Kling 2.6 - AI Video Generation with Sound

Create stunning AI videos with optional sound using Kling 2.6 technology

The text prompt used to generate the video
sound* This parameter is used to specify whether the generated video contains sound
This parameter defines the aspect ratio of the video.

This generation will cost: 80 Gems

Ready to Create Your Video

Generate a Kling 2.6 Clip in 3 Steps

1

Pick Text or Image Input

Switch to Text-to-Video for scene prompts or Image-to-Video to animate a photo. Both modes support the optional sound toggle.

2

Dial In Your Settings

Write your prompt, flip Sound on or off, select an aspect ratio (1:1, 16:9, 9:16), and choose 5 s or 10 s duration.

3

Generate & Export

Tap Generate, preview the result with audio, and download the finished MP4 when you're satisfied.

What Sets Kling 2.6 Apart

Toggleable Audio Layer

Flip the sound switch and Kling 2.6 synthesizes ambient audio, dialogue cues, or effects that track the on-screen action frame by frame.

⏱️

Two Duration Tiers

Choose 5-second punchy clips for social hooks or 10-second cuts for deeper storytelling — both render at the same quality level.

🎬

Refined Visual Engine

Kling 2.6 builds on the proven Kling physics backbone with sharper textures, smoother transitions, and stronger prompt adherence.

📐

Platform-Ready Formats

Output in 1:1 square, 16:9 landscape, or 9:16 portrait — match the native format of any social or broadcast channel without cropping.

Kling 2.6 — Common Questions

Q1. What is Kling 2.6?

Kling 2.6 is Kuaishou's dual-mode video model supporting text-to-video and image-to-video with an optional synchronized audio layer. It outputs in multiple aspect ratios at 5 s or 10 s.

Q2. How does the sound toggle work?

Enable the Sound switch before generating. The model then synthesizes ambient audio, dialogue hints, and effects matched to the visual content. Disabling it produces a silent clip at a lower gem cost.

Q3. Which aspect ratios are available?

1:1 (square), 16:9 (landscape), and 9:16 (portrait). Aspect-ratio selection is available in text-to-video mode; image-to-video inherits the photo's native ratio.

Q4. What does each generation cost?

5 s silent = 80 gems, 5 s with sound = 160 gems, 10 s silent = 160 gems, 10 s with sound = 320 gems. Gem packs are on the pricing page.