Seedance 2.0 — ByteDance AI Video with Optional Audio
Generate 4 s to 12 s clips at up to 1080p in six aspect ratios — and toggle synchronized sound effects on or off per render.
Ready to Create Your Video
Create a Seedance 2.0 Video in 3 Steps
Describe or Upload
Pick Text-to-Video and write a scene prompt (3–2 500 chars), or switch to Image-to-Video and drag in up to two reference photos (JPEG / PNG / WEBP, 10 MB max each).
Adjust Settings
Select an aspect ratio (1:1, 16:9, 9:16, 21:9, 4:3, or 3:4), resolution (480p / 720p / 1080p), duration (4 s / 8 s / 12 s), and flip the audio toggle if you want synchronized sound.
Render & Download
Hit Generate, wait for the render to finish, then preview the clip inline and save the final MP4 to your device.
Why Choose Seedance 2.0
Dual-Input Pipeline
Describe a scene in words or upload one or two reference images — Seedance 2.0 handles both text-to-video and image-to-video through one unified interface.
Three Resolution Tiers
480p for fast storyboards, 720p for balanced quality, and 1080p for broadcast-grade output. Pick the tier that fits your deadline and budget.
Six Aspect Ratios
1:1, 16:9, 9:16, 21:9, 4:3, and 3:4 — covering Instagram, YouTube, TikTok, ultra-wide displays, and classic broadcast formats from one generation.
On-Demand Audio Layer
Flip a toggle to generate synchronized sound effects alongside the visuals. Audio adds a small per-second cost, keeping silent renders budget-friendly.
Flexible Duration Control
Choose 4 s for quick loops, 8 s for standard social clips, or 12 s for longer narrative sequences — all rendered through the same pipeline.
ByteDance Video Engine
Powered by ByteDance's proprietary video diffusion stack, Seedance 2.0 delivers sharp detail, natural motion, and reliable prompt adherence across all settings.
Seedance 2.0 — Common Questions
Q1. What is Seedance 2.0?
Seedance 2.0 is ByteDance's AI video generation model. It turns text prompts or uploaded images into polished clips at up to 1080p, with optional synchronized audio, six aspect ratios, and three duration tiers.
Q2. What resolutions and durations are available?
Resolutions: 480p (Standard), 720p (High), and 1080p (Ultra). Durations: 4 s, 8 s, or 12 s. All combinations work with both text-to-video and image-to-video.
Q3. How is pricing calculated?
Cost scales with resolution and duration — roughly 8–10 gems per second at 480p, 12–14 at 720p, and 16–18 at 1080p. Enabling audio adds about 2 gems per second on top.
Q4. Which image formats can I upload?
JPEG, PNG, and WEBP — up to 10 MB per file and a maximum of two images per generation in image-to-video mode.
Q5. How does the audio toggle work?
When enabled, the model generates synchronized sound effects alongside the video frames. The audio layer adds a small per-second surcharge. Leave the toggle off for a silent clip at lower cost.