StreamDiffusion V2 Pipeline
StreamDiffusionV2 is a streaming inference pipeline and autoregressive video diffusion model from the creators of the original StreamDiffusion project. The model is trained using Self-Forcing on Wan2.1 1.3b with modifications to support streaming.At a Glance
| Base Model | Wan2.1 1.3B |
| Estimated VRAM | ~20GB |
| Training | Self-Forcing |
| LoRA Support | 1.3B LoRAs |
| VACE Support | Yes |
| T2V / V2V | Yes / Yes |
Examples
The following examples include timeline JSON files with the prompts used so you can try them as well.Evolution
Timeline JSON file
Download the timeline to try this example
Feline
Timeline JSON file
Download the timeline to try this example
Prey
Timeline JSON file
Download the timeline to try this example
Resolution
The generation will be faster for smaller resolutions resulting in smoother video. Scope currently will use the input video’s resolution as the output resolution. The visual quality will be better at 832x480 which is the resolution that the model was trained on, but you may need a more powerful GPU in order to achieve a higher FPS.Seed
The seed parameter in the UI can be used to reproduce generations. If you like the generation for a certain seed value, input video and sequence of prompts you can re-use that value later with those same input video and prompts to reproduce the generation.Prompting
The model works better with long, detailed prompts. A helpful technique to extend prompts is to take a base prompt and then ask a LLM chatbot (eg. ChatGPT, Claude, Gemini, etc.) to write a more detailed version. If your base prompt is:Offline Generation
A test script can be used for offline generation. If the model weights are not downloaded yet:output.mp4 file in the streamdiffusionv2 directory.
See Also
Other Pipelines
LongLive
Smooth prompt transitions and extended generation from Nvidia
Krea Realtime
14B model for highest quality generation
RewardForcing
Reward-matched training for improved output quality
MemFlow
Memory bank for long-context consistency