Pipelines Supported in Scope
Scope supports five autoregressive video diffusion pipelines. All are built on Wan2.1 base models and use the Self-Forcing training methodology, except RewardForcing which uses Rewarded Distribution Matching. Each pipeline has its own strengths - some optimize for speed, others for quality or long-context consistency. The comparison tables below summarize the key differences to help you choose.Quick Comparison
Four pipelines use the smaller Wan2.1 1.3B model, while Krea Realtime uses the larger 14B model for higher quality output at the cost of increased VRAM requirements.| Pipeline | Base Model | Creator | Estimated VRAM* |
|---|---|---|---|
| StreamDiffusion V2 | Wan2.1 1.3B | StreamDiffusion team | ~20GB |
| LongLive | Wan2.1 1.3B | Nvidia, MIT, HKUST, HKU, THU | ~20GB |
| Krea Realtime | Wan2.1 14B | Krea | ~32GB |
| RewardForcing | Wan2.1 1.3B | ZJU, Ant Group, SIAS-ZJU, HUST, SJTU | ~20GB |
| MemFlow | Wan2.1 1.3B | Kling | ~20GB |
Feature Support
All pipelines support both Text-to-Video (T2V) and Video-to-Video (V2V) generation modes, as well as LoRA adapters and VACE conditioning. MemFlow is unique in having a memory bank for improved long-context consistency.| Feature | StreamDiffusion V2 | LongLive | Krea Realtime | RewardForcing | MemFlow |
|---|---|---|---|---|---|
| Text-to-Video (T2V) | Yes | Yes | Yes | Yes | Yes |
| Video-to-Video (V2V) | Yes | Yes | Limited* | Yes | Yes |
| LoRA Support | 1.3B LoRAs | 1.3B LoRAs | 14B LoRAs | 1.3B LoRAs | 1.3B LoRAs |
| VACE Support | Yes | Yes | Yes | Yes | Yes |
| Memory Bank | No | No | No | No | Yes |
Pipeline Details
StreamDiffusion V2
Real-time streaming from the original StreamDiffusion creators
LongLive
Smooth prompt transitions and extended generation from Nvidia
Krea Realtime
14B model for highest quality generation
RewardForcing
Reward-matched training for improved output quality
MemFlow
Memory bank for long-context consistency
Shared Parameters
All pipelines share these common parameters.Resolution
Generation is faster at smaller resolutions, resulting in smoother video. The visual quality is best at 832x480, which is the training resolution for most pipelines. You may need a more powerful GPU to maintain high FPS at this resolution.Seed
The seed parameter enables reproducible generations. If you find a seed value that produces good results with a specific prompt sequence, save it to reproduce that generation later.Prompting
These techniques apply to all pipelines and significantly improve output quality.Subject and Background Anchors
Include a clear subject (who/what) and background/setting (where) in each prompt. For scene continuity, reference the same subject and/or setting across prompts.Cinematic Long Takes
The models work better with long cinematic takes rather than rapid shot-by-shot transitions. Avoid quick cutscenes, rapid scene changes, and jump cuts. Instead, let scenes flow naturally with gradual transitions.Long, Detailed Prompts
All pipelines perform better with detailed prompts. A helpful technique is to expand a base prompt using an LLM (ChatGPT, Claude, etc.). Base prompt:See Also
VAE Types
Configure VAE for quality/speed tradeoffs
System Requirements
Hardware requirements for each pipeline
Pipeline Architecture
Technical details for node developers