Skip to main content

Pipelines Supported in Scope

Scope supports five autoregressive video diffusion pipelines. All are built on Wan2.1 base models and use the Self-Forcing training methodology, except RewardForcing which uses Rewarded Distribution Matching. Each pipeline has its own strengths - some optimize for speed, others for quality or long-context consistency. The comparison tables below summarize the key differences to help you choose.

Quick Comparison

Four pipelines use the smaller Wan2.1 1.3B model, while Krea Realtime uses the larger 14B model for higher quality output at the cost of increased VRAM requirements.
PipelineBase ModelCreatorEstimated VRAM*
StreamDiffusion V2Wan2.1 1.3BStreamDiffusion team~20GB
LongLiveWan2.1 1.3BNvidia, MIT, HKUST, HKU, THU~20GB
Krea RealtimeWan2.1 14BKrea~32GB
RewardForcingWan2.1 1.3BZJU, Ant Group, SIAS-ZJU, HUST, SJTU~20GB
MemFlowWan2.1 1.3BKling~20GB
*Estimated runtime VRAM usage. A 24GB GPU (e.g. RTX 4090) is the minimum commercially available card for 1.3B pipelines. See System Requirements for minimum hardware specs.

Feature Support

All pipelines support both Text-to-Video (T2V) and Video-to-Video (V2V) generation modes, as well as LoRA adapters and VACE conditioning. MemFlow is unique in having a memory bank for improved long-context consistency.
FeatureStreamDiffusion V2LongLiveKrea RealtimeRewardForcingMemFlow
Text-to-Video (T2V)YesYesYesYesYes
Video-to-Video (V2V)YesYesLimited*YesYes
LoRA Support1.3B LoRAs1.3B LoRAs14B LoRAs1.3B LoRAs1.3B LoRAs
VACE SupportYesYesYesYesYes
Memory BankNoNoNoNoYes
*Krea Realtime’s regular V2V mode (latent initialization) has known quality issues. Use VACE V2V (visual conditioning with input video) for better results.

Pipeline Details


Shared Parameters

All pipelines share these common parameters.

Resolution

Generation is faster at smaller resolutions, resulting in smoother video. The visual quality is best at 832x480, which is the training resolution for most pipelines. You may need a more powerful GPU to maintain high FPS at this resolution.

Seed

The seed parameter enables reproducible generations. If you find a seed value that produces good results with a specific prompt sequence, save it to reproduce that generation later.

Prompting

These techniques apply to all pipelines and significantly improve output quality.

Subject and Background Anchors

Include a clear subject (who/what) and background/setting (where) in each prompt. For scene continuity, reference the same subject and/or setting across prompts.
"A 3D animated scene. A panda walks along a path towards the camera in a park on a spring day."
"A 3D animated scene. A panda halts along a path in a park on a spring day."

Cinematic Long Takes

The models work better with long cinematic takes rather than rapid shot-by-shot transitions. Avoid quick cutscenes, rapid scene changes, and jump cuts. Instead, let scenes flow naturally with gradual transitions.

Long, Detailed Prompts

All pipelines perform better with detailed prompts. A helpful technique is to expand a base prompt using an LLM (ChatGPT, Claude, etc.). Base prompt:
"A cartoon dog jumping and then running."
Expanded prompt:
"A cartoon dog with big expressive eyes and floppy ears suddenly leaps into the frame, tail wagging, and then sprints joyfully toward the camera. Its oversized paws pound playfully on the ground, tongue hanging out in excitement. The animation style is colorful, smooth, and bouncy, with exaggerated motion to emphasize energy and fun. The background blurs slightly with speed lines, giving a lively, comic-style effect as if the dog is about to jump right into the viewer."

See Also