Skip to main content

Tips and Tricks

Be careful not to use too many words. CLIP (Contrastive Language-Image Pre-training) which maps text to latent space - will cut off your prompt after a certain number of characters. Focus on using important nouns and adjectives, and avoid wasting characters on lengthy sentence structure as you might with newer models like Flux, which handle more context. Keep it direct and to the point.
Be mindful of how each step balances the image input against the text prompt (a basic but important guideline). Experiment with strategies that emphasize prompt adherence in the early steps and become more abstract in later ones—or reverse the approach depending on your goal.
You can’t rely on seeds producing the same results across different machines, so avoid “locking in” a single look. Instead, ensure your prompt works well across a wide range of seeds.
Pay attention to the texture of your image input. Adding a small amount of noise can encourage the model to generate richer detail, but too much noise may result in a muddy or unclear output, so use it carefully.
A useful approach is to initialize all control networks during stream creation with their conditioning_scale set to 0, and include every control net in subsequent updates regardless of use. This strategy ensures that the number of control nets remains unchanged between updates, allowing only the relevant conditioning_scale values to be adjusted as needed. Setting a conditioning_scale to 0 effectively disables that control net, as it will be ignored by the AI runner, keeping the control net present without triggering any reloads or changes to model structure.Use the example below to get started.
"controlnets": [
        {
            "enabled": true,
            "model_id": "thibaud/controlnet-sd21-openpose-diffusers",
            "preprocessor": "pose_tensorrt",
            "conditioning_scale": 0.47,
            "preprocessor_params": {},
            "control_guidance_end": 1,
            "control_guidance_start": 0
        },
        {
            "enabled": true,
            "model_id": "thibaud/controlnet-sd21-hed-diffusers",
            "preprocessor": "soft_edge",
            "conditioning_scale": 0.14,
            "preprocessor_params": {},
            "control_guidance_end": 1,
            "control_guidance_start": 0
        },
        {
            "enabled": true,
            "model_id": "thibaud/controlnet-sd21-canny-diffusers",
            "preprocessor": "canny",
            "conditioning_scale": 0.27,
            "preprocessor_params": {
                "low_threshold": 100,
                "high_threshold": 200
            },
            "control_guidance_end": 1,
            "control_guidance_start": 0
        },
        {
            "enabled": true,
            "model_id": "thibaud/controlnet-sd21-depth-diffusers",
            "preprocessor": "depth_tensorrt",
            "conditioning_scale": 0.34,
            "preprocessor_params": {},
            "control_guidance_end": 1,
            "control_guidance_start": 0
        },
        {
            "enabled": true,
            "model_id": "thibaud/controlnet-sd21-color-diffusers",
            "preprocessor": "passthrough",
            "conditioning_scale": 0.66,
            "preprocessor_params": {},
            "control_guidance_end": 1,
            "control_guidance_start": 0
        }
    ],
This video demonstrates how to use ControlNets with animal prompts to achieve better results
This video demonstrates how to use a very powerful tool for getting exactly what you want from StreamDiffusion and Daydream.
If you are working with multiple speech prompts, you can blend the second phrase with the previous one to create a smoother, more continuous flow.Sending the speech as a prompt, for example:prompt: "A beautiful tree"And then, the next phraseprompt: "Stands by the river"Merge the two phrases and return the float values, with each value representing the weight of its corresponding prompt.prompt: [["A beautiful tree", 0.6],["Stand by the river", 0.4]]
Slide window for the prompting by selecting a subset of recent context to focus on while generating or processing text.For example:Transcript 1: A beautiful → Queue: A beautifulTranscript 2: scene with → Queue: A beautiful scene withTranscript 3: mountains → Queue: A beautiful scene with mountainsTranscript 4: and lakes → Queue: scene with mountains and lakes (removed A beautiful)Transcript 5: by the river → Queue: mountains and lakes by the river (removed scene with)
To hide the controls on the player for playback, append &controls=false to the playback URL.
https://lvpr.tv/?v=<your output_playback_id>&controls=false