Generating synchronized audio and video with LTX 2.3 in Scope
LTX 2.3 is a 22-billion-parameter audio-video foundation model from Lightricks. It generates synchronized video and audio from a single text prompt, which makes it different from every other pipeline shipped with Scope. This guide walks you through installing the scope-ltx-2 plugin, running it locally or in the cloud, and using the two LoRA families that define this pipeline: ID-LoRA for identity-driven talking-head video, and IC-LoRAs for structural control.Running locally? You need to install the scope-ltx-2 plugin yourself (see Install the plugin below). It is not bundled with Scope by default, but it can be installed directly from the desktop app.Running in the cloud via Remote Inference? The plugin is already pre-installed in the cloud image. Skip the install step and jump to Run it in the cloud.
scope-ltx-2 on GitHub
Source repository for the LTX 2.3 Scope plugin
LTX 2.3 Pipeline Reference
Full parameter schema, schedules, memory architecture, audio modes, and the complete IC-LoRA catalog. Helpful alongside this guide whenever you want the technical detail behind a setting.
Prerequisites
These requirements apply to local installs. For Remote Inference, you only need a Daydream account signed in from the desktop app, and none of the hardware or token setup below applies.- NVIDIA GPU with 24 GB VRAM or more (RTX 4090, A5000, or similar)
- CUDA 12.8 or higher
- Python 3.12 or higher
- A HuggingFace access token with read permissions (see HuggingFace Auth)
Install the plugin
You can install the plugin from the Scope desktop app or from the CLI. Both paths install the sameltx2 pipeline.
Desktop app
See the Plugins guide for more detail on the desktop plugin workflow.
CLI
From the scope directory:ltx2 pipeline is available:
Upgrade
In the desktop app, open Settings → Plugins. If an update is available, an Update button appears next to the plugin — click it and wait for the server to restart. From the CLI:Uninstall
In the desktop app, open Settings → Plugins and click the trash icon next to the plugin. From the CLI:Run it locally
Set your HuggingFace token
LTX 2.3 downloads weights from the daydreamlive/LTX2.3 repository on first load. Export a token with read access before launching:Add this to your shell profile (
~/.bashrc, ~/.zshrc) so it persists across sessions.Select the LTX 2.3 pipeline
In the pipeline picker, choose LTX 2.3. On first load, Scope downloads roughly 28 GB of weights (transformer, text encoder, VAEs, ID-LoRA), which can take several minutes depending on your connection.
~/.daydream-scope/models/LTX2.3/ by default. Override this with the DAYDREAM_SCOPE_MODELS_DIR environment variable.
Run it in the cloud
LTX 2.3 is pre-bundled in the Daydream cloud image used by Remote Inference, so there is no plugin install, noHF_TOKEN to export, and no model download to wait on. Everything that the local setup above covers is already done for you on the cloud H100.
Sign in from the desktop app
Open Settings (gear icon, top-right) → Account and sign in with Daydream.
Remote Inference
Full Remote Inference setup, supported features, and known limitations
Generation modes
LTX 2.3 supports three modes. Parameter details for each are on the reference page.Text mode
The default. A prompt produces synchronized video and audio with no other input.Image-to-video
Upload a reference image as the I2V input and the first frame of the generation will be conditioned on it. Use thei2v_strength slider to dial back the conditioning if the first frame feels too locked to the reference.
Video mode
Used with IC-LoRAs for structural control. Select a matching IC-LoRA, switch the input to Video mode, and provide a reference video. Usecontrol_strength to blend how strictly the output follows the reference.
Identity-driven talking-head with ID-LoRA
ID-LoRA generates video of a specific person speaking with their own voice. Unlike cascaded pipelines that generate video and voice separately, LTX 2.3’s ID-LoRA produces lip-synced audio-video in one pass. You give it three things:- A text prompt describing the scene
- A reference image of the subject
- A short audio clip (about 5 seconds) of the subject speaking
Prompt structure
ID-LoRA responds best to channel-tagged prompts:[VISUAL] for scene content, [SPEECH] for what the subject says, [SOUNDS] for ambient audio.
Workflow
Enable ID-LoRA mode
Set Audio Mode to
id_lora. This tells the pipeline that the audio input is an identity reference, not a driving track.Upload a reference audio clip
Upload about 5 seconds of clean audio of the subject speaking. Clip quality matters more than length.
Write your prompt
Use the channel-tagged format above. Describe the scene, the dialogue, and the ambient audio separately.
Tune identity guidance
The Identity Guidance slider (default 3.0, range 0.0 to 20.0) controls how strongly speaker identity is amplified. Around 4.0 gives roughly 9% better speaker similarity; higher values lock identity harder at the cost of some diversity. Set it to 0 to disable.
ID-LoRA weights (
ltx-2.3-id-lora-talkvid-3k.safetensors) download automatically with the base model. You do not need to install them separately.Structural control with IC-LoRAs
IC-LoRAs (In-Context LoRAs) condition generation on a reference video rather than a text prompt alone. They are how you get precise control over output structure: depth maps, pose skeletons, edge compositions, camera movements, or stylistic transformations like anime-to-photorealism.Install an IC-LoRA
Download the .safetensors file
Browse the IC-LoRA catalog on the reference page and download the file from HuggingFace.
Place it in the LoRA directory
Copy the file into Scope discovers new files automatically and exposes them in the LoRA picker.
~/.daydream-scope/models/lora/.Select it in the LoRA picker
In the Scope UI, open the LoRA adapters panel and select your IC-LoRA from the dropdown.
Switch to video mode
Set the input mode to Video and upload your reference video (depth map, pose skeleton, color-graded clip, or whatever the chosen IC-LoRA expects).
Which IC-LoRA should I use?
Match the IC-LoRA to your goal:- Depth, edges, or pose control → Union Control (Lightricks)
- Motion paths from spline trajectories → Motion Track Control (Lightricks)
- Anime-to-photorealism → Anime2Real (community)
- Replace regions in a video via mask → Inpaint (community)
- Strip color grading → Ungrade (community)
- Sharpen soft footage → Refocus (community)
- Remove compression artifacts → Uncompress (community)
- Extend canvas beyond the frame → Outpaint (community)
- Transfer camera motion → Cameraman (community)
- Colorize black-and-white footage → Colorizer (community)
Fitting the model on a 24 GB GPU
LTX 2.3 is a 22-billion-parameter model. It runs on a 24 GB GPU through FP8 quantization of the transformer, temporary offloading of the text encoder, and CPU-resident transformer blocks that stream to GPU during denoising. This all happens automatically, but there are knobs to turn if you hit memory limits:- Lower
num_frames(33 or 65 instead of the default 129) - Reduce resolution (keep to multiples of 32)
- Lower
ffn_chunk_sizeto 2048 or 1024 - Set
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:Trueto reduce fragmentation
Troubleshooting
Out of memory errors
Out of memory errors
Reduce
num_frames first (33 or 65 instead of 129), then lower resolution, then lower ffn_chunk_size. Close any other GPU workloads. If the model still does not fit at any practical settings, your GPU may be below spec for LTX 2.3.Model download fails
Model download fails
- Verify your
HF_TOKENis set correctly and has read access. - Run
huggingface-cli loginto confirm the token works. - Check network connectivity and available disk space (about 28 GB required).
Plugin does not appear after install
Plugin does not appear after install
Restart the Scope server. From the CLI, confirm the plugin is registered with
uv run daydream-scope plugins and that the pipeline is available with uv run daydream-scope pipelines.Slow generation
Slow generation
Generation time scales with frame count, resolution, and PCIe throughput during weight streaming. Reduce
num_frames for faster iteration and make sure no other workloads are hitting the same GPU.ID-LoRA ignores my reference voice
ID-LoRA ignores my reference voice
Make sure
audio_mode is set to id_lora, not driving. In driving mode, the input audio is treated as a lip-sync target and no identity transfer happens. If voice similarity is still low, raise identity_guidance_scale toward 4 to 6.See also
LTX 2.3 Reference
Full parameter schema, IC-LoRA catalog, and memory architecture details
Using LoRAs
General LoRA installation and management in Scope
Using Nodes
Install and manage third-party Scope plugins
Remote Inference
Run Scope on cloud-hosted GPUs, including LTX 2.3