Skip to main content

Generating synchronized audio and video with LTX 2.3 in Scope

LTX 2.3 is a 22-billion-parameter audio-video foundation model from Lightricks. It generates synchronized video and audio from a single text prompt, which makes it different from every other pipeline shipped with Scope. This guide walks you through installing the scope-ltx-2 plugin, running it locally or in the cloud, and using the two LoRA families that define this pipeline: ID-LoRA for identity-driven talking-head video, and IC-LoRAs for structural control.
Running locally? You need to install the scope-ltx-2 plugin yourself (see Install the plugin below). It is not bundled with Scope by default, but it can be installed directly from the desktop app.Running in the cloud via Remote Inference? The plugin is already pre-installed in the cloud image. Skip the install step and jump to Run it in the cloud.

scope-ltx-2 on GitHub

Source repository for the LTX 2.3 Scope plugin

LTX 2.3 Pipeline Reference

Full parameter schema, schedules, memory architecture, audio modes, and the complete IC-LoRA catalog. Helpful alongside this guide whenever you want the technical detail behind a setting.

Prerequisites

These requirements apply to local installs. For Remote Inference, you only need a Daydream account signed in from the desktop app, and none of the hardware or token setup below applies.
  • NVIDIA GPU with 24 GB VRAM or more (RTX 4090, A5000, or similar)
  • CUDA 12.8 or higher
  • Python 3.12 or higher
  • A HuggingFace access token with read permissions (see HuggingFace Auth)

Install the plugin

This section applies to local installs only. If you are using Remote Inference, skip ahead to Run it in the cloud.
You can install the plugin from the Scope desktop app or from the CLI. Both paths install the same ltx2 pipeline.

Desktop app

1

Open the Plugins tab

Open SettingsPlugins.
2

Paste the plugin URL

Paste the following into the installation input field:
https://github.com/daydreamlive/scope-ltx-2
3

Install

Click Install and wait for the Scope server to restart. The plugin and its ltx2 pipeline appear automatically once the restart completes.
See the Plugins guide for more detail on the desktop plugin workflow.

CLI

From the scope directory:
uv run daydream-scope install https://github.com/daydreamlive/scope-ltx-2
Confirm the plugin registered:
uv run daydream-scope plugins
Confirm the ltx2 pipeline is available:
uv run daydream-scope pipelines

Upgrade

In the desktop app, open SettingsPlugins. If an update is available, an Update button appears next to the plugin — click it and wait for the server to restart. From the CLI:
uv run daydream-scope install --upgrade https://github.com/daydreamlive/scope-ltx-2

Uninstall

In the desktop app, open SettingsPlugins and click the trash icon next to the plugin. From the CLI:
uv run daydream-scope uninstall scope-ltx-2

Run it locally

1

Set your HuggingFace token

LTX 2.3 downloads weights from the daydreamlive/LTX2.3 repository on first load. Export a token with read access before launching:
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Add this to your shell profile (~/.bashrc, ~/.zshrc) so it persists across sessions.
2

Start Scope

From the scope directory:
uv run daydream-scope
The web UI opens at http://localhost:8000.
3

Select the LTX 2.3 pipeline

In the pipeline picker, choose LTX 2.3. On first load, Scope downloads roughly 28 GB of weights (transformer, text encoder, VAEs, ID-LoRA), which can take several minutes depending on your connection.
4

Generate

Enter a prompt and press Play. The model produces synchronized audio and video in batches of 129 frames by default.
To prefetch weights without starting the full UI, run uv run download_models --pipeline ltx2.
Model weights land in ~/.daydream-scope/models/LTX2.3/ by default. Override this with the DAYDREAM_SCOPE_MODELS_DIR environment variable.

Run it in the cloud

LTX 2.3 is pre-bundled in the Daydream cloud image used by Remote Inference, so there is no plugin install, no HF_TOKEN to export, and no model download to wait on. Everything that the local setup above covers is already done for you on the cloud H100.
1

Sign in from the desktop app

Open Settings (gear icon, top-right) → Account and sign in with Daydream.
2

Enable Remote Inference

Toggle Remote Inference on in the Account panel.
3

Select LTX 2.3 and press Play

Choose LTX 2.3 from the pipeline picker and press Play.

Remote Inference

Full Remote Inference setup, supported features, and known limitations

Generation modes

LTX 2.3 supports three modes. Parameter details for each are on the reference page.

Text mode

The default. A prompt produces synchronized video and audio with no other input.

Image-to-video

Upload a reference image as the I2V input and the first frame of the generation will be conditioned on it. Use the i2v_strength slider to dial back the conditioning if the first frame feels too locked to the reference.

Video mode

Used with IC-LoRAs for structural control. Select a matching IC-LoRA, switch the input to Video mode, and provide a reference video. Use control_strength to blend how strictly the output follows the reference.

Identity-driven talking-head with ID-LoRA

ID-LoRA generates video of a specific person speaking with their own voice. Unlike cascaded pipelines that generate video and voice separately, LTX 2.3’s ID-LoRA produces lip-synced audio-video in one pass. You give it three things:
  • A text prompt describing the scene
  • A reference image of the subject
  • A short audio clip (about 5 seconds) of the subject speaking
The output is a video of that subject speaking, in the scene you described.

Prompt structure

ID-LoRA responds best to channel-tagged prompts:
[VISUAL]: A close-up of a person speaking in a park.
[SPEECH]: Hello world.
[SOUNDS]: Birds chirping.
Each tag controls a separate channel: [VISUAL] for scene content, [SPEECH] for what the subject says, [SOUNDS] for ambient audio.

Workflow

1

Enable ID-LoRA mode

Set Audio Mode to id_lora. This tells the pipeline that the audio input is an identity reference, not a driving track.
2

Upload a reference audio clip

Upload about 5 seconds of clean audio of the subject speaking. Clip quality matters more than length.
3

Upload a reference image

Set the I2V reference image to a headshot of the same subject.
4

Write your prompt

Use the channel-tagged format above. Describe the scene, the dialogue, and the ambient audio separately.
5

Tune identity guidance

The Identity Guidance slider (default 3.0, range 0.0 to 20.0) controls how strongly speaker identity is amplified. Around 4.0 gives roughly 9% better speaker similarity; higher values lock identity harder at the cost of some diversity. Set it to 0 to disable.
6

Generate

Press Play. The output will be a video of your subject speaking the [SPEECH] line in the scene described by [VISUAL], with [SOUNDS] as the environmental audio.
ID-LoRA weights (ltx-2.3-id-lora-talkvid-3k.safetensors) download automatically with the base model. You do not need to install them separately.

Structural control with IC-LoRAs

IC-LoRAs (In-Context LoRAs) condition generation on a reference video rather than a text prompt alone. They are how you get precise control over output structure: depth maps, pose skeletons, edge compositions, camera movements, or stylistic transformations like anime-to-photorealism.

Install an IC-LoRA

1

Download the .safetensors file

Browse the IC-LoRA catalog on the reference page and download the file from HuggingFace.
2

Place it in the LoRA directory

Copy the file into ~/.daydream-scope/models/lora/.
cp ~/Downloads/ltx-2.3-22b-ic-lora-union-control-ref0.5.safetensors ~/.daydream-scope/models/lora/
Scope discovers new files automatically and exposes them in the LoRA picker.
3

Select it in the LoRA picker

In the Scope UI, open the LoRA adapters panel and select your IC-LoRA from the dropdown.
4

Switch to video mode

Set the input mode to Video and upload your reference video (depth map, pose skeleton, color-graded clip, or whatever the chosen IC-LoRA expects).
5

Tune control strength

Adjust control_strength (0.0 to 1.0) to blend the guide. Some IC-LoRAs need strengths as high as 1.3 to show the effect.
IC-LoRAs expect reference video at a specific downscale factor relative to the output resolution (often 0.5x). The plugin reads this automatically from the safetensors metadata, so you do not need to configure it by hand.

Which IC-LoRA should I use?

Match the IC-LoRA to your goal:
  • Depth, edges, or pose control → Union Control (Lightricks)
  • Motion paths from spline trajectories → Motion Track Control (Lightricks)
  • Anime-to-photorealism → Anime2Real (community)
  • Replace regions in a video via mask → Inpaint (community)
  • Strip color grading → Ungrade (community)
  • Sharpen soft footage → Refocus (community)
  • Remove compression artifacts → Uncompress (community)
  • Extend canvas beyond the frame → Outpaint (community)
  • Transfer camera motion → Cameraman (community)
  • Colorize black-and-white footage → Colorizer (community)
Full descriptions, training details, and usage notes for each are in the IC-LoRA catalog.

Fitting the model on a 24 GB GPU

LTX 2.3 is a 22-billion-parameter model. It runs on a 24 GB GPU through FP8 quantization of the transformer, temporary offloading of the text encoder, and CPU-resident transformer blocks that stream to GPU during denoising. This all happens automatically, but there are knobs to turn if you hit memory limits:
  • Lower num_frames (33 or 65 instead of the default 129)
  • Reduce resolution (keep to multiples of 32)
  • Lower ffn_chunk_size to 2048 or 1024
  • Set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to reduce fragmentation
See Memory architecture for the full picture.

Troubleshooting

Reduce num_frames first (33 or 65 instead of 129), then lower resolution, then lower ffn_chunk_size. Close any other GPU workloads. If the model still does not fit at any practical settings, your GPU may be below spec for LTX 2.3.
  • Verify your HF_TOKEN is set correctly and has read access.
  • Run huggingface-cli login to confirm the token works.
  • Check network connectivity and available disk space (about 28 GB required).
Restart the Scope server. From the CLI, confirm the plugin is registered with uv run daydream-scope plugins and that the pipeline is available with uv run daydream-scope pipelines.
Generation time scales with frame count, resolution, and PCIe throughput during weight streaming. Reduce num_frames for faster iteration and make sure no other workloads are hitting the same GPU.
Make sure audio_mode is set to id_lora, not driving. In driving mode, the input audio is treated as a lip-sync target and no identity transfer happens. If voice similarity is still low, raise identity_guidance_scale toward 4 to 6.

See also

LTX 2.3 Reference

Full parameter schema, IC-LoRA catalog, and memory architecture details

Using LoRAs

General LoRA installation and management in Scope

Using Nodes

Install and manage third-party Scope plugins

Remote Inference

Run Scope on cloud-hosted GPUs, including LTX 2.3