LTX-2 Video Generation Prompt Engineering: From 36-Scene Horror to Cinematic Continuity Pipelines
Structured prompt specifications for LTX-2 video generation. Covers the 36-scene horror scenario template with mandatory dialogue, cinematic shot design principles, and multi-scene visual continuity control for production pipelines.
Summary
When producing watchable video from LTX-2, prompt structuring is key. Generating a single pretty shot and chaining 36 scenes into a coherent story require fundamentally different engineering. This article consolidates three specifications refined through production use:
- 36-scene horror scenario generation template — system prompt for LLM-driven scenario writing
- Cinematic prompt design principles — structure for maximizing individual shot quality
- Multi-scene continuity control — pipeline design for preventing inter-clip visual breakage
Prerequisites
- Video generation engine: LTX-2 (5-second clip generation)
- Resolution: 3840x2160 (4K), 24fps
- Scenario generation: Local LLM (Mixtral, etc.) generating STORY→SCENES→LTX2_PROMPTS in one pass
- Pipeline: Per-scene generation → ffmpeg concatenation → audio composited downstream
1. 36-Scene Horror Scenario Generation Template
Design Intent
Background: When experimenting with Monstral-123B (NVFP4 quantization) for horror scenario generation, we encountered critical instabilities:
- Scene counts varied between 20–50; achieving a fixed 36-scene structure was impossible
- Narrative structure collapsed; act structure (setup/escalation/resolution) became unclear
- Dialogue disappeared midway; mute scenes proliferated
- Endings became ambiguous or devolved into abrupt “it was all a dream” tropes
These weren’t LLM performance failures—they stemmed from the absence of explicit output format constraints. Simply asking an LLM to “write a horror video scenario” grants too much freedom, destabilizing generation results. By designing a system prompt that rigorously constrains output schema and rules, we enabled mid-size models like Monstral-123B to reliably generate stable 36-scene horror narratives.
Output Schema (Fixed)
LLM output is restricted to exactly these 5 sections. No extra commentary.
- STORY_PROMPT — narrative skeleton (one paragraph)
- LABELS — genre, tone, motifs
- OUTLINE — 5-stage plot summary
- SCENES_36 — Scene 01 through Scene 36 (1-3 sentences each)
- LTX2_PROMPTS_36 — Shot 01 through Shot 36 (generation prompt per shot)
Scene Allocation
- 01-08: Daily life and subtle unease (first whispers, minor anomalies)
- 09-20: Escalation and repetition (voices repeat, reflections speak back)
- 21-28: Investigation and confrontation (identifying the cause, revealing truth)
- 29-36: Resolution and closure (cause identified → concrete action → safety restored)
Dream endings, “it was all imagination” dismissals, and unresolved ambiguity are prohibited.
Required Elements Per Scene
Scene 04:
Visual: A woman stands before a mirror. Camera slowly zooms into her reflection.
Whisper: "You see it too, don't you?"
Sound: Fluorescent light flickers with a low humming sound.
- Visual: what is on screen
- Camera: framing or camera movement
- Dialogue: at least one spoken line (Dialogue / Whisper / Voice(V.O.) / Heard Voice / Inner Voice)
No scene may be silent. For scenes where dialogue is unnatural, use whispered words, distorted voices, internal monologue, reflected/mirrored speech, or remembered phrases.
Setting Rules
- Japan (urban or suburban), deliberately vague location names
- Cultural elements: quiet residential streets, small houses, mirrors, shrines, corridors, sliding doors, evening TV noise, cicadas, wind, fluorescent lights
- No graphic gore or explicit violence. Fear is built through psychological means: whispers, shadows, reflections, repetition, isolation
Story Flavors (Presets)
Pre-built flavor templates the LLM can auto-select when user input is vague:
- trapped_room_mummy: locked in before summer break, only traces remain
- water_reflection: water surface reflections gradually rewrite reality
- station_beep_phrase: station chimes become words that guide people
- fox_shrine_wrath: removal of a small shrine triggers quiet anomalies
LTX2_PROMPTS_36 Shot Definition
Shot 01:
DURATION=5s FPS=24 RES=3840x2160
PROMPT: <concrete visual instruction, short>
NEGATIVE: low quality, blurry, distorted hands, deformed face, gore, blood
CAMERA: handheld POV / slow push-in / over-the-shoulder
LIGHTING: low-key, streetlight, fluorescent hum
AUDIO_CUE: <ambient, dialogue, silence>
CONTINUITY: prev=none next=shared_object:mirror
SEED_HINT: episode_seed+01
Avoid face close-ups; prefer hands, backs, silhouettes. Telop/subtitles are composited downstream via ffmpeg.
2. Cinematic Prompt Design Principles
Shot-First Thinking
The first thing in every prompt is “where the camera is.” Abstract phrasing like “the camera shows” is banned. Use explicit cinematography language:
static camera/slow pan/close framing/wide interior shot/shallow depth- Specify camera position, field of view, and spatial compression
Environment Anchoring (Visual Mood)
Embed lighting, color palette, and surface textures as shared parameters across prompts:
- Lighting: warm, flickering, fluorescent, natural, overcast
- Color: muted pastels, warm golds, sickly greens
- Texture: fogged glass, worn metal, dust in light beams
This anchors the diffusion process and stabilizes results across scenes.
Action as Continuous Physical Sequence
Write actions as natural progression, not bullet points:
Leaning into frame → Hesitating near the handle → Exhaling slowly to fog the glass
Use arrows (→) to show step-by-step motion. Without this, LTX-2 produces “teleportation” — the model cannot predict intermediate frames when motion is abbreviated. Obsessively describing how the body moves through space is an effective approach to smooth video.
Character Definition Through Behavior
Define characters through posture, micro-expressions, timing, and small physical habits rather than long descriptions. Include age, clothing, and emotional state expressed through motion.
Camera Movement as Narrative Tool
- Slow pan: observation and tension building
- Static shots: tension or comedic timing
- Maintain consistent pan direction and speed across adjacent clips
- When continuing from previous scene, explicitly state
Continue the same pan
Audio Is Mandatory
Audio is not decoration — it defines timing:
- Ambient Sound: oven humming, steam, distant noise
- Dialogue Beats: quoted dialogue, [Beat] and [Silence] markers for pauses
- Music must be explicitly included or excluded
Prompt Structure Template
- Shot Establishment
- Environment & Lighting
- Character Position & Emotion
- Core Action Sequence
- Camera Movement (with timing)
- Audio & Dialogue
- Ending Visual Beat (pose, pause, reaction)
3. Multi-Scene Continuity Control
The Problem
Moving from single beautiful shots to multi-scene narratives introduces:
- Visual drift: lighting and texture shift between shots
- Camera discontinuity: pan speed and direction mismatch across clips
- Timing loss: inability to control acting “beats” and pauses
Last-Frame Continuity
- Continuation directive: explicitly state
Continue the same panat the start of the next scene - Seed incrementing: use a fixed episode seed plus shot index (
episode_seed + shot_idx). Allows micro-variation while maintaining macro-consistency - Texture inheritance: visual states from previous shots (e.g., fogged glass) become preconditions for the next shot
CONTINUITY Field in Practice
Each shot’s CONTINUITY field specifies prev= and next=, connecting adjacent scenes through shared objects, shared sounds, continuing actions, or location transitions:
CONTINUITY: prev=shared_sound:fluorescent_hum next=location_transition:hallway_to_kitchen
Without explicit connection directives, LTX-2 generates each shot as an independent image, producing jarring cuts when concatenated.
Avoiding Teleportation
When physical motion is abbreviated, the AI cannot predict intermediate frames and produces “teleportation.” The most effective countermeasure is obsessively describing how the body moves through space.
Takeaways
LTX-2 prompt engineering is fundamentally “writing a film storyboard in natural language.” The horror template forces structured output from the LLM. The cinematic principles maximize individual shot quality. The continuity controls connect shots into a narrative. All three layers are required for a production-ready pipeline. The future of video generation AI depends not on single-shot beauty, but on the intelligence embedded in the gaps between consecutive shots.
Reproduction Steps
Minimum Setup
- LTX-2 video generation environment (ComfyUI or API)
- Scenario generation LLM (local or API)
- ffmpeg (clip concatenation and telop compositing)
Horror Scenario Generation Workflow
- Select a Story Flavor (or provide free-form input)
- Feed the system prompt template from this article to the LLM
- Retrieve SCENES_36 and LTX2_PROMPTS_36 from output
- Feed each shot’s PROMPT to LTX-2 sequentially
- Concatenate generated clips with ffmpeg
- Composite audio and telop downstream
Cinematic Quality Checklist
- Camera position specified at the start of each prompt
- Lighting and color palette consistent across scenes
- Actions described step-by-step with arrows (→)
- CONTINUITY defined between adjacent shots
- Audio directives included
- Face close-ups avoided; hands/backs/silhouettes preferred

