/// media generation
Video generation
Nodes can generate text-to-video on your own GPU, alongside text and images. It runs entirely on your hardware through stable-diffusion.cpp— there's no cloud path, so it never touches credits. Generation is asynchronous (a clip takes minutes): you start a job and poll it until the video is ready.
1. Deploy a video model
Video needs a video model loaded on a node. On the Nodes page, open a node, go to Media, and deploy one. The node downloads the weights, starts its media runtime, and reports ready — watch it with wai status.
| model | needs | notes |
|---|---|---|
| Wan 2.2 TI2V 5B | 12GB+ VRAM | The lightweight default. Short clips (~2s) in a few minutes on a 12GB card. |
| LTX-2.3 (22B) | 14GB+ VRAM, ~12GB free RAM | Higher quality, larger. Its Gemma-12B text encoder is streamed from RAM so the diffusion model fits a 16GB GPU; slower than Wan. |
The dashboard greys out a model a node can't fit. One media model runs per node at a time; deploying a new one replaces it. Text inference on the same node is unaffected.
2. Test it from the node
Once a video model reports ready, the fastest check is wai test-video on the node itself. It generates a short clip straight against the local media server (bypassing the gateway), saves it to your Desktop, and opens it — a quick way to confirm the GPU actually renders before wiring it into an app.
wai status # confirm the media model is "ready" wai test-video # default prompt wai test-video "a red kite over a green field, slow motion"
wai status wai test-video wai test-video "a red kite over a green field, slow motion"
It prints the job id and polls until done, then writes wai-test-video.webm. A first run also confirms the heavier bits work end to end — for LTX-2.3 that's the Gemma encoder offload and the temporal-tiled VAE decode. (wai test-image does the same for image models.)
Prefer raw HTTP? The node's media server speaks an async API on 127.0.0.1:8081 (loopback only): POST /sdcpp/v1/vid_gen returns a job id, then GET /sdcpp/v1/jobs/{id} returns the base64 clip when complete.
3. Use it
The everyday way is the web UI. In Chat or the playground, pick a video model — they appear as model@node (e.g. ltx-2.3@gpu01) — type a prompt, and the clip renders inline when the job finishes.
To drive it from your own app, use the playground video endpoint. It's session-authenticated (a signed-in WAI session, not a wai_sk_… gateway key — video has no OpenAI-compatible /v1 route yet). Start a job, then poll until status: "done":
POST /api/playground/video
{ "prompt": "a red kite over a green field", "node": "gpu01" } // node optional
-> 202 { "jobId": "…", "nodeId": "…", "node": "gpu01" }GET /api/playground/video?jobId=…&nodeId=…
-> { "status": "pending" | "running", "progress": … }
-> { "status": "done", "video": "<base64>", "contentType": "video/webm" }
-> { "status": "failed", "error": "…" }Good to know
- It takes minutes, not seconds. Clips are short by design; expect a few minutes per generation depending on the model, resolution, and frame count.
- Output is WebM. The node returns a base64 WebM clip; the UI renders it inline.
- Node-only, free. There is no cloud failover for video — if no node is ready, the request returns a 503 telling you to deploy a model. Nothing is billed.
- Storage.Model weights (Wan is ~10GB, LTX-2.3 ~24GB) land in the node's models folder; move it to another disk with
wai models-dir.
See also the wai CLI reference for the full command list and node states.