Studio tooling reference

A working catalogue of the ij8 · studio.

A single-page technical reference for prospective artists: what is wired, which models are available, where the gallery and minting surfaces are live, and which studio internals still require authenticated capture.

01 Static, self-contained reference for tooling.ij8.ai.

02 Dense feature index, not a marketing page.

03 Auth-gated studio views are marked as capture pending.

00

Premise

ij8 is a generative creative studio for serious artists. Image, video, 3D, code, sound, classroom work, and gallery distribution converge in a single chat-driven canvas. The studio is in active development; the gallery and minting infrastructure are live. Its stance toward AI is a middle way — a collaborator the artist directs and authors with, not a system that replaces the work, and not a pace the artist must match. The human brings the vision, the taste, and the questions; the AI brings the speed and the hands. The same position carries into how it teaches.

Generative Structures II gallery artwork
Public gallery artwork capture: Generative Structures II.
Animi gallery artwork
Public gallery artwork capture: Animi.
Oracles3D gallery artwork
Public gallery artwork capture: Oracles3D.
01

Image generation

The image surface is multi-backend by design: Gemini first, OpenAI fallback, and a ComfyUI workflow library for specific model families and edit modes.

Gemini

gemini-3-pro-image-preview is the primary image model. Multi-turn editing chats are cached in memory by ${userId}:${sessionId}:${imageId} and cleaned every five minutes. The model sees prompt, reference image, and mask.

OpenAI image

gpt-image-1.5 is the fallback path when Gemini returns 503. The fallback is explicit rather than hidden behind a generic “AI image” label.

ComfyUI library

comfyui.ts is a ~3.8k-line workflow library spanning Flux 1, Flux 2, Flux 2 Klein, SD 3.5, Qwen-Image 2512, Z-Image Turbo (incl. GGUF-quantized), CyberRealistic, Playground 2.5, Kontext (image-edit adapter), and SDXL style-transfer, plus inpaint, outpaint, upscale, and two-pass variants. Per-checkpoint detection (isFlux2, isZImageGGUF, isQwenImage, isFlux2Klein) routes each builder to the right text-encoder / VAE pair.

Model pool

A configured image-model pool can be set. Per turn, one model is selected deterministically or randomly and sent as imageConfig.imageModel. The single-model dropdown writes through to both the pool and legacy field.

Prompt enhancement

/api/enhance-prompt performs a pre-pass that rewrites loose prompts toward the selected backend’s strengths.

Conversational editing

Direct generation and agentic-loop tools share the same image conversation context, allowing multi-turn refinement rather than isolated prompt submissions.

Sketch to image

A pencil button in image mode opens a freehand drawing pad — pen, brush, eraser, color swatches, undo — sized to the session's aspect ratio. The drawing is sent as a composition reference (Gemini or gpt-image, multimodal conditioning, no ControlNet), so the finished image preserves the drawn layout, shapes, and spatial relationships. /api/sketch-to-image.

Provenance

Every generated image records its generationMeta — resolved model id, AI backend, infrastructure target (local vs remote), and seed — shown in the canvas metadata panel. A render is traceable to the exact model and seed that produced it.

Authenticated studio capture pending: chat-driven image generation with model pool selector visible.
Authenticated studio capture pending: prompt enhancement before/after view.
Authenticated studio capture pending: multi-turn image conversation with reference image.
02

Image editing

The editing layer is region-aware. Masks, bounding boxes, popovers, inpaint variants, background removal, crop, outpaint, upscale, and code-to-raster conversion are treated as normal studio operations.

Authenticated studio capture pending: mask painter with toolbar, bounding-box overlay, and selected region.
Authenticated studio capture pending: inpaint variation picker and cross-turn session state.
Authenticated studio capture pending: region popover with a localized inpaint prompt.
03

Video

Video generation spans local ComfyUI workflows, Minimax API models, masked video inpaint, animation pipelines, and frame-level timeline inspection.

Wan

Wan 2.2 14B runs through ComfyUI as the local high-end video target, paired with VACE for masked inpaint.

Hailuo

Hailuo 2.3 and Hailuo 2.3 Fast are accessed through the Minimax HTTP API.

VACE

Animate region: on any still image, paint a mask with the full inpaint toolset, type a motion prompt, and generate a short looping video in which only the painted region moves — every unmasked pixel is anchored to the source photograph per frame, by construction (the source and mask are repeated across all frames). Wan 2.2 14B + VACE through ComfyUI; the endpoint is SSE-streamed to survive Cloudflare's 100s upstream timeout.

Cel-style

Cel-style video is available as a ComfyUI pipeline.

Animation

gemini-animation.ts and hy-motion.ts are part of the animation path.

Timeline

A frame timeline scrubber supports per-frame inspection. The share-video path uses ffmpeg with exact-duration alignment for X.com loop compatibility: video is pinned to -t, -r, -vsync cfr, -fflags +genpts; audio is wrapped with apad,atrim=0:videoDurationSec,asetpts=PTS-STARTPTS so MediaRecorder undershoot / overshoot does not break loop replay. AAC is the output codec; H.264 Main is the video codec (level 4.0 — level 3.1's macroblock ceiling rejected square 1080² captures); -movflags +faststart enables progressive download.

Authenticated studio capture pending: frame timeline scrubber during a generated video session.
Authenticated studio capture pending: masked VACE video inpaint setup.
Authenticated studio capture pending: share-video export controls and ffmpeg duration metadata.
04

3D

The current 3D target is HunYuan3D 2.1. The pipeline separates shape and texture generation, prepares 2D sources for silhouette extraction, and presents meshes in a browser viewer.

Bio-mechanical Artifact gallery artwork
Public gallery artwork suitable as a 2D-to-3D source reference.
Generative Structures II public 3D collection page
Public gallery capture: Generative Structures II collection page with model-viewer meshes.
Animi public 3D collection page
Public gallery capture: a 3D collection page rendering a real mesh in the gallery's model-viewer turntable.
05

Animation + rigging

Rigging combines conventional retargeting, UniRig auto-rigging, a geometric morphology classifier, free-form RigSpec authoring, and Blender-driven bakes streamed over SSE for long jobs.

Authenticated capture pending: /rig-test debug surface with morphology classification output.
Authenticated capture pending: RigSpec authoring view with topology archetype and behavior controls.
Authenticated capture pending: SSE-streamed bake progress for dense radial rig.
06

Code generation

The code surface supports creative coding, shader sketches, validation in Chromium, forked sketches, and conversion of code output into images.

Pipeline

code-pipeline.ts is the ~1.5k-line creative-coding pipeline.

GLSL

glsl-wrapper.ts wraps shader sketches.

Frameworks

Four creative-coding frameworks are wired: p5.js, three.js, GLSL, and Tone.js. An "Audio + visual" pair toggle injects Tone.js into a visual sketch.

Pairing addendum

When "Audio + visual" is enabled, the prompt injects a pairing addendum into the framework instructions. For Tone.js sketches it requires a mandatory p5.js spectrum visualizer in the same HTML. For p5.js / three.js / GLSL sketches it requires a Tone.js audio graph routed through new Tone.Limiter(-1).toDestination(), started by a centered Play overlay that transitions to a corner widget after first interaction.

Safety rules

The prompt enforces frequency guards (≥200 Hz on laptop speakers), forbids Tone.Destination.connect() (always .toDestination()), forbids event-driven triggerAttackRelease(freq, dur, Tone.now()) (omit the time arg to avoid same-frame start-time errors), and forbids p5.js variable names that shadow globals (scale, rotate, color, image, map, random, ...).

Editor

CodeViewer supplies code editor and run-and-capture integration.

Raster export

convert-to-2d converts sketch output to a raster image.

Fork

/api/sessions/fork-sketch creates sketch forks.

Explain

The Learn panel calls /api/images/[id]/explain to break a generated sketch down into its key algorithms from first principles.

Models

gemini-3.5-flash is primary with GEMINI_CODE_MODEL override. gpt-5.4 is fallback with OPENAI_CODE_MODEL override.

Draw to code

A pencil button in code mode opens the same freehand pad — but here the drawing is sent to the code model as vision input, with instructions to reproduce the drawn form as code: vertices, beziers, transforms, joints. The sketch is design guidance, not an asset — it never ships in the artifact and is explicitly excluded from runtime reference injection. Sketch governs shape and structure; the text prompt governs behavior, color, and motion. One drawing surface, two destinations: pixels in image mode, geometry-written-as-code in code mode.

Explanation layers

Every sketch's viewer carries four tabs — Preview, Gist, How it Works, Code. Gist distills the core algorithm for a beginner: real lines copied from the sketch (never invented pseudo-code) with short explanations, plus a plain-steps recipe view. How it Works is the deep layer: sections tagged math, computation, and aesthetics — aesthetic principles treated as a first-class explanatory dimension. Both generate once and cache per sketch; Refresh regenerates.

Stable seeds

The render seed is derived from the sketch id, so a generative sketch reproduces identically on every view — across reloads, sessions, and devices. Sketches are stable artifacts, not slot machines. A New Seed button re-rolls for the session only.

Validation

Browser-side validation runs through Playwright Chromium via CODE_VALIDATION_BROWSER.

Authenticated capture pending: code editor and live sketch viewer integrated with the chat surface.
Authenticated capture pending: run-and-capture result converted to 2D image asset.
Authenticated capture pending: sketch fork flow and validation output.
07

Sound

Sound is a shipped medium, not a planned one. Music, sound effects, and video-to-audio foley generate through ComfyUI; Tone.js is wired as a fourth creative-coding framework so audio can be authored alongside a visual sketch.

Music

ACE-Step v1.5 turbo generates music through ComfyUI — 8 steps, cfg 1, ModelSamplingAuraFlow shift 3. Reached through /api/generate-audio with kind: 'music'.

SFX / ambient

Stable Audio Open 1.0 generates sound effects and ambient beds. The checkpoint ships without T5, so the workflow loads it via a separate CLIPLoader(type='stable_audio').

Foley

HunyuanVideo-Foley XXL generates audio from video. kind: 'foley' routes through comfyui-foley.ts, including a WebP-to-MP4 transcode pipeline.

Endpoint

/api/generate-audio is a single endpoint covering all three modalities, routing to workflows in comfyui-audio.ts and comfyui-foley.ts. Audio defaults to the local backend — an HPC round-trip exceeds generation time. Backend routing for all other modalities is covered in section 12.

Tone.js

Tone.js is one of four creative-coding frameworks. An "Audio + visual" pair toggle injects Tone.js into a visual sketch so sound and image are authored in the same canvas.

Canvas

AudioPlayer presents generated audio in the canvas. The MediaType model includes music, sfx, and speech alongside image, video, 3D, animation, code, app, and text.

Storyboard soundtracks

Storyboards can ship with a generated ambient music bed that loops in Present mode. Volume slider and mute toggle live next to the panel controls. Generation reuses kind: 'music' with the storyboard title and outline as prompt context.

Authenticated studio capture pending: chat-driven music generation with the AudioPlayer in canvas.
Authenticated studio capture pending: video-to-audio foley result alongside its source clip.
Authenticated studio capture pending: a paired Tone.js + visual sketch with the audio-visual toggle.
08

Storyboards

A storyboard is a sequence of panels — image, video, 3D, code, or text — assembled inside the same chat-driven canvas, with a scaffolded outline, real per-panel playback, an optional ambient soundtrack, and a Present mode for review and sharing.

Authenticated studio capture pending: brainstorm scratchpad and progressive outline fill.
Authenticated studio capture pending: storyboard with mixed image / video / 3D / code panels and ambient soundtrack controls.
Authenticated studio capture pending: Present mode with Narrate + auto-advance running across panels.
09

Apps + classroom

The studio itself is the app surface: chat, canvas, collections, and classroom. The classroom layer adds authoring, cohorts, lesson sessions, AI grading, and exportable reports. The middle way carries into teaching: ij8 teaches AI literacy, computational thinking, design, and creative innovation by having students make — self-paced, with full instructor control and visibility. The full teaching reference is at classroom.ij8.ai; institutional pilots and partnerships are at pilots.ij8.ai.

ij8 Studio App Lab context pack in AppShell with chat and canvas
Studio capture: AppShell + chat + canvas running an App Lab prototype — context pack on the left (refined prompt, entities, variables), live mobile prototype rendered in the canvas.
Authenticated capture pending: student lesson session with stepwise progression and LessonScoreHud.
Authenticated capture pending: admin/teacher report table with CSV export.
10

Hybrid approaches

The platform is strongest when a work moves between modes. The following cross-pipeline flows are routine or directly implied by wired surfaces.

  1. Image to 3D mesh

    Generate or import a 2D image, run the 3D prep prompt, extract shape, texture, and inspect in the model viewer.

  2. Mesh to rigged animation

    Generate a mesh, classify morphology, author or infer a RigSpec, then bake continuous behavior in Blender.

  3. Sketch to image

    Generate p5.js or GLSL, run it, capture the canvas frame, and store the result as an image.

  4. Image to mask to video

    Select a region, create a mask, and send the image/mask pair into VACE or another video pipeline.

  5. Mask to inpaint variation

    Paint a mask, generate multiple inpaint variants, and retain picker state across turns.

  6. Image to background removal to 3D

    Batch remove background with rembg variants, then feed the cleaner silhouette to HunYuan3D.

  7. Animation frame to image edit to re-animation

    Scrub to a frame, edit it as an image, then use the altered frame as a new animation source.

  8. Code to APNG

    Run a sketch or shader, capture frames, and export a lightweight loop.

  9. Sketch to sound

    Pair Tone.js into a visual sketch with the audio-visual toggle so sound is authored alongside the canvas rather than added afterward.

  10. Video to foley

    Generate or scrub a video, then run HunyuanVideo-Foley to synthesize a matching audio track for it.

  11. Course session to minted collection

    Use classroom-guided production to generate work, curate outputs, then publish through collections.

  12. Drawing to artifact

    One freehand pad, two destinations: render the drawing as a finished image, or hand it to the code model as vision guidance and get the form back as live geometry in code.

  13. Still to regional motion

    Mask a region of a finished image and animate only that region into a looping video; the rest stays the literal photograph.

  14. Views to solid mesh

    Hand-pick or synthesize back/left/right views of a subject, then condition the 3D shape pass on all of them to kill the flat-back problem.

  15. Generative reserve to fulfill

    An on-chain reservation triggers background image generation, IPFS pinning, and on-chain fulfillment.

  16. Image edit to gallery drop

    Use conversational editing and background removal to refine a work, then place it inside a scheduled gallery release.

  17. Codegen collection

    Fork a sketch, validate it in Chromium, mint or surface it as a codegen lane collection.

Isometric studio artwork from public gallery
Public gallery capture: isometric studio work as a reference for hybrid spatial/code/image practice.
Gli Studi degli Artisti artwork
Public gallery capture: environment-like generative image, suitable for downstream 3D or animation treatment.
Authenticated capture pending: a complete hybrid chain from chat prompt to canvas artifact to collection.
11

Distribution: gallery, collections, NFT

The distribution layer is live: artist pages, collection pages, scheduled drops, IPFS pinning, Ethereum contracts, royalties, blind reveals, regenerative fulfill, and gallery surfacing.

ij8 Gallery homepage capture
Public capture: gallery home with featured collections.
ij8 Gallery explore capture
Public capture: explore page with many on-chain collection cards.
ij8 public landing capture
Public capture: unauthenticated ij8 landing surface.
12

Backends + routing

Generation is routed per request across four ComfyUI backends. For most users the choice is silent — they see generation time, not infrastructure. Users granted remote access get an explicit Compute backend toggle. Per-user and per-cohort opt-in gates HPC access; audio always pins to local.

Routing policy is intentionally policy-driven rather than availability-driven: paid / commercial requests must not be routed to SMU. Free / educational requests can fall back to local if HPC is unreachable.

13

Roadmap

Roadmap items are marked by current state. Sound has shipped and has its own section; word remains a planned medium, not yet a shipped feature.

StateItemNotes
ShippingGallery and minting infrastructureArtist pages, collection pages, scheduled drops, Ethereum L1 + Sepolia contract paths, ERC-2981 royalties, Pinata IPFS pinning.
Shipping / activeImage generation and editingGemini, OpenAI fallback, ComfyUI workflows, masking, inpaint, outpaint, background removal, crop, upscale, and region-level interactions.
Shipping / activeSoundMusic (ACE-Step v1.5 turbo), SFX and ambient (Stable Audio Open 1.0), video-to-audio foley (HunyuanVideo-Foley), and Tone.js as a creative-coding framework. See section 07.
ShippingLearning surfaces50 official tutorials (Explain → Show → Play → Make), instructor-authored tutorials, Gist and How-it-Works explanation layers. See classroom.ij8.ai.
Shipped3D job cancellationOwnership-checked cancel that interrupts the in-flight ComfyUI workflow; the fulfill worker checks between every stage; per-stage timeouts.
In progressCancellation everywhereVideo, animation, and generative-fulfill jobs still need the same treatment.
NextHot-wallet hygieneMove toward KMS or signing-sidecar abstraction.
NextObservabilitySentry rollout, structured logging, and real per-backend health checks.
StartedTest scaffoldingVitest landed in the studio (11 unit-test suites: outpaint, gist, history-trim, render-policy, app-lab, access). CI gates remain lint + build; typecheck gate and rate limiting still ahead.
PlannedWordAlgorithmic poetry, generative text and word-art forms, computational poetics, and lettrist / concrete-poetry tooling, treated as a first-class medium alongside image and sound rather than as a chat afterthought.
ExploratorySpeech / TTSThe AudioMediaType enum already includes 'speech'; no implementation. Pending a deliberate choice of voice model and license terms.
ExploratoryMulti-chain supportSupport beyond Ethereum.
ExploratoryBeyond-NFT distributionDigital-painting fabrication, 3D printing, and game-asset pipelines.
14

Open questions / what artists shape

Founding-cohort artists are invited to influence which roadmap items get prioritized and what bespoke workflows are developed around their practice. The relationship is closer to founding members than customers: the tools should be tested against real methods, not generic usage funnels.

Open questions include which hybrid flows matter most, where bespoke artist tools should become shared primitives, which distribution formats should sit beyond NFT, and how word should enter the same canvas — and how sound should deepen — without becoming ornamental add-ons. Drawing as a first-class input — one pad feeding both image and code — and storyboard authoring are the most recent shapes under active redesign; the questions there are about how much narrative scaffolding to bake in before it stops feeling like an open canvas.

15

Index / colophon

Complete enumeration of named models, backends, contracts, and infrastructure in this reference.

LayerTechnology / model / pathStatus / use
Imagegemini-3-pro-image-previewPrimary image generation and multi-turn editing.
Imagegpt-image-1.5OpenAI fallback for Gemini 503.
Chat orchestrationgemini-3.5-flash (GEMINI_CHAT_MODEL override); qwen3:14b (Ollama); minimax-m27Primary chat / agentic loop; local non-streaming alternate inside the agentic loop; Minimax HTTP alternate.
ComfyUI imageFlux 1, Flux 2, SD 3.5, Qwen-Image 2512, Flux 2 Klein, Z-Image Turbo (incl. GGUF-quantized), CyberRealistic, Playground 2.5, Kontext, SDXL style-transferWorkflow library in comfyui.ts.
EditMaskEditor, MaskToolbar, BoundingBoxOverlay, rembg general / anime / precise / birefnet, convert-to-2d, sketch-to-image, draw-to-code (role-separated sketch refs)Masking, inpaint, outpaint, crop, zoom-enhance, upscale, background removal, code raster.
VideoWan 2.2 14B, Hailuo 2.3, Hailuo 2.3 Fast, VACE, cel-style video, gemini-animation.ts, hy-motion.ts, ffmpeg 6.1.1, VACE animate-region (SSE)ComfyUI, Minimax API, masked video inpaint, animation, share-video export.
3DHunYuan3D 2.1, HunYuan3D 2.0, HunYuan3D-2mv multi-view, /api/3d-views/generate, 3D job cancel, build3DPrepPrompt, model-viewerDual-pass shape to texture, multi-view conditioning, prep-prompt, viewer, APNG export.
RiggingMixamo FBX, Blender 4.4.3, UniRig, scripts/classify_morphology.py, classify-morphology.ts, lib/rig/spec.ts, lib/rig/topology.ts, scripts/bake_rig.py, bake-rig.ts, /rig-test, /api/rig/auto-smokeRetargeting, auto-rigging, morphology classification (radial, stalk, body-tail, pulse-mass), RigSpec authoring, Blender bake, SSE streamed long bakes.
Codecode-pipeline.ts, glsl-wrapper.ts, p5.js, three.js, GLSL, Tone.js, app framework, CodeViewer, GistPane, HowItWorksPane, deterministic per-id seed, DrawToCodePanel, /api/sessions/fork-sketch, /api/images/[id]/explain, gemini-3.5-flash, gpt-5.4, CODE_VALIDATION_BROWSERCreative coding across four frameworks, run-and-capture, fork, sketch explanation, model fallback, browser validation.
Storyboardsscenes schema, collection_items.narrative, present mode, narrate toggle, auto-advance, ambient soundtrackMulti-panel scene authoring; mixed-media panels; present-mode playback.
Brainstormbrainstorm scratchpad + Refine + Send-to-storyboard, update_outline tool, progressive outline fillPre-scaffold ideation loop feeding storyboards.
App Lablib/ai/app-lab/context-pack.ts, prototype-generator.ts, inspect.ts, virtual-files.ts, detect-underspecification.ts, map.ts, shape.ts, stage-selection.tsReact-CDN guided prototype generation with validation + repair.
SoundACE-Step v1.5 turbo, Stable Audio Open 1.0, HunyuanVideo-Foley XXL, Tone.js, /api/generate-audio, comfyui-audio.ts, comfyui-foley.ts, AudioPlayer, SoundtrackBarMusic, SFX and ambient, video-to-audio foley, Tone.js creative-coding audio, storyboard soundtrack playback.
Classroomcourse-scaffold.ts, /api/courses/syllabus/parse, CourseAuthorWorkspace, course wizard modal, /api/lessons/sessions/[id]/next, lesson-scoring.ts, gemini-3.1-flash-lite, AppShell, Sidebar, LearnPanel, LessonScoreHudCourse authoring, lesson progression, AI grading, roles, reports, Commons course pool.
Learning50 seeded tutorials (seed-tutorials.ts), illustration verify-refine agent + heartbeat curator, archetype widgets (mapping-diagram, step-builder, param-explorer, live-animated), tutorials / tutorial_illustrations / tutorial_sessions / course_tutorials tablesExplain→Show→Play→Make tutorial runtime + instructor authoring.
Backendscomfyui-backend.ts, comfyui-backend-config.ts, AsyncLocalStorage(comfyBackendContext), TCP+HTTP liveness, 15s cache, users.hpcAccess, cohorts.hpcAccessPer-request routing across local 4090, SMU SuperPOD A100, RunPod, RunComfy.
Operations/api/health (DB · ComfyUI · HunYuan3D · key-presence per backend), /api/admin/providers, platform_provider_settings, user_provider_settings, pino, Sentry stubsHealth + provider-key admin + observability scaffolding.
ContractsIJGCollection.sol, IJGCollectionV2.sol, IJGCollectionV3.sol, IJGCollectionFactory.sol, Sepolia 0x92B10e1f7107D7A2436445B4842a8077d041151eV1, generative reservation, regeneration + refund + cloneable, EIP-1167 factory.
StackNext.js 16.1.6, React 19.2.3, TypeScript 5 strict, Tailwind 4, Zustand 5, Drizzle ORM + better-sqlite3 SQLite (36 tables), Auth.js, Google OAuth, email magic link, DB whitelist, wagmi ^3, viem ^2, custom WalletButton, ComfyUI, Blender 4.4.3, ffmpeg 6.1.1, rembg, UniRig, ACE-Step v1.5 turbo, Stable Audio Open 1.0, HunyuanVideo-Foley XXL, Tone.js, storyboards (scenes schema, present mode, ambient soundtrack), App Lab framework (React 19 UMD, Tailwind, Babel Standalone), Gemini 3.5 Flash, Qwen3:14b (Ollama), Minimax M27, npm workspaces, Turborepo, Cortex-studio, PM2, Cloudflare tunnel, Pinata, Ethereum mainnet + SepoliaApplication and infrastructure stack.

Self-hosted Inter. No tracking. No external scripts. Single-page static output for tooling.ij8.ai. Last rendered .