gemini-3-pro-image-preview is the primary image model. Multi-turn editing chats are cached in memory by ${userId}:${sessionId}:${imageId} and cleaned every five minutes. The model sees prompt, reference image, and mask.
A working catalogue of the ij8 · studio.
A single-page technical reference for prospective artists: what is wired, which models are available, where the gallery and minting surfaces are live, and which studio internals still require authenticated capture.
Premise
ij8 is a generative creative studio for serious artists. Image, video, 3D, code, sound, classroom work, and gallery distribution converge in a single chat-driven canvas. The studio is in active development; the gallery and minting infrastructure are live. Its stance toward AI is a middle way — a collaborator the artist directs and authors with, not a system that replaces the work, and not a pace the artist must match. The human brings the vision, the taste, and the questions; the AI brings the speed and the hands. The same position carries into how it teaches.



Image generation
The image surface is multi-backend by design: Gemini first, OpenAI fallback, and a ComfyUI workflow library for specific model families and edit modes.
gpt-image-1.5 is the fallback path when Gemini returns 503. The fallback is explicit rather than hidden behind a generic “AI image” label.
comfyui.ts is a ~3.8k-line workflow library spanning Flux 1, Flux 2, Flux 2 Klein, SD 3.5, Qwen-Image 2512, Z-Image Turbo (incl. GGUF-quantized), CyberRealistic, Playground 2.5, Kontext (image-edit adapter), and SDXL style-transfer, plus inpaint, outpaint, upscale, and two-pass variants. Per-checkpoint detection (isFlux2, isZImageGGUF, isQwenImage, isFlux2Klein) routes each builder to the right text-encoder / VAE pair.
A configured image-model pool can be set. Per turn, one model is selected deterministically or randomly and sent as imageConfig.imageModel. The single-model dropdown writes through to both the pool and legacy field.
/api/enhance-prompt performs a pre-pass that rewrites loose prompts toward the selected backend’s strengths.
Direct generation and agentic-loop tools share the same image conversation context, allowing multi-turn refinement rather than isolated prompt submissions.
A pencil button in image mode opens a freehand drawing pad — pen, brush, eraser, color swatches, undo — sized to the session's aspect ratio. The drawing is sent as a composition reference (Gemini or gpt-image, multimodal conditioning, no ControlNet), so the finished image preserves the drawn layout, shapes, and spatial relationships. /api/sketch-to-image.
Every generated image records its generationMeta — resolved model id, AI backend, infrastructure target (local vs remote), and seed — shown in the canvas metadata panel. A render is traceable to the exact model and seed that produced it.
Image editing
The editing layer is region-aware. Masks, bounding boxes, popovers, inpaint variants, background removal, crop, outpaint, upscale, and code-to-raster conversion are treated as normal studio operations.
- Mask painter
MaskEditor,MaskToolbar, andBoundingBoxOverlaysupport precise selection, freehand masks, region selection, and bounding-box overlay. - Inpaint
Three inpaint paths are wired: Gemini multi-turn, Kontext Flux through ComfyUI, and a ComfyUI inpainting workflow. The inpaint variation picker is backed by
sessionStorageacross turns. Within the ComfyUI path, inpaint likewise prefers Flux Fill over the selected checkpoint whenever its components are available. - Canvas edits
Outpaint, crop, zoom-enhance, upscale, remove-region, and region popovers are available operations. Outpaint always runs Flux Fill — a true inpainting model that conditions on the original pixels — regardless of which 2D model is selected: the selected model is the generation engine, not the outpaint engine. Extension proceeds one edge per pass with computed feathering and a smoothstep soft-stitch that composites the true original back over the render, so the preserved region is untouched and the seam is structural, not cosmetic. The same workflow runs identically on the local GPU and on HPC, with a preflight that verifies all four Flux Fill components before committing. Outpaint runs as an async job so long passes outlive Cloudflare's ~100s ceiling.
- Background removal
rembgruns as a Python subprocess with four variants: general, anime, precise, and birefnet. - Region popovers
A selected region opens a popover for a localized prompt, prompt enhancement, and an applied inpaint edit — refining one area without reprompting the whole image.
- Code raster
convert-to-2dcaptures code-rendered output as a raster image so sketches can re-enter the visual pipeline.
Video
Video generation spans local ComfyUI workflows, Minimax API models, masked video inpaint, animation pipelines, and frame-level timeline inspection.
Wan 2.2 14B runs through ComfyUI as the local high-end video target, paired with VACE for masked inpaint.
Hailuo 2.3 and Hailuo 2.3 Fast are accessed through the Minimax HTTP API.
Animate region: on any still image, paint a mask with the full inpaint toolset, type a motion prompt, and generate a short looping video in which only the painted region moves — every unmasked pixel is anchored to the source photograph per frame, by construction (the source and mask are repeated across all frames). Wan 2.2 14B + VACE through ComfyUI; the endpoint is SSE-streamed to survive Cloudflare's 100s upstream timeout.
Cel-style video is available as a ComfyUI pipeline.
gemini-animation.ts and hy-motion.ts are part of the animation path.
A frame timeline scrubber supports per-frame inspection. The share-video path uses ffmpeg with exact-duration alignment for X.com loop compatibility: video is pinned to -t, -r, -vsync cfr, -fflags +genpts; audio is wrapped with apad,atrim=0:videoDurationSec,asetpts=PTS-STARTPTS so MediaRecorder undershoot / overshoot does not break loop replay. AAC is the output codec; H.264 Main is the video codec (level 4.0 — level 3.1's macroblock ceiling rejected square 1080² captures); -movflags +faststart enables progressive download.
3D
The current 3D target is HunYuan3D 2.1. The pipeline separates shape and texture generation, prepares 2D sources for silhouette extraction, and presents meshes in a browser viewer.
- HunYuan3D 2.1
Current target. Dual-pass shape generation followed by texture generation. ComfyUI restarts between the shape and texture passes to clear VRAM — the two stages do not fit simultaneously on the single 24GB GPU.
- HunYuan3D 2.0
Alternate target retained in the 3D path.
- Prep prompt
build3DPrepPromptre-renders the 2D source as a sculpture on a light-gray background. Black voids were producing shredded silhouettes. The Convert-to-3D path sendstPoseandprepFor3Dflags so the source is re-rendered as a sculpture before HunYuan3D conversion. - Job queue
The 3D queue is durable, has
startedAtfor stale-recovery, and scopes fulfill jobs per user. - Multi-view
Single-image 3D must hallucinate what it cannot see — the flat-back problem. An MV picker locks the current image as the front view and fills back/left/right either from hand-picked session images or from a single "Generate views" click that synthesizes turntable rotations. With two or more views, the shape pass runs the HunYuan3D multi-view checkpoint, conditioning the mesh on every labeled view.
- Mesh viewer
model-viewersupports turntable yaw and APNG rotation export. - Fulfill worker
The async fulfill worker sits behind an internal-secret-gated route. Jobs are cancelable mid-pipeline: cancel is ownership-checked and idempotent, interrupts the in-flight ComfyUI workflow, and the fulfill worker observes cancellation between every stage — prep, T-pose, shape, texture — with per-stage timeouts.



Animation + rigging
Rigging combines conventional retargeting, UniRig auto-rigging, a geometric morphology classifier, free-form RigSpec authoring, and Blender-driven bakes streamed over SSE for long jobs.
- Retargeting
retarget-mixamo.tsruns Mixamo FBX retargeting through a Blender subprocess. - UniRig
unirig.tsprovides auto-rigging. - Morphology
scripts/classify_morphology.pyandclassify-morphology.tsuse a Fibonacci-sphere ray probe to identify radial, stalk, body-tail, and pulse-mass archetypes. - RigSpec
lib/rig/spec.tsandlib/rig/topology.tsdefine a free-form AI rig schema and topology archetypes.scripts/bake_rig.pyandbake-rig.tsbake through Blender. - Behaviors
Continuous behaviors include sway, pulse, softbody, and rotate.
- Long bakes
/api/rig/auto-smokeis SSE-streamed to survive Cloudflare’s 100s upstream timeout for dense radial bakes with 100+ bones across 240 frames and multiple chains. - Weighting
The hub bone gets
influenceRadiusvia a chain to prevent global hub dominance throughmanual_distance_weights. Orphan vertices fall back to the nearest bone to prevent static appendage islands. - Debug surface
/rig-testis the debug surface for rigging work.
Code generation
The code surface supports creative coding, shader sketches, validation in Chromium, forked sketches, and conversion of code output into images.
code-pipeline.ts is the ~1.5k-line creative-coding pipeline.
glsl-wrapper.ts wraps shader sketches.
Four creative-coding frameworks are wired: p5.js, three.js, GLSL, and Tone.js. An "Audio + visual" pair toggle injects Tone.js into a visual sketch.
When "Audio + visual" is enabled, the prompt injects a pairing addendum into the framework instructions. For Tone.js sketches it requires a mandatory p5.js spectrum visualizer in the same HTML. For p5.js / three.js / GLSL sketches it requires a Tone.js audio graph routed through new Tone.Limiter(-1).toDestination(), started by a centered Play overlay that transitions to a corner widget after first interaction.
The prompt enforces frequency guards (≥200 Hz on laptop speakers), forbids Tone.Destination.connect() (always .toDestination()), forbids event-driven triggerAttackRelease(freq, dur, Tone.now()) (omit the time arg to avoid same-frame start-time errors), and forbids p5.js variable names that shadow globals (scale, rotate, color, image, map, random, ...).
CodeViewer supplies code editor and run-and-capture integration.
convert-to-2d converts sketch output to a raster image.
/api/sessions/fork-sketch creates sketch forks.
The Learn panel calls /api/images/[id]/explain to break a generated sketch down into its key algorithms from first principles.
gemini-3.5-flash is primary with GEMINI_CODE_MODEL override. gpt-5.4 is fallback with OPENAI_CODE_MODEL override.
A pencil button in code mode opens the same freehand pad — but here the drawing is sent to the code model as vision input, with instructions to reproduce the drawn form as code: vertices, beziers, transforms, joints. The sketch is design guidance, not an asset — it never ships in the artifact and is explicitly excluded from runtime reference injection. Sketch governs shape and structure; the text prompt governs behavior, color, and motion. One drawing surface, two destinations: pixels in image mode, geometry-written-as-code in code mode.
Every sketch's viewer carries four tabs — Preview, Gist, How it Works, Code. Gist distills the core algorithm for a beginner: real lines copied from the sketch (never invented pseudo-code) with short explanations, plus a plain-steps recipe view. How it Works is the deep layer: sections tagged math, computation, and aesthetics — aesthetic principles treated as a first-class explanatory dimension. Both generate once and cache per sketch; Refresh regenerates.
The render seed is derived from the sketch id, so a generative sketch reproduces identically on every view — across reloads, sessions, and devices. Sketches are stable artifacts, not slot machines. A New Seed button re-rolls for the session only.
Browser-side validation runs through Playwright Chromium via CODE_VALIDATION_BROWSER.
Sound
Sound is a shipped medium, not a planned one. Music, sound effects, and video-to-audio foley generate through ComfyUI; Tone.js is wired as a fourth creative-coding framework so audio can be authored alongside a visual sketch.
ACE-Step v1.5 turbo generates music through ComfyUI — 8 steps, cfg 1, ModelSamplingAuraFlow shift 3. Reached through /api/generate-audio with kind: 'music'.
Stable Audio Open 1.0 generates sound effects and ambient beds. The checkpoint ships without T5, so the workflow loads it via a separate CLIPLoader(type='stable_audio').
HunyuanVideo-Foley XXL generates audio from video. kind: 'foley' routes through comfyui-foley.ts, including a WebP-to-MP4 transcode pipeline.
/api/generate-audio is a single endpoint covering all three modalities, routing to workflows in comfyui-audio.ts and comfyui-foley.ts. Audio defaults to the local backend — an HPC round-trip exceeds generation time. Backend routing for all other modalities is covered in section 12.
Tone.js is one of four creative-coding frameworks. An "Audio + visual" pair toggle injects Tone.js into a visual sketch so sound and image are authored in the same canvas.
AudioPlayer presents generated audio in the canvas. The MediaType model includes music, sfx, and speech alongside image, video, 3D, animation, code, app, and text.
Storyboards can ship with a generated ambient music bed that loops in Present mode. Volume slider and mute toggle live next to the panel controls. Generation reuses kind: 'music' with the storyboard title and outline as prompt context.
Storyboards
A storyboard is a sequence of panels — image, video, 3D, code, or text — assembled inside the same chat-driven canvas, with a scaffolded outline, real per-panel playback, an optional ambient soundtrack, and a Present mode for review and sharing.
- Sidebar tab
Storyboards live in their own sidebar tab (alongside Folders and Collections). The dropdown switcher above the tab list is the entry point on narrow widths.
- Brainstorm
Brainstorm scratchpad with Refine and Send to storyboard seeds an outline; AI fills the outline progressively via the
update_outlinetool and always offers a "your own direction" option. - Outline → panels
The scaffolder writes one placeholder panel per outline beat and auto-navigates to the new storyboard. Generate-this on a placeholder replaces it with the real artifact for its declared media type.
- Panels in canvas
3D panels render textured and animated like the canvas itself (not as static frames). Video and code panels run their real previews. Any result can be sent to a storyboard from the viewer toolbar's collection picker.
- Captions
Panel captions are auto-written from the outline plus image descriptions via Gemini; saved with the panel.
- Soundtrack
An ambient music bed is generated inline (
kind: 'music') from the storyboard title + outline. The track loops in Present mode and has a volume slider and a mute toggle. - Present mode
Narrate toggle, auto-advance, and loop. A render-loop cancellation fix from the May rollout keeps narration + auto-advance from dying between panels.
- Scenes
Panels group into named scenes — add, rename, reorder, and per-panel scene assignment are all in the storyboard UI. Storyboard is also a collection mode (see Distribution).
Apps + classroom
The studio itself is the app surface: chat, canvas, collections, and classroom. The classroom layer adds authoring, cohorts, lesson sessions, AI grading, and exportable reports. The middle way carries into teaching: ij8 teaches AI literacy, computational thinking, design, and creative innovation by having students make — self-paced, with full instructor control and visibility. The full teaching reference is at classroom.ij8.ai; institutional pilots and partnerships are at pilots.ij8.ai.
- Scaffolding
course-scaffold.tscreates AI course scaffolds./api/courses/syllabus/parseparses syllabi. - Authoring
CourseAuthorWorkspaceand the course wizard modal support course authors. - Lessons
/api/lessons/sessions/[id]/nextadvances lesson sessions step by step. Lessons are self-paced: the mastery score is live feedback toward a visible goal, never a lock on the next lesson. - Grading
lesson-scoring.tsusesgemini-3.1-flash-liteas the judge. The v2 algorithm splits weight by code-eligibility: code lessons earn engagement points across turns and apply / edit channels; non-code lessons (text, image, audio, 3D) reallocate apply-channel weight back to turns so a writing or sound lesson can reach the full 100. The judge sees the actual rendered artifact alongside the student's prompts, code, and explanations. - Roles
Roles are
admin,dev,teacher,student, anduser. Cohorts and cohort memberships are part of the data model. - Commons
The Commons course pool is shared across teachers. Course reports and CSV export are available.
- Shell
AppShell,Sidebar,LearnPanel, andLessonScoreHudintegrate into the chat-canvas surface. - App Lab
lib/ai/app-lab/is the guided-prototype framework:context-pack.tsbundles the refined prompt, requirements, constraints, variables, and an output sample;prototype-generator.tsgenerates a single-file React-CDN HTML prototype (React 19 UMD + ReactDOM + Tailwind + Babel Standalone, hash-routing,localStorage, mockFakeAuthProvider);inspect.tsvalidates structure and Babel syntax;virtual-files.tsisolates context;detect-underspecification.tsflags incomplete briefs; up to two repair attempts on validation failure; soft cap 750 KB, hard cap 1.5 MB. Prototypes are teaching-grade — readable and annotated — not production. - Lesson media types
Lessons declare which media types are allowed:
image,video,3d,code,app,text, andaudio(which subsumes music, SFX, and foley). Text-mode chat is inlined; artifacts save explicitly. When both text and other modes are allowed, the lesson defaults to non-text so the canvas isn't crowded out. - Teacher controls
Teachers can override lesson sessions with Mark complete and Reopen. Restarting a lesson snapshots the prior outline as a tile in the new session so the student keeps a thread to what came before.
- Tutorials
Alongside lessons: 50 official AI-led tutorials, each teaching one tightly scoped concept on an Explain → Show → Play → Make arc, with the canvas as the blackboard — every idea illustrated by a live, runnable sketch that was vetted before any student sees it. Ungraded by design, open to every signed-in account, searchable by tag and difficulty. Teachers author their own tutorials with AI co-authoring and accept-to-freeze illustration review, and courses interleave lessons and tutorials in one ordered curriculum. The full teaching treatment is at classroom.ij8.ai.
- Student showcase
Students publish curated work to a public, teacher-curated class gallery (
apps/showcase, served at showcase.ij8.ai) — cohort-scoped and wallet-free, distinct from the NFT gallery. Submissions move student → teacher curation (approve, cohort↔public, scene) → public reader; approved code sketches render live via/api/sketches/[id]/wrapped.

Hybrid approaches
The platform is strongest when a work moves between modes. The following cross-pipeline flows are routine or directly implied by wired surfaces.
- Image to 3D mesh
Generate or import a 2D image, run the 3D prep prompt, extract shape, texture, and inspect in the model viewer.
- Mesh to rigged animation
Generate a mesh, classify morphology, author or infer a RigSpec, then bake continuous behavior in Blender.
- Sketch to image
Generate p5.js or GLSL, run it, capture the canvas frame, and store the result as an image.
- Image to mask to video
Select a region, create a mask, and send the image/mask pair into VACE or another video pipeline.
- Mask to inpaint variation
Paint a mask, generate multiple inpaint variants, and retain picker state across turns.
- Image to background removal to 3D
Batch remove background with rembg variants, then feed the cleaner silhouette to HunYuan3D.
- Animation frame to image edit to re-animation
Scrub to a frame, edit it as an image, then use the altered frame as a new animation source.
- Code to APNG
Run a sketch or shader, capture frames, and export a lightweight loop.
- Sketch to sound
Pair Tone.js into a visual sketch with the audio-visual toggle so sound is authored alongside the canvas rather than added afterward.
- Video to foley
Generate or scrub a video, then run HunyuanVideo-Foley to synthesize a matching audio track for it.
- Course session to minted collection
Use classroom-guided production to generate work, curate outputs, then publish through collections.
- Drawing to artifact
One freehand pad, two destinations: render the drawing as a finished image, or hand it to the code model as vision guidance and get the form back as live geometry in code.
- Still to regional motion
Mask a region of a finished image and animate only that region into a looping video; the rest stays the literal photograph.
- Views to solid mesh
Hand-pick or synthesize back/left/right views of a subject, then condition the 3D shape pass on all of them to kill the flat-back problem.
- Generative reserve to fulfill
An on-chain reservation triggers background image generation, IPFS pinning, and on-chain fulfillment.
- Image edit to gallery drop
Use conversational editing and background removal to refine a work, then place it inside a scheduled gallery release.
- Codegen collection
Fork a sketch, validate it in Chromium, mint or surface it as a codegen lane collection.


Distribution: gallery, collections, NFT
The distribution layer is live: artist pages, collection pages, scheduled drops, IPFS pinning, Ethereum contracts, royalties, blind reveals, regenerative fulfill, and gallery surfacing.
- Contracts
IJGCollection.solV1,IJGCollectionV2.solgenerative reservation,IJGCollectionV3.solregeneration + refund + cloneable, andIJGCollectionFactory.solEIP-1167 minimal-proxy clones. - Chain + tests
Solidity 0.8.28, Cancun EVM, OpenZeppelin v5.4, 66 contract test cases across 3 suites (V1, V2, V3), Hardhat tooling. Sepolia V1 deploy:
0x92B10e1f7107D7A2436445B4842a8077d041151e, verified. - Collection modes
open,closed,edition,generative,blind,codegen,storyboard, andshowcase. Creative lanes:ai,code,hybrid. Storyboard and showcase are deliberately non-NFT modes: storyboard is a project workspace that hides all minting and contract UI; showcase carries per-item curation state (pending/approved, cohort/public) for teacher-curated class galleries. - IPFS + fulfill
Pinata REST API handles IPFS pinning. Generative drops move from on-chain reserve to non-blocking image-gen fetch to IPFS pin to on-chain fulfill, with recovery and regen-status routes for orphans.
- Blind drops
Blind drops use the same
Closedcontract mode, with an API/UI redaction layer that readstokenSold()on-chain and nullsfilePath,title, andpromptfor unsold items on a five-second cache. - Regeneration (V3)
V3 collections expose
regenerate(tokenId)bounded by amaxRegenerationscounter. The full state machine isreserved → generating → pinning → fulfilling → fulfilled, withrefundGenerative()closing the loop when a generation fails. Recovery and regen-status routes pick up orphaned reservations. - Storyboard collections
The storyboard collection mode is a non-NFT project surface: panels render live in the studio canvas and in the class-showcase reader — 3D panels textured and animated, video and code panels running their real previews, an optional ambient soundtrack looping in Present mode. It deliberately hides minting and contract UI.
- Royalty + fee
ERC-2981 royalty is 7.5% on secondary sales. Platform fee is 10% on primary sales.



Backends + routing
Generation is routed per request across four ComfyUI backends. For most users the choice is silent — they see generation time, not infrastructure. Users granted remote access get an explicit Compute backend toggle. Per-user and per-cohort opt-in gates HPC access; audio always pins to local.
- Local 4090
Studio host, no tunnel. Default for every user. Audio (music, SFX, foley) pins here unconditionally — remote round-trip exceeds generation time.
- SMU SuperPOD A100
Reached via an SSH
-Ltunnel from the studio host. Env:COMFYUI_URL_HPC. Opportunistic partition scheduling prefers theshortpartition (4-hour cap) over contendedbatchand auto-retargets pending jobs as queues shift. Cannot serve commercial / paid work — gated by cohort + user flag. - RunPod
Secondary remote target for commercial / paid work and for capacity overflow. Env:
COMFYUI_URL_RUNPODorCOMFYUI_RUNPOD_URLS. - RunComfy API
Managed-template backend with bearer-token auth. Suited to workflows already curated by RunComfy (e.g. HunYuan3D 2.1). Cold-start latency on
min_instances=0templates is ≈ 8.5 minutes — pre-warm before class. RunComfy also serves 2D generation now: Flux 2 Dev, Qwen-Image 2512, Flux 2 Klein, and Z-Image Turbo are mapped to serverless model endpoints. - Routing
comfyui-backend.tsusesAsyncLocalStorage— acomfyBackendStoretyped asComfyBackendContext: the per-request preference is resolved at the route layer and inherited by every downstreamgetComfyUrl()call. No global state, no parameter threading through 20 callees. - Liveness
Two-stage probe per backend: a 700ms TCP connect (detects whether an SSH tunnel is up at all), then a 3s HTTP
GET /queue(alive but allows for heavy generations). 15-second cache. Timeouts on the HTTP stage are treated as alive-but-slow so polling does not falsely demote a backend mid-generation. - Access
Per-user
users.hpcAccessand per-cohortcohorts.hpcAccessopt-in HPC routing. Preference fieldsettings.preferredComfyBackend='local' | 'hpc' | 'runpod' | 'api' | 'auto';'auto'normalizes to local;'api'selects the RunComfy managed-template path. - Health
/api/healthreports each backend independently: DB (SELECT 1), ComfyUI/queueprobe, HunYuan 3D model presence, and API-key presence for Gemini / OpenAI. Latency captured per backend. Status:ok/degraded/down. - Provider settings
/api/admin/providersexposes platform-level provider toggles and per-platform key overrides;platform_provider_settings+user_provider_settingstables back per-user keys (platform and per-user settings both have shipped UI; users can supply their own provider keys in the settings panel).
Routing policy is intentionally policy-driven rather than availability-driven: paid / commercial requests must not be routed to SMU. Free / educational requests can fall back to local if HPC is unreachable.
Roadmap
Roadmap items are marked by current state. Sound has shipped and has its own section; word remains a planned medium, not yet a shipped feature.
| State | Item | Notes |
|---|---|---|
| Shipping | Gallery and minting infrastructure | Artist pages, collection pages, scheduled drops, Ethereum L1 + Sepolia contract paths, ERC-2981 royalties, Pinata IPFS pinning. |
| Shipping / active | Image generation and editing | Gemini, OpenAI fallback, ComfyUI workflows, masking, inpaint, outpaint, background removal, crop, upscale, and region-level interactions. |
| Shipping / active | Sound | Music (ACE-Step v1.5 turbo), SFX and ambient (Stable Audio Open 1.0), video-to-audio foley (HunyuanVideo-Foley), and Tone.js as a creative-coding framework. See section 07. |
| Shipping | Learning surfaces | 50 official tutorials (Explain → Show → Play → Make), instructor-authored tutorials, Gist and How-it-Works explanation layers. See classroom.ij8.ai. |
| Shipped | 3D job cancellation | Ownership-checked cancel that interrupts the in-flight ComfyUI workflow; the fulfill worker checks between every stage; per-stage timeouts. |
| In progress | Cancellation everywhere | Video, animation, and generative-fulfill jobs still need the same treatment. |
| Next | Hot-wallet hygiene | Move toward KMS or signing-sidecar abstraction. |
| Next | Observability | Sentry rollout, structured logging, and real per-backend health checks. |
| Started | Test scaffolding | Vitest landed in the studio (11 unit-test suites: outpaint, gist, history-trim, render-policy, app-lab, access). CI gates remain lint + build; typecheck gate and rate limiting still ahead. |
| Planned | Word | Algorithmic poetry, generative text and word-art forms, computational poetics, and lettrist / concrete-poetry tooling, treated as a first-class medium alongside image and sound rather than as a chat afterthought. |
| Exploratory | Speech / TTS | The AudioMediaType enum already includes 'speech'; no implementation. Pending a deliberate choice of voice model and license terms. |
| Exploratory | Multi-chain support | Support beyond Ethereum. |
| Exploratory | Beyond-NFT distribution | Digital-painting fabrication, 3D printing, and game-asset pipelines. |
Open questions / what artists shape
Founding-cohort artists are invited to influence which roadmap items get prioritized and what bespoke workflows are developed around their practice. The relationship is closer to founding members than customers: the tools should be tested against real methods, not generic usage funnels.
Open questions include which hybrid flows matter most, where bespoke artist tools should become shared primitives, which distribution formats should sit beyond NFT, and how word should enter the same canvas — and how sound should deepen — without becoming ornamental add-ons. Drawing as a first-class input — one pad feeding both image and code — and storyboard authoring are the most recent shapes under active redesign; the questions there are about how much narrative scaffolding to bake in before it stops feeling like an open canvas.
Index / colophon
Complete enumeration of named models, backends, contracts, and infrastructure in this reference.
| Layer | Technology / model / path | Status / use |
|---|---|---|
| Image | gemini-3-pro-image-preview | Primary image generation and multi-turn editing. |
| Image | gpt-image-1.5 | OpenAI fallback for Gemini 503. |
| Chat orchestration | gemini-3.5-flash (GEMINI_CHAT_MODEL override); qwen3:14b (Ollama); minimax-m27 | Primary chat / agentic loop; local non-streaming alternate inside the agentic loop; Minimax HTTP alternate. |
| ComfyUI image | Flux 1, Flux 2, SD 3.5, Qwen-Image 2512, Flux 2 Klein, Z-Image Turbo (incl. GGUF-quantized), CyberRealistic, Playground 2.5, Kontext, SDXL style-transfer | Workflow library in comfyui.ts. |
| Edit | MaskEditor, MaskToolbar, BoundingBoxOverlay, rembg general / anime / precise / birefnet, convert-to-2d, sketch-to-image, draw-to-code (role-separated sketch refs) | Masking, inpaint, outpaint, crop, zoom-enhance, upscale, background removal, code raster. |
| Video | Wan 2.2 14B, Hailuo 2.3, Hailuo 2.3 Fast, VACE, cel-style video, gemini-animation.ts, hy-motion.ts, ffmpeg 6.1.1, VACE animate-region (SSE) | ComfyUI, Minimax API, masked video inpaint, animation, share-video export. |
| 3D | HunYuan3D 2.1, HunYuan3D 2.0, HunYuan3D-2mv multi-view, /api/3d-views/generate, 3D job cancel, build3DPrepPrompt, model-viewer | Dual-pass shape to texture, multi-view conditioning, prep-prompt, viewer, APNG export. |
| Rigging | Mixamo FBX, Blender 4.4.3, UniRig, scripts/classify_morphology.py, classify-morphology.ts, lib/rig/spec.ts, lib/rig/topology.ts, scripts/bake_rig.py, bake-rig.ts, /rig-test, /api/rig/auto-smoke | Retargeting, auto-rigging, morphology classification (radial, stalk, body-tail, pulse-mass), RigSpec authoring, Blender bake, SSE streamed long bakes. |
| Code | code-pipeline.ts, glsl-wrapper.ts, p5.js, three.js, GLSL, Tone.js, app framework, CodeViewer, GistPane, HowItWorksPane, deterministic per-id seed, DrawToCodePanel, /api/sessions/fork-sketch, /api/images/[id]/explain, gemini-3.5-flash, gpt-5.4, CODE_VALIDATION_BROWSER | Creative coding across four frameworks, run-and-capture, fork, sketch explanation, model fallback, browser validation. |
| Storyboards | scenes schema, collection_items.narrative, present mode, narrate toggle, auto-advance, ambient soundtrack | Multi-panel scene authoring; mixed-media panels; present-mode playback. |
| Brainstorm | brainstorm scratchpad + Refine + Send-to-storyboard, update_outline tool, progressive outline fill | Pre-scaffold ideation loop feeding storyboards. |
| App Lab | lib/ai/app-lab/context-pack.ts, prototype-generator.ts, inspect.ts, virtual-files.ts, detect-underspecification.ts, map.ts, shape.ts, stage-selection.ts | React-CDN guided prototype generation with validation + repair. |
| Sound | ACE-Step v1.5 turbo, Stable Audio Open 1.0, HunyuanVideo-Foley XXL, Tone.js, /api/generate-audio, comfyui-audio.ts, comfyui-foley.ts, AudioPlayer, SoundtrackBar | Music, SFX and ambient, video-to-audio foley, Tone.js creative-coding audio, storyboard soundtrack playback. |
| Classroom | course-scaffold.ts, /api/courses/syllabus/parse, CourseAuthorWorkspace, course wizard modal, /api/lessons/sessions/[id]/next, lesson-scoring.ts, gemini-3.1-flash-lite, AppShell, Sidebar, LearnPanel, LessonScoreHud | Course authoring, lesson progression, AI grading, roles, reports, Commons course pool. |
| Learning | 50 seeded tutorials (seed-tutorials.ts), illustration verify-refine agent + heartbeat curator, archetype widgets (mapping-diagram, step-builder, param-explorer, live-animated), tutorials / tutorial_illustrations / tutorial_sessions / course_tutorials tables | Explain→Show→Play→Make tutorial runtime + instructor authoring. |
| Backends | comfyui-backend.ts, comfyui-backend-config.ts, AsyncLocalStorage(comfyBackendContext), TCP+HTTP liveness, 15s cache, users.hpcAccess, cohorts.hpcAccess | Per-request routing across local 4090, SMU SuperPOD A100, RunPod, RunComfy. |
| Operations | /api/health (DB · ComfyUI · HunYuan3D · key-presence per backend), /api/admin/providers, platform_provider_settings, user_provider_settings, pino, Sentry stubs | Health + provider-key admin + observability scaffolding. |
| Contracts | IJGCollection.sol, IJGCollectionV2.sol, IJGCollectionV3.sol, IJGCollectionFactory.sol, Sepolia 0x92B10e1f7107D7A2436445B4842a8077d041151e | V1, generative reservation, regeneration + refund + cloneable, EIP-1167 factory. |
| Stack | Next.js 16.1.6, React 19.2.3, TypeScript 5 strict, Tailwind 4, Zustand 5, Drizzle ORM + better-sqlite3 SQLite (36 tables), Auth.js, Google OAuth, email magic link, DB whitelist, wagmi ^3, viem ^2, custom WalletButton, ComfyUI, Blender 4.4.3, ffmpeg 6.1.1, rembg, UniRig, ACE-Step v1.5 turbo, Stable Audio Open 1.0, HunyuanVideo-Foley XXL, Tone.js, storyboards (scenes schema, present mode, ambient soundtrack), App Lab framework (React 19 UMD, Tailwind, Babel Standalone), Gemini 3.5 Flash, Qwen3:14b (Ollama), Minimax M27, npm workspaces, Turborepo, Cortex-studio, PM2, Cloudflare tunnel, Pinata, Ethereum mainnet + Sepolia | Application and infrastructure stack. |
Self-hosted Inter. No tracking. No external scripts. Single-page static output for tooling.ij8.ai. Last rendered .
