The question behind UltimateDemo2026 was straightforward: how far can pure AI-assisted coding take you on genuinely uncharted technical ground? Not a tutorial platform with abundant examples, but a specific combination that sits well outside the mainstream -- the Ultimate 64's hardware extensions (64 MHz turbo, Ultimate Audio DMA, the UCI interface), and classic C64 demo effects, compiled with the Oscar64 cross-compiler for 6502.
There was a secondary aim alongside the proof of concept: to build reusable, well-documented library code for the Ultimate 64's extended capabilities, usable in future projects. If the demo worked, the libraries and documentation that had to be constructed to build it would have value beyond the demo itself.
What came out of those sessions is a multi-scene demo running on real Ultimate 64 hardware: eight visual effects plus a hardware detection sequence, compiled to a 36 KB PRG with Oscar64. This post is an honest account of how that happened -- the techniques that worked, the bugs that didn't behave, and what the AI collaboration was actually like.

The startup screen probes each hardware requirement before the demo begins. UCI presence and firmware version, REU size (16 MB required), turbo mode at the $D031 register, Ultimate Audio version, and the MOD music file load from USB into REU. Any failure exits with an error message rather than running a broken demo.
The Hardware: Ultimate 64 at 64 MHz
The Ultimate 64 (U64) is a modern FPGA-based reimplementation of the Commodore 64 mainboard, produced by Gideon Zweijtzer. Beyond being a faithful C64, it adds hardware that the original machine never had.
The key extensions this demo uses:
- Turbo mode: the U64 exposes CPU speed control through registers at $D030 and $D031. With firmware set to "U64 Turbo Registers," the CPU can be stepped from 1 MHz up to 64 MHz (on the Elite-II) in 16 speed increments by writing to $D031. The demo uses all 16 steps.
- 16 MB REU: the U64's built-in RAM Expansion Unit provides 16 MB of fast storage accessible via DMA. At 64 MHz the DMA transfer is nearly free -- this becomes important for the tunnel effect.
- Ultimate Audio: 7-channel DMA audio, with 8-bit PCM samples that can reside in REU or C64 RAM. The demo uses this to play a 4-channel ProTracker MOD file through the hardware mixer rather than the SID chip.
- UCI: the Ultimate Command Interface is the software API for the cartridge's USB filesystem, disk drive emulation, and hardware control. The demo uses UCI to load the MOD file from USB into REU at startup.
The most concrete illustration of what 64 MHz buys: the Mandelbrot fractal in scene 2 requires approximately 130 seconds to render at 1 MHz. At 64 MHz it completes in roughly 2 seconds. The same algorithm, the same C code, the same 6502 instruction stream -- just faster.
The Toolchain
Three tools formed the development loop:
Oscar64 is a C99/C++ cross-compiler for 6502 family processors, written by drmortalwombat. It produces compact, well-optimized machine code, supports inline assembly, interrupt qualifiers, fixed-point math via fixmath.h, and auto-compiles dependencies via #pragma compile chains. It is not a widely-documented platform -- online resources are limited to the compiler's own README and manual.
Claude Code is Anthropic's AI coding assistant, used here as the primary author for the C source. The working environment was WSL2 (Linux on Windows). Claude wrote, revised, and debugged the C code; the human role was to direct, test, and -- as described below -- intervene when things went wrong.
MCP server on the U64: an MCP (Model Context Protocol) server running in a local Docker container provides Claude Code with tools to interact directly with the U64 over Wi-Fi. The key tool for this project: ultimate_run_program, which deploys and launches a built PRG on the real hardware without any manual file transfer. The iteration cycle was: write C -> compile with Oscar64 -> run on hardware via MCP -> observe the result -> repeat. No emulator was involved; every test ran on the actual machine.
Building the Demo
Gears: Speed as Proof

The opening scene is a pair of meshing wireframe gears on a hires bitmap. The left gear (16 teeth) drives the right gear (8 teeth) at double speed in the opposite direction. The CPU speed ramps from 1 MHz to 64 MHz over 16 steps, one second per step, with a text band at the bottom showing the current speed. The gears visibly accelerate with each step. At 64 MHz they hold for 5 seconds before the scene ends.
The key technique is XOR animation. Drawing a line or circle in XOR mode and then drawing at exactly the same coordinates again erases it -- no bitmap clear required. Each frame: erase the previous gear positions by redrawing them (XOR), advance the angles, draw at the new positions. The Oscar64 bitmap library supports XOR drawing natively.
Early versions used memset to clear the bitmap between frames. At high turbo speeds the CPU writes to the bitmap while the VIC-II is actively reading it to generate the display signal, producing visible glitches -- the right gear would disappear. XOR animation eliminates the clear and eliminates the race.
Mandelbrot: The 64 MHz Payoff

Scene 2 renders a full Mandelbrot set in multicolor bitmap mode (160x200 pixels, 32 iterations maximum) into the seahorse valley region. The colorization uses a 4x4 palette matrix keyed to the cell's content type (interior vs exterior pixel ratio) and its diagonal zone (a gentle gradient across the frame).
Oscar64 is compiled with -dNOFLOAT to save code space, so all math uses fixed-point arithmetic via Oscar64's fixmath.h library -- 4.12 format, where 4096 represents 1.0. The key routines (lsqr4f12s, lmul4f12s) are native 6502 assembly and run approximately 8 times faster than equivalent C at the same clock frequency. At 64 MHz the render completes in roughly 2 seconds. The 130-second equivalent at 1 MHz is the concrete argument for why turbo mode matters.
Ball: When Geometry Goes Wrong
Scene 3 is an Amiga-style shaded 3D ball bouncing on a rotating wireframe floor grid. The ball is drawn in bitmap mode with three concentric Bresenham rings at different radii (28, 19, 10 pixels) and fill patterns (dark, mid, bright) giving a basic radial gradient. An elliptical shadow sits on the grid plane below.
The floor is a 5x5 point grid with Y-axis rotation using a 64-entry integer sine table, with perspective projection.
This scene produced the longest debugging session of the project. The geometry for the floor normals and the near-plane clipping was wrong in ways that caused visible glitches and occasional crashes -- the grid lines would collapse or fly off to impossible screen positions. Claude's approach was to adjust coefficients and try again, then adjust them again. The adjustments were not wrong, exactly, but each iteration addressed a symptom rather than the underlying coordinate system error. Several sessions were spent in this loop.
The resolution came when the approach changed: rather than asking Claude to fix the geometry, the assembly output (build/udemo2026.asm) was read to understand what the compiler had actually emitted, and the coordinate transformation sequence was traced by hand. Once the root cause was identified -- a sign error in the Y-axis rotation matrix applied before perspective -- the fix was a single line change. Claude implemented it immediately and correctly.
Tunnel: REU as a Data Bus

Scene 6 is a texture-mapped 3D tunnel: per-pixel polar coordinates (angle and distance from center) mapped through a repeating brick texture, with frame-by-frame animation of the tunnel depth and a lateral camera sway via a 64-step sine table.
The lookup tables -- per-pixel angle (100 rows x 160 bytes) and per-pixel distance (100 rows x 160 bytes), 16 KB combined -- exceeded the available BSS budget in the main code region ($0A00-$C000, essentially full by this point). The solution was to store both tables in REU at a fixed offset ($200000), compute each row into a 160-byte buffer during init and DMA-store it, then DMA-fetch one row at a time during rendering. At 64 MHz the DMA transfer costs a few microseconds per row -- effectively free.
The first hardware test came back with feedback: the tunnel moved too fast, and lacked lateral movement. This was a genuine U64-specific constraint that Claude had no way to evaluate from code alone -- only real hardware at real speed reveals it. The fix was to reduce the frame advance rates (t_ang+=1, t_dist+=1 instead of +=3/+=2) and add the lateral sine sway. The combined 160-byte fetch per row (replacing two 80-byte fetches) was added at the same time, halving the DMA call count.
The second hardware test: confirmed working.
Scroller: The Low Point

The final scene before the end screen is a hardware fine-scroll sinus scroller: text scrolls left using the $D016 pixel-scroll register, each character displaced vertically by a sine table entry, producing a wave through the message. The background is a plasma effect. ProTracker music plays throughout.
On paper it is the most conventional of the scenes -- fine-scroll is a well-understood C64 technique. In practice, this scene consumed more time than any other.
The implementation was close to working. The scroll moved, the sine wave was visible, but the display was corrupted every few seconds -- the screen would partially clear or characters would appear in the wrong positions. Claude made a series of changes: adjusted the fine-scroll timing, changed the plasma phase counters, restructured the frame loop. Each change produced a different manifestation of the corruption, but none fixed it. A second session continued in the same direction.
At some point during this, the working state was not committed. A change that looked minor -- reorganizing the scroller's local variable layout -- broke everything in a way that was not obviously connected to the corruption bug. Claude attempted to fix this too, and in doing so introduced further divergence from the last known-good state. By the start of the third session, rate limits had been reached and the scroller code was further from working than it had been two sessions earlier.
The decision: discard everything back to the last committed state and start the scroller from scratch.
In the fresh implementation, the root cause was identified before writing any code. The modplay IRQ handler -- which fires on every CIA1 timer tick during music playback -- was not saving and restoring zero page locations $52-$57. Oscar64 uses those locations for the scroller's local variable storage. Every music tick overwrote the scroller's working state mid-frame. The corruption was not in the scroller logic at all.
The first step in the fresh implementation was to call modplay_stop() before scroller_run(), taking the IRQ out of the picture entirely so the scroller could be confirmed working in isolation. It worked on the first hardware test. With that confirmed, the music was brought back in -- and ran through the scroller with no issues. In the final demo, music plays continuously from the Mandelbrot scene through to the end screen, where it stops.
The days spent in the dead-end branch were lost. The scroller that ships is the one written from a clean state with a diagnosed root cause.
Working with AI: What Actually Happened
Five things became clear across these sessions that are worth stating explicitly for anyone considering a similar approach.
AI assumes everything you ask is achievable. There is no signal from the model when a request is beyond what it can reliably do on unfamiliar ground. It will attempt the implementation, and the attempt will look plausible, and it may not work. Oscar64 and the Ultimate 64 hardware are not well-represented in training data -- Claude would construct approaches from general C and 6502 principles that were close but subtly wrong in hardware-specific ways. The absence of pushback is not confirmation that the approach is correct.
When it goes in circles, you have to force the pivot. The ball scene geometry debugging is the clearest example. Claude was iterating on coefficient adjustments -- not wrong as a strategy, but operating at the wrong level. The right intervention was to stop that loop, specify a different approach (read the assembly output, trace the coordinate transform by hand), and re-engage Claude on the concrete fix once the root cause was known. Left to continue, the iteration would have continued.
You sometimes have to diagnose the root cause yourself. Three of the five significant bugs in this project (the petscii.h charmap killing MOD detection, the ZP clobber in the scroller, the mmap_trampoline omission before MMAP_NO_ROM) were resolved only after the human identified what the root cause actually was. Claude was addressing symptoms in each case. Handing Claude a specific diagnosis ("zero page $52-$57 is being overwritten by the modplay IRQ") produced an immediate and correct fix. Asking Claude to fix "the scroller corruption" did not.
Unknown platform means token burn. Oscar64 is a capable compiler but it is not documented at the level of cc65 or SDCC. U64 hardware registers are documented, but the interactions between them -- the mmap_trampoline requirement, the CIA1 IRQ chain behavior, the precise sequence needed to set up Ultimate Audio -- are not described in any single accessible source. Claude re-derived this information repeatedly, made guesses that had to be tested on hardware, and explored approaches that turned out to be wrong. Building up the project memory and documentation as the work progressed was the mitigation -- once something was confirmed working on hardware, it was recorded so it would not be re-litigated in the next session.
Always commit a working state. The scroller story is the full illustration of what happens when you do not. The half-working branch accumulated changes, broke, accumulated more changes to fix the break, and eventually diverged far enough from any recoverable point that the only option was to discard it entirely. The cost was days of work. The rule that came out of it is absolute: before any session, commit what is working. Before any non-trivial change, commit what is working. The AI can break things in ways that are not immediately obvious, and "undo the last change" does not always restore the previous state.
Verdict
A working multi-scene C64 demo targeting specific Ultimate 64 hardware capabilities, compiled with Oscar64, was built predominantly via AI collaboration across roughly three coding sessions. That is the proof of concept result.
The qualification is equally important: the human role was real throughout. Knowing when to stop an approach that is not converging, identifying root causes that the AI could not find, providing hardware feedback from the real machine, making architectural decisions about memory layout and scene sequencing -- none of this happened automatically. The AI wrote the code; the human directed, tested, and intervened.
The secondary aim was also met. The libraries built for this project -- turbo speed control, Ultimate Audio DMA, ProTracker MOD playback from REU via CIA IRQ, and UCI file I/O -- are documented and available in the repository. The documentation needed to build them had to be assembled from primary sources during development (firmware documentation, the UCI library source, ModPlayer_16k analysis). It now exists in a form that future projects can use directly.
The honest bottom line: the demo is something that I did not yet have the skills to build on my own. AI collaboration made it possible. The limitations described above are real, and they require active management -- but within those limits, the result speaks for itself.
The demo is available for download under the Files section of this site, and on the GitHub repository linked in the credits below.
Credits
Code: Xander Mol
Music: 4ev.mod ("Forever Young") -- artist unknown
UCI/DOS library: Scott Hutter and Francesco Sblendorio -- github.com/xlar54/ultimateii-dos-lib
MOD player: based on ModPlayer_16k by 6510nl / Freshness -- github.com/6510nl/48MHz
PETSCII font: Small Round PETSCII Font by Cupid
Compiler: Oscar64 by drmortalwombat -- github.com/drmortalwombat/oscar64
Ultimate 64 hardware: designed by Gideon Zweijtzer / Gideon's Logic -- ultimate64.com
Licensed under the GNU General Public License v3.0.
Sources and More Information
UltimateDemo2026 repository -- source code, libraries, build instructions, and all documentation manuals: github.com/xahmol/UltimateDemo2026
Turbo speed control library manual -- register reference, detection, speed API, and usage examples for U64 turbo mode: TURBOCONTROLMANUAL.md
Ultimate Audio DMA and MOD player manual -- 7-channel PCM DMA hardware reference, ProTracker MOD player API, IRQ chain setup, and REU integration: ULTIMATEAUDIOMANUAL.md
UCI library manual -- Ultimate Command Interface protocol, file I/O, directory navigation, disk mounting, and REU transfer API: UCILIBMANUAL.md
Oscar64 compiler -- C99/C++ cross-compiler for 6502, by drmortalwombat: github.com/drmortalwombat/oscar64
Oscar64 tutorial series -- worked examples covering bitmap drawing, XOR animation, sprites, raster effects, and more: github.com/drmortalwombat/OscarTutorials
Ultimate II+ Command Library -- UCI/DOS interface library by Scott Hutter and Francesco Sblendorio: github.com/xlar54/ultimateii-dos-lib
ModPlayer_16k -- ProTracker MOD player for C64 by 6510nl, the base for this project's modplay library: github.com/6510nl/48MHz
Ultimate 64 firmware documentation -- firmware releases and update instructions: ultimate64.com/Firmware
Ultimate 64 turbo mode documentation -- register reference for $D030/$D031: 1541u-documentation.readthedocs.io -- turbo mode
Ultimate 64 documentation project -- community-maintained reference for U64 and UII+ features: 1541u-documentation.readthedocs.io
"An Introduction to Programming C-64 Demos" by Puterman (Linus Akerlund), Public Domain -- a thorough reference covering raster IRQ, bitmap modes, effects, and optimization for demo coding: antimon.org/code/Linus -- also available as a reStructuredText conversion on GitHub: github.com/pstankiewicz/introduction-c64-demo-programming
C64 hardware and platform references:
- Ultimate Commodore 64 Reference by Michael Steil -- memory map, KERNAL API, and ROM disassembly in machine-readable form: pagetable.com/c64ref (source: github.com/mist64/c64ref)
- Commodore 64 Programmer's Reference Guide (Commodore, 1982) -- the original hardware and BASIC reference manual: archive.org/details/c64-programmer-ref
- C64 intern (Data Becker) -- in-depth hardware internals reference; no stable online edition, available via retro computing libraries and archives
- C64-Wiki -- community encyclopedia covering hardware, software, and programming: c64-wiki.com
- Codebase64 -- programming articles, source code, and hardware documentation contributed by the C64 community: codebase64.net