I’ve been working as a software engineer for over 8 years at this point, and admittedly I’ve never understood how computers actually work. So I figured I’d try to learn how they work by emulating one. Sorry Ben Eater, I’m not going to build one just yet.
I spent hundreds of hours as a kid catching Pokémon, so the Game Boy was the perfect candidate: real hardware, relatively simple in scope, and something with a strong personal connection.
Instead of jumping straight into it, I first did From NAND to Tetris. It was a great course, and it made me really understand the fundamentals of computers, like registers, memory, and the ALU. Then to get used to building an emulator, I built a CHIP-8 emulator in F#: Fip-8.
A few months later, and after many nights of going to bed at 2 AM even though I told myself I’d only work on it for an hour or two, I have a working Game Boy emulator: Fame Boy. Complete with sound and runs on desktop and web.
I wanted to have the emulator work on both desktop and web, so I focused on having a simple interface between the emulator core and whatever frontend is running it.
The interface between the frontends and core is essentially just two arrays and two functions:
I tried to model Fame Boy in a similar way to the actual hardware of the Game Boy.
The CPU, like the real Sharp LR35902 in the Game Boy, knows nothing about the hardware except a memory map (and the IoController for interrupt signals only). It’s also the most F#-ish part of my codebase, leaning heavily into functional domain modelling.
Memory.fs holds most of the RAM used in the Game Boy, and acts as the memory map and bus between the CPU, IO Controller, and cartridge. It also shares a reference to the same VRAM and OAM RAM arrays with the PPU for performance.
IoController.fs emerged when I found myself adding too much logic to Memory.fs. While a singular IO controller doesn’t exist in the Game Boy hardware, handling all the hardware registers through it simplified and added safety to the interfaces for the individual components.
The stepper function in Emulator.fs is the glue
that brings the whole emulator together, composing all the components’
individual step functions:
While the real hardware components all run in parallel based on a central master oscillator, my emulator is single threaded and so the components have to run in sequence. The stepper function centralises the execution and ensures that all the components are synchronised.
Lastly, for the emulator to be playable it needs to run at the correct number of cycles per second, around 17500 CPU-cycles per 60 FPS frame. The frontends use the audio sampling rate to drive the emulator when the sound is on, and the frame rate to drive the emulator when it’s muted. More on that later.
First of all, I’d like to apologise to the functional programming
purists. While my CHIP-8 emulator is
completely pure (no mutable members and all arrays are
copied for none of that side-effect nonsense), Fame Boy uses mutability
liberally. The Game Boy runs a lot faster than the CHIP-8, and
copying 16+ kB of memory a million times every second didn’t seem like
the smart thing to do.
So, why F# for Fame Boy? Firstly, I think its extensive typing system works really well for modelling CPU instructions. Secondly, and more importantly, I just really like F#. I used to work primarily in F# at my previous company, and so I’m always looking for an excuse to keep on using it.
As an example why I think the CPU modelling works well in F#, I was following Gekkio’s Complete Technical Reference when implementing the CPU. I grouped the instructions like the reference, and ended up with something like this in Instructions.fs:
And it wasn’t just the load instructions. A lot of the other instructions shared similar concepts, like the location of the instruction’s operand:
Even though this is a small domain and most Game Boy devs know the opcodes/instructions basically as is, I felt like it could be neatened up. The code below shows the extraction of the location concept. The code uses different names and order from the source code to make the load instruction more readable for anyone not familiar with F#’s DU.
This helped reduce the CPU instructions down from 512 opcodes to just 58 instructions. Generalising a domain like this risks allowing invalid states, but using a good type system can avoid them.
For example, if I had used a location type, Loc, instead
of the From and To types, this instruction
would compile without any complaints:
Load(Loc.Direct D, Loc.Immediate) (storing a register to
the immediate value). The Game Boy’s hardware (its domain) doesn’t
support writing to the immediate value, so the domain would contain an
illegal state.
By using the F# type system to model the domain correctly, you get a guarantee that illegal states can’t be expressed in your system. You don’t necessarily need unit tests, it just won’t compile. So with simplified types Fame Boy still captures exactly what the Game Boy’s CPU supports and nothing more (with one cheeky exception).
Now the eagle-eyed Game Boy emulator devs would say to me “hey Nick, but what about the opcode 0x76?”, and I would reply “A monad is a monoid in the category of endofunctors” to show that I’m using a functional programming language and therefore smarter than them.
Joking aside though, it’s a compromise I decided on because I felt it
simplified the CPU domain a lot. If you look at the patterns that the
opcodes follow, 0x76 would be
Load(From.Indirect, To.Indirect), or load the 8-bit value
from the memory at location HL to the memory at location HL. My
emulator’s typing allows it, but the instruction doesn’t actually exist
on the Game Boy. Logically, it’s a NOP and not dangerous, and in
practice it’s unreachable since the opcode reader decodes
0x76 as HALT. But it’s a notable blemish in
what I think is otherwise a decent domain model.
Now you can do something similar in most languages, but if you’ve
worked with a functional language it’s hard to properly describe how
smooth it feels working with these types. After using a
match statement or Options in F#, going back to a
switch statement feels clunky and prone to mistakes. For
anyone who hasn’t worked with a functional programming language I’d
recommend you go out and try one.
Since the goal of this project was to learn about computer hardware rather than building the best emulator, I almost never looked at other emulators’ code in depth. However, while casually browsing the source code for CAMLBOY, I spotted lines like this:
I really liked that you could pass in however many flags you wanted, in any order. And since they are just named parameters for the method, the performance overhead is minimal.
But I couldn’t create something exactly like it because F# avoids method overloading and default parameters due to its type system supporting partial application. Instead I settled on something like this:
It never sat well with me, it required an array and a flag type
(e.g. Half). But I carried on anyway as I wanted to make
progress. As I got near the end, I spent a lot of time revisiting my old
code and refactoring, and wanted to try and improve the setFlags
function. So after a lot of mulling over and trying out other
approaches, I ended up with this (Cpu/State.fs L81):
The functions are exactly what I wanted. Effortlessly composable and testable, just simple pure functions. Chef’s kiss.
The previous function required hoisting the values into DU types and putting them in an array, and the setFlags function was more verbose as a result. Furthermore, because the functions are inline and don’t require any heap allocations, the new functions are actually much more performant, they increased the emulator’s FPS by about 10%.