Hi! For sharing more information about the SA-1 enhancement chip, what are the expectations for the following years and the possibilities of the co-processor, I'm making a small series of articles talking about the current state of the SA-1 ROM hacking. If you have any suggestions about this series, the comments section is open for everyone!
The Super Accelerator One ("SA-1") chip is one of the enhancement chips included in some SNES games. It's a chip being currently used for a lot of ROM hacks, because of the 10.74 MHz CPU speed and the similar architecture (65c816) of the SNES CPU, being one of the most effective ways for reducing game slowdown and writing advanced effects and interactions, allowing to the SNES PPU chip to be explored fully by writing real time HDMA code or use the full potential of the SNES OAM (sprites).

Another interesting thing is that the chip allows for quicker development cycle, since it allows to you code more flexible solutions by not hard coding everything and restricting creativity when it comes to the SNES hardware - this is a common practice in the industry for reducing development time and delivering 'value' faster, this is one of the reason that we have some SNES games released with the SA-1 chip but it felt like the chip did absolutely nothing "special" - the main reason for using the chip was actually letting the developers produce more in less time.
Speed and memory capabilities:
The SA-1 CPU has a base speed of 10.74 MHz.
It includes an internal RAM of 2 kB, which runs at 10.74 MHz.
The backup RAM (or extra RAM) is called BW-RAM and can be up to 256 KB. It runs at 5.37 MHz.
It's not LoROM, nor HiROM. The SA-1 ROM map includes both memory layouts with different implications in the $00-$3F and $80-$FF regions. The Super MMC allows for setting up different configurations depending on what you want to achieve. You can map up to 8 MB of ROM using SA-1 without additional circuit.

Multi-processor processing:
The SA-1 is an additional CPU, which runs at the same time as the SNES CPU. This means that you can have one routine running at the SNES CPU and another with the SA-1 CPU. The codes can be executed independently, they can interact with each other and it has different implications when it comes to synchronization.

The SA-1 memory controller can deal with simultaneous access from both SNES and SA-1 CPU. Because it has fully control of the SA-1 CPU, when the SNES attempts to read or write the same device (ROM, BW-RAM or I-RAM), the memory controller pauses the SA-1 chip (i.e. sets the CPU enable flag is set to 'false' on the current clock cycle) in a manner that the SA-1 will wait one more cycle before doing the read or write operation over the same device. That way, you can have code that frequently accesses ROM or RAM without risk of losing data integrity or rely on manual handshakes or semaphores which reduces drastically the performance on one of the chips.

Hardware capabilities:
Custom DMA circuit capable for transferring data between ROM, I-RAM and BW-RAM faster compared to SNES general-purpose DMA circuit.
Additional DMA mode that allows for real time character conversion, by converting bitmap structure data on BW-RAM to the SNES PPU image format "on the fly" to the SNES DMA controller by using clock synchonization. One of the most impressive features for sure which makes it possible to manipulate pictures way faster compared with the sole SNES CPU.
Bit-streaming mode from ROM that allows you to read small amount of bits per time, being useful for decoding complex compression structures at a much faster speed compared to manually decoding bit packets.
Powerful mathematical operations, including signed 32-bit multiplication, 16-bit signed division and 40-bit cumulative sum (for matrices calculation).

SA-1 is not a silver bullet. You don't need it to make a cool, fast-paced game. There is a lot of things that can be done with the 3.58 MHz (FastROM) SNES CPU, specially if you are writing a game from scratch (nowadays we know better ways to explore the SNES architecture better when writing new games).
Most of the videos and demonstrations, when you see it has the SA-1 title into it, the things you see can also be done with the SNES CPU. The point is that while most (if not all) things can be done with the SNES CPU, you probably would not see it happening if it wasn't for the SA-1 chip (or other enhancement chips, really). The chip just allows you to do things that you probably thought it wasn't possible in the regular CPU, either for speed restrictions or data restrictions.
The flexibility is important. While you can have a very fancy technical demo showing 3D polygons and impressive image and sound, it may be impractical to make these flexible and modular enough to put them directly on a game. That's normal. Either you have to write a lot of hardcoded code and macros (which takes a lot of time and dedication) for some levels or you have to deal with code architecture limitations that will affect the gameplay of your game.

You can have a bullet bill shooter on the Super Nintendo with the regular CPU without problem, but you will probably have trouble keeping so many bullets of different patterns at the same time without using the SA-1 chip. Picture is from Touhou Mario 2 - each bullet (flame) has independent speed, acceleration, angle and animation - and there's up to 120 of them on screen depending of the phrase. Video is available here.
For the case you already have a released, commercial game, without source code available and you want to improve the game somehow (either by making it faster, more animated or smoother), some techniques can be used. Normally these includes:
Optimizing the existing code and structures
Converting the game's ROM addresses to use the FastROM format
Converting the game's ROM and RAM memory model to use the SA-1 chip
Rewriting the game from scratch
All of them combined
There are advantages and disadvantages of each method. More details about each method (and including alternatives like Super FX chip) will be discussed on a future post.
The process of converting a game that previously used only the SNES CPU, either in SlowROM or FastROM is one of the most cool ways of introducing the SA-1 chip, since the game, specially the ones that struggles in keeping a good speed pace for whatever reason, to run much more smoothly over the original hardware. If properly converted, it's possible to play those ROM hacks on the real hardware and on the real SA-1 chip too.
When you decide to use the SA-1 chip, it means you are essentially partially rewriting the game to use a similar processor, with similar architecture but with different memory and hardware context. If you already have the original game source code, the process can be quite simple, but if you only have the ROM image, the process is complex and requires a complete conversion from the start to the end, since once converting a memory map to the SA-1 format, you must convert everything to use the new memory map. For example, while FastROM will work fine with SlowROM addresses, making it possible to partially convert a game to use FastROM, using SA-1 Pack's technique it's either all or nothing.
Note that this is different than just activating the SA-1 chip. Actually, it's possible to you simply put the game ROM on a SA-1 chip and most of them will load fine (specially LoROM games up to 2MB). But the SA-1 chip will not make your game fast by simply enabling the chip - remember that the SA-1 CPU is a different processor running at the same time as the SNES CPU? So you have to move the existing code to the SA-1 CPU to you actually get the benefits of using the SA-1 chip.
Back in 2011, I made the very first experiments of using the SA-1 chip on Super Mario World. I knew that activating the chip alone would not give any result at all, because the code still runs on the SNES CPU. So instead I started studying where I can use the SA-1 chip to make specific tasks such as graphics manipulation and background processing:
Graphics rotation algorithm: SA-1 Rotating in SMW Take #1 | Super Mario World Custom Boss - Giant N'ball(?) V2
Snow falling algorithm: Overworld Snowfall - Final (or not)
2D graphics drawing: SA-1 Drawing Mode in SMW - Just LoL | SA-1 Triangles
However, after receiving feedback and insights from experienced SNES ROM hackers back in the day about moving part of the game memory to the SA-1 BW-RAM, I started experimenting with patches that would change every single opcode to map them into a different position. In other words, all opcodes like LDA $00 [DP=$0000], STA $0D9F, INC $13CC, STA $14C8+x, STA $00+x [X = $1200] would get changed into LDA $00 [DP=$6000], STA $6D9F, INC $73CC, STA $74C8+x, STA $00+x [X = $7200]. But to that work, I would need to change all opcode instances, without exceptions. Missing a single opcode would be enough to cause problems in the game. Missing any layer of indirection would also cause problems (likely a game crash) in the game. So that's how "The Last Remapper" appeared.

Picture - this remap file has has around 30,000 lines
The Last Remapper or TLR, is a script I coded for the sole purpose of scanning the entire Super Mario World disassembly (initially mikeyk's diassembly - the first public known diassembly - and later p4plus2's disassembly after figuring out there were some opcode errors on the mikeyk's). The script checked if the 7 MB big disassembly had any opcode that would fall in range $0000-$1FFF (which leads to $7E:0000-$7E:1FFF) and then create a patch that changed the range to $6000-$7FFF (which leads to $40:0000-$40:1FFF on SA-1). Later, I had to figure out on my own if there was any indirect references, either by databank (by setting to $7E) or by index (X or Y in 16-bit mode).
After testing a few times, the game had some bugs but apparently it was playable. In Super Mario World, what cause slowdown most of the times, specially in ROM hacks is the high amount of sprites. So after creating the "TLR" patch, I coded the "boost" patch, which basically made the sprite processing routine run on the SA-1 CPU instead of the SNES CPU. The results were shared in two videos. One had a level played on the SA-1 chip and another the same level was played on the slower CPU. The differences were very clear and at that time, FastROM was already a common technique but it barely allowed for additional 1-2 sprites on screen. The SA-1 chip and TLR changed everything.
After fixing a lot of bugs and including the SA-1 activator + remap files + boost patches in a single 'package', a pack of patches, the first version of SMW SA-1 Pack was released in 2012. Adoption took a while, the patch simply broke compatibility with all existing tools and patches, including Lunar Magic, the main Super Mario World level editor.

To demonstrate the SA-1 chip potential, I worked with Wakana in Touhou Mario, a SMW ROM hack which puts Mario in the world of Touhou 8 - Imperishable Night. Because Touhou is a bullet bill game, it was the perfect ROM hack to use the SA-1 chip, since naturally the amount of enemies on screen would be higher than the normal. So Touhou Mario was the first SMW ROM hack to use the SA-1 chip and it was a great success. You can download and play Touhou Mario here, note that because of it uses one of the earliest SA-1 Pack versions (v1.03, if I'm not mistaken), it might not work correctly on real hardware. Full playthough is available too on YouTube.
SA-1 Pack eventually become one of the most popular patches on SMW Central and it continuously received improvements for enhancing the existing Super Mario World game architecture for more powerful ROM hacks. The method is very aggressive, since it involves rewriting the game structure completely, but it's the most effective way to make a run 100% on the SA-1 chip.
The process of converting a game to use the SA-1 chip fully, involves the following steps:
Game disassembly. You have to take the ROM image and start looking for where is code and where is data. The code must the disassembled correctly, given the different 65c816 CPU operations (16-bit A, 16-bit XY, 16-bit AXY, 8-bit variations, etc.), when the code section starts and when the code section ends. Figure out where the jump tables comes off so you will not accidentally mark indirect referenced code as data block.
Understand the game structure. A correct disassembly is when you properly separate data and code, plus code context. The next step involves figuring out how the data structure works, in a manner that you know what is graphics, what is music, how the levels or maps are loaded, how are the enemies indexed and stored on the image and how they interact with each other. You also have to read all code and figure out the context (if there are indirect references and how they are assembled, specially implicit ones), the memory management (how are the e.g. enemies or structures loaded into memory, how they are allocated, what are the regions used for) and the I/O operations (when the CPU talks with the SNES PPU, SNES APU, the joypads and internal CPU regions).
Create SA-1 memory model. The SA-1 CPU can't read or write memory from WRAM, so you have to move to I-RAM or BW-RAM. Because of that, you have to repoint all RAM references, including dynamic, direct page, indirect and implicit (via XY registers - pagination) to the BW- or I-RAM blocks. Figure out if there is specific regions that needs to be moved for I-RAM for additional speed and if there is static regions that can stay on the WRAM (such as decompressed graphics), saving BW-RAM space.
Apply the SA-1 memory model. Most of the times, you will miss indirect references since you can only figure them out by playing the game. In Super R-Type stage six there were robots which starts flying up if you don't defeat them on ground after some seconds. The flying had an indirect reference that caused the game to crash. These things you can only figure out by testing manually extensively before releasing the patch. It's a trial and error process until the game properly works with the new memory model.
Move I/O operations to the SNES CPU. The SA-1 CPU can't access internal parts of the SNES console. This means that the SNES APU, the SNES PPU and joypads access must be done by the SNES CPU. This includes DMA and HDMA. For doing that, you have to look on all places that it has access to those registers and edit them to the SA-1 CPU call the SNES CPU before doing these. Extra care need to be done to the SNES CPU don't end up executing the rest of the game alone, so once that specific part is done, it must be sent back to the SA-1 CPU to continue processing. In addition, making sure that the SNES CPU won't end up accidentally calling itself in recursion (for example, a process that is already on SNES CPU and then it goes to a sub-routine that tries calling the SNES CPU again), since that does not work.
Change mathematical operations to use the SA-1 registers. Multiplication and division are tasks done by registers and SA-1 can't do multiplication/division on SNES registers and vice-versa. They are also incompatible, with SA-1 registers having different and operation modes that you must keep it in mind when converting.
Everything is done by testing extensively and by repeating the process when there's errors. There is no tool that can make the process automatic, even with extensive research, because of the many specific things when it comes to game. Each game is unique, there is no standard "data structure" or "file system" like FAT or NTFS, there is no design patterns which you can easily replace into an equivalent code. You have to take in account each game individual structure and figure out on your own how to structure one that will work with the SA-1 chip.
Super Metroid is a 3 MB game, which made this process exponentially more complex than normal. This is the main reason for it taking so much time. I have collected many insights since the beginning of the development which I would like to apply for future SA-1 works, specially ones that the ROM is smaller (up to 1 MB) to figure out if there is any way to make the process simpler, if there is any way to remove the repetitive amount of work.
I have put an ETA for August, which I believe I should at least have a beta version running.
Once the project is done, I have plans for a new widescreen game, a new SA-1 game (with ROM size being <= 1 MB) and most importantly the Star Fox Super FX-2 project.
Hope you had a good read! In the next post I would like to discuss more about the incoming SA-1 projects, including things like Dual ROM system, improved SA-1 Pack remapping process and more. Support is appreciated, the amount of work needed for everything is big and there is still a lot of cool projects to be worked on for the next months and years.
Moomoo
2022-06-30 06:18:32 +0000 UTCVitor
2022-06-12 19:47:48 +0000 UTCMoomoo
2022-06-09 08:54:03 +0000 UTCVitor
2022-06-07 11:19:53 +0000 UTCMatt Hargett
2022-06-07 03:06:12 +0000 UTCVitor
2022-06-06 02:43:21 +0000 UTCTim
2022-06-06 02:17:58 +0000 UTCMatt Hargett
2022-06-06 00:27:45 +0000 UTC