frooxius

2DWPU - 2D experimental CPU architecture (FrooxArchive #7)

Added 2022-07-16 18:01:02 +0000 UTC

Hello everyone, and welcome to another entry of the FrooxArchive!

Last week, I showed you the first of my experimental processor architectures dubbed “WPU” - weird processing units that I made around 2010-2012. This week, I have the one I’m most proud of - 2DWPU.

GitHub here: https://github.com/Frooxius/2DWPU

The basic idea of this one was that instead of the instructions being a linear sequence, they are instead spatially arranged in a 2D grid, with each instruction referring to the neighboring ones. For example an ADD instruction will refer to two neighboring instructions for the source of its two arguments to be added together - the execution is also recursive.

Since this execution model is a bigger departure from the classical program flow, it presents a number of new challenges, but also interesting benefits.

When coding, the structure of your program has to be quite different - some parts almost reversed. However this can actually be a bit helpful, because when converting your algorithm to instructions, you decompose it from top down, rather than bottom up. This is in contrast to traditional assembly programming, where you compose bigger algorithms from smaller pieces first and once you get used to it, it can provide a better mental model.

It also allowed for an interesting new “type” of instruction - bi-directional instructions, that perform different actions depending whether it’s being entered, or exited. My favorite was instruction that pushes value onto the stack and pops it on entry. When programming in assembly and calling a subroutine, you have to remember to do this manually, but having a program structured this way allows for this to be implicit, eliminating a potential source of mistakes.

Structuring programs this way presented another interesting benefit - it allowed for implicit parallelization at instruction level. This was the major benefit of this architecture that I decided to explore.

Schematic of the 2DWPU with 3 cores.

Since most instructions branch into multiple other instructions (that further branch into others), it offered a simple and clear way for multiple cores to pick up each of the branches that needed to be processed and run them in parallel.

To test this idea, I implemented not only a software simulator of this architecture, but a hardware version on an FPGA as well. The idea actually bore fruit - adding more cores to an existing algorithm helped improve the performance and lower the execution time.

Results of simulation of the architecture, showing how many instruction cycles were needed to complete the program with increasing number of cores.

Having a version of this architecture running on the FPGA hardware allowed for a pretty powerful demonstration - I synthesized an 11 core version and mapped the individual cores for physical switches on the board - enabling/disabling the cores on the fly (or more specifically the signals for them to take up the jobs) made the simulation on the display visibly speed up and down.

Of course, this wasn’t pure magic and adding more and more cores lead to diminishing returns. Some algorithms and parts of them would also parallelize better than others. There was also quite some performance left on the table - the way I implemented this mechanism, the cores would often get “stuck”, waiting for their work to get merged back. In some cases the cores would not parallelize either when they should.

Analysis of number of instructions executed over time of the program - the first part of the program clears the memory/display, which is highly linear process and doesn't auto-parallelize well. The latter part where simulation itself occurs however parallelizes much better.

There were a lot of questions left unanswered and unexplored too, like what is the performance impact of the querying instructions and structuring program flow recursively, how would the architecture get pipelined and so on, what benefits would be gained by compiling higher level languages like C into 2D assembly.

The programming itself presented a number of challenges too - oftentimes when coding examples, I’d find myself having to worry about where to lay out the instructions, so they can actually reach each other, adding an unexpected complexity.

Old video of testing an app on the FPGA version of 2DWPU: https://youtu.be/De6e_PbzykM

I moved onto other projects before I got to tackling those problems. It might be that a lot of these things would lead to a dead end. Or maybe more interesting discoveries. Regardless, exploring parts that I did was a lot of fun and the process of building this architecture taught me a lot.

Building architecture like this required separating the project into several layers of abstraction to keep it manageable. It was really important when developing the hardware version for an FPGA - debugging those is notably challenging, because unlike a program, everything is happening “all at once” in the circuit. Having clearly separated modules and roles allowed me to test them piece by piece, rather than getting overwhelmed by the sheer complexity.

Screenshot from debugging session of the hardware version

I applied the same principles for a number of my future projects too, which let me design and implement high-complexity software in a manageable and scalable manner. Even though I dropped this project itself long ago, its impact on my work remains to this day.

It was this architecture that I also presented at Intel ISEF 2012 - an international science competition in the USA (held in Pittsburgh when I participated) and even managed to snag a 4th grand award for it in the Computer Science category.

The whole whirlwind of the experience - getting to the US for the first time through my work, was one of the most transformative experiences of my life and set me on the path that I am now - building my own projects and following my passions and dreams.

It also taught me that it takes a lot of effort - for the projects that got people’s attention, I’ve done a number of them that didn’t. But with each project, I’ve gained something invaluable: experience and new knowledge and skills. Specific projects may come and go, but as time goes, you build bigger and better ones, incorporating ideas, experience, lessons and other know-how from all previous ones, sometimes even reusing and repurposing stuff you’ve built in the past. Nothing gets ever truly lost.

As with the previous project, I’ve included the assembler and simulator. Unfortunately I’ve left the project in a weird state 10 years ago and the simulator will crash on a number of the examples and I’m not entirely sure why. Some of them seem to use older syntax of 2D Assembly that I changed over time.

I’ve tried looking into it (which is partly why this post came later this week, sorry!), but solving it would require a lot more time. Some of the examples still work though, so hopefully they’ll be good enough to play with this a bit.

However, if anyone would like to poke at it, I’ve published the sources on GitHub, including the hardware implementation for the FPGA! The sources are mostly provided as-is, in their original state, but perhaps some of you will have fun with them. If you’d like to fix them up, feel free to submit a PR!

Anyway, this is all for now (at least that I can remember off the top of my head), hope you enjoyed the post and see you with the next one!