Writing a Compiler from Scratch - Part 1

EDIT: Whoops, wrote the wrong course number (CMPT 376), corrected below. I'm taking CMPT 376, Writing for Computer Scientists, this semester, must have gotten myself mixed up :)

Ok, I'm going to try something here, and I don't know if I'll manage to finish it. My all-time favorite class in my school career (so far) has been CMPT 379 - Compiler Design. The class basically consisted of me writing a compiler in Visual C++, with weekly lectures and code reviews from the professor (it was a small class). Since then, I've been trying to think of a reason to write a compiler (i.e. a language that can be ported to .Net, or a useful domain-specific language, etc.). Then, this morning, as I walked back to the Computing Science common room from the Gym (after my morning workout), I found a reason. I think compilers are some of the most interesting pieces of software, and I thought that if I wrote a series of blog posts in which I developed a compiler, maybe I could share that passion with others.

So, here's the plan: I've designed an, extremely simple, language that I'm going to walk through writing a compiler for. The initial version will target .Net IL code (since it's a stack machine system, which is much easier to generate code for). Once I've implemented the simple language, I think it'd be awesome if my readers could jump in and propose some ideas for new language features, and I'll try to blog about implementing them.

To be honest, this project may well fall flat on its face. Writing a compiler is not a simple task, no matter how simple the language. However, I do think it will be fun (while it lasts), so let's give it a try!

So, step one is: Define your language! The most essential part of compiler design is having a very clear idea of the purpose of your language. The language I'm going to design is called "Duh"?, because it's probably about as simple as it gets (maybe a little more complex than Brainf**k (NOTE: link target does include a few four-letter words :D)). Here's a sample Duh program

print "Enter your age: ";
ageInput : string;
age : int;
ageInput = readline;
age = ageInput -> int;
print "In 10 years you will be " + (age + 10) -> string + ". Wow, that's old"

Pretty simple, eh? Line 1 is a simple print statement, which accepts a string and write it to the console. Lines 2 and 3 declare variables of string and 32-bit integer types (decided to use a more Pascal-esque variable declaration format, just for fun). Lines 4 and 5 assign values to those variables. Line 4 uses another built-in function, readline, which reads a line of text from the console. Note that we don't even support initializing variables in the declaration! That's ok though, this is just a toy language, we can add initializations later. Line 5 introduces the "->" conversion operator, which takes the value on the left and converts it to the type on the right. It's not quite the same as a cast, because it will try to convert the value, if it can. Finally, we add 10 to the value the user entered and convert it to a string, then we place that string into a large message and print it to the console.

Well, here goes nothing! Feel free to post your initial comments. I hope you'll follow along! I'll be posting full source code with every post