programming grimoire [2]::naming things into existence

Having learned what a program is, it’s time we made one.

getting set up (introducing Rust)
1. the console
2. setup and compilation
naming things
semantics vs instructions
1. shadowing
2. mutable variables
what is a language for?
a dungeon game
second orb to ponder

getting set up (introducing Rust)

That’s probably enough preamble. Why don’t we actually do a little programming?

For this series, we will be using the Rust programming language. Instructions for installing it are here.

Although Rust has a very helpful and informative compiler, we also recommend having a good IDE with language server support. Like all programmers we have opinions. We currently use Zed, but for the love of Lain, please turn off AI in the settings. A traditional console-based option is Vim, or more recently Neovim; alternatively emacs. (In times past, the choice of vim or emacs was so fervently argued it was often called ‘religious debate’.) If you dislike Zed, Sublime is still pretty good and importantly very fast, but its language server support is less good. We won’t recommend anything that is built in Electron (VSCode etc.) even though most people use them. You might feel less strongly about it.

Speaking of religious debate, why Rust? We (canmoms) like Rust. It’s a language that, in our view, strikes a good balance. It is a compiled language, so you will make the actual sort of programs that run on your computer. It doesn’t have as many fiddly annoying aspects as older languages like C, but it will still give you a firm foundation to understand what both ‘high-‘ and ‘low-level’ languages are doing. For the sake of teaching, it is a very explicit language, and accords nicely with the ways that we like to think about programming. In particular, it has a very nice type system. It is also very fast to run (but relatively slow to compile).

Not everyone would agree. People get quite fervent about this. The main complaint people make about Rust is that it is overcomplicated; it also has a very strict model for handling memory which people sometimes find overly restrictive. This is one reason why suggesting starting with Rust, which is infamously quite difficult, is maybe a bit spicy. Later in this series, we’ll talk about some of the design tradeoffs that Rust makes, and compare other languages.

A great deal of this series will be conceptual enough that it will apply to any programming language. If we start talking about a specific Rust feature we will make sure to be clear about it.

This is not a complete guide to the syntax of Rust. For that, there is the Rust book. This is an excellent and well-written resource if you already know a little about programming, but it is quite heavy going.

the console

In the olden days, computers just gave you text-based ‘terminals’. You would type commands, and the results would be printed as text.

In modern times we still tend to use ‘terminal emulators’, also known as ‘consoles’, for programmy activities. The idea is the same: you type the name of a program, and some settings for it, and it will run.

On Linux, there are lots of options depending on your distro, desktop environment etc. but you can probably find yours just by searching for ‘console’ or ‘terminal’. On MacOS, it’s just called Terminal.

On Windows, you have the ‘command prompt’ which is old, and ‘PowerShell’ which is newer. Out of the box, powershell is a rather unfetching shade of blue. We recommend using scoop to install concfg to make it less painful to look at. With scoop you can get things feeling a little more comfortable.

On the layer just inside the window, there are lots of different “shells” which work in slightly different ways (bash, fish, powershell etc.) which determine what actual text you’ll see and what commands are available. The most important piece of information is that your shell is ‘somewhere’: it has a ‘working directory’ inside the operating system’s ‘file system’. Whenever you run a command it will take this ‘location’ into account. On most consoles, you can change to other locations with the cd command, and see what is present where you are with ls. . is a shortcut for where you are right now, and .. is a shortcut for the layer containing where you are, so cd .. will take you out of your current folder.

There’s a lot of other stuff but that’s all you need to get started.

setup and compilation

Create a new folder. In the console navigate to this folder and type the incantation cargo init. This will run a program called Cargo, which is the Rust package manager, and tell it to set up a new project for us. Cargo will create some files: a Cargo.toml file which contains some general settings for the project, a src folder, and inside there a main.rs file to contain our program.

Together, these files define what sort of program is to be built. The program that turns them into an actual executable file of the sort we discussed above is called the compiler. Compilers are one of the greatest things humans have ever invented, maybe. They’re certainly one of the richest, most intricate segments of computer science.

To turn our program into an executable file, we should type cargo build into the console. This compiles the program without optimisations under target/debug; alternatively, casting cargo build --release will create a faster, optmised version under target/release. You can also build and immediately run a program with cargo run. cargo will not bother rebuilding if nothing has changed.

If there is a compilation error in your program, the compiler will not create a new executable file, and will give you information about the error, and hopefully some advice on how to fix it, in the console when you type cargo build or cargo run.

OK, that’s the basic set of incantations. Let’s see what we can do with them.

naming things

There’s an old computer science joke which goes like this:

The two hard problems of computer science are naming things, cache invalidation, and off-by-one errors.

It apparently goes back to Phil Karlton, according to his son. At some point the meta joke of adding ‘off-by-one errors’ was added. Who knows who came up with that one.

Cache invalidation is an interesting and genuinely fiddly engineering problem which we will not explain just yet; the other two are funny jokes, right? Like, yeah, it’s kinda important to come up with good names for things, but at the end of the day it doesn’t matter that much. They’re easy to replace with modern tools. And off-by-one errors are just a common type of accidental slipup.

But actually…

The very first thing you will probably be taught to do in any programming tutorial is “assign a variable”. Let’s go ahead and do that right now.

fn main() {
    let some_number = 1337;
}

Ignore for now the fn main() {...} wrapper, we’ll talk about that later, it just tells our program where to start.

The line breaks down…

let says we are about to name something
some_number is the name we are bestowing
= says we are assigning a value to this name
1337 is the value we are assigning
; ends the expression/statement (in Rust, everything is an expression)

If we compile and run this program it won’t do anything. (You will get a warning about an unused variable too.) But we have nevertheless expressed a desire in the language’s syntax: we have said we wish for something to exist, and we have given that ‘thing’ a name. That name is some_number. We have also stored a value behind this name. This is the integer 1337.

What has actually happened, though? According to the language’s semantics, all that we have declared is that we want the name some_number to exist, and that it should contain the value 1337. This means, if, later in the program, we write some_number in a place where a name is expected, it can be replaced by the number 1337.

Where in the computer is the number 1337 actually stored? Logically, it is a place called the ‘stack’, but the compiler has some freedom here: maybe it will store this in a CPU register, maybe it will put it in memory.

Alternatively, as in this case, it will actually determine that we never look for the contents of the name some_number, so it can safely be deleted from the program. After the compiler’s done its work, the value 1337 won’t even appear anywhere.

We can add some special magic to the program to tell it to contact the operating system and write to the console.

fn main() {
    let some_number = 1337;
    println!("some_number = {}", some_number);
}

What does println! mean? This is a Rust standard library “macro” which is actually doing quite a lot under the hood. It is saying: ‘format’ a string, then ask the operating system to write it out to the console. ‘Formatting’ a string involves filling in the gaps where we’ve written {} with values from our program. There are lots of different ways to format things, but that would be a distraction.

For now, I will just say this: this is a piece of prebuilt machinery, a spell which Rust provides us. Later, we will certainly find out what’s going on under the hood; for now, we really need to see something come out of our program so we will have to leave it unexplained. This is the simplest way to build up some other ideas.

All you need to know then: you can write println! followed by something in parantheses. The first line must always be a template, wrapped in quotes, "like so". Any place you put {} in the template, you can then put a value from the list that comes after. The first value goes in the first {}, the second value in the second {}, and so on. (You can actually put additional information inside the {} to make it print it in certain ways, such as {:09} to print a 9-digit long number padded with zeroes; I won’t explain that right now but if you are dying to know about it there is information here.) Try playing around a bit with println! templates.

semantics vs instructions

What we are observing here is that, unlike the assembly language we saw earlier, a Rust program doesn’t correspond to machine code in a simple way at all. The compiler takes our program and spits out some machine code which ‘matches up with’ what we wanted the program to do.

Using Compiler Explorer, we can actually see exactly what comes out of the compiler. With some knowledge and effort, it’s possible to line things up, but actually most of this code is just setting up to run println! for us.

So what’s going on? How should we understand this?

A Rust program does not describe machine code, exactly. It describes what an ‘abstract machine’ should do. The code generated by the compiler ‘matches up’ in the following way: for a given input, the program is obliged to result in certain things where it touches the ‘outside world’. In this case, there are no inputs (or are there…), so the compiler must have the program print the number 1337 to the console. How it actually goes about doing that is its business.

If you write let some_number you’re saying that the compiler should know about this name, which we call declaring an identifier. Then the = 1337 is performing assignment: we are associating that name with a value.

Note that this is not the same as asserting there is a mathematical equality, it’s not like an algebraic equation, despite using the same ‘equals’ sign. Assignment in this type of language is an action: the thing on the right is put inside the thing on the left. There is a before, where the equality might not be true, and an after, where it is. Some languages might use a different syntax like some_number <- 1337. But Rust follows the tradition of C-like languages in which a single = always means assignment.)

A Rust program describes a series of statements about how we want things in the abstract machine to change. For example, let’s add some numbers…

fn main() {
    let leet = 1337;
    let acab = 1312;
    let sum = leet + acab;
    println!("{} + {} = {}", leet, acab, sum);
}

We are assigning ‘the result of adding 1337 and 1312’ to the identifier ‘sum’. At the end we’ve got a line which prints out all that stuff we declared, slotting it into the designated boxes in a sentence template.

One intresting wrinkle of this is that the state of things is different on each line of the program. Let’s try printing out the value of sum before we’ve declared it…

fn main() {
    let leet = 1337;
    let acab = 1312;
    println!("sum: {}", sum);
    let sum = leet + acab;
    println!("{} + {} = {}", leet, acab, sum);
}

In your IDE, or if you try to compile, you should get a red squiggly line under ‘sum’, and it will say:

error[E0425]: cannot find value sum in this scope

On this line of the program, sum does not exist. It only exists after we declare it. But what’s all this about scope?

In Rust, curly braces do something quite special. They create a kind of mini world in which the names we give are allowed to live. Let’s give it a try. (We have added comments, lines of code which do nothing, by writing // at the start of each one.)

fn main() {
    let leet = 1337;
    
    //scope begins here
    {
        let acab = 1312;
        
        //this will not give an error
        let foo = acab;

        //this is fine too
        let bar = leet;

        //scope ends here
    } 
    
    //this will give an error
    let sum = leet + acab;
    println!("{} + {} = {}", leet, acab, sum);
}

Inside the inner pair of braces, we define the names acab, foo and bar. We can refer to the name acab inside the braces, since it has been defined. We can also refer to the name leet, which came from the ‘world outside’.

However, at the end of the scope, the closing brace }, everything that was declared inside the scope is destroyed! If we try to refer to acab again, it fails. Names do not last forever. We are constantly ‘stacking’ scopes on top of each other (e.g. by calling functions, we’ll get to that in a moment). When we leave a layer of the stack, it disappears, and all the names it knew about are forgotten…

It might not be obvious why this is useful yet, besides avoiding accidents when the same names are used in multiple places… but it all relates to some of Rust’s other useful features to do with memory management and lifetimes. We’ll get there.

There is no concept of scope or names on the level of assembly. There are just registers and memory addresses. Scopes are part of the abstract machine, also known as the semantics of the language.

shadowing

This is honestly a bit niche, but worth mentioning: you can replace a name with another name. This declaration lasts until the end of its scope, after which it falls away and reveals the old name again.

fn main() {
    let foo = 1312;
    {
        //the previous foo becomes 'shadowed'
        let foo = 1337;

        //prints 1337
        println!("Inner foo: {}", foo);
        //inner foo is dropped
    }
    //prints 1312
    println!("Outer foo: {}", foo);
}

This is called ‘shadowing’, and many languages do not have it. It is useful to know about, though.

mutable variables

So far, we have never changed the value we’ve stored behind a name. This is called ‘mutation’. Traditionally, the things behind names could always be mutated, which is why they tend to be called ‘variables’.

Rust also allows mutation, but we have to explicitly say we want to allow it when we name something. Here’s a simple example:

fn main() {
    let mut tobias_form = "human";
    println!("tobias is currently a {}", tobias_form);
    tobias_form = "hawk";
    println!("tobias is currently a {}", tobias_form);
}

If we hadn’t included the mut when declaring tobias_form, we’d get a compilation error when we try to assign a new value to it.

Note that with mutable variables, you can ‘reach outside’ the self-contained world of a scope.

fn main() {
    let mut tobias_form = "human";
    let cassie_form = "human";
    print!("tobias is currently a {}", tobias_form);
    print!("cassie is currently a {}", cassie_form);
    {
        //time to battle some yeerks!
        tobias_form = "hawk";
        let cassie_form = "horse";
        print!("tobias is currently a {}", tobias_form);
        print!("cassie is currently a {}", cassie_form);
    }
    print!("tobias is currently a {}", tobias_form);
    print!("cassie is currently a {}", cassie_form);
}

Oh no! Tobias got stuck as a hawk!

Mutable variables are generally more complicated than immutable ones because they might be mutated in all sorts of different places. They make it easier to create bugs if you assume a name refers to one thing but it’s actually a different thing. So it’s a good idea to only make things mutable when they need to be mutable.

what is a language for?

The function of a programming language is to let you express, as clearly and precisely as possible, the intent you wish to perform. In practice, this means describing a number of abstract entities you wish to bestow an ‘as if’ existence. The compiler will then do the hard work of arranging the world in order to fulfil your wish. You need to meet it halfway, though: understand how things work enough to speak a language it understands and follow the rules.

So far we have just given names to some numbers. Soon we’ll be making other things: data structures, functions, types, and all sorts. Then, we can describe how these abstract things should interact with each other.

Could you have a programming language which does not involve bestowing names? Yes, certainly: you could, for example, directly push and pop things onto a stack with suitable commands. It would be a headache and you would not be using your superpower as a language-using creature, which is to invent new concepts and attach them to words.

Ah, but all this adding numbers together is pretty boring, isn’t it. Let’s work through a more complex example: a simple dungeon game to be played in the console.

a dungeon game

You may have played a game like this before. You have a bunch of ascii characters expressing walls and creatures. Like this…

############
#..........#
#..@.......#
#.......g..#
#..........#
############

In this world, creatures live on a grid. (Grids are easy to represent.) The position of the player is represented by an @ sign. There is another creature, represented by the letter g, in the room.

What does our program need to do? On a high level…

it needs to communicate the state of the game to the player.
it needs to receive input, representing the player’s next action.
it needs to update the simulation in response to the player’s action.
it needs to update the displayed game state

How can we go about doing this? We need to describe the actions that should be taken with the abstract entities we’re naming. There are multiple ways to do this, but we’re angling towards writing functions.

second orb to ponder

Here’s a slightly annoyingly written Rust program to fuck about with names and scopes:

fn main() {
    let spell = "Y";
    {
        let spell = "X";
        print!("{}", spell);
    }
    {
        print!("{}", spell);
        let spell = "Z";
        print!("{}{}", spell, spell);
    }
    print!("{}", spell);
}

Can you work out what it prints? (print is the same as println mostly, it just does not put a newline at the end so you can keep writing on the same line.) Write down a guess, then run it and see if you’re right.

Most of the time you will not be directly putting pointless scopes and constantly shadowing variables in the middle of a function like this! But functions, loops, conditionals and the like all depend on scopes to work.