programming grimoire [3]::creating spells

Last time we saw that programming consists of defining names and putting things behind them. We got a little sense for the ‘geography’ of a program: the names only exist in some places.

This is one big part of the puzzle. The next powerful spell is functions.

functions
1. functions abstracted
2. side effects
types
1. types are enforced at compile time
A function to display the dungeon
1. loops
buffering with arrays
a third orb to ponder

functions

Much like the programs inside the operating system, we can think of the inside of a program as being made of smaller pieces called functions. We’ve already seen a function: fn main() is Rust’s way of declaring that there is a function called main. By default, Rust programs always do a little setup stuff and then start in the function called main.

There are several ways to look at functions. One is that you run the function, it does some calculation, and then it returns a value to the place that called it. In Rust, this is defined like so…

fn give_me_four() -> i32 {
    4
}

This is a function which always returns the integer value 4. The beginning part is the most important. It breaks down like this:

fn says that we are declaring a function
give_me_four is the name we are assigning to the function
() is where the function’s arguments would go. this function has no arguments
-> is used to indicate the return type of the function, more on that in a sec
i32 is the type of a 32-bit integer, more on that in a sec as well
{ begins a scope, the ‘function body’, which is what the function actually does
4 is an expression (a single calculation), in this case just the value 4. it does not have a semicolon so it gets returned
} ends the function scope

Note that in Rust, unlike a lot of other languages, we don’t usually need to use a keyword such as return to leave a function. If the last line of the function is an expression without a semicolon, that is what gets returned.

You can call this function like so, by putting some parentheses right after it:

fn main() {
    let hopefully_four = give_me_four();
    println!("given {}", hopefully_four);
}

Calling a function means you jump into the function, run its code, and then come back to where you left off. It’s sort of like you dropped the function body right there as a scope, like this:

fn main() {
    let hopefully_four = {
        4
    };
    println!("given {}", hopefully_four);
}

To be useful, functions usually need to have arguments which are filled in when the function is run. Here’s a really boring example:

fn add_four(num: i32) -> i32 {
    num + 4
}

This function takes a 32-bit integer, adds 4 to it, and then returns the result of this calculation. We can call it by putting a value into the parentheses after the name of the function.

fn main() {
    //we can put a number in
    let four_more_than_twelve = add_four(12); //16

    //we can put a named identifier in
    let leet = 1337;
    let four_more_than_leet = add_four(leet); //1341

    //we can put an expression in.
    //this means we'll evaluate the expression and then put the result into the function.
    let four_more_than_an_expression = add_four(leet + 12); //1353

    //rust won't let us call the function with the wrong number of arguments
    let error_too_few = add_four(); //error: no argument
    let error_too_many = add_four(5,6) //error: too many arguments
}

Just like before, it’s as if the function was inserted where we called it, but assigning names to hold each input.

fn main() {
    //this...
    let four_more_than_twelve = add_four(12);

    //is essentially the same as...
    let four_more_than_twelve = {
        let num : i32 = 12;
        num + 4
    };
}

The compiler might do exactly that, instead of jumping. This is called ‘inlining’.

functions abstracted

Although I’ve described a function as a series of instructions to perform, this is not how a mathematician views a function. A mathematician thinks of a function as a ‘mapping between sets’, like a bunch of tiny little arrows from each input to each output. For our add_four function, 0 is mapped to 4, 1 is mapped to 5, 2 is mapped to 6, and so on.

The actual instructions used to perform this mapping do not matter so much. Many different sets of instructions can be used to represent the same function, as long as they give the same output for every possible input. The compiler can take advantage of this, by finding a faster set of instructions to ‘do the same thing’.

Thinking about functions as just a mapping only really works if we have what are known as ‘pure functions’: functions which don’t touch anything ‘outside’ the function, just quietly do their calculation and pass back the result. But there are other types of function…

side effects

println! is also a function. (Technically, a macro that generates a function call. Don’t worry about that for now.) However, this type of function doesn’t simply make a calculation and then politely return it: it goes away and does something, in this case, tell the operating system that we would like to print out some text please.

This is known as a ‘side effect’. In Rust, functions can have side effects. In order to do so, they need to ‘know about’ something outside their own scope. This is called ‘capturing’.

For now, the only side effect we will concern ourselves with is printing things to the console. Pure functions are easier to reason about and usually allow the compiler to make things a lot faster to boot.

types

Let’s talk about data types! What’s all this i32 stuff? What’s a ‘32 bit integer` when it’s at home?

Well, an integer is a whole number. 32 bits means that it uses 32 bits (binary values, that can be 0 or 1) to store the number. This is the same as four bytes, since a byte is 8 bits. i32 is a signed integer data type which can store about four billion different whole numbers between \(-2^{31}=-2\,147\,483\,648\) and \(2^{31}-1=2\,147\,483\,647\). (You can get these numbers easily with i32::MIN and i32::MAX).

Once again, this is an ‘as if’ rule that belongs to the abstract machine. As far as the hardware is concerned, bytes are just bytes. Let’s say you have a register containing the 32-bit binary value 0000 0000 0000 0000 0000 0101 0011 1001. What is that? Maybe it’s an integer with the value 1337 arranged in big-endian order. Maybe it’s a string of Unicode characters. Maybe it’s part of an image. The computer does not give a shit: it will do whatever its next instruction says. If the next instruction is 32-bit integer addition then it will treat those bits as a 32-bit integer.

The compiler cares a lot, though. Rust is a strongly typed language, which means that as far as the language semantics are concerned, every name you assign also has a data type which defines what you’re able to do with it.

We’ve not been explicitly assigning types so far, because the compiler is pretty clever: it can assume that if you are putting an integer in something, it’s intended to have integer type. And by default, numbers without a decimal point have the type i32, so we can skip writing that.

We can be explicit if we want to, though. Let’s check out a few animals in the type zoo…

//this line introduces some names from elsewhere
use std::sync::{Arc, Mutex};

fn main() {
   let unsigned_acab : u32 = 1312;
   let bigger_acab : i64 = 1312;
   let float_acab : f32 = 1312.0;
   
   let acab_array : [i32; 5] = [1312, 1312, 1312, 1312, 1312];
   let acab_str_slice : &str = "1312";
   let silly_nonsense : Arc<Mutex<i32>> = Arc::new(Mutex::new(1312));
}

Here we have declared an unsigned 32-bit integer, a signed 64-bit integer, a 32-bit float, an array of integers, a string slice, and a slightly more complicated thing that I’m mostly including as foreshadowing ;p

The u32 uses the same space as an i32, but it only allows positive numbers. So it can store numbers from 0 up to 4294967295.

The i64 type uses twice as much space as an i32, allowing it to store numbers from \(-9\,223\,372\,036\,854\,775\,808\) to \(9\,223\,372\,036\,854\,775\,807\).

The f32 type is a floating point type. Floating point types are a special representation of numbers which allows fractions and large numbers to be stored in the same 32 bits. Just like i32 and u32, you have about four billion possible floats, but they are not spaced evenly on the number line: increasingly, more and more numbers are ‘missing’ the further away you get from zero. You also have special values like ‘infinity’ and ‘not a number’. Floats are quite weird and there are a lot of nuances to them; they are a leaky abstraction designed to approximately represent our intuitive idea of how real numbers work.

The last three are some examples of slightly more complex types that you will one day be able to express in Rust. They are respectively an array, a string slice, and an “automatically reference counted smart pointer containing a mutex containing a 32 bit integer”, which is the sort of thing you might sometimes need if you’re writing multithreaded programs.

But what’s the point of all this? Types exist so the compiler can help us. They let us express what something is supposed to be, and make sure, before our program even runs, that we don’t try to do something that doesn’t make sense. If a function is written on the assumption that its input is one thing, the types express that to the compiler so we don’t try to run it in a way that wouldn’t work.

types are enforced at compile time

Let’s add another line to this:

fn main() {
   let unsigned_acab : u32 = 1312;
   let bigger_acab : i64 = 1312;

   let causes_error = unsigned_acab + bigger_acab;
}

This last line will cause an error.

We’re trying to add a 32-bit unsigned integer to a 64-bit signed integer. Although that seems like it probably would make sense, Rust doesn’t know what sort of value you would want the result to be, so it hasn’t defined what addition does for these two types. Before you add the numbers together, you need to convert, or cast, the numbers into a compatible pair of types which can be added.

For simple types like this, it can be done with the as keyword to convert one type into the other:

fn main() {
   let unsigned_acab : u32 = 1312;
   let bigger_acab : i64 = 1312;

   let no_error = unsigned_acab as i64 + bigger_acab;
}

It’s always safe (in the sense that no information will be lost) to turn a u32 into an i64, but not the other way round. If the i64 is negative, or larger than the maximum u32, there is no mathematically correct answer for this operation. However, Rust has rules about what to do here. Essentially, casting down to a smaller type ‘truncates’ the integer, keeping only the last 32 bits. Negative values get a little bit complicated, we’ll brush over it for now. (Though feel free to try casting some negative integers to unsigned integers and see what comes out!)

Does this mean that we should always use the biggest available number type, such as i128? This is where the abstraction starts to get leaky again! Most modern processors work with 64-bit values. To handle a 128-bit number, the compiler is forced to generate multiple instructions, and the program will be slower. It will still follow the stipulations of the abstract machine… but the abstract machine is different from the real machine in an important way, and so the real machine has to do more work to satisfy the abstract machine’s demands.

There are other good reasons to want to use smaller types. For example, if you are crunching lots of repetitive data, like pixels, you want to pack the data into arrays, and it matters a lot how much space an array takes up, how quickly it can be loaded in and out of memory, and so on. In this case, there are good reasons to try to use the smallest type you can get away with.

Unsigned types are also useful because the compiler will make sure you don’t create negative numbers by accident. Indeed, one of the most powerful things we can do in programming is to create our own types to express the needs of our program to let the compiler help us out. We’ll do that soon, but let’s get back to the dungeon game.

A function to display the dungeon

Let’s imagine one way of displaying an empty room in the dungeon. We could print it out, line by line, like this:

fn main() {
    println!("#############");
    println!("#...........#");
    println!("#...........#");
    println!("#...........#");
    println!("#############");
}

However, we probably want to print out rooms of different sizes, and in different positions, and so on. And in general, if you find yourself repeating yourself, it’s probably a good idea to make a function to express your intent more clearly.

Let’s start by creating loops.

loops

One thing that computers are really good at is doing the same thing over and over again. This is such a common construct that nearly every programming language has some way to represent it. It’s called a ‘loop’.

Although so far the story has concerned Rust, I need to briefly talk about some history here. C, C++, and similar languages write loops like this:

for (int i = 0; i < 10; i++) {
    //do some shit
}

In C, this says: assign a variable of integer type called i; at the beginning of each loop, check if i is less than 10; at the end of each loop, add 1 to i.

In other words, it says ‘for each number from 0 up to 9 inclusive, do some shit’.

In Rust, this way of writing a loop can be done:

let mut i = 0;
while i < 10 {
    //do some shit
    i = i+1;
}

However, this is not how Rust likes to do things. Rust would generally prefer that you ‘iterate over’ something. It looks like this…

for i in 0..10 {
    //do some shit
}

In a Rust for loop, we are saying something slightly different: we have a thing we’re walking along, and we say, for each time round the loop, take the ‘next thing’ until we run out of things. This is more similar to how Python does it than how C does it. (Note that this time, the variable i only lives in each iteration of the loop, and doesn’t ‘leak out’ into the outer scope.)

The thing we’ve written here, 0..10, is called a ‘range’. It’s Rust’s way of representing ‘the numbers from 0 to one-less-than 10’. If you want to include 10, you can put 1..=10. You are more explicitly describing your intent (do something with every number from 0 up to 9) rather than how you go about achieving that intent; the compiler takes care of the fiddly details.

Ranges are not the only thing we can iterate over, but they’re the only thing we need to iterate over for now.

Let’s try putting a loop in our room drawing program…

fn main() {
    let room_height = 3;
    println!("#############");
    for row in 0..room_height {
        println!("#...........#");
    }
    println!("#############");
}

We could also have another loop to represent the width of the room…

fn main() {
    let room_height = 3;
    let room_width = 9;
    for column in 0..room_width+2 {
        print!("#")
    }
    //print a newline to end the row
    println!();
    
    for row in 0..room_height {
        print!("#");
        for column in 0..room_width {
            print!(".");
        }
        println!("#");
    }
    
    for column in 0..room_width+2 {
        print!("#")
    }
    println!();
}

We can factor this out into a function, and pass the width and height in as arguments.

fn main() {
    print_room(11, 3);
}

fn print_room(width: u32, height: u32) {
    for column in 0..width {
        print!("#")
    }
    //print a newline to end the row
    println!();
    
    for row in 0..height {
        print!("#");
        for column in 0..width {
            print!(".");
        }
        println!("#");
    }
    
    for column in 0..width {
        print!("#")
    }
    println!();
}

Hang on, how are we using print_room before it’s defined? That stuff I said about how names only exist at certain points in time doesn’t apply for functions, just let bindings. Functions can be declared in any order, and they exist for the entire duration of the scope, which is in this case the entire program.

That’s a little repetitive. We have the row-drawing code twice. Let’s make that a function too…

fn main() {
    print_room(11, 3);
}

fn print_room(width: u32, height: u32) {
    print_horizontal_wall(width);
    
    for row in 0..height {
        print!("#");
        for column in 0..width {
            print!(".");
        }
        println!("#");
    }
    
    print_horizontal_wall(width);
}

fn print_horizontal_wall(width: u32) {
    for column in 0..width+2 {
        print!("#")
    }
    //print a newline to end the row
    println!();
}

By giving these individual bits of logic names, it is much easier to understand what the program is doing.

buffering with arrays

This code will work, and it helpfully separates concerns like ‘how big is our room’ from ‘what characters do you use to draw a room’. However, it’s not going to be a good basis for making a game on, because we can’t easily put anything in this room, we can’t easily draw more than one room, and it also may be inefficient to make so many calls to print!, each of which potentially has to bring in a lot of extra machinery. We could theoretically add checks like ‘is there a character in this square’, but that’s going to make our code quite convoluted.

It would be much better if we could have a kind of ‘working area’ where we draw the rooms and characters as separate operations, and then print the whole thing out at once.

Which is a wonderful excuse to introduce array types!

a third orb to ponder

You can clear the console by running

print!("{esc}c", esc = 27 as char);

which sends a special ‘control character’ signal which instructs the console to clear itself. (How do I know that? I googled ‘rust clear console’ and found a Stack Overflow thread which provided various answers; this one worked.)

You can also run this code to make the program wait:

std::thread::sleep(std::time::Duration::from_millis(1000/60));

which will make the program stop doing anything for 1/60 seconds. (Don’t worry for now about all this :: stuff: that’s just a way of looking up names. We’ll cover it more later.)

Using these two commands, and an extra loop, can you make an animation which causes the room to expand?