Memory Management With Zig

This post is aimed at people who come from languages with automatic garbage collection and want to get a sense for how to approach memory management in a lower level language. I'll be using Zig (and a bit of Python) to demonstrate, but the concepts apply to C and other languages where you have to care if your data is going on the stack or the heap.

This is related to the post on pointers in Zig, and for a similar audience, but here we're going to look more at the memory management aspects than the syntax and what it means.

If you want to follow along with the code for this, the snippets shown are taken from more complete files available on sourcehut.

RPN Calculators

In order to look at how memory management works, we're going to use a concrete application. Reverse polish notation calculators are frequently used to teach basic concepts in computer science, like how stacks work. Conveniently, a stack is a data structure that grows and shrinks over its lifetime. So we're going to write an RPN calculator in Zig, and use it as a vehicle for exploring how to implement dynamic data structures.

Inconveniently, the call stack is a place where memory lives and has a frustratingly similar name. In the interest of clarity, I'll try to always refer to the call stack as the call stack or execution stack. If I'm talking about a stack and don't qualify it, I mean the data structure that holds values for the calculator.

Before looking at it in Zig, though, we're going to look at a simple Python implementation of an RPN calculator and briefly cover what they do. Ours will be a simple four-function calculator. Numbers will be pushed on a stack, and operators will pop 2 numbers off, apply the operation, and push the result back on.

Here's a Python implementation of the core of the calculator.

ops = {
    '+': lambda a, b: a + b,
    '-': lambda a, b: a - b,
    '*': lambda a, b: a * b,
    '/': lambda a, b: a / b,
}

def rpn(cmd: str, vals: list[float]):
    val = str2float(cmd) # Returns None if it isn't a float.
    if val is not None:
        vals.append(val)
        return None
    op = ops[cmd]
    b = vals.pop()
    a = vals.pop()
    vals.append(op(a, b))
    return vals[-1]

Let's break it down by all the memory it uses. We'll start with the local variables. Several local variables are instantiated in this function, and cease to matter when it returns: val, op, a, and b. These can all live on the execution stack, allocated by the runtime when the function starts and freed when it returns.

Next up are the arguments. The first one, cmd, is just a string to be interpreted. It's either a number, in which case it's pushed on the stack, an operation, in which case it's turned into a function, or something else, which causes an error to bubble up to the caller.

The second argument, though, is the stack we're using to do our work. This stack is lengthened and shortened over the life of the program. Since it's changing size at runtime, it's an appropriate thing to use with dynamic memory.

Zig Though

Python manages our memory for us, though, and we're here to talk about doing that manually. So let's take a look at what a stack looks like in Zig. Of course, the good way to implement a stack in Zig is probably to use std.ArrayList or std.SegmentedList. But we're here to learn about things, not to write production quality software. So instead, we'll write our own linked list implementation.

Allocators?

A brief note on allocators: Zig's philosophy of no hidden allocations is great, but it tends to make you think that you have to become an expert in all the different allocators in order to use Zig well. In most unmanaged languages, I just call malloc and never worry about what the allocator is doing under the hood. Zig feels different, because it gives us a choice to make here. That's ok though, because we're going to choose to use the default. You don't have to become an expert in heaps or arena allocators to use Zig any more than you have to know everything about ORMs and databases to use Django. std.heap.GeneralPurposeAllocator is good enough for our purposes, and probably the vast majority of allocation needs you'll ever have. However, if you do want to learn more about allocators, Benjamin Feng gave a talk about some of the simpler ones and the search term you want for the other ones is heap allocator.

That said, we can write our Stack implementation using idiomatic Zig and have it accept an Allocator as an argument. Problem deferred.

A Stack

This isn't a tutorial on generics, so I'm going to make a very bad stack that only holds 64-bit floats. Zig's generic support is probably among the nicest I've ever seen (see also: D) but it's not for us today. For now the focus is on memory management. To that end, our stack will be built out of everyone's favorite data structure: a linked list. Here's a definition of just the data parts:

pub const Stack = struct {
    const Self = @This();

    pub const Node = struct {
        next: ?*Node = null,
        data: f64,
    };

    head: ?*Node = null,
};

Our stack interface is going to have the same functions we used in the Python RPN implementation: push, pop, and peek. Let's look at how those might be implemented in Zig.

const std = @import("std");
const Allocator = std.mem.Allocator;

pub const Stack = struct {
    const Self = @This();

    pub const Node = struct {
        next: ?*Node = null,
        data: f64,
    };

    head: ?*Node = null,
    allocator: Allocator,

    pub fn push(self: *Self, val: f64) !void {
        const next: ?*Node = self.head;
        var new: *Node = try self.allocator.create(Node);
        new.next = next;
        new.data = val;
        self.head = new;
    }

    pub fn pop(self: *Self) ?f64 {
        var popped: *Node = self.head orelse return null;
        defer self.allocator.destroy(popped);
        self.head = popped.next;
        return popped.data;
    }

    pub fn peek(self: *Self) ?f64 {
        var next: *Node = self.head orelse return null;
        return next.data;
    }

    pub fn init(allocator: Allocator) Self {
        return Self{
            .head = null,
            .allocator = allocator,
        };
    }

    pub fn deinit(self: *Self) void {
        while (self.pop()) |_| {}
    }
};

As you can see, we've got some allocation going on. The peek function doesn't change the size of the data structure, but push and pop both do. When we want to add a node to the list, we have to get some space to put it in, and after reading the value in pop it's important to release the memory.

A Tangent

This is a good time to talk about what the allocator is actually doing. The std.mem.Allocator type provides us with an interface that has a lot of surface area, but we can focus on the important parts. The Allocator interface requires anything that implements it to expose a way to get memory and a way to give it back. Some allocators don't care that you've given the memory back, but that's not important.

The main function that user code will call from an Allocator is create(self: Allocator, comptime T: type) Error!*T. It gets enough memory to hold a T and hands a pointer back to the caller. Its counterpart, destroy(self: Allocator, ptr: anytype) void, gives the memory back. Note that destroy never fails. That's a Zig thing. Freeing resources doesn't fail.

There's also alloc and free, which you can use to get and release an array of a data type, but for our purposes create and destroy will suffice.

So what does an allocator do? It has some memory, and it keeps track of which bits are being used and which bits are free. When someone asks for some, it checks if there's enough free memory to fulfill the request hands out a pointer to it if there is.

Note that alloc returns an error or a pointer. There's no ? in the type, so it's not an optional pointer. So if it didn't fail with an error, you now have a pointer to the thing you requested.

When you're done with the memory the allocator gave you, it's polite to give it back. When writing libraries where you don't know which allocator will be used, it's very important to pair every call to alloc with one to destroy. That tells the allocator you're done with it and it may reuse the memory you had for someone else.

Allocators??

When I said earlier that we don't have to think about which allocator to use, because we're always just going to choose the default, I lied a little. I've written a lot of C, and almost never used anything but the default malloc in production. However, in tests, I very frequently use memcheck which replaces the default heap allocator. Zig conveniently has std.testing.allocator, which we can use to test our stack implementation. So that's what we'll do.

Stack Testing

Here's a little test to ensure that our stack is working properly. Note that if we leave the last 2 calls to pop out, or if we delete the defer self.allocator.destroy(popped) line, we'll get a useful error from the testing allocator about leaking memory.

const testing = std.testing;
const alloc = std.testing.allocator;

test "Create, push, pop" {
    var stack: Stack = Stack.init(alloc);
    try std.testing.expectEqual(@as(?f64, null), stack.peek());
    try std.testing.expectEqual(@as(?f64, null), stack.pop());
    try stack.push(2.5);
    try std.testing.expectEqual(@as(?f64, 2.5), stack.peek());
    try stack.push(3.5);
    try std.testing.expectEqual(@as(?f64, 3.5), stack.peek());
    try std.testing.expectEqual(@as(?f64, 3.5), stack.pop());
    try std.testing.expectEqual(@as(?f64, 2.5), stack.peek());
    try std.testing.expectEqual(@as(?f64, 2.5), stack.pop());
    try std.testing.expectEqual(@as(?f64, null), stack.peek());
    try std.testing.expectEqual(@as(?f64, null), stack.pop());
}

Application

Now we have a working stack implementation, it's time to wrap that RPN calculator around it. We can pretty closely follow the Python implementation.

const std = @import("std");
const Stack = @import("stack.zig").Stack;

fn str2f64(s: []const u8) ?f64 {
    return std.fmt.parseFloat(f64, s) catch return null;
}

fn add(a: f64, b: f64) f64 {
    return a + b;
}

fn sub(a: f64, b: f64) f64 {
    return a - b;
}

fn mul(a: f64, b: f64) f64 {
    return a * b;
}

fn div(a: f64, b: f64) f64 {
    return a / b;
}

fn rpn(cmd: []const u8, vals: *Stack) !?f64 {
    if (str2f64(cmd)) |val| {
        try vals.push(val);
        return null;
    }

    const b = vals.pop().?;
    const a = vals.pop().?;

    var result = switch (cmd[0]) {
        '+' => add(a, b),
        '-' => sub(a, b),
        '*' => mul(a, b),
        '/' => div(a, b),
        else => return error.BadOp,
    };

    try vals.push(result);
    return vals.peek();
}

As you can see, we've got the same val, a, and b execution stack variables. The transition to a less functional style turns op into result, but overall it's pretty clearly doing the same thing. Note that the allocation is all handled by the data structure. The calculator code doesn't need to worry about when memory is allocated or freed. It's on the caller to make sure that the Stack is properly initialized and cleaned up.

Of course, this implies the existence of a caller. Let's add a test that uses the calculator to do some math. The test, as the owner of the stack, is responsible for ensuring that it has its has init and deinit functions called.

const testing = std.testing;
const alloc = std.testing.allocator;
test "Reverse some polish notations." {
    var stack: Stack = Stack.init(alloc);
    defer stack.deinit();

    try testing.expectEqual(@as(?f64, null), try rpn("3.5", &stack));
    try testing.expectEqual(@as(?f64, null), try rpn("5.5", &stack));
    try testing.expectEqual(@as(?f64, 9.0), try rpn("+", &stack));
    try testing.expectEqual(@as(?f64, null), try rpn("83", &stack));
    try testing.expectEqual(@as(?f64, null), try rpn("3", &stack));
    _ = try rpn("/", &stack);
    try testing.expectEqual(@as(?f64, 249.0), try rpn("*", &stack));
    try testing.expectEqual(@as(?f64, null), try rpn("49", &stack));
    try testing.expectEqual(@as(?f64, 200.0), try rpn("-", &stack));
}

Memories

Let's take a look at what the memory looks like over the lifetime of this final program. We start at the top of the call stack, allocating a Stack object.

Stack object on the stack with null head pointer and an allocator

As you can see, the only variable on the execution stack here is stack. After being initialized, its head pointer is null and its allocator points to the testing allocator.

The next thing it does is push 3.5 and onto the stack. Let's look at what the memory looks like after that.

Stack object on the stack, node on the heap. The node's next pointer is null and its data is 3.5. The stack object also points to an allocator.

When we called vals.push(val) inside the rpn function, the Stack called Allocator.create. This gave us a pointer to enough uninitialized memory to hold one Node. Then push filled it in and stuck the address in the Stack's head field. Since the testing allocator uses the general purpose allocator under the hood, the memory it gave us was on the heap. Now we'll skip ahead a bit in the test to show what the memory looks like after a few things have been put on the stack. This is the program's state right before the / is sent:

Stack object on the stack, 3 nodes on the heap. The stack object's head pointer points to Node 3 with data 3.0. Node 3's next pointer points to node 2, which has data 83.0. Node 2's next pointer points to node 1, which has data 9.0. Node 1's next pointer is null. The stack object's alloc field still points to an Allocator.

The subsequent calls to push resulted in more memory allocated on the heap. This memory is not guaranteed to be contiguous (one reason not to use linked lists on modern computers) but it's all been given to us by the allocator.

When we get to the end of the test, the defer stack.deinit() runs and we give all the heap memory back to the allocator by way of pop's call to self.allocator.destroy. In the instant before our call stack frame is reclaimed, the memory looks just like it did at the start.

Stack object on the stack with null head pointer and an allocator

Concluding

Memory management isn't fundamentally all that hard. Just think about your data structures and how they behave at runtime and you'll be able to decide where everything ought to live.

You may have noticed that the singly linked list implementation in Zig's standard library actually doesn't include functions for allocating new nodes. That's because in the general case you might want to store the whole list using nodes that are already on the call stack, or you could want to store a list of structs that themselves have pointers to other data. Managing that is properly done by a higher level part of the system. If we had implemented our stack of floats using std.SinglyLinkedList, we still would have had to wrap it up in something that allocates and frees.

As I said at the start, though, we probably would have been better served by a std.ArrayList or std.SegmentedList. Our stack doesn't really need to give back its memory every time it shrinks a little.

Resource Management

Another thing I want to mention: memory is not the only resource we manage when developing software. I write a lot of embedded code, and sometimes I have to acquire exclusive access to some piece of hardware, do something with it, then leave it in a good state. The tools Zig gives us for cleaning up when we're done with memory (defer and errdefer) are useful for making sure the hardware is left in a good state when we're done with it. They can also be used to do things like releasing mutexes, closing files, and finishing database transactions.

This post uses a little bit of Python, and while Python manages memory for you, it also gives you tools like try/finally and with to do resource management. If you think of memory as just another resource to be managed, you might have an easier time in both managed and unmanaged languages.