Pointers in Zig

This post is going to explain how pointers, arrays, and slices work in Zig, without depending on knowledge of other languages that use pointers. If you're trying to learn Zig and your only programming experience is in a language with managed memory, this is for you. If you're struggling to understand pointers in C, this might help. If you're coming from C or C++ and just want to know the differences, you might be better off reading the official documentation for arrays, slices, and pointers. Ziglearn is another good resource.

Memory

Memory in computers that run Zig code is laid out as a large contiguous region of bytes, where each byte has an address. This model is pretty easy to visualize with a table:

AddressValue
03
112
287
3255
49
50
60
70
811
957
10144
11254
1293
1313
14228
1565
16 bytes of memory with random data

This chunk of memory has the number 11 at address 8. Any value stored in memory takes up some number of contiguous bytes. That number varies depending on the type, but no matter how large it is, a value's address is the lowest-numbered address it occupies. So a 4-byte u32 value that occupies addresses 8 through 11 would be said to be at address 8.

Using the table above as an example, if we have a u16 stored at address 0, we know that it consists of two bytes: 3 and 12 (0x03 and 0x0c in hex). Depending on the machine's endianness (the order in which bytes are interpreted), that means our u16 is either 3,075 (0x0c03) or 780 (0x030c). Every value that's stored in memory works like this.

Arrays

Sometimes we want to store several values of the same type. For example, a sequence of characters to make the string "Take off every zig" needs to live in memory somehow. We could store each of those letters as ASCII in its own u8 and feed them in sequence to an output function to print the string, but computers are supposed to do work for us, not the other way around.

Instead, we can store them all in order next to each other in memory, and feed the whole chunk to a print function. This bunch of adjacent values that are all the same type is called an array. Arrays in Zig are defined by their length and the type of value that they hold, with the length in [square brackets] right before the type. So "Take off every zig" is an [18]u8 or, if we know that nobody will ever modify its contents, we can declare it const. Here it is laid out in memory, at address 51236:

Take off every zig
AddressValue
51236'T'
51237'a'
51238'k'
51239'e'
51240' '
51241'o'
51242'f'
51243'f'
51244' '
51245'e'
51246'v'
51247'e'
51248'r'
51249'y'
51250' '
51251'z'
51252'i'
51253'g'

That chunk of memory from 51236 to 51253 is the array. We can create an array just like it, taking advantage of Zig's array literal length inference syntax ([_]T), with the following Zig code:


var orders: [18]u8 = [_]u8{
        'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ', 'e', 'v', 'e', 'r',
        'y', ' ', 'z', 'i', 'g',
};
      

That's what arrays look like in memory and how to define them. If we wanted to write a function that could accept any 18-byte sequence but promises not to change its contents, it would start with something like
fn look(array: [18]u8) void.

Pointers

A pointer is a value that holds the address of another value. In Zig, you can get the address of any value by using the address-of operator &. The type of a pointer is an asterisk (*) before the type of the value to which it points. So a *u8 points to a u8. Here's a little bit of sample Zig code, along with its associated memory. We'll use a machine with 8-bit pointers so we don't have to worry about endianness.


var num: u8 = 12;
const ptr: *u8 = #
std.testing.expectEqual(ptr.*, num);
      
A u8 and its pointer
AddressValueNote
12812num
129165uninitialized
130165
131165
132128ptr
133165
134165
135165

The code included one more new thing: dereferencing. The value of ptr is 128, but we don't particularly care what its value is; we want to know the value of the thing it points to. To ask Zig for the value pointed to by a pointer, we use the derefrence operator .*. This is the inverse of the address-of operator (&).

This kind of pointer (*T where T is a type) only refers to exactly one item at a time. There is another kind of pointer in Zig, [*]T. This is called an unknown-length pointer, and it points to an arbitrary number of values. Using our orders array from the last section, we can get a pointer to its elements with this code:


var orders: [18]u8 = [_]u8{
        'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ', 'e', 'v', 'e', 'r',
        'y', ' ', 'z', 'i', 'g',
};
var order_ptr: [*]u8 = &orders;
      

This lets us address individual elements of the array, like order_ptr[3], which would have the value 'e'. Unfortunately, this also lets us ask for order_ptr[100] and Zig will happily tell us whatever is 100 bytes away from the start of order_ptr in memory, even though it's not part of orders. This can lead to undetected errors in code, which are the worst errors. The problem here is that unknown-length pointers don't have known lengths. Shocking, right?

Slices

Slices solve this problem. You can slice an array with the slicing syntax: array[start..end] where start is the index of the first element that will get sliced and and end is the index 1 past the last element. The end parameter can be left off if you want to take the rest of the array, so array[0..] is a slice comprising all the elements in array and array[0..10] is the first 10 elements.

You can think of a slice like a struct with an unknown-length pointer and a length.


fn Slice(comptime T: type) type {
    return struct {
        ptr: [*]T,
        len: usize,
    };
}
      

Zig comes with some extra syntax support for slices, so the type Slice(u8) is referred to as []u8 and the 3rd element of a slice named slice is slice[2]. Raw unknown-length pointers are best avoided in Zig; slices are much safer and easier to use. Going back to our trusty orders example, here's some code and associated memory layout with a slice.


var orders: [18]u8 = [_]u8{
        'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ', 'e', 'v', 'e', 'r',
        'y', ' ', 'z', 'i', 'g',
};
var slice: []const u8 = orders[0..];
      
Slices of arrays
AddressValueNote
232236pointer to orders
23318length of orders
234165uninitialized
235165
236'T'beginning of orders
237'a'
238'k'
239'e'
240' '
241'o'
242'f'
243'f'
244' '
245'e'
246'v'
247'e'
248'r'
249'y'
250' '
251'z'
252'i'
253'g'

The slicing syntax orders[0..] says to take a slice of the orders array, starting at index 0 and going to the end. We could get just "off" by slicing orders[5..8] out of the array instead.

const

I've sprinkled the const keyword through this post without really explaining what it does. This is the part where that changes. Whenever you see const in a Zig type, it means that whatever comes next can't be changed. So a slice of u8s declared as []const u8, means that the slice itself can change, but the u8s to which it points can't. Here's some code with the "whatever comes next" in bold.


var orders: [18]u8 = [_]u8{
    'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ', 'e', 'v', 'e', 'r',
    'y', ' ', 'z', 'i', 'g',
};
var slice: []const u8 = orders[0..];
slice = orders[5..8]; // Legal
slice[0] = 'X'; // error: cannot assign to constant
const slice2: []u8 = orders[0..];
slice2[0] = 'X'; // Legal
slice2 = orders[5..8]; // error: cannot assign to constant
      

So values declared var can change, even if they point to things that can't, and values declared const can't, even if they point to things that can. This is important for...

Function Parameters

Function parameters in Zig are always implicitly const. There's no way to make them var. Consider the following 3 tests:


fn times2(a: u32) u32 {
    return a * 2;
}

test "use function" {
    var a: u32 = 3;
    const b = times2(a);
}

test "inline" {
    const a: u32 = 3;
    const b: u32 = a * 2;
}

test "var" {
    var a: u32 = 3;
    a = a * 2;
    b = a;
}
      

The first two tests are equivalent, because a is const when it's multiplied by two. The third example is defferent. It modifies the value of a in place before assigning it to b. Note that the const keyword doesn't appear at all in the definition of times2, but if we had tried to do a = a * 2; return a; in that function, the compiler would have spat out error: cannot assign to constant. That's what "function parameters in Zig are always implicitly const" means.

Limiting the language to pure functions seems nice, but sometimes you need a function to modify its arguments in place. For example, let's write a function to reverse an array of u8s. It will have to take the array's length as a comptime parameter, since arrays lengths must be known at compile time.


fn reverse(comptime L: usize, array: [L]u8) [L]u8 {
    var out: [L]u8 = undefined;
    for (array) |elem, idx| {
        out[L - idx - 1] = elem;
    }
    return out;
}
      

This function has some problems. First is that it doubles the amount of memory required by the array. Even if we're never going to use the un-reversed version of the array again, it's still there, unmodified. The other major issue is that we have to know the array's length at compile time. That means no reversing dynamically allocated memory for us. We can get around both of these by switching from an array-reverser to a slice-reverser.

Instead of accepting and returning [L]u8, it can just accept a []u8, reverse the slice in place, and return void. Let's look at the standard library's implementation of std.mem.reverse.


pub fn swap(comptime T: type, a: *T, b: *T) void {
    const tmp = a.*;
    a.* = b.*;
    b.* = tmp;
}

pub fn reverse(comptime T: type, items: []T) void {
    var i: usize = 0;
    const end = items.len / 2;
    while (i < end) : (i += 1) {
        swap(T, &items[i], &items[items.len - i - 1]);
    }
}
      

This has an extra T parameter to make it generic across slices of any type, but it's still pretty straightforward. We walk up to the midpoint of the slice, swapping each element with its pair on the other side of the midpoint. No extra memory required, and no chance of failure. Since swap can't modify its parameters directly, it takes a pair of pointers and swaps the values they point to.

Let's do one more memory layout example that uses std.mem.reverse.


var orders: [18]u8 = [_]u8{
    'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ', 'e', 'v', 'e', 'r',
    'y', ' ', 'z', 'i', 'g',
};
std.mem.reverse(u8, orders[0..]);
      
Before and and after reversing
AddressValue BeforeValue AfterNote
022slice.ptr slice
11818slice.len
2'T''g'orders[0] orders
3'a''i'orders[1]
4'k''z'orders[2]
5'e'' 'orders[3]
6' ''y'orders[4]
7'o''r'orders[5]
8'f''e'orders[6]
9'f''v'orders[7]
10' ''e'orders[8]
11'e'' 'orders[9]
12'v''f'orders[10]
13'e''f'orders[11]
14'r''o'orders[12]
15'y'' 'orders[13]
16' ''e'orders[14]
17'z''k'orders[15]
18'i''a'orders[16]
19'g''T'orders[17]

Conclusion

Hopefully pointers, arrays, and slices are somewhat demystified now. If I did a good job, you should have a solid grasp of the basics. If you want to learn more, here are some topics that might be of interest.