Pointers in Zig
This post is going to explain how pointers, arrays, and slices work in Zig, without depending on knowledge of other languages that use pointers. If you're trying to learn Zig and your only programming experience is in a language with managed memory, this is for you. If you're struggling to understand pointers in C, this might help. If you're coming from C or C++ and just want to know the differences, you might be better off reading the official documentation for arrays, slices, and pointers. Ziglearn is another good resource.
Memory
Memory in computers that run Zig code is laid out as a large contiguous region of bytes, where each byte has an address. This model is pretty easy to visualize with a table:
Address | Value |
---|---|
0 | 3 |
1 | 12 |
2 | 87 |
3 | 255 |
4 | 9 |
5 | 0 |
6 | 0 |
7 | 0 |
8 | 11 |
9 | 57 |
10 | 144 |
11 | 254 |
12 | 93 |
13 | 13 |
14 | 228 |
15 | 65 |
This chunk of memory has the number 11 at address 8. Any
value stored in memory takes up some number of contiguous
bytes. That number varies depending on the type, but no matter
how large it is, a value's address is the lowest-numbered
address it occupies. So a 4-byte u32
value that
occupies addresses 8 through 11 would be said to be at address
8.
Using the table above as an example, if we have
a u16
stored at address 0, we know that it consists
of two bytes: 3 and 12 (0x03 and 0x0c in hex). Depending on the
machine's endianness (the order in which bytes are interpreted),
that means our u16
is either 3,075 (0x0c03) or 780
(0x030c). Every value that's stored in memory works like
this.
Arrays
Sometimes we want to store several values of the same
type. For example, a sequence of characters to make the string
"Take off every zig" needs to live in memory somehow. We could
store each of those letters as ASCII in its own u8
and feed them in sequence to an output function to print the
string, but computers are supposed to do work for us, not the
other way around.
Instead, we can store them all in order next to each other in
memory, and feed the whole chunk to a print function. This bunch
of adjacent values that are all the same type is called an
array. Arrays in Zig are defined by their length and the type of
value that they hold, with the length in [square
brackets]
right before the type. So "Take off every zig"
is an [18]u8
or, if we know that nobody will ever
modify its contents, we can declare it const
. Here
it is laid out in memory, at address 51236:
Address | Value |
---|---|
51236 | 'T' |
51237 | 'a' |
51238 | 'k' |
51239 | 'e' |
51240 | ' ' |
51241 | 'o' |
51242 | 'f' |
51243 | 'f' |
51244 | ' ' |
51245 | 'e' |
51246 | 'v' |
51247 | 'e' |
51248 | 'r' |
51249 | 'y' |
51250 | ' ' |
51251 | 'z' |
51252 | 'i' |
51253 | 'g' |
That chunk of memory from 51236 to 51253 is the
array. We can create an array just like it, taking advantage of
Zig's array literal length inference syntax ([_]T
),
with the following Zig code:
var orders: [18]u8 = [_]u8{ 'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ', 'e', 'v', 'e', 'r', 'y', ' ', 'z', 'i', 'g', };
That's what arrays look like in memory and how to define
them. If we wanted to write a function that could accept any
18-byte sequence but promises not to change its contents, it
would start with something like fn look(array:
[18]u8) void
.
Pointers
A pointer is a value that holds the address of another
value. In Zig, you can get the address of any value by using the
address-of operator &
. The type of a pointer
is an asterisk (*
) before the type of the value to
which it points. So a *u8
points to
a u8
. Here's a little bit of sample Zig code, along
with its associated memory. We'll use a machine with 8-bit
pointers so we don't have to worry about endianness.
var num: u8 = 12; const ptr: *u8 = # std.testing.expectEqual(ptr.*, num);
Address | Value | Note |
---|---|---|
128 | 12 | num |
129 | 165 | uninitialized |
130 | 165 | |
131 | 165 | |
132 | 128 | ptr |
133 | 165 | |
134 | 165 | |
135 | 165 |
The code included one more new thing: dereferencing. The
value of ptr
is 128, but we don't particularly care
what its value is; we want to know the value of the thing it
points to. To ask Zig for the value pointed to by a pointer, we
use the derefrence operator .*
. This is the
inverse of the address-of operator (&
).
This kind of pointer (*T
where T is a type) only
refers to exactly one item at a time. There is another kind of
pointer in Zig, [*]T
. This is called an unknown-length
pointer, and it points to an arbitrary number of values. Using
our orders
array from the last section, we can get
a pointer to its elements with this code:
var orders: [18]u8 = [_]u8{ 'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ', 'e', 'v', 'e', 'r', 'y', ' ', 'z', 'i', 'g', }; var order_ptr: [*]u8 = &orders;
This lets us address individual elements of the array,
like order_ptr[3]
, which would have the
value 'e'
. Unfortunately, this also lets us ask
for order_ptr[100]
and Zig will happily tell us
whatever is 100 bytes away from order_ptr
in
memory, even though it's not part of orders
. This
can lead to errors that are difficult to detect and fix. The
problem here is that unknown-length pointers don't have known
lengths (surprise) so access through them can't be checked to
ensure that it doesn't go past the end of the array.
Slices
Slices solve this problem. You can get a slice from an array
with the slicing syntax: array[start..end]
where start
is the index of the first element that
will be in the slice and end
is the index 1 past
the last element. The end
parameter can be left off
if you want to take the rest of the array,
so array[0..]
is a slice comprising all the
elements in array
and array[0..10]
is
the first 10 elements.
You can think of a slice like a struct with an unknown-length pointer and a length.
fn Slice(comptime T: type) type { return struct { ptr: [*]T, len: usize, }; }
Zig comes with some extra syntax support for slices, so the
type Slice(u8)
is referred to as []u8
and the 3rd element of a slice named slice
is slice[2]
. Raw unknown-length pointers are best
avoided in Zig; slices are much safer and easier to use. Going
back to our trusty orders
example, here's some code
and associated memory layout with a slice.
var orders: [18]u8 = [_]u8{ 'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ', 'e', 'v', 'e', 'r', 'y', ' ', 'z', 'i', 'g', }; var slice: []const u8 = orders[0..];
Address | Value | Note |
---|---|---|
232 | 236 | pointer to orders |
233 | 18 | length of orders |
234 | 165 | uninitialized |
235 | 165 | |
236 | 'T' | beginning of orders |
237 | 'a' | |
238 | 'k' | |
239 | 'e' | |
240 | ' ' | |
241 | 'o' | |
242 | 'f' | |
243 | 'f' | |
244 | ' ' | |
245 | 'e' | |
246 | 'v' | |
247 | 'e' | |
248 | 'r' | |
249 | 'y' | |
250 | ' ' | |
251 | 'z' | |
252 | 'i' | |
253 | 'g' |
The slicing syntax orders[0..]
says to take a
slice of the orders array, starting at index 0 and going to the
end. We could get just "off" by
slicing orders[5..8]
out of the array instead.
const
I've sprinkled the const
keyword through this
post without really explaining what it does. This is the part
where that changes. Whenever you see const
in a Zig
type, it means that whatever comes next can't be changed. So a
slice of u8
s declared as []const u8
,
means that the slice itself can change, but the u8
s
to which it points can't. Here's some code with the "whatever
comes next" in bold.
var orders: [18]u8 = [_]u8{ 'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ', 'e', 'v', 'e', 'r', 'y', ' ', 'z', 'i', 'g', }; var slice: []const u8 = orders[0..]; slice = orders[5..8]; // Legal slice[0] = 'X'; // error: cannot assign to constant const slice2: []u8 = orders[0..]; slice2[0] = 'X'; // Legal slice2 = orders[5..8]; // error: cannot assign to constant
So values declared var
can change, even if they
point to things that can't, and values
declared const
can't, even if they point to things
that can. This is important for...
Function Parameters
Function parameters in Zig are always
implicitly const
. There's no way to make
them var
. Consider the following 3 tests:
fn times2(a: u32) u32 { return a * 2; } test "use function" { var a: u32 = 3; const b = times2(a); } test "inline" { const a: u32 = 3; const b: u32 = a * 2; } test "var" { var a: u32 = 3; a = a * 2; b = a; }
The first two tests are equivalent, because a
is const
when it's multiplied by two. The third
example is defferent. It modifies the value of a
in
place before assigning it to b
. Note that
the const
keyword doesn't appear at all in the
definition of times2
, but if we had tried to
do a = a * 2; return a;
in that function, the
compiler would have spat out error: cannot assign to
constant
. That's what "function parameters in Zig are
always implicitly const
" means.
Limiting the language to pure functions seems nice, but
sometimes you need a function to modify its arguments in
place. For example, let's write a function to reverse an array
of u8
s. It will have to take the array's length as
a comptime parameter, since arrays lengths must be known at
compile time.
fn reverse(comptime L: usize, array: [L]u8) [L]u8 { var out: [L]u8 = undefined; for (array) |elem, idx| { out[L - idx - 1] = elem; } return out; }
This function has some problems. First is that it doubles the amount of memory required by the array. Even if we're never going to use the un-reversed version of the array again, it's still there, unmodified. The other major issue is that we have to know the array's length at compile time. That means no reversing dynamically allocated memory for us. We can get around both of these by switching from an array-reverser to a slice-reverser.
Instead of accepting and returning [L]u8
, it can
just accept a []u8
, reverse the slice in place, and
return void
. Let's look at the standard library's
implementation of std.mem.reverse
.
pub fn swap(comptime T: type, a: *T, b: *T) void { const tmp = a.*; a.* = b.*; b.* = tmp; } pub fn reverse(comptime T: type, items: []T) void { var i: usize = 0; const end = items.len / 2; while (i < end) : (i += 1) { swap(T, &items[i], &items[items.len - i - 1]); } }
This has an extra T
parameter to make it generic
across slices of any type, but it's still pretty
straightforward. We walk up to the midpoint of the slice,
swapping each element with its pair on the other side of the
midpoint. No extra memory required, and no chance of
failure. Since swap
can't modify its parameters
directly, it takes a pair of pointers and swaps the values they
point to.
Let's do one more memory layout example that
uses std.mem.reverse
.
var orders: [18]u8 = [_]u8{ 'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ', 'e', 'v', 'e', 'r', 'y', ' ', 'z', 'i', 'g', }; std.mem.reverse(u8, orders[0..]);
Address | Value Before | Value After | Note | |
---|---|---|---|---|
0 | 2 | 2 | slice.ptr | slice |
1 | 18 | 18 | slice.len | |
2 | 'T' | 'g' | orders[0] | orders |
3 | 'a' | 'i' | orders[1] | |
4 | 'k' | 'z' | orders[2] | |
5 | 'e' | ' ' | orders[3] | |
6 | ' ' | 'y' | orders[4] | |
7 | 'o' | 'r' | orders[5] | |
8 | 'f' | 'e' | orders[6] | |
9 | 'f' | 'v' | orders[7] | |
10 | ' ' | 'e' | orders[8] | |
11 | 'e' | ' ' | orders[9] | |
12 | 'v' | 'f' | orders[10] | |
13 | 'e' | 'f' | orders[11] | |
14 | 'r' | 'o' | orders[12] | |
15 | 'y' | ' ' | orders[13] | |
16 | ' ' | 'e' | orders[14] | |
17 | 'z' | 'k' | orders[15] | |
18 | 'i' | 'a' | orders[16] | |
19 | 'g' | 'T' | orders[17] |
Conclusion
Hopefully pointers, arrays, and slices are somewhat demystified now. If I did a good job, you should have a solid grasp of the basics. If you want to learn more, here are some topics that might be of interest.