Pointers in Zig¶
This post is going to explain how pointers, arrays, and slices work in Zig, without depending on knowledge of other languages that use pointers. If you’re trying to learn Zig and your only programming experience is in a language with managed memory, this is for you. If you’re struggling to understand pointers in C, this might help. If you’re coming from C or C++ and just want to know the differences, you might be better off reading the official documentation for arrays, slices, and pointers. zig.guide is another good resource.
Memory¶
Memory in computers that run Zig code is laid out as a large contiguous region of bytes, where each byte has an address. This model is pretty easy to visualize with a table:
Address |
Value |
---|---|
0 |
3 |
1 |
12 |
2 |
87 |
3 |
255 |
4 |
9 |
5 |
0 |
6 |
0 |
7 |
0 |
8 |
11 |
9 |
57 |
10 |
144 |
11 |
254 |
12 |
93 |
13 |
13 |
14 |
228 |
15 |
65 |
This chunk of memory has the number 11 at address 8. Any value stored
in memory takes up some number of contiguous bytes. That number varies
depending on the type, but no matter how large it is, a value’s
address is the lowest-numbered address it occupies. So a 4-byte
u32
value that occupies addresses 8 through 11 would be said to be
at address 8.
Using the table above as an example, if we have a u16
stored at
address 0, we know that it consists of two bytes: 3 and 12 (0x03 and
0x0c in hexadecimal). Depending on the machine’s endianness (the order
in which bytes are interpreted), that means our u16
is either
3,075 (0x0c03) or 780 (0x030c). Every value that’s stored in memory
works like this.
Arrays¶
Sometimes we want to store several values of the same type. For
example, a sequence of characters to make the string “Take off every
zig” needs to live in memory somehow. We could store each of those
letters as ASCII in its own u8
and feed them in sequence to an
output function to print the string, but computers are supposed to do
work for us, not the other way around.
Instead, we can store them all in order next to each other in memory,
and feed the whole chunk to a print function. This bunch of adjacent
values that are all the same type is called an array. Arrays in Zig
are defined by their length and the type of value that they hold, with
the length in [square brackets]
right before the type. So “Take
off every zig” is an [18]u8
or, if we know that nobody will ever
modify its contents, we can declare it const
. Here it is laid out
in memory, at address 51236:
Address |
Value |
---|---|
51236 |
‘T’ |
51237 |
‘a’ |
51238 |
‘k’ |
51239 |
‘e’ |
51240 |
‘ ‘ |
51241 |
‘o’ |
51242 |
‘f’ |
51243 |
‘f’ |
51244 |
‘ ‘ |
51245 |
‘e’ |
51246 |
‘v’ |
51247 |
‘e’ |
51248 |
‘r’ |
51249 |
‘y’ |
51250 |
‘ ‘ |
51251 |
‘z’ |
51252 |
‘i’ |
51253 |
‘g’ |
That chunk of memory from 51236 to 51253 is the array. We can
create an array just like it, taking advantage of Zig’s array literal
length inference syntax ([_]T
), with the following Zig code:
var orders: [18]u8 = [_]u8{
'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ',
'e', 'v', 'e', 'r', 'y', ' ',
'z', 'i', 'g',
};
That’s what arrays look like in memory and how to define them. If we wanted to write a function that could accept any 18-byte sequence but promises not to change its contents, it would start with something like
fn look(array: [18]u8) void
Pointers¶
A pointer is a value that holds the address of another value. In Zig,
you can get the address of any value by using the address-of operator
&
. The type of a pointer is spelled with an asterisk (*
)
before the type of the value to which it points. So a *u8
points
to a u8
. Here’s a little bit of sample Zig code, along with its
associated memory. We’ll use a machine with 8-bit pointers so we don’t
have to worry about endianness.
var num: u8 = 12;
const ptr: *u8 = #
std.testing.expectEqual(ptr.*, num);
Address |
Value |
Note |
---|---|---|
128 |
12 |
num |
129 |
165 |
uninitialized |
130 |
165 |
|
131 |
165 |
|
132 |
128 |
ptr |
133 |
165 |
|
134 |
165 |
|
135 |
165 |
The code included one more new thing: dereferencing. The value of
ptr
is 128, but we don’t particularly care what its value is; we
want to know the value of the thing it points to. To ask Zig for the
value pointed to by a pointer, we use the derefrence operator
.*
. This is the inverse of the address-of operator (&
).
This kind of pointer (*T
where T is a type) only refers to exactly
one item at a time. There is another kind of pointer in Zig, [*]T
. This is called an unknown-length pointer, and it points to an
arbitrary number of values. Using our orders
array from the last
section, we can get a pointer to its elements with this code:
var orders: [18]u8 = [_]u8{
'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ',
'e', 'v', 'e', 'r', 'y', ' ',
'z', 'i', 'g',
};
var order_ptr: [*]u8 = &orders;
This lets us address individual elements of the array, like
order_ptr[3]
, which would have the value 'e'
. Unfortunately,
this also lets us ask for order_ptr[100]
and Zig will happily tell
us whatever is 100 bytes away from order_ptr
in memory, even
though it’s not part of orders
. This can lead to errors that are
difficult to detect and fix. The problem here is that unknown-length
pointers don’t have known lengths (surprise) so access through them
can’t be checked to ensure that it doesn’t go past the end of the
array.
Slices¶
Slices solve this problem. You can get a slice from an array with the
slicing syntax: array[start..end]
where start
is the index of
the first element that will be in the slice and end
is the index 1
past the last element. The end
parameter can be left off if you
want to take the rest of the array, so array[0..]
is a slice
comprising all the elements in array
and array[0..10]
is the
first 10 elements.
You can think of a slice like a struct with an unknown-length pointer and a length.
fn Slice(comptime T: type) type {
return struct {
ptr: [*]T,
len: usize,
};
}
Zig comes with some extra syntax support for slices, so the type
Slice(u8)
is referred to as []u8
and the 3rd element of a
slice named slice
is slice[2]
. Raw unknown-length pointers are
best avoided in Zig; slices are much safer and easier to use. Going
back to our trusty orders
example, here’s some code and associated
memory layout with a slice.
var orders: [18]u8 = [_]u8{
'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ',
'e', 'v', 'e', 'r', 'y', ' ',
'z', 'i', 'g',
};
var slice: []const u8 = orders[0..];
Address |
Value |
Note |
---|---|---|
232 |
236 |
pointer to orders |
233 |
18 |
length of orders |
234 |
165 |
uninitialized |
235 |
165 |
|
236 |
‘T’ |
beginning of orders |
237 |
‘a’ |
|
238 |
‘k’ |
|
239 |
‘e’ |
|
240 |
‘ ‘ |
|
241 |
‘o’ |
|
242 |
‘f’ |
|
243 |
‘f’ |
|
244 |
‘ ‘ |
|
245 |
‘e’ |
|
246 |
‘v’ |
|
247 |
‘e’ |
|
248 |
‘r’ |
|
249 |
‘y’ |
|
250 |
‘ ‘ |
|
251 |
‘z’ |
|
252 |
‘i’ |
|
253 |
‘g’ |
The slicing syntax orders[0..]
says to take a slice of the orders
array, starting at index 0 and going to the end. We could get just
“off” by slicing orders[5..8]
out of the array instead.
const¶
I’ve sprinkled the const
keyword through this post without really
explaining what it does. This is the part where that changes.
Whenever you see const
in a Zig type, it means that whatever comes
next can’t be changed. So a slice of u8
s declared as []const
u8
, means that the slice itself can change, but the u8
s to
which it points can’t. In the following example, the “whatever comes
next” is u8
and slice2
.
var orders: [18]u8 = [_]u8{
'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ',
'e', 'v', 'e', 'r', 'y', ' ',
'z', 'i', 'g',
};
var slice: []const u8 = orders[0..];
slice = orders[5..8]; // Legal
slice[0] = 'X'; // error: cannot assign to constant
const slice2: []u8 = orders[0..];
slice2[0] = 'X'; // Legal
slice2 = orders[5..8]; // error: cannot assign to constant
So values declared var
can change, even if they point to things
that can’t, and values declared const
can’t, even if they point to
things that can. This is important for…
Function Parameters¶
Function parameters in Zig are always implicitly const
. There’s no
way to make them var
. Consider the following 3 tests:
fn times2(a: u32) u32 {
return a * 2;
}
test "use function" {
var a: u32 = 3;
const b = times2(a);
}
test "inline" {
const a: u32 = 3;
const b: u32 = a * 2;
}
test "var" {
var a: u32 = 3;
a = a * 2;
b = a;
}
The first two tests are equivalent, because a
is const
when
it’s multiplied by two. The third example is defferent. It modifies
the value of a
in place before assigning it to b
. Note that
the const
keyword doesn’t appear at all in the definition of
times2
, but if we had tried to do a = a * 2; return a;
in that
function, the compiler would have spat out error: cannot assign to
constant
. That’s what “function parameters in Zig are always
implicitly const
” means.
Limiting the language to pure functions seems nice, but sometimes you
need a function to modify its arguments in place. For example, let’s
write a function to reverse an array of u8
s. It will have to
take the array’s length as a comptime parameter, since arrays lengths
must be known at compile time.
fn reverse(comptime L: usize, array: [L]u8) [L]u8 {
var out: [L]u8 = undefined;
for (array) |elem, idx| {
out[L - idx - 1] = elem;
}
return out;
}
This function has some problems. First is that it doubles the amount of memory required by the array. Even if we’re never going to use the un-reversed version of the array again, it’s still there, unmodified. The other major issue is that we have to know the array’s length at compile time. That means no reversing dynamically allocated memory for us. We can get around both of these by switching from an array-reverser to a slice-reverser.
Instead of accepting and returning [L]u8
, it can just accept a
[]u8
, reverse the slice in place, and return void
. Let’s
look at the standard library’s implementation of std.mem.reverse
.
pub fn swap(comptime T: type, a: *T, b: *T) void {
const tmp = a.*;
a.* = b.*;
b.* = tmp;
}
pub fn reverse(comptime T: type, items: []T) void {
var i: usize = 0;
const end = items.len / 2;
while (i < end) : (i += 1) {
swap(T, &items[i], &items[items.len - i - 1]);
}
}
This has an extra T
parameter to make it generic across slices of
any type, but it’s still pretty straightforward. We walk up to the
midpoint of the slice, swapping each element with its pair on the
other side of the midpoint. No extra memory required, and no chance of
failure. Since swap
can’t modify its parameters directly, it takes
a pair of pointers and swaps the values they point to.
Let’s do one more memory layout example that uses std.mem.reverse
.
var orders: [18]u8 = [_]u8{
'T', 'a', 'k', 'e', ' ', 'o', 'f', 'f', ' ',
'e', 'v', 'e', 'r', 'y', ' ',
'z', 'i', 'g',
};
std.mem.reverse(u8, orders[0..]);
Address |
Value Before |
Value After |
Note |
|
---|---|---|---|---|
12 |
128 |
2 |
slice.ptr |
slice |
13 |
18 |
18 |
slice.len |
|
… |
… |
… |
… |
??? |
128 |
‘T’ |
‘g’ |
orders[0] |
orders |
129 |
‘a’ |
‘i’ |
orders[1] |
|
130 |
‘k’ |
‘z’ |
orders[2] |
|
131 |
‘e’ |
‘ ‘ |
orders[3] |
|
132 |
‘ ‘ |
‘y’ |
orders[4] |
|
133 |
‘o’ |
‘r’ |
orders[5] |
|
134 |
‘f’ |
‘e’ |
orders[6] |
|
135 |
‘f’ |
‘v’ |
orders[7] |
|
136 |
‘ ‘ |
‘e’ |
orders[8] |
|
137 |
‘e’ |
‘ ‘ |
orders[9] |
|
138 |
‘v’ |
‘f’ |
orders[10] |
|
139 |
‘e’ |
‘f’ |
orders[11] |
|
140 |
‘r’ |
‘o’ |
orders[12] |
|
141 |
‘y’ |
‘ ‘ |
orders[13] |
|
142 |
‘ ‘ |
‘e’ |
orders[14] |
|
143 |
‘z’ |
‘k’ |
orders[15] |
|
144 |
‘i’ |
‘a’ |
orders[16] |
|
145 |
‘g’ |
‘T’ |
orders[17] |
Conclusion¶
Hopefully pointers, arrays, and slices are somewhat demystified now. If I did a good job, you should have a solid grasp of the basics. If you want to learn more, here are some topics that might be of interest.