Go beginners may have difficulties wrapping their head around the concept of pointers. This pointer FAQ explains why and how pointers are used in Go.
Memory cells in main memory are continuously numbered. The number of a given cell is called its address. A pointer to a variable represents the starting address of the memory cells in which that variable is stored in the running process' main memory.
Consider the following code snippet.
package main
import "fmt"
func main() {
a := 42 // create an integer
p := &a // create a pointer to a
fmt.Println(" p is", p) // print p -> this prints the address
fmt.Println("*p is", *p) // read the value at address p. "*p" is called "pointer indirection"
}
If Go is new to you, here is how to decipher the pointer syntax.
&a
returns the memory address of the integer variable a. The result is a pointer type.a
gets assigned to p
. So p
is now of type pointer to int . If you want to declare p
beforehand, you would write: var p *int
to declare that p
is a pointer to int.*p
returns the value at the address that p
points to.What happens in this code? Variable a
is stored in a memory cell with a certain address, say, 0x8198
. Variable p
is a pointer to a
, and it contains the address of a
in memory.
The output of the above code is:
p is 0x8198
*p is 42
(The address value varies.)
The basic concept is super simple. Point to a cell where a value is stored.
In Go, pointers have two main use cases.
Pointers can be used to describe complex data structures, like linked lists or binary trees.
In a binary tree, for example, every node can have zero, one, or two child nodes. These child nodes can be represented by a pointer to a node.
package main
type node[T any] struct {
data T
left *node[T]
right *node[T]
}
If a child node does not exist, the pointer to that child node contains the value nil
. This is a pointer's zero value.
The value
nil
represents the absence of a value. There is no address to point to.
Code can traverse a tree made of node
s by visiting the left
and right
children until it finds the pointer to be nil
.
// Print is a method of node that prints this node
// and all of its child nodes recursively, with indenting.
func (n *node[T]) Print(indent string) {
if n == nil {
fmt.Printf("%s<nil>\n", indent)
return
}
fmt.Printf("%s%v\n", indent, n.data)
n.left.Print(indent + " ")
n.right.Print(indent + " ")
}
(Playground: create and print a tree)
If you want to learn more about binary trees in Go, start with A Binary Search Tree · Applied Goand continue to Balancing a binary search tree · Applied Go. And here is how I turned the balanced version of this tree into a generic tree.
Pointers can also be used for letting different parts of the program access the same data. In fact, this is the use case pointers were originally invented for.
Example: a function creates a struct and passes a pointer to that struct around to several other functions. Each of the called functions can modify the data in place.
NOTE: The following code snippet is kept simple to demonstrate data sharing, but it is not yet correct. The functions do not check the pointer parameter for nil
-ness. I'll explain this further below.
package main
import "fmt"
type data struct {
val int
steps int
}
func addTwo(d *data) {
d.val += 2
d.steps++
}
func square(d *data) {
d.val *= d.val
d.steps++
}
func negate(d *data) {
d.val = -d.val
d.steps++
}
func main() {
// minus34 is a pointer to a new data struct
minus34 := &data{
val: 6,
}
// all functions receive a pointer to the data
// they can manipulate the original struct
square(minus34)
negate(minus34)
addTwo(minus34)
fmt.Println(minus34.val, minus34.steps)
}
(Result: -34 3
).
You can see this pattern mostly with methods that need to change their receiver, like the node.Print()
method above.
It is entirely possible to achieve something similar without using pointers, like so:
func addTwo(d data) data {
d.val += 2
d.steps++
return d
}
func square(d data) data{
d.val *= d.val
d.steps++
return d
}
func negate(d data) data {
d.val = -d.val
d.steps++
return d
}
func main() {
minus34 := data{
val: 6,
}
minus34 = square(minus34)
minus34 = negate(minus34)
minus34 = addTwo(minus34)
fmt.Println(minus34.val, minus34.steps)
}
(Result: -34 3
).
Here, addTwo
, square
, and negate
receive a copy of minus34
and return the calculation result as a new return value. The caller (main()
) assigns the result back to minus34
each time.
(If you have a background in functional languages, or just love deeply nested expressions, you probably will want to write the code as
minus34 = addTwo(negate(square(data{val: 6})))
which is perfectly valid.)
In both cases, the result is the same. So what would be the benefit of passing the original variable around by pointer?
The possible answer to this is: speed. This brings us straight to the next question.
It depends.
Really, it depends. Don't assume that passing a pointer is always faster than copying data.
In general, passing a pointer around avoids copying data. Consider a large buffer, a large struct generated from unmarshalling an insanely complex JSON blob, a multi-dimensional matrix, or any other large data structure you can imagine. The larger this data structure is, the more time is needed to copy all of it over. And each copy needs additional memory. If that memory is on the heap, garbage collection adds to the overall time consumed by copying the data.
In contrast to this, a single pointer is passed around as quick as a plain integer.
Sounds good, doesn't it? On the other hand, using pointers instead of copies can cause data on the call stack to escape to the heap. Consider a function that creates a local variable. Such function-local variables typically live on the function's call stack. When the function returns, the local variables go out of scope and are removed from the stack by simply setting the stack pointer back to the caller's stack frame. Quick and easy.
If a function, however, creates a local variable and returns a pointer to that variable, the variable must outlive the function it was created in. The runtime knows that fact and creates the variable on the heap, where it will eventually be subject to garbage collection.
Code that does this repeatedly quickly generates a large heap of garbage (pun intended), putting pressure on the garbage collector to find and collect all of those escaped variables that are not used anymore.
This can lead to a situation where passing data around by pointer can be less efficient than passing a copy around.
Therefore, if the pointer concept is new to you, consider sticking to...
Pointers have a bad reputation for being a super dangerous language construct. This reputation goes back to C and similar languages that do little for securing pointer usage. On the contrary, direct pointer manipulations are required for fundamental operations like accessing an array element.
C allows to add or subtract values to or from a pointer. Note: we are talking about the pointer itself, not about the value it points to. In other words, in C it is possible to arbitrarily change where the pointer is pointing to by simple plus/minus operations. This "feature" has the elaborate name "pointer arithmetic", but this name should not hide the fact that such kind of manipulations are pretty low-level and thus dangerous.
In C, pointer arithmetic is typically used to access a cell in an array. So instead of a[23]
, you could write *(a+23)
. Looks innocent enough, but trouble is waiting just around the corner. Consider that you can add anything to the pointer. You can easily read past the end of the array, either involuntarily or with sinister ambitions. (Buffer Overflow, anyone?) In fact, you can access any memory cell that the C process owns, by creating a pointer that contains an arbitrary value.
(And BTW, C does not even do out-of-bounds checks for array access. Code can call a[23]
on an array of length 10 and succeed.)
Go does not allow any of this. Pointer arithmetic does not exist, and pointers can only be created from existing variables. Try writing var p *int = 0x8198
to have a pointer point at a memory location of your choice, and the Go compiler will rightfully complain.
Compared to naked C pointers, pointers in Go are mostly harmless. Mostly, because two caveats remain: nil
pointers, and pointers in a concurrency context.
As every other data type in Go, pointer types have a zero value, which is named nil
. If a pointer is nil
, this means that it does not point to any particular variable. Trying to follow a nil
pointer (in technical terms: to dereference the pointer) in order to get the value that it is pointing to is an error. The Go runtime responds to that attempt with a panic, and this usually means that the whole process exits immediately.
Which leads to...
Always check a pointer for nil before attempting to dereference it.
What does "received from elsewhere" mean? A pointer created from a variable right before accessing the pointer cannot be nil
:
var n int
p := &a
// p is guaranteed to not be nil
Here we need no check for nil
.
On the other hand, when a function receives a pointer from the caller, the function cannot tell whether this pointer is nil
or not. In this situation a nil
check is mandatory.
Let me pick up the original example and add nil
checks.
The functions square
, negate
, and addTwo
need to check the received pointer for nil
and return an error in that case, because there is no sound concept of squaring a nil pointer.
func square(d *data) error {
if d == nil { // sorry, no way of squaring nil
return fmt.Errorf("d must not be nil")
}
d.val *= d.val
d.steps++
return nil
}
// ...
func main() {
// ...
err := square(minus34)
if err != nil {
log.Fatalf("square: %s", err)
}
// ...
}
(Full code in the playground. with an intentional call to square
passing a nil
pointer.)
The caller must check the returned error and handle it accordingly. In this particular example, func main
does not check minus34
for nil
ness because minus34
is guaranteed to not be nil
—we create it from a struct literal at the start of main()
.
No, not another golden pointer rule this time. Instead, let me point (no pun intended) you to the very first of the Go Proverbs, which summarizes Go's approach to concurrent data processing:
Don't communicate by sharing memory, share memory by communicating.
Here are the bad news: pointers go straight against this approach. Pass a pointer to a goroutine, and you have two goroutines that can access the same piece of data. This is a sure-fire recipe for data races.
Ok, so instead of passing a pointer, we can pass a channel, right? Data sent down a channel is copied data, and the sending and receiving goroutines end up with their own copies of that data.
But there is a catch. Nothing prevents us from sending pointers through a channel. We can even do so inadvertently, by sending a slice through a channel. Technically, only the header of the slice travels through the channel. A slice header contains the capacity of the slice, its current length, and... a pointer to the actual slice data.
And so we are back at sharing memory instead of communicating. So be aware of the things you send through channels.
nil
before dereferencing themTo quote Axel Wagner,
Language design is a trade-off and there is no such thing as a perfect type-system. Go has decided that it doesn't want to occupy the "most type-safety possible" spot in the language design space.
Bottom line: pointers are not fool-proof, but they are not unusually dangerous either. With a healthy does of developer discipline applied, pointers are a useful tool for modeling absence of data as well as avoiding unnecessary copying of data.
Background photo of cover image by Nick Fewings on Unsplash
Update 2022-10-30: Fix the pointer diagram. Add further reading.
Categories: : The Language