It escaped! How can you know if a variable lives on the stack or the heap, and why should you care?

Function-local variables are stored in the call stack, but sometimes, they "escape to the heap". This affects performance and garbage collection.

You may have heard of variables that "escape to the heap". This sounds funny and worrying at the same time. The good news: If your goal is to write a correct program, you couldn't care less. Variable escaping does not change the semantics of your code a single bit. Close this browser tab and read something else.

However, if your project is in the stage of performance optimization (remember: don't do premature optimization!), then you might care about variable escaping.

But why? And what exactly is variable escaping?

Stack versus heap

Here is a quick memory management primer. In Go (and in many other programming languages, Forth not included), variables can live in one of two places: the stack or the heap.

Stacks are great for function-local data storage

A stack is a piece of contiguous memory that can extend indefinitely. (Many programming languages set an upper bound for stack space at compile time, but Go takes a different route and allocates more stack space if the stack runs out of space.) A stack works like a last-in, first-out (LIFO) storage. Data can be added at one end of the stack only and removed only from that end.

The LIFO nature of a stack makes it an ideal place for storing function-local data. Once a function is invoked, it can create local variables that live as long as the function lives. A function continues to "live" while it calls other functions. When a function exists, all function-local data becomes invalid.

A function can call at most one other function at any given point in time. All function calls that exist at a given point in time, from main() down to the currently executing function, therefore build a linear call chain that is, basically, a stack. (Maybe you know the term "call stack" already.)

Because of this, when a function starts, it can put all of its local data on top of the call stack and pop the data off again when it exits.

Heaps are great for long-lived data

A heap is a storage structure that allows allocating memory space on demand. This space is not bound to the lifetime of a function. A heap is therefore ideal for storing long-lived data that must outlive the function that created that data. In low-level languages like C, it is the developer's duty to take care that allocated heap space is eventually de-allocated, or freed, somewhere else in the code. Garbage-collected languages like Go do that automatically. Once all pointers to a piece of heap-allocated data are gone, the garbage collector can de-allocate that piece of data.

Stack storage is fast

The linear nature of a stack makes allocations and de-allocations extremely cheap. Stacks usually start with pre-allocated space that is ready to use. (In Go, each goroutine has its own call stack with an initial size of 2048 bytes, enough for a few levels of nested function calls. If the stack is about to run out of space, the runtime allocates more.) A dedicated pointer holds the address where the stack ends (the "top" of the stack). Because the stack is preallocated, reserving stack space for storing the local data that a function call needs is as easy as increasing a pointer value. When the function exists, that space is deallocated by setting that pointer back to the previous value.

Heap storage is more flexible but comes at a cost

Managing a heap is more involved than managing a stack. Because heap space is allocated on demand and freed "when not needed anymore" (when is that, exactly?), the structure of a heap is not linear like a stack. Deallocation leaves gaps in the heap space that need to be managed, or otherwise, a future allocation request would not find enough contiguous space to match the requested size. Garbage collection also adds to the runtime cost of heap management.

Data on the heap is therefore more time-expensive than data on the stack.

(This is the reason why high-performance libraries often brag about "zero allocations", or "near-zero allocations".)

OK, so what's the problem?

There is function-local data that is bound to the lifetime of the function and heap-allocated data for long-lived data. It should be clear what kind of data should go to which of the two storage mechanisms. Why bother?

Unfortunately, things are not that clear.

Some function-local variables cannot live on the stack

Functions can create function-local data that cannot live on the stack. How so?

Imagine a function that creates a local variable and returns a pointer to that variable.

func b() *int {
    m := 1
    return &m
}

If the variable lived on the stack, it would get popped off the stack at the moment the function returns. The returned pointer would be invalid immediately because the de-allocated stack frame is now available for re-allocation by the next function call. The place where the variable was stored would be overwritten by random data.

So the stack is not a suitable place for that variable. The struct needs to be stored on the heap instead, which, as we have seen before, is the place for data that outlives the function that created it.

This has direct implications for performance and memory management. The more variables escape to a heap, the more work the garbage collector has to do.

The compiler can detect such situations. It will then take measures to have that struct allocated on the heap at runtime.

This is what happens when a variable "escapes to the heap."

So do we only have to watch out for pointers as return values? No. Again, things are more complicated.

Hidden pointers

Several data types contain hidden pointers, including:

  • Slices
  • Maps
  • Structs with pointer fields
  • Function literals

This is why looking out for pointer types is not enough to detect escaping variables. Moreover, there can be more situations that justify moving a variable to the heap. Ultimately, this is the compiler's decision. This process is called escape analysis.

To know what the compiler decides, we can make it talk to us.

How to find out if variables escape to the heap

The Go tools provide an option for listing the results of the escape analysis. Or rather, two equivalent options:

go run -gcflags "-m" <file.go ...>

or

go tools compile -m <file.go ...>

The flag -m prints optimization decisions, especially decisions between stack and heap storage.

Showcase

This code contains several examples of variables that move, or escape, to the heap.

package main

// returnValue returns a value over the call stack
func returnValue() int {
    n := 42
    return n
}

// returnPointer returns a pointer to n, and n is moved to the heap
func returnPointer() *int {
    n := 42  // --- line 11 ---
    return &n
}

// returnSlice returns a slice that escapes to the heap,
// because the slice header includes a pointer to the data.
func returnSlice() []int {
    slice := []int{42} // --- line 18 ---
    return slice
}

// returnArray returns an array that does not escape to the heap,
// because arrays need no header for tracking length and capacity
// and are always copied by value.
func returnArray() [1]int {
    return [1]int{42}
}

// largeArray creates a ridiculously large array that escapes to the heap,
// even though the array itself is not returned
// and thus does not outlive the function.
func largeArray() int {
    var largeArray [100000000]int  // --- line 33 ---
    largeArray[42] = 42
    return largeArray[42]
}

// returnFunc() returns a function that escapes to the heap
func returnFunc() func() int {
    f := func() int {  // --- line 40 ---
        return 42
    }
    return f
}

func main() {
    a := returnValue()
    p := *returnPointer()
    s := returnSlice()
    arr := returnArray()
    la := largeArray()
    f := returnFunc()()

    // Consume the variables to avoid compiler warnings.
    // I don't use Printf/ln because this produces a lot 
    // of extra escape messages.
    if a+p+s[0]+arr[0]+la+f == 0 {
        return
    }
}

With the below commands, I also use the -l flag to disable inlining. This removed the noise from the output.

Compiling this code with the -gcflags=-m -l option produces this output:

$`go` run -gcflags="-m -l" main.go
# command-line-arguments
./main.go:11:2: moved to heap: n
./main.go:18:16: []int{...} escapes to heap
./main.go:33:6: moved to heap: largeArray
./main.go:40:7: func literal escapes to heap

I added line comments to the code so that you can match the messages with the corresponding statements.

To summarize

Variable escaping is a process where the compiler decides to move a variable from the stack to the heap. This is necessary when a function returns a pointer to a local variable, or when a variable contains hidden pointers.

You do not have to care about escaping variables unless you need to do performance optimization. Variable escaping does not change the semantics of your code.

If you need to optimize the performance of your code, you can reveal escaping variables by compiling your code with either of the commands go run -gcflags "-m" <file.go ...> or go tool compile -m <file.go ...>

Happy coding!

Applied Go Courses helps you getting up to speed with Go without friction. Our flagship product, Master Go, is an online course with concise and intuitive lectures and many practices and exercises. Rich graphic visualizations make complex topics easy to understand and remember. Lectures are short and numerous, to help planning and saving your precious time, as well as serving as quick reference material after the course. Learn more at https://appliedgo.com.

Photographs:

Categories: Memory Management, Performance