Spawning goroutine closures in a loop - what can possibly go wrong?

Closure semantics can play tricks on us when the closure is a goroutine. Can you spot the error right away?

UPDATE:

With the release of Go 1.22, Go has a new loop behavior, and the problem I describe below is gone. Yay!

Try the playground links in this article to see the new (and better!) behavior.

Goroutines

To run tasks concurrently, Go has a concept called goroutine. A goroutine is a function that, once spawned, runs inside its own thread of execution while the caller's thread of execution can also continue. Both the caller and the callee now run independently of each other. They might run time-sliced (especially if there is only one CPU core) or maybe simultaneously on multiple CPU cores.

Running a goroutine is dead easy. Just put the keyword "go" in front of a function call.

func concurrent(done chan<- bool) {
    fmt.Println("in concurrent()")
    
    // send a boolean through the channel
    done <- true 
}

func main() {

    // create a channel for tracking the goroutine's progress
    done := make(chan bool)
    
    // start a goroutine and pass the channel
    go concurrent(done)
    
    fmt.Println("In main()")
    
    // wait for anything coming out of the channel
    <- done 
    // then exit
}

This prints:

In main()
in concurrent()

(Playground link. Here, a channel is used for communicating that the goroutine has finished.)

Closures

A closure is a really handy construct. Basically, a closure is a function defined inside another function. When a closure is called, it has access to the outer function's local variables. This allows a couple of neat tricks.

For example, a function can create a closure and return it to the caller. The closure carries the outer function's context with it. This context is even available after the outer function exits.

// outer() creates and returns a closure
func outer() func() {
    a := 7
    return func () {
        fmt.Println("The closure knows that a is", a)
    }
}

func main() {

    // have outer() create a closure
    closure := outer()
    
    // call the closure after outer() has finished
    closure()
}

(Playground link)

This prints:

The closure knows that a is 7

Closures and goroutines (and loops)

Here is where things become tricky.

Let's assume a typical scenario in concurrent programming: We want to spawn several goroutines at once. For this, we can use a loop like so:

func main() {

    done := make(chan bool)    

    // create and spawn ten goroutines
    for i := 0; i < 10; i++ {
    
        // define a closure and immediately spawn it as a goroutine. 
        // Pass the done channel    
        go func(finished chan bool) {
            fmt.Println("I am the", i, "th goroutine")
            finished <- true
        }(done) // here we actually call the goroutine
    }
    
    // wait for the goroutines to finish
    for i := 0; i < 10; i++ {    
        <- done        
    }
    fmt.Println("Done.")
}

(Playground link)

But wait, what happens? The program prints:

I am the 10 th goroutine
I am the 10 th goroutine
I am the 10 th goroutine
I am the 10 th goroutine
I am the 10 th goroutine
I am the 10 th goroutine
I am the 10 th goroutine
I am the 10 th goroutine
I am the 10 th goroutine
I am the 10 th goroutine
Done.

Wait, what? All goroutines claim to be the tenth one?

Let's recap:

The goroutines are spawned to run independently from the main thread of execution.
The goroutines are closures, hence they can access the loop variable i.

Why doesn't each goroutine take the value of the loop variable with it at the time it is spawned? Like, when the 3rd goroutine is spawned, then i is 3, hence when the goroutine leaves the context of the outer function, it should take that particular value with it, right? So why do all of them print "10" instead?

The point is, the goroutines do not start running immediately. Concurrency is not parallelism. On a perfectly parallel processor architecture, the goroutines could indeed start running right away. In real life, however, the start usually gets delayed by the goroutine scheduler - and the fact that there are not always enough CPU cores available for running all of the goroutines in a separate system thread.

So what happens is:

The loop spawns ten goroutines.
The goroutines are ready to run but the main goroutine still claims CPU time.
When the goroutines finally can start off, the loop has finished, and i is 10.
Now the goroutines run and read the value of i from their outer function.
And for all of them, i is 10 in that moment.

How to properly pass data to a goroutine closure

Maybe you already see where the fault lies. The code relies on a closure's ability to read the outer function's local variables.

What we should have done instead is to pass all data to the goroutine via proper function parameters.

Like so:

func main() {

    done := make(chan bool)    

    // create and spawn ten goroutines
    for i := 0; i < 10; i++ {
    
        // define a closure and immediately spawn it as a goroutine. 
        // Pass the loop index (new!) and the done channel    
        go func(index int, finished chan bool) {
            fmt.Println("I am the", index, "th goroutine")
            finished <- true
        }(i, done) // now we also pass i to the goroutine
    }
    
    // wait for the goroutines to finish
    for i := 0; i < 10; i++ {    
        <- done        
    }
    
    fmt.Println("Done.")
}

(Playground.)

Note that the parameter list of the closure now also includes an int variable, and in the actual argument list, we pass the loop index i:

go func(index int, ...) {
    //...
}(i, ...)

Now, i is well-defined at the point of calling the closure. Once passed over, it cannot change anymore.

And with this small change, we get the expected result:

I am the 9 th goroutine
I am the 4 th goroutine
I am the 0 th goroutine
I am the 1 th goroutine
I am the 2 th goroutine
I am the 3 th goroutine
I am the 6 th goroutine
I am the 5 th goroutine
I am the 7 th goroutine
I am the 8 th goroutine
Done.

A short remark before you leave...

In the above examples, I passed the loop index to the goroutines. I did this as a very simple way of showing what data each goroutine receives.

Please do not confuse this with giving goroutines an "ID"!

Goroutines should be treated as anonymous entities. Goroutines are nothing more than a separate path of execution that runs independently from the main execution path.

Do not think "thread". Do not assign a goroutine an identity. Do not try implementing a goroutine management system that attempts to query states like "blocking", "sleeping", or "in a system call". You should not have to micro-manage goroutines, the Go runtime scheduler does that for you.

In the above examples, I could have claimed the loop variable to be some kind of incoming data to process. But I am a fan of really simple examples, and constructing an artificial "real-world" scenario would not have contributed to the message the code shall convey.

Categories: : Concurrency, The Language