TryLock: problem solver or footgun?

Go 1.18 introduces a new method for mutexes: TryLock(). The doc warns that correct use cases are rare. Is deadlock prevention a correct use case?

New in Go 1.18: TryLock()

Locking a mutex with the Lock() method is a blocking operation. As long as the mutex is locked by another part of the system, the goroutine that calls Lock() is unable to proceed. This can lead to deadlock situations.

The new TryLock() method in Go 1.18 seems to provide a solution. TryLock() does not block but immediately returns a boolean success message. The goroutine then can continue and take appropriate action.

"May contain traces of correct use cases"

While this sounds like a cool idea, the TryLock() function documentation warns:

Note that while correct uses of TryLock do exist, they are rare, and use of TryLock is often a sign of a deeper problem in a particular use of mutexes.

So should we embrace TryLock as a useful new tool in the Mutex toolbox, or better avoid it if possible?

TryLock for deadlock prevention

To answer the question, let's look at a use case that seems quite valid: deadlock prevention.

Here is an example.

Let's assume we want to implement a file copy function that is safe for concurrent use. For the sake of simplicity, a "file" is only a struct with a path, some data, and a mutex.

type File struct {
    path string
    data [10]byte
    mu   sync.Mutex
}

In order to copy file A to file B, the copy function must be sure that both files are accessible. Again for simplicity, I make no distinction between reading and writing. In both cases, the whole file shall be locked.

So the copy function might look like this:

func copyFile(task string, source, target *File) {
    // get exclusive access to both files
    source.mu.Lock()
    // simulate time for opening the source file
    // to highly increase the chances for a deadlock
    time.Sleep(time.Millisecond)
    target.mu.Lock()

    // simulate copying data between files
    copy(target.data[:], source.data[:])

    // release the file locks again
    target.mu.Unlock()
    source.mu.Unlock()
}

The Sleep timer increases the chance of running into a lock, so that only one or very few attempts are needed to produce a deadlock.

A deadlock situation occurs if the copy function is invoked by two goroutines that want to copy the same files. One goroutine wants to copy file A to file B, and the other goroutine wants to copy file B to file A. Obviously, there is a deeper problem behind that scenario, and ultimately someone would need to improve the design to avoid copying files back and forth, but for this particular deadlock experiment, let's ignore those design considerations. After all, the point is that we want to see a classic deadlock situation.

And we get that deadlock easily by running a "backup" and a "restore" action for the same pair of files at the same time.

func main() {
    orig := &File{path: "original"}
    bck := &File{path: "backup"}
    done := make(chan struct{})

    go func() {
        copyFile("backup", orig, bck)
        done <- struct{}{}
    }()
    copyFile("restore", bck, orig)

    <-done
}

Because of the artificial delay between the first and the second lock in copyFile, almost all test runs will result in the following sequence of events:

  1. Both goroutines first acquire a lock for their respective source file. The backup routine locks the original file, and the restore routine locks the backup file.
  2. Now both try to acquire a lock for their respective target file. However, both files are locked already,

And here we have a deadlock. None of the goroutines can move back or forth anymore.

Non-blocking locking (shocking!?)

(I deeply apologize for the silly pun. I wish I could remove it but the article is published already.)

Obviously, the deadlock is possible due to the blocking nature of the Lock() calls. What if any of the two goroutines was still able to act? It could then give way for the other goroutine to finish its copying action.

With TryLock(), we can do exactly that. Here is how:

  1. First, try to lock the source file.
  2. If that succeeds, try to lock the target file.
  3. If the second lock also succeeds, copy the data and release the locks.
  4. Else ensure the first lock is unlocked, and wait for some time, then go back to step 1.

Expressed as code:

func copyFile(task string, source, target *File) {
    var lock1 bool
    for {
        if source.mu.TryLock() {
            lock1 = true
          
            time.Sleep(time.Millisecond)

            if target.mu.TryLock() {
                // both locks are acquired, leave the loop and copy the data
                break
            }
        }
        // At this point, at least one of the locks is still not acquired.
        // Unlock the source mutex if necessary, back off for a while, then try again
        if lock1 {
            source.mu.Unlock()
            lock1 = false
        }
        time.Sleep(time.Millisecond)
    }

    copy(target.data[:], source.data[:])

    target.mu.Unlock()
    source.mu.Unlock()
}

With this simple modification, the code never can run into a deadlock while attempting to lock two mutexes in a sequence.

Are there any footguns in sight?

In theory, this approach may have some downsides.

  1. The code could go into a livelock situation, where both goroutines repeatedly grab the first lock, fail to grab the second lock, and go into the back-off loop.
  2. The additional lock checking logic and busy loop waiting may affect performance

A livelock, however, seems very unlikely, as the two goroutines would have to run in perfect timely symmetry, in order to always lock the first mutex when it is unlocked and always grab the second mutex when it is already locked. Such a symmetry is so unlikely that some livelock demo algorithms that I found on the Web even go as far as to introduce artificial timings to achieve some sort of cadence that both goroutines follow.

And as far as performance is concerned—well, I would suspect this highly depends on the given scenario. When I ran the above copy and restore goroutines in a loop, I got these numbers from a second of execution time:

56371  backups with      471 backoff rounds
49436 restores with      465 backoff rounds

The number of back-off rounds (that is, sleeping for a millisecond and repeating) is a hundred times (or two orders of magnitude) lower than the number of performed backup and restore actions. In other words, each goroutine backs off only in a hundredth of all attempts to acquire both locks.

And if even this is not an option, there are other deadlock avoidance strategies available. One of these is ordered locking, which I describe in a lecture in my course Concurrency Deep Dive. (Ouch, a shameless self-plug! But hey, it's my party blog, and I cry market if I want to...)

Image by Darby Browning from Pixabay

Categories: Concurrency, The Language