Go 1.18 introduces a new method for mutexes: TryLock(). The doc warns that correct use cases are rare. Is deadlock prevention a correct use case?
Locking a mutex with the Lock()
method is a blocking operation. As long as the mutex is locked by another part of the system, the goroutine that calls Lock()
is unable to proceed. This can lead to deadlock situations.
The new TryLock()
method in Go 1.18 seems to provide a solution. TryLock()
does not block but immediately returns a boolean success message. The goroutine then can continue and take appropriate action.
While this sounds like a cool idea, the TryLock()
function documentation warns:
Note that while correct uses of TryLock do exist, they are rare, and use of TryLock is often a sign of a deeper problem in a particular use of mutexes.
So should we embrace TryLock as a useful new tool in the Mutex toolbox, or better avoid it if possible?
To answer the question, let's look at a use case that seems quite valid: deadlock prevention.
Here is an example.
Let's assume we want to implement a file copy function that is safe for concurrent use. For the sake of simplicity, a "file" is only a struct with a path, some data, and a mutex.
type File struct {
path string
data [10]byte
mu sync.Mutex
}
In order to copy file A to file B, the copy function must be sure that both files are accessible. Again for simplicity, I make no distinction between reading and writing. In both cases, the whole file shall be locked.
So the copy function might look like this:
func copyFile(task string, source, target *File) {
// get exclusive access to both files
source.mu.Lock()
// simulate time for opening the source file
// to highly increase the chances for a deadlock
time.Sleep(time.Millisecond)
target.mu.Lock()
// simulate copying data between files
copy(target.data[:], source.data[:])
// release the file locks again
target.mu.Unlock()
source.mu.Unlock()
}
The Sleep timer increases the chance of running into a lock, so that only one or very few attempts are needed to produce a deadlock.
A deadlock situation occurs if the copy function is invoked by two goroutines that want to copy the same files. One goroutine wants to copy file A to file B, and the other goroutine wants to copy file B to file A. Obviously, there is a deeper problem behind that scenario, and ultimately someone would need to improve the design to avoid copying files back and forth, but for this particular deadlock experiment, let's ignore those design considerations. After all, the point is that we want to see a classic deadlock situation.
And we get that deadlock easily by running a "backup" and a "restore" action for the same pair of files at the same time.
func main() {
orig := &File{path: "original"}
bck := &File{path: "backup"}
done := make(chan struct{})
go func() {
copyFile("backup", orig, bck)
done <- struct{}{}
}()
copyFile("restore", bck, orig)
<-done
}
Because of the artificial delay between the first and the second lock in copyFile
, almost all test runs will result in the following sequence of events:
And here we have a deadlock. None of the goroutines can move back or forth anymore.
(I deeply apologize for the silly pun. I wish I could remove it but the article is published already.)
Obviously, the deadlock is possible due to the blocking nature of the Lock()
calls. What if any of the two goroutines was still able to act? It could then give way for the other goroutine to finish its copying action.
With TryLock()
, we can do exactly that. Here is how:
Expressed as code:
func copyFile(task string, source, target *File) {
var lock1 bool
for {
if source.mu.TryLock() {
lock1 = true
time.Sleep(time.Millisecond)
if target.mu.TryLock() {
// both locks are acquired, leave the loop and copy the data
break
}
}
// At this point, at least one of the locks is still not acquired.
// Unlock the source mutex if necessary, back off for a while, then try again
if lock1 {
source.mu.Unlock()
lock1 = false
}
time.Sleep(time.Millisecond)
}
copy(target.data[:], source.data[:])
target.mu.Unlock()
source.mu.Unlock()
}
With this simple modification, the code never can run into a deadlock while attempting to lock two mutexes in a sequence.
In theory, this approach may have some downsides.
A livelock, however, seems very unlikely, as the two goroutines would have to run in perfect timely symmetry, in order to always lock the first mutex when it is unlocked and always grab the second mutex when it is already locked. Such a symmetry is so unlikely that some livelock demo algorithms that I found on the Web even go as far as to introduce artificial timings to achieve some sort of cadence that both goroutines follow.
And as far as performance is concerned—well, I would suspect this highly depends on the given scenario. When I ran the above copy and restore goroutines in a loop, I got these numbers from a second of execution time:
56371 backups with 471 backoff rounds
49436 restores with 465 backoff rounds
The number of back-off rounds (that is, sleeping for a millisecond and repeating) is a hundred times (or two orders of magnitude) lower than the number of performed backup and restore actions. In other words, each goroutine backs off only in a hundredth of all attempts to acquire both locks.
And if even this is not an option, there are other deadlock avoidance strategies available. One of these is ordered locking, which I describe in a lecture in my course Concurrency Deep Dive. (Ouch, a shameless self-plug! But hey, it's my party blog, and I cry market if I want to...)
Image by Darby Browning from Pixabay
Categories: : Concurrency, The Language