[WARNINGS] link-id not found - [fmiosacomparativestudypartikernellandchapter1bootup|1 link-id not found - [fmiosacomparativestudypartikernellandchapter3scheduling|3 link-id not found - [fmiosacomparativestudypartikernellandchapter5memorymanagement|5 link-id not found - [fmiosacomparativestudypartikernellandchapter7deadlocks|7 link-id not found - [fmiosacomparativestudypartiiuserlandchapter8iolayers|8 link-id not found - [fmiosacomparativestudypartiiuserlandchapter10environmentlayer|10
Intro (I 1 2 3 4 5 6 7) (II 8 9 10 11) (D) (C) (B
I)
h1. Chapter 7: Deadlocks
(_Premature optimization is the root of all evil." -Knuth))
h2. 1. What Causes Deadlocks
Suppose process @foo@ is trying to access the CD-ROM. Since only one process at a time can have the CD-ROM, there is a lock for it, and @foo@ has that lock. Also assume that process @bar@ is using (and has the lock for) the printer. Now, consider @foo@ suddenly needs the printer. While still holding the CD-ROM lock, @foo@ tries to get to the printer, but since @bar@ has the printer locked, @foo@ blocks, waiting for @bar@ to release the printer. However, if @bar@ has decided that it needs the CD-ROM for something, it ends up blocking to wait for the CD-ROM because @foo@ has it locked. Therefore, both processes are locked indefinitely, waiting for a resource the other has locked. This situation is called a deadlock or, perhaps more descriptively, a deadly embrace.
Textbook definition: A set of processes is deadlocked if each process in the set is blocked, waiting for an event that only another process in the set can cause. (Tanenbaum, 163)
When implementing SMP machines, since there are now multiple, simultaneous processes, and the requisite need for more locking, the possibility of deadlocks must be taken into consideration. An example is in sharing the run-tree. Let's say process A is killing process B. Process A puts a lock on process B and then has to go and put a lock on the entire tree so it can delete process B from it. However, before process A gets the tree lock another CPU running a command like @ps@ (which lists all of the running processes) locks the tree in order to traverse it. Therefore, process A blocks, waiting for @ps@ to finish. However, when @ps@ gets down to process B it blocks because process B is locked by process A. A deadlock has formed.
There are many more types of deadlocks, but they mostly fall under the general perspective of processes fighting for resources.
h3. Resources
Resources are parts of a computer that a program uses, or is given access to. Resources come in all shapes and sizes: disk drives, printers, memory, mouse, etc. Whenever a process chooses to use a resource it must request it (lock it) either explicitly or implicitly, use it, and then release it (unlock).
Resources come in two basic flavors.
Preemptible resources can actually be taken away from a process. An example of this is memory or disk access. Suppose a process has a lock on, say, a chunk of memory. Then, if the kernel desires (such as a to avoid a deadlock), it can just preempt that process' "lock" and page out its chunk of memory. The CPU itself is actually a preemptible resource!
Non-preemptible resources are those that cannot be taken away from a process. An example of this is the printer. If process A prints half its job and process B wants the printer, the kernel can't just take the printer away from process A and give it to B because then the printout would be skewed. These non-preemptible resources are the biggest causes of deadlocks.
h2. 2. Deadlock Algorithms
There are two different ways to deal with deadlocks: prevent them from ever happening or catch them if they do happen. Actually, there is a third: ignore deadlocks altogether. This is purely a numbers game and gets riskier the more processes and resources there are. That aside, let's take a look at a few of the different active methods.
h3. Detection and Recovery
One method is to keep track of all of the resource requests and releases. If the kernel detects a circular trend that could lead to a possible deadlock, it could prevent the request altogether, or just kill the process. Another way of handling deadlocks is to just kill processes that have been blocking for a certain amount of time (like an hour).
These methods only work on machines where killing processes is actually acceptable (when done properly). Mainframes running batch jobs are such machines. If a process is killed then it can just be restarted later as long as its files, modified by the process before it was killed, were returned back to their original state.
h3. Prevention
There are a few ways to prevent deadlocks from happening. One possibility is to add a spool, or message queue, to certain resources that require mutual exclusion. An example of this is the printer. Jobs are queued to the printer's spool so no user process has a lock on the printer, itself. Instead, a daemon runs in the background, pulling jobs off of the spool and sending them to the printer. This daemon is the only thing that will ever need direct access to the printer so there are no fights for the lock.
Another option is to prevent processes from requesting new resources if they are already holding one or more. One way to accomplish this is to require process to release (temporarily or permanently) all of its resources before acquiring a new one. However, with processes that copy large amounts of data from one resource to another, this is not an option.
The other way to accomplish this is to have a process announce all of the resources it will use in the course of its execution, and then request them all at once. This is difficult because it's hard (or impossible) for some processes to know all of the resources they will need, ahead of time. Also, locking out numerous parts of the machine, at once, will block a lot of other processes.
Yet another option is to "order" all the resources, and have each thread always request resources in the same order. If a thread discovers it needs some low-ordered resource, it is required to (permanently or temporarily) release all its higher-ordered resources before it can lock that lower-ordered resource. For example, to allow several processes to concurrently modify a tree structure, once a process has a lock on some node it is not allowed to lock the root -- it must first release the lock on the node, because it is only allowed to lock both the root and a node in the proper order: first the root, then the node.
h4. Banker's Algorithm
Avoidance of deadlocks can also come from strategic scheduling of processes. A widely-used algorithm is called the banker's algorithm. This algorithm works similar to how a banker in a small town would handle lines of credit. The basic structures to allow for this algorithm is for each process to announce how many of what resource it's using, and the maximum amount it would use throughout that processes' full execution. For instance, if process A copies information from one tape drive to two others, then it has in its table three as the maximum number of tape drive resources it will need. Let's say, though, that process A is only using one, at the moment, so it also puts in its table that it is using 1 tape drive resource.
The scheduler will also need to know how many of each resource the machine has. For instance, let's assume that the machine running process A has 10 tape drives. Now, let's assume that the other processes' tables contain the following:
| Process | Using | Max | |||
| A | 1 | 3 | |||
| B | 2 | 4 | |||
| C | 2 | 5 | |||
| D | 3 | 8 | |||
| ======== | ======= | ==== | |||
| Remaining: | 2 |
So the "banker's" goal is to ensure that there are enough remaining tape drives to accommodate at least one process's max-needed to make sure the "bank" (i.e. system) doesn't go bankrupt on resources. In the case of the above table, there are enough tape drives left to satisfy process A or B if either of them cashed out and used its max. Although process C couldn't, it could wait until A or B finished their run and release their resources. Enough tape drives would be available, then, for process C to do its thing.
If process D was given one more tape drive then this table would claim the state of the machine unsafe (i.e. deadlock possible) because none of the processes would be able to be accommodated if any were to try and cash out. Therefore, the banker would not allow process D (or C) to claim any more resources.
To implement this simplified model to a system with many different types of non-preemptable resources, the scheduler would just keep two tables: one for resources being used, and one for max resources. Rows of these tables would consist of processes, and columns would consist of resources. The banker's job would be to ensure that at least one process can be granted all of its max resources.
There are two major drawbacks to this algorithm, though. First, it only effectively works on machines with multiple numbers of certain resources. On machines with one CD-ROM, for instance, if two or more processes exist on the run-tree that list, even just 1, as their max CD-ROMs needed, then neither of them will be allowed to access it. Also, it is nearly impossible, as stated earlier, for some processes to know their max resources needed (i.e. those processes that rely highly on user input).
h2. 3. O/S Comparisons
h3. FMI/OS:
The number of possible deadlocks to compensate for is reduced dramatically on a client-server system. In FMI/OS, there are basically three:
h4. Re-Entrant Interrupt Handler
Let's assume that an interrupt handler has locked its subscriber list as it goes through and moves all of the processes from @ready@ to @pending@. Then, while locked, the interrupt happens again. Thus, the handler is called again, starting over from the beginning. However, when it arrives at requesting the lock on its subscriber list it is blocked because the list is already locked. And who has the lock? The handler does, but it doesn't know it! Since the scheduler doesn't control the handlers, there was no context switch, so there's no way to return to the previous running instance of the handler to release the lock. A deadlock is born.
In an SMP system, two CPUs can simultaneously run an interrupt handler. One will just block while the other has the lock. A deadlock only happens if a single CPU is re-entering the interrupt it was just handling.
Fix: The easiest way that FMI/OS avoids this is to turn off interrupts on that CPU during a lock. We use something called auto locks that are locks which automatically turn off interrupts on that CPU within the duration of the lock. Auto locks also increment a counter keeping track of the number of locks being held on that CPU. With this, @runnext()@ will throw an assertion when trying to context switch if the CPU calling it holds any locks (i.e. @locksheld > 0@).
h4. Lock Releasing
Another possible deadlock comes as a result of just bad programming. However, it's still worth mentioning because it is a common mistake. Some programmers make the mistake of allowing a program to grab a lock but not release it. Having these random locks floating around that will never be released could cause catastrophic deadlocks. Auto locks also prevent this from happening.
Fix: Although auto locks will prevent the deadlock, you should still always make sure to release all locks when writing code!
h4. Lock Order
The final deadlock FMI/OS must watch out for is the second example given in this chapter (the one with killing processes vs. @ps@). This deadlock is also a result of bad programming and can easily be prevented with proper ordering of grabbing the locks.
Fix: If a process wants to grab a lock on a thread in the run-tree (i.e. a leaf node of the tree) then it needs to grab a lock on the whole tree, first, before grabbing the leaf lock.
(c)2006 Dimitri Hammond
The ""DragonFly BSD":http://en.wikipedia.org/wiki/DragonFlyBSD#Protectingshared_resources" has yet another way of dealing with locks: "Serializing tokens".
For a long time I thought that, of course, an OS must have locks. Duh. However, recent research on "lock-free and wait-free algorithms":http://en.wikipedia.org/wiki/Lock-freeandwait-free_algorithms showed that locks are never absolutely necessary.
And by "recent", I mean 1991: ** "Maurice Herlihy 1993":http://citeseer.ist.psu.edu/herlihy93waitfree.html. ** "Maurice Herlihy 1991":http://portal.acm.org/citation.cfm?doid=114005.102808.
Since most people don't believe that lock-free (much less wait-free) algorithms were possible, there hasn't been much research into them. The first "proof of concept" lock-free algorithms were much slower than current state-of-the-art locking algorithms. Does that mean lock-free algorithms are inherently less efficient?