Wednesday, March 31, 2010

Load Balancing Actors with Work Stealing Techniques


Actor concurrency is well known to provide a more convenient model to concurrency than traditional thread based approaches. In contrast with threads, you can have as many actors as you want (if memory permits) and the actor implementation will take care of processing actors on threads whenever necessary behind the scenes.

In Akka, which among other things provides actors for the JVM (in Scala), the component which schedules actors for processing on a set of threads is called the "dispatcher". The default dispatcher uses a fixed size thread pool to process actors on. When a message is sent to an actor, a task is scheduled on the thread pool to process all the messages in the mailbox of the actor.

What actor concurrency does not provide out of the box is load balancing among equal actors ("equal" meaning they are both capable of processing the same types of messages).

You can implement load balancing on top of actor frameworks (example in Akka), by sending messages to a set of actors in some clever way (round robin, always choosing the one with the smallest mailbox, ...) but this has a few limitations. This kind of load balancing is rather "static": once a message is sent to an actor, it stays there until that actor processes it. If another actor is already done, it can never help the slow actor for messages that have already been distributed.

Here comes work stealing.

Work stealing is an idea used for example in the Fork/Join framework. Applied to actors, it implies that idle actors can "steal work" from the tail of the mailbox of busy actors. Of course actors are no threads, so "idle actors" can not really do anything because they are not assigned to a thread as long as they stay idle. This is why the implementation is a bit more subtle than the conceptual idea.

I have implemented a work stealing dispatcher for Akka actors. Although its called "work stealing" the implementation actually behaves more as "work donating" because the victim actor takes the initiative. I.e. it actually donates work to its thief, rather than having the thief steal work from the victim.

There is some cleverness in the dispatcher implementation using ReentrantLock.tryLock to check if an actor which is being scheduled on the dispatcher is actually capable of processing its mailbox (another thread might already be busy). If an actor can not process messages for this reason, it will try selecting a thief (round robin), donate a message from its mailbox to the thief (a ConcurrentLinkedDeque, using Deque.pollLast) and process the message in the thief.

Using the victim thread to do the "work donating" while this thread can not be used to process the victims mailbox itself because another thread is already doing that, implies that no threads are "waisted" on the work stealing process. Furthermore, when selecting a thief, the algorithm will make sure that the thief is idle, such that no messages are being redistributed to an actor which can not process them immediately. This also prevents stolen messages from being stolen again by another actor later on.

 To use it, you simply declare to use the work stealing dispatcher in your Akka actors:

val dispatcher = Dispatchers.newExecutorBasedEventDrivenWorkStealingDispatcher("pooled-dispatcher")

class WorkstealingActor extends Actor {
  messageDispatcher = dispatcher
  def receive = {
    case test => {/* process message */}
  }
}

When you create multiple instances of this actor, they'll all share the same work stealing dispatcher.