public
Description: Akka Transactors
Home | Edit | New

Reference: Supervisor Hierarchies - Fault Handling

Supervisor hierarchies (fault handling)

Supervisor hierarchies originate from Erlang’s OTP framework.

A supervisor is responsible for starting, stopping and monitoring its child processes. The basic idea of a supervisor is that it should keep its child processes alive by restarting them when necessary. This makes for a completely different view on how to write fault-tolerant servers. Instead of trying all things possible to prevent an error from happening, this approach embraces failure. It shifts the view to look at errors as something natural and something that will happen and instead of trying to prevent it; embraces it. Just ‘Let It Crash™’, since the components will be reset to a stable state and restarted upon failure.

We have two different restart strategies; All-For-One and One-For-One. Best explained using some pictures (referenced from erlang.org):

OneForOne

The OneForOne fault handler will restart only the component that has crashed.

AllForOne

The AllForOne fault handler will restart all the components that the supervisor is managing, including the one that have crashed. This strategy should be used when you have a certain set of components that are coupled in some way that if one is crashing they all need to be reset to a stable state before continuing.

Restart callbacks

There are two different callbacks that the Active Object and Actor can hook in to:

  • Pre restart
  • Post restart

These are called prior to and after the restart upon failure and can be used to clean up and reset/reinitialize state upon restart. This is important in order to reset the component failure and leave the component in a fresh and stable state before consuming further messages.

Restart and Life-cycle strategies

Both the Active Object supervisor configuration and the Actor supervisor configuration take a ‘RestartStrategy’ instance which defines the fault management:

RestartStrategy(
  AllForOne(), // restart policy (AllForOne or OneForOne) 
  3,           // maximum number of restart retries
  5000         // within time in millis
)

The other common configuration element is the ‘LifeCycle’ which defines the life-cycle:

LifeCycle(
  Permanent(), // options: 'Permanent' which means that the component will always be restarted
               //          'Temporary' means that it will be restarted if it has exited through an error not normally 
  1000         // maximum time in millis allowed for doing a clean shutdown before killing it by forces
)

Java API: Active Objects

Declarative supervisor configuration

To configure Active Objects for supervision you have to consult the ‘ActiveObjectManager’ and its ‘configure’ method. This method takes a ‘RestartStrategy’ and an array of ‘Component’ definitions defining the Active Objects, their ‘LifeCycle’ and timeout in millis. Finally you call the ‘supervise’ method start everything up. The Java configuration elements reside in the ‘se.scalablesolutions.akka.kernel.config.JavaConfig’ class and needs to be imported statically.

Here is an example:

import static se.scalablesolutions.akka.kernel.config.JavaConfig.*;

private ActiveObjectManager manager = new ActiveObjectManager();

manager.configure(
  new RestartStrategy(new AllForOne(), 3, 1000), new Component[] {
    new Component(
      Foo.class,
      new LifeCycle(new Permanent(), 1000),
      1000),
    new Component(
      Bar.class,
      BarImpl.class,
      new LifeCycle(new Permanent(), 1000),
      1000)
  }).supervise();

Then you can retrieve the Active Object as follows:

Foo foo = manager.getInstance(Foo.class);

Restart callbacks

For Active Objects callbacks can be defined in two different ways. The first one is to use annotations. You can annotate a no-argument method void as return type with:

  • @se.scalablesolutions.akka.annotation.prerestart
  • @se.scalablesolutions.akka.annotation.postrestart

The methods can be arbitrary named.

@prerestart
public void preRestart() {
  ... // clean up before restart
}

@postrestart
public void postRestart() {
  ... // reinit stable state after restart
}

Which will invoke these methods upon restart.

The second one is to define the names of these callback methods in the declarative supervisor configuration:

new Component(
  POJO.class,
  new LifeCycle(new Permanent(), 1000),
  new RestartCallbacks("preRestart", "postRestart")),
  10000)

Scala API: Actors

Declarative supervisor configuration

The Actor’s supervision can be declaratively defined by creating a ‘SupervisorFactory’ and overriding its ‘getSupervisorConfig’ method. Here is an example:


object factory extends SupervisorFactory {
  import se.scalablesolutions.akka.kernel.config.ScalaConfig._
  override def getSupervisorConfig = {
    SupervisorConfig(
      RestartStrategy(AllForOne, 3, 100),
      Supervise(
        new MyActor1,
        LifeCycle(Permanent, 100))
      ::
      Supervise(
        new MyActor2,
        LifeCycle(Permanent, 100))
      :: Nil)
  }
}
val supervisor = factory.newSupervisor
supervisor.startSupervisor // will link and start up all actors

Programatical linking and supervision of Actors

Actors can at runtime create, spawn, link and supervise other actors. Linking is done using one of the link methods available in the Actor itself.

Here is the API:

// link and unlink actors
link(actor)
unlink(actor)

// starts and links Actors atomically
startLink(actor)
startLinkRemote(actor)
 
// spawns (creates and starts) actors
spawn(classOf[MyActor])
spawnRemote(classOf[MyActor])

// spawns and links Actors atomically
spawnLink(classOf[MyActor])
spawnLinkRemote(classOf[MyActor])

If a linked Actor is failing and throws an exception then an ‘Exit(deadActor, cause)’ message will be sent to the parent. If the parent wants to be able to do something with this message, e.g. be able restart the linked Actor according to the predefined fault handling schemes then it has to set its ‘trapExit’ flag to true to be able to trap the failure.

class MyActor extends Actor {
  trapExit = true // default is 'false'

  ... // implementation omitted
}

The supervising Actor also needs to define a fault handler that defines the restart strategy the Actor should accommodate when it traps an ‘Exit’ message. This is done by setting the ‘faultHandler’ field.

protected var faultHandler: Option[FaultHandlingStrategy] = None

The different options are:

  • AllForOneStrategy(maxNrOfRetries, withinTimeRange)
  • OneForOneStrategy(maxNrOfRetries, withinTimeRange)

Here is an example:

...
faultHandler = Some(AllForOneStrategy(3, 10000))
...

The supervised actor needs to define a life-cycle. This is done by setting the lifeCycle field as follows:

...
lifeCycle = Some(LifeCycle(Permanent, 100)) // Permanent or Temporary
...

Restart callbacks

In the Actor you can override the ‘preRestart’ and ‘postRestart’ methods to add hook into the restart process. These methods take two parameters:

  • The reason for the failure.
  • The initial config sent to the Actor throught the ‘Init(config)’ message. ‘Some(config)’ if a config has been sent and ‘None’ if not.
def preRestart(reason: AnyRef, config: Option[AnyRef]) = {
  ... // clean up before restart
}

def postRestart(reason: AnyRef, config: Option[AnyRef]) {
  ... // reinit stable state after restart
}
Last edited by rossputin, Mon Nov 30 07:23:11 -0800 2009
Home | Edit | New
Versions: