public
Description: An open source clone of Amazon's Dynamo.
Home | Edit | New

Voldemort Rebalancing

Voldemort rebalancing is the numero uno request for Project Voldemort users for a long time, here is the current plan.

Rebalancing Overall Design

Voldemort Rebalancing need to handle few things

  1. Load balancing of data
    • New server should get equal share from all other servers in cluster.
  2. Should not impact online serving
    • rebalancing will run in background and should be throttled.
  3. Rebalancing logic need to handle get()/put()/delete() while doing rebalancing, We are thinking of a “proxy server” model to solve this problem
    • for both get()/put() external requests
      1. New server makes a background redirectGet() call
      2. does a local put() ignoring any ObsoleteVersionExceptions
      3. Serve the original get()/put() request as normal.
    • for delete() request
      1. we need to delete it from local store and make sure we dont add it back due to rebalancing.
      2. we should delete it locally and send back an exception
  4. Restrictions
    • Allow only one rebalancing at one time to keep things simple for now.

Rebalancing Process.

Steps

  1. Get the rebalancing permit from the cluster.
    • Attempt to set CLUSTER_STATE_KEY to REBALANCING_CLUSTER on all nodes.
    • Fail if not successful on atleast Majority nodes.
      • metadataStore need to throw exception if CLUSTER_STATE is already REBALANCING_CLUSTER.
  2. If succeed in getting permit
    1. Set State of new/rebalancing node as REBALANCING_MASTER_SERVER
    2. Set the list of all other nodes as REBALANCING_SLAVES_LIST_KEY
    3. While REBALANCING_SLAVES_LIST_KEY is not empty
      1. Choose a node at Random from REBALANCING_SLAVES_LIST_KEY
      2. set state on node as REBALANCING_SLAVE_SERVER
      3. set partition list to be stolen/donated to the remote node as REBALANCING_PARTITIONS_LIST_KEY.
      4. start fetching/updating
      5. if success
        • remove node from REBALANCING_SLAVES_LIST_KEY
      6. if failed
        • Increment failed count for node and continue.
        • if failed more than ‘x’ times throw error in log and remove node from list.
      7. repeat
  3. Set the state of REBALANCING_MASTER_SERVER back to Normal State
  4. Set the cluster State back to NORMAL_CLUSTER.

Rebalancing Tasks

Task Details Issue/Ticket no. primary contact Status
Proxy serving * Add a new RedirectStore() layer to handle proxy serving
** get()/put()/delete() calls
Bhupesh Bansal Done
InvalidMetadataRequest All servers at all times throws an InvalidMetadataException() if they are requested with a key not belonging to them based on current partitioning and routing strategy
Done
Gossip protocol Servers gossip with each other to keep in sync on
* cluster.xml
* stores.xml
* CLUSTER_STATE
Not started
Streaming protocol For migrating partitions we need to support streaming based protocol.
* A new AdminClient and AdminServer was added to Voldemort to support
streaming and other operations needed by Rebalancing.
Done
Rebalancing Process Design Implement the full rebalancing logic Partially done
Rebalancing user interface Not Started
Last edited by bbansal, Wed Oct 14 17:20:22 -0700 2009
Home | Edit | New
Versions: