This repository is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are added as a member.
Every repository with this icon (
) is private.
Every repository with this icon (
This repository is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (
Voldemort Rebalancing
Voldemort rebalancing is the numero uno request for Project Voldemort users for a long time, here is the current plan.
Rebalancing Overall Design
Voldemort Rebalancing need to handle few things
- Load balancing of data
- New server should get equal share from all other servers in cluster.
- Should not impact online serving
- rebalancing will run in background and should be throttled.
- Rebalancing logic need to handle get()/put()/delete() while doing rebalancing, We are thinking of a “proxy server” model to solve this problem
- for both get()/put() external requests
- New server makes a background redirectGet() call
- does a local put() ignoring any ObsoleteVersionExceptions
- Serve the original get()/put() request as normal.
- for delete() request
- we need to delete it from local store and make sure we dont add it back due to rebalancing.
- we should delete it locally and send back an exception
- for both get()/put() external requests
- Restrictions
- Allow only one rebalancing at one time to keep things simple for now.
Rebalancing Process.

Steps
- Get the rebalancing permit from the cluster.
- Attempt to set CLUSTER_STATE_KEY to REBALANCING_CLUSTER on all nodes.
- Fail if not successful on atleast Majority nodes.
- metadataStore need to throw exception if CLUSTER_STATE is already REBALANCING_CLUSTER.
- If succeed in getting permit
- Set State of new/rebalancing node as REBALANCING_MASTER_SERVER
- Set the list of all other nodes as REBALANCING_SLAVES_LIST_KEY
- While REBALANCING_SLAVES_LIST_KEY is not empty
- Choose a node at Random from REBALANCING_SLAVES_LIST_KEY
- set state on node as REBALANCING_SLAVE_SERVER
- set partition list to be stolen/donated to the remote node as REBALANCING_PARTITIONS_LIST_KEY.
- start fetching/updating
- if success
- remove node from REBALANCING_SLAVES_LIST_KEY
- if failed
- Increment failed count for node and continue.
- if failed more than ‘x’ times throw error in log and remove node from list.
- repeat
- Set the state of REBALANCING_MASTER_SERVER back to Normal State
- Set the cluster State back to NORMAL_CLUSTER.
Rebalancing Tasks
| Task | Details | Issue/Ticket no. | primary contact | Status |
| Proxy serving | * Add a new RedirectStore() layer to handle proxy serving ** get()/put()/delete() calls |
Bhupesh Bansal | Done | |
| InvalidMetadataRequest | All servers at all times throws an InvalidMetadataException() if they are requested with a key not belonging to them based on current partitioning and routing strategy |
Done | ||
| Gossip protocol | Servers gossip with each other to keep in sync on * cluster.xml * stores.xml * CLUSTER_STATE |
Not started | ||
| Streaming protocol | For migrating partitions we need to support streaming based protocol. * A new AdminClient and AdminServer was added to Voldemort to support streaming and other operations needed by Rebalancing. |
Done | ||
| Rebalancing Process Design | Implement the full rebalancing logic | Partially done | ||
| Rebalancing user interface | Not Started |






