Friday, September 27, 2013

MySQL Fabric - Sharding - Shard Maintenance


Introduction

Let us start by listing the scenarios in which we will need to perform shard maintenance. In addition to periodic maintenance operations the following situations might mandate performing shard maintenance.

A shard is physically a server group. A single non-performant server can impact the latency of requests for the entire group. Hence we need to be able to pull out this non-performant server and replace it by a spare, without impacting the accessibility of the entire shard.

Maintenance activity might involve performing software upgrades on the different servers in the server group of the shard.

We might also need to upgrade the hardware on all the machines in the server group of a shard without impacting the accessibility of the entire data.

What if we want to form an alternate group and migrate the entire shard there? For example during times of heavy load we will want to create a group using more powerful servers, while during light load we might want to migrate to a group of more economical, less power consuming and cheaper machines.

Let us look at what options are available to do the above.

 

Sharding Topology

The topology we create has three shards. There are four server groups. One of the server groups serves as the Global group for the setup. Each of the server group has two MySQL servers, one serving as the master and another as the slave.

The assumption is that you have already gone through the following related blogs to create the above configuration.

Pulling out a non-performant server

We use the intrinsic HA management capabilities built into a Fabric server group to manage moving servers in and out seamlessly out of the shards.



case 1: Server is a master in the group

In the above setup assume we want to retire the MySQL Server running on host3:port3 (master in GROUPID1).

Step 1: Find the UUID of the server


command:


mysqlfabric server lookup_uuid <host:port> <user_id> <password>


where:


host:port – host:port of the server whose uuid we want to lookup (in the above


case host3:port3). 

user_id, password – credentials for the MySQL Server.


(OR)


mysqlfabric group lookup_servers <group-id>


where:



group-id – The group-id of the Group whose servers we want to lookup (in the 


above case GROUPID1).


Step 2: Choose another server in the group and promote that server into 


master


command:


mysqlfabric group promote <group-id><server-uuid>



where:


group-id – The group-id of the Group in which we want to promote a server to 


master (in the above case GROUPID1).


server-id – The server-id of the server which we want to promote as master.


Step 3: Remove the server from the group


command:


mysqlfabric group remove <group-id> <server-uuid>


where:


group-id – The groupid of the Group from which we want to remove a server (in 


the above case GROUPID1).

server-uuid – The uuid of the server which we want to remove from the group.

Now the server is safely out of the group. Now we can perform maintanence operations on the server and restore it later.

case 2: Server is a slave in the group

Step 1: Remove the server from the group
command:
mysqlfabric group remove <group-id> <server-uuid>
where:
group-id – The group-id of the group from which we want to remove a server (in the above case GROUPID1).
Server-uuid – The server UUID of the server we want to remove.

 

Performing software and hardware updates on a server

The same operation as above can be performed to remove the servers one by one and perform upgrades on them (or) we can move the entire shard to an alternate server group, upgrade all the servers in the current server group and move the
shard back.

Moving shards is explained below.


Moving the shard into another group

The sharding implementation supports a Move operations that allows moving a shard into another server group (a group with GROUPID4). The original server group is detached from the global group and will not receive any updates and its servers will not be reflected during lookups either.

While the move operation is in progress transactions can carry on without being affected.

For example to move a shard with shard-id=1 into a new group with new-group-id=GROUPID4. (The shard-id can be found using the command – mysqlfabric store dump_sharding_information)

command:
mysqlfabric sharding move <shard-id> <new-group-id>

where:
shard-id – The shard-id of the shard that needs to be moved (in this case 1)
new-group-id – The group-id of the server group into which this shard needs to be moved. (in this case GROUPID4).

Now we can cleanup and use the servers in GROUPID1 in whichever way we want.

Sunday, September 22, 2013

MySQL Fabric - Sharding - Migrating From an Unsharded to a Sharded Setup


Introduction


Having written an application to work with an unsharded MySQL setup, how do we migrate the application to a sharded setup ?


The above problem can be divided into two parts


  • Migrating the MySQL Store (with minimum disruption to ongoing transactions)
  • Migrating the Application.


This blog deals with the first problem of migrating the MySQL Store. The second problem of migrating the application will be handled in a separate blog post, and we shall assume for now that we just replace the old unsharded non-Fabric aware application with the new Fabric aware application.



 
Most enterprises start with a single server setup to develop a prototype and later realize that they need to scale. This blog presents a way for such a setup to seamlessly scale to a sharded setup with minimal disruption of ongoing transactions.


Unsharded Setup


The following is the schema in the MySQL Server running on localhost:13013


 
The employee database has the following tables

  • employee
  • salaries
  • dept_emp
  • fuel_reimb

 

Creating the Sharded Configuration


Let us assume that our final sharded configuration will have three shards. Each of these shards will be present in one group and there will be one global group.


 
The tables employee, salaries and dept_emp need to be sharded based on emp_no while fuel_reimb is a global table that needs to be present in all the shards.


Bring the Target server into the Fabric ecosystem


Create a Fabric group and add the server to it. Promote the server to master.


Commands:

mysqlfabric group create GROUPID1

mysqlfabric group add GROUPID1 localhost:13013 root ""

mysqlfabric group promote GROUPID1


 
The server is now part of the Fabric setup.


Creating the Global Group


The global group in a sharding setup is used to propagate schema updates to all tables in the sharding setup and updates to global tables throughout the sharding ecosystem. The global group contains

  • The schema of the sharded tables
  • The global tables


NOTE: Once we have restored the schema on the global group, until the global group is connected to the server group of the shard (GROUPID1) i.e. Until we create the sharding setup, we need to ensure that schema changing transactions do not occur on the server group of the shard. If schema changes occur in the server group of the shard, these will need to be propogated to the global group also, otherwise it will result in mismatching schemas in the global group and the server group of the shard. This can cause problems for global operations that expect the new schema. Once the sharding setup is created, schema changes can be sent to the global group.


NOTE: Similarly once the global tables have been restored on the global group, we need to prevent changes to the global groups until the sharding setup is created, after which the changes can be propagated to the global group.


Commands:

mysqlfabric group create GLOBAL_GROUP

mysqlfabric group add GLOBAL_GROUP localhost:13011 root ""

mysqlfabric group promote GLOBAL_GROUP

 

Creating the data in the Global Group


Backup the schema of all the tables and restore them on the on the global group. Also backup the data in the global tables and restore them on the global group.


Commands:

Use mysqldump to backup the schemas

mysqldump -d -u root --single-transaction --all-databases --socket <Server socket file for localhost:13013> > schema.sql

NOTE: --no-data can be used instead of -d in the above command.


Use mysql client to restore the schemas

mysql -u root --socket <Server socket file for localhost:13011> < schema.sql


Use mysqldump to backup the global tables

mysqldump -u root --no-create-info --single-transaction --socket <server socket file for localhost:13013> employee fuel_reimb > global.sql


The GTID_EXECUTED variable would have been set on the destination server. This causes applying the dump using the mysql client to fail. Reset this to allow restoring the global table.

reset master


Restore the data on the global tables

mysql --database=employee -u root --socket <server socket file for localhost:13011> < global.sql

 

Creating the Sharding Setup


Create a sharding definition


A sharding definition introduces a particular sharding scheme to which multiple tables related by a common sharding key can be mapped. A Sharding definition is composed of

  • Sharding scheme – RANGE, HASH
  • Global Group – Gets the global updates for all the shard tables mapped to this sharding definition. It also stores the global tables in this sharding definition.

Command

mysqlfabric sharding define HASH GLOBAL_GROUP

 

Map the tables to the sharding definition


Each sharding definition is associated with a unique ID. We map this unique ID to the table being sharded. Each mapping maps a table and the column containing the shard key to the sharding definition.

Commands:

mysqlfabric sharding add_mapping 1 employee.employee emp_no

mysqlfabric sharding add_mapping 1 employee.dept_emp emp_no

mysqlfabric sharding add_mapping 1 employee.salaries emp_no

 

Define the Shards


The shard definitions define the way the data is sharded based on the shard key. For a hash based sharding scheme, since the shards are automatically defined on the shard key, we do not need to manually define the lower_bounds for each shard.


Each shard is associated with a sharding definition and a group on which the data will be present. When we define a shard we also need to specify if it will be ENABLED for operations on shards.

Commands:

mysqlfabric sharding add_shard 1 GROUPID1 ENABLED


The sharding setup now contains a global group and one shard containing all the data.




Once the sharding setup has been created we can start using the Fabric enabled connector to start executing transactions. This would require shutting down the non-Fabric-aware application and replacing it by a Fabric-aware application.

 

Split the shards


Split the shard containing the data. Splitting helps to distribute the load into another group. This is a way to scale our setup for both read and write loads. The splitting can happen while transactions are ongoing.

Commands:

Add another group to which we want to split the data

mysqlfabric group create GROUPID2


Add the servers that are part of the split group


mysqlfabric group add GROUPID2 localhost:13009 root ""


Promote a server as master in the split group


mysqlfabric group promote GROUPID2


Split the shard


If you do not know the shard ID, dump the sharding information to find the shard ID.

mysqlfabric store dump_sharding_information


split the shard

mysqlfabric sharding split 1 GROUPID2


Now the data is distributed between the GROUPID1 and the GROUPID2.


 

Further Splits


The shards can be split further depending on need. For example to split shard with shard ID = 3 into a group with group ID = GroupID3 we will run the following command,

Commands:

mysqlfabric sharding split 3 GROUPID3


After the above command is executed the topology would look like the following.
 

Please find a few thoughts on maintaining a sharded system here http://vnwrites.blogspot.com/2013/09/mysqlfabric-sharding-maintenance.html

Saturday, September 21, 2013

MySQL Fabric - Sharding - Simple Example


In this article we discuss a use case for starting with a single server (single shard setup) and scaling gradually as the application and its data grows. We also describe here the different aspects of creating shards using MySQL Fabric and what each of it means.

Note: Unlike the previous example the present example is not working with data that already exists.

Note: The example also covers how to modify the python application code to work with Fabric.

Prototyping the Application on a Single MySQL Shard Setup


The figure below depicts the difference in the initial setup between using and not using MySQL Fabric.

 
Note: For the initial setup the Global Group can contain just another MySQL Server running on the same machine.


Note: The application would also need changes in the way it uses the python connector, when it switches to FABRIC.

The Application


In order to keep the focus on working with Fabric, a very simple application is used as an example. The application basically registers and queries employee details. The schema itself is highly simplified and contains the bare minimum to cover the all the use cases in Fabric. 


Application Schema:

 

The employee, department and the salary tables are sharded based on the emp_no that is used as the shard key.

The fuel_reimb represents a table that is often used (in joins queries) to find the fuel allowance for various employees. All the data in this table is often required in joins.


Connecting to the database: 
We use the python connector to write into the database. The python connector is MySQL Fabric aware. We shall look into the details of using the Fabric aware python connector later in this blog.

Starting MySQL Fabric


  1. Start a MySQL Server that acts as the state store.
  2. Change fabric.cfg, for configuring Fabric, to use this MySQL Server as the state store.
  3. Fabric persists metadata about the shard topology in a set of relation tables in the MySQL Server acting as the state store. This step creates the schemas of the tables in the state store.

Commands:

mysqlfabric manage setup

  1. Starting the Fabric Server (contains an XML-RPC Server).

Commands:

mysqlfabric manage start

 Creating the sharding topology


Creating the Global Group


The global group in a sharding setup is used to propagate schema updates to all tables in the sharding setup and updates to global tables throughout the sharding ecosystem. The global group contains,

  • The schema of the sharded tables
  • The global tables

Commands:

mysqlfabric group create GLOBAL_GROUP



mysqlfabric group add GLOBAL_GROUP localhost:13001 root ""

mysqlfabric group add GLOBAL_GROUP localhost:13002 root ""


Find the UUID of localhost:13001 and promote it as master

mysqlfabric group lookup_servers GLOBAL_GROUP


mysqlfabric group promote GLOBAL_GROUP <UUID of MySQL Server running on localhost:13001>


Creating the Server Group (for the first shard)


Each shard maps to a server group. Each server group is a set of MySQL servers in a HA configuration. The application starts with a single server group and adds more shards (server groups) as required. Let us create the first shard.

Commands:

mysqlfabric group create GROUPID1


mysqlfabric group add GROUPID1 localhost:13003 root ""

mysqlfabric group add GROUPID1 localhost:13004 root ""


mysqlfabric group lookup_servers GROUPID1

mysqlfabric group promote <UUID of MySQLServer running on localhost:13003>



  Create a Sharding Definition


A sharding definition introduces a particular sharding scheme to which multiple tables (related by a common sharding key) can be mapped. A Sharding definition is composed of

  • Sharding scheme – RANGE, HASH
  • Global Group – Gets the global updates for all the shard tables mapped to this sharding definition. It also stores the global tables in this sharding definition.
Command

mysqlfabric sharding define RANGE GLOBAL_GROUP

Map the tables to the sharding definition

Each sharding definition is associated with a unique ID. This unique ID is generated when we create a sharding definition. In the above case this ID is 1.We map this unique ID to the table being sharded. Each mapping maps a table and the column containing the shard key to the sharding definition.


Commands:


mysqlfabric sharding add_mapping 1 employee.employee emp_no


mysqlfabric sharding add_mapping 1 employee.salaries emp_no

mysqlfabric sharding add_mapping 1 employee.dept_emp emp_no

Add the Shards


When we add shards we define the way the data is sharded based on the shard key. Each of the shards,

  • Specify the shard mapping ID to which it belongs.
  • It also specifies the group in which the shard data will be present.
  • When we add a shard we also need to specify if it will be ENABLED for operations on shards.
  • A RANGE shard definition also specifies a lower_bound which represents the least value of the shard key that will be present in a particular shard. In this case we assume that the lower_bound=1.
Commands:

mysqlfabric sharding add_shard 1 GROUPID1 ENABLED 1




Define the Schema on the Global Group


The schema for the tables is defined on the master of the global group so that it gets replicated to all the shards.

Queries:

CREATE DATABASE employee;


use employee;



CREATE TABLE employee(emp_no INT PRIMARY KEY AUTO_INCREMENT, birth_date DATE, first_name VARCHAR(14), last_name VARCHAR(16), gender ENUM('M','F'), hire_date DATE);


CREATE TABLE salaries(emp_no INT, salary INT, from_date DATE, to_date DATE, FOREIGN KEY(emp_no) REFERENCES employee(emp_no));


CREATE TABLE dept_emp(emp_no INT, dept_no CHAR(4), from_date DATE, to_date DATE, FOREIGN KEY(emp_no) REFERENCES employee(emp_no));


CREATE TABLE fuel_reimb(role VARCHAR(20), allowance INTEGER);


Write the Application


The below application is written in Python. The application is very simple and uses connector python to insert a row into the table. The goal of the application is to demonstrate the extensions in the python connector that make it FABRIC aware.


 Application Code:


import mysql.connector.fabric as connector

if __name__ == "__main__":

fabric_params = {

"fabric" : {"host" : "localhost", "port" : 8080},

"user" : "root", "passwd" : ""

}

__cnx = connector.MySQLFabricConnection(**fabric_params)

__cnx.set_property(key=str(1),tables=["employee.employee"])

cur = __cnx.cursor()

cur.execute("use employee")

cur.execute("INSERT INTO employee(birth_date, first_name, last_name, gender, hire_date) VALUES(NULL, 'A', 'B', 'M', NULL)")

__cnx.commit()

 Explanation:

fabric_params = {

"fabric" : {"host" : "localhost", "port" : 8080},

"user" : "root", "passwd" : ""

}

__cnx = connector.MySQLFabricConnection(**fabric_params)


The above code gets a connection to the running Fabric XML-RPC server. The parameters that are passed to the MySQLFabricConnection class are

host” - The hostname at which the Fabric server runs.

port” - The port number at which the Fabric server runs.

user”, “passwd” - Login credentials for the Fabric state store.


__cnx.set_property(key=str(1),tables=["employee.employee"])


key – The shard key value

tables – The table name being sharded.

set_property automatically redirects internally to the correct shard on which the queries are executed.


If a query needs to be executed on all the shards, example an update on a global table or a schema change operation on all the shards, then add a parameter

scope = “GLOBAL”

to signify global operations. This is set of “LOCAL” by default.


Adding more Shards


As the size of the shard increases, we add more shards to the sharding topology.


Commands:

Add a group for the shard

mysqlfabric group create GROUPID2


mysqlfabric group add GROUPID2 localhost:13005 root ""

mysqlfabric group add GROUPID2 localhost:13006 root ""


mysqlfabric group lookup_servers GROUPID2

mysqlfabric group promote <UUID of MySQLServer running on localhost:13005>


Add the shard definition

The difference from the previous definition will be that the lower_bound will point to employee number from which we start inserting into this shard. In this case we choose a hypothetical value of 10001. This means that starting 10001 the employee information will be stored in this shard.


mysqlfabric sharding add_shard 1 GROUPID2 ENABLED 10001


The following will direct the application to this new shard.

__cnx.set_property(key=str(10001),tables=["employee.employee"])


In the above case any key value greater than 10001 will automatically redirect the insert to the new shard.


NOTE: The above step actually adds a new shard starting at a specified lower_bound. Other options for moving data around in a sharded setup are shard moves and splits. These shall be explored in later blogs.

Find out how to migrate from an unsharded to a sharded setup here http://vnwrites.blogspot.in/2013/09/mysqlfabric-sharding-migration.html .

MySQL Fabric - Sharding - Introduction


Introduction

Enterprises often start with a single server setup.


As the enterprise grows, so does the data and the number of requests for the data. Using a single server setup makes it difficult to manage the increasing load. This creates the requirement to scale out.

One solution would be to replicate to read servers. The writes go to the master and the reads go to the slaves.

 
Although this setup handles the increased read load, it still cannot handle the increasing write load. Trying to handle this by adding another master just compounds the problem. The write load must be repeated on every master (more work for the application and each master is just as busy as if there were just one).

Thus we need to tackle the problem in a different way. Instead of focusing on our setup we shift focus to our data. By splitting the data we can correspondingly distribute the load on the data.



Fabric Sharding allows the distribution of rows from a MySQL table over multiple
database servers or physical locations.

MySQL Fabric Sharding - Basic Architecture

 

The basic architecture consists of the,
  • Fabric Aware Connector
  • Fabric Server
  • Server Groups - That store the shard data.

MySQL Fabric Aware Connector

The connector contacts the Fabric server to find the location of a particular shard key. The connector also implements caching to avoid repeated round trips to the Fabric server to fetch the sharding information. By building the functionality into the connector, the architecture avoids the need to for an extra hop to a proxy/routing node and so minimizes latency. In the future we may add the option of a standalone routing node to enable Fabric-unaware connectors work with MySQL Fabric.

MySQL Fabric Server

The Fabric server,
  • Stores the sharding information
  • Range Based Sharding
  • Hash Based Sharding
  • Handles Re-Sharding
  • Shard Moving
  • Shard Splitting

Server Groups - Storing the shard data

Each server group represents an HA configuration of MySQL Servers, with one of the servers acting as master and the others acting as slaves.

Sharding Schemes

The data can basically be partitioned using the RANGE based or HASH based sharding scheme. In the current version, the application must provide the sharding key but this is something that we aim to make more transparent in the future.

Range Based Sharding

The Range based sharding scheme partitions a given table based on, user defined ranges of the value of a column, chosen as the sharding key.
For example, in the following Employee table schema,


EmpID
EmpName
EmpAddress
EmpPhone




 
The table can be partitioned on EmpID. Let us assume that the EmpID ranges between values 0 – 10000. We can create five partitions with lower_bounds = 0, 201, 401, 601,801. Where the partitions with lower_bound = 0 stores all the rows with EmpID between 0 – 200. Similarly the partition with lower_bound = 201 stores all the rows with EmpID between 201 – 400.

 Hash Based Sharding

The Hash based sharding scheme uses the md5 cryptographic hash and builds on the underlying RANGE based sharding scheme. The lower_bounds for the underlying RANGE based sharding scheme are generated by applying the cryptographic hash on the group ID for the particular shard.

In order to decide whether a given row of a table must be in a particular shard, we apply the md5 algorithm on the value of the column chosen as the shard key. We compare this value with the lower_bound of the shard definition to decide if the row should be placed in this shard.

Hence in the case of the employee table schema, assume that we have a FABRIC group with ID as GROUPID-1. We set lower_bound = md5(“GROUPID-1”). Now when we want to insert a row into this shard, we check to see if md5(EmpID for row) > md5(“GROUPID-1”) before inserting.

Handling Global Operations

In order to handle transactions that need to be executed on all the shards, E.g. Schema updates / updates on global tables, FABRIC creates the concept of a global group.


The Global Group stores the schema (IMPORTANT: Not the data) of the tables that are sharded. It also stores the tables (IMPORTANT: And the data) for all such tables that need to be present on all the shards (E.g. Pincode table that might need to participate in joins across all the shards).

When we need to fire a global operation, for example a schema update on a sharded table, we run this on the global group. The update automatically gets propagated across all the shards.

Moving the Shard


When a given shard is overloaded, we may decide to move the shard into a new Fabric Group (For e.g. The new Fabric Group could be composed of servers running on more powerful hardware). While the shard move takes place we would like to ensure minimal impact on ongoing transactions.

Fabric allows moving a shard registered with the Fabric system into another Fabric group seamlessly.
 

Splitting the Shard



When the data in a shard grows larger than the size optimal for good performance, we can split the data in this shard into 2 server groups. While shard split takes place we would like to ensure minimal impact on ongoing transactions. Fabric allows splitting a shard registered with the Fabric system into another Fabric group seamlessly.

Find an use case for starting with a single server (single shard setup) and scaling gradually as the application and its data grows here http://vnwrites.blogspot.in/2013/09/mysqlfabric-sharding-example.html .