Hi and welcome to another blog post. Today, we will explore Vitess. We will set up a simple Vitess cluster on Kubernetes. Once the cluster is running, we will discuss how to move tables and reshard the cluster. Finally, we will look at how to monitor a Vitess cluster using Prometheus. Please note that I am not an expert in Vitess; I simply want to experiment with it in this blog post.
Before we begin setting up Vitess, we need a running Kubernetes cluster. If you want to try this on your local machine, you can use a tool like kind to create a local cluster. Additionally, we need to install the MySQL Client and vtctldclient locally:
go install vitess.io/vitess/go/cmd/vtctldclient@v0.21.3
brew install mysql@8.4
brew link mysql@8.4
If you are not familiar with the architecture and concepts of Vitess, I recommend reviewing the documentation before proceeding. All the configuration files we are using can be found in the ricoberger/playground GitHub repository.
We will create our Vitess cluster using the
Vitess Operator. To install
the operator, we will create a new namespace called vitess
, install the CRDs,
and set up the operator using the following commands. Afterward, we can verify
that the operator is running by executing kubectl get pods
.
kubectl apply --server-side -f 001_namespace.yaml
kubectl apply --server-side -f 002_crds.yaml
kubectl apply --server-side -f 003_operator.yaml
NAME READY STATUS RESTARTS AGE
vitess-operator-7cc877ccc5-vdndl 1/1 Running 0 21s
Once the operator is running, we can launch our first Vitess cluster. The
cluster will use one cell (zone1
) that includes all the control plane
components (VTAdmin, vtctld, Topology Store) and one keyspace named commerce
,
which will contain one primary tablet and one replica tablet.
To bring up the cluster we can apply the 101_initial_cluster.yaml
manifest.
Afterwards we can check the state of the cluster using kubectl get pods
. After
a few minutes, it should show that all pods are in the status of running.
kubectl apply --server-side -f 101_initial_cluster.yaml
NAME READY STATUS RESTARTS AGE
example-commerce-x-x-zone1-vtorc-c13ef6ff-86bd96dfb4-kp8w5 1/1 Running 2 (59s ago) 71s
example-etcd-faf13de3-1 1/1 Running 0 72s
example-etcd-faf13de3-2 1/1 Running 0 72s
example-etcd-faf13de3-3 1/1 Running 0 72s
example-vttablet-zone1-2469782763-bfadd780 3/3 Running 2 (46s ago) 71s
example-vttablet-zone1-2548885007-46a852d0 3/3 Running 1 (46s ago) 71s
example-zone1-vtadmin-c03d7eae-68d845dbfd-wnlk9 2/2 Running 0 72s
example-zone1-vtctld-1d4dcad0-75f6fb7c6b-78rpv 1/1 Running 1 (51s ago) 72s
example-zone1-vtgate-bc6cde92-57fdc84bb6-cdj75 1/1 Running 2 (45s ago) 72s
vitess-operator-7cc877ccc5-vdndl 1/1 Running 0 2m29s
For ease-of-use, Vitess provides a script to port-forward from Kubernetes to our
local machine. This script also recommends setting up aliases for mysql
and
vtctldclient
. Once the port-forward starts running, the VTAdmin UI will be
available at http://localhost:14000/
.
alias vtctldclient="vtctldclient --server=localhost:15999"
alias mysql="mysql -h 127.0.0.1 -P 15306 -u user"
./pf.sh &
In the last step of the initial installation we will create our initial scheme,
which will deploy a single unsharded keyspace named commerce
, with the
following tables:
product
table contains the product information for all of the products.customer
table has a customer_id
that has an auto_increment
. A
typical customer table would have a lot more columns, and sometimes additional
detail tables.corder
table (named so because order
is an SQL reserved word) has an
order_id
auto-increment column. It also has foreign keys into
customer(customer_id)
and product(sku)
.vtctldclient ApplySchema --sql-file="102_create_commerce_schema.sql" commerce
vtctldclient ApplyVSchema --vschema-file="103_vschema_commerce_initial.json" commerce
We should now be able to connect to the VTGate Server in our cluster by running
the mysql
command.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| commerce |
| information_schema |
| mysql |
| sys |
| performance_schema |
+--------------------+
5 rows in set (0.02 sec)
In the next step we will create a new keyspace named customer
and move the
customer
and corder
tables to the newly created keyspace. This is the
recommended approach before splitting a single table across multiple servers
(sharding).
Let's start by loading some data into our created tables and looking at the data
we inserted. Notice that all of our tables are currently in the commerce
schema/keyspace here.
mysql < 201_insert_commerce_data.sql
mysql --table < 202_select_commerce_data.sql
Using commerce
Customer
+-------------+--------------------+
| customer_id | email |
+-------------+--------------------+
| 1 | alice@domain.com |
| 2 | bob@domain.com |
| 3 | charlie@domain.com |
| 4 | dan@domain.com |
| 5 | eve@domain.com |
+-------------+--------------------+
Product
+----------+-------------+-------+
| sku | description | price |
+----------+-------------+-------+
| SKU-1001 | Monitor | 100 |
| SKU-1002 | Keyboard | 30 |
+----------+-------------+-------+
COrder
+----------+-------------+----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+----------+-------+
| 1 | 1 | SKU-1001 | 100 |
| 2 | 2 | SKU-1002 | 30 |
| 3 | 3 | SKU-1002 | 30 |
| 4 | 4 | SKU-1002 | 30 |
| 5 | 5 | SKU-1002 | 30 |
+----------+-------------+----------+-------+
When we list our tablets using the following command, we can see that we have two tablets running: one primary and one replica.
mysql -e "show vitess_tablets"
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+
| Cell | Keyspace | Shard | TabletType | State | Alias | Hostname | PrimaryTermStartTime |
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+
| zone1 | commerce | - | PRIMARY | SERVING | zone1-2469782763 | 10.244.5.244 | 2025-04-10T06:08:22Z |
| zone1 | commerce | - | REPLICA | SERVING | zone1-2548885007 | 10.244.14.73 | |
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+
Now it is time to deploy new tablets for our customer
keyspace by applying the
203_customer_tablets.yaml
manifest. After some minutes we should see the newly
created tablets in a running state. We should also see that a new vtorc instance
was created for the customer
keyspace.
kubectl apply --server-side -f 203_customer_tablets.yaml
NAME READY STATUS RESTARTS AGE
example-commerce-x-x-zone1-vtorc-c13ef6ff-86bd96dfb4-kp8w5 1/1 Running 2 (5m52s ago) 6m4s
example-customer-x-x-zone1-vtorc-53d270f6-7754f557c-bb87n 1/1 Running 0 72s
example-etcd-faf13de3-1 1/1 Running 0 6m5s
example-etcd-faf13de3-2 1/1 Running 0 6m5s
example-etcd-faf13de3-3 1/1 Running 0 6m5s
example-vttablet-zone1-1250593518-17c58396 3/3 Running 0 72s
example-vttablet-zone1-2469782763-bfadd780 3/3 Running 2 (5m39s ago) 6m4s
example-vttablet-zone1-2548885007-46a852d0 3/3 Running 1 (5m39s ago) 6m4s
example-vttablet-zone1-3778123133-6f4ed5fc 3/3 Running 2 (35s ago) 72s
example-zone1-vtadmin-c03d7eae-68d845dbfd-wnlk9 2/2 Running 0 6m5s
example-zone1-vtctld-1d4dcad0-75f6fb7c6b-78rpv 1/1 Running 1 (5m44s ago) 6m5s
example-zone1-vtgate-bc6cde92-57fdc84bb6-cdj75 1/1 Running 2 (5m38s ago) 6m5s
vitess-operator-7cc877ccc5-vdndl 1/1 Running 0 7m22s
Before we continue we restart the port-forward after launching the pods has
completed. Afterwards we can list our tables again. We can see that we have four
tablets now. The two existing ones for the commerce
keyspace and two new ones
for the customer
keyspace.
killall kubectl
./pf.sh &
mysql -e "show vitess_tablets"
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+
| Cell | Keyspace | Shard | TabletType | State | Alias | Hostname | PrimaryTermStartTime |
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+
| zone1 | commerce | - | PRIMARY | SERVING | zone1-2469782763 | 10.244.5.244 | 2025-04-10T06:08:22Z |
| zone1 | commerce | - | REPLICA | SERVING | zone1-2548885007 | 10.244.14.73 | |
| zone1 | customer | - | PRIMARY | SERVING | zone1-1250593518 | 10.244.11.55 | 2025-04-10T06:13:10Z |
| zone1 | customer | - | REPLICA | SERVING | zone1-3778123133 | 10.244.8.170 | |
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+
In the next step we will create a MoveTables
workflow, which copies the tables
from the commerce
keyspace into the customer
keyspace. This operation does
not block any database activity.
vtctldclient MoveTables --target-keyspace customer --workflow commerce2customer create --source-keyspace commerce --tables 'customer,corder'
To see what happens under the covers, let's look at the routing rules that the MoveTables operation created. These are instructions used by a VTGate to determine which backend keyspace to send requests to for a given table.
vtctldclient GetRoutingRules
We can monitor the progress of the MoveTables
operation using the status
action. We can also validate its correctness by performing a logical diff
between the source and target to confirm that they are fully synced with the
VDiff
command.
# Monitoring Progress
vtctldclient MoveTables --target-keyspace customer --workflow commerce2customer status --format=json
{
"table_copy_state": {},
"shard_streams": {
"customer/-": {
"streams": [
{
"id": 1,
"tablet": {
"cell": "zone1",
"uid": 1250593518
},
"source_shard": "commerce/-",
"position": "2a5c9b12-15d2-11f0-a7a3-a2742b031576:1-40",
"status": "Running",
"info": "VStream Lag: 0s"
}
]
}
},
"traffic_state": "Reads Not Switched. Writes Not Switched"
}
# Validate Correctness
vtctldclient VDiff --target-keyspace customer --workflow commerce2customer create
vtctldclient VDiff --format=json --target-keyspace customer --workflow commerce2customer show last --verbose
VDiff 1163c387-4284-4214-9896-74b3af6a0cef scheduled on target shards, use show to view progress
{
"Workflow": "commerce2customer",
"Keyspace": "customer",
"State": "started",
"UUID": "1163c387-4284-4214-9896-74b3af6a0cef",
"RowsCompared": 0,
"HasMismatch": false,
"Shards": "-",
"StartedAt": "2025-04-10 06:19:56",
"TableSummary": {
"corder": {
"TableName": "corder",
"State": "started",
"RowsCompared": 0,
"MatchingRows": 0,
"MismatchedRows": 0,
"ExtraRowsSource": 0,
"ExtraRowsTarget": 0
},
"customer": {
"TableName": "customer",
"State": "pending",
"RowsCompared": 0,
"MatchingRows": 0,
"MismatchedRows": 0,
"ExtraRowsSource": 0,
"ExtraRowsTarget": 0
}
},
"Reports": {
"corder": {
"-": {
"TableName": "",
"ProcessedRows": 0,
"MatchingRows": 0,
"MismatchedRows": 0,
"ExtraRowsSource": 0,
"ExtraRowsTarget": 0
}
},
"customer": {
"-": {
"TableName": "",
"ProcessedRows": 0,
"MatchingRows": 0,
"MismatchedRows": 0,
"ExtraRowsSource": 0,
"ExtraRowsTarget": 0
}
}
},
"Progress": {
"Percentage": 0
}
}
Once the MoveTables
operation is complete, the first step in making the
changes live is to switch all query serving traffic from the old commerce
keyspace to the customer
keyspace for the tables we moved. Queries against the
other tables will continue to route to the commerce
keyspace.
vtctldclient MoveTables --target-keyspace customer --workflow commerce2customer SwitchTraffic
If we now look at the
routing rules
after the SwitchTraffic
step, we will see that all queries against the
customer
and corder
tables will get routed to the customer
keyspace.
vtctldclient GetRoutingRules
The final step is to complete the migration using the Complete
action. This
will (by default) get rid of the routing rules that were created and DROP
the
original tables in the source keyspace (commerce
). Along with freeing up space
on the original tablets, this is an important step to eliminate potential future
confusion.
vtctldclient MoveTables --target-keyspace customer --workflow commerce2customer complete
In this step, we will divide our customer
keyspace into two shards. The final
architecture will appear as shown in the graphic below.
Before we can start we have to create a sequence table for our auto-increment columns and we have to decide for sharding keys or Primary Vindexes within a VSchema. More information regarding these two topic can be found in the Vitess documentation1. To create the sequence tables and VSchema we can run the following commands:
vtctldclient ApplySchema --sql="$(cat 301_create_commerce_seq.sql)" commerce
vtctldclient ApplyVSchema --vschema="$(cat 302_vschema_commerce_seq.json)" commerce
vtctldclient ApplyVSchema --vschema="$(cat 303_vschema_customer_sharded.json)" customer
vtctldclient ApplySchema --sql="$(cat 304_create_customer_sharded.sql)" customer
At this point, you have finalized our sharded VSchema and vetted all the queries
to make sure they still work. Now, it’s time to reshard. To do this we will
create the target shards by applying the 305_new_shards.yaml
manifest.
Afterwards some minutes we should see four new tablets and two new vtorc pods
(via kubectl get pods
) for the created target shards.
kubectl apply --server-side -f 305_new_shards.yaml
# Restart the port-forward afterwards:
killall kubectl
./pf.sh &
NAME READY STATUS RESTARTS AGE
example-commerce-x-x-zone1-vtorc-c13ef6ff-86bd96dfb4-kp8w5 1/1 Running 2 (19m ago) 19m
example-customer-80-x-zone1-vtorc-836adff9-b67657589-ndpxq 1/1 Running 0 76s
example-customer-x-80-zone1-vtorc-2bf8b95e-86b8b56fbc-q69h4 1/1 Running 0 76s
example-customer-x-x-zone1-vtorc-53d270f6-7754f557c-bb87n 1/1 Running 0 14m
example-etcd-faf13de3-1 1/1 Running 0 19m
example-etcd-faf13de3-2 1/1 Running 0 19m
example-etcd-faf13de3-3 1/1 Running 0 19m
example-vttablet-zone1-0118374573-10d08e80 3/3 Running 2 (35s ago) 76s
example-vttablet-zone1-0120139806-fed29577 3/3 Running 0 76s
example-vttablet-zone1-1250593518-17c58396 3/3 Running 0 14m
example-vttablet-zone1-2289928654-7de47379 3/3 Running 0 76s
example-vttablet-zone1-2469782763-bfadd780 3/3 Running 2 (19m ago) 19m
example-vttablet-zone1-2548885007-46a852d0 3/3 Running 1 (19m ago) 19m
example-vttablet-zone1-3778123133-6f4ed5fc 3/3 Running 2 (13m ago) 14m
example-vttablet-zone1-4277914223-0f04a9a6 3/3 Running 0 76s
example-zone1-vtadmin-c03d7eae-68d845dbfd-wnlk9 2/2 Running 0 19m
example-zone1-vtctld-1d4dcad0-75f6fb7c6b-78rpv 1/1 Running 1 (19m ago) 19m
example-zone1-vtgate-bc6cde92-57fdc84bb6-cdj75 1/1 Running 2 (19m ago) 19m
vitess-operator-7cc877ccc5-vdndl 1/1 Running 0 20m
Now we can start the Reshard
operation. It occurs online, and will not block
any read or write operations to your database:
vtctldclient Reshard --target-keyspace customer --workflow cust2cust create --source-shards '-' --target-shards '-80,80-'
After the reshard is complete, we can use VDiff to check data integrity and ensure our source and target shards are consistent:
vtctldclient VDiff --target-keyspace customer --workflow cust2cust create
vtctldclient VDiff --format=json --target-keyspace customer --workflow cust2cust show last
VDiff da8b1af8-eaf0-415b-9b63-4d3606798435 scheduled on target shards, use show to view progress
{
"Workflow": "cust2cust",
"Keyspace": "customer",
"State": "started",
"UUID": "da8b1af8-eaf0-415b-9b63-4d3606798435",
"RowsCompared": 4,
"HasMismatch": false,
"Shards": "-80,80-",
"StartedAt": "2025-04-10 06:29:57",
"Progress": {
"Percentage": 100,
"ETA": "2025-04-10 06:29:57"
}
}
After validating for correctness, the next step is to switch all traffic from the source shards to the target shards:
vtctldclient Reshard --target-keyspace customer --workflow cust2cust SwitchTraffic
We should now be able to see the data that has been copied over to the new shards:
mysql --table < 306_select_customer-80_data.sql
mysql --table < 307_select_customer80-_data.sql
Using customer/-80
Customer
+-------------+--------------------+
| customer_id | email |
+-------------+--------------------+
| 1 | alice@domain.com |
| 2 | bob@domain.com |
| 3 | charlie@domain.com |
| 5 | eve@domain.com |
+-------------+--------------------+
COrder
+----------+-------------+----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+----------+-------+
| 1 | 1 | SKU-1001 | 100 |
| 2 | 2 | SKU-1002 | 30 |
| 3 | 3 | SKU-1002 | 30 |
| 5 | 5 | SKU-1002 | 30 |
+----------+-------------+----------+-------+
Using customer/80-
Customer
+-------------+----------------+
| customer_id | email |
+-------------+----------------+
| 4 | dan@domain.com |
+-------------+----------------+
COrder
+----------+-------------+----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+----------+-------+
| 4 | 4 | SKU-1002 | 30 |
+----------+-------------+----------+-------+
We can now complete the created Reshard
workflow and remove the shard that is
no longer required:
vtctldclient Reshard --target-keyspace customer --workflow cust2cust complete
kubectl apply --server-side -f 308_down_shard_-.yaml
kubectl delete vitessshards.planetscale.com example-customer-x-x-dc880356
Afterwards the list of running pods should look as follows. As we can see the two tablets for the old shard as well as the vtorc pod were removed.
NAME READY STATUS RESTARTS AGE
example-commerce-x-x-zone1-vtorc-c13ef6ff-86bd96dfb4-kp8w5 1/1 Running 2 (33m ago) 33m
example-customer-80-x-zone1-vtorc-836adff9-b67657589-ndpxq 1/1 Running 0 15m
example-customer-x-80-zone1-vtorc-2bf8b95e-86b8b56fbc-q69h4 1/1 Running 0 15m
example-etcd-faf13de3-1 1/1 Running 0 33m
example-etcd-faf13de3-2 1/1 Running 0 33m
example-etcd-faf13de3-3 1/1 Running 0 33m
example-vttablet-zone1-0118374573-10d08e80 3/3 Running 2 (14m ago) 15m
example-vttablet-zone1-0120139806-fed29577 3/3 Running 0 15m
example-vttablet-zone1-2289928654-7de47379 3/3 Running 0 15m
example-vttablet-zone1-2469782763-bfadd780 3/3 Running 2 (33m ago) 33m
example-vttablet-zone1-2548885007-46a852d0 3/3 Running 1 (33m ago) 33m
example-vttablet-zone1-4277914223-0f04a9a6 3/3 Running 0 15m
example-zone1-vtadmin-c03d7eae-68d845dbfd-wnlk9 2/2 Running 0 33m
example-zone1-vtctld-1d4dcad0-75f6fb7c6b-78rpv 1/1 Running 1 (33m ago) 33m
example-zone1-vtgate-bc6cde92-57fdc84bb6-cdj75 1/1 Running 2 (33m ago) 33m
vitess-operator-7cc877ccc5-vdndl 1/1 Running 0 34m
To monitor our Vitess cluster we will use Prometheus and Grafana. We will not go through the setup of Prometheus and Grafana within this post and assume that we already have a running Prometheus and Grafana instance. To monitor our Vitess cluster with Prometheus and Grafana we will create a scrape configuration for Prometheus and import some dashboards2 into Grafana.
kubectl apply --server-side -f 401_monitoring.yaml
Once we get some data in the dashboards, we can also generate some load by
running the example application. The following command will create customers in
our customer
table. We can increase the load after some time by restarting the
command with the -goroutines
flag.
go run . -create-customers
go run . -create-customers -goroutines=20
The following commands will create some orders in our corders
table. To create
a new order we select a random customer, all products and create a new order for
the selected customer and one of the selected products.
go run . -create-orders -goroutines=10
go run . -create-orders -goroutines=100
Now we can also restart some tablets and monitor the behaviour of Vitess via the created dashboards. In teh following we test the following scenarios:
commerce
keyspace80-
in the customer
keyspace80-
in the customer
keyspaceLast but not least, we can monitor a single tablet by creating a port forward
and opening http://localhost:15005
in our browser. The dashboard displays the
number of queries per second, the current query and transaction log, real-time
queries, and much more.
kubectl port-forward example-vttablet-zone1-4277914223-0f04a9a6 15005:15000
That's it for today's post. I had a lot of fun playing around with Vitess and hopefully gained a better understanding of how it works. I hope you also enjoyed the post, and I'll see you next time.