Search This Blog

Sunday, October 20, 2013

InnoDB scalability issues due to tables without primary keys

InnoDB scalability issues due to tables without primary keys:

Scalability issues due to tables without primary keys

This scalability issue is caused by the usage of tables without primary keys. This issue typically shows itself as contention on the InnoDB dict_sys mutex. Now the dict_sys mutex controls access to the data dictionary. This mutex is used at various places. I will only mention a few of them:
  • During operations such as opening and closing table handles, or
  • When accessing I_S tables, or
  • During undo of a freshly inserted row, or
  • During other data dictionary modification operations such as CREATE TABLE, or
  • Within the “Persistent Stats” subsystem, among other things.
Of course this list is not exhaustive but should give you a good picture of how heavily it is used.
But the thing is when you are mainly debugging contention related to a data dictionary control structure, you start to look off at something that is directly related to data dictionary modifications. You look for execution of CREATE TABLE, DROP TABLE, TRUNCATE TABLE, etc. But what if none of that is actually causing the contention on the dict_sys mutex? Are you aware when generating “row-id” values, for tables without explicit primary keys, or without non-nullable unique keys, dict_sys mutex is acquired. So INSERTs to tables with implicit primary keys is a InnoDB system-wide contention point.
Let’s also take a look at the relevant source code.
Firstly, below is the function that does the row-id allocation which is defined in the file storage/innobase/row/row0ins.cc
Secondly, below is the function that actually generates the row-id which is defined in the filestorage/innobase/include/dict0boot.ic
Finally, I would like to share results of a few benchmarks that I conducted in order to show you how this affects performance.

Benchmarking affects of non-presence of primary keys

First off all, let me share information about the host that was used in the benchmarks. I will also share the MySQL version and InnoDB configuration used.

Hardware

The host was a “hi1.4xlarge” Amazon EC2 instance. The instance comes with 16 vCPUs and 60.5GB of memory. The instance storage consists of 2×1024 SSD-backed storage volumes, and the instance is connected to a 10 Gigabit ethernet network. So the IO performance is very decent. I created a RAID 0 array from the 2 instance storage volumes and created XFS filesystem on the resultant software RAID 0 volume. This configuration would allows us to get the best possible IO performance out of the instance.

MySQL

The MySQL version used was 5.5.34 MySQL Community Server, and the InnoDB configuration looked as follows:
I conducted two different types of benchmarks, and both of them were done by using sysbench.
First one involved benchmarking the performance of single-row INSERTs for tables with and without explicit primary keys. That’s what I would be showing first.

Single-row INSERTs

The tables were generated as follows for the benchmark involving tables with primary keys:
This resulted in the following table being created:
While the tables without primary keys were generated as follows:
This resulted in the tables being created with the following structure:
The actual benchmark for the table with primary keys was run as follows:
While the actual benchmark for the table without primary keys was run as follows:
Note that the benchmarks were run with three variations in the number of concurrent threads used by sysbench: 16, 32 and 64.
Below are how the graphs look like for each of these benchmarks.
Writes per second 16 threads
Writes per second 32 threads
Writes per second 64 threads
Some key things to note from the graphs are that the throughput of the INSERTs to the tables without explicit primary keys never goes above 87% of the throughput of the INSERTs to the tables with primary keys defined. Furthermore, as we increase the concurrency downward spikes start appearing. These become more apparent when we move to a concurrency of 64 threads. This is expected, because the contention is supposed to increase as we increase the concurrency of operations that contend on the dict_sys mutex.
Now let’s take a look at how this impacts the bulk load performance.

Bulk Loads

The bulk loads to the tables with primary keys were performed as follows:
While the bulk loads to the tables without primary keys were performed as follows:
Note that the benchmarks were again run with three variations in the number of concurrent threads used by sysbench: 16, 32 and 64.
Below is what the picture is portrayed by the graph.
Parallel Bulk Loading of Tables
Here again, you can see how the bulk load time increases as we increase the number of concurrent threads. This against points to the increase in contention on the dict_sys mutex. With 16 threads the bulk load time for tables without primary keys is 107% more than the bulk load time for the tables with primary keys. This increases to 116% with 32 threads and finally 124% with 64 threads.

Conclusion

Tables without primary keys cause a wide range of contention because they rely on acquiring dict_sys mutex to generate row-id values. This mutex is used at critical places within InnoDB. Hence the affect of large amount of INSERTs to tables without primary keys is not only isolated to that table alone but can be seen very widely. There are a number of times I have seen tables without primary keys being used in many different scenarios that include simple INSERTs to these tables as well as multi-row INSERTs as a result of, for example, INSERT … SELECT into a table that is being temporarily created. The advice is always to have primary keys present in your tables. Hopefully I have been able to highlight the true impact non-presence of primary keys can have.

No comments:

Post a Comment