Search This Blog

Sunday, July 21, 2013

MySQL :: Partitioning with Dates in MySQL 5.1

MySQL :: Partitioning with Dates in MySQL 5.1:

Partitioning with Dates in MySQL 5.1

Many of you have responded very positively on the first and second partitioning articles that I wrote some time back. It’s clear that a lot of you are excited about the upcoming MySQL 5.1 release, and in particular, the new data partitioning feature that’s being offered. That’s no surprise because, as we covered in the previous two articles, partitioning has some excellent performance and management benefits for anyone who’s building data warehouses or other large databases with MySQL.
However, I’ve gotten some feedback indicating a disappointment that MySQL 5.1 doesn’t support partitioning on a native date/time column. Given that many data warehouses and other large databases try and segment data by some date-based algorithm – especially via the RANGE partitioning style – it’s understandable that a number of you are wondering if partitioning in 5.1 will meet the needs of your particular business intelligence application that relies on time-based functions.
It’s true that in MySQL 5.1 the actual partitioning key (the element used by the MySQL partition engine to horizontally divide the rows across the various partitions defined on a table or index) must be an integer value. But that doesn’t mean you can’t partition by date in MySQL 5.1 – it just requires the proper selection between a couple of MySQL date functions and you’ll be in business.
Let’s take a look at how to properly partition by dates in MySQL 5.1 and also showcase how not to try and partition by date, so you won’t waste time and effort on something that doesn’t give you the results that you’re looking for.

How Not to Partition by Date

I always appreciate it when someone shows me where not to step, so let’s first see what you don’t want to try when it comes to partitioning by date in MySQL 5.1. If you try and define a table with a partition key that references a date datatype, MySQL will quickly kick back an error:
mysql>  create table rms (d date)
    ->  partition by range (d)
    -> (partition p0 values less than ('1995-01-01'),
    ->  partition p1 VALUES LESS THAN ('2010-01-01'));
ERROR 1064 (42000): VALUES value must be of same type as partition function near '),
 partition p1 VALUES LESS THAN ('2010-01-01'))' at line 3
Because MySQL DBAs and developers are very clever and innovative, the first thing they'll do is try a workaround for the integer-only restriction in MySQL 5.1 regarding the partition key. For example, here's one way to get around the restriction:
mysql> CREATE TABLE part_date1
-> ( c1 int default NULL,
-> c2 varchar(30) default NULL,
-> c3 date default NULL) engine=myisam
-> partition by range (cast(date_format(c3,'%Y%m%d') as signed))
-> (PARTITION p0 VALUES LESS THAN (19950101),
-> PARTITION p1 VALUES LESS THAN (19960101) ,
-> PARTITION p2 VALUES LESS THAN (19970101) ,
-> PARTITION p3 VALUES LESS THAN (19980101) ,
-> PARTITION p4 VALUES LESS THAN (19990101) ,
-> PARTITION p5 VALUES LESS THAN (20000101) ,
-> PARTITION p6 VALUES LESS THAN (20010101) ,
-> PARTITION p7 VALUES LESS THAN (20020101) ,
-> PARTITION p8 VALUES LESS THAN (20030101) ,
-> PARTITION p9 VALUES LESS THAN (20040101) ,
-> PARTITION p10 VALUES LESS THAN (20100101),
-> PARTITION p11 VALUES LESS THAN MAXVALUE );
Query OK, 0 rows affected (0.01 sec)
Bingo! MySQL took it. Here’s another possible workaround:
mysql> CREATE TABLE part_date2
    ->      (  c1 int default NULL,
    ->  c2 varchar(30) default NULL,
    ->  c3 date default NULL) engine=myisam
    ->      partition by range (YEAR(C3)*100+MONTH(C3))
    -> (PARTITION p0 VALUES LESS THAN (199501),
    -> PARTITION p1 VALUES LESS THAN (199601) ,
    -> PARTITION p2 VALUES LESS THAN (199701) ,
    -> PARTITION p3 VALUES LESS THAN (199801) ,
    -> PARTITION p4 VALUES LESS THAN (199901) ,
    -> PARTITION p5 VALUES LESS THAN (200001) ,
    -> PARTITION p6 VALUES LESS THAN (200101) ,
    -> PARTITION p7 VALUES LESS THAN (200201) ,
    -> PARTITION p8 VALUES LESS THAN (200301) ,
    -> PARTITION p9 VALUES LESS THAN (200401) ,
    -> PARTITION p10 VALUES LESS THAN (201001),
    -> PARTITION p11 VALUES LESS THAN MAXVALUE );
Query OK, 0 rows affected (0.02 sec)
MySQL took that too. At this point, you might be feeling pretty crafty as you’ve outwitted MySQL into partitioning by a date column. But watch what happens when we fill the tables with millions of rows of data and then see how each reacts with a standard range-based query:
mysql> explain partitions
    -> select count(*) from part_date1 where
    ->      c3 > date '1995-01-01' and c3 < date '1995-12-31'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: part_date1
   partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 8100000
        Extra: Using where
1 row in set (0.00 sec)

mysql> explain partitions
    -> select count(*) from part_date2 where
    ->      c3 > date '1995-01-01' and c3 < date '1995-12-31'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: part_date1
   partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 8100000
        Extra: Using where
1 row in set (0.00 sec)   
Yikes! What happened? MySQL is doing a full table scan rather than just scanning the partition(s) that contain the data we need. The issue is that you’ve forgotten the second part of the equation when it comes to leveraging date-based partitioning (or any partitioning style really), which is the MySQL optimizer. Yes, you can divide your data using any number of date-to-integer conversion techniques, but that doesn’t mean you’ll get the big bang when it comes to using partitioning, which is partition pruning – the act where MySQL only references the partitions needed to fulfill a particular query. The optimizer’s partition pruning mechanism is the actual hero that accomplishes the performance speed-up’s in partitioned vs. non-partitioned tables, and without it, you might as well go back to using non-partitioned tables if it’s an increase in performance you’re looking for.
But take heart – there is a way to partition by date and get partition pruning.

How to Partition by Date

The MySQL optimizer will recognize two date-based functions for partition pruning purposes:
  • 1.TO_DAYS()
  • 2.YEAR()
If you partition date columns using either of these functions, you can both define a partitioned table by date and get the benefit of partition pruning from the optimizer (provided, of course, that your query is written in such a way so as to take advantage of the optimizer’s ability to prune partitions). For example, we can redefine the tables used in our previous example using both the TO_DAYS() and YEAR() date-based functions:
mysql> CREATE TABLE part_date3
    ->      (  c1 int default NULL,
    ->  c2 varchar(30) default NULL,
    ->  c3 date default NULL) engine=myisam
    ->      partition by range (to_days(c3))
    -> (PARTITION p0 VALUES LESS THAN (to_days('1995-01-01')),
    -> PARTITION p1 VALUES LESS THAN (to_days('1996-01-01')) ,
    -> PARTITION p2 VALUES LESS THAN (to_days('1997-01-01')) ,
    -> PARTITION p3 VALUES LESS THAN (to_days('1998-01-01')) ,
    -> PARTITION p4 VALUES LESS THAN (to_days('1999-01-01')) ,
    -> PARTITION p5 VALUES LESS THAN (to_days('2000-01-01')) ,
    -> PARTITION p6 VALUES LESS THAN (to_days('2001-01-01')) ,
    -> PARTITION p7 VALUES LESS THAN (to_days('2002-01-01')) ,
    -> PARTITION p8 VALUES LESS THAN (to_days('2003-01-01')) ,
    -> PARTITION p9 VALUES LESS THAN (to_days('2004-01-01')) ,
    -> PARTITION p10 VALUES LESS THAN (to_days('2010-01-01')),
    -> PARTITION p11 VALUES LESS THAN MAXVALUE );
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE part_date4
    ->      (  c1 int default NULL,
    ->  c2 varchar(30) default NULL,
    ->  c3 date default NULL) engine=myisam
    ->      PARTITION BY RANGE (year(c3))
    -> (PARTITION p0 VALUES LESS THAN (1995),
    -> PARTITION p1 VALUES LESS THAN (1996) ,
    -> PARTITION p2 VALUES LESS THAN (1997) ,
    -> PARTITION p3 VALUES LESS THAN (1998) ,
    -> PARTITION p4 VALUES LESS THAN (1999) ,
    -> PARTITION p5 VALUES LESS THAN (2000) ,
    -> PARTITION p6 VALUES LESS THAN (2001) ,
    -> PARTITION p7 VALUES LESS THAN (2002) ,
    -> PARTITION p8 VALUES LESS THAN (2003) ,
    -> PARTITION p9 VALUES LESS THAN (2004) ,
    -> PARTITION p10 VALUES LESS THAN (2010),
    -> PARTITION p11 VALUES LESS THAN MAXVALUE );
Query OK, 0 rows affected (0.02 sec)
If we fill both tables with the same amount of data as the tables we created with non-prunable partitioning keys and issue range-based queries against the newly loaded well-partitioned tables, we’ll find a big difference when it comes to what the MySQL optimizer does:
mysql> explain partitions
    -> select count(*) from part_date3 where
    ->      c3 > date '1995-01-01' and c3 < date '1995-12-31'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: part_date3
   partitions: p1
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 808431
        Extra: Using where
1 row in set (0.00 sec)

mysql> explain partitions
    -> select count(*) from part_date4 where
    ->      c3 > date '1995-01-01' and c3 < date '1995-12-31'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: part_date4
   partitions: p1
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 200000
        Extra: Using where
1 row in set (0.00 sec)
Notice only the p1 partition is being scanned in these two tables, whereas all partitions were being checked in the tables that had partition keys the MySQL optimizer couldn’t use for partition pruning. A quick test run shows partition pruning in action along with the resulting performance difference:
mysql> select count(*) from part_date3 where
    ->      c3 > date '1995-01-01' and c3 < date '1995-12-31';
+----------+
| count(*) |
+----------+
|   805114 |
+----------+
1 row in set (4.11 sec)

mysql> select count(*) from part_date1 where
    ->      c3 > date '1995-01-01' and c3 < date '1995-12-31';
+----------+
| count(*) |
+----------+
|   805114 |
+----------+
1 row in set (40.33 sec)
A 90% decrease in response time for the query referencing table part_date3 over the query requesting data from table part_date1 spells out pretty clearly why it’s important to use date functions that the MySQL optimizer can utilize when defining date-based partitioned tables.

Conclusion

While a restriction does exist in MySQL 5.1 that says partitioned tables must be partitioned via an integer-based value, it doesn’t mean you can’t partition tables by date or get the benefits of partition pruning. You must, however, ensure that you’re using date conversion functions in the defined partitioning key for the table that the MySQL optimizer can make use of.
For more information on partitioning in general, please check out our online reference manual at http://dev.mysql.com/doc/refman/5.1/en/partitioning.html. Also, don’t forget about our MySQL forums, which have a special section devoted to partitioning at http://forums.mysql.com/list.php?106.
Of course I’d be remiss if I didn’t encourage you to download the latest copy of MySQL 5.1 (which is now in beta) today and give partitioning a try. I’d certainly love to hear how you’re using partitioning and other new features in MySQL 5.1.
Thanks, as always, for supporting MySQL!

No comments:

Post a Comment