Search This Blog

Thursday, October 13, 2011

When EXPLAIN estimates can go wrong!

When EXPLAIN estimates can go wrong!:

I have been working with a few customer cases and one interesting case popped up. The customer was facing a peculiar problem where the rows column in the EXPLAIN output of the query was totally off. The actual number of rows was 18 times more than the number of rows reported by MySQL in the output of EXPLAIN. Now this can be a real pain as MySQL uses “the number of rows” estimation to pick and choose indexes and it could really be picking up a wrong index simply because of the wrong estimate.


The customer reported that he changed the value of innodb_stats_sample_pages from 8 to 256 but with no effect, however I think innodb_stats_sample_pages really would have no effect on the “number of rows estimation”, because page sampling is used to generate index cardinality estimates which are then in turn used by MySQL to pick and choose indexes and hence is irrelevant to this problem.


Next, I proceeded to test using MySQL 5.1.58 and 5.5.15 vanilla releases.


Following is the definition of the table that I used for the test:


mysql [localhost] {msandbox} (foo2) > show create table test_estimate \G
*************************** 1. row ***************************
Table: test_estimate
Create Table: CREATE TABLE `test_estimate` (
`id` int(11) NOT NULL DEFAULT '0',
`created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`type` tinyint(4) NOT NULL DEFAULT '0',
KEY `id` (`id`),
KEY `type_created` (`type`,`created`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

First I did a test for a SIMPLE SELECT to see the actual count of rows:


mysql [localhost] {msandbox} (foo2) > select count(*) from test_estimate where type=6 \G
*************************** 1. row ***************************
count(*): 3372104

And then proceeded to run the EXPLAIN on both 5.1 and 5.5:


On 5.1:


mysql [localhost] {msandbox} (foo2) > explain select count(*) from test_estimate where type=6 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: test_estimate
type: ref
possible_keys: type_created
key: type_created
key_len: 1
ref: const
rows: 185440
Extra: Using index

On 5.5:


mysql [localhost] {msandbox} (foo2) > explain select count(*) from test_estimate where type=6 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: test_estimate
type: ref
possible_keys: type_created
key: type_created
key_len: 1
ref: const
rows: 3135918
Extra: Using index

The results on 5.1 are very off, you don’t expect that much variation between the actual number of rows and the rows estimated. While the results on 5.5 are much closer and are acceptable.


Next, I decided to do a test run for a SELECT involving RANGE SCAN:


The actual count of rows is:


mysql [localhost] {msandbox} (foo2) > select count(*) from test_estimate where type between 3 and 6 \G
*************************** 1. row ***************************
count(*): 19391022

And the EXPLAIN result,


On 5.1:


mysql [localhost] {msandbox} (foo2) > explain select count(*) from test_estimate where type between 3 and 6 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: test_estimate
type: range
possible_keys: type_created
key: type_created
key_len: 1
ref: NULL
rows: 339184
Extra: Using where; Using index

On 5.5:


mysql [localhost] {msandbox} (foo2) > explain select count(*) from test_estimate where type between 3 and 6 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: test_estimate
type: range
possible_keys: type_created
key: type_created
key_len: 1
ref: NULL
rows: 14313458
Extra: Using where; Using index

Again the row estimates on 5.1 is very off, its as much as 57 times less than the number of rows, which is not acceptable at all. While 5.5 returns an acceptable estimate.


This test proves that there is a bug in MySQL 5.1 and how it calculates the row estimates. This bug was tracked down to http://bugs.mysql.com/bug.php?id=53761 and was indeed fixed in 5.5 as my tests show.


Vasil Dimov goes on to explain the reason of the bug:

“For a given level the algo knows the number of pages in the requested range and the number of records on the leftmost and the rightmost page. Then it assumes all pages in between contain the average between the two border pages and multiplies this average number by the number of intermediate pages.”


Obviously this assumption, that all the pages in the requested range have almost similar number of records, fails when the pages between the leftmost and the rightmost page have fewer or larger number of records per page. This is not something that can happen in all cases, but there are two scenarios when this can happen:



  1. when you have a large number of records that are concentrated against a key value and that key value happens to be in the intermediate pages,

  2. when you have a large number of records that are concentrated against key values that lie in the leftmost and rightmost page and very few records concentrated against the key values that lie in the intermediate pages.


The fix that is introduced is:

“Same idea, but peek a few (10) of the intermediate pages to get a better estimate of the average number of records per page. If there are less than 10 intermediate pages then all of them will be scanned and the result will be precise, not an estimation.”


This seems to be a good fix as sampling more of the intermediate pages is going to increase the quality of the estimation, and make the estimation very precise if the index tree is not much wide.

No comments:

Post a Comment