Tuesday, April 10, 2012

Optimizer: new EXPLAIN FORMAT=JSON

New feature: structured EXPLAIN


Probably, most of us tried to read the output of MySQL EXPLAIN command at least once.
And probably many of us have decided, that it isn't so readable and understandable as it has to be.
Some of us even tried to create or use external pretty-printing programs and other complicated converters to make EXPLAIN's output less cryptic, for example nice Percona's Visual Explain script.
But from now we have a native MySQL command that explains query execution plan in a better human-readable and machine-readable way: EXPLAIN FORMAT=JSON!

A trivial example, SELECT from a single table:

mysql> CREATE TABLE t1 (i INT, j INT);
Query OK, 0 rows affected (0.00 sec)

mysql> EXPLAIN SELECT * FROM t1;
+----+-------------+-------+--------+---------------+------+---------+------+------+---------------------+
| id | select_type | table | type   | possible_keys | key  | key_len | ref  | rows | Extra               |
+----+-------------+-------+--------+---------------+------+---------+------+------+---------------------+
|  1 | SIMPLE      | t1    | system | NULL          | NULL | NULL    | NULL |    0 | const row not found |
+----+-------------+-------+--------+---------------+------+---------+------+------+---------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN FORMAT=JSON SELECT * FROM t1;
+------------------------------------------------- ...
| EXPLAIN                                          
+------------------------------------------------- ...
| {
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "t1",
      "access_type": "system",
      "rows": 0,
      "filtered": 0,
      "const_row_not_found": true
    }
  }
} |
+------------------------------------------------- ...
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+-------+------+----------------------------------------------------------------+
| Level | Code | Message                                                        |
+-------+------+----------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select NULL AS `i`,NULL AS `j` from `test`.`t1` |
+-------+------+----------------------------------------------------------------+
1 row in set (0.00 sec)
  

Moreover, new EXPLAIN is not just a 1-to-1 translator of the old EXPLAIN command output.
JSON output is much more informative:

  • it tries to display a real execution path of the query;
  • it describes pushed conditions;
  • temporary table and index creation is displayed in more precise

More advanced example, nested loop join and pushed conditions (see "attached_conditions"):

mysql> CREATE TABLE t2 (i INT KEY, j INT);
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t1 VALUES (1, 1), (2, 2), (3, 3);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> INSERT INTO t2 SELECT * FROM t1;
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.i = t2.i WHERE t1.j > 1 AND t2.j < 3;
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref       | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
|  1 | SIMPLE      | t1    | ALL    | NULL          | NULL    | NULL    | NULL      |    3 | Using where |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY       | PRIMARY | 4       | test.t1.i |    1 | Using where |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
2 rows in set (0.00 sec)

mysql> SET end_markers_in_json=true;
Query OK, 0 rows affected (0.00 sec)

mysql> EXPLAIN FORMAT=JSON SELECT * FROM t1 JOIN t2 ON t1.i = t2.i WHERE t1.j > 1 AND t2.j < 3;
+------------------------------------------------- ...
| EXPLAIN
+------------------------------------------------- ...
| {
  "query_block": {
    "select_id": 1,
    "nested_loop": [
      {
        "table": {
          "table_name": "t1",
          "access_type": "ALL",
          "rows": 3,
          "filtered": 100,
          "attached_condition": "((`test`.`t1`.`j` > 1) and (`test`.`t1`.`i` is not null))"
        } /* table */
      },
      {
        "table": {
          "table_name": "t2",
          "access_type": "eq_ref",
          "possible_keys": [
            "PRIMARY"
          ] /* possible_keys */,
          "key": "PRIMARY",
          "key_length": "4",
          "ref": [
            "test.t1.i"
          ] /* ref */,
          "rows": 1,
          "filtered": 100,
          "attached_condition": "(`test`.`t2`.`j` < 3)"
        } /* table */
      }
    ] /* nested_loop */
  } /* query_block */
} |
+------------------------------------------------- ...
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+-------+------+---------------------------------- ...
| Level | Code | Message
+-------+------+---------------------------------- ...
| Note  | 1003 | /* select#1 */ select `test`.`t1`.`i` AS `i`,`test`.`t1`.`j` AS `j`,`test`.`t2`.`i` AS `i`,`test`.`t2`.`j` AS `j` from `test`.`t1` join `test`.`t2` where ((`test`.`t2`.`i` = `test`.`t1`.`i`) and (`test`.`t1`.`j` > 1) and (`test`.`t2`.`j` < 3)) |
+-------+------+---------------------------------- ...
1 row in set (0.00 sec)
  

Thus, instead of the "Using where" in the traditional EXPLAIN output we can see where each part of query condition was pushed down and what part was pushed.

Similar example: "Using index" goes "index_condition":

mysql> EXPLAIN SELECT * FROM t2 WHERE i > 1 AND j < 3;
+----+-------------+-------+-------+---------------+---------+---------+------+------+------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                              |
+----+-------------+-------+-------+---------------+---------+---------+------+------+------------------------------------+
|  1 | SIMPLE      | t2    | range | PRIMARY       | PRIMARY | 4       | NULL |    2 | Using index condition; Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+------------------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN FORMAT=JSON SELECT * FROM t2 WHERE i > 1 AND j < 3;
+------------------------------------------------- ...
| EXPLAIN
+------------------------------------------------- ...
| {
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "t2",
      "access_type": "range",
      "possible_keys": [
        "PRIMARY"
      ] /* possible_keys */,
      "key": "PRIMARY",
      "key_length": "4",
      "rows": 2,
      "filtered": 100,
      "index_condition": "(`test`.`t2`.`i` > 1)",
      "attached_condition": "(`test`.`t2`.`j` < 3)"
    } /* table */
  } /* query_block */
} |
+------------------------------------------------- ...
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+-------+------+---------------------------------- ...
| Level | Code | Message                                                                                                                                      |
+-------+------+---------------------------------- ...
| Note  | 1003 | /* select#1 */ select `test`.`t2`.`i` AS `i`,`test`.`t2`.`j` AS `j` from `test`.`t2` where ((`test`.`t2`.`i` > 1) and (`test`.`t2`.`j` < 3)) |
+-------+------+---------------------------------- ...
1 row in set (0.00 sec)

  

Also the new explain reflects the proper execution order of "grouping_operation" (GROUP BY clause), "ordering_operation" (ORDER BY) and "duplicates_removal" (DISTINCT etc.) operations.
Each of them may be an initiator of temporary table or temporary index (filesort) creation.
One of the most advanced features of new EXPLAIN is how it displays temporary tables and temporary indices.
Whereas the traditional EXPLAIN may "aggregate" two facts of temporary table creation into the single "Using temporary table" message (and similar for temporary index), the new JSON output tries to display both (if any) at the proper place: i.e. if the "ORDER BY" clause is the initiator of a temporary table creation, the new EXPLAIN shows "using_temporary_table": true key-value pair under the "ordering_operation" node, and if, for example, the "GROUP BY" causes a temporary index creation, the EXPLAIN outputs "using_filesort": true pair under the "grouping_operation" node.

Simple example with DISTINCT, GROUP BY and ORDER BY:

mysql> EXPLAIN SELECT DISTINCT * FROM t1 GROUP BY i ORDER BY j;
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |    3 | Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN FORMAT=JSON SELECT DISTINCT * FROM t1 GROUP BY i ORDER BY j;
+------------------------------------------------- ...
| EXPLAIN
+------------------------------------------------- ...
| {
  "query_block": {
    "select_id": 1,
    "ordering_operation": {
      "using_filesort": true,
      "duplicates_removal": {
        "using_filesort": false,
        "grouping_operation": {
          "using_temporary_table": true,
          "using_filesort": false,
          "table": {
            "table_name": "t1",
            "access_type": "ALL",
            "rows": 3,
            "filtered": 100
          } /* table */
        } /* grouping_operation */
      } /* duplicates_removal */
    } /* ordering_operation */
  } /* query_block */
} |
+------------------------------------------------- ...
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+-------+------+---------------------------------- ...
| Level | Code | Message                                                                                                                                         |
+-------+------+---------------------------------- ...
| Note  | 1003 | /* select#1 */ select distinct `test`.`t1`.`i` AS `i`,`test`.`t1`.`j` AS `j` from `test`.`t1` group by `test`.`t1`.`i` order by `test`.`t1`.`j` |
+-------+------+---------------------------------- ...
1 row in set (0.00 sec)
  

Another advanced feature is how the new EXPLAIN displays table materializations. It includes "derived tables", semi-join materialization etc.
Update: See Guilhem's post about subqueries with materialization for a complicated example.

Trivial subquery materialization example: "derived table"

mysql> EXPLAIN SELECT * FROM (SELECT * FROM t1) t;
+----+-------------+------------+------+---------------+------+---------+------+------+-------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+-------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL |    3 | NULL  |
|  2 | DERIVED     | t1         | ALL  | NULL          | NULL | NULL    | NULL |    3 | NULL  |
+----+-------------+------------+------+---------------+------+---------+------+------+-------+
2 rows in set (0.00 sec)

mysql> EXPLAIN FORMAT=JSON SELECT * FROM (SELECT * FROM t1) t;
+------------------------------------------------- ...
| EXPLAIN
+------------------------------------------------- ...
| {
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "t",
      "access_type": "ALL",
      "rows": 3,
      "filtered": 100,
      "materialized_from_subquery": {
        "using_temporary_table": true,
        "dependent": false,
        "cacheable": true,
        "query_block": {
          "select_id": 2,
          "table": {
            "table_name": "t1",
            "access_type": "ALL",
            "rows": 3,
            "filtered": 100
          } /* table */
        } /* query_block */
      } /* materialized_from_subquery */
    } /* table */
  } /* query_block */
} |
+------------------------------------------------- ...
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+-------+------+---------------------------------- ...
| Level | Code | Message                                                                                                                                             |
+-------+------+---------------------------------- ...
| Note  | 1003 | /* select#1 */ select `t`.`i` AS `i`,`t`.`j` AS `j` from (/* select#2 */ select `test`.`t1`.`i` AS `i`,`test`.`t1`.`j` AS `j` from `test`.`t1`) `t` |
+-------+------+---------------------------------- ...
1 row in set (0.00 sec)
  

The new JSON EXPLAIN is feature rich, please try it and have fun!


And, at the end of this post, let me thank Evgeny Potemkin and Guilhem Bichot for their great help in design and implementation of structured EXPLAIN!