Cool SQL optimizations that do not depend on the cost model. Part 1 Cool SQL optimizations that do not depend on the cost model. Part 2 Cool SQL optimizations that do not depend on the cost model. Part 3
8. CHECK restrictions
Oh, this is cool stuff! Our Sakila database has a CHECK constraint on the FILM.RATING column :CREATE TABLE film (
..
RATING varchar(10) DEFAULT 'G',
..
CONSTRAINT check_special_rating
CHECK (rating IN ('G','PG','PG-13','R','NC-17')),
..
);
Seriously, use CHECK constraints to ensure data integrity. The cost of adding them is extremely low - much less than other restrictions, for example, PRIMARY , UNIQUE or FOREIGN KEY , because they do not require an index to work, so you get them practically "for free". But there is an interesting nuance related to optimization! Consider the following queries:
Impossible predicates
We've already encountered impossible predicates , even NOT NULL constraints (which are actually a special kind of CHECK constraint ), but this one is even cooler:SELECT *
FROM film
WHERE rating = 'N/A';
There is no such movie, and there cannot be, since the CHECK constraint prevents its insertion (or update). Again, this should translate to a command to do nothing. What about this request?
CREATE INDEX idx_film_rating ON film (rating);
SELECT count(*)
FROM film
WHERE rating NOT IN ('G','PG','PG-13','R');
Thanks to the index above, it's probably enough to just do a quick scan of the index and count all the movies with rating = 'NC-17' , since that's the only rating left. So the query should be rewritten like this:
SELECT count(*)
FROM film
WHERE rating = 'NC-17';
This should be the case regardless of the index, because comparing a column with one value is faster than comparing with 4. So, what databases can do this?
DB2
Impossible predicate (rating = 'N/A') Cool!Explain Plan
-----------------------------------
ID | Operation | Rows | Cost
1 | RETURN | | 0
2 | TBSCAN GENROW | 0 of 0 | 0
Predicate Information
2 - RESID (1 = 0)
Reverse predicate (rating = 'NC-17') Nope...
Explain Plan
------------------------------------------------------------
ID | Operation | Rows | Cost
1 | RETURN | | 34
2 | GRPBY (COMPLETE) | 1 of 210 ( .48%) | 34
3 | IXSCAN IDX_FILM_RATING | 210 of 1000 ( 21.00%) | 34
Predicate Information
3 - SARG NOT(Q1.RATING IN ('G', 'PG', 'PG-13', 'R'))
Although the ID=3 step uses an index, and although the cardinalities are correct, a full scan occurs because the plan does not have a range predicate, only the "SARG" predicate. See Marcus Wynand's review for details . You can also demonstrate this by manually inverting the predicate and getting:
Explain Plan
------------------------------------------------------------
ID | Operation | Rows | Cost
1 | RETURN | | 7
2 | GRPBY (COMPLETE) | 1 of 210 ( .48%) | 7
3 | IXSCAN IDX_FILM_RATING | 210 of 1000 ( 21.00%) | 7
Predicate Information
3 - START (Q1.RATING = 'NC-17')
STOP (Q1.RATING = 'NC-17')
Now we have the desired range predicate.
MySQL
MySQL supports the CHECK constraint syntax , but for some reason does not enforce them. Try this:CREATE TABLE x (a INT CHECK (a != 0));
INSERT INTO x VALUES (0);
SELECT * FROM x;
and you will get:
A
-
0
Zero points for MySQL (really, why not just support CHECK constraints ?)
Oracle
Impossible predicate (rating = 'N/A')--------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |
|* 1 | FILTER | | 1 | | 0 |
|* 2 | TABLE ACCESS FULL| FILM | 0 | 89 | 0 |
--------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NULL IS NOT NULL)
2 - filter("RATING"='N/A')
Again, a very strange filter NULL IS NOT NULL , cutting off FULL TABLE SCAN , which could just as easily be removed from the plan altogether. But at least it works! Reverse predicate (rating = 'NC-17') Oops:
----------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | INDEX FAST FULL SCAN| IDX_FILM_RATING | 1 | 415 | 210 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter((RATING'PG-13' AND RATING'R' AND RATING'PG' AND RATING'G'))
The predicate cannot be inverted, the cardinality assessment is very lame, in addition we get INDEX FAST FULL SCAN instead of INDEX RANGE SCAN , and the filter predicate instead of the access predicate . But this is what we should get, for example, by manually inverting the predicate:
------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | INDEX RANGE SCAN| IDX_FILM_RATING | 1 | 210 | 210 |
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("RATING"='NC-17')
Bummer!
PostgreSQL
Note that the PostgreSQL version of Sakila database uses the ENUM type instead of CHECK constraints on the RATING column . I duplicated the table using a CHECK constraint instead . Impossible predicate (rating = 'N/A') Doesn't work:QUERY PLAN
------------------------------------------------------
Seq Scan on film2 (cost=0.00..67.50 rows=1 width=385)
Filter: ((rating)::text = 'N/A'::text)
The reverse predicate (rating = 'NC-17') doesn't work either:
QUERY PLAN
------------------------------------------------------------------
Aggregate (cost=70.53..70.54 rows=1 width=8)
-> Seq Scan on film2 (cost=0.00..70.00 rows=210 width=0)
Filter: ((rating)::text ALL ('{G,PG,PG-13,R}'::text[]))
Very sorry! Note: As David Rowley kindly pointed out to us in the comments , this feature can be enabled by setting the parameter:
SET constraint_exclusion TO on;
SQL Server
Невозможный предикат (rating = 'N/A')
Да!
|--Constant Scan
Reverse predicate (rating = 'NC-17') Yes too!
|--Compute Scalar
|--Stream Aggregate
|--Index Seek(OBJECT:([idx_film_rating]), SEEK:([rating]='NC-17'))
Summary
Database | Impossible predicate | Reverse predicate |
---|---|---|
DB2 LUW 10.5 | Yes | No |
MySQL 8.0.2 | Not supported | Not supported |
Oracle 12.2.0.1 | Yes | No |
PostgreSQL 9.6 | No | No |
9. Unnecessary reflexive connections.
As your queries become more complex, you may well end up needing to perform a reflective join on a table based on its primary key. Believe me, this is a very common practice when building complex views and connecting them to each other, so making sure the database pays attention to this is a critical part of optimizing complex SQL code. I will not demonstrate a complex example, a simple one will suffice, for example:SELECT a1.first_name, a1.last_name
FROM actor a1
JOIN actor a2 ON a1.actor_id = a2.actor_id;
This can be seen as a special case of JOIN elimination , since we don't actually need a join to A2 , we can do everything we need with just table A1 . Next, INNER JOIN elimination only works properly if there is a FOREIGN KEY , which we don't have here. But thanks to the primary key by ACTOR_ID , we can prove that in fact A1 = A2 . In a sense, this is again transitive closure . You can go even further and use columns from both tables A1 and A2 :
SELECT a1.first_name, a2.last_name
FROM actor a1
JOIN actor a2 ON a1.actor_id = a2.actor_id;
In the classic case of JOIN elimination , it would no longer be possible to eliminate it since both tables are projected. But since we have already proven that A1 = A2 , then they are interchangeable, so we can expect the query to be converted to:
SELECT first_name, last_name
FROM actor;
What DBMS can do this?
DB2
Projection of table A1 only Yes:Explain Plan
------------------------------------------------
ID | Operation | Rows | Cost
1 | RETURN | | 20
2 | TBSCAN ACTOR | 200 of 200 (100.00%) | 20
Projection of tables A1 and A2 ... also yes:
Explain Plan
------------------------------------------------
ID | Operation | Rows | Cost
1 | RETURN | | 20
2 | TBSCAN ACTOR | 200 of 200 (100.00%) | 20
MySQL
Projection of table A1 only No.ID TABLE REF EXTRA
-----------------------------------
1 a1
1 a2 a1.actor_id Using index
Projection of tables A1 and A2 ... also no
ID TABLE REF EXTRA
-----------------------------------
1 a1
1 a2 a1.actor_id
Complete disappointment...
Oracle
Projection of table A1 only Yes--------------------------------------------
| Id | Operation | Name | E-Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | TABLE ACCESS FULL| ACTOR | 200 |
--------------------------------------------
Projection of tables A1 and A2 Yes again
--------------------------------------------
| Id | Operation | Name | E-Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | TABLE ACCESS FULL| ACTOR | 200 |
--------------------------------------------
PostgreSQL
Projection of table A1 only No:QUERY PLAN
--------------------------------------------------------------------
Hash Join (cost=6.50..13.25 rows=200 width=13)
Hash Cond: (a1.actor_id = a2.actor_id)
-> Seq Scan on actor a1 (cost=0.00..4.00 rows=200 width=17)
-> Hash (cost=4.00..4.00 rows=200 width=4)
-> Seq Scan on actor a2 (cost=0.00..4.00 rows=200 width=4)
Projection of tables A1 and A2 And again no:
QUERY PLAN
---------------------------------------------------------------------
Hash Join (cost=6.50..13.25 rows=200 width=13)
Hash Cond: (a1.actor_id = a2.actor_id)
-> Seq Scan on actor a1 (cost=0.00..4.00 rows=200 width=10)
-> Hash (cost=4.00..4.00 rows=200 width=11)
-> Seq Scan on actor a2 (cost=0.00..4.00 rows=200 width=11)
SQL Server
Projection of table A1 only Oddly enough, no! (But keep in mind that I'm using SQL Server 2014, newer versions may have this fixed. I could definitely use an upgrade!)|--Merge Join(Inner Join, MERGE:([a2].[actor_id])=([a1].[actor_id]))
|--Index Scan(OBJECT:([a2]))
|--Sort(ORDER BY:([a1].[actor_id] ASC))
|--Table Scan(OBJECT:([a1]))
Projection of tables A1 and A2 Not again, and the plan even changed for the worse:
|--Hash Match(Inner Join, HASH:([a1].[actor_id])=([a2].[actor_id]))
|--Table Scan(OBJECT:([sakila].[dbo].[actor] AS [a1]))
|--Table Scan(OBJECT:([sakila].[dbo].[actor] AS [a2]))
Summary
Frankly speaking, I expected that this optimization would be performed on all databases, but I was very mistaken, sadly. Along with eliminating JOIN , this is one of the most important optimizations, allowing you to build huge SQL queries from reusable parts like views and table functions. Unfortunately, it is not supported in 3 of the 5 most common databases.Database | Removing reflective join, single table projection | Reflexive connection elimination, full projection |
---|---|---|
DB2 LUW 10.5 | Yes | Yes |
MySQL 8.0.2 | No | No |
Oracle 12.2.0.1 | Yes | Yes |
PostgreSQL 9.6 | No | No |
SQL Server 2014 | No | No |
GO TO FULL VERSION