JavaRush /Java Blog /Random EN /Cool SQL optimizations that do not depend on the cost mod...

Level 41

28 February 2021
49 views
0 comments

Cool SQL optimizations that do not depend on the cost model. Part 4

Cool SQL optimizations that do not depend on the cost model. Part 1 Cool SQL optimizations that do not depend on the cost model. Part 2 Cool SQL optimizations that do not depend on the cost model. Part 3

8. CHECK restrictions

Oh, this is cool stuff! Our Sakila database has a CHECK constraint on the FILM.RATING column :

CREATE TABLE film (
  ..
  RATING varchar(10) DEFAULT 'G',
  ..
  CONSTRAINT check_special_rating
    CHECK (rating IN ('G','PG','PG-13','R','NC-17')),
  ..
);

Seriously, use CHECK constraints to ensure data integrity. The cost of adding them is extremely low - much less than other restrictions, for example, PRIMARY , UNIQUE or FOREIGN KEY , because they do not require an index to work, so you get them practically "for free". But there is an interesting nuance related to optimization! Consider the following queries:

Impossible predicates

We've already encountered impossible predicates , even NOT NULL constraints (which are actually a special kind of CHECK constraint ), but this one is even cooler:

SELECT *
FROM film
WHERE rating = 'N/A';

There is no such movie, and there cannot be, since the CHECK constraint prevents its insertion (or update). Again, this should translate to a command to do nothing. What about this request?

CREATE INDEX idx_film_rating ON film (rating);

SELECT count(*)
FROM film
WHERE rating NOT IN ('G','PG','PG-13','R');

Thanks to the index above, it's probably enough to just do a quick scan of the index and count all the movies with rating = 'NC-17' , since that's the only rating left. So the query should be rewritten like this:

SELECT count(*)
FROM film
WHERE rating = 'NC-17';

This should be the case regardless of the index, because comparing a column with one value is faster than comparing with 4. So, what databases can do this?

DB2

Impossible predicate (rating = 'N/A') Cool!

Explain Plan
-----------------------------------
ID | Operation      |   Rows | Cost
 1 | RETURN         |        |    0
 2 |  TBSCAN GENROW | 0 of 0 |    0

Predicate Information
 2 - RESID (1 = 0)

Reverse predicate (rating = 'NC-17') Nope...

Explain Plan
------------------------------------------------------------
ID | Operation                |                  Rows | Cost
 1 | RETURN                   |                       |   34
 2 |  GRPBY (COMPLETE)        |    1 of 210 (   .48%) |   34
 3 |   IXSCAN IDX_FILM_RATING | 210 of 1000 ( 21.00%) |   34

Predicate Information
 3 - SARG  NOT(Q1.RATING IN ('G', 'PG', 'PG-13', 'R'))

Although the ID=3 step uses an index, and although the cardinalities are correct, a full scan occurs because the plan does not have a range predicate, only the "SARG" predicate. See Marcus Wynand's review for details . You can also demonstrate this by manually inverting the predicate and getting:

Explain Plan
------------------------------------------------------------
ID | Operation                |                  Rows | Cost
 1 | RETURN                   |                       |    7
 2 |  GRPBY (COMPLETE)        |    1 of 210 (   .48%) |    7
 3 |   IXSCAN IDX_FILM_RATING | 210 of 1000 ( 21.00%) |    7

Predicate Information
 3 - START (Q1.RATING = 'NC-17')
      STOP (Q1.RATING = 'NC-17')

Now we have the desired range predicate.

MySQL

MySQL supports the CHECK constraint syntax , but for some reason does not enforce them. Try this:

CREATE TABLE x (a INT CHECK (a != 0));
INSERT INTO x VALUES (0);
SELECT * FROM x;

and you will get:

A
-
0

Zero points for MySQL (really, why not just support CHECK constraints ?)

Oracle

Impossible predicate (rating = 'N/A')

--------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      0 |
|*  1 |  FILTER            |      |      1 |        |      0 |
|*  2 |   TABLE ACCESS FULL| FILM |      0 |     89 |      0 |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(NULL IS NOT NULL)
   2 - filter("RATING"='N/A')

Again, a very strange filter NULL IS NOT NULL , cutting off FULL TABLE SCAN , which could just as easily be removed from the plan altogether. But at least it works! Reverse predicate (rating = 'NC-17') Oops:

----------------------------------------------------------------------------
| Id  | Operation             | Name            | Starts | E-Rows | A-Rows |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                 |      1 |        |      1 |
|   1 |  SORT AGGREGATE       |                 |      1 |      1 |      1 |
|*  2 |   INDEX FAST FULL SCAN| IDX_FILM_RATING |      1 |    415 |    210 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter((RATING'PG-13' AND RATING'R' AND RATING'PG' AND RATING'G'))

The predicate cannot be inverted, the cardinality assessment is very lame, in addition we get INDEX FAST FULL SCAN instead of INDEX RANGE SCAN , and the filter predicate instead of the access predicate . But this is what we should get, for example, by manually inverting the predicate:

------------------------------------------------------------------------
| Id  | Operation         | Name            | Starts | E-Rows | A-Rows |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                 |      1 |        |      1 |
|   1 |  SORT AGGREGATE   |                 |      1 |      1 |      1 |
|*  2 |   INDEX RANGE SCAN| IDX_FILM_RATING |      1 |    210 |    210 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("RATING"='NC-17')

Bummer!

PostgreSQL

Note that the PostgreSQL version of Sakila database uses the ENUM type instead of CHECK constraints on the RATING column . I duplicated the table using a CHECK constraint instead . Impossible predicate (rating = 'N/A') Doesn't work:

QUERY PLAN
------------------------------------------------------
Seq Scan on film2  (cost=0.00..67.50 rows=1 width=385)
  Filter: ((rating)::text = 'N/A'::text)

The reverse predicate (rating = 'NC-17') doesn't work either:

QUERY PLAN
------------------------------------------------------------------
Aggregate  (cost=70.53..70.54 rows=1 width=8)
  ->  Seq Scan on film2  (cost=0.00..70.00 rows=210 width=0)
        Filter: ((rating)::text  ALL ('{G,PG,PG-13,R}'::text[]))

Very sorry! Note: As David Rowley kindly pointed out to us in the comments , this feature can be enabled by setting the parameter:

SET constraint_exclusion TO on;
SQL Server
Невозможный предикат (rating = 'N/A')
Да!
|--Constant Scan

Reverse predicate (rating = 'NC-17') Yes too!

|--Compute Scalar
     |--Stream Aggregate
          |--Index Seek(OBJECT:([idx_film_rating]), SEEK:([rating]='NC-17'))

Summary

Database	Impossible predicate	Reverse predicate
DB2 LUW 10.5	Yes	No
MySQL 8.0.2	Not supported	Not supported
Oracle 12.2.0.1	Yes	No
PostgreSQL 9.6	No	No

9. Unnecessary reflexive connections.

As your queries become more complex, you may well end up needing to perform a reflective join on a table based on its primary key. Believe me, this is a very common practice when building complex views and connecting them to each other, so making sure the database pays attention to this is a critical part of optimizing complex SQL code. I will not demonstrate a complex example, a simple one will suffice, for example:

SELECT a1.first_name, a1.last_name
FROM actor a1
JOIN actor a2 ON a1.actor_id = a2.actor_id;

This can be seen as a special case of JOIN elimination , since we don't actually need a join to A2 , we can do everything we need with just table A1 . Next, INNER JOIN elimination only works properly if there is a FOREIGN KEY , which we don't have here. But thanks to the primary key by ACTOR_ID , we can prove that in fact A1 = A2 . In a sense, this is again transitive closure . You can go even further and use columns from both tables A1 and A2 :

SELECT a1.first_name, a2.last_name
FROM actor a1
JOIN actor a2 ON a1.actor_id = a2.actor_id;

In the classic case of JOIN elimination , it would no longer be possible to eliminate it since both tables are projected. But since we have already proven that A1 = A2 , then they are interchangeable, so we can expect the query to be converted to:

SELECT first_name, last_name
FROM actor;

What DBMS can do this?

DB2

Projection of table A1 only Yes:

Explain Plan
------------------------------------------------
ID | Operation     |                 Rows | Cost
 1 | RETURN        |                      |   20
 2 |  TBSCAN ACTOR | 200 of 200 (100.00%) |   20

Projection of tables A1 and A2 ... also yes:

Explain Plan
------------------------------------------------
ID | Operation     |                 Rows | Cost
 1 | RETURN        |                      |   20
 2 |  TBSCAN ACTOR | 200 of 200 (100.00%) |   20

MySQL

Projection of table A1 only No.

ID  TABLE  REF          EXTRA
-----------------------------------
1   a1
1   a2     a1.actor_id  Using index

Projection of tables A1 and A2 ... also no

ID  TABLE  REF          EXTRA
-----------------------------------
1   a1
1   a2     a1.actor_id

Complete disappointment...

Oracle

Projection of table A1 only Yes

--------------------------------------------
| Id  | Operation         | Name  | E-Rows |
--------------------------------------------
|   0 | SELECT STATEMENT  |       |        |
|   1 |  TABLE ACCESS FULL| ACTOR |    200 |
--------------------------------------------

Projection of tables A1 and A2 Yes again

--------------------------------------------
| Id  | Operation         | Name  | E-Rows |
--------------------------------------------
|   0 | SELECT STATEMENT  |       |        |
|   1 |  TABLE ACCESS FULL| ACTOR |    200 |
--------------------------------------------

PostgreSQL

Projection of table A1 only No:

QUERY PLAN
--------------------------------------------------------------------
Hash Join  (cost=6.50..13.25 rows=200 width=13)
  Hash Cond: (a1.actor_id = a2.actor_id)
  ->  Seq Scan on actor a1  (cost=0.00..4.00 rows=200 width=17)
  ->  Hash  (cost=4.00..4.00 rows=200 width=4)
        ->  Seq Scan on actor a2  (cost=0.00..4.00 rows=200 width=4)

Projection of tables A1 and A2 And again no:

QUERY PLAN
---------------------------------------------------------------------
Hash Join  (cost=6.50..13.25 rows=200 width=13)
  Hash Cond: (a1.actor_id = a2.actor_id)
  ->  Seq Scan on actor a1  (cost=0.00..4.00 rows=200 width=10)
  ->  Hash  (cost=4.00..4.00 rows=200 width=11)
        ->  Seq Scan on actor a2  (cost=0.00..4.00 rows=200 width=11)

SQL Server

Projection of table A1 only Oddly enough, no! (But keep in mind that I'm using SQL Server 2014, newer versions may have this fixed. I could definitely use an upgrade!)

|--Merge Join(Inner Join, MERGE:([a2].[actor_id])=([a1].[actor_id]))
     |--Index Scan(OBJECT:([a2]))
     |--Sort(ORDER BY:([a1].[actor_id] ASC))
          |--Table Scan(OBJECT:([a1]))

Projection of tables A1 and A2 Not again, and the plan even changed for the worse:

|--Hash Match(Inner Join, HASH:([a1].[actor_id])=([a2].[actor_id]))
     |--Table Scan(OBJECT:([sakila].[dbo].[actor] AS [a1]))
     |--Table Scan(OBJECT:([sakila].[dbo].[actor] AS [a2]))

Summary

Frankly speaking, I expected that this optimization would be performed on all databases, but I was very mistaken, sadly. Along with eliminating JOIN , this is one of the most important optimizations, allowing you to build huge SQL queries from reusable parts like views and table functions. Unfortunately, it is not supported in 3 of the 5 most common databases.

Database	Removing reflective join, single table projection	Reflexive connection elimination, full projection
DB2 LUW 10.5	Yes	Yes
MySQL 8.0.2	No	No
Oracle 12.2.0.1	Yes	Yes
PostgreSQL 9.6	No	No
SQL Server 2014	No	No