JavaRush /Java Blog /Random EN /Cool SQL optimizations that do not depend on the cost mod...

Cool SQL optimizations that do not depend on the cost model. Part 2

Published in the Random EN group
Cool SQL optimizations that do not depend on the cost model. Part 1 Cool SQL optimizations that do not depend on the cost model.  Part 2 - 1

4. Elimination of "meaningless" predicates

Equally meaningless are predicates that are (almost) always true. As you can imagine, if you are asking:
SELECT * FROM actor WHERE 1 = 1;
...then the databases will not actually execute it, but will simply ignore it. I once answered a question about this on Stack Overflow and that is why I decided to write this article. I'll leave testing this as an exercise to the reader, but what happens if the predicate is a little less "meaningless"? For example:
SELECT * FROM film WHERE release_year = release_year;
Do you really need to compare the value to itself for each row? No, there is no value for which this predicate would be FALSE , right? But we still need to check it. Although the predicate cannot be equal to FALSE , it may well be equal to NULL everywhere , again due to three-valued logic. The RELEASE_YEAR column is nullable, and if any of the rows have RELEASE_YEAR IS NULL , then NULL = NULL results in NULL and the row must be eliminated. So the request becomes the following:
SELECT * FROM film WHERE release_year IS NOT NULL;
Which databases do this?

DB2

Yes!
Explain Plan
-------------------------------------------------
ID | Operation    |                   Rows | Cost
 1 | RETURN       |                        |   49
 2 |  TBSCAN FILM | 1000 of 1000 (100.00%) |   49
Predicate Information
 2 - SARG Q1.RELEASE_YEAR IS NOT NULL

MySQL

It's a shame, but MySQL, again, doesn't map predicates into execution plans, so figuring out whether MySQL implements this particular optimization is a little tricky. You can perform a performance assessment and see if any large-scale comparisons are being made. Or you can add an index:
CREATE INDEX i_release_year ON film (release_year);
And get plans for the following requests in return:
SELECT * FROM film WHERE release_year = release_year;
SELECT * FROM film WHERE release_year IS NOT NULL;
If the optimization works, then the plans of both queries should be approximately the same. But in this case this is not the case:
ID  TABLE  POSSIBLE_KEYS   ROWS  FILTERED  EXTRA
------------------------------------------------------
1   film             1000  10.00           Using where

ID  TABLE  POSSIBLE_KEYS   ROWS  FILTERED  EXTRA
------------------------------------------------------
1   film   i_release_year  1000  100.00    Using where
As you can see, our two queries differ significantly in the values ​​of the POSSIBLE_KEYS and FILTERED columns . So I'd venture a reasonable guess that MySQL doesn't optimize this.

Oracle

Yes!
----------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows |
----------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |
|*  1 |  TABLE ACCESS FULL| FILM |      1 |   1000 |
----------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("RELEASE_YEAR" IS NOT NULL)

PostgreSQL

Unfortunately no!
QUERY PLAN
--------------------------------------------------------------
Seq Scan on film  (cost=0.00..67.50 rows=5 width=386)
  Filter: ((release_year)::integer = (release_year)::integer)
Plans and costs vary. Namely, look at the evaluation of cardinality, which is absolutely no good, while this predicate:
SELECT * FROM film WHERE release_year IS NOT NULL;
gives much better results:
QUERY PLAN
---------------------------------------------------------
Seq Scan on film  (cost=0.00..65.00 rows=1000 width=386)
  Filter: (release_year IS NOT NULL)
Bummer!

SQL Server

Oddly enough, SQL Server doesn't seem to do this either:
|--Table Scan(OBJECT:([film]), WHERE:([release_year]=[release_year]))
However, based on the appearance of the plan, the cardinality assessment is correct, as is the cost. But in my experience with SQL Server, I would say that in this case, no optimization occurs, since SQL Server would display the actually executed predicate in the plan (to see why, take a look at the CHECK constraint examples below). What about "meaningless" predicates on NOT NULL columns? The above conversion was only necessary because RELEASE_YEAR can be undefined. What happens if you run the same meaningless query on, for example, the FILM_ID column ?
SELECT * FROM film WHERE film_id = film_id
Does it now correspond to no predicate at all? Or at least that's how it should be. But is it?

DB2

Yes!
Explain Plan
-------------------------------------------------
ID | Operation    |                   Rows | Cost
 1 | RETURN       |                        |   49
 2 |  TBSCAN FILM | 1000 of 1000 (100.00%) |   49
No predicates are applied at all and we select all movies.

MySQL

Yes! (Again, educated guess)
ID  TABLE  POSSIBLE_KEYS   ROWS  FILTERED  EXTRA
------------------------------------------------------
1   film                   1000  100.00
Notice that the EXTRA column is now empty, as if we don't have a WHERE clause at all!

Oracle

Yes!
----------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows |
----------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |
|   1 |  TABLE ACCESS FULL| FILM |      1 |   1000 |
----------------------------------------------------
Again, no predicates apply.

PostgreSQL

Wow, no again!
QUERY PLAN
------------------------------------------------------
Seq Scan on film  (cost=0.00..67.50 rows=5 width=386)
  Filter: (film_id = film_id)
The filter is applied and the cardinality score is still 5. Bummer!

SQL Server

And here again no!
|--Table Scan(OBJECT:([film]), WHERE:([film_id]=[film_id]))

Summary

It seems like a simple optimization, but it is not used in all DBMSs; in particular, oddly enough, it is not used in SQL Server!
Database Meaningless but necessary predicates (NULL semantics) Meaningless and unnecessary predicates (non-NULL semantics)
DB2 LUW 10.5 Yes Yes
MySQL 8.0.2 No Yes
Oracle 12.2.0.1 Yes Yes
PostgreSQL 9.6 No No
SQL Server 2014 No No

5. Projections in EXISTS subqueries

Interestingly, I get asked about them all the time at my master class, where I defend the point of view that SELECT * usually does not lead to any good. The question is: is it possible to use SELECT * in an EXISTS subquery ? For example, if we need to find actors who played in films...
SELECT first_name, last_name
FROM actor a
WHERE EXISTS (
  SELECT * -- Is this OK?
  FROM film_actor fa
  WHERE a.actor_id = fa.actor_id
)
And the answer is... yes. Can. The asterisk does not affect the request. How can you be sure of this? Consider the following query:
-- DB2
SELECT 1 / 0 FROM sysibm.dual
-- Oracle
SELECT 1 / 0 FROM dual
-- PostgreSQL, SQL Server
SELECT 1 / 0
-- MySQL
SELECT pow(-1, 0.5);
All of these databases report a division by zero error. Note an interesting fact: in MySQL, when we divide by zero, we get NULL instead of an error, so we have to do something else that is not allowed. Now, what happens if we execute, instead of the above, the following queries?
-- DB2
SELECT CASE WHEN EXISTS (
  SELECT 1 / 0 FROM sysibm.dual
) THEN 1 ELSE 0 END
FROM sysibm.dual
-- Oracle
SELECT CASE WHEN EXISTS (
  SELECT 1 / 0 FROM dual
) THEN 1 ELSE 0 END
FROM dual
-- PostgreSQL
SELECT EXISTS (SELECT 1 / 0)
-- SQL Server
SELECT CASE WHEN EXISTS (
  SELECT 1 / 0
) THEN 1 ELSE 0 END
-- MySQL
SELECT EXISTS (SELECT pow(-1, 0.5));
Now none of the databases return an error. They all return TRUE or 1 . This means that none of our databases actually evaluate the projection (that is, the SELECT clause ) of the EXISTS subquery . SQL Server, for example, shows the following plan:
|--Constant Scan(VALUES:((CASE WHEN (1) THEN (1) ELSE (0) END)))
As you can see, the CASE expression has been converted to a constant and the subquery has been eliminated. Other databases store the subquery in the plan and don't mention anything about the projection, so let's take another look at the original query plan in Oracle:
SELECT first_name, last_name
FROM actor a
WHERE EXISTS (
  SELECT *
  FROM film_actor fa
  WHERE a.actor_id = fa.actor_id
)
The above query plan looks like this:
------------------------------------------------------------------
| Id  | Operation             | Name                    | E-Rows |
------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                         |        |
|*  1 |  HASH JOIN SEMI       |                         |    200 |
|   2 |   TABLE ACCESS FULL   | ACTOR                   |    200 |
|   3 |   INDEX FAST FULL SCAN| IDX_FK_FILM_ACTOR_ACTOR |   5462 |
------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."ACTOR_ID"="FA"."ACTOR_ID")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) LAST_NAME, FIRST_NAME
   2 - (rowset=256) A.ACTOR_ID, FIRST_NAME, LAST_NAME
   3 - FA.ACTOR_ID
We observe information about the projection at Id=3 . In fact, we don't even access the FILM_ACTOR table because we don't need to. The EXISTS predicate can be performed using a foreign key index on a single ACTOR_ID column - all that is needed for this query - even though we wrote SELECT * .

Summary

Luckily, all of our databases remove projection from EXISTS subqueries :
Database Projection EXISTS
DB2 LUW 10.5 Yes
MySQL 8.0.2 Yes
Oracle 12.2.0.1 Yes
PostgreSQL 9.6 Yes
SQL Server 2014 Yes
Stay tuned for Part 3 , where we'll discuss other cool SQL optimizations.
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION