JavaRush /Java Blog /Random EN /Cool SQL optimizations that don't depend on the cost mode...

Cool SQL optimizations that don't depend on the cost model. Part 2

Published in the Random EN group
Cool SQL optimizations that don't depend on the cost model. Part 1 Cool SQL optimizations that don't depend on the cost model.  Part 2 - 1

4. Elimination of "meaningless" predicates

Equally meaningless are predicates that are (almost) always true. As you can imagine if you are asking for:
SELECT * FROM actor WHERE 1 = 1;
... then the databases will not actually execute it, but will simply ignore it. I once answered a question about this on Stack Overflow , which is why I decided to write this article. I'll leave testing this as an exercise for the reader, but what happens if the predicate is a little less "meaningless"? For example:
SELECT * FROM film WHERE release_year = release_year;
Is it really necessary to compare the value with itself for each row? No, because there is no value for which this predicate will be FALSE , right? But we still need to test it. Although the predicate cannot be FALSE , it could very well be NULL everywhere , again due to three-valued logic. The RELEASE_YEAR column allows an undefined value, and if for any of the rows RELEASE_YEAR IS NULL , then NULL = NULL yields NULL and the row must be excluded. So the request becomes the following:
SELECT * FROM film WHERE release_year IS NOT NULL;
Which database does this?

DB2

Yes!
Explain Plan
-------------------------------------------------
ID | Operation    |                   Rows | Cost
 1 | RETURN       |                        |   49
 2 |  TBSCAN FILM | 1000 of 1000 (100.00%) |   49
Predicate Information
 2 - SARG Q1.RELEASE_YEAR IS NOT NULL

MySQL

Unfortunately, MySQL, again, doesn't show predicates in execution plans, so it's a bit tricky to find out if MySQL implements this particular optimization. You can perform a performance evaluation and see if any large-scale comparisons are being made. Or you can add an index:
CREATE INDEX i_release_year ON film (release_year);
And get plans in return for the following requests:
SELECT * FROM film WHERE release_year = release_year;
SELECT * FROM film WHERE release_year IS NOT NULL;
If the optimization works, then the plans of both queries should be approximately the same. But in this case it doesn't:
ID  TABLE  POSSIBLE_KEYS   ROWS  FILTERED  EXTRA
------------------------------------------------------
1   film             1000  10.00           Using where

ID  TABLE  POSSIBLE_KEYS   ROWS  FILTERED  EXTRA
------------------------------------------------------
1   film   i_release_year  1000  100.00    Using where
As you can see, our two queries differ significantly in the values ​​of the POSSIBLE_KEYS and FILTERED columns . So I'm going to risk a reasonable guess that MySQL doesn't optimize for this.

Oracle

Yes!
----------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows |
----------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |
|*  1 |  TABLE ACCESS FULL| FILM |      1 |   1000 |
----------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("RELEASE_YEAR" IS NOT NULL)

PostgreSQL

Unfortunately no!
QUERY PLAN
--------------------------------------------------------------
Seq Scan on film  (cost=0.00..67.50 rows=5 width=386)
  Filter: ((release_year)::integer = (release_year)::integer)
Plans and costs vary. Namely, look at the cardinality evaluation, which is completely useless, while this predicate:
SELECT * FROM film WHERE release_year IS NOT NULL;
gives much better results:
QUERY PLAN
---------------------------------------------------------
Seq Scan on film  (cost=0.00..65.00 rows=1000 width=386)
  Filter: (release_year IS NOT NULL)
Bummer!

SQL Server

Oddly enough, SQL Server doesn't seem to do this either:
|--Table Scan(OBJECT:([film]), WHERE:([release_year]=[release_year]))
However, according to the appearance of the plan, the assessment of cardinality is correct, as are the costs. But in my experience with SQL Server, I would say that, in this case, no optimization takes place, since SQL Server would show the actually executed predicate in the plan (to understand why, take a look at the examples of the CHECK constraint below ) . What about "meaningless" predicates on NOT NULL columns? The above conversion was only necessary because RELEASE_YEAR can be null. What happens if you run the same meaningless query, for example, with the FILM_ID column ?
SELECT * FROM film WHERE film_id = film_id
Now it corresponds to no predicate at all? Or at least that's how it should be. But is it?

DB2

Yes!
Explain Plan
-------------------------------------------------
ID | Operation    |                   Rows | Cost
 1 | RETURN       |                        |   49
 2 |  TBSCAN FILM | 1000 of 1000 (100.00%) |   49
No predicates are applied at all and we select all movies.

MySQL

Yes! (Again, an educated guess)
ID  TABLE  POSSIBLE_KEYS   ROWS  FILTERED  EXTRA
------------------------------------------------------
1   film                   1000  100.00
Notice how the EXTRA column is now empty, as if we don't have a WHERE clause at all!

Oracle

Yes!
----------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows |
----------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |
|   1 |  TABLE ACCESS FULL| FILM |      1 |   1000 |
----------------------------------------------------
Again, no predicates are applied.

PostgreSQL

Wow, again no!
QUERY PLAN
------------------------------------------------------
Seq Scan on film  (cost=0.00..67.50 rows=5 width=386)
  Filter: (film_id = film_id)
The filter is applied and the cardinality score is still 5. Bummer!

SQL Server

And again, no!
|--Table Scan(OBJECT:([film]), WHERE:([film_id]=[film_id]))

Summary

It seems to be a simple optimization, but it is by no means used in all DBMS, in particular, oddly enough, it is not used in SQL Server!
Database Pointless but necessary predicates (NULL semantics) Meaningless and unnecessary predicates (non-NULL semantics)
DB2 LUW 10.5 Yes Yes
MySQL 8.0.2 No Yes
Oracle 12.2.0.1 Yes Yes
PostgreSQL 9.6 No No
SQL Server 2014 No No

5. Projections in EXISTS subqueries

Interestingly, I get asked about them all the time in my master class, where I defend the point of view that SELECT * usually does not lead to good. The question is: can you use SELECT * in an EXISTS subquery ? For example, if we need to find actors who have played in movies...
SELECT first_name, last_name
FROM actor a
WHERE EXISTS (
  SELECT * -- Is this OK?
  FROM film_actor fa
  WHERE a.actor_id = fa.actor_id
)
And the answer is... yes. Can. The asterisk has no effect on the query. How to be sure of this? Consider the following query:
-- DB2
SELECT 1 / 0 FROM sysibm.dual
-- Oracle
SELECT 1 / 0 FROM dual
-- PostgreSQL, SQL Server
SELECT 1 / 0
-- MySQL
SELECT pow(-1, 0.5);
All of these databases report a divide-by-zero error. Note an interesting fact: in MySQL, when dividing by zero, we get NULL as a result , not an error, so we have to perform another forbidden action. Now, what happens if we execute, instead of the above, these queries?
-- DB2
SELECT CASE WHEN EXISTS (
  SELECT 1 / 0 FROM sysibm.dual
) THEN 1 ELSE 0 END
FROM sysibm.dual
-- Oracle
SELECT CASE WHEN EXISTS (
  SELECT 1 / 0 FROM dual
) THEN 1 ELSE 0 END
FROM dual
-- PostgreSQL
SELECT EXISTS (SELECT 1 / 0)
-- SQL Server
SELECT CASE WHEN EXISTS (
  SELECT 1 / 0
) THEN 1 ELSE 0 END
-- MySQL
SELECT EXISTS (SELECT pow(-1, 0.5));
Now none of the databases return an error. They all return TRUE or 1 . This means that none of our databases actually evaluate the projection (that is, the SELECT clause ) of the EXISTS subquery . SQL Server, for example, shows the following plan:
|--Constant Scan(VALUES:((CASE WHEN (1) THEN (1) ELSE (0) END)))
As you can see, the CASE expression has been converted to a constant and the subquery has been eliminated. In other databases, the subquery is stored in the plan and nothing is mentioned about the projection, so let's take another look at the plan of the original query in Oracle:
SELECT first_name, last_name
FROM actor a
WHERE EXISTS (
  SELECT *
  FROM film_actor fa
  WHERE a.actor_id = fa.actor_id
)
The plan for the above query looks like this:
------------------------------------------------------------------
| Id  | Operation             | Name                    | E-Rows |
------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                         |        |
|*  1 |  HASH JOIN SEMI       |                         |    200 |
|   2 |   TABLE ACCESS FULL   | ACTOR                   |    200 |
|   3 |   INDEX FAST FULL SCAN| IDX_FK_FILM_ACTOR_ACTOR |   5462 |
------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."ACTOR_ID"="FA"."ACTOR_ID")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) LAST_NAME, FIRST_NAME
   2 - (rowset=256) A.ACTOR_ID, FIRST_NAME, LAST_NAME
   3 - FA.ACTOR_ID
Observing projection information at Id=3 . In fact, we don't even access the FILM_ACTOR table because we don't need to. The EXISTS predicate can be done with a foreign key index on a single ACTOR_ID column - whatever is needed for this query - despite the fact that we wrote SELECT * .

Summary

Fortunately, all of our databases remove the projection from EXISTS subqueries :
Database EXIST projection
DB2 LUW 10.5 Yes
MySQL 8.0.2 Yes
Oracle 12.2.0.1 Yes
PostgreSQL 9.6 Yes
SQL Server 2014 Yes
Stay tuned for Part 3 , where we'll discuss the rest of the cool SQL optimizations.
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION