Cool SQL optimizations that do not depend on the cost model. Part 1 Cool SQL optimizations that do not depend on the cost model. Part 2 Cool SQL optimizations that do not depend on the cost model. Part 3 Cool SQL optimizations that do not depend on the cost model. Part 4
10. Pushing predicates
This optimization is not entirely appropriate here, because it cannot be said that it is not based at all on the cost model. But since I can't think of a single reason why the optimizer shouldn't push predicates into derived tables, I'll list it here, along with the rest of the non-cost optimizations. Consider the request:SELECT *
FROM (
SELECT *
FROM actor
) a
WHERE a.actor_id = 1;
The derived table in this query does not make any sense and should be eliminated by reducing the number of query nesting levels. But let's ignore that for now. You can expect the database to execute the following query instead of the above:
SELECT *
FROM (
SELECT *
FROM actor
WHERE actor_id = 1
) a;
And then, again, maybe eliminate the external request. A more complex example is obtained using UNION :
SELECT *
FROM (
SELECT first_name, last_name, 'actor' type
FROM actor
UNION ALL
SELECT first_name, last_name, 'customer' type
FROM customer
) people
WHERE people.last_name = 'DAVIS';
The result of this query:
FIRST_NAME LAST_NAME TYPE
----------------------------
JENNIFER DAVIS actor
SUSAN DAVIS actor
SUSAN DAVIS actor
JENNIFER DAVIS customer
Now, it would be great if the database optimizer would run a query like this instead:
SELECT *
FROM (
SELECT first_name, last_name, 'actor' type
FROM actor
WHERE last_name = 'DAVIS'
UNION ALL
SELECT first_name, last_name, 'customer' type
FROM customer
WHERE last_name = 'DAVIS'
) people;
That is, so that it pushes the predicate into the derived table, and from there into two UNION ALL subqueries , since, after all, we have an index on both the ACTOR.LAST_NAME column and the CUSTOMER.LAST_NAME column . Again, this conversion is probably based on cost estimates in most databases, but I still think it's a no-brainer since, with any algorithm, it's almost always better to reduce the number of tuples processed as early as possible. If you know of a case where such a transformation turns out to be a bad idea, I will be glad to hear your comments! I would be very interested. So which of our databases can do this? (And please, it’s so simple and so important, let the answer be: everything)
DB2
Simple derived table YesExplain Plan
--------------------------------------------------
ID | Operation | Rows | Cost
1 | RETURN | | 6
2 | FETCH ACTOR | 1 of 1 (100.00%) | 6
3 | IXSCAN PK_ACTOR | 1 of 200 ( .50%) | 0
Predicate Information
3 - START (Q1.ACTOR_ID = 1)
STOP (Q1.ACTOR_ID = 1)
Derived table with UNION Also yes:
Explain Plan
-----------------------------------------------------------------
ID | Operation | Rows | Cost
1 | RETURN | | 20
2 | UNION | 2 of 1 | 20
3 | FETCH CUSTOMER | 1 of 1 (100.00%) | 13
4 | IXSCAN IDX_CUSTOMER_LAST_NAME | 1 of 599 ( .17%) | 6
5 | FETCH ACTOR | 1 of 1 (100.00%) | 6
6 | IXSCAN IDX_ACTOR_LAST_NAME | 1 of 200 ( .50%) | 0
Predicate Information
4 - START (Q1.LAST_NAME = 'DAVIS')
STOP (Q1.LAST_NAME = 'DAVIS')
6 - START (Q3.LAST_NAME = 'DAVIS')
STOP (Q3.LAST_NAME = 'DAVIS')
Also, in both cases, the derived table (view) was excluded from the plan as it was not actually needed.
MySQL
Simple derived table YesID TABLE TYPE KEY REF EXTRA
---------------------------------------
1 actor const PRIMARY const
The usual primary key access by constant value is used. Derived table with UNION Oops, no.
ID SELECT_TYPE TABLE TYPE KEY REF ROWS EXTRA
------------------------------------------------------------------
1 PRIMARY ref const 10
2 DERIVED actor ALL 200
3 UNION customer ALL 599
Manual conversion results in the plan:
ID SELECT_TYPE TABLE TYPE KEY REF ROWS EXTRA
--------------------------------------------------------------------------
1 PRIMARY ALL 5
2 DERIVED actor ref idx_actor_last_name const 3
3 UNION customer ref idx_last_name const 1
This is a serious problem when using complex nested queries in MySQL!
Oracle
Simple derived table Yes, it works.---------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | TABLE ACCESS BY INDEX ROWID| ACTOR | 1 | 1 | 1 |
|* 2 | INDEX UNIQUE SCAN | PK_ACTOR | 1 | 1 | 1 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ACTOR"."ACTOR_ID"=1)
And the number of nesting levels has been reduced. Derived table with UNION also works:
---------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | VIEW | | 4 |
| 2 | UNION-ALL | | |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| ACTOR | 3 |
|* 4 | INDEX RANGE SCAN | IDX_ACTOR_LAST_NAME | 3 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER | 1 |
|* 6 | INDEX RANGE SCAN | IDX_CUSTOMER_LAST_NAME | 1 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("LAST_NAME"='DAVIS')
6 - access("LAST_NAME"='DAVIS')
However, without reducing the number of nesting levels. Id=1 "View" shows that the derived table is still there. In this case it is not a big problem, just perhaps a small additional cost.
PostgreSQL
Simple derived table Yes, it works:QUERY PLAN
----------------------------------------------------
Seq Scan on actor (cost=0.00..4.50 rows=1 width=25)
Filter: (actor_id = 1)
Note, however, that PostgreSQL sometimes doesn't even use the primary key to look up a single row, but instead scans the entire table. In this case, 200 rows × 25 bytes per row ("width") fit in one block, so what's the point of bothering with index reads other than generating unnecessary I/O operations to access such a small table? Derived table with UNION Yes, it works too:
QUERY PLAN
-----------------------------------------------------------------------------------
Append (cost=0.00..12.83 rows=4 width=45)
-> Seq Scan on actor (cost=0.00..4.50 rows=3 width=45)
Filter: ((last_name)::text = 'DAVIS'::text)
-> Index Scan using idx_last_name on customer (cost=0.28..8.29 rows=1 width=45)
Index Cond: ((last_name)::text = 'DAVIS'::text)
Again, an index on the ACTOR.LAST_NAME column is not used, but an index on the CUSTOMER.LAST_NAME column is used because the CUSTOMER table is much larger.
SQL Server
Simple derived table Yes, it works|--Nested Loops(Inner Join)
|--Index Seek(SEEK:([actor_id]=(1)))
|--RID Lookup(OBJECT:([actor]))
Derived table with UNION also works.
|--Concatenation
|--Compute Scalar(DEFINE:([Expr1003]='actor'))
| |--Nested Loops(Inner Join)
| |--Index Seek(SEEK:([actor].[last_name]='DAVIS'))
| |--RID Lookup(OBJECT:([actor]))
|--Compute Scalar(DEFINE:([Expr1007]='customer'))
|--Nested Loops(Inner Join)
|--Index Seek(SEEK:([customer].[last_name]='DAVIS'))
|--RID Lookup(OBJECT:([customer]))
Summary
My hopes did not come true. MySQL 8.0.2 does not yet fully support this simple optimization. Everyone else, however, is supportive.Database | Push a simple derived table | Pushing a derived table with UNION |
---|---|---|
DB2 LUW 10.5 | Yes | Yes |
MySQL 8.0.2 | Yes | No |
Oracle 12.2.0.1 | Yes | Yes |
PostgreSQL 9.6 | Yes | Yes |
SQL Server 2014 | Yes | Yes |
Conclusion
The list presented here is far from complete. There are many other simple SQL transformations that are not (or should not be) difficult for databases to implement, even before a cost optimizer gets involved. They eliminate unnecessary, extra work [for the database] ( as opposed to unnecessary, required work, which I already wrote about ). These are important tools to:- Stupid [developer] mistakes had no impact on performance. Errors are inevitable, and as a project grows and SQL queries become more complex, these errors can accumulate, hopefully without any effect.
- Provide the ability to reuse complex blocks, such as views and table functions, that can be embedded in parent SQL queries, transformed, or partially deleted or rewritten.
GO TO FULL VERSION