JavaRush /Java Blog /Random EN /Cool SQL optimizations that do not depend on the cost mod...

Level 41

28 February 2021
47 views
0 comments

Cool SQL optimizations that do not depend on the cost model. Part 5

Cool SQL optimizations that do not depend on the cost model. Part 1 Cool SQL optimizations that do not depend on the cost model. Part 2 Cool SQL optimizations that do not depend on the cost model. Part 3 Cool SQL optimizations that do not depend on the cost model. Part 4

10. Pushing predicates

This optimization is not entirely appropriate here, because it cannot be said that it is not based at all on the cost model. But since I can't think of a single reason why the optimizer shouldn't push predicates into derived tables, I'll list it here, along with the rest of the non-cost optimizations. Consider the request:

SELECT *
FROM (
  SELECT *
  FROM actor
) a
WHERE a.actor_id = 1;

The derived table in this query does not make any sense and should be eliminated by reducing the number of query nesting levels. But let's ignore that for now. You can expect the database to execute the following query instead of the above:

SELECT *
FROM (
  SELECT *
  FROM actor
  WHERE actor_id = 1
) a;

And then, again, maybe eliminate the external request. A more complex example is obtained using UNION :

SELECT *
FROM (
  SELECT first_name, last_name, 'actor' type
  FROM actor
  UNION ALL
  SELECT first_name, last_name, 'customer' type
  FROM customer
) people
WHERE people.last_name = 'DAVIS';

The result of this query:

FIRST_NAME  LAST_NAME  TYPE
----------------------------
JENNIFER    DAVIS      actor
SUSAN       DAVIS      actor
SUSAN       DAVIS      actor
JENNIFER    DAVIS      customer

Now, it would be great if the database optimizer would run a query like this instead:

SELECT *
FROM (
  SELECT first_name, last_name, 'actor' type
  FROM actor
  WHERE last_name = 'DAVIS'
  UNION ALL
  SELECT first_name, last_name, 'customer' type
  FROM customer
  WHERE last_name = 'DAVIS'
) people;

That is, so that it pushes the predicate into the derived table, and from there into two UNION ALL subqueries , since, after all, we have an index on both the ACTOR.LAST_NAME column and the CUSTOMER.LAST_NAME column . Again, this conversion is probably based on cost estimates in most databases, but I still think it's a no-brainer since, with any algorithm, it's almost always better to reduce the number of tuples processed as early as possible. If you know of a case where such a transformation turns out to be a bad idea, I will be glad to hear your comments! I would be very interested. So which of our databases can do this? (And please, it’s so simple and so important, let the answer be: everything)

DB2

Simple derived table Yes

Explain Plan
--------------------------------------------------
ID | Operation         |               Rows | Cost
 1 | RETURN            |                    |    6
 2 |  FETCH ACTOR      |   1 of 1 (100.00%) |    6
 3 |   IXSCAN PK_ACTOR | 1 of 200 (   .50%) |    0
Predicate Information
 3 - START (Q1.ACTOR_ID = 1)
      STOP (Q1.ACTOR_ID = 1)

Derived table with UNION Also yes:

Explain Plan
-----------------------------------------------------------------
ID | Operation                        |               Rows | Cost
 1 | RETURN                           |                    |   20
 2 |  UNION                           |             2 of 1 |   20
 3 |   FETCH CUSTOMER                 |   1 of 1 (100.00%) |   13
 4 |    IXSCAN IDX_CUSTOMER_LAST_NAME | 1 of 599 (   .17%) |    6
 5 |   FETCH ACTOR                    |   1 of 1 (100.00%) |    6
 6 |    IXSCAN IDX_ACTOR_LAST_NAME    | 1 of 200 (   .50%) |    0
Predicate Information
 4 - START (Q1.LAST_NAME = 'DAVIS')
      STOP (Q1.LAST_NAME = 'DAVIS')
 6 - START (Q3.LAST_NAME = 'DAVIS')
      STOP (Q3.LAST_NAME = 'DAVIS')

Also, in both cases, the derived table (view) was excluded from the plan as it was not actually needed.

MySQL

Simple derived table Yes

ID  TABLE  TYPE   KEY      REF    EXTRA
---------------------------------------
1   actor  const  PRIMARY  const

The usual primary key access by constant value is used. Derived table with UNION Oops, no.

ID  SELECT_TYPE  TABLE       TYPE  KEY          REF    ROWS  EXTRA
------------------------------------------------------------------
1   PRIMARY        ref   	const  10
2   DERIVED      actor       ALL                       200
3   UNION        customer    ALL                       599

Manual conversion results in the plan:

ID  SELECT_TYPE  TABLE       TYPE  KEY                  REF    ROWS  EXTRA
--------------------------------------------------------------------------
1   PRIMARY        ALL                               5
2   DERIVED      actor       ref   idx_actor_last_name  const  3
3   UNION        customer    ref   idx_last_name        const  1

This is a serious problem when using complex nested queries in MySQL!

Oracle

Simple derived table Yes, it works.

---------------------------------------------------------------------------
| Id  | Operation                   | Name     | Starts | E-Rows | A-Rows |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |      1 |        |      1 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ACTOR    |      1 |      1 |      1 |
|*  2 |   INDEX UNIQUE SCAN         | PK_ACTOR |      1 |      1 |      1 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ACTOR"."ACTOR_ID"=1)

And the number of nesting levels has been reduced. Derived table with UNION also works:

---------------------------------------------------------------------------------
| Id  | Operation                             | Name                   | E-Rows |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                        |        |
|   1 |  VIEW                                 |                        |      4 |
|   2 |   UNION-ALL                           |                        |        |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| ACTOR                  |      3 |
|*  4 |     INDEX RANGE SCAN                  | IDX_ACTOR_LAST_NAME    |      3 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER               |      1 |
|*  6 |     INDEX RANGE SCAN                  | IDX_CUSTOMER_LAST_NAME |      1 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("LAST_NAME"='DAVIS')
   6 - access("LAST_NAME"='DAVIS')

However, without reducing the number of nesting levels. Id=1 "View" shows that the derived table is still there. In this case it is not a big problem, just perhaps a small additional cost.

PostgreSQL

Simple derived table Yes, it works:

QUERY PLAN
----------------------------------------------------
Seq Scan on actor  (cost=0.00..4.50 rows=1 width=25)
  Filter: (actor_id = 1)

Note, however, that PostgreSQL sometimes doesn't even use the primary key to look up a single row, but instead scans the entire table. In this case, 200 rows × 25 bytes per row ("width") fit in one block, so what's the point of bothering with index reads other than generating unnecessary I/O operations to access such a small table? Derived table with UNION Yes, it works too:

QUERY PLAN
-----------------------------------------------------------------------------------
Append  (cost=0.00..12.83 rows=4 width=45)
  ->  Seq Scan on actor  (cost=0.00..4.50 rows=3 width=45)
        Filter: ((last_name)::text = 'DAVIS'::text)
  ->  Index Scan using idx_last_name on customer  (cost=0.28..8.29 rows=1 width=45)
        Index Cond: ((last_name)::text = 'DAVIS'::text)

Again, an index on the ACTOR.LAST_NAME column is not used, but an index on the CUSTOMER.LAST_NAME column is used because the CUSTOMER table is much larger.

SQL Server

Simple derived table Yes, it works

|--Nested Loops(Inner Join)
     |--Index Seek(SEEK:([actor_id]=(1)))
     |--RID Lookup(OBJECT:([actor]))

Derived table with UNION also works.

|--Concatenation
     |--Compute Scalar(DEFINE:([Expr1003]='actor'))
     |    |--Nested Loops(Inner Join)
     |         |--Index Seek(SEEK:([actor].[last_name]='DAVIS'))
     |         |--RID Lookup(OBJECT:([actor]))
     |--Compute Scalar(DEFINE:([Expr1007]='customer'))
          |--Nested Loops(Inner Join)
               |--Index Seek(SEEK:([customer].[last_name]='DAVIS'))
               |--RID Lookup(OBJECT:([customer]))

Summary

My hopes did not come true. MySQL 8.0.2 does not yet fully support this simple optimization. Everyone else, however, is supportive.

Database	Push a simple derived table	Pushing a derived table with UNION
DB2 LUW 10.5	Yes	Yes
MySQL 8.0.2	Yes	No
Oracle 12.2.0.1	Yes	Yes
PostgreSQL 9.6	Yes	Yes
SQL Server 2014	Yes	Yes

Conclusion

The list presented here is far from complete. There are many other simple SQL transformations that are not (or should not be) difficult for databases to implement, even before a cost optimizer gets involved. They eliminate unnecessary, extra work [for the database] ( as opposed to unnecessary, required work, which I already wrote about ). These are important tools to:

Stupid [developer] mistakes had no impact on performance. Errors are inevitable, and as a project grows and SQL queries become more complex, these errors can accumulate, hopefully without any effect.

Provide the ability to reuse complex blocks, such as views and table functions, that can be embedded in parent SQL queries, transformed, or partially deleted or rewritten.

These capabilities are critical to point 2. Without them, it would be very difficult to create 4000-row SQL queries with normal performance based on a library of reusable SQL components. Much to the disappointment of PostgreSQL and MySQL users, these two popular open source databases still have a long way to go compared to their commercial competitors DB2, Oracle and SQL Server, of which DB2 has performed best, with Oracle and SQL Server roughly neck and neck. a little behind.