August 24, 2020

Logical Trees Part 4: Filtering with the select operator; SQL WHERE

In part 3 we looked at the basic operators that describe getting and projecting data.

In this post we will see how basic filters are specified using the LogOp_Select operator, commonly resulting from a SQL WHERE statement.

Operators
Examples
Optimization
Get the Code
Remember This


Operators


LogOp_Select Filter data


LogOp_Select Operator

The select operator describes filtering - aka applying a predicate - using two child operators. The first is a logical operator describing some relational data to be filtered, and the second is an operator describing the filter itself.

The filter is an expression returning a potentially three valued boolean  - true, false or unknown.

Examples of operators that return boolean values include:

- Scalar comparisons, describing =, >, <, <>.
- Logical comparisons, describing AND, OR
- Relational comparisons, describing SOME, ALL, EXISTS, NOT EXISTS - we will examine these in detail in a later post. 


ScaOp_Comp Compare values


ScaOp_Comp Operator

The scalar comparison ScaOp_Comp specifies a comparison type (see table below), and has two children specifying the values to be compared.

Operator

T-SQL

ScaOp_Comp x_cmpEq

=

ScaOp_Comp x_cmpGt

> 

ScaOp_Comp x_cmpLt

< 

ScaOp_Comp x_cmpGe

>= or !<

ScaOp_Comp x_cmpLe

<= or !>

ScaOp_Comp x_cmpNe

<> or !=

ScaOp_Comp x_cmpIs

No explicit support

ScaOp_Comp x_cmpIsNot

No explicit support



Uncovered: Null Equality - Not supported in T-SQL, but there under the hood.

In the above list there are two operators - is and is not - that have no explicit equivalent in T-SQL

If we want to test if two values are equal, and return true if both values are null, we either write the somewhat clumsy:

WHERE a = b OR a IS NULL AND b IS NULL -- True if a and b are both null

Or, 
in some set comparison queries, we can use a more elegant EXISTS & INTERSECT construct. 

However, in the SQL standard - but not implemented in T-SQL - is the following syntax:

WHERE a IS NOT DISTINCT FROM b -- True if a and b are both null

It is frustrating then to discover that in certain queries (such as the EXISTS & INTERSECT construct) we then see that the engine does in fact include an operator that supports the null equality comparison functionality written into the SQL standard over twenty years ago.

For a deeper look at this limitation, and potential solutions, see this post by Paul White.


In addition to the above, SQL LIKE is a special case comparison with it's own operator syntax:

ScaOp_Intrinsic like


LogOp_Logical Apply boolean logic - AND / OR


ScaOp_Logical Operator

The LogOp_Logical operator describes boolean logical operations such as AND. Each child operator describes a boolean result, to be compared. Note that unlike SQL, the operator can have more than two child inputs - a sample query demonstrates this later, in the examples section. 


Operator

T-SQL

ScaOp_Logical x_lopAnd

AND

ScaOp_Logical x_lopOr

OR

No operator. NOT expressions are rewritten using De Morgan's laws.

NOT



Why no operator for NOT?

NOT has no operator in the logical trees, because negation is rewritten to eliminate it. For example:

NOT (a = b AND x = y)

Can be rewritten as:

a <> b OR x <> y

The rewritten expression - seen in the first logical tree (Converted) - is an example of De Morgan's laws for propositional logic. A runnable example is included in the queries below. It is not important to understand the academic theory, but it is worth understanding that rewrites such as these can make it easier for subsequent optimization rules to further simply the query. 


ScaOp_Identifier Specify a column


ScaOp_Identifier Operator

This leaf operator specifies a column - either physical or derived.

QCOL: [{table alias}].{column name}] - Column from a table specified in a child operator
COL: {expression name} - A derived column, where the expression is defined in a child operator.


ScaOp_Const Specify a constant


ScaOp_Constant Operator

This leaf operator specifies a constant.

TI ({type information}) - Type specification
XVAR ({value specification}) - Value specification. XVAR is an example of the extensive internal use of a type we know externally as sql_variant. (Paul White).

Whilst the individual elements of the specification have many variations, I won't in this post attempt document them all. However, they are usually sufficiently readable to understand the meaning in the context of the tree.



Examples

The examples use the AdventureWorks database. Any version from 2008R2 onwards will work - available hereThe queries specify a set of trace flags that will print the logical trees (and the physical output tree) to the messages window in SSMS. The tree output shown was produced by the SQL Server 2019 optimizer. Older versions may have different behaviour.

Equality Comparison


-- Simple equality comparison, comparing a column with a constant
SELECT p.Name
FROM Production.Product AS p
WHERE p.ProductID = 500
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
/* *** Converted Tree: ***
LogOp_Project QCOL: [p].Name
    LogOp_Select
        LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000 
        ScaOp_Comp x_cmpEq
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=500)
    AncOp_PrjList

LogOp_Select has two childen:
A logical operator describing the input data
A scalar comparison operator (ScaOp_Comp), specifying an equality comparison (x_cmpEq)

ScaOp_Comp has two children, describing the two values to be compared
ScaOp_Identifier specifies a column, using the alias declared in the LogOp_Get
ScaOp_Const specifies a constant, with a data type and value.
*/


Boolean Comparison


-- Simple comparison with boolean expression
SELECT p.Name
FROM Production.Product AS p
WHERE p.Color = 'Silver'
      AND p.ProductLine = 'M'
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
/* *** Converted Tree: ***
LogOp_Project QCOL: [p].Name
    LogOp_Select
        LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000 
        ScaOp_Logical x_lopAnd
            ScaOp_Comp x_cmpEq
                ScaOp_Identifier QCOL: [p].Color
                ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=12) XVAR(nvarchar,Owned,Value=Len,Data = (12,83105108118101114))
            ScaOp_Comp x_cmpEq
                ScaOp_Identifier QCOL: [p].ProductLine
                ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=2) XVAR(nvarchar,Owned,Value=Len,Data = (2,77))

    AncOp_PrjList 
LogOp_Select has two childen:
A logical operator describing the input data
A logical comparison operator (ScaOp_Logical), specifying an AND comparison (x_lopAnd)

ScaOp_Logical has two children, specifying the two comparisons eacg returning a boolean result, as input to the AND expression.
*/


3 Part Boolean Comparison


-- Simple comparison with 3 part boolean expression
SELECT p.Name
FROM Production.Product AS p
WHERE p.Color = 'Silver'
      AND p.ProductLine = 'M'
      AND p.FinishedGoodsFlag = 1
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
/* *** Converted Tree: *** (partial. operators omitted for clarity, indicated by :)
        :
        ScaOp_Logical x_lopAnd
            ScaOp_Comp x_cmpEq
            :
            ScaOp_Comp x_cmpEq
            :
            ScaOp_Comp x_cmpEq
        :
Note that unlike SQL syntax, the logical operator can take more than 2 inputs. 
*/


SQL NOT, eliminated using De Morgan's laws


-- NOT expression rewritten using De Morgan's laws.
SELECT p.ProductID
FROM Production.Product AS p
WHERE NOT (
              p.Color = 'Silver'
              AND p.ProductLine = 'M'
          )
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
/* *** Converted Tree: *** (partial. operators omitted for clarity, indicated by :)
    :
    ScaOp_Logical x_lopOr
        ScaOp_Comp x_cmpNe
            ScaOp_Identifier QCOL: [p].Color
            ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=12) XVAR(nvarchar,Owned,Value=Len,Data = (12,83105108118101114))
        ScaOp_Comp x_cmpNe
            ScaOp_Identifier QCOL: [p].ProductLine
            ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=2) XVAR(nvarchar,Owned,Value=Len,Data = (2,77))
    :

Tree shows the expression with NOT eliminated, and the AND converted to OR
*/



IN comparison with a set of constants


- *** IN Test against constants
SELECT p.Name
FROM Production.Product AS p
WHERE p.ProductID IN ( 500501 )
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Converted Tree: ***
    :
    ScaOp_Logical x_lopOr
        ScaOp_Comp x_cmpEq
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=501)
        ScaOp_Comp x_cmpEq
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=500)w
    :

Described using OR and multiple equality comparisons
*/




Contradiction (Explicit)

In this example, the optimizer recognises a contradiction that will prevent any rows being returned from the relevant table.

-- Optimization: Contradiction (Explicit)
SELECT p.Name
FROM Production.Product AS p
WHERE 1=0
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Converted Tree: ***
LogOp_Project QCOL: [p].Name
    LogOp_Select
        LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000 
        ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=0)
    AncOp_PrjList 

Initially, a select operator has a filter defined with a zero constant, i.e. always false


*** Simplified Tree: ***

LogOp_ConstTableGet (0) COL: IsBaseRow1000  QCOL: [p].ProductID QCOL: [p].Name COL: ProductNumber  COL: rowguid 

In this tree, the 'always false' select has been replaced by LogOp_ConstTableGet.
This includes a rowcount (0), indicating no rows will be returned.
Instead it gives an indication that some table metadata will be needed  (e.g. column names, data types).
*/



Contradiction (Query vs. Check Constraint)

If a query specifies a condition that contradicts the requirement of a check constraint, the optimizer can eliminate the redundant part of the tree.

-- Optimization: Contradiction (Check Constraint)
SELECT p.Name
FROM Production.Product AS p
WHERE p.ListPrice < 0
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Converted Tree: ***
    :
    LogOp_Select
        LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000 
        ScaOp_Comp x_cmpLt
            ScaOp_Identifier QCOL: [p].ListPrice
            ScaOp_Const TI(money,ML=8) XVAR(money,Not Owned,Value=(10000units)=(0))
    :

The specified comparison is in the converted tree.


*** Simplified Tree: ***

LogOp_ConstTableGet (0) COL: IsBaseRow1000  QCOL: [p].ProductID QCOL: [p].Name COL: ProductNumber  COL: rowguid 

In this tree, the comparison that can never return true due to the check constraint, is replaced with LogOp_ConstTableGet (0)
*/



Domain Simplification

Here, the optimizer can determine that two ranges specified with an AND can be rationalized to their intersection; simply p.ProductID = 5

This same optimization logic will also merge overlapping ranges specified with an OR operator.

This can be useful when queries are generated with unexpected range overlaps - e.g. poor queries generated by an ORM framework, or from a user interface that allows users to create redundant criteria. In the latter case, it's important to be aware of this rule, to avoid wasted work attempting to implement the rationalization in the application, when the optimizer is already capable of performing the work.

-- Optimization: Domain Simplification (Rule: SelPredNorm)
SELECT p.ProductID
FROM Production.Product AS p
WHERE p.ProductID
      BETWEEN 1 AND 5
      AND p.ProductID
      BETWEEN 5 AND 10
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Converted Tree: ***
    :
    ScaOp_Logical x_lopAnd
        ScaOp_Comp x_cmpGe
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)
        ScaOp_Comp x_cmpLe
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=5)
        ScaOp_Comp x_cmpGe
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=5)
        ScaOp_Comp x_cmpLe
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=10)
    :

Initialy the comparison is expressed in full.


*** Simplified Tree: ***
    :
    ScaOp_Comp x_cmpEq
        ScaOp_Identifier QCOL: [p].ProductID
        ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=5)
    :

In the simplified tree, it is rationalised to ProductId = 5
*/






Github Repo for everything in the blog: SQL-Server-Examples


  • To best leverage indexes, understand SArgability
  • The optimizer has logic that will deal with some 'strange' query constructs e.g. generated by an ORM.
  • When considering a poorly performing and unreadable ORM generated query, it is sometimes better to focus on the optimized result (plan in SSMS) rather than be distracted by bad query constructs that the optimizer has coped with and therefore aren't the cause of the problem.
  • If your application generates it's own non-ORM queries, be aware of what the optimizer will manage for you - i.e. don't write your own optimizations when they aren't needed. (e.g domain simplification).



In part 5 we will look at joins, including clever tricks the optimizer can do with them to simplify plans.

No comments:

Post a Comment