August 24, 2020

Contents

Optimizer Logical Trees Step-By-Step


Logical trees expose the early stages of optimization. This is a step-by-step guide to those trees, starting with the basics. Included are tips for getting queries to run faster.

Part 1 - Introduction
Part 2 - Overview of Trees and Operators
Part 3 - Getting and projecting; SQL FROM, SELECT
Part 4 - Filtering with the select operator; SQL WHERE

Coming soon: Part 5 - Joins and Apply


Special thanks to Paul White for providing an invaluable technical review of this series, and to Redgate for donating software.

General


Logical Trees Part 4: Filtering with the select operator; SQL WHERE

In part 3 we looked at the basic operators that describe getting and projecting data.

In this post we will see how basic filters are specified using the LogOp_Select operator, commonly resulting from a SQL WHERE statement.

Operators
Examples
Optimization
Get the Code
Remember This


Operators


LogOp_Select Filter data


LogOp_Select Operator

The select operator describes filtering - aka applying a predicate - using two child operators. The first is a logical operator describing some relational data to be filtered, and the second is an operator describing the filter itself.

The filter is an expression returning a potentially three valued boolean  - true, false or unknown.

Examples of operators that return boolean values include:

- Scalar comparisons, describing =, >, <, <>.
- Logical comparisons, describing AND, OR
- Relational comparisons, describing SOME, ALL, EXISTS, NOT EXISTS - we will examine these in detail in a later post. 


ScaOp_Comp Compare values


ScaOp_Comp Operator

The scalar comparison ScaOp_Comp specifies a comparison type (see table below), and has two children specifying the values to be compared.

Operator

T-SQL

ScaOp_Comp x_cmpEq

=

ScaOp_Comp x_cmpGt

> 

ScaOp_Comp x_cmpLt

< 

ScaOp_Comp x_cmpGe

>= or !<

ScaOp_Comp x_cmpLe

<= or !>

ScaOp_Comp x_cmpNe

<> or !=

ScaOp_Comp x_cmpIs

No explicit support

ScaOp_Comp x_cmpIsNot

No explicit support



Uncovered: Null Equality - Not supported in T-SQL, but there under the hood.

In the above list there are two operators - is and is not - that have no explicit equivalent in T-SQL

If we want to test if two values are equal, and return true if both values are null, we either write the somewhat clumsy:

WHERE a = b OR a IS NULL AND b IS NULL -- True if a and b are both null

Or, 
in some set comparison queries, we can use a more elegant EXISTS & INTERSECT construct. 

However, in the SQL standard - but not implemented in T-SQL - is the following syntax:

WHERE a IS NOT DISTINCT FROM b -- True if a and b are both null

It is frustrating then to discover that in certain queries (such as the EXISTS & INTERSECT construct) we then see that the engine does in fact include an operator that supports the null equality comparison functionality written into the SQL standard over twenty years ago.

For a deeper look at this limitation, and potential solutions, see this post by Paul White.


In addition to the above, SQL LIKE is a special case comparison with it's own operator syntax:

ScaOp_Intrinsic like


LogOp_Logical Apply boolean logic - AND / OR


ScaOp_Logical Operator

The LogOp_Logical operator describes boolean logical operations such as AND. Each child operator describes a boolean result, to be compared. Note that unlike SQL, the operator can have more than two child inputs - a sample query demonstrates this later, in the examples section. 


Operator

T-SQL

ScaOp_Logical x_lopAnd

AND

ScaOp_Logical x_lopOr

OR

No operator. NOT expressions are rewritten using De Morgan's laws.

NOT



Why no operator for NOT?

NOT has no operator in the logical trees, because negation is rewritten to eliminate it. For example:

NOT (a = b AND x = y)

Can be rewritten as:

a <> b OR x <> y

The rewritten expression - seen in the first logical tree (Converted) - is an example of De Morgan's laws for propositional logic. A runnable example is included in the queries below. It is not important to understand the academic theory, but it is worth understanding that rewrites such as these can make it easier for subsequent optimization rules to further simply the query. 


ScaOp_Identifier Specify a column


ScaOp_Identifier Operator

This leaf operator specifies a column - either physical or derived.

QCOL: [{table alias}].{column name}] - Column from a table specified in a child operator
COL: {expression name} - A derived column, where the expression is defined in a child operator.


ScaOp_Const Specify a constant


ScaOp_Constant Operator

This leaf operator specifies a constant.

TI ({type information}) - Type specification
XVAR ({value specification}) - Value specification. XVAR is an example of the extensive internal use of a type we know externally as sql_variant. (Paul White).

Whilst the individual elements of the specification have many variations, I won't in this post attempt document them all. However, they are usually sufficiently readable to understand the meaning in the context of the tree.



Examples

The examples use the AdventureWorks database. Any version from 2008R2 onwards will work - available hereThe queries specify a set of trace flags that will print the logical trees (and the physical output tree) to the messages window in SSMS. The tree output shown was produced by the SQL Server 2019 optimizer. Older versions may have different behaviour.

Equality Comparison


-- Simple equality comparison, comparing a column with a constant
SELECT p.Name
FROM Production.Product AS p
WHERE p.ProductID = 500
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
/* *** Converted Tree: ***
LogOp_Project QCOL: [p].Name
    LogOp_Select
        LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000 
        ScaOp_Comp x_cmpEq
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=500)
    AncOp_PrjList

LogOp_Select has two childen:
A logical operator describing the input data
A scalar comparison operator (ScaOp_Comp), specifying an equality comparison (x_cmpEq)

ScaOp_Comp has two children, describing the two values to be compared
ScaOp_Identifier specifies a column, using the alias declared in the LogOp_Get
ScaOp_Const specifies a constant, with a data type and value.
*/


Boolean Comparison


-- Simple comparison with boolean expression
SELECT p.Name
FROM Production.Product AS p
WHERE p.Color = 'Silver'
      AND p.ProductLine = 'M'
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
/* *** Converted Tree: ***
LogOp_Project QCOL: [p].Name
    LogOp_Select
        LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000 
        ScaOp_Logical x_lopAnd
            ScaOp_Comp x_cmpEq
                ScaOp_Identifier QCOL: [p].Color
                ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=12) XVAR(nvarchar,Owned,Value=Len,Data = (12,83105108118101114))
            ScaOp_Comp x_cmpEq
                ScaOp_Identifier QCOL: [p].ProductLine
                ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=2) XVAR(nvarchar,Owned,Value=Len,Data = (2,77))

    AncOp_PrjList 
LogOp_Select has two childen:
A logical operator describing the input data
A logical comparison operator (ScaOp_Logical), specifying an AND comparison (x_lopAnd)

ScaOp_Logical has two children, specifying the two comparisons eacg returning a boolean result, as input to the AND expression.
*/


3 Part Boolean Comparison


-- Simple comparison with 3 part boolean expression
SELECT p.Name
FROM Production.Product AS p
WHERE p.Color = 'Silver'
      AND p.ProductLine = 'M'
      AND p.FinishedGoodsFlag = 1
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
/* *** Converted Tree: *** (partial. operators omitted for clarity, indicated by :)
        :
        ScaOp_Logical x_lopAnd
            ScaOp_Comp x_cmpEq
            :
            ScaOp_Comp x_cmpEq
            :
            ScaOp_Comp x_cmpEq
        :
Note that unlike SQL syntax, the logical operator can take more than 2 inputs. 
*/


SQL NOT, eliminated using De Morgan's laws


-- NOT expression rewritten using De Morgan's laws.
SELECT p.ProductID
FROM Production.Product AS p
WHERE NOT (
              p.Color = 'Silver'
              AND p.ProductLine = 'M'
          )
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
/* *** Converted Tree: *** (partial. operators omitted for clarity, indicated by :)
    :
    ScaOp_Logical x_lopOr
        ScaOp_Comp x_cmpNe
            ScaOp_Identifier QCOL: [p].Color
            ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=12) XVAR(nvarchar,Owned,Value=Len,Data = (12,83105108118101114))
        ScaOp_Comp x_cmpNe
            ScaOp_Identifier QCOL: [p].ProductLine
            ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=2) XVAR(nvarchar,Owned,Value=Len,Data = (2,77))
    :

Tree shows the expression with NOT eliminated, and the AND converted to OR
*/



IN comparison with a set of constants


- *** IN Test against constants
SELECT p.Name
FROM Production.Product AS p
WHERE p.ProductID IN ( 500501 )
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Converted Tree: ***
    :
    ScaOp_Logical x_lopOr
        ScaOp_Comp x_cmpEq
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=501)
        ScaOp_Comp x_cmpEq
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=500)w
    :

Described using OR and multiple equality comparisons
*/




Contradiction (Explicit)

In this example, the optimizer recognises a contradiction that will prevent any rows being returned from the relevant table.

-- Optimization: Contradiction (Explicit)
SELECT p.Name
FROM Production.Product AS p
WHERE 1=0
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Converted Tree: ***
LogOp_Project QCOL: [p].Name
    LogOp_Select
        LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000 
        ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=0)
    AncOp_PrjList 

Initially, a select operator has a filter defined with a zero constant, i.e. always false


*** Simplified Tree: ***

LogOp_ConstTableGet (0) COL: IsBaseRow1000  QCOL: [p].ProductID QCOL: [p].Name COL: ProductNumber  COL: rowguid 

In this tree, the 'always false' select has been replaced by LogOp_ConstTableGet.
This includes a rowcount (0), indicating no rows will be returned.
Instead it gives an indication that some table metadata will be needed  (e.g. column names, data types).
*/



Contradiction (Query vs. Check Constraint)

If a query specifies a condition that contradicts the requirement of a check constraint, the optimizer can eliminate the redundant part of the tree.

-- Optimization: Contradiction (Check Constraint)
SELECT p.Name
FROM Production.Product AS p
WHERE p.ListPrice < 0
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Converted Tree: ***
    :
    LogOp_Select
        LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000 
        ScaOp_Comp x_cmpLt
            ScaOp_Identifier QCOL: [p].ListPrice
            ScaOp_Const TI(money,ML=8) XVAR(money,Not Owned,Value=(10000units)=(0))
    :

The specified comparison is in the converted tree.


*** Simplified Tree: ***

LogOp_ConstTableGet (0) COL: IsBaseRow1000  QCOL: [p].ProductID QCOL: [p].Name COL: ProductNumber  COL: rowguid 

In this tree, the comparison that can never return true due to the check constraint, is replaced with LogOp_ConstTableGet (0)
*/



Domain Simplification

Here, the optimizer can determine that two ranges specified with an AND can be rationalized to their intersection; simply p.ProductID = 5

This same optimization logic will also merge overlapping ranges specified with an OR operator.

This can be useful when queries are generated with unexpected range overlaps - e.g. poor queries generated by an ORM framework, or from a user interface that allows users to create redundant criteria. In the latter case, it's important to be aware of this rule, to avoid wasted work attempting to implement the rationalization in the application, when the optimizer is already capable of performing the work.

-- Optimization: Domain Simplification (Rule: SelPredNorm)
SELECT p.ProductID
FROM Production.Product AS p
WHERE p.ProductID
      BETWEEN 1 AND 5
      AND p.ProductID
      BETWEEN 5 AND 10
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Converted Tree: ***
    :
    ScaOp_Logical x_lopAnd
        ScaOp_Comp x_cmpGe
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)
        ScaOp_Comp x_cmpLe
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=5)
        ScaOp_Comp x_cmpGe
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=5)
        ScaOp_Comp x_cmpLe
            ScaOp_Identifier QCOL: [p].ProductID
            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=10)
    :

Initialy the comparison is expressed in full.


*** Simplified Tree: ***
    :
    ScaOp_Comp x_cmpEq
        ScaOp_Identifier QCOL: [p].ProductID
        ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=5)
    :

In the simplified tree, it is rationalised to ProductId = 5
*/






Github Repo for everything in the blog: SQL-Server-Examples


  • To best leverage indexes, understand SArgability
  • The optimizer has logic that will deal with some 'strange' query constructs e.g. generated by an ORM.
  • When considering a poorly performing and unreadable ORM generated query, it is sometimes better to focus on the optimized result (plan in SSMS) rather than be distracted by bad query constructs that the optimizer has coped with and therefore aren't the cause of the problem.
  • If your application generates it's own non-ORM queries, be aware of what the optimizer will manage for you - i.e. don't write your own optimizations when they aren't needed. (e.g domain simplification).



In part 5 we will look at joins, including clever tricks the optimizer can do with them to simplify plans.

August 19, 2020

Logical Trees Part 3: Getting and projecting; SQL FROM, SELECT

In part 2 we looked at the individual trees and overall types of logical operators.

In this post we will get going with some examples you can try: Basic SQL SELECT statements and how they are described using logical get and project operators.

Operators
Examples
Optimization
Get the Code
Remember This



Operators


LogOp_Get  Get data from source


LogOp_Get Operator

The LogOp_Get logical operator describes getting data from a table or a multi statement table valued function. It is a leaf operator - i.e. it has no children. Inline table valued functions are handled by a different structure. We will look at both types of table valued functions in a later post.

The source specification includes the following:

TBL {schema.name} - Table or function name. If the query uses a synonym, the synonym name will be given.
(alias TBL: {alias}) Alias, if specified in the query.
{schema.name} - The base object - i.e. if the the query uses a synonym, the name given here will be the object the synonym references.
TableId={Object Id} - Object id of the base object.
TableReference={Id} - Table reference identifier, which may be used to differentiate different uses of the same source, e.g. where a sub query references a table that is also used in the parent query.
IsRow: COL: {IsBaseRow Derived Column} - A derived internal flag that may be later used in data modification queries and some cursor plans - ref: SQLKiwi (Paul White) 


LogOp_Project  Project columns


LogOp_Project Operator

The Project logical operator specifies the columns to be returned from the child operators. In most cases, these columns are specified on the same line as the Project. Each column is specified as follows - for  physical and derived columns respectively.

QCOL: [{table alias}].{column name}] - Physical column from a table specified in a child operator
COL: {expression name} - A derived column, where the expression is defined in a child operator.

Project has two children. The first is a logical operator - this could be a simple get, or an operator that has it's own descendants. This nesting behaviour can give rise to deep trees for complex queries - similar to query plans in SSMS.

The second child operator - AncOp_PrjList (project list) acts as a parent for possible derived column definitions. If there are no derived columns, the operator will still be present, but childless.

In some cases, the initial tree (Converted) will include a passive project operator - i.e. one that does not specify any additional columns. This will be eliminated later in optimization.


These examples queries highlight how the operators appear at the first optimization step - 'Converted Tree'.

The examples use the AdventureWorks database. Any version from 2008R2 onwards will work - available hereThe queries specify a set of trace flags that will print the logical trees (and the physical output tree) to the messages window in SSMS. The tree output shown was produced by the SQL Server 2019 optimizer. Older versions may have different behaviour.


Select a column


-- Select a column

SELECT ProductID
FROM Production.Product
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Converted Tree: ***

LogOp_Project QCOL: [AdventureWorks2017].[Production].[Product].ProductID
    LogOp_Get TBL: Production.Product Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000 
    AncOp_PrjList       

LogOp_Get specifies the table given in the query, which can be a synonym.
The second reference is to the base table - in this case the same as the query.
TableId is the base object Id.
TableReferenceID may be used to differentiate otherwise ambiguous table references;

LogOp_Project {column specifiers} has two children:
- The LogOp_Get describing the table read.
- AncOp_PrjList; which can have it's own children producing derived columns.
*/


Select a column with a table alias


-- Select a column with table alias

SELECT p.ProductID
FROM Production.Product AS p
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Converted Tree: ***

LogOp_Project QCOL: [p].ProductID
    LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000
    AncOp_PrjList

Project uses the table alias in the column specifier
Get associates the table with an alias.
*/



Select multiple columns


-- Select multiple columns

SELECT p.ProductID,
       p.Name
FROM Production.Product AS p
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/*  *** Converted Tree: ***

LogOp_Project QCOL: [p].ProductID QCOL: [p].Name
    LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000
    AncOp_PrjList

LogOp_Project specifies more than one QCOL columns
*/



Select from a synonym


-- Select from a table synonym

CREATE SYNONYM Production.ProductSynonym
FOR Production.Product;

SELECT ps.ProductID
FROM Production.ProductSynonym AS ps
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

DROP SYNONYM Production.ProductSynonym;

/* *** Converted Tree: ***

LogOp_Project QCOL: [ps].ProductID
    LogOp_Get TBL: Production.ProductSynonym(alias TBL: ps) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000
    AncOp_PrjList 

LogOp_Get specifies the synonym first, and then specifies the base table.
*/



It is easy to think of indexes solely as a way of speeding up queries that have some selectivity. However, 
even non-selective queries like those above still benefit from the optimization logic termed Index Matching.

Index Matching


When data needs to be fetched from a table, there may be more than one structure that can support what is needed. Heaps and indexes - clustered and non clustered - are all possible candidates.

For queries with a simple criteria - one that can be matched to index keys - the optimizer will favour those that support a seek if it provides sufficient selectivity to be more performant than a full scan. 

If the index does not cover all the columns that are needed to be returned, the optimizer may augment the access (a seek or scan) with a subsequent lookup to another index, to fetch the data needed.

Index width may be considered. If more than one structure is sufficient for the selectivity, the optimizer will favour the narrowest structure - whether it be the one used for the initial selectivity, or the one used in a following lookup step. Additionally, the logic may also consider ordering requirements (ascending or descending) and partitioning.

In the post optimization rewrite (separate to index matching), a filter operator implementing a predicate that could not leverage an index seek may, nonetheless, be collapsed into the data access operator. This means that the predicate is evaluated in the same loop of code reading data from the source. The greatly reduced need for locking results in both better performance, and concurrency.

The above is not a comprehensive analysis of the matching logic, but gives a flavour of some of the factors the optimizer considers.


Simple Index Matching Example


In the output tree in the example below, the index scan physical operator (PhyOp_Range) specifies which index has been chosen. The (3) indicates index_id 3 for the Product table - AK_Product_Name. This is the narrowest index available to satisfy the query.


-- Optimization: Index Matching

SELECT p.Name
FROM Production.Product AS p
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Output Tree: (trivial plan) ***

PhyOp_NOP
    PhyOp_Range TBL: Production.Product(alias TBL: p)(3) ASC  Bmk ( QCOL: [p].ProductID) IsRow: COL: IsBaseRow1000 

Index matching looks for the narrowest index that will satisfy the query fastest.
In this case the only column needed is Name.
Index Id 3 - AK_Product_Name - is chosen. It is the narrowest of the indexes that contain the Name column
*/


Finding Index IDs


Although it's generally easier to discover the name of the chosen index by viewing the query plan in SSMS, one can instead map the id shown in the output tree by looking it up in the sys.indexes view.

- Finding Index Ids

SELECT [Schema] = OBJECT_SCHEMA_NAME(i.object_id),
       [Table] = OBJECT_NAME(i.object_id),
       i.index_id,
       IndexName = i.name
FROM sys.indexes AS i
ORDER BY [Schema],
         [Table],
         i.index_id,
         IndexName;





Github Repo for everything in the blog: SQL-Server-Examples



  • Even non selective queries can benefit from a well matched index.
  • For poorly performing non selective queries consider a tailored index - most beneficial where required the columns are collectively narrow relative to the overall table width.

In part 4 we will look at simple WHERE criteria, with some examples of the filtering operators used in logical trees.

August 17, 2020

Logical Trees Part 2: Overview of Trees and Operators

In part 1 we looked at the overall flow of the optimizer.

In this post we will examine the different types of logical trees, and the types of operators you see inside them.

Logical Trees
Tree Types
Operator Types
Remember This


Logical Trees

Logical trees are composed of operators, each of which may have child operators, producing a cascading plan. Conceptually, the structure is similar to a query plan you will see in Management Studio. There are however some key characteristics to note.

Unlike physical operators in a query plan, logical operators never execute. Their sole purpose is to describe an operation that logically exists at a given step. For example, a join will be shown in it's logical form (inner, outer, full, semi, anti-semi) with no consideration as to whether it should be a hash join, merge join, nested loop or apply. Similarly, getting table data is described with no determination of how the table should be accessed, e.g. with a seek or scan, or whether a heap or specific index will be used as the source of the data.

Terminology is different than SQL - Relational rather SQL terminology is used. Get used to the idea that selection means determining which rows to return - aka filtering. Projection means defining which columns to return - akin the behaviour of SQL SELECT.

Logical operators describe the intrinsic functionality expressed in T-SQL code. This is not necessarily a direct translation. When we look at logical trees, we will see cases where the same logical operators are used for seemingly different SQL functions or constructs. For example, there is no logical left string operator - it can be adequately described using the more multi-purpose substring. Conversely, logical (and physical) operators can describe a null equality test - which is not explicitly available in T-SQL.

Logical (and physical) trees are solely diagnostics - they omit lots of 'under the hood' information that is used by the optimizer. Also, in places, they have a few syntax anomalies that would make them unsuitable for machine reading. But despite these limitations - or perhaps because of some of them - they are human readable and useful in understanding steps in the optimization process.


Tree Types

At different steps of optimization we can examine the logical tree, and also some physical trees at the end of the process. Below are some examples * of the changes that can occur between each tree. We will look at practical examples in later posts.


SQL Server Optimizer Overview

Converted Tree

Views are expanded. The full logic for a view's SQL code is written into the tree.

Calculated columns are expanded, with their explicit expression written into the tree regardless of need (this will be eliminated later if not used).

Expressions may be simplified with Constant Folding. If an expression evaluates constants e.g. 10+2, this can be rewritten simply as 12.

Other expression rewrites occur, such as eliminating double negatives.

Input Tree

Regular joins are rewritten as cartesian joins, with a subsequent filter to describe the join criteria. Most will be rationalised back to regular (non cartesian) joins in a later step. 

Table valued constructors are tidied, from using unions to a simpler structure.

Simplified Tree

Redundant joins may be eliminated. This is a very powerful optimizer feature that we will explore in depth in a later post.

Some outer joins may be converted to inner joins, e.g. if a predicate is applied to the outer table.

Unused calculated columns - expanded earlier in the Converted tree - are eliminated.

Contradiction Detection eliminates areas of a query that cannot return rows - e.g. WHERE p.ProductID < 10 AND p.ProductId > 10

Domain Simplification can rationalise multiple clauses against a single source. For example:
WHERE p.ProductId BETWEEN 1 AND 5 AND p.ProductId BETWEEN 5 and 10
can be rationalised to
WHERE p.ProductId = 5

Join Collapsed Tree

Ahead of this tree being output, statistics for relevant columns are created or updated as required. This facilitates estimating row counts (cardinality) for each operator in the tree. These estimates are fundamental to choosing a good plan in later stages of optimization. Some stats update/create work may also occur during cost based optimization, if any of the generated alternative plans require different statistical information.

An initial join order is then set - aka Heuristic Join Reorder. This process may also eliminate redundant joins, not previously eliminated during simplification.

This may manifest as a surprising operator - one with no equivalent in a physical query plan - the NAry join. It describes a join of more than two tables in a single operator, combining the respective join criteria into one composite comparison.

After Project Normalization Tree

Expressions that match a calculated column definition are collapsed into a simple reference to the relevant column.

Common expressions - ones that appear in more than one place in the query - may be refactored to only be evaluated once, and then referenced as a single derived column.

Other rewrites include pushing some expressions deeper into the plan, e.g. to evaluate them before a join operation rather than after.

Output Tree 

This is the first physical tree we see. The simple logical operators are replaced with more complex physical operators - joins have now been determined to be nested loop, hash or merge. Table access operators describe specific indexes or heaps to be used, and whether there will be a seek or a scan.

This is the result of the Exploration stages - during which logically equivalent, but physically different trees are produced and costed. One of these trees may be chosen for two reasons. Firstly, it may be deemed to be 'good enough': i.e. further exploration is not expected to be cost effective in improving performance. Alternatively, if a time limit has been reached, the best tree found to far will be chosen. The exploration stages are beyond the scope of this series, but it is well worth learning the basics to round out your understanding of the later steps of optimization.

Post Optimization Rewrites

Following the output tree produced by the optimizer, some more work may be done in order to form an execution plan. This includes rewrites that are always beneficial for a given scenario - i.e. they do not need costing, and therefore have no place in the exploration stages, but can only be applied once there are physical operators in the tree.

An example of this collapsing a filter operator into the preceding data access operator, for better performance.

Operator Types

In logical trees there are three overall types of operator.

Logical Operators - LogOp_{name}

Logical operators describe the processing of sets of data. This includes originating data from tables, and operations which in most cases take other operators as inputs; such as joins, filters and aggregations. Although there are 75 of these operators in SQL Server 2019, there is a surprisingly small subset that satisfy most queries. 

In the output tree we will see logical operations expressed instead as specific physical operators - PhyOp_{name} - e.g. a Hash Join rather than a generic logical join. In this step by step guide, we'll look at examples of this evolution, and how you can design tables, indexes and queries to get the operators that will deliver the best performance for a query.

Ancillary Operators - AncOp_{name}

Ancillary operators describe additional functionality for a logical operator. Of the 16 ancillary operators in SQL Server 2019, one of the most commonly seen uses is encapsulating an expression to derive a new column that will be evaluated at run time.

Scalar Operators - ScaOp_{name}

Scalar operators are the building blocks of expressions, producing a scalar rather than relational output. They range from basic mathematical operations such as add, multiply etc, through to comparisons and set comparisons such as exists.

Note that unlike logical operators (LogOp_{name}), the scalar operators found in logical trees can also exist in physical plans - i.e. they can be executed.

Not only are there over fifty of these operators, several of those have many variants, leading to over a hundred explicit uses. Despite this apparent proliferation, most are readable without the need for in-depth knowledge.





  • Logical trees show the early stages of optimization.
  • A number of simplifications are applied, to eliminate unnecessary work.
  • Logical operators do not specify physical processing methods such as physical join types (e.g. hash) or data access methods - this is determined later.

In part 3, we'll look at examples of logical operators with T-SQL examples you can try, starting with getting and projecting data from tables.




* The tree behaviour examples given above have been common in my tests of 250 queries. The following sql file provides repros for these observations:

Logical Trees Part 2 - Overview of Trees and Operators.sql (GitHub)

These behaviours are however far from comprehensive. The optimizer contains around four hundred rules that can rewrite portions of a tree in certain circumstances.

The following sql file gives queries reproducing 
Please do comment below or get in touch if you have other examples that might be worth a mention.