August 17, 2020

Logical Trees Part 2: Overview of Trees and Operators

In part 1 we looked at the overall flow of the optimizer.

In this post we will examine the different types of logical trees, and the types of operators you see inside them.

Logical Trees
Tree Types
Operator Types
Remember This

Logical Trees

Logical trees are composed of operators, each of which may have child operators, producing a cascading plan. Conceptually, the structure is similar to a query plan you will see in Management Studio. There are however some key characteristics to note.

Unlike physical operators in a query plan, logical operators never execute. Their sole purpose is to describe an operation that logically exists at a given step. For example, a join will be shown in it's logical form (inner, outer, full, semi, anti-semi) with no consideration as to whether it should be a hash join, merge join, nested loop or apply. Similarly, getting table data is described with no determination of how the table should be accessed, e.g. with a seek or scan, or whether a heap or specific index will be used as the source of the data.

Terminology is different than SQL - Relational rather SQL terminology is used. Get used to the idea that selection means determining which rows to return - aka filtering. Projection means defining which columns to return - akin the behaviour of SQL SELECT.

Logical operators describe the intrinsic functionality expressed in T-SQL code. This is not necessarily a direct translation. When we look at logical trees, we will see cases where the same logical operators are used for seemingly different SQL functions or constructs. For example, there is no logical left string operator - it can be adequately described using the more multi-purpose substring. Conversely, logical (and physical) operators can describe a null equality test - which is not explicitly available in T-SQL.

Logical (and physical) trees are solely diagnostics - they omit lots of 'under the hood' information that is used by the optimizer. Also, in places, they have a few syntax anomalies that would make them unsuitable for machine reading. But despite these limitations - or perhaps because of some of them - they are human readable and useful in understanding steps in the optimization process.

Tree Types

At different steps of optimization we can examine the logical tree, and also some physical trees at the end of the process. Below are some examples * of the changes that can occur between each tree. We will look at practical examples in later posts.

SQL Server Optimizer Overview

Converted Tree

Views are expanded. The full logic for a view's SQL code is written into the tree.

Calculated columns are expanded, with their explicit expression written into the tree regardless of need (this will be eliminated later if not used).

Expressions may be simplified with Constant Folding. If an expression evaluates constants e.g. 10+2, this can be rewritten simply as 12.

Other expression rewrites occur, such as eliminating double negatives.

Input Tree

Regular joins are rewritten as cartesian joins, with a subsequent filter to describe the join criteria. Most will be rationalised back to regular (non cartesian) joins in a later step. 

Table valued constructors are tidied, from using unions to a simpler structure.

Simplified Tree

Redundant joins may be eliminated. This is a very powerful optimizer feature that we will explore in depth in a later post.

Some outer joins may be converted to inner joins, e.g. if a predicate is applied to the outer table.

Unused calculated columns - expanded earlier in the Converted tree - are eliminated.

Contradiction Detection eliminates areas of a query that cannot return rows - e.g. WHERE p.ProductID < 10 AND p.ProductId > 10

Domain Simplification can rationalise multiple clauses against a single source. For example:
WHERE p.ProductId BETWEEN 1 AND 5 AND p.ProductId BETWEEN 5 and 10
can be rationalised to
WHERE p.ProductId = 5

Join Collapsed Tree

Ahead of this tree being output, statistics for relevant columns are created or updated as required. This facilitates estimating row counts (cardinality) for each operator in the tree. These estimates are fundamental to choosing a good plan in later stages of optimization. Some stats update/create work may also occur during cost based optimization, if any of the generated alternative plans require different statistical information.

An initial join order is then set - aka Heuristic Join Reorder. This process may also eliminate redundant joins, not previously eliminated during simplification.

This may manifest as a surprising operator - one with no equivalent in a physical query plan - the NAry join. It describes a join of more than two tables in a single operator, combining the respective join criteria into one composite comparison.

After Project Normalization Tree

Expressions that match a calculated column definition are collapsed into a simple reference to the relevant column.

Common expressions - ones that appear in more than one place in the query - may be refactored to only be evaluated once, and then referenced as a single derived column.

Other rewrites include pushing some expressions deeper into the plan, e.g. to evaluate them before a join operation rather than after.

Output Tree 

This is the first physical tree we see. The simple logical operators are replaced with more complex physical operators - joins have now been determined to be nested loop, hash or merge. Table access operators describe specific indexes or heaps to be used, and whether there will be a seek or a scan.

This is the result of the Exploration stages - during which logically equivalent, but physically different trees are produced and costed. One of these trees may be chosen for two reasons. Firstly, it may be deemed to be 'good enough': i.e. further exploration is not expected to be cost effective in improving performance. Alternatively, if a time limit has been reached, the best tree found to far will be chosen. The exploration stages are beyond the scope of this series, but it is well worth learning the basics to round out your understanding of the later steps of optimization.

Post Optimization Rewrites

Following the output tree produced by the optimizer, some more work may be done in order to form an execution plan. This includes rewrites that are always beneficial for a given scenario - i.e. they do not need costing, and therefore have no place in the exploration stages, but can only be applied once there are physical operators in the tree.

An example of this collapsing a filter operator into the preceding data access operator, for better performance.

Operator Types

In logical trees there are three overall types of operator.

Logical Operators - LogOp_{name}

Logical operators describe the processing of sets of data. This includes originating data from tables, and operations which in most cases take other operators as inputs; such as joins, filters and aggregations. Although there are 75 of these operators in SQL Server 2019, there is a surprisingly small subset that satisfy most queries. 

In the output tree we will see logical operations expressed instead as specific physical operators - PhyOp_{name} - e.g. a Hash Join rather than a generic logical join. In this step by step guide, we'll look at examples of this evolution, and how you can design tables, indexes and queries to get the operators that will deliver the best performance for a query.

Ancillary Operators - AncOp_{name}

Ancillary operators describe additional functionality for a logical operator. Of the 16 ancillary operators in SQL Server 2019, one of the most commonly seen uses is encapsulating an expression to derive a new column that will be evaluated at run time.

Scalar Operators - ScaOp_{name}

Scalar operators are the building blocks of expressions, producing a scalar rather than relational output. They range from basic mathematical operations such as add, multiply etc, through to comparisons and set comparisons such as exists.

Note that unlike logical operators (LogOp_{name}), the scalar operators found in logical trees can also exist in physical plans - i.e. they can be executed.

Not only are there over fifty of these operators, several of those have many variants, leading to over a hundred explicit uses. Despite this apparent proliferation, most are readable without the need for in-depth knowledge.

  • Logical trees show the early stages of optimization.
  • A number of simplifications are applied, to eliminate unnecessary work.
  • Logical operators do not specify physical processing methods such as physical join types (e.g. hash) or data access methods - this is determined later.

In part 3, we'll look at examples of logical operators with T-SQL examples you can try, starting with getting and projecting data from tables.

* The tree behaviour examples given above have been common in my tests of 250 queries. The following sql file provides repros for these observations:

Logical Trees Part 2 - Overview of Trees and Operators.sql (GitHub)

These behaviours are however far from comprehensive. The optimizer contains around four hundred rules that can rewrite portions of a tree in certain circumstances.

The following sql file gives queries reproducing 
Please do comment below or get in touch if you have other examples that might be worth a mention.

No comments:

Post a Comment