August 19, 2020

Logical Trees Part 3: Getting and projecting; SQL FROM, SELECT

In part 2 we looked at the individual trees and overall types of logical operators.

In this post we will get going with some examples you can try: Basic SQL SELECT statements and how they are described using logical get and project operators.

Operators
Examples
Optimization
Get the Code
Remember This



Operators


LogOp_Get  Get data from source


LogOp_Get Operator

The LogOp_Get logical operator describes getting data from a table or a multi statement table valued function. It is a leaf operator - i.e. it has no children. Inline table valued functions are handled by a different structure. We will look at both types of table valued functions in a later post.

The source specification includes the following:

TBL {schema.name} - Table or function name. If the query uses a synonym, the synonym name will be given.
(alias TBL: {alias}) Alias, if specified in the query.
{schema.name} - The base object - i.e. if the the query uses a synonym, the name given here will be the object the synonym references.
TableId={Object Id} - Object id of the base object.
TableReference={Id} - Table reference identifier, which may be used to differentiate different uses of the same source, e.g. where a sub query references a table that is also used in the parent query.
IsRow: COL: {IsBaseRow Derived Column} - A derived internal flag that may be later used in data modification queries and some cursor plans - ref: SQLKiwi (Paul White) 


LogOp_Project  Project columns


LogOp_Project Operator

The Project logical operator specifies the columns to be returned from the child operators. In most cases, these columns are specified on the same line as the Project. Each column is specified as follows - for  physical and derived columns respectively.

QCOL: [{table alias}].{column name}] - Physical column from a table specified in a child operator
COL: {expression name} - A derived column, where the expression is defined in a child operator.

Project has two children. The first is a logical operator - this could be a simple get, or an operator that has it's own descendants. This nesting behaviour can give rise to deep trees for complex queries - similar to query plans in SSMS.

The second child operator - AncOp_PrjList (project list) acts as a parent for possible derived column definitions. If there are no derived columns, the operator will still be present, but childless.

In some cases, the initial tree (Converted) will include a passive project operator - i.e. one that does not specify any additional columns. This will be eliminated later in optimization.


These examples queries highlight how the operators appear at the first optimization step - 'Converted Tree'.

The examples use the AdventureWorks database. Any version from 2008R2 onwards will work - available hereThe queries specify a set of trace flags that will print the logical trees (and the physical output tree) to the messages window in SSMS. The tree output shown was produced by the SQL Server 2019 optimizer. Older versions may have different behaviour.


Select a column


-- Select a column

SELECT ProductID
FROM Production.Product
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Converted Tree: ***

LogOp_Project QCOL: [AdventureWorks2017].[Production].[Product].ProductID
    LogOp_Get TBL: Production.Product Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000 
    AncOp_PrjList       

LogOp_Get specifies the table given in the query, which can be a synonym.
The second reference is to the base table - in this case the same as the query.
TableId is the base object Id.
TableReferenceID may be used to differentiate otherwise ambiguous table references;

LogOp_Project {column specifiers} has two children:
- The LogOp_Get describing the table read.
- AncOp_PrjList; which can have it's own children producing derived columns.
*/


Select a column with a table alias


-- Select a column with table alias

SELECT p.ProductID
FROM Production.Product AS p
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Converted Tree: ***

LogOp_Project QCOL: [p].ProductID
    LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000
    AncOp_PrjList

Project uses the table alias in the column specifier
Get associates the table with an alias.
*/



Select multiple columns


-- Select multiple columns

SELECT p.ProductID,
       p.Name
FROM Production.Product AS p
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/*  *** Converted Tree: ***

LogOp_Project QCOL: [p].ProductID QCOL: [p].Name
    LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000
    AncOp_PrjList

LogOp_Project specifies more than one QCOL columns
*/



Select from a synonym


-- Select from a table synonym

CREATE SYNONYM Production.ProductSynonym
FOR Production.Product;

SELECT ps.ProductID
FROM Production.ProductSynonym AS ps
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

DROP SYNONYM Production.ProductSynonym;

/* *** Converted Tree: ***

LogOp_Project QCOL: [ps].ProductID
    LogOp_Get TBL: Production.ProductSynonym(alias TBL: ps) Production.Product TableID=482100758 TableReferenceID=0 IsRow: COL: IsBaseRow1000
    AncOp_PrjList 

LogOp_Get specifies the synonym first, and then specifies the base table.
*/



It is easy to think of indexes solely as a way of speeding up queries that have some selectivity. However, 
even non-selective queries like those above still benefit from the optimization logic termed Index Matching.

Index Matching


When data needs to be fetched from a table, there may be more than one structure that can support what is needed. Heaps and indexes - clustered and non clustered - are all possible candidates.

For queries with a simple criteria - one that can be matched to index keys - the optimizer will favour those that support a seek if it provides sufficient selectivity to be more performant than a full scan. 

If the index does not cover all the columns that are needed to be returned, the optimizer may augment the access (a seek or scan) with a subsequent lookup to another index, to fetch the data needed.

Index width may be considered. If more than one structure is sufficient for the selectivity, the optimizer will favour the narrowest structure - whether it be the one used for the initial selectivity, or the one used in a following lookup step. Additionally, the logic may also consider ordering requirements (ascending or descending) and partitioning.

In the post optimization rewrite (separate to index matching), a filter operator implementing a predicate that could not leverage an index seek may, nonetheless, be collapsed into the data access operator. This means that the predicate is evaluated in the same loop of code reading data from the source. The greatly reduced need for locking results in both better performance, and concurrency.

The above is not a comprehensive analysis of the matching logic, but gives a flavour of some of the factors the optimizer considers.


Simple Index Matching Example


In the output tree in the example below, the index scan physical operator (PhyOp_Range) specifies which index has been chosen. The (3) indicates index_id 3 for the Product table - AK_Product_Name. This is the narrowest index available to satisfy the query.


-- Optimization: Index Matching

SELECT p.Name
FROM Production.Product AS p
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

/* *** Output Tree: (trivial plan) ***

PhyOp_NOP
    PhyOp_Range TBL: Production.Product(alias TBL: p)(3) ASC  Bmk ( QCOL: [p].ProductID) IsRow: COL: IsBaseRow1000 

Index matching looks for the narrowest index that will satisfy the query fastest.
In this case the only column needed is Name.
Index Id 3 - AK_Product_Name - is chosen. It is the narrowest of the indexes that contain the Name column
*/


Finding Index IDs


Although it's generally easier to discover the name of the chosen index by viewing the query plan in SSMS, one can instead map the id shown in the output tree by looking it up in the sys.indexes view.

- Finding Index Ids

SELECT [Schema] = OBJECT_SCHEMA_NAME(i.object_id),
       [Table] = OBJECT_NAME(i.object_id),
       i.index_id,
       IndexName = i.name
FROM sys.indexes AS i
ORDER BY [Schema],
         [Table],
         i.index_id,
         IndexName;





Github Repo for everything in the blog: SQL-Server-Examples



  • Even non selective queries can benefit from a well matched index.
  • For poorly performing non selective queries consider a tailored index - most beneficial where required the columns are collectively narrow relative to the overall table width.

In part 4 we will look at simple WHERE criteria, with some examples of the filtering operators used in logical trees.

No comments:

Post a Comment