Wednesday, 08 June 2022

Simple Parameterization and Trivial Plans—Part 5

Checkpoint

In part 4, I described the role the parser plays turning a statement with constant values into a parameterized prepared statement. To recap briefly, the parser:

Identifies potential parameters
Assigns an initial data type
Accounts for both simple and forced parameterization rules
Sets a flag if it determines simple parameterization is impossible

When the parser allows simple parameterization to continue, SQL Server increments the Auto-Param Attempts/sec counter of the SQL Statistics object.

Let’s now continue following the compilation process noting the effects on simple parameterization as we go.

As in previous parts, code examples use the Stack Overflow 2010 database on SQL Server 2019 CU 16 with the following additional nonclustered index:

CREATE INDEX [IX dbo.Users Reputation (DisplayName)] 
ON dbo.Users (Reputation) 
INCLUDE (DisplayName);

Normalization

The output of parsing is a logical representation of the statement called a parse tree. This tree does not contain SQL language elements. It is an abstraction of the logical elements of the original query specification.

It is frequently possible to write the same logical requirement in different ways. Using a SQL analogy, x IN (4, 8) is logically the same as writing x = 4 OR x = 8. This flexibility can be useful when writing queries, but it makes implementing an optimizer more difficult.

In general terms, normalization is an attempt to standardize. It recognises common variations expressing the same logic, and rewrites them in a standard way. For example, normalization is responsible for turning x BETWEEN y AND z into x >= y AND x <= z. The normalized, or standardized, form chosen is one the query processor finds convenient to work with internally.

I’m simplifying a bit here. Like parameterization, normalization happens in stages at different points during statement compilation and optimization. Logical operator trees and expressions are normalized at different times and in different ways. It is only important to understand the broad concept for what follows.

Maximizing Parameterized Plan Reuse

SQL Server could use the parameter locations determined during parsing to directly replace constants with parameter markers. This would function to allow plan reuse, but also require future statements to be written in exactly the same way (constant values aside) to benefit from plan reuse.

This might be useful enough in itself, but we can do better by normalizing. Expressing the parameterized statement in a standardized way allows future statements to match if they normalize to the same form, even where the original text is quite different. I’ll show you some examples shortly.

Decoding

SQL Server implements this important normalization aspect by decoding the logical operator tree back into a SQL representation. Decoding uses slightly different rules for simple and forced parameterization but in either case the normalized SQL will usually be quite different from the original.

For simple parameterization, decoding emits keywords in upper case, data types and intrinsic functions in lower case, delimits object names with square brackets, and removes unnecessary spaces and comments.

Forced parameterization emits keywords in lower case, does not use bracket delimiters, and places a space either side of the ‘dot’ between schema and object names.

Decoding also standardizes elements like comparison operators, the AS keyword for aliases, and the presence of an ASC or DESC specification in an ORDER BY clause. For example, decoding always expresses ‘not equal’ using <> even if != was used in the original statement.

Neither normalization scheme changes the casing of non-keyword elements like object names. This could be unsafe in case-sensitive databases.

Whichever normalization scheme is used, the key point is AdHoc statements can be written using different keyword casing, spacing, comments, and delimiters but still benefit from plan reuse through server-side parameterization. The text will normalize to the same parameterized form, allowing a cached prepared plan to be matched and reused.

Let’s look at an example.

Example 1 - Normalization

The following statements generate the same normalized text under simple parameterization:

ALTER DATABASE SCOPED CONFIGURATION 
    CLEAR PROCEDURE_CACHE;
GO
SELECT U.DisplayName FROM dbo.Users AS U 
WHERE U.Reputation IN (2) ORDER BY U.Id;
GO
SELECT U.DisplayName 
FROM dbo.Users AS U 
WHERE U.Reputation =3
ORDER BY U.Id;
GO
select 
    U.DisplayName 
FROM [dbo].[Users] as U 
WhErE 
    U.Reputation = (4)
Order By
    U.[Id];
GO
SELECT
    U.DisplayName 
    -- Note: No AS used for the alias
from [dbo] . Users U 
where
    U.Reputation = 5
order by 
    U.Id asc;
GO

Plan cache

Let’s look at the plan cache:

SELECT
    CP.usecounts,
    CP.objtype,
    ST.[text]
FROM sys.dm_exec_cached_plans AS CP
CROSS APPLY sys.dm_exec_sql_text (CP.plan_handle) AS ST
WHERE 
    ST.[text] NOT LIKE '%dm_exec_cached_plans%'
    AND ST.[text] LIKE '%DisplayName%Users%'
ORDER BY 
    CP.objtype ASC;

The output shows four AdHoc shell plans pointing to a single Prepared plan:

Normalized statement plan reuse

Each AdHoc statement is different, but the logical trees all decode to the same form (line breaks added for clarity):

(@1 tinyint)
SELECT [U].[DisplayName]
FROM [dbo].[Users] [U] 
WHERE [U].[Reputation]=@1 
ORDER BY [U].[Id] ASC

The normalized and parameterized form is used in cache lookups so the prepared plan is successfully found and reused.

This is a neat feature of simple parameterization but it is still best to use a consistent SQL style. Admittedly, this can be difficult to enforce in practical work environments over time.

Normalization isn’t perfect. You can probably find any number of ways to subtly change the statements above to produce different ‘normalized’ parameterized forms. Normalization promotes reuse of parameterized plans—it doesn’t guarantee it.

Failed Parameterization

The decoding step for simple parameterization has deliberately limited capabilities. It is only able to decode certain logical tree elements into SQL. These limitations determine which clauses and intrinsic functions (among other things) are compatible with simple parameterization.

If the decoding step successfully produces a complete SQL representation, the compilation process proceeds from this point as a prepared (parameterized) statement.

When decoding fails, the Failed Auto-Params/sec counter of the SQL Statistics object is incremented and compilation continues as an unparameterized (not prepared) statement.

Statements that fail at the parsing or decoding steps do not show any parameter details in estimated or actual plans because a prepared version of the statement is never created. Refer back to part three for execution plan details.

Decoder Limitations

For a parameterization attempt to fail here, it must have passed the generic tests performed by the parser, but fail at the decoding stage. The syntax elements, global variables, and intrinsic functions supported by the simple parameterization decoder are not documented.

The decoder limitations explain why (and when) examples shown earlier in this series failed with LOWER and CEILING built-in functions, but succeeded with FLOOR and ABS. Constant to constant comparison is also unsupported by the simple parameterization decoder, which explains why the well-known WHERE 1 = 1 trick prevents simple parameterization. As further examples, the decoder supports the @@SPID and @@TRANCOUNT global variables, but not @@ROWCOUNT or @@IDENTITY.

The performance counter test rig in part three can be used to explore which statement features fail at the decoding stage. Such statements will increment both Auto-Param Attempts/sec and Failed Auto-Params/sec.

Remember a statement disqualified from simple parameterization by the parser doesn’t increment Auto-Param Attempts/sec.

Example 2 - Plan Reuse

Try to predict how many plans of which type (ad-hoc or prepared) will be cached if we remove the GO batch separators from example 1:

-- Single batch
SELECT U.DisplayName FROM dbo.Users AS U 
WHERE U.Reputation IN (2) ORDER BY U.Id;

SELECT U.DisplayName 
FROM dbo.Users AS U 
WHERE U.Reputation =3
ORDER BY U.Id;

select 
    U.DisplayName 
FROM [dbo].[Users] as U 
WhErE 
    U.Reputation = (4)
Order By
    U.[Id];

SELECT
    U.DisplayName 
    -- Note: No AS used for the alias
from [dbo] . Users U 
where
    U.Reputation = 5
order by 
    U.Id asc;

You may be surprised to find only two plans are cached, one Adhoc and one Prepared.

The prepared plan is used four times:

Two plans cached

Explanation

Back in part one, I said it can be useful to think of a prepared statement as an (unnamed) stored procedure. Imagine example 2 had contained four calls to a single stored procedure with different parameter values each time. That would cache the ad-hoc batch once and the stored procedure plan once. The single stored procedure plan would be executed four times with the different parameter values. Something conceptually similar happened with simple parameterization as I’ll now explain.

SQL Server compiles a single plan for all statements in a batch, but it compiles each statement in the batch sequentially. In example 2, the first statement encountered qualifies for simple parameterization so a separate prepared statement is built and cached.

The second, third, and fourth statements in the batch also qualify for simple parameterization. After normalization and decoding they match the prepared plan cached by the first statement. The single prepared plan is therefore executed a total of four times.

Shell plans revisited

The Adhoc plan cached in example 2 contains four shell plans. Each shell plan points to the same Prepared plan.

Caching the Adhoc shell ensures that future execution of exactly the same statement text (including constant values) will quickly find the parameterized plan.

Without the shells, the statement would need to be parsed, parameterized, normalized, and decoded back to SQL representation before it could be matched to the prepared plan.

This is less work than generating even a trivial plan, but it is still less efficient than using the shell to locate the prepared statement directly from the source text.

The goal of simple parameterization is to optimize performance for simple and frequently-executed ad-hoc SQL statements that parameterize to the same form. For an OLTP workload with a very rapid submission of such statements, every millisecond counts. Besides, all compilation activity consumes some server resources.

Only the Prepared plan is cached

Algebrization and constant folding

The next step of the compilation process is algebrization. This complex stage performs a number of tasks, including binding object names found in the parse tree to physical database entities, constant folding, matching aggregates to grouping elements, and deriving final data types from metadata using type conversion rules as necessary. The output from this activity is a bound tree of logical operators.

The most relevant part of algebrization to simple parameterization is the initial round of constant folding it performs.

As the linked documentation states, constant folding is the early evaluation of expressions to improve runtime performance. For example, the expression DATEFROMPARTS(2022, 07, 11) can be evaluated early to the date value 11 July 2022.

The algebrization stage is the first time constant folding is applied during compilation, so earlier stages will always see unfolded expressions.

Let’s look at this with an example.

Example 3 - Constant Folding

SELECT 
    U.DisplayName
FROM dbo.Users AS U 
WHERE 
    U.Reputation = 900 + 90 + 9;

This statement qualifies for simple parameterization. Each of the constants is identified as a separate parameter by the parser, and initially typed as integer. No integer type shrinking to smallint or tinyint is performed because the immediate parsing context is an arithmetic operator as described in part four.

The post-execution (actual) plan confirms parameterization occurred before constant folding had a chance to evaluate 900 + 90 + 9:

Simple parameterization applied before constant folding

Notice the three integer parameters and the normalized representation of the parameterized statement text in the top bar.

This example is specific to simple parameterization. Forced parameterization doesn’t parameterize constant-foldable expressions that are arguments of the +, -, *, /, and % operators.

Constant folding may be repeated several times later in the compilation and optimization process as new folding opportunities arise. These later constant folding runs may apply to the whole tree, newly generated alternative subtrees, or an individual expression.

End of Part 5

In the final part of this series, I’ll continue analysis of the compilation process with the simplification and trivial plan stages, including how and when SQL Server decides if simple parameterization is safe or unsafe.