Be Careful with LOBs and OPTION (RECOMPILE)
It sometimes makes sense to add OPTION (RECOMPILE)
to a query. Typically this will be when:
- A good enough plan for the query is very sensitive to one or more parameters
- No good single value exists for the parameter to use in a hint
- Optimize for unknown doesnā€™t give a good result
- The plan might be expected to change over time
- The cost of recompiling the statement is much less than the expected execution time
- Recompiling every time is very likely to save more time and resources than it costs overall
All that is fairly well-known. The point of this short post is to draw your attention to another side-effect of adding OPTION (RECOMPILE)
ā€” the parameter embedding optimization (PEO).
When PEO is used, SQL Server takes the value of any variables and parameters and embeds the runtime values in the query text, pretty much as if you had entered them by hand before compiling. This is often very useful for plan quality, but there is a potential drawback when large object types (LOBs) are in play.
Example
Consider the following toy query, which creates a 512MB LOB string then returns the first character:
DECLARE
@x varchar(max),
@y varchar(max),
@lob varchar(max);
SET @x = 'x';
SET @y = 'y';
SET @lob = @y + REPLICATE(@x, 512 * 1024 * 1024);
SELECT LEFT(@lob, 1);
That executes in around 3 seconds, with the vast majority of the time spent constructing the large string.
And it is a large string, but not outlandishly so. It is not unheard of for people to read complete files or XML/JSON into a variable using OPENROWSET...BULK
for example.
I will note in passing that LOB variables and parameters are not limited to 2GB. That is the storage limit for LOBs. You are free to construct a LOB of any size if you really want to hurt your server that way.
With a recompile hint
Add OPTION (RECOMPILE)
to the previous query:
DECLARE
@x varchar(max),
@y varchar(max),
@lob varchar(max);
SET @x = 'x';
SET @y = 'y';
SET @lob = @y + REPLICATE(@x, 512 * 1024 * 1024);
SELECT
LEFT(@lob, 1)
OPTION (RECOMPILE);
The SELECT
now takes around 30 seconds.
This is a factor of ten slower. Why?
Explanation
Well, there are a number of reasons. For one, embedding the value of @lob
at runtime requires making at least one copy of the entire string, which is resource-intensive. There are other internal details like the time taken to compute a hash of the constant value when storing it in a memo group.
The broader point is that you probably wouldnā€™t embed a 512MB string literal in a query, submit it in a query, and expect good performance. Adding OPTION (RECOMPILE)
is the equivalent of doing that.
Another relevant factor is the optimizer makes no general guarantees about the number of executions or exact timing of scalar expression evaluations. This means literals or expressions can be duplicated in the query tree, meaning an embedded LOB value can appear more than once and each instance will require a fresh copy of the large value. It is not always possible to anticipate when the optimizer will decide to do this.
The final thing I want to mention is constant folding. This means evaluating an expression at compile time, for example DATALENGTH(@lob)
in a variation of the example above. Embedding the 512MB string is followed by evaluating DATALENGTH
during compilation, which is itself an expensive operation.
Second example
If you have a few minutes to kill, try running this variation with and without the recompile hint:
DECLARE
@x varchar(max),
@y varchar(max),
@lob varchar(max);
SET @x = 'x';
SET @y = 'y';
SET @lob = @y + REPLICATE(@x, 512 * 1024 * 1024);
SELECT
LEFT(@lob, 1),
LEFT(@lob, 1),
LEFT(@lob, 1),
LEFT(@lob, 1),
LEFT(@lob, 1),
LEFT(@lob, 1),
LEFT(@lob, 1),
LEFT(@lob, 1)
OPTION (RECOMPILE);
Conclusion
I would argue that embedding large objects via PEO will rarely, if ever, pay dividends. SQL Server ought not do it.
In the meantime, ensure you only use OPTION (RECOMPILE)
when it truly makes sense, and you have given it full consideration.
Thanks for reading.