Beware Sneaky Reads with Unique Indexes
Introduction
I saw a question asked recently on #sqlhelp:
Might SQL Server retrieve (out-of-row) LOB data from a table, even if the column isn’t referenced in the query?
Leaving aside trivial cases like selecting a computed column that does reference the LOB data, one might be tempted to say that no, SQL Server does not read data you haven’t asked for. In general, that is correct; however, there are cases where SQL Server might sneakily read a LOB column.
Example Table
Here’s a T-SQL script to create that table and populate it with 1,000 rows:
CREATE TABLE dbo.LOBtest
(
pk integer IDENTITY NOT NULL,
some_value integer NULL,
lob_data varchar(max) NULL,
another_column char(5) NULL,
CONSTRAINT [PK dbo.LOBtest pk]
PRIMARY KEY CLUSTERED (pk ASC)
);
GO
DECLARE
@Data varchar(max);
SET @Data = REPLICATE(CONVERT(varchar(max), 'x'), 65540);
WITH
Numbers (n) AS
(
SELECT
ROW_NUMBER() OVER (
ORDER BY (SELECT 0))
FROM master.sys.columns C1
CROSS JOIN master.sys.columns C2
)
INSERT LOBtest
WITH (TABLOCKX)
(
some_value,
lob_data
)
SELECT TOP (1000)
N.n,
@Data
FROM Numbers AS N
WHERE
N.n <= 1000;
Test 1: A Simple Update
Let’s run a query to subtract one from every value in the some_value
column:
UPDATE dbo.LOBtest
WITH (TABLOCKX)
SET some_value = some_value - 1;
Modifying this integer column in 1,000 rows doesn’t take very long, or use many resources.
The STATISTICS IO
and TIME
output shows a total of 9 logical reads, and 25ms elapsed time. The execution plan is:
The Clustered Index Scan properties show that SQL Server retrieves only the pk
and some_value
columns during the scan:
The pk
column is needed by the Clustered Index Update operator to uniquely identify the row that is being changed.
The some_value
column is used by the Compute Scalar to calculate the new value.
In case you are wondering what the Top operator is for, it is used to enforce SET ROWCOUNT
in versions of SQL Server before 2012.
Test 2: Update with an Index
Now let’s create a nonclustered index keyed on the some_value
column, with lob_data
as an included column:
CREATE NONCLUSTERED INDEX
[IX dbo.LOBtest some_value (lob_data)]
ON dbo.LOBtest
(some_value)
INCLUDE
(lob_data)
WITH
(
FILLFACTOR = 100,
MAXDOP = 1,
SORT_IN_TEMPDB = ON
);
This is not a useful index for our update query, but imagine someone else created it for a different purpose.
Let’s run our update query again:
UPDATE dbo.LOBtest
WITH (TABLOCKX)
SET some_value = some_value - 1;
Now this requires 4,014 logical reads and the elapsed query time has increased to around 100ms.
The extra logical reads (4 per row) are an expected consequence of maintaining the nonclustered index.
The execution plan is similar to before:
The Clustered Index Update operator picks up the extra work of maintaining the nonclustered index.
The new Compute Scalar operators detect whether the value in the some_value
column has been changed by the update.
SQL Server may be able to skip maintaining the nonclustered index if the value hasn’t changed (see my previous post about non-updating updates for details).
Our update query does change the value of the some_data
column in every row, so this optimization doesn’t add any value in this case.
The list of output columns from the Clustered Index Scan hasn’t changed from the one shown previously. SQL Server still just reads the pk
and some_data
columns.
Overall, adding the nonclustered index hasn’t had any startling effects, and the LOB column data still isn’t being read from the table.
Let’s see what happens if we make the nonclustered index unique.
Test 3: Update with a Unique Index
Here’s the script to create a new unique index, and drop the previous one:
CREATE UNIQUE NONCLUSTERED INDEX
[UQ dbo.LOBtest some_value (lob_data)]
ON dbo.LOBtest
(some_value)
INCLUDE
(lob_data)
WITH
(
FILLFACTOR = 100,
MAXDOP = 1,
SORT_IN_TEMPDB = ON
);
GO
DROP INDEX [IX dbo.LOBtest some_value (lob_data)]
ON dbo.LOBtest;
Remember that SQL Server only enforces uniqueness on index keys (the some_data
column here). The lob_data
column is simply stored at the leaf-level of the non-clustered index.
With all that in mind, we might expect this change to make very little difference to the update query. Let’s see:
UPDATE dbo.LOBtest
WITH (TABLOCKX)
SET some_value = some_value - 1;
Now look at the elapsed time and logical reads:
Scan count 1,
logical reads 2016, physical reads 0, read-ahead reads 0,
lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992.
CPU time = 172 ms, elapsed time = 16172 ms.
Even with all the data and index pages already in memory, the update took over 16 seconds to update just 1,000 rows, performing over 52,000 lob logical reads (nearly 16,000 of those using read-ahead).
Why on earth is SQL Server reading LOB data for a statement that only updates a single integer column?
The Execution Plan
The execution plan for test 3 is a bit more complex than before:
The bottom level of the plan is exactly the same as we saw with the non-unique index test. The top level has heaps of new stuff, which I will address shortly.
Now, you might be expecting to find that the Clustered Index Scan is reading the lob_data
column (for some reason). After all, we need to explain where all the LOB logical reads are coming from.
Yet when we look at the properties of the Clustered Index Scan, we see exactly the same as before:
SQL Server is still only reading the pk
and some_value
columns . So what is doing all those reported LOB reads?
Updates that Sneakily Read Data
We have to look as the Clustered Index Update operator before we see the LOB column in the output list:
[Expr1020]
is a bit flag added by an earlier Compute Scalar. It is set true
if the some_value
column has not been changed (part of the non-updating updates optimization I mentioned earlier).
The Clustered Index Update operator adds two new columns, the lob_data
column, and some_value_OLD
.
The some_value_OLD
column, as the name suggests, is the pre-update value of the some_value
column.
At this point, the clustered index has been updated with the new value, but we haven’t touched the nonclustered index yet.
An interesting observation here is that the Clustered Index Update operator can read a column into the data flow as part of its operation.
SQL Server could have read the LOB data as part of the initial Clustered Index Scan, but that would mean carrying the LOB data through all the operations that occur prior to the Clustered Index Update.
The server knows it will have to go back to the clustered index row to update it, so it delays reading the LOB data until then. Sneaky!
Why the LOB Data Is Needed
This is all very interesting (I hope), but why is SQL Server reading the LOB data? For that matter, why does it need to pass the pre-update value of the some_value
column out of the Clustered Index Update?
The answer relates to the top row of the execution plan for test 3, reproduced here for ease of reference:
This is a wide (per-index) update plan. SQL Server used a narrow (per-row) update plan in test 2, where the Clustered Index Update took care of maintaining the nonclustered index as well. More about this difference shortly.
Split, Sort, Collapse
The Split/Sort/Collapse combination is an optimization, which aims to make per-index update plans more efficient. It does this by:
- Splitting each update into a delete/insert pair
- Sorting the operations by key value and operation type
- Collapsing now-adjacent delete/insert pairs to an update
This process also avoids transient unique key violations:
Imagine we had a unique index which currently holds three rows with the values 1, 2, and 3. If we run a query that adds 1 to each row value, we should end up with values 2, 3, and 4. If we tried to immediately update the value 1 to a 2, it would conflict with the existing value 2 (which would soon be updated to 3) and the query would fail with an error, when it should not.
You might argue that SQL Server could avoid the transient uniqueness violation by starting with the highest value (3) and working down. That’s fine in this case, but it’s not possible to generalize this logic to work with every possible update query, and all possible row processing orders.
Wide updates
SQL Server has to use a wide update plan if it sees any risk of false uniqueness violations. It’s worth noting that the logic SQL Server uses to detect whether these violations are possible has definite limits. As a result, you will often receive a wide update plan, even when you can see that no violations are possible.
Another benefit of this optimization is that it includes a sort on the index key as part of its work. Processing the index changes in index key order promotes sequential I/O on the index, and avoiding touching a page more than once.
Introduced inserts
A side-effect of the above is that the split changes might include one or more inserts.
In order to insert a new row in the index, SQL Server needs all the columns— the key column and the included LOB column. This is the reason SQL Server reads the LOB data as part of the Clustered Index Update—it needs it for an insert.
In addition, the some_value_OLD
column is required by the Split operator (remember it splits updates into delete/insert pairs). In order to generate the correct index key delete operation as part of its work, it needs the old key value.
In this particular case, the Split/Sort/Collapse combination is detrimental to performance, because reading all that LOB data is expensive. There is no way to avoid this with disk-based tables in SQL Server.
Finally, for completeness, I should mention that the Filter operator is there to filter out the non-updating updates.
Beating the Set-Based Update with a Cursor
One situation where SQL Server can see that false unique-key violations aren’t possible is where it can guarantee that only one row is being updated.
Armed with this knowledge, we can write a cursor (or WHILE
-loop equivalent) that updates one row at a time, and so avoids reading the LOB data:
SET NOCOUNT ON;
SET STATISTICS XML, IO, TIME OFF;
DECLARE
@PK integer,
@StartTime datetime;
SET @StartTime = GETUTCDATE();
DECLARE curUpdate CURSOR
LOCAL
FORWARD_ONLY
KEYSET
SCROLL_LOCKS
FOR
SELECT L.pk
FROM dbo.LOBtest AS L
ORDER BY L.pk ASC;
OPEN curUpdate;
WHILE (1 = 1)
BEGIN
FETCH NEXT
FROM curUpdate
INTO @PK;
IF @@FETCH_STATUS = -1 BREAK;
IF @@FETCH_STATUS = -2 CONTINUE;
UPDATE dbo.LOBtest
SET some_value = some_value - 1
WHERE CURRENT OF curUpdate;
END;
CLOSE curUpdate;
DEALLOCATE curUpdate;
SELECT DATEDIFF(MILLISECOND, @StartTime, GETUTCDATE());
That completes the update in 220 milliseconds (remember test 3 took over 16 seconds).
I used the WHERE CURRENT OF
syntax there and a KEYSET
cursor, just for the fun of it. One could just as well use a WHERE
clause that specified the primary key value instead.
Clustered Indexes
A clustered index is the ultimate index with included columns. All non-key columns are included columns in a clustered index.
Let’s re-create the test table and data with an updatable primary key (previously not updatable due to the identity property) and without any non-clustered indexes:
IF OBJECT_ID(N'dbo.LOBtest', N'U') IS NOT NULL
BEGIN
DROP TABLE dbo.LOBtest;
END;
GO
CREATE TABLE dbo.LOBtest
(
pk integer NOT NULL,
some_value integer NULL,
lob_data varchar(max) NULL,
another_column char(5) NULL,
CONSTRAINT [PK dbo.LOBtest pk]
PRIMARY KEY CLUSTERED (pk ASC)
);
GO
DECLARE @Data varchar(max);
SET @Data = REPLICATE(CONVERT(varchar(max), 'x'), 65540);
WITH
Numbers (n) AS
(
SELECT
ROW_NUMBER() OVER (
ORDER BY (SELECT 0))
FROM master.sys.columns AS C1
CROSS JOIN master.sys.columns AS C2
)
INSERT dbo.LOBtest
WITH (TABLOCKX)
(
pk,
some_value,
lob_data
)
SELECT TOP (1000)
N.n,
N.n,
@Data
FROM Numbers AS N
WHERE
N.n <= 1000;
Here’s a query to modify the clustering keys:
UPDATE dbo.LOBtest
SET pk = pk + 1;
The execution plan is:
The Split/Sort/Collapse optimization is present, and we gain an Eager Table Spool for Halloween protection.
SQL Server now has no choice but to read the LOB data in the Clustered Index Scan:
The performance is not great, as you might expect (even though there is no non-clustered index to maintain):
Table 'LOBtest'.
Scan count 1,
logical reads 2011, physical reads 0, read-ahead reads 0,
lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992.
Table 'Worktable'.
Scan count 1,
logical reads 2040, physical reads 0, read-ahead reads 0,
lob logical reads 34000, lob physical reads 0, lob read-ahead reads 8000.
SQL Server Execution Times:
CPU time = 483 ms, elapsed time = 17884 ms.
Notice how the LOB data is read twice; once at the Clustered Index Scan, and again from the work table in tempdb used by the Eager Table Spool.
If you try the same test with a non-unique clustered index (rather than a primary key), you’ll get a much more efficient plan that just passes the clustering key (including uniqueifier) around (no LOB data or other non-key columns):
A unique nonclustered index (on a heap) works well too:
Both updates complete in a few tens of milliseconds, with no LOB reads, and just a few thousand logical reads. In fact the heap is rather more efficient.
There are lots more fun combinations to try that I don’t have space for here.
Final Thoughts
The behaviour shown in this post is not limited to LOB data by any means. If the conditions are met, any unique index that has included columns can produce similar behaviour—something to bear in mind when adding large INCLUDE
columns to achieve covering queries perhaps.
Thanks for reading.