Tuesday, 14 December 2010

Beware Sneaky Reads with Unique Indexes

Introduction

I saw a question asked recently on #sqlhelp:

Might SQL Server retrieve (out-of-row) LOB data from a table, even if the column isn’t referenced in the query?

Leaving aside trivial cases like selecting a computed column that does reference the LOB data, one might be tempted to say that no, SQL Server does not read data you haven’t asked for. In general, that is correct; however, there are cases where SQL Server might sneakily read a LOB column.

Example Table

Table schema and data

Here’s a T-SQL script to create that table and populate it with 1,000 rows:

CREATE TABLE dbo.LOBtest 
(
    pk integer IDENTITY NOT NULL, 
    some_value integer NULL, 
    lob_data varchar(max) NULL,
    another_column char(5) NULL,

    CONSTRAINT [PK dbo.LOBtest pk]
        PRIMARY KEY CLUSTERED (pk ASC)
);
GO
DECLARE
    @Data varchar(max);

SET @Data = REPLICATE(CONVERT(varchar(max), 'x'), 65540);

WITH 
    Numbers (n) AS
    (
        SELECT
            ROW_NUMBER() OVER (
                ORDER BY (SELECT 0))
        FROM master.sys.columns C1
        CROSS JOIN master.sys.columns C2
    )
INSERT LOBtest 
WITH (TABLOCKX)
(
    some_value,
    lob_data
)
SELECT TOP (1000)
    N.n, 
    @Data
FROM Numbers AS N
WHERE
    N.n <= 1000;

Test 1: A Simple Update

Let’s run a query to subtract one from every value in the some_value column:

UPDATE dbo.LOBtest
    WITH (TABLOCKX)
SET some_value = some_value - 1;

Modifying this integer column in 1,000 rows doesn’t take very long, or use many resources.

The STATISTICS IO and TIME output shows a total of 9 logical reads, and 25ms elapsed time. The execution plan is:

Update execution plan

The Clustered Index Scan properties show that SQL Server retrieves only the pk and some_value columns during the scan:

Clustered Index Scan properties

The pk column is needed by the Clustered Index Update operator to uniquely identify the row that is being changed.

The some_value column is used by the Compute Scalar to calculate the new value.

In case you are wondering what the Top operator is for, it is used to enforce SET ROWCOUNT in versions of SQL Server before 2012.

Test 2: Update with an Index

Now let’s create a nonclustered index keyed on the some_value column, with lob_data as an included column:

CREATE NONCLUSTERED INDEX 
    [IX dbo.LOBtest some_value (lob_data)]
ON dbo.LOBtest
    (some_value) 
INCLUDE
    (lob_data)
WITH
(
    FILLFACTOR = 100,
    MAXDOP = 1,
    SORT_IN_TEMPDB = ON
);

This is not a useful index for our update query, but imagine someone else created it for a different purpose.

Let’s run our update query again:

UPDATE dbo.LOBtest
    WITH (TABLOCKX)
SET some_value = some_value - 1;

Now this requires 4,014 logical reads and the elapsed query time has increased to around 100ms.

The extra logical reads (4 per row) are an expected consequence of maintaining the nonclustered index.

The execution plan is similar to before:

Update execution plan with nonclustered index

The Clustered Index Update operator picks up the extra work of maintaining the nonclustered index.

The new Compute Scalar operators detect whether the value in the some_value column has been changed by the update.

SQL Server may be able to skip maintaining the nonclustered index if the value hasn’t changed (see my previous post about non-updating updates for details).

Our update query does change the value of the some_data column in every row, so this optimization doesn’t add any value in this case.

The list of output columns from the Clustered Index Scan hasn’t changed from the one shown previously. SQL Server still just reads the pk and some_data columns.

Overall, adding the nonclustered index hasn’t had any startling effects, and the LOB column data still isn’t being read from the table.

Let’s see what happens if we make the nonclustered index unique.

Test 3: Update with a Unique Index

Here’s the script to create a new unique index, and drop the previous one:

CREATE UNIQUE NONCLUSTERED INDEX 
    [UQ dbo.LOBtest some_value (lob_data)]
ON dbo.LOBtest
    (some_value)
INCLUDE
    (lob_data)
WITH
(
    FILLFACTOR = 100,
    MAXDOP = 1,
    SORT_IN_TEMPDB = ON
);
GO
DROP INDEX [IX dbo.LOBtest some_value (lob_data)]
ON dbo.LOBtest;

Remember that SQL Server only enforces uniqueness on index keys (the some_data column here). The lob_data column is simply stored at the leaf-level of the non-clustered index.

With all that in mind, we might expect this change to make very little difference to the update query. Let’s see:

UPDATE dbo.LOBtest
    WITH (TABLOCKX)
SET some_value = some_value - 1;

Now look at the elapsed time and logical reads:

Scan count 1,
logical reads 2016, physical reads 0, read-ahead reads 0, 
lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992.

CPU time = 172 ms, elapsed time = 16172 ms.

Even with all the data and index pages already in memory, the update took over 16 seconds to update just 1,000 rows, performing over 52,000 lob logical reads (nearly 16,000 of those using read-ahead).

Why on earth is SQL Server reading LOB data for a statement that only updates a single integer column?

The Execution Plan

The execution plan for test 3 is a bit more complex than before:

Test 3 update execution plan

The bottom level of the plan is exactly the same as we saw with the non-unique index test. The top level has heaps of new stuff, which I will address shortly.

Now, you might be expecting to find that the Clustered Index Scan is reading the lob_data column (for some reason). After all, we need to explain where all the LOB logical reads are coming from.

Yet when we look at the properties of the Clustered Index Scan, we see exactly the same as before:

Clustered Index Scan properties

SQL Server is still only reading the pk and some_value columns . So what is doing all those reported LOB reads?

Updates that Sneakily Read Data

We have to look as the Clustered Index Update operator before we see the LOB column in the output list:

Clustered Index Update properties

[Expr1020] is a bit flag added by an earlier Compute Scalar. It is set true if the some_value column has not been changed (part of the non-updating updates optimization I mentioned earlier).

The Clustered Index Update operator adds two new columns, the lob_data column, and some_value_OLD.

The some_value_OLD column, as the name suggests, is the pre-update value of the some_value column.

At this point, the clustered index has been updated with the new value, but we haven’t touched the nonclustered index yet.

An interesting observation here is that the Clustered Index Update operator can read a column into the data flow as part of its operation.

SQL Server could have read the LOB data as part of the initial Clustered Index Scan, but that would mean carrying the LOB data through all the operations that occur prior to the Clustered Index Update.

The server knows it will have to go back to the clustered index row to update it, so it delays reading the LOB data until then. Sneaky!

Why the LOB Data Is Needed

This is all very interesting (I hope), but why is SQL Server reading the LOB data? For that matter, why does it need to pass the pre-update value of the some_value column out of the Clustered Index Update?

The answer relates to the top row of the execution plan for test 3, reproduced here for ease of reference:

Test 3 execution plan

This is a wide (per-index) update plan. SQL Server used a narrow (per-row) update plan in test 2, where the Clustered Index Update took care of maintaining the nonclustered index as well. More about this difference shortly.

Split, Sort, Collapse

The Split/Sort/Collapse combination is an optimization, which aims to make per-index update plans more efficient. It does this by:

Splitting each update into a delete/insert pair
Sorting the operations by key value and operation type
Collapsing now-adjacent delete/insert pairs to an update

This process also avoids transient unique key violations:

Imagine we had a unique index which currently holds three rows with the values 1, 2, and 3. If we run a query that adds 1 to each row value, we should end up with values 2, 3, and 4. If we tried to immediately update the value 1 to a 2, it would conflict with the existing value 2 (which would soon be updated to 3) and the query would fail with an error, when it should not.

You might argue that SQL Server could avoid the transient uniqueness violation by starting with the highest value (3) and working down. That’s fine in this case, but it’s not possible to generalize this logic to work with every possible update query, and all possible row processing orders.

Wide updates

SQL Server has to use a wide update plan if it sees any risk of false uniqueness violations. It’s worth noting that the logic SQL Server uses to detect whether these violations are possible has definite limits. As a result, you will often receive a wide update plan, even when you can see that no violations are possible.

Another benefit of this optimization is that it includes a sort on the index key as part of its work. Processing the index changes in index key order promotes sequential I/O on the index, and avoiding touching a page more than once.

Introduced inserts

A side-effect of the above is that the split changes might include one or more inserts.

In order to insert a new row in the index, SQL Server needs all the columns— the key column and the included LOB column. This is the reason SQL Server reads the LOB data as part of the Clustered Index Update—it needs it for an insert.

In addition, the some_value_OLD column is required by the Split operator (remember it splits updates into delete/insert pairs). In order to generate the correct index key delete operation as part of its work, it needs the old key value.

In this particular case, the Split/Sort/Collapse combination is detrimental to performance, because reading all that LOB data is expensive. There is no way to avoid this with disk-based tables in SQL Server.

Finally, for completeness, I should mention that the Filter operator is there to filter out the non-updating updates.

Beating the Set-Based Update with a Cursor

One situation where SQL Server can see that false unique-key violations aren’t possible is where it can guarantee that only one row is being updated.

Armed with this knowledge, we can write a cursor (or WHILE-loop equivalent) that updates one row at a time, and so avoids reading the LOB data:

SET NOCOUNT ON;
SET STATISTICS XML, IO, TIME OFF;

DECLARE
    @PK integer,
    @StartTime datetime;

SET @StartTime = GETUTCDATE();

DECLARE curUpdate CURSOR 
    LOCAL 
    FORWARD_ONLY 
    KEYSET 
    SCROLL_LOCKS
FOR     
    SELECT L.pk
    FROM dbo.LOBtest AS L
    ORDER BY L.pk ASC;

OPEN curUpdate;

WHILE (1 = 1)
BEGIN
    FETCH NEXT
    FROM curUpdate 
    INTO @PK;

    IF @@FETCH_STATUS = -1 BREAK;
    IF @@FETCH_STATUS = -2 CONTINUE;

    UPDATE dbo.LOBtest
    SET some_value = some_value - 1
    WHERE CURRENT OF curUpdate;
END;

CLOSE curUpdate;
DEALLOCATE curUpdate;

SELECT DATEDIFF(MILLISECOND, @StartTime, GETUTCDATE());

That completes the update in 220 milliseconds (remember test 3 took over 16 seconds).

I used the WHERE CURRENT OF syntax there and a KEYSET cursor, just for the fun of it. One could just as well use a WHERE clause that specified the primary key value instead.

Clustered Indexes

A clustered index is the ultimate index with included columns. All non-key columns are included columns in a clustered index.

Let’s re-create the test table and data with an updatable primary key (previously not updatable due to the identity property) and without any non-clustered indexes:

IF OBJECT_ID(N'dbo.LOBtest', N'U') IS NOT NULL
BEGIN
    DROP TABLE dbo.LOBtest;
END;
GO
CREATE TABLE dbo.LOBtest 
(
    pk integer NOT NULL, 
    some_value integer NULL, 
    lob_data varchar(max) NULL,
    another_column char(5) NULL,

    CONSTRAINT [PK dbo.LOBtest pk]
        PRIMARY KEY CLUSTERED (pk ASC)
);
GO
DECLARE @Data varchar(max);

SET @Data = REPLICATE(CONVERT(varchar(max), 'x'), 65540);

WITH
    Numbers (n) AS
    (
        SELECT
            ROW_NUMBER() OVER (
                ORDER BY (SELECT 0))
        FROM master.sys.columns AS C1
        CROSS JOIN master.sys.columns AS C2
    )
INSERT dbo.LOBtest 
WITH (TABLOCKX)
(
    pk,
    some_value,
    lob_data
)
SELECT TOP (1000)
    N.n,
    N.n, 
    @Data
FROM Numbers AS N
WHERE
    N.n <= 1000;

Here’s a query to modify the clustering keys:

UPDATE dbo.LOBtest
SET pk = pk + 1;

The execution plan is:

Updating the primary key

The Split/Sort/Collapse optimization is present, and we gain an Eager Table Spool for Halloween protection.

SQL Server now has no choice but to read the LOB data in the Clustered Index Scan:

Reading LOB data in the scan

The performance is not great, as you might expect (even though there is no non-clustered index to maintain):

Table 'LOBtest'. 

Scan count 1,
logical reads 2011, physical reads 0, read-ahead reads 0, 
lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992.

Table 'Worktable'. 

Scan count 1,
logical reads 2040, physical reads 0, read-ahead reads 0, 
lob logical reads 34000, lob physical reads 0, lob read-ahead reads 8000.

SQL Server Execution Times:
CPU time = 483 ms,  elapsed time = 17884 ms.

Notice how the LOB data is read twice; once at the Clustered Index Scan, and again from the work table in tempdb used by the Eager Table Spool.

If you try the same test with a non-unique clustered index (rather than a primary key), you’ll get a much more efficient plan that just passes the clustering key (including uniqueifier) around (no LOB data or other non-key columns):

Non-unique clustered index

A unique nonclustered index (on a heap) works well too:

Unique nonclustered index on a heap

Both updates complete in a few tens of milliseconds, with no LOB reads, and just a few thousand logical reads. In fact the heap is rather more efficient.

There are lots more fun combinations to try that I don’t have space for here.

Final Thoughts

The behaviour shown in this post is not limited to LOB data by any means. If the conditions are met, any unique index that has included columns can produce similar behaviour—something to bear in mind when adding large INCLUDE columns to achieve covering queries perhaps.

Thanks for reading.