LOB logical reads without LOB columns

Spanish Armada

Introduction

Is it possible to see LOB (large object) logical reads from STATISTICS IO output on a table with no LOB columns?

I was asked this question today by someone who had spent a good fraction of their afternoon trying to work out why this was occurring—even going so far as to re-run DBCC CHECKDB to see if corruption was the cause.

The table in question wasn’t particularly pretty. It had grown somewhat organically over time, with new columns being added every so often as the need arose.

Nevertheless, it remained a simple structure with no LOB columns—no text or image, no xml, no max types—nothing aside from ordinary integer, money, varchar, and datetime types.

To add to the air of mystery, not every query that ran against the table would report LOB logical reads—just sometimes—but when it did, the query often took much longer to execute.

Example Setup

I’m not allowed to reproduce the exact structure here, but the following script creates a table that will serve to demonstrate the same effects:

IF OBJECT_ID(N'dbo.Test', N'U') IS NOT NULL
BEGIN
    DROP TABLE dbo.Test;
END;
GO
CREATE TABLE dbo.Test
(
    row_id numeric IDENTITY NOT NULL,
    col01 nvarchar(450) NOT NULL,
    col02 nvarchar(450) NOT NULL,
    col03 nvarchar(450) NOT NULL,
    col04 nvarchar(450) NOT NULL,
    col05 nvarchar(450) NOT NULL,
    col06 nvarchar(450) NOT NULL,
    col07 nvarchar(450) NOT NULL,
    col08 nvarchar(450) NOT NULL,
    col09 nvarchar(450) NOT NULL,
    col10 nvarchar(450) NOT NULL,

    CONSTRAINT [PK dbo.Test row_id]
        PRIMARY KEY CLUSTERED (row_id)
);

The next script loads the ten variable-length character columns with one-character strings in the first row, two-character strings in the second row, and so on down to the 450th row:

WITH
    Numbers AS
    (
        -- Generates numbers 1 - 450 inclusive
        SELECT TOP (450)
            n = ROW_NUMBER() OVER (
                ORDER BY (SELECT 0))
        FROM master.sys.columns AS C1
        CROSS JOIN master.sys.columns AS C2
        CROSS JOIN master.sys.columns AS C3
        ORDER BY
            n ASC
        )
INSERT  dbo.Test
    WITH (TABLOCKX)
SELECT
    REPLICATE(N'A', N.n), REPLICATE(N'B', N.n),
    REPLICATE(N'C', N.n), REPLICATE(N'D', N.n),
    REPLICATE(N'E', N.n), REPLICATE(N'F', N.n),
    REPLICATE(N'G', N.n), REPLICATE(N'H', N.n),
    REPLICATE(N'I', N.n), REPLICATE(N'J', N.n)
FROM Numbers AS N
ORDER BY
    N.n ASC;

Once those two scripts have run, the table contains 450 rows and 10 columns of data like this:

Illustration of table data

Queries

Most of the time, when we query data from this table, we don’t see any LOB logical reads, for example:

-- Find the maximum length of the data in 
-- column 5 for a range of rows
SELECT
    result = MAX(DATALENGTH(T.col05))
FROM dbo.Test AS T
WHERE
    T.row_id BETWEEN 50 AND 100;

The execution plan and output is shown below:

Plan, output, and I/O statistics

But with a different query


-- Read all the data in column 1
SELECT
    result = MAX(DATALENGTH(T.col01))
FROM dbo.Test AS T;

LOB reads for column 1


suddenly we have 49 lob logical reads, as well as the ‘normal’ logical reads we would expect.

The Explanation

If we had tried to create this table in SQL Server 2000, we would have received a warning message to say that future INSERT or UPDATE operations on the table might fail if the resulting row exceeded the in-row storage limit of 8060 bytes.

If we needed to store more data than would fit in an 8060 byte row (including internal overhead) we had to use a LOB column — text, ntext, or image.

These special data types store the large data values in a separate structure, with just a pointer left in the original row.

Row Overflow

SQL Server 2005 introduced a feature called row overflow, which allows one or more variable-length columns in a row to move to off-row storage if the data in a particular row would otherwise exceed 8060 bytes.

You no longer receive a warning when creating (or altering) a table that might need more than 8060 bytes of in-row storage. If SQL Server finds that it can no longer fit a variable-length column in a particular row, it will silently move one or more of these columns off the row into a separate allocation unit.

Only variable-length columns can be moved in this way (for example the (n)varchar, varbinary, and sql_variant types). Fixed-length columns (like integer and datetime for example) never move into ‘row overflow’ storage.

The decision to move a column off-row is done on a row-by-row basis. Data in a particular column might be stored in-row for some table records, and off-row for others.

If SQL Server finds that it needs to move a column into row-overflow storage, it moves the largest variable-length column record for that row. In the case of an UPDATE statement that results in the 8060 byte limit being exceeded, it might not be the column that grew that is moved!

Sneaky LOBs

Anyway, that’s all very interesting but I don’t want to get too carried away with the intricacies of row-overflow storage internals.

The point is that it is now possible to define a table with non-LOB columns that will silently exceed the old row-size limit and result in ordinary variable-length columns being moved to off-row storage.

Adding new columns to a table, expanding an existing column definition, or simply storing more data in a column than you used to — all these things can result in one or more variable-length columns being moved off row.

Row-overflow storage is logically quite different from old-style LOB and new-style max data type storage. Individual variable-length columns are still limited to 8000 bytes each, you can just have more of them now.

Having said that, the physical mechanisms involved are very similar to full LOB storage. A column moved to row-overflow leaves a 24-byte pointer record in the row, and the ‘separate storage’ I have been talking about is structured very similarly to both old-style LOBs and new-style max types.

The disadvantages are also the same. When SQL Server needs a row-overflow column value it needs to follow the in-row pointer and navigate another chain of pages, just like retrieving a traditional LOB.

Final Thoughts

In the example script presented above, the rows with row_idvalues from 402 to 450 inclusive all exceed the total in-row storage limit of 8060 bytes.

A SELECT that references a column in one of those rows that has moved to off-row storage will incur one or more lob logical reads as the storage engine locates the data.

The results on your system might vary slightly depending on your settings, but in my tests only column 1 in rows 402-450 moved off-row.

You might like to play around with the script—updating columns, changing data type lengths, and so on—to see the effect on lob logical reads and which columns get moved when.

You might even see row-overflow columns moving back in-row if they are updated to be smaller (hint: reduce the size of a column entry by at least 1000 bytes if you hope to see this).

Be aware that SQL Server will not warn you when it moves ‘ordinary’ variable-length columns into overflow storage, and it can have dramatic effects on performance.

It makes more sense than ever to choose column data types sensibly. If you make every column a varchar(8000) or nvarchar(4000), and someone stores data that results in a row needing more than 8060 bytes, SQL Server might turn some of your column data into pseudo-LOBs – all without saying a word.

Finally, some people make a distinction between ordinary LOBs (those that can hold up to 2GB of data) and the LOB-like structures created by row-overflow (where columns are still limited to 8000 bytes) by referring to row-overflow LOBs as SLOBs.

I find that quite appealing, but the ‘S’ stands for ‘small’, which makes expanding the whole acronym a little odd: ‘small large objects’.

Thanks for reading.