Wednesday, 12 March 2025

What Was the TOP PERCENT Bug and How Was It Fixed?

Bug Demo

Let’s create a table with six rows, two in group A and four in group B:

CREATE TABLE dbo.T
(
    TID integer NOT NULL PRIMARY KEY, 
    GroupID char(1) NOT NULL
);

INSERT dbo.T 
    (TID, GroupID)
VALUES 
    (1, 'A'), 
    (2, 'A'), 
    (3, 'B'), 
    (4, 'B'), 
    (5, 'B'), 
    (6, 'B');

The task is to return half the rows in each group. Any rows will do, so a deterministic query is not required.

A suitable T-SQL expression of the requirement is shown below:

SELECT
    Groups.GroupID, 
    CA.TID, 
    CA.GroupID
FROM 
(
    -- Group list
    VALUES 
        ('A'), 
        ('B')
) AS Groups (GroupID)
CROSS APPLY
(
    -- Half the rows per group
    SELECT TOP (50) PERCENT 
        T.TID, 
        T.GroupID
     FROM dbo.T AS T
     WHERE
        T.GroupID = Groups.GroupID
) AS CA;

-- Tidy up
DROP TABLE dbo.T;

The correct result is to return one of the two rows from group A and two of the four rows from group B.

Results

If you run the demo on SQL Server 2019 or later, you will get correct results.

On SQL Server 2017 and earlier, the output will be wrong.

In both cases, the execution plan is the same in all respects:

Execution plan for both correct and incorrect results Execution plan for both correct and incorrect results

An example of incorrect results obtained on SQL Server 2017 is below:

Incorrect query results on SQL Server 2017 Incorrect query results on SQL Server 2017

Notice that two rows (not one) have been returned from group A and three rows (not two) from group B.

An example of correct results (using SQL Server 2022) is:

Correct query results on SQL Server 2022 Correct query results on SQL Server 2022

The problem

The fault lies with the highlighted Eager Index Spool operator.

The spool performs its normal nonclustered index building function of course, but it is also required to count the total number of rows in the source set so the Top operator can calculate how many rows 50 PERCENT represents.

The issue is that the Eager Index Spool counts six rows (correctly) and the Top operator deduces (also correctly) that three rows is 50% of that count. Which would be fine, but the set is split into two groups (A & B) and the 50% should be computed separately for each.

For correct results, the Top and Eager Index Spool combination should recalculate what 50 PERCENT means whenever the APPLY correlated parameter (Groups.GroupID) changes (a ‘rebind’).

The current plan does not do that, so it tries to produce 50% of six rows (three rows) for each group. That explains why we see two rows from group A (there are only two present) and three rows from group B.

The spool’s single count of six rows was applied uniformly, ignoring the group-specific logic required by the CROSS APPLY.

Side note: The query could easily be made deterministic by the addition of an ORDER BY clause. The demo doesn’t do this to avoid the optimizer choosing a Sort instead of an Eager Index Spool. This cost based decision depends on the small number of rows in the table.

A larger demo table with a deterministic query would also produce an Eager Index Spool and reproduce the issue.

Potential solutions

At first glance, it’s not easy to see how this issue could be fixed.

The Eager Index Spool only ever reads its input once. Unlike an Eager Table Spool, its input can never depend on correlated parameter values. Even if we were to relax that restriction, there’s no guarantee the spool’s subtree could efficiently deliver rows just for the current group—repeated subtree executions with a filter would rather defeat the main purpose of the spool, which is to cache results.

Another idea would be for the spool to remember the count of rows per group, rather than just the overall total. It would return the per group count for the current value(s) of any correlated parameters. This information would have to be stored somewhere, since we generally don’t know ahead of time how many groups there will be at runtime. Storing it in memory isn’t really a sensible option without also making the spool an operator that can request a memory grant, which would be quite a bit of engineering effort.

The other option is to store the row count in the spool’s worktable, but this breaks the general layout of spool worktables and reduces the maximum row size available for user data. It’s also conceptually a little clunky to store a per-group subtotal in a worktable designed to hold detail rows. This type of solution would require extensive testing.

While these options posed challenges, Microsoft ultimately took a simpler approach.

Chosen solution

Microsoft chose to physically count the rows in the worktable that match the current value of any correlated parameters whenever the Top operator requests row count statistics.

While perhaps not the last word in overall efficiency, this does massively simplify the implementation and testing effort. Instead of returning the single stored total row count, the spool simply counts the matching rows. It knows it can do this relatively cheaply because the spool is indexed on the correlated parameter keys.

For those who enjoy a call stack, here is SQL Server 2022 seeking to the first matching row in the worktable in response to a request for statistics from the Top operator:

Call stack showing a seek to find the current group's row count Call stack showing a seek to find the current group’s row count

The spool then calls sqlmin!RowsetNewSS::GetNextRows to count each matching row individually:

Call stack showing the next row being fetched from the worktable Call stack showing the next row being fetched from the worktable

So there you have it. Microsoft chose to fix this bug by having the spool count rows in its worktable whenever there are correlated parameters and it is asked for statistics.

I couldn’t find any reference to this fix in the Cumulative Update documentation, so perhaps it was quietly addressed for SQL Server 2019 RTM.

Note: This bug remains unfixed prior to SQL Server 2019.

Thanks for reading.