Monday, 10 December 2012

MERGE Bug with Filtered Indexes

A MERGE statement can fail, and incorrectly report a unique key violation when:

The target table uses a unique filtered index; and
No key column of the filtered index is updated; and
A column from the filtering condition is updated; and
Transient key violations are possible

Reproduction script

Say we have two tables, one that is the target of a MERGE statement, and another that contains updates to be applied to the target.

The target table contains three columns, an integer primary key, a single character alternate key, and a status code column.

A filtered unique index exists on the alternate key, which is only enforced when the status code is ‘a’:

CREATE TABLE #Target 
(
    pk integer NOT NULL, 
    ak character(1) NOT NULL,
    status_code character(1) NOT NULL,

    PRIMARY KEY (pk)
);

CREATE UNIQUE INDEX uq1
ON #Target
    (ak)
INCLUDE 
    (status_code)
WHERE 
    status_code = 'a';

The “changes” table contains an integer primary key (to identify the target row to change) and the new status code:

CREATE TABLE #Changes
(
    pk integer NOT NULL,
    status_code character(1) NOT NULL,

    PRIMARY KEY (pk)
);

Sample data for the example is:

INSERT #Target 
    (pk, ak, status_code)
VALUES 
    (1, 'A', 'a'),
    (2, 'B', 'a'),
    (3, 'C', 'a'),
    (4, 'A', 'd');

INSERT #Changes
    (pk, status_code)
VALUES
    (1, 'd'),
    (4, 'a');

Visually:

        #Target                   #Changes
╔════╦════╦═════════════╗   ╔════╦═════════════╗  
║ pk ║ ak ║ status_code ║   ║ pk ║ status_code ║  
╠════╬════╬═════════════╣   ╠════╬═════════════╣  
║  1 ║  A ║      a      ║   ║  1 ║      d      ║  
║  2 ║  B ║      a      ║   ║  4 ║      a      ║  
║  3 ║  C ║      a      ║   ╚════╩═════════════╝  
║  4 ║  A ║      d      ║  
╚════╩════╩═════════════╝

Remember, the alternate key (ak) column in the target table is unique, for rows where status_code = ‘a’.

Applying the changes to the target will modify:

row 1 from status ‘a’ to status ‘d’
row 4 from status ‘d’ to status ‘a’.

The result of applying all the changes will still satisfy the filtered unique index, because:

The ‘A’ in row 1 will be deleted from the filtered index (status_code changes to ‘d’)
The ‘A’ in row 4 will be added to the filtered index (status_code changes to ‘a’).

Note that there is deliberately a risk of a transient key violation here.

`MERGE` Test One

We now execute a MERGE statement to apply these changes:

MERGE #Target AS T
USING #Changes AS C
    ON C.pk = T.pk
WHEN MATCHED AND C.status_code <> T.status_code 
    THEN UPDATE SET status_code = C.status_code;

The MERGE changes the two target rows as expected.

The updated target table now contains:

╔════╦════╦═════════════╗
║ pk ║ ak ║ status_code ║
╠════╬════╬═════════════╣
║  1 ║  A ║      d      ║ <— changed from ‘a’  
║  2 ║  B ║      a      ║
║  3 ║  C ║      a      ║
║  4 ║  A ║      a      ║ <— changed from ‘d’  
╚════╩════╩═════════════╝

`MERGE` Test Two

Now let’s repopulate the #Changes table to reverse the updates we just performed:

TRUNCATE TABLE #Changes;

INSERT #Changes
    (pk, status_code)
VALUES
    (1, 'a'),
    (4, 'd');

This will change row 1 back to status ‘a’ and row 4 back to status ‘d’.

Before these changes, the tables contain:

        #Target                   #Changes
╔════╦════╦═════════════╗   ╔════╦═════════════╗  
║ pk ║ ak ║ status_code ║   ║ pk ║ status_code ║  
╠════╬════╬═════════════╣   ╠════╬═════════════╣  
║  1 ║  A ║      d      ║   ║  1 ║      a      ║  
║  2 ║  B ║      a      ║   ║  4 ║      d      ║  
║  3 ║  C ║      a      ║   ╚════╩═════════════╝  
║  4 ║  A ║      a      ║  
╚════╩════╩═════════════╝

We now execute the same MERGE statement as before:

MERGE #Target AS T
USING #Changes AS C
    ON C.pk = T.pk
WHEN MATCHED AND C.status_code <> T.status_code 
    THEN UPDATE SET status_code = C.status_code;

This time we receive the error message:

Msg 2601, Level 14, State 1
Cannot insert duplicate key row in object 'dbo.#Target' with unique index 'uq1'. The duplicate key value is (A).
The statement has been terminated.

Applying the changes using `UPDATE`

We will now rewrite the MERGE as an UPDATE:

UPDATE T
SET status_code = C.status_code
FROM #Target AS T
JOIN #Changes AS C ON
    T.pk = C.pk
WHERE
    C.status_code <> T.status_code;

This statement succeeds where the MERGE failed. The two rows are updated as expected:

╔════╦════╦═════════════╗  
║ pk ║ ak ║ status_code ║  
╠════╬════╬═════════════╣  
║  1 ║  A ║      a      ║ <— changed back to ‘a’  
║  2 ║  B ║      a      ║  
║  3 ║  C ║      a      ║  
║  4 ║  A ║      d      ║ <— changed back to ‘d’  
╚════╩════╩═════════════╝

What went wrong with `MERGE`?

Execution of the MERGE statement happens to apply the changes in the order of the pk column.

In test one, this was not a problem Row 1 was removed from the unique filtered index by changing status_code from ‘a’ to ‘d’ before row 4 was added to it. At no point does the table contain two rows where ak = ‘A’ and status_code = ‘a’.

In test two, the first operation changed row 1 from status ‘d’ to status ‘a’. This meant there were transiently two rows in the filtered unique index where ak = ‘A’ (both row 1 and row 4 meet the index filtering criteria ‘status_code = a’).

The storage engine does not allow even transient unique key violations (unless IGNORE_DUP_KEY is on, but that is a different story, and is ignored by MERGE in any case).

This strict rule applies regardless of the fact that after all changes were applied, there would be no unique key violation (row 4 would eventually be changed from ‘a’ to ‘d’, removing it from the filtered unique index, and resolving the key violation).

Why it went wrong

The query optimizer usually detects when a transient uniqueness violation could occur, and builds a plan that avoids the issue.

I wrote about this previously in Beware Sneaky Reads with Unique Indexes (you can read more about the details on pages 495-497 of the Microsoft SQL Server 2008 Internals book, and in Craig Freedman’s post Maintaining Unique Indexes).

To summarize: The optimizer introduces Split, Filter, Sort, and Collapse operators into the query plan to:

Split each update into a delete followed by an insert
Filter out rows that would not change the index (due to the filter on the index, or a non-updating update)
Sort the resulting stream by index key, with deletes before inserts
Collapse surviving delete/insert pairs on the same index key back into an update

The effect of all this is that only net changes are applied to an index (as one or more insert, update, and/or delete operations).

In this case, the net effect is a single update of the filtered unique index: Changing the row for ak = ‘A’ from pk = 4 to pk = 1. In case that is less than 100% clear, let’s look at the operation in test two again:

#Target table             #Changes table       Result
╔════╦════╦═════════════╗ ╔════╦═════════════╗ ╔════╦════╦═════════════╗  
║ pk ║ ak ║ status_code ║ ║ pk ║ status_code ║ ║ pk ║ ak ║ status_code ║  
╠════╬════╬═════════════╣ ╠════╬═════════════╣ ╠════╬════╬═════════════╣  
║  1 ║  A ║      d      ║ ║  1 ║      d      ║ ║  1 ║  A ║      a      ║  
║  2 ║  B ║      a      ║ ║  4 ║      a      ║ ║  2 ║  B ║      a      ║  
║  3 ║  C ║      a      ║ ╚════╩═════════════╝ ║  3 ║  C ║      a      ║  
║  4 ║  A ║      a      ║                      ║  4 ║  A ║      d      ║  
╚════╩════╩═════════════╝                      ╚════╩════╩═════════════╝

From the filtered index’s point of view (filtered for status_code = ‘a’ and shown in nonclustered index key order) the overall effect of the query is:

Before        After
╔════╦════╗   ╔════╦════╗  
║ pk ║ ak ║   ║ pk ║ ak ║  
╠════╬════╣   ╠════╬════╣  
║  4 ║  A ║   ║  1 ║  A ║  <- the only change
║  2 ║  B ║   ║  2 ║  B ║  
║  3 ║  C ║   ║  3 ║  C ║  
╚════╩════╝   ╚════╩════╝

The single net change there is a change of pk from 4 to 1 for the nonclustered index entry ak = ‘A’.

This is the magic performed by the Split, Sort, and Collapse. Notice in particular how the original changes to the index key (on the ak column) have been transformed into an update of a non-key column (pk is included in the nonclustered index).

By not updating any nonclustered index keys, we are guaranteed to avoid transient key violations.

The Execution Plans

The MERGE execution plan that produces the incorrect key-violation error looks like this:

Bad MERGE plan

The successful UPDATE execution plan is:

Good update plan

The MERGE execution plan is a narrow (per-row) update. The Clustered Index Merge operator maintains both the clustered index and the filtered nonclustered index.

The UPDATE plan is a wide (per-index) update. The clustered index is maintained first, then the Split, Filter, Sort, and Collapse sequence is applied, before the nonclustered index is updated as necessary.

There is always a wide update plan for any (non-Hekaton) statement that modifies the database. The narrow form is a performance optimization where the number of rows is expected to be relatively small, and is not available for all operations.

One of the operations that should disallow a narrow plan is maintaining a unique index where transient key violations could occur.

Workarounds

The MERGE can be made to work (producing a wide update plan with Split, Sort, and Collapse) by:

Adding all columns referenced in the filtered index’s WHERE clause to the index key (INCLUDE is not sufficient); or
Executing the query with trace flag 8790 set e.g. OPTION (QUERYTRACEON 8790).

Undocumented trace flag 8790 forces a wide update plan for any data-changing statement (remember that a wide update plan is always possible). The flag is ignored for Hekaton updates.

Either change will produce a successfully-executing wide update plan for the MERGE that failed.

Conclusion

The optimizer fails to spot the possibility of transient unique key violations with MERGE under the conditions listed at the start of this post.

It incorrectly chooses a narrow update plan for the MERGE, which cannot provide the protection of a Split/Sort/Collapse sequence for the nonclustered index maintenance.

The MERGE plan may fail at execution time depending on the order in which rows are processed, and the distribution of data in the database. Worse, a previously solid MERGE query may suddenly start to fail unpredictably if a filtered unique index is added to the merge target table at any point.

Web archive of the Connect bug I filed: MERGE Incorrectly Reports Unique Key Violations

This was closed as Won’t Fix and not copied across to the new Azure Feedback site. Even so, it was fixed at some point:

Bug reproduces on:

SQL Server 2014 (SP3-CU2) build 12.0.6214.1
SQL Server 2012 (SP4-GDR) build 11.0.7462.6
SQL Server 2008 R2 (SP3-GDR) build 10.50.6560.0
SQL Server 2008 (SP3 CU8) build 10.0.5828

Fixed in:

SQL Server 2016 (SP2-CU5) build 13.0.5264.1
SQL Server 2017 (RTM-CU13-OD) build 14.0.3049.1
SQL Server 2019 (CTP2.2) build 15.0.1200.24

Thanks for reading.