The Halloween Problem—Part 3
The MERGE
statement (introduced in SQL Server 2008) allows us to perform a mixture of INSERT
, UPDATE
, and DELETE
operations using a single statement.
The Halloween Protection issues for MERGE
are mostly a combination of the requirements of the individual operations, but there are some important differences and a couple of interesting optimizations that apply only to MERGE
.
Avoiding the Halloween Problem with MERGE
We start by looking again at the Demo and Staging example from part two:
CREATE TABLE dbo.Demo
(
SomeKey integer NOT NULL,
CONSTRAINT PK_Demo
PRIMARY KEY (SomeKey)
);
CREATE TABLE dbo.Staging
(
SomeKey integer NOT NULL
);
INSERT dbo.Staging
(SomeKey)
VALUES
(1234),
(1234);
CREATE NONCLUSTERED INDEX c
ON dbo.Staging (SomeKey);
INSERT dbo.Demo
SELECT s.SomeKey
FROM dbo.Staging AS s
WHERE NOT EXISTS
(
SELECT 1
FROM dbo.Demo AS d
WHERE d.SomeKey = s.SomeKey
);
As you may recall, this example was used to show that an INSERT
requires Halloween Protection when the insert target table is also referenced in the SELECT
part of the query (the EXISTS
clause in this case).
The correct behaviour for the INSERT
statement above is to try to add both 1234 values, and to consequently fail with a PRIMARY KEY
violation. Without phase separation, the INSERT
would incorrectly add one value, completing without an error being thrown.
The INSERT execution plan
The code above has one difference from that used in part two; a nonclustered index on the Staging
table has been added. The INSERT
execution plan still requires Halloween Protection though:
The MERGE execution plan
Now try the same logical insert expressed using MERGE
syntax:
MERGE dbo.Demo AS d
USING dbo.Staging AS s ON
s.SomeKey = d.SomeKey
WHEN NOT MATCHED BY TARGET THEN
INSERT (SomeKey)
VALUES (s.SomeKey);
In case you are not familiar with the syntax, the logic is to compare rows in the Staging
and Demo
tables on the SomeKey
value, and if no matching row is found in the target (Demo
) table, we insert a new row. This has exactly the same semantics as the previous INSERT...WHERE NOT EXISTS
code, of course. The execution plan is quite different however:
Notice the lack of an Eager Table Spool in this plan. Despite that, the query still produces the correct error message. It seems SQL Server has found a way to execute the MERGE
plan iteratively while respecting the logical phase separation required by the SQL standard.
The hole-filling optimization
In the right circumstances, the SQL Server optimizer can recognize that the MERGE
statement is hole-filling, which is just another way of saying that the statement only adds rows where there is an existing gap in the target tableā€™s key.
For this optimization to be applied, the values used in the WHEN NOT MATCHED BY TARGET
clause must exactly match the ON
part of the USING
clause. Also, the target table must have a unique key (a requirement satisfied by the PRIMARY KEY
in the present case).
Where these requirements are met, the MERGE
statement does not require protection from the Halloween Problem.
Of course, the MERGE
statement is logically no more or less hole-filling than the original INSERT...WHERE NOT EXISTS
syntax. The difference is that the optimizer has complete control over implementing the MERGE
statement, whereas the INSERT
syntax would require it to reason about the wider semantics of the query. A human can easily see that the INSERT
is also hole-filling, but the optimizer does not think about things in the same way we do.
To illustrate the exact matching requirement I mentioned, consider the following query syntax, which does not benefit from the hole-filling optimization. The result is full Halloween Protection provided by an Eager Table Spool:
MERGE dbo.Demo AS d
USING dbo.Staging AS s ON
s.SomeKey = d.SomeKey
WHEN NOT MATCHED THEN
INSERT (SomeKey)
VALUES (s.SomeKey * 1);
The only difference there is the multiplication by one in the VALUES
clause ā€“ something which does not change the logic of the query, but which is enough to prevent the hole-filling optimization being applied.
Hole-filling with Nested Loops
In the previous example, the optimizer chose to join the tables using a Merge Join. The hole-filling optimization can also be applied where a Nested Loops Join is chosen, but this requires an extra uniqueness guarantee on the source table, and an index seek on the inner side of the join.
To see this in action, we can clear out the existing staging data, add uniqueness to the nonclustered index, and try the MERGE
again:
-- Remove existing duplicate rows
TRUNCATE TABLE dbo.Staging;
-- Convert index to unique
CREATE UNIQUE NONCLUSTERED INDEX c
ON dbo.Staging (SomeKey)
WITH (DROP_EXISTING = ON);
-- Sample data
INSERT dbo.Staging
(SomeKey)
VALUES
(1234),
(5678);
-- Hole-filling merge
MERGE dbo.Demo AS d
USING dbo.Staging AS s ON
s.SomeKey = d.SomeKey
WHEN NOT MATCHED THEN
INSERT (SomeKey)
VALUES (s.SomeKey);
The resulting execution plan again uses the hole-filling optimization to avoid Halloween Protection, using a nested loops join and an inner-side seek into the target table:
Avoiding unnecessary index traversals
Where the hole-filling optimization applies, the engine may also apply a further optimization:
It can remember the current index position while reading the target table (processing one row at a time, remember) and reuse that information when performing the insert, instead of seeking down the b-tree to find the insert location.
The reasoning is that the current read position is very likely to be on the same page where the new row should be inserted. Checking that the row does in fact belong on this page is very fast, since it involves checking only the lowest and highest keys currently stored there.
The combination of eliminating the Eager Table Spool and saving an index navigation per row can provide a significant benefit in OLTP workloads, provided the execution plan is retrieved from cache.
The compilation cost for MERGE
statements is rather higher than for INSERT
, UPDATE
and DELETE
, so plan reuse is an important consideration. It is also helpful to ensure that pages have sufficient free space to accommodate new rows, avoiding page splits. This is typically achieved through normal index maintenance and the assignment of a suitable FILLFACTOR
.
I mention OLTP workloads, which typically feature a large number of relatively small changes, because the MERGE
optimizations may not be a good choice where a large number of are rows processed per statement. Other optimizations like minimally-logged INSERTs
cannot currently be combined with hole-filling. As always, the performance characteristics should be benchmarked to ensure the expected benefits are realized.
The hole-filling optimization for MERGE
inserts may be combined with updates and deletes using additional MERGE
clauses; each data-changing operation is assessed separately for the Halloween Problem.
Avoiding the join
The final optimization we will look at can be applied where the MERGE
statement contains update and delete operations as well as a hole-filling insert, and the target table has a unique clustered index.
The following example shows a common MERGE
pattern where unmatched rows are inserted, and matching rows are updated or deleted depending on an additional condition:
CREATE TABLE #T
(
col1 integer NOT NULL,
col2 integer NOT NULL,
CONSTRAINT PK_T
PRIMARY KEY (col1)
);
CREATE TABLE #S
(
col1 integer NOT NULL,
col2 integer NOT NULL,
CONSTRAINT PK_S
PRIMARY KEY (col1)
);
INSERT #T
(col1, col2)
VALUES
(1, 50),
(3, 90);
INSERT #S
(col1, col2)
VALUES
(1, 40),
(2, 80),
(3, 90);
The MERGE
statement required to make all the required changes is remarkably compact:
MERGE #T AS T
USING #S AS S ON T.col1 = S.col1
WHEN NOT MATCHED THEN INSERT VALUES (S.col1, S.col2)
WHEN MATCHED AND T.col2 - S.col2 = 0 THEN DELETE
WHEN MATCHED THEN UPDATE SET T.col2 -= S.col2;
The execution plan is quite surprising:
No Halloween Protection, no join between the source and target tables, and itā€™s not often you will see a Clustered Index Insert operator followed by a Clustered Index Merge to the same table. This is another optimization targeted at OLTP workloads with high plan reuse and suitable indexing.
The idea is to read a row from the source table and immediately try to insert it into the target. If a key violation results, the error is suppressed, the Insert operator outputs the conflicting row it found, and that row is then processed for an update or delete operation using the Merge plan operator as normal.
If the original insert succeeds (without a key violation) processing continues with the next row from the source (the Merge operator only processes updates and deletes).
This optimization primarily benefits MERGE
queries where most source rows result in an insert. Again, careful benchmarking is required to ensure performance is better than using separate statements.
Summary
The MERGE
statement provides several unique optimization opportunities.
In the right circumstances, it can avoid the need to add explicit Halloween Protection compared with an equivalent INSERT
operation, or perhaps even a combination of INSERT
, UPDATE
, and DELETE
statements. Additional MERGE
-specific optimizations can avoid the index b-tree traversal that is usually needed to locate the insert position for a new row, and may also avoid the need to join the source and target tables completely.
In the final part of this series, we will look at how the query optimizer reasons about the need for Halloween protection, and identify some more tricks it can employ to avoid the need to add Eager Table Spools to execution plans that change data.