Friday, 11 September 2015

Interesting things about INSTEAD OF triggers

Triggers and Row Versioning

Only DML AFTER triggers use row versioning (in SQL Server 2005 onward) to provide the inserted and deleted pseudo-tables inside a trigger procedure. This point is not clearly made in much of the official documentation. In most places, the documentation simply says that row-versioning is used to build the inserted and deleted tables in triggers without qualification (examples below):

Presumably, the original versions of these entries were written before INSTEAD OF triggers were added to the product, and never updated. Either that, or it is a simple (but repeated) oversight. Anyway, the way row-versioning works with AFTER triggers is quite intuitive. These triggers fire after the modifications in question have been performed so it’s easy to see how maintaining versions of the modified rows enables the database engine to provide the inserted and deleted pseudo-tables. The deleted pseudo-table is constructed from versions of the affected rows before the modifications took place; the inserted pseudo-table is formed from the versions of the affected rows as at the time the trigger procedure started.

INSTEAD OF Triggers

INSTEAD OF triggers are different because this type of DML trigger completely replaces the triggered action. The inserted and deleted pseudo-tables now represent changes that would have been made, had the triggering statement actually executed. Row-versioning cannot be used for these triggers because no modifications have occurred, by definition. So, if not using row versions how does SQL Server do it?

The answer is that SQL Server modifies the execution plan for the triggering DML statement when an INSTEAD OF trigger exists. Rather than modifying the affected tables directly, the execution plan writes information about the changes to a hidden worktable. This worktable contains all the data needed to perform the original changes, the type of modification to perform on each row (delete or insert), as well as any information needed in the trigger for an OUTPUT clause.

Execution plan without a trigger

To see all this in action, we will first run a simple test without an INSTEAD OF trigger present:

CREATE TABLE Test
(
    RowID integer NOT NULL, 
    Data integer NOT NULL,

    CONSTRAINT PK_Test_RowID
    PRIMARY KEY CLUSTERED (RowID)
);
GO
INSERT dbo.Test
    (RowID, Data)
VALUES
    (1, 100),
    (2, 200),
    (3, 300);
GO
DELETE dbo.Test;
GO
DROP TABLE dbo.Test;

The execution plan for the delete is very straightforward:

DELETE plan with no trigger

Each row that qualifies is passed directly to a Clustered Index Delete operator, which deletes it. Easy.

Plan with an `INSTEAD OF` trigger

Now let’s modify the test to include an INSTEAD OF DELETE trigger (one that just performs the same delete action for simplicity):

CREATE TABLE Test
(
    RowID integer NOT NULL, 
    Data integer NOT NULL,

    CONSTRAINT PK_Test_RowID
    PRIMARY KEY CLUSTERED (RowID)
);
GO
INSERT dbo.Test
    (RowID, Data)
VALUES
    (1, 100),
    (2, 200),
    (3, 300);
GO
CREATE TRIGGER dbo_Test_IOD
ON dbo.Test
INSTEAD OF DELETE
AS 
BEGIN
    SET NOCOUNT ON;

    DELETE FROM dbo.Test
    WHERE EXISTS
    (
        SELECT * FROM Deleted
        WHERE Deleted.RowID = dbo.Test.RowID
    );
END;
GO
DELETE dbo.Test;
GO
DROP TABLE dbo.Test;

The execution plan for the DELETE is now quite different:

Delete plan with an instead of trigger

The Clustered Index Delete operator has been replaced by a Clustered Index Insert. This is the insert to the hidden worktable, which is renamed (in the public execution plan representation) to the name of the base table affected by the delete. The renaming occurs when the XML show plan is generated from the internal execution plan representation, so there is no documented way to see the hidden worktable.

As a result of this change, the plan therefore appears to perform an insert to the base table in order to delete rows from it. This is confusing, but it does at least disclose the presence of an INSTEAD OF trigger. Replacing the Insert operator with a Delete might be even more confusing. Perhaps the ideal would be a new graphical icon for an INSTEAD OF trigger worktable? Anyway, it is what it is.

The new Compute Scalar operator defines the type of action performed on each row. This action code is an integer, with the following meanings:

Action Code	Meaning
3	DELETE
4	INSERT
259	DELETE in a MERGE plan
260	INSERT in a MERGE plan

For this query, the action is a constant 3 meaning every row is to be deleted:

Compute Scalar properties

Update actions

As an aside, an INSTEAD OF UPDATE execution plan replaces a single Update operator with two Clustered Index Inserts to the same hidden worktable—one for the inserted pseudo-table rows and one for the deleted pseudo-table rows. An example execution plan:

Instead of Update Plan

A MERGE that performs an UPDATE also produces an execution plan with two inserts to the same base table for similar reasons:

MERGE UPDATE plan

The Trigger Plan

The execution plan for the trigger body also has some interesting features:

Trigger body plan

The first thing to notice is that the graphical icon used for the deleted table is not the same as the icon used in AFTER trigger plans:

Deleted scan showplan operator

The representation in the INSTEAD OF trigger plan is a Clustered Index Seek. The underlying object is the same internal worktable we saw earlier, though here it is named deleted instead of being given the base table name, presumably for some sort of consistency with AFTER triggers.

The seek operation on the deleted table might not be what you were expecting (if you were expecting a seek on RowID):

Deleted Seek

This ‘seek’ returns all rows from the worktable that have an action code of 3 (delete), making it exactly equivalent to the Deleted Scan operator seen in AFTER trigger plans. The same internal worktable is used to hold rows for both inserted and deleted pseudo-tables in INSTEAD OF triggers. The equivalent of an Inserted Scan is a seek on action code 4 (which is possible in a delete trigger, but the result will always be empty). There are no indexes on the internal worktable aside from the non-unique clustered index on the action column alone. In addition, there are no statistics associated with this internal index.

The analysis so far might leave you wondering where the join between the RowID columns is performed. This comparison occurs at the Nested Loops Left Semi Join operator as a residual predicate:

Residual predicate

Now that we know the ‘seek’ is effectively a full scan of the deleted table, the execution plan chosen by the query optimizer seems pretty inefficient. The overall flow of the execution plan is that each row from the Test table is potentially compared with the entire set of deleted rows, which sounds a lot like a cartesian product.

The saving grace is that the join is a semi join, meaning the comparison process stops for a given Test row as soon as the first deleted row satisfies the residual predicate. Nevertheless, the strategy seems a curious one. Perhaps the execution plan would be better if the Test table contained more rows?

Trigger with 1,000 rows

The following script can be used to test the trigger with a larger number of rows. We will start with 1,000:

CREATE TABLE Test
(
    RowID integer NOT NULL, 
    Data integer NOT NULL,

    CONSTRAINT PK_Test_RowID
    PRIMARY KEY CLUSTERED (RowID)
);
GO
SET STATISTICS XML OFF;
SET NOCOUNT ON;
GO
DECLARE @i integer = 1;
WHILE @i <= 1000
BEGIN
    INSERT dbo.Test (RowID, Data)
    VALUES (@i, @i * 100);

    SET @i += 1;
END;
GO
CREATE TRIGGER dbo_Test_IOD 
ON dbo.Test 
INSTEAD OF DELETE 
AS 
BEGIN
    SET NOCOUNT ON;

    DELETE FROM dbo.Test
    WHERE EXISTS
    (
        SELECT * FROM Deleted
        WHERE Deleted.RowID = dbo.Test.RowID
    );
END;
GO
SET STATISTICS XML ON;
GO
DELETE dbo.Test;
GO
DROP TABLE dbo.Test;

The execution plan for the trigger body is now:

1000 row trigger plan

Mentally replacing the (misleading) Clustered Index Seek with a Deleted Scan, the plan looks generally pretty good. The optimizer has chosen a one-to-many Merge Join instead of a Nested Loops Semi Join, which seems reasonable. The Distinct Sort is a curious addition though:

Sort properties

This sort is performing two functions. First, it is providing the merge join with the sorted input it needs, which is fair enough because there is no index on the internal worktable to provide the necessary order. The second thing the sort is doing is to distinct on RowID. This might seem odd, because RowID is the primary key of the base table.

The issue is that rows in the deleted table are simply candidate rows that the original DELETE query identified. Unlike an AFTER trigger, these rows have not been checked for constraint or key violations yet, so the query processor has no guarantee they are in fact unique.

Generally, this is a very important point to bear in mind with INSTEAD OF triggers: there is no guarantee that the rows provided meet any of the constraints on the base table (including NOT NULL). This is not only important for the trigger author to remember; it also limits the simplifications and transformations the query optimizer can perform.

A second issue shown in the Sort properties above, but not highlighted, is that the output estimate is just 32 rows. The internal worktable has no statistics associated with it, so the optimizer guesses at the effect of the Distinct operation. We ‘know’ the RowID values are unique, but without any hard information to go on the optimizer makes a poor guess. This issue will return to haunt us in the next test.

Trigger with 5,000 rows

Now modify the test script to generate 5,000 rows:

CREATE TABLE Test
(
    RowID integer NOT NULL, 
    Data integer NOT NULL,

    CONSTRAINT PK_Test_RowID
    PRIMARY KEY CLUSTERED (RowID)
);
GO
SET STATISTICS XML OFF;
SET NOCOUNT ON;
GO
DECLARE @i integer = 1;
WHILE @i <= 5000
BEGIN
    INSERT dbo.Test (RowID, Data)
    VALUES (@i, @i * 100);

    SET @i += 1;
END;
GO
CREATE TRIGGER dbo_Test_IOD 
ON dbo.Test 
INSTEAD OF DELETE 
AS 
BEGIN
    SET NOCOUNT ON;

    DELETE FROM dbo.Test
    WHERE EXISTS
    (
        SELECT * FROM Deleted
        WHERE Deleted.RowID = dbo.Test.RowID
    );
END;
GO
SET STATISTICS XML ON;
GO
DELETE dbo.Test;
GO
DROP TABLE dbo.Test;

The trigger execution plan is:

5000 row trigger plan

This time the optimizer has decided to split the distinct and sort operations. The distinct on RowID is performed by the Hash Match (Aggregate) operator:

Hash match operator properties

Notice the optimizer’s estimate for the output is 71 rows. In fact, all 5,000 rows survive the distinct because RowID is unique. The inaccurate estimate means that an inadequate fraction of the query memory grant is allocated to the Sort, which ends up spilling to tempdb:

Sort properties

This test has to be performed on SQL Server 2012 or higher in order to see the sort warning in the execution plan. In prior versions, the plan contains no information about spills—a Profiler trace on the Sort Warnings event would be needed to reveal it (and you would need to correlate that back to the source query somehow).

On SQL Server 2014

If the previous test is repeated on SQL Server 2014, in a database set to compatibility level 120 so the new cardinality estimator (CE) is used, the trigger execution plan is different again:

5000 row plan on 2014

In some ways, this execution plan seems like an improvement. The (unnecessary) Distinct Sort is still there, but the overall strategy seems more natural: For each distinct candidate RowID in the deleted table, join to the base table (so verifying that the candidate row actually exists) and then delete it.

Unfortunately, the 2014 plan is based on worse cardinality estimates than we saw in SQL Server 2012. Switching to Plan Explorer to display the estimated row counts shows the problem clearly:

Estimated plan

The optimizer chose a Nested Loops strategy for the join because it expected a very small number of rows on its top input. The first issue occurs at the Clustered Index Seek. The optimizer knows the deleted table contains 5,000 rows at this point as we can see by switching to Plan Tree view and adding the optional Table Cardinality column:

Plan tree view

The ‘old’ cardinality estimator in SQL Server 2012 and earlier is smart enough to know that the ‘seek’ on the internal worktable would return all 5,000 rows (so it chose a merge join). The new CE is not so smart. It sees the worktable as a ‘black box’ and guesses at the effect of the seek on action code = 3:

Black box calculation

The guess of 71 rows (rounded up) is a pretty miserable outcome, but the error is compounded when the new CE estimates the rows for the distinct operation on those 71 rows:

Distinct estimate

Based on the expected 8 rows, the optimizer chooses the Nested Loops strategy. Another way to see these estimation errors is to add the following statement to the trigger body (for test purposes only):

SELECT COUNT_BIG(DISTINCT RowID)
FROM Deleted;

The estimated plan shows the estimation errors:

Estimated plan

The actual plan still shows 5,000 rows of course:

Actual plan

Or you could compare estimate versus actual at the same time in Plan Tree view:

Plan tree view

A million rows…

The poor guess-estimates when using the 2014 cardinality estimator cause the optimizer to select a Nested Loops strategy even when the Test table contains a million rows. The 2014 new CE estimated plan for that test is:

Million row estimated plan

The ‘seek’ estimates 1,000 rows from the known cardinality of 1,000,000 and the distinct estimate is 32 rows. The post-execution plan reveals the effect on the memory reserved for the Hash Match:

Hash Spills

Expecting only 32 rows, the Hash Match gets into real trouble, recursively spilling its hash table before eventually completing.

Final Thoughts

While it is true that a trigger should rarely be written to do something that can be achieved with declarative referential integrity, it is also true that a well-written trigger that uses an efficient execution plan can be comparable in performance to the cost of maintaining an extra nonclustered index.

There are two practical problems with the above statement. First (and with the best will in the world) people don’t always write good trigger code. Second, getting a good execution plan from the query optimizer in all circumstances can be difficult. The nature of triggers is that they are called with a wide range of input cardinalities and data distributions.

Even for AFTER triggers, the lack of indexes and statistics on the deleted and inserted pseudo-tables means plan selection is often based on guesses or poor quality information. Even where a good plan is initially selected, later executions may reuse the same plan when a recompilation would have been a better choice. There are ways to work around the limitations, primarily through the use of temporary tables and explicit indexes/statistics but even there great care is required since triggers are a form of stored procedure.

With INSTEAD OF triggers, the risks can be even greater because the contents of the inserted and deleted tables are unverified candidates. The query optimizer cannot use constraints on the base table to simplify and refine its execution plan. The new cardinality estimator in SQL Server 2014 also represents a real step backwards when it comes to INSTEAD OF trigger plans. Guessing at the effect of a seek operation that the engine introduced itself is a surprising and unwelcome oversight.

Thanks for reading.