Improved RCSI Ghost Cleanup in SQL Server 2022
Background
The read committed snapshot isolation (RCSI) isolation level provides many benefits. Chief among them:
- Readers wonāt block writers (and vice versa).
- Each statement sees a point-in-time view of the committed database (with exceptions for submodule calls like non-inlined functions).
On the other side of the coin, we have the costs of maintaining the row versions needed by the RCSI implementation.
This is not just a case of making sure tempdb (or the user database if ADR is in use) is large enough and can handle the extra concurrent activity:
- Writers changing existing data must create row versions (some inserts too).
- The system must check and maintain the version store in the background.
- Readers have to locate the correct version of each row.
That last item can cause a significant performance reduction when long version chains are involved. This is somewhat less likely to occur under RCSI than SI because each RCSI statement sees a more recent point in time. You might still encounter issues with long-running RCSI statements like reports, but thatās not the focus of this article.
There is a separate issue with RCSI that I want to cover today.
Demo
It is not unusual to find a frequently updated table in a database using RCSI. This toy example boils the scenario down to one row in a table with a single indexed column:
DROP TABLE IF EXISTS dbo.Test;
GO
CREATE TABLE dbo.Test (id integer PRIMARY KEY CLUSTERED);
INSERT dbo.Test (id) VALUES (0);
GO
CHECKPOINT;
To simulate the effect of a reasonable number of separate modification statements, weāll change the value stored in the row 123,456 times:
SET NOCOUNT, XACT_ABORT ON;
SET STATISTICS XML OFF;
DECLARE
@i integer = 1;
WHILE @i < 123456
BEGIN
BEGIN TRANSACTION;
UPDATE TOP (1)
dbo.Test WITH (TABLOCKX)
SET @i = id = @i + 1;
COMMIT TRANSACTION
WITH (DELAYED_DURABILITY = ON);
END;
Note: The TOP (1)
and TABLOCKX
hints make the updates faster by lowering the locking overhead and avoiding unnecessary Halloween Protection in the execution plan.
Updates under RCSI (without Optimized Locking) acquire update locks when locating qualifying rows to change. The reason the table lock is an optimization will be explained shortly.
Weāre only ever updating one row (over and over again), so the Halloween Protection provided by the Top operator is sufficient to avoid an Eager Table Spool.
Delayed durability is used because weāre not looking to test logging performance here.
Feel free to run the example without those optimizations if you prefer.
The (optimized) estimated execution plan is:
The redundant Top operator is just Halloween Protection doing its thing.
After each run, weāll look at the physical index statistics with this DMV query:
SELECT
[level] = IPS.index_level,
[pages] = IPS.page_count,
[used %] = FORMAT(IPS.avg_page_space_used_in_percent, 'N4'),
[avg size] = IPS.avg_record_size_in_bytes,
[rows] = IPS.record_count,
[ghosts] = IPS.ghost_record_count,
[versions] = IPS.version_record_count,
[ghost versions] = IPS.version_ghost_record_count
FROM sys.dm_db_index_physical_stats
(
DB_ID(),
OBJECT_ID(N'dbo.Test', 'U'),
1, -- index id
NULL, -- partition
'DETAILED'
) AS IPS
ORDER BY
IPS.index_level;
1. SQL Server 2019 Locking Read Committed
This first test runs on SQL Server 2019 CU29-GDR (build 15.0.4410)āthe latest available as of December 2, 2004.
The test database is set to use the simple recovery model and allow delayed durability. It has ADR, SI, and RCSI all disabled:
ALTER DATABASE CURRENT
SET RECOVERY SIMPLE;
ALTER DATABASE CURRENT
SET READ_COMMITTED_SNAPSHOT OFF
WITH ROLLBACK IMMEDIATE;
ALTER DATABASE CURRENT
SET ALLOW_SNAPSHOT_ISOLATION OFF;
ALTER DATABASE CURRENT
SET DELAYED_DURABILITY = ALLOWED;
ALTER DATABASE CURRENT
SET ACCELERATED_DATABASE_RECOVERY = OFF;
This sets a performance baseline using locking read committed. In this configuration, the test runs for roughly 2 seconds.
The DMV query shows a single page in the table containing one row and no ghosts, as expected:
level | pages | used % | avg size | rows | ghosts | versions | ghost versions |
---|---|---|---|---|---|---|---|
0 | 1 | 0.1359 | 11 | 1 | 0 | 0 | 0 |
As a side note, without delayed transaction durability the test runs for 12 seconds.
Ghost story
That might seem like a long time, but weāre changing the key column of the index so the update is performed as a delete followed by an insert. The delete marks the row as a ghost and the insert adds a new row.
The next iteration of the loop will encounter the ghost row during its index scan and add it to the background ghost cleanup taskās queue. Depending on how quickly ghost cleanup proceeds, subsequent iterations might encounter many ghost rows before locating the one non-deleted row.
In this particular test, using a laptop with no other activity, the ghost cleanup task does very wellāthere are no ghosts in the table at the end of the test. This is not always the case even on modern SQL Server versions with their long history of ghost cleanup fixes, as Forrest McDaniel discovered in Fixing Queues with Watermarks.
2. SQL Server 2019 RCSI
Letās now switch from locking read committed to RCSI without changing anything else.
ALTER DATABASE CURRENT
SET READ_COMMITTED_SNAPSHOT ON
WITH ROLLBACK IMMEDIATE;
Running exactly the same test again results in an execution time of 1m 22s. Thatās significantly longer than the 2s using the locking read committed isolation level.
The DMV query shows:
level | pages | used % | avg size | rows | ghosts | versions | ghost versions |
---|---|---|---|---|---|---|---|
0 | 86 | 49.8145 | 25 | 1 | 12301 | 12848 | 547 |
1 | 1 | 13.7880 | 11 | 86 | 0 | 0 | 0 |
Explanation
The big difference this time is that the scan canāt queue ghosts the update encounters for cleanup because their transaction sequence number isnāt older than the current oldest active transaction. This is just the normal way SQL Server decides if it safe to dispose of versions and ghosts or not under row versioning isolation levels.
Note that this would still be the case with an autocommit transaction instead of the explicit one used in the test. An autocommit transaction would also require us to use FORCED
delayed durability because thereās no COMMIT
statement to attach the durability option to. If you try that configuration, youāll find performance is the same. You can save a bit of typing by using implicit transactions, but again, performance will be the same.
Once each update transaction completes, normal ghost and version store cleanup can start removing the deleted rows from the table. Thatās an asynchronous process on a separate system thread, so it will be running at the same time as the next update.
The situation at the end of the test shows an intermidate state after some ghosts have been exorcised but not all of them.
After cleaning
A few seconds later, the DMV query returns:
level | pages | used % | avg size | rows | ghosts | versions | ghost versions |
---|---|---|---|---|---|---|---|
0 | 2 | 0.3089 | 25 | 1 | 1 | 1 | 0 |
1 | 1 | 0.2965 | 11 | 2 | 0 | 0 | 0 |
Notice that the index acquired a second level, meaning the original page split during the test. This happened because the page was full of active version ghosts. The insert part of the update statement had to split the page and add a new index root. This happened many times as we can infer from the previous DMV output showing 86 pages at the leaf level, all roughly 50% full as is typical after a split.
The leaf-level row size has increased from 11 bytes to 25 due to the 14-byte row versioning overhead. Non-leaf levels do not use versioning or ghosting.
The one remaining ghost is on the original index page. SQL Server leaves a single ghost record when it is the root page of the index according to system metadata. The one non-deleted row is on a different page. All the other leaf level index pages have been cleaned up and disposed of since they were empty and not the first metadata page.
Scarier ghost story
The terrible performance under RCSI is explained by the increased number of ghost records each update scan encounters and has to skip over to get to the one visible row.
Not only does SQL Server have to find the row to discover that it is marked as a ghost, it also has to request an update lock on it beforehand. The lock is ultimately not acquired because it is covered by the existing TABLOCKX
, but the lock manager still has to process the request and discover it is redundant due to the already held lock. Without the locking hint, real locks are taken and performance is much worse.
None of the foregoing takes very long individually, but monitoring shows SQL Server was handling around 6 million lock requests per second during the test (and skipping the same number of ghosts).
Remember, update locks are required when locating qualifying rows even under RCSI when Optimized Locking isnāt available or enabled. That is an Azure SQL Database only feature at the time of writing. I expect weāll see it in SQL Server 2025.
Ultimately, the problem is caused by having to request a lock onāand then skipāhundreds of millions of ghosts during the test. Remember, weāre dealing with a one column table with a single row.
Ghost and version cleanup under RCSI canāt start until the end of each transaction and the next update starts almost immediately. That update will encounter version ghosts from the prior run(s) as well as those it creates itself.
But wait, it gets worse.
3. SQL Server 2019 RCSI with an open transaction
This third test is a repeat of the second test with an open transaction. Drop and recreate the test table as usual, but then start a transaction in a separate session connected to the same database:
-- Will be RCSI
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION;
-- Leave this transaction open
BEGIN TRANSACTION;
SELECT COUNT_BIG(*) FROM dbo.AccessMe AS AM;
It is important that the transaction runs under a row versioning isolation level and has truly started (by accessing a user table in the target database). You can use any table in the above code block, even the dbo.Test
table itself.
The important thing is that the transaction appears in the DMV sys.dm_tran
_active_
snapshot
_database
_transactions
. Notice that the transaction does not need to be using snapshot isolationāRCSI will do just fine.
Terrifying ghost story
Running the update test now takes over four minutes. Without the delayed durability optimization, that becomes half an hour. Recall that the locking read committed test required just 12 seconds in those conditions. RCSI without the open transaction took 1m 22s. Weāre only updating one row 123,456 times!
The DMV query returns:
level | pages | used % | avg size | rows | ghosts | versions | ghost versions |
---|---|---|---|---|---|---|---|
0 | 823 | 50.0148 | 25 | 1 | 0 | 123455 | 123455 |
1 | 2 | 66.0675 | 11 | 823 | 0 | 0 | 0 |
2 | 1 | 0.2965 | 11 | 2 | 0 | 0 | 0 |
The index for our one-row table now has three levels, with 823 pages at the leaf. There are 123,455 ghost versionsāone for every update performed. These ghosts canāt be cleaned up until the transaction we left open commits or rolls back. Each ghost has a versioned row in the version store.
Details
Each of the 123,456 individual single row updates had to skip one less ghost row than its iteration number. For example, update number 100,000 had to skip 99,999 ghost rows.
Remember, each skipped row is preceeded by a lock request. The locking canāt be avoided because SERIALIZABLE
transactions can lock ghosted rows. Not honouring a lock could result in a violation of the ACID guarantees.
Hopefully, you can see how inefficient this all is and where the time goes. Wait statistics wonāt help because SQL Server is never waiting for a resourceāitās working as hard as it can requesting locks and skipping ghosts at an impressive rate.
Unless you know to look at performance counters for skipped ghost records, it seems like SQL Server is just taking up to half an hour to update one row 123,456 times, with one core maxed out.
Donāt forget to close the open transaction!
The ghost versions will be cleaned up within a few seconds after you do that.
SQL Server 2022 Improvement
On SQL Server 2022, the locking read committed test completes in 2-3 seconds, as before. The RCSI test runs for about 1m 28s, again about the same as on 2019.
The big difference is with the open transaction. That test now runs for roughly the same time as without the open transactionā1m 32s in the test I ran just now. Thatās much better than over 4 minutes, but still not great.
At the end of the test, the DMV shows that some ghosts were cleaned during the run:
level | pages | used % | avg size | rows | ghosts | versions | ghost versions |
---|---|---|---|---|---|---|---|
0 | 231 | 49.9272 | 25 | 1 | 34201 | 34590 | 389 |
1 | 1 | 37.0768 | 11 | 231 | 0 | 0 | 0 |
A few seconds later, the cleaning is complete:
level | pages | used % | avg size | rows | ghosts | versions | ghost versions |
---|---|---|---|---|---|---|---|
0 | 2 | 0.3089 | 25 | 1 | 1 | 1 | 0 |
1 | 1 | 0.2965 | 11 | 2 | 0 | 0 | 0 |
Analysis
The open RCSI transaction no longer prevents ghost cleanup, which explains the improved performance. Fewer ghosts are encountered, so fewer locks are requested and fewer rows are skipped. Rather than 6 million lock requests per second, the server now settles at around 10,000 reflecting the lower average number of ghost versions encountered.
SQL Server 2022 enables statement-level ghost cleanup under RCSI.
Itās still nowhere near as fast as locking read committed, but itās much better than on 2019. If you ever need to disable this improvement on SQL Server 2022, undocumented global trace flag 7006 serves that purpose.
Important
The improvement in SQL Server 2022 only allows ghosts to be cleaned up earlier when there is a concurrent active RCSI transaction. The version store records are unaffected and will remain until the end of the transaction as before.
This much is at least documented:
Even though
READ COMMITTED
transactions using row versioning provides a transactionally consistent view of the data at a statement level, row versions generated or accessed by this type of transaction are maintained until the transaction completes.
There appears to be a pending improvement to address this, but it is currently hidden behind an undocumented trace flag and isnāt fully functional even with it enabled. Something to watch out for in SQL Server 2025, perhaps.
An open snapshot isolation transaction still prevents early ghost cleanup because that transaction might legitimately need to encounter those ghosts and ghosted versions to provide a view of the committed database as it was when the transaction started. Remember, RCSI only offers a view as of the start of the current statement.
Final Thoughts
You may not rapidly update a single-row table and access it with a locking scan as literally shown in these tests. Nevertheless, ghosts are a fact of life and maintained even for heap tables when a row versioning isolation level is in use.
Scans may be frowned upon, but remember all seeks have a scanning component unless they are equality tests on an enforced key (a āsingleton seekā into a unique index or constraint). The range scan component of a seek may have to skip ghosts to find rows it can logically āseeā.
Row versioning comes with costs, even without long version chains. None of the examples in this article resulted in more than a single entry in each version chain. If that sounds wrong, consider that the update is implemented as a delete followed by an insert. There is a new live row on each iteration.
The convenience of RCSI and SI do not come for free. In some cases, the effects can be dramatic and hard to diagnose unless you already know what youāre looking for.
Itās not currently possible to exclude some tables from row versioning. The setting is per database and all-or-nothing, heavily updated tables and indexed views as well as static ones.
Specifying READCOMMITTEDLOCK
only affects whether shared locks are acquired, it wonāt help you avoid ghosts and ghost versions.
Row versions are always generated and maintained when a row versioning isolation level is enabledāeven if it is never used.
Long-running transactions are always bad news, especially if they are idle and unintentional. SQL Server 2022 can help avoid the worst of the performance impacts shown here, but the baseline RCSI run time (1.5 minutes) was still pretty poor compared with locking read committed (2 seconds). That occurred without any other open transactions.
ADR was disabled for these tests, but the core issues reproduce the same with it enabled. Having ADR makes life more difficult because the performance counters for skipped ghost records and version store size do not function, without an obvious replacement.
Thanks for reading.