Wednesday, 18 December 2013

Snapshot Isolation in SQL Server

Snapshot Isolation can be used to reduce deadlocks in your SQL Server. It works by marking records with a Row Versioning identifier which is stored in TempDB. When you query a database, the database returns all the data as a consistent dataset but still allows data updates while the query is in progress. Any subsequent queries will return the newly updated record, even if the previous query had yet to complete.

Snapshot Isolation is best used with a second setting which changes the Snapshot Isolation to Read Committed Snapshot. This tells SQL Server to observe the row versioning indicator when making a query.

Without Snapshot Isolation, you may see this behaviour:

Large query 1 begins
Small update begins
Small query 2 begins
[pause until query 1 completes]
Large query 1 ends
Small update ends
Small query 2 ends

The first large query is holding a lock on all the records it is trying to return until the query is complete. This stops the update occurring, which in turn stops the second query executing.  Only when that lock from Query 1 is released is SQL able to perform the update, and only once the update is complete can query 2 execute. Of course I could have used a (NOLOCK) table hint in both queries, which would have allowed both the update and second query to run straightaway, but there are two issues with this. Firstly NOLOCK is deprecated in SQL Server 2012, and secondly in both queries it would have been pot luck whether I got the version of the record pre or post update. In the example above, a particularly large query could have caused the update to be chosen as a deadlock victim and resulted in a failed update.

With Snapshot Isolation, the row versioning would have allowed the update to progress, and the subsequent query to progress also. The example above would now behave like this:

Large query 1 begins
Small update begins – notes that row is in use and marks the update accordingly.
Small query 2 begins
Small update ends
Small query 2 ends – returning post-update value
[pause until query 1 completes]
Large query 1 ends – returning pre-update value
Pre-update value is now removed completely

Of course this could still be a slightly confusing result: the first query to complete returns the new value, and the second query to complete contains the old value! It is important to remember that the result of your query that you see is the result as the database was when then query started, not when it completed.

Because one of the databases I look after is very heavily queried, we were seeing a lot of issues with the inserts and updates of records. We were constantly seeing errors in our CRM logs reporting deadlocks, and the performance of some of the inserts was poor. On changing the snapshot isolation settings, the difference was a 500x improvement in the average time to insert a record into one of our core tables, and our deadlocks vanished overnight. Inserts and deletions benefit particularly from this technology, as updates may affect one of two data pages and create a local lock, but an insert and associated index reorganisation can end up with many more data pages being affected, and therefore many more pages requiring a lock to progress.

The penalty, if you can call it that,  of changing the implementationwas that we found our TempDB usage went up from almost zero in size to around 1GB. The technique heavily uses TempDB, so before you enable it make sure that your TempDB is on a decent disk and there is plenty of memory available otherwise you may have just transferred the bottleneck from one place to another.

Snapshot Isolation causes the underlying database structure to be altered as a few extra bytes are added to each row to hold a version indicator. Unfortunately if your database is mirrored, you must break the mirror to make the change and recreate it as the change is not supported by mirroring. Also you must kill all the user sessions on the database. If the database is used by Dynamics, that will mean stopping IIS too on the CRM Web Servers to prevent connections re-establishing themselves.

To make the change, run the following commands:

ALTER DATABASE YOURDB
SET READ_COMMITTED_SNAPSHOT ON;
GO
ALTER DATABASE YOURDB
SET ALLOW_SNAPSHOT_ISOLATION ON
GO


I struggle to see any downsides to changing the database this way. The performance gains we saw were impressive, and the price of a little extra activity in TempDB was a very small price to pay. Of course, there may be reasons why your transactions must be kept atomic, but for every database I create in the future, I will be using Snapshot Isolation.

Max degree of Parallelism in SQL Server

I was asked to review the Max Degree of Parallelism (MDOP) settings for one of our CRM database servers. Reading some of the web blogs, there appeared to be conflicting information out there regarding the optimum settings should be. Some have suggested that you should leave the default setting, some suggest reducing it 1 for servers serving high numbers of connections.

First lets describe what this parameter actually does.

The Max Degree of Parallelism dictates how many CPU cores SQL Server can dedicate to any single query. SQL Server is able to process parts of a query in parallel and dedicate multiple cores of CPU to the processing of that query. This parallel approach can be used when the query is retrieving data from multiple sources or to help process a particular part of a query. Part of that query’s execution time is to calculate how many cores should be used to process the query, so part of the motivation behind setting this to a value of 1 is that this portion of the query execution can be bypassed as SQL knows only one core should be used.

The MDOP setting is adjusted with the following command, and is applied server-wide immediately:

sp_configure 'show advanced options', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
sp_configure 'max degree of parallelism', 8;
GO
RECONFIGURE WITH OVERRIDE;
GO

As part of my investigation I used Profiler to extract a particularly long-running query on our database. The query was extracting all of the Active records of an entity, and sort them by a timestamp field. Because the entity was a CRM entity, the data was spread across the two CRM tables: the Base table where the entity management takes place and the ExtensionBase table where the user-defined data is stored, including my timestamp field. So SQL was generating a query which was essentially this:

SELECT [columns]
FROM [base table] B
INNER JOIN [extension table] E ON B.Id = E.Id
WHERE B.StateCode = 0 AND B.DeletionStateCode = 0
ORDER BY E.[Timestamp field]

Isolating the query and running it in Query Analyzer, I could see that SQL was querying the data from the Base table and Extension table, merging them and sorting the results. According to the execution plan, the results sort was taking 77% of the execution time. By varying the MDOP setting from 1 to 8, the query execution time varied from 24 seconds (MDOP=1), 6 seconds (MDOP=4) and 4.7 seconds (MDOP=8). It was clear then that the Sort part of the query was able to make use of the additional cores to return a quicker result.

What you then have to weigh up is that if you have several long-running queries running simultaneously and using many cores, there may be very limited CPU available for other operations.

Setting a value of 0 allows SQL to allocate all cores to a single query. Set it to this value if your database is used sporadically, but when you do query the database it is a large query and you want the results as soon as possible.


So in conclusion, reducing the MDOP to a very low value could result in long execution times, setting it to a high value (or zero) will throttle the number of queries that can execute at the same time. Depending on the number of cores available and how many parallel tasks you are trying to perform, you should set the MDOP to a compromise between the two. Setting it to a very low value can have a negative impact on the execution time of your query. If you aren’t sure what you are doing, best to leave it to the default setting but there may be good reason to set play with different settings, if the server isn’t performing as you want it to.