The SQL Servant

Wednesday, 18 December 2013

Snapshot Isolation in SQL Server

Snapshot Isolation can be used to reduce deadlocks in your SQL Server. It works by marking records with a Row Versioning identifier which is stored in TempDB. When you query a database, the database returns all the data as a consistent dataset but still allows data updates while the query is in progress. Any subsequent queries will return the newly updated record, even if the previous query had yet to complete.

Snapshot Isolation is best used with a second setting which changes the Snapshot Isolation to Read Committed Snapshot. This tells SQL Server to observe the row versioning indicator when making a query.

Without Snapshot Isolation, you may see this behaviour:

Large query 1 begins

Small update begins

Small query 2 begins

[pause until query 1 completes]

Large query 1 ends

Small update ends

Small query 2 ends

The first large query is holding a lock on all the records it is trying to return until the query is complete. This stops the update occurring, which in turn stops the second query executing. Only when that lock from Query 1 is released is SQL able to perform the update, and only once the update is complete can query 2 execute. Of course I could have used a (NOLOCK) table hint in both queries, which would have allowed both the update and second query to run straightaway, but there are two issues with this. Firstly NOLOCK is deprecated in SQL Server 2012, and secondly in both queries it would have been pot luck whether I got the version of the record pre or post update. In the example above, a particularly large query could have caused the update to be chosen as a deadlock victim and resulted in a failed update.

With Snapshot Isolation, the row versioning would have allowed the update to progress, and the subsequent query to progress also. The example above would now behave like this:

Large query 1 begins

Small update begins – notes that row is in use and marks the update accordingly.

Small query 2 begins

Small update ends

Small query 2 ends – returning post-update value

[pause until query 1 completes]

Large query 1 ends – returning pre-update value

Pre-update value is now removed completely

Of course this could still be a slightly confusing result: the first query to complete returns the new value, and the second query to complete contains the old value! It is important to remember that the result of your query that you see is the result as the database was when then query started, not when it completed.

Because one of the databases I look after is very heavily queried, we were seeing a lot of issues with the inserts and updates of records. We were constantly seeing errors in our CRM logs reporting deadlocks, and the performance of some of the inserts was poor. On changing the snapshot isolation settings, the difference was a 500x improvement in the average time to insert a record into one of our core tables, and our deadlocks vanished overnight. Inserts and deletions benefit particularly from this technology, as updates may affect one of two data pages and create a local lock, but an insert and associated index reorganisation can end up with many more data pages being affected, and therefore many more pages requiring a lock to progress.

The penalty, if you can call it that, of changing the implementationwas that we found our TempDB usage went up from almost zero in size to around 1GB. The technique heavily uses TempDB, so before you enable it make sure that your TempDB is on a decent disk and there is plenty of memory available otherwise you may have just transferred the bottleneck from one place to another.

Snapshot Isolation causes the underlying database structure to be altered as a few extra bytes are added to each row to hold a version indicator. Unfortunately if your database is mirrored, you must break the mirror to make the change and recreate it as the change is not supported by mirroring. Also you must kill all the user sessions on the database. If the database is used by Dynamics, that will mean stopping IIS too on the CRM Web Servers to prevent connections re-establishing themselves.

To make the change, run the following commands:

ALTER DATABASE YOURDB

SET READ_COMMITTED_SNAPSHOT ON;

ALTER DATABASE YOURDB

SET ALLOW_SNAPSHOT_ISOLATION ON

I struggle to see any downsides to changing the database this way. The performance gains we saw were impressive, and the price of a little extra activity in TempDB was a very small price to pay. Of course, there may be reasons why your transactions must be kept atomic, but for every database I create in the future, I will be using Snapshot Isolation.

Max degree of Parallelism in SQL Server

I was asked to review the Max Degree of Parallelism (MDOP) settings for one of our CRM database servers. Reading some of the web blogs, there appeared to be conflicting information out there regarding the optimum settings should be. Some have suggested that you should leave the default setting, some suggest reducing it 1 for servers serving high numbers of connections.

First lets describe what this parameter actually does.

The Max Degree of Parallelism dictates how many CPU cores SQL Server can dedicate to any single query. SQL Server is able to process parts of a query in parallel and dedicate multiple cores of CPU to the processing of that query. This parallel approach can be used when the query is retrieving data from multiple sources or to help process a particular part of a query. Part of that query’s execution time is to calculate how many cores should be used to process the query, so part of the motivation behind setting this to a value of 1 is that this portion of the query execution can be bypassed as SQL knows only one core should be used.

The MDOP setting is adjusted with the following command, and is applied server-wide immediately:

sp_configure 'show advanced options', 1;

RECONFIGURE WITH OVERRIDE;

sp_configure 'max degree of parallelism', 8;

RECONFIGURE WITH OVERRIDE;

As part of my investigation I used Profiler to extract a particularly long-running query on our database. The query was extracting all of the Active records of an entity, and sort them by a timestamp field. Because the entity was a CRM entity, the data was spread across the two CRM tables: the Base table where the entity management takes place and the ExtensionBase table where the user-defined data is stored, including my timestamp field. So SQL was generating a query which was essentially this:

SELECT [columns]

FROM [base table] B

INNER JOIN [extension table] E ON B.Id = E.Id

WHERE B.StateCode = 0 AND B.DeletionStateCode = 0

ORDER BY E.[Timestamp field]

Isolating the query and running it in Query Analyzer, I could see that SQL was querying the data from the Base table and Extension table, merging them and sorting the results. According to the execution plan, the results sort was taking 77% of the execution time. By varying the MDOP setting from 1 to 8, the query execution time varied from 24 seconds (MDOP=1), 6 seconds (MDOP=4) and 4.7 seconds (MDOP=8). It was clear then that the Sort part of the query was able to make use of the additional cores to return a quicker result.

What you then have to weigh up is that if you have several long-running queries running simultaneously and using many cores, there may be very limited CPU available for other operations.

Setting a value of 0 allows SQL to allocate all cores to a single query. Set it to this value if your database is used sporadically, but when you do query the database it is a large query and you want the results as soon as possible.

So in conclusion, reducing the MDOP to a very low value could result in long execution times, setting it to a high value (or zero) will throttle the number of queries that can execute at the same time. Depending on the number of cores available and how many parallel tasks you are trying to perform, you should set the MDOP to a compromise between the two. Setting it to a very low value can have a negative impact on the execution time of your query. If you aren’t sure what you are doing, best to leave it to the default setting but there may be good reason to set play with different settings, if the server isn’t performing as you want it to.

Wednesday, 13 February 2013

Anatomy of the CRM 4.0 Filtered View

Introduction

This post explains the way that filtered views work in a CRM 4 database, and the ways in which this impacts object sharing in CRM, and can have repercussions for the way in which you decide to organise your database hierarchy.
Filtered Views can be used directly in queries on a CRM database SQL Server, and are particularly useful when building customised reports through SQL Server Reporting Services. By querying the Filtered View in the report, and then installing the report in CRM, the user executing the report gets to see only the entities to which they have permission.

Example

Imagine a situation where I have two Team Leaders: A and B, and a number of Agents A1, A2, B1, B2 etc. A Team Leader role is created which has no rights to the Agent entity. A Supervisor then shares the agents A1, A2…An to Team Leader A and B1, B2…Bn to Team Leader B. A custom report is then created which accesses the Agent data by means of the Filtered View, and this is shared with Team Leaders A and B.
The Team Leaders can run the report, and will only see the data for the agents to whom they have been explicitly given access to: the report fully respects the rights granted in CRM, even though the CRM Web Service is bypassed in the generation of the report. This is very powerful, and means reports can be shared to users safe in the knowledge that the database is restricting the data that the report will contain.

Object Access Steps

When CRM decides if a user as rights to see an object in CRM, it makes a number of tests, of which at least one must come back positive:

- Does the user own the object?
- Does the role of the user allow access to the object at Business Unit level, and is the user in the same Business Unit?
- Does the role of the user allow access to the object at Child Business Unit level, and is the user in a Parental Business Unit?
- Has the object been explicitly share with the user?

To check the privilege granted to the object by the user, the function fn_GetMaxPrivilegeDepthMask is called, passing the Entity Type Id as parameter. This in turn calls the function fn_FindUserGuid to find the GUID of the user querying the filtered view using the SUSER_NAME() SQL function and joins it to the Roles and Privileges table to check access rights.
The column PrivilegeDepthMask in the RolePrivileges table is returned, which contains a bit mask indicating the level of access to the object permitted:

0x88 – Child Business Unit
0x44 – Global Access
0x22 – Business Unit

If none of these permissions are set, then a join is made to the Principal Object Access table. This table contains one row for each object shared with each user, so 10 objects shared with 3 users will have 30 records in this table. Remember also that if the object has a Parental Cascading relationship with child objects, that sharing an instance of an object to a user will also create an entry in the POA table for every child object also.

Filtered View Query

The Filtered View is now ready to be compiled. The pseudo code is:

SELECT {entity attribute list}
WHERE
PrivilegeRight = Child business unit AND Object Owner BU is in child business unit as Curernt User
OR
PrivilegeRight = Global
OR
PrivilegeRight = Business unit AND Object Owner is in same business unit as Current User
OR Id IN (SELECT Objects in POA for Current User)

Conclusions

So what conclusions can we draw from all of this? The main part is limiting the records in the Principal Object Access table. The more records that are in this table, the greater the overhead on the Filtered View query. Sharing too many objects will rapidly become a drain on database performance and will result in system slowdown. The rule of thumb is that on a reasonable system, the number of records in this table should be less than 5 million, but I would suggest an upper limit of no more than 1 million, and to aim for less than 100000 if possible. Use sharing sparingly and be aware that cascading permissions could be adding many records to the POA table.

When querying data via the CRM web service, the Filtered View is not used, but a very similar query is used instead, with very similar characteristics including the join to the POA table. Reducing records in the POA will have benefits here too.

The use of Business Units to partition data can reduce the number of records which require sharing. Use them wherever possible, providing they are sensible to reduce the need to share objects, and your database will perform better as a result.

Another way to reduce the impact of Filtered View queries is to query the list of parameters for your report using the Filtered View, but run the report directly from the entity view, once you are sure that the report input parameters will always restrict the output to those objects the user should be able to see. Default/empty parameters can trip you up here if you’re not careful.

Don’t be scared of sharing objects – it can be the only way to solve certain problems – but know the impact of what will happen to the database from those shares and then you can make the best choice.

Tuesday, 5 February 2013

Pivoting Columns to Rows in SQL Server

I've found this trick useful, and maybe you will too.

I was given a table with a number of columns which needed to be turned into a vertical table with one row for each column as name-value pairs.

i.e. changing:
Title FirstName Surname
------- ----------- --------
Mr John Smith

into:
NamePart NameValue
----------- ---------
Title Mr
FirstName John
Surname Smith

To solve this, I created a table with one column and 3 distinct rows: one for each column I wanted to split out, and included it unjoined in the query. I then used a CASE to return the appropriate column for each value in the temporary table, and sorted the output by the temporary table value to keep the ordering. Of course this only works if all the columns have the same data type, otherwise they will have to be CAST to a common type.

The example makes more sense, so here is the code:

declare @Tab table (Id INT IDENTITY(1,1), Title VARCHAR(30), FirstName VARCHAR(30), Surname VARCHAR(30));

insert into @Tab values ('Mr','John','Smith');
insert into @Tab values ('Dr','David','Jones');
insert into @Tab values ('Mrs','Sally','Brown');

select Id, Title, FirstName, Surname
from @Tab;

with P (Col1)
as
(select Col1=1
union all
select Col1=2
union all
select Col1=3)

select T.Id,
case P.Col1
when 1 then 'Title'
when 2 then 'FirstName'
when 3 then 'Surname'
end as [Name Part],
case P.Col1
when 1 then T.Title
when 2 then T.FirstName
when 3 then T.Surname
end as [Name Value]
from @Tab T, P
order by T.Id, P.Col1;

Output:

Id Title FirstName Surname
----------- ------------ ------------- ------------------------------
1 Mr John Smith
2 Dr David Jones
3 Mrs Sally Brown

Id NamePart NameValue
----------- --------- ------------------------------
1 Title Mr
1 FirstName John
1 Surname Smith
2 Title Dr
2 FirstName David
2 Surname Jones
3 Title Mrs
3 FirstName Sally
3 Surname Brown

Tuesday, 22 January 2013

What is and where is my GUID?

Looking after a SQL database which was created by CRM, leaves you either loving or loathing GUIDs. Personally, I like GUIDs. They aren't suitable all of the time as a unique identifier compared to say, a SMALLINT, but they have their place. I like them because they are unique, not just in my database, but there are so many potential values that any GUID I will have will never be in your database. Ever! This means I could give you my schema, you could populate it with data (including your own GUIDs) and at some point in the future we could merge our two databases together seamlessly. You couldn't do that with integers, even a 64-bit one.

Lets rewind a bit and review what a GUID is, before we start hunting our database for one. GUIDs come in differing forms, feel free to read all about them on Wikipedia.

In SQL Server a GUID is a 16-byte hexadecimal number with a datatype of UNIQUEIDENTIFIER, and we can generate a new one using the NEWID() function. When queried, the GUID is represented in groups of bytes in a 4-2-2-2-6 pattern. e.g.:

DECLARE @myGUID UNIQUEIDENTIFIER = NEWID()
SELECT @myGUID

Result:

------------------------------------
399E4198-490B-4A32-B824-F01DC785B56F

Great, so we have a GUID. Now what?

In my job, I am often presented with a GUID, and asked what it is. Very often, someone will have been investigating a log file and found a GUID in an error message, but not know what object type it belongs to, just that it exists somewhere in the database.

So I created a little script to spit out some SQL that would generate a query for every table/column combination in the entire database where the column datatype is UNIQUEIDENTIFER. Be sure to extend the Query Analyzer output to more than the default 256 characters per column and to send the results to text so that you can copy the script to the clipboard.

The SQL is:

SELECT 'PRINT ''Table: ' + OBJECT_NAME(C.[object_id]) + ' Column: ' + C.[name] + ''';
SELECT * FROM ' + OBJECT_NAME(C.[object_id]) + ' WITH (NOLOCK) WHERE ' + C.[name] + '=''' + CAST(@myGUID AS CHAR(36)) + ''';' AS [SQL]
FROM sys.columns C
INNER JOIN sys.types T ON C.user_type_id = T.user_type_id
WHERE T.[name] = 'UNIQUEIDENTIFIER'
AND OBJECTPROPERTY(C.[object_id], N'IsUserTable') = 1
ORDER BY OBJECT_NAME(C.[object_id]), C.[name]

The output is:

PRINT 'Table: myTable1 Column: Col1';

SELECT * FROM myTable1 WITH (NOLOCK) WHERE Col1='954FF5D8-3442-4CD5-B25C-323A616CBD61';

PRINT 'Table: myTable2 Column: Col2';

SELECT * FROM myTable2 WITH (NOLOCK) WHERE Col2='954FF5D8-3442-4CD5-B25C-323A616CBD61';

...

Now paste the SQL into a new query window, and leave it spin for a while. As the column is going to be an identifier of some sort, the chances are it will be indexed so shouldn't take too long. Once all the queries of run, you should be able to search the results for your GUID, and find all the instances of it. This will then show you what table(s) it is found and and in which column. You may find it in a relational table joining two tables together that you hadn't even previously considered. I find that a lot when I inherit a Dynamics CRM database and am presented with this problem.

Of course there is plenty of room for refinement. I could have put each row into its own SQL variable and executed it, I could have put a READ UNCOMMITTED at the front of the query instead of putting a NOLOCK on each query, or set the ROWCOUNT for the query batch (in case the GUID is something like a CRM object owner and appears thousands or millions of times). But I'll leave that to you, now you know what the gaps are and what is unique to your database. Happy hunting!

Monday, 21 January 2013

Data Compression in SQL Server 2008 R2

I've been working with a database which has been struggling for performance of late. The issue was we had a database deployed which required re-indexing, but had increasingly 24x7 demands placed upon it. Standard Edition 2008 R2 did not permit Online re-indexing, but Enterprise Edition does, along with a host of other features. Finally an Enterprise Edition was approved, allowing online re-indexing but also another feature: database compression.

Database Compression comes in a number of guises: Backup Compression, Row Compression, Page Compression.

Backup Compression

Backup Compression compresses backups as they are taken, and apply to complete backups, differential backups and transaction log backups. As a rule-of-thumb, I usually reckon my backups are about 75% of the combined data and index size, i.e. a 50GB database with 40GB of data and indexing will create a backup file of around 30GB. It varies depending on the content of your database, but for finger-in-air guesstimates, its a ratio that has always worked for me. As soon as backup compression was applied, this reduced to about 30% of the size. Not only that, but instead of taking 21 minutes to write to disk, it was written in 9. The penalty, as would be expected, is increased CPU. I may have missed the performance counter twitch, but the CPUs on my 16-core system barely registered a flicker. If you have Enterprise Edition, then unless your CPUs hover round the 90% mark all day every day, then firstly you should get that sorted, and then you should enable backup compression.

Row and Page Compression

Enterprise Edition also allows the compression of the database itself. Two levels of compression are offered: Row Compression and Page Compression.

The effects of Row Compression on your data is described here:
http://msdn.microsoft.com/en-us/library/cc280576(v=sql.105).aspx
And Page Compression:
http://msdn.microsoft.com/en-us/library/cc280464(v=sql.105).aspx

Handily, the best way to check if Row and Page compression will work for you is to use the Stored Procedure:
EXEC sp_estimate_data_compression_savings @schema, @table, @index, @partition, @compressiontype
e.g.
EXEC sp_estimate_data_compression_savings 'dbo','MyTable',2,null,'PAGE'

Give the Stored Procedure your Schema, table name, index, partition and compression type, and SQL will take a sample of the data in the table and report how much space you will save by using that compression type. Leave the parameters Index and Partition as NULL if you want to get the values for all the indexes and all partitions. The values you can set for compression are PAGE, ROW and NONE.

The resultant recordset is:
Object_Name: Name of your table
Schema_Name: The schema in which it belongs
Index Id: Index number
Size_With_Current_Compression_Setting(KB): As described
Size_With_Requested_Compression_Setting(KB): Estimate of table size with compression applied (or removed)
Sample_Size_With_Current_Compression_Setting(KB): As described
Sample_Size_With_Requested_Compression_Setting(KB): As described

Index Compression?

The output from the Stored Procedure is worth a bit more explanation. The data in the table can be compressed, but also the indexes. When you index a column or columns in SQL Server, SQL will store a copy of the data referenced by the index in the index itself. This is so that if you query a value from the index, the index can be traversed, and the value found and returned without referring to the table itself. If you need an non-indexed value, SQL will traverse the index, look up the data page where the row is found, get the data page, find the data and return it - much more expensive in terms of data retrieval and processing time. But the upshot is that the data in the index can be compressed, hence the ability to compress data in the indexes separately from the table.

A final potential area of confusion is the clustered index. I assume all your tables have a clustered index? (Even if your database is a CRM database, and CRM has made a poor choice!). When you compress your clustered index, you are compressing your table data. However, you aren't required to have a clustered index on your table, so it is possible to compress your data without.

Compressing Your Data

To compress your index, use the ALTER INDEX command, with the option: DATA_COMPRESSION=PAGE, ROW or NONE. e.g.:

ALTER INDEX [ndx_Table] ON [dbo].[myTable] REBUILD PARTITION = ALL WITH ( DATA_COMPRESSION = PAGE)

A neat extension with Enterprise Edition is that you can rebuild your index with ONLINE=ON, so that database availability can be maintained whilst your data or index is being compressed.

If you are compressing data in a table without a clustered index, then use the ALTER TABLE command instead:

ALTER TABLE [dbo].[myTable] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION =  PAGE)

Performance

The compressing and decompressing of indexes requires more CPU cycles each time a table is written to or queried. I compressed a CRM Base table with approx 1.3M rows. The table has 6 indexes, 5 of which are compressed. Uncompressed the table has the following sizes:

Rows: 1284350
Data: 186952KB
Indexing: 313552KB

And with Page compression:

Data: 64856KB
Indexing: 228256KB

Compressed, the table is approx 60% of the size of the uncompressed table.

Inserting 100K rows took 10s without compression applied, 15s with page compression
Querying the table and aggregating across all rows gave query times approx 120ms both with and without compression. The increase in CPU for the query was not possible to quantify above the general chatter on the server.
I have read some claimed performance gains of a factor of 2-3 for database compression. If your database is large, and your tables generally have to be retrieved from disk for table scans, then the compression will really have a performance benefit as fewer pages have to be read from the disk. If your database is already wholly in memory, then the reduction in data pages read from memory will balance out the increase in CPU time to uncompress the data.

Conclusions

Row and Page compression can result is some worthwhile reductions in the size of your tables and indexing. The reduction could leave your database able to be resident in RAM and eliminate disk access, and that will really give your database a performance boost. The increase in complexity for compression will slow down your inserts, updates and deletes, and take up CPU cycles. The more indexes you have on your table, the greater the amount of time will be required to compress the data in them, and the slower these transactions will be. If your database has few writes but many queries, then compression will really work well for you.