Tuesday 22 January 2013

What is and where is my GUID?

Looking after a SQL database which was created by CRM, leaves you either loving or loathing GUIDs. Personally, I like GUIDs. They aren't suitable all of the time as a unique identifier compared to say, a SMALLINT, but they have their place. I like them because they are unique, not just in my database, but there are so many potential values that any GUID I will have will never be in your database. Ever! This means I could give you my schema, you could populate it with data (including your own GUIDs) and at some point in the future we could merge our two databases together seamlessly. You couldn't do that with integers, even a 64-bit one.

Lets rewind a bit and review what a GUID is, before we start hunting our database for one. GUIDs come in differing forms, feel free to read all about them on Wikipedia.

In SQL Server a GUID is a 16-byte hexadecimal number with a datatype of UNIQUEIDENTIFIER, and we can generate a new one using the NEWID() function. When queried, the GUID is represented in groups of bytes in a 4-2-2-2-6 pattern. e.g.:


DECLARE @myGUID UNIQUEIDENTIFIER = NEWID()
SELECT @myGUID

Result:

------------------------------------
399E4198-490B-4A32-B824-F01DC785B56F

Great, so we have a GUID. Now what?

In my job, I am often presented with a GUID, and asked what it is. Very often, someone will have been investigating a log file and found a GUID in an error message, but not know what object type it belongs to, just that it exists somewhere in the database.

So I created a little script to spit out some SQL that would generate a query for every table/column combination in the entire database where the column datatype is UNIQUEIDENTIFER. Be sure to extend the Query Analyzer output to more than the default 256 characters per column and to send the results to text so that you can copy the script to the clipboard.

The SQL is:


SELECT 'PRINT ''Table: ' + OBJECT_NAME(C.[object_id]) + ' Column: ' + C.[name] + ''';
SELECT * FROM ' + OBJECT_NAME(C.[object_id]) + ' WITH (NOLOCK) WHERE ' + C.[name] + '=''' + CAST(@myGUID AS CHAR(36)) + ''';' AS [SQL]
FROM sys.columns C
INNER JOIN sys.types T ON C.user_type_id = T.user_type_id
WHERE T.[name] = 'UNIQUEIDENTIFIER'
AND OBJECTPROPERTY(C.[object_id], N'IsUserTable') = 1
ORDER BY OBJECT_NAME(C.[object_id]), C.[name]

The output is:

PRINT 'Table: myTable1 Column: Col1';
SELECT * FROM myTable1 WITH (NOLOCK) WHERE Col1='954FF5D8-3442-4CD5-B25C-323A616CBD61';
PRINT 'Table: myTable2 Column: Col2';
SELECT * FROM myTable2 WITH (NOLOCK) WHERE Col2='954FF5D8-3442-4CD5-B25C-323A616CBD61';
...

Now paste the SQL into a new query window, and leave it spin for a while. As the column is going to be an identifier of some sort, the chances are it will be indexed so shouldn't take too long. Once all the queries of run, you should be able to search the results for your GUID, and find all the instances of it. This will then show you what table(s) it is found and and in which column. You may find it in a relational table joining two tables together that you hadn't even previously considered. I find that a lot when I inherit a Dynamics CRM database and am presented with this problem.

Of course there is plenty of room for refinement. I could have put each row into its own SQL variable and executed it, I could have put a READ UNCOMMITTED at the front of the query instead of putting a NOLOCK on each query, or set the ROWCOUNT for the query batch (in case the GUID is something like a CRM object owner and appears thousands or millions of times). But I'll leave that to you, now you know what the gaps are and what is unique to your database. Happy hunting!










Monday 21 January 2013

Data Compression in SQL Server 2008 R2

I've been working with a database which has been struggling for performance of late. The issue was we had a database deployed which required re-indexing, but had increasingly 24x7 demands placed upon it. Standard Edition 2008 R2 did not permit Online re-indexing, but Enterprise Edition does, along with a host of other features. Finally an Enterprise Edition was approved, allowing online re-indexing but also another feature: database compression.

Database Compression comes in a number of guises: Backup Compression, Row Compression, Page Compression.


Backup Compression

Backup Compression compresses backups as they are taken, and apply to complete backups, differential backups and transaction log backups. As a rule-of-thumb, I usually reckon my backups are about 75% of the combined data and index size, i.e. a 50GB database with 40GB of data and indexing will create a backup file of around 30GB. It varies depending on the content of your database, but for finger-in-air guesstimates, its a ratio that has always worked for me. As soon as backup compression was applied, this reduced to about 30% of the size. Not only that, but instead of taking 21 minutes to write to disk, it was written in 9. The penalty, as would be expected, is increased CPU. I may have missed the performance counter twitch, but the CPUs on my 16-core system barely registered a flicker. If you have Enterprise Edition, then unless your CPUs hover round the 90% mark all day every day, then firstly you should get that sorted, and then you should enable backup compression.


Row and Page Compression

Enterprise Edition also allows the compression of the database itself. Two levels of compression are offered: Row Compression and Page Compression. 

The effects of Row Compression on your data is described here:
http://msdn.microsoft.com/en-us/library/cc280576(v=sql.105).aspx
And Page Compression:
http://msdn.microsoft.com/en-us/library/cc280464(v=sql.105).aspx

Handily, the best way to check if Row and Page compression will work for you is to use the Stored Procedure:
EXEC sp_estimate_data_compression_savings @schema, @table, @index, @partition, @compressiontype
e.g.
EXEC sp_estimate_data_compression_savings 'dbo','MyTable',2,null,'PAGE'

Give the Stored Procedure your Schema, table name, index, partition and compression type, and SQL will take a sample of the data in the table and report how much space you will save by using that compression type. Leave the parameters Index and Partition as NULL if you want to get the values for all the indexes and all partitions. The values you can set for compression are PAGE, ROW and NONE. 

The resultant recordset is:
Object_Name: Name of your table
Schema_Name: The schema in which it belongs
Index Id: Index number
Size_With_Current_Compression_Setting(KB): As described
Size_With_Requested_Compression_Setting(KB): Estimate of table size with compression applied (or removed)
Sample_Size_With_Current_Compression_Setting(KB): As described
Sample_Size_With_Requested_Compression_Setting(KB): As described


Index Compression?

The output from the Stored Procedure is worth a bit more explanation. The data in the table can be compressed, but also the indexes. When you index a column or columns in SQL Server, SQL will store a copy of the data referenced by the index in the index itself. This is so that if you query a value from the index, the index can be traversed, and the value found and returned without referring to the table itself. If you need an non-indexed value, SQL will traverse the index, look up the data page where the row is found, get the data page, find the data and return it - much more expensive in terms of data retrieval and processing time. But the upshot is that the data in the index can be compressed, hence the ability to compress data in the indexes separately from the table.

A final potential area of confusion is the clustered index. I assume all your tables have a clustered index? (Even if your database is a CRM database, and CRM has made a poor choice!). When you compress your clustered index, you are compressing your table data. However, you aren't required to have a clustered index on your table, so it is possible to compress your data without.


Compressing Your Data

To compress your index, use the ALTER INDEX command, with the option: DATA_COMPRESSION=PAGE, ROW or NONE. e.g.:

ALTER INDEX [ndx_Table] ON [dbo].[myTable] REBUILD PARTITION = ALL WITH ( DATA_COMPRESSION = PAGE)

A neat extension with Enterprise Edition is that you can rebuild your index with ONLINE=ON, so that database availability can be maintained whilst your data or index is being compressed.

If you are compressing data in a table without a clustered index, then use the ALTER TABLE command instead:

ALTER TABLE [dbo].[myTable] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION =  PAGE)



Performance

The compressing and decompressing of indexes requires more CPU cycles each time a table is written to or queried. I compressed a CRM Base table with approx 1.3M rows. The table has 6 indexes, 5 of which are compressed. Uncompressed the table has the following sizes:

Rows: 1284350
Data: 186952KB
Indexing: 313552KB

And with Page compression:

Data: 64856KB
Indexing: 228256KB

Compressed, the table is approx 60% of the size of the uncompressed table.

Inserting 100K rows took 10s without compression applied, 15s with page compression
Querying the table and aggregating across all rows gave query times approx 120ms both with and without compression. The increase in CPU for the query was not possible to quantify above the general chatter on the server.
I have read some claimed performance gains of a factor of 2-3 for database compression. If your database is large, and your tables generally have to be retrieved from disk for table scans, then the compression will really have a performance benefit as fewer pages have to be read from the disk. If your database is already wholly in memory, then the reduction in data pages read from memory will balance out the increase in CPU time to uncompress the data.


Conclusions

Row and Page compression can result is some worthwhile reductions in the size of your tables and indexing. The reduction could leave your database able to be resident in RAM and eliminate disk access, and that will really give your database a performance boost. The increase in complexity for compression will slow down your inserts, updates and deletes, and take up CPU cycles. The more indexes you have on your table, the greater the amount of time will be required to compress the data in them, and the slower these transactions will be. If your database has few writes but many queries, then compression will really work well for you.