Monday 21 January 2013

Data Compression in SQL Server 2008 R2

I've been working with a database which has been struggling for performance of late. The issue was we had a database deployed which required re-indexing, but had increasingly 24x7 demands placed upon it. Standard Edition 2008 R2 did not permit Online re-indexing, but Enterprise Edition does, along with a host of other features. Finally an Enterprise Edition was approved, allowing online re-indexing but also another feature: database compression.

Database Compression comes in a number of guises: Backup Compression, Row Compression, Page Compression.


Backup Compression

Backup Compression compresses backups as they are taken, and apply to complete backups, differential backups and transaction log backups. As a rule-of-thumb, I usually reckon my backups are about 75% of the combined data and index size, i.e. a 50GB database with 40GB of data and indexing will create a backup file of around 30GB. It varies depending on the content of your database, but for finger-in-air guesstimates, its a ratio that has always worked for me. As soon as backup compression was applied, this reduced to about 30% of the size. Not only that, but instead of taking 21 minutes to write to disk, it was written in 9. The penalty, as would be expected, is increased CPU. I may have missed the performance counter twitch, but the CPUs on my 16-core system barely registered a flicker. If you have Enterprise Edition, then unless your CPUs hover round the 90% mark all day every day, then firstly you should get that sorted, and then you should enable backup compression.


Row and Page Compression

Enterprise Edition also allows the compression of the database itself. Two levels of compression are offered: Row Compression and Page Compression. 

The effects of Row Compression on your data is described here:
http://msdn.microsoft.com/en-us/library/cc280576(v=sql.105).aspx
And Page Compression:
http://msdn.microsoft.com/en-us/library/cc280464(v=sql.105).aspx

Handily, the best way to check if Row and Page compression will work for you is to use the Stored Procedure:
EXEC sp_estimate_data_compression_savings @schema, @table, @index, @partition, @compressiontype
e.g.
EXEC sp_estimate_data_compression_savings 'dbo','MyTable',2,null,'PAGE'

Give the Stored Procedure your Schema, table name, index, partition and compression type, and SQL will take a sample of the data in the table and report how much space you will save by using that compression type. Leave the parameters Index and Partition as NULL if you want to get the values for all the indexes and all partitions. The values you can set for compression are PAGE, ROW and NONE. 

The resultant recordset is:
Object_Name: Name of your table
Schema_Name: The schema in which it belongs
Index Id: Index number
Size_With_Current_Compression_Setting(KB): As described
Size_With_Requested_Compression_Setting(KB): Estimate of table size with compression applied (or removed)
Sample_Size_With_Current_Compression_Setting(KB): As described
Sample_Size_With_Requested_Compression_Setting(KB): As described


Index Compression?

The output from the Stored Procedure is worth a bit more explanation. The data in the table can be compressed, but also the indexes. When you index a column or columns in SQL Server, SQL will store a copy of the data referenced by the index in the index itself. This is so that if you query a value from the index, the index can be traversed, and the value found and returned without referring to the table itself. If you need an non-indexed value, SQL will traverse the index, look up the data page where the row is found, get the data page, find the data and return it - much more expensive in terms of data retrieval and processing time. But the upshot is that the data in the index can be compressed, hence the ability to compress data in the indexes separately from the table.

A final potential area of confusion is the clustered index. I assume all your tables have a clustered index? (Even if your database is a CRM database, and CRM has made a poor choice!). When you compress your clustered index, you are compressing your table data. However, you aren't required to have a clustered index on your table, so it is possible to compress your data without.


Compressing Your Data

To compress your index, use the ALTER INDEX command, with the option: DATA_COMPRESSION=PAGE, ROW or NONE. e.g.:

ALTER INDEX [ndx_Table] ON [dbo].[myTable] REBUILD PARTITION = ALL WITH ( DATA_COMPRESSION = PAGE)

A neat extension with Enterprise Edition is that you can rebuild your index with ONLINE=ON, so that database availability can be maintained whilst your data or index is being compressed.

If you are compressing data in a table without a clustered index, then use the ALTER TABLE command instead:

ALTER TABLE [dbo].[myTable] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION =  PAGE)



Performance

The compressing and decompressing of indexes requires more CPU cycles each time a table is written to or queried. I compressed a CRM Base table with approx 1.3M rows. The table has 6 indexes, 5 of which are compressed. Uncompressed the table has the following sizes:

Rows: 1284350
Data: 186952KB
Indexing: 313552KB

And with Page compression:

Data: 64856KB
Indexing: 228256KB

Compressed, the table is approx 60% of the size of the uncompressed table.

Inserting 100K rows took 10s without compression applied, 15s with page compression
Querying the table and aggregating across all rows gave query times approx 120ms both with and without compression. The increase in CPU for the query was not possible to quantify above the general chatter on the server.
I have read some claimed performance gains of a factor of 2-3 for database compression. If your database is large, and your tables generally have to be retrieved from disk for table scans, then the compression will really have a performance benefit as fewer pages have to be read from the disk. If your database is already wholly in memory, then the reduction in data pages read from memory will balance out the increase in CPU time to uncompress the data.


Conclusions

Row and Page compression can result is some worthwhile reductions in the size of your tables and indexing. The reduction could leave your database able to be resident in RAM and eliminate disk access, and that will really give your database a performance boost. The increase in complexity for compression will slow down your inserts, updates and deletes, and take up CPU cycles. The more indexes you have on your table, the greater the amount of time will be required to compress the data in them, and the slower these transactions will be. If your database has few writes but many queries, then compression will really work well for you.












No comments:

Post a Comment