I/O Efficiency - Storage Performance Optimization

In a typical business application, data builds over time as sales are recorded.  Data is stored in blocks for tables and leaf pages for indices.  The number of blocks & leaf pages grows.  A full table or an index scan does more I/O operations.  Execution time of SQL statements using these access paths grows.  Users and applications experience performance as execution time.  Time is the critical metric.

In seeming opposition, the blocks in a relational database will typically contain fewer and fewer rows of data making each I/O operation less efficient.  This is due to logical deletes (rows are just marked as deleted) and block & page splits when data is inserted.  When a split occurs, a page is added to contain the new data.  Inefficient I/O is like sending a shipping container halfway around the world, opening the door, and finding one package of noodles.  While everyone loves a steaming hot meal and is grateful for it, can its delivery be more efficient, consume less energy, and take less time?

Imagine if the storage subsystem shipped a block all the way to the database and it was empty.  A lot of work was expended for no benefit.  A storage block is like a shipping container.  The computer’s operating system does I/O operations on blocks, not rows.  In turn, a block or page contains rows of data.  The more rows; the more efficient.

Most business applications add data, update it, and delete it.  Relational databases simply mark a row when it is deleted; it still consumes space.  This is called wasted (white) space due to logical deletes.  Applications like PeopleSoft inserts rows into a table, use the data for processing (joins), and then delete all of the rows.  Temp tables are used by many applications.

How to pack blocks and pages with the maximum number of rows?

Relational databases offer reorgs, rebuilds, and drop/create to pack each block with rows so that each I/O operation is more efficient.  Space will be recovered too.  Measure the number of block & pages before and after.  The percentage space savings is a relative estimate of the time savings each time a scan is performed.  If the SQL statements accessing the index execute numerous times, the aggregated time and resource consumption will add up.  This will have a nice ripple effect as those resources will be available for other processes.  This extends to other virtual servers in a VMware environment.  

Minimizing time accessing storage will reduce the execution time of SQL statements.  It will reduce contention and the likelihood that locking & blocking will interfere with processing.  Faster processing means better application performance and more productive users.  This has a true economic benefit.

Reorgs and rebuilds can be completed ad hoc, scripted & scheduled, or automated with a tool like IDERA’s Defrag Manger:  SQL Defrag Manager.  

Caution - SQL Server will write to log files during these operations.  If there is not enough space, the operation will fail.  It is recommended that you start with smaller objects before trying on the largest and busiest tables and indices.  Prove that this technique works to yourself and for your applications; document before & after.    


Picture thanks to Wikipedia:  MSC_Oscar