Precise Detects Problem In VMware Storage

by Jan 28, 2017

Our demo environment had a storage hardware outage.  Here is the timeline and how the issue appeared in Precise for SQL Server.

1/9/17- Data Center experienced a power outage, Dell Power Vault MD3200i was abruptly powered off causing 2 drives to enter preemptive failure state.

1/10/17 – Faulty drive 1 of 2 replaced, allow time for the RAID rebuild to complete before replacing drive 2 of 2

1/11/17 – Raid rebuild completed, Faulty drive 2 of 2 replaced.

1/16/17 – Precise Team notified IT of severe performance degradation of VMs in Demo environment

1/18/17 – After troubleshooting VM & Host it was apparent Disks reads were causing the issue, checked the storage device GUI Raid rebuild marked as completed and successful. VMs are still experiencing performance issues, after looking into the system logs of the storage device Raid Rebuild shows failed and Raid configuration now shows array degraded. Migrating VMs to new data store

1/20/17 – Migration of VMs completed and Precise team confirms problem solved.

I returned from vacation on Monday, January 16.  I could not help to notice that the Precise GUI in the demo environment was painfully slow.  Precise for SQL Server is measuring the performance of its own historic repository, the database is named PMDB, short for Performance Management DataBase.  Precise loads the PMDB with performance metrics 24 by 7; the Precise GUI retrieves metrics.

This screen shows that Log Waits increased ~10X under a static workload.  The top three SQL statements show 2 updates and an insert.  Click the image to enlarge; click back to return.

The PMDB database lives in a file named PMDB.  This screen shows a profile of the performance characteristics for PMDB.  I/O Stall Time rises from negligible to significant.  This metric is a general measure of the performance of storage.  It is most useful in a trend graph.

Management by measurement is the hallmark of best practices.  Before & after measurement documents that the problem is solved.  The normal processing profile resumes.  The Precise application runs great.  The GUI performs to expectations

Precise captures the SQL statements 24 by 7 that drive resource consumption showing cause & effect.  Precise’s value proposition is a little wider and a little deeper with a lighter footprint.  This gives the DBA more opportunities to take performance to the next level.