Statistics and PowerBI Visualizations, Part Two

by Jan 15, 2018

In the previous post, we went over the details of DBCC_SHOW_STATISTICS, now let’s see how this comes together.  We shall run the histogram for the ProductID from the Sales.SalesOrderDetail table.

 

Pay attention to the Range_HI_Key of 831.   The Avg_Range_Rows are 36 2/3.

 

Now let’s run a query to see how SQL Server uses these statistics to create an execution plan.

 

Let’s look at the execution plan below.  You can see that the estimated number of records coming from the index for ProductID is 308.

 

 

So how did the optimizer come to this?  The range for the ProductID in the query is 827 to 831.  This includes the entire range of the step, including the Range_Hi_Key.  In the previous post, we mentioned that the Range_Rows is  the estimated count of the rows in the step, EXCLUDING the Range_HI_Key and the EQ_Rows is the estimated number of rows for columns who have the value of the Range_HI_Key. The number 308 is the summation of the EQ_Rows and Range_Rows.

 

Query on Range_HI_Key

Now let’s see what happens when we query the Range_Hi_Key exactly

When we look at the estimated execution plan, you will see the estimated number of rows is 198.  This is the value for EQ_Rows which is the estimated number of rows for columns who have the value of the Range_HI_Key.

  

Query on Step excluding Range_HI_Key

Finally, we shall see how SQL Server estimates the records for a value within the step excluding the Range_HI_Key.

The execution plan visual is rounding the estimated number of 36.66667 to 37, but this is the value of the Avg_Range_Rows which is the average number of records, with duplicate values within the step, excluding the Range_HI_Key.

 

  

Old versus New

When I was putting together the lesson plans for this, I wanted to make my own query for the comparisons, not borrow one from another site or blog.  Yes, I borrow plenty, but I wanted this to be mine.  When I was presenting my “code tuning” class, I had recently upgraded my instance from 2012 to 2017.  I had also put my database into 2017 compatibility mode.  I had used this query to show that unions that are intensive can cause issues with tempdb and cause spill over.  To my “joy”, when I ran the query in the class I did not get the tempdb spillover.  And right then I realized that I was not in Kansas(2012 compatibility) any longer.  But this proved to be opportunistic for the statistics/optimizer comparison.

So here, let’s see the query in 2012(Cardinality Estimator 7) compatibility.

In the screenshot below, you’ll see that Cardinality Estimator 7 under estimates the amount of rows for the sort.  Therefore requesting less memory for the query.  Since less memory is given to the query, rows have to spill to tempdb for the sorting of the union.

Query in 2017(Cardinality Estimator 14) Compatibility

 

 

In the screenshot below, you will see that Cardinality Estimator 14 OVER estimates the amount of records for the distinct sort.  Also, if you notice and compare the Hash Match Inner Joins to Cardinality Estimator 7 above, CE14 has a much closer value for estimate to actual.  This means that the Cardinality Estimator requested more memory for this query and it did NOT need to spill over to tempdb.

 

Now we see how SQL Server is using the statistics and how the differences in the Cardinality Estimators can determine differences in the execution plans and affect the resources needed.

The next post is where the fun comes in.  Running these stats in SSMS doesn’t show much.  In the final part, we will show the PowerBI report designed by Mr. Patrick LeBlanc(b|t).