Filtered Indexes and Performance

Introduction

During a recent SQL Server Performance Tuning consultation, I was asked a fascinating question by a Senior DBA working with me from the client-side. Let me share this query, as I believe you will find it intriguing as well.

“We often create an index on the columns used in the WHERE clause of the SELECT statement. Why can’t we create an index with the WHERE clause instead of just including every single row for selected key columns in the index?”

Of course, we can do that as well. We do not have to create an index on every row of the table if the rows are not retrieved frequently, or if the queries do not focus on them. In today’s blog post, we will discuss Filtered Indexes – in essence, indexes with a WHERE clause.

What is the Filtered Index?

Filtered Index is a new feature in SQL SERVER. A Filtered Index serves to index a portion of rows in a table. This technique applies a filter on INDEX, which improves query performance, reduces index maintenance costs, and reduce index storage costs compared with full-table indexes.

A Filtered Index is an optimized non-clustered index. This feature is a significant performance improvement in SQL Server, reducing the index storage cost and reducing maintenance costs.

It isn’t easy to understand how filtered indexes work and their advantages. Let us better understand the filtered index with the help of demonstration.

Demonstration of a Filtered Index

For the demonstration, we will create the script from scratch.

This example will use a table from a sample database, AdventureWorks, and create a filtered index. Next, we will run two queries - one where the query matches the WHERE clause of the index and one where it does not.  We will measure statistics IO for both situations and see how a filtered index is efficient in this instance.

In the database AdventureWorks, there is a table called HumanResources.Employee. Let us create an index on its column Title.

CREATE NONCLUSTERED INDEX NCI_Department

ON HumanResources.Employee(LoginID)

WHERE JobTitle= ‘Marketing Specialist’

Once we create the index, we will run the following two queries. However, immediately before running them, we will enable the STATISTICS IO ON to measure how much data the system is reading from the disk. You can read more about this over here.

The command to enable statistics IO is as following:

SET STATISTICS IO ON

Now, let us run the following query in which we have a WHERE condition similar to the Filtered Index we have created.

SELECT LoginID

FROM HumanResources.Employee

WHERE JobTitle= ‘Marketing Specialist’

Here is the output of the messages window which lists the IO details.

Table ‘Employee’. Scan count 1, logical reads 2

From the IO details, we can see that the query has read the 2 data pages. The size of each data page is 8K.

The next step is to run another query similar to the previous one. The only difference here is the WHERE condition, which is different from what we used in the index.

SELECT LoginID

FROM HumanResources.Employee

WHERE JobTitle= ‘Senior Tool Designer’

Here is the output of the messages window which lists the IO details.

 Table ‘Employee’. Scan count 1, logical reads 9

It is clear from the observation that filtered indexes are very effective. In our case, the filtered index is over four times more effective when the WHERE condition matches what we have specified in the index definition. However, when the WHERE condition does not correspond with the index, the performance is less efficient. This efficiency loss arises because the query may either scan the entire table looking for values or use an alternative index.

When to use Filtered Index

After looking at the results in the previous experiment, you can choose two different directions. The first is to create multiple filtered indexes, while the second is to create a regular non-clustered index instead of a filtered non-clustered index on a table. Honestly, without running any experiment, it is impossible to say which method is correct. Unless and until we test a regular index with a filtered index, we can’t tell which one is better for your specific query.

Here is a general guideline on when one should consider a filtered index. If you have a massive table of data, with only a subset of the data under frequent access, you may use a filtered index with the qualifying condition of the frequently-accessed data in the WHERE clause. Let’s say your table contains many years of data, but you only need to retrieve data for the most recent year. In that case, you may consider creating a filtered index with the current year’s condition to avoid making a colossal index with data that are unnecessary for your business queries.

Summary

This blog post should give you an idea of how filtered indexes work and when to use them. Filtered indexes can be incredibly convenient when dealing with a large table with consistently high access to specific data. In the next episode of this series, we will discuss a real-world scenario of Columnstore Indexes.

Anonymous