A long time ago I looked at some reasons as to why persisted computed columns ( and therefore indexes on those columns) are not used (Part1, Part2).
Recently I was drawn into an issue where the same thing was happening, a persisted computed column had been created with an index, but neither were being used when expected.
Using AdventureWorks lets re-create this scenario:
1 2 3 4 5 6 7 8 9 10 11 12 13
create table ccTest ( id integer primary key, surname varchar(50) not null, cSum as checksum(isnull(surname,' ')) persisted not null) go create index idxcsum on ccTest(cSum) go insert into ccTest(Id,surname) Select BusinessEntityID,LastName from Person.Person go
Now, if we want to find all those surnames with a specific checksum (yes, i know! checksum is bad ), it would be reasonable to :
select * from ccTest where cSum=0
and assume that the index idxcsum is used to retrieve the required data. Looking at the plan it obvious that this does not happen.
The clustered index is scanned, the non clustered index is ignored. There are cases where this is a good thing to happen, for instance to avoid a key/bookmark lookup when that will be an expensive operation. Here though there is an estimate of ~16rows not enough for that to be the case.
Maybe we need to force the use of the index ? :
select * from ccTest with(index=idxcsum) where cSum=0
This uses the index, but not in the manner we need :
The index is scanned, all the rows are read from the index, and the key lookup previously mentioned has entered the equation.
So what is really happening here ?
Well in the second example, there are some very graphical clues, the Compute Scalar and Filter.
What is the compute scalar computing ? Lets look ‘defined values’ in the properties :
[[AdventureWorks2012].[dbo].[ccTest].cSum] = Scalar Operator(checksum([AdventureWorks2012].[dbo].[ccTest].[surname]))
So the persisted data is being recomputed, looking at the filter the predicate is :
That computed value is then being filtered upon. But this is the persisted data, why is the index not used ?. Well, that’s because the computation isn’t logically the same. But how can this be ?
There are two things that need to be appreciated here:
- SQL Server puts a lot of energy into simplifying queries and operations. Redundant branches are pruned and contradictions removed.
- Computed columns are expanded back to the base definition before being optimized.
In our computed column definition we have a redundant code branch. ISNULL is used to guard against the Surname column being NULL, however the value in Surname can never be NULL as there is a NOT NULL constraint on it, therefore when the column definition is expanded it is then simplified and this redundancy removed. You can see this in the ‘defined values’ above ? There is no ISNULL definition is the computation. So the optimizer cannot now rematch this computation back to the persisted computed column as the definition is not the same. It would be nice to see the NOEXPAND hint used to prevent this happening as it does for indexed views.
Now we know that the resolution is relatively straightforward:
1 2 3 4 5 6 7 8 9 10 11 12 13
create table ccTest2 ( id integer primary key, surname varchar(50) not null, cSum as checksum(surname) persisted not null) go create index idxcsum on ccTest2(cSum) go insert into ccTest2(Id,surname) Select BusinessEntityID,LastName from Person.Person go
And then if we :
select * from ccTest2 where ccTest2.cSum=0
We will see the index used as expected in the first place
So, the takeaway is that if you are using persisted computed columns ensure that the definitions have no redundancy and then the indexes should be used.
NB : Curiously there is something else , or more accurately another level of indirection, here. If you use the query :
select * from ccTest where CHECKSUM(surname) = 0
You may have expected it to match to the persisted computed column, as it would against ccTest2, but it doesn’t.