Please use my filtered index

Not long ago we had a requirement to timeout asynchronous application tasks not completing within an expected period (defined per task type). There is a high hit rate on the table with the number of pending modules at any one time in the 100s, and for this reason timeout processing needed to cause minimal blocking to truly active tasks. The schema design is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- timeout (sec) per task
create table [dbo].[TaskTimeouts] 
(
       TaskName varchar(100)
     , TimeoutSeconds int
     , constraint [PK_TaskTimeouts] primary key(TaskName)
);
go
 
-- task state for a process. In this case the interesting states are 0=pending, 1=complete, 4=timeout
create table [dbo].[AsyncTaskStates] 
(
       ProcessId uniqueidentifier
     , TaskName varchar(100)
     , StateId tinyint
     , StartedAt datetime
     , CompletedAt datetime
     , constraint [PK_AsyncTaskStates] primary key(ProcessId, TaskName)
);
go

The solution seemed simple enough, implement an index by the started time with a filter on the pending state. Also to minimize blocking truly active tasks the timeout process should only scan based on the overall minimum timeout value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- proposed index to support fnding pending tasks for timeout
create nonclustered index [IX_AsyncTaskStates_StartedAt] 
on    [dbo].[AsyncTaskStates]
      (
           [StartedAt] ASC
      )
      where 
      (
           [StateId]=(0)
      );
go
 
-- query to verify proposed index
declare @minTimeoutSec int = (select min(TimeoutSeconds) from [dbo].[TaskTimeouts]);
 
select s.ProcessId, s.Taskname, s.StartedAt
from   [dbo].[AsyncTaskStates] s
join   [dbo].[TaskTimeouts] t on s.TaskName = t.TaskName 
where  s.StateId = 0
and    s.StartedAt < cast(dateadd(second, -@minTimeoutSec, getdate()) as datetime2(3)) and    datediff(second, s.StartedAt, getdate()) > t.TimeoutSeconds;
go

SelectPlanToVerifyFilteredIndexUse

Everything was developed, tested and deployed to production. There was unexpected blocking and clustered index scanning of a large busy table to boot. After replicating the problem locally it was easy to see the filtered column also needed to be an included column.

1
2
3
4
5
6
7
8
9
10
11
12
-- update implemented to find pending tasks for timeout 
declare @minTimeoutSec int = (select min(TimeoutSeconds) from [dbo].[TaskTimeouts]);
 
update s
set    CompletedAt = getdate()
     , StateId = 4
 
from   [dbo].[AsyncTaskStates] s
join   [dbo].[TaskTimeouts] t on s.TaskName = t.TaskName 
where  s.StateId = 0
and    s.StartedAt < cast(dateadd(second, -@minTimeoutSec, getdate()) as datetime2(3)) and    datediff(second, s.StartedAt, getdate()) > t.TimeoutSeconds;
go

And the bad plan

BadUpdatePlanUsingScan

I didn’t understand why a filtered column needed to also be an included one. Then I remembered a post I was forwarded about six months ago, and sure enough the answer (at least a reasonable explanation and workaround) was given in the post’s comments. The post filterd index side effect and its comments are well worth reading.

Great, problem solved. Well, yes and no. What really was the problem? Well, unfortunately me and here is why:

  1. During discussions with the dev team I presented the solution using a select statement.
  2. The statement implemented was an update.
  3. I should have reviewed the code better, communicated the change better to the DBAs, who in turn on spotting the problem could have fed back more promptly.
  4. How about tests to ensure the index was being used, tricky, but very possible.

And the real lesson learned is update code behaves very differently to select code and the optimizer does not make any assumptions about an index filter clause. As stated in the post I reference above, best practice for filtered indexes is to ensure the filtered columns are included.

All the code presented is available here in GitHub.

Enjoy.