When is getdate() deterministic ?

Def :     

Deterministic , a function that will always return the same output for the same input.
Non-Deterministic, a function that will return different result for the same input.

Going by that the above definition is is obvious that getdate() is non-deterministic,  each execution will return a different output as time goes by.  However, there is at least one case inside sqlserver where it isn’t treated as such and that is within estimations.

A question I was asked recently was:
Which is better ?

1
2
3
select count(*)
from DateTab
where dateCol > =cast(getdate() as date)

or

1
2
3
4
5
declare @nowdate
Select @now=getdate()
Select count(*)
from DateTab
where Datecol>=@now

‘Better’ is a bit of an woolly term and can mean many things,  but for our purpose lets assume that better means ‘Provides the most accurate estimate’.

At first glance there isn’t much to choose between the two, they are functionally equivalent, but there is one significant difference, and that is the row estimation.

To demonstrate, we will need some data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
use tempdb
go
Drop table DateTab
go
Create table DateTab
(
DateCol date primarykey
)
go
insert into DateTab
select top(20) dateadd(dd,row_number() over(order by (select null))-9,getdate())
from sys.columns
go
select count(*)
from DateTab
where dateCol>=cast(getdate()as date)
go
declare @nowdate
Select @now=getdate()
Select count(*)
from DateTab
where Datecol>=@now

Upon executing the count statements, you will see that by using getdate() directly the estimate is 12, spot on accurate with the amount of actual rows,  by using the variable the estimate is 6.  It has returned an estimate of ~30% of the quantity of rows,  this is the default behaviour when using a variable and a non-equality predicate of >=.

By using getdate() however the estimate has been derived from the statistics of the column.  When building the plan the optimizer has read the statistics and summed the following rows and returned that as the estimate.  Naturally this will mean that as time marches on and as the plan is cached the the estimate of rows will be based upon the initially executed getdate().  But I would imagine that this sort of query is used for counting orders over the last X days (or other temporal period),  that being the case then estimate of rows should still be ‘good’ as the table will be continually being populated with fresh orders.

It is important to note that even if the predicate is ‘dateadd(dd,-30,getdate())’ then the optimizer will return an estimate based upon today minus 30 days.

This behaviour is documented here: http://msdn.microsoft.com/en-us/library/ms175933(v=sql.100).aspx under the snappily titled “Compile-Time Expression Evaluation for Cardinality Estimation”

  

 

Leave a Reply

Your email address will not be published. Required fields are marked *