May 26

Introducing SQL Horizons

Doing SQLLunch as I have been over the past year or so, I get asked by a lot of people, perhaps a surprisingly high number given the number of webinars that do happens , “Can you make these available online?”.

The answer to this hasn’t changed, its still no. I think that those who can get to the venues do like the formula of SQLLunch and I don’t intend to change that.

However, it is clear that there still maybe a gap in the market and with that in mind I have been working with the good folks at PASS and have created an online only user group “SQL Horizons Online”.

Our first meeting will be on Thursday 30th May @ 8pm BST with Jody Roberts, PASS regional mentor for Africa, who will be talking about HA and DR, What the big difference?

Details of registration and meeting room details will be posted at : http://sqlhorizons.sqlpass.org/

I hope that you can join us online for this and future events.

May 17

When is getdate() deterministic ?

Def :     

Deterministic , a function that will always return the same output for the same input.
Non-Deterministic, a function that will return different result for the same input.

Going by that the above definition is is obvious that getdate() is non-deterministic,  each execution will return a different output as time goes by.  However, there is at least one case inside sqlserver where it isn’t treated as such and that is within estimations.

A question I was asked recently was:
Which is better ?

1
2
3
select count(*)
from DateTab
where dateCol > =cast(getdate() as date)

or

1
2
3
4
5
declare @nowdate
Select @now=getdate()
Select count(*)
from DateTab
where Datecol>=@now

‘Better’ is a bit of an woolly term and can mean many things,  but for our purpose lets assume that better means ‘Provides the most accurate estimate’.

At first glance there isn’t much to choose between the two, they are functionally equivalent, but there is one significant difference, and that is the row estimation.

To demonstrate, we will need some data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
use tempdb
go
Drop table DateTab
go
Create table DateTab
(
DateCol date primarykey
)
go
insert into DateTab
select top(20) dateadd(dd,row_number() over(order by (select null))-9,getdate())
from sys.columns
go
select count(*)
from DateTab
where dateCol>=cast(getdate()as date)
go
declare @nowdate
Select @now=getdate()
Select count(*)
from DateTab
where Datecol>=@now

Upon executing the count statements, you will see that by using getdate() directly the estimate is 12, spot on accurate with the amount of actual rows,  by using the variable the estimate is 6.  It has returned an estimate of ~30% of the quantity of rows,  this is the default behaviour when using a variable and a non-equality predicate of >=.

By using getdate() however the estimate has been derived from the statistics of the column.  When building the plan the optimizer has read the statistics and summed the following rows and returned that as the estimate.  Naturally this will mean that as time marches on and as the plan is cached the the estimate of rows will be based upon the initially executed getdate().  But I would imagine that this sort of query is used for counting orders over the last X days (or other temporal period),  that being the case then estimate of rows should still be ‘good’ as the table will be continually being populated with fresh orders.

It is important to note that even if the predicate is ‘dateadd(dd,-30,getdate())’ then the optimizer will return an estimate based upon today minus 30 days.

This behaviour is documented here: http://msdn.microsoft.com/en-us/library/ms175933(v=sql.100).aspx under the snappily titled “Compile-Time Expression Evaluation for Cardinality Estimation”

  

 

May 11

What price a deadlock ?

“Oh, if we deadlock we’ll retry a few times until it goes through.”:

Ive had a few variations of this conversation of the years and on the face of it, what does it really matter ? You get a deadlock, you retry, retry, retry until eventually it does complete ok. You get the right data eventually and job done.

The problem here is that although SQLServer does a fine job of detecting and killing out one process or other that is involved in a deadlock, this is NOT instantaneous. There is a background process called the Lock Monitor that scans the lock chains for deadlocked processes. This job fires every 5 seconds. http://msdn.microsoft.com/en-us/library/ms178104(v=sql.105).aspx . The point being that in a deadlock scenario, it can be upto 5 seconds before a process is rolledback. 5 seconds is an eternity on a busy OLTP system.

Lets dummy up some code to demonstrate how with scale the situation gets worse and worse.

Create table DeadLockTab
(
ID integer identity Primary key,
SomeData uniqueidentifier
)
go
insert into DeadLockTab
(
    SomeData
)
Select top(10) newid()
from sys.columns
go

Now using Adam Machanics SQLQueryStress, lets run some code that will deadlock. All I wish to do I simply update 3 random rows in the 10 row table.

set nocount on
Declare @c integer =0

begin transaction
while(@C<3) begin
Declare @id integer = cast(rand()*10 as integer)+1
  Update DeadLockTab
     set SomeData = newid() 
  where ID = @id
select @c+=1

end
commit

 

Not the prettiest of code but it will do the job and if we run 100 iterations on 1 thread…

So, how about 100 iteration on 3 threads ?

A massive increase in the Seconds/Iteration from 0.0005 to 0.0900 ( well over a hundred times slower) and only with 6 deadlocks (Total Exceptions) over the 300 iterations.

Lets go crazy, and simulate a webscale 7 concurrent sessions J

That is the cost of a deadlock right there, an average of .1 second per call.

There are many solutions to this, but I want to say: don’t be afraid to proactively block. If we drop concurrency to 1 by using “Select count(*) from DeadlockTab with (TABLOCKX,HOLDLOCK)”

Then the average execution speed AT SCALE (and that’s what really matters) is dramatically improved.

So consider the time taken to resolve a deadlock next time you are told “If that happens we’ll simply resubmit the query”. 5 seconds is an eon and during this time further locks and blocks will accumulate behind the deadlocked processes further exacerbating the situation.

Apr 10

What Price A TimeOut ?

If you have been interested in query processing and looking at execution plans, you may may well have noticed that a query states a “Reason For Early Termination” as a property.

clip_image001

There are 4 return values here, <BLANK>, “Good Enough Plan Found”, “MemoryLimitExceeded” and”Timeout”.

Let’s ignore “MemoryLimitExceeded”, I’ve never experienced it in the wild but the intent behind this is (relatively) clear.  “Good Enough Plan Found” states that the optimizer has found a decent plan to execute, remember that the optimizers’ job is to find a “Good enough plan in a short enough time”, not a perfect plan.  It’s not a good thing to happen if the optimizer takes 60seconds to return that perfect plan, if an OK plan can be found and executed is 5 seconds.  <BLANK> is tricky one to summarise succinctly, but this states that optimization has come to a natural end, the optimizer completed a parsing stage but did not meet the entry requirements of the next. That leaves “TimeOut” and in this blog I wish to focus on that.

Conventional wisdom dictates that TimeOut states that the optimizer has taken too long whilst optimizing a query and the best of the plans so far found is returned.  This would imply that if the optimizer could work harder a ‘better’ plan would be returned.  First off, let’s find a query that does “Timeout”.

 

 
declare @SalesPersonID integer
declare @FullName nvarchar(255)
declare @JobTitle nvarchar(30)
declare @SalesTerritory nvarchar(30)
declare @Sales2002 money
declare @Sales2003 money
declare @Sales2004 money
select  @SalesPersonID = SalesPersonID,
        @FullName = FullName,
        @JobTitle = JobTitle,
        @SalesTerritory = SalesTerritory,
        @Sales2002 = [2002],
        @Sales2003 = [2003],
        @Sales2004 = [2004]
   From Sales.vSalesPersonSalesByFiscalYears
  where SalesTerritory ='Canada'

If you execute this against AdventureWorks2012, you will see that the reason for early termination is “Timeout”

clip_image002

Plan explorer has highlighted the issue, and also neatly add a yellow warning triangle to the plan itself.  So why did this query timeout?  Well the number of potential combinations was massive and the optimizer only tried so many before returning, how many? Interesting question, let’s use the undocumented flag 8675 to find out the stages entered and the number of tasks (not potential plans as such) considered by the optimizer.

 
declare @SalesPersonID integer
declare @FullName nvarchar(255)
declare @JobTitle nvarchar(30)
declare @SalesTerritory nvarchar(30)
declare @Sales2002 money
declare @Sales2003 money
declare @Sales2004 money
select  @SalesPersonID = SalesPersonID,
        @FullName = FullName,
        @JobTitle = JobTitle,
        @SalesTerritory = SalesTerritory,
        @Sales2002 = [2002],
        @Sales2003 = [2003],
        @Sales2004 = [2004]
   From Sales.vSalesPersonSalesByFiscalYears
  where SalesTerritory ='Canada'
option(QueryTraceOn 8675,QueryTraceOn 3604)

That gives us the output

clip_image003

To prevent the optimizer carrying on for longer, search(1) has been aborted at task 1100 and a plan has been returned.

So this is potentially a bad plan, right?  And if we made the optimizer work harder a better plan would be found.  This can actually be achieved by another undocumented flag 8780, and that disables the optimizer timeout(s).  Let’s see that in action:

 

 
declare @SalesPersonID integer
declare @FullName nvarchar(255) 
declare @JobTitle nvarchar(30) 
declare @SalesTerritory nvarchar(30) 
declare @Sales2002 money 
declare @Sales2003 money 
declare @Sales2004 money 
select  @SalesPersonID = SalesPersonID, 
        @FullName = FullName, 
        @JobTitle = JobTitle, 
        @SalesTerritory = SalesTerritory, 
        @Sales2002 = [2002], 
        @Sales2003 = [2003], 
        @Sales2004 = [2004] 
   From Sales.vSalesPersonSalesByFiscalYears 
  where SalesTerritory ='Canada'
option (querytraceon 8780,querytraceon 8675,querytraceon 3604)

clip_image004

As you can see no more “TimeOut” and the returned plan, which is now an entirely different plan to before, has a reason for early termination of “Good enough Plan Found”, but we did take a total time of 0.132 secs as opposed to 0.053 secs of the “TimeOut” version.  Is that sacrifice worthwhile? How does our “Good Enough” query runtime duration compare to the Timeout query duration?

clip_image005

As it turns out not to well, the “Good Enough” plan is slower by quite a significant margin and rejecting the notion that “TimeOut” = Bad.

But, this is just my laptop (i3 with SSD if you must know) hardly what you would call enterprise standard hardware. Maybe, the extra cost of cpu is offset by the saving in IO? Not a bad assumption. Lets throw some ‘big iron’ at the problem, to the Cloud.

A quick hattip must go to Jamie Thomson and his AdventureWorks on Azure community project, if you wish to play on Azure this is the best way to do it.  http://sqlblog.com/blogs/jamie_thomson/archive/2012/03/27/adventureworks2012-now-available-to-all-on-sql-azure.aspx

 

SQLAzure does not support the use of traceflags, but you can use the USE PLAN hint and provide a plan. This I have done and also created stored procedures for the queries. What are the timings now?

With timeout

clip_image006

and “Good enough”

clip_image007

These timings are after several executions so no compile times are included here.

That’s another assumption put to bed, throwing hardware at the problem hasn’t helped (as usual J ).

Returning back to on premise (or at least my laptop), why is the “Good Enough” query slower, specifically why is the CPU usage so high? There is nothing in particular in the plan that would point to a high cpu usage.  One tool that is gaining more exposure in the SQLServer world is XPERF and that is an excellent tool to reveal what a process has spent its time doing.  Examining the traces whilst running the “Good Enough” plan shows something quite interesting:

clip_image008

The majority of the processing time has been in Dateadd and DatePart, this is coming from the line in the view “YEAR(DATEADD(m, 6, soh.[OrderDate])) AS [FiscalYear] “.

The plan itself exposes this activity in a rather innocuous “compute scalar” operator that is executed 31,465 times.

clip_image009

A quick reworking of the schema and view will enable us to entirely remove this calculation and therefore this cpu cost.

 
alter table Sales.SalesOrderHeader add FiscalYearCalc smallint
go
update Sales.SalesOrderHeader set FiscalYearCalc = YEAR(DATEADD(m, 6, [OrderDate]))
go
CREATE VIEW [Sales].[vSalesPersonSalesByFiscalYearsCalc]
AS
SELECT pvt.[SalesPersonID]
      ,pvt.[FullName]
      ,pvt.[JobTitle]
      ,pvt.[SalesTerritory]
      ,pvt.[2002]
      ,pvt.[2003]
      ,pvt.[2004]
FROM (SELECT soh.[SalesPersonID]
            ,p.[FirstName] + ' ' + COALESCE(p.[MiddleName], '') + ' ' + p.[LastName] AS [FullName]
            ,e.[JobTitle]
            ,st.[Name] AS [SalesTerritory]
           ,soh.[SubTotal]
           ,[FiscalYearCalc] AS [FiscalYear]
      FROM [Sales].[SalesPerson] sp
INNER JOIN [Sales].[SalesOrderHeader] soh
   ON sp.[BusinessEntityID] = soh.[SalesPersonID]
INNER JOIN [Sales].[SalesTerritory] st
   ON sp.[TerritoryID] = st.[TerritoryID]
INNER JOIN [HumanResources].[Employee] e
   ON soh.[SalesPersonID] = e.[BusinessEntityID]
INNER JOIN [Person].[Person] p
   ON p.[BusinessEntityID] = sp.[BusinessEntityID]
) AS soh
PIVOT
(
SUM([SubTotal])
FOR [FiscalYear]
IN ([2002], [2003], [2004])
) AS pvt;

Now upon retrying the equivalent queries something quite interesting happens:

 
declare @SalesPersonID integer
declare @FullName nvarchar(255)
declare @JobTitle nvarchar(30)
declare @SalesTerritory nvarchar(30)
declare @Sales2002 money
declare @Sales2003 money
declare @Sales2004 money
select @SalesPersonID = SalesPersonID,
       @FullName = FullName,
       @JobTitle = JobTitle,
       @SalesTerritory = SalesTerritory,
       @Sales2002 = [2002],
       @Sales2003 = [2003],
       @Sales2004 = [2004]
  From Sales.vSalesPersonSalesByFiscalYearsCalc
 where SalesTerritory ='Canada'

clip_image010

 
declare @SalesPersonID integer
declare @FullName nvarchar(255)
declare @JobTitle nvarchar(30)
declare @SalesTerritory nvarchar(30)
declare @Sales2002 money
declare @Sales2003 money
declare @Sales2004 money
select @SalesPersonID = SalesPersonID,
       @FullName = FullName,
       @JobTitle = JobTitle,
       @SalesTerritory = SalesTerritory,
       @Sales2002 = [2002], 
       @Sales2003 = [2003],
       @Sales2004 = [2004]
  From Sales.vSalesPersonSalesByFiscalYearsCalc
 where SalesTerritory ='Canada'
option(querytraceon 8780)

clip_image011

Allowing the optimizer to run to its conclusion and not timeout has resulted in the same plan being returned. So, in this case, we can state that “Timeout” is certainly “Good Enough”.

Hopefully this blog will remove some of the vilification of “Timeout” and to some extent shows how it protects the optimizer from getting carried away with itself to much and also demonstrates how a change in schema results in a change of plan as the expected workload changes.

Naturally this ‘all depends’, your mileage will vary, and there will be cases where maybe removing the timeout will allow for a better plan to be built.  In the grand scheme of things should you overly worry about timeouts ? Ill leave you with a quote from Conor Cunningham “In practice, what I suggest you do is to examine whether you are happy with any given plan and stop worrying about the nastiness of the generalities of the search framework”. To paraphrase, If the plan you have does the job you need it to in an acceptable time frame, then that is “Good Enough”.