Aug 15

Please please tell me now…

.. Is there something I should know,  sang Duran Duran many moons ago when I was young and before I knew that query optimization was even a thing.  But,  it is quite a handy little line to keep in mind when you are tuning queries,  giving SQL server a little bit more explicit information about your data can go a long way to removing the work that is done to guarantee something about the data.

The classic case for this is adding constraints, for example :

1
select * from Production.Product where DaysToManufacture < 0

Produces the plan of :
1
SQL Server doesn’t need to read any of the table data.  This is because there is a constraint on the table guaranteeing that there are no rows <0 and therefore SQL Server can safely replace that table read with a constant scan.

So how else can this be exploited?

Well, im working on a large datawarehouse project and one of the issues we have had is poor query against the datawarehouse to process the cubes.  Imagine we are using the query of :

1
2
3
4
5
6
select dimCustomer.CustomerKey,dimCustomer.Title,dimCustomer.FirstName,dimCustomer.LastName,
       FactInternetSales.SalesOrderNumber,FactInternetSales.SalesOrderLineNumber,
       factInternetSales.ProductKey,UnitPrice 
  from dimCustomer
  join factInternetSales
    on dimCustomer.CustomerKey = factInternetSales.CustomerKey

in a DSV.  Now SSAS has a high propensity, I don’t know if it always does this, to wrap queries in a subselect and use DISTINCT.  So our query would come something like this :

1
2
3
4
5
6
7
8
Select distinct *
from (
   select dimCustomer.CustomerKey,dimCustomer.Title,dimCustomer.FirstName,dimCustomer.LastName,
          FactInternetSales.SalesOrderNumber,FactInternetSales.SalesOrderLineNumber,
          factInternetSales.ProductKey,UnitPrice from dimCustomer
     join factInternetSales
       on dimCustomer.CustomerKey = factInternetSales.CustomerKey
) as Data

The plan hasn’t changed, both look like this :

2

That is because the primary key of both tables are being returned and that guarantees the DISTINCT , if either key column is removed from the select then SQL Server will have to sort to make the DISTINCT.

1
2
3
4
5
6
7
8
9
10
Select distinct *
from (
    select dimCustomer.CustomerKey,dimCustomer.Title,dimCustomer.FirstName,dimCustomer.LastName,
           /* FactInternetSales.SalesOrderNumber,FactInternetSales.SalesOrderLineNumber, */
           factInternetSales.ProductKey,UnitPrice 
      from dimCustomer
      join factInternetSales
        on dimCustomer.CustomerKey = factInternetSales.CustomerKey
) as Data
option(maxdop 1) – Inhibit parallelism for simplicity

3

Yes, we are asking SQL Server for two different sets of data, they are not logically equivalent queries, but sometimes the data is implicitly unique and still SQL server has to do work to guarantee that uniqueness.

Let us further imagine that Unknown members are dealt with buy setting the customer key to -1 in factInternetSales BUT the member -1 itself is not in the dimCustomer table and is dealt with by a view. This is quite a bad practice but that never stops this sort of stuff occurring in the real world, so bear with me..

1
2
3
4
5
6
7
Drop view vwCustomer
go
Create View vwCustomer
as
Select CustomerKey,Title,FirstName,LastName from dimCustomer
union
Select -1,null,null,'Unknown'

 

First thing to note is that UNION is used,  UNION tries to guarantee a unique set of rows and therefore extra work will have to be employed by SQL Server to do that work even on a simple select * from vwCustomer.  The recommendation would be to use UNION ALL, we implicitly know that the data will be unique so we can save SQL Server the bother..

1
2
3
4
5
6
7
Drop view vwCustomer
go
Create View vwCustomer
as
Select CustomerKey,Title,FirstName,LastName from dimCustomer
union all
Select -1,null,null,'Unknown'

Now we plumb that into our DSV…

1
2
3
4
5
select dimCustomer.CustomerKey,dimCustomer.Title,dimCustomer.FirstName,dimCustomer.LastName,
       FactInternetSales.SalesOrderNumber,FactInternetSales.SalesOrderLineNumber,
       factInternetSales.ProductKey,UnitPrice from vwCustomer dimCustomer
  join factInternetSales
    on dimCustomer.CustomerKey = factInternetSales.CustomerKey

Which if we run in isolation is just peachy. However let us wrap this in DISTINCT as SSAS would:

1
2
3
4
5
6
7
8
9
Select distinct *
from (
   select dimCustomer.CustomerKey,dimCustomer.Title,dimCustomer.FirstName,dimCustomer.LastName,
          FactInternetSales.SalesOrderNumber,FactInternetSales.SalesOrderLineNumber,
          factInternetSales.ProductKey,UnitPrice from vwCustomer dimCustomer
     join factInternetSales
       on dimCustomer.CustomerKey = factInternetSales.CustomerKey
) as Data
option(maxdop 1)

4
Oh, we now have a sort.  This will be REALLY heavy on a regular DW load, ordering millions of rows is not quick.

So, what is our solution ?,  lets tell SQL Server what our implied knowledge of the data is to make it explicit.

1
2
3
4
5
6
7
8
Drop view vwCustomer
go
Create View vwCustomer
as
Select CustomerKey,Title,FirstName,LastName from dimCustomer
 where CustomerKey >0
union all
Select -1,null,null,'Unknown'

In all intents and purposes adding CustomerKey >0 does nothing, there are no rows <0 nothing will be filtered and this is the exact point.  This is implied knowledge, we have now explicitly given SQL Server that same knowledge.

So what happens when we use that view ?

5

Great no sorting 🙂

So that deals with this as a one-off ? Lets let SQL Server know that this is the same for all queries of the table not just for this view.

1
alter table dimCustomer add Constraint ChkSk check(CustomerKey>0)

We are now guaranteeing that all CustomerKeys are positive, so if we back up a bit and use the view definition of :

1
2
3
4
5
6
7
Drop view vwCustomer
go
Create View vwCustomer
as
Select CustomerKey,Title,FirstName,LastName from dimCustomer
union all
Select -1,null,null,'Unknown'

And then run :

1
2
3
4
5
6
7
8
9
10
Select distinct *
from (
   select dimCustomer.CustomerKey,dimCustomer.Title,dimCustomer.FirstName,dimCustomer.LastName,
          FactInternetSales.SalesOrderNumber,FactInternetSales.SalesOrderLineNumber,
          factInternetSales.ProductKey,UnitPrice 
     from vwCustomer dimCustomer
     join factInternetSales
       on dimCustomer.CustomerKey = factInternetSales.CustomerKey
) as Data
option(maxdop 1)

Once again, the DISTINCT is explicitly guaranteed by the schema, no extra operations are required by SQL Server to prove that J

So, next time you are query tuning, remember that adding a little extra ( and sometimes seemingly superfluous code) can go a long way.

To paraphrase a bit

Please please tell me now, is there something you can say, to make that operator so away….

Apr 12

Reasons why your plans suck : No 56,536

I have been working with SQL server for more years than I really care to mention and like to think that I am now quite comfortable with the internals, specifically the Query Optimizer. Every so often though a new problem gets thrown into the mix, just to keep me on my toes ( or so it would seem  ).

Plans go “wrong” for a multitude of reasons, but they mostly boil down to a poor estimation. Poor estimations are generally caused by inadequate statistics. This is why SQL Server puts so much effort into efficiently creating and maintaining them. So, on my dev box I had a plan that went “wrong”, a 3 sec process turned into a 3hr process, a quick bit of investigation quickly turned up the culprit. A relatively simple select statement, no worries, its probably due to an old cached plan being used. Here is the plan:

plan

In reality, Application had hundreds of thousands of rows not one.

So, I added WITH(RECOMPILE), that should sort that one. But NO, on the next run, same thing with the same statement. Curious! Running the statement in isolation caused a “good” plan to be built, running with the full process, a “Bad” plan was built. So, why a bad estimation ? Bad stats ? In fact the plan was stating “missing stats” as a warning.

missing

Right, so not Bad stats, it’s no stats. Why would there be no stats ? Not too many reasons, async stats processing could be a culprit, it’s a large table so stats could be potentially built async. But, no, that option is disabled.
Possibilities are narrowing down now, but I thought that I would take a look at the internal info that is not normally available within the execution plan. For this you will need to use the undocumented trace flag 8666: Turn this on using DBCC TRACEON(8666) before retrieving the plan.
So inside the plan, I saw this :

failed

Stats failed to build, not “not found” or “non existent” the stats actually failed to build. Running profiler with error events, StmtStarting and StmtEnding we can see this activity happening. If you are not aware, stats are built using the STATMAN function, this can be seen using a profiler trace just as any user initiated action can. So, run the process and….

profiler

There it failed to build stats, so the estimation was wrong and I got a bad plan, bingo. That’s the effect, whats the cause of the failed stats build ?

There are a few reasons why stats fail to build mostly edge-case scenarios such as deadlock, but remember that I said this was a dev box ? A small dev box indeed, in this case the stats build failed due to memory pressure. How can I be certain of that ? Well playing the hunch card I saw an extended event, “query_no_cqscan_due_to_memory_limitation”, this, in amongst a few others sounded relevant and indeed when running the process through that event does fire when statements fail to build with memory pressure.

cqscan

Sometimes I am in awe of the amount of work that has gone into SQLServers’ query processing engine and this is one of those. Ideally I would have liked the problem to be signposted more clearly, maybe the error log, but beggars can’t be choosers.

Mar 29

T4 Support in SSDT

Sometimes a piece of technology passes you by completely, T4 templating happens to have done that for me.  On tuesday night, at SQL Supper, Geoff Clark ( a uk MCM ) (t) done a fantastic presentation on end-to-end datawarehouse loading, a portion of this was on t4 templating.  The question arose about SSDT and T4 templating and as a coincidence literally the next day  the SSDT team put out the March 2014 release,  having my interest already piqued i thought id have a play 🙂

So… lets add a TSQL Template

sqltt

 

That will now add base TSQL template into you project, however there is a slight snag , its broken 🙁

broken

This is confirmed and will be fixed in the next release as stated in this msdn forum thread. Bugger!!!

However, until then, all is not lost,  although the DacFX portion is broken T4 templating still works so a more ‘generic’ solution is available.  The basis of this is remarkably similar to TSQL Smells,  iterate through a project adding the .sql files into a model and examining the DOM for interesting ‘stuff’.

After a bit of playing around, very much a T4 noob, this is the script I came up with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
<#@ template language="C#" debug="true" hostspecific="true" #>
<#@ assembly name="Microsoft.VisualStudio.Shell.Interop.8.0" #>
<#@ assembly name="EnvDTE" #>
<#@ assembly name="EnvDTE80" #>
<#@ assembly name="VSLangProj" #>
<#@ assembly name="C:\Program Files (x86)\Microsoft SQL Server\120\SDK\Assemblies\Microsoft.SqlServer.TransactSql.ScriptDom.dll" #>
<#@ assembly name="C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE\Extensions\Microsoft\SQLDB\DAC\120\Microsoft.SqlServer.Dac.dll" #>
<#@ assembly name="C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE\Extensions\Microsoft\SQLDB\DAC\120\Microsoft.SqlServer.Dac.Extensions.dll" #>
<#@ import namespace="Microsoft.VisualStudio.Shell.Interop" #>
<#@ import namespace="EnvDTE" #>
<#@ import namespace="EnvDTE80" #>
<#@ import namespace="Microsoft.VisualStudio.TextTemplating" #>
<#@ import namespace="Microsoft.SqlServer.Dac" #>
<#@ import namespace="Microsoft.SqlServer.Dac.Model" #>
<#@ import namespace="System.IO" #>
<#@ import namespace="System.Collections.Generic" #>
<#@ output extension=".sql" #>
 
-- Dynamic File generated by db
<#
    var hostServiceProvider = (IServiceProvider)this.Host;
    var dte = (DTE)hostServiceProvider.GetService(typeof(DTE));
 
    using (TSqlModel model = new TSqlModel(SqlServerVersion.Sql110, new TSqlModelOptions { }))
    {
        foreach(Project project in dte.Solution)
        {
            IterateThroughProject(project.ProjectItems,model);
        }
 
        List<TSqlObject> allTables = GetAllTables(model);
        foreach (var table in allTables)
        {
#>
--				Table <#= table.Name.Parts[0] #>.<#= table.Name.Parts[1] #>
<#
 
        }
    }
 
#>
-- File Done
<#+
 
    public List<TSqlObject> GetAllTables(TSqlModel model)
    {
        List<TSqlObject> allTables = new List<TSqlObject>();
 
        var tables = model.GetObjects(DacQueryScopes.All, ModelSchema.Table);
        if (tables != null)
        {
            allTables.AddRange(tables);
        }
        return allTables;
    }
 
 
 
    private void IterateThroughProject(ProjectItems PrjItems,TSqlModel model)
    {
        foreach(ProjectItem PrjItem in  PrjItems)
        {
 
            if(PrjItem.Name.EndsWith(".tt", StringComparison.OrdinalIgnoreCase)){ // Dont Load the files we want to build
                continue;
 
            }
            if(PrjItem.ProjectItems!=null)
            {
                IterateThroughProject(PrjItem.ProjectItems,model);
            }
            if(//PrjItem.Object.GetType().ToString() == "Microsoft.VisualStudio.Data.Tools.Package.Project.DatabaseFileNode" && 
                PrjItem.Name.EndsWith(".sql", StringComparison.OrdinalIgnoreCase))
            {
#>
--				This is a sql file and will be processed
--				<#= PrjItem.FileNames[0] #>
<#+
                if (!PrjItem.Saved)
                {
                    PrjItem.Save();
                }
                StreamReader Reader = new StreamReader(PrjItem.FileNames[0]);
 
                string Script = Reader.ReadToEnd();
                model.AddObjects(Script);
            }
        }
    }
 
#>

This should all be fairly self explanatory, we iterate through the project in the member function IterateThroughProject adding any found .sql files to the model. Then use the model.GetObjects member to find all tables in the model, iterate over that list printing schema and table name.

I have seen a few clunky TSQL generation routines written in TSQL but i think porting those into T4 templating and having them built directly into a dacpac will be a big boon.

Have fun

Jan 03

Reading a snapshot – Are there performance implications ?

My present role is to look at the performance of a large datawarehouse load.  This is right up my alley,  and I have be playing with all manner of performance tuning fun, indexing, execution plans, filtered statistics etc, all good stuff.

However,  during QA testing some of my suggestions actually made things worse.  Not a little worse, but a whole lotta worse,  big surprise.  Going back to my dev box I confirmed that the TSQL changes should have made things better, but there was one significant ( other than data load )  difference.  To ensure transactional integrity a database snapshot is used,  this means that the load process works on a static view of the data whilst the OLTP system can carry on doing its work.  In the QA testing we, by definition, ran the whole process,  when I was dev’ing I read from the database not the snapshot,  interesting.

When I switched to using a snapshot things did indeed go slower,  lets try this against AdventureWorks:

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
CREATE DATABASE AdventureWorks2012ss ON
( NAME = AdventureWorks2012_Data, FILENAME = 
'D:\DATAAdventureWorks2012_data.ss' )
AS SNAPSHOT OF AdventureWorks2012;
GO
set statistics io on
GO
DBCC DROPCLEANBUFFERS
GO
use AdventureWorks2012
GO
select count(SOH.Comment),count(SOD.ProductID) from sales.SalesOrderHeader SOH
inner loop join sales.SalesOrderDetail SOD
on SOH.SalesOrderID = SOD.SalesOrderID
GO
use AdventureWorks2012ss
GO
set statistics io on
go
DBCC DROPCLEANBUFFERS
GO
select count(SOH.Comment),count(SOD.ProductID) from sales.SalesOrderHeader SOH
inner loop join sales.SalesOrderDetail SOD
on SOH.SalesOrderID = SOD.SalesOrderID
GO

OK, so that is a fairly non-sensical query and forcing the loop join is there to demonstrate the issue, but the plans are exactly the same. Running those and tracing in profiler shows the performance difference :
SnapshotRead

Reading the database ~1.0-1.1 secs, reading the Snapshot ~1.5-1.6 seconds, a sizeable difference. Other noteworthy things here are that the snapshot file and the database file exist on the same physical drive and that drive is a slow USB drive. Testing against my laptops SSD, try as I might, demonstrated no measurable difference.

There is however, one difference in the output of the above scripts, physical reads.
SnapshotPhysicalRead

150 against the database, 1150 against the snapshot. This sounds to me like a likely cause of the slowdown.

When reading a snapshot the sparse file has to be read initially but then the database is read if the page has not yet been changed, a double whammy if you will. Also this happens on a page read by page read basis, this is why I forced the nested loop join in the query. Using Process Monitor from the sysinternals tool set we can see this activity in action….

Firstly reading the database:

ReadDB

Nice large reads of 65,536 bytes each, now the snapshot…

ReadSnapshot

Lots of reads of 8,192, this is the activity that I believe is slowing my queries down. I asked Microsoft for an opinion on what is actually happening here, the response was :

When we do a large read against a COW sparse file, pages that have not been updated in the primary will not be in the sparse file and we will have to read from the primary’s file. This is done on a page by page basis so can result in many reads against the primary for a single read in the snapshot DB.

If the pages have been pushed over to the snapshot’s COW file then they can be read in a large read.

Essentially, we’ll only do large reads against pages that have been pushed into the snapshot.

The broken up reads against the primary are issued in parallel, so while it is more IOs it is not necessarily a longer time to do those IOs.

Sparse file reads themselves are not likely very optimal since they are by definition fragmented, so OS will likely end up breaking up those reads anyway.

So the activity that I am seeing is natural and normal and that explanation does highlight the physical differences that are necessitated when reading a snapshot.

So back to the problem in hand, to fix the queries that were running slow in the real examples I used the FORCESCAN hint which will prevent the nested loop operation and enable SQLServer to do larger page reads against the database file for those unchanged pages in the snapshot, this did have a dramatic effect.

Have you seen similar activity before ? Is this only an issue for slow or overloaded io systems ? I’d love to know what you have witnessed.

Sep 30

A quickie post on debugging

I’m not a great fan of the debugging feature in management studio, It always strikes me that to get any use from it then you are iterating (looping) too much ie doing TSQL wrong.  TSQL is not a procedural programming language and the debugger would seem to be most useful if you did develop TSQL in a procedural manner.  If you could use it to peek inside a temporary table or execute a single cte in a complex statement then we are talking 🙂

What I do like to do though is use the little known, at least to the people I speak to,  “Trace Query in SQL Server Profiler” option. 

Profile

This is accessed by R-Clicking in a query window and will run profiler with a filter to the current windows spid.  Quite neat.

This feature can be further enhanced to be even more useful.  When profiler is run it will default to using the template “TSQL_SPs”.  The default configuration of this template is, IMO, not very useful,  so I have created my own “TSQL_SPs” template which contains those events and columns that I am interested in.  That being, all errors and warnings, SP:StmtCompleted, SQL:StmtRecompile, SQL:BatchCompleted and a few others,  with the Duration,CPU, Reads, Writes columns.  This is simple to do, create a profiler session with the events and columns that you are concerned about and simply File->Save As->Template and select “TSQL_SPs” from the drop down.  Hopefully next time you “Trace Query in SQL Server Pofiler” your template will be loaded and have the SPID filter automatically added.

To my mind, that is real TSQL debugging, how hard is this statement going to hit the metal?

 

Apr 10

What Price A TimeOut ?

If you have been interested in query processing and looking at execution plans, you may may well have noticed that a query states a “Reason For Early Termination” as a property.

clip_image001

There are 4 return values here, <BLANK>, “Good Enough Plan Found”, “MemoryLimitExceeded” and”Timeout”.

Let’s ignore “MemoryLimitExceeded”, I’ve never experienced it in the wild but the intent behind this is (relatively) clear.  “Good Enough Plan Found” states that the optimizer has found a decent plan to execute, remember that the optimizers’ job is to find a “Good enough plan in a short enough time”, not a perfect plan.  It’s not a good thing to happen if the optimizer takes 60seconds to return that perfect plan, if an OK plan can be found and executed is 5 seconds.  <BLANK> is tricky one to summarise succinctly, but this states that optimization has come to a natural end, the optimizer completed a parsing stage but did not meet the entry requirements of the next. That leaves “TimeOut” and in this blog I wish to focus on that.

Conventional wisdom dictates that TimeOut states that the optimizer has taken too long whilst optimizing a query and the best of the plans so far found is returned.  This would imply that if the optimizer could work harder a ‘better’ plan would be returned.  First off, let’s find a query that does “Timeout”.

 

 
declare @SalesPersonID integer
declare @FullName nvarchar(255)
declare @JobTitle nvarchar(30)
declare @SalesTerritory nvarchar(30)
declare @Sales2002 money
declare @Sales2003 money
declare @Sales2004 money
select  @SalesPersonID = SalesPersonID,
        @FullName = FullName,
        @JobTitle = JobTitle,
        @SalesTerritory = SalesTerritory,
        @Sales2002 = [2002],
        @Sales2003 = [2003],
        @Sales2004 = [2004]
   From Sales.vSalesPersonSalesByFiscalYears
  where SalesTerritory ='Canada'

If you execute this against AdventureWorks2012, you will see that the reason for early termination is “Timeout”

clip_image002

Plan explorer has highlighted the issue, and also neatly add a yellow warning triangle to the plan itself.  So why did this query timeout?  Well the number of potential combinations was massive and the optimizer only tried so many before returning, how many? Interesting question, let’s use the undocumented flag 8675 to find out the stages entered and the number of tasks (not potential plans as such) considered by the optimizer.

 
declare @SalesPersonID integer
declare @FullName nvarchar(255)
declare @JobTitle nvarchar(30)
declare @SalesTerritory nvarchar(30)
declare @Sales2002 money
declare @Sales2003 money
declare @Sales2004 money
select  @SalesPersonID = SalesPersonID,
        @FullName = FullName,
        @JobTitle = JobTitle,
        @SalesTerritory = SalesTerritory,
        @Sales2002 = [2002],
        @Sales2003 = [2003],
        @Sales2004 = [2004]
   From Sales.vSalesPersonSalesByFiscalYears
  where SalesTerritory ='Canada'
option(QueryTraceOn 8675,QueryTraceOn 3604)

That gives us the output

clip_image003

To prevent the optimizer carrying on for longer, search(1) has been aborted at task 1100 and a plan has been returned.

So this is potentially a bad plan, right?  And if we made the optimizer work harder a better plan would be found.  This can actually be achieved by another undocumented flag 8780, and that disables the optimizer timeout(s).  Let’s see that in action:

 

 
declare @SalesPersonID integer
declare @FullName nvarchar(255) 
declare @JobTitle nvarchar(30) 
declare @SalesTerritory nvarchar(30) 
declare @Sales2002 money 
declare @Sales2003 money 
declare @Sales2004 money 
select  @SalesPersonID = SalesPersonID, 
        @FullName = FullName, 
        @JobTitle = JobTitle, 
        @SalesTerritory = SalesTerritory, 
        @Sales2002 = [2002], 
        @Sales2003 = [2003], 
        @Sales2004 = [2004] 
   From Sales.vSalesPersonSalesByFiscalYears 
  where SalesTerritory ='Canada'
option (querytraceon 8780,querytraceon 8675,querytraceon 3604)

clip_image004

As you can see no more “TimeOut” and the returned plan, which is now an entirely different plan to before, has a reason for early termination of “Good enough Plan Found”, but we did take a total time of 0.132 secs as opposed to 0.053 secs of the “TimeOut” version.  Is that sacrifice worthwhile? How does our “Good Enough” query runtime duration compare to the Timeout query duration?

clip_image005

As it turns out not to well, the “Good Enough” plan is slower by quite a significant margin and rejecting the notion that “TimeOut” = Bad.

But, this is just my laptop (i3 with SSD if you must know) hardly what you would call enterprise standard hardware. Maybe, the extra cost of cpu is offset by the saving in IO? Not a bad assumption. Lets throw some ‘big iron’ at the problem, to the Cloud.

A quick hattip must go to Jamie Thomson and his AdventureWorks on Azure community project, if you wish to play on Azure this is the best way to do it.  http://sqlblog.com/blogs/jamie_thomson/archive/2012/03/27/adventureworks2012-now-available-to-all-on-sql-azure.aspx

 

SQLAzure does not support the use of traceflags, but you can use the USE PLAN hint and provide a plan. This I have done and also created stored procedures for the queries. What are the timings now?

With timeout

clip_image006

and “Good enough”

clip_image007

These timings are after several executions so no compile times are included here.

That’s another assumption put to bed, throwing hardware at the problem hasn’t helped (as usual J ).

Returning back to on premise (or at least my laptop), why is the “Good Enough” query slower, specifically why is the CPU usage so high? There is nothing in particular in the plan that would point to a high cpu usage.  One tool that is gaining more exposure in the SQLServer world is XPERF and that is an excellent tool to reveal what a process has spent its time doing.  Examining the traces whilst running the “Good Enough” plan shows something quite interesting:

clip_image008

The majority of the processing time has been in Dateadd and DatePart, this is coming from the line in the view “YEAR(DATEADD(m, 6, soh.[OrderDate])) AS [FiscalYear] “.

The plan itself exposes this activity in a rather innocuous “compute scalar” operator that is executed 31,465 times.

clip_image009

A quick reworking of the schema and view will enable us to entirely remove this calculation and therefore this cpu cost.

 
alter table Sales.SalesOrderHeader add FiscalYearCalc smallint
go
update Sales.SalesOrderHeader set FiscalYearCalc = YEAR(DATEADD(m, 6, [OrderDate]))
go
CREATE VIEW [Sales].[vSalesPersonSalesByFiscalYearsCalc]
AS
SELECT pvt.[SalesPersonID]
      ,pvt.[FullName]
      ,pvt.[JobTitle]
      ,pvt.[SalesTerritory]
      ,pvt.[2002]
      ,pvt.[2003]
      ,pvt.[2004]
FROM (SELECT soh.[SalesPersonID]
            ,p.[FirstName] + ' ' + COALESCE(p.[MiddleName], '') + ' ' + p.[LastName] AS [FullName]
            ,e.[JobTitle]
            ,st.[Name] AS [SalesTerritory]
           ,soh.[SubTotal]
           ,[FiscalYearCalc] AS [FiscalYear]
      FROM [Sales].[SalesPerson] sp
INNER JOIN [Sales].[SalesOrderHeader] soh
   ON sp.[BusinessEntityID] = soh.[SalesPersonID]
INNER JOIN [Sales].[SalesTerritory] st
   ON sp.[TerritoryID] = st.[TerritoryID]
INNER JOIN [HumanResources].[Employee] e
   ON soh.[SalesPersonID] = e.[BusinessEntityID]
INNER JOIN [Person].[Person] p
   ON p.[BusinessEntityID] = sp.[BusinessEntityID]
) AS soh
PIVOT
(
SUM([SubTotal])
FOR [FiscalYear]
IN ([2002], [2003], [2004])
) AS pvt;

Now upon retrying the equivalent queries something quite interesting happens:

 
declare @SalesPersonID integer
declare @FullName nvarchar(255)
declare @JobTitle nvarchar(30)
declare @SalesTerritory nvarchar(30)
declare @Sales2002 money
declare @Sales2003 money
declare @Sales2004 money
select @SalesPersonID = SalesPersonID,
       @FullName = FullName,
       @JobTitle = JobTitle,
       @SalesTerritory = SalesTerritory,
       @Sales2002 = [2002],
       @Sales2003 = [2003],
       @Sales2004 = [2004]
  From Sales.vSalesPersonSalesByFiscalYearsCalc
 where SalesTerritory ='Canada'

clip_image010

 
declare @SalesPersonID integer
declare @FullName nvarchar(255)
declare @JobTitle nvarchar(30)
declare @SalesTerritory nvarchar(30)
declare @Sales2002 money
declare @Sales2003 money
declare @Sales2004 money
select @SalesPersonID = SalesPersonID,
       @FullName = FullName,
       @JobTitle = JobTitle,
       @SalesTerritory = SalesTerritory,
       @Sales2002 = [2002], 
       @Sales2003 = [2003],
       @Sales2004 = [2004]
  From Sales.vSalesPersonSalesByFiscalYearsCalc
 where SalesTerritory ='Canada'
option(querytraceon 8780)

clip_image011

Allowing the optimizer to run to its conclusion and not timeout has resulted in the same plan being returned. So, in this case, we can state that “Timeout” is certainly “Good Enough”.

Hopefully this blog will remove some of the vilification of “Timeout” and to some extent shows how it protects the optimizer from getting carried away with itself to much and also demonstrates how a change in schema results in a change of plan as the expected workload changes.

Naturally this ‘all depends’, your mileage will vary, and there will be cases where maybe removing the timeout will allow for a better plan to be built.  In the grand scheme of things should you overly worry about timeouts ? Ill leave you with a quote from Conor Cunningham “In practice, what I suggest you do is to examine whether you are happy with any given plan and stop worrying about the nastiness of the generalities of the search framework”. To paraphrase, If the plan you have does the job you need it to in an acceptable time frame, then that is “Good Enough”.