Aug 22

Who did what to my database and when…

One of the most popular questions on forums / SO etc is, “How can i find out who dropped a table, truncated a table, dropped a procedure….” etc.   Im sure we have all been there,  something changes ( maybe schema or data ) and we have no way of telling who did it and when.

SQL auditing can get you there, and i’m not suggesting you shouldn’t use that,  but what if that is not setup for monitoring the objects you are interested in ?

If you look back through my posts, I have been quite interested in ScriptDom and TSQL parsing,  so if you have a trace of all the actions that have taken place over a time period, you can parse the statements to find the activity.

Drawing inspiration ( cough, stealing ) from Ed Elliot’s dacpac explorer,  I have created a git repo for a simple parser that is created using T4 templates. T4 templates are an ideal use here as the full language domain can be exploded out automatically and then you as a developer can add in your code to cherry pick the parts you are interested in.

At this time the project has hooks to report on SchemaObjectName objects , so any time any object is referenced be that in a CREATE, SELECT , DROP, MERGE …  will fall into the function OnSchemaObjectName and  OnDropTableStatement that will be hit when DROP TABLE is used.

This is not intended to be a full end to end solution, not least as the SQL to be parsed could be coming from and number of sources,  but if you have the skills to run it as is , you probably have enough to tailor it for your requirements.

As ever,  let me know what you think and any comments gratefully received.

The github repo is   :   https://github.com/davebally/TSQLParse

 

Jun 17

Why is creating excel sheets from SSIS so hard ?

If there is one process that should be simpler than it is out of the box, it is creating Excel spreadsheets from SSIS.   Over the years i’ve tried doing it a number of ways, using the built in component,  Interop , OLE DB etc all suck to one degree or another.  Either unreliable or to slow or simply unusable.

A twitter conversation, A) proved I wasn’t alone and B) Pointed me in the direction of EPPlus.

Over on SSC there is already a post on using EPplus with SSIS, some of which, such as putting EPPlus into the GAC, is still relevant for this post.

However, right now, i have a big love of BIML, simply put i think that this is what SSIS should have been in the first place and I personally find all the pointing and clicking a real time sink.  Additionally,  in BIML, one you have written a package to do something , ie a simple dataflow, its a snip to repeat that over 10, 20, 50 , 100 or 1000s of tables.  But the real time saver for me is when you need to re-architect,  ie turn sequential dataflows into a parallel dataflow.  Its only really a case of changing where you iterate in your BIML code.

Anyway,  i’ve combined these two pain points to create a BIML routine that uses EPPlus to output multi-sheeted Excel spreadsheet reliably and fast.

At the moment its very basic , take SQL statements and output the data to an excel file, but in time i will be hoping to create some meta data to start ‘getting the crayons out’ and making them look a bit more pretty.

Code is on GitHub at https://github.com/davebally/BIML-SSIS-Excel-Output ,  hope this of use to someone.

 

 

Mar 24

.Net 4.6.1 and MultiSubnetFailover

Im not a big clustering/HA expert, but I obviously understand the need for such things to exist 🙂

However, the standard SQL tooling, SSMS etc, seems to go out of its way to make the MultiSubNetFailover option as inaccessible and convoluted as possible.  Quite why this option even exists and its not done as default escapes me, clients should be entirely agnostic to the server its talking to.

After going through this pain again, this time deploying SSIS packages to an AG, i stumbled over this msdn blog

https://blogs.msdn.microsoft.com/alwaysonpro/2015/12/01/improved-multisubnet-listener-behavior-with-newly-released-sql-client-provider-in-net-4-6-1/

So .NET 4.6.1 does default to MultiSubNetFailover = True, which is great.

There is a rather cryptic note on the end though

Note that this isn’t what ended up being shipped.  Instead a new default property TransparentNetworkIPResolution was introduced to SqlConnection.

Ive asked for some clarification on what that exactly means, im unable to find any other reference to that property, but In my testing the effect seems to be the same :S

So,  if you have MultiSubNetFailOver=True pain,  give .NET 4.6.1 a whirl,  it should resolve that.

 

Oct 04

SQL Saturday 467 – Precon – Query processing and internals

Im not doing a tremendous amount of public speaking this year and concentrating on more professional matters,   however im pleased to say that on Friday 5th December i will be presenting a day long pre-con on Query Processing and internals at SQL Saturday 467 in Southampton.

We will be spending the whole day looking at the optimizer and picking apart its internals to show how it generates a plan from your given sql. This will cover a wide range of topics, starting from execution plan basics and working ourselves through to demonstrating internals with undocumented traceflags.

Other precons are available if this doesn’t float your boat.   Full details, pricing and hotel discount codes are available from http://www.sqlsaturday.com/467/eventhome.aspx

Mar 23

Getting out of the ground with TSQL Smells.

 

At SQLBits I had a number of conversations with a number of people over TSQL Smells, my open source project for static code analysis of SQL Code. The general opinion was that although the concept was sound, the process of getting it running ( and developing it further ) was not documented. Im sure that if you have sufficient .Net skills you could muddle through on your own, but since it will be mainly TSQL devs who have the interest here, I thought I would take the time to draw up some ‘Arse and Elbow’ documentation to get you started.

So lets go :

  1. Grab Visual Studio 2013. Community edition is fine operationally ( Im not a lawyer, check the licensing terms).
  2. After that has installed, update SSDT. Tools -> Extensions and Updates -> Updates -> Product Updates
  3. Install both the 32Bit and 64Bit versions of the DACFX framework, at time of writing latest is downloadable from http://www.microsoft.com/en-us/download/confirmation.aspx?id=45886
  4. Ensure that the SSDT extensions directory ( C:\Program Files (x86)\Microsoft Visual Studio 12.0\Common7\IDE\Extensions\Microsoft\SQLDB\DAC\120\Extensions ) is read writable ( I use Everyone / Full control ).

That should be the tooling side of the equation sorted.

Now clone ( I wont go into what this means , a number of tutorials are available on github) the TSQL Smells repo on github http://github.com/davebally/TSQL-Smells

In terms of software this should now all be done.

Run up visual studio, load in the TSQLSmells solutions and “Rebuild Solution”.

If that all goes according to plan, you will now be able to run the unit tests

And if you are even more lucky they will all pass.

This now means that the rules are installed correctly and we can proceed onwards.

The unit test cases reference sql files that are held in the TSQLSmellsTest project, you can further test by loading this project and selecting the “Run Code Analysis” option.

You are now in a position to load your own projects and run the code analysis against them.

Finally, included in the solution is RunSCAAnalysis, this generates a small exe that can run the static code analysis rules against a database or dacpac.

Command line options for that are –S Server –o outputfile and one of d
database or –f filename, it would be my intention with this that it that it would form part of your automated build/deploy framework (if you are not using SSDT) and break the build on a “Smell”.

I hope that this blog entry goes someway to starting you on your way to better TSQL Code quality and for my part, I will be giving it some love in the near future.


Update 16-apr-2016

Ive had some comments that the project is not working so ive now update the GitHub project to work with VS2015 / SQL 2014.  To do this i built a fresh azure VM with VS2015 installed and i did run into some problems.

If you are having build problems :

Mostly these may be due to missing references and due to the multiple packages involved it hard to give absolute instructions.

Missing Dac references : You need to find and install the latest ( or appropriate) DACFX packages (32Bit is essential , 64 Bit is optional for a runtime env).  The SSDT team blog is the best source of reference https://blogs.msdn.microsoft.com/ssdt/ After install you may need to delete the reference in the project , add reference and browse to the DACFX folder ( C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\Extensions\Microsoft\SQLDB\DAC\130\ ) and add the dlls in.

If you are not using VS2015 & SQL 2014 then you will probably have to look in different version number folders.

Missing TransactSql.ScriptDom reference : This is installed as part of sql server and not vs,  if you do not have SQL Installed, you will need to find “Transact-SQL Language Service” from the SQL Server Feature pack https://www.microsoft.com/en-gb/download/details.aspx?id=42295.  The package is tsqllanguageservice.msi.

Again delete the references and re-add by browsing to the folder, C:\Program Files (x86)\Microsoft SQL Server\130\SDK\Assemblies.  Again the 130 version number will change over time / older versions.

It builds but the tests fail :

This will probably be due to the output being placed in the incorrect folder.  SSDT will look to the folders (C:\Program Files\Microsoft SQL Server\X\Dac\Bin\Extensions\ ) where X is the SQL Version being targeted..  Make sure that the folder exists for your specific setup.

I wish that there were a way to get over all the versioning and streamline this but as of yet there isnt.

 

 

Mar 17

Increasing ColumnStore compression ratio for archival and “Cold storage”

At my present client, part of the task I have to under take is to re-engineer the “Cold Storage”. This is ~20Terabytes of compressed CSVs going back over 15 years or so. Presently these are held on aging and creaking centera servers that are way past end of life. The solution probably made sense 10 years ago ( or whenever it was designed ) but now it doesn’t and the clients first thought was to throw it all into Orace RAC with page compression. Seems like a vaguely sensible ( if expensive) way to handle the data. But, maybe there is an even better way and right now I would say that ColumnStore compression is that better way.

As a really simple POC I took some pseudo-real sample data and threw it into a SQL Server table. Data size was approx ~300MB. After adding page compression, that size came down to ~42MB. That is roughly inline with the expected estimates ( and I would hazard a guess what could be achieved by Oracle page compression). So how does columnstore compare ? Adding a clustered columnstore index took datasize own to 2,128KB, Yes from 300MB to 2MB that is amazing. But wait, it gets better using the COLUMNSTORE_ARCHIVAL setting that applies a variant of the LZH compression to (presumably) each rowgroup the datasize falls even further to 464KB. From 300MB to less that 1MB now, that is compression for you.

The other side of the coin is query-speed ( this is arguably less important at the client ) and something I am yet to measure and compare but will do when I get my dirty hands on the real data.

However, the achievable compression can be increased even further with a little bit of thought and jigging of the data. Simplistically columnstore is all about repeating data over a column, the meta data says “In Column X I have the value Y and that happens for the next Z rows”. My data has some large almost sequential columna, transaction reference, batch transaction reference etc.. stored as large-ish chars of 16,20, and 32 bytes in length. These are fairly unique throughout the data and as such column store compression has to specify the full value on each and every row. What would be the impact if we only held that part of each reference that was changing ?, i.e. the whole 32 characters are held but only the last one would be changing as these are effectively “Strings that look like integers”.

A quick bit of guestimation, for simplicities sake splitting the column into 2 one of 31 chars and one of 1.

10 rows *32 bytes = 320

Against

1 row *31 bytes + 10 of 1 = 41 bytes

Interesting, ~15% compression. Lets push this idea further and create a million row table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
drop table CCSTest1
go
Create Table CCSTest1
(
Col1 char(23) not null
)
go
with cteRown
as
(
Select top(1000) ROW_NUMBER() over( order by (select null)) as r
from sys.columns a cross apply sys.columns b
)
,
ctePad
as
(
Select REPLICATE('0',4-len(cast(r as varchar(4))))+cast(r as varchar(4)) as PadR
from cteRown
)
insert into CCSTest1
select r1.PadR+'000000000000000'+r2.PadR
from ctePad r1
cross join ctePad r2
 
go
Create clustered index CCSTest1idx on CCSTest1(Col1)

Using sp_spaceused we can see the data size is 31,632KB.

Adding page compression

1
ALTER TABLE CCSTest1 REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE);

Brings the datasize down to 13,120KB not to shabby.

Let us now switch to a clustered columstore

1
CREATE CLUSTERED COLUMNSTORE INDEX CCSTest1idx ON CCSTest1 with (drop_existing = on)

This bring the data storage even further down to 2,864KB, quite neat in itself even before we change the underlying data structure. Note that adding the COLUMNSTORE_ARCHIVAL option has no effect on the datasize, this is due to the dictionary that is used by default to add an extra level of compression to varchar data.

Lets now split Col1 into smaller ( and more importantly for columnstore, repeatable and repeating) segments.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
drop table CCSTest2
go
Create Table CCSTest2
(
Col1 char(19) not null,
Col2 char(4) not null
)
go
with cteRown
as
(
Select top(1000) ROW_NUMBER() over( order by (select null)) as r
from sys.columns a cross apply sys.columns b
)
,
ctePad
as
(
Select REPLICATE('0',4-len(cast(r as varchar(4))))+cast(r as varchar(4)) as PadR
from cteRown
)
insert into CCSTest2(Col1,Col2)
select r1.PadR+'000000000000000',r2.PadR
from ctePad r1
cross join ctePad r2
go
Create clustered index CCSTest2idx on CCSTest2(Col1,Col2)

Datasize here is 31,632KB, no real difference from before.

Now add the columnstore index

1
CREATE CLUSTERED COLUMNSTORE INDEX CCSTest2idx ON CCSTest2 with (drop_existing = on)

The datasize is now 1,392KB less than half of the original and certainly a practice that I shall be employing to further increase the potential compression on data. Yes, query speed will be ever so slightly degraded but this is perfectly acceptable in my scenario.

If you are using columnstore compression, to achieve the ‘optimal’ compression ratio there are still things you have to do and hoops to jump through. Consider splitting columns into multiple parts to increase the ‘repeatability’ of data over sequential rows and ensure that a clustered index is built upon the ‘raw’ data prior to adding a columnstore to define what that ‘sequential’ order is. See also “Data Loading performance considerations with Clustered Columnstore indexes“, though more concerned with throughput, does contain information of compression ratio.

Mar 08

SQLBits, Incremental Statistics and Demo Fail

So another great SQLBits is done and dusted, many thanks to the team, helpers and sponsors for putting on such a great event. It’s always a pleasure and honour to be able to present here and to get to spend time with a great crowd of intelligent SQL Geeks.

I have uploaded the slides and scripts from both of my sessions, Fridays session “Statistics and Estimations” are available here http://1drv.ms/1C0BMva and Saturdays “2014 Cardinality estimation” session are available here http://1drv.ms/1FwA7er .

During the 2014 session I suffered a massive demo fail, and as ever, realised my mistake as soon as I had left the room. I was in the middle of what I think it is an interesting scenario with incremental statistics so I thought it was work adding the details of the point I was *trying* to make on my blog.

Incremental statistics are a great new addition, simply put each partition in a table now has its own set of statistics. These are aggregated together to form the table statistics for estimation purposes, I have seen posts that sort of suggest that more than 200 histogram steps are now used for estimation, I haven’t found any evidence for this, ill pick this up later in the post.

For most partitioned tables incremental statistics are great as in reality most tables that are partitioned will only have updates to the ‘last’ one. Most partitioned strategies are on something that is continually incrementing ( like a date-time or ID ) so only the last partition is touched. Using incremental statistics means that SQL Server now does not have to look at the previous partitions ( that have not been changed) to form statistics, a nice little cost saving. Its almost a no brainer, I can’t really see a downside. Ben Nevarez covers the basics in more detail here.

However for those tables that have updates to multi partitions, there are some nuances that you will need to be aware of and this is the scenario I had with the demo fail.

As ever, the best way of demonstrating the subtleties is by example, lets create a table, define the partitioning strategy and add some data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
drop database PartitionTest
go
Create Database PartitionTest
go
USE PartitionTest;
GO
-- Adds four new filegroups to the PartitionTest database
ALTER DATABASE PartitionTest
ADD FILEGROUP test1fg;
GO
ALTER DATABASE PartitionTest
ADD FILEGROUP test2fg;
GO
ALTER DATABASE PartitionTest
ADD FILEGROUP test3fg;
GO
ALTER DATABASE PartitionTest
ADD FILEGROUP test4fg;
go
 
-- Adds one file for each filegroup.
ALTER DATABASE PartitionTest
ADD FILE
(
NAME = test1dat1,FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\t1dat1.ndf',SIZE = 5MB,MAXSIZE = 100MB,FILEGROWTH = 5MB
)
TO FILEGROUP test1fg;
 
ALTER DATABASE PartitionTest
ADD FILE
(
NAME = test2dat2,FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\t2dat2.ndf',SIZE = 5MB,MAXSIZE = 100MB,FILEGROWTH = 5MB
)
TO FILEGROUP test2fg;
GO
ALTER DATABASE PartitionTest
ADD FILE
(
NAME = test3dat3,FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\t3dat3.ndf',SIZE= 5MB,MAXSIZE = 100MB,FILEGROWTH = 5MB
)
TO FILEGROUP test3fg;
GO
ALTER DATABASE PartitionTest
ADD FILE
(
NAME = test4dat4,FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\t4dat4.ndf',SIZE = 5MB,MAXSIZE = 100MB,FILEGROWTH = 5MB
)
TO FILEGROUP test4fg;
GO
-- Creates a partition function called myRangePF1 that will partition a table into four partitions
CREATE PARTITION FUNCTION myRangePF1 (int)
AS RANGE LEFT FOR VALUES (100000, 200000, 300000) ;
GO
-- Creates a partition scheme called myRangePS1 that applies myRangePF1 to the four filegroups created above
CREATE PARTITION SCHEME myRangePS1
AS PARTITION myRangePF1
TO (test1fg, test2fg, test3fg, test4fg) ;
GO
-- Creates a partitioned table called PartitionTable that uses myRangePS1 to partition col1
drop TABLE PartitionTable
go
CREATE TABLE PartitionTable (col1 int identity primary KEY, col2 int)
ON myRangePS1 (col1) ;
GO
-- Can be a bit slow so....
ALTER DATABASE PartitionTest SET DELAYED_DURABILITY = FORCED;
go
/* Populate with 10* SalesOrderHeader*/
insert into PartitionTable(col2)
select CurrencyRateID
from AdventureWorks2012.sales.SalesOrderHeader
go 10

All being well you should now have multiple partitions populated with some data, confirm that with

1
select partition_number,rows from sys.partitions where OBJECT_ID=OBJECT_ID('PartitionTable')

Now to add incremental statistics to the table we need to simply:

1
2
3
4
create statistics StatPartitionTable
on PartitionTable(Col2)
with incremental = on;
go

What we are now interested in is when does a table have its stats updated and when does a partition have its stats updated. Naturally partition stats happen as part of table stats update and cant happen as an independent process.

To start lets update all the stats, but first run profiler with SP:StmtCompleted and SQL:StmtCompleted events ( obviously filtered to the SPID that you are working in J ), with this on we can monitor the STATMAN command that are internally used by SQL Server and the partitions that are being sampled.

With that running, update the stats with :

1
update statistics PartitionTable(statpartitiontable)

In the profiler session you will see 4 STATMAN calls:

Looking at the TextData for each one you will see that there is one per partition indicated by the usual partition function filter of

WHERE $PARTITION.myRangePF1([col1]) = X

X being the partition number.

Lets update some rows in a single partition:

1
2
3
update PartitionTable
set col2=1
where col1<=20500

This is 20% of the rows+500 enough to cause SQL Server to need to update table statistics, in case you are wondering TF 2371 is still relevant and will use the dynamic drop off as versions prior to 2014.

You can also monitor the number of modifications using

1
select * From sys.dm_db_stats_properties(object_id('partitiontable'),2)

If we now run a statement that requires statistics on the table such as

1
2
select count(*) from PartitionTable where col2=877
option(recompile)

In our profiler session we will see that indeed only one of the partitions has been updated and the modification count in dm_db_stats_properties has been reset to zero.

Quite neat, why bother to sample the other there have been no changes.

Lets complicate things a bit more and update 2 partitions

1
2
3
4
5
6
7
8
update PartitionTable
set col2=407
where col1<=25000
go
update PartitionTable
set col2=407
where col1>100000 and col1<=125000
go

Modification counter now equals 50,000 and after executing

1
select count(*) from PartitionTable where col2=877 option(recompile)

the counter get reset to zero and both partition STATMAN calls will be seen in profiler.

Now, here is the scenario that I wanted to show and had a demo fail in my session:

1
2
3
4
5
6
7
8
update PartitionTable
set col2=2225
where col1<=15000
go
update PartitionTable
set col2=2225
where col1>100000 and col1<=113000
go

Looking at the modification counter there are 28,000 updates, so the question is: Will the stats now be updated ?

Run the count(*) and find out J

Modification counter will equal zero but there have been no STATMAN calls, why is this ? Well, although the table has 20% + 500 changes, the partitions are also required to have a certain level of updates too. I apologise for the use of ‘certain level’ here but haven’t been able to find what the exact formula is but from my observations then its 20% of the partitions rows need to be updated to force the stats update.

So what has happened to the update counter, well it’s still at the partition level this can be seen on the internal version of stats properties DMV.

1
select * From sys.dm_db_stats_properties_internal(object_id('partitiontable'),2)

Partitions one and two have 15k and 13k updates respectively and will continue to increment as updates happen. This does muddy the waters of the functionality a bit and TBH in my mind makes it very hard to come up with a solid usage recommendation.

Overall I think that if you have a partitioned table then you will be wanting to enable incrementing statistics but just be aware that the individual partitions may not be updating in the manner that you are expecting.

Let us return to the > 200 step histogram issue too, to my mind then if it were possible for SQL Server to use the histograms at the partition level then it would show when filtering using the table using the partition function ie

$PARTITION.myRangePF1([col1])

However if we completely mangle our test data with

1
update PartitionTable set col2 = $PARTITION.myRangePF1([col1])

and then

1
2
3
Select count(*) from PartitionTable where col2=1 and $PARTITION.myRangePF1([col1])=1 option(recompile)
go
Select count(*) from PartitionTable where col2=1 and $PARTITION.myRangePF1([col1])=2 option(recompile)

We do get a different estimation, but lets dive into this a bit further with the cardinality estimator extended event.

For partition 1 it looks like this:

For partition 2 it looks like this

So the selectivity of col2 is the same across both partitions .318 but the selectivity of the partition function is different and hence the different estimation. Maybe I am over simplifying things here but as I say, if there could be a difference then I would have expected it to show here.

Nov 02

Non use of persisted data – Part Three

A long time ago I looked at some reasons as to why persisted computed columns ( and therefore indexes on those columns) are not used (Part1, Part2).
Recently I was drawn into an issue where the same thing was happening, a persisted computed column had been created with an index, but neither were being used when expected.
Using AdventureWorks lets re-create this scenario:

1
2
3
4
5
6
7
8
9
10
11
12
13
create table ccTest
(
id integer primary key,
surname varchar(50) not null,
cSum as checksum(isnull(surname,' ')) persisted not null)
go
create index idxcsum on ccTest(cSum)
go
 
 
insert into ccTest(Id,surname)
Select BusinessEntityID,LastName from Person.Person
go

Now, if we want to find all those surnames with a specific checksum (yes, i know! checksum is bad ), it would be reasonable to :

1
select * from ccTest where cSum=0

and assume that the index idxcsum is used to retrieve the required data. Looking at the plan it obvious that this does not happen.

ixscan1

The clustered index is scanned, the non clustered index is ignored. There are cases where this is a good thing to happen, for instance to avoid a key/bookmark lookup when that will be an expensive operation. Here though there is an estimate of ~16rows not enough for that to be the case.
Maybe we need to force the use of the index ? :

1
select * from ccTest with(index=idxcsum) where cSum=0

This uses the index, but not in the manner we need :

ixforced

The index is scanned, all the rows are read from the index, and the key lookup previously mentioned has entered the equation.
So what is really happening here ?
Well in the second example, there are some very graphical clues, the Compute Scalar and Filter.
What is the compute scalar computing ? Lets look ‘defined values’ in the properties :
[[AdventureWorks2012].[dbo].[ccTest].cSum] = Scalar Operator(checksum([AdventureWorks2012].[dbo].[ccTest].[surname]))
So the persisted data is being recomputed, looking at the filter the predicate is :
[AdventureWorks2012].[dbo].[ccTest].[cSum]=(0)
That computed value is then being filtered upon. But this is the persisted data, why is the index not used ?. Well, that’s because the computation isn’t logically the same. But how can this be ?
There are two things that need to be appreciated here:

  1. SQL Server puts a lot of energy into simplifying queries and operations. Redundant branches are pruned and contradictions removed.
  2. Computed columns are expanded back to the base definition before being optimized.

In our computed column definition we have a redundant code branch. ISNULL is used to guard against the Surname column being NULL, however the value in Surname can never be NULL as there is a NOT NULL constraint on it, therefore when the column definition is expanded it is then simplified and this redundancy removed. You can see this in the ‘defined values’ above ? There is no ISNULL definition is the computation. So the optimizer cannot now rematch this computation back to the persisted computed column as the definition is not the same. It would be nice to see the NOEXPAND hint used to prevent this happening as it does for indexed views.
Now we know that the resolution is relatively straightforward:

1
2
3
4
5
6
7
8
9
10
11
12
13
create table ccTest2
(
id integer primary key,
surname varchar(50) not null,
cSum as checksum(surname) persisted not null)
go
create index idxcsum on ccTest2(cSum)
go
 
 
insert into ccTest2(Id,surname)
Select BusinessEntityID,LastName from Person.Person
go

And then if we :

1
select * from ccTest2 where ccTest2.cSum=0

We will see the index used as expected in the first place

ixgood

So, the takeaway is that if you are using persisted computed columns ensure that the definitions have no redundancy and then the indexes should be used.
NB : Curiously there is something else , or more accurately another level of indirection, here. If you use the query :

1
select * from ccTest where CHECKSUM(surname) = 0

You may have expected it to match to the persisted computed column, as it would against ccTest2, but it doesn’t.

Oct 20

Manchester and Leeds user groups – 16/17 Oct

Thanks to all those that turned out of Wednesday and Thursday night in Leeds and Manchester to hear my presentation on Estimation and Statistics.  The ‘lucky’ Manchester crowd also got to hear ‘Trace Flag fun’, a deep dive into optimizer internals.

As promised here are the slides and scripts:

 

Estimation and Statistics

TraceFlag Fun

If you have any follow up questions then please mail me : dave@clearskysql.co.uk

Sep 30

A quickie post on debugging

I’m not a great fan of the debugging feature in management studio, It always strikes me that to get any use from it then you are iterating (looping) too much ie doing TSQL wrong.  TSQL is not a procedural programming language and the debugger would seem to be most useful if you did develop TSQL in a procedural manner.  If you could use it to peek inside a temporary table or execute a single cte in a complex statement then we are talking 🙂

What I do like to do though is use the little known, at least to the people I speak to,  “Trace Query in SQL Server Profiler” option. 

Profile

This is accessed by R-Clicking in a query window and will run profiler with a filter to the current windows spid.  Quite neat.

This feature can be further enhanced to be even more useful.  When profiler is run it will default to using the template “TSQL_SPs”.  The default configuration of this template is, IMO, not very useful,  so I have created my own “TSQL_SPs” template which contains those events and columns that I am interested in.  That being, all errors and warnings, SP:StmtCompleted, SQL:StmtRecompile, SQL:BatchCompleted and a few others,  with the Duration,CPU, Reads, Writes columns.  This is simple to do, create a profiler session with the events and columns that you are concerned about and simply File->Save As->Template and select “TSQL_SPs” from the drop down.  Hopefully next time you “Trace Query in SQL Server Pofiler” your template will be loaded and have the SPID filter automatically added.

To my mind, that is real TSQL debugging, how hard is this statement going to hit the metal?