SQLBits, Incremental Statistics and Demo Fail

So another great SQLBits is done and dusted, many thanks to the team, helpers and sponsors for putting on such a great event. It’s always a pleasure and honour to be able to present here and to get to spend time with a great crowd of intelligent SQL Geeks.

I have uploaded the slides and scripts from both of my sessions, Fridays session “Statistics and Estimations” are available here http://1drv.ms/1C0BMva and Saturdays “2014 Cardinality estimation” session are available here http://1drv.ms/1FwA7er .

During the 2014 session I suffered a massive demo fail, and as ever, realised my mistake as soon as I had left the room. I was in the middle of what I think it is an interesting scenario with incremental statistics so I thought it was work adding the details of the point I was *trying* to make on my blog.

Incremental statistics are a great new addition, simply put each partition in a table now has its own set of statistics. These are aggregated together to form the table statistics for estimation purposes, I have seen posts that sort of suggest that more than 200 histogram steps are now used for estimation, I haven’t found any evidence for this, ill pick this up later in the post.

For most partitioned tables incremental statistics are great as in reality most tables that are partitioned will only have updates to the ‘last’ one. Most partitioned strategies are on something that is continually incrementing ( like a date-time or ID ) so only the last partition is touched. Using incremental statistics means that SQL Server now does not have to look at the previous partitions ( that have not been changed) to form statistics, a nice little cost saving. Its almost a no brainer, I can’t really see a downside. Ben Nevarez covers the basics in more detail here.

However for those tables that have updates to multi partitions, there are some nuances that you will need to be aware of and this is the scenario I had with the demo fail.

As ever, the best way of demonstrating the subtleties is by example, lets create a table, define the partitioning strategy and add some data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
drop database PartitionTest
go
Create Database PartitionTest
go
USE PartitionTest;
GO
-- Adds four new filegroups to the PartitionTest database
ALTER DATABASE PartitionTest
ADD FILEGROUP test1fg;
GO
ALTER DATABASE PartitionTest
ADD FILEGROUP test2fg;
GO
ALTER DATABASE PartitionTest
ADD FILEGROUP test3fg;
GO
ALTER DATABASE PartitionTest
ADD FILEGROUP test4fg;
go
 
-- Adds one file for each filegroup.
ALTER DATABASE PartitionTest
ADD FILE
(
NAME = test1dat1,FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\t1dat1.ndf',SIZE = 5MB,MAXSIZE = 100MB,FILEGROWTH = 5MB
)
TO FILEGROUP test1fg;
 
ALTER DATABASE PartitionTest
ADD FILE
(
NAME = test2dat2,FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\t2dat2.ndf',SIZE = 5MB,MAXSIZE = 100MB,FILEGROWTH = 5MB
)
TO FILEGROUP test2fg;
GO
ALTER DATABASE PartitionTest
ADD FILE
(
NAME = test3dat3,FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\t3dat3.ndf',SIZE= 5MB,MAXSIZE = 100MB,FILEGROWTH = 5MB
)
TO FILEGROUP test3fg;
GO
ALTER DATABASE PartitionTest
ADD FILE
(
NAME = test4dat4,FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\t4dat4.ndf',SIZE = 5MB,MAXSIZE = 100MB,FILEGROWTH = 5MB
)
TO FILEGROUP test4fg;
GO
-- Creates a partition function called myRangePF1 that will partition a table into four partitions
CREATE PARTITION FUNCTION myRangePF1 (int)
AS RANGE LEFT FOR VALUES (100000, 200000, 300000) ;
GO
-- Creates a partition scheme called myRangePS1 that applies myRangePF1 to the four filegroups created above
CREATE PARTITION SCHEME myRangePS1
AS PARTITION myRangePF1
TO (test1fg, test2fg, test3fg, test4fg) ;
GO
-- Creates a partitioned table called PartitionTable that uses myRangePS1 to partition col1
drop TABLE PartitionTable
go
CREATE TABLE PartitionTable (col1 int identity primary KEY, col2 int)
ON myRangePS1 (col1) ;
GO
-- Can be a bit slow so....
ALTER DATABASE PartitionTest SET DELAYED_DURABILITY = FORCED;
go
/* Populate with 10* SalesOrderHeader*/
insert into PartitionTable(col2)
select CurrencyRateID
from AdventureWorks2012.sales.SalesOrderHeader
go 10

All being well you should now have multiple partitions populated with some data, confirm that with

1
select partition_number,rows from sys.partitions where OBJECT_ID=OBJECT_ID('PartitionTable')

Now to add incremental statistics to the table we need to simply:

1
2
3
4
create statistics StatPartitionTable
on PartitionTable(Col2)
with incremental = on;
go

What we are now interested in is when does a table have its stats updated and when does a partition have its stats updated. Naturally partition stats happen as part of table stats update and cant happen as an independent process.

To start lets update all the stats, but first run profiler with SP:StmtCompleted and SQL:StmtCompleted events ( obviously filtered to the SPID that you are working in J ), with this on we can monitor the STATMAN command that are internally used by SQL Server and the partitions that are being sampled.

With that running, update the stats with :

1
update statistics PartitionTable(statpartitiontable)

In the profiler session you will see 4 STATMAN calls:

Looking at the TextData for each one you will see that there is one per partition indicated by the usual partition function filter of

WHERE $PARTITION.myRangePF1([col1]) = X

X being the partition number.

Lets update some rows in a single partition:

1
2
3
update PartitionTable
set col2=1
where col1<=20500

This is 20% of the rows+500 enough to cause SQL Server to need to update table statistics, in case you are wondering TF 2371 is still relevant and will use the dynamic drop off as versions prior to 2014.

You can also monitor the number of modifications using

1
select * From sys.dm_db_stats_properties(object_id('partitiontable'),2)

If we now run a statement that requires statistics on the table such as

1
2
select count(*) from PartitionTable where col2=877
option(recompile)

In our profiler session we will see that indeed only one of the partitions has been updated and the modification count in dm_db_stats_properties has been reset to zero.

Quite neat, why bother to sample the other there have been no changes.

Lets complicate things a bit more and update 2 partitions

1
2
3
4
5
6
7
8
update PartitionTable
set col2=407
where col1<=25000
go
update PartitionTable
set col2=407
where col1>100000 and col1<=125000
go

Modification counter now equals 50,000 and after executing

1
select count(*) from PartitionTable where col2=877 option(recompile)

the counter get reset to zero and both partition STATMAN calls will be seen in profiler.

Now, here is the scenario that I wanted to show and had a demo fail in my session:

1
2
3
4
5
6
7
8
update PartitionTable
set col2=2225
where col1<=15000
go
update PartitionTable
set col2=2225
where col1>100000 and col1<=113000
go

Looking at the modification counter there are 28,000 updates, so the question is: Will the stats now be updated ?

Run the count(*) and find out J

Modification counter will equal zero but there have been no STATMAN calls, why is this ? Well, although the table has 20% + 500 changes, the partitions are also required to have a certain level of updates too. I apologise for the use of ‘certain level’ here but haven’t been able to find what the exact formula is but from my observations then its 20% of the partitions rows need to be updated to force the stats update.

So what has happened to the update counter, well it’s still at the partition level this can be seen on the internal version of stats properties DMV.

1
select * From sys.dm_db_stats_properties_internal(object_id('partitiontable'),2)

Partitions one and two have 15k and 13k updates respectively and will continue to increment as updates happen. This does muddy the waters of the functionality a bit and TBH in my mind makes it very hard to come up with a solid usage recommendation.

Overall I think that if you have a partitioned table then you will be wanting to enable incrementing statistics but just be aware that the individual partitions may not be updating in the manner that you are expecting.

Let us return to the > 200 step histogram issue too, to my mind then if it were possible for SQL Server to use the histograms at the partition level then it would show when filtering using the table using the partition function ie

$PARTITION.myRangePF1([col1])

However if we completely mangle our test data with

1
update PartitionTable set col2 = $PARTITION.myRangePF1([col1])

and then

1
2
3
Select count(*) from PartitionTable where col2=1 and $PARTITION.myRangePF1([col1])=1 option(recompile)
go
Select count(*) from PartitionTable where col2=1 and $PARTITION.myRangePF1([col1])=2 option(recompile)

We do get a different estimation, but lets dive into this a bit further with the cardinality estimator extended event.

For partition 1 it looks like this:

For partition 2 it looks like this

So the selectivity of col2 is the same across both partitions .318 but the selectivity of the partition function is different and hence the different estimation. Maybe I am over simplifying things here but as I say, if there could be a difference then I would have expected it to show here.

Leave a Reply

Your email address will not be published. Required fields are marked *