An application generated sequential GUID

Recently one of our development teams implemented a GUID based on this SqlGuid Structure. Initially I did not give much thought to the implementation, however later during a code review, noting the choice of uniqueidentifier as the primary key, I began to join the dots. Clearly this is meant to be sequentially generated, and leaving aside the debate on the choice of GUID as PK, lets look at how reliable the implementation is. The link above suggests that in sql only the last six bytes are evaluated, however if one looks at the output from newsequentialid() we can see this is not the case, and therefore the implementation may not be reliable.

newsequetialid_sorting

The sequential GUID implementation

The C# implementation is given below and is basically calculate the number of ticks (see definition here) since an agreed point in time, in this case Jan 1st 2000, and then apply the least significant 6 bytes from the ticks value into the last six bytes of a randomly generated GUID.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
private static readonly long _ReferenceDate = new DateTime(2000, 1, 1).Ticks;
 
// return guid based on last six bytes
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlGuid NewSequentialGuidOrig()
{
    var ticks = DateTime.UtcNow.Ticks - _ReferenceDate;
    return SequentialGuidOrig.New(ticks);
}
 
private static class SequentialGuidOrig
{
    public static Guid New(long ticks)
    {
        var guidBytes = Guid.NewGuid().ToByteArray();
        var tickBytes = BitConverter.GetBytes(ticks);
 
        return new Guid(new[]
        {
            guidBytes[0],
            guidBytes[1],
            guidBytes[2],
            guidBytes[3],
            guidBytes[4],
            guidBytes[5],
            guidBytes[6],
            guidBytes[7],
            guidBytes[8],
            guidBytes[9],
 
            // the last six bytes need to be sequential to sort in SQL Server
            tickBytes[7],
            tickBytes[6],
            tickBytes[5],
            tickBytes[4],
            tickBytes[3],
            tickBytes[2]
        });
    }
}

Testing it out

We can easily test this by implementing a sqlclr version over a small data set of say, 65k rows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/**
create a set of guids with int row id. 
On resorting data by inserted guid its accuracy can be determined by distance of new row id from inserted one.
**/
if object_id('SequentialGuidOrigResults') is not null 
   truncate table SequentialGuidOrigResults
else
   create table SequentialGuidOrigResults (Id int not null, SequentialGuid uniqueidentifier not null);
go
 
with t1(i) as (select 1 union select 2)
   , t2(i) as (select t1.i from t1 cross join t1 t)
   , t3(i) as (select t2.i from t2 cross join t2 t)
   , t4(i) as (select t3.i from t3 cross join t3 t)
   , t5(i) as (select t4.i from t4 cross join t4 t)
   , t(Id) as (select row_number() over(order by (select null)) from t5)
 
insert into SequentialGuidOrigResults 
select [Id]
     , dbo.NewSequentialGuidOrig() [SequentialGuid]
from   t;
go
 
-- sequential guid distance distribution
-- reorder data set by SequentialGuid and calculate distance from inserted row number
select distance
     , count(1) [count] 
from   ( 
       select Id
            , [SequentialGuid]
            , abs(Id - row_number() over(order by [SequentialGuid])) distance
       from   SequentialGuidOrigResults
       ) t
group  by
       distance 
order  by
       1;

and as we can see from plotting a 300 row sample of sorted SequentialGuid values against the expected insert order (red line) in the graph below, its a bit hoppy.

sequential_guid_initial

Can we improve

Since we have only used six of the eight available bytes from our ticks value, then all we need to do is work out the next two bytes that have sorting precedence and apply the remaining two (and most significant) bytes and test.

1
2
3
4
5
6
7
8
9
-- following the last 6 byte block, the next sorting precedence is the last 2 byte block. Byte 8 is most significant.
select cast(g as uniqueidentifier) from 
(
       values ('00000000-0000-0000-00AA-000000000000')
            , ('00000000-0000-0000-AA00-000000000000')
) t(g) 
order  by
       1;
go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
private static class SequentialGuidTicks
{
    public static Guid New(long ticks)
    {
        var guidBytes = Guid.NewGuid().ToByteArray();
        var tickBytes = BitConverter.GetBytes(ticks);
 
        return new Guid(new[]
        {
            guidBytes[0],
            guidBytes[1],
            guidBytes[2],
            guidBytes[3],
            guidBytes[4],
            guidBytes[5],
            guidBytes[6],
            guidBytes[7],
 
            // after last 6, bytes 9 & 10 have sorting precedence. 
            tickBytes[1],
            tickBytes[0],
 
            // the last six bytes need to be sequential to sort in SQL Server
            tickBytes[7],
            tickBytes[6],
            tickBytes[5],
            tickBytes[4],
            tickBytes[3],
            tickBytes[2]
        });
    }
}

Overlaying the sorted SequentialGuid values generated using all bytes from the ticks value, the results show a marked improvement, but still hoppy.

sequential_guid_fullticks

Adding A Sequencer

Clearly, implementing a sequencer in the remaining bytes should be helpful, and this would need to be thread safe. Fortunately the system.threading namespace in .NET helps us out here and provides interlocked class. The increment function not only implements integer increment in an thread safe atomic operation, it also rolls over to 1 on reaching max int. In this situation this proves to be very useful. Lets implement that…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- following the last 8 bytes, 
-- the next sorting precedence is the middle 2 byte block. Byte 7 is most significant then byte 6.
-- then the first 2 byte block. Byte 5 is most significant.
-- finally the first 4 byte block. Byte 3 is most significant.
select cast(g as uniqueidentifier) from 
(
       values ('AA000000-0000-0000-0000-000000000000')
            , ('00AA0000-0000-0000-0000-000000000000')
            , ('0000AA00-0000-0000-0000-000000000000')
            , ('000000AA-0000-0000-0000-000000000000')
            , ('00000000-AA00-0000-0000-000000000000')
            , ('00000000-00AA-0000-0000-000000000000')
            , ('00000000-0000-AA00-0000-000000000000')
            , ('00000000-0000-00AA-0000-000000000000')
) t(g) 
order  by
       1;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
private static readonly Sequencer _sequencer = new Sequencer();
 
// sequencer wrapper implementing the interlocked increment, getting around limitation of static readonly variable access
// this is a thread safe implementation
private class Sequencer
{
    public int _i;
    public int Increment()
    {
        return Interlocked.Increment(ref _i);
    }
}
 
private static class SequentialGuidSeqn
{
    public static Guid New(long ticks, int sequence)
    {
        var guidBytes = Guid.NewGuid().ToByteArray();
        var tickBytes = BitConverter.GetBytes(ticks);
        var seqBytes = BitConverter.GetBytes(sequence);
 
        return new Guid(new[]
        {
            guidBytes[0],
            guidBytes[1],
            guidBytes[2],
            guidBytes[3],
 
            // sql server presentation for this byte block is swapped
            seqBytes[1],
            seqBytes[0],
 
            // sql server presentation for this byte block is swapped
            seqBytes[3],
            seqBytes[2],
 
            // after last 6, bytes 9 & 10 have sorting precedence. 
            tickBytes[1],
            tickBytes[0],
 
            // the last six bytes need to be sequential to sort in SQL Server
            tickBytes[7],
            tickBytes[6],
            tickBytes[5],
            tickBytes[4],
            tickBytes[3],
            tickBytes[2]
        });
    }
}

Overlaying the sorted SequentialGuid values generated when including a sequencer, the results (that fat blue line hiding behind the target red one) are sequential.

sequential_guid_sequence

In Summary

Clearly there are some limitations, for example the reference time from which ticks is generated needs to fixed. Scaling out would also need to be carefully considered. What is also not so obvious is the interlocked.increment rollover can result in some out of sequence values, although when putting this post together I noticed there is an increment64 which would be much safer to use.

All the code presented is available here in GitHub as a sql project.

Enjoy.