Big Data

What is Hashing; using Modulus to partition data 3

Hopefully this post goes some way in helping the reader understand better what hashing, hash indexes are and the need for row chains with In-memory Tables (Hekaton) in SQL Server 2014 hash indexes. Purpose of hashing? Hashing can be used to index character data, instead of building an index on a varchar(50) column for example,(…)

Reducing SQL Server IO and Access Times using Bloom Filters – Part 3 (Inserting Data) 5

Part 2 (Basics of the method in SQL Server) explained how to get data into a Bloom Filter structure, it now needs persisting. This post explains a method on how a Bloom Filter can be stored in a SQL Server database – I assume you have read Part 1 and Part 2 and understand about(…)

Reducing SQL Server IO and Access Times using Bloom Filters – Part 2 (Basics of the method in SQL Server) 2

Part 1 addressed Bloom Filter Concepts, if you haven’t already done so its important to start there. In this post I will show the basics of how we set and query the bit array that holds our Bloom Filter structure. Step 1 – Hash the target Data element (key) Multiple hash functions are used over your(…)

Reducing SQL Server IO and Access Times using Bloom Filters – Part 1 (Concepts) 3

Given a 10 million row table with a GUID as a primary key, we have a 50,000 row table that we want to look up to see if we have any matching rows and for those matching rows aggregate the data – lets assuming that 50% of the rows have a corresponding match – so(…)

Overview of HADOOP, new features in Version 1, Version 2 branch description 0

Introduction The essay assumes the reader has no knowledge of Hadoop or Map Reduce; it will give an overview of Hadoop and the confusion around the project branch that form v1.0.0 and v2.0.0, the essay will also give discussion of the features introduced in v1.0.0 and some of the use-cases that they can be applied(…)