Experience Innovative Online Hands-on SQL Server Learning via YouTube

We would like to introduce you to the first Online Hands-on Learning available for SQL Server.

Learn SQL Server Internals and practice the hands-on examples shown in the YouTube videos right away – LIVE – while watching! No bulky VMs to download, SQLTest tool simulates the hands-on labs for you! No registration necessary.

Here are some of the topics we have available at the moment:

  • Why does SQL Server choose Scan instead of Seek? Also check out the cost based optimizer and see for yourself: It sometimes chooses seek over scan even though the seek is a bit more expensive.
  • How about Resource Governor, are you aware that Resource Governor CPU Throttling does not work for mixed workload until SQL Server 2016? Before you promise your manager anything, practice some scenarios!
  • Did you know SQL Server Optimizer doesn’t cost CPU overhead related to expanding compressed data? Learn why you should never compress certain tables by checking out the hands-on video Data Compression – Pros and Cons.
  • Don’t count on the wait time, SQL Server will reset the ASYNC_NETWORK_IO wait time after a threshold, don’t believe any canned demos, check it out for yourself!


  • Concurrent updates and selects from adjacent rows on the same page from many sessions can lead to PAGELATCH_EX and PAGELATCH_SH contention, practice the solution.

All tutorials can be found on sqlvideo.com (YouTube embedded), hands-on example scripts included.

Enjoy the experience and please take a few minutes to give us feedback so we can improve our tutorials!

Are there any specific topics you would like us to cover among our next clips? Don’t forget to mention them as well.

If you want to take the learning a step further, join one of our in-person hands-on workshops! Schedule

Looking forward,

Ramesh and Carmen from SQLWorkshops.com

Subscribe to our newsletter

  • To receive free Community License for SQLTest.
  • To receive access to additional, bonus video tutorials and example scripts.
  • To keep the learning going.

NUMA remote (foreign) memory access overhead on Windows, SQL Server and In-Memory OLTP

In NUMA (Non-Uniform Memory Access), processors in the same physical location are grouped in a node which has its own local node memory. In a NUMA based system, there will be more than one such node and these nodes will use a shared interconnect mechanism to transfer data between them. In such a case, a processor accessing its local node memory will be much faster than the same processor accessing memory from remote node.

To receive the example scripts used in the video, subscribe to our mailing list: https://newsletter.sqlworkshops.comReference: https://software.intel.com/en-us/articles/optimizing-applications-for-numaMost operating systems optimize memory allocation on a NUMA based system, such that when a thread executing on a processor allocates memory, the operating system will try to allocate the memory from the processor’s local node unless there is a memory availability issue, in which case it will allocate memory from a remote node.

Using a simple Windows application, we can demonstrate that a thread running on a processor accessing remote memory is 15%+ more expensive than accessing its local memory depending on the processor model.

To receive the example scripts used in the video, subscribe to our mailing list: https://newsletter.sqlworkshops.com

In SQL Server, when memory is allocated for caching data, it is allocated from the local node whenever it is possible. In cases where a query scans a table serially (MAXDOP = 1) and where memory is allocated for data cache, memory is always allocated from the local node when possible. This might lead the table to reside entirely in one memory node. When a thread executing a query, retrieving data from this table, happens to be on a different node, the data access becomes expensive.

Remote memory is referred to as “foreign memory” in SQL Server. Too much foreign node memory allocation by itself does not indicate an issues as the threads that are accessing this memory can be from any node. SQL Server does some amount of optimization regarding NUMA, like when a query executes in parallel, it keeps all the threads on a single node when possible.

Like explained above, when a table is cached on a single NUMA node part of a large table or range scan and later accessed from another node part of the large table or range scan, like using serial execution (MAXDOP = 1), the performance penalty can be 15%+ memory depending on the processor model. To mitigate this issue it is recommended to scan the table in parallel (MAXDOP greater than the number of processors in a NUMA node) so the data is not isolated to a single NUMA node.
On the positive side, one can isolate a table to a single NUMA node using Resource Governor and access that table always using processors in that node and avoid this 15%+ penalty. On a 2 node NUMA system this can lead to 7.5%+ overall improvement.

To receive the example scripts used in the video, subscribe to our mailing list: https://newsletter.sqlworkshops.com

This applies to SQL Server 2014 In-Memory OLTP technology as well, with a slight variation. With regular tables, non-memory optimized tables, where data is not in the cache; it is loaded into the cache on demand (whenever someone accesses the data). Hence, it is important to distribute the data across all NUMA nodes during query execution for predictable performance.

With memory optimized tables, data is loaded by SQL Server at startup, during this time SQL Server distributes data across all NUMA nodes.

The problem with memory optimized tables and data distribution across NUMA nodes occurs only during initial data load into the table. This problem disappears after SQL Server restart as explained above. Since queries execute serially (always MAXDOP = 1) in In-Memory OLTP, there is a possibility that all the data inserted by a single thread will reside on a single NUMA node.

Since In-Memory OLTP technology uses native code, there are less processor instructions; this magnifies the performance impact on foreign memory access. With In-Memory OLTP technology, this penalty can be even 30%+ depending on the processor model.

Usually foreign memory access should not be an issue with In-Memory OLTP usage as one should not perform large table or range scans.
Again, like with normal table and large scans, it is recommended to load the table with many threads, which sometimes means splitting large inserts into small parts and executing then in separate batches.