Bare metal to cloud hosted virtual machines Non-Uniform Memory Access (NUMA) is a four-letter word in every sense, well beyond simply the number of characters in the acronym to many data professionals, particularly those in an operational database administrator (DBA) role.
Fear not dear reader, this article will remove some mystery around the basics of NUMA, address how those basics relate to SQL Server, and take a word or two to differentiate between hardware NUMA and soft NUMA.
To the delight of some and the trepidation of others, we simply cannot discuss NUMA without talking about hardware. For a system (computer) to be a NUMA system it must have more than one CPU, not multiple cores, rather multiple physical sockets with a CPU in them. In relation to the scope of this article, the number of cores per CPU is functionally irrelevant. As we think about the hardware layout of a motherboard with multiple CPU sockets, each socket will have a bank of slots for RAM which are directly connected to the CPU by a dedicated connection.
Each socket and directly attached RAM is known as a node hence a two-socket computer would have two NUMA nodes, a four socket would have four NUMA nodes, etc. Generally, each NUMA node will have maximum direct access to the total maximum system RAM divided by the number of CPU sockets.
For example, a dual socket system with a 1TB (1024GB) RAM maximum would have two NUMA nodes in which each socket can directly access 512GB of RAM while in a four-socket system with the same maximum RAM each NUMA node would directly access 256GB of RAM. However, that is not to say a socket cannot access all of the RAM, the socket simply cannot directly access RAM outside of its own node.
There are cross connects between NUMA nodes which allow for access to memory in other nodes, but accessing remote memory is quite expensive from a compute perspective.
If your SQL Server instance is running on bare metal, aka a physical server, feel free to skip to the next section. However, if you are running in a virtual environment, there are a few more items to talk about.
When a virtual machine is built the configuration of the machine needs to consider that the number of virtual CPUs (vCPU) and virtual cores (vCores). The basic rule of thumb is do not assign more vCores to a vCPU than there are physical cores (pCores) in a physical CPU (pCPU) on the smallest host the virtual machine could end up running on.
If the smallest host has four, eight core processors no vCPU should be created with more than eight vCores. Conversely, if you create a VM with two vCPUs of 4 vCores each the hypervisor will coalesce those, so they run on a single socket, rather than managing NUMA. Please check specific documentation for your hypervisor of choice for additional details. When it comes to cloud-based Infrastructure as a Service VMs you are likely to be provisioning the VM in a provider defined class thus we will need to trust that the provider has considered NUMA in their design.
The most prominent way which NUMA affects SQL Server is as it relates to the Maximum Degree of Parallelism (MAXDOP). When we consider the high cost of compute to access memory in a remote NUMA node, the basis of guidelines saying we want to keep MAXDOP less than the number of cores beings to make sense.
The following tables from Microsoft’s “Configure the max degree of parallelism Server Configuration Option” document give some guidance on setting MAXDOP. Please note that the phrase “logical processors” refers to the number of hyper-threaded cores, if hyper threading is enabled or the physical cores without hyper-threading.
Starting with SQL Server 2016 (13.x), use the following guidelines when you configure the max degree of parallelism server configuration value:
Number of processors
Server with single NUMA node
Less than or equal to 8 logical processors
Keep MAXDOP at or below # of logical processors
Greater than 8 logical processors
Keep MAXDOP at 8
Server with multiple NUMA nodes
Less than or equal to 16 logical processors per NUMA node
Keep MAXDOP at or below # of logical processors per NUMA node
Greater than 16 logical processors per NUMA node
Keep MAXDOP at half the number of logical processors per NUMA node with a MAX value of 16
From SQL Server 2008 through SQL Server 2014 (12.x), use the following guidelines when you configure the max degree of parallelism server configuration value:
Less than or equal to 8 logical processors per NUMA node
Greater than 8 logical processors per NUMA node
The tables above infer core counts in a single processor can grow rather large and that is in fact true. In supported versions of SQL Server (2012 and higher at the time of this writing) there exists a concept called software-NUMA or soft-NUMA.
For SQL Server instances that are 2014SP2 and newer, soft-NUMA is automatically configured, while prior to that it is a manual configuration. When SQL Server detects a hardware NUMA node with more than 10 cores per socket, soft-NUMA splits the hardware node generally allowing for improved performance and scalability.
The preferred number of cores per soft-NUMA node is 8, though it can drop as low as five cores per node or go as high as nine cores per node. Additional details on soft-NUMA, including the registry entries for manual configuration are available in the Microsoft document “Soft-NUMA (SQL Server)”.
One of the basic things we need to keep in mind when managing a system that has NUMA nodes, is that we need to look at resources per node rather than overall per system. It is completely possible to have the CPU of a single node be heavily used while the other CPUs are doing very little.
You can imagine that if you were to look at CPU as it pertains to the entire system in a four NUMA node system, what you would see would appear to be somewhere around 25% of total CPU, which runs the risk of throwing off any investigation.
Similarly, memory information, such as, page life expectancy, needs to be viewed on a per node basis. In Windows Perfmon, you will want to look at the Processor Information group, rather than Processor, of counters in which the cores will be referenced as a pair of numbers indication the NUMA node first and the specific core second.
For example, 1,3 would be core 3 on NUMA node 1. As you investigate memory, you will want to use the SQL Server Buffer Node group, rather than SQL Server Buffer Manager, in which the NUMA nodes will be listed by number. The differentiation holds true if you are querying the sys.dm_os_performance_counters DMV from within SQL Server.
We’ve discussed the basic concept where in a multi-processor system, a CPU and its directly attached memory make up a single NUMA node and while a node can access the resources in another node doing so is a costly operation from a compute perspective.
This incurred cost is why we keep our MAXDOP within the bounds of a single NUMA node. The bounds of that NUMA node are always going to be set by the hardware, but where core counts exceed ten, there may also be software division of that node via soft-NUMA. Managing resources such as CPU utilization and memory usage necessitates us to look at the node level rather than the holistic system level when multiple NUMA nodes are in play.
Aside to the documents linked above, Microsoft has additional documentation covering NUMA as it pertains to SQL Server and the Windows Operating system. Entering NUMA or non-uniform memory access into your search engine of choice will also yield many computer science/computer engineering, scholarly documents on the technicalities of NUMA, which were far beyond the target scope of this particular post.
Explore IDERA's SQL Server Solutions.
Written by Peter Shore/posted by Zak Cole
Powered by IDERA