x64? My Terminal Servers Run Just Fine With 32 Bits and 8/12/16 GB RAM!

A recent discussion with a colleague brought this topic to my attention which I have not discussed in detail in my series on Windows x64:

Memory scalability of 32-bit Terminal Servers

Many people seem to think that on 32-bit Windows only specially adapted applications can access memory above 4 GB. Since such applications are rare that would effectively mean that putting more than 4 GB RAM into a terminal server is a waste of resources.

Luckily, things are different. To understand why, we must take a look at the x86 processor architecture.

AWE

32-bit processes can, of course, only address 4 GB RAM because 2^32 equals 4 GB. But this limit can, like many other restrictions, be overcome with tricks. In this case the “trick” is called Address Windowing Extensions (AWE), a Windows API that allows 32-bit applications to access more physical memory than they have virtual address space. With AWE, memory above the 4 GB “barrier” can be accessed by mapping portions of it into the space below 4 GB. By moving this mapped slice around, a practically unlimited amount of memory can be accessed.

Now this moving around business may sound easy, nevertheless applications need to actively do it if they need more than 2 GB of memory (remember, half of each process’s virtual address space is reserved for the kernel). This is typically the case for large databases.

On terminal servers the situation is entirely different. Here we typically have many concurrently running processes each using less than 2 GB RAM. And here comes the good news: using more than 4 GB RAM is a no-brainer and requires no changes whatsoever to your applications. So how does it work?

PAE

Windows can be booted with the /PAE option. PAE makes the kernel use 4 additional address lines which are built into (nearly) every CPU since the Pentium Pro. Modern CPUs have 36 instead of 32 address lines and support a maximum of 64 GB RAM (with PAE enabled).

But how can a traditional 32-bit application that can only count to 2^32 and that knows nothing of AWE access such “high” memory?

Easy: all the hard work is done by the operating system. Remember that each application has its private virtual address space that always ranges from 0x00000000 to 0xFFFFFFFF. But programs cannot simply access any address in that range. Doing so would result in the well-known general protection fault. Instead applications must request memory from the OS. The OS then chooses the appropriate amount of free pages (RAM is managed in units of pages with a size of 4 KB each). Finally the pages are mapped from physical RAM into the virtual memory of the requesting process.

By virtualizing the address space of processes, it has become irrelevant to applications where a page actually is located. It could have been moved to disk (to the page file, of course) to make room for other process’s demands. Or it could reside in the area above 4 GB. Since applications only work with virtual addresses in the 32-bit range they need not be modified as long as they use less than 2 GB of memory.

With PAE, multiple 32-bit processes without AWE support can still only use 2 GB each. But it is perfectly possible to have 20 applications on a terminal server that each need 300 MB RAM if the machine is equipped with 8 GB (leaving room for the kernel and OS). This is because the map from virtual to physical memory is different for every process.

Caveats

Drivers, again.

Drivers often access physical memory directly. Badly written drivers use only 32-bit pointers and are thus not able to count higher than to 4 GB. That is the reason why Windows XP and Vista only support 4 GB RAM. Microsoft feared, bad drivers for consumer hardware would cause too many crashes on systems with more than 4 GB of memory.

And money.

Windows Server 2003 and 2008 support more than 4 GB RAM only in the Enterprise Editions. Those are significantly more expensive than the Standard Editions which are most often used for terminal servers.

Terminal Server Farm Scalability

In total, there are three options to scale your terminal server farm:

  • Use PAE and install more than 4 GB RAM per server (requires Enterprise Edition of Windows)
  • Install more than 4 GB RAM and reinstall every Server with Windows x64
  • Add more servers to the farm

Each of these options has its own little problems and drawbacks. But that is a topic for another article.

Side notes

Did you know that PAE is very likely enabled on the system you are currently working on? Every Windows since XP XP2 enables PAE if the CPU supports the no-execute (NX) feature because NX relies on PAE.

32-bit terminal servers will only scale up (by adding more RAM) if the system is not kernel memory constrained. See my earlier article on kernel memory limitations for details.

Quotes

From Microsoft KB article #283037:

To summarize, PAE is a function of the Windows 2000 and Windows Server 2003 memory managers that provides more physical memory to a program that requests memory. The program is not aware that any of the memory that it uses resides in the range greater than 4 GB, just as a program is not aware that the memory it has requested is actually in the page file.

AWE is an API set that enables programs to reserve large chunks of memory. The reserved memory is non-pageable and is only accessible to that program.

Comments

Related Posts

.NET Applications on Windows x64 - Easy? Yes and No

When migrating to 64-bit Windows, traditional “unmanaged” applications can pose challenges. That is because unmanaged binaries contain hardware-dependent CPU instructions - and the view on the hardware differs between 32- and 64-bit mode. But .NET? It should be unaffected of a system’s bitness since “managed” binaries contain instructions in a so-called intermediate language that is executed in a virtual machine at run-time and only then translated to machine language. But is it really? This article is about .NET programs that are dependent on OS bitness.
64-Bit Windows (x64)

XenApp and RDS Sizing Part 2 - Determining Farm Capacity

XenApp and RDS Sizing Part 2 - Determining Farm Capacity
This article is part of a mini-series. You can find the other articles here. As we have seen in part 1 of this series, when sizing a new farm the first thing we need to know is the capacity of the existing farm. Armed with data on capacity and additionally load, we can easily calculate the capacity of a new farm. In this article I describe how to determine capacity of the four relevant hardware components of a XenApp server: CPU, memory, storage and network.
Citrix/Terminal Services/Remote Desktop Services

Latest Posts

Fast & Silent 5 Watt PC: Minimizing Idle Power Usage

Fast & Silent 5 Watt PC: Minimizing Idle Power Usage
This micro-series explains how to turn the Lenovo ThinkCentre M90t Gen 6 into a smart workstation that consumes only 5 Watts when idle but reaches top Cinebench scores while staying almost imperceptibly silent. In the first post, I showed how to silence the machine by replacing and adding to Lenovo’s CPU cooler. In this second post, I’m listing the exact configuration that achieves the lofty goal of combining minimal idle power consumption with top Cinebench scores.
Hardware

Fast & Silent 5 Watt PC: Lenovo ThinkCentre M90t Modding

Fast & Silent 5 Watt PC: Lenovo ThinkCentre M90t Modding
This micro-series explains how to turn the Lenovo ThinkCentre M90t Gen 6 into a smart workstation that consumes only 5 Watts when idle but reaches top Cinebench scores while staying almost imperceptibly silent. In this first post, I’m showing how to silence the machine by replacing and adding to Lenovo’s CPU cooler. In a second post, I’m listing the exact configuration that achieves the lofty goal of combining minimal idle power consumption with top Cinebench scores.
Hardware