VMware vs. Microsoft: Why Memory Overcommitment is Useful in Production and Why Microsoft Denies it
In the ongoing virtualization war much has been written (pro and con) about the value of memory overcommitment, a feature VMware ESX has and Microsoft Hyper-V is lacking (XenServer, too, for that matter). But only few people take a look at what the term overcommitment actually means. In this article I will explain how overcommitment works, why it greatly benefits VDI installations and why Microsoft denies exactly this.
What is Memory Overcommitment?
Memory overcommitment is not a feature in itself, but a collection of technologies out of which “transparent page sharing” is the most interesting in my opinion. The usefulness of transparent page sharing will spring to mind immediately if you consider VDI environments. There, host machines typically run large numbers of identical client operating systems with the same applications installed on them. Since all the clients/guests run the same application set, most of the code pages they need to keep in memory are identical. Each guest keeps separate copies of all system and application EXEs and DLLs in memory – what a waste!
Now consider the following: in memory program code is organized in units of pages. If you had a component that identified all those identical pages in each guest’s virtual memory and map them to one set of pages in the host’s physical memory, you could reduce the code memory footprint of n virtual machines to that of one single virtual machine. And that is in a nutshell what VMware’s “transparent page sharing” does.
Windows Has Been Overcommitting Since the Earliest Days of NT!
You can also look at it another way: This technology has been around for ages and is built into every modern operating system. Consider a typical Windows system with several applications running. All of those applications require a common set of system DLLs, like advapi32.dll, ntdll.dll and so on. It would be a waste of RAM to load those system DLLs into memory multiple times. Instead, the OS loads each DLL only once and maps the one set of physical pages into each process’s virtual memory. I have explained this in more detail here.
What VMware has done is actually quite simple: they have taken the proven technology of page sharing and taken it one step further: From multiple processes on a single OS to multiple OS’s on a single VM host. This is the logical step to take. Others will take it when they have the technology ready.
So why does Microsoft not recognize the value of transparent page sharing? Simple: because they do not have it yet. It is the same with VMotion/LiveMigration and countless other technologies in the past whose value Microsoft traditionally plays down until they have it themselves.
VMware white paper explaining memory resource management in ESX server
Although being a huge VMware fan, I don’t think you are right on this one. You are only writing about transparent memory page sharing and not about memory overcommitment. What I see as overcommitment is when you have less physical RAM in your host then needed for the physical RAM needed by all your VMs.
So lets take the VDI example:
– 50 VMs with XP in it. Each 1Gb RAM assigned (a bit much but just for the argument)
– Now through transparent memory page sharing, each VM actually needs just 512Mb of RAM to itself.
– So the ESX host would have to have 50x 512Mb = 25.6Gb RAM
In my opinion, overcommitment is when you have a host with just 24 Gb or less RAM. And that is a situation I would never allow in a production environment !!!
So maybe someone can clarify on what the correct definition is of overcommitment.
you are right: transparent page sharing is not equal to memory overcommitment. The latter is not a technology but a more general term encompassing a range of technologies which, taken together, allow to assign more memory to the guests than is physically available on the host.
In my article I picked the most interesting of the overcommitment technologies and tried to explain how it works and why it is useful.
To your example: If you assign 1 GB RAM to 50 machines each then the host would need 50 GB RAM (plus what ESX needs itself). As I understand it that is the situation with Hyper-V. With transparent page sharing enabled, some of the memory used by each of the VMs need not be allocated 50 times in physical RAM, but only once. In average that might be (to pick a likely number) 768 MB per guest. All guests combined would consume 768 MB x 50 = 37,5 GB RAM. That is certainly less than 50 times 1 GB.
Maybe we use a different definition of “overcommitment”. I use the one used in the following article:
At least that is how I understood it.
Nice article. I posted some further thoughts on this on my blog. http://www.shawnbass.com/Blogs/tabid/58/EntryID/153/Default.aspx
Posted a similar response over on Shawn’s site, since he brought up ASLR:
If you guys check out this post from Microsoft, http://blogs.technet.com/b/virtualization/archive/2011/02/15/vmware-aslr-follow-up-blog.aspx, and get past the mud-slinging in the comments, you can walk away with a pretty good understanding of how ASLR impacts density. Turn it off, density increases (w/ TPS enabled), but security decreases. The security benefit of ASLR also appears to be argued as much as the benefit of TPS, so caveat administrator. :)