I’ve reported on this twice already but it seems a fix will be offered soon. I discovered the problem back in March when I did a project where we virtualized a large amount of Citrix XenApp servers on an AMD platform with RVI capabilities. As Hardware MMU increased performance significantly it was enabled by default for 32Bit OS’es. This is when we noticed that large pages(side effect of enabling MMU) are not TPS’ed and thus give a totally different view of resource consumption than on your average cluster. When vSphere and Nehalem was released more customers experienced this behavior, as EPT(Intel’s version of RVI) is fully supported and utilized on vSphere, as reported in this article. To be absolutely clear: large pages were never supposed to be TPS’ed and this is not a bug but actually working as designed. However; we did discover an issue with the algorithm  being used to calculate Guest Active Memory which causes the alarms to be triggered as “kichaonline” describes in this reply.

I’m not going to reiterate everything that has been reported in this VMTN Topic about the problem, but what I would like to mention is that a patch will be released soon to fix the incorrect alarms:

Several people have, understandably, asked about when this issue will be fixed. We are on track to resolving the problem in Patch 2, which is expected in mid to late September.

In the meantime, disabling large page usage as a temporary work-around is probably the best approach, but I would like to reiterate that this causes a measurable loss of performance. So once the patch becomes available, it is a good idea to go back and reenable large pages.

Also a small clarification. Someone asked if the temporary work-around would be “free” (i.e., have no performance penalty) for Win2k3 x64 which doesn’t enable large pages by default. While this may seem plausible, it is however not the case. When running a virtual machine, there are two levels of memory mapping in use: from guest linear to guest physical address and from guest physical to machine address. Large pages provide benefits at each of these levels. A guest that doesn’t enable large pages in the first level mapping, will still get performance improvements from large pages if they can be used for the second level mapping. (And, unsurprisingly, large pages provide the biggest benefits when both mappings are done with large pages.) You can read more about this in the “Memory and MMU Virtualization” section of this document:

http://www.vmware.com/resources/techresources/10036

Thanks,
Ole

Mid / Late september may sound to vague for some and that’s probably why Ole reported the following yesterday:

The problem will be fixed in Patch 02, which we currently expect to be available approximately September 30.

Thanks,
Ole

http://www.yellow-bricks.com/2009/09/11/memory-alarms-triggered-with-amd-rvi-and-intel-ept/