XenApp and RDS Sizing Part 2 - Determining Farm Capacity

This article is part of a mini-series. You can find the other articles here.

As we have seen in part 1 of this series, when sizing a new farm the first thing we need to know is the capacity of the existing farm. Armed with data on capacity and additionally load, we can easily calculate the capacity of a new farm. In this article I describe how to determine capacity of the four relevant hardware components of a XenApp server: CPU, memory, storage and network.

CPU Capacity

If you only have one CPU model, determining the total CPU capacity is simple: just add up the total number of cores. But what if you have different models? How do you add the performance of, say, a Xeon 7100 and a Xeon E5640? Simply adding Gigahertz values does not cut it; modern CPUs are more efficient per Herz than older ones.

Normalization

You can add numbers from two different systems only after you normalize them, i.e. make them comparable. One method of making CPUs comparable is to run benchmarks. I did not want to do that, though. Way too cumbersome. There had to be a more elegant solution. And there is.

Moore’s law states that the number of transistors - and thus performance - doubles every 18-24 months. That gives us a simple way to calculate the relative performance of two processors. The only things we need to know are the CPUs’ release dates. Easy enough. But wait! How can Moore’s law be universally true? It is nothing more than one man’s prediction dating back to 1965!

It can easily be shown that Moore’s law is true, at least has been for the past 40 years:

Transistor Count and Moore’s Law - 2011 by Wgsimon under CC

As to the why: Reality conforming to Gordon Moore’s prediction would be entirely by chance if many of the important players in the semiconductor industry had not started to use his law to time their product releases, making Moore’s law a self-fulfilling prophecy.

Calculating with Gordon Moore

Adapted to our situation Moore’s law states:

CPU B is twice as fast as CPU A if B was released 18-24 months after A and both CPUs are from the same price segment.

As a formula, it looks like this:

Moores Law - Formula 2

In words: the performance of CPU B, released t months after CPU A, is equal to (e to the power of 0.033 times t) times the performance of CPU A.

Normalized CPU Cores

When applying above formula to a typical set of server CPUs we get the following table:

CPUCoresFirst soldPerf indexPerf index per core
Xeon 7100 3.0 GHz208/20062.01.0
Xeon 7100 3.4 GHz203/20072.521.26
Xeon E5440411/20073.280.82
Xeon X5550403/20095.561.40
Xeon E5640403/20108.262.06

As you can see, we set the oldest CPU’s performance index per core to 1.0. Moore’s law gives us performance indices for other CPUs.

By calculating the performance index per CPU core we normalized CPU performance values. As a result, we can now directly add the (normalized) performance values of different CPU models. In our example, with 10 Xeon 7100 3.0 GHz, 16 Xeon 7100 3.4 GHz, 6 Xeon E5440, 8 Xeon X5550 and 6 Xeon 5640, the table looks like this:

CPUNumber of CPUsNumber of coresPerf index per coreNumber of normalized cores
Xeon 7100 3.0 GHz10201.020
Xeon 7100 3.4 GHz16321.2640.3
Xeon E54406240.8219.7
Xeon X55508321.444.8
Xeon E56406242.0649.4
Total44126-174.2

Memory Capacity

Determining RAM capacity is easier than determining CPU capacity because we can simply add RAM values from different servers. Different speeds (DDR, DDR2, DDR3, etc.) do not matter in practice. However, there is one thing that makes this a little more interesting: we need the amount of free RAM. Only RAM that is available for user sessions is relevant for farm capacity. Calculate it like this:

RAM formula

with:

  • RAMinstalled: Physical RAM in the server
  • RAMinvisible: Memory that cannot be accessed by the operating system
  • RAMOS: Memory used by the OS

Physical RAM installed in the server can be invisible to the OS for several reasons, technical as well as licensing. Especially on 32-bit platforms a relatively large part of the 4 GB address space is needed for device drivers to communicate with the hardware. The following table illustrates this for several generations of popular HP servers:

Server modelInvisible RAMPercentage lost (out of 4 GB total)
DL360 G40.5 GB12.5%
DL360 G50.75 GB18.75%
DL360 G60.51 GB12.8%
DL360 G70.51 GB12.8%

The amount of memory used by the OS should really be called memory used by the platform because it includes add-on software components like Citrix XenApp, antivirus, user environment management and so on. In short: everything that not directly belongs to a user session. Here are some values I compiled for Server 2003 and 2008 R2 (your mileage can - and will - vary):

DescriptionServer 2003 x86Server 2008 R2 x64
Kernel175 MB300 MB
Session 0 (OS + XA + AV)325 MB600 MB
File system cache400 MB500 MB
Total900 MB1,400 MB

File system cache is very important. If it gets too small performance goes down, response times go up.

Did you ever notice how much memory you wasted with yesterday’s 4 GB 32-bit servers? Out of the 3,5 GB that were actually usable, more than 25% was required by the OS alone. Less than 75% of all RAM could be used for its main purpose, hosting user sessions!

Continuing our example from above, we calculate total memory capacity for our 23 server farm as follows:

Server modelNumber of systemsFree RAM per W2k3 serverFree RAM (sum)
DL360 G4132,684 MB34,892 MB
DL360 G532,428 MB7,284 MB
DL360 G642,674 MB10,696 MB
DL360 G732,674 MB8,022 MB
Total23-60,894 MB

Storage Performance

I deliberately do not use the term capacity with storage lest people only think in terms of size. Much more relevant for multi-user systems with hundreds of concurrent processes each depending on mass storage is throughput - no, not how many bytes per second the storage system can read or write, but how many different IO operations it can process per second. IOPS is by far the most important storage metric in our field of work, and for that reason I used it to characterize a farm’s storage performance.

How do we determine the total number of IOPS an existing farm can process? Although we can simply add IOPS values from different hard drives we still need the numbers for each type of drive. Unfortunately measuring IOPS is not easy because the result greatly depends on how you measure (transfer request size, queue depth, random vs. sequential, read vs. write). Luckily we are bound to find traditional spinning disks in the servers of older farms (and probably also in most new farm servers) whose IOPS values depend mainly on rotational speed. Ballpark numbers that are good enough for our purpose can be found in Wikipedia:

DeviceIOPS
7,200 rpm SATA HDD75-100
10,000 rpm SATA HDD125-150
10,000 rpm SAS HDD140
15,000 rpm SAS HDD175-210

With this data total IOPS calculation becomes dead simple. Continuing our example, we probably have 15 K disks even in older G4 servers, each equipped with 2 disks configured as RAID-1 mirrors. Since only one disk is active in a two-disk mirror we use only one disk per server in our calculation. With an average value per disk of 190 IOPS we get:

Total IOPS per farm = 23 * 190 = 4,370

Notice how small this number is compared to SSDs? Even a cheap consumer SSD easily outperforms those 23 active server hard disks by a factor of two - and that is in the more difficult discipline of writing data.

Network Capacity

For our purposes network capacity is equal to throughput in bytes per second. The XenApp servers in our example are equipped with one active gigabit network connection. Gigabit ethernet is capable of transferring roughly 100 MB/s.

Total network capacity: 23 * 100 MB/s = 2,25 GB/s

Being able to transfer 2,25 Gigabytes per second is pretty impressive and very probably this is much more than we need for XenApp.

Capacity per User

Server capacity values are certainly nice to have, but what we really need are values per user. Continuing with our example, our 23-server farm was designed for a maximum of 250 concurrent users. That is also the number of Citrix XenApp licenses the customer purchased. Farm capacity per user can be calculated by dividing total farm capacity by the user count:

ComponentFarm capacityCapacity per user
CPU174.2 norm. cores0.7 norm. cores
Memory60,894 MB244 MB
Storage4,370 IOPS18 IOPS
Network2,300 MB/s9.2 MB/s

Preview

In this article I have shown how to assess the computing power you have in your datacenter. The next article will be about determining how that hardware is used, in other words: determining the load.

Comments

Related Posts

XenApp and RDS Sizing Part 4 – Calculating the New Farm's Capacity

This article is part of a mini-series. You can find the other articles here. In the previous articles in this series we saw how to calculate a farm’s capacity and then how to determine its load. With that information and knowledge of our methodology we can go about calculating the capacity of the new farm, in other words doing the actual sizing. Which is dead simple, by the way.
Citrix/Terminal Services/Remote Desktop Services

How-to: XenApp/RDS Sizing and Capacity Planning with uberAgent for Splunk

How-to: XenApp/RDS Sizing and Capacity Planning with uberAgent for Splunk
Do you know the maximum number of users each of your terminal servers can host with acceptable performance? You may have found out the hard way how many are too many - but how many are just right? Farm sizing and server capacity planning are typical tasks for consultants who often have a hard time fighting the peculiarities of perfmon and logman trying to get the data they need for their calculations. It can be so much easier at no additional cost. The 60-day Enterprise Trial version of Splunk in conjunction with an evaluation license of uberAgent give all the information you need in much less time. Here is how.
Citrix/Terminal Services/Remote Desktop Services

Latest Posts

Fast & Silent 5 Watt PC: Minimizing Idle Power Usage

Fast & Silent 5 Watt PC: Minimizing Idle Power Usage
This micro-series explains how to turn the Lenovo ThinkCentre M90t Gen 6 into a smart workstation that consumes only 5 Watts when idle but reaches top Cinebench scores while staying almost imperceptibly silent. In the first post, I showed how to silence the machine by replacing and adding to Lenovo’s CPU cooler. In this second post, I’m listing the exact configuration that achieves the lofty goal of combining minimal idle power consumption with top Cinebench scores.
Hardware

Fast & Silent 5 Watt PC: Lenovo ThinkCentre M90t Modding

Fast & Silent 5 Watt PC: Lenovo ThinkCentre M90t Modding
This micro-series explains how to turn the Lenovo ThinkCentre M90t Gen 6 into a smart workstation that consumes only 5 Watts when idle but reaches top Cinebench scores while staying almost imperceptibly silent. In this first post, I’m showing how to silence the machine by replacing and adding to Lenovo’s CPU cooler. In a second post, I’m listing the exact configuration that achieves the lofty goal of combining minimal idle power consumption with top Cinebench scores.
Hardware