Benchmarking a $11,699 HP Enterprise Performance SSD

[Update on 29/06/2012: added tests 5 and 6 with RAID controller cache and Atlantis Ilio]

When a 400 GB SSD comes nearly at the price of a small car, you expect a lot, performance-wise. At least I did when I got my hands on two drives from HP’s enterprise performance series. Read on to find out if that hefty price tag is justified.

Test Environment

HP Enterprise Performance SSD

The two SSDs came in a ProLiant DL380 G7. The machine was a test/demo device. It was equipped with two Intel Xeon E5640 @ 2,67 GHz, 64 GB RAM and a Smart Array P410i RAID controller (without battery-backed write cache - keep that in mind for now, I will come back to it later). There were 6 2.5" SAS drives installed: 2 15K drives with 146 GB capacity, 2 7.2K or 10K drives with 1 TB capacity and the 2 SSDs this article is all about: HP Enterprise Performance 400 GB, 6G, SLC, SAS (EO400FBRWA), data sheet, list price $11,699.

I tested the performance of the SSDs from virtual machines running on ESX 4.0. The VMs used local storage on the SSDs. Each VM had a size of 40 GB and was thickly provisioned. The guests’ OS was Windows 7 x64 SP1 with 8 GB RAM and 2 vCPUs. The VMFS version was 3.33 with a block size of 1 MB (smallest possible). No Citrix PVS or MCS anywhere in sight. Each VM had McAfee VirusScan Enterprise 8.8 installed.

All in all a typical case of persistent VDI, although on non-typical disk hardware.

IOMeter Configuration

I performed all tests with IOMeter (the 2006 version) configured as follows:

  • Size of the test area on disk: 10,000,000 sectors (= 5,000,000 KB = 4,76 GB)
  • 1 manager, 2 workers
  • Transfer request size: 4 KB
  • 100% random
  • Aligned on sector boundaries
  • Duration 1 minute

Test Results

Except where otherwise noted all tests were run 4 times. The value shown is the average of those 4.

Test 1: One VM Running

The VM was on a single-disk RAID-0. Only one instance of IOMeter was running in the single powered-on VM.

# of outstanding IOs100% read100% write
11,5121,761
43,7705,073
86,6567,998
167,03310,766
246,29911,214

As can be clearly seen, the queue length needs to be high enough for the SSD to operate at peak performance. If chosen too high, performance drops again. This device seems to have its sweet spot somewhere between 16 and 24. I tested higher values but from 24 onwards performance got worse.

Two things are particularly striking about these results:

  1. Write IOPS are much higher than read IOPS. Usually it is the other way around.
  2. The IOPS are really low. According to the data sheet, the SSD should deliver 42,000 read IOPS and 13,300 write IOPS.

Test 2: Two VMs Running

I thought maybe this SSD delivers maximum performance only when concurrency is very high. To check that, I repeated the first test, but this time I ran two instances of IOMeter concurrently in two different virtual machines.

# of outstanding IOs100% read100% write
4x27,3218,475
8x27,47111,368
12x27,3469,672

The performance is basically identical. Still looking for an explanation.

Test 3: Comparing with a Consumer SSD

To get a handle on the numbers I repeated the tests with the same OS image on an HP Elitebook 8560p laptop which was equipped with an Intel X25m 160 GB consumer SSD (SSDSA2M160G2HP).

# of outstanding IOs100% read100% write
15,0565,361
418,0129,512
828,65210,284
1637,7309,928
3245,21611,071
4845,12110,314

The consumer drive peaks at a queue length of 32. As expected, its read IOPS are much higher than its write IOPS. Interestingly, the consumer SSD’s read IOPS are much higher than the server SSD’s read IOPS.

Test 4: Hypervisor Footprint

Could it be that the hypervisor somehow reduced the SSD’s performance? To test that I installed Windows Server 2008 R2 on the hardware, eliminating the hypervisor altogether and tested some more.

# of outstanding IOs100% read100% write
168,08511,928

Although a little better than with ESX, the performance is substantially unchanged.

Other Considerations

To rule out the number of IOMeter workers, I performed an additional test with 8 workers per manager on the physical Windows installation. Here the write IOPS were 11985, nearly identical to the value with 2 workers.

Deleting the IOMeter file iobw.tst between tests did not change the results either.

I performed all tests without a battery-backed write cache on the RAID controller the disks were attached to. The reason for that is simple: the demo machine was configured that way and I did not have a cache module to play with. According to the SSD’s data sheet the difference in write performance should be 42% (18,900 write IOPS instead of 13,300). The data sheet does not say anything about read performance with the cache module.

Adding a RAID Controller Cache Module

A little while after this article was initially published we got a 1 GB battery-backed cache module for the Smart Array P410i. I repeated the test with this new configuration.

Test 5: With RAID Controller Cache Memory

Only two test runs per test here.

# of outstanding IOs100% read100% write
12,9477,226
410,07117,336
817,72517,173
1629,92317,123
2434,68917,077
3234,25417,034

Now we’re talking! Apparently a RAID controller without cache memory is not much good.

Atlantis Ilio on Magnetic Drives

I am not sure whether this is a valid benchmark/comparison, but since we had the system in place I thought I might publish the numbers for a different configuration, too, just for the fun of it.

On the same server, I installed Atlantis Ilio, a very neat storage optimization product. In addition to deduplicating on the block level, Ilio also reduces IOPS by doing clever things like combining many smaller IOs into fewer larger ones.

Test 6: With RAID Controller Cache Memory

Only two test runs per test here.

Virtual machine located in an Ilio data store accessed via NFS. The store was placed on a RAID-0 array comprised of the server’s two 15K 146 GB disks.

# of outstanding IOs100% read100% write
14,7003,397
47,6945,454
88,9166,085
1612,3638,194
2412,1788,003
3214,2568,486

Given that a 15K drive delivers at most 300 IOPS these are really astonishing numbers. In all honesty, I am not sure if maybe Ilio works better with IOMeter workloads than it does with real VDI user workloads.

ConclusionThe enterprise and the consumer SSDs differ in price by factor 17.5 (price per GB, calculated from the price of the X25m’s successor: $500 for 300 GB). That is a lot, given that the consumer device outperforms the enterprise drive in read performance by 30% and lags behind only 36% in write IOPS (at least in the workload used for this test). The enterprise SSD would need to be a lot more reliable to make up for the difference in price.

Apart from that I learned from these tests that there is no way around a cache module on a RAID controller. That got me thinking - when do we get the “real” (raw) performance of that enterprise SSD - with or without cache on the RAID controller? Naively I would say without - and the cache gives it a boost just as it would any other drive. Am I maybe wrong with this?

Finally, it is refreshing to see what kind of performance is possible with two 15K spindles and a little help from Atlantis Ilio. It looks like Ilio gives magnetic drives just the kind of boost they need to be usable in local disk VDI deployments.

Comments

Related Posts

Persistent VDI in the Real World - Storage

Persistent VDI in the Real World - Storage
This is the second article in a multi-part series about building and maintaining an inexpensive scalable platform for VDI in enterprise environments. Previously in this Series I started this series by defining requirements. Without proper requirements, everything else is moot. Remember that we are looking at running typical enterprise desktop workloads, we are trying to centralize desktops and our primary desktop hosting technology is multi-user Windows, aka RDS/XenApp.
Citrix/Terminal Services/Remote Desktop Services

Latest Posts

Fast & Silent 5 Watt PC: Minimizing Idle Power Usage

Fast & Silent 5 Watt PC: Minimizing Idle Power Usage
This micro-series explains how to turn the Lenovo ThinkCentre M90t Gen 6 into a smart workstation that consumes only 5 Watts when idle but reaches top Cinebench scores while staying almost imperceptibly silent. In the first post, I showed how to silence the machine by replacing and adding to Lenovo’s CPU cooler. In this second post, I’m listing the exact configuration that achieves the lofty goal of combining minimal idle power consumption with top Cinebench scores.
Hardware

Fast & Silent 5 Watt PC: Lenovo ThinkCentre M90t Modding

Fast & Silent 5 Watt PC: Lenovo ThinkCentre M90t Modding
This micro-series explains how to turn the Lenovo ThinkCentre M90t Gen 6 into a smart workstation that consumes only 5 Watts when idle but reaches top Cinebench scores while staying almost imperceptibly silent. In this first post, I’m showing how to silence the machine by replacing and adding to Lenovo’s CPU cooler. In a second post, I’m listing the exact configuration that achieves the lofty goal of combining minimal idle power consumption with top Cinebench scores.
Hardware