True Cloud Storage – Storage Spaces Direct

Let me start off by saying this is one of the most exciting new features in the next version of Windows Server. Prior to joining Microsoft, as a customer, I had been begging for this feature for years. Well, the product team has delivered – and in a big way. All of the testing I have performed on this has shown that this is a solid technology, with minimal performance overhead. You can expect to see near line-rate network speeds and full utilization of disk performance. It is as resilient as you would expect, and as flexible as you will need. Recently announced at Ignite 2015, Storage Spaces Direct (S2D) is available in the Server 2016.

The previous version of storage spaces relied on a shared SAS JBOD being connected to all storage nodes. While this helps reduce the cost of storage by replacing expensive SANs with less expensive SAS JBOD’s, it still is a higher cost than leveraging DAS for a few reasons: Dual-port SAS drives are required in order to provide redundant connectivity to the SAS backplane, the JBOD itself has additional cost, and the datacenter costs associated with the rack space, power, and cabling for the JBOD shelf. Storage Spaces Direct overcomes these hurdles by leveraging local disks in server chassis, thereby creating cloud storage out of the lowest cost hardware. You can finally leverage SATA SSD’s with Storage Spaces and still maintain redundancy and failover capabilities. Data is synchronously replicated to disks on other nodes of the cluster leveraging SMB3, meaning you can take advantage of SMB Multichannel and RDMA capable nics for maximum performance.

Here’s a quick look at a Storage Spaces Direct deployment in my lab:

Shared Nothing Storage Spaces

Storage Spaces Direct

Clustering creates virtual enclosures using the local disks in each storage node. Data is the automatically replicated and tiered according to the policy of the virtual disk. There are several recommendations for this release:

  • A minimum of 4 nodes is recommended to ensure availability of data in servicing and failure scenarios
  • 10Gbps network with RDMA is the recommended backplane between nodes
  • Three-way mirrored virtual disks are recommended
  • The server HBA must be capable of pass-through or JBOD mode

All of the other features you know and love about Storage Spaces work in conjunction with this feature. So how do we configure this? Powershell of course. Standard cluster configuration applies, and it’s assumed you have a working cluster, network configured, and disks ready to go. The process of configuring S2D has been streamlined to a single command in Server 2016: Enable-ClusterS2D. This command will scan all available storage on cluster nodes, add the physical disks to a new storage pool, and create two storage tiers based on the disk characteristics. Alternatively, you can manually create the storage pool and tiers by skipping the eligibility checks when enabling S2D.

Enable-ClusterS2D

We can confirm S2D has been enabled by looking at the cluster properties:

Get-Cluster | fl S2D*

Storage Spaces Direct Cluster Properties

And that enclosures have been created, and all disks are recognized by the cluster:

Disks and Enclosures

Disks and Enclosures

The storage pool and tiers should also now be visible, I chose to create mine manually (see my post on SSD’s and Storage Spaces Performance).

S2D Storage Pool

Storage Spaces Direct Storage Pool

If the media type of your disks shows up as “Unspecified” which can happen with a guest cluster on a 3rd party hypervisor like ESXi, you can set them manually to HDD/SSD now that there are in a storage pool like so:

Get-StoragePool pool1 | Get-PhysicalDisk | Set-PhysicalDisk -MediaType SSD

The enclosures, disks and storage pool will now appear in Failover Cluster Manager:

S2D-Failover-Cluster-Manager

Failover Cluster Manager Enclosures and Disks

Lastly, we’ll create a virtual disk (you can do this through the GUI as well). You’ll want to tweak the number of columns for your environment (hint: for number of columns, it’s best not to exceed the number of local disks per virtual enclosure):

New-Volume -StoragePoolFriendlyName “pool1” -FriendlyName “vm-storage1” -PhysicalDiskRedundancy 2 -FileSystem CSVFS_NTFS -ResiliencySettingName Mirror -Size 250GB

Viola! We have a mirrored virtual disk that’s been created using storage spaces direct:

Virtual Disk

Virtual Disk

Rebooting a node will cause the pool and virtual disk to enter a degraded state. When the node comes back online, it will automatically begin a background Regeneration storage job. The fairness algorithm we set earlier will ensure sufficient disk IO access to workloads while regenerating the data as quickly as possible – this will saturate the network bandwidth, which, in this case, is a good thing. You can view the status of the running job by using the Get-StorageJob cmdlet. Disk failures are handled the same there were previously, retire and then remove the disk.

Regarding resiliency, I ran several IO tests while performing various simulated failures:

Node Reboot – 33% performance drop for 20s then full performance, 10% performance drop during regeneration

Node Crash (hard power cycle) – Full IO pause for 20s then full performance, 10% performance drop during regeneration

Disk Surprise Removal – Full IO pause for 4s then full performance, 10% performance drop during regeneration

Note that when using Hyper-V over SMB, the IO pause does not impact running VM’s. They will see increased latency, but the VM’s will continue to run.

SSD’s on Storage Spaces are killing your VM’s performance

We’re wrapping up a project that involved Windows Storage Spaces on Server 2012 R2. I was very excited to get my hands on new SSDs and test out Tiered Storage Spaces with Hyper-V. As it turns out, the newest technology in SSD drives combined with the default configuration of Storage Spaces is killing performance of VM’s.

First, it’s important to understand sector sizes on physical disks, as this is the crux of the issue. The sector size is the amount of data the physical disk controller inside your hard disk actually writes to the storage medium. Since the invention of the hard disk, sector sizes have been 512 bytes for hard drives.  Many other aspects of storage are based on this premise. Up until recently, this did not pose an issue. However, with larger and larger disks, this caused capacity problems. In fact, the 512-byte sector is the reason for the 2.2TB limit with MBR partitions.

Disk manufacturers realized that 512-byte sector drives would not be sustainable at larger capacities, and started introducing 4k sector, aka Advanced Format, disks beginning in 2007. In order to ensure compatibility, they utilized something called 512-byte emulation, aka 512e, where the disk controller would accept reads/writes of 512 bytes, but use a physical sector size of 4k. To do this, internal cache temporarily stores the 4k of data from physical medium and the disk controller manipulates the 512 bytes of data appropriately before writing back to disk or sending the 512 bytes of data to the system. Manufacturers took this additional processing into account when spec’ing performance of drives. There are also 4k native drives which use a physical sector size of 4k and do not support this 512-byte translation in the disk controller – instead they expect the system to send 4k blocks to disk.

The key thing to understand is that since SSD’s were first released, they’ve always had a physical sector size of 4k – even if they advertise 512-bytes. They are by definition either 512e or 4k native drives. Additionally, Windows accommodates 4k native drives by performing these same Read-Modify-Write, aka RMW, functions at the OS level that are normally performed inside the disk controller on 512e disks. This means that if the OS sees you’re using a disk with a 4k sector size, but the system receives a 512b, it will read the full 4k of data from disk into memory, replace the 512 bytes of data in memory, then flush the 4k of data from memory down to disk.

Enter Storage Spaces and Hyper-V. Storage Spaces understands that physical disks may have 512-byte or 4k sector sizes and because it’s virtualizing storage, it too has a sector size associated with the virtual disk. Using powershell, we can see these sector sizes:

Get-PhysicalDisk | sort-object SlotNumber | select SlotNumber, FriendlyName, Manufacturer, Model, PhysicalSectorSize, LogicalSectorSize | ft

Get-PhysicalDisk

Any disk whose PhysicalSectorSize is 4k, but LogicalSectorSize is 512b is a 512e disk, a disk with a PhysicalSectorSize and LogicalSectorSize of 4k is a 4k native disk, and any disk with 512b for both PhysicalSectorSize and LogicalSectorSize is a standard HDD.

The problem with all of this is that the when creating a virtual disk with Storage Spaces, if you do not specify a LogicalSectorSize via the Powershell cmdlet, the system will create a virtual disk with a LogicalSectorSize equal to the greatest PhysicalSectorSize of any disk in the pool. This means if you have SSD’s in your pool and you created the virtual disk using the GUI, your virtual disk will have a 4k LogicalSectorSize.  If  a 512byte write is sent to a virtual disk with a 4k LogicalSectorSize, it will perform the RMW at the OS level – and if you’re physical disks are actually 512e, then they too will have to perform RMW at the disk controller for each 512-bytes of the 4k write it received from the OS. That’s a bit of a performance hit, and can cause you to see about 1/4th of the advertised write speeds and 8x the IO latency.

Why this matters with Hyper-V? Unless you’ve specifically formatted your VHDx files using 4k sectors, they are likely using 512-byte sectors, meaning every write to a VHDx storage on a Storage Spaces virtual disk is performing this RMW operation in memory at the OS and then again at the disk controller. The proof is in the IOMeter tests:

32K Request, 65% Read, 65% Random

Virtual Disk 4k LogicalSectorSize

RMW-IOMeter

Virtual Disk 512b LogicalSectorSize

512-IOMeter