True Cloud Storage – Storage Spaces Direct

Let me start off by saying this is one of the most exciting new features in the next version of Windows Server. Prior to joining Microsoft, as a customer, I had been begging for this feature for years. Well, the product team has delivered – and in a big way. All of the testing I have performed on this has shown that this is a solid technology, with minimal performance overhead. You can expect to see near line-rate network speeds and full utilization of disk performance. It is as resilient as you would expect, and as flexible as you will need. Recently announced at Ignite 2015, Storage Spaces Direct (S2D) is available in the Server 2016.

The previous version of storage spaces relied on a shared SAS JBOD being connected to all storage nodes. While this helps reduce the cost of storage by replacing expensive SANs with less expensive SAS JBOD’s, it still is a higher cost than leveraging DAS for a few reasons: Dual-port SAS drives are required in order to provide redundant connectivity to the SAS backplane, the JBOD itself has additional cost, and the datacenter costs associated with the rack space, power, and cabling for the JBOD shelf. Storage Spaces Direct overcomes these hurdles by leveraging local disks in server chassis, thereby creating cloud storage out of the lowest cost hardware. You can finally leverage SATA SSD’s with Storage Spaces and still maintain redundancy and failover capabilities. Data is synchronously replicated to disks on other nodes of the cluster leveraging SMB3, meaning you can take advantage of SMB Multichannel and RDMA capable nics for maximum performance.

Here’s a quick look at a Storage Spaces Direct deployment in my lab:

Shared Nothing Storage Spaces

Storage Spaces Direct

Clustering creates virtual enclosures using the local disks in each storage node. Data is the automatically replicated and tiered according to the policy of the virtual disk. There are several recommendations for this release:

  • A minimum of 4 nodes is recommended to ensure availability of data in servicing and failure scenarios
  • 10Gbps network with RDMA is the recommended backplane between nodes
  • Three-way mirrored virtual disks are recommended
  • The server HBA must be capable of pass-through or JBOD mode

All of the other features you know and love about Storage Spaces work in conjunction with this feature. So how do we configure this? Powershell of course. Standard cluster configuration applies, and it’s assumed you have a working cluster, network configured, and disks ready to go. The process of configuring S2D has been streamlined to a single command in Server 2016: Enable-ClusterS2D. This command will scan all available storage on cluster nodes, add the physical disks to a new storage pool, and create two storage tiers based on the disk characteristics. Alternatively, you can manually create the storage pool and tiers by skipping the eligibility checks when enabling S2D.

Enable-ClusterS2D

We can confirm S2D has been enabled by looking at the cluster properties:

Get-Cluster | fl S2D*

Storage Spaces Direct Cluster Properties

And that enclosures have been created, and all disks are recognized by the cluster:

Disks and Enclosures

Disks and Enclosures

The storage pool and tiers should also now be visible, I chose to create mine manually (see my post on SSD’s and Storage Spaces Performance).

S2D Storage Pool

Storage Spaces Direct Storage Pool

If the media type of your disks shows up as “Unspecified” which can happen with a guest cluster on a 3rd party hypervisor like ESXi, you can set them manually to HDD/SSD now that there are in a storage pool like so:

Get-StoragePool pool1 | Get-PhysicalDisk | Set-PhysicalDisk -MediaType SSD

The enclosures, disks and storage pool will now appear in Failover Cluster Manager:

S2D-Failover-Cluster-Manager

Failover Cluster Manager Enclosures and Disks

Lastly, we’ll create a virtual disk (you can do this through the GUI as well). You’ll want to tweak the number of columns for your environment (hint: for number of columns, it’s best not to exceed the number of local disks per virtual enclosure):

New-Volume -StoragePoolFriendlyName “pool1” -FriendlyName “vm-storage1” -PhysicalDiskRedundancy 2 -FileSystem CSVFS_NTFS -ResiliencySettingName Mirror -Size 250GB

Viola! We have a mirrored virtual disk that’s been created using storage spaces direct:

Virtual Disk

Virtual Disk

Rebooting a node will cause the pool and virtual disk to enter a degraded state. When the node comes back online, it will automatically begin a background Regeneration storage job. The fairness algorithm we set earlier will ensure sufficient disk IO access to workloads while regenerating the data as quickly as possible – this will saturate the network bandwidth, which, in this case, is a good thing. You can view the status of the running job by using the Get-StorageJob cmdlet. Disk failures are handled the same there were previously, retire and then remove the disk.

Regarding resiliency, I ran several IO tests while performing various simulated failures:

Node Reboot – 33% performance drop for 20s then full performance, 10% performance drop during regeneration

Node Crash (hard power cycle) – Full IO pause for 20s then full performance, 10% performance drop during regeneration

Disk Surprise Removal – Full IO pause for 4s then full performance, 10% performance drop during regeneration

Note that when using Hyper-V over SMB, the IO pause does not impact running VM’s. They will see increased latency, but the VM’s will continue to run.

Partition Alignment

Squeezing every ounce of performance out of your disk array is critical in IO intensive applications. Most times, this is simply an after-thought. However, doing a little leg-work during the implementation phase can go a long way to increasing the performance of your application. Aligning partitions is a great idea for SQL and virtualized environments – these are the places you will see the most benefit.

The concept of aligning partitions is actually quite simple and applies to SAN’s and really any disk array alike. If you are using RAID in any capacity, then aligning disk partitions will help increase performance. It is best illustrated by the following graphics, borrowed from http://www.vmware.com/pdf/esx3_partition_align.pdf (This is a great read, but specific to VMWare environments, however, the same concepts apply).

Using unaligned partitions in a virtual environment, you can see that a read could ultimately result in 3 disks accesses to the underlying disk subsystem:

By aligning partitions properly, that same read results in just 1 disk access:

While these graphics are Virtual Machine and VMWare specific, the same is true for Hyper-V and SQL (except remove the middle layer for SQL). In order for partition alignment to work properly, you need to ensure that the lowest level of the disk sub-system has the highest segment size (also referred to as stripe size). Depending upon your RAID controller or SAN, this could default to as low as 4K or as high as 1024K. I won’t cover what differences in segment sizes mean for performance, that’s an entirely difference discussion, but generally speaking defaults are usually 64K or 128K. The basic idea behind a proper stripe size is that you want to size it so that most of your reads/writes can happen in 1 operation.

From there, you need to ensure that your block or file allocation unit size is set properly – ideally smaller or the same size as the segment size and that it is a multiple of the segment size. Lastly, you should then set the offset to the same as the segment size. By default, Windows 2003 will offset by 31.5K, Windows 2008 by 1024K, and VMWare VMFS default’s to 128.

Setting the segment size may or may not be an online operation – that depends entirely on your RAID controller or SAN as to whether this can be done to an already configured array or if it has to be done during the initial configuration. Changing the offset and/or block size of a partition however is NOT an online operation. This means that all data will have to be removed from the partition, the offset configured, and the partition recreated. Prior to Windows 2008, this cannot be done to system partitions so for Windows 2003, you would have to attach the virtual hard disk to another system, set the offset and format the partition, and then perform the windows installation.

The following links provide detailed information about aligning partitions in both VMWare and Windows. Consult your SAN or RAID controller documentation for setting or finding out the segment size.

Recommendations for Aligning VMFS Partitions

Disk Partition Alignment Best Practices for SQL Server