WMI Bug with Scale Out File Server

During the build out of our Windows Azure Pack infrastructure, I uncovered what I believe is a bug with WMI and Scale Out File Server. For us, the issue bubbled up in Virtual Machine Manager where deployments of VM templates from a library on a SOFS share would randomly fail with the following error:

Error (12710)

VMM does not have appropriate permissions to access the Windows Remote Management resources on the server ( CLOUD-LIBRARY01.domain.com).

Unknown error (0x80338105)

This issue was intermittent, and rebooting the SOFS nodes always seemed to clear up the problem. Upon tracing the process, I found BITS was getting an Access Denied error when attempting to create the URL in wsman. Furthermore, VMM was effectively saying the path specified did not exist. From the VMM trace:

ConvertUNCPathToPhysicalPath (catch CarmineException) [[(CarmineException#f0912a) { Microsoft.VirtualManager.Utils.CarmineException: The specified path is not a valid share path on CLOUD-LIBRARY01.domain.com.  Specify a valid share path on CLOUD-LIBRARY01.domain.com to the virtual machine to be saved, and then try the operation again.

Further testing, I found I got mixed results when querying cluster share properties via WMI:

PS C:\Users\jeff> gwmi Win32_ClusterShare -ComputerName CLOUD-LIBRARY01

None.

PS C:\Users\jeff> gwmi Win32_ClusterShare -ComputerName CLOUD-LIBRARY01

Name                                    Path                                    Description
—-                                    —-                                    ———–
\\CLOUD-VMMLIB\ClusterStorage$          C:\ClusterStorage                       Cluster Shared Volumes Default Share
\\CLOUD-LIBRARY\ClusterStorage$         C:\ClusterStorage                       Cluster Shared Volumes Default Share
\\CLOUD-VMMLIB\MSSCVMMLibrary           C:\ClusterStorage\Volume1\Shares\MSS…

Finally, while viewing procmon while performing the WMI queries:

A success:

Date & Time:  6/3/2014 3:56:20 PM
Event Class:   File System
Operation:     CreateFile
Result: SUCCESS
Path:   \\CLOUD-VMMLIB\PIPE\srvsvc
TID:    996
Duration:       0.0006634
Desired Access:        Generic Read/Write
Disposition:    Open
Options:        Non-Directory File, Open No Recall
Attributes:     n/a
ShareMode:   Read, Write
AllocationSize: n/a
Impersonating:         S-1-5-21-xxxx
OpenResult:   Opened

A failure:

Date & Time:  6/3/2014 3:56:57 PM
Event Class:   File System
Operation:     CreateFile
Result: ACCESS DENIED
Path:   \\CLOUD-VMMLIB\PIPE\srvsvc
TID:    996
Duration:       0.0032664
Desired Access:        Generic Read/Write
Disposition:    Open
Options:        Non-Directory File, Open No Recall
Attributes:     n/a
ShareMode:   Read, Write
AllocationSize: n/a
Impersonating:         S-1-5-21-xxx

What’s happening here is that WMI is attempting to access the named pipe of the server service on the SOFS cluster object. Because we’re using SOFS, the DNS entry for the SOFS cluster object contains IP’s for every server in the cluster. The WMI call attempts to connect using the cluster object name, but because of DNS round robin, that may or may not be the local node. It would have appropriate access to that named pipe for the local server, but it will not for other servers in the cluster.

There are two workarounds for this issue. First, you can add a local hosts file entry on each of the cluster nodes containing the SOFS cluster object pointing back to localhost, or second, you can add the computer account(s) of each cluster node to the local Administrators group of all other cluster nodes. We chose to implement the first workaround until the issue can be corrected by Microsoft.