WMI Bug with Scale Out File Server

During the build out of our Windows Azure Pack infrastructure, I uncovered what I believe is a bug with WMI and Scale Out File Server. For us, the issue bubbled up in Virtual Machine Manager where deployments of VM templates from a library on a SOFS share would randomly fail with the following error:

Error (12710)

VMM does not have appropriate permissions to access the Windows Remote Management resources on the server ( CLOUD-LIBRARY01.domain.com).

Unknown error (0x80338105)

This issue was intermittent, and rebooting the SOFS nodes always seemed to clear up the problem. Upon tracing the process, I found BITS was getting an Access Denied error when attempting to create the URL in wsman. Furthermore, VMM was effectively saying the path specified did not exist. From the VMM trace:

ConvertUNCPathToPhysicalPath (catch CarmineException) [[(CarmineException#f0912a) { Microsoft.VirtualManager.Utils.CarmineException: The specified path is not a valid share path on CLOUD-LIBRARY01.domain.com.  Specify a valid share path on CLOUD-LIBRARY01.domain.com to the virtual machine to be saved, and then try the operation again.

Further testing, I found I got mixed results when querying cluster share properties via WMI:

PS C:\Users\jeff> gwmi Win32_ClusterShare -ComputerName CLOUD-LIBRARY01

None.

PS C:\Users\jeff> gwmi Win32_ClusterShare -ComputerName CLOUD-LIBRARY01

Name                                    Path                                    Description
—-                                    —-                                    ———–
\\CLOUD-VMMLIB\ClusterStorage$          C:\ClusterStorage                       Cluster Shared Volumes Default Share
\\CLOUD-LIBRARY\ClusterStorage$         C:\ClusterStorage                       Cluster Shared Volumes Default Share
\\CLOUD-VMMLIB\MSSCVMMLibrary           C:\ClusterStorage\Volume1\Shares\MSS…

Finally, while viewing procmon while performing the WMI queries:

A success:

Date & Time:  6/3/2014 3:56:20 PM
Event Class:   File System
Operation:     CreateFile
Result: SUCCESS
Path:   \\CLOUD-VMMLIB\PIPE\srvsvc
TID:    996
Duration:       0.0006634
Desired Access:        Generic Read/Write
Disposition:    Open
Options:        Non-Directory File, Open No Recall
Attributes:     n/a
ShareMode:   Read, Write
AllocationSize: n/a
Impersonating:         S-1-5-21-xxxx
OpenResult:   Opened

A failure:

Date & Time:  6/3/2014 3:56:57 PM
Event Class:   File System
Operation:     CreateFile
Result: ACCESS DENIED
Path:   \\CLOUD-VMMLIB\PIPE\srvsvc
TID:    996
Duration:       0.0032664
Desired Access:        Generic Read/Write
Disposition:    Open
Options:        Non-Directory File, Open No Recall
Attributes:     n/a
ShareMode:   Read, Write
AllocationSize: n/a
Impersonating:         S-1-5-21-xxx

What’s happening here is that WMI is attempting to access the named pipe of the server service on the SOFS cluster object. Because we’re using SOFS, the DNS entry for the SOFS cluster object contains IP’s for every server in the cluster. The WMI call attempts to connect using the cluster object name, but because of DNS round robin, that may or may not be the local node. It would have appropriate access to that named pipe for the local server, but it will not for other servers in the cluster.

There are two workarounds for this issue. First, you can add a local hosts file entry on each of the cluster nodes containing the SOFS cluster object pointing back to localhost, or second, you can add the computer account(s) of each cluster node to the local Administrators group of all other cluster nodes. We chose to implement the first workaround until the issue can be corrected by Microsoft.

Resolving error 0x8007007e Cannot cannect to wmi provider

Recently, I had to troubleshoot a problem with SQL backups via Microsoft Data Protection Manager 2012 SP1 for a SQL Server 2008 system. DPM was alerting us that database auto-protection failed with error code ID 32511. The detailed errors showed that DPM could not enumerate SQL Server instances using Windows Management Instrumentation on the protected computer. This error was detailed in the DPMRACurr.errlog on the production server:

WARNING Failed: Hr: = [0x8007007e] : unable to execute the WQL query: SELECT * FROM ServerSettings

This pointed to a problem with the underlying WMI configuration for SQL, so I used wbemtest.exe from the remote DPM server to test WMI connectivity. If you are unsure of exactly what WMI namespaces are in use or what queries are being run, you can use WMI Tracing to see what’s happening under the hood.

Log Name: Microsoft-Windows-WMI-Activity/Trace
Source: Microsoft-Windows-WMI-Activity
Date: 10/22/2013 3:59:39 PM
Event ID: 1
Task Category: None
Level: Information
Keywords:
User: SYSTEM
Computer: SERVERNAME
Description:
GroupOperationId = 9283379; OperationId = 9300341; Operation = Start IWbemServices::ExecQuery – SELECT * FROM ServerSettings; ClientMachine = DPMSERVER; User = jeff; ClientProcessId = 2540; NamespaceName = \\.\root\Microsoft\SqlServer\ComputerManagement10

Once wbemtest is open, connect to the appropriate namespace:

SQL 2005
\\SERVERNAME\root\Microsoft\SqlServer\ComputerManagement

SQL 2008 & 2008 R2
\\SERVERNAME\root\Microsoft\SqlServer\ComputerManagement10

SQL 2012
\\SERVERNAME\root\Microsoft\SqlServer\ComputerManagement11

Once connected, try executing the WQL query that your application is using – in my case, it was SELECT * FROM ServerSettings. Doing this resulted in the error:

Number: 0x8007007e
Facility: Win32
Description: The specified module could not be found.

Some quick research shows this can most often be resolved by recompiling the WMI template for SQL with mofcomp:

http://support.microsoft.com/kb/956013

On 64-bit Windows with SQL 2008, the command is:

mofcomp “C:\Program Files (x86)\Microsoft SQL Server\100\Shared\sqlmgmproviderxpsp2up.mof”

You may need to adjust the command for bitness and version of SQL and then restart the WMI service for the changes to take effect. However, this did not resolve the issue on the specific system where I was encountering the problem. The same error was returned when trying to run a query in wbemtest after recompiling and restarting the service, the DPM console also displayed the same error when attempting to enumerate SQL instances. The 0x8007007e error typically means a DLL or registration is missing. Time to break out procmon and see what’s happening under the covers. Using filters to include only the wmiprvse.exe process and excluding entries with a SUCCESS result, I could see that there was a file it seemed to be looking for, but could not find:

Procmon WMI SQL

 

It seemed to be scouring the path looking for sqlmgmprovider.dll and svrenumapi100.dll. I checked on disk, and sure enough, neither of those files existed under the path C:\Program Files\Microsoft SQL Server\100\Shared, however, their 32-bit counterparts were located under C:\Program Files\Microsoft SQL Server\100\Shared. Checking another  64-bit SQL 2008 server, I was able to find those files under that first path. After copying them from a known working system, the error was resolved. Also, the second file was only listed in procmon once I copied the first to the server and retested, so it make take several passes to completely resolve.

Note that this resolved this specific error for me, though it may not be the best solution. The reason those files were not on the server is because there was only a 32-bit instance of SQL Server on the system. By adding those two files and re-running wbemtest, an error was no longer returned, but the query also did not show any instances of SQL Server because it was querying for 64-bit instances.