WMI Bug with Scale Out File Server

During the build out of our Windows Azure Pack infrastructure, I uncovered what I believe is a bug with WMI and Scale Out File Server. For us, the issue bubbled up in Virtual Machine Manager where deployments of VM templates from a library on a SOFS share would randomly fail with the following error:

Error (12710)

VMM does not have appropriate permissions to access the Windows Remote Management resources on the server ( CLOUD-LIBRARY01.domain.com).

Unknown error (0x80338105)

This issue was intermittent, and rebooting the SOFS nodes always seemed to clear up the problem. Upon tracing the process, I found BITS was getting an Access Denied error when attempting to create the URL in wsman. Furthermore, VMM was effectively saying the path specified did not exist. From the VMM trace:

ConvertUNCPathToPhysicalPath (catch CarmineException) [[(CarmineException#f0912a) { Microsoft.VirtualManager.Utils.CarmineException: The specified path is not a valid share path on CLOUD-LIBRARY01.domain.com.  Specify a valid share path on CLOUD-LIBRARY01.domain.com to the virtual machine to be saved, and then try the operation again.

Further testing, I found I got mixed results when querying cluster share properties via WMI:

PS C:\Users\jeff> gwmi Win32_ClusterShare -ComputerName CLOUD-LIBRARY01

None.

PS C:\Users\jeff> gwmi Win32_ClusterShare -ComputerName CLOUD-LIBRARY01

Name                                    Path                                    Description
—-                                    —-                                    ———–
\\CLOUD-VMMLIB\ClusterStorage$          C:\ClusterStorage                       Cluster Shared Volumes Default Share
\\CLOUD-LIBRARY\ClusterStorage$         C:\ClusterStorage                       Cluster Shared Volumes Default Share
\\CLOUD-VMMLIB\MSSCVMMLibrary           C:\ClusterStorage\Volume1\Shares\MSS…

Finally, while viewing procmon while performing the WMI queries:

A success:

Date & Time:  6/3/2014 3:56:20 PM
Event Class:   File System
Operation:     CreateFile
Result: SUCCESS
Path:   \\CLOUD-VMMLIB\PIPE\srvsvc
TID:    996
Duration:       0.0006634
Desired Access:        Generic Read/Write
Disposition:    Open
Options:        Non-Directory File, Open No Recall
Attributes:     n/a
ShareMode:   Read, Write
AllocationSize: n/a
Impersonating:         S-1-5-21-xxxx
OpenResult:   Opened

A failure:

Date & Time:  6/3/2014 3:56:57 PM
Event Class:   File System
Operation:     CreateFile
Result: ACCESS DENIED
Path:   \\CLOUD-VMMLIB\PIPE\srvsvc
TID:    996
Duration:       0.0032664
Desired Access:        Generic Read/Write
Disposition:    Open
Options:        Non-Directory File, Open No Recall
Attributes:     n/a
ShareMode:   Read, Write
AllocationSize: n/a
Impersonating:         S-1-5-21-xxx

What’s happening here is that WMI is attempting to access the named pipe of the server service on the SOFS cluster object. Because we’re using SOFS, the DNS entry for the SOFS cluster object contains IP’s for every server in the cluster. The WMI call attempts to connect using the cluster object name, but because of DNS round robin, that may or may not be the local node. It would have appropriate access to that named pipe for the local server, but it will not for other servers in the cluster.

There are two workarounds for this issue. First, you can add a local hosts file entry on each of the cluster nodes containing the SOFS cluster object pointing back to localhost, or second, you can add the computer account(s) of each cluster node to the local Administrators group of all other cluster nodes. We chose to implement the first workaround until the issue can be corrected by Microsoft.

VMM 2012 R2 service crashes on start with exception code 0xe0434352

Was working on a new VMM 2012 R2 install for a Windows Azure Pack POC and spent the better part of a day dealing with a failing VMM Service. SQL 2012 SP1 had been installed on the same server and during install, VMM was configured to run under the local SYSTEM account and use the local SQL instance. Installation completed successfully, but the VMM service would not start, logging the following errors in the Application log in Event Viewer:

Log Name: Application
Source: .NET Runtime
Date: 12/31/2013 12:43:27 PM
Event ID: 1026
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: AZPK01
Description:
Application: vmmservice.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.AggregateException
Stack:
at Microsoft.VirtualManager.Engine.VirtualManagerService.WaitForStartupTasks()
at Microsoft.VirtualManager.Engine.VirtualManagerService.TimeStartupMethod(System.String, TimedStartupMethod)
at Microsoft.VirtualManager.Engine.VirtualManagerService.ExecuteRealEngineStartup()
at Microsoft.VirtualManager.Engine.VirtualManagerService.TryStart(System.Object)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.TimerQueueTimer.CallCallback()
at System.Threading.TimerQueueTimer.Fire()
at System.Threading.TimerQueue.FireNextTimers()

Log Name: Application
Source: Application Error
Date: 12/31/2013 12:43:28 PM
Event ID: 1000
Task Category: (100)
Level: Error
Keywords: Classic
User: N/A
Computer: AZPK01
Description:
Faulting application name: vmmservice.exe, version: 3.2.7510.0, time stamp: 0x522d2a8a
Faulting module name: KERNELBASE.dll, version: 6.3.9600.16408, time stamp: 0x523d557d
Exception code: 0xe0434352
Fault offset: 0x000000000000ab78
Faulting process id: 0x10ac
Faulting application start time: 0x01cf064fc9e2947a
Faulting application path: C:\Program Files\Microsoft System Center 2012 R2\Virtual Machine Manager\Bin\vmmservice.exe
Faulting module path: C:\windows\system32\KERNELBASE.dll
Report Id: 0e0178f3-7243-11e3-80bb-001dd8b71c66
Faulting package full name:
Faulting package-relative application ID:

I attempted re-installing VMM 2012 R2 and selected a domain account during installation, but had the same result. I enabled VMM Tracing to collect debug logging and was seeing various SQL exceptions:

[0]0BAC.06EC::‎2013‎-‎12‎-‎31 12:46:04.590 [Microsoft-VirtualMachineManager-Debug]4,2,Catalog.cs,1077,SqlException [ex#4f] caught by scope.Complete !!! (catch SqlException) [[(SqlException#62f6e9) System.Data.SqlClient.SqlException (0x80131904): Could not obtain information about Windows NT group/user ‘DOMAIN\jeff’, error code 0x5.

I was finally able to find a helpful error message in the standard VMM logs located under C:\ProgramData\VMMLogs\SCVMM.\report.txt (probably should have looked their first):

System.AggregateException: One or more errors occurred. —> Microsoft.VirtualManager.DB.CarmineSqlException: The SQL Server service account does not have permission to access Active Directory Domain Services (AD DS).
Ensure that the SQL Server service is running under a domain account or a computer account that has permission to access AD DS. For more information, see “Some applications and APIs require access to authorization information on account objects” in the Microsoft Knowledge Base at http://go.microsoft.com/fwlink/?LinkId=121054.

My local SQL instance was configured to run under a local user account, not a domain account. I re-checked the VMM installation requirements, and this requirement is not documented anywhere. Sure enough, once I reconfigured SQL to run as a domain account (also had to fix a SPN issue: http://softwarelounge.co.uk/archives/3191) and restarted the SQL service, the VMM service started successfully.

SCVMM 2008 R2 Installation defaults and Self Service Portal

SCVMM is an excellent product for managing your Hyper-V environment. The Self Service Portal (SSP), a component of the SCVMM install, allows end users to manage and deploy VM’s remotely. However, but sure to read the fine print when installing.

During the installation process, you will be prompted to select what ports the VMM Agent will use when communicating with the Hyper-V host. The default ports WinRM and BITS use are 80 and 443 respectively. If you plan on running the Self Service Portal from the same host system, you will either need to change the ports the VMM Agent uses or change the ports the Self Service Portal uses.

Since browsers and IIS always default to 80 and 443 for HTTP and HTTPS, I would recommend making the change to the VMM Agent. Port 8080 for the VMM Agent control port (WinRM) and 8443 for the VMM Agent data port (BITS) are nice alternatives. Note that using a different IP for the SSP is NOT an option, as WinRM and BITS will self-configure to listen on all IP addresses thereby hijacking the ports.

A quick note, changing the default ports is recommended if you are planning on running ANY website on the same box. For instance, we had initially installed SCVMM on the same box running Operations Manager. It wasn’t until our VM migrations began failing that we realized the default installation of SCVMM was being interfered with the Operations Manager Console which was also running on the same system.

Lastly, you may not even receive an error of any type when having this issue – rather, the SSP simply won’t install. You may see behavior similar to this:

http://social.technet.microsoft.com/Forums/en-US/virtualmachinemanager/thread/bba52f08-7b95-4a74-9c9b-ceaf0499e29c/#1dbef478-f896-48e4-af4e-b455d120c10b