SCVMM 2008 R2 Installation defaults and Self Service Portal

SCVMM is an excellent product for managing your Hyper-V environment. The Self Service Portal (SSP), a component of the SCVMM install, allows end users to manage and deploy VM’s remotely. However, but sure to read the fine print when installing.

During the installation process, you will be prompted to select what ports the VMM Agent will use when communicating with the Hyper-V host. The default ports WinRM and BITS use are 80 and 443 respectively. If you plan on running the Self Service Portal from the same host system, you will either need to change the ports the VMM Agent uses or change the ports the Self Service Portal uses.

Since browsers and IIS always default to 80 and 443 for HTTP and HTTPS, I would recommend making the change to the VMM Agent. Port 8080 for the VMM Agent control port (WinRM) and 8443 for the VMM Agent data port (BITS) are nice alternatives. Note that using a different IP for the SSP is NOT an option, as WinRM and BITS will self-configure to listen on all IP addresses thereby hijacking the ports.

A quick note, changing the default ports is recommended if you are planning on running ANY website on the same box. For instance, we had initially installed SCVMM on the same box running Operations Manager. It wasn’t until our VM migrations began failing that we realized the default installation of SCVMM was being interfered with the Operations Manager Console which was also running on the same system.

Lastly, you may not even receive an error of any type when having this issue – rather, the SSP simply won’t install. You may see behavior similar to this:

http://social.technet.microsoft.com/Forums/en-US/virtualmachinemanager/thread/bba52f08-7b95-4a74-9c9b-ceaf0499e29c/#1dbef478-f896-48e4-af4e-b455d120c10b

Error 0x800423f3 backing up Hyper-V VM with DPM 2007

One error you may receive while backing up a Hyper-V VM with DPM 2007 is the generic “DPM encountered a retryable VSS error. (ID 30112 Details: Unknown error (0x800423f3) (0x800423F3)).” There are a couple of different things that could cause this error. The two most common are:

1. You are running a Windows Server 2008 SP1 Hyper-V host and do not have the appropriate pre-requisites installed. Specifically, the hotfix described in KB959962.

http://technet.microsoft.com/en-us/library/dd347840.aspx

2. There is a VSS error of some kind inside the VM causing the Hyper-V VSS writer to fail.

One of the most common VSS errors inside a Server 2008 VM I have seen, is event id 8193:

Log Name:      Application
Source:        VSS
Date:          <DateTime>
Event ID:      8193
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      <ComputerName>
Description:
Volume Shadow Copy Service error: Unexpected error calling routine ConvertStringSidToSid.  hr = 0x80070539.

Operation:
OnIdentify event
Gathering Writer Data
Context:
Execution Context: Shadow Copy Optimization Writer
Writer Class Id: {4dc3bdd4-ab48-4d07-adb0-3bee2926fd7f}
Writer Name: Shadow Copy Optimization Writer
Writer Instance ID: {3586f039-f2f9-4dcb-a46e-3aaa20f1a2fa}
This error can be solved by following the instructions in this blog post. Specifically, perform these steps outline in KB947242:
  • Delete unresolvable SIDs in the ‘Administrators’ group on the VM.
  • Open regedit and locate ‘HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList’
  • Under the ProfileList subkey, delete any subkey that is named SID.bak

This has resolved the issue in most cases where I have seen that DPM error occur. Some other suggested troubleshooting tips that have solved this problem for me in the past:

  • Re-install the Integration Components and reboot the VM
  • Resolve issues for any VSS writers not listed as stable from the “vssadmin list writers” command on the host or inside the VM. You can restart the following services to resolve some problems
    • System Writer – Cryptographic Services service (doesn’t affect the system)
    • IIS Metabse Writer – IIS Administrative service (will reset all of IIS)
    • SqlServerWriter – SQL VSS service (doesn’t affect SQL)
    • WMI Writer – Windows Management Instrumentation service (WMI will be unavailable during the
      service restart)
    • BITS Writer – BITS service (BITS will be unavailable during the service restart)
  • Re-register VSS components as described in KB940032
  • Ensure there is sufficient space inside the VM for shadow copies

DPM Daily Maintenance Script

We recently completed a project to move over 300 servers from our old backup infrastructure to a brand new disk-based DPM 2007 solution. We have been very pleased with DPM 2007 thus far, but are finding that it required a fair amount of hand holding in the mornings to kick off failed jobs, increase disk allocations, and perform consistency checks. Unfortunately, the DPM console can only be loaded on the DPM server itself, and it cannot connect to a remote DPM server. That means logging in via RDP to each DPM server and addressing the alerts. After a few weeks of doing this by hand, we added them to our SCOM 2007 server which helped consolidate the alerts to a single interface, but we found we could not modify disk allocations via SCOM.

So I sat down and hashed out DPM Daily Maintenance Script. This powershell script will query the database for alerts and addresses the four most common. Replica disk and Recovery Point Volume threshold exceeded, Replica is inconsistent, and Recovery Point creation failed. The script takes 4 optional parameters:

replicaIncreaseRatio – Percentage of existing replica disk size to increase (ie. 1.1 increases by 10%. This is the default if nothing is specified)
scIncreaseRatio –  Percentage of existing recovery point volume size to increase (ie. 1.1 increases by 10%. This is the default if nothing is specified)
replicaIncreaseSize – Fixed value to increase replica disk (ie. 1GB)
scIncreaseSize – Fixed value to increase recovery point volume (ie. 1GB)

The script will first query the database for alerts, and then sorts them alphabetically and by alert type. This means that if a replica became inconsistent because the replica disk threshold was exceeded or if a recovery point creation failed because the recovery point volume threshold was exceeded, the script will increase the size of the volume before re-running the job. Also, for replica disks, the script will actually query the original datasource and resize the replica disk to the current workload’s size plus the ratio or fixed amount specified in the script. This ensures that the replica disk is extended to the proper amount during the first pass in cases where a large amount of data is added to the workload.

We have been running this scripts on 6 DPM servers for about 6 weeks now and I have to say they have virtually eliminated the daily maintenance (I was on vacation for 2 weeks during that time and DPM happily hummed along without any intervention, self-healing twice per day). We still use SCOM to monitor the alerts and are manually checking for replicas that are constantly becoming inconsistent or recovery point creations that are consistently failing and addressing those by hand. We have setup a scheduled task that runs twice per day using the following command line:

C:\Windows\system32\windowspowershell\v1.0\powershell.exe -PSConsoleFile “C:\Program Files\Microsoft DPM\DPM\bin\dpmshell.psc1” -command “.’C:\admin\DailyMaintenance.ps1′” >> C:\admin\DailyMaintenance.log

DailyMaintenance

There are a few 3rd party products that can help with these same alerts, and Microsoft is working on making our lives easier with DPM v3, but in the meantime, this should take some of the burden off of the sys admins.