Using same remote SQL 2012 SP1 instance for DPM 2012 SP1 and DPM 2012 R2

We recently began to deploy DPM 2012 R2 into our environment. For ease of management, we use a single remote SQL instance for all of our DPM installations. Naturally, we decided to use the same remote SQL 2012 SP1 instance for new DPM 2012 R2 installs.

One of the first steps requires that you run the DPM Remote SQL Prep on the SQL server. When we ran this from the DPM 2012 R2 installation media, it upgraded the existing DPM 2012 SP1 Remote SQL Prep files causing all of the existing jobs on the DPM 2012 SP1 servers to fail. The errors were not evident in the DPM console, rather they were logged to in the SQL Agent on the remote SQL instance:

Message
Executed as user: DOMAIN\sqlservice. The process could not be created for step 1 of job 0x8ADCFE6FE202F04F8C7A11C240E42059 (reason: The system cannot find the file specified). The step failed.

The resolution was to re-run the DPM Remote SQL Prep install from the DPM 2012 SP1 media AFTER the DPM Remote SQL Prep install was run from the DPM 2012 R2 media on the remote SQL server. This restored the necessary files on disk and jobs began running again immediately.

SQL Server Reporting Services error installing DPM 2012 SP1 with remote SQL 2012 database

We use Microsoft Data Protection Manager in our environment to protect our Windows workloads. Recently, DPM 2012 SP1 was released and we have begun the process of upgrading each of our DPM servers to this version, but encountered a problem with the latest server to be upgraded. Though the prerequisite check was successful, DPM would fail to install citing an error with SQL Server Reporting Services on our remote SQL 2012 server:

DPM Setup cannot query the SQL Server Reporting Services configuration

DPM Setup cannot query the SQL Server Reporting Services configuration

Viewing the error log, we can see the following error attempting to query the SSRS configuration via WMI:

[3/4/2013 12:05:44 PM] Information : Getting the reporting secure connection level for DPMSQL01/MSSQLSERVER
[3/4/2013 12:05:44 PM] Information : Querying WMI Namespace: \\DPMSQL01\root\Microsoft\SqlServer\ReportServer\RS_MSSQLSERVER\v10\admin for query: SELECT * FROM MSReportServer_ConfigurationSetting WHERE InstanceName=’MSSQLSERVER’
[3/4/2013 12:05:44 PM] * Exception : => System.Management.ManagementException: Provider load failure

DPM is using WMI to get information about the SSRS installation, and is getting a “Provider load failure” error message. The natural troubleshooting technique here is to attempt to run this query manually via wbemtest from the SQL server itself, and sure enough, we end up with a 0x80041013 “Provider Load Failure” error message:

0x80041013 Provider Load Failure

0x80041013 Provider Load Failure

The SQL Server was originally deployed as SQL 2008 R2 and then upgraded to SQL 2012 SP1. Though there is a KB article describing this issue, there is no update for SQL 2012 SP1. You’ll also notice that the path mentioned in the error code includes v10 – which refers to SQL 2008. So, it seems as though the underlying problem has to do with an issue with the upgrade from SQL 2008 R2 to SQL 2012 and the WMI namespace.

Rather than open a PSS case to find the root cause, we decided it was probably faster to uninstall SQL entirely, then install a fresh instance of SQL 2012 and restore the DPM databases. If you choose to go this route, be sure to take a backup of your SSRS encryption key, DPM databases, master db, msdb, and the SSRS databases. If you don’t, you’ll spend hours reconfiguring reports, setting up SQL security and you’ll have to run DPMSync to recreate the SQL jobs.

DPM 2010 Tape Belongs to a DPM server sharing this library

Recently, I ran into an issue with our DPM 2010 shared tape library installation where several tapes added back to the library where reporting that they belonged to another DPM server sharing the library. I did not care about the data on these tapes, rather, they just needed to be marked as Free in order to be re-used. I logged into each of our DPM servers trying to find the server that owned the tape, but all of them reported the same error.

I tried performing erase operations, re-cataloging the tapes, identifying the unknown tapes, using the ForceTapeFree script , and external erase operations but DPM did not want to free it’s grip. Finally, I surmised that it must be something in the DPMDB rather than actual data on the tape.

It turns out that the media had been assoicated with an orphaned Media Pool. To correct this, I used the following DB queries.

First, I needed to locate the proper information about the tape. This query will give you the slot and barcode number which should allow you to find the piece of media you need to correct. You’ll want the GlobalMediaId field from this query:

select media.BarcodeValue, media.SlotNum, media.MediaId, gmedia.MediaPoolId
from tbl_MM_Global_ArchiveMedia gmedia
    innerjoin tbl_MM_Media media
        on gmedia.MediaId = media.GlobalMediaId

Next, you’ll want to find the appropriate “Free Media Pool” for your library. You can do this with the following query:

select library.ProductId, library.SerialNo, library.LibraryId, mpool.Name, mpool.MediaPoolId, mpool.GlobalMediaPoolId
from tbl_MM_MediaPool mpool
innerjoin tbl_MM_Library library
on mpool.LibraryId = library.LibraryId
where mpool.Name =‘Free Media Pool’

You’ll want the GlobalMediaPoolId GUID from that query. We then need to update the media with the proper MediaPoolId:

declare @GlobalMediaId asvarchar
declare @GlobalMediaPoolId asvarchar

set @GlobalMediaId =‘<GUID from query 1>’
set @GlobalMediaPoolId =‘<GUID from query 2>’

update tbl_MM_Global_ArchiveMedia
set MediaPoolId = @GlobalMediaPoolId
where MediaId = @GlobalMediaId

Lastly, perform a refresh in the DPM Console. Your tapes should now be marked as Free.

Error 0x800423f3 backing up Hyper-V VM with DPM 2007

One error you may receive while backing up a Hyper-V VM with DPM 2007 is the generic “DPM encountered a retryable VSS error. (ID 30112 Details: Unknown error (0x800423f3) (0x800423F3)).” There are a couple of different things that could cause this error. The two most common are:

1. You are running a Windows Server 2008 SP1 Hyper-V host and do not have the appropriate pre-requisites installed. Specifically, the hotfix described in KB959962.

http://technet.microsoft.com/en-us/library/dd347840.aspx

2. There is a VSS error of some kind inside the VM causing the Hyper-V VSS writer to fail.

One of the most common VSS errors inside a Server 2008 VM I have seen, is event id 8193:

Log Name:      Application
Source:        VSS
Date:          <DateTime>
Event ID:      8193
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      <ComputerName>
Description:
Volume Shadow Copy Service error: Unexpected error calling routine ConvertStringSidToSid.  hr = 0x80070539.

Operation:
OnIdentify event
Gathering Writer Data
Context:
Execution Context: Shadow Copy Optimization Writer
Writer Class Id: {4dc3bdd4-ab48-4d07-adb0-3bee2926fd7f}
Writer Name: Shadow Copy Optimization Writer
Writer Instance ID: {3586f039-f2f9-4dcb-a46e-3aaa20f1a2fa}
This error can be solved by following the instructions in this blog post. Specifically, perform these steps outline in KB947242:
  • Delete unresolvable SIDs in the ‘Administrators’ group on the VM.
  • Open regedit and locate ‘HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList’
  • Under the ProfileList subkey, delete any subkey that is named SID.bak

This has resolved the issue in most cases where I have seen that DPM error occur. Some other suggested troubleshooting tips that have solved this problem for me in the past:

  • Re-install the Integration Components and reboot the VM
  • Resolve issues for any VSS writers not listed as stable from the “vssadmin list writers” command on the host or inside the VM. You can restart the following services to resolve some problems
    • System Writer – Cryptographic Services service (doesn’t affect the system)
    • IIS Metabse Writer – IIS Administrative service (will reset all of IIS)
    • SqlServerWriter – SQL VSS service (doesn’t affect SQL)
    • WMI Writer – Windows Management Instrumentation service (WMI will be unavailable during the
      service restart)
    • BITS Writer – BITS service (BITS will be unavailable during the service restart)
  • Re-register VSS components as described in KB940032
  • Ensure there is sufficient space inside the VM for shadow copies

DPM Daily Maintenance Script

We recently completed a project to move over 300 servers from our old backup infrastructure to a brand new disk-based DPM 2007 solution. We have been very pleased with DPM 2007 thus far, but are finding that it required a fair amount of hand holding in the mornings to kick off failed jobs, increase disk allocations, and perform consistency checks. Unfortunately, the DPM console can only be loaded on the DPM server itself, and it cannot connect to a remote DPM server. That means logging in via RDP to each DPM server and addressing the alerts. After a few weeks of doing this by hand, we added them to our SCOM 2007 server which helped consolidate the alerts to a single interface, but we found we could not modify disk allocations via SCOM.

So I sat down and hashed out DPM Daily Maintenance Script. This powershell script will query the database for alerts and addresses the four most common. Replica disk and Recovery Point Volume threshold exceeded, Replica is inconsistent, and Recovery Point creation failed. The script takes 4 optional parameters:

replicaIncreaseRatio – Percentage of existing replica disk size to increase (ie. 1.1 increases by 10%. This is the default if nothing is specified)
scIncreaseRatio –  Percentage of existing recovery point volume size to increase (ie. 1.1 increases by 10%. This is the default if nothing is specified)
replicaIncreaseSize – Fixed value to increase replica disk (ie. 1GB)
scIncreaseSize – Fixed value to increase recovery point volume (ie. 1GB)

The script will first query the database for alerts, and then sorts them alphabetically and by alert type. This means that if a replica became inconsistent because the replica disk threshold was exceeded or if a recovery point creation failed because the recovery point volume threshold was exceeded, the script will increase the size of the volume before re-running the job. Also, for replica disks, the script will actually query the original datasource and resize the replica disk to the current workload’s size plus the ratio or fixed amount specified in the script. This ensures that the replica disk is extended to the proper amount during the first pass in cases where a large amount of data is added to the workload.

We have been running this scripts on 6 DPM servers for about 6 weeks now and I have to say they have virtually eliminated the daily maintenance (I was on vacation for 2 weeks during that time and DPM happily hummed along without any intervention, self-healing twice per day). We still use SCOM to monitor the alerts and are manually checking for replicas that are constantly becoming inconsistent or recovery point creations that are consistently failing and addressing those by hand. We have setup a scheduled task that runs twice per day using the following command line:

C:\Windows\system32\windowspowershell\v1.0\powershell.exe -PSConsoleFile “C:\Program Files\Microsoft DPM\DPM\bin\dpmshell.psc1” -command “.’C:\admin\DailyMaintenance.ps1′” >> C:\admin\DailyMaintenance.log

DailyMaintenance

There are a few 3rd party products that can help with these same alerts, and Microsoft is working on making our lives easier with DPM v3, but in the meantime, this should take some of the burden off of the sys admins.