Background
We had a particular Veeam Backup & Replication Job that began taking an inordinately long time to complete and during the job the guest VM's CPU usage would skyrocket. Upon closer inspection we found that the actual backup time wasn't that long, it was in deleting the snapshot that was the problem. As you may know if you use Veeam, when a job begins, Veeam will instruct the VMware Host to perform a snapshot of the virtual machine. When the Veeam backup completes, it then instructs the VMware Host to delete the snapshot. It was the process of deleting the snapshot that was taking so long (or to be more precise committing all of the changes since the snapshot was first taken).
The problematic virtual machine in this case is Windows Server 2012 running an application called EventSentry which uses a Postgres SQL Database. This issue could theoretically occur with any application running on a virtual machine that does a lot of Disk I/O during a process that creates a VMware snapshot. Simply put, the more disk changes that occur while the snapshot is in place, the longer it will take to delete the snapshot (commit the changes).
After some experimenting we found that if you would first stop the EventSentry Database service (which stops the Postgres SQL database), then the Veeam Backup jobs would complete much more quickly and the CPU on the guest VM being backed up remained stable. In fact, jobs that would take 2.5 Hours to complete before, would only take less than an hour with the application's database stopped.
Possible Solutions
OK. So this should be simple. Just figure out a way to stop the EventSentry Database Service before the Veeam job starts, and then restart the service once the job completes. Well, in my case it was not quite so simple to figure out, which is why I'm writing this Blog post about it.
Veeam offers a means to run scripts before and after for either the entire Job or for a specific host within the job.
- The scripts you run before and after a job are referred to as Pre-Job and Post-Job Scripts.
- The scripts you run for a specific guest VM are referred to as Pre-Freeze and Post-Thaw scripts.
Pre-Freeze and Post-Thaw Scripts
It's actually VMware Tools that provides the Pre-Freeze and Post-Thaw scripts function. Veeam just copies the scripts you choose to run on the virtual guest host then uses the function within the VMware Tools installed on that guest VM to actually run them.
So, we created a simple script that contained the command NET STOP "EventSentry Database" and configured that to be the Pre-Freeze script. But when the Veeam job would run we would get an error in Veeam that said "Exit Code: 5". Not very helpful.
Veeam offers great documentation on how to use these Pre-Freeze scripts but no help at all for troubleshooting:
http://helpcenter.veeam.com/backup/80/hyperv/backup_job_vss_scripts_hv.html
From the beginning we suspected it was some type of permissions issue. We assumed however that VMware Tools was executing the scripts using the local System account. Turns out, it executes the scripts using the same credentials you provided the Veeam Backup job to do the "Guest Processing". Makes sense once you know this.
From the problematic virtual machine, if we opened a Command Prompt (not as Administrator) we found that we could not even manually stop the service using the command NET STOP "EventSentry Database". Windows would give an error: "System error 5 has occurred. Access is denied." So that's where the Exit Code 5 came from in the Veeam error message.
This user account context that we tried the command from also happens to be a Domain Admin, so why can we not stop and restart this service? It turns out that with this particular service (EventSentry Database), only the local administrators and the local System account had permissions to do this.
The Permissions Fix
After some Googling found that you can edit the security permissions of a service using the SC.exe command, but a much faster and friendlier method was to use a utility called Service Security Editor which we found here: http://www.coretechnologies.com/products/ServiceSecurityEditor/
Using this very-easy-to-use tool we were able to quickly select the EventSentry Database service and could then see the current permissions as well as add the domain username we needed to give it rights to stop and start this particular service. Thank you Core Technologies Consulting, LLC. This saved me tremendous time and it also helped to be able visually see the security permissions on other services as well.
Pre-Freeze and Post-Thaw Scripts Didn't Help
After all of that work, we discovered that this was not helping us for our particular situation. We assumed that a Pre-Freeze would run before the VMware snapshot (which it does) and the Post-Freeze would run after the snapshot was deleted (it does not). The Post-Freeze script runs immediately after the snapshot is created. Therefore when the job would run it would stop the EventSentry Service, take a VMware Snapshot, then restart the service. The snapshot happens so fast we initially thought the script wasn't working.
Now we understand that Pre-Freeze and Post-Thaw are meant to help you with applications that are not Microsoft VSS-aware (Volume Shadow Copy). This allows you to stop the process only during snapshot creations so that you have a clean snapshot image. Apps that are VSS-aware do not need these scripts since VMware Tools uses VSS to quiesce apps such as Microsoft SQL databases during snapshot creation.
Pre-Job and Post-Job Scripts
Since the above was NOT doing what we needed we had to revert to using Veeam's Pre-Job and Post-Job scripts.
The frustrating part (to me) about this option is that if you are backing up multiple VMs with one job, the service being stopped will remain offline while all of the other VMs are being backed up. Thus you have to create a single Veeam job for the specific VM you wish to stop the services for.
OK, so we follow their instructions here: http://helpcenter.veeam.com/backup/80/vsphere/backup_job_advanced_advanced_vm.html
Once again I assumed incorrectly how this process works. I assumed it would behave much like the Pre-Freeze script in that Veeam would copy your script to the guest and execute it using the user context defined in the Guest Processing. Not at all correct.
Instead, Pre-Job and Post-Job scripts run on the Veeam Server and not the guest. This means that if you wish to stop a service on the guest you cannot use the NET STOP command. Instead you have to use either the SC.exe command or a utility such as PSservice.exe.
This also means you will may need to convey the username and password in your script. In our case, the Veeam Server is logged in as the same user (domain admin) that we previously used the Service Security Editor to grant permission to so that it can stop/start the service. Thus using Windows pass-through authentication we didn't have to specify credentials. Personally, I wish Veeam would offer a means to execute job scripts using credentials you store within Veeam, just as they do with other facets of Veeam.
In our case, we used this command in the Pre-Job Script
sc.exe \\nameofguestVM stop eventsentrydatabase
We used this command in the Post-Job Script
sc.exe \\nameofguestVM start eventsentrydatabase
Finally! This accomplished what we were after.
In Summary
When you need to stop/start a Windows Service during a Veeam Backup job, the first thing is to determine do you just need to quiesce a non-VSS-aware application only during the creation of the Snapshot? If so, use the Pre-Freeze and Post-Thaw method.
If you need the service stopped during the entire job, use the Pre-Job and Post-Job method.
Either way, understand the username context the method will be using. Although this may not be true with every Windows service, you may have to use a utility such as Service Security Editor to grant that user the rights to stop and start that service.
Pre-Freeze scripts use the credentials you defined in the "Guest Processing" section of the job and they run from the guest VM being backed up. Pre-Job scripts use the credentials that the Veeam Service is running and they run from the Veeam server and NOT the guest VM.
As always, hoping this helps someone else in the future.