Wednesday, October 14, 2015

Hyper-V dynamic MAC addressing is a cluster...

Hyper-V dynamic MAC addressing is a cluster. There - I said it.

Microsoft has been making great strides with Hyper-V by adding new features and functionality. If you compare Hyper-V on Server 2012 R2 to VMware 5.5, most people would agree that Hyper-V is "good enough" and has most of the features that the majority of businesses require. Unfortunately Microsoft has overlooked a small detail - MAC addresses. Yes, MAC addresses - something you'd think would have already been a solved problem. If you've used Hyper-V in any capacity, you probably know that there are two methods for handling MAC addresses for your virtual guests:

  • Static addresses - where you can permanently set whichever MAC address you want
  • Dynamic addresses - where each guest pulls from a dynamic pool of MAC addresses that is defined automatically on a host.
MAC address options in Hyper-V Manager
MAC address options in Hyper-V Manager

Most organizations are likely utilizing dynamic MAC addresses for their virtual guests, as it's kind of the default "set it and forget it" method. This is fine for most purposes, especially if you statically set IP addresses inside your guests, or if you only have single Hyper-V hosts with local storage.

The dynamic MAC address range is set automatically on each host based on the IP address of the host, to ensure ranges do not conflict on multiple hosts. You can view the MAC address range for a host by opening Hyper-V Manager, and in the Actions pane, clicking on Virtual Switch Manager, and then by selecting the MAC Address Range option in the left pane of the Virtual Switch Manager.

Defining the dynamic MAC address pool on each host
Defining the dynamic MAC address pool on each host

The problem comes when you have multiple Hyper-V hosts in a failover cluster. When a VM using dynamic MAC addressing live migrates to a new node in a failover cluster, it retains the same MAC address to prevent network disruptions. However, as soon as that VM is rebooted, it will grab a new MAC address from the node it's currently on.

Now, in most scenarios that's fine. If you have servers that are using statically set IP addresses, the MAC address changing on reboot will have little effect. If you are hosting virtual desktops using DHCP, this shouldn't matter much either, as it's expected for the IP address of a desktop to change when using DHCP.

RDS Pooled Desktops with Rollback Enabled

Hyper-V dynamic MAC addresses in a clustered configuration has an unfortunate side effect when using pooled desktops with RDS. When a pooled desktop is created, Hyper-V creates a checkpoint shortly after the desktop powers on for the first time - this is the mechanism that RDS uses to perform the rollback. When a user is finished with their session and logs off a pooled desktop, RDS creates a task for Hyper-V to restore the checkpoint. This brings the pooled desktop back to it's original state.

When Hyper-V creates the checkpoint, it stores the MAC address of the desktop in the checkpoint. So each time the desktop performs a rollback, the MAC address reverts back to whatever it was when the checkpoint was created. This causes a problem in a clustered configuration. A desktop that is created on the first node of the cluster gets a MAC address from that node's dynamic MAC pool. Through the magic of Live Migration, that virtual desktop can move to another node in the cluster.

The MAC address stored in the checkpoint
The MAC address stored in the checkpoint

Now that the pooled desktop is on a different node, the first node of the cluster can reallocate that original MAC address from it's pool, since no desktop on the first node is using it. See the problem? The first node will give that MAC address to a new VM, however the pooled desktop on the second node will revert to that same MAC address when performing the rollback. Voila! You now have a duplicate MAC address in your environment, which can cause issues with layer two switching, as well as DHCP.

Now, if you have SCVMM in your environment, it attempts to resolve this issue by allowing you to define a static MAC address pool that all hosts in your cluster can pull from. Again, this is a great solution for regular VM's, but doesn't work so well with RDS in the picture. If your template virtual machine is set to a static MAC address, RDS will throw an error when you try to create and/or update the pooled template. RDS doesn't want to create multiple virtual desktops all with the same static MAC address, so it forces you to use dynamic MAC addresses and will not continue until you change your template VM.

Failing to update the pool when the template has a static MAC address
Failing to update the pool when the template has a static MAC address

A workaround

I've developed a workaround for this behavior that prevents the creation of duplicate MAC addresses in the environment. We know that when the rollback occurs, the MAC address of the virtual desktop may change, causing the problem. By using PowerShell, we can modify the MAC address inside the checkpoint, and "blank" it out - this will cause the virtual desktop to grab a new dynamic MAC address from the host it currently resides on each time that a rollback occurs.

Get-VM <Name> | Get-VMSnapshot | Get-VMNetworkAdapter | Set-VMNetworkAdapter -DynamicMacAddress

Resetting the MAC address present in the checkpoint
Resetting the MAC address present in the checkpoint

However we're not quite done yet. When we make this change, we're forcing the virtual desktop to change MAC addresses while it's still technically running, which doesn't quite work. The virtual desktop will grab a new MAC address, but all network connectivity will be lost until we reboot the desktop. We certainly don't want to sit around all day rebooting virtual desktops. Luckily, Hyper-V logs an event whenever a checkpoint is restored. We can create another simple PowerShell script to reboot the virtual desktop, and use Task Scheduler to trigger this script whenever the rollback event is logged.

Event ID 18596 is logged whenever a checkpoint is restored
Event ID 18596 is logged whenever a checkpoint is restored

The PowerShell script is pretty simple. First we need to find the specific event log entry that triggered the script. We do this by passing the eventChannel and eventRecordID as parameters. When Task Scheduler triggers the script, it will pass this information into the script (we'll set this up next). We can then use the Get-WinEvent cmdlet to retrieve the specifc event entry we're looking for.

Once we have the event entry, it's as simple as looking inside the message of the event and grabbing the name of the virtual desktop. Now that we have the name of the virtual desktop, we can use some Hyper-V cmdlets to force a reboot of the virtual desktop.

Here is a link to my GitHub repository for the scripts - Restart-CheckpointedVDI on GitHub.

You'll need to setup a scheduled task that is triggered by the Hyper-V event. Use the following parameters for the scheduled task:

  • Context: SYSTEM
  • Run whether use is logged on or not: enabled
  • Run with highest privileges: enabled

Use the following Trigger settings:

  • Log: Microsoft-Windows-Hyper-V-Worker/Admin
  • Source: Hyper-V-Worker
  • Event ID: 18596

The trigger settings for the scheduled task
The trigger settings for the scheduled task

For the action, call C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe and use the following arguments:

-NoLogo -ExecutionPolicy Bypass -File D:\Restart-CheckpointedVDI.ps1 -EventChannel $(eventChannel) -EventRecordID $(eventRecordID)

The scheduled task after it has been setup
The scheduled task after it has been setup

To trigger the script correctly, Task Scheduler needs to pass the eventChannel and eventRecordID to the script - this does not happen natively. I found a slick TechNet blog post detailing how to set this up - However for ease of use, I've also included an XML file you can use to import the scheduled task for you (Restart-CheckpointedVDI on GitHub). Simply modify the path to the script in the argument section of the Action tab and the scheduled task should be all set.

The net effect of modifying the MAC address inside the checkpoint, setting up a scheduled task to trigger based on event ID 18596 and running the reboot script is as follows:

  1. A user finishes their session and logs off of the pooled desktop.
  2. RDS sends a task to Hyper-V to initiate the rollback.
  3. Hyper-V stops the VM, applies the checkpoint, and restores the VM.
  4. Hyper-V logs event ID 18596 in the Microsoft-Windows-Hyper-V-Worker/Admin event log.
  5. Task Scheduler is triggered by the event and launches the reboot script.
  6. The reboot script grabs the event that triggered it based on parameter input, parses the virtual desktop name, and reboots the desktop.
This entire procedure occurs over the course of 15-30 seconds, depending on how fast your Hyper-V virtual infrastructure is.

Final Thoughts

It's unfortunate how Hyper-V dynamic MAC addressing works in a situation such as this. It's also unfortunate that Microsoft chose to implement static MAC address pools as part of SCVMM. If RDS were able to integrate more tightly with SCVMM and grab a MAC from the static pool, that would solve this problem.

Ideally, Microsoft should have implemented a static MAC address pool directly as part of Failover Clustering, so customers of all sizes could have access to a single standardized pool of MAC addresses, preventing DHCP conflicts in scenarios such as RDS pooled desktops. VMware already does this in a standardized fashion by centrally managing MAC address with vCenter - a standard part of any size VMware deployment. It's clear that the Hyper-V, SCVMM and RDS teams at Microsoft are not all on the same page on this issue.

1 comment:

  1. Hi Tom,

    thanks for the script it works great .Just a question do you know of a way to see what addresses are available in the pool for assignment?