Know thy Hypervisor’s limits: VMware HA, DRS and DPM

Last week I was setting up a vSphere Cluster and like any good admin, I was test driving all its features, making sure everything was working fine. As a side note, I’m trying to squeeze as much values of the vSphere Licenses we currently have, so I’ve set this cluster up with lots of the bells and whistles ESXi has, like:

  • Distributed virtual Switches
  • SDRS Datastore Clusters
  • NetIOC and SIOC
  • VMware HA, DRS and Distributed Power Management

(In v5.1 a lot of them have gotten better, more mature, less buggy)

So here I was, had this 5 host cluster  (ESXi1 to 5) setup nicely and I was testing VMware’s HA, but in slightly different scenario, one when DPM said “hey power down these 3 hosts you don’t need them, you have enough capacity”. Fine…”Apply DRS Recommendation” I did, hosts went to standby.

So there I had my 201  dual core test VMs running on just 2 servers (mind you this cluster is just for testing, so VMs were mostly idle). Time to do some damage:

Let me tell you what happens if you have one of the remaining blades go down, say ESXi1:

  1. First HA kicks in and notices, ESXi1 is not responding to hearbeats via IP  or Storage Networks, so that means host is down, not isolated.
  2. HA figures out ESXi1 had VMs running on it that were protected and starts powering ON VMs. Also DRS will eventually figure out you need more capacity and start powering on some of your standby hosts.
  3. HA will power on all your VMs just fine, by the time it finishes, DRS still had not onlined some standby hosts….so I ended up with 1 VM that HA did not manage to power on, bummer!

I investigated this issue. First stop – event viewer on the surviving host had this to say

“vSphere HA unsuccessfully failed over TestVM007. vSphere HA will retry if the maximum number of attempts has not been exceeded.

Reason: Failed to power on VM. warning

6/27/2013 1:39:00 PM
TestVM007″

Related events also showed this information:

“Failed to power on VM.
Could not power on VM : Number of running VCPUs limit exceeded.
Max VCPUs limit reached: 400 (7781 worlds)
SharedArea: Unable to find ‘vmkCrossProfShared’ in SHARED_PER_VCPU_VMX area.
SharedArea: Unable to find ‘nmiShared’ in SHARED_PER_VCPU_VMX area.
SharedArea: Unable to find ‘testSharedAreaPtr’ in SHARED_PER_VM_VMX area.”

I then went straight to the ESXi 5.1 configuration maximums…for sure an ESXi host can take more than 400 vCPUs right? And there it was, page 2:

“Virtual CPUs per host 2048

Ok…I’m at 400 nowhere near that limit. Then I find this KB article from vmware… no help since it seems to apply to ESXi 4.x not 5.1. Also you can’t find the advanced configuration item they mention in the paper. I looked for it using web client, vsphere client and powerCLI. It’s not there. However you do see the value listed if you run this from an ssh session on the target host….but I suspect it is not configurable:

# esxcli system settings kernel list | grep maxVCPUsPerCore

maxVCPUsPerCore uint32 Max number of VCPUs should run on a single core. 0 == determine at runtime 0 0 0

I go back to the maximums document and read also on page 2:

“Virtual CPUs per core 25

OK…So the message said 400vCPU limit reached:

400/25 = 16 – the exact number of cores I have on my ESXi boxes.

Eureka-Moment

So kids, I managed to reach one of ESXi’s limits with my configuration. Which makes you wonder a little about running high density ESXi hosts…and VMWARE’s claim that they can run 2000 vCPUs…sure they can, if you can run 40 physical ones in one box 🙂

My hosts had 16 pCPUs and 192GB  RAM and half the RAM slots were still empty so I could in theory double down the RAM and  stuff each ESXi server with 200VMs each with 2vCPUs and 1-2GB RAM…and I would not be able to failover in case of failure and other scenarios.

Where else does vCPU to pCPU limit manifest itself?

I also tried to see exactly what other scenarios might cause vCenter and ESXi to act up and misbehave. Here’s what I’ve got sofar:

Scenario A: 5 host DRS cluster, with VMware HA enabled, percentage based failover, and admission control enabled. 201 VMs running, 2vCPUs each:

  • Put each host in maintenance mode, until DRS stop migrating VMs for you and enter maintenance mode is stuck at 2%. Note that no VMs are migrated from the evacuated host, but this is due to Admission control, not the hitting vCPU ceiling.
  • Once you reach that point DRS will issue a fault, that it has no more resources, and will not initiate a single vMotion. The error I got was

Insufficient resources to satisfy configured failover level for vSphere HA.

Migrate TestVM007 from ESXi1.contoso.com to any host”

Scenario B: 5 Host DRS cluster, with VMware HA enabled, percentage based failover, and admission control disabled. 401 VMs running, 2vCPUs each:

  • Put each host in maintenance mode, until DRS stop migrating VMs for you and enter maintenance mode is stuck at 2%. Note that VMs are migrated from the evacuated host, but only when you hit the 400vCPU limit do the vMotions stop.
  • The error you get is:

“The VM failed to resume on the destination during early power on.

Failed to power on VM.
Could not power on VM : Number of running VCPUs limit exceeded.
Max VCPUs limit reached: 400 (7781 worlds)
SharedArea: Unable to find ‘vmkCrossProfShared’ in SHARED_PER_VCPU_VMX area.
SharedArea: Unable to find ‘nmiShared’ in SHARED_PER_VCPU_VMX area.
SharedArea: Unable to find ‘testSharedAreaPtr’ in SHARED_PER_VM_VMX area.”

So a similar message to the one you get when HA can’t power on a VM.

To wrap this up, I think there might be some corner cases where people might start to see this behaviour (I’m thinking in VDI environments mostly), and it would be very wise to take a serious look at the vCPU : pCPU ratio in failover scenarios to avoid hitting vSphere ESXi’s maximum values.

Change vSphere Service Console IP

Now I get a chance to write an article I’ve been meaning to about something I’ve run into while working with vSphere 4.1. Initially I’ve called it a “bug”” (may have said on twitter I guess), now I’m starting to think “it serves me right” in a way. It is about what happens when you want to change the vSphere Service Console IP, of a host that is already in a cluster. Here’s the history:

  • 3 Hosts configured in a cluster. After some weeks it was decided that we had to change the IP’s and vLAN , to make room for some other vLANs that needed room to grow.
  • No problem, get the new IP’s, talk to the network guys to trunk the ports on the physical hosts and reconfigure switches to make sure that traffic can talk to our vCenter Server.
  • Google for how to change the Service Console IP….5 minutes later Google for how to change also the vLAN ID of the Service Console. So for changing the IP and vLAN these are 2 good places to start.
    • Place host in maintenance mode (while still in cluster – we chose to not remove it or delete the cluster since we had resource pools configured)
    • Make all the change (IP, gateway, hosts file)
    • Test settings (ping, nslookup)
    • Now once all hosts are reconfigured properly we update each host hosts file with the updated IP/hostname entries for the other nodes in a cluster.
  • Obviously when I took each host out of maintenance mode our clusters would not work, to be expected.
  • Now…let’s reconfigure vSphere Cluster since it was not a proper cluster anymore. Reconfigure cluster finishes “Successfully”(task took longer than we expected it to), everything seems great.

Fast forward a few days later, I do a routine configuration check of the systems and our cluster starts to throw “HA agent misconfigured errors”. I discover although I updated the hosts file on vSphere, the OLD ip addresses were still there. I mean there was a mix of the old settings and new settings. I start asking my colleagues if anyone made any changes, but no one had done anything. After some troubleshooting (which included a file level search for files where that IP may be listed on the vSphere host) I concluded this:

“When you try to reconfigure the IP address of a host that is in a cluster, and then you Reconfigure the cluster for HA, somewhere (maybe vCenter DB) information about the IP’s of the hosts is stored, as they were joined to the Cluster initially! Therefore any cluster reconfiguration of hosts with new IP’s will get a mix of old IP and new IP in the /etc/hosts file and possibly Reconfigure for HA Errors

To fix this, obviously we disabled HA, disbanded the cluster and recreated it back again.

The right way to change vSphere Service Console IP

In light of these issues these are the steps to properly change the IP address of a host:

  1. If host is in a cluster, remove it from the cluster.
  2. Put host in maintenance mode.
  3. Disconnect from vCenter
  4. Login to physical (or remote KVM) console and change IP settings. Change the gateway by editing /etc/sysconfig/network so that the GATEWAY line is pointing to your new gateway. Change the IP using these commands.
esxcfg-vswif -i <new IP > -n <new Mask> vswif0
esxcfg-vswitch vSwitch0 -p <port group Name> -v <VLAN ID>
esxcfg-vswif -s vswif0
esxcfg-vswif -e vswif0

5. Ping your reconfigured host to see all is working properly.

6.Rejoin host to the cluster, reconfigure for HA (let HA reconfigure your hosts file instead of manual changing it). Enjoy not having to worry about cluster issues 🙂

A colleague of mine also wrote this “interactive script” that prompts you for required information for changing all these settings, I’m a bit LSI (Linux Shell Impaired).

#!/bin/sh
echo "New IP :"
read new_ip
echo "New Mask:"
read new_mask
echo "New Gw:"
read new_gw
echo "New vlan:"
read new_vlan
sed -i "s/`cat /etc/sysconfig/network |grep GATEWAY=|cut -d = -f 2`/$new_gw/g" /etc/sysconfig/network

esxcfg-vswif -i $new_ip -n $new_mask vswif0
esxcfg-vswitch vSwitch0 -p "Service Console" -v $new_vlan

esxcfg-vswif -s vswif0
esxcfg-vswif -e vswif0

I hope you enjoyed the read, and remember:

If you need to change the IP of a host in a cluster….remove it from the cluster first, saves yourself some time and braincells. Comments and critique are welcome, as usual.