Now I get a chance to write an article I’ve been meaning to about something I’ve run into while working with vSphere 4.1. Initially I’ve called it a “bug”” (may have said on twitter I guess), now I’m starting to think “it serves me right” in a way. It is about what happens when you want to change the vSphere Service Console IP, of a host that is already in a cluster. Here’s the history:
- 3 Hosts configured in a cluster. After some weeks it was decided that we had to change the IP’s and vLAN , to make room for some other vLANs that needed room to grow.
- No problem, get the new IP’s, talk to the network guys to trunk the ports on the physical hosts and reconfigure switches to make sure that traffic can talk to our vCenter Server.
- Google for how to change the Service Console IP….5 minutes later Google for how to change also the vLAN ID of the Service Console. So for changing the IP and vLAN these are 2 good places to start.
- Place host in maintenance mode (while still in cluster – we chose to not remove it or delete the cluster since we had resource pools configured)
- Make all the change (IP, gateway, hosts file)
- Test settings (ping, nslookup)
- Now once all hosts are reconfigured properly we update each host hosts file with the updated IP/hostname entries for the other nodes in a cluster.
- Obviously when I took each host out of maintenance mode our clusters would not work, to be expected.
- Now…let’s reconfigure vSphere Cluster since it was not a proper cluster anymore. Reconfigure cluster finishes “Successfully”(task took longer than we expected it to), everything seems great.
Fast forward a few days later, I do a routine configuration check of the systems and our cluster starts to throw “HA agent misconfigured errors”. I discover although I updated the hosts file on vSphere, the OLD ip addresses were still there. I mean there was a mix of the old settings and new settings. I start asking my colleagues if anyone made any changes, but no one had done anything. After some troubleshooting (which included a file level search for files where that IP may be listed on the vSphere host) I concluded this:
“When you try to reconfigure the IP address of a host that is in a cluster, and then you Reconfigure the cluster for HA, somewhere (maybe vCenter DB) information about the IP’s of the hosts is stored, as they were joined to the Cluster initially! Therefore any cluster reconfiguration of hosts with new IP’s will get a mix of old IP and new IP in the /etc/hosts file and possibly Reconfigure for HA Errors”
To fix this, obviously we disabled HA, disbanded the cluster and recreated it back again.
The right way to change vSphere Service Console IP
In light of these issues these are the steps to properly change the IP address of a host:
- If host is in a cluster, remove it from the cluster.
- Put host in maintenance mode.
- Disconnect from vCenter
- Login to physical (or remote KVM) console and change IP settings. Change the gateway by editing /etc/sysconfig/network so that the GATEWAY line is pointing to your new gateway. Change the IP using these commands.
esxcfg-vswif -i <new IP > -n <new Mask> vswif0 esxcfg-vswitch vSwitch0 -p <port group Name> -v <VLAN ID> esxcfg-vswif -s vswif0 esxcfg-vswif -e vswif0
5. Ping your reconfigured host to see all is working properly.
6.Rejoin host to the cluster, reconfigure for HA (let HA reconfigure your hosts file instead of manual changing it). Enjoy not having to worry about cluster issues 🙂
A colleague of mine also wrote this “interactive script” that prompts you for required information for changing all these settings, I’m a bit LSI (Linux Shell Impaired).
#!/bin/sh echo "New IP :" read new_ip echo "New Mask:" read new_mask echo "New Gw:" read new_gw echo "New vlan:" read new_vlan sed -i "s/`cat /etc/sysconfig/network |grep GATEWAY=|cut -d = -f 2`/$new_gw/g" /etc/sysconfig/network esxcfg-vswif -i $new_ip -n $new_mask vswif0 esxcfg-vswitch vSwitch0 -p "Service Console" -v $new_vlan esxcfg-vswif -s vswif0 esxcfg-vswif -e vswif0
I hope you enjoyed the read, and remember:
If you need to change the IP of a host in a cluster….remove it from the cluster first, saves yourself some time and braincells. Comments and critique are welcome, as usual.