Automate vSphere Certificate Generation

A couple of weeks ago I was working on some audit internally, and I discovered we had some vSphere servers working with self generated certificates. While these servers were un-managed servers (esxi free license servers), they still needed certificates, as it is the case with such servers, they are “critical”, just not critical enough to warrant licenses :).

The “problem’ with vSphere certificates is that they have to be generated using OpenSSL and you cannot generate them using Windows tools like, certreq. With certreq you could potentially have done this process much easier. Also there is an issue with using the request files given out by OpenSSL as it does not have template information written in it, and the Windows CA cannot generate a certificate if it does not know which kind of certificate you want.

I trawled the internet for ways to automate this, and I didn’t find an end to end solution for certificate generation. I only found bits and pieces, and people were writing how to do each certificate one by one. This didn’t sit well with me, and looking at the workflows I discovered there was really no point not having a script that does “it” automatically. I will define what “it’ is, by making a short description of the steps required for generating a vSphere certificate:

  1. Generate CSR file and key file using OpenSSL
  2. Submit CSR file to certification authority
  3. Retrieve response from certification authority
  4. Rename certificate file and key file  and upload to vSphere host

Some notes regarding the setup in which this would work:

  • I used Powershell to automate this, so this won’t work on other platforms.
  • I used a Windows 2008 R2 PKI CA with a “Web Server” Template.
  • The CA also had automatic approval for this type of certificate (which made automating the response retrieval easier)
  • User running this script needs to have the right to request/issue the given certificate template, also should be local admin on the box you are running the script, otherwise you would have to modify script to run some parts of the commands with “runas”

The script

I used a preexisting script to get started, the one for certificate mass generation from, found here.

What differs from the way they did it, is that I’ve changed the way variables are passed for building the “config file”,  and the fact that each CSR has its own config file, specified on command line. This will help you track your work better for troubleshooting purposes. Something that should be noted is that their script, and also mine, use a special openssl config file, in the sense that the lines to be modified by the script are numbered, not searched in the file, so beware of making changes to the “custom_openssl.cfg” file. It could have probably been more elegant to search for the lines in the file, but I didn’t want to spend time getting it to work.

The download link for the script I built is this one; Generate-vSphere-Cert, below you will find some explanations on how it works.

Learning points

The script takes some parameters as input (get some of them wrong and your script might not work as intended or quit)

a) vSphereHostFile – is a CSV file that must contain the host name and domain name in 2 separate columns.

b) CAMachineName_CAName is the name of your CA in the format (hostname\display name)

c) TemplateName is the name of the certificate template you want to use for certificate generation, as defined on your CA

Lines 32 – 44 you should change the variables there to match your requirements (different paths, different location, country, email, company, etc). There is room for improvement here, you can include this info in the csv file, useful for creating certificates for multiple companies, with different contact information.

Lines 49 – 73 – build out a folder structure, one folder per host where all host files will be stored. Also builds CN, SAN’s (Subject Alternate Names)  – you may wish to customize what you add here. I added short name, FQDN, i left out IP address as that can change more easily than the name.

Lines 80-97 – use a temporary file from the original openssl config file containing the parameters we setup until now – this piece of code uses numbered lines, so if you make changes to the original file, change the line numbers here)

Lines 99-104 – build out the file/paths to generate a CSR with openssl. The command i used is slightly different than the ones on the internet, I needed a special length for the RSA, so I used:

"$openssldir\openssl.exe req -newkey rsa:2048 -out $csr -keyout $key -config $config"

Lines 109-114 – build paths for files to send/receive to/from the Windows CA. I also used something “unusual” (as in, not your first page results on google search) which is specifying the CAName and Template name.

The CA name is needed so you do not get a prompt each time certreq is invoked.

The certificate template is specified using the attrib parameter, the missing piece of my “how to automate” CSR submitting, see below:

$ConfigString = """$CAMachineName_CAName"""
$attrib = "CertificateTemplate:$TemplateName"
$issuecerts_cmd = "certreq -submit -attrib $attrib -config $ConfigString $csr $crt $p7b $rsp "

Lines 117-122 Unless you use this script for automating creation of vCenter Certificates, you can comment these lines out. They generate a PFX certificate which is required with vCenter. PFX certificates are not not required for vSphere host certificates.

The next step to automation would be to upload these files to your vSphere host. I used this script here and changed some paths to suit my folder structure. You can also use SCP or other methods to upload the file. After the files are uploaded you need to reboot the host for the certificates to take effect.

As always with these scripts, do your best to try them in a test environment before unleashing them into production. You are dealing with Certification Authorities and your vSphere hosts. Failure to upload a correct certificate to the hosts will result in you not being able to connect with vSphere Client, and having to go to console (NOT SSH) and regenerate self signed certificate.

I hope this was a useful read, comments and critique are open, as always.

Things to keep in mind about Snapshots

Some time ago I setup a VMWare environment, and I was involved in sizing and design decisions. I did a lot of reading about how to size the VMFS datastores how many VMDKs per datastore, how to calculate appropriate size. Everyone on the web mentioned you have to take into account snapshot size, so I did (for a good read on snapshots try this post by VMWare). I split VMFS datastores according to roles (Logs, Database, OS, swap) and accounted for a snapshot allowance for each datastore.

Fastforward 3 months later and a couple of snapshot VMs and I do a usage report on the datastores to notice something I didn’t expect. I used the VMware vCenter reporting features to get a disk usage (which are pretty sweet by the way). I was amazed the report said zero space used for snapshots (although those VMs had snapshots and VDMKs on the datastores). I cycled through the Datastores and found where the snapshots were stored. They were stored on the Datastore where the OS was found, same where the config file was located, then I looked it up in the documentation and found this:

  • The default location for snapshots of Virtual Machines is their Working Directory.

  • The default Working Directory is the datastore where the Configuration File (.vmx) of the VM is stored.

Wow, that was unexpected, for me at least since that meant I undersized my OS datastore a little. So this question haunted me, ok, how to change this setting in dire situations, when you want to avoid VMs crashing because your datastore is out of space. I then did more research and discovered this:

  • Default Working Directory can be changed if you change the VMX file using by adding/changing this line: workingDir=”path/path/”

  • Doing so will ALSO change the location of your .vswp file (the swap file created by vSphere) to the location specified by “WorkingDir”

According to this article you can also specify the location of the swap file within the VMX by adding this line: sched.swap.dir = “/vmfs/volumes/Volume1/VM/”. However this setting or adding the workingdir to the configuration file will take effect over the “Store Virtual Machine Swap file in location specified by the Host” option (on the logic that VM settings take precedence over host settings, unless defaults are used for VM – please correct me if wrong)

The consequence of this is that you no longer define swap file storage at host level (which was pretty easy because you have much fewer hosts) instead you define it at VM level (which you may have in the hundreds). Taking this further you’d probably have to use powershell to set this easily…and have this thorougly documented for each VM.

You can see how from something relatively benign changing defaults for Snapshots turns into quite an administrative burden. Then you have to balance administrative burden vs reisizing datastores.

Datastore sizing – revisited

Now with this information the way datastores are sized get a little more complex. Prior to me knowing about this I read what really smart and knowledgeable people had to say about about datastore sizing and it went a little like this:

(Avg VM * #VMs ) * (100% + (Snapshot Allowance) + 10% Reserve)

Snapshot allowance was 10-20%.

Now that is great for datastores that hold the entire VM inside it, I wanted to separate I/O you have to create multiple datastores and each VM can have more than 1 VMDK the math above applies to a single type of Datastore (e.g. for a Dastore for DB vdmks)

(Avg DB VMDK * #VMDKs ) * (100% + (Snapshot Allowance) + 10%Reserve)

In light of my recent discovery about snapshots, the math changes yet again, the sizing would be:

(AvgVMDK * #VMDKs) * (100% + 10% Reserve)

Now assuming you store the VMDK where you store your OS VDMDK sizing this DataStore changes as follows:

(AvgVDMK * #VMDKs) * (100% + 10%Reserve)+(Other Datastores [db,app,log,swap]) + Snapshot Allowance

Where Snapshot Allowance is now sized different:

Snapshot Allowance = (OS Datastore Size + DB/App/Log/Swap Datastore Size) * (10-20%)

In essence if no VMware snapshot defaults are changed and snapshots will be used (they are found in a lot of processes within VMware – backup solutions, VDI, development, patch management of guests) the space occupied by these snapshots is important and it is also important where snapshots consume this space from. Whatever the design, it must include some form or “snapshots space management” to use some fancy words for it. Any comments or different angles on this are welcome as usual.

Change vSphere Service Console IP

Now I get a chance to write an article I’ve been meaning to about something I’ve run into while working with vSphere 4.1. Initially I’ve called it a “bug”” (may have said on twitter I guess), now I’m starting to think “it serves me right” in a way. It is about what happens when you want to change the vSphere Service Console IP, of a host that is already in a cluster. Here’s the history:

  • 3 Hosts configured in a cluster. After some weeks it was decided that we had to change the IP’s and vLAN , to make room for some other vLANs that needed room to grow.
  • No problem, get the new IP’s, talk to the network guys to trunk the ports on the physical hosts and reconfigure switches to make sure that traffic can talk to our vCenter Server.
  • Google for how to change the Service Console IP….5 minutes later Google for how to change also the vLAN ID of the Service Console. So for changing the IP and vLAN these are 2 good places to start.
    • Place host in maintenance mode (while still in cluster – we chose to not remove it or delete the cluster since we had resource pools configured)
    • Make all the change (IP, gateway, hosts file)
    • Test settings (ping, nslookup)
    • Now once all hosts are reconfigured properly we update each host hosts file with the updated IP/hostname entries for the other nodes in a cluster.
  • Obviously when I took each host out of maintenance mode our clusters would not work, to be expected.
  • Now…let’s reconfigure vSphere Cluster since it was not a proper cluster anymore. Reconfigure cluster finishes “Successfully”(task took longer than we expected it to), everything seems great.

Fast forward a few days later, I do a routine configuration check of the systems and our cluster starts to throw “HA agent misconfigured errors”. I discover although I updated the hosts file on vSphere, the OLD ip addresses were still there. I mean there was a mix of the old settings and new settings. I start asking my colleagues if anyone made any changes, but no one had done anything. After some troubleshooting (which included a file level search for files where that IP may be listed on the vSphere host) I concluded this:

“When you try to reconfigure the IP address of a host that is in a cluster, and then you Reconfigure the cluster for HA, somewhere (maybe vCenter DB) information about the IP’s of the hosts is stored, as they were joined to the Cluster initially! Therefore any cluster reconfiguration of hosts with new IP’s will get a mix of old IP and new IP in the /etc/hosts file and possibly Reconfigure for HA Errors

To fix this, obviously we disabled HA, disbanded the cluster and recreated it back again.

The right way to change vSphere Service Console IP

In light of these issues these are the steps to properly change the IP address of a host:

  1. If host is in a cluster, remove it from the cluster.
  2. Put host in maintenance mode.
  3. Disconnect from vCenter
  4. Login to physical (or remote KVM) console and change IP settings. Change the gateway by editing /etc/sysconfig/network so that the GATEWAY line is pointing to your new gateway. Change the IP using these commands.
esxcfg-vswif -i <new IP > -n <new Mask> vswif0
esxcfg-vswitch vSwitch0 -p <port group Name> -v <VLAN ID>
esxcfg-vswif -s vswif0
esxcfg-vswif -e vswif0

5. Ping your reconfigured host to see all is working properly.

6.Rejoin host to the cluster, reconfigure for HA (let HA reconfigure your hosts file instead of manual changing it). Enjoy not having to worry about cluster issues 🙂

A colleague of mine also wrote this “interactive script” that prompts you for required information for changing all these settings, I’m a bit LSI (Linux Shell Impaired).

echo "New IP :"
read new_ip
echo "New Mask:"
read new_mask
echo "New Gw:"
read new_gw
echo "New vlan:"
read new_vlan
sed -i "s/`cat /etc/sysconfig/network |grep GATEWAY=|cut -d = -f 2`/$new_gw/g" /etc/sysconfig/network

esxcfg-vswif -i $new_ip -n $new_mask vswif0
esxcfg-vswitch vSwitch0 -p "Service Console" -v $new_vlan

esxcfg-vswif -s vswif0
esxcfg-vswif -e vswif0

I hope you enjoyed the read, and remember:

If you need to change the IP of a host in a cluster….remove it from the cluster first, saves yourself some time and braincells. Comments and critique are welcome, as usual.