I have struggled to find an end to end comprehensive guide on deploying VCF for home lab use. Generally I prefer blogs for this sort of thing, because a blog will tell me exactly how to get the application up and running, but not necessarily explain all the bells and whistles. That’s what the formal documentation is for. But with VCF, even though it’s not necessarily a new product, it is still not as well documented for home lab use, as well as other VMware products.
I think one reason for this is the hardware specs for VCF are quite high.
The recommended architecture for VCF is 4 physical servers for the management domain, and 3 physical servers for each VI workload domain. 7 servers, just to get started.
That is out of the scope of my budget, and probably many others, and likely the reason why VCF for home lab use is not well documented.
Another option is to deploy the VCF environment in a consolidated fashion using the Holodeck. This tool will nest 3 ESXI hosts and all the VCF components, and do so in an automated fashion. Again the minimum specs here are very high…
Holodeck Minimum Specs for VCF Consolidated:
- 2 sockets – Total 16 cores
- 384 GB RAM
- 3.5 TB SSD Disk
I tried to deploy the holodeck on one of my ESXi servers, and somewhere near the end of the installation it failed, and I was seeing the CPU spike close to 120% during that time. I think I just didn’t have enough juice to run it.
So where does that leave us?
I have a solution that will utilize my 2 lab servers, the 8 Core Supermicro servers, and will use a single physical ESXi server for management domain, and the second physical esxi server for the workload domain.
This blog post is is mainly for setting up the management domain. Workload will come in a later post.
Let’s get started…
Software Versions used in this demo
Software | Version |
---|---|
VMware Cloud Foundation | 5.1 |
Physical vSphere Host | 8.0 Update 1 |
Prepping the environment for VCF
Take a look at the architecture below. I have 2 physical ESXi servers, with nested ESXi hosts in each of them. I also have the management and general vm traffic going through the 1 GB links that go through the switch. The switch is connected to my home fabric, and out to the internet. The 2 10GB links are directly connected between each physical host, and they allow trunking as the vlan is set to 4095.
So everything in this blog post will assume the above server and a similar networking architecture. Please adjust your environment as needed.
Physical ESXI networking requirements
Let’s jump into the physical host and see what is configured for the network. As you can see below, I have 6 physical NICs connected to each host.
- 1GB – Management, and VM Network
- 1GB – Not in use
- 1GB – Not in use
- 1GB – Not in use
- 10GB – vSAN & vMotion
- 10GB – Not in use (yet)
2 Virtual switches configured. Don’t worry about the vSwitch1, not in use in this lab anymore.
vSwitch0 is set on NIC1 for mgmt, and vSanvMotionSwitch is using uplink 5, the first available 10GB NIC.
Set the MTU to 9000 on vSanvMotionSwitch and make sure to set the Promiscuous mode, MAC address changes, and Forged transmits to Accept.
Do the same for vSwitch0
For the portgroups, create a single portgroup vSanvMotionPG, and set the vlan to 4095
If you’re using 2 physical servers for this lab, as I am, then you’ll recreate all these same settings on that second host.
Deploy 4 nested ESXi VMs
There’s a couple ways you can do this next section. You can create 4 VMs manually and install the ESXi ISO onto each of them. Run through some basic network configuration, etc. It’s not too bad, it will take you 30 minutes or so to do all 4. I’ll show you below what settings I have on each of the nested hosts.
Or you can do this the automated way, using ansible. I created a playbook that deploys a VM and automatically installs and configures the nested esxi hosts for you. See the blog for more details.
However you decide to do this, you should end up with 4 VMs, as shown above.
- 8 CPU, don’t forget to enable Expose hardware assisted virtualization to the guest OS
- I had 75GB of memory, and I just realized it shows as 73.24 for some reason
- 3 Disks (adjust as necessary)
- 40 for the ESXi OS – Thin Provisioned
- 1TB for the primary vSAN storage – Thin Provisioned
- 40GB for caching – Thin Provisioned
- 4 Networks
- VM Network – VMXNET 3
- VM Network – VMXNET 3
- vSanvMotionPG – VMXNET 3
- vSanvMotionPG – VMXNET 3
- cddrive – Used for installing the ISO, then not needed afterwards
If you are doing this the manual way, here is a blog showing the process: Configure a nested esxi host
Configure DNS
You will need to reserve a few a records for various components of the VCF lab. Back in the day, I was using a simple bind server as my primary dns server, it worked fine. But I ended up adopting pfSense and use it strictly for DNS purposes.
Here’s a list of the fqdns you will need to have. Both forward and reverse DNS lookups must work.
- vcfcloudbuilder.home.lab 192.168.3.15 # VCF Cloud Builder ova file
- vcf-mgmt-esxi-1.home.lab 192.168.3.16 # nested ESXi #1
- vcf-mgmt-esxi-2.home.lab 192.168.3.17 # nested ESXi #2
- vcf-mgmt-esxi-3.home.lab 192.168.3.18 # nested ESXi #3
- vcf-mgmt-esxi-4.home.lab 192.168.3.19 # nested ESXi #4
- vcf-sddc-mgr-01.home.lab 192.168.3.60 # SDDC Manager
- vcf-nsx01a.home.lab 192.168.3.61 # NSX Manager
- vcf-nsx01.home.lab 192.168.3.62 # NSX VIP
- vcf-vcenter-01.home.lab 192.168.3.63 # vCenter
VCF Cloud Builder – To deploy the Management Cluster
Downloading the OVA
Sign in to my.vmware.com and navigate to Products and Accounts -> All Products
VMware Cloud Foundation -> View Download Components
In this lab i’ll be using VCF 5.1. Go to Downloads
Download both the OVA for VMware Cloud Builder and the Cloud Builder Deployment Parameter Guide, xlsx file.
Deploy the VM
The cloud builder tool, which installs via an ova file, needs to be deployed into some vSphere environment. So you have a couple choices here.
- Deploy this onto your PC/MAC using VMware Workstation
- Deploy this via the physical esxi host that will host the VCF management plane
- Deploy this via the second physical esxi host, that will host the workload domain.
I chose the 3rd option, to deploy the ova into the unused physical esxi host. I probably could get by with deploying it directly to the other host, so honestly either is fine.
Install the vcf cloud builder ova…
Deploy a virtual machine from an OVF or OVA file
Name the VM: vcfcloudbuilder and select the file you had downloaded in an earlier step.
Select the datastore.
Read and accept the EULA
Set the appropriate network, you configure network settings in the next tab.
Set the network settings, again make sure you have set the DNS in advance. If your ntp server doesn’t match what is configured on the esxi hosts, you will get a warning in the vcf prechecks.
Finish the ova deployment and wait a while. Takes about an hour or so.
Eventually it will be up and running. Yes I know my VM name has changed, and so will the fqdn in a later step. No worries, same image and version.
Set the cluster minimum size in the VCF Cloud Builder tool
Once the vcf cloud builder tool is up and running, ssh into the ova and run the following commands.
su -
echo "bringup.mgmt.cluster.minimum.size=1" >> /etc/vmware/vcf/bringup/application.properties
systemctl restart vcf-bringup.service
Once you’ve done the above, move on the next step.
Prepare the JSON Config file
Ok, now onto the fun part.. And honestly we’re almost done. Once the vcf cloud builder tool is up and running, and once you review the json file, that’s basically it. A few steps to submit it, but the rest of it is a waiting game to see if it deploys successfully or not.
Remember that xlsx sheet that we downloaded earlier, I would go ahead and open it.
In order for VCF to provision out the entire environment, vCenter, NSX, vSAN, etc.. it needs a ton of variables up front. Network settings, portgroup mappings, etc. All of these settings are stored in a json file, and one of the first steps in the vcf cloud builder tool asks you for that json file.
When I was trying to get this working in my home lab, I looked at various blogs, but none of them were the perfect solution for my environment. I ended up having to use this excel sheet, map it out the way I needed, then convert it to json and merge to my existing json file. Hopefully you won’t have to do that, and that my json file below is comprehensive enough for your needs.
So I recommend reviewing the xlsx they provide, mock it up as if it were your own setup. It is my understanding that the excel sheet in the .xlsx format can be uploaded directly to the vcf cloud builder tool. I have not tried this, but assuming that works, no need to convert anything to json.
Below is my json file. feel free to comment if anything is unclear.
{
"subscriptionLicensing": false,
"skipEsxThumbprintValidation": true,
"managementPoolName": "vcf-mgmt-01",
"sddcManagerSpec": {
"secondUserCredentials": {
"username": "vcf",
"password": "PASSWORD_SET_ME"
},
"ipAddress": "192.168.3.60",
"hostname": "vcf-sddc-mgr-01",
"rootUserCredentials": {
"username": "root",
"password": "PASSWORD_SET_ME"
},
"localUserPassword": "PASSWORD_SET_ME"
},
"sddcId": "vcf-01",
"esxLicense": "LICENSE_FOR_ESX",
"taskName": "workflowconfig/workflowspec-ems.json",
"ceipEnabled": false,
"fipsEnabled": false,
"ntpServers": ["pool.ntp.org"],
"dnsSpec": {
"subdomain": "home.lab",
"domain": "home.lab",
"nameserver": "192.168.3.6"
},
"networkSpecs": [
{
"networkType": "MANAGEMENT",
"subnet": "192.168.3.0/24",
"gateway": "192.168.3.1",
"vlanId": "0",
"mtu": "9000",
"portGroupKey": "cl01-vds01-pg-mgmt",
"standbyUplinks":[],
"activeUplinks":[
"uplink1",
"uplink2"
]
},
{
"networkType": "VMOTION",
"subnet": "172.16.12.0/24",
"gateway": "172.16.12.253",
"vlanId": "1612",
"mtu": "9000",
"portGroupKey": "cl01-vds02-pg-vmotion",
"includeIpAddressRanges": [{"endIpAddress": "172.16.12.104", "startIpAddress": "172.16.12.101"}],
"standbyUplinks":[],
"activeUplinks":[
"uplink1",
"uplink2"
]
},
{
"networkType": "VSAN",
"subnet": "172.16.13.0/24",
"gateway": "172.16.13.253",
"vlanId": "1613",
"mtu": "9000",
"portGroupKey": "cl01-vds02-pg-vsan",
"includeIpAddressRanges": [{"endIpAddress": "172.16.13.104", "startIpAddress": "172.16.13.101"}],
"standbyUplinks":[],
"activeUplinks":[
"uplink1",
"uplink2"
]
},
{
"networkType": "VM_MANAGEMENT",
"subnet": "192.168.3.0/24",
"gateway": "192.168.3.1",
"vlanId": "0",
"mtu": "9000",
"portGroupKey": "cl01-vds01-pg-vm-mgmt",
"standbyUplinks":[],
"activeUplinks":[
"uplink1",
"uplink2"
]
}
],
"nsxtSpec":
{
"nsxtManagerSize": "small",
"nsxtManagers": [
{
"hostname": "vcf-nsx01a",
"ip": "192.168.3.61"
}
],
"rootNsxtManagerPassword": "PASSWORD_SET_ME",
"nsxtAdminPassword": "PASSWORD_SET_ME",
"nsxtAuditPassword": "PASSWORD_SET_ME",
"vip": "192.168.3.62",
"vipFqdn": "vcf-nsx01",
"nsxtLicense": "LICENSE_FOR_NSX",
"transportVlanId": 1614
},
"vsanSpec": {
"licenseFile": "LICENSE_FOR_VSAN",
"vsanDedup": "true",
"esaConfig": {
"enabled": false
},
"datastoreName": "cl01-ds-vsan01"
},
"dvsSpecs": [
{
"dvsName": "cl01-vds01",
"vmnics": [
"vmnic0",
"vmnic1"
],
"mtu": 9000,
"networks":[
"MANAGEMENT",
"VM_MANAGEMENT"
],
"niocSpecs":[
{
"trafficType":"VSAN",
"value":"HIGH"
},
{
"trafficType":"VMOTION",
"value":"LOW"
},
{
"trafficType":"VDP",
"value":"LOW"
},
{
"trafficType":"VIRTUALMACHINE",
"value":"HIGH"
},
{
"trafficType":"MANAGEMENT",
"value":"NORMAL"
},
{
"trafficType":"NFS",
"value":"LOW"
},
{
"trafficType":"HBR",
"value":"LOW"
},
{
"trafficType":"FAULTTOLERANCE",
"value":"LOW"
},
{
"trafficType":"ISCSI",
"value":"LOW"
}
],
"nsxtSwitchConfig": {
"transportZones": [ {
"name": "vcf-01-tz-overlay01",
"transportType": "OVERLAY"
},
{
"name": "vcf-01-tz-vlan01",
"transportType": "VLAN"
}
]
}
},
{
"dvsName": "cl01-vds02",
"vmnics": [
"vmnic2",
"vmnic3"
],
"mtu": 9000,
"networks":[
"VSAN",
"VMOTION"
],
"nsxtSwitchConfig": {
"transportZones": [
{
"name": "vcf-01-tz-vlan02",
"transportType": "VLAN"
}
]
}
}
],
"clusterSpec":
{
"clusterName": "cl01",
"clusterEvcMode": "",
"clusterImageEnabled": true,
"vmFolders": {
"MANAGEMENT": "vcf-01-fd-mgmt",
"NETWORKING": "vcf-01-fd-nsx",
"EDGENODES": "vcf-01-fd-edge"
},
"resourcePoolSpecs": [{
"name": "cl01-rp-sddc-mgmt",
"type": "management",
"cpuReservationPercentage": 0,
"cpuLimit": -1,
"cpuReservationExpandable": true,
"cpuSharesLevel": "normal",
"cpuSharesValue": 0,
"memoryReservationMb": 0,
"memoryLimit": -1,
"memoryReservationExpandable": true,
"memorySharesLevel": "normal",
"memorySharesValue": 0
}, {
"name": "cl01-rp-sddc-edge",
"type": "network",
"cpuReservationPercentage": 0,
"cpuLimit": -1,
"cpuReservationExpandable": true,
"cpuSharesLevel": "normal",
"cpuSharesValue": 0,
"memoryReservationPercentage": 0,
"memoryLimit": -1,
"memoryReservationExpandable": true,
"memorySharesLevel": "normal",
"memorySharesValue": 0
}, {
"name": "cl01-rp-user-edge",
"type": "compute",
"cpuReservationPercentage": 0,
"cpuLimit": -1,
"cpuReservationExpandable": true,
"cpuSharesLevel": "normal",
"cpuSharesValue": 0,
"memoryReservationPercentage": 0,
"memoryLimit": -1,
"memoryReservationExpandable": true,
"memorySharesLevel": "normal",
"memorySharesValue": 0
}, {
"name": "cl01-rp-user-vm",
"type": "compute",
"cpuReservationPercentage": 0,
"cpuLimit": -1,
"cpuReservationExpandable": true,
"cpuSharesLevel": "normal",
"cpuSharesValue": 0,
"memoryReservationPercentage": 0,
"memoryLimit": -1,
"memoryReservationExpandable": true,
"memorySharesLevel": "normal",
"memorySharesValue": 0
}]
},
"pscSpecs": [
{
"adminUserSsoPassword": "PASSWORD_SET_ME",
"pscSsoSpec": {
"ssoDomain": "vsphere.local"
}
}
],
"vcenterSpec": {
"vcenterIp": "192.168.3.63",
"vcenterHostname": "vcf-vcenter-01",
"licenseFile": "LICENSE_FOR_VCENTER",
"vmSize": "small",
"storageSize": "",
"rootVcenterPassword": "PASSWORD_SET_ME"
},
"hostSpecs": [
{
"association": "dc01",
"ipAddressPrivate": {
"ipAddress": "192.168.3.16"
},
"hostname": "vcf-mgmt-esxi-1",
"credentials": {
"username": "root",
"password": "PASSWORD_SET_ME"
},
"vSwitch": "vSwitch0"
},
{
"association": "dc01",
"ipAddressPrivate": {
"ipAddress": "192.168.3.17"
},
"hostname": "vcf-mgmt-esxi-2",
"credentials": {
"username": "root",
"password": "PASSWORD_SET_ME"
},
"vSwitch": "vSwitch0"
},
{
"association": "dc01",
"ipAddressPrivate": {
"ipAddress": "192.168.3.18"
},
"hostname": "vcf-mgmt-esxi-3",
"credentials": {
"username": "root",
"password": "PASSWORD_SET_ME"
},
"vSwitch": "vSwitch0"
},
{
"association": "dc01",
"ipAddressPrivate": {
"ipAddress": "192.168.3.19"
},
"hostname": "vcf-mgmt-esxi-4",
"credentials": {
"username": "root",
"password": "PASSWORD_SET_ME"
},
"vSwitch": "vSwitch0"
}
]
}
Ok, well that’s it for the configuration. Either the above json file, or the excel spreadsheet. Both should work for the next step.
Utilize the UI to deploy VCF
Navigate to the fqdn of your cloud builder tool, and login. You should see the below page:
Select VMware Cloud Foundation and click Next.
Click that you’ve read the terms, Next.
Select the json file that was used in the previous step, or if you used the workbook, upload the excel sheet.
Time for the pre-check. This takes about 5 minutes or so, depending on your environment. Mine takes a bit longer since the vMotion and vSan networks aren’t actually reachable outside of the physical servers, and there is no default gateway that is pingable. So I think it pings for a minute or so before giving up and moving to the next tasks. Just a warning though.
If you see any errors, you won’t be able to proceed to future steps. Warnings, you should be able to bypass.
Another error that I ran into was when I first tried this deployment, I was getting an error about using 1GB NICs, instead of 10GB. After moving to the 10GB, the error went away.
If this happens to you, you can force the install to happen, and bypass that error, using the following commands.
# SSH into the vcf cloud builder ova
touch /home/admin/vcf-config.5.1.json
vi /home/admin/vcf-config.5.1.json #Paste in values from the json file we created earlier
curl -k -u admin:'PASSWORD' -X POST https://localhost/v1/sddcs -H "Content-Type: application/json" -d "@/home/admin/vcf-config.5.1.json"
Performing the prechecks…
All the pre-checks are finished, and I have some warnings.
Notice the yellow tab: Errors found during configuration file validation. Proceed with caution
- Gateway 172.16.12.253 for VMOTION network is not responding from vcf-mgmt-esxi-2.home.lab
- Gateway 172.16.13.253 for VSAN network is not responding from vcf-mgmt-esxi-2.home.lab
- DHCP has failed to assign IP Address to vmk ‘vmk30’ on ESXi Host ‘vcf-mgmt-esxi-2.home.lab’ for network ‘NSXT_HOST_OVERLAY’
- No remote NTP Server exists for ESXi Host vcfova.home.lab
You can ignore all the above, they don’t seem to matter. Click Acknowledge on the yellow tab and click Next.
Click Deploy SDDC
Starting to deploy the VCF environment! This process takes a long time… 2-3 hours in my environment.
You can see the Cloud Builder Timetable and task list here.
During this time you can sit back and relax, or log into the various nested esxi hosts and watch the magic happen. There are some steps like bringing up the vcenter, or nsxt environments that take quite a while (30minutes or more). So don’t worry if you see it hanging on that task, some take a while.
And if you’re really bored, you can tail the vcf log file, and watch everything real time.
# SSH into the vcf cloud builder tool
tail -f /var/log/vmware/vcf/bringup/vcf-bringup.log
Eventually you’ll get to see the screenshot above. VCF is finally deployed and ready to use!
Tips and Tricks for VCF
After doing this install a dozen or so times, and most of them failing, I learned a few things that you might run into. Most importantly, it’s worth spending a few extra minutes doing the excel sheet and really reviewing the json file. Make sure all the IP addresses are available, make sure they’re registered in DNS, NTP. I’m not sure if it’s required but I would say go ahead and configure DHCP on your primary management network just in case. I have it in mine, so I never ran into an issue.
Restart a failed installation of VCF
If your vcf deployment failed, you can’t just retry it from the vcf cloud builder tool. I know there’s an option for it, but it will probably fail again. It’s best to just reprovision the environment and build out new esxi hosts, modify your json file as needed, and redo it.
If you’re using the ansible based nested esxi deployment script I created, then reprovisioning the environment should be fast and simple.
There’s only 1 extra thing you will need to do, directly on the vcf cloud builder tool itself.
# SSH into the vcf cloud builder tool
su -
sudo /usr/pgsql/13/bin/psql -U postgres -d bringup -h localhost
delete from execution;
delete from "Resource";
\q
The above commands will wipe the database of your existing install so you can start again. MUCH faster than redeploying a new ova for the cloud builder tool.
Forcing the VCF deployment via CLI
I showed these commands above, but just listing it again under tips and tricks. If you want to bypass the pre-checks you can force the cloud builder tool to accept your json file and start running the deployment, via a cli command. I wouldn’t recommend this unless you know the error/warning you’re seeing is unimportant. Like the warning about 1GB Nics. That can be bypassed. I’m sure there are others that are worth bypassing. But some things like invalid certificates on the nested esxi hosts, that’s worth fixing.
# SSH into the vcf cloud builder ova
touch /home/admin/vcf-config.5.1.json
vi /home/admin/vcf-config.5.1.json #Paste in values from the json file we created earlier
curl -k -u admin:'PASSWORD' -X POST https://localhost/v1/sddcs -H "Content-Type: application/json" -d "@/home/admin/vcf-config.5.1.json"
Great post Matt. Since you are using “bringup.mgmt.cluster.minimum.size=1,” it’s important to note that in a single node scenario, “hostFailuresToTolerate”: 0 must be included in the clusterSpec section of the JSON file.
Br.
Hey Maciej,
Thanks for the feedback. Can you tell me a little more about that setting. I am not seeing any errors or warnings in the sddc manager or vCenter. That setting makes sense, I’m just wondering what it is supposed to suppress or fix.
Thanks!
You are not affected by any FTT-related errors because your system has a four-node cluster. During the setup of VCF, FTT=1 is configured by default. FTT=1, which refers to RAID 1, ensures n+1 availability by creating a duplicate of the data on a different host within the cluster.
The setting “bringup.mgmt.cluster.minimum.size=1” indicates that it is possible to deploy a VCF management domain using just one ESXi host, although this is not recommended for production environments.
Consequently, in a single node scenario, it’s essential to have “hostFailuresToTolerate”: 0 in the JSON file, which means there is only a single data component without any copies.
Br.
i faced issue in task “Update SDDC Manager with Licensing Information task failed” vcf on vxrail 5.1.1