If you’ve made it this far and you’re still seeing errors in your lab (as I did) then there are a few things you should look for.
- DNS
- NTP
- MTU
DNS
I would ssh into all the components we’ve created (esxis, nsx-mgr, edge, vyos, etc…) and do some digs and nslookups. If any of them fail, then there is a DNS problem. Also test outbound connectivity, dig google.com and see if you get a response. There are a ton of guides out there on troubleshooting DNS issues, but definitely get this resolved before you move on.
NTP
NTP can cause quite a few issues in nested environments. Make sure NTP is set on all the esxi hosts, physical and nested. NSX Manager, edge, vyos, dns. Everything! And test it!
#To test ntp: ssh into the device and run
ntpq -p
MTU
And finally MTU. NSX is surprisingly picky about MTU..
From VMware: NSX-T leverages the Generic Network Virtualization Encapsulation (Geneve) protocol, a
network virtualization tunneling protocol used to establish tunnels across transport
nodes to carry overlay traffic. Transport nodes include VM and physical-based Edges,
ESX hosts, and KVM Hypervisors, all of which require at least one Geneve Termination
End Point (TEP). With encapsulation technologies, like Geneve, it is essential to
increase the maximum transmission unit (MTU) supported both on transport nodes and
the physical network underlay. This article looks at steps to validate MTU in an NSX-T
Environment. Read More
From Me: Set everything to MTU 9000 and you don’t have to worry about anything.
If you’re using an unmanaged switch, like I am, then google it to make sure it supports jumbo frames. Since we have 2 supermicro esxi hosts, any network devices that it passes through must also support jumbo frames.
Physical ESXi and Nested ESXi
MTU can be set in 2 places, on the virtual switch and on the VMKernel Nics
One way that you can test this, is by running a command vmkping on the hosts (physical and nested)
# Example: On physical host1 you can vmkping your other physical host2
# -d don't fragment
# -s send up to a 8972 MTU
vmkping -d -s 8972 192.168.3.5
When applying an MTU change in the interface, when are these taken into consideration? Is this change life or do we have to restart EDGE and ESX?
Thanks
I don’t remember restarting any edges/esx hosts. As soon as you make the change, it goes into effect. Same as the vyos router.