I am building a vSAN cluster consisting of 2 racks each with 3 nodes (This will eventually be a stretch cluster). Each rack is in different subnets as listed below:
Rack 1:
- Management: 10.73.8.0/25 (Gateway: 10.73.8.126)
- vMotion: 10.73.10.0/25 (Gateway: 10.73.10.126)
- vSAN: 10.73.11.0/25 (Gateway: 10.73.10.126)
Rack 2:
- Management: 10.73.8.128/25 (Gateway: 10.73.8.254)
- vMotion: 10.73.10.128/25 (Gateway: 10.73.10.254)
- vSAN: 10.73.11.128/25 (Gateway: 10.73.10.254)
I had built the cluster with all nodes in Rack 1. No problem. Everything works and I have a few test VMs running. When I try to add nodes from Rack 2 into the same cluster I get a "vSAN cluster partition" error. Here is what I've checked/tested:
- I have full end to end connectivity between ALL nodes (vmkping between nodes in both racks works on all subnets with MTU size messages with no fragmentation)
- The unicast agent list on all nodes is correctly showing all other nodes with the right UUIDs, IP addresses, and cert thumbprints.
- I've tried various permutations of leaving/joining the cluster with the partitioned nodes.
All my google-fu has indicated that my issue should be one of the above but it doesn't appear to be the case. I even added static routes for the vSAN networks even though I do have override default gateways set on the vSAN vmks. No dice.
I know this is a strange one but if anyone can point me in directions of other causes for this error it would be greatly appreciated.