Professional Documents
Culture Documents
• Background:
• Current Emphasis: Datacenter technologies, Network Virtualization, Server and Desktop virtualization, SAN switching
• Overview: Started in LAN Networking, Network Mgmt Systems, and gradually moved into Data Center Architecture and Management
• Publications: Author of MCITPro 70-647: Windows Server 2008 R2 Enterprise Administration
• Publication: MCITPro 70-70-237: Designing Messaging Solutions with Microsoft® Exchange Server 2007
• Certifications: Dozens of Networking/DataCenter/SAN switching/Server Administration technical certifications from Cisco, VMware, Microsoft, Novell, and Industry recognized
security certifications from ISC2 (CISSP),
• Education: Graduated with honors from Ohio State University with Bachelor of Science degree in Zoology (Pre-Med) and minors in Finance and Economics.
• History:
• VMware NSBU: Technical Enablement Architect 02/2015- Prsent
• Firefly Director of Cisco Integration for VMware and Microsoft Integration 01/2013- 2/2015
• Firefly Senior Instructor and PLD for Cisco Data Center Virtualization 11/2009 – 12/2012
• CEO NITTCI: Prof Services, Training, Content Development (Courseware and books) 10/2003 – Present
• CEO Dynacomp Network Systems: Consulting, Training, Courseware Development 02/1989 – 10/2003
• Including LAN Networking specializing in Novel Netware, Directory Services, LAN design
Agenda – Part 1
CONFIDENTIAL
6
Agenda – Part 1
1200+
NSX Customers
250+
Production Deployments
(adding 25-50 per QTR)
100+
Organizations have spent
over US$1M on NSX
Be more efficient
Run things cheaper CapEx (Increase compute efficiency, ensure full life of network hardware, etc.)
Primary Use Cases with NSX
Security: Automation: Application Continuity:
Inherently Secure IT at the Speed Datacenter
Theme Infrastructure of Business Anywhere
Reduce infrastructure
Secure infrastructure
provisioning time from weeks Reduce RTO by 80%
at 1/3 the cost
to minutes
Value
DMZ Anywhere Developer Cloud Metro/Geo Pooling
Other
Projects
Secure End User Multi-tenant NSX in Public Cloud
Infrastructure
12
What is NSX?
Provides
A Faithful Reproduction of Network & Security Services
in Software
13
NSX Components
Cloud Consumption • Self Service Portal
• vCloud Automation Center, OpenStack,
Custom CMP
NSX Manager
• Single configuration portal
Management Plane • REST API entry-point
Logical Network
NSX Controller
• Manages Logical networks
Control Plane • Control-Plane Protocol
• Separation of Control and Data Plane
Distributed Services
NSX Edge
• High – Performance Data Plane
Data Plane • Scale-out Distributed Forwarding Model
Logical Distributed Firewall
Switch Logical Router
…
NSX vSwitch and NSX Edge
ESXi
NSX vSwitch
NSX Logical
VDS VXLAN Logical Router Firewall NSX Edge Services GW
Router Control VM
• VXLAN
updates controller L3-L7 Services:
• Distributed Routing Determines active ESXi • NAT, DHCP, Load Balancer,
host for VXLAN to VLAN VPN, Firewall
• Distributed Firewall
layer 2 bridging
• Switch Security VM form factor
• Message Bus High Availability
15 | 22
Virtual Networks (VMware NSX)
16
Virtual Extensible LANs (VXLAN)
Design decision: Should VXLAN be included in the design?
18
VXLAN Frame Format
Original L2 frame header and payload is encapsulated in a UDP/IP
packet
50 bytes of VXLAN overhead
• Original L2 header becomes payload, plus: VXLAN, UDP, and IP headers
Frame VXLAN Packet
FCS
Original Frame Header
MAC Header IP Header UDP Header VXLAN Header & Payload
Outer MAC Header Outer IP Header Outer UDP Header VXLAN Header Inner L2
Destination Address 6 Misc Data 9 Source Port 2 VXLAN Flags 1 Destination Address 6
Payload 1500
VLAN Type 0x8100 2 Header Checksum 2 UDP Length 2 VNI 3 VLAN Type 0x8100 2
14+ 20 14+
VXLAN Overhead
19
NSX for vSphere VXLAN Replication Modes
NSX for vSphere provides three modes of traffic replication (one that is Data Plane based and two that are
Controller based)
Multicast Mode
Requires IGMP for a Layer 2 topology and
Multicast Routing for L3 topology
Unicast Mode
All replication occurs using unicast
Hybrid Mode
Local replication offloaded to physical network,
while remote replication occurs via unicast
• In hybrid mode, this proxy is called a Multicast Tunnel End ARP Table
Point (MTEP).
• This list of UTEPs or MTEPs is NOT synced to each VTEP. VTEP Table
Transit LS VLAN PG
Logical Switching
CONFIDENTIAL 22
Enterprise Topology
• A common enterprise-level topology.
External Network
Physical Router
VLAN 20
Uplink
VXLAN 5020
Uplink
LR Instance 1
VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM
Servicer Provider: Multiple Tenant Topology
• Multiple tenants to the same NSX Edge gateway.
External Network
Tenant 1 Tenant 2
LR Instance 1 LR Instance 2
VM VM VM VM VM VM VM VM VM VM VM VM
Web Logical Switch App Logical Switch DB Logical Switch App Logical Switch DB Logical Switch
Web Logical Switch
NSX Multiple Tenant Topology (IP Domain Separation)
External Network
LR Instance 1 LR Instance 10
VM VM VM VM VM VM VM VM VM VM VM VM
Web Logical Switch App Logical Switch DB Logical Switch Web Logical Switch App Logical Switch DB Logical Switch
NSX– Physical View NSX Edge Controller Manager
VM4 VM5
VM1 VM2 VM3
App LS
Web Logical Switch
Transport Zone
Physical Network
26
CONFIDENTIAL
Management Plane Components
vRA/Openstack/Custom
Management Plane
1:1
vCenter NSX Manager
vSphere Cluster
vSphere HA
DRS with Anti-affinity
NSX Controllers
Properties
Virtual Form Factor (4 vCPU, 4GB RAM) Host Agent
High Availability
VXLAN - no Multicast
ARP Suppression
28
Deploying and Configuring VMware NSX
Consumption
Deploy VMware NSX
+ + +
Programmatic
Virtual
Network Deployment
NSX NSX
Mgmt Edge
Virtual Infrastructure
Logical Networks
Component Deployment
One Time
Recurring
Deploy NSX Manager
Deploy NSX Controller Cluster Deploy Logical Switches per tier
Universal
Controller
USS Cluster
vCenter & NSX Manager A vCenter & NSX Manager B vCenter & NSX Manager H
Universal
DFW
CONFIDENTIAL 30
Cross-VC NSX Components & Terminology
• Cross-VC NSX objects use the term Universal and include:
– Universal Synchronization Service (USS)
– Universal Controller Cluster (UCC)
– Universal Transport Zone (UTZ)
– Universal Logical Switch (ULS)
– Universal Distributed Logical Router (UDLR)
– Universal IP Set/MAC Set
– Universal Security Group/Service/Service Group
WAN/Internet
VLANs carried throughout the Fabric
L2 application scope is limited to a
L3 L3
single POD L2
L2
Default gateway – HSRP/VRRP at the
aggregation layer
Ideally multiple aggregation PODs, to
limit the Layer 2 domain size, although
not always the case POD A POD B
• Uniform configurations
L3
L2
WAN/Internet
34
L3 Fabric Topologies & Design Considerations
VMkernel traffic
ToR provides default gateway service for L3
each VLAN subnet L2
...
VLAN Boundary 802.1Q
Hypervisor 1
802.1Q Hypervisor n
Physical Fabric Options with NSX
Spine
WAN
Internet
L3
L2
Leaf L2 VLANs
for bridging
L3 L3
Edge Leaf (L3 to DC Fabric, L2 to
L2 L2 External Networks)
vCenter 1
Max supported
number of VMs
Edge Clusters
vCenter 2 Storage
Max supported
number of VMs Management
Cluster
Compute Clusters Infrastructure Clusters (Edge, Storage,
vCenter and Cloud Management System)
WAN
Spine
Internet
L3
L2
vCenter 2
Max supported
number of VMs
Management Cluster L3 L3
Leaf L2 L2
L2 required for Management workloads such
as vCenter Server, NSX Controllers, NSX L2
WAN
Routed DC Fabric Internet
Edge Cluster L3
L2
L2 required for External 802.1Q VLANs &
Edge Default GW
Needed as Edge HA uses GARP to announce L3 L3
VMkernel VLANs
L2 Fabric - Network Addressing and VLANs Definition Considerations
L3 POD A POD B
Compute Racks - IP Address Allocations
and VLANs L2
Function VLAN ID IP Subnet
Management 66 10.66.Y.0/24
vMotion 77 10.77.Y.0/24
VXLAN 88 10.88.Y.0/24
Storage 99 10.99.Y.0/24
Compute
Cluster
L2 Fabric A
32 Hosts
Compute
Cluster B
32 Hosts
Span of VLANs
VLAN Trunk (802.1Q)
A1
Mgmt vMotion VXLAN Storage
10.66.1.25/26 10.77.1.25/26 10.88.1.25/26 10.99.1.25/26
DGW: 10.66.1.1 GW: 10.77.1.1 DGW: 10.88.1.1 GW: 10.99.1.1
44
Slide 44
Dynamic routing protocols (OSPF, BGP) used to advertise to the rest of the fabric
Scalability and predictable network addressing, based on number of ESXi hosts per rack or cluster
Reduces VLAN usage by reusing VLANs with a rack (L3) or POD (L2)
VMkernel Networking
Multi instance TCP/IP Stack Separate routing table, ARP table and default
• Introduced with vSphere 5.5 and leveraged by: gateway per stack instance
VXLAN Provides increased isolation and reservation
NSX vSwitch transport network of networking resources
Enables VXLAN VTEPs to use a gateway
independent from the default TCP/IP stack
Management, vMotion, FT, NFS, iSCSI
leverage the default TCP/IP stack in 5.5
VMkernel VLANs do not extend beyond the
rack in an L3 fabric design or beyond the
cluster with an L2 fabric, therefore static
routes are required for Management, Storage
and vMotion Traffic
Host Profiles reduce the overhead of
managing static routes and ensure
persistence
46
VMkernel Networking
Static Routing
• VMkernel VLANs do not extend beyond the rack in an L3 fabric design or beyond the cluster with an L2
fabric, therefore static routes are required for Management, Storage and vMotion Traffic
• Host Profiles reduce the overhead of managing static routes and ensure persistence
• Follow the RPQ (Request for Product Qualification) process for official support of routed vMotion.
Routing of IP Storage traffic also has some caveats
47
VMkernel Networking
VMkernel Teaming Recommendations
• LACP (802.3ad) provides optimal use of available bandwidth and quick convergence,
but does require physical network configuration
• Load Based Teaming is also a good option for VMkernel traffic where there is a desire NSX vSwitch
to simplify configuration and reduce dependencies on the physical network, while still
effectively using multiple uplinks
ESXi Host
• Explicit Failover allows for predictable traffic flows and manual balancing of VMkernel
traffic
• Refer to VDS best practices White Paper for more details on common configurations:
http://www.vmware.com/files/pdf/techpaper/vsphere-distributed-switch-best-
practices.pdf
• NSX introduces support for multiple VTEPs per host with VXLAN
48
Recap: vCenter – Scale Boundaries
10,000 powered on VMs
DC Object vCenter Server 1,000 ESXi hosts
Max. 500 hosts 128 VDS
Cluster
Max. 32 hosts
VDS 1 VDS 2
DRS-based vMotion
Manual vMotion
NSX for vSphere – Scale Boundaries
Cloud Management System
1:1 mapping of
vCenter to NSX API NSX API
NSX Cluster vCenter Server vCenter Server
(Manager) (Manager)
DRS-based vMotion
Manual vMotion
WebVM VM WebVM VM
vCenter vCenter
Server A Server B
WebVM VM WebVM VM
vCAC
Compute A Compute N
vCenter
WebVM VM
Server
WebVM VM
Edge Cluster
VDS Uplink Connectivity Options in NSX
NSX supports multiple teaming policies for VXLAN traffic
NSX for vSphere also supports multiple VTEPs per ESXi host (to load balance VXLAN traffic across
available uplinks)
LACP ✓ ×
Route based on IP Hash (Static Ether Channel) ✓ ×
Explicit Failover Order ✓ ×
Route based on Physical NIC Load (LBT) × ×
63
Uplink Connectivity Recommendation for VXLAN Traffic
Teaming and Failover mode recommendation for VXLAN traffic depends on:
VXLAN bandwidth requirements per ESXi host
NSX Administrator’s familiarity with Networking configuration
- Use where LACP isn’t available or bandwidth requirements for VXLAN traffic exceed a single uplink
Load Balance – SRC ID
- Recommended for Edge cluster to avoid complexity/support of routing over LACP
Route based on Src-ID with Multi-VTEP works well, but it is a more advanced configuration
CONFIDENTIAL 64
Network Adapter Offloads
VXLAN TCP - Operating system sends large sized TCP packets to NIC
Segmentation Offload (VXLAN encapsulated)
(VXLAN TSO)
- NIC segments packets as per physical MTU
CONFIDENTIAL 65
VMware internal slide – do not share
82599
Intel (ixgbe) X540 3.7.13.7.14iov‐NAPI Yes Yes 3.21.4 Yes Yes
I350
57810
Broadcom (bnx2x) 57711
1.72.56.v55.2 No No 1.78.58.v55.3 Yes Yes
Connect X-2
Mellanox is planning an async release to
Mellanox (mlx4_en) Connect X3 1.9.7.0 No Yes
Connect X3 Pro support VXLAN TSO for Connect X3‐Pro
BE2
No No Emulex is planning an async release to
BE3
Emulex (elxnet)
Skyhawk Yes No support RSS
CONFIDENTIAL 66
VXLAN Design Recommendations
Unicast Mode is appropriate for small deployments, or L3 Fabric
networks where the number of hosts in a segment is limited
Hybrid Mode is generally recommended for Production deployments
and particularly for L2 physical network topologies
Hybrid also helps when there are is multicast traffic sourced from
VMs
Validate connectivity and MTU on transport network before moving on
to L3 and above
Not all network adapters are created equal for VXLAN
Physical Workload
VM
VXLAN
VLAN L2 payload
69
Use Cases: Migration
• L2 as well as L3
• Virtual to virtual, physical to virtual
• Temporary, bandwidth not critical
VXLAN VLAN
70
Use Cases: Integration of non-Virtualized Workloads
• Typically necessary for integrating a non-virtualized appliance
• A gateway takes care of the on ramp/off ramp
VM
VXLAN VLAN
71
Software Layer 2 Gateway Form Factor
• Native capability of NSX
• High performance VXLAN to VLAN gateway in hypervisor kernel
Scale-up
Flexibility & Operations
• x86 performance curve
• Rich set of stateful services
• Encapsulation & encryption offloads
• Multi-tier logical routing
Scale-out as you grow
• Advanced monitoring
• Single gateway can handle all P/V traffic
VLAN 10
• Then additional gateways can be introduced VLAN 20
VLAN 30
72
Hardware Layer 2 Gateway Form Factor
• Some partner switches integrate with NSX and
provide VXLAN to VLAN gateway in hardware
Software Gateway: L2 extended
• Main benefits of this form factor:
VLAN
– Bandwidth
– Scale and
– Low-latency
73
L2 Connectivity of Physical Workloads
Physical workloads in same subnet (L2)
VLAN
? 10+ Gbps performance
1:1 mapping between VXLAN and VLAN
74
Logical to Physical – NSX L2 Bridging
DLR Control VM Standby DLR Control VM Active
VXLAN 5001
Physical Workload
VLAN 100
Migrate workloads (P2V or V2V)
Extend Logical Networks to Physical
Leverage Network/Security Services on VLAN backed
networks
Physical Gateway 75
NSX L2 Bridging Design Considerations
Usage of ESXi dvUplinks
Other
Need to ensure VLAN 10 is carried on the
VXLAN Traffic
types of uplink used for VXLAN traffic
traffic
Switch port must also allow traffic from/to that VLAN
Bridged traffic Can achieve more than 10G for bridged
traffic by bundling together 2 10G physical
interfaces
VXLAN 5000
76
VXLAN to VLAN SW L2 Bridging – Considerations
Multiple Bridge Instances vs. separate Logical Routers
Bridge instances are limited to the throughput of a single ESXi host
Bridged traffic enters and leaves host via the dvUplink that is used for VXLAN traffic – VDS
teaming/failover policy is not used
Interoperability
VLAN dvPortgroup and VXLAN logical switches must be available on the same VDS
Distributed Logical Routing cannot be used on a logical switch that is Bridged
Bridging a VLAN ID of 0 is not supported
Scalability
L2 bridging provides Line Rate throughput
Latency and CPU usage comparable with standard VXLAN
Loop prevention
Only one bridge active for a given VXLAN-VLAN pair
Detect and filter packets received via a different uplink by matching MAC address
NSX L2 Bridging Design Considerations
Routing + Bridging Use case - Not Supported
DLR Instance 1
Physical Workload
Web VM App VM
VM VM
Bridge Instance 1
NSX Edge
Physical Workload
Web VM App VM
VM VM
Bridge Instance 1
Bridging Instance 2
(VXLAN 5001 to VLAN 20)
SW VTEP
Physical Servers
(VLAN 10) VXLAN 5000 VXLAN 5001 Physical Servers Physical Servers
(VLAN 10) (VLAN 20)
80
VXLAN to VLAN L2 Bridging – Summary
NSX-v SW L2 Bridging Instance vs HW VTEPs
Always lead with NSX-v SW native bridging, performance is sufficient for nearly all
use cases and is HW agnostic
Some customers believe they need HW L2 VTEP when they don’t due to
positioning of network vendors. Find out what their use cases are first and whether
L2 bridging is actually a requirement
The following are potential use cases for HW L2 VTEP:
Low latency traffic
Very large volumes of physical servers
High amount of guest initiated storage traffic
Mandates deploying multicast in the network infrastructure to handle the delivery of VXLAN
encapsulated multi-destination traffic
Broadcast, Unknown Unicast, Multicast (BUM) traffic
Multicast mode only needed for the VXLAN segments that are bridged to VLANs
NSX-v has no direct control over the hardware VTEP devices
No control-plan communication with the controller, nor orchestration/automation capabilities (manual
configuration required for HW VTEPs)
Note: full control-plane/data-plane integration only available with NSX-MH
End-to-end loop exposure
No capabilities on HW VTEPs to detect a L2 loop caused by a physical L2 backdoor connection
Unsupported Coexistence of HW and SW VTEPs
Can only connect bare-metal servers (or VLAN attached VMs) to a pair of HW VTEP ToRs
Let’s Compare:
HW Vendor
Virtualization Solution
VM VM VM
VM
HV HV VM VM VM VLAN-backed
VM
vSphere
HV HV
83
VXLAN Hardware Encapsulation Benefits
HW SW
VXLAN L2 payload VXLAN L2 payload
NSX virtualization VM HV vSwitch
same same
performance performance
HW GW Performance
vs.
85
NSX Bridging Instance vs. Hardware Gateway
• A single bridging instance per Logical Switch • Several Hardware Gateways can be deployed at several
• Bandwidth limited by single bridging instance locations simultaneously
• L2 network must be extended to reach all the • With Hardware Gateways, VLANs can be kept local to a
VXLAN
VLAN
Non-virtualized
VM VM VM VM devices (part of the
same L2 segment)
ESXi
Distributed
Logical Router
LIF1 LIF2
DLR Control VM NSX Edge
vSphere
Host
88
NSX Logical Routing : Components Interaction
VXLAN New Distributed Logical Router Instance is
External Network
1 created on NSX Manager with Dynamic
VLAN Routing configured
Peering
OSPF, BGP
Controller pushes new logical router
NSX Edge 2 Configuration including LIFs to ESXi hosts
(Acting as next hop router)
Compute A Compute B
vCenter
Web Web Server
VM VM
VM VM
Web Web
VM VM Controller NSX
VM VM
NSX Manager Cluster Edges
Only VXLAN
192.168.230.101 192.168.240.101
LIFs Supported 192.168.220.101
Compute A Compute B
vCenter
Web Web Server
VM VM
VM VM
Web Web
VM VM Controller NSX
VM VM
NSX Manager Cluster Edges
Compute/Edge VDS
Routing Peering
Recommended use of a VXLAN (not Edge Uplink
VLAN) for the Transit Link between DLR
and NSX Edge NSX Edge
VXLAN 5020
Transit Link
Distributed
Routing
94
Multi Tenant Routing Topology
External Network
NSX
A single NSX Edge can provide centralized Edge
routing for multiple connected tenants
Up to 9 tenants supported on a single NSX Edge for pre-6.1 VXLAN 5020 VXLAN 5029
NSX releases Transit Link Transit Link
95
Multi Tenant Routing Topology (Post-6.1 NSX Release)
From NSX SW Release 6.1, a new type of
interface is supported on the NSX Edge (in External Network
addition to Internal and Uplink), the “Trunk”
interface VLAN
VXLAN
This allows to create many sub-interfaces on
a single NSX Edge vNic and establish NSX Edge
96
High Scale Multi Tenant Topology
Used to scale up the number of tenants
(only option before VXLAN trunk
introduction) External Network
VLAN
VXLAN
Support for overlapping IP addresses
between Tenants connected to different first
tier NSX Edges NSX Edge X-Large
(Route Aggregation Layer)
Can configure NAT on the first tier NSX Edge
VXLAN 5100
Transit
Tenant NSX Edge Tenant NSX Edge
Services Gateway Services Gateway
Tenant 1
Web Logical Web Logical
Switch App Logical Switch DB Logical Switch Switch App Logical Switch DB Logical Switch
Tenant 1
Web Logical Web Logical
Switch App Logical Switch DB Logical Switch Switch App Logical Switch DB Logical Switch
External Network
DLR Instance 3
ROUTING
PEERING
VXLAN 5030
VXLAN 5020
External Network
VXLAN 5020
1 2
Active/Standby HA Model
NSX Edge
DLR Control VM
102
Active/Standby HA Model
All North-South traffic is handled by the Active NSX Physical Router
Edge R1> show ip route VXLAN
O 172.16.1.0/24 via 172.16.1.2 Core
The Active NSX Edge is the only one establishing adjacencies to the O 172.16.2.0/24 via 172.16.1.2 VLAN
DLR and the physical router O 172.16.3.0/24 via 172.16.1.2
Routing
Adjacency
Physical Router
.1
172.16.1.0/24
E1 .2 E2
Active Standby
ESXi Host Kernel
.2
net-vdr -l --route Default+Edge-1
192.168.1.0/24
O 0.0.0.0 via 192.168.1.2
.1
DLR
Active Standby
Web App DB
OSPF/BGP Timers
O 0.0.0.0 via 192.168.1.2 (40 sec, 120 sec)
Other HA recommendations:
.1
vSphere HA should be enabled for the NSX Edge VMs
DLR
Stateful services supported on the NSX Edge pair Active Standby
FW, Load-Balancing, NAT
Web App DB
104
Active/Standby HA Model
Failure of the Control VM
Failure of the Active Control VM triggers the failover to the VXLAN
Standby VM Core
VLAN
Routing
Adjacency
Physical Router
E1 E2
Active Standby
.2
192.168.1.0/24
DLR X
Active Standby
net-vdr -l --route Default+Edge-1
Web
O 0.0.0.0 via 192.168.1.2
App DB
ESXi Host Kernel
105
Active/Standby HA Model
Failure of the Control VM
Failure of the Active Control VM triggers the failover to the VXLAN
Standby VM Core
VLAN
Heartbeat Dead Timer tuning on the Control VM is not required Routing
to improve convergence in this failure scenario Adjacency
Physical Router
South-to-North flows keep flowing based on the forwarding
information programmed in the kernel of the ESXi hosts
This is true despite the fact that the routing protocol is not running yet on the
newly activated DLR Control VM
E1 E2
North-to-South flows keep flowing based on the information
programmed on the NSX Edge forwarding table Active Standby
The (30, 120 sec) protocol timers setting ensures that the NSX Edge maintains
.2
active the routing adjacency to the DLR, preventing flushing the info in the
forwarding table 192.168.1.0/24
VXLAN
Core
ECMP support on the DLR and on the NSX Edge VLAN
Both have the capability of installing in their forwarding tables up to 8
equal cost routes for a given destination
8 NSX Edges can be simultaneously deployed for a Physical Routers
given tenant
Increase the available bandwidth for North-South communication (up
to 80 Gbps*)
Reduces the traffic outage in an ESG failure scenario (only 1/Xth of E1 E2 E3 E8
the flows are affected) …
Load-balancing algorithm on NSX Edge:
Based on Linux kernel flow based random round robin algorithm for
the next-hop selection a flow is a pair of source IP and destination
IP
Load-balancing algorithm on DLR:
Hashing of source IP and destination IP defines the chosen next-hop DLR
Active Standby
Web App DB
107
Enabling ECMP on DLR and NSX Edge
108
Why FW is Not Disabled from 6.1.2?
Core
Physical
Routers
Next-Hop 1
Next-Hop 2 Standby
Active
DLR
… ECMP Edges
A/S Edges
Tenant Requiring Tenant NOT Requiring
Stateful Services Stateful Services
109
ECMP HA Model (Up to 8 NSX Edges)
VXLAN
North-South traffic is handled by all Active NSX Edges Core
Active routing adjacencies are established with the DLR Control VM
VLAN
and the physical router
Traffic is hashed across equal cost paths based on Src/Dst IP
address values Physical Router
Routing
Adjacencies
E1 E2 E3 E8
…
DLR
Active Standby
VXLAN
North-South traffic is handled by all Active NSX Edges Core
Active routing adjacencies are established with the DLR and the physical VLAN
router
Traffic is hashed across equal cost paths based on Src/Dst IP address
values Physical Router
On failure of an NSX Edge, the corresponding flows are Routing
re-hashed through the remaining active units Adjacencies
The DLR and the physical router time out the routing adjacencies with
the failed Edge and remove routing table entries pointing to that specific E1 E2 E3 E8
…
next-hop IP address
Recommended to aggressively tune the hello/holdtime keep alive/hold
down routing timers (1/3 seconds) to speed up traffic recovery
Other HA recommendations:
X
No need for deploying a Standby for each Active NSX Edge
vSphere HA should remain enabled DLR
Active Standby
North-to-South traffic initially flows based on dynamic ESG Forwarding Table ESG Forwarding Table
routing information provided by the Active DLR E1> show ip route E8> show ip route
Control VM to the ESGs O 172.16.1.0/24 via 192.168.1.1 O 172.16.1.0/24 via 192.168.1.1
O 172.16.2.0/24 via 192.168.1.1 O 172.16.2.0/24 via 192.168.1.1
South-to-North traffic flows based on the routing O 172.16.3.0/24 via 192.168.1.1 O 172.16.3.0/24 via 192.168.1.1
information (usually just a default route) programmed
in the kernel of the ESXi hosts
E1 E8
DLR
Active Standby
Web App DB
113
ECMP - DLR Active Control VM Failure
Core
North-to-South traffic initially flows based on dynamic ESG Forwarding Table ESG Forwarding Table
routing information provided by the Active DLR E1> show ip route E8> show ip route
Control VM to the ESGs O 172.16.1.0/24 via 192.168.1.1 O 172.16.1.0/24 via 192.168.1.1
O 172.16.2.0/24 via 192.168.1.1 O 172.16.2.0/24 via 192.168.1.1
South-to-North traffic flows based on the routing O 172.16.3.0/24 via 192.168.1.1 O 172.16.3.0/24 via 192.168.1.1
information (usually just a default route) programmed
in the kernel of the ESXi hosts
E1 E8
After the failure of the Active Control VM, all the
adjacencies with the NSX Edges are brought down ESXi Host Kernel …
(until the Standby takes over and restarts the routing net-vdr -l --route Default+Edge-1 .2 .9
services)
This is because of the aggressive timers settings required to speed
O 0.0.0.0 via 192.168.1.2
………….
O 0.0.0.0/0 via 192.168.1.9
X X 192.168.1.0/24
.1
X
up convergence
On ESGs, Logical Network routes dynamically learned from DLR are DLR
removed from the forwarding tables north-to-south traffic flows Active Standby
stop
South-to-north traffic keeps flowing based on the forwarding table Web App DB
information available in the ESXi hypervisors at the time of failure
172.16.1.0/24 172.16.2.0/24 172.16.3.0/24
114
ECMP - DLR Active Control VM Failure
Use of Static Routes to Remove Traffic Outage
Core
North-to-South traffic outage can be avoided by
leveraging static routes on the ESGs to reach the
Logical Switches prefixes E1> show ip route E1> show ip route
If possible, the recommendation is to configure a static S 172.16.0.0/16 via 192.168.1.1 S 172.16.0.0/16 via 192.168.1.1
.1
With this configuration, failure of the DLR Active
Control VM results in zero packets outage DLR X
Active Standby
Web App DB
115
ECMP - Simultaneous Failure of NSX Edge and Control VM
Core
A specific failure scenario is the one where the DLR
Active Control VM fails at the same time as an ESG
This happens if both VMs are co-located on the same ESXi host
.1
Recommendation is to use anti-affinity rules to prevent
deploying the DLR Control VM on the same ESXi host DLR X
Active Standby
with an Active ESG
DLR Control VMs could be deployed on dedicated hosts part Web App DB
of the Edge Cluster or (recommended option) on the
Compute Clusters 172.16.1.0/24 172.16.2.0/24 172.16.3.0/24
116
VLAN Traffic and ESXi Uplinks Design (Option 1)
North-South Communication
Physical View Logical View
Design principle: the number of ESG
logical uplinks matching the number of
802.1q Core
ESXi physical uplinks Routed DC Trunks
Core
L3
Fabric R1 R2
ESXi Uplink = VLAN ID = Routing Adjacency = L2 Physical Routers
R1 R2
Active Path
ToR2 ToR4
ToR1 ToR3
Assuming an ESXi host is equipped with
2x10GE uplinks, this implies two logical uplinks External VLAN20
External VLAN10
on each ESG
Each ESG logical uplinks is mapped to a E1 E2 E3 E4 E5 E6 E7 E8
unique VLAN backed port-group carried only
on a specific ESXi host uplink (i.e. VLAN 10 on Edge Racks Transit VXLAN
uplink 1, VLAN 20 on uplink 2)
ToR1 ToR2 ToR3 ToR4
Both physical uplinks can be utilized by a single NSX DLR
Edge
Core
Routed DC L3 Links Core
As alternative deployment model, the Fabric
E3 E4 E7 E8
118
Edge HA Models Comparison – BW, Services & Convergence
Physical Router
Active/Standby HA Model Routing
Adjacency
Single Path
Bandwidth E2
(~10 Gbps/Tenant) E1
Active Standby
Stateful Services Supported - NAT, SLB, FW
Web App DB
Physical Router
ECMP HA Model Routing
Adjacencies
Up to 8 Paths
Bandwidth
(~80 Gbps/Tenant) E1 E2 E3 E8
Web App DB
NSX Edge Services Gateway
NSX Edge Gateway: Integrated network services
• Multi-functional & multi-use VM model. Deployment varies based
Routing/NAT on its use, places in the topology, performance etc.
Firewall • Functional use – N/S routing only, LB Only, Perimeter FW etc.
• Form factor – X-Large to Compact (one license)
Load Balancing
• Stateful switchover of services(FW/NAT, LB, DHCP & IPSEC/SSL)
L2/L3 VPN • Multi-interface routing Support – OSPF & BGP
DDI DHCP/DNS relay • Can be deployed in high availability or stand alone mode
• Per tenant edge services – scaling by interface and instance
122 |
34
NSX Edge Design Considerations
High Availability considerations
Use Edge HA where stateful services are required
Use Dynamic Routing failover if Edges are only performing N/S Routing
123
3-Tier App Logical to Physical Mapping NSX Manager
vCAC
NSX Controller Cluster
vCenter
Host 1 Host 2
Management Cluster
Web App Web App Web DB
vSphere Host vSphere Host vSphere Host vSphere Host vSphere Host
125
NSX Edge High Availability Failover Behavior
Feature Behavior
Firewall / NAT Stateful failover for firewall connections. Connection entries are synced to the standby
appliance.
Failover to standby in 15 seconds by default, can be configured (down to 6)
DHCP When Standby becomes active the HA link synchronization will preserve the DHCP allocation
table state.
Load Balancer For L7, Sticky tables are synced. Health of backend pool servers is synced.
Will perform a back-end status health check before becoming available.
SSL VPN / L2VPN When Standby becomes active the client will reconnect automatically
126