To realize the goal of dynamically obtaining and announcing layer-3 network identity (IPv4 addresses) for cloud-native workloads, we propose an approach that utilizes the BGP routing protocol.
In a typical network design, these two steps are necessary:
1. Add an EIGRP “network” statement at the access-layer, for the wider "service" subnet (eg. 192.168.0.0/24). Note that "access-to-distribution" dynamic routing is achieved using EIGRP in our datacenter.
2. Considering that a cloud-native load-balancer such as MetalLB announces the /32 service identity with next-hop value of the primary interface, there is no need add a network statement. Instead, a secondary address/mask is specified at default-gateway for the access VLAN's layer-3 interface spec to allow this gateway to send ARP requests on the access network upon receiving traffic destined to the service subnet.
eBGP multi-hop between MetalLB and “distribution” switch will receive /32 with the new AS_PATH, and extended-community attributes (optional). The distribution switch can now resolve the next hop. Note that there is no need to establish a BGP relationship with “access” layer. Additionally, we must setup the "service subnet" as a "secondary" IP address/mask specification on the gateway's layer-3 interface. We typically employ a pair of switches with HSRP enabled for high availability.
This method requires us to aggregate Linux BGP speakers with MetalLB at rack or server-room level. This represents a significant new capability for publishing services from cloud-native environments such as Kubernetes.
Typically, when a router receives an eBGP route where the next-hop is unknown, the route is not installed in the table, or propagated.
NEXT_HOP is inaccessible
We solved this problem by using an EIGRP network statement for the service prefix, one that be obtaining using APIs from a Network Identity Manager such as Infoblox.
If the router doesn't know how to reach a route's next hop, a recursive lookup will fail, and the route can't be added to BGP. For example, if a BGP router receives a route for 192.168.0.11/32 with a NEXT_HOP attribute of 192.168.0.1, but doesn't have an entry in its routing table for a subnet containing 192.168.0.1, the received route for 192.168.0.11 is useless and won't be installed in the routing table.
If however, MetalLB announces the prefix with the next-hop value of the primary interface (10.205.32.15 in this example), then the IGP (EIGRP) between "access" and "distribution" layers will ensure that next-hop is reachable, and the route is successfully installed.
BGP configuration example at the network layer router:
router bgp 65001
neighbor 10.205.32.15 remote-as 65002
address-family ipv4 unicast
neighbor 10.205.32.15 ebgp-multihop
neighbor 10.205.32.15 activate
neighbor 10.205.32.15 prefix-list service-subnets in
Prefix-list
`ip prefix-list service-subnets seq 100 permit 192.168.0.0/24 ge 32`
EIGRP configuration at the "access" layer switches
router eigrp 100
network 192.168.0.0/24
Layer-3 Gateway at "access" layer
interface vlan 3032
ip address 10.032.2/22
ip address 192.168.0.1/24 secondary
hsrp 32
ip 10.0.32.1
MetalLB configuration
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config:
peers:
- peer-address: 10.0.0.1
peer-asn: 65001
my-asn: 65002
address-pools:
- name: default
protocol: bgp
addresses:
- 192.168.0.0/24
bgp-advertisements:
- aggregation-length: 32
localpref: 100
communities:
- no-advertise
- aggregation-length: 24
bgp-communities:
no-advertise: 64512-64534
In a typical network design, these two steps are necessary:
1. Add an EIGRP “network” statement at the access-layer, for the wider "service" subnet (eg. 192.168.0.0/24). Note that "access-to-distribution" dynamic routing is achieved using EIGRP in our datacenter.
2. Considering that a cloud-native load-balancer such as MetalLB announces the /32 service identity with next-hop value of the primary interface, there is no need add a network statement. Instead, a secondary address/mask is specified at default-gateway for the access VLAN's layer-3 interface spec to allow this gateway to send ARP requests on the access network upon receiving traffic destined to the service subnet.
eBGP multi-hop between MetalLB and “distribution” switch will receive /32 with the new AS_PATH, and extended-community attributes (optional). The distribution switch can now resolve the next hop. Note that there is no need to establish a BGP relationship with “access” layer. Additionally, we must setup the "service subnet" as a "secondary" IP address/mask specification on the gateway's layer-3 interface. We typically employ a pair of switches with HSRP enabled for high availability.
This method requires us to aggregate Linux BGP speakers with MetalLB at rack or server-room level. This represents a significant new capability for publishing services from cloud-native environments such as Kubernetes.
Typically, when a router receives an eBGP route where the next-hop is unknown, the route is not installed in the table, or propagated.
NEXT_HOP is inaccessible
We solved this problem by using an EIGRP network statement for the service prefix, one that be obtaining using APIs from a Network Identity Manager such as Infoblox.
If the router doesn't know how to reach a route's next hop, a recursive lookup will fail, and the route can't be added to BGP. For example, if a BGP router receives a route for 192.168.0.11/32 with a NEXT_HOP attribute of 192.168.0.1, but doesn't have an entry in its routing table for a subnet containing 192.168.0.1, the received route for 192.168.0.11 is useless and won't be installed in the routing table.
If however, MetalLB announces the prefix with the next-hop value of the primary interface (10.205.32.15 in this example), then the IGP (EIGRP) between "access" and "distribution" layers will ensure that next-hop is reachable, and the route is successfully installed.
BGP configuration example at the network layer router:
router bgp 65001
neighbor 10.205.32.15 remote-as 65002
address-family ipv4 unicast
neighbor 10.205.32.15 ebgp-multihop
neighbor 10.205.32.15 activate
neighbor 10.205.32.15 prefix-list service-subnets in
Prefix-list
`ip prefix-list service-subnets seq 100 permit 192.168.0.0/24 ge 32`
EIGRP configuration at the "access" layer switches
router eigrp 100
network 192.168.0.0/24
Layer-3 Gateway at "access" layer
interface vlan 3032
ip address 10.032.2/22
ip address 192.168.0.1/24 secondary
hsrp 32
ip 10.0.32.1
MetalLB configuration
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config:
peers:
- peer-address: 10.0.0.1
peer-asn: 65001
my-asn: 65002
address-pools:
- name: default
protocol: bgp
addresses:
- 192.168.0.0/24
bgp-advertisements:
- aggregation-length: 32
localpref: 100
communities:
- no-advertise
- aggregation-length: 24
bgp-communities:
no-advertise: 64512-64534