BGP MPLS-VPN Option B

carrying label information in bgp updates

1 May 2022   13 min read

The Cisco documentation about this call it Inter-AS Option B with the use case being to extend LSPs between sites over the one link. As Option B is the only MPLS-VPN method supported by Cisco SD-WAN I wanted to get a better understanding of how it works aswell as see if it could be used to extend multi-VRF prefixes between edge routers and a core switch within the same AS (rather than using Option C with LDP).


Table Of Contents



Topology

To quote RFC 3107 (Carrying Label Information in BGP-4):

  • If two immediately adjacent Label Switched Routers (LSRs) are also BGP peers, then label distribution can be done without the need for any other label distribution protocol.
  • Label mapping information for a particular route is piggybacked in the same BGP Update message that is used to distribute the route itself. When BGP is used to distribute a particular route it also distributes an MPLS label which is mapped to that route.

The lab topology uses the IPv4 address-family underlay to advertise loopbacks (loopback1) that are used to build the the VPNv4 address-family overlay. The reason I did it this way was because I wanted route-through any issues (failures of WAN to Core links) rather than routing-around them (routing convergence).

To enabled labelled BGP send-label is added to the underlay (IPv4 address-family) neighbor and mpls bgp forwarding to the physical interface facing that neighbor. The rest of the configuration is pretty standard, I needed to make WAN01 and WAN02 route reflectors on the underlay (route-reflector-client) so that their loopbacks are advertised to the CORE01 in failure scenarios. The configs can be found here.

WAN02# show ip bgp summary | in 19|N
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.110.1   4        65103      13      13        7    0    0 00:07:02        4
192.168.112.2   4        65103      11      13        7    0    0 00:06:46        3

WAN02# show ip route | in B__
B        10.1.1.1 [200/0] via 192.168.110.1, 00:05:54
B        10.3.3.3 [200/0] via 192.168.112.2, 00:05:54
B        192.168.111.0 [200/0] via 192.168.110.1, 00:05:54

WAN02# show bgp vpnv4 unicast all summary | in 10|N
BGP router identifier 10.2.2.2, local AS number 65103
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.1.1.1        4        65103     126     126        6    0    0 00:06:13        0
10.3.3.3        4        65103     127     127        6    0    0 00:06:10        0

WAN02# show mpls interfaces
Interface              IP            Tunnel   BGP Static Operational
GigabitEthernet1       No            No       Yes No     Yes
GigabitEthernet2       No            No       Yes No     Yes

The WAN routers have a BLU VRF BGP peering to external data centres (different ASs) and all devices within the DC and campus advertise loopbacks in to the BLU VRF.

WAN02# show bgp vpnv4 unicast all summary | in 10|N
BGP router identifier 10.2.2.2, local AS number 65103
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.1.1.1        4        65103    1802    1802       28    0    0 01:31:51        5
10.3.3.3        4        65103    1801    1803       28    0    0 01:31:48        1
192.168.200.1   4        65102      28      34       28    0    0 00:22:58        4

BGP advertised MPLS Labels

For every prefix a device running labeled BGP (mpls bgp forwarding and send-label) advertises a label is assigned. For example on WAN02 it has a label for its local loopback as well for the loopbacks learnt from the DC2 peering.

WAN02#show mpls forwarding-table
Local      Outgoing   Prefix              Bytes Label   Outgoing        Next Hop
Label      Label      or Tunnel Id        Switched      interface
16         No Label   172.16.200.1/32[V]  11970         Gi3             192.168.200.1
18         No Label   172.16.200.2/32[V]  0             Gi3             192.168.200.1
19         No Label   172.16.200.3/32[V]  114000        Gi3             192.168.200.1
20         No Label   192.168.200.0/30[V] 0             aggregate/BLU
21         Pop Label  172.16.2.2/32[V]    0             aggregate/BLU

The Bytes Label Switched counter will only increment for remote prefixes, it doesn’t do so for the routers local loopback or interface. Similarly in a traceroute you only see the MPLS label if the prefix is not local to the router.

CORE01# traceroute vrf BLU 172.16.2.2 source loopback 11
Type escape sequence to abort.
Tracing the route to 172.16.2.2
VRF info: (vrf in name/id, vrf out name/id)
  1 172.16.2.2 14 msec *  7 msec

CORE01# traceroute vrf BLU 172.16.200.1 source loopback 11
Type escape sequence to abort.
Tracing the route to 172.16.200.1
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.200.2 [MPLS: Label 16 Exp 0] 15 msec 7 msec 8 msec
  2 192.168.200.1 20 msec *  10 msec

All local interfaces have an Outgoing Label of POP label whilst prefixes learnt remotely (over BGP peering) have No Label.

  • No Label: Strips all of the MPLS labels off the packet and forward the raw IP packet (no longer a labeled packet)
  • Pop Label: Removes the top label and forwards the remaining payload including any other labels. If the label is the last label in the stack (has BoS bit set) the outgoing packet is no longer a labeled packet and is forwarded as an IPv4 packet

The below capture shows the BGP UPDATE message from WAN02 when loopback11 (172.16.2.2) was brought up. You can see in the BGP header that the BGP prefix holds an MPLS label (21) with bottom meaning it is to be Bottom-of-Stack (BoS).

The BGP assigned labels are either inner or outer labels and can be viewed per-prefix, per-rd, per-vrf or for all prefixes.

  • inner label: Prefixes that the device is advertising, so the label a neighbor will use when sending packets through this device
  • outer label: Prefixes learnt from a neighbor, so the label this device will use when sending packets through its neighbor
WAN02# show bgp vrf BLU 172.16.3.3
BGP routing table entry for 10.3.3.3:3001:172.16.3.3/32, version 14
Paths: (1 available, best #1, table BLU)
  Advertised to update-groups:
     2
  Refresh Epoch 2
  Local
    10.3.3.3 (via default) from 10.3.3.3 (10.3.3.3)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:65103:10003001
      mpls labels in/out nolabel/17
      rx pathid: 0, tx pathid: 0x0

WAN02# show bgp vpnv4 unicast all labels
   Network              Next Hop          In label/Out label
Route Distinguisher: 10.3.3.3:3001 (BLU)
   172.16.1.1/32        10.1.1.1          nolabel/17
   172.16.2.2/32        0.0.0.0           21/nolabel(BLU)
   172.16.3.3/32        10.3.3.3          nolabel/17
   172.16.100.1/32      10.1.1.1          nolabel/18
   172.16.100.2/32      10.1.1.1          nolabel/19
   172.16.100.3/32      10.1.1.1          nolabel/20
   172.16.200.1/32      192.168.200.1     16/nolabel
   172.16.200.2/32      192.168.200.1     18/nolabel
   172.16.200.3/32      192.168.200.1     19/nolabel
   192.168.100.0/30     10.1.1.1          nolabel/16
   192.168.200.0/30     192.168.200.1     20/nolabel

Traffic from CORE01 to Loopback11 (172.16.2.2) on WAN01 will use label 21, a ping sourced from loopback11 (172.16.3.3) is successful.

CORE01# show ip route vrf BLU 172.16.2.2
Routing Table: BLU
Routing entry for 172.16.2.2/32
  Known via "bgp 65103", distance 200, metric 0, type internal
  Last update from 10.2.2.2 00:24:43 ago
  Routing Descriptor Blocks:
  * 10.2.2.2 (default), from 10.2.2.2, 00:24:43 ago
      Route metric is 0, traffic share count is 1
      AS Hops 0
      MPLS label: 21
      MPLS Flags: MPLS Required

CORE01# show ip cef vrf BLU 172.16.2.2 detail
172.16.2.2/32, epoch 0, flags [rib defined all labels]
  recursive via 10.2.2.2 label 21()
    recursive via 192.168.112.1
      attached to GigabitEthernet2

CORE01# ping vrf BLU 172.16.2.2 source loopback 11
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.2.2, timeout is 2 seconds:
Packet sent with a source address of 172.16.3.3
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 2/74/360 ms

This capture is taken on the link between WAN02 and CORE01 and shows the extra MPLS header with a label of 21. It has the BoS bit set indicating it is the last label in the stack causing WAN02 to forward it as an IP packet using the BLU routing table.

Labeled BGP peers must be adjacent

I configured the lab in this way with loopbacks thinking that if a WAN to CORE link went down it would route-through the problem. The idea was that a link between a WAN and the CORE01 went down the underlay peering would be lost (IPv4 BGP over physical interface) but the overlay peerings (VPNv4 BGP using loopback) would stay up as the loopback is routable via the other WAN.

WAN02(config)# int gi2
WAN02(config-if)# shut

CORE01#show ip bgp summary | in 19|N
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.111.1   4        65103     141     139        8    0    0 02:03:13        4
192.168.112.1   4        65103       0       0        1    0    0 00:00:31 Active

BGP router identifier 10.3.3.3, local AS number 65103
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.1.1.1        4        65103    2436    2434       33    0    0 02:04:16        5
10.2.2.2        4        65103    2436    2435       33    0    0 02:04:15        5

CORE01#show ip cef vrf BLU 172.16.2.2
172.16.2.2/32
  nexthop 10.2.2.2 GigabitEthernet2 label 21()

Although the CORE still has a route reachability between loopbacks on the CORE and WAN02 is broken.

CORE01#ping vrf BLU 172.16.2.2 source loopback 11
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.2.2, timeout is 2 seconds:
Packet sent with a source address of 172.16.3.3
.....
Success rate is 0 percent (0/5

From a packet capture I can see the packets arriving WAN01 but these are not forwarded onto WAN02. I think the reason for this is because when it gets to WAN01 it tries to forward it based on the label 21 but that label entry does not exist on WAN01.

WAN01#show mpls forwarding-table labels 21
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface

Goes to show you should read the RFC properly to start with, it states pretty clearly in there that this will not work.

  1. When the BGP Peers are not Directly Adjacent
    Consider the following LSR topology: A–B–C–D. Suppose that D distributes a label L to A. In this topology, A cannot simply push L onto a packet’s label stack, and then send the resulting packet to B. D must be the only LSR that sees L at the top of the stack. Before sends the packet to B, it must push on another label, which was distributed by B. B must replace this label with yet another label, which was distributed by C. In other words, there must be an LSP between A and D. If there is no such LSP, A cannot make use of label L. This is true any time labels are distributed between non-adjacent LSRs, whether that distribution is done by BGP or by some other method.

Strangely although pings to loopback11 of WAN02 didn’t work, pings to loopback12 of DC2 did work.

CORE01#ping vrf BLU 172.16.200.2 source loopback 11
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.200.2, timeout is 2 seconds:
Packet sent with a source address of 172.16.3.3
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 10/28/49 ms

The reason this worked was because the label used by 172.16.200.2 (18) exists on WAN01 for a different prefix (DC1 loopback11 172.16.100.1).

CORE01# show ip route vrf BLU 172.16.200.2
Routing Table: BLU
Routing entry for 172.16.200.2/32
  Known via "bgp 65103", distance 200, metric 0
  Tag 65102, type internal
  Last update from 10.2.2.2 01:07:19 ago
  Routing Descriptor Blocks:
  * 10.2.2.2 (default), from 10.2.2.2, 01:07:19 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 65102
      MPLS label: 18
      MPLS Flags: MPLS Required

WAN01# show mpls forwarding-table labels 18
Local      Outgoing   Prefix              Bytes Label   Outgoing     Next Hop
Label      Label      or Tunnel Id        Switched      interface
18         No Label   172.16.100.1/32[V]  1074          Gi3         192.168.100.1

Once traffic arrived at WAN01 it was IP forwarded into BLU VRF and after going round the houses eventually got to the destination.

CORE01# traceroute vrf BLU 172.16.200.2 source loopback 11
Type escape sequence to abort.
Tracing the route to 172.16.200.2
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.100.2 [MPLS: Label 18 Exp 0] 24 msec 13 msec 13 msec
  2 192.168.100.1 49 msec 21 msec 11 msec
  3 192.168.100.2 14 msec 15 msec 8 msec
  4 192.168.200.2 [MPLS: Label 18 Exp 0] 19 msec 14 msec 13 msec
  5 192.168.200.1 18 msec *  11 msec

Non-loopback overlay - Still doesn’t work

I tried changing the design to use the physical interfaces instead of loopbacks for the overlay peering to see if it made any difference.

WAN02# show ip bgp summary | in 651|N
BGP router identifier 10.2.2.2, local AS number 65103
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.110.1   4        65103      43      43        7    0    0 00:01:47        2
192.168.112.2   4        65103      40      44        7    0    0 00:01:50        2

WAN02# show bgp vpnv4 unicast all summary | in 651|N
BGP router identifier 10.2.2.2, local AS number 65103
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.110.1   4        65103      52      52       18    0    0 00:02:13        6
192.168.112.2   4        65103      49      53       18    0    0 00:02:16        1
192.168.200.1   4        65102      44      47       18    0    0 00:02:09        4

As expected with all the links up everything works the same as it did when using loopbacks for the overlay.

CORE01# show ip cef vrf BLU 172.16.200.1
172.16.200.1/32
  nexthop 192.168.112.1 GigabitEthernet2 label 21()

CORE01# traceroute vrf BLU 172.16.200.1 source loopback 11
Type escape sequence to abort.
Tracing the route to 172.16.200.1
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.200.2 [MPLS: Label 21 Exp 0] 10 msec 6 msec 8 msec
  2 192.168.200.1 5 msec *  9 msec

WAN02# show mpls forwarding-table labels 16
Local      Outgoing   Prefix              Bytes Label   Outgoing      Next Hop
Label      Label      or Tunnel Id        Switched      interface
16         No Label   172.16.200.2/32[V]  1140          Gi3           192.168.200.1

With the link between WAN02 to CORE01 down traffic must go through WAN01 and once again despite having the routing information traffic forwarding does not work as WAN01 has no entry for that label.

WAN02(config)# int gi 2
WAN02(config-if)# shut

CORE01#show ip cef vrf BLU 172.16.200.1
172.16.200.1/32
  nexthop 192.168.111.1 GigabitEthernet1 label 21()

CORE01#ping vrf BLU 172.16.200.1 source loopback 11
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.200.1, timeout is 2 seconds:
Packet sent with a source address of 172.16.3.3
.....
Success rate is 0 percent (0/5)

CORE01#traceroute vrf BLU 172.16.200.1 source loopback 11
Type escape sequence to abort.
Tracing the route to 172.16.200.1
VRF info: (vrf in name/id, vrf out name/id)
  1  *  *  *
  2  *  *  *
  3  *  *  *
  4

Summary

To sum up MPLS-VPN option B (InterAS option B)

  • MPLS BGP forwarding is only supported on directly connected interfaces enabled to receive MPLS traffic (mpls bgp forwarding)
  • The IPv4 BGP peering between devices is used to send and receive labels (send-label)
  • It requires only one MP-BGP session (using physical interfaces or loopbacks) to exchange all VPN prefixes between the LSRs
  • The MP-BGP session distributes labeled VPN prefixes between the LSRs. As a result, the traffic that flows between the LSRs is labeled
  • Because the traffic is MPLS, QoS mechanisms that are applied only to IP traffic cannot be carried and the VRFs cannot be isolated
  • This feature provides nonstop forwarding (NSF) and Graceful Restart

MPLS-VPN option B is pretty simple to setup and as long as you understand its limitations and use case it is a fairly straight forward to use and troubleshoot. If you did want to use it for intra-AS traffic in place of LDP you could probably workaround the limitations using per-VRF peerings, redistribution from other routing protocols or possibly static label entires. However all of these options would add a level of complexity to the solution so I am not really sure that there is much benefit to them.

The main reason I am looking at it is because this is the only option supported by Cisco SD-WAN. I still need to test it fully with SD-WAN but don’t think it be a problem in this scenario as there will be no MPLS between SD-WAN cEdges. MPLS would be used from each SD-WAN cEdge to the Core, but between the cEdges SD-WAN uses OMP to share routing information and this will be redistributed into BGP on the cEdges. Therefore in theory you would not have the problem as a prefixes will have their own labels generated on each cEdge.

https://blog.ipspace.net/2014/11/handling-bottom-of-mpls-stack.html
https://datatracker.ietf.org/doc/html/rfc3107
https://lostintransit.se/2016/03/02/ccde-inter-as-l3-vpns
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/sw/5_x/nx-os/mpls/configuration/guide/mpls_cg/mp_interas_optionb_lite.html