rfc9692.original | rfc9692.txt | |||
---|---|---|---|---|
RIFT Working Group A. Przygienda, Ed. | Internet Engineering Task Force (IETF) T. Przygienda, Ed. | |||
Internet-Draft J. Head, Ed. | Request for Comments: 9692 J. Head, Ed. | |||
Intended status: Standards Track Juniper Networks | Category: Standards Track Juniper Networks | |||
Expires: 24 November 2024 A. Sharma | ISSN: 2070-1721 A. Sharma | |||
Hudson River Trading | Hudson River Trading | |||
P. Thubert | P. Thubert | |||
Bruno. Rijsman | B. Rijsman | |||
Individual | Individual | |||
Dmitry. Afanasiev | D. Afanasiev | |||
Yandex | Yandex | |||
23 May 2024 | January 2025 | |||
RIFT: Routing in Fat Trees | RIFT: Routing in Fat Trees | |||
draft-ietf-rift-rift-24 | ||||
Abstract | Abstract | |||
This document defines a specialized, dynamic routing protocol for | This document defines a specialized, dynamic routing protocol for | |||
Clos, fat tree, and variants thereof. These topologies were | Clos, fat tree, and variants thereof. These topologies were | |||
initially used within crossbar interconnects, and consequently router | initially used within crossbar interconnects and consequently router | |||
and switch backplanes, but their characteristics make them ideal for | and switch backplanes, but their characteristics make them ideal for | |||
constructing IP fabrics as well. The protocol specified by this | constructing IP fabrics as well. The protocol specified by this | |||
document is optimized toward the minimization of control plane state | document is optimized towards the minimization of control plane state | |||
to support very large substrates as well as the minimization of | to support very large substrates as well as the minimization of | |||
configuration and operational complexity to allow for simplified | configuration and operational complexity to allow for a simplified | |||
deployment of said topologies. | deployment of said topologies. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 24 November 2024. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9692. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2024 IETF Trust and the persons identified as the | Copyright (c) 2025 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5 | 1. Introduction | |||
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 8 | 1.1. Requirements Language | |||
2. A Reader's Digest . . . . . . . . . . . . . . . . . . . . . . 8 | 2. A Reader's Digest | |||
3. Reference Frame . . . . . . . . . . . . . . . . . . . . . . . 10 | 3. Reference Frame | |||
3.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 10 | 3.1. Terminology | |||
3.2. Topology . . . . . . . . . . . . . . . . . . . . . . . . 16 | 3.2. Topology | |||
4. RIFT: Routing in Fat Trees . . . . . . . . . . . . . . . . . 19 | 4. RIFT: Routing in Fat Trees | |||
5. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 19 | 5. Overview | |||
5.1. Properties . . . . . . . . . . . . . . . . . . . . . . . 19 | 5.1. Properties | |||
5.2. Generalized Topology View . . . . . . . . . . . . . . . . 20 | 5.2. Generalized Topology View | |||
5.2.1. Terminology and Glossary . . . . . . . . . . . . . . 20 | 5.2.1. Terminology and Glossary | |||
5.2.2. Clos as Crossed, Stacked Crossbars . . . . . . . . . 21 | 5.2.2. Clos as Crossed, Stacked Crossbars | |||
5.3. Fallen Leaf Problem . . . . . . . . . . . . . . . . . . . 31 | 5.3. Fallen Leaf Problem | |||
5.4. Discovering Fallen Leaves . . . . . . . . . . . . . . . . 33 | 5.4. Discovering Fallen Leaves | |||
5.5. Addressing the Fallen Leaves Problem . . . . . . . . . . 34 | 5.5. Addressing the Fallen Leaves Problem | |||
6. Specification . . . . . . . . . . . . . . . . . . . . . . . . 35 | 6. Specification | |||
6.1. Transport . . . . . . . . . . . . . . . . . . . . . . . . 36 | 6.1. Transport | |||
6.2. Link (Neighbor) Discovery (LIE Exchange) . . . . . . . . 36 | 6.2. Link (Neighbor) Discovery (LIE Exchange) | |||
6.2.1. LIE Finite State Machine . . . . . . . . . . . . . . 42 | 6.2.1. LIE Finite State Machine | |||
6.3. Topology Exchange (TIE Exchange) . . . . . . . . . . . . 52 | 6.3. Topology Exchange (TIE Exchange) | |||
6.3.1. Topology Information Elements . . . . . . . . . . . . 52 | 6.3.1. Topology Information Elements | |||
6.3.2. Southbound and Northbound TIE Representation . . . . 53 | 6.3.2. Southbound and Northbound TIE Representation | |||
6.3.3. Flooding . . . . . . . . . . . . . . . . . . . . . . 56 | 6.3.3. Flooding | |||
6.3.4. TIE Flooding Scopes . . . . . . . . . . . . . . . . . 65 | 6.3.4. TIE Flooding Scopes | |||
6.3.5. RAIN: RIFT Adjacency Inrush Notification . . . . . . 70 | 6.3.5. RAIN: RIFT Adjacency Inrush Notification | |||
6.3.6. Initial and Periodic Database Synchronization . . . . 70 | 6.3.6. Initial and Periodic Database Synchronization | |||
6.3.7. Purging and Roll-Overs . . . . . . . . . . . . . . . 70 | 6.3.7. Purging and Rollovers | |||
6.3.8. Southbound Default Route Origination . . . . . . . . 71 | 6.3.8. Southbound Default Route Origination | |||
6.3.9. Northbound TIE Flooding Reduction . . . . . . . . . . 72 | 6.3.9. Northbound TIE Flooding Reduction | |||
6.3.10. Special Considerations . . . . . . . . . . . . . . . 77 | 6.3.10. Special Considerations | |||
6.4. Reachability Computation . . . . . . . . . . . . . . . . 78 | 6.4. Reachability Computation | |||
6.4.1. Northbound Reachability SPF . . . . . . . . . . . . . 79 | 6.4.1. Northbound Reachability SPF | |||
6.4.2. Southbound Reachability SPF . . . . . . . . . . . . . 80 | 6.4.2. Southbound Reachability SPF | |||
6.4.3. East-West Forwarding Within a non-ToF Level . . . . . 80 | 6.4.3. East-West Forwarding Within a Non-ToF Level | |||
6.4.4. East-West Links Within ToF Level . . . . . . . . . . 80 | 6.4.4. East-West Links Within a ToF Level | |||
6.5. Automatic Disaggregation on Link & Node Failures . . . . 80 | 6.5. Automatic Disaggregation on Link & Node Failures | |||
6.5.1. Positive, Non-transitive Disaggregation . . . . . . . 80 | 6.5.1. Positive, Non-Transitive Disaggregation | |||
6.5.2. Negative, Transitive Disaggregation for Fallen | 6.5.2. Negative, Transitive Disaggregation for Fallen Leaves | |||
Leaves . . . . . . . . . . . . . . . . . . . . . . . 84 | 6.6. Attaching Prefixes | |||
6.6. Attaching Prefixes . . . . . . . . . . . . . . . . . . . 86 | 6.7. Optional Zero Touch Provisioning (RIFT ZTP) | |||
6.7. Optional Zero Touch Provisioning (RIFT ZTP) . . . . . . . 94 | 6.7.1. Terminology | |||
6.7.1. Terminology . . . . . . . . . . . . . . . . . . . . . 95 | 6.7.2. Automatic System ID Selection | |||
6.7.2. Automatic System ID Selection . . . . . . . . . . . . 97 | 6.7.3. Generic Fabric Example | |||
6.7.3. Generic Fabric Example . . . . . . . . . . . . . . . 97 | 6.7.4. Level Determination Procedure | |||
6.7.4. Level Determination Procedure . . . . . . . . . . . . 98 | 6.7.5. RIFT ZTP FSM | |||
6.7.5. RIFT ZTP FSM . . . . . . . . . . . . . . . . . . . . 100 | 6.7.6. Resulting Topologies | |||
6.7.6. Resulting Topologies . . . . . . . . . . . . . . . . 105 | 6.8. Further Mechanisms | |||
6.8. Further Mechanisms . . . . . . . . . . . . . . . . . . . 106 | 6.8.1. Route Preferences | |||
6.8.1. Route Preferences . . . . . . . . . . . . . . . . . . 106 | 6.8.2. Overload Bit | |||
6.8.2. Overload Bit . . . . . . . . . . . . . . . . . . . . 107 | 6.8.3. Optimized Route Computation on Leaves | |||
6.8.3. Optimized Route Computation on Leaves . . . . . . . . 107 | 6.8.4. Mobility | |||
6.8.4. Mobility . . . . . . . . . . . . . . . . . . . . . . 108 | 6.8.5. Key/Value (KV) Store | |||
6.8.5. Key/Value (KV) Store . . . . . . . . . . . . . . . . 111 | 6.8.6. Interactions with BFD | |||
6.8.6. Interactions with BFD . . . . . . . . . . . . . . . . 112 | 6.8.7. Fabric Bandwidth Balancing | |||
6.8.7. Fabric Bandwidth Balancing . . . . . . . . . . . . . 113 | 6.8.8. Label Binding | |||
6.8.8. Label Binding . . . . . . . . . . . . . . . . . . . . 116 | 6.8.9. L2L Procedures | |||
6.8.9. Leaf to Leaf Procedures . . . . . . . . . . . . . . . 116 | 6.8.10. Address Family and Multi-Topology Considerations | |||
6.8.10. Address Family and Multi Topology Considerations . . 117 | 6.8.11. One-Hop Healing of Levels with East-West Links | |||
6.8.11. One-Hop Healing of Levels with East-West Links . . . 117 | 6.9. Security | |||
6.9. Security . . . . . . . . . . . . . . . . . . . . . . . . 117 | 6.9.1. Security Model | |||
6.9.1. Security Model . . . . . . . . . . . . . . . . . . . 117 | 6.9.2. Security Mechanisms | |||
6.9.2. Security Mechanisms . . . . . . . . . . . . . . . . . 119 | 6.9.3. Security Envelope | |||
6.9.3. Security Envelope . . . . . . . . . . . . . . . . . . 120 | 6.9.4. Weak Nonces | |||
6.9.4. Weak Nonces . . . . . . . . . . . . . . . . . . . . . 124 | 6.9.5. Lifetime | |||
6.9.5. Lifetime . . . . . . . . . . . . . . . . . . . . . . 125 | 6.9.6. Security Association Changes | |||
6.9.6. Security Association Changes . . . . . . . . . . . . 125 | 7. Information Elements Schema | |||
7. Information Elements Schema . . . . . . . . . . . . . . . . . 125 | 7.1. Backwards-Compatible Extension of Schema | |||
7.1. Backwards-Compatible Extension of Schema . . . . . . . . 126 | 7.2. common.thrift | |||
7.2. common.thrift . . . . . . . . . . . . . . . . . . . . . . 127 | 7.3. encoding.thrift | |||
7.3. encoding.thrift . . . . . . . . . . . . . . . . . . . . . 133 | 8. Further Details on Implementation | |||
8. Further Details on Implementation . . . . . . . . . . . . . . 140 | 8.1. Considerations for Leaf-Only Implementation | |||
8.1. Considerations for Leaf-Only Implementation . . . . . . . 140 | 8.2. Considerations for Spine Implementation | |||
8.2. Considerations for Spine Implementation . . . . . . . . . 141 | 9. Security Considerations | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 141 | 9.1. General | |||
9.1. General . . . . . . . . . . . . . . . . . . . . . . . . . 141 | 9.2. Time to Live and Hop Limit Values | |||
9.2. Time to Live and Hop Limit Values . . . . . . . . . . . . 142 | 9.3. Malformed Packets | |||
9.3. Malformed Packets . . . . . . . . . . . . . . . . . . . . 142 | 9.4. RIFT ZTP | |||
9.4. RIFT ZTP . . . . . . . . . . . . . . . . . . . . . . . . 143 | 9.5. Lifetime | |||
9.5. Lifetime . . . . . . . . . . . . . . . . . . . . . . . . 143 | 9.6. Packet Number | |||
9.6. Packet Number . . . . . . . . . . . . . . . . . . . . . . 143 | 9.7. Outer Fingerprint Attacks | |||
9.7. Outer Fingerprint Attacks . . . . . . . . . . . . . . . . 143 | 9.8. TIE Origin Fingerprint DoS Attacks | |||
9.8. TIE Origin Fingerprint DoS Attacks . . . . . . . . . . . 144 | 9.9. Host Implementations | |||
9.9. Host Implementations . . . . . . . . . . . . . . . . . . 144 | 9.9.1. IPv4 Broadcast and IPv6 All-Routers Multicast | |||
9.9.1. IPv4 Broadcast and IPv6 All Routers Multicast | Implementations | |||
Implementations . . . . . . . . . . . . . . . . . . . 145 | 10. IANA Considerations | |||
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 145 | 10.1. Multicast and Port Numbers | |||
10.1. Requested Multicast and Port Numbers . . . . . . . . . . 145 | 10.2. Registry for RIFT Security Algorithms | |||
10.2. Requested Registry for RIFT Security Algorithms . . . . 146 | 10.3. Registries with Assigned Values for Schema Values | |||
10.3. Requested Registries with Assigned Values for Schema | 10.3.1. RIFTVersions Registry | |||
Values . . . . . . . . . . . . . . . . . . . . . . . . . 147 | 10.3.2. RIFTCommonAddressFamilyType Registry | |||
10.3.1. Registry RIFT/Versions . . . . . . . . . . . . . . . 148 | 10.3.3. RIFTCommonHierarchyIndications Registry | |||
10.3.2. Registry RIFT/common/AddressFamilyType . . . . . . . 148 | 10.3.4. RIFTCommonIEEE8021ASTimeStampType Registry | |||
10.3.3. Registry RIFT/common/HierarchyIndications . . . . . 149 | 10.3.5. RIFTCommonIPAddressType Registry | |||
10.3.4. Registry RIFT/common/IEEE802_1ASTimeStampType . . . 149 | 10.3.6. RIFTCommonIPPrefixType Registry | |||
10.3.5. Registry RIFT/common/IPAddressType . . . . . . . . . 150 | 10.3.7. RIFTCommonIPv4PrefixType Registry | |||
10.3.6. Registry RIFT/common/IPPrefixType . . . . . . . . . 150 | 10.3.8. RIFTCommonIPv6PrefixType Registry | |||
10.3.7. Registry RIFT/common/IPv4PrefixType . . . . . . . . 151 | 10.3.9. RIFTCommonKVTypes Registry | |||
10.3.8. Registry RIFT/common/IPv6PrefixType . . . . . . . . 151 | 10.3.10. RIFTCommonPrefixSequenceType Registry | |||
10.3.9. Registry RIFT/common/KVTypes . . . . . . . . . . . . 152 | 10.3.11. RIFTCommonRouteType Registry | |||
10.3.10. Registry RIFT/common/PrefixSequenceType . . . . . . 152 | 10.3.12. RIFTCommonTIETypeType Registry | |||
10.3.11. Registry RIFT/common/RouteType . . . . . . . . . . . 153 | 10.3.13. RIFTCommonTieDirectionType Registry | |||
10.3.12. Registry RIFT/common/TIETypeType . . . . . . . . . . 154 | 10.3.14. RIFTEncodingCommunity Registry | |||
10.3.13. Registry RIFT/common/TieDirectionType . . . . . . . 155 | 10.3.15. RIFTEncodingKeyValueTIEElement Registry | |||
10.3.14. Registry RIFT/encoding/Community . . . . . . . . . . 156 | 10.3.16. RIFTEncodingKeyValueTIEElementContent Registry | |||
10.3.15. Registry RIFT/encoding/KeyValueTIEElement . . . . . 156 | 10.3.17. RIFTEncodingLIEPacket Registry | |||
10.3.16. Registry RIFT/encoding/KeyValueTIEElementContent . . 157 | 10.3.18. RIFTEncodingLinkCapabilities Registry | |||
10.3.17. Registry RIFT/encoding/LIEPacket . . . . . . . . . . 157 | 10.3.19. RIFTEncodingLinkIDPair Registry | |||
10.3.18. Registry RIFT/encoding/LinkCapabilities . . . . . . 160 | 10.3.20. RIFTEncodingNeighbor Registry | |||
10.3.19. Registry RIFT/encoding/LinkIDPair . . . . . . . . . 161 | 10.3.21. RIFTEncodingNodeCapabilities Registry | |||
10.3.20. Registry RIFT/encoding/Neighbor . . . . . . . . . . 163 | 10.3.22. RIFTEncodingNodeFlags Registry | |||
10.3.21. Registry RIFT/encoding/NodeCapabilities . . . . . . 163 | 10.3.23. RIFTEncodingNodeNeighborsTIEElement Registry | |||
10.3.22. Registry RIFT/encoding/NodeFlags . . . . . . . . . . 164 | 10.3.24. RIFTEncodingNodeTIEElement Registry | |||
10.3.23. Registry RIFT/encoding/NodeNeighborsTIEElement . . . 165 | 10.3.25. RIFTEncodingPacketContent Registry | |||
10.3.24. Registry RIFT/encoding/NodeTIEElement . . . . . . . 166 | 10.3.26. RIFTEncodingPacketHeader Registry | |||
10.3.25. Registry RIFT/encoding/PacketContent . . . . . . . . 167 | 10.3.27. RIFTEncodingPrefixAttributes Registry | |||
10.3.26. Registry RIFT/encoding/PacketHeader . . . . . . . . 168 | 10.3.28. RIFTEncodingPrefixTIEElement Registry | |||
10.3.27. Registry RIFT/encoding/PrefixAttributes . . . . . . 169 | 10.3.29. RIFTEncodingProtocolPacket Registry | |||
10.3.28. Registry RIFT/encoding/PrefixTIEElement . . . . . . 171 | 10.3.30. RIFTEncodingTIDEPacket Registry | |||
10.3.29. Registry RIFT/encoding/ProtocolPacket . . . . . . . 171 | 10.3.31. RIFTEncodingTIEElement Registry | |||
10.3.30. Registry RIFT/encoding/TIDEPacket . . . . . . . . . 171 | 10.3.32. RIFTEncodingTIEHeader Registry | |||
10.3.31. Registry RIFT/encoding/TIEElement . . . . . . . . . 172 | 10.3.33. RIFTEncodingTIEHeaderWithLifeTime Registry | |||
10.3.32. Registry RIFT/encoding/TIEHeader . . . . . . . . . . 173 | 10.3.34. RIFTEncodingTIEID Registry | |||
10.3.33. Registry RIFT/encoding/TIEHeaderWithLifeTime . . . . 174 | 10.3.35. RIFTEncodingTIEPacket Registry | |||
10.3.34. Registry RIFT/encoding/TIEID . . . . . . . . . . . . 175 | 10.3.36. RIFTEncodingTIREPacket Registry | |||
10.3.35. Registry RIFT/encoding/TIEPacket . . . . . . . . . . 175 | 11. References | |||
10.3.36. Registry RIFT/encoding/TIREPacket . . . . . . . . . 176 | 11.1. Normative References | |||
11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 176 | 11.2. Informative References | |||
12. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 177 | Appendix A. Sequence Number Binary Arithmetic | |||
13. References . . . . . . . . . . . . . . . . . . . . . . . . . 178 | Appendix B. Examples | |||
13.1. Normative References . . . . . . . . . . . . . . . . . . 178 | B.1. Normal Operation | |||
13.2. Informative References . . . . . . . . . . . . . . . . . 180 | B.2. Leaf Link Failure | |||
Appendix A. Sequence Number Binary Arithmetic . . . . . . . . . 183 | B.3. Partitioned Fabric | |||
Appendix B. Examples . . . . . . . . . . . . . . . . . . . . . . 184 | B.4. Northbound Partitioned Router and Optional East-West Links | |||
B.1. Normal Operation . . . . . . . . . . . . . . . . . . . . 184 | Acknowledgments | |||
B.2. Leaf Link Failure . . . . . . . . . . . . . . . . . . . . 186 | Contributors | |||
B.3. Partitioned Fabric . . . . . . . . . . . . . . . . . . . 187 | Authors' Addresses | |||
B.4. Northbound Partitioned Router and Optional East-West | ||||
Links . . . . . . . . . . . . . . . . . . . . . . . . . . 188 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 189 | ||||
1. Introduction | 1. Introduction | |||
Clos [CLOS] topologies have gained prominence in today's networking, | Clos [CLOS] topologies have gained prominence in today's networking, | |||
primarily as a result of the paradigm shift towards a centralized | primarily as a result of the paradigm shift towards a centralized | |||
data-center architecture that is poised to deliver a majority of | data center architecture that is poised to deliver a majority of | |||
computation and storage services in the future. Such networks are | computation and storage services in the future. Such networks are | |||
called commonly a fat tree/network in modern IP fabric considerations | commonly called a fat tree / network in modern IP fabric | |||
[VAHDAT08] as homonym to the original definition of the term | considerations [VAHDAT08] as a similar term for the original | |||
[FATTREE]. In most generic terms, and disregarding exceptions like | definition of the term Fat Tree [FATTREE]. In most generic terms, | |||
horizontal shortcuts, those networks are all variations of a | and disregarding exceptions like horizontal shortcuts, those networks | |||
structured design isomorphic to a ranked lattice where the least | are all variations of a structured design isomorphic to a ranked | |||
upper bound is the "top of the fabric" and links closer to the top | lattice where the least upper bound is the "top of the fabric" and | |||
may be "fatter" to guarantee non-blocking bi-sectional capacity. | links closer to the top may be "fatter" to guarantee non-blocking | |||
bisectional capacity. | ||||
Many builders of such IP fabrics desire a protocol that auto- | Many builders of such IP fabrics desire a protocol that | |||
configures itself and deals with failures and mis-configurations with | autoconfigures itself and deals with failures and misconfigurations | |||
a minimum of human intervention. Such a solution would allow local | with a minimum amount of human intervention. Such a solution would | |||
IP fabric bandwidth to be consumed in a 'standard component' fashion, | allow local IP fabric bandwidth to be consumed in a "standard | |||
i.e. provision it much faster and operate it at much lower costs than | component" fashion, i.e., provision it much faster and operate it at | |||
today, much like compute or storage is consumed already. | much lower costs than today, similar to how compute or storage is | |||
consumed already. | ||||
In looking at the problem through the lens of such IP fabric | In looking at the problem through the lens of such IP fabric | |||
requirements, RIFT (Routing in Fat Trees) addresses those challenges | requirements, Routing in Fat Trees (RIFT) addresses those challenges | |||
not through an incremental modification of either a link-state | not through an incremental modification of either a link-state | |||
(distributed computation) or distance-vector (diffused computation) | (distributed computation) or distance-vector (diffused computation) | |||
techniques but rather a mixture of both, briefly described as "link- | technique but rather a mixture of both, briefly described as "link- | |||
state towards the spines" and "distance vector towards the leaves". | state towards the spines" and "distance vector towards the leaves". | |||
In other words, "bottom" levels are flooding their link-state | In other words, "bottom" levels are flooding their link-state | |||
information in the "northern" direction while each node generates | information in the "northern" direction while each node generates | |||
under normal conditions a "default route" and floods it in the | under normal conditions a "default route" and floods it in the | |||
"southern" direction. This type of protocol naturally supports | "southern" direction. This type of protocol naturally supports | |||
highly desirable address aggregation. Alas, such aggregation could | highly desirable address aggregation. Alas, such aggregation could | |||
drop traffic in cases of misconfiguration or while failures are being | drop traffic in cases of misconfiguration or while failures are being | |||
resolved or even cause persistent network partitioning and this has | resolved. It could also cause persistent network partitioning, which | |||
to be addressed by some adequate mechanism. The approach RIFT takes | has to be addressed by some adequate mechanism. The approach RIFT | |||
is described in Section 6.5 and is based on automatic, sufficient | takes is described in Section 6.5 and is based on automatic, | |||
disaggregation of prefixes in case of link and node failures. | sufficient disaggregation of prefixes in case of link and node | |||
failures. | ||||
The protocol further provides: | The protocol further provides: | |||
* optional fully automated construction of fat tree topologies based | * optional fully automated construction of fat tree topologies based | |||
on detection of links without any configuration (Section 6.7), | on detection of links without any configuration (Section 6.7) | |||
while allowing for conventional configuration methods or an | while allowing for conventional configuration methods or an | |||
arbitrary mix of both, | arbitrary mix of both, | |||
* minimum amount of routing state held by nodes, | * the minimum amount of routing state held by nodes, | |||
* automatic pruning and load balancing of topology flooding | * automatic pruning and load balancing of topology flooding | |||
exchanges over a sufficient subset of links (Section 6.3.9), | exchanges over a sufficient subset of links (Section 6.3.9), | |||
* automatic address aggregation (Section 6.3.8) and consequently | * automatic address aggregation (Section 6.3.8) and consequently | |||
automatic disaggregation (Section 6.5) of prefixes on link and | automatic disaggregation (Section 6.5) of prefixes on link and | |||
node failures to prevent traffic loss and suboptimal routing, | node failures to prevent traffic loss and suboptimal routing, | |||
* loop-free non-ECMP forwarding due to its inherent valley-free | * loop-free non-ECMP forwarding due to its inherent valley-free | |||
nature, | nature, | |||
* fast mobility (Section 6.8.4), | * fast mobility (Section 6.8.4), | |||
* re-balancing of traffic towards the spines based on bandwidth | * rebalancing of traffic towards the spines based on bandwidth | |||
available (Section 6.8.7.1), and finally | available (Section 6.8.7.1), and finally | |||
* mechanisms to synchronize a limited key-value data-store | * mechanisms to synchronize a limited key-value datastore | |||
(Section 6.8.5.1) that can be used after protocol convergence to | (Section 6.8.5.1) that can be used after protocol convergence to, | |||
e.g. bootstrap higher levels of functionality on nodes. | e.g., bootstrap higher levels of functionality on nodes. | |||
Figure 1 illustrates a simplified, conceptual view of a RIFT fabric | Figure 1 illustrates a simplified, conceptual view of a RIFT fabric | |||
with its routing tables and topology databases using IPv4 as address | with its routing tables and topology databases using IPv4 as the | |||
family. The top of the fabric's link-state database holds | address family. The top of the fabric's link-state database holds | |||
information about the nodes below it and the routes to them. When | information about the nodes below it and the routes to them. When | |||
referring to Figure 1, /32 notation corresponds to each node's IPv4 | referring to Figure 1, /32 notation corresponds to each node's IPv4 | |||
loopback address (e.g. A/32 is node A's loopback, etc.) and 0/0 | loopback address (e.g., A/32 is node A's loopback, etc.) and 0/0 | |||
indicates a default IPv4 route. The first row of database | indicates a default IPv4 route. The first row of database | |||
information represents the nodes for which full topology information | information represents the nodes for which full topology information | |||
is available. The second row of database information indicates that | is available. The second row of database information indicates that | |||
partial information of other nodes in the same level is also | partial information of other nodes in the same level is also | |||
available. Such information will be needed to perform certain | available. Such information will be needed to perform certain | |||
algorithms necessary for correct protocol operation. When the | algorithms necessary for correct protocol operation. When the | |||
"bottom" (or in other words leaves) of the fabric is considered, the | "bottom" (or in other words leaves) of the fabric is considered, the | |||
topology is basically empty and, under normal conditions, the leaves | topology is basically empty and, under normal conditions, the leaves | |||
hold a load balanced default route to the next level. | hold a load-balanced default route to the next level. | |||
The remainder of this document fills in the protocol specification | The remainder of this document fills in the protocol specification | |||
details. | details. | |||
[A,B,C,D] | [A,B,C,D] | |||
[E] | [E] | |||
+---------+ +---------+ A/32 @ [C,D] | +---------+ +---------+ A/32 @ [C,D] | |||
| E | | F | B/32 @ [C,D] | | E | | F | B/32 @ [C,D] | |||
+-+-----+-+ +-+-----+-+ C/32 @ C | +-+-----+-+ +-+-----+-+ C/32 @ C | |||
skipping to change at page 8, line 9 ¶ | skipping to change at line 320 ¶ | |||
+-+-----+-+ +-+-----+-+ | +-+-----+-+ +-+-----+-+ | |||
0/0 @ [C,D] | A | | B | 0/0 @ [C,D] | 0/0 @ [C,D] | A | | B | 0/0 @ [C,D] | |||
+---------+ +---------+ | +---------+ +---------+ | |||
Figure 1: RIFT Information Distribution | Figure 1: RIFT Information Distribution | |||
1.1. Requirements Language | 1.1. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
2. A Reader's Digest | 2. A Reader's Digest | |||
This section is an initial guided tour through the document in order | This section is an initial guided tour through the document in order | |||
to convey the necessary information for different readers, depending | to convey the necessary information for different readers, depending | |||
on their level of interest. The authors recommend reading the HTML | on their level of interest. The authors recommend reading the HTML | |||
or PDF versions of this document due to the inherent limitation of | or PDF versions of this document due to the inherent limitation of | |||
text version to represent complex figures. | text version to represent complex figures. | |||
The Terminology (Section 3.1) section should be used as a supporting | The "Terminology" (Section 3.1) section should be used as a | |||
reference as the document is read. | supporting reference as the document is read. | |||
The indications of direction (i.e. "top", "bottom", etc.) referenced | The indications of direction (i.e., "top", "bottom", etc.) referenced | |||
in Section 1 are of paramount importance. RIFT requires a topology | in Section 1 are of paramount importance. RIFT requires a topology | |||
with a sense of top and bottom in order to properly achieve a sorted | with a sense of top and bottom in order to properly achieve a sorted | |||
topology. Clos, Fat Tree, and other similarly structured networks | topology. Clos, fat tree, and other similarly structured networks | |||
are conducive to such requirements. Where RIFT does allow for | are conducive to such requirements. Where RIFT allows for further | |||
further relaxation of these constraints, this will be mentioned later | relaxation of these constraints will be mentioned later in this | |||
in this section. | section. | |||
Several of the images in this document are annotated with "northern | Several of the images in this document are annotated with "northern | |||
view" or "southern view" to indicate perspective to the reader. A | view" or "southern view" to indicate perspective to the reader. A | |||
"northern view" should be interpreted as "from the top of the fabric | "northern view" should be interpreted as "from the top of the fabric | |||
looking down", whereas "southern view" should be interpreted as "from | looking down", whereas "southern view" should be interpreted as "from | |||
the bottom looking up". | the bottom looking up". | |||
Operators and implementors alike must decide whether multi-plane IP | Operators and implementors alike must decide whether multi-plane IP | |||
fabrics are of interest for them. Section 3.2 illustrates an example | fabrics are of interest for them. Section 3.2 illustrates an example | |||
of both single-plane in Figure 2 and multi-plane fabric in Figure 3. | of both single-plane in Figure 2 and multi-plane fabric in Figure 3. | |||
Multi-plane fabrics require understanding of additional RIFT concepts | Multi-plane fabrics require understanding of additional RIFT concepts | |||
(e.g. negative disaggregation in Section 6.5.2) that are unnecessary | (e.g., negative disaggregation in Section 6.5.2) that are unnecessary | |||
in the context of fabrics consisting of a single-plane only. The | in the context of fabrics consisting of a single-plane only. | |||
Overview (Section 5) and Section 5.2 aim to provide enough context to | "Overview" (Section 5) and "Generalized Topology View" (Section 5.2) | |||
determine if multi-plane fabrics are of interest to the reader. The | aim to provide enough context to determine if multi-plane fabrics are | |||
Fallen Leaf part (Section 5.3), and additionally Section 5.4 and | of interest to the reader. "Fallen Leaf Problem" (Section 5.3) and | |||
Section 5.5 describe further considerations that are specific to | additionally Sections 5.4 and 5.5 describe further considerations | |||
multi-plane fabrics. | that are specific to multi-plane fabrics. | |||
The fundamental protocol concepts are described starting in the | The fundamental protocol concepts are described starting in | |||
specification part (Section 6), but some sub-sections are less | "Specification" (Section 6), but some subsections are less relevant | |||
relevant unless the protocol is being implemented. The protocol | unless the protocol is being implemented. The protocol transport | |||
transport (Section 6.1) is of particular importance for two reasons. | (Section 6.1) is of particular importance for two reasons. First, it | |||
First, it introduces RIFT's packet format content in the form of a | introduces RIFT's packet format content in the form of a normative | |||
normative Thrift [thrift] model given in Section 7.3 which is carried | Thrift [thrift] model given in Section 7.3, which is carried in an | |||
in according security envelope as described in Section 6.9.3. | according security envelope as described in Section 6.9.3. Second, | |||
Second, the Thrift model component is a prerequisite to understanding | the Thrift model component is a prerequisite to understanding the | |||
the RIFT's inherent security features as defined in both security | RIFT's inherent security features as defined in both "Security" | |||
models part (Section 6.9) and the security segment (Section 9). The | (Section 6.9) and "Security Considerations" (Section 9). The | |||
normative schema defining the Thrift model can be found in | normative schema defining the Thrift model can be found in Sections | |||
Section 7.2 and Section 7.3. Furthermore, while a detailed | 7.2 and 7.3. Furthermore, while a detailed understanding of Thrift | |||
understanding of Thrift [thrift] and the models is not required | [thrift] and the model is not required unless implementing RIFT, they | |||
unless implementing RIFT, they may provide additional useful | may provide additional useful information for other readers. | |||
information for other readers. | ||||
If implementing RIFT to support multi-plane topologies Section 6 | If implementing RIFT to support multi-plane topologies, Section 6 | |||
should be reviewed in its entirety in conjunction with the previously | should be reviewed in its entirety in conjunction with the previously | |||
mentioned Thrift schemas. Sections not relevant to single-plane | mentioned Thrift schemas. Sections not relevant to single-plane | |||
implementations will be noted later in this section. | implementations will be noted later in this section. | |||
All readers dealing with implementation of the protocol should pay | All readers dealing with implementation of the protocol should pay | |||
special attention to the Link Information Element (LIE) definitions | special attention to the Link Information Element (LIE) definitions | |||
part (Section 6.2) as it not only outlines basic neighbor discovery | (Section 6.2) as it not only outlines basic neighbor discovery and | |||
and adjacency formation, but also provides necessary context for | adjacency formation but also provides necessary context for RIFT's | |||
RIFT's optional Zero Touch Provisioning (ZTP) (Section 6.7) and mis- | optional Zero Touch Provisioning (ZTP) (Section 6.7) and miscabling | |||
cabling detection capabilities that allow it to automatically detect | detection capabilities that allow it to automatically detect and | |||
and build the underlay topology with basically no configuration. | build the underlay topology with basically no configuration. These | |||
These specific capabilities are detailed in Section 6.7. | specific capabilities are detailed in Section 6.7. | |||
For other readers, the following sections provide a more detailed | For other readers, the following sections provide a more detailed | |||
understanding of the fundamental properties and highlight some | understanding of the fundamental properties and highlight some | |||
additional benefits of RIFT such as link state packet formats, | additional benefits of RIFT, such as link-state packet formats, | |||
efficient flooding, synchronization, loop-free path computation and | efficient flooding, synchronization, loop-free path computation, and | |||
link-state database maintenance - Section 6.3, Section 6.3.2, | link-state database maintenance (see Sections 6.3, 6.3.2, 6.3.3, | |||
Section 6.3.3, Section 6.3.4, Section 6.3.6, Section 6.3.7, | 6.3.4, 6.3.6, 6.3.7, 6.3.8, 6.4, 6.4.1, 6.4.2, 6.4.3, and 6.4.4). | |||
Section 6.3.8, Section 6.4, Section 6.4.1, Section 6.4.2, | RIFT's ability to perform weighted unequal-cost load balancing of | |||
Section 6.4.3, Section 6.4.4. RIFT's ability to perform weighted | traffic across all available links is outlined in Section 6.8.7 with | |||
unequal-cost load balancing of traffic across all available links is | an accompanying example. | |||
outlined in Section 6.8.7 with an accompanying example. | ||||
Section 6.5 is the place where the single-plane vs. multi-plane | Section 6.5 is the place where the single-plane vs. multi-plane | |||
requirement is explained in more detail. For those interested in | requirement is explained in more detail. For those interested in | |||
single-plane fabrics, only Section 6.5.1 is required. For the multi- | single-plane fabrics, only Section 6.5.1 is required. For the multi- | |||
plane interested reader Section 6.5.2, Section 6.5.2.1, | plane-interested reader, Sections 6.5.2, 6.5.2.1, 6.5.2.2, and | |||
Section 6.5.2.2, and Section 6.5.2.3 are also mandatory. Section 6.6 | 6.5.2.3 are also mandatory. Section 6.6 is especially important for | |||
is especially important for any multi-plane interested reader as it | any multi-plane-interested reader as it outlines how the Routing | |||
outlines how the RIB (Routing Information Base) and FIB (Forwarding | Information Base (RIB) and Forwarding Information Base (FIB) are | |||
Information Base) are built via the disaggregation mechanisms, but | built via the disaggregation mechanisms but also illustrates how they | |||
also illustrates how they prevent defective routing decisions that | prevent defective routing decisions that cause traffic loss in both | |||
cause traffic loss in both single or multi-plane topologies. | single-plane or multi-plane topologies. | |||
Appendix B contains a set of comprehensive examples that show how | Appendix B contains a set of comprehensive examples that show how | |||
RIFT contains the impact of failures to only the required set of | RIFT contains the impact of failures to only the required set of | |||
nodes. It should also help cement some of RIFT's core concepts in | nodes. It should also help cement some of RIFT's core concepts in | |||
the reader's mind. | the reader's mind. | |||
Last, but not least, RIFT has other optional capabilities. One | Last but not least, RIFT has other optional capabilities. One | |||
example is the key-value data-store, which enables RIFT to advertise | example is the key-value datastore, which enables RIFT to advertise | |||
data post-convergence in order to bootstrap higher levels of | data post-convergence in order to bootstrap higher levels of | |||
functionality (e.g. operational telemetry). Those are covered in | functionality (e.g., operational telemetry). Those are covered in | |||
Section 6.8. | Section 6.8. | |||
More information related to RIFT can be found in the "RIFT | More information related to RIFT can be found in the "RIFT | |||
Applicability" [APPLICABILITY] document, which discusses alternate | Applicability" [APPLICABILITY] document, which discusses alternate | |||
topologies upon which RIFT may be deployed, use cases where it is | topologies upon which RIFT may be deployed, describes use cases where | |||
applicable, and presents operational considerations that complement | it is applicable, and presents operational considerations that | |||
this document. The RIFT DayOne [DayOne] book covers some practical | complement this document. "RIFT Day One" [DayOne] covers some | |||
details of existing RIFT implementations and deployment details. | practical details of existing RIFT implementations and deployment | |||
details. | ||||
3. Reference Frame | 3. Reference Frame | |||
3.1. Terminology | 3.1. Terminology | |||
This section presents the terminology used in this document. | This section presents the terminology used in this document. | |||
Bandwidth Adjusted Distance (BAD): | Bandwidth Adjusted Distance (BAD): | |||
Each RIFT node can calculate the amount of northbound bandwidth | Each RIFT node can calculate the amount of northbound bandwidth | |||
available towards a node compared to other nodes at the same level | available towards a node compared to other nodes at the same level | |||
and can modify the route distance accordingly to allow for the | and can modify the route distance accordingly to allow for the | |||
lower level to adjust their load balancing towards spines. | lower level to adjust their load balancing towards spines. | |||
Bi-directional Adjacency: | Bidirectional Adjacency: | |||
Bidirectional adjacency is an adjacency where nodes of both sides | Bidirectional adjacency is an adjacency where nodes of both sides | |||
of the adjacency advertised it in the Node TIEs with the correct | of the adjacency advertised it in the Node TIEs with the correct | |||
levels and System IDs. Bi-directionality is used to check in | levels and System IDs. Bidirectionality is used to check in | |||
different algorithms whether the link should be included. | different algorithms whether the link should be included. | |||
Bow-tying: | Bow-tying: | |||
Traffic patterns in fully converged IP fabrics traverse normally | Traffic patterns in fully converged IP fabrics normally traverse | |||
the shortest route based on hop count toward their destination | the shortest route based on hop count towards their destination | |||
(e.g., leaf, spine, leaf). Some failure scenarios with partial | (e.g., leaf, spine, leaf). Some failure scenarios with partial | |||
routing information cause nodes to lose the required downstream | routing information cause nodes to lose the required downstream | |||
reachability to a destination and force traffic to utilize routes | reachability to a destination and force traffic to utilize routes | |||
that traverse higher levels in the fabric in order to turn south | that traverse higher levels in the fabric in order to turn south | |||
again using a different route to resolve reachability (e.g., leaf, | again using a different route to resolve reachability (e.g., leaf, | |||
spine-1, super-spine, spine-2, leaf). | spine-1, superspine, spine-2, leaf). | |||
Clos/Fat Tree: | Clos / fat tree: | |||
This document uses the terms Clos and Fat Tree interchangeably | This document uses the terms "Clos" and "fat tree" interchangeably | |||
where it always refers to a folded spine-and-leaf topology with | where it always refers to a folded spine-and-leaf topology with | |||
possibly multiple Points of Delivery (PoDs) and one or multiple | possibly multiple Points of Delivery (PoDs) and one or multiple | |||
Top of Fabric (ToF) planes. Several modifications such as leaf- | Top of Fabric (ToF) planes. Several modifications such as L2L | |||
2-leaf shortcuts and multiple level shortcuts are possible and | shortcuts and multi-level shortcuts are possible and described | |||
described further in the document. | further in the document. | |||
Cost: | Cost: | |||
A natural number without a unit associated with two entities. The | A natural number without the unit associated with two entities. | |||
usual natural numbers algebra can be applied to costs. A cost may | The cost is a monoid under addition. A cost may be associated | |||
be associated with either a single link or prefix or it may | with either a single link or prefix, or it may represent the sum | |||
represent the sum of costs (distance) of links in the path between | of costs (distance) of links in the path between two nodes. | |||
two nodes. | ||||
Crossbar: | Crossbar: | |||
Physical arrangement of ports in a switching matrix without | Physical arrangement of ports in a switching matrix without | |||
implying any further scheduling or buffering disciplines. | implying any further scheduling or buffering disciplines. | |||
Directed Acyclic Graph (DAG): | Directed Acyclic Graph (DAG): | |||
A finite directed graph with no directed cycles (loops). If links | A finite directed graph with no directed cycles (loops). If links | |||
in a Clos are considered as either being all directed towards the | in a Clos are considered as either being all directed towards the | |||
top or vice versa, each of such two graphs is a DAG. | top or vice versa, each of two such graphs is a DAG. | |||
Disaggregation: | Disaggregation: | |||
Process in which a node decides to advertise more specific | The process in which a node decides to advertise more specific | |||
prefixes Southwards, either positively to attract the | prefixes southwards, either positively to attract the | |||
corresponding traffic, or negatively to repel it. Disaggregation | corresponding traffic or negatively to repel it. Disaggregation | |||
is performed to prevent traffic loss and suboptimal routing to the | is performed to prevent traffic loss and suboptimal routing to the | |||
more specific prefixes. | more specific prefixes. | |||
Distance: | Distance: | |||
The sum of costs (bound by infinite cost constant) between two | The sum of costs (bound by the infinite cost constant) between two | |||
nodes. A distance is primarily used to express separation between | nodes. A distance is primarily used to express separation between | |||
two entities and can be used again as cost in another context. | two entities and can be used again as cost in another context. | |||
East-West (E-W) Link: | East-West (E-W) Link: | |||
A link between two nodes at the same level. East-West links are | A link between two nodes at the same level. East-West links are | |||
normally not part of Clos or "fat tree" topologies. | normally not part of Clos or fat tree topologies. | |||
Flood Repeater (FR): | Flood Repeater (FR): | |||
A node can designate one or more northbound neighbor nodes to be | A node can designate one or more northbound neighbor nodes to be | |||
flood repeaters. The flood repeaters are responsible for flooding | flood repeaters. The flood repeaters are responsible for flooding | |||
northbound TIEs further north. The document sometimes calls them | northbound TIEs further north. The document sometimes calls them | |||
flood leaders as well. | flood leaders as well. | |||
Folded Spine-and-Leaf: | Folded Spine-and-Leaf: | |||
In case the Clos fabric input and output stages are equivalent, | In case the Clos fabric input and output stages are equivalent, | |||
the fabric can be "folded" to build a "superspine" or top which is | the fabric can be "folded" to build a "superspine" or top, which | |||
called the ToF in this document. | is called the ToF in this document. | |||
Interface: | Interface: | |||
A layer 3 entity over which RIFT control packets are exchanged. | A layer 3 entity over which RIFT control packets are exchanged. | |||
Key Value (KV) TIE: | Key Value (KV) TIE: | |||
A TIE that is carrying a set of key value pairs [DYNAMO]. It can | A TIE that is carrying a set of key value pairs [DYNAMO]. It can | |||
be used to distribute non topology related information within the | be used to distribute non-topology-related information within the | |||
protocol. | protocol. | |||
Leaf-to-Leaf Shortcuts (L2L): | Leaf-to-Leaf (L2L) Shortcuts: | |||
East-West links at leaf level will need to be differentiated from | East-West links at leaf level will need to be differentiated from | |||
East-West links at other levels. | East-West links at other levels. | |||
Leaf: | Leaf: | |||
A node without southbound adjacencies. Level 0 implies a leaf in | A node without southbound adjacencies. Level 0 implies a leaf in | |||
RIFT but a leaf does not have to be level 0. | RIFT, but a leaf does not have to be level 0. | |||
Level: | Level: | |||
Clos and Fat Tree networks are topologically partially ordered | Clos and fat tree networks are topologically partially ordered | |||
graphs and 'level' denotes the set of nodes at the same height in | graphs, and "level" denotes the set of nodes at the same height in | |||
such a network. Nodes at the top level (i.e., ToF) are at the | such a network. Nodes at the top level (i.e., ToF) are at the | |||
level with the highest value and count down to the nodes at the | level with the highest value and count down to the nodes at the | |||
bottom level (i.e., leaf) with the lowest value. A node will have | bottom level (i.e., leaf) with the lowest value. A node will have | |||
links to nodes one level down and/or one level up. In some | links to nodes one level down and/or one level up. In some | |||
circumstances, a node may have links to other nodes at the same | circumstances, a node may have links to other nodes at the same | |||
level. A leaf node may also have links to nodes multiple levels | level. A leaf node may also have links to nodes multiple levels | |||
higher. In RIFT, Level 0 always indicates that a node is a leaf, | higher. In RIFT, level 0 always indicates that a node is a leaf | |||
but does not have to be level 0. Level values can be configured | but does not have to be level 0. Level values can be configured | |||
manually or automatically derived via Section 6.7. As a final | manually or automatically as described in Section 6.7. | |||
footnote: Clos terminology often uses the concept of "stage", but | ||||
due to the folded nature of the Fat Tree it is not used from this | | As a final footnote: Clos terminology often uses the concept | |||
point on to prevent misunderstandings. | | of "stage", but due to the folded nature of the fat tree, it | |||
| is not used from this point on to prevent misunderstandings. | ||||
LIE: | LIE: | |||
This is an acronym for a "Link Information Element" exchanged on | This is an acronym for a "Link Information Element" exchanged on | |||
all the system's links running RIFT to form _ThreeWay_ adjacencies | all the system's links running RIFT to form _ThreeWay_ adjacencies | |||
and carry information used to perform RIFT Zero Touch Provisioning | and carry information used to perform RIFT Zero Touch Provisioning | |||
(ZTP) of levels. | (ZTP) of levels. | |||
Metric: | Metric: | |||
Used interchangeably with cost. | Used interchangeably with "cost". | |||
Neighbor: | Neighbor: | |||
Once a _ThreeWay_ adjacency has been formed a neighborship | Once a _ThreeWay_ adjacency has been formed, a neighborship | |||
relationship contains the neighbor's properties. Multiple | relationship contains the neighbor's properties. Multiple | |||
adjacencies can be formed to a remote node via parallel point-to- | adjacencies can be formed to a remote node via parallel point-to- | |||
point interfaces but such adjacencies are *not* sharing a neighbor | point interfaces, but such adjacencies are *not* sharing a | |||
structure. Saying "neighbor" is thus equivalent to saying "a | neighbor structure. Saying "neighbor" is thus equivalent to | |||
_ThreeWay_ adjacency". | saying "a _ThreeWay_ adjacency". | |||
Node TIE: | Node TIE: | |||
This stands as acronym for a "Node Topology Information Element", | This is an acronym for a "Node Topology Information Element", | |||
which contains all adjacencies the node discovered and information | which contains all adjacencies the node discovered and information | |||
about the node itself. Node TIE should not be confused with a | about the node itself. Node TIE should not be confused with a | |||
North TIE since "node" defines the type of TIE rather than its | North TIE since "node" defines the type of TIE rather than its | |||
direction. Consequently, North Node TIEs and South Node TIEs | direction. Consequently, North Node TIEs and South Node TIEs | |||
exist. | exist. | |||
North SPF (N-SPF): | North SPF (N-SPF): | |||
A reachability calculation that is progressing northbound, as | A reachability calculation that is progressing northbound, for | |||
example SPF that is using South Node TIEs only. Normally it | example, SPF that is using South Node TIEs only. Normally it | |||
progresses a single hop only and installs default routes. | progresses by only a single hop and installs default routes. | |||
Northbound Link: | Northbound Link: | |||
A link to a node one level up or in other words, one level further | A link to a node one level up or, in other words, one level | |||
north. | further north. | |||
Northbound representation: | Northbound Representation: | |||
Subset of topology information flooded towards higher levels of | The subset of topology information flooded towards higher levels | |||
the fabric. | of the fabric. | |||
Overloaded: | Overloaded: | |||
Applies to a node advertising the _overload_ attribute as set. | Applies to a node advertising the _overload_ attribute as set. | |||
Overload attribute is carried in the _NodeFlags_ object of the | The overload attribute is carried in the _NodeFlags_ object of the | |||
encoding schema. | encoding schema. | |||
Point of Delivery (PoD): | Point of Delivery (PoD): | |||
A self-contained vertical slice or subset of a Clos or Fat Tree | A self-contained vertical slice or subset of a Clos or fat tree | |||
network containing normally only level 0 and level 1 nodes. A | network normally containing only level 0 and level 1 nodes. A | |||
node in a PoD communicates with nodes in other PoDs via the ToF | node in a PoD communicates with nodes in other PoDs via the ToF | |||
nodes. PoDs are numbered to distinguish them and PoD value 0 | nodes. PoDs are numbered to distinguish them, and PoD value 0 | |||
(defined later in the encoding schema as _common.default_pod_) is | (defined later in the encoding schema as _common.default_pod_) is | |||
used to denote "undefined" or "any" PoD. | used to denote "undefined" or "any" PoD. | |||
Prefix TIE: | Prefix TIE: | |||
This is an acronym for a "Prefix Topology Information Element" and | This is an acronym for a "Prefix Topology Information Element", | |||
it contains all prefixes directly attached to this node in case of | and it contains all prefixes directly attached to this node in | |||
a North TIE and in case of South TIE the necessary default routes | case of a North TIE and the necessary default routes the node | |||
the node advertises southbound. | advertises southbound in case of a South TIE. | |||
Radix: | Radix: | |||
A radix of a switch is the number of switching ports it provides. | A radix of a switch is the number of switching ports it provides. | |||
It's sometimes called fanout as well. | It's sometimes called "fanout" as well. | |||
Routing on the Host (RotH): | Routing on the Host (RotH): | |||
Modern data center architecture variant where servers/leaves are | A modern data center architecture variant where servers/leaves are | |||
multi-homed and consequently participate in routing. | multihomed and consequently participate in routing. | |||
Security Envelope: | Security Envelope: | |||
RIFT packets are flooded within an authenticated security envelope | RIFT packets are flooded within an authenticated security envelope | |||
that allows to protect the integrity of information a node accepts | that optionally enables protection of the integrity of information | |||
if any of the mechanisms in Section 10.2 is used. This is further | a node accepts if any of the mechanisms in Section 10.2 are used. | |||
described in Section 6.9.3. | This is further described in Section 6.9.3. | |||
Shortest-Path First (SPF): | Shortest Path First (SPF): | |||
A well-known graph algorithm attributed to Dijkstra [DIJKSTRA] | A well-known graph algorithm attributed to Dijkstra [DIJKSTRA] | |||
that establishes a tree of shortest paths from a source to | that establishes a tree of shortest paths from a source to | |||
destinations on the graph. SPF acronym is used due to its | destinations on the graph. The SPF acronym is used due to its | |||
familiarity as general term for the node reachability calculations | familiarity as a general term for the node reachability | |||
RIFT can employ to ultimately calculate routes of which Dijkstra | calculations RIFT can employ to ultimately calculate routes, of | |||
algorithm is a possible one. | which Dijkstra's algorithm is a possible one. | |||
South Reflection: | South Reflection: | |||
Often abbreviated just as "reflection", it defines a mechanism | Often abbreviated just as "reflection", it defines a mechanism | |||
where South Node TIEs are "reflected" from the level south back up | where South Node TIEs are "reflected" from the level south back up | |||
north to allow nodes in the same level without E-W links to be | north to allow nodes in the same level without E-W links to be | |||
aware of each other's node Topology Information Elements (TIEs). | aware of each other's node Topology Information Elements (TIEs). | |||
South SPF (S-SPF): | South SPF (S-SPF): | |||
A reachability calculation that is progressing southbound, as | A reachability calculation that is progressing southbound, for | |||
example SPF that is using North Node TIEs only. | example, SPF that is using North Node TIEs only. | |||
South/Southbound and North/Northbound (Direction): | South/Southbound and North/Northbound (Direction): | |||
When describing protocol elements and procedures, in different | When describing protocol elements and procedures, in different | |||
situations the directionality of the compass is used. i.e., | situations, the directionality of the compass is used, i.e., | |||
'lower', 'south' or 'southbound' mean moving towards the bottom of | "lower", "south", and "southbound" mean moving towards the bottom | |||
the Clos or Fat Tree network and 'higher', 'north' and | of the Clos or fat tree network and "higher", "north", and | |||
'northbound' mean moving towards the top of the Clos or Fat Tree | "northbound" mean moving towards the top of the Clos or fat tree | |||
network. | network. | |||
Southbound Link: | Southbound Link: | |||
A link to a node one level down or in other words, one level | A link to a node one level down or, in other words, one level | |||
further south. | further south. | |||
Southbound representation: | Southbound Representation: | |||
Subset of topology information sent towards a lower level. | The subset of topology information sent towards a lower level. | |||
Spine: | Spine: | |||
Any nodes north of leaves and south of ToF nodes. Multiple layers | Any nodes north of leaves and south of ToF nodes. Multiple layers | |||
of spines in a PoD are possible. | of spines in a PoD are possible. | |||
Superspine, Aggregation/Spine and Edge/Leaf Switches:" | Superspine, Aggregation/Spine, and Edge/Leaf Switches: | |||
Traditional level names in 5-stages folded Clos for Level 2, 1 and | Typical level names in 5 stages folded Clos for levels 2, 1, and | |||
0 respectively (counting up from the bottom). We normalize this | 0, respectively (counting up from the bottom). We normalize this | |||
language to talk about ToF, Top-of-Pod (ToP) and leaves. | language to talk about ToF, Top-of-Pod (ToP), and leaves. | |||
System ID: | System ID: | |||
RIFT nodes identify themselves with a unique network-wide number | RIFT nodes identify themselves with a unique network-wide number | |||
when trying to build adjacencies or describe their topology. RIFT | when trying to build adjacencies or describe their topology. RIFT | |||
System IDs can be auto-derived or configured. | System IDs can be auto-derived or configured. | |||
ThreeWay Adjacency: | ThreeWay Adjacency: | |||
RIFT tries to form a unique adjacency between two nodes over a | RIFT tries to form a unique adjacency between two nodes over a | |||
point-to-point interface and exchange local configuration and | point-to-point interface and exchange local configuration and | |||
necessary RIFT ZTP information. An adjacency is only advertised | necessary RIFT ZTP information. An adjacency is only advertised | |||
in Node TIEs and used for computations after it achieved | in Node TIEs and used for computations after it achieved | |||
_ThreeWay_ state, i.e. both routers reflected each other in LIEs | _ThreeWay_ state, i.e., both routers reflected each other in LIEs, | |||
including relevant security information. Nevertheless, LIEs | including relevant security information. Nevertheless, LIEs | |||
before _ThreeWay_ state is reached may carry RIFT ZTP related | before _ThreeWay_ state is reached may already carry information | |||
information already. | related to RIFT ZTP. | |||
TIDE: | TIDE: | |||
Topology Information Description Element carrying descriptors of | The Topology Information Description Element carries descriptors | |||
the TIEs stored in the node. | of the TIEs stored in the node. | |||
TIE: | TIE: | |||
This is an acronym for a "Topology Information Element". TIEs are | This is an acronym for a "Topology Information Element". TIEs are | |||
exchanged between RIFT nodes to describe parts of a network such | exchanged between RIFT nodes to describe parts of a network such | |||
as links and address prefixes. A TIE has always a direction and a | as links and address prefixes. A TIE always has a direction and a | |||
type. North TIEs (sometimes abbreviated as N-TIEs) are used when | type. North TIEs (sometimes abbreviated as N-TIEs) are used when | |||
dealing with TIEs in the northbound representation and South-TIEs | dealing with TIEs in the northbound representation, and South-TIEs | |||
(sometimes abbreviated as S-TIEs) for the southbound equivalent. | are used (sometimes abbreviated as S-TIEs) for the southbound | |||
TIEs have different types such as node and prefix TIEs. | equivalent. TIEs have different types, such as node and prefix | |||
TIEs. | ||||
TIEDB: | TIEDB: | |||
The database holding the newest versions of all TIE headers (and | The database holding the newest versions of all TIE headers (and | |||
the corresponding TIE content if it is available). | the corresponding TIE content if it is available). | |||
TIRE: | TIRE: | |||
Topology Information Request Element carrying set of TIDE | The Topology Information Request Element carries a set of TIDE | |||
descriptors. It can both confirm received and request missing | descriptors. It can both confirm received and request missing | |||
TIEs. | TIEs. | |||
Top of Fabric (ToF): | Top of Fabric (ToF): | |||
The set of nodes that provide inter-PoD communication and have no | The set of nodes that provide inter-PoD communication and have no | |||
northbound adjacencies, i.e. are at the "very top" of the fabric. | northbound adjacencies, i.e., are at the "very top" of the fabric. | |||
ToF nodes do not belong to any PoD and are assigned | ToF nodes do not belong to any PoD and are assigned the | |||
_common.default_pod_ PoD value to indicate the equivalent of "any" | _common.default_pod_ PoD value to indicate the equivalent of "any" | |||
PoD. | PoD. | |||
Top of PoD (ToP): | Top of PoD (ToP): | |||
The set of nodes that provide intra-PoD communication and have | The set of nodes that provide intra-PoD communication and have | |||
northbound adjacencies outside of the PoD, i.e. are at the "top" | northbound adjacencies outside of the PoD, i.e., are at the "top" | |||
of the PoD. | of the PoD. | |||
ToF Plane or Partition: | ToF Plane or Partition: | |||
In large fabrics ToF switches may not have enough ports to | In large fabrics, ToF switches may not have enough ports to | |||
aggregate all switches south of them and with that, the ToF is | aggregate all switches south of them, and with that, the ToF is | |||
'split' into multiple independent planes. Section 5.2 explains | "split" into multiple independent planes. Section 5.2 explains | |||
the concept in more detail. A plane is a subset of ToF nodes that | the concept in more detail. A plane is a subset of ToF nodes that | |||
are aware of each other through south reflection or E-W links. | are aware of each other through south reflection or E-W links. | |||
Valid LIE: | Valid LIE: | |||
LIEs undergo different checks to determine their validity. The | LIEs undergo different checks to determine their validity. The | |||
term "valid LIE" is used to describe a LIE that can be used to | term "valid LIE" is used to describe a LIE that can be used to | |||
form or maintain an adjacency. The amount of checking itself | form or maintain an adjacency. The amount of checking itself | |||
depends on the FSM (Finite State Machine) involved and its state. | depends on the Finite State Machine (FSM) involved and its state. | |||
A "minimally valid LIE" is a LIE that passes checks necessary on | A "minimally valid LIE" is a LIE that passes checks necessary on | |||
any FSM in any state. A "ThreeWay valid LIE" is a LIE that | any FSM in any state. A "ThreeWay valid LIE" is a LIE that | |||
successfully underwent further checks with a LIE FSM in _ThreeWay_ | successfully underwent further checks with a LIE FSM in _ThreeWay_ | |||
state. Minimally valid LIE is a subcategory of _ThreeWay_ valid | state. A minimally valid LIE is a subcategory of a _ThreeWay_ | |||
LIE. | valid LIE. | |||
RIFT Zero Touch Provisioning (abbreviated as RIFT ZTP or just | RIFT Zero Touch Provisioning (abbreviated as RIFT ZTP or just | |||
ZTP): | ZTP): | |||
Optional RIFT mechanism which allows the automatic derivation of | An optional RIFT mechanism that allows the automatic derivation of | |||
node levels based on minimum configuration as detailed in | node levels based on minimum configuration, as detailed in | |||
Section 6.7. Such a mininum configuration consists solely of ToFs | Section 6.7. Such a minimum configuration consists solely of ToFs | |||
being configured as such. RIFT ZTP contains a recommendation for | being configured as such. RIFT ZTP contains a recommendation for | |||
automatic collision-free derivation of the System ID as well. | automatic collision-free derivation of the System ID as well. | |||
Additionally, when the specification refers to elements of packet | Additionally, when the specification refers to elements of packet | |||
encoding or constants provided in the Section 7 a special emphasis is | encoding or the constants provided in Section 7, a special emphasis | |||
used, e.g. _invalid_distance_. The same convention is used when | is used, e.g., _invalid_distance_. The same convention is used when | |||
referring to finite state machine states or events outside the | referring to finite state machine states or events outside the | |||
context of the machine itself, e.g., _OneWay_. | context of the machine itself, e.g., _OneWay_. | |||
3.2. Topology | 3.2. Topology | |||
^ N +--------+ +--------+ | ^ N +--------+ +--------+ | |||
Level 2 | |ToF 21| |ToF 22| | Level 2 | |ToF 21| |ToF 22| | |||
W <-*-> E ++-+--+-++ ++-+--+-++ | W <-*-> E ++-+--+-++ ++-+--+-++ | |||
| | | | | | | | | | | | | | | | | | | | |||
S v P111/2 P121/2 | | | | | S v P111/2 P121/2 | | | | | |||
^ ^ ^ ^ | | | | | ^ ^ ^ ^ | | | | | |||
| | | | | | | | | | | | | | | | | | |||
+--------------+ | +-----------+ | | | +---------------+ | +--------------+ | +-----------+ | | | +---------------+ | |||
| | | | | | | | | | | | | | | | | | |||
South +-----------------------------+ | | ^ | South +-----------------------------+ | | ^ | |||
skipping to change at page 17, line 34 ¶ | skipping to change at line 768 ¶ | |||
| +---0/0--->-----+ 0/0 | +----------------+ | | | +---0/0--->-----+ 0/0 | +----------------+ | | |||
0/0 | | | | | | | | 0/0 | | | | | | | | |||
| +---<-0/0-----+ | v | +--------------+ | | | | +---<-0/0-----+ | v | +--------------+ | | | |||
v | | | | | | | | v | | | | | | | | |||
+-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+ | +-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+ | |||
Level 0 | | (L2L) | | | | | | | Level 0 | | (L2L) | | | | | | | |||
|Leaf111+~~~~~~~~~~+Leaf112| |Leaf121| |Leaf122| | |Leaf111+~~~~~~~~~~+Leaf112| |Leaf121| |Leaf122| | |||
+-+-----+ +-+---+-+ +--+--+-+ +-+-----+ | +-+-----+ +-+---+-+ +--+--+-+ +-+-----+ | |||
+ + \ / + + | + + \ / + + | |||
Prefix111 Prefix112 \ / Prefix121 Prefix122 | Prefix111 Prefix112 \ / Prefix121 Prefix122 | |||
multi-homed | multihomed | |||
Prefix | Prefix | |||
+---------- PoD 1 ---------+ +---------- PoD 2 ---------+ | +---------- PoD 1 ---------+ +---------- PoD 2 ---------+ | |||
Figure 2: A Three Level Spine-and-Leaf Topology | Figure 2: A Three-Level Spine-and-Leaf Topology | |||
____________________________________________________________________________ | ||||
| [Plane A] . [Plane B] . [Plane C] . [Plane D] | | ||||
|..........................................................................| | ||||
| +-+ . +-+ . +-+ . +-+ | | ||||
| |n| . |n| . |n| . |n| | | ||||
| +++ . +++ . +++ . +++ | | ||||
| . | | . . | | . . | | . . | | | | ||||
| . | | . . | | . . | | . . | | | | ||||
| +-+ | | . +-+ | | . +-+ | | . +-+ | | | | ||||
| |1| +-+ | . |1| +-+ | . |1| +-+ | . |1| +-+ | | | ||||
| +++ | | . +++ | | . +++ | | . +++ | | | | ||||
| || | | . || | | . || | | . || | | | | ||||
| || | | . || | | . || | | . || | | | | ||||
| |+--|--+| . |+--|--+| . |+--|--+| . |+--|----+ | | ||||
| | | || . | | || . | | || . | | || | | ||||
| | | || . | | || . | | || . | | +|---+ | | ||||
=====|===|==||=========|===|==||=========|===|==||=========|===|====|===|=== | | ||||
/ | | | || . | | || . | | || . | | / | | / | | ||||
/ | | | || . | | || . | | || . | | / ++---++ / | | ||||
/ | | | || . | | || . | | || . | | / | n | / | | ||||
/ | | | || . | | || . | | || . | | / +++-+++ / | | ||||
/ | ++---++ || . ++---++ || . ++---++ || . ++---++/ / | | ||||
/ | | 1 | || . | 2 | || . | 3 | || . | 4 |/ / | | ||||
/ | +++-+++ || . +++-+++ || . +++-+++ || . +++-+++/ / | | ||||
/ | || || || . || || || . || || || . || || / / | | ||||
/ \__||_||_____________||_||_____________||_||_____________||_||_/_________/_/ | ||||
/ || || || || || || || || / || || / | ||||
/ || || +-----------+| || || || || || / || || / | ||||
/ || || |+-----------|-||-------------+| || || || / || || / | ||||
/ || || ||+----------|-||--------------|-||-------------+| || / || || / | ||||
/ || || ||| | || | || +-------+ || / || || / | ||||
/ || || ||| | |+--------------|-||------|---+ || / || || / | ||||
/ || || ||| | | | || | | +-+| / || || / | ||||
/ || || ||| | +-----------+ | || | | | | / || || / | ||||
/ || +|-|||----------|------------+| | |+------|---|---|-+| / || || / | ||||
/ || +-|||----------|------------||---|-|-------|-+ | | || / || || / | ||||
/ || ||| | +------||---+ | | | | | || / || || / | ||||
/ |+----|||-----+ | |+-----||-----|-------+ | | | || / || || / | ||||
/ | ||| | | || || | | | | || / || || / | ||||
/ | ||| | | || || | +----|-|---+ || / || || / | ||||
/ | ||| | | || || | | | | || / || || / | ||||
/ |+----+|| | | || || | | | | || / || || / | ||||
/ || +---+| | | +---+| |+---+ | | | +---+ || / +++-+++ / | ||||
/ || |+---+ +---+| |+---+ +---+| |+---+ +----+| || / | n | / | ||||
/ || || || || || || || || / +++-+++ / | ||||
/ +++-+++ +++-+++ +++-+++ +++-+++/=========/ | ||||
/ | 1 | | 2 + | 3 | . . . | n |/ ^^ | ||||
/ +++-+++ +-----+ +-----+ +-----+/ // | ||||
/ / PoDs | ||||
================================================================== // | ||||
Figure 3: Topology with Multiple Planes | ||||
The topology in Figure 2 is referred to in all further | The topology in Figure 2 is referred to in all further | |||
considerations. This figure depicts a generic "single plane fat | considerations. This figure depicts a generic "single-plane fat | |||
tree" and the concepts explained using three levels apply by | tree" and the concepts explained using three levels apply by | |||
induction to further levels and higher degrees of connectivity. | induction to further levels and higher degrees of connectivity. | |||
Further, this document will deal also with designs that provide only | ||||
sparser connectivity and "partitioned spines" as shown in Figure 3 | (Artwork only available as SVG: see | |||
https://www.rfc-editor.org/rfc/rfc9692.html) | ||||
Figure 3: Topology with Multiple Planes | ||||
Further, this document will also deal with designs that provide only | ||||
sparser connectivity and "partitioned spines", as shown in Figure 3 | ||||
and explained further in Section 5.2. | and explained further in Section 5.2. | |||
4. RIFT: Routing in Fat Trees | 4. RIFT: Routing in Fat Trees | |||
The remainder of this document presents the detailed specification of | The remainder of this document presents the detailed specification of | |||
the RIFT protocol, which in the most abstract terms has many | the RIFT protocol, which in the most abstract terms has many | |||
properties of a modified link-state protocol when distributing | properties of a modified link-state protocol when distributing | |||
information northbound and a distance vector protocol when | information northbound and a distance-vector protocol when | |||
distributing information southbound. While this is an unusual | distributing information southbound. While this is an unusual | |||
combination, it does quite naturally exhibit desired properties. | combination, it does quite naturally exhibit desired properties. | |||
5. Overview | 5. Overview | |||
5.1. Properties | 5.1. Properties | |||
The most singular property of RIFT is that it floods link-state | The most singular property of RIFT is that it only floods link-state | |||
information northbound only so that each level obtains the full | information northbound so that each level obtains the full topology | |||
topology of levels south of it. Link-State information is, with some | of levels south of it. Link-State information is, with some | |||
exceptions, not flooded East-West nor back South again. Exceptions | exceptions, not flooded East-West nor back south again. Exceptions | |||
like south reflection is explained in detail in Section 6.5.1 and | like south reflection is explained in detail in Section 6.5.1, and | |||
east-west flooding at ToF level in multi-plane fabrics is outlined in | east-west flooding at the ToF level in multi-plane fabrics is | |||
Section 5.2. In the southbound direction, the necessary routing | outlined in Section 5.2. In the southbound direction, the necessary | |||
information required (normally just a default route as per | routing information required (normally just a default route as per | |||
Section 6.3.8) only propagates one hop south. Those nodes then | Section 6.3.8) only propagates one hop south. Those nodes then | |||
generate their own routing information and flood it south to avoid | generate their own routing information and flood it south to avoid | |||
the overhead of building an update per adjacency. For the moment | the overhead of building an update per adjacency. The East-West | |||
describing the East-West direction is left out until later in the | direction is described later in the document. | |||
document. | ||||
Those information flow constraints create not only an anisotropic | Those information flow constraints create not only an anisotropic | |||
protocol (i.e. the information is not distributed "evenly" or | protocol (i.e., the information is not distributed "evenly" or | |||
"clumped" but summarized along the N-S gradient) but also a "smooth" | "clumped" but summarized along the north-south gradient) but also a | |||
information propagation where nodes do not receive the same | "smooth" information propagation where nodes do not receive the same | |||
information from multiple directions at the same time. Normally, | information from multiple directions at the same time. Normally, | |||
accepting the same reachability on any link, without understanding | accepting the same reachability on any link, without understanding | |||
its topological significance, forces tie-breaking on some kind of | its topological significance, forces tie-breaking on some kind of | |||
distance function. And such tie-breaking leads ultimately to hop-by- | distance function. And such tie-breaking ultimately leads to hop-by- | |||
hop forwarding by shortest paths only. In contrast to that, RIFT, | hop forwarding by shortest paths only. In contrast to that, RIFT, | |||
under normal conditions, does not need to tie-break the same | under normal conditions, does not need to tie-break the same | |||
reachability information from multiple directions. Its computation | reachability information from multiple directions. Its computation | |||
principles (south forwarding direction is always preferred) leads to | principles (south forwarding direction is always preferred) lead to | |||
valley-free [VFR] forwarding behavior. In shortest terms, valley | valley-free [VFR] forwarding behavior. In the shortest terms, | |||
free paths allow reversal of direction at most once from a packet | valley-free paths allow reversal of direction from a packet heading | |||
heading northbound to southbound while permitting traversal of | northbound to southbound while permitting traversal of horizontal | |||
horizontal links in the northbound phase. Those principles guarantee | links in the northbound phase at most once. Those principles | |||
loop-free forwarding and with that can take advantage of all such | guarantee loop-free forwarding and with that can take advantage of | |||
feasible paths on a fabric. This is another highly desirable | all such feasible paths on a fabric. This is another highly | |||
property if available bandwidth should be utilized to the maximum | desirable property if available bandwidth should be utilized to the | |||
extent possible. | maximum extent possible. | |||
To account for the "northern" and the "southern" information split | To account for the "northern" and the "southern" information split, | |||
the link state database is partitioned accordingly into "north | the link state database (LSDB) is partitioned accordingly into "north | |||
representation" and "south representation" Topology Information | representation" and "south representation" Topology Information | |||
Elements (TIEs). In simplest terms the North TIEs contain a link | Elements (TIEs). In the simplest terms, the North TIEs contain a | |||
state topology description of lower levels and South TIEs carry | link-state topology description of lower levels and South TIEs simply | |||
simply node description of the level above and default routes | carry a node description of the level above and default routes | |||
pointing north. This oversimplified view will be refined gradually | pointing north. This oversimplified view will be refined gradually | |||
in the following sections while introducing protocol procedures and | in the following sections while introducing protocol procedures and | |||
state machines at the same time. | state machines at the same time. | |||
5.2. Generalized Topology View | 5.2. Generalized Topology View | |||
This section and resulting Section 6.5.2 are dedicated to multi-plane | This section and Section 6.5.2 are dedicated to multi-plane fabrics, | |||
fabrics, in contrast with the single plane designs where all ToF | in contrast with the single-plane designs where all ToF nodes are | |||
nodes are topologically equal and initially connected to all the | topologically equal and initially connected to all the switches at | |||
switches at the level below them. | the level below them. | |||
Multi-plane design is effectively a multi-dimensional switching | The multi-plane design is effectively a multidimensional switching | |||
matrix. To make that easier to visualize, this document introduces a | matrix. To make that easier to visualize, this document introduces a | |||
methodology depicting the connectivity in two-dimensional pictures. | methodology depicting the connectivity in two-dimensional pictures. | |||
Further, it can be leveraged that what is under consideration here | Further, it can be leveraged that what is under consideration here is | |||
are basically stacked crossbar fabrics where ports align "on top of | basically stacked crossbar fabrics where ports align "on top of each | |||
each other" in a regular fashion. | other" in a regular fashion. | |||
A word of caution to the reader; at this point it should be observed | A word of caution to the reader: At this point, it should be observed | |||
that the language used to describe Clos variations, especially in | that the language used to describe Clos variations, especially in | |||
multi-plane designs, varies widely between sources. This description | multi-plane designs, varies widely between sources. This description | |||
follows the terminology introduced in Section 3.1. This terminology | follows the terminology introduced in Section 3.1. This terminology | |||
is needed to follow the rest of this section correctly. | is needed to follow the rest of this section correctly. | |||
5.2.1. Terminology and Glossary | 5.2.1. Terminology and Glossary | |||
This section describes the terminology and abbreviations used in the | This section describes the terminology and abbreviations used in the | |||
rest of the text. Though the glossary may not be clear on a first | rest of the text. Though the glossary may not be clear on a first | |||
read, the following sections will introduce the terms in their proper | read, the following sections will introduce the terms in their proper | |||
context. | context. | |||
P: | P: | |||
Denotes the number of PoDs in a topology. | Denotes the number of PoDs in a topology. | |||
S: | S: | |||
Denotes the number of ToF nodes in a topology. | Denotes the number of ToF nodes in a topology. | |||
K: | K: | |||
To simplify the visual aids, notations and further considerations, | To simplify the visual aids, notations, and further | |||
the assumption is made that the switches are symmetrical, i.e., | considerations, the assumption is made that the switches are | |||
they have an equal number of ports pointing northbound and | symmetrical, i.e., they have an equal number of ports pointing | |||
southbound. With that simplification, K denotes half of the radix | northbound and southbound. With that simplification, K denotes | |||
of a symmetrical switch, meaning that the switch has K ports | half of the radix of a symmetrical switch, meaning that the switch | |||
pointing north and K ports pointing south. K_LEAF (K of a leaf) | has K ports pointing north and K ports pointing south. K_LEAF (K | |||
thus represents both the number of access ports in a leaf Node and | of a leaf) thus represents both the number of access ports in a | |||
the maximum number of planes in the fabric, whereas K_TOP (K of a | leaf node and the maximum number of planes in the fabric, whereas | |||
ToP) represents the number of leaves in the PoD and the number of | K_TOP (K of a ToP) represents the number of leaves in the PoD and | |||
ports pointing north in a ToP Node towards a higher spine level | the number of ports pointing north in a ToP Node towards a higher | |||
and thus the number of ToF nodes in a plane. | spine level and thus the number of ToF nodes in a plane. | |||
ToF Plane: | ToF Plane: | |||
Set of ToFs that are aware of each other by means of south | Set of ToFs that are aware of each other by means of south | |||
reflection. Planes are designated by capital letters, e.g. plane | reflection. Planes are designated by capital letters, e.g., plane | |||
A. | A. | |||
N: | N: | |||
Denotes the number of independent ToF planes in a topology. | Denotes the number of independent ToF planes in a topology. | |||
R: | R: | |||
Denotes a redundancy factor, i.e., number of connections a spine | Denotes a redundancy factor, i.e., the number of ToP nodes in a | |||
has towards a ToF plane. In single plane design K_TOP is equal to | PoD that are connected to a ToF plane. In a single-plane design, | |||
R. | R is equal to K_LEAF | |||
Fallen Leaf: | Fallen Leaf: | |||
A fallen leaf in a plane Z is a switch that lost all connectivity | A fallen leaf in a plane Z is a switch that lost all connectivity | |||
northbound to Z. | northbound to Z. | |||
5.2.2. Clos as Crossed, Stacked Crossbars | 5.2.2. Clos as Crossed, Stacked Crossbars | |||
The typical topology for which RIFT is defined is built of P number | The typical topology for which RIFT is defined is built of P number | |||
of PoDs and connected together by S number of ToF nodes. A PoD node | of PoDs and connected together by S number of ToF nodes. A PoD node | |||
has K number of ports. From here on half of them (K=Radix/2) are | has 2K number of ports. From here on, half of them (K=Radix/2) are | |||
assumed to connect host devices from the south, and the other half to | assumed to connect host devices from the south, and the other half is | |||
connect to interleaved PoD Top-Level switches to the north. The K | assumed to connect to interleaved PoD top-level switches to the | |||
ratio can be chosen differently without loss of generality when port | north. The K ratio can be chosen differently without loss of | |||
speeds differ or the fabric is oversubscribed but K=Radix/2 allows | generality when port speeds differ or the fabric is oversubscribed, | |||
for more readable representation whereby there are as many ports | but K=Radix/2 allows for more readable representation whereby there | |||
facing north as south on any intermediate node. A node is hence | are as many ports facing north as south on any intermediate node. A | |||
represented in a schematic fashion with ports "sticking out" to its | node is hence represented in a schematic fashion with ports "sticking | |||
north and south rather than by the usual real-world front faceplate | out" to its north and south, rather than by the usual real-world | |||
designs of the day. | front faceplate designs of the day. | |||
Figure 4 provides a view of a leaf node as seen from the north, i.e. | Figure 4 provides a view of a leaf node as seen from the north, i.e., | |||
showing ports that connect northbound. For lack of a better symbol, | showing ports that connect northbound. For lack of a better symbol, | |||
the document chooses to use the "o" as ASCII visualisation of a | the document chooses to use the "o" as ASCII visualization of a | |||
single port. In this example, K_LEAF has 6 ports. Observe that the | single port. In this example, K_LEAF has 6 ports. Observe that the | |||
number of PoDs is not related to Radix unless the ToF Nodes are | number of PoDs is not related to the Radix unless the ToF nodes are | |||
constrained to be the same as the PoD nodes in a particular | constrained to be the same as the PoD nodes in a particular | |||
deployment. | deployment. | |||
Top view | Top View | |||
+---+ | +---+ | |||
| | | | | | |||
| O | e.g., Radix = 12, K_LEAF = 6 | | o | e.g., Radix = 12, K_LEAF = 6 | |||
| | | | | | |||
| O | | | o | | |||
| | ------------------------- | | | ------------------------- | |||
| o <------ Physical Port (Ethernet) ----+ | | o <------ Physical Port (Ethernet) ----+ | |||
| | ------------------------- | | | | ------------------------- | | |||
| O | | | | o | | | |||
| | | | | | | | |||
| O | | | | o | | | |||
| | | | | | | | |||
| O | | | | o | | | |||
| | | | | | | | |||
+---+ v | +---+ v | |||
|| || || || || || || | || || || || || || || | |||
+----+ +------------------------------------------------+ | +----+ +------------------------------------------------+ | |||
| | | | | | | | | | |||
+----+ +------------------------------------------------+ | +----+ +------------------------------------------------+ | |||
|| || || || || || || | || || || || || || || | |||
Side views | Side Views | |||
Figure 4: A Leaf Node, K_LEAF=6 | Figure 4: A Leaf Node, K_LEAF=6 | |||
The Radix of a PoD's top node may be different than that of the leaf | The Radix of a PoD's top node may be different than that of the leaf | |||
node. Though, more often than not, a same type of node is used for | node. Though, more often than not, a same type of node is used for | |||
both, effectively forming a square (K*K). In the general case, | both, effectively forming a square (K*K). In the general case, | |||
switches at the top of the PoD with K_TOP southern ports not | switches at the top of the PoD with K_TOP southern ports not | |||
necessarily equal to K_LEAF could be considered . For instance, in | necessarily equal to K_LEAF could be considered. For instance, in | |||
the representations below, we pick a 6 port K_LEAF and an 8 port | the representations below, we pick a 6-port K_LEAF and an 8-port | |||
K_TOP. In order to form a crossbar, K_TOP Leaf Nodes are necessary | K_TOP. In order to form a crossbar, K_TOP leaf nodes are necessary | |||
as illustrated in Figure 5. | as illustrated in Figure 5. | |||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| O | | O | | O | | O | | O | | O | | O | | O | | | O | | O | | O | | O | | O | | O | | O | | O | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| O | | O | | O | | O | | O | | O | | O | | O | | | O | | O | | O | | O | | O | | O | | O | | O | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| O | | O | | O | | O | | O | | O | | O | | O | | | O | | O | | O | | O | | O | | O | | O | | O | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| O | | O | | O | | O | | O | | O | | O | | O | | | O | | O | | O | | O | | O | | O | | O | | O | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| O | | O | | O | | O | | O | | O | | O | | O | | | O | | O | | O | | O | | O | | O | | O | | O | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| O | | O | | O | | O | | O | | O | | O | | O | | | O | | O | | O | | O | | O | | O | | O | | O | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | |||
Figure 5: Southern View of Leaf Nodes of a PoD, K_TOP=8 | Figure 5: Southern View of Leaf Nodes of a PoD, K_TOP=8 | |||
As further visualized in Figure 6 the K_TOP Leaf Nodes are fully | As further visualized in Figure 6, the K_TOP leaf nodes are fully | |||
interconnected with the K_LEAF ToP nodes, providing connectivity that | interconnected with the K_LEAF ToP nodes, providing connectivity that | |||
can be represented as a crossbar when "looked at" from the north. | can be represented as a crossbar when "looked at" from the north. | |||
The result is that, in the absence of a failure, a packet entering | The result is that, in the absence of a failure, a packet entering | |||
the PoD from the north on any port can be routed to any port in the | the PoD from the north on any port can be routed to any port in the | |||
south of the PoD and vice versa. And that is precisely why it makes | south of the PoD and vice versa. And that is precisely why it makes | |||
sense to talk about a "switching matrix". | sense to talk about a "switching matrix". | |||
W <---*---> E | W <---*---> E | |||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | |||
skipping to change at page 24, line 37 ¶ | skipping to change at line 1024 ¶ | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | | |||
^ | | ^ | | |||
| | | | | | |||
| ---------- ----------------------- | | | ---------- ----------------------- | | |||
+----- Leaf Node Top-of-PoD Node (Spine) --+ | +----- Leaf Node Top-of-PoD Node (Spine) --+ | |||
---------- ----------------------- | ---------- ----------------------- | |||
Figure 6: Northern View of a PoD's Spines, K_TOP=8 | Figure 6: Northern View of a PoD's Spines, K_TOP=8 | |||
Side views of this PoD is illustrated in Figure 7 and Figure 8. | Side views of this PoD is illustrated in Figures 7 and 8. | |||
Connecting to Spine Nodes | Connecting to ToP Nodes | |||
|| || || || || || || || | || || || || || || || || | |||
+----------------------------------------------------------------+ N | +----------------------------------------------------------------+ N | |||
| Top-of-PoD Node (Sideways) | ^ | | Top-of-PoD Node (Sideways) | ^ | |||
+----------------------------------------------------------------+ | | +----------------------------------------------------------------+ | | |||
|| || || || || || || || * | || || || || || || || || * | |||
+----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ | | +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ | | |||
|Leaf| |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| v | |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| v | |||
|Node| |Node| |Node| |Node| |Node| |Node| |Node| |Node| S | |Node| |Node| |Node| |Node| |Node| |Node| |Node| |Node| S | |||
+----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ | +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ | |||
|| || || || || || || || | || || || || || || || || | |||
Connecting to Client Nodes | Connecting to Client Nodes | |||
Figure 7: Side View of a PoD, K_TOP=8, K_LEAF=6 | Figure 7: Side View of a PoD, K_TOP=8, K_LEAF=6 | |||
Connecting to Spine Nodes | Connecting to ToP Nodes | |||
|| || || || || || | || || || || || || | |||
+----+ +----+ +----+ +----+ +----+ +----+ N | +----+ +----+ +----+ +----+ +----+ +----+ N | |||
|ToP | |ToP | |ToP | |ToP | |ToP | |ToP | ^ | |ToP | |ToP | |ToP | |ToP | |ToP | |ToP | ^ | |||
|Node| |Node| |Node| |Node| |Node| |Node| | | |Node| |Node| |Node| |Node| |Node| |Node| | | |||
+----+ +----+ +----+ +----+ +----+ +----+ * | +----+ +----+ +----+ +----+ +----+ +----+ * | |||
|| || || || || || | | || || || || || || | | |||
+------------------------------------------------+ v | +------------------------------------------------+ v | |||
| Leaf Node (Sideways) | S | | Leaf Node (Sideways) | S | |||
+------------------------------------------------+ | +------------------------------------------------+ | |||
Connecting to Client Nodes | Connecting to Client Nodes | |||
Figure 8: Other Side View of a PoD, K_TOP=8, K_LEAF=6, 90-Degree | Figure 8: Other Side View of a PoD, K_TOP=8, K_LEAF=6, 90-Degree | |||
Turn in E-W Plane from the Previous Figure | Turn in E-W Plane from the Previous Figure | |||
As a next step, observe that a resulting PoD can be abstracted as a | As a next step, observe that a resulting PoD can be abstracted as a | |||
bigger node with a number K of K_POD= K_TOP * K_LEAF, and the design | bigger node with a number K of K_POD = K_TOP * K_LEAF, and the design | |||
can recurse. | can recurse. | |||
It will be critical at this point that, before progressing further, | It will be critical at this point that, before progressing further, | |||
the concept and the picture of "crossed crossbars" is understood. | the concept and the picture of "crossed crossbars" is understood. | |||
Else, the following considerations might be difficult to comprehend. | Else, the following considerations might be difficult to comprehend. | |||
To continue, the PoDs are interconnected with each other through a | To continue, the PoDs are interconnected with each other through a | |||
ToF node at the very top or the north edge of the fabric. The | ToF node at the very top or the north edge of the fabric. The | |||
resulting ToF is *not* partitioned if, and only if (IIF), every PoD | resulting ToF is *not* partitioned if and only if (IIF) every ToP | |||
top level node (spine) is connected to every ToF Node. This topology | node is connected to every ToF node. This topology is also referred | |||
is also referred to as a single plane configuration and is quite | to as a single-plane configuration and is quite popular due to its | |||
popular due to its simplicity. In order to reach a 1:1 connectivity | simplicity. There are K_TOP ToF nodes and K_LEAF ToP nodes because | |||
ratio between the ToF and the leaves, it results that there are K_TOP | each port of a ToP node connects to a different ToF node. | |||
ToF nodes, because each port of a ToP node connects to a different | Consequently, it will take at least P * K_LEAF ports on a ToF node to | |||
ToF node, and K_LEAF ToP nodes for the same reason. Consequently, it | connect to each of the K_LEAF ToP nodes of the P PoDs. Figure 9 | |||
will take at least (P * K_LEAF) ports on a ToF node to connect to | illustrates this, looking at P=3 PoDs from above and 2 sides. The | |||
each of the K_LEAF ToP nodes of the P PoDs. Figure 9 illustrates | large view is the one from above, with the 8 ToF of 3 * 6 ports each | |||
this, looking at P=3 PoDs from above and 2 sides. The large view is | interconnecting the PoDs and every ToP Node being connected to every | |||
the one from above, with the 8 ToF of 3*6 ports each interconnecting | ToF node. | |||
the PoDs, every ToP Node being connected to every ToF node. | ||||
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] <-----+ | [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] <-----+ | |||
| | | | | | | | | | | | | | | | | | | | |||
[=================================] | -------------- | [=================================] | -------------- | |||
| | | | | | | | +----- ToF | | | | | | | | | +----- ToF | |||
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] +----- Node ---+ | [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] +----- Node ---+ | |||
| -------------- | | | -------------- | | |||
| v | | v | |||
+-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ <-----+ +-+ | +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ <-----+ +-+ | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
skipping to change at page 26, line 37 ¶ | skipping to change at line 1114 ¶ | |||
| | | | | | | | | | | | | | | | -+ +- +-+ v | | | | | | | | | | | | | | | | | | | -+ +- +-+ v | | | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] | | --| |--[ ]--| | | [ |o| |o| |o| |o| |o| |o| |o| |o| ] | | --| |--[ ]--| | | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] | ----- | --| |--[ ]--| | | [ |o| |o| |o| |o| |o| |o| |o| |o| ] | ----- | --| |--[ ]--| | | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] +--- PoD ---+ --| |--[ ]--| | | [ |o| |o| |o| |o| |o| |o| |o| |o| ] +--- PoD ---+ --| |--[ ]--| | | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] | ----- | --| |--[ ]--| | | [ |o| |o| |o| |o| |o| |o| |o| |o| ] | ----- | --| |--[ ]--| | | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] | | --| |--[ ]--| | | [ |o| |o| |o| |o| |o| |o| |o| |o| ] | | --| |--[ ]--| | | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] | | --| |--[ ]--| | | [ |o| |o| |o| |o| |o| |o| |o| |o| ] | | --| |--[ ]--| | | |||
| | | | | | | | | | | | | | | | -+ +- +-+ | | | | | | | | | | | | | | | | | | | -+ +- +-+ | | | |||
+-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ | +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ | |||
Figure 9: Fabric Spines and TOFs in Single Plane Design, 3 PoDs | Figure 9: Fabric Spines and ToFs in Single-Plane Design, 3 PoDs | |||
The top view can be collapsed into a third dimension where the hidden | The top view can be collapsed into a third dimension where the hidden | |||
depth index is representing the PoD number. One PoD can be shown | depth index is representing the PoD number. One PoD can be shown | |||
then as a class of PoDs and hence save one dimension in the | then as a class of PoDs and hence save one dimension in the | |||
representation. The Spine Node expands in the depth and the vertical | representation. The ToF node expands in the depth and the vertical | |||
dimensions, whereas the PoD top level Nodes are constrained, in | dimensions, whereas the ToP nodes are constrained in the horizontal | |||
horizontal dimension. A port in the 2-D representation represents | dimension. A port in the 2-D representation effectively represents | |||
effectively the class of all the ports at the same position in all | the class of all the ports at the same position in all the PoDs that | |||
the PoDs that are projected in its position along the depth axis. | are projected in its position along the depth axis. This is shown in | |||
This is shown in Figure 10. | Figure 10. | |||
/ / / / / / / / / / / / / / / / | / / / / / / / / / / / / / / / / | |||
/ / / / / / / / / / / / / / / / | / / / / / / / / / / / / / / / / | |||
/ / / / / / / / / / / / / / / / | / / / / / / / / / / / / / / / / | |||
/ / / / / / / / / / / / / / / / ] | / / / / / / / / / / / / / / / / ] | |||
+-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ ]] | +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ ]] | |||
| | | | | | | | | | | | | | | | ] ----------------------- | | | | | | | | | | | | | | | | | ] ----------------------- | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] <-- Top of PoD Node (Spine) | [ |o| |o| |o| |o| |o| |o| |o| |o| ] <-- Top of PoD Node (Spine) | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] ----------------------- | [ |o| |o| |o| |o| |o| |o| |o| |o| ] ----------------------- | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ]]]] | [ |o| |o| |o| |o| |o| |o| |o| |o| ]]]] | |||
skipping to change at page 27, line 26 ¶ | skipping to change at line 1147 ¶ | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] // (depth) | [ |o| |o| |o| |o| |o| |o| |o| |o| ] // (depth) | |||
| |/| |/| |/| |/| |/| |/| |/| |/ // | | |/| |/| |/| |/| |/| |/| |/| |/ // | |||
+-+ +-+ +-+/+-+/+-+ +-+ +-+ +-+ // | +-+ +-+ +-+/+-+/+-+ +-+ +-+ +-+ // | |||
^ | ^ | |||
| -------- | | -------- | |||
+----- ToF Node | +----- ToF Node | |||
-------- | -------- | |||
Figure 10: Collapsed Northern View of a Fabric for Any Number of PoDs | Figure 10: Collapsed Northern View of a Fabric for Any Number of PoDs | |||
As simple as a single plane deployment is, it introduces a limit due | As simple as a single-plane deployment is, it introduces a limit due | |||
to the bound on the available radix of the ToF nodes that has to be | to the bound on the available radix of the ToF nodes that has to be | |||
at least P * K_LEAF. Nevertheless, it will become clear that a | at least P * K_LEAF. Nevertheless, it will become clear that a | |||
distinct advantage of a connected or non-partitioned ToF is that all | distinct advantage of a connected or non-partitioned ToF is that all | |||
failures can be resolved by simple, non-transitive, positive | failures can be resolved by simple, non-transitive, positive | |||
disaggregation (i.e., nodes advertising more specific prefixes with | disaggregation (i.e., nodes advertising more specific prefixes with | |||
the default to the level below them that is, however, not propagated | the default to the level below them that is not propagated further | |||
further down the fabric) as described in Section 6.5.1 . In other | down the fabric) as described in Section 6.5.1. In other words, non- | |||
words, non-partitioned ToF nodes can always reach nodes below or | partitioned ToF nodes can always reach nodes below or withdraw the | |||
withdraw the routes from PoDs they cannot reach unambiguously. And | routes from PoDs they cannot reach unambiguously. And with this, | |||
with this, positive disaggregation can heal all failures and still | positive disaggregation can heal all failures and still allow all the | |||
allow all the ToF nodes to be aware of each other via south | ToF nodes to be aware of each other via south reflection. | |||
reflection. Disaggregation will be explained in further detail in | Disaggregation will be explained in further detail in Section 6.5. | |||
Section 6.5. | ||||
In order to scale beyond the "single plane limit", the ToF can be | In order to scale beyond the "single-plane limit", the ToF can be | |||
partitioned into N number of identically wired planes where N is an | partitioned into N number of identically wired planes where N is an | |||
integer divider of K_LEAF. The 1:1 ratio and the desired symmetry | integer divider of K_LEAF. The 1:1 ratio and the desired symmetry | |||
are still served, this time with (K_TOP * N) ToF nodes, each of (P * | are still served, this time with (K_TOP*N) ToF nodes, each of | |||
K_LEAF / N) ports. N=1 represents a non-partitioned Spine and | (P*K_LEAF/N) ports. N=1 represents a non-partitioned ToF | |||
N=K_LEAF is a maximally partitioned Spine. Further, if R is any | (superspine), and N=K_LEAF is a maximally partitioned ToF. Further, | |||
integer divisor of K_LEAF, then N=K_LEAF/R is a feasible number of | if R is any integer divisor of K_LEAF, then N=K_LEAF/R is a feasible | |||
planes and R a redundancy factor that denotes the number of | number of planes and R is a redundancy factor that denotes the number | |||
independent paths between 2 leaves within a plane. It proves | of independent paths between 2 leaves within a plane. It proves | |||
convenient for deployments to use a radix for the leaf nodes that is | convenient for deployments to use a radix for the leaf nodes that is | |||
a power of 2 so they can pick a number of planes that is a lower | a power of 2 so they can pick a number of planes that is a lower | |||
power of 2. The example in Figure 11 splits the Spine in 2 planes | power of 2. The example in Figure 11 splits the ToF in 2 planes with | |||
with a redundancy factor R=3, meaning that there are 3 non- | a redundancy factor of R=3, meaning that there are 3 non-intersecting | |||
intersecting paths between any leaf node and any ToF node. A ToF | paths between any leaf node and any ToF node. A ToF node must have, | |||
node must have, in this case, at least 3*P ports, and be directly | in this case, at least 3*P ports and be directly connected to 3 of | |||
connected to 3 of the 6 ToP nodes (spines) in each PoD. The ToP | the 6 ToP nodes (spines) in each PoD. The ToP nodes are represented | |||
nodes are represented horizontally with K_TOP=8 ports northwards | horizontally with K_TOP=8 ports northwards each. | |||
each. | ||||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | |||
| | O | | O | | O | | O | | O | | O | | O | | O | | | | | O | | O | | O | | O | | O | | O | | O | | O | | | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | |||
| | O | | O | | O | | O | | O | | O | | O | | O | | | | | O | | O | | O | | O | | O | | O | | O | | O | | | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | |||
| | O | | O | | O | | O | | O | | O | | O | | O | | | | | O | | O | | O | | O | | O | | O | | O | | O | | | |||
skipping to change at page 29, line 5 ¶ | skipping to change at line 1214 ¶ | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | |||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | |||
^ | ^ | |||
| | | | |||
| --------------------- | | --------------------- | |||
+----- ToF Node Across Depth | +----- ToF Node Across Depth | |||
--------------------- | --------------------- | |||
Figure 11: Northern View of a Multi-Plane ToF Level, K_LEAF=6, N=2 | Figure 11: Northern View of a Multi-Plane ToF Level, K_LEAF=6, N=2 | |||
At the extreme end of the spectrum it is even possible to fully | At the extreme end of the spectrum, it is even possible to fully | |||
partition the spine with N = K_LEAF and R=1, while maintaining | partition the ToF with N=K_LEAF and R=1 while maintaining | |||
connectivity between each leaf node and each ToF node. In that case | connectivity between each leaf node and each ToF node. In that case, | |||
the ToF node connects to a single Port per PoD, so it appears as a | the ToF node connects to a single port per PoD, so it appears as a | |||
single port in the projected view represented in Figure 12. The | single port in the projected view represented in Figure 12. The | |||
number of ports required on the Spine Node is more than or equal to | number of ports required on the ToF node is more than or equal to P, | |||
P, the number of PoDs. | i.e., the number of PoDs. | |||
Plane 1 | Plane 1 | |||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ -+ | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ -+ | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | | |||
| | O | | O | | O | | O | | O | | O | | O | | O | | | | | | O | | O | | O | | O | | O | | O | | O | | O | | | | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | | |||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | | |||
----------- . ------------------- . ------------ . ------- | | ----------- . ------------------- . ------------ . ------- | | |||
Plane 2 | | Plane 2 | | |||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | | |||
skipping to change at page 31, line 8 ¶ | skipping to change at line 1274 ¶ | |||
| | | | | | |||
| ---------------- ------------- | | | ---------------- ------------- | | |||
+----- ToF Node Class of PoDs ---+ | +----- ToF Node Class of PoDs ---+ | |||
---------------- ------------- | ---------------- ------------- | |||
Figure 12: Northern View of a Maximally Partitioned ToF Level, R=1 | Figure 12: Northern View of a Maximally Partitioned ToF Level, R=1 | |||
5.3. Fallen Leaf Problem | 5.3. Fallen Leaf Problem | |||
As mentioned earlier, RIFT exhibits an anisotropic behavior tailored | As mentioned earlier, RIFT exhibits an anisotropic behavior tailored | |||
for fabrics with a North / South orientation and a high level of | for fabrics with a north-south orientation and a high level of | |||
interleaving paths. A non-partitioned fabric makes a total loss of | interleaving paths. A non-partitioned fabric makes a total loss of | |||
connectivity between a ToF node at the north and a leaf node at the | connectivity between a ToF node at the north and a leaf node at the | |||
south a very rare but yet possible occasion that is fully healed by | south a very rare but possible occasion that is fully healed by | |||
positive disaggregation as described in Section 6.5.1. In large | positive disaggregation as described in Section 6.5.1. In large | |||
fabrics or fabrics built from switches with low radix, the ToF may | fabrics or fabrics built from switches with a low radix, the ToF may | |||
often become partitioned in planes which makes the occurrence of | often become partitioned in planes, which makes it more likely that a | |||
having a given leaf being only reachable from a subset of the ToF | given leaf is only reachable from a subset of the ToF nodes. This | |||
nodes more likely to happen. This makes some further considerations | makes some further considerations necessary. | |||
necessary. | ||||
A "Fallen Leaf" is a leaf that can be reached by only a subset of ToF | A "fallen leaf" is a leaf that can be reached by only a subset of ToF | |||
nodes due to missing connectivity. If R is the redundancy factor, | nodes due to missing connectivity. If R is the redundancy factor, | |||
then it takes at least R breakages to reach a "Fallen Leaf" | then it takes at least R breakages to reach a "fallen leaf" | |||
situation. | situation. | |||
In a maximally partitioned fabric, the redundancy factor is R=1, so | In a maximally partitioned fabric, the redundancy factor is R=1, so | |||
any breakage in the fabric will cause one or more fallen leaves in | any breakage in the fabric will cause one or more fallen leaves in | |||
the affected plane. R=2 guarantees that a single breakage will not | the affected plane. R=2 guarantees that a single breakage will not | |||
cause a fallen leaf. However, not all cases require disaggregation. | cause a fallen leaf. However, not all cases require disaggregation. | |||
The following cases do not require particular action: | The following cases do not require particular action: | |||
If a southern link on a node goes down, then connectivity through | * If a southern link on a node goes down, then connectivity through | |||
that node is lost for all nodes south of it. There is no need to | that node is lost for all nodes south of that link. There is no | |||
disaggregate since the connectivity to this node is lost for all | need to disaggregate since the connectivity to this node is lost | |||
spine nodes in a same fashion. | for all spine nodes in the same fashion. | |||
If a ToF Node goes down, then northern traffic towards it is | * If a ToF node goes down, then northern traffic towards it is | |||
routed via alternate ToF nodes in the same plane and there is no | routed via alternate ToF nodes in the same plane and there is no | |||
need to disaggregate routes. | need to disaggregate routes. | |||
In a general manner, the mechanism of non-transitive positive | In a general manner, the mechanism of non-transitive, positive | |||
disaggregation is sufficient when the disaggregating ToF nodes | disaggregation is sufficient when the disaggregating ToF nodes | |||
collectively connect to all the ToP nodes in the broken plane. This | collectively connect to all the ToP nodes in the broken plane. This | |||
happens in the following case: | happens in the following case: | |||
If the breakage is the last northern link from a ToP node to a ToF | * If the breakage is the last northern link from a ToP node to a ToF | |||
node going down, then the fallen leaf problem affects only that | node going down, then the fallen leaf problem affects only that | |||
ToF node, and the connectivity to all the nodes in the PoD is lost | ToF node, and the connectivity to all the nodes in the PoD is lost | |||
from that ToF node. This can be observed by other ToF nodes | from that ToF node. This can be observed by other ToF nodes | |||
within the plane where the ToP node is located and positively | within the plane where the ToP node is located and positively | |||
disaggregated within that plane. | disaggregated within that plane. | |||
On the other hand, there is a need to disaggregate the routes to | On the other hand, there is a need to disaggregate the routes to | |||
Fallen Leaves within the plane in a transitive fashion, that is, all | Fallen Leaves within the plane in a transitive fashion, that is, all | |||
the way to the other leaves, in the following cases: | the way to the other leaves, in the following cases: | |||
* If the breakage is the last northern link from a leaf node within | * If the breakage is the last northern link from a leaf node within | |||
a plane (there is only one such link in a maximally partitioned | a plane (there is only one such link in a maximally partitioned | |||
fabric) that goes down, then connectivity to all unicast prefixes | fabric) that goes down, then connectivity to all unicast prefixes | |||
attached to the leaf node is lost within the plane where the link | attached to the leaf node is lost within the plane where the link | |||
is located. Southern Reflection by a leaf node, e.g., between ToP | is located. Southern Reflection by a leaf node, e.g., between ToP | |||
nodes, if the PoD has only 2 levels, happens in between planes, | nodes, if the PoD has only 2 levels, happens in between planes, | |||
allowing the ToP nodes to detect the problem within the PoD where | allowing the ToP nodes to detect the problem within the PoD where | |||
it occurs and positively disaggregate. The breakage can be | it occurs and positively disaggregate. The breakage can be | |||
observed by the ToF nodes in the same plane through the North | observed by the ToF nodes in the same plane through the north | |||
flooding of TIEs from the ToP nodes. The ToF nodes however need | flooding of TIEs from the ToP nodes. However, the ToF nodes need | |||
to be aware of all the affected prefixes for the negative, | to be aware of all the affected prefixes for the negative, | |||
possibly transitive disaggregation to be fully effective (i.e., a | possibly transitive, disaggregation to be fully effective (i.e., a | |||
node advertising in the control plane that it cannot reach a | node advertising in the control plane that it cannot reach a | |||
certain more specific prefix than default whereas such | certain more specific prefix than the default prefix, whereas such | |||
disaggregation must in the extreme condition propagate further | disaggregation in the extreme condition must be propagated further | |||
down southbound). The problem can also be observed by the ToF | down southbound). The problem can also be observed by the ToF | |||
nodes in the other planes through the flooding of North TIEs from | nodes in the other planes through the flooding of North TIEs from | |||
the affected leaf nodes, together with non-node North TIEs which | the affected leaf nodes, together with non-node North TIEs, which | |||
indicate the affected prefixes. To be effective in that case, the | indicate the affected prefixes. To be effective in that case, the | |||
positive disaggregation must reach down to the nodes that make the | positive disaggregation must reach down to the nodes that make the | |||
plane selection, which are typically the ingress leaf nodes. The | plane selection, which are typically the ingress leaf nodes. The | |||
information is not useful for routing in the intermediate levels. | information is not useful for routing in the intermediate levels. | |||
* If the breakage is a ToP node in a maximally partitioned fabric | * If the breakage is a ToP node in a maximally partitioned fabric | |||
(in which case it is the only ToP node serving the plane in that | (in which case it is the only ToP node serving the plane in that | |||
PoD that goes down), then the connectivity to all the nodes in the | PoD that goes down), then the connectivity to all the nodes in the | |||
PoD is lost within the plane where the ToP node is located. | PoD is lost within the plane where the ToP node is located. | |||
Consequently, all leaves of the PoD fall in this plane. Since the | Consequently, all leaves of the PoD fall in this plane. Since the | |||
Southern Reflection between the ToF nodes happens only within a | Southern Reflection between the ToF nodes happens only within a | |||
plane, ToF nodes in other planes cannot discover fallen leaves in | plane, ToF nodes in other planes cannot discover fallen leaves in | |||
a different plane. They also cannot determine beyond their local | a different plane. They also cannot determine beyond their local | |||
plane whether a leaf node that was initially reachable has become | plane whether a leaf node that was initially reachable has become | |||
unreachable. As the breakage can be observed by the ToF nodes in | unreachable. As the breakage can be observed by the ToF nodes in | |||
the plane where the breakage happened, the ToF nodes in the plane | the plane where the breakage happened, the ToF nodes in the plane | |||
need to be aware of all the affected prefixes for the negative | need to be aware of all the affected prefixes for the negative | |||
disaggregation to be fully effective. The problem can also be | disaggregation to be fully effective. The problem can also be | |||
observed by the ToF nodes in the other planes through the flooding | observed by the ToF nodes in the other planes through the flooding | |||
of North TIEs from the affected leaf nodes, if there are only 3 | of North TIEs from the affected leaf nodes if the failing ToP node | |||
levels and the ToP nodes are directly connected to the leaf nodes, | is directly connected to its leaf nodes, which can detect the link | |||
and then again it can only be effective if it is propagated | going down. Then again, the knowledge of the failure at the ToF | |||
transitively to the leaf, and useless above that level. | level can only be useful if it is propagated transitively to all | |||
the leaves; it is useless above that level since the decision of | ||||
placing a packet in a plane happens at the leaf that injects the | ||||
packet in the fabric. | ||||
These abstractions are rolled back into a simplified example that | These abstractions are rolled back into a simplified example that | |||
shows that in Figure 3 the loss of link between spine node 3 and leaf | shows that in Figure 3 the loss of the link between spine node 3 and | |||
node 3 will make leaf node 3 a fallen leaf for ToF nodes in plane C. | leaf node 3 will make leaf node 3 a fallen leaf for ToF nodes in | |||
Worse, if the cabling was never present in the first place, plane C | plane C. Worse, if the cabling was never present in the first place, | |||
will not even be able to know that such a fallen leaf exists. Hence | plane C will not even be able to know that such a fallen leaf exists. | |||
partitioning without further treatment results in two grave problems: | Hence, partitioning without further treatment results in two grave | |||
problems: | ||||
* Leaf node 1 trying to route to leaf node 3 must not choose spine | 1. Leaf node 1 trying to route to leaf node 3 must not choose spine | |||
node 3 in plane C as its next hop since it will inevitably drop | node 3 in plane C as its next hop since it will inevitably drop | |||
the packet when forwarding using default routes or do excessive | the packet when forwarding using default routes or do excessive | |||
bow-tying. This information must be in its routing table. | bow-tying. This information must be in its routing table. | |||
* A path computation trying to deal with the problem by distributing | 2. A path computation trying to deal with the problem by | |||
host routes may only form paths through leaves. The flooding of | distributing host routes may only form paths through leaves. The | |||
information about leaf node 3 would have to go up to ToF nodes in | flooding of information about leaf node 3 would have to go up to | |||
planes A, B, and D and then "loopback" over other leaves to ToF C | ToF nodes in planes A, B, and D and then "loopback" over other | |||
leading in extreme cases to traffic for leaf node 3 when presented | leaves to ToF C, leading in extreme cases to traffic for leaf | |||
to plane C taking an "inverted fabric" path where leaves start to | node 3 when presented to plane C taking an "inverted fabric" path | |||
serve as ToFs, at least for the duration of a protocol's | where leaves start to serve as ToFs, at least for the duration of | |||
convergence. | a protocol's convergence. | |||
5.4. Discovering Fallen Leaves | 5.4. Discovering Fallen Leaves | |||
When aggregation is used, RIFT deals with fallen leaves by ensuring | When aggregation is used, RIFT deals with fallen leaves by ensuring | |||
that all the ToF nodes share the same north topology database. This | that all the ToF nodes share the same north topology database. This | |||
happens naturally in single plane design by the means of northbound | happens naturally in single-plane design by the means of northbound | |||
flooding and south reflection but needs additional considerations in | flooding and south reflection but needs additional considerations in | |||
multi-plane fabrics. To enable routing to fallen leaves in multi- | multi-plane fabrics. To enable routing to fallen leaves in multi- | |||
plane designs, RIFT requires additional interconnection across planes | plane designs, RIFT requires additional interconnection across planes | |||
between the ToF nodes, e.g., using rings as illustrated in Figure 13. | between the ToF nodes, e.g., using rings as illustrated in Figure 13. | |||
Other solutions are possible but they either need more cabling or end | Other solutions are possible, but they either need more cabling or | |||
up having much longer flooding paths and/or single points of failure. | end up having much longer flooding paths and/or single points of | |||
failure. | ||||
In detail, by reserving at least two ports on each ToF node it is | In detail, by reserving at least two ports on each ToF node, it is | |||
possible to connect them together by interplane bi-directional rings | possible to connect them together by interplane bidirectional rings | |||
as illustrated in Figure 13. The rings will be used to exchange full | as illustrated in Figure 13. The rings will be used to exchange full | |||
north topology information between planes. All ToFs having the same | north topology information between planes. All ToFs having the same | |||
north topology allows by the means of transitive, negative | north topology allows, by the means of transitive, negative | |||
disaggregation described in Section 6.5.2 to efficiently fix any | disaggregation described in Section 6.5.2, to efficiently fix any | |||
possible fallen leaf scenario. Somewhat as a side effect, the | possible fallen leaf scenario. Somewhat as a side effect, the | |||
exchange of information fulfills the requirement for a full view of | exchange of information fulfills the requirement for a full view of | |||
the fabric topology at the ToF level, without the need to collate it | the fabric topology at the ToF level without the need to collate it | |||
from multiple points. | from multiple points. | |||
____________________________________________________________________________ | _______________________________________________________________________ | |||
| [Plane A] . [Plane B] . [Plane C] . [Plane D] | | | [Plane A] . [Plane B] . [Plane C] . [Plane D] | | |||
|..........................................................................| | |.....................................................................| | |||
| +-------------------------------------------------------------+ | | | +------------------------------------------------------------+ | | |||
| | +---+ . +---+ . +---+ . +---+ | | | | | +---+ . +---+ . +---+ . +---+ | | | |||
| +-+ n +-------------+ n +-------------+ n +-------------+ n +-+ | | | +-+ n +-------------+ n +-------------+ n +------------+ n +-+ | | |||
| +--++ . +-+++ . +-+++ . +--++ | | | +--++ . +-+++ . +-+++ . +--++ | | |||
| || . || . || . || | | | || . || . || . || | | |||
| +---------||---------------||----------------||---------------+ || | | | +---------||---------------||----------------||--------------+ || | | |||
| | +---+ || . +---+ || . +---+ || . +---+ | || | | | | +---+ || . +---+ || . +---+ || . +---+ | || | | |||
| +-+ 1 +---||--------+ 1 +--||---------+ 1 +--||---------+ 1 +-+ || | | | +-+ 1 +---||--------+ 1 +--||---------+ 1 +--||--------+ 1 +-+ || | | |||
| +--++ || . +-+++ || . +-+++ || . +-+++ || | | | +--++ || . +-+++ || . +-+++ || . +-+++ || | | |||
| || || . || || . || || . || || | | | || || . || || . || || . || || | | |||
| || || . || || . || || . || || | | | || || . || || . || || . || || | | |||
Figure 13: Using rings to bring all planes and at the ToF bind them | Figure 13: Using Rings to Bring All Planes and Bind Them at the ToF | |||
5.5. Addressing the Fallen Leaves Problem | 5.5. Addressing the Fallen Leaves Problem | |||
One consequence of the "Fallen Leaf" problem is that some prefixes | One consequence of the "fallen leaf" problem is that some prefixes | |||
attached to the fallen leaf become unreachable from some of the ToF | attached to the fallen leaf become unreachable from some of the ToF | |||
nodes. RIFT defines two methods to address this issue denoted as | nodes. RIFT defines two methods to address this issue, denoted as | |||
positive disaggregation and negative disaggregation. Both methods | positive disaggregation and negative disaggregation. Both methods | |||
flood corresponding types of South TIEs to advertise the impacted | flood corresponding types of South TIEs to advertise the impacted | |||
prefix(es). | prefix(es). | |||
When used for the operation of disaggregation, a positive South TIE, | When used for the operation of disaggregation, a positive South TIE, | |||
as usual, indicates reachability to a prefix of given length and all | as usual, indicates reachability to a prefix of given length and all | |||
addresses subsumed by it. In contrast, a negative route | addresses subsumed by it. In contrast, a negative route | |||
advertisement indicates that the origin cannot route to the | advertisement indicates that the origin cannot route to the | |||
advertised prefix. | advertised prefix. | |||
The positive disaggregation is originated by a router that can still | The positive disaggregation is originated by a router that can still | |||
reach the advertised prefix, and the operation is not transitive. In | reach the advertised prefix, and the operation is not transitive. In | |||
other words, the receiver does *not* generate its own TIEs or flood | other words, the receiver does *not* generate its own TIEs or flood | |||
them south as a consequence of receiving positive disaggregation | them south as a consequence of receiving positive disaggregation | |||
advertisements from a higher level node. The effect of a positive | advertisements from a higher-level node. The effect of a positive | |||
disaggregation is that the traffic to the impacted prefix will follow | disaggregation is that the traffic to the impacted prefix will follow | |||
the longest match and will be limited to the northbound routers that | the longest match and will be limited to the northbound routers that | |||
advertised the more specific route. | advertised the more specific route. | |||
In contrast, the negative disaggregation can be transitive, and is | In contrast, the negative disaggregation can be transitive and is | |||
propagated south when all the possible routes have been advertised as | propagated south when all the possible routes have been advertised as | |||
negative exceptions. A negative route advertisement is only | negative exceptions. A negative route advertisement is only | |||
actionable when the negative prefix is aggregated by a positive route | actionable when the negative prefix is aggregated by a positive route | |||
advertisement for a shorter prefix. In such case, the negative | advertisement for a shorter prefix. In such case, the negative | |||
advertisement "punches out a hole" in the positive route in the | advertisement "punches out a hole" in the positive route in the | |||
routing table, making the positive prefix reachable through the | routing table, making the positive prefix reachable through the | |||
originator with the special consideration of the negative prefix | originator with the special consideration of the negative prefix | |||
removing certain next hop neighbors. The specific procedures will be | removing certain next-hop neighbors. The specific procedures are | |||
explained in detail in Section 6.5.2.3. | explained in detail in Section 6.5.2.3. | |||
When the ToF switches are not partitioned into multiple planes, the | When the ToF switches are not partitioned into multiple planes, the | |||
resulting southbound flooding of the positive disaggregation by the | resulting southbound flooding of the positive disaggregation by the | |||
ToF nodes that can still reach the impacted prefix is in general | ToF nodes that can still reach the impacted prefix is generally | |||
enough to cover all the switches at the next level south, typically | enough to cover all the switches at the next level south, typically | |||
the ToP nodes. If all those switches are aware of the | the ToP nodes. If all those switches are aware of the | |||
disaggregation, they collectively create a ceiling that intercepts | disaggregation, they collectively create a ceiling that intercepts | |||
all the traffic north and forwards it to the ToF nodes that | all the traffic north and forwards it to the ToF nodes that | |||
advertised the more specific route. In that case, the positive | advertised the more specific route. In that case, the positive | |||
disaggregation alone is sufficient to solve the fallen leaf problem. | disaggregation alone is sufficient to solve the fallen leaf problem. | |||
On the other hand, when the fabric is partitioned in planes, the | On the other hand, when the fabric is partitioned in planes, the | |||
positive disaggregation from ToF nodes in different planes do not | positive disaggregation from ToF nodes in different planes do not | |||
reach the ToP switches in the affected plane and cannot solve the | reach the ToP switches in the affected plane and cannot solve the | |||
skipping to change at page 35, line 33 ¶ | skipping to change at line 1488 ¶ | |||
packet typically occurs at the leaf level and the disaggregation must | packet typically occurs at the leaf level and the disaggregation must | |||
be transitive and reach all the leaves. In that case, the negative | be transitive and reach all the leaves. In that case, the negative | |||
disaggregation is necessary. The details on the RIFT approach to | disaggregation is necessary. The details on the RIFT approach to | |||
deal with fallen leaves in an optimal way are specified in | deal with fallen leaves in an optimal way are specified in | |||
Section 6.5.2. | Section 6.5.2. | |||
6. Specification | 6. Specification | |||
This section specifies the protocol in a normative fashion by either | This section specifies the protocol in a normative fashion by either | |||
prescriptive procedures or behavior defined by Finite State Machines | prescriptive procedures or behavior defined by Finite State Machines | |||
(FSM). | (FSMs). | |||
The FSMs, as usual, are presented as states a neighbor can assume, | The FSMs, as usual, are presented as states a neighbor can assume, | |||
events that can occur, and the corresponding actions performed when | events that can occur, and the corresponding actions performed when | |||
transitioning between states on event processing. | transitioning between states on event processing. | |||
Actions are performed before the end state is assumed. | Actions are performed before the end state is assumed. | |||
The FSMs can queue events against itself to chain actions or against | The FSMs can queue events against themselves to chain actions or | |||
other FSMs in the specification. Events are always processed in the | against other FSMs in the specification. Events are always processed | |||
sequence they have been queued. | in the sequence they have been queued. | |||
Consequently, "On Entry" actions for an FSM state are performed every | Consequently, "On Entry" actions for an FSM state are performed every | |||
time and right before the corresponding state is entered, i.e., after | time and right before the corresponding state is entered, i.e., after | |||
any transitions from previous state. | any transitions from previous state. | |||
"On Exit" actions are performed every time and immediately when a | "On Exit" actions are performed every time and immediately when a | |||
state is exited, i.e., before any transitions towards target state | state is exited, i.e., before any transitions towards the target | |||
are performed. | state are performed. | |||
Any attempt to transition from a state towards another on reception | Any attempt to transition from a state towards another on reception | |||
of an event where no action is specified MUST be considered an | of an event where no action is specified MUST be considered an | |||
unrecoverable error and the protocol MUST reset all adjacencies and | unrecoverable error, and the protocol MUST reset all adjacencies and | |||
discard all the state (i.e., force the FSM back to _OneWay_ and flush | discard all the states (i.e., force the FSM back to _OneWay_ and | |||
all of the queues holding flooding information). | flush all of the queues holding flooding information). | |||
The data structures and FSMs described in this document are | The data structures and FSMs described in this document are | |||
conceptual and do not have to be implemented precisely as described | conceptual and do not have to be implemented precisely as described | |||
here, i.e., an implementation is considered conforming as long as it | here, i.e., an implementation is considered conforming as long as it | |||
supports the described functionality and exhibits externally | supports the described functionality and exhibits externally | |||
observable behavior equivalent to the behavior of the standardized | observable behavior equivalent to the behavior of the standardized | |||
FSMs. | FSMs. | |||
The FSMs can use "timers" for different situations. Those timers are | The FSMs can use "timers" for different situations. Those timers are | |||
started through actions and their expiration leads to queuing of | started through actions, and their expiration leads to queuing of | |||
corresponding events to be processed. | corresponding events to be processed. | |||
The term "holdtime" is used often as short-hand for "holddown timer" | The term "holdtime" is used often as shorthand for "holddown timer" | |||
and signifies either the length of the holding down period or the | and signifies either the length of the holding down period or the | |||
timer used to expire after such period. Such timers are used to | timer used to expire after such period. Such timers are used to | |||
"hold down" state within an FSM that is cleaned if the machine | "holddown" the state within an FSM that is cleaned if the machine | |||
triggers a _HoldtimeExpired_ event. | triggers a _HoldtimeExpired_ event. | |||
6.1. Transport | 6.1. Transport | |||
All normative RIFT packet structures and their contents are defined | All normative RIFT packet structures and their contents are defined | |||
in the Thrift [thrift] models in Section 7. The packet structure | in the Thrift [thrift] models in Section 7. The packet structure | |||
itself is defined in _ProtocolPacket_ which contains the packet | itself is defined in _ProtocolPacket_, which contains the packet | |||
header in _PacketHeader_ and the packet contents in _PacketContent_. | header in _PacketHeader_ and the packet contents in _PacketContent_. | |||
_PacketContent_ is a union of the LIE, TIE, TIDE, and TIRE packets | _PacketContent_ is a union of the LIE, TIE, TIDE, and TIRE packets, | |||
which are subsequently defined in _LIEPacket_, _TIEPacket_, | which are subsequently defined in _LIEPacket_, _TIEPacket_, | |||
_TIDEPacket_, and _TIREPacket_ respectively. | _TIDEPacket_, and _TIREPacket_, respectively. | |||
Further, in terms of bits on the wire, it is the _ProtocolPacket_ | Further, in terms of bits on the wire, it is the _ProtocolPacket_ | |||
that is serialized and carried in an envelope defined in | that is serialized and carried in an envelope defined in | |||
Section 6.9.3 within a UDP frame that provides security and allows | Section 6.9.3 within a UDP frame that provides security and allows | |||
validation/modification of several important fields without Thrift | validation/modification of several important fields without Thrift | |||
de-serialization for performance and security reasons. Security | deserialization for performance and security reasons. Security | |||
model and procedures are further explained in Section 9. | models and procedures are further explained in Section 9. | |||
6.2. Link (Neighbor) Discovery (LIE Exchange) | 6.2. Link (Neighbor) Discovery (LIE Exchange) | |||
RIFT LIE exchange auto-discovers neighbors, negotiates RIFT ZTP | RIFT LIE exchange auto-discovers neighbors, negotiates RIFT ZTP | |||
parameters and discovers miscablings. The formation progresses under | parameters, and discovers miscablings. The formation progresses | |||
normal conditions from _OneWay_ to _TwoWay_ and then _ThreeWay_ state | under normal conditions from _OneWay_ to _TwoWay_ and then _ThreeWay_ | |||
at which point it is ready to exchange TIEs per Section 6.3. The | state, at which point it is ready to exchange TIEs as described in | |||
adjacency exchanges RIFT ZTP information (Section 6.7) in any of the | Section 6.3. The adjacency exchanges RIFT ZTP information | |||
states, i.e. it is not necessary to reach _ThreeWay_ for zero-touch | (Section 6.7) in any of the states, i.e., it is not necessary to | |||
provisioning to operate. | reach _ThreeWay_ for ZTP to operate. | |||
RIFT supports any combination of IPv4 and IPv6 addressing, including | RIFT supports any combination of IPv4 and IPv6 addressing, including | |||
link-local scope, on the fabric to form adjacencies with the | link-local scope, on the fabric to form adjacencies with the | |||
additional capability for forwarding paths that are capable of | additional capability for forwarding paths that are capable of | |||
forwarding IPv4 packets in presence of IPv6 addressing only. | forwarding IPv4 packets in the presence of IPv6 addressing only. | |||
IPv4 LIE exchange happens by default over well-known administratively | IPv4 LIE exchange happens by default over a well-known IPv4 multicast | |||
locally scoped and configured or otherwise well-known IPv4 multicast | address [RFC2365] that may also be administratively configured (e.g., | |||
address [RFC2365]. For IPv6 [RFC8200] exchange is performed over | with a local scope). For IPv6 [RFC8200], exchange is performed over | |||
link-local multicast scope [RFC4291] address which is configured or | the link-local multicast scope [RFC4291] address, which is configured | |||
otherwise well-known. In both cases a destination UDP port defined | or otherwise well-known. In both cases, a destination UDP port | |||
in the schema Section 7.2 is used unless configured otherwise. LIEs | defined in the schema (Section 7.2) is used unless configured | |||
MUST be sent with an IPv4 Time to Live (TTL) or an IPv6 Hop Limit | otherwise. LIEs MUST be sent with an IPv4 Time to Live (TTL) or an | |||
(HL) of either 1 or 255 to prevent RIFT information reaching beyond a | IPv6 Hop Limit (HL) of either 1 or 255 to prevent RIFT information | |||
single L3 next-hop in the topology. Observe that for the allocated | reaching beyond a single Layer 3 (L3) next hop in the topology. | |||
link-local scope IP multicast address TTL value of 1 is a more | Observe that, for the allocated link-local scope IP multicast | |||
logical choice since TTL value of 255 may in some environment lead to | address, the TTL value of 1 is a more logical choice since the TTL | |||
an early drop due to suspicious TTL value for a packet addressed to | value of 255 may, in some environments, lead to an early drop due to | |||
such destination. LIEs SHOULD be sent with network control | the suspicious TTL value for a packet addressed to such a | |||
precedence unless an implementation is prevented from doing so | destination. LIEs SHOULD be sent with network control precedence | |||
[RFC2474]. | unless an implementation is prevented from doing so [RFC2474]. | |||
Any LIE packet received on an address that is neither the well-known | Any LIE packet received on an address that is neither the well-known | |||
nor configured multicast or a broadcast address MUST be discarded. | nor configured multicast or a broadcast address MUST be discarded. | |||
The originating port of the LIE has no further significance other | The originating port of the LIE has no further significance, other | |||
than identifying the origination point. LIEs are exchanged over all | than identifying the origination point. LIEs are exchanged over all | |||
links running RIFT. | links running RIFT. | |||
An implementation may listen and send LIEs on IPv4 and/or IPv6 | An implementation may listen and send LIEs on IPv4 and/or IPv6 | |||
multicast addresses. A node MUST NOT originate LIEs on an address | multicast addresses. A node MUST NOT originate LIEs on an address | |||
family if it does not process received LIEs on that family. LIEs on | family if it does not process received LIEs on that family. LIEs on | |||
the same link are considered part of the same LIE FSM independent of | the same link are considered part of the same LIE FSM independent of | |||
the address family they arrive on. The LIE source address may not | the address family they arrive on. The LIE source address may not | |||
identify the peer uniquely in unnumbered or link-local address cases | identify the peer uniquely in unnumbered or link-local address cases | |||
so the response transmission MUST occur over the same interface the | so the response transmission MUST occur over the same interface the | |||
LIEs have been received on. A node may use any of the adjacency's | LIEs have been received on. A node may use any of the adjacency's | |||
source addresses it saw in LIEs on the specific interface during | source addresses it saw in LIEs on the specific interface during | |||
adjacency formation to send TIEs (Section 6.3.3). That implies that | adjacency formation to send TIEs (Section 6.3.3). That implies that | |||
an implementation MUST be ready to accept TIEs on all addresses it | an implementation MUST be ready to accept TIEs on all addresses it | |||
used as source of LIE frames. | used as sources of LIE frames. | |||
A simplified version MAY be implemented on platforms with limited | A simplified version MAY be implemented on platforms with limited | |||
multicast support (e.g. IoT devices) by sending and receiving LIE | multicast support (e.g., Internet of Things (IoT) devices) by sending | |||
frames on IPv4 subnet broadcast addresses or IPv6 all routers | and receiving LIE frames on IPv4 subnet broadcast addresses or IPv6 | |||
multicast address. However, this technique is less optimal and | all-routers multicast addresses. However, this technique is less | |||
presents a wider attack surface from a security perspective and | optimal and presents a wider attack surface from a security | |||
should hence be used only as last resort. | perspective and should hence be used only as a last resort. | |||
A _ThreeWay_ adjacency (as defined in the glossary) over any address | A _ThreeWay_ adjacency (as defined in the glossary) over any address | |||
family implies support for IPv4 forwarding if the | family implies support for IPv4 forwarding if the | |||
_ipv4_forwarding_capable_ flag in _LinkCapabilities_ is set to true. | _ipv4_forwarding_capable_ flag in _LinkCapabilities_ is set to true. | |||
In the absence of IPv4 LIEs with _ipv4_forwarding_capable_ set to | In the absence of IPv4 LIEs with _ipv4_forwarding_capable_ set to | |||
true, a node MUST forward IPv4 packets using gateways discovered on | true, a node MUST forward IPv4 packets using gateways discovered on | |||
IPv6-only links advertising this capability. The mechanism to | IPv6-only links advertising this capability. The mechanism to | |||
discover the corresponding IPv6 gateway is out of scope for this | discover the corresponding IPv6 gateway is out of scope for this | |||
specification and may be implementation specific. It is expected | specification and may be implementation-specific. It is expected | |||
that the whole fabric supports the same type of forwarding of address | that the whole fabric supports the same type of forwarding of address | |||
families on all the links, any other combination is outside the scope | families on all the links; any other combination is outside the scope | |||
of this specification. If IPv4 forwarding is supported on an | of this specification. If IPv4 forwarding is supported on an | |||
interface, _ipv4_forwarding_capable_ MUST be set to true for all LIEs | interface, _ipv4_forwarding_capable_ MUST be set to true for all LIEs | |||
advertised from that interface. If IPv4 and IPv6 LIEs indicate | advertised from that interface. If IPv4 and IPv6 LIEs indicate | |||
contradicting information, protocol behavior is unspecified. A node | contradicting information, protocol behavior is unspecified. A node | |||
sending IPv4 LIEs MUST set the _ipv4_forwarding_capable_ flag to true | sending IPv4 LIEs MUST set the _ipv4_forwarding_capable_ flag to true | |||
on all LIEs advertised from that interface. | on all LIEs advertised from that interface. | |||
Operation of a fabric where only some of the links are supporting | Operation of a fabric where only some of the links are supporting | |||
forwarding on an address family or have an address in a family and | forwarding on an address family or have an address in a family and | |||
others do not is outside the scope of this specification. | others do not is outside the scope of this specification. | |||
Any attempt to construct IPv6 forwarding over IPv4 only adjacencies | Any attempt to construct IPv6 forwarding over IPv4-only adjacencies | |||
is outside this specification. | is outside the scope of this specification. | |||
Table 1 outlines protocol behavior pertaining to LIE exchange over | Table 1 outlines protocol behavior pertaining to LIE exchange over | |||
different address family combinations. Table 2 outlines the way in | different address family combinations. Table 2 outlines the way in | |||
which neighbors forward traffic as it pertains to the | which neighbors forward traffic as it pertains to the | |||
_ipv4_forwarding_capable_ flag setting across the same address family | _ipv4_forwarding_capable_ flag setting across the same address family | |||
combinations. The table is symmetric, i.e. local and remote can be | combinations. The table is symmetric, i.e., the local and remote | |||
exchanged to construct the remaining combinations. | columns can be exchanged to construct the remaining combinations. | |||
The specific forwarding implementation to support the described | The specific forwarding implementation to support the described | |||
behavior is out of scope for this document. | behavior is out of scope for this document. | |||
+==========+==========+==========================================+ | +==========+==========+==========================================+ | |||
| Local | Remote | LIE Exchange Behavior | | | Local | Remote | LIE Exchange Behavior | | |||
| Neighbor | Neighbor | | | | Neighbor | Neighbor | | | |||
| AF | AF | | | | Address | Address | | | |||
| Family | Family | | | ||||
+==========+==========+==========================================+ | +==========+==========+==========================================+ | |||
| IPv4 | IPv4 | LIEs and TIEs are exchanged over IPv4 | | | IPv4 | IPv4 | LIEs and TIEs are exchanged over IPv4 | | |||
| | | only. The local neighbor receives TIEs | | | | | only. The local neighbor receives TIEs | | |||
| | | from remote neighbors on any of the LIE | | | | | from remote neighbors on any of the LIE | | |||
| | | source addresses. | | | | | source addresses. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
| IPv6 | IPv6 | LIEs and TIEs are exchanged over IPv6 | | | IPv6 | IPv6 | LIEs and TIEs are exchanged over IPv6 | | |||
| | | only. The local neighbor receives TIEs | | | | | only. The local neighbor receives TIEs | | |||
| | | from remote neighbors on any of the LIE | | | | | from remote neighbors on any of the LIE | | |||
| | | source addresses. | | | | | source addresses. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
| IPv4, | IPv6 | The local neighbor sends LIEs for both | | | IPv4, | IPv6 | The local neighbor sends LIEs for both | | |||
| IPv6 | | IPv4 and IPv6 while the remote neighbor | | | IPv6 | | IPv4 and IPv6, while the remote neighbor | | |||
| | | only sends LIEs for IPv6. The resulting | | | | | only sends LIEs for IPv6. The resulting | | |||
| | | adjacency will exchange TIEs over IPv6 | | | | | adjacency will exchange TIEs over IPv6 | | |||
| | | on any of the IPv6 LIE source addresses. | | | | | on any of the IPv6 LIE source addresses. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
| IPv4, | IPv4, | LIEs and TIEs are exchanged over IPv6 | | | IPv4, | IPv4, | LIEs and TIEs are exchanged over IPv6 | | |||
| IPv6 | IPv6 | and IPv4. TIEs are received on any of | | | IPv6 | IPv6 | and IPv4. TIEs are received on any of | | |||
| | | the IPv4 or IPv6 LIE source addresses. | | | | | the IPv4 or IPv6 LIE source addresses. | | |||
| | | The local neighbor receives TIEs from | | | | | The local neighbor receives TIEs from | | |||
| | | the remote neighbors on any of the IPv4 | | | | | the remote neighbors on any of the IPv4 | | |||
| | | or IPv6 LIE source addresses. | | | | | or IPv6 LIE source addresses. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
| IPv4, | IPv4 | The local neighbor sends LIEs for both | | | IPv4, | IPv4 | The local neighbor sends LIEs for both | | |||
| IPv6 | | IPv4 and IPv6 while the remote neighbor | | | IPv6 | | IPv4 and IPv6, while the remote neighbor | | |||
| | | only sends LIEs for IPv4. The resulting | | | | | only sends LIEs for IPv4. The resulting | | |||
| | | adjacency will exchange TIEs over IPv4 | | | | | adjacency will exchange TIEs over IPv4 | | |||
| | | on any of the IPv4 LIE source addresses. | | | | | on any of the IPv4 LIE source addresses. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
Table 1: Control Plane Behavior for Neighbor AF Combinations | Table 1: Control Plane Behavior for Neighbor Address Family | |||
Combinations | ||||
+==========+==========+==========================================+ | +==========+==========+==========================================+ | |||
| Local | Remote | Forwarding Behavior | | | Local | Remote | Forwarding Behavior | | |||
| Neighbor | Neighbor | | | | Neighbor | Neighbor | | | |||
| AF | AF | | | | Address | Address | | | |||
| Family | Family | | | ||||
+==========+==========+==========================================+ | +==========+==========+==========================================+ | |||
| IPv4 | IPv4 | Only IPv4 traffic can be forwarded. | | | IPv4 | IPv4 | Only IPv4 traffic can be forwarded. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
| IPv6 | IPv6 | If either neighbor sets | | | IPv6 | IPv6 | If either neighbor sets | | |||
| | | _ipv4_forwarding_capable_ to false, only | | | | | _ipv4_forwarding_capable_ to false, only | | |||
| | | IPv6 traffic can be forwarded. If both | | | | | IPv6 traffic can be forwarded. If both | | |||
| | | neighbors set _ipv4_forwarding_capable_ | | | | | neighbors set _ipv4_forwarding_capable_ | | |||
| | | to true, IPv4 traffic is also forwarded | | | | | to true, IPv4 traffic is also forwarded | | |||
| | | via IPv6 gateways. | | | | | via IPv6 gateways. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
skipping to change at page 40, line 35 ¶ | skipping to change at line 1710 ¶ | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
| IPv4, | IPv4, | IPv4 and IPv6 traffic can be forwarded. | | | IPv4, | IPv4, | IPv4 and IPv6 traffic can be forwarded. | | |||
| IPv6 | IPv6 | If IPv4 and IPv6 LIEs advertise | | | IPv6 | IPv6 | If IPv4 and IPv6 LIEs advertise | | |||
| | | conflicting _ipv4_forwarding_capable_ | | | | | conflicting _ipv4_forwarding_capable_ | | |||
| | | flags, the behavior is unspecified. | | | | | flags, the behavior is unspecified. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
| IPv4, | IPv4 | IPv4 traffic can be forwarded. | | | IPv4, | IPv4 | IPv4 traffic can be forwarded. | | |||
| IPv6 | | | | | IPv6 | | | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
Table 2: Forwarding Behavior for Neighbor AF Combinations | Table 2: Forwarding Behavior for Neighbor Address Family | |||
Combinations | ||||
The protocol does *not* support selective disabling of address | The protocol does *not* support selective disabling of address | |||
families after adjacency formation, disabling IPv4 forwarding | families after adjacency formation, disabling IPv4 forwarding | |||
capability or any local address changes in _ThreeWay_ state, i.e. if | capability, or any local address changes in _ThreeWay_ state, i.e., | |||
a link has entered ThreeWay IPv4 and/or IPv6 with a neighbor on an | if a link has entered ThreeWay IPv4 and/or IPv6 with a neighbor on an | |||
adjacency and it wants to stop supporting one of the families or | adjacency and it wants to stop supporting one of the families, change | |||
change any of its local addresses or stop IPv4 forwarding, it MUST | any of its local addresses, or stop IPv4 forwarding, it MUST tear | |||
tear down and rebuild the adjacency. It MUST also remove any state | down and rebuild the adjacency. It MUST also remove any state it | |||
it stored about the remote side of the adjacency such as associated | stored about the remote side of the adjacency such as associated LIE | |||
LIE source addresses. | source addresses. | |||
Unless RIFT ZTP as described in Section 6.7 is used, each node is | Unless RIFT ZTP is used as described in Section 6.7, each node is | |||
provisioned with the level at which it is operating and advertises it | provisioned with the level at which it is operating and advertises it | |||
in the _level_ of the _PacketHeader_ schema element. It MAY be also | in the _level_ of the _PacketHeader_ schema element. It MAY also be | |||
provisioned with its PoD. If level is not provisioned, it is not | provisioned with its PoD. If the level is not provisioned, it is not | |||
present in the optional _PacketHeader_ schema element and established | present in the optional _PacketHeader_ schema element and established | |||
by ZTP procedures if feasible. If PoD is not provisioned, it is | by ZTP procedures, if feasible. If PoD is not provisioned, it is | |||
governed by the _LIEPacket_ schema element assuming the | governed by the _LIEPacket_ schema element assuming the | |||
_common.default_pod_ value. This means that switches except ToF do | _common.default_pod_ value. This means that switches except ToF do | |||
not need to be configured at all. Necessary information to configure | not need to be configured at all. Necessary information to configure | |||
all values is exchanged in the _LIEPacket_ and _PacketHeader_ or | all values is exchanged in the _LIEPacket_ and _PacketHeader_ or | |||
derived by the node automatically. | derived by the node automatically. | |||
Further definitions of leaf flags are found in Section 6.7 given they | Further leaf flag definitions are found in Section 6.7 as they have | |||
have implications in terms of level and adjacency forming here. Leaf | implications in terms of level and adjacency formation. Leaf flags | |||
flags are carried in _HierarchyIndications_. | are carried in _HierarchyIndications_. | |||
A node MUST form a _ThreeWay_ adjacency if at a minimum the following | A node MUST form a _ThreeWay_ adjacency if, at a minimum, the | |||
first order logic conditions are satisfied on a LIE packet as | following first order logic conditions are satisfied on a LIE packet, | |||
specified by the _LIEPacket_ schema element and received on a link | as specified by the _LIEPacket_ schema element and received on a link | |||
(such a LIE is considered a "minimally valid" LIE). Observe that | (such a LIE is considered a "minimally valid" LIE). Observe that, | |||
depending on the FSM involved and its state further conditions may be | depending on the FSM involved and its state further, conditions may | |||
checked and even a minimally valid LIE can be considered ultimately | be checked, and even a minimally valid LIE can be considered | |||
invalid if any of the additional conditions fail. | ultimately invalid if any of the additional conditions fail: | |||
1. the neighboring node is running the same major schema version as | 1. the neighboring node is running the same major schema version as | |||
indicated in the _major_version_ element in _PacketHeader_ *and* | indicated in the _major_version_ element in _PacketHeader_ *and* | |||
2. the neighboring node uses a valid System ID (i.e. value different | 2. the neighboring node uses a valid System ID (i.e., a value | |||
from _IllegalSystemID_) in the _sender_ element in _PacketHeader_ | different from _IllegalSystemID_) in the _sender_ element in | |||
*and* | _PacketHeader_ *and* | |||
3. the neighboring node uses a different System ID than the node | 3. the neighboring node uses a different System ID than the node | |||
itself *and* | itself *and* | |||
4. (the advertised MTU values in the _LiePacket_ element match on | 4. (the advertised MTU values in the _LiePacket_ element match on | |||
both sides while a missing MTU in the _LiePacket_ element is | both sides, while a missing MTU in the _LiePacket_ element is | |||
interpreted as _default_mtu_size_) *and* | interpreted as _default_mtu_size_) *and* | |||
5. both nodes advertise defined level values in _level_ element in | 5. both nodes advertise defined level values in the _level_ element | |||
_PacketHeader_ *and* | in _PacketHeader_ *and* | |||
6. [ | 6. [ | |||
i) the node is at _leaf_level_ value and has no _ThreeWay_ | a. the node is at the _leaf_level_ value and does not already | |||
adjacencies already to nodes at Highest Adjacency _ThreeWay_ | have any _ThreeWay_ adjacencies to nodes that are at the | |||
(HAT as defined later in Section 6.7.1) with level different | Highest Adjacency _ThreeWay_ (HAT), as defined in | |||
than the adjacent node *or* | Section 6.7.1, with a level that is different than the | |||
adjacent node *or* | ||||
ii) the node is not at _leaf_level_ value and the neighboring | b. the node is not at the _leaf_level_ value and the neighboring | |||
node is at _leaf_level_ value *or* | node is at the _leaf_level_ value *or* | |||
iii) both nodes are at _leaf_level_ values *and* both indicate | c. both nodes are at the _leaf_level_ value *and* both indicate | |||
support for Section 6.8.9 *or* | support for that described in Section 6.8.9 *or* | |||
iv) neither node is at _leaf_level_ value and the neighboring | ||||
node is at most one level difference away | ||||
]. | d. neither node is at the _leaf_level_ value and the neighboring | |||
node is, at most, one level away. | ||||
] | ||||
LIEs arriving with IPv4 Time to Live (TTL) or an IPv6 Hop Limit (HL) | LIEs arriving with IPv4 Time to Live (TTL) or an IPv6 Hop Limit (HL) | |||
different than 1 or 255 MUST be ignored. | different than 1 or 255 MUST be ignored. | |||
6.2.1. LIE Finite State Machine | 6.2.1. LIE Finite State Machine | |||
This section specifies the precise, normative LIE FSM which is given | This section specifies the precise, normative LIE FSM, which is also | |||
as well in Figure 14. Additionally, some sets of actions often | shown in Figure 14. Additionally, some sets of actions often repeat | |||
repeat and are hence summarized into well-known procedures. | and are hence summarized into well-known procedures. | |||
Events generated are fairly fine grained, especially when indicating | Events generated are fairly fine grained, especially when indicating | |||
problems in adjacency forming conditions to simplify tracking of | problems in adjacency-forming conditions to simplify tracking of | |||
problems in deployment. | problems in deployment. | |||
Initial state is _OneWay_. | The initial state is _OneWay_. | |||
The machine sends LIEs proactively on several transitions to | The machine sends LIEs proactively on several transitions to | |||
accelerate adjacency bring-up without waiting for the corresponding | accelerate adjacency bring-up without waiting for the corresponding | |||
timer tic. | timer tic. | |||
Enter | Enter | |||
| | | | |||
V | V | |||
+-----------+ | +-----------+ | |||
| OneWay |<----+ | | OneWay |<----+ | |||
skipping to change at page 45, line 17 ¶ | skipping to change at line 1935 ¶ | |||
| | LevelChanged | | | LevelChanged | |||
+------------+ MultipleNeighborsDone | +------------+ MultipleNeighborsDone | |||
Figure 14: LIE FSM | Figure 14: LIE FSM | |||
The following words are used for well-known procedures: | The following words are used for well-known procedures: | |||
* PUSH Event: queues an event to be executed by the FSM upon exit of | * PUSH Event: queues an event to be executed by the FSM upon exit of | |||
this action | this action | |||
* CLEANUP: The FSM *conceptually* holds a `current neighbor` | * CLEANUP: The FSM *conceptually* holds a "current neighbor" | |||
variable that contains information received in the remote node's | variable that contains information received in the remote node's | |||
LIE that is processed against LIE validation rules. In the event | LIE that is processed against LIE validation rules. In the event | |||
that the LIE is considered to be invalid, the existing state held | that the LIE is considered to be invalid, the existing state held | |||
by `current neighbor` MUST be deleted. | by a "current neighbor" MUST be deleted. | |||
* SEND_LIE: create and send a new LIE packet | * SEND_LIE: create and send a new LIE packet | |||
1. reflecting the _neighbor_ element as described in | 1. reflecting the _neighbor_ element as described in | |||
ValidReflection and | ValidReflection, | |||
2. setting the necessary _not_a_ztp_offer_ variable if level was | 2. setting the necessary _not_a_ztp_offer_ variable if the level | |||
derived from the last known neighbor on this interface and | was derived from the last-known neighbor on this interface, | |||
and | ||||
3. setting _you_are_flood_repeater_ variable to the computed | 3. setting the _you_are_flood_repeater_ variable to the computed | |||
value | value. | |||
* PROCESS_LIE: | * PROCESS_LIE: | |||
1. if LIE has a major version not equal to this node's major | 1. if LIE has a major version not equal to this node's major | |||
version *or* System ID equal to (this node's System ID or | version *or* System ID equal to this node's System ID or | |||
_IllegalSystemID_) then CLEANUP else | _IllegalSystemID_, then CLEANUP, else | |||
2. if both sides advertise Layer 2 MTU values and the MTU in the | 2. if both sides advertise Layer 2 MTU values and the MTU in the | |||
received LIE does not match the MTU advertised by the local | received LIE does not match the MTU advertised by the local | |||
system *or* at least one of the nodes does not advertise an | system *or* at least one of the nodes does not advertise an | |||
MTU value and the advertising node's LIE does not match the | MTU value and the advertising node's LIE does not match the | |||
_default_mtu_size_ of the system not advertising an MTU then | _default_mtu_size_ of the system not advertising an MTU, then | |||
CLEANUP, PUSH UpdateZTPOffer, PUSH MTUMismatch else | CLEANUP, PUSH UpdateZTPOffer, and PUSH MTUMismatch, else | |||
3. if the LIE has an undefined level *or* this node's level is | 3. if the LIE has an undefined level *or* this node's level is | |||
undefined *or* this node is a leaf and remote level is lower | undefined *or* this node is a leaf and the remote level is | |||
than HAT *or* (the LIE's level is not leaf *and* its | lower than HAT *or* the LIE's level is not leaf *and* its | |||
difference is more than one from this node's level) then | difference is more than one from this node's level, then | |||
CLEANUP, PUSH UpdateZTPOffer, PUSH UnacceptableHeader else | CLEANUP, PUSH UpdateZTPOffer, and PUSH UnacceptableHeader, | |||
else | ||||
4. PUSH UpdateZTPOffer, construct temporary new neighbor | 4. PUSH UpdateZTPOffer, construct a temporary new neighbor | |||
structure with values from LIE, if no current neighbor exists | structure with values from LIE, if no current neighbor exists, | |||
then set current neighbor to new neighbor, PUSH NewNeighbor | then set current neighbor to new neighbor, PUSH NewNeighbor | |||
event, CHECK_THREE_WAY else | event, CHECK_THREE_WAY, else | |||
1. if current neighbor System ID differs from LIE's System ID | a. if the current neighbor System ID differs from LIE's | |||
then PUSH MultipleNeighbors else | System ID, then PUSH MultipleNeighbors, else | |||
2. if current neighbor stored level differs from LIE's level | b. if the current neighbor stored level differs from LIE's | |||
then PUSH NeighborChangedLevel else | level, then PUSH NeighborChangedLevel, else | |||
3. if current neighbor stored IPv4/v6 address differs from | c. if the current neighbor stored IPv4/v6 address differs | |||
LIE's address then PUSH NeighborChangedAddress else | from LIE's address, then PUSH NeighborChangedAddress, else | |||
4. if any of neighbor's flood address port, name, or local | d. if any of the neighbor's flood address port, name, or | |||
LinkID changed then PUSH NeighborChangedMinorFields | local LinkID changed, then PUSH NeighborChangedMinorFields | |||
5. CHECK_THREE_WAY | e. CHECK_THREE_WAY | |||
* CHECK_THREE_WAY: if current state is _OneWay_ do nothing else | * CHECK_THREE_WAY: if the current state is _OneWay_, do nothing, | |||
else | ||||
1. if LIE packet does not contain neighbor then if current state | 1. if LIE packet does not contain a neighbor then if the current | |||
is _ThreeWay_ then PUSH NeighborDroppedReflection else | state is _ThreeWay_, then PUSH NeighborDroppedReflection, else | |||
2. if packet reflects this system's ID and local port and state | 2. if the packet reflects this System ID and local port and the | |||
is _ThreeWay_ then PUSH event ValidReflection else PUSH event | state is _ThreeWay_, then PUSH the ValidReflection event, else | |||
MultipleNeighbors | PUSH the MultipleNeighbors event. | |||
States: | States: | |||
* OneWay: initial state the FSM is starting from. In this state the | * OneWay: The initial state the FSM is starting from. In this | |||
router did not receive any valid LIEs from a neighbor. | state, the router did not receive any valid LIEs from a neighbor. | |||
* TwoWay: that state is entered when a node has received a minimally | * TwoWay: This state is entered when a node has received a minimally | |||
valid LIE from a neighbor but not a ThreeWay valid LIE. | valid LIE from a neighbor but not a ThreeWay valid LIE. | |||
* ThreeWay: this state signifies that _ThreeWay_ valid LIEs from a | * ThreeWay: This state signifies that _ThreeWay_ valid LIEs from a | |||
neighbor have been received. On achieving this state the link can | neighbor have been received. On achieving this state, the link | |||
be advertised in _neighbors_ element in _NodeTIEElement_. | can be advertised in the _neighbors_ element in _NodeTIEElement_. | |||
* MultipleNeighborsWait: occurs normally when more than two nodes | * MultipleNeighborsWait: Occurs normally when more than two nodes | |||
become aware of each other on the same link or a remote node is | become aware of each other on the same link or a remote node is | |||
quickly reconfigured or rebooted without regressing to _OneWay_ | quickly reconfigured or rebooted without regressing to _OneWay_ | |||
first. Each occurrence of the event SHOULD generate notification | first. Each occurrence of the event SHOULD generate a | |||
to help operational deployments. | notification to help operational deployments. | |||
Events: | Events: | |||
* TimerTick: one-second timer tick, i.e., the event is provided to | * TimerTick: One-second timer tick, i.e., the event is provided to | |||
the FSM once a second by an implementation-specific mechanism that | the FSM once a second by an implementation-specific mechanism that | |||
is outside the scope of this specification. This event is quietly | is outside the scope of this specification. This event is quietly | |||
ignored if the relevant transition does not exist. | ignored if the relevant transition does not exist. | |||
* LevelChanged: node's level has been changed by ZTP or | * LevelChanged: Node's level has been changed by ZTP or | |||
configuration. This is provided by the ZTP FSM. | configuration. This is provided by the ZTP FSM. | |||
* HALChanged: best HAL computed by ZTP has changed. This is | * HALChanged: Best HAL computed by ZTP has changed. This is | |||
provided by the ZTP FSM. | provided by the ZTP FSM. | |||
* HATChanged: HAT computed by ZTP has changed. This is provided by | * HATChanged: HAT computed by ZTP has changed. This is provided by | |||
the ZTP FSM. | the ZTP FSM. | |||
* HALSChanged: set of HAL offering systems computed by ZTP has | * HALSChanged: Set of HAL offering systems computed by ZTP has | |||
changed. This is provided by the ZTP FSM. | changed. This is provided by the ZTP FSM. | |||
* LieRcvd: received LIE on the interface. | * LieRcvd: Received LIE on the interface. | |||
* NewNeighbor: new neighbor is present in the received LIE. | * NewNeighbor: New neighbor is present in the received LIE. | |||
* ValidReflection: received valid reflection of this node from | * ValidReflection: Received valid reflection of this node from the | |||
neighbor, i.e. all elements in _neighbor_ element in _LiePacket_ | neighbor, i.e., all elements in the _neighbor_ element in | |||
have values corresponding to this link. | _LiePacket_ have values corresponding to this link. | |||
* NeighborDroppedReflection: lost previously held reflection from | * NeighborDroppedReflection: Lost previously held reflection from | |||
neighbor, i.e. _neighbor_ element in _LiePacket_ does not | the neighbor, i.e., the _neighbor_ element in _LiePacket_ does not | |||
correspond to this node or is not present. | correspond to this node or is not present. | |||
* NeighborChangedLevel: neighbor changed advertised level from the | * NeighborChangedLevel: Neighbor changed the advertised level from | |||
previously held one. | the previously held one. | |||
* NeighborChangedAddress: neighbor changed IP address, i.e. LIE has | * NeighborChangedAddress: Neighbor changed the IP address, i.e., the | |||
been received from an address different from previous LIEs. Those | LIE has been received from an address different from previous | |||
changes will influence the sockets used to listen to TIEs, TIREs, | LIEs. Those changes will influence the sockets used to listen to | |||
TIDEs. | TIEs, TIREs, and TIDEs. | |||
* UnacceptableHeader: Unacceptable header received. | * UnacceptableHeader: Unacceptable header received. | |||
* MTUMismatch: MTU mismatched. | * MTUMismatch: MTU mismatched. | |||
* NeighborChangedMinorFields: minor fields changed in neighbor's | * NeighborChangedMinorFields: Minor fields changed in the neighbor's | |||
LIE. | LIE. | |||
* HoldtimeExpired: adjacency holddown timer expired. | * HoldtimeExpired: Adjacency holddown timer expired. | |||
* MultipleNeighbors: more than one neighbor is present on interface | * MultipleNeighbors: More than one neighbor is present on the | |||
* MultipleNeighborsDone: multiple neighbors timer expired. | interface. | |||
* FloodLeadersChanged: node's election algorithm determined new set | * MultipleNeighborsDone: Multiple neighbors timer expired. | |||
* FloodLeadersChanged: Node's election algorithm determined new set | ||||
of flood leaders. | of flood leaders. | |||
* SendLie: send a LIE out. | * SendLie: Send a LIE out. | |||
* UpdateZTPOffer: update this node's ZTP offer. This is sent to the | * UpdateZTPOffer: Update this node's ZTP offer. This is sent to the | |||
ZTP FSM. | ZTP FSM. | |||
Actions: | Actions: | |||
* on HATChanged in _OneWay_ finishes in OneWay: store HAT | * on HATChanged in _OneWay_ finishes in OneWay: store HAT | |||
* on FloodLeadersChanged in _OneWay_ finishes in OneWay: update | * on FloodLeadersChanged in _OneWay_ finishes in OneWay: update | |||
_you_are_flood_repeater_ LIE elements based on flood leader | _you_are_flood_repeater_ LIE elements based on the flood leader | |||
election results | election results | |||
* on UnacceptableHeader in _OneWay_ finishes in OneWay: no action | * on UnacceptableHeader in _OneWay_ finishes in OneWay: no action | |||
* on NeighborChangedMinorFields in _OneWay_ finishes in OneWay: no | * on NeighborChangedMinorFields in _OneWay_ finishes in OneWay: no | |||
action | action | |||
* on SendLie in _OneWay_ finishes in OneWay: SEND_LIE | * on SendLie in _OneWay_ finishes in OneWay: SEND_LIE | |||
* on HALSChanged in _OneWay_ finishes in OneWay: store HALS | * on HALSChanged in _OneWay_ finishes in OneWay: store the HALS | |||
* on MultipleNeighbors in _OneWay_ finishes in | * on MultipleNeighbors in _OneWay_ finishes in | |||
MultipleNeighborsWait: start multiple neighbors timer with | MultipleNeighborsWait: start multiple neighbors timer with the | |||
interval _multiple_neighbors_lie_holdtime_multipler_ * | interval _multiple_neighbors_lie_holdtime_multiplier_ * | |||
_default_lie_holdtime_ | _default_lie_holdtime_ | |||
* on NeighborChangedLevel in _OneWay_ finishes in OneWay: no action | * on NeighborChangedLevel in _OneWay_ finishes in OneWay: no action | |||
* on LieRcvd in _OneWay_ finishes in OneWay: PROCESS_LIE | * on LieRcvd in _OneWay_ finishes in OneWay: PROCESS_LIE | |||
* on MTUMismatch in _OneWay_ finishes in OneWay: no action | * on MTUMismatch in _OneWay_ finishes in OneWay: no action | |||
* on ValidReflection in _OneWay_ finishes in ThreeWay: no action | * on ValidReflection in _OneWay_ finishes in ThreeWay: no action | |||
* on LevelChanged in _OneWay_ finishes in OneWay: update level with | * on LevelChanged in _OneWay_ finishes in OneWay: update the level | |||
event value, PUSH SendLie event | with the event value, PUSH the SendLie event | |||
* on HALChanged in _OneWay_ finishes in OneWay: store new HAL | * on HALChanged in _OneWay_ finishes in OneWay: store the new HAL | |||
* on HoldtimeExpired in _OneWay_ finishes in OneWay: no action | * on HoldtimeExpired in _OneWay_ finishes in OneWay: no action | |||
* on NeighborChangedAddress in _OneWay_ finishes in OneWay: no | * on NeighborChangedAddress in _OneWay_ finishes in OneWay: no | |||
action | action | |||
* on NewNeighbor in _OneWay_ finishes in TwoWay: PUSH SendLie event | * on NewNeighbor in _OneWay_ finishes in TwoWay: PUSH the SendLie | |||
event | ||||
* on UpdateZTPOffer in _OneWay_ finishes in OneWay: send offer to | * on UpdateZTPOffer in _OneWay_ finishes in OneWay: send the offer | |||
ZTP FSM | to the ZTP FSM | |||
* on NeighborDroppedReflection in _OneWay_ finishes in OneWay: no | * on NeighborDroppedReflection in _OneWay_ finishes in OneWay: no | |||
action | action | |||
* on TimerTick in _OneWay_ finishes in OneWay: PUSH SendLie event | * on TimerTick in _OneWay_ finishes in OneWay: PUSH SendLie event | |||
* on FloodLeadersChanged in _TwoWay_ finishes in TwoWay: update | * on FloodLeadersChanged in _TwoWay_ finishes in TwoWay: update | |||
_you_are_flood_repeater_ LIE elements based on flood leader | _you_are_flood_repeater_ LIE elements based on the flood leader | |||
election results | election results | |||
* on UpdateZTPOffer in _TwoWay_ finishes in TwoWay: send offer to | * on UpdateZTPOffer in _TwoWay_ finishes in TwoWay: send the offer | |||
ZTP FSM | to the ZTP FSM | |||
* on NewNeighbor in _TwoWay_ finishes in MultipleNeighborsWait: PUSH | * on NewNeighbor in _TwoWay_ finishes in MultipleNeighborsWait: PUSH | |||
SendLie event | the SendLie event | |||
* on ValidReflection in _TwoWay_ finishes in ThreeWay: no action | * on ValidReflection in _TwoWay_ finishes in ThreeWay: no action | |||
* on LieRcvd in _TwoWay_ finishes in TwoWay: PROCESS_LIE | * on LieRcvd in _TwoWay_ finishes in TwoWay: PROCESS_LIE | |||
* on UnacceptableHeader in _TwoWay_ finishes in OneWay: no action | * on UnacceptableHeader in _TwoWay_ finishes in OneWay: no action | |||
* on HALChanged in _TwoWay_ finishes in TwoWay: store new HAL | * on HALChanged in _TwoWay_ finishes in TwoWay: store the new HAL | |||
* on HoldtimeExpired in _TwoWay_ finishes in OneWay: no action | * on HoldtimeExpired in _TwoWay_ finishes in OneWay: no action | |||
* on LevelChanged in _TwoWay_ finishes in TwoWay: update level with | * on LevelChanged in _TwoWay_ finishes in TwoWay: update the level | |||
event value | with the event value | |||
* on TimerTick in _TwoWay_ finishes in TwoWay: PUSH SendLie event, | * on TimerTick in _TwoWay_ finishes in TwoWay: PUSH SendLie event, | |||
if last valid LIE was received more than _holdtime_ ago as | if last valid LIE was received more than _holdtime_ ago as | |||
advertised by neighbor then PUSH HoldtimeExpired event | advertised by the neighbor, then PUSH the HoldtimeExpired event | |||
* on HATChanged in _TwoWay_ finishes in TwoWay: store HAT | * on HATChanged in _TwoWay_ finishes in TwoWay: store HAT | |||
* on NeighborChangedLevel in _TwoWay_ finishes in OneWay: no action | * on NeighborChangedLevel in _TwoWay_ finishes in OneWay: no action | |||
* on HALSChanged in _TwoWay_ finishes in TwoWay: store HALS | * on HALSChanged in _TwoWay_ finishes in TwoWay: store the HALS | |||
* on MTUMismatch in _TwoWay_ finishes in OneWay: no action | * on MTUMismatch in _TwoWay_ finishes in OneWay: no action | |||
* on NeighborChangedAddress in _TwoWay_ finishes in OneWay: no | * on NeighborChangedAddress in _TwoWay_ finishes in OneWay: no | |||
action | action | |||
* on SendLie in _TwoWay_ finishes in TwoWay: SEND_LIE | * on SendLie in _TwoWay_ finishes in TwoWay: SEND_LIE | |||
* on MultipleNeighbors in _TwoWay_ finishes in | * on MultipleNeighbors in _TwoWay_ finishes in | |||
MultipleNeighborsWait: start multiple neighbors timer with | MultipleNeighborsWait: start multiple neighbors timer with the | |||
interval _multiple_neighbors_lie_holdtime_multipler_ * | interval _multiple_neighbors_lie_holdtime_multiplier_ * | |||
_default_lie_holdtime_ | _default_lie_holdtime_ | |||
* on TimerTick in _ThreeWay_ finishes in ThreeWay: PUSH SendLie | * on TimerTick in _ThreeWay_ finishes in ThreeWay: PUSH the SendLie | |||
event, if last valid LIE was received more than _holdtime_ ago as | event, if the last valid LIE was received more than _holdtime_ ago | |||
advertised by neighbor then PUSH HoldtimeExpired event | as advertised by the neighbor, then PUSH the HoldtimeExpired event | |||
* on LevelChanged in _ThreeWay_ finishes in OneWay: update level | * on LevelChanged in _ThreeWay_ finishes in OneWay: update the level | |||
with event value | with the event value | |||
* on HATChanged in _ThreeWay_ finishes in ThreeWay: store HAT | * on HATChanged in _ThreeWay_ finishes in ThreeWay: store HAT | |||
* on MTUMismatch in _ThreeWay_ finishes in OneWay: no action | * on MTUMismatch in _ThreeWay_ finishes in OneWay: no action | |||
* on UnacceptableHeader in _ThreeWay_ finishes in OneWay: no action | * on UnacceptableHeader in _ThreeWay_ finishes in OneWay: no action | |||
* on MultipleNeighbors in _ThreeWay_ finishes in | * on MultipleNeighbors in _ThreeWay_ finishes in | |||
MultipleNeighborsWait: start multiple neighbors timer with | MultipleNeighborsWait: start multiple neighbors timer with the | |||
interval _multiple_neighbors_lie_holdtime_multipler_ * | interval _multiple_neighbors_lie_holdtime_multiplier_ * | |||
_default_lie_holdtime_ | _default_lie_holdtime_ | |||
* on NeighborChangedLevel in _ThreeWay_ finishes in OneWay: no | * on NeighborChangedLevel in _ThreeWay_ finishes in OneWay: no | |||
action | action | |||
* on HALSChanged in _ThreeWay_ finishes in ThreeWay: store HALS | * on HALSChanged in _ThreeWay_ finishes in ThreeWay: store the HALS | |||
* on LieRcvd in _ThreeWay_ finishes in ThreeWay: PROCESS_LIE | * on LieRcvd in _ThreeWay_ finishes in ThreeWay: PROCESS_LIE | |||
* on FloodLeadersChanged in _ThreeWay_ finishes in ThreeWay: update | * on FloodLeadersChanged in _ThreeWay_ finishes in ThreeWay: update | |||
_you_are_flood_repeater_ LIE elements based on flood leader | _you_are_flood_repeater_ LIE elements based on the flood leader | |||
election results, PUSH SendLie | election results, PUSH the SendLie event | |||
* on NeighborDroppedReflection in _ThreeWay_ finishes in TwoWay: no | * on NeighborDroppedReflection in _ThreeWay_ finishes in TwoWay: no | |||
action | action | |||
* on HoldtimeExpired in _ThreeWay_ finishes in OneWay: no action | * on HoldtimeExpired in _ThreeWay_ finishes in OneWay: no action | |||
* on ValidReflection in _ThreeWay_ finishes in ThreeWay: no action | * on ValidReflection in _ThreeWay_ finishes in ThreeWay: no action | |||
* on UpdateZTPOffer in _ThreeWay_ finishes in ThreeWay: send offer | * on UpdateZTPOffer in _ThreeWay_ finishes in ThreeWay: send the | |||
to ZTP FSM | offer to the ZTP FSM | |||
* on NeighborChangedAddress in _ThreeWay_ finishes in OneWay: no | * on NeighborChangedAddress in _ThreeWay_ finishes in OneWay: no | |||
action | action | |||
* on HALChanged in _ThreeWay_ finishes in ThreeWay: store new HAL | * on HALChanged in _ThreeWay_ finishes in ThreeWay: store the new | |||
HAL | ||||
* on SendLie in _ThreeWay_ finishes in ThreeWay: SEND_LIE | * on SendLie in _ThreeWay_ finishes in ThreeWay: SEND_LIE | |||
* on MultipleNeighbors in MultipleNeighborsWait finishes in | * on MultipleNeighbors in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: start multiple neighbors timer with | MultipleNeighborsWait: start multiple neighbors timer with the | |||
interval _multiple_neighbors_lie_holdtime_multipler_ * | interval _multiple_neighbors_lie_holdtime_multiplier_ * | |||
_default_lie_holdtime_ | _default_lie_holdtime_ | |||
* on FloodLeadersChanged in MultipleNeighborsWait finishes in | * on FloodLeadersChanged in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: update _you_are_flood_repeater_ LIE | MultipleNeighborsWait: update _you_are_flood_repeater_ LIE | |||
elements based on flood leader election results | elements based on the flood leader election results | |||
* on TimerTick in MultipleNeighborsWait finishes in | * on TimerTick in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: check MultipleNeighbors timer, if timer | MultipleNeighborsWait: check MultipleNeighbors timer, if the timer | |||
expired PUSH MultipleNeighborsDone | expired, PUSH MultipleNeighborsDone | |||
* on ValidReflection in MultipleNeighborsWait finishes in | * on ValidReflection in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: no action | MultipleNeighborsWait: no action | |||
* on UpdateZTPOffer in MultipleNeighborsWait finishes in | * on UpdateZTPOffer in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: send offer to ZTP FSM | MultipleNeighborsWait: send the offer to the ZTP FSM | |||
* on NeighborDroppedReflection in MultipleNeighborsWait finishes in | * on NeighborDroppedReflection in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: no action | MultipleNeighborsWait: no action | |||
* on LieRcvd in MultipleNeighborsWait finishes in | * on LieRcvd in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: no action | MultipleNeighborsWait: no action | |||
* on UnacceptableHeader in MultipleNeighborsWait finishes in | * on UnacceptableHeader in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: no action | MultipleNeighborsWait: no action | |||
* on NeighborChangedAddress in MultipleNeighborsWait finishes in | * on NeighborChangedAddress in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: no action | MultipleNeighborsWait: no action | |||
* on LevelChanged in MultipleNeighborsWait finishes in OneWay: | * on LevelChanged in MultipleNeighborsWait finishes in OneWay: | |||
update level with event value | update the level with the event value | |||
* on HATChanged in MultipleNeighborsWait finishes in | * on HATChanged in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: store HAT | MultipleNeighborsWait: store HAT | |||
* on MTUMismatch in MultipleNeighborsWait finishes in | * on MTUMismatch in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: no action | MultipleNeighborsWait: no action | |||
* on HALSChanged in MultipleNeighborsWait finishes in | * on HALSChanged in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: store HALS | MultipleNeighborsWait: store the HALS | |||
* on HALChanged in MultipleNeighborsWait finishes in | * on HALChanged in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: store new HAL | MultipleNeighborsWait: store the new HAL | |||
* on HoldtimeExpired in MultipleNeighborsWait finishes in | * on HoldtimeExpired in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: no action | MultipleNeighborsWait: no action | |||
* on SendLie in MultipleNeighborsWait finishes in | * on SendLie in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: no action | MultipleNeighborsWait: no action | |||
* on MultipleNeighborsDone in MultipleNeighborsWait finishes in | * on MultipleNeighborsDone in MultipleNeighborsWait finishes in | |||
OneWay: no action | OneWay: no action | |||
* on Entry into OneWay: CLEANUP | * on Entry into OneWay: CLEANUP | |||
6.3. Topology Exchange (TIE Exchange) | 6.3. Topology Exchange (TIE Exchange) | |||
6.3.1. Topology Information Elements | 6.3.1. Topology Information Elements | |||
Topology and reachability information in RIFT is conveyed by TIEs. | Topology and reachability information in RIFT is conveyed by TIEs. | |||
The TIE exchange mechanism uses the port indicated by each node in | The TIE exchange mechanism uses the port indicated by each node in | |||
the LIE exchange as _flood_port_ in _LIEPacket_ and the interface on | the LIE exchange as _flood_port_ in _LIEPacket_ and the interface on | |||
which the adjacency has been formed as destination. TIEs MUST be | which the adjacency has been formed as the destination. TIEs MUST be | |||
sent with an IPv4 Time to Live (TTL) or an IPv6 Hop Limit (HL) of | sent with an IPv4 Time to Live (TTL) or an IPv6 Hop Limit (HL) of | |||
either 1 or 255 and also MUST be ignored if received with values | either 1 or 255 and also MUST be ignored if received with values | |||
different than 1 or 255. This helps to protect RIFT information from | different than 1 or 255. This helps to protect RIFT information from | |||
being accepted beyond a single L3 next-hop in the topology. TIEs | being accepted beyond a single L3 next hop in the topology. TIEs | |||
SHOULD be sent with network control precedence unless an | SHOULD be sent with network control precedence unless an | |||
implementation is prevented from doing so [RFC2474]. | implementation is prevented from doing so [RFC2474]. | |||
TIEs contain sequence numbers, lifetimes, and a type. Each type has | TIEs contain sequence numbers, lifetimes, and a type. Each type has | |||
ample identifying number space and information is spread across | ample identifying number space, and information is spread across | |||
multiple TIEs with the same TIEElement type (this is true for all TIE | multiple TIEs with the same TIEElement type (this is true for all TIE | |||
types). | types). | |||
More information about the TIE structure can be found in the schema | More information about the TIE structure can be found in the schema | |||
in Section 7 starting with _TIEPacket_ root. | in Section 7, starting with _TIEPacket_ root. | |||
6.3.2. Southbound and Northbound TIE Representation | 6.3.2. Southbound and Northbound TIE Representation | |||
A central concept of RIFT is that each node represents itself | A central concept of RIFT is that each node represents itself | |||
differently depending on the direction in which it is advertising | differently, depending on the direction in which it is advertising | |||
information. More precisely, a spine node represents two different | information. More precisely, a spine node represents two different | |||
databases over its adjacencies depending on whether it advertises | databases over its adjacencies, depending on whether it advertises | |||
TIEs to the north or to the south/east-west. Those differing TIE | TIEs to the north or to the south/east-west. Those differing TIE | |||
databases are called either south- or northbound (South TIEs and | databases are called either southbound or northbound (South TIEs and | |||
North TIEs) depending on the direction of distribution. | North TIEs), depending on the direction of distribution. | |||
The North TIEs hold all of the node's adjacencies and local prefixes | The North TIEs hold all of the node's adjacencies and local prefixes, | |||
while the South TIEs hold only all of the node's adjacencies, the | while the South TIEs hold all of the node's adjacencies, the default | |||
default prefix with necessary disaggregated prefixes and local | prefix with necessary disaggregated prefixes, and local prefixes. | |||
prefixes. Section 6.5 explains further details. | Section 6.5 explains further details. | |||
All TIE types are mostly symmetrical in both directions. The | All TIE types are mostly symmetrical in both directions. Section 7.3 | |||
(Section 7.3) defines the TIE types (i.e., the TIETypeType element) | defines the TIE types (i.e., the TIETypeType element) and their | |||
and their directionality (i.e., _direction_ within the _TIEID_ | directionality (i.e., _direction_ within the _TIEID_ element). | |||
element). | ||||
As an example illustrating a database holding both representations, | As an example illustrating a database holding both representations, | |||
the topology in Figure 2 with the optional link between spine 111 and | the topology in Figure 2 with the optional link between spine 111 and | |||
spine 112 (so that the flooding on an East-West link can be shown) is | spine 112 (so that the flooding on an East-West link can be shown) is | |||
shown below. Unnumbered interfaces are implicitly assumed and for | shown below. Unnumbered interfaces are implicitly assumed and, for | |||
simplicity, the key value elements which may be included in their | simplicity, the key value elements, which may be included in their | |||
South TIEs or North TIEs are not shown. First, in Figure 15 are the | South TIEs or North TIEs, are not shown. First, Figure 15 shows the | |||
TIEs generated by some nodes. | TIEs generated by some nodes. | |||
ToF 21 South TIEs: | ToF 21 South TIEs: | |||
Node South TIE: | South Node TIE: | |||
NodeTIEElement(level=2, | NodeTIEElement(level=2, | |||
neighbors( | neighbors( | |||
(Spine 111, level 1, cost 1, links(...)), | (Spine 111, level 1, cost 1, links(...)), | |||
(Spine 112, level 1, cost 1, links(...)), | (Spine 112, level 1, cost 1, links(...)), | |||
(Spine 121, level 1, cost 1, links(...)), | (Spine 121, level 1, cost 1, links(...)), | |||
(Spine 122, level 1, cost 1, links(...)) | (Spine 122, level 1, cost 1, links(...)) | |||
) | ) | |||
) | ) | |||
Prefix South TIE: | South Prefix TIE: | |||
PrefixTIEElement(prefixes(0/0, metric 1), (::/0, metric 1)) | PrefixTIEElement(prefixes(0/0, metric 1), (::/0, metric 1)) | |||
Spine 111 South TIEs: | ||||
Node South TIE: | ||||
NodeTIEElement(level=1, | ||||
neighbors( | ||||
(ToF 21, level 2, cost 1, links(...)), | ||||
(ToF 22, level 2, cost 1, links(...)), | ||||
(Spine 112, level 1, cost 1, links(...)), | ||||
(Leaf111, level 0, cost 1, links(...)), | ||||
(Leaf112, level 0, cost 1, links(...)) | ||||
) | ||||
) | ||||
Prefix South TIE: | ||||
PrefixTIEElement(prefixes(0/0, metric 1), (::/0, metric 1)) | ||||
Spine 111 North TIEs: | Spine 111 South TIEs: | |||
Node North TIE: | South Node TIE: | |||
NodeTIEElement(level=1, | NodeTIEElement(level=1, | |||
neighbors( | neighbors( | |||
(ToF 21, level 2, cost 1, links(...)), | (ToF 21, level 2, cost 1, links(...)), | |||
(ToF 22, level 2, cost 1, links(...)), | (ToF 22, level 2, cost 1, links(...)), | |||
(Spine 112, level 1, cost 1, links(...)), | (Spine 112, level 1, cost 1, links(...)), | |||
(Leaf111, level 0, cost 1, links(...)), | (Leaf111, level 0, cost 1, links(...)), | |||
(Leaf112, level 0, cost 1, links(...)) | (Leaf112, level 0, cost 1, links(...)) | |||
) | ) | |||
) | ) | |||
Prefix North TIE: | South Prefix TIE: | |||
PrefixTIEElement(prefixes(Spine 111.loopback) | PrefixTIEElement(prefixes(0/0, metric 1), (::/0, metric 1)) | |||
Spine 121 South TIEs: | Spine 111 North TIEs: | |||
Node South TIE: | North Node TIE: | |||
NodeTIEElement(level=1, | NodeTIEElement(level=1, | |||
neighbors( | neighbors( | |||
(ToF 21, level 2, cost 1, links(...)), | (ToF 21, level 2, cost 1, links(...)), | |||
(ToF 22, level 2, cost 1, links(...)), | (ToF 22, level 2, cost 1, links(...)), | |||
(Leaf121, level 0, cost 1, links(...)), | (Spine 112, level 1, cost 1, links(...)), | |||
(Leaf122, level 0, cost 1, links(...)) | (Leaf111, level 0, cost 1, links(...)), | |||
) | (Leaf112, level 0, cost 1, links(...)) | |||
) | ) | |||
Prefix South TIE: | ) | |||
PrefixTIEElement(prefixes(0/0, metric 1), (::/0, metric 1)) | North Prefix TIE: | |||
PrefixTIEElement(prefixes(Spine 111.loopback) | ||||
Spine 121 North TIEs: | Spine 121 South TIEs: | |||
Node North TIE: | South Node TIE: | |||
NodeTIEElement(level=1, | NodeTIEElement(level=1, | |||
neighbors( | neighbors( | |||
(ToF 21, level 2, cost 1, links(...)), | (ToF 21, level 2, cost 1, links(...)), | |||
(ToF 22, level 2, cost 1, links(...)), | (ToF 22, level 2, cost 1, links(...)), | |||
(Leaf121, level 0, cost 1, links(...)), | (Leaf121, level 0, cost 1, links(...)), | |||
(Leaf122, level 0, cost 1, links(...)) | (Leaf122, level 0, cost 1, links(...)) | |||
) | ) | |||
) | ) | |||
Prefix North TIE: | South Prefix TIE: | |||
PrefixTIEElement(prefixes(Spine 121.loopback) | PrefixTIEElement(prefixes(0/0, metric 1), (::/0, metric 1)) | |||
Leaf112 North TIEs: | Spine 121 North TIEs: | |||
North Node TIE: | ||||
NodeTIEElement(level=1, | ||||
neighbors( | ||||
(ToF 21, level 2, cost 1, links(...)), | ||||
(ToF 22, level 2, cost 1, links(...)), | ||||
(Leaf121, level 0, cost 1, links(...)), | ||||
(Leaf122, level 0, cost 1, links(...)) | ||||
) | ||||
) | ||||
North Prefix TIE: | ||||
PrefixTIEElement(prefixes(Spine 121.loopback) | ||||
Node North TIE: | Leaf112 North TIEs: | |||
NodeTIEElement(level=0, | North Node TIE: | |||
neighbors( | NodeTIEElement(level=0, | |||
(Spine 111, level 1, cost 1, links(...)), | neighbors( | |||
(Spine 112, level 1, cost 1, links(...)) | (Spine 111, level 1, cost 1, links(...)), | |||
) | (Spine 112, level 1, cost 1, links(...)) | |||
) | ) | |||
Prefix North TIE: | ) | |||
PrefixTIEElement(prefixes(Leaf112.loopback, Prefix112, Prefix_MH)) | North Prefix TIE: | |||
PrefixTIEElement(prefixes(Leaf112.loopback, Prefix112, Prefix_MH)) | ||||
Figure 15: Example TIEs Generated in a 2 Level Spine-and-Leaf | Figure 15: Example TIEs Generated in a 2-Level Spine-and-Leaf | |||
Topology | Topology | |||
It may not be obvious here as to why the Node South TIEs contain all | It may not be obvious here as to why the South Node TIEs contain all | |||
the adjacencies of the corresponding node. This will be necessary | the adjacencies of the corresponding node. This will be necessary | |||
for algorithms further elaborated on in Section 6.3.9 and | for algorithms further elaborated on in Sections 6.3.9 and 6.8.7. | |||
Section 6.8.7. | ||||
For Node TIEs to carry more adjacencies than fit into an MTU-sized | For Node TIEs to carry more adjacencies than fit into an MTU-sized | |||
packet, the element _neighbors_ may contain a different set of | packet, the _neighbors_ element may contain a different set of | |||
neighbors in each TIE. Those disjointed sets of neighbors MUST be | neighbors in each TIE. Those disjointed sets of neighbors MUST be | |||
joined during corresponding computation. However, if the following | joined during corresponding computation. However, if the following | |||
occurs across multiple Node TIEs | occurs across multiple Node TIEs: | |||
1. _capabilities_ do not match *or* | 1. _capabilities_ do not match *or* | |||
2. _flags_ values do not match *or* | 2. _flags_ values do not match *or* | |||
3. same neighbor repeats in multiple TIEs with different values | 3. the same neighbor repeats in multiple TIEs with different values. | |||
The implementation is expected to use the value of any of the valid | The implementation is expected to use the value of any of the valid | |||
TIEs it received as it cannot control the arrival order of those | TIEs it received, as it cannot control the arrival order of those | |||
TIEs. | TIEs. | |||
The _miscabled_links_ element SHOULD be included in every Node TIE, | The _miscabled_links_ element SHOULD be included in every Node TIE; | |||
otherwise the behavior is undefined. | otherwise, the behavior is undefined. | |||
A ToF node MUST include information on all other ToFs it is aware of | A ToF node MUST include information on all other ToFs it is aware of | |||
through reflection. The _same_plane_tofs_ element is used to carry | through reflection. The _same_plane_tofs_ element is used to carry | |||
this information. To prevent MTU overrun problems, multiple Node | this information. To prevent MTU overrun problems, multiple Node | |||
TIEs can carry disjointed sets of ToFs which MUST be joined to form a | TIEs can carry disjointed sets of ToFs, which MUST be joined to form | |||
single set. | a single set. | |||
Different TIE types are carried in _TIEElement_. Schema enum | Different TIE types are carried in _TIEElement_. Schema enum | |||
`common.TIETypeType` in _TIEID_ indicates which elements MUST be | 'common.TIETypeType' in _TIEID_ indicates which elements MUST be | |||
present in the _TIEElement_. In case of a mismatch between the | present in _TIEElement_. In case of a mismatch between _TIETypeType_ | |||
_TIETypeType_ in the _TIEID_ and the present element, the unexpected | in the _TIEID_ and the present element, the unexpected elements MUST | |||
elements MUST be ignored. In case of lack of expected element in the | be ignored. In case of the lack of an expected element in the TIE, | |||
TIE an error MUST be reported and the TIE MUST be ignored. The | an error MUST be reported and the TIE MUST be ignored. The | |||
element _positive_disaggregation_prefixes_ and | _positive_disaggregation_prefixes_ and | |||
_positive_external_disaggregation_prefixes_ MUST be advertised | _positive_external_disaggregation_prefixes_ elements MUST be | |||
southbound only and ignored in North TIEs. The element | advertised southbound only and ignored in North TIEs. The | |||
_negative_disaggregation_prefixes_ MUST be propagated according to | _negative_disaggregation_prefixes_ element MUST be propagated, | |||
Section 6.5.2 southwards towards lower levels to heal pathological | according to Section 6.5.2, southwards towards lower levels to heal | |||
upper-level partitioning, otherwise traffic loss may occur in | pathological upper-level partitioning; otherwise, traffic loss may | |||
multiplane fabrics. It MUST NOT be advertised within a North TIE and | occur in multi-plane fabrics. It MUST NOT be advertised within a | |||
MUST be ignored otherwise. | North TIE and MUST be ignored otherwise. | |||
6.3.3. Flooding | 6.3.3. Flooding | |||
As described before, TIEs themselves are transported over UDP with | As described before, TIEs themselves are transported over UDP with | |||
the ports indicated in the LIE exchanges and using the destination | the ports indicated in the LIE exchanges and use the destination | |||
address on which the LIE adjacency has been formed. | address on which the LIE adjacency has been formed. | |||
TIEs are uniquely identified by the _TIEID_ schema element. The | TIEs are uniquely identified by the _TIEID_ schema element. _TIEID_ | |||
_TIEID_ induces a total order achieved by comparing the elements in | induces a total order achieved by comparing the elements in sequence | |||
sequence defined in the element and comparing each value as an | defined in the element and comparing each value as an unsigned | |||
unsigned integer of corresponding length. The _TIEHeader_ element | integer of corresponding length. The _TIEHeader_ element contains a | |||
contains a _seq_nr_ element to distinguish newer versions of same | _seq_nr_ element to distinguish newer versions of the same TIE. | |||
TIE. | ||||
The _TIEHeader_ can also carry an _origination_time_ schema element | _TIEHeader_ can also carry an _origination_time_ schema element (for | |||
(for fabrics that utilize precision timing) which contains the | fabrics that utilize precision timing) that contains the absolute | |||
absolute timestamp of when the TIE was generated and an | timestamp of when the TIE was generated and an _origination_lifetime_ | |||
_origination_lifetime_ to indicate the original lifetime when the TIE | to indicate the original lifetime when the TIE was generated. When | |||
was generated. When carried, they can be used for debugging or | carried, they can be used for debugging or security purposes (e.g., | |||
security purposes (e.g. to prevent lifetime modification attacks). | to prevent lifetime modification attacks). Clock synchronization is | |||
Clock synchronization is considered in more detail in Section 6.8.4. | considered in more detail in Section 6.8.4. | |||
_remaining_lifetime_ counts down to 0 from _origination_lifetime_. | _remaining_lifetime_ counts down to 0 from _origination_lifetime_. | |||
TIEs with lifetimes differing by less than _lifetime_diff2ignore_ | TIEs with lifetimes differing by less than _lifetime_diff2ignore_ | |||
MUST be considered EQUAL (if all other fields are equal). This | MUST be considered EQUAL (if all other fields are equal). This | |||
constant MUST be larger than _purge_lifetime_ to avoid | constant MUST be larger than _purge_lifetime_ to avoid | |||
retransmissions. | retransmissions. | |||
This normative ordering methodology is described in Figure 16 and | This normative ordering methodology is described in Figure 16 and | |||
MUST be used by all implementations. | MUST be used by all implementations. | |||
function Compare(X: TIEHeader, Y: TIEHeader) returns Ordering: | function Compare(X: TIEHeader, Y: TIEHeader) returns Ordering: | |||
seq_nr of a TIEHeader = TIEHeader.seq_nr | seq_nr of a TIEHeader = TIEHeader.seq_nr | |||
TIEID of a TIEHeader = TIEHeader.TIEID | TIEID of a TIEHeader = TIEHeader.TIEID | |||
direction of a TIEID = TIEID.direction | direction of a TIEID = TIEID.direction | |||
# System ID | # System ID | |||
originator of a TIEID = TIEID.originator | originator of a TIEID = TIEID.originator | |||
# is of type TIETypeType | # is of type TIETypeType | |||
skipping to change at page 57, line 31 ¶ | skipping to change at line 2512 ¶ | |||
else if X.direction < Y.direction: | else if X.direction < Y.direction: | |||
return Y is larger | return Y is larger | |||
else if X.originator > Y.originator: | else if X.originator > Y.originator: | |||
return X is larger | return X is larger | |||
else if X.originator < Y.originator: | else if X.originator < Y.originator: | |||
return Y is larger | return Y is larger | |||
else: | else: | |||
if X.tietype == Y.tietype: | if X.tietype == Y.tietype: | |||
if X.tie_nr == Y.tie_nr: | if X.tie_nr == Y.tie_nr: | |||
if X.seq_nr == Y.seq_nr: | if X.seq_nr == Y.seq_nr: | |||
X.lifetime_left = X.remaining_lifetime - time since TIE was received | X.lifetime_left = X.remaining_lifetime | |||
Y.lifetime_left = Y.remaining_lifetime - time since TIE was received | - time since TIE was received | |||
Y.lifetime_left = Y.remaining_lifetime | ||||
- time since TIE was received | ||||
if absolute_value_of(X.lifetime_left - Y.lifetime_left) <= common.lifetime_diff2ignore: | if absolute_value_of(X.lifetime_left - | |||
Y.lifetime_left) <= common.lifetime_diff2ignore: | ||||
return Both are Equal | return Both are Equal | |||
else: | else: | |||
return TIEHeader with larger lifetime_left is larger | return TIEHeader with larger lifetime_left is | |||
larger | ||||
else: | else: | |||
return return TIEHeader with larger seq_nr is larger | return TIEHeader with larger seq_nr is larger | |||
else: | else: | |||
return TIEHeader with larger tie_nr is larger | return TIEHeader with larger tie_nr is larger | |||
else: | else: | |||
return TIEHeader with larger TIEType is larger | return TIEHeader with larger TIEType is larger | |||
Figure 16: TIEHeader Comparison Function | Figure 16: TIEHeader Comparison Function | |||
All valid TIE types are defined in _TIETypeType_. This enum | All valid TIE types are defined in _TIETypeType_. This enum | |||
indicates what TIE type the TIE is carrying. In case the value is | indicates what TIE type the TIE is carrying. In case the value is | |||
not known to the receiver, the TIE MUST be re-flooded with scope | not known to the receiver, the TIE MUST be reflooded with the scope | |||
identical to the scope of a prefix TIE. This allows for future | identical to the scope of a prefix TIE. This allows for future | |||
extensions of the protocol within the same major schema with types | extensions of the protocol that are within the same major schema and | |||
opaque to some nodes with some restrictions defined in Section 7. | that have types that are opaque to some nodes; some restrictions are | |||
defined in Section 7. | ||||
6.3.3.1. Normative Flooding Procedures | 6.3.3.1. Normative Flooding Procedures | |||
On reception of a TIE with an undefined level value in the packet | On reception of a TIE with an undefined level value in the packet | |||
header the node MUST issue a warning and discard the packet. | header, the node MUST issue a warning and discard the packet. | |||
This section specifies the precise, normative flooding mechanism and | This section specifies the precise, normative flooding mechanism and | |||
can be omitted unless the reader is pursuing an implementation of the | can be omitted unless the reader is pursuing an implementation of the | |||
protocol or looks for a deep understanding of underlying information | protocol or looks for a deep understanding of underlying information | |||
distribution mechanism. | distribution mechanism. | |||
Flooding Procedures are described in terms of the flooding state of | Flooding procedures are described in terms of the flooding state of | |||
an adjacency and resulting operations on it driven by packet | an adjacency, and resulting operations on it are driven by packet | |||
arrivals. Implementations MUST implement a behavior that is | arrivals. Implementations MUST implement a behavior that is | |||
externally indistinguishable from the FSMs and normative procedures | externally indistinguishable from the FSMs and normative procedures | |||
given here. | given here. | |||
RIFT does not specify any kind of flood rate limiting. To help with | RIFT does not specify any kind of flood rate limiting. To help with | |||
adjustment of flooding speeds the encoded packets provide hints to | adjustment of flooding speeds, the encoded packets provide hints to | |||
react accordingly to losses or overruns via | react accordingly to losses or overruns via | |||
_you_are_sending_too_quickly_ in the _LIEPacket_ and `Packet Number` | _you_are_sending_too_quickly_ in the _LIEPacket_ and "Packet Number" | |||
in the security envelope described in Section 6.9.3. Flooding of all | in the security envelope described in Section 6.9.3. Flooding of all | |||
corresponding topology exchange elements SHOULD be performed at the | corresponding topology exchange elements SHOULD be performed at the | |||
highest feasible rate but the rate of transmission MUST be throttled | highest feasible rate, but the rate of transmission MUST be throttled | |||
by reacting to packet elements and features of the system such as | by reacting to packet elements and features of the system, such as | |||
e.g. queue lengths or congestion indications in the protocol packets. | queue lengths or congestion indications in the protocol packets. | |||
A node SHOULD NOT send out any topology information elements if the | A node SHOULD NOT send out any topology information elements if the | |||
adjacency is not in a "ThreeWay" state. No further tightening of | adjacency is not in a _ThreeWay_ state. No further tightening of | |||
this rule is possible. For example, link buffering may cause both | this rule is possible. For example, link buffering may cause both | |||
LIEs and TIEs/TIDEs/TIREs to be re-ordered. | LIEs and TIEs/TIDEs/TIREs to be reordered. | |||
A node MUST drop any received TIEs/TIDEs/TIREs unless it is in | A node MUST drop any received TIEs/TIDEs/TIREs unless it is in the | |||
_ThreeWay_ state. | _ThreeWay_ state. | |||
TIEs generated by other nodes MUST be re-flooded. TIDEs and TIREs | TIEs generated by other nodes MUST be reflooded. TIDEs and TIREs | |||
MUST NOT be re-flooded. | MUST NOT be reflooded. | |||
6.3.3.1.1. FloodState Structure per Adjacency | 6.3.3.1.1. FloodState Structure per Adjacency | |||
The structure contains conceptually for each adjacency the following | For each adjacency, the structure conceptually contains the following | |||
elements. The word "collection" or "queue" indicates a set of | elements. The word "collection" or "queue" indicates a set of | |||
elements that can be iterated over: | elements that can be iterated over the following: | |||
TIES_TX: | TIES_TX: | |||
Collection containing all the TIEs to transmit on the adjacency. | Collection containing all the TIEs to transmit on the adjacency. | |||
TIES_ACK: | TIES_ACK: | |||
Collection containing all the TIEs that have to be acknowledged on | Collection containing all the TIEs that have to be acknowledged on | |||
the adjacency. | the adjacency. | |||
TIES_REQ: | TIES_REQ: | |||
Collection containing all the TIE headers that have to be | Collection containing all the TIE headers that have to be | |||
skipping to change at page 59, line 31 ¶ | skipping to change at line 2604 ¶ | |||
TIES_RTX: | TIES_RTX: | |||
Collection containing all TIEs that need retransmission with the | Collection containing all TIEs that need retransmission with the | |||
corresponding time to retransmit. | corresponding time to retransmit. | |||
FILTERED_TIEDB: | FILTERED_TIEDB: | |||
A filtered view of TIEDB, which retains for consideration only | A filtered view of TIEDB, which retains for consideration only | |||
those headers permitted by is_tide_entry_filtered and which either | those headers permitted by is_tide_entry_filtered and which either | |||
have a lifetime left > 0 or have no content. | have a lifetime left > 0 or have no content. | |||
Following words are used for well-known elements and procedures | The following words are used for well-known elements and procedures | |||
operating on this structure: | operating on this structure: | |||
TIE: | TIE: | |||
Describes either a full RIFT TIE or just the _TIEHeader_ or | describes either a full RIFT TIE or just the _TIEHeader_ or | |||
_TIEID_ equivalent as defined in Section 7.3. The corresponding | _TIEID_ equivalent, as defined in Section 7.3. The corresponding | |||
meaning is unambiguously contained in the context of each | meaning is unambiguously contained in the context of each | |||
algorithm. | algorithm. | |||
is_flood_reduced(TIE): | is_flood_reduced(TIE): | |||
returns whether a TIE can be flood reduced or not. | returns whether a TIE can be flood-reduced or not. | |||
is_tide_entry_filtered(TIE): | is_tide_entry_filtered(TIE): | |||
returns whether a header should be propagated in TIDE according to | returns whether a header should be propagated in TIDE according to | |||
flooding scopes. | flooding scopes. | |||
is_request_filtered(TIE): | is_request_filtered(TIE): | |||
returns whether a TIE request should be propagated to neighbor or | returns whether a TIE request should be propagated to the neighbor | |||
not according to flooding scopes. | or not, according to flooding scopes. | |||
is_flood_filtered(TIE): | is_flood_filtered(TIE): | |||
returns whether a TIE requested be flooded to neighbor or not | returns whether a TIE requested be flooded to the neighbor or not, | |||
according to flooding scopes. | according to flooding scopes. | |||
try_to_transmit_tie(TIE): | try_to_transmit_tie(TIE): | |||
A. if not is_flood_filtered(TIE) then | if not is_flood_filtered(TIE), then | |||
1. remove TIE from TIES_RTX if present | 1. remove the TIE from TIES_RTX if present | |||
2. if TIE with same key is found on TIES_ACK then | 2. if the TIE with same key is found on TIES_ACK, then | |||
a. if TIE is same or newer than TIE do nothing else | a. if the TIE is the same as or newer than TIE, do nothing, | |||
else | ||||
b. remove TIE from TIES_ACK and add TIE to TIES_TX | b. remove the TIE from TIES_ACK and add TIE to TIES_TX | |||
3. else insert TIE into TIES_TX | 3. else insert the TIE into TIES_TX. | |||
ack_tie(TIE): | ack_tie(TIE): | |||
remove TIE from all collections and then insert TIE into TIES_ACK. | remove the TIE from all collections and then insert the TIE into | |||
TIES_ACK. | ||||
tie_been_acked(TIE): | tie_been_acked(TIE): | |||
remove TIE from all collections. | remove the TIE from all collections. | |||
remove_from_all_queues(TIE): | remove_from_all_queues(TIE): | |||
same as _tie_been_acked_. | same as _tie_been_acked_. | |||
request_tie(TIE): | request_tie(TIE): | |||
if not is_request_filtered(TIE) then remove_from_all_queues(TIE) | if not is_request_filtered(TIE), then remove_from_all_queues(TIE) | |||
and add to TIES_REQ. | and add to TIES_REQ. | |||
move_to_rtx_list(TIE): | move_to_rtx_list(TIE): | |||
remove TIE from TIES_TX and then add to TIES_RTX using TIE | remove the TIE from TIES_TX and then add to TIES_RTX, using the | |||
retransmission interval. | TIE retransmission interval. | |||
clear_requests(TIEs): | clear_requests(TIEs): | |||
remove all TIEs from TIES_REQ. | remove all TIEs from TIES_REQ. | |||
bump_own_tie(TIE): | bump_own_tie(TIE): | |||
for self-originated TIE originate an empty or re-generate with | for a self-originated TIE, originate an empty or regenerate with | |||
version number higher than the one in TIE. | the version number higher than the one in the TIE. | |||
The collection SHOULD be served with the following priorities if the | The collection SHOULD be served with the following priorities if the | |||
system cannot process all the collections in real time: | system cannot process all the collections in real time: | |||
1. Elements on TIES_ACK should be processed with highest priority | 1. Elements on TIES_ACK should be processed with highest priority | |||
2. TIES_TX | 2. TIES_TX | |||
3. TIES_REQ and TIES_RTX should be processed with lowest priority | 3. TIES_REQ and TIES_RTX should be processed with lowest priority | |||
6.3.3.1.2. TIDEs | 6.3.3.1.2. TIDEs | |||
_TIEID_ and _TIEHeader_ space forms a strict total order (modulo | _TIEID_ and _TIEHeader_ spaces form a strict total order (modulo | |||
incomparable sequence numbers (found in `TIEHeader.seq_nr`) as | incomparable sequence numbers (found in "TIEHeader.seq_nr"), as | |||
explained in Appendix A in the very unlikely event that can occur if | explained in Appendix A, in the very unlikely event that a TIE is | |||
a TIE is "stuck" in a part of a network while the originator reboots | "stuck" in a part of a network while the originator reboots and | |||
and reissues TIEs many times to the point its sequence# rolls over | reissues TIEs many times to the point its sequence number rolls over | |||
and forms incomparable distance to the "stuck" copy) which implies | and forms an incomparable distance to the "stuck" copy), which | |||
that a comparison relation is possible between two elements. With | implies that a comparison relation is possible between two elements. | |||
that it is implicitly possible to compare TIEs, TIEHeaders and TIEIDs | With that, it is implicitly possible to compare TIEs, TIEHeaders, and | |||
to each other whereas the shortest viable key is always implied. | TIEIDs to each other, whereas the shortest viable key is always | |||
implied. | ||||
6.3.3.1.2.1. TIDE Generation | 6.3.3.1.2.1. TIDE Generation | |||
As given by timer constant, periodically generate TIDEs by: | NEXT_TIDE_ID: ID of the next TIE to be sent in the TIDE. | |||
NEXT_TIDE_ID: ID of next TIE to be sent in TIDE. | As given by the timer constant, periodically generate TIDEs by: | |||
a. NEXT_TIDE_ID = MIN_TIEID | 1. NEXT_TIDE_ID = MIN_TIEID | |||
b. while NEXT_TIDE_ID not equal to MAX_TIEID do | 2. while NEXT_TIDE_ID is not equal to MAX_TIEID do: | |||
1. HEADERS = Exactly TIRDEs_PER_PKT headers from FILTERED_TIEDB | a. HEADERS = Exactly TIRES_PER_TIDE_PKT headers from | |||
starting at NEXT_TIDE_ID, unless fewer than TIRDEs_PER_PKT | FILTERED_TIEDB starting at NEXT_TIDE_ID, unless fewer than | |||
remain, in which case all remaining headers. | TIRES_PER_TIDE_PKT remain, in which case all remaining | |||
headers. | ||||
2. if HEADERS is empty then START = MIN_TIEID else START = first | b. if HEADERS is empty, then START = MIN_TIEID, else START = | |||
element in HEADERS | first element in HEADERS | |||
3. if HEADERS' size less than TIRDEs_PER_PKT then END = | c. if HEADERS size is less than TIRES_PER_TIDE_PKT, then END = | |||
MAX_TIEID else END = last element in HEADERS | MAX_TIEID, else END = last element in HEADERS | |||
4. send *sorted* HEADERS as TIDE setting START and END as its | d. send *sorted* HEADERS as TIDE, setting START and END as its | |||
range | range | |||
5. NEXT_TIDE_ID = END | e. NEXT_TIDE_ID = END | |||
The constant _TIRDEs_PER_PKT_ SHOULD be computed per interface and | The constant _TIRES_PER_TIDE_PKT_ SHOULD be computed per interface | |||
used by the implementation to limit the amount of TIE headers per | and used by the implementation to limit the amount of TIE headers per | |||
TIDE so the sent TIDE PDU does not exceed interface MTU. | TIDE so the sent TIDE PDU does not exceed the MTU of the interface. | |||
TIDE PDUs SHOULD be spaced on sending to prevent packet drops. | TIDE PDUs SHOULD be transmitted at a rate that does not lead to | |||
packet drops. | ||||
The algorithm will intentionally enter the loop once and send a | The algorithm will intentionally enter the loop once and send a | |||
single TIDE even when the database is empty, otherwise no TIDEs would | single TIDE, even when the database is empty; otherwise, no TIDEs | |||
be sent for in case of empty database and break intended | would be sent for in case of an empty database and break the intended | |||
synchronization. | synchronization. | |||
6.3.3.1.2.2. TIDE Processing | 6.3.3.1.2.2. TIDE Processing | |||
On reception of TIDEs the following processing is performed: | TXKEYS: Collection of TIE headers to be sent after processing of the | |||
packet | ||||
TXKEYS: Collection of TIE Headers to be sent after processing of | ||||
the packet | ||||
REQKEYS: Collection of TIEIDs to be requested after processing of | REQKEYS: Collection of TIEIDs to be requested after processing of | |||
the packet | the packet | |||
CLEARKEYS: Collection of TIEIDs to be removed from flood state | CLEARKEYS: Collection of TIEIDs to be removed from flood state | |||
queues | queues | |||
LASTPROCESSED: Last processed TIEID in TIDE | LASTPROCESSED: Last processed TIEID in the TIDE | |||
DBTIE: TIE in the Link State Database (LSDB) if found | DBTIE: TIE in the LSDB, if found | |||
a. LASTPROCESSED = TIDE.start_range | On reception of TIDEs, the following processing is performed: | |||
b. for every HEADER in TIDE do | 1. LASTPROCESSED = TIDE.start_range | |||
1. DBTIE = find HEADER in current LSDB | 2. For every HEADER in the TIDE do: | |||
2. if HEADER < LASTPROCESSED then report error and reset | a. DBTIE = find HEADER in the current LSDB | |||
b. if HEADER < LASTPROCESSED, then report an error and reset the | ||||
adjacency and return | adjacency and return | |||
3. put all TIEs in LSDB where (TIE.HEADER > LASTPROCESSED and | c. put all TIEs in LSDB, where (TIE.HEADER > LASTPROCESSED and | |||
TIE.HEADER < HEADER) into TXKEYS | TIE.HEADER < HEADER) into TXKEYS | |||
4. LASTPROCESSED = HEADER | d. LASTPROCESSED = HEADER | |||
5. if DBTIE not found then | e. if DBTIE is not found, then | |||
I) if originator is this node, then bump_own_tie | i. if originator is this node, then bump_own_tie | |||
II) else put HEADER into REQKEYS | ii. else put HEADER into REQKEYS | |||
6. if DBTIE.HEADER < HEADER then | f. if DBTIE.HEADER < HEADER then | |||
I) if originator is this node then bump_own_tie else | i. if the originator is this node, then bump_own_tie, else | |||
i. if this is a North TIE header from a northbound | 1. if this is a North TIE header from a northbound | |||
neighbor then override DBTIE in LSDB with HEADER | neighbor, then override DBTIE in LSDB with HEADER | |||
ii. else put HEADER into REQKEYS | 2. else put HEADER into REQKEYS | |||
7. if DBTIE.HEADER > HEADER then put DBTIE.HEADER into TXKEYS | g. if DBTIE.HEADER > HEADER, then put DBTIE.HEADER into TXKEYS | |||
8. if DBTIE.HEADER = HEADER then | h. if DBTIE.HEADER = HEADER, then | |||
I) if DBTIE has content already then put DBTIE.HEADER into | i. if DBTIE has content already, then put DBTIE.HEADER into | |||
CLEARKEYS | CLEARKEYS, else | |||
II) else put HEADER into REQKEYS | ii. put HEADER into REQKEYS | |||
c. put all TIEs in LSDB where (TIE.HEADER > LASTPROCESSED and | 3. put all TIEs in LSDB, where (TIE.HEADER > LASTPROCESSED and | |||
TIE.HEADER <= TIDE.end_range) into TXKEYS | TIE.HEADER <= TIDE.end_range) into TXKEYS | |||
d. for all TIEs in TXKEYS try_to_transmit_tie(TIE) | 4. for all TIEs in TXKEYS, try_to_transmit_tie(TIE) | |||
e. for all TIEs in REQKEYS request_tie(TIE) | 5. for all TIEs in REQKEYS, request_tie(TIE) | |||
f. for all TIEs in CLEARKEYS remove_from_all_queues(TIE) | 6. for all TIEs in CLEARKEYS, remove_from_all_queues(TIE) | |||
6.3.3.1.3. TIREs | 6.3.3.1.3. TIREs | |||
6.3.3.1.3.1. TIRE Generation | 6.3.3.1.3.1. TIRE Generation | |||
Elements from both TIES_REQ and TIES_ACK MUST be collected and sent | Elements from both TIES_REQ and TIES_ACK MUST be collected and sent | |||
out as fast as feasible as TIREs. When sending TIREs with elements | out as fast as feasible as TIREs. When sending TIREs with elements | |||
from TIES_REQ the _remaining_lifetime_ field in | from TIES_REQ, the _remaining_lifetime_ field in | |||
_TIEHeaderWithLifeTime_ MUST be set to 0 to force reflooding from the | _TIEHeaderWithLifeTime_ MUST be set to 0 to force reflooding from the | |||
neighbor even if the TIEs seem to be same. | neighbor even if the TIEs seem to be the same. | |||
6.3.3.1.3.2. TIRE Processing | 6.3.3.1.3.2. TIRE Processing | |||
On reception of TIREs the following processing is performed: | TXKEYS: Collection of TIE headers to be sent after processing of the | |||
packet | ||||
TXKEYS: Collection of TIE Headers to be sent after processing of | REQKEYS: Collection of TIEIDs to be requested after processing of | |||
the packet | the packet | |||
REQKEYS: Collection of TIEIDs to be requested after processing of | ACKKEYS: Collection of TIEIDs that have been acknowledged | |||
the packet | ||||
ACKKEYS: Collection of TIEIDs that have been acked | DBTIE: TIE in the LSDB, if found | |||
DBTIE: TIE in the LSDB if found | On reception of TIREs, the following processing is performed: | |||
a. for every HEADER in TIRE do | 1. for every HEADER in TIRE do: | |||
1. DBTIE = find HEADER in current LSDB | a. DBTIE = find HEADER in the current LSDB | |||
2. if DBTIE not found then do nothing | ||||
3. if DBTIE.HEADER < HEADER then put HEADER into REQKEYS | b. if DBTIE is not found, then do nothing | |||
4. if DBTIE.HEADER > HEADER then put DBTIE.HEADER into TXKEYS | c. if DBTIE.HEADER < HEADER, then put HEADER into REQKEYS | |||
5. if DBTIE.HEADER = HEADER then put DBTIE.HEADER into ACKKEYS | d. if DBTIE.HEADER > HEADER, then put DBTIE.HEADER into TXKEYS | |||
b. for all TIEs in TXKEYS try_to_transmit_tie(TIE) | e. if DBTIE.HEADER = HEADER, then put DBTIE.HEADER into ACKKEYS | |||
c. for all TIEs in REQKEYS request_tie(TIE) | 2. for all TIEs in TXKEYS, try_to_transmit_tie(TIE) | |||
d. for all TIEs in ACKKEYS tie_been_acked(TIE) | 3. for all TIEs in REQKEYS, request_tie(TIE) | |||
4. for all TIEs in ACKKEYS, tie_been_acked(TIE) | ||||
6.3.3.1.4. TIEs Processing on Flood State Adjacency | 6.3.3.1.4. TIEs Processing on Flood State Adjacency | |||
On reception of TIEs the following processing is performed: | On reception of TIEs, the following processing is performed: | |||
ACKTIE: TIE to acknowledge | ACKTIE: TIE to acknowledge | |||
TXTIE: TIE to transmit | TXTIE: TIE to transmit | |||
DBTIE: TIE in the LSDB if found | DBTIE: TIE in the LSDB, if found | |||
a. DBTIE = find TIE in current LSDB | 1. DBTIE = find TIE in the current LSDB | |||
b. if DBTIE not found then | 2. if DBTIE is not found, then | |||
1. if originator is this node then bump_own_tie with a short | a. if the originator is this node, then bump_own_tie with a | |||
remaining lifetime | short remaining lifetime | |||
2. else insert TIE into LSDB and ACKTIE = TIE | b. else insert TIE into LSDB and ACKTIE = TIE | |||
else | else | |||
1. if DBTIE.HEADER = TIE.HEADER then | a. if DBTIE.HEADER = TIE.HEADER, then | |||
i. if DBTIE has content already then ACKTIE = TIE | i. if DBTIE has content already, then ACKTIE = TIE | |||
ii. else process like the "DBTIE.HEADER < TIE.HEADER" case | ii. else process like the "DBTIE.HEADER < TIE.HEADER" case | |||
2. if DBTIE.HEADER < TIE.HEADER then | b. if DBTIE.HEADER < TIE.HEADER, then | |||
i. if originator is this node then bump_own_tie | i. if the originator is this node, then bump_own_tie | |||
ii. else insert TIE into LSDB and ACKTIE = TIE | ii. else insert TIE into LSDB and ACKTIE = TIE | |||
3. if DBTIE.HEADER > TIE.HEADER then | c. if DBTIE.HEADER > TIE.HEADER, then | |||
i. if DBTIE has content already then TXTIE = DBTIE | ||||
i. if DBTIE has content already, then TXTIE = DBTIE | ||||
ii. else ACKTIE = DBTIE | ii. else ACKTIE = DBTIE | |||
c. if TXTIE is set then try_to_transmit_tie(TXTIE) | 3. if TXTIE is set, then try_to_transmit_tie(TXTIE) | |||
d. if ACKTIE is set then ack_tie(TIE) | 4. if ACKTIE is set, then ack_tie(TIE) | |||
6.3.3.1.5. Sending TIEs | 6.3.3.1.5. Sending TIEs | |||
On a periodic basis all TIEs with lifetime left > 0 MUST be sent out | On a periodic basis, all TIEs with a lifetime of > 0 left MUST be | |||
on the adjacency, removed from TIES_TX list and requeued onto | sent out on the adjacency, removed from the TIES_TX list, and | |||
TIES_RTX list. The specific period is out of scope for this | requeued onto TIES_RTX list. The specific period is out of scope for | |||
document. | this document. | |||
6.3.3.1.6. TIEs Processing In LSDB | 6.3.3.1.6. TIEs Processing in LSDB | |||
The Link State Database (LSDB) holds the most recent copy of TIEs | The LSDB holds the most recent copy of TIEs received via flooding | |||
received via flooding from according peers. Consecutively, after | from according peers. Consecutively, after version tie-breaking by | |||
version tie-breaking by LSDB, a peer receives from the LSDB the | LSDB, a peer receives from the LSDB the newest versions of TIEs | |||
newest versions of TIEs received by other peers and processes them | received by other peers and processes them (without any filtering) | |||
(without any filtering) just like receiving TIEs from its remote | just like receiving TIEs from its remote peer. Such a publisher | |||
peer. Such a publisher model can be implemented in several ways, | model can be implemented in several ways, either in a single thread | |||
either in a single thread of execution or in multiple parallel | of execution or in multiple parallel threads. | |||
threads. | ||||
LSDB can be logically considered as the entity aging out TIEs, i.e. | LSDB can be logically considered as the entity aging out TIEs, i.e., | |||
being responsible to discard TIEs that are stored longer than | being responsible to discard TIEs that are stored longer than | |||
_remaining_lifetime_ on their reception. | _remaining_lifetime_ on their reception. | |||
LSDB is also expected to periodically re-originate the node's own | LSDB is also expected to periodically reoriginate the node's own | |||
TIEs. Originating at an interval significantly shorter than | TIEs. Originating at an interval significantly shorter than | |||
_default_lifetime_ is RECOMMENDED to prevent TIE expiration by other | _default_lifetime_ is RECOMMENDED to prevent TIE expiration by other | |||
nodes in the network which can lead to instabilities. | nodes in the network, which can lead to instabilities. | |||
6.3.4. TIE Flooding Scopes | 6.3.4. TIE Flooding Scopes | |||
In a somewhat analogous fashion to link-local, area and domain | In a somewhat analogous fashion to link-local, area, and domain | |||
flooding scopes, RIFT defines several complex "flooding scopes" | flooding scopes, RIFT defines several complex "flooding scopes", | |||
depending on the direction and type of TIE propagated. | depending on the direction and type of TIE propagated. | |||
Every North TIE is flooded northbound, providing a node at a given | Every North TIE is flooded northbound, providing a node at a given | |||
level with the complete topology of the Clos or Fat Tree network that | level with the complete topology of the Clos or fat tree network that | |||
is reachable southwards of it, including all specific prefixes. This | is reachable southwards of it, including all specific prefixes. This | |||
means that a packet received from a node at the same or lower level | means that a packet received from a node at the same or lower level | |||
whose destination is covered by one of those specific prefixes will | whose destination is covered by one of those specific prefixes will | |||
be routed directly towards the node advertising that prefix rather | be routed directly towards the node advertising that prefix, rather | |||
than sending the packet to a node at a higher level. | than sending the packet to a node at a higher level. | |||
A node's Node South TIEs, consisting of all node's adjacencies and | A node's South Node TIEs, consisting of all node's adjacencies and | |||
prefix South TIEs limited to those related to default IP prefix and | South Prefix TIEs limited to those related to default IP prefix and | |||
disaggregated prefixes, are flooded southbound in order to inform | disaggregated prefixes, are flooded southbound in order to inform | |||
nodes one level down of connectivity of the higher level as well as | nodes one level down of connectivity of the higher level as well as | |||
reachability to the rest of the fabric. In order to allow an E-W | reachability to the rest of the fabric. In order to allow an E-W | |||
disconnected node in a given level to receive the South TIEs of other | disconnected node in a given level to receive the South TIEs of other | |||
nodes at its level, every *NODE* South TIE is "reflected" northbound | nodes at its level, every South Node TIE is "reflected" northbound to | |||
to the level from which it was received. It should be noted that | the level from which it was received. It should be noted that East- | |||
East-West links are included in South TIE flooding (except at the ToF | West links are included in South TIE flooding (except at the ToF | |||
level); those TIEs need to be flooded to satisfy algorithms in | level); those TIEs need to be flooded to satisfy the algorithms | |||
Section 6.4. In that way nodes at same level can learn about each | described in Section 6.4. In that way, nodes at same level can learn | |||
other using without a lower level except in case of leaf level. The | about each other without using a lower level except in case of leaf | |||
precise, normative flooding scopes are given in Table 3. Those rules | level. The precise, normative flooding scopes are given in Table 3. | |||
also govern what SHOULD be included in TIDEs on the adjacency. | Those rules also govern what SHOULD be included in TIDEs on the | |||
Again, East-West flooding scopes are identical to South flooding | adjacency. Again, East-West flooding scopes are identical to | |||
scopes except in case of ToF East-West links (rings) which are | southern flooding scopes, except in case of ToF East-West links | |||
basically performing northbound flooding. | (rings), which are basically performing northbound flooding. | |||
Node South TIE "south reflection" enables support of positive | South Node TIE "south reflection" enables support of positive | |||
disaggregation on failures as described in Section 6.5 and flooding | disaggregation on failures, as described in Section 6.5, and flooding | |||
reduction in Section 6.3.9. | reduction, as described in Section 6.3.9. | |||
+===========+======================+==============+=================+ | +===========+======================+==============+=================+ | |||
| Type / | South | North | East-West | | | Type / | South | North | East-West | | |||
| Direction | | | | | | Direction | | | | | |||
+===========+======================+==============+=================+ | +===========+======================+==============+=================+ | |||
| Node | flood if level of | flood if | flood only if | | | South | flood if the level | flood if the | flood only if | | |||
| South TIE | originator is | level of | this node is | | | Node TIE | of the originator | level of the | this node is | | |||
| | equal to this | originator | not ToF | | | | is equal to this | originator | not ToF | | |||
| | node | is higher | | | | | node | is higher | | | |||
| | | than this | | | | | | than this | | | |||
| | | node | | | | | | node | | | |||
+-----------+----------------------+--------------+-----------------+ | +-----------+----------------------+--------------+-----------------+ | |||
| non-Node | flood self- | flood only | flood only if | | | non-Node | flood self- | flood only | flood only if | | |||
| South TIE | originated only | if neighbor | self-originated | | | South TIE | originated only | if the | it is self- | | |||
| | | is | and this node | | | | | neighbor is | originated and | | |||
| | | originator | is not ToF | | | | | the | this node is | | |||
| | | originator | not ToF | | ||||
| | | of TIE | | | | | | of TIE | | | |||
+-----------+----------------------+--------------+-----------------+ | +-----------+----------------------+--------------+-----------------+ | |||
| all North | never flood | flood always | flood only if | | | all North | never flood | flood always | flood only if | | |||
| TIEs | | | this node is | | | TIEs | | | this node is | | |||
| | | | ToF | | | | | | ToF | | |||
+-----------+----------------------+--------------+-----------------+ | +-----------+----------------------+--------------+-----------------+ | |||
| TIDE | include at least | include at | if this node is | | | TIDE | include at least | include at | if this node is | | |||
| | all non-self | least all | ToF then | | | | all non-self- | least all | ToF, then | | |||
| | originated North | Node South | include all | | | | originated North | South Node | include all | | |||
| | TIE headers and | TIEs and all | North TIEs, | | | | TIE headers and | TIEs and all | North TIEs; | | |||
| | self-originated | South TIEs | otherwise only | | | | self-originated | South TIEs | otherwise, only | | |||
| | South TIE headers | originated | self-originated | | | | South TIE headers | originated | include self- | | |||
| | and Node South | by peer and | TIEs | | | | and South Node TIEs | by a peer | originated TIEs | | |||
| | TIEs of nodes at | all North | | | | | of nodes at same | and all | | | |||
| | same level | TIEs | | | | | level | North TIEs | | | |||
+-----------+----------------------+--------------+-----------------+ | +-----------+----------------------+--------------+-----------------+ | |||
| TIRE as | request all North | request all | if this node is | | | TIRE as | request all North | request all | if this node is | | |||
| Request | TIEs and all | South TIEs | ToF then apply | | | Request | TIEs and all peer's | South TIEs | ToF, then apply | | |||
| | peer's self- | | North scope | | | | self-originated | | north scope | | |||
| | originated TIEs | | rules, | | | | TIEs and all South | | rules; | | |||
| | and all Node | | otherwise South | | | | Node TIEs | | otherwise, | | |||
| | South TIEs | | scope rules | | | | | | apply south | | |||
| | | | scope rules | | ||||
+-----------+----------------------+--------------+-----------------+ | +-----------+----------------------+--------------+-----------------+ | |||
| TIRE as | Ack all received | Ack all | Ack all | | | TIRE as | Ack all received | Ack all | Ack all | | |||
| Ack | TIEs | received | received TIEs | | | Ack | TIEs | received | received TIEs | | |||
| | | TIEs | | | | | | TIEs | | | |||
+-----------+----------------------+--------------+-----------------+ | +-----------+----------------------+--------------+-----------------+ | |||
Table 3: Normative Flooding Scopes | Table 3: Normative Flooding Scopes | |||
If the TIDE includes additional TIE headers beside the ones | If the TIDE includes additional TIE headers beside the ones | |||
specified, the receiving neighbor must apply the corresponding filter | specified, the receiving neighbor must apply the corresponding filter | |||
skipping to change at page 69, line 9 ¶ | skipping to change at line 3000 ¶ | |||
To illustrate these rules, consider using the topology in Figure 2, | To illustrate these rules, consider using the topology in Figure 2, | |||
with the optional link between spine 111 and spine 112, and the | with the optional link between spine 111 and spine 112, and the | |||
associated TIEs given in Figure 15. The flooding from particular | associated TIEs given in Figure 15. The flooding from particular | |||
nodes of the TIEs is given in Table 4. | nodes of the TIEs is given in Table 4. | |||
+============+==========+===========================================+ | +============+==========+===========================================+ | |||
| Local | Neighbor | TIEs Flooded from Local to Neighbor Node | | | Local | Neighbor | TIEs Flooded from Local to Neighbor Node | | |||
| Node | Node | | | | Node | Node | | | |||
+============+==========+===========================================+ | +============+==========+===========================================+ | |||
| Leaf111 | Spine | Leaf111 North TIEs, Spine 111 Node South | | | Leaf111 | Spine | Leaf111 North TIEs, Spine 111 South Node | | |||
| | 112 | TIE | | | | 112 | TIE | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| Leaf111 | Spine | Leaf111 North TIEs, Spine 112 Node South | | | Leaf111 | Spine | Leaf111 North TIEs, Spine 112 South Node | | |||
| | 111 | TIE | | | | 111 | TIE | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| ... | ... | ... | | | ... | ... | ... | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| Spine | Leaf111 | Spine 111 South TIEs | | | Spine | Leaf111 | Spine 111 South TIEs | | |||
| 111 | | | | | 111 | | | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| Spine | Leaf112 | Spine 111 South TIEs | | | Spine | Leaf112 | Spine 111 South TIEs | | |||
| 111 | | | | | 111 | | | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| Spine | Spine | Spine 111 South TIEs | | | Spine | Spine | Spine 111 South TIEs | | |||
| 111 | 112 | | | | 111 | 112 | | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| Spine | ToF 21 | Spine 111 North TIEs, Leaf111 North TIEs, | | | Spine | ToF 21 | Spine 111 North TIEs, Leaf111 North TIEs, | | |||
| 111 | | Leaf112 North TIEs, ToF 22 Node South TIE | | | 111 | | Leaf112 North TIEs, ToF 22 South Node TIE | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| Spine | ToF 22 | Spine 111 North TIEs, Leaf111 North TIEs, | | | Spine | ToF 22 | Spine 111 North TIEs, Leaf111 North TIEs, | | |||
| 111 | | Leaf112 North TIEs, ToF 21 Node South TIE | | | 111 | | Leaf112 North TIEs, ToF 21 South Node TIE | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| ... | ... | ... | | | ... | ... | ... | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| ToF 21 | Spine | ToF 21 South TIEs | | | ToF 21 | Spine | ToF 21 South TIEs | | |||
| | 111 | | | | | 111 | | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| ToF 21 | Spine | ToF 21 South TIEs | | | ToF 21 | Spine | ToF 21 South TIEs | | |||
| | 112 | | | | | 112 | | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| ToF 21 | Spine | ToF 21 South TIEs | | | ToF 21 | Spine | ToF 21 South TIEs | | |||
| | 121 | | | | | 121 | | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| ToF 21 | Spine | ToF 21 South TIEs | | | ToF 21 | Spine | ToF 21 South TIEs | | |||
| | 122 | | | | | 122 | | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| ... | ... | ... | | | ... | ... | ... | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
Table 4: Flooding some TIEs from example topology | Table 4: Flooding Some TIEs from Example Topology | |||
6.3.5. RAIN: RIFT Adjacency Inrush Notification | 6.3.5. RAIN: RIFT Adjacency Inrush Notification | |||
The optional RIFT Adjacency Inrush Notification (RAIN) mechanism | The optional RIFT Adjacency Inrush Notification (RAIN) mechanism | |||
helps to prevent adjacencies from being overwhelmed by flooding on | helps to prevent adjacencies from being overwhelmed by flooding on | |||
restart or bring-up with many southbound neighbors. A node MAY set | restart or bring-up with many southbound neighbors. In its LIEs, a | |||
in its LIEs the corresponding _you_are_sending_too_quickly_ flag to | node MAY set the corresponding _you_are_sending_too_quickly_ flag to | |||
indicate to the neighbor that it SHOULD flood Node TIEs with normal | indicate to the neighbor that it SHOULD flood Node TIEs with normal | |||
speed and significantly slow down the flooding of any other TIEs. | speed and significantly slow down the flooding of any other TIEs. | |||
The flag SHOULD be set only in the southbound direction. The | The flag SHOULD be set only in the southbound direction. The | |||
receiving node SHOULD accommodate the request to lessen the flooding | receiving node SHOULD accommodate the request to lessen the flooding | |||
load on the affected node if south of the sender and should ignore | load on the affected node if it is south of the sender and should | |||
the indication if north of the sender. | ignore the indication if it is north of the sender. | |||
The distribution of Node TIEs at normal speed even at high load | The distribution of Node TIEs at normal speed, even at high load, | |||
guarantees correct behavior of algorithms like disaggregation or | guarantees correct behavior of algorithms like disaggregation or | |||
default route origination. Furthermore though, the use of this bit | default route origination. Furthermore though, the use of this bit | |||
presents an inherent trade-off between processing load and | presents an inherent trade-off between processing load and | |||
convergence speed since significantly slowing down flooding of | convergence speed since significantly slowing down flooding of | |||
northbound prefixes from neighbors for an extended time will lead to | northbound prefixes from neighbors for an extended time will lead to | |||
traffic losses. | traffic losses. | |||
6.3.6. Initial and Periodic Database Synchronization | 6.3.6. Initial and Periodic Database Synchronization | |||
The initial exchange of RIFT includes periodic TIDE exchanges that | The initial exchange of RIFT includes periodic TIDE exchanges that | |||
contain description of the link state database and TIREs which | contain descriptions of the LSDB and TIREs, which perform the | |||
perform the function of requesting unknown TIEs as well as confirming | function of requesting unknown TIEs as well as confirming the | |||
reception of flooded TIEs. The content of TIDEs and TIREs is | reception of flooded TIEs. The content of TIDEs and TIREs is | |||
governed by Table 3. | governed by Table 3. | |||
6.3.7. Purging and Roll-Overs | 6.3.7. Purging and Rollovers | |||
When a node exits the network, if "unpurged", residual stale TIEs may | When a node exits the network, if "unpurged", residual stale TIEs may | |||
exist in the network until their lifetimes expire (which in case of | exist in the network until their lifetimes expire (which in case of | |||
RIFT is by default a rather long period to prevent ongoing re- | RIFT is by default a rather long period to prevent ongoing | |||
origination of TIEs in very large topologies). RIFT does not have a | reorigination of TIEs in very large topologies). RIFT does not have | |||
"purging mechanism" based on sending specialized "purge" packets. In | a "purging mechanism" based on sending specialized "purge" packets. | |||
other routing protocols such a mechanism has proven to be complex and | In other routing protocols, such a mechanism has proven to be complex | |||
fragile based on many years of experience. RIFT simply issues a new, | and fragile based on many years of experience. RIFT simply issues a | |||
i.e., higher sequence number, empty version of the TIE with a short | new, i.e., higher sequence number, empty version of the TIE with a | |||
lifetime given by the _purge_lifetime_ constant and relies on each | short lifetime given by the _purge_lifetime_ constant and relies on | |||
node to age out and delete each TIE copy independently. Abundant | each node to age out and delete each TIE copy independently. | |||
amounts of memory are available today even on low-end platforms and | Abundant amounts of memory are available today, even on low-end | |||
hence keeping those relatively short-lived extra copies for a while | platforms, and hence, keeping those relatively short-lived extra | |||
is acceptable. The information will age out and in the meantime all | copies for a while is acceptable. The information will age out and, | |||
computations will deliver correct results if a node leaves the | in the meantime, all computations will deliver correct results if a | |||
network due to the new information distributed by its adjacent nodes | node leaves the network due to the new information distributed by its | |||
breaking bi-directional connectivity checks in different | adjacent nodes breaking bidirectional connectivity checks in | |||
computations. | different computations. | |||
Once a RIFT node issues a TIE with an ID, it SHOULD preserve the ID | Once a RIFT node issues a TIE with an ID, it SHOULD preserve the ID | |||
as long as feasible (also when the protocol restarts), even if the | as long as feasible (also when the protocol restarts), even if the | |||
TIE looses all content. The re-advertisement of an empty TIE | TIE looses all content. The re-advertisement of an empty TIE | |||
fulfills the purpose of purging any information advertised in | fulfills the purpose of purging any information advertised in | |||
previous versions. The originator is free to not re-originate the | previous versions. The originator is free to not reoriginate the | |||
corresponding empty TIE again or originate an empty TIE with | corresponding empty TIE again or originate an empty TIE with a | |||
relatively short lifetime to prevent large number of long-lived empty | relatively short lifetime to prevent a large number of long-lived | |||
stubs polluting the network. Each node MUST time out and clean up | empty stubs polluting the network. Each node MUST time out and clean | |||
the corresponding empty TIEs independently. | up the corresponding empty TIEs independently. | |||
Upon restart a node MUST be prepared to receive TIEs with its own | Upon restart, a node MUST be prepared to receive TIEs with its own | |||
System ID and supersede them with equivalent, newly generated, empty | System ID and supersede them with equivalent, newly generated, empty | |||
TIEs with a higher sequence number. As above, the lifetime can be | TIEs with a higher sequence number. As above, the lifetime can be | |||
relatively short since it only needs to exceed the necessary | relatively short since it only needs to exceed the necessary | |||
propagation and processing delay by all the nodes that are within the | propagation and processing delay by all the nodes that are within the | |||
TIE's flooding scope. | TIE's flooding scope. | |||
TIE sequence numbers are rolled over using the method described in | TIE sequence numbers are rolled over using the method described in | |||
Appendix A . First sequence number of any spontaneously originated | Appendix A . The first sequence number of any spontaneously | |||
TIE (i.e. not originated to override a detected older copy in the | originated TIE (i.e., not originated to override a detected older | |||
network) MUST be a reasonably unpredictable random number (for | copy in the network) MUST be a reasonably unpredictable random number | |||
example [RFC4086]) in the interval [0, 2^30-1] which will prevent | (for example, [RFC4086]) in the interval [0, 2^30-1], which will | |||
otherwise identical TIE headers to remain "stuck" in the network with | prevent otherwise identical TIE headers to remain "stuck" in the | |||
content different from TIE originated after reboot. In traditional | network with content different from the TIE originated after reboot. | |||
link-state protocols this is delegated to a 16-bit checksum on packet | In typical link-state protocols, this is delegated to a 16-bit | |||
content. RIFT avoids this design due to the CPU burden presented by | checksum on packet content. RIFT avoids this design due to the CPU | |||
computation of such checksums and additional complications tied to | burden presented by computation of such checksums and additional | |||
the fact that the checksum must be "patched" into the packet after | complications tied to the fact that the checksum must be "patched" | |||
the generation of the content, a difficult proposition in binary | into the packet after the generation of the content, which is a | |||
hand-crafted formats already and highly incompatible with model- | difficult proposition in binary, hand-crafted formats already and | |||
based, serialized formats. The sequence number space is hence | highly incompatible with model-based, serialized formats. The | |||
consciously chosen to be 64-bits wide to make the occurrence of a TIE | sequence number space is hence consciously chosen to be 64-bits wide | |||
with same sequence number but different content as much or even more | to make the occurrence of a TIE with the same sequence number but | |||
unlikely than the checksum method. To emulate the "checksum | different content as much or even more unlikely than the checksum | |||
behavior" an implementation could choose to compute a 64-bit checksum | method. To emulate the "checksum behavior", an implementation could | |||
or hash function over the TIE content and use that as part of the | choose to compute a 64-bit checksum or hash function over the TIE | |||
first sequence number after reboot. | content and use that as part of the first sequence number after | |||
reboot. | ||||
6.3.8. Southbound Default Route Origination | 6.3.8. Southbound Default Route Origination | |||
Under certain conditions nodes issue a default route in their South | Under certain conditions, nodes issue a default route in their South | |||
Prefix TIEs with costs as computed in Section 6.8.7.1. | Prefix TIEs with costs as computed in Section 6.8.7.1. | |||
A node X that | A node X that | |||
1. is *not* overloaded *and* | 1. is *not* overloaded *and* | |||
2. has southbound or East-West adjacencies | 2. has southbound or East-West adjacencies | |||
SHOULD originate in its south prefix TIE such a default route if and | ||||
SHOULD originate such a default route in its South Prefix TIE if and | ||||
only if | only if | |||
1. all other nodes at X's' level are overloaded *or* | 1. all other nodes at X's level are overloaded *or* | |||
2. all other nodes at X's' level have NO northbound adjacencies *or* | 2. all other nodes at X's level have NO northbound adjacencies, *or* | |||
3. X has computed reachability to a default route during N-SPF. | 3. X has computed reachability to a default route during N-SPF. | |||
The term "all other nodes at X's' level" describes obviously just the | The term "all other nodes at X's level" obviously describes just the | |||
nodes at the same level in the PoD with a viable lower level | nodes at the same level in the PoD with a viable lower level | |||
(otherwise the Node South TIEs cannot be reflected. The nodes in PoD | (otherwise, the South Node TIEs cannot be reflected; the nodes in PoD | |||
1 and PoD 2 are "invisible" to each other). | 1 and PoD 2 are "invisible" to each other). | |||
A node originating a southbound default route SHOULD install a | A node originating a southbound default route SHOULD install a | |||
default discard route if it did not compute a default route during | default discard route if it did not compute a default route during | |||
N-SPF. This basically means that the top of the fabric will drop | N-SPF. This basically means that the top of the fabric will drop | |||
traffic for unreachable addresses. | traffic for unreachable addresses. | |||
6.3.9. Northbound TIE Flooding Reduction | 6.3.9. Northbound TIE Flooding Reduction | |||
RIFT chooses only a subset of northbound nodes to propagate flooding | RIFT chooses only a subset of northbound nodes to propagate flooding | |||
and with that both balances it (to prevent 'hot' flooding links) | and, with that, both balances it (to prevent "hot" flooding links) | |||
across the fabric as well as reduces its volume. The solution is | across the fabric as well as reduces its volume. The solution is | |||
based on several principles: | based on several principles: | |||
1. a node MUST flood self-originated North TIEs to all the reachable | 1. a node MUST flood self-originated North TIEs to all the reachable | |||
nodes at the level above which is called the node's "parents"; | nodes at the level above, which is called the node's "parents"; | |||
2. it is typically not necessary that all parents reflood the North | 2. it is typically not necessary that all parents reflood the North | |||
TIEs to achieve a complete flooding of all the reachable nodes | TIEs to achieve a complete flooding of all the reachable nodes | |||
two levels above which we call the node's "grandparents"; | two levels above, which we call the node's "grandparents"; | |||
3. to control the volume of its flooding two hops North and yet keep | 3. to control the volume of its flooding two hops north and yet keep | |||
it robust enough, it is advantageous for a node to select a | it robust enough, it is advantageous for a node to select a | |||
subset of its parents as "Flood Repeaters" (FRs), which when | subset of its parents as "Flood Repeaters" (FRs), which when | |||
combined, deliver two or more copies of its flooding to all of | combined, deliver two or more copies of its flooding to all of | |||
its parents, i.e. the originating node's grandparents; | its parents, i.e., the originating node's grandparents; | |||
4. nodes at the same level do *not* have to agree on a specific | 4. nodes at the same level do *not* have to agree on a specific | |||
algorithm to select the FRs, but overall load balancing should be | algorithm to select the FRs, but overall load balancing should be | |||
achieved so that different nodes at the same level should tend to | achieved so that different nodes at the same level should tend to | |||
select different parents as FRs (consideration of possible | select different parents as FRs (consideration of possible | |||
strategies in an unrelated but similar field can be found in | strategies in an unrelated but similar field can be found in | |||
[RFC2991]); | [RFC2991]); | |||
5. there are usually many solutions to the problem of finding a set | 5. there are usually many solutions to the problem of finding a set | |||
of FRs for a given node; the problem of finding the minimal set | of FRs for a given node; the problem of finding the minimal set | |||
is (similar to) a NP-Complete problem and a globally optimal set | is (similar to) an NP-Complete problem, and a globally optimal | |||
may not be the minimal one if load-balancing with other nodes is | set may not be the minimal one if load balancing with other nodes | |||
an important consideration; | is an important consideration; | |||
6. it is expected that there will often exist sets of equivalent | 6. it is expected that sets of equivalent nodes at a level L will | |||
nodes at a level L, defined as having a common set of parents at | often exist, defined as having a common set of parents at L+1. | |||
L+1. Applying this observation at both L and L+1, an algorithm | Applying this observation at both L and L+1, an algorithm may | |||
may attempt to split the larger problem in a sum of smaller | attempt to split the larger problem in a sum of smaller, separate | |||
separate problems; | problems; and | |||
7. it is expected that there will be from time to time a broken link | 7. it is expected that there will be a broken link between a parent | |||
between a parent and a grandparent, and in that case the parent | and a grandparent from time to time, and in that case, the parent | |||
is probably a poor FR due to its lower reliability. An algorithm | is probably a poor FR due to its lower reliability. An algorithm | |||
may attempt to eliminate parents with broken northbound | may attempt to eliminate parents with broken northbound | |||
adjacencies first in order to reduce the number of FRs. Albeit | adjacencies first in order to reduce the number of FRs. Albeit | |||
it could be argued that relying on higher fanout FRs will slow | it could be argued that relying on higher fanout FRs will slow | |||
flooding due to higher replication, load reliability of FR's | flooding due to higher replication, load reliability of FR's | |||
links is likely a more pressing concern. | links is likely a more pressing concern. | |||
In a fully connected Clos Network, this means that a node selects one | In a fully connected Clos network, this means that a node selects one | |||
arbitrary parent as FR and then a second one for redundancy. The | arbitrary parent as the FR and then a second one for redundancy. The | |||
computation can be relatively simple and completely distributed | computation can be relatively simple and completely distributed | |||
without any need for synchronization among nodes. In a "PoD" | without any need for synchronization among nodes. In a "PoD" | |||
structure, where the Level L+2 is partitioned into silos of | structure, where the level L+2 is partitioned into silos of | |||
equivalent grandparents that are only reachable from respective | equivalent grandparents that are only reachable from respective | |||
parents, this means treating each silo as a fully connected Clos | parents, this means treating each silo as a fully connected Clos | |||
Network and solving the problem within the silo. | network and solving the problem within the silo. | |||
In terms of signaling, a node has enough information to select its | In terms of signaling, a node has enough information to select its | |||
set of FRs; this information is derived from the node's parents' Node | set of FRs; this information is derived from the node's parents' | |||
South TIEs, which indicate the parent's reachable northbound | South Node TIEs, which indicate the parent's reachable northbound | |||
adjacencies to its own parents (the node's grandparents). A node may | adjacencies to its own parents (the node's grandparents). A node may | |||
send a LIE to a northbound neighbor with the optional boolean field | send a LIE to a northbound neighbor with the optional boolean field | |||
_you_are_flood_repeater_ set to false, to indicate that the | _you_are_flood_repeater_ set to false to indicate that the northbound | |||
northbound neighbor is not a flood repeater for the node that sent | neighbor is not a flood repeater for the node that sent the LIE. In | |||
the LIE. In that case the northbound neighbor SHOULD NOT reflood | that case, the northbound neighbor SHOULD NOT reflood northbound TIEs | |||
northbound TIEs received from the node that sent the LIE. If the | received from the node that sent the LIE. If | |||
_you_are_flood_repeater_ is absent or if _you_are_flood_repeater_ is | _you_are_flood_repeater_ is absent or _you_are_flood_repeater_ is set | |||
set to true, then the northbound neighbor is a flood repeater for the | to true, then the northbound neighbor is a flood repeater for the | |||
node that sent the LIE and MUST reflood northbound TIEs received from | node that sent the LIE and MUST reflood northbound TIEs received from | |||
that node. The element _you_are_flood_repeater_ MUST be ignored if | that node. The element _you_are_flood_repeater_ MUST be ignored if | |||
received from a northbound adjacency. | received from a northbound adjacency. | |||
This specification provides a simple default algorithm that SHOULD be | This specification provides a simple default algorithm that SHOULD be | |||
implemented and used by default on every RIFT node. | implemented and used by default on every RIFT node. | |||
* let |NA(Node) be the set of Northbound adjacencies of node Node | * let |NA(Node) be the set of northbound adjacencies of node Node | |||
and CN(Node) be the cardinality of |NA(Node); | and CN(Node) be the cardinality of |NA(Node); | |||
* let |SA(Node) be the set of Southbound adjacencies of node Node | * let |SA(Node) be the set of southbound adjacencies of node Node | |||
and CS(Node) be the cardinality of |SA(Node); | and CS(Node) be the cardinality of |SA(Node); | |||
* let |P(Node) be the set of node Node's parents; | * let |P(Node) be the set of node Node's parents; | |||
* let |G(Node) be the set of node Node's grandparents. Observe | * let |G(Node) be the set of node Node's grandparents. Observe | |||
that |G(Node) = |P(|P(Node)); | that |G(Node) = |P(|P(Node)); | |||
* let N be the child node at level L computing a set of FR; | * let N be the child node at level L computing a set of FRs; | |||
* let P be a node at level L+1 and a parent node of N, i.e. bi- | * let P be a node at level L+1 and a parent node of N, i.e., | |||
directionally reachable over adjacency ADJ(N, P); | bidirectionally reachable over adjacency ADJ(N, P); | |||
* let G be a grandparent node of N, reachable transitively via a | * let G be a grandparent node of N, reachable transitively via a | |||
parent P over adjacencies ADJ(N, P) and ADJ(P, G). Observe that N | parent P over adjacencies ADJ(N, P) and ADJ(P, G). Observe that N | |||
does not have enough information to check bidirectional | does not have enough information to check bidirectional | |||
reachability of ADJ(P, G); | reachability of ADJ(P, G); | |||
* let R be a redundancy constant integer; a value of 2 or higher for | * let R be a redundancy constant integer; a value of 2 or higher for | |||
R is RECOMMENDED; | R is RECOMMENDED; | |||
* let S be a similarity constant integer; a value in range 0 .. 2 | * let S be a similarity constant integer; a value in range 0 .. 2 | |||
for S is RECOMMENDED, the value of 1 SHOULD be used. Two | for S is RECOMMENDED, and the value of 1 SHOULD be used. Two | |||
cardinalities are considered as equivalent if their absolute | cardinalities are considered as equivalent if their absolute | |||
difference is less than or equal to S, i.e. |a-b|<=S. | difference is less than or equal to S, i.e., |a-b|<=S | |||
* let RND be a 64-bit random number (for example [RFC4086]) | * let RND be a 64-bit random number (for example, as described in | |||
generated by the system once on startup. | [RFC4086]) generated by the system once on startup. | |||
The algorithm consists of the following steps: | The algorithm consists of the following steps: | |||
1. Derive a 64-bits number by XOR'ing 'N's System ID with RND. | 1. Derive a 64-bit number by XORing N's System ID with RND. | |||
2. Derive a 16-bits pseudo-random unsigned integer PR(N) from the | 2. Derive a 16-bit pseudo-random unsigned integer PR(N) from the | |||
resulting 64-bits number by splitting it in 16-bits-long words | resulting 64-bit number by splitting it into 16-bit-long words | |||
W1, W2, W3, W4 (where W1 are the least significant 16 bits of the | W1, W2, W3, W4 (where W1 are the least significant 16 bits of the | |||
64-bits number, and W4 are the most significant 16 bits) and then | 64-bit number, and W4 are the most significant 16 bits) and then | |||
XOR'ing the circularly shifted resulting words together: | XORing the circularly shifted resulting words together: | |||
A. (W1<<1) xor (W2<<2) xor (W3<<3) xor (W4<<4); | A. (W1<<1) xor (W2<<2) xor (W3<<3) xor (W4<<4); | |||
where << is the circular shift operator. | where << is the circular shift operator. | |||
3. Sort the parents by decreasing number of northbound adjacencies | 3. Sort the parents by decreasing number of northbound adjacencies | |||
(using decreasing System ID of the parent as tie-breaker): | (using decreasing System ID of the parent as a tie-breaker): | |||
sort |P(N) by decreasing CN(P), for all P in |P(N), as ordered | sort |P(N) by decreasing CN(P), for all P in |P(N), as the | |||
array |A(N) | ordered array |A(N) | |||
4. Partition |A(N) in subarrays |A_k(N) of parents with equivalent | 4. Partition |A(N) in subarrays |A_k(N) of parents with equivalent | |||
cardinality of northbound adjacencies (in other words with | cardinality of northbound adjacencies (in other words, with | |||
equivalent number of grandparents they can reach): | equivalent number of grandparents they can reach): | |||
A. set k=0; // k is the ID of the subarrray | a. set k=0; // k is the ID of the subarray | |||
B. set i=0; | b. set i=0; | |||
C. while i < CN(N) do | c. while i < CN(N) do | |||
i) set j=i; | i. set j=i; | |||
ii) while i < CN(N) and CN(|A(N)[j]) - CN(|A(N)[i]) <= S | ii. while i < CN(N) and CN(|A(N)[j]) - CN(|A(N)[i]) <= S: | |||
a. place |A(N)[i] in |A_k(N) // abstract action, maybe | 1. place |A(N)[i] in |A_k(N) // abstract action, maybe | |||
noop | noop | |||
b. set i=i+1; | 2. set i=i+1; | |||
iii) /* At this point j is the index in |A(N) of the first | iii. /* At this point, j is the index in |A(N) of the first | |||
member of |A_k(N) and (i-j) is C_k(N) defined as the | member of |A_k(N) and (i-j) is C_k(N) defined as the | |||
cardinality of |A_k(N) */ | cardinality of |A_k(N). */ | |||
set k=k+1; | set k=k+1; | |||
/* At this point k is the total number of subarrays, initialized | /* At this point, k is the total number of subarrays, initialized | |||
for the shuffling operation below */ | for the shuffling operation below. */ | |||
5. shuffle individually each subarrays |A_k(N) of cardinality C_k(N) | 5. Shuffle each subarrays |A_k(N) of cardinality C_k(N) within |A(N) | |||
within |A(N) using the Durstenfeld variation of Fisher-Yates | individually using the Durstenfeld variation of the Fisher-Yates | |||
algorithm that depends on N's System ID: | algorithm that depends on N's System ID: | |||
A. while k > 0 do | a. while k > 0 do | |||
i) for i from C_k(N)-1 to 1 decrementing by 1 do | i. for i from C_k(N)-1 to 1 decrementing by 1 do | |||
a. set j to PR(N) modulo i; | 1. set j to PR(N) modulo i; | |||
b. exchange |A_k[j] and |A_k[i]; | 2. exchange |A_k[j] and |A_k[i]; | |||
ii) set k=k-1; | ii. set k=k-1; | |||
6. For each grandparent G, initialize a counter c(G) with the number | 6. For each grandparent G, initialize a counter c(G) with the number | |||
of its south-bound adjacencies to elected flood repeaters (which | of its southbound adjacencies to elected flood repeaters (which | |||
is initially zero): | is initially zero): | |||
A. for each G in |G(N) set c(G) = 0; | a. for each G in |G(N), set c(G) = 0; | |||
7. Finally keep as FRs only parents that are needed to maintain the | 7. Finally, only keep FRs as parents that are needed to maintain the | |||
number of adjacencies between the FRs and any grandparent G equal | number of adjacencies between the FRs and any grandparent G equal | |||
or above the redundancy constant R: | or above the redundancy constant R: | |||
A. for each P in reshuffled |A(N); | a. for each P in reshuffled |A(N): | |||
i) if there exists an adjacency ADJ(P, G) in |NA(P) such | i. if there exists an adjacency ADJ(P, G) in |NA(P) such | |||
that c(G) < R then | that c(G) < R, then | |||
a. place P in FR set; | 1. place P in FR set; | |||
b. for all adjacencies ADJ(P, G') in |NA(P) increment | 2. for all adjacencies ADJ(P, G') in |NA(P) increment | |||
c(G') | c(G') | |||
B. If any c(G) is still < R, it was not possible to elect a set | 8. If any c(G) is still < R, it was not possible to elect a set of | |||
of FRs that covers all grandparents with redundancy R | FRs that covers all grandparents with redundancy R | |||
Additional rules for flooding reduction: | Additional rules for flooding reduction: | |||
1. The algorithm MUST be re-evaluated by a node on every change of | 1. The algorithm MUST be re-evaluated by a node on every change of | |||
local adjacencies or reception of a parent South TIE with changed | local adjacencies or reception of a parent South TIE with changed | |||
adjacencies. A node MAY apply a hysteresis to prevent excessive | adjacencies. A node MAY apply a hysteresis to prevent an | |||
amount of computation during periods of network instability just | excessive amount of computation during periods of network | |||
like in the case of reachability computation. | instability just like in the case of reachability computation. | |||
2. Upon a change of the flood repeater set, a node SHOULD send out | 2. Upon a change of the flood repeater set, a node SHOULD send out | |||
LIEs that grant flood repeater status to newly promoted nodes | LIEs that grant flood repeater status to newly promoted nodes | |||
before it sends LIEs that revoke the status to the nodes that | before it sends LIEs that revoke the status to the nodes that | |||
have been newly demoted. This is done to prevent transient | have been newly demoted. This is done to prevent transient | |||
behavior where the full coverage of grandparents is not | behavior where the full coverage of grandparents is not | |||
guaranteed. Such a condition is sometimes unavoidable in case of | guaranteed. Such a condition is sometimes unavoidable in case of | |||
lost LIEs but it will correct itself though at possible transient | lost LIEs, but it will correct itself at possible transient | |||
reduction in flooding propagation speeds. The election can use | reduction in flooding propagation speeds. The election can use | |||
the LIE FSM _FloodLeadersChanged_ event to notify LIE FSMs of | the LIE FSM _FloodLeadersChanged_ event to notify LIE FSMs of the | |||
necessity to update the sent LIEs. | necessity to update the sent LIEs. | |||
3. A node MUST always flood its self-originated TIEs to all its | 3. A node MUST always flood its self-originated TIEs to all its | |||
neighbors. | neighbors. | |||
4. A node receiving a TIE originated by a node for which it is not a | 4. A node receiving a TIE originated by a node for which it is not a | |||
flood repeater SHOULD NOT reflood such TIEs to its neighbors | flood repeater SHOULD NOT reflood such TIEs to its neighbors, | |||
except for rules in Section 6.3.9, Paragraph 10, Item 6. | except for the rules described in Section 6.3.9, Paragraph 10, | |||
Item 6. | ||||
5. The indication of flood reduction capability MUST be carried in | 5. The indication of flood reduction capability MUST be carried in | |||
the Node TIEs in the _flood_reduction_ element and MAY be used to | the Node TIEs in the _flood_reduction_ element and MAY be used to | |||
optimize the algorithm to account for nodes that will flood | optimize the algorithm to account for nodes that will flood | |||
regardless. | regardless. | |||
6. A node generates TIDEs as usual but when receiving TIREs or TIDEs | 6. A node generates TIDEs as usual, but when receiving TIREs or | |||
resulting in requests for a TIE of which the newest received copy | TIDEs resulting in requests for a TIE of which the newest | |||
came on an adjacency where the node was not flood repeater it | received copy came on an adjacency where the node was not a flood | |||
SHOULD ignore such requests on first and only first request. | repeater, it SHOULD ignore such requests on only the first | |||
Normally, the nodes that received the TIEs as flooding repeaters | request. Normally, the nodes that received the TIEs as flooding | |||
should satisfy the requesting node and with that no further TIREs | repeaters should satisfy the requesting node and, with that, no | |||
for such TIEs will be generated. Otherwise, the next set of | further TIREs for such TIEs will be generated. Otherwise, the | |||
TIDEs and TIREs MUST lead to flooding independent of the flood | next set of TIDEs and TIREs MUST lead to flooding independent of | |||
repeater status. This solves a very difficult incast problem on | the flood repeater status. This solves a very difficult "incast" | |||
nodes restarting with a very wide fanout, especially northbound. | problem on nodes restarting with a very wide fanout, especially | |||
To retrieve the full database they often end up processing many | northbound. To retrieve the full database, they often end up | |||
in-rushing copies whereas this approach load-balances the | processing many inrushing copies, whereas this approach load | |||
incoming database between adjacent nodes and flood repeaters and | balances the incoming database between adjacent nodes and flood | |||
should guarantee that two copies are sent by different nodes to | repeaters and should guarantee that two copies are sent by | |||
ensure against any losses. | different nodes to ensure against any losses. | |||
6.3.10. Special Considerations | 6.3.10. Special Considerations | |||
First, due to the distributed, asynchronous nature of ZTP, it can | First, due to the distributed, asynchronous nature of ZTP, it can | |||
create temporary convergence anomalies where nodes at higher levels | create temporary convergence anomalies where nodes at higher levels | |||
of the fabric temporarily become lower than where they ultimately | of the fabric temporarily become lower than where they ultimately | |||
belong. Since flooding can begin before ZTP is "finished" and in | belong. Since flooding can begin before ZTP is "finished" and in | |||
fact must do so given there is no global termination criteria for the | fact must do so given there is no global termination criteria for the | |||
unsychronized ZTP algorithm, information may end up temporarily in | unsynchronized ZTP algorithm, information may temporarily end up in | |||
wrong layers. A special clause when changing level takes care of | wrong layers. A special clause when changing level takes care of | |||
that. | that. | |||
More difficult is a condition where a node (e.g. a leaf) floods a TIE | More difficult is a condition where a node (e.g., a leaf) floods a | |||
north towards its grandparent, then its parent reboots, partitioning | TIE north towards its grandparent, then its parent reboots, | |||
the grandparent from leaf directly and then the leaf itself reboots. | partitioning the grandparent from the leaf directly, and then the | |||
That can leave the grandparent holding the "primary copy" of the | leaf itself reboots. That can leave the grandparent holding the | |||
leaf's TIE. Normally this condition is resolved easily by the leaf | "primary copy" of the leaf's TIE. Normally, this condition is | |||
re-originating its TIE with a higher sequence number than it notices | resolved easily by the leaf reoriginating its TIE with a higher | |||
in the northbound TIEs, here however, when the parent comes back it | sequence number than it notices in the northbound TIEs; here however, | |||
won't be able to obtain leaf's North TIE from the grandparent easily | when the parent comes back, it won't be able to obtain the leaf's | |||
and with that the leaf may not issue the TIE with a higher sequence | North TIE from the grandparent easily, and with that, the leaf may | |||
number that can reach the grandparent for a long time. Flooding | not issue the TIE with a higher sequence number that can reach the | |||
procedures are extended to deal with the problem by the means of | grandparent for a long time. Flooding procedures are extended to | |||
special clauses that override the database of a lower level with | deal with the problem by the means of special clauses that override | |||
headers of newer TIEs received in TIDEs coming from the north. Those | the database of a lower level with headers of newer TIEs received in | |||
headers are then propagated southbound towards the leaf to cause it | TIDEs coming from the north. Those headers are then propagated | |||
to originate a higher sequence number of the TIE effectively | southbound towards the leaf to cause it to originate a higher | |||
refreshing it all the way up to ToF. | sequence number of the TIE, effectively refreshing it all the way up | |||
to ToF. | ||||
6.4. Reachability Computation | 6.4. Reachability Computation | |||
A node has three possible sources of relevant information for | A node has three possible sources of relevant information for | |||
reachability computation. A node knows the full topology south of it | reachability computation. A node knows the full topology south of it | |||
from the received North Node TIEs or alternately north of it from the | from the received North Node TIEs or alternately north of it from the | |||
South Node TIEs. A node has the set of prefixes with their | South Node TIEs. A node has the set of prefixes with their | |||
associated distances and bandwidths from corresponding prefix TIEs. | associated distances and bandwidths from corresponding prefix TIEs. | |||
To compute prefix reachability, a node runs conceptually a northbound | To compute prefix reachability, a node conceptually runs a northbound | |||
and a southbound SPF. N-SPF and S-SPF notation denotes here the | and a southbound SPF. Here, N-SPF and S-SPF notation denotes the | |||
direction in which the computation front is progressing. | direction in which the computation front is progressing. | |||
Since neither computation can "loop", it is possible to compute non- | Since neither computation can "loop", it is possible to compute non- | |||
equal-cost or even k-shortest paths [EPPSTEIN] and "saturate" the | equal costs or even k-shortest paths [EPPSTEIN] and "saturate" the | |||
fabric to the extent desired. This specification however uses | fabric to the extent desired. This specification however uses | |||
simple, familiar SPF algorithms and concepts as example due to their | simple, familiar SPF algorithms and concepts as examples due to their | |||
prevalence in today's routing. | prevalence in today's routing. | |||
For reachability computation purposes, RIFT considers all parallel | For reachability computation purposes, RIFT considers all parallel | |||
links between two nodes to be of the same cost advertised in the | links between two nodes to be of the same cost advertised in the | |||
_cost_ element of _NodeNeighborsTIEElement_. In case the neighbor has | _cost_ element of _NodeNeighborsTIEElement_. In case the neighbor has | |||
multiple parallel links at different cost, the largest distance | multiple parallel links at different costs, the largest distance | |||
(highest numerical value) MUST be advertised. Given the range of | (highest numerical value) MUST be advertised. Given the range of | |||
thrift encodings, _infinite_distance_ is defined as the largest non- | Thrift encodings, _infinite_distance_ is defined as the largest non- | |||
negative _MetricType_. Any link with metric larger than that (i.e. | negative _MetricType_. Any link with a metric larger than that (i.e., | |||
negative MetricType) MUST be ignored in computations. Any link with | the negative MetricType) MUST be ignored in computations. Any link | |||
metric set to _invalid_distance_ MUST also be ignored in computation. | with the metric set to _invalid_distance_ MUST also be ignored in | |||
In case of a negatively distributed prefix the metric attribute MUST | computation. In case of a negatively distributed prefix, the metric | |||
be set to _infinite_distance_ by the originator and it MUST be | attribute MUST be set to _infinite_distance_ by the originator, and | |||
ignored by all nodes during computation except for the purpose of | it MUST be ignored by all nodes during computation, except for the | |||
determining transitive propagation and building the corresponding | purpose of determining transitive propagation and building the | |||
routing table. | corresponding routing table. | |||
A prefix can carry the _directly_attached_ attribute to indicate that | A prefix can carry the _directly_attached_ attribute to indicate that | |||
the prefix is directly attached, i.e., should be routed to even if | the prefix is directly attached, i.e., should be routed to even if | |||
the node is in overload. In case of a negatively distributed prefix | the node is in overload. In case of a negatively distributed prefix, | |||
this attribute MUST NOT be included by the originator and it MUST be | this attribute MUST NOT be included by the originator, and it MUST be | |||
ignored by all nodes during SPF computation. If a prefix is locally | ignored by all nodes during SPF computation. If a prefix is locally | |||
originated the attribute _from_link_ can indicate the interface to | originated, the attribute _from_link_ can indicate the interface to | |||
which the address belongs to. In case of a negatively distributed | which the address belongs to. In case of a negatively distributed | |||
prefix this attribute MUST NOT be included by the originator and it | prefix, this attribute MUST NOT be included by the originator, and it | |||
MUST be ignored by all nodes during computation. A prefix can also | MUST be ignored by all nodes during computation. A prefix can also | |||
carry the _loopback_ attribute to indicate the said property. | carry the _loopback_ attribute to indicate the said property. | |||
Prefixes are carried in different types of TIEs indicating their | Prefixes are carried in different types of TIEs indicating their | |||
type. For same prefix being included in different TIE types tie- | type. For the same prefix being included in different TIE types, | |||
breaking is performed according to Section 6.8.1. If the same prefix | tie-breaking is performed according to Section 6.8.1. If the same | |||
is included multiple times in multiple TIEs of the same type | prefix is included multiple times in multiple TIEs of the same type | |||
originating at the same node the resulting behavior is unspecified. | originating at the same node, the resulting behavior is unspecified. | |||
6.4.1. Northbound Reachability SPF | 6.4.1. Northbound Reachability SPF | |||
N-SPF MUST use exclusively northbound and East-West adjacencies in | N-SPF MUST use exclusively northbound and East-West adjacencies in | |||
the computing node's node North TIEs (since if the node is a leaf it | the computing node's North Node TIEs (since if the node is a leaf, it | |||
may not have generated a Node South TIE) when starting SPF. Observe | may not have generated a South Node TIE) when starting SPF. Observe | |||
that N-SPF is really just a one hop variety since Node South TIEs are | that N-SPF is really just a one-hop variety since South Node TIEs are | |||
not re-flooded southbound beyond a single level (or East-West) and | not reflooded southbound beyond a single level (or East-West), and | |||
with that the computation cannot progress beyond adjacent nodes. | with that, the computation cannot progress beyond adjacent nodes. | |||
Once progressing, the computation uses the next higher level's Node | Once progressing, the computation uses the next higher level's South | |||
South TIEs to find corresponding adjacencies to verify backlink | Node TIEs to find corresponding adjacencies to verify backlink | |||
connectivity. Two unidirectional links MUST be associated to confirm | connectivity. Two unidirectional links MUST be associated to confirm | |||
bidirectional connectivity, a process often known as `backlink | bidirectional connectivity, a process often known as "backlink | |||
check`. As part of the check, both Node TIEs MUST contain the correct | check". As part of the check, both Node TIEs MUST contain the | |||
System IDs *and* expected levels. | correct System IDs *and* expected levels. | |||
The default route found when crossing an E-W link SHOULD be used if | The default route found when crossing an E-W link SHOULD be used if | |||
and only if | and only if: | |||
1. the node itself does *not* have any northbound adjacencies *and* | 1. the node itself does *not* have any northbound adjacencies *and* | |||
2. the adjacent node has one or more northbound adjacencies | 2. the adjacent node has one or more northbound adjacencies | |||
This rule forms a "one-hop default route split-horizon" and prevents | This rule forms a "one-hop default route split-horizon" and prevents | |||
looping over default routes while allowing for "one-hop protection" | looping over default routes while allowing for "one-hop protection" | |||
of nodes that lost all northbound adjacencies except at the ToF where | of nodes that lost all northbound adjacencies, except at the ToF | |||
the links are used exclusively to flood topology information in | where the links are used exclusively to flood topology information in | |||
multi-plane designs. | multi-plane designs. | |||
Other south prefixes found when crossing E-W link MAY be used if and | Other south prefixes found when crossing E-W links MAY be used if and | |||
only if | only if | |||
1. no north neighbors are advertising same or a supersuming non- | 1. no north neighbors are advertising the same or a supersuming non- | |||
default prefix *and* | default prefix *and* | |||
2. the node does not originate a non-default supersuming prefix | 2. the node does not originate a non-default supersuming prefix | |||
itself. | itself. | |||
I.e., the E-W link can be used as a gateway of last resort for a | That is, the E-W link can be used as a gateway of last resort for a | |||
specific prefix only. Using south prefixes across E-W link can be | specific prefix only. Using south prefixes across an E-W link can be | |||
beneficial e.g., on automatic disaggregation in pathological fabric | beneficial, e.g., on automatic disaggregation in pathological fabric | |||
partitioning scenarios. | partitioning scenarios. | |||
A detailed example can be found in Appendix B.4. | A detailed example can be found in Appendix B.4. | |||
6.4.2. Southbound Reachability SPF | 6.4.2. Southbound Reachability SPF | |||
S-SPF MUST use the southbound adjacencies in the Node South TIEs | S-SPF MUST use the southbound adjacencies in the South Node TIEs | |||
exclusively, i.e. progresses towards nodes at lower levels. Observe | exclusively, i.e., progresses towards nodes at lower levels. Observe | |||
that E-W adjacencies are NEVER used in this computation. This | that E-W adjacencies are NEVER used in this computation. This | |||
enforces the requirement that a packet traversing in a southbound | enforces the requirement that a packet traversing in a southbound | |||
direction must never change its direction. | direction must never change its direction. | |||
S-SPF MUST use northbound adjacencies in node North TIEs to verify | S-SPF MUST use northbound adjacencies in North Node TIEs to verify | |||
backlink connectivity by checking for presence of the link beside | backlink connectivity by checking for the presence of the link beside | |||
correct System ID and level. | the correct System ID and level. | |||
6.4.3. East-West Forwarding Within a non-ToF Level | 6.4.3. East-West Forwarding Within a Non-ToF Level | |||
Using south prefixes over horizontal links MAY occur if the N-SPF | Using south prefixes over horizontal links MAY occur if the N-SPF | |||
includes East-West adjacencies in computation. It can protect | includes East-West adjacencies in computation. It can protect | |||
against pathological fabric partitioning cases that leave only paths | against pathological fabric partitioning cases that leave only paths | |||
to destinations that would necessitate multiple changes of forwarding | to destinations that would necessitate multiple changes of forwarding | |||
direction between north and south. | direction between north and south. | |||
6.4.4. East-West Links Within ToF Level | 6.4.4. East-West Links Within a ToF Level | |||
E-W ToF links behave in terms of flooding scopes defined in | E-W ToF links behave in terms of flooding scopes defined in | |||
Section 6.3.4 like northbound links and MUST be used exclusively for | Section 6.3.4 like northbound links and MUST be used exclusively for | |||
control plane information flooding. Even though a ToF node could be | control plane information flooding. Even though a ToF node could be | |||
tempted to use those links during southbound SPF and carry traffic | tempted to use those links during southbound SPF and carry traffic | |||
over them this MUST NOT be attempted since it may, in anycast cases, | over them, this MUST NOT be attempted since it may, in anycast cases, | |||
lead to routing loops. An implementation MAY try to resolve the | lead to routing loops. An implementation MAY try to resolve the | |||
looping problem by following on the ring strictly tie-broken | looping problem by following on the ring strictly tie-broken | |||
shortest-paths only but the details are outside this specification. | shortest-paths only, but the details are outside this specification. | |||
And even then, the problem of proper capacity provisioning of such | And even then, the problem of proper capacity provisioning of such | |||
links when they become traffic-bearing in case of failures is vexing | links when they become traffic-bearing in case of failures is vexing, | |||
and when used for forwarding purposes, they defeat statistical non- | and when used for forwarding purposes, they defeat statistical non- | |||
blocking guarantees that Clos is providing normally. | blocking guarantees that Clos is providing normally. | |||
6.5. Automatic Disaggregation on Link & Node Failures | 6.5. Automatic Disaggregation on Link & Node Failures | |||
6.5.1. Positive, Non-transitive Disaggregation | 6.5.1. Positive, Non-Transitive Disaggregation | |||
Under normal circumstances, a node's South TIEs contain just the | Under normal circumstances, a node's South TIEs contain just the | |||
adjacencies and a default route. However, if a node detects that its | adjacencies and a default route. However, if a node detects that its | |||
default IP prefix covers one or more prefixes that are reachable | default IP prefix covers one or more prefixes that are reachable | |||
through it but not through one or more other nodes at the same level, | through it but not through one or more other nodes at the same level, | |||
then it MUST explicitly advertise those prefixes in a South TIE. | then it MUST explicitly advertise those prefixes in a South TIE. | |||
Otherwise, some percentage of the northbound traffic for those | Otherwise, some percentage of the northbound traffic for those | |||
prefixes would be sent to nodes without corresponding reachability, | prefixes would be sent to nodes without corresponding reachability, | |||
causing it to be dropped. Even when traffic is not being dropped, | causing it to be dropped. Even when traffic is not being dropped, | |||
the resulting forwarding could 'backhaul' packets through the higher | the resulting forwarding could "backhaul" packets through the higher- | |||
level spines, clearly an undesirable condition affecting the blocking | level spines, clearly an undesirable condition affecting the blocking | |||
probabilities of the fabric. | probabilities of the fabric. | |||
This specification refers to the process of advertising additional | This specification refers to the process of advertising additional | |||
prefixes southbound as 'positive disaggregation'. Such | prefixes southbound as "positive disaggregation". Such | |||
disaggregation is non-transitive, i.e., its effects are always | disaggregation is non-transitive, i.e., its effects are always | |||
constrained to a single level of the fabric. Naturally, multiple | constrained to a single level of the fabric. Naturally, multiple | |||
node or link failures can lead to several independent instances of | node or link failures can lead to several independent instances of | |||
positive disaggregation necessary to prevent looping or bow-tying the | positive disaggregation necessary to prevent looping or bow-tying the | |||
fabric. | fabric. | |||
A node determines the set of prefixes needing disaggregation using | A node determines the set of prefixes needing disaggregation using | |||
the following steps: | the following steps: | |||
1. A DAG computation in the southern direction is performed first. | 1. A DAG computation in the southern direction is performed first. | |||
The North TIEs are used to find all of the prefixes it can reach | The North TIEs are used to find all of the prefixes it can reach | |||
and the set of next-hops in the lower level for each of them. | and the set of next hops in the lower level for each of them. | |||
Such a computation can be easily performed on a Fat Tree by | Such a computation can be easily performed on a fat tree by | |||
setting all link costs in the southern direction to 1 and all | setting all link costs in the southern direction to 1 and all | |||
northern directions to infinity. We term set of those | northern directions to infinity. The set of those prefixes is | |||
prefixes |R, and for each prefix, r, in |R, its set of next-hops | referred to as |R; for each prefix r in |R, its set of next hops | |||
is defined to be |H(r). | is referred to as |H(r). | |||
2. The node uses reflected South TIEs to find all nodes at the same | 2. The node uses reflected South TIEs to find all nodes at the same | |||
level in the same PoD and the set of southbound adjacencies for | level in the same PoD and the set of southbound adjacencies for | |||
each. The set of nodes at the same level is termed |N and for | each. The set of nodes at the same level is termed |N, and for | |||
each node, n, in |N, its set of southbound adjacencies is defined | each node, n, in |N, its set of southbound adjacencies is defined | |||
to be |A(n). | to be |A(n). | |||
3. For a given r, if the intersection of |H(r) and |A(n), for any n, | 3. For a given r, if the intersection of |H(r) and |A(n), for any n, | |||
is empty then that prefix r must be explicitly advertised by the | is empty, then that prefix r must be explicitly advertised by the | |||
node in a South TIE. | node in a South TIE. | |||
4. Identical set of disaggregated prefixes is flooded on each of the | 4. An identical set of disaggregated prefixes is flooded on each of | |||
node's southbound adjacencies. In accordance with the normal | the node's southbound adjacencies. In accordance with the normal | |||
flooding rules for a South TIE, a node at the lower level that | flooding rules for a South TIE, a node at the lower level that | |||
receives this South TIE SHOULD NOT propagate it south-bound or | receives this South TIE SHOULD NOT propagate it southbound or | |||
reflect the disaggregated prefixes back over its adjacencies to | reflect the disaggregated prefixes back over its adjacencies to | |||
nodes at the level from which it was received. | nodes at the level from which it was received. | |||
To summarize the above in simplest terms: if a node detects that its | To summarize the above in simplest terms: If a node detects that its | |||
default route encompasses prefixes for which one of the other nodes | default route encompasses prefixes for which one of the other nodes | |||
in its level has no possible next-hops in the level below, it has to | in its level has no possible next hops in the level below, it has to | |||
disaggregate it to prevent traffic loss or suboptimal routing through | disaggregate it to prevent traffic loss or suboptimal routing through | |||
such nodes. Hence, a node X needs to determine if it can reach a | such nodes. Hence, a node X needs to determine if it can reach a | |||
different set of south neighbors than other nodes at the same level, | different set of south neighbors than other nodes at the same level, | |||
which are connected to it via at least one common south neighbor. If | which are connected to it via at least one common south neighbor. If | |||
it can, then prefix disaggregation may be required. If it can't, | it can, then prefix disaggregation may be required. If it can't, | |||
then no prefix disaggregation is needed. An example of | then no prefix disaggregation is needed. An example of | |||
disaggregation is provided in Appendix B.3. | disaggregation is provided in Appendix B.3. | |||
Finally, a possible algorithm is described here: | Finally, a possible algorithm is described here: | |||
1. Create partial_neighbors = (empty), a set of neighbors with | 1. Create partial_neighbors = (empty), a set of neighbors with | |||
partial connectivity to the node X's level from X's perspective. | partial connectivity to the node X's level from X's perspective. | |||
Each entry in the set is a south neighbor of X and a list of | Each entry in the set is a south neighbor of X and a list of | |||
nodes of X.level that can't reach that neighbor. | nodes of X.level that can't reach that neighbor. | |||
2. A node X determines its set of southbound neighbors | 2. A node X determines its set of southbound neighbors | |||
X.south_neighbors. | X.south_neighbors. | |||
3. For each South TIE originated from a node Y that X has which is | 3. For each South TIE originated from a node Y that X has, which is | |||
at X.level, if Y.south_neighbors is not the same as | at X.level, if Y.south_neighbors is not the same as | |||
X.south_neighbors but the nodes share at least one southern | X.south_neighbors but the nodes share at least one southern | |||
neighbor, for each neighbor N in X.south_neighbors but not in | neighbor, for each neighbor N in X.south_neighbors but not in | |||
Y.south_neighbors, add (N, (Y)) to partial_neighbors if N isn't | Y.south_neighbors, add (N, (Y)) to partial_neighbors if N isn't | |||
there or add Y to the list for N. | there or add Y to the list for N. | |||
4. If partial_neighbors is empty, then node X does not disaggregate | 4. If partial_neighbors is empty, then node X does not disaggregate | |||
any prefixes. If node X is advertising disaggregated prefixes in | any prefixes. If node X is advertising disaggregated prefixes in | |||
its South TIE, X SHOULD remove them and re-advertise its South | its South TIE, X SHOULD remove them and re-advertise its South | |||
TIEs. | TIEs. | |||
A node X computes reachability to all nodes below it based upon the | A node X computes reachability to all nodes below it based upon the | |||
received North TIEs first. This results in a set of routes, each | received North TIEs first. This results in a set of routes, each | |||
categorized by (prefix, path_distance, next-hop set). Alternately, | categorized by (prefix, path_distance, next-hop set). Alternately, | |||
for clarity in the following procedure, these can be organized by | for clarity in the following procedure, these can be organized by a | |||
next-hop set as ((next-hops), {(prefix, path_distance)}). If | next-hop set as ((next-hops), {(prefix, path_distance)}). If | |||
partial_neighbors isn't empty, then the procedure in Figure 17 | partial_neighbors isn't empty, then the procedure in Figure 17 | |||
describes how to identify prefixes to disaggregate. | describes how to identify prefixes to disaggregate. | |||
disaggregated_prefixes = { empty } | disaggregated_prefixes = { empty } | |||
nodes_same_level = { empty } | nodes_same_level = { empty } | |||
for each South TIE | for each South TIE | |||
if (South TIE.level == X.level and | if (South TIE.level == X.level and | |||
X shares at least one S-neighbor with X) | X shares at least one S-neighbor with X) | |||
add South TIE.originator to nodes_same_level | add South TIE.originator to nodes_same_level | |||
end if | end if | |||
end for | end for | |||
for each next-hop-set NHS | for each next-hop-set NHS | |||
isolated_nodes = nodes_same_level | isolated_nodes = nodes_same_level | |||
for each NH in NHS | for each NH in NHS | |||
if NH in partial_neighbors | if NH in partial_neighbors | |||
isolated_nodes = | isolated_nodes = | |||
intersection(isolated_nodes, | intersection(isolated_nodes, | |||
partial_neighbors[NH].nodes) | partial_neighbors[NH].nodes) | |||
end if | end if | |||
end for | end for | |||
if isolated_nodes is not empty | if isolated_nodes is not empty | |||
for each prefix using NHS | for each prefix using NHS | |||
add (prefix, distance) to disaggregated_prefixes | add (prefix, distance) to disaggregated_prefixes | |||
end for | end for | |||
end if | end if | |||
end for | end for | |||
copy disaggregated_prefixes to X's South TIE | copy disaggregated_prefixes to X's South TIE | |||
if X's South TIE is different | if X's South TIE is different | |||
schedule South TIE for flooding | schedule South TIE for flooding | |||
end if | end if | |||
Figure 17: Computation of Disaggregated Prefixes | Figure 17: Computation of Disaggregated Prefixes | |||
Each disaggregated prefix is sent with the corresponding | Each disaggregated prefix is sent with the corresponding | |||
path_distance. This allows a node to send the same South TIE to each | path_distance. This allows a node to send the same South TIE to each | |||
south neighbor. The south neighbor which is connected to that prefix | south neighbor. The south neighbor that is connected to that prefix | |||
will thus have a shorter path. | will thus have a shorter path. | |||
Finally, to summarize the less obvious points partially omitted in | Finally, to summarize the less obvious points partially omitted in | |||
the algorithms to keep them more tractable: | the algorithms to keep them more tractable: | |||
1. all neighbor relationships MUST perform backlink checks. | 1. All neighbor relationships MUST perform backlink checks. | |||
2. overload flag as introduced in Section 6.8.2 and carried in the | 2. The overload flag as introduced in Section 6.8.2 and carried in | |||
_overload_ schema element have to be respected during the | the _overload_ schema element has to be respected during the | |||
computation. Nodes advertising themselves as overloaded MUST NOT | computation. Nodes advertising themselves as overloaded MUST NOT | |||
be transited in reachability computation but MUST be used as | be transited in reachability computation but MUST be used as | |||
terminal nodes with prefixes they advertise being reachable. | terminal nodes with prefixes they advertise being reachable. | |||
3. all the lower-level nodes are flooded the same disaggregated | 3. All the lower-level nodes are flooded the same disaggregated | |||
prefixes since RIFT does not build a South TIE per node which | prefixes since RIFT does not build a South TIE per node, which | |||
would complicate things unnecessarily. The lower-level node that | would complicate things unnecessarily. The lower-level node that | |||
can compute a southbound route to the prefix will prefer it to | can compute a southbound route to the prefix will prefer it to | |||
the disaggregated route anyway based on route preference rules. | the disaggregated route anyway based on route preference rules. | |||
4. positively disaggregated prefixes do *not* have to propagate to | 4. Positively disaggregated prefixes do *not* have to propagate to | |||
lower levels. With that the disturbance in terms of new flooding | lower levels. With that, the disturbance in terms of new | |||
is contained to a single level experiencing failures. | flooding is contained to a single level experiencing failures. | |||
5. disaggregated Prefix South TIEs are not "reflected" by the lower | 5. Disaggregated South Prefix TIEs are not "reflected" by the lower | |||
level. Nodes within same level do *not* need to be aware which | level. Nodes within the same level do *not* need to be aware of | |||
node computed the need for disaggregation. | which node computed the need for disaggregation. | |||
6. The fabric is still supporting maximum load balancing properties | 6. The fabric is still supporting maximum load balancing properties | |||
while not trying to send traffic northbound unless necessary. | while not trying to send traffic northbound unless necessary. | |||
In case positive disaggregation is triggered and due to the very | In case positive disaggregation is triggered and due to the very | |||
stable but un-synchronized nature of the algorithm the nodes may | stable but unsynchronized nature of the algorithm, the nodes may | |||
issue the necessary disaggregated prefixes at different points in | issue the necessary disaggregated prefixes at different points in | |||
time. This can lead for a short time to an "incast" behavior where | time. For a short time, this can lead to an "incast" behavior where | |||
the first advertising router based on the nature of longest prefix | the first advertising router based on the nature of the longest | |||
match will attract all the traffic. Different implementation | prefix match will attract all the traffic. Different implementation | |||
strategies can be used to lessen that effect, but those are outside | strategies can be used to lessen that effect, but those are outside | |||
the scope of this specification. | the scope of this specification. | |||
It is worth observing that, in a single plane ToF, this | It is worth observing that, in a single-plane ToF, this | |||
disaggregation prevents traffic loss up to (K_LEAF * P) link failures | disaggregation prevents traffic loss up to (K_LEAF * P) link failures | |||
in terms of Section 5.2 or, in other terms, it takes at minimum that | in terms of Section 5.2 or, in other terms, it takes at minimum that | |||
many link failures to partition the ToF into multiple planes. | many link failures to partition the ToF into multiple planes. | |||
6.5.2. Negative, Transitive Disaggregation for Fallen Leaves | 6.5.2. Negative, Transitive Disaggregation for Fallen Leaves | |||
As explained in Section 5.3 failures in multi-plane ToF or more than | As explained in Section 5.3, failures in multi-plane ToF or more than | |||
(K_LEAF * P) links failing in single plane design can generate fallen | (K_LEAF * P) links failing in single-plane design can generate fallen | |||
leaves. Such scenario cannot be addressed by positive disaggregation | leaves. Such scenario cannot be addressed by positive disaggregation | |||
only and needs a further mechanism. | only and needs a further mechanism. | |||
6.5.2.1. Cabling of Multiple ToF Planes | 6.5.2.1. Cabling of Multiple ToF Planes | |||
Returning in this section to designs with multiple planes as shown | Returning in this section to designs with multiple planes as shown | |||
originally in Figure 3, Figure 18 highlights how the ToF is cabled in | originally in Figure 3, Figure 18 highlights how the ToF is cabled in | |||
case of two planes by the means of dual-rings to distribute all the | case of two planes by the means of dual-rings to distribute all the | |||
North TIEs within both planes. | North TIEs within both planes. | |||
____________________________________________________________________________ | _______________________________________________________________________ | |||
| [Plane A] . [Plane B] . [Plane C] . [Plane D] | | | [Plane A] . [Plane B] . [Plane C] . [Plane D] | | |||
|..........................................................................| | |.....................................................................| | |||
| +-------------------------------------------------------------+ | | | +------------------------------------------------------------+ | | |||
| | +---+ . +---+ . +---+ . +---+ | | | | | +---+ . +---+ . +---+ . +---+ | | | |||
| +-+ n +-------------+ n +-------------+ n +-------------+ n +-+ | | | +-+ n +-------------+ n +-------------+ n +------------+ n +-+ | | |||
| +--++ . +-+++ . +-+++ . +--++ | | | +--++ . +-+++ . +-+++ . +--++ | | |||
| || . || . || . || | | | || . || . || . || | | |||
| +---------||---------------||----------------||---------------+ || | | | +---------||---------------||----------------||--------------+ || | | |||
| | +---+ || . +---+ || . +---+ || . +---+ | || | | | | +---+ || . +---+ || . +---+ || . +---+ | || | | |||
| +-+ 1 +---||--------+ 1 +--||---------+ 1 +--||---------+ 1 +-+ || | | | +-+ 1 +---||--------+ 1 +--||---------+ 1 +--||--------+ 1 +-+ || | | |||
| +--++ || . +-+++ || . +-+++ || . +-+++ || | | | +--++ || . +-+++ || . +-+++ || . +-+++ || | | |||
| || || . || || . || || . || || | | | || || . || || . || || . || || | | |||
| || || . || || . || || . || || | | | || || . || || . || || . || || | | |||
Figure 18: Topologically Connected Planes | Figure 18: Topologically Connected Planes | |||
Section 5.3 already describes how failures in multi-plane fabrics can | Section 5.3 already describes how failures in multi-plane fabrics can | |||
lead to traffic loss that normal positive disaggregation cannot fix. | lead to traffic loss that normal positive disaggregation cannot fix. | |||
The mechanism of negative, transitive disaggregation incorporated in | The mechanism of negative, transitive disaggregation incorporated in | |||
RIFT provides the corresponding solution and next section explains | RIFT provides the corresponding solution, and the next section | |||
the involved mechanisms in more detail. | explains the involved mechanisms in more detail. | |||
6.5.2.2. Transitive Advertisement of Negative Disaggregates | 6.5.2.2. Transitive Advertisement of Negative Disaggregates | |||
A ToF node discovering that it cannot reach a fallen leaf SHOULD | A ToF node discovering that it cannot reach a fallen leaf SHOULD | |||
disaggregate all the prefixes of that leaf. It uses for that purpose | disaggregate all the prefixes of that leaf. For that purpose, it | |||
negative prefix South TIEs that are, as usual, flooded southwards | uses negative South Prefix TIEs that are, as usual, flooded | |||
with the scope defined in Section 6.3.4. | southwards with the scope defined in Section 6.3.4. | |||
Transitively, a node explicitly loses connectivity to a prefix when | Transitively, a node explicitly loses connectivity to a prefix when | |||
none of its children advertises it and when the prefix is negatively | none of its children advertises it and when the prefix is negatively | |||
disaggregated by all of its parents. When that happens, the node | disaggregated by all of its parents. When that happens, the node | |||
originates the negative prefix further down south. Since the | originates the negative prefix further down south. Since the | |||
mechanism applies recursively south the negative prefix may propagate | mechanism applies recursively south, the negative prefix may | |||
transitively all the way down to the leaf. This is necessary since | propagate transitively all the way down to the leaf. This is | |||
leaves connected to multiple planes by means of disjointed paths may | necessary since leaves connected to multiple planes by means of | |||
have to choose the correct plane at the very bottom of the fabric to | disjointed paths may have to choose the correct plane at the very | |||
make sure that they don't send traffic towards another leaf using a | bottom of the fabric to make sure that they don't send traffic | |||
plane where it is "fallen" which would make traffic loss unavoidable. | towards another leaf using a plane where it is "fallen", which would | |||
make traffic loss unavoidable. | ||||
When connectivity is restored, a node that disaggregated a prefix | When connectivity is restored, a node that disaggregated a prefix | |||
withdraws the negative disaggregation by the usual mechanism of re- | withdraws the negative disaggregation by the usual mechanism of re- | |||
advertising TIEs omitting the negative prefix. | advertising TIEs omitting the negative prefix. | |||
6.5.2.3. Computation of Negative Disaggregates | 6.5.2.3. Computation of Negative Disaggregates | |||
Negative prefixes can in fact be advertised due to two different | Negative prefixes can in fact be advertised due to two different | |||
triggers. This will be described consecutively. | triggers. This will be described consecutively. | |||
The first origination reason is a computation that uses all the node | The first origination reason is a computation that uses all the North | |||
North TIEs to build the set of all reachable nodes by reachability | Node TIEs to build the set of all reachable nodes by reachability | |||
computation over the complete graph and including horizontal ToF | computation over the complete graph, including horizontal ToF links. | |||
links. The computation uses the node itself as root. This is | The computation uses the node itself as the root. This is compared | |||
compared with the result of the normal southbound SPF as described in | with the result of the normal southbound SPF as described in | |||
Section 6.4.2. The difference are the fallen leaves and all their | Section 6.4.2. The differences are the fallen leaves and all their | |||
attached prefixes are advertised as negative prefixes southbound if | attached prefixes are advertised as negative prefixes southbound if | |||
the node does not consider the prefix to be reachable within the | the node does not consider the prefix to be reachable within the | |||
southbound SPF. | southbound SPF. | |||
The second origination reason hinges on the understanding how the | The second origination reason hinges on the understanding of how the | |||
negative prefixes are used within the computation as described in | negative prefixes are used within the computation as described in | |||
Figure 19. When attaching the negative prefixes at a certain point | Figure 19. When attaching the negative prefixes at a certain point | |||
in time the negative prefix may find itself with all the viable nodes | in time, the negative prefix may find itself with all the viable | |||
from the shorter match nexthop being pruned. In other words, all its | nodes from the shorter match next hop being pruned. In other words, | |||
northbound neighbors provided a negative prefix advertisement. This | all its northbound neighbors provided a negative prefix | |||
is the trigger to advertise this negative prefix transitively south | advertisement. This is the trigger to advertise this negative prefix | |||
and is normally caused by the node being in a plane where the prefix | transitively south and is normally caused by the node being in a | |||
belongs to a fabric leaf that has "fallen" in this plane. Obviously, | plane where the prefix belongs to a fabric leaf that has "fallen" in | |||
when one of the northbound switches withdraws its negative | this plane. Obviously, when one of the northbound switches withdraws | |||
advertisement, the node has to withdraw its transitively provided | its negative advertisement, the node has to withdraw its transitively | |||
negative prefix as well. | provided negative prefix as well. | |||
6.6. Attaching Prefixes | 6.6. Attaching Prefixes | |||
After an SPF is run, it is necessary to attach the resulting | After an SPF is run, it is necessary to attach the resulting | |||
reachability information in form of prefixes. For S-SPF, prefixes | reachability information in the form of prefixes. For S-SPF, | |||
from a North TIE are attached to the originating node with that | prefixes from a North TIE are attached to the originating node with | |||
node's next-hop set and a distance equal to the prefix's cost plus | that node's next-hop set and a distance equal to the prefix's cost | |||
the node's minimized path distance. The RIFT route database, a set | plus the node's minimized path distance. The RIFT route database, a | |||
of (prefix, prefix-type, attributes, path_distance, next-hop set), | set of (prefix, prefix-type, attributes, path_distance, next-hop | |||
accumulates these results. | set), accumulates these results. | |||
N-SPF prefixes from each South TIE need to also be added to the RIFT | N-SPF prefixes from each South TIE need to also be added to the RIFT | |||
route database. The N-SPF is really just a stub so the computing | route database. The N-SPF is really just a stub so the computing | |||
node needs simply to determine, for each prefix in a South TIE that | node simply needs to determine, for each prefix in a South TIE that | |||
originated from adjacent node, what next-hops to use to reach that | originated from adjacent node, what next hops to use to reach that | |||
node. Since there may be parallel links, the next-hops to use can be | node. Since there may be parallel links, the next hops to use can be | |||
a set; presence of the computing node in the associated Node South | a set; the presence of the computing node in the associated South | |||
TIE is sufficient to verify that at least one link has bidirectional | Node TIE is sufficient to verify that at least one link has | |||
connectivity. The set of minimum cost next-hops from the computing | bidirectional connectivity. The set of minimum cost next hops from | |||
node X to the originating adjacent node is determined. | the computing node X to the originating adjacent node is determined. | |||
Each prefix has its cost adjusted before being added into the RIFT | Each prefix has its cost adjusted before being added into the RIFT | |||
route database. The cost of the prefix is set to the cost received | route database. The cost of the prefix is set to the cost received | |||
plus the cost of the minimum distance next-hop to that neighbor while | plus the cost of the minimum distance next hop to that neighbor while | |||
considering its attributes such as mobility per Section 6.8.4. Then | considering its attributes such as mobility per Section 6.8.4. Then | |||
each prefix can be added into the RIFT route database with the next- | each prefix can be added into the RIFT route database with the next- | |||
hop set; ties are broken based upon type first and then distance and | hop set; ties are broken based upon type first and then distance and | |||
further on _PrefixAttributes_. Only the best combination is used for | further on _PrefixAttributes_. Only the best combination is used for | |||
forwarding. RIFT route preferences are normalized by the enum | forwarding. RIFT route preferences are normalized by the enum | |||
_RouteType_ in Thrift [thrift] model given in Section 7. | _RouteType_ in the Thrift [thrift] model given in Section 7. | |||
An example implementation for node X follows: | An example implementation for node X follows: | |||
for each South TIE | for each South TIE | |||
if South TIE.level > X.level | if South TIE.level > X.level | |||
next_hop_set = set of minimum cost links to the | next_hop_set = set of minimum cost links to the | |||
South TIE.originator | South TIE.originator | |||
next_hop_cost = minimum cost link to | next_hop_cost = minimum cost link to | |||
South TIE.originator | South TIE.originator | |||
end if | end if | |||
for each prefix P in the South TIE | for each prefix P in the South TIE | |||
P.cost = P.cost + next_hop_cost | P.cost = P.cost + next_hop_cost | |||
if P not in route_database: | if P not in route_database: | |||
add (P, P.cost, P.type, | add (P, P.cost, P.type, | |||
P.attributes, next_hop_set) to route_database | P.attributes, next_hop_set) to route_database | |||
end if | end if | |||
if (P in route_database): | if (P in route_database): | |||
if route_database[P].cost > P.cost or | if route_database[P].cost > P.cost or | |||
route_database[P].type > P.type: | route_database[P].type > P.type: | |||
update route_database[P] with (P, P.type, P.cost, | update route_database[P] with (P, P.type, P.cost, | |||
P.attributes, | P.attributes, | |||
next_hop_set) | next_hop_set) | |||
else if route_database[P].cost == P.cost and | else if route_database[P].cost == P.cost and | |||
route_database[P].type == P.type: | route_database[P].type == P.type: | |||
update route_database[P] with (P, P.type, | update route_database[P] with (P, P.type, | |||
P.cost, P.attributes, | P.cost, P.attributes, | |||
merge(next_hop_set, route_database[P].next_hop_set)) | merge(next_hop_set, route_database[P].next_hop_set)) | |||
else | else | |||
// Not preferred route so ignore | // Not preferred route so ignore | |||
end if | end if | |||
end if | end if | |||
end for | end for | |||
end for | end for | |||
Figure 19: Adding Routes from South TIE Positive and Negative | Figure 19: Adding Routes from South TIE Positive and Negative | |||
Prefixes | Prefixes | |||
After the positive prefixes are attached and tie-broken, negative | After the positive prefixes are attached and tie-broken, negative | |||
prefixes are attached and used in case of northbound computation, | prefixes are attached and used in case of northbound computation, | |||
ideally from the shortest length to the longest. The nexthop | ideally from the shortest length to the longest. The next-hop | |||
adjacencies for a negative prefix are inherited from the longest | adjacencies for a negative prefix are inherited from the longest | |||
positive prefix that aggregates it, and subsequently adjacencies to | positive prefix that aggregates it; subsequently, adjacencies to | |||
nodes that advertised negative for this prefix are removed. | nodes that advertised negative disaggregation for this prefix are | |||
removed. | ||||
The rule of inheritance MUST be maintained when the nexthop list for | The rule of inheritance MUST be maintained when the next-hop list for | |||
a prefix is modified, as the modification may affect the entries for | a prefix is modified, as the modification may affect the entries for | |||
matching negative prefixes of immediate longer prefix length. For | matching negative prefixes of immediate longer prefix length. For | |||
instance, if a nexthop is added, then by inheritance it must be added | instance, if a next hop is added, then by inheritance, it must be | |||
to all the negative routes of immediate longer prefixes length unless | added to all the negative routes of immediate longer prefixes length | |||
it is pruned due to a negative advertisement for the same next hop. | unless it is pruned due to a negative advertisement for the same next | |||
Similarly, if a nexthop is deleted for a given prefix, then it is | hop. Similarly, if a next hop is deleted for a given prefix, then it | |||
deleted for all the immediately aggregated negative routes. This | is deleted for all the immediately aggregated negative routes. This | |||
will recurse in the case of nested negative prefix aggregations. | will recurse in the case of nested negative prefix aggregations. | |||
The rule of inheritance MUST also be maintained when a new prefix of | The rule of inheritance MUST also be maintained when a new prefix of | |||
intermediate length is inserted, or when the immediately aggregating | intermediate length is inserted or when the immediately aggregating | |||
prefix is deleted from the routing table, making an even shorter | prefix is deleted from the routing table, making an even shorter | |||
aggregating prefix the one from which the negative routes now inherit | aggregating prefix the one from which the negative routes now inherit | |||
their adjacencies. As the aggregating prefix changes, all the | their adjacencies. As the aggregating prefix changes, all the | |||
negative routes MUST be recomputed, and then again the process may | negative routes MUST be recomputed, and then again, the process may | |||
recurse in case of nested negative prefix aggregations. | recurse in case of nested negative prefix aggregations. | |||
Although these operations can be computationally expensive, the | Although these operations can be computationally expensive, the | |||
overall load on devices in the network is low because these | overall load on devices in the network is low because these | |||
computations are not run very often, as positive route advertisements | computations are not run very often, as positive route advertisements | |||
are always preferred over negative ones. This prevents recursion in | are always preferred over negative ones. This prevents recursion in | |||
most cases because positive reachability information never inherits | most cases because positive reachability information never inherits | |||
next hops. | next hops. | |||
To make the negative disaggregation less abstract and provide an | To make the negative disaggregation less abstract and provide an | |||
example ToP node T1 with 4 ToF parents S1..S4 as represented in | example ToP node, T1 with 4 ToF parents S1..S4 as represented in | |||
Figure 20 are considered further: | Figure 20 are considered further: | |||
+----+ +----+ +----+ +----+ N | +----+ +----+ +----+ +----+ N | |||
| S1 | | S2 | | S3 | | S4 | ^ | | S1 | | S2 | | S3 | | S4 | ^ | |||
+----+ +----+ +----+ +----+ W< + >E | +----+ +----+ +----+ +----+ W< + >E | |||
| | | | v | | | | | v | |||
|+--------+ | | S | |+--------+ | | S | |||
||+-----------------+ | | ||+-----------------+ | | |||
|||+----------------+---------+ | |||+----------------+---------+ | |||
|||| | |||| | |||
skipping to change at page 89, line 4 ¶ | skipping to change at line 3934 ¶ | |||
| S1 | | S2 | | S3 | | S4 | ^ | | S1 | | S2 | | S3 | | S4 | ^ | |||
+----+ +----+ +----+ +----+ W< + >E | +----+ +----+ +----+ +----+ W< + >E | |||
| | | | v | | | | | v | |||
|+--------+ | | S | |+--------+ | | S | |||
||+-----------------+ | | ||+-----------------+ | | |||
|||+----------------+---------+ | |||+----------------+---------+ | |||
|||| | |||| | |||
+----+ | +----+ | |||
| T1 | | | T1 | | |||
+----+ | +----+ | |||
Figure 20: A ToP Node with 4 Parents | Figure 20: A ToP Node with 4 Parents | |||
If all ToF nodes can reach all the prefixes in the network; with | If all ToF nodes can reach all the prefixes in the network, with | |||
RIFT, they will normally advertise a default route south. An | RIFT, they will normally advertise a default route south. An | |||
abstract Routing Information Base (RIB), more commonly known as a | abstract Routing Information Base (RIB), more commonly known as a | |||
routing table, stores all types of maintained routes including the | routing table, stores all types of maintained routes, including the | |||
negative ones and "tie-breaks" for the best one, whereas an abstract | negative ones and "tie-breaks" for the best one, whereas an abstract | |||
Forwarding table (FIB) retains only the ultimately computed | forwarding table (FIB) retains only the ultimately computed | |||
"positive" routing instructions. In T1, those tables would look as | "positive" routing instructions. In T1, those tables would look as | |||
illustrated in Figure 21: | illustrated in Figure 21: | |||
+---------+ | +---------+ | |||
| Default | | | Default | | |||
+---------+ | +---------+ | |||
| | | | |||
| +--------+ | | +--------+ | |||
+---> | Via S1 | | +---> | Via S1 | | |||
| +--------+ | | +--------+ | |||
skipping to change at page 89, line 38 ¶ | skipping to change at line 3969 ¶ | |||
+---> | Via S3 | | +---> | Via S3 | | |||
| +--------+ | | +--------+ | |||
| | | | |||
| +--------+ | | +--------+ | |||
+---> | Via S4 | | +---> | Via S4 | | |||
+--------+ | +--------+ | |||
Figure 21: Abstract RIB | Figure 21: Abstract RIB | |||
In case T1 receives a negative advertisement for prefix 2001:db8::/32 | In case T1 receives a negative advertisement for prefix 2001:db8::/32 | |||
from S1 a negative route is stored in the RIB (indicated by a ~ | from S1, a negative route is stored in the RIB (indicated by a "~" | |||
sign), while the more specific routes to the complementing ToF nodes | sign), while the more specific routes to the complementing ToF nodes | |||
are installed in FIB. RIB and FIB in T1 now look as illustrated in | are installed in FIB. RIB and FIB in T1 now look as illustrated in | |||
Figure 22 and Figure 23, respectively: | Figures 22 and 23, respectively: | |||
+---------+ +-----------------+ | +---------+ +-----------------+ | |||
| Default | <-------------- | ~2001:db8::/32 | | | Default | <-------------- | ~2001:db8::/32 | | |||
+---------+ +-----------------+ | +---------+ +-----------------+ | |||
| | | | | | |||
| +--------+ | +--------+ | | +--------+ | +--------+ | |||
+---> | Via S1 | +---> | Via S1 | | +---> | Via S1 | +---> | Via S1 | | |||
| +--------+ +--------+ | | +--------+ +--------+ | |||
| | | | |||
| +--------+ | | +--------+ | |||
skipping to change at page 90, line 25 ¶ | skipping to change at line 3994 ¶ | |||
| +--------+ | | +--------+ | |||
| | | | |||
| +--------+ | | +--------+ | |||
+---> | Via S3 | | +---> | Via S3 | | |||
| +--------+ | | +--------+ | |||
| | | | |||
| +--------+ | | +--------+ | |||
+---> | Via S4 | | +---> | Via S4 | | |||
+--------+ | +--------+ | |||
Figure 22: Abstract RIB after Negative 2001:db8::/32 from S1 | Figure 22: Abstract RIB After Negative 2001:db8::/32 from S1 | |||
The negative 2001:db8::/32 prefix entry inherits from ::/0, so the | The negative 2001:db8::/32 prefix entry inherits from ::/0, so the | |||
positive more specific routes are the complements to S1 in the set of | positive, more specific routes are the complements to S1 in the set | |||
next-hops for the default route. That entry is composed of S2, S3, | of next hops for the default route. That entry is composed of S2, | |||
and S4, or, in other words, it uses all entries in the default route | S3, and S4, or in other words, it uses all entries in the default | |||
with a "hole punched" for S1 into them. These are the next hops that | route with a "hole punched" for S1 into them. These are the next | |||
are still available to reach 2001:db8::/32, now that S1 advertised | hops that are still available to reach 2001:db8::/32 now that S1 | |||
that it will not forward 2001:db8::/32 anymore. Ultimately, those | advertised that it will not forward 2001:db8::/32 anymore. | |||
resulting next-hops are installed in FIB for the more specific route | Ultimately, those resulting next hops are installed in FIB for the | |||
to 2001:db8::/32 as illustrated below: | more specific route to 2001:db8::/32 as illustrated below: | |||
+---------+ +---------------+ | +---------+ +---------------+ | |||
| Default | | 2001:db8::/32 | | | Default | | 2001:db8::/32 | | |||
+---------+ +---------------+ | +---------+ +---------------+ | |||
| | | | | | |||
| +--------+ | | | +--------+ | | |||
+---> | Via S1 | | | +---> | Via S1 | | | |||
| +--------+ | | | +--------+ | | |||
| | | | | | |||
| +--------+ | +--------+ | | +--------+ | +--------+ | |||
skipping to change at page 91, line 25 ¶ | skipping to change at line 4026 ¶ | |||
| +--------+ | +--------+ | | +--------+ | +--------+ | |||
| | | | | | |||
| +--------+ | +--------+ | | +--------+ | +--------+ | |||
+---> | Via S3 | +---> | Via S3 | | +---> | Via S3 | +---> | Via S3 | | |||
| +--------+ | +--------+ | | +--------+ | +--------+ | |||
| | | | | | |||
| +--------+ | +--------+ | | +--------+ | +--------+ | |||
+---> | Via S4 | +---> | Via S4 | | +---> | Via S4 | +---> | Via S4 | | |||
+--------+ +--------+ | +--------+ +--------+ | |||
Figure 23: Abstract FIB after Negative 2001:db8::/32 from S1 | Figure 23: Abstract FIB After Negative 2001:db8::/32 from S1 | |||
To illustrate matters further consider T1 receiving a negative | To illustrate matters further, consider T1 receiving a negative | |||
advertisement for prefix 2001:db8:1::/48 from S2, which is stored in | advertisement for prefix 2001:db8:1::/48 from S2, which is stored in | |||
RIB again. After the update, the RIB in T1 is illustrated in | RIB again. After the update, the RIB in T1 is illustrated in | |||
Figure 24: | Figure 24: | |||
+---------+ +----------------+ +------------------+ | +---------+ +----------------+ +------------------+ | |||
| Default | <----- | ~2001:db8::/32 | <------ | ~2001:db8:1::/48 | | | Default | <----- | ~2001:db8::/32 | <------ | ~2001:db8:1::/48 | | |||
+---------+ +----------------+ +------------------+ | +---------+ +----------------+ +------------------+ | |||
| | | | | | | | |||
| +--------+ | +--------+ | | | +--------+ | +--------+ | | |||
+---> | Via S1 | +---> | Via S1 | | | +---> | Via S1 | +---> | Via S1 | | | |||
skipping to change at page 91, line 52 ¶ | skipping to change at line 4053 ¶ | |||
| +--------+ +--------+ | | +--------+ +--------+ | |||
| | | | |||
| +--------+ | | +--------+ | |||
+---> | Via S3 | | +---> | Via S3 | | |||
| +--------+ | | +--------+ | |||
| | | | |||
| +--------+ | | +--------+ | |||
+---> | Via S4 | | +---> | Via S4 | | |||
+--------+ | +--------+ | |||
Figure 24: Abstract RIB after Negative 2001:db8:1::/48 from S2 | Figure 24: Abstract RIB After Negative 2001:db8:1::/48 from S2 | |||
Negative 2001:db8:1::/48 inherits from 2001:db8::/32 now, so the | Negative 2001:db8:1::/48 inherits from 2001:db8::/32 now, so the | |||
positive more specific routes are the complements to S2 in the set of | positive, more specific routes are the complements to S2 in the set | |||
next hops for 2001:db8::/32, which are S3 and S4, or, in other words, | of next hops for 2001:db8::/32, which are S3 and S4, or in other | |||
all entries of the parent with the negative holes "punched in" again. | words, all entries of the parent with the negative holes "punched in" | |||
After the update, the FIB in T1 shows as illustrated in Figure 25: | again. After the update, the FIB in T1 shows as illustrated in | |||
Figure 25: | ||||
+---------+ +---------------+ +-----------------+ | +---------+ +---------------+ +-----------------+ | |||
| Default | | 2001:db8::/32 | | 2001:db8:1::/48 | | | Default | | 2001:db8::/32 | | 2001:db8:1::/48 | | |||
+---------+ +---------------+ +-----------------+ | +---------+ +---------------+ +-----------------+ | |||
| | | | | | | | |||
| +--------+ | | | | +--------+ | | | |||
+---> | Via S1 | | | | +---> | Via S1 | | | | |||
| +--------+ | | | | +--------+ | | | |||
| | | | | | | | |||
| +--------+ | +--------+ | | | +--------+ | +--------+ | | |||
skipping to change at page 92, line 31 ¶ | skipping to change at line 4082 ¶ | |||
| +--------+ | +--------+ | | | +--------+ | +--------+ | | |||
| | | | | | | | |||
| +--------+ | +--------+ | +--------+ | | +--------+ | +--------+ | +--------+ | |||
+---> | Via S3 | +---> | Via S3 | +---> | Via S3 | | +---> | Via S3 | +---> | Via S3 | +---> | Via S3 | | |||
| +--------+ | +--------+ | +--------+ | | +--------+ | +--------+ | +--------+ | |||
| | | | | | | | |||
| +--------+ | +--------+ | +--------+ | | +--------+ | +--------+ | +--------+ | |||
+---> | Via S4 | +---> | Via S4 | +---> | Via S4 | | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 | | |||
+--------+ +--------+ +--------+ | +--------+ +--------+ +--------+ | |||
Figure 25: Abstract FIB after Negative 2001:db8:1::/48 from S2 | Figure 25: Abstract FIB After Negative 2001:db8:1::/48 from S2 | |||
Further, assume that S3 stops advertising its service as default | Further, assume that S3 stops advertising its service as a default | |||
gateway. The entry is removed from RIB as usual. In order to update | gateway. The entry is removed from RIB as usual. In order to update | |||
the FIB, it is necessary to eliminate the FIB entry for the default | the FIB, it is necessary to eliminate the FIB entry for the default | |||
route, as well as all the FIB entries that were created for negative | route, as well as all the FIB entries that were created for negative | |||
routes pointing to the RIB entry being removed (::/0). This is done | routes pointing to the RIB entry being removed (::/0). This is done | |||
recursively for 2001:db8::/32 and then for, 2001:db8:1::/48. The | recursively for 2001:db8::/32 and then for 2001:db8:1::/48. The | |||
related FIB entries via S3 are removed, as illustrated in Figure 26. | related FIB entries via S3 are removed as illustrated in Figure 26. | |||
+---------+ +---------------+ +-----------------+ | +---------+ +---------------+ +-----------------+ | |||
| Default | | 2001:db8::/32 | | 2001:db8:1::/48 | | | Default | | 2001:db8::/32 | | 2001:db8:1::/48 | | |||
+---------+ +---------------+ +-----------------+ | +---------+ +---------------+ +-----------------+ | |||
| | | | | | | | |||
| +--------+ | | | | +--------+ | | | |||
+---> | Via S1 | | | | +---> | Via S1 | | | | |||
| +--------+ | | | | +--------+ | | | |||
| | | | | | | | |||
| +--------+ | +--------+ | | | +--------+ | +--------+ | | |||
skipping to change at page 93, line 25 ¶ | skipping to change at line 4112 ¶ | |||
| +--------+ | +--------+ | | | +--------+ | +--------+ | | |||
| | | | | | | | |||
| | | | | | | | |||
| | | | | | | | |||
| | | | | | | | |||
| | | | | | | | |||
| +--------+ | +--------+ | +--------+ | | +--------+ | +--------+ | +--------+ | |||
+---> | Via S4 | +---> | Via S4 | +---> | Via S4 | | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 | | |||
+--------+ +--------+ +--------+ | +--------+ +--------+ +--------+ | |||
Figure 26: Abstract FIB after Loss of S3 | Figure 26: Abstract FIB After Loss of S3 | |||
Say that at that time, S4 would also disaggregate prefix | Say that at that time, S4 would also disaggregate prefix | |||
2001:db8:1::/48. This would mean that the FIB entry for | 2001:db8:1::/48. This would mean that the FIB entry for | |||
2001:db8:1::/48 becomes a discard route, and that would be the signal | 2001:db8:1::/48 becomes a discard route, and that would be the signal | |||
for T1 to disaggregate prefix 2001:db8:1::/48 negatively in a | for T1 to disaggregate prefix 2001:db8:1::/48 negatively in a | |||
transitive fashion with its own children. | transitive fashion with its own children. | |||
Finally, the case occurs where S3 becomes available again as a | Finally, the case occurs where S3 becomes available again as a | |||
default gateway, and a negative advertisement is received from S4 | default gateway, and a negative advertisement is received from S4 | |||
about prefix 2001:db8:2::/48 as opposed to 2001:db8:1::/48. Again, a | about prefix 2001:db8:2::/48 as opposed to 2001:db8:1::/48. Again, a | |||
negative route is stored in the RIB, and the more specific route to | negative route is stored in the RIB, and the more specific route to | |||
the complementing ToF nodes are installed in FIB. Since | the complementing ToF nodes is installed in FIB. Since | |||
2001:db8:2::/48 inherits from 2001:db8::/32, the positive FIB routes | 2001:db8:2::/48 inherits from 2001:db8::/32, the positive FIB routes | |||
are chosen by removing S4 from S2, S3, S4. The abstract FIB in T1 | are chosen by removing S4 from S2, S3, S4. The abstract FIB in T1 | |||
now shows as illustrated in Figure 27: | now shows as illustrated in Figure 27: | |||
+-----------------+ | +-----------------+ | |||
| 2001:db8:2::/48 | | | 2001:db8:2::/48 | | |||
+-----------------+ | +-----------------+ | |||
| | | | |||
+---------+ +---------------+ +-----------------+ | +---------+ +---------------+ +-----------------+ | |||
| Default | | 2001:db8::/32 | | 2001:db8:1::/48 | | | Default | | 2001:db8::/32 | | 2001:db8:1::/48 | | |||
skipping to change at page 94, line 29 ¶ | skipping to change at line 4153 ¶ | |||
| +--------+ | +--------+ | +--------+ | | +--------+ | +--------+ | +--------+ | |||
| | | | | | | | |||
| +--------+ | +--------+ | +--------+ | | +--------+ | +--------+ | +--------+ | |||
+---> | Via S3 | +---> | Via S3 | +---> | Via S3 | | +---> | Via S3 | +---> | Via S3 | +---> | Via S3 | | |||
| +--------+ | +--------+ | +--------+ | | +--------+ | +--------+ | +--------+ | |||
| | | | | | | | |||
| +--------+ | +--------+ | +--------+ | | +--------+ | +--------+ | +--------+ | |||
+---> | Via S4 | +---> | Via S4 | +---> | Via S4 | | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 | | |||
+--------+ +--------+ +--------+ | +--------+ +--------+ +--------+ | |||
Figure 27: Abstract FIB after Negative 2001:db8:2::/48 from S4 | Figure 27: Abstract FIB After Negative 2001:db8:2::/48 from S4 | |||
6.7. Optional Zero Touch Provisioning (RIFT ZTP) | 6.7. Optional Zero Touch Provisioning (RIFT ZTP) | |||
Each RIFT node can operate in zero touch provisioning (ZTP) mode, | Each RIFT node can operate in Zero Touch Provisioning (ZTP) mode, | |||
i.e. it has no RIFT specific configuration (unless it is a ToF or it | i.e., it has no RIFT-specific configuration (unless it is a ToF or it | |||
is explicitly configured to operate in the overall topology as leaf | is explicitly configured to operate in the overall topology as a leaf | |||
and/or support leaf-2-leaf procedures) and it will fully | and/or support L2L procedures), and it will fully, automatically | |||
automatically derive necessary RIFT parameters itself after being | derive necessary RIFT parameters itself after being attached to the | |||
attached to the topology. Manually configured nodes and nodes | topology. Manually configured nodes and nodes operating using RIFT | |||
operating using RIFT ZTP can be mixed freely and will form a valid | ZTP can be mixed freely and will form a valid topology if achievable. | |||
topology if achievable. | ||||
The derivation of the level of each node happens based on offers | The derivation of the level of each node happens based on offers | |||
received from its neighbors whereas each node (with the possible | received from its neighbors, whereas each node (with the possible | |||
exception of nodes configured as leaves) tries to attach at the | exception of nodes configured as leaves) tries to attach at the | |||
highest possible point in the fabric. This guarantees that even if | highest possible point in the fabric. This guarantees that even if | |||
the diffusion front of offers reaches a node from "below" faster than | the diffusion front of offers reaches a node from "below" faster than | |||
from "above", it will greedily abandon already negotiated level | from "above", it will greedily abandon an already negotiated level | |||
derived from nodes topologically below it and properly peer with | derived from nodes topologically below it and properly peer with | |||
nodes above. | nodes above. | |||
The fabric is very consciously numbered from the top down to allow | The fabric is very consciously numbered from the top down to allow | |||
for PoDs of different heights and to minimize the number of | for PoDs of different heights and to minimize the number of | |||
configuration necessary, in this case just a TOP_OF_FABRIC flag on | configurations necessary, in this case, just a TOP_OF_FABRIC flag on | |||
every node at the top of the fabric. | every node at the top of the fabric. | |||
This section describes the necessary concepts and procedures of RIFT | This section describes the necessary concepts and procedures of the | |||
ZTP operation. | RIFT ZTP operation. | |||
6.7.1. Terminology | 6.7.1. Terminology | |||
The interdependencies between the different flags and the configured | The interdependencies between the different flags and the configured | |||
level can be somewhat vexing at first and it may take multiple reads | level can be somewhat vexing at first, and it may take multiple reads | |||
of the glossary to comprehend them. | of the glossary to comprehend them. | |||
Automatic Level Derivation: | Automatic Level Derivation: | |||
Procedures which allow nodes without level configured to derive it | Procedures that allow nodes without a level configured to derive | |||
automatically. Only applied if CONFIGURED_LEVEL is undefined. | it automatically. Only applied if CONFIGURED_LEVEL is undefined. | |||
UNDEFINED_LEVEL: | UNDEFINED_LEVEL: | |||
A "null" value that indicates that the level has not been | A "null" value that indicates that the level has not been | |||
determined and has not been configured. Schemas normally indicate | determined and has not been configured. Schemas normally indicate | |||
that by a missing optional value without an available defined | that by a missing optional value without an available defined | |||
default. | default. | |||
LEAF_ONLY: | LEAF_ONLY: | |||
An optional configuration flag that can be configured on a node to | An optional configuration flag that can be configured on a node to | |||
make sure it never leaves the "bottom of the hierarchy". | make sure it never leaves the "bottom of the hierarchy". The | |||
TOP_OF_FABRIC flag and CONFIGURED_LEVEL cannot be defined at the | TOP_OF_FABRIC flag and CONFIGURED_LEVEL cannot be defined at the | |||
same time as this flag. It implies CONFIGURED_LEVEL value of | same time as this flag. It implies a CONFIGURED_LEVEL value of | |||
_leaf_level_. It is indicated in the _leaf_only_ schema element. | _leaf_level_. It is indicated in the _leaf_only_ schema element. | |||
TOP_OF_FABRIC: | TOP_OF_FABRIC: | |||
A configuration flag that MUST be provided on all ToF nodes. | A configuration flag that MUST be provided on all ToF nodes. | |||
LEAF_FLAG and CONFIGURED_LEVEL cannot be defined at the same time | LEAF_FLAG and CONFIGURED_LEVEL cannot be defined at the same time | |||
as this flag. It implies a CONFIGURED_LEVEL value. In fact, it | as this flag. It implies a CONFIGURED_LEVEL value. In fact, it | |||
is basically a shortcut for configuring same level at all ToF | is basically a shortcut for configuring the same level at all ToF | |||
nodes which is unavoidable since an initial 'seed' is needed for | nodes, which is unavoidable since an initial "seed" is needed for | |||
other ZTP nodes to derive their level in the topology. The flag | other ZTP nodes to derive their level in the topology. The flag | |||
plays an important role in fabrics with multiple planes to enable | plays an important role in fabrics with multiple planes to enable | |||
successful negative disaggregation (Section 6.5.2). It is carried | successful negative disaggregation (Section 6.5.2). It is carried | |||
in the _top_of_fabric_ schema element. A standards conforming | in the _top_of_fabric_ schema element. A standards-conforming | |||
RIFT implementation implies a CONFIGURED_LEVEL value of | RIFT implementation implies a CONFIGURED_LEVEL value of | |||
_top_of_fabric_level_ in case of TOP_OF_FABRIC. This value is | _top_of_fabric_level_ in case of TOP_OF_FABRIC. This value is | |||
kept reasonably low to allow for fast ZTP re-convergence on | kept reasonably low to allow for fast ZTP reconvergence on | |||
failures. | failures. | |||
CONFIGURED_LEVEL: | CONFIGURED_LEVEL: | |||
A level value provided manually. When this is defined (i.e. it is | A level value provided manually. When this is defined (i.e., it | |||
not an UNDEFINED_LEVEL) the node is not participating in ZTP in | is not an UNDEFINED_LEVEL), the node is not participating in ZTP | |||
the sense of deriving its own level based on other nodes' | in the sense of deriving its own level based on other nodes' | |||
information. TOP_OF_FABRIC flag is ignored when this value is | information. The TOP_OF_FABRIC flag is ignored when this value is | |||
defined. LEAF_ONLY can be set only if this value is undefined or | defined. LEAF_ONLY can be set only if this value is undefined or | |||
set to _leaf_level_. | set to _leaf_level_. | |||
DERIVED_LEVEL: | DERIVED_LEVEL: | |||
Level value computed via automatic level derivation when | Level value computed via automatic level derivation when | |||
CONFIGURED_LEVEL is equal to UNDEFINED_LEVEL. | CONFIGURED_LEVEL is equal to UNDEFINED_LEVEL. | |||
LEAF_2_LEAF: | LEAF_2_LEAF: | |||
An optional flag that can be configured on a node to make sure it | An optional flag that can be configured on a node to make sure it | |||
supports procedures defined in Section 6.8.9. It is a capability | supports procedures defined in Section 6.8.9. It is a capability | |||
that implies LEAF_ONLY and the corresponding restrictions. | that implies LEAF_ONLY and the corresponding restrictions. The | |||
TOP_OF_FABRIC flag is ignored when set at the same time as this | TOP_OF_FABRIC flag is ignored when set at the same time as this | |||
flag. It is carried in the _leaf_only_and_leaf_2_leaf_procedures_ | flag. It is carried in the _leaf_only_and_leaf_2_leaf_procedures_ | |||
schema flag. | schema flag. | |||
LEVEL_VALUE: | LEVEL_VALUE: | |||
With ZTP, the original definition of "level" in Section 3.1 is | With ZTP, the original definition of "level" in Section 3.1 is | |||
both extended and relaxed. First, level is defined now as | both extended and relaxed. First, the level is defined now as | |||
LEVEL_VALUE and is the first defined value of CONFIGURED_LEVEL | LEVEL_VALUE and is the first defined value of CONFIGURED_LEVEL | |||
followed by DERIVED_LEVEL. Second, it is possible for nodes to be | followed by DERIVED_LEVEL. Second, it is possible for nodes to be | |||
more than one level apart to form adjacencies if any of the nodes | more than one level apart to form adjacencies if any of the nodes | |||
is at least LEAF_ONLY. | is at least LEAF_ONLY. | |||
Valid Offered Level (VOL): | Valid Offered Level (VOL): | |||
A neighbor's level received in a valid LIE (i.e. passing all | A neighbor's level received in a valid LIE (i.e., passing all | |||
checks for adjacency formation while disregarding all clauses | checks for adjacency formation while disregarding all clauses | |||
involving level values) persisting for the duration of the | involving level values) persisting for the duration of the | |||
holdtime interval on the LIE. Observe that offers from nodes | holdtime interval on the LIE. Observe that offers from nodes | |||
offering level value of _leaf_level_ do not constitute VOLs (since | offering the level value of _leaf_level_ do not constitute VOLs | |||
no valid DERIVED_LEVEL can be obtained from those and consequently | (since no valid DERIVED_LEVEL can be obtained from those and | |||
_not_a_ztp_offer_ flag MUST be ignored). Offers from LIEs with | consequently the _not_a_ztp_offer_ flag MUST be ignored). Offers | |||
_not_a_ztp_offer_ being true are not VOLs either. If a node | from LIEs with _not_a_ztp_offer_ being true are not VOLs either. | |||
maintains parallel adjacencies to the neighbor, VOL on each | If a node maintains parallel adjacencies to the neighbor, VOL on | |||
adjacency is considered as equivalent, i.e. the newest VOL from | each adjacency is considered as equivalent, i.e., the newest VOL | |||
any such adjacency updates the VOL received from the same node. | from any such adjacency updates the VOL received from the same | |||
node. | ||||
Highest Available Level (HAL): | Highest Available Level (HAL): | |||
Highest defined level value received from all VOLs received. | Highest-defined level value received from all VOLs received. | |||
Highest Available Level Systems (HALS): | Highest Available Level Systems (HALS): | |||
Set of nodes offering HAL VOLs. | Set of nodes offering HAL VOLs. | |||
Highest Adjacency ThreeWay (HAT): | Highest Adjacency ThreeWay (HAT): | |||
Highest neighbor level of all the formed _ThreeWay_ adjacencies | Highest neighbor level of all the formed _ThreeWay_ adjacencies | |||
for the node. | for the node. | |||
6.7.2. Automatic System ID Selection | 6.7.2. Automatic System ID Selection | |||
RIFT nodes require a 64-bit System ID which SHOULD be derived as | RIFT nodes require a 64-bit System ID that SHOULD be derived as | |||
EUI-64 MA-L derive according to [EUI64]. The organizationally | EUI-64 MAC Address Block Large (MA-L) according to [EUI64]. The | |||
governed portion of this ID (24 bits) can be used to generate | organizationally governed portion of this ID (24 bits) can be used to | |||
multiple IDs if required to indicate more than one RIFT instance. | generate multiple IDs if required to indicate more than one RIFT | |||
instance. | ||||
As matter of operational concern, the router MUST ensure that such | As matter of operational concern, the router MUST ensure that such | |||
identifier is not changing very frequently (or at least not without | identifier is not changing very frequently (or at least not without | |||
sending all its TIEs with fairly short lifetimes, i.e. purging them) | sending all its TIEs with fairly short lifetimes, i.e., purging them) | |||
since otherwise the network may be left with large amounts of stale | since the network may otherwise be left with large amounts of stale | |||
TIEs in other nodes (though this is not necessarily a serious problem | TIEs in other nodes (though this is not necessarily a serious problem | |||
if the procedures described in Section 9 are implemented). | if the procedures described in Section 9 are implemented). | |||
6.7.3. Generic Fabric Example | 6.7.3. Generic Fabric Example | |||
ZTP forces considerations of an incorrectly or unusually cabled | ZTP forces considerations of an incorrectly or unusually cabled | |||
fabric and how such a topology can be forced into a "lattice" | fabric and how such a topology can be forced into a "lattice" | |||
structure which a fabric represents (with further restrictions). A | structure that a fabric represents (with further restrictions). A | |||
necessary and sufficient physical cabling is shown in Figure 28. The | necessary and sufficient physical cabling is shown in Figure 28. The | |||
assumption here is that all nodes are in the same PoD. | assumption here is that all nodes are in the same PoD. | |||
+---+ | +---+ | |||
| A | s = TOP_OF_FABRIC | | A | s = TOP_OF_FABRIC | |||
| s | L = LEAF_ONLY | | s | L = LEAF_ONLY | |||
++-++ L2L = LEAF_2_LEAF | ++-++ L2L = LEAF_2_LEAF | |||
| | | | | | |||
+--+ +--+ | +--+ +--+ | |||
| | | | | | |||
skipping to change at page 98, line 38 ¶ | skipping to change at line 4328 ¶ | |||
+-----------------+ | | | +-----------------+ | | | |||
| | | | | | | | | | | | |||
++-++ ++-++ | | ++-++ ++-++ | | |||
| X +-----+ Y +-+ | | X +-----+ Y +-+ | |||
|L2L| | L | | |L2L| | L | | |||
+---+ +---+ | +---+ +---+ | |||
Figure 28: Generic ZTP Cabling Considerations | Figure 28: Generic ZTP Cabling Considerations | |||
First, RIFT must anchor the "top" of the cabling and that's what the | First, RIFT must anchor the "top" of the cabling and that's what the | |||
TOP_OF_FABRIC flag at node A is for. Then things look smooth until | TOP_OF_FABRIC flag at node A is for. Then, things look smooth until | |||
the protocol has to decide whether node Y is at the same level as I, | the protocol has to decide whether node Y is at the same level as I, | |||
J (and as consequence, X is south of it) or at the same level as X. | J (and as consequence, X is south of it), or X. This is unresolvable | |||
This is unresolvable here until we "nail down the bottom" of the | here until we "nail down the bottom" of the topology. To achieve | |||
topology. To achieve that the protocol chooses to use in this | that, the protocol chooses to use the leaf flags in X and Y in this | |||
example the leaf flags in X and Y. In case where Y would not have a | example. In the case where Y does not have a leaf flag, it will try | |||
leaf flag it will try to elect highest level offered and end up being | to elect the highest level offered and end up being in same level as | |||
in same level as I and J. | I and J. | |||
6.7.4. Level Determination Procedure | 6.7.4. Level Determination Procedure | |||
A node starting up with UNDEFINED_VALUE (i.e. without a | A node starting up with UNDEFINED_VALUE (i.e., without a | |||
CONFIGURED_LEVEL or any leaf or TOP_OF_FABRIC flag) MUST follow those | CONFIGURED_LEVEL or any leaf or TOP_OF_FABRIC flag) MUST follow these | |||
additional procedures: | additional procedures: | |||
1. It advertises its LEVEL_VALUE on all LIEs (observe that this can | 1. It advertises its LEVEL_VALUE on all LIEs (observe that this can | |||
be UNDEFINED_LEVEL which in terms of the schema is simply an | be UNDEFINED_LEVEL, which in terms of the schema, is simply an | |||
omitted optional value). | omitted optional value). | |||
2. It computes HAL as numerically highest available level in all | 2. It computes HAL as the numerically highest available level in all | |||
VOLs. | VOLs. | |||
3. It chooses then MAX(HAL-1,0) as its DERIVED_LEVEL. The node then | 3. Then, it chooses MAX(HAL-1,0) as its DERIVED_LEVEL. The node | |||
starts to advertise this derived level. | then starts to advertise this derived level. | |||
4. A node that lost all adjacencies with HAL value MUST hold down | 4. A node that lost all adjacencies with the HAL value MUST holddown | |||
computation of new DERIVED_LEVEL for at least one second unless | computation of the new DERIVED_LEVEL for at least one second | |||
it has no VOLs from southbound adjacencies. After the holddown | unless it has no VOLs from southbound adjacencies. After the | |||
timer expired, it MUST discard all received offers, recompute | holddown timer expired, it MUST discard all received offers, | |||
DERIVED_LEVEL and announce it to all neighbors. | recompute DERIVED_LEVEL, and announce it to all neighbors. | |||
5. A node MUST reset any adjacency that has changed the level it is | 5. A node MUST reset any adjacency that has changed the level it is | |||
offering and is in _ThreeWay_ state. | offering and is in _ThreeWay_ state. | |||
6. A node that changed its defined level value MUST readvertise its | 6. A node that changed its defined level value MUST re-advertise its | |||
own TIEs (since the new _PacketHeader_ will contain a different | own TIEs (since the new _PacketHeader_ will contain a different | |||
level than before). The sequence number of each TIE MUST be | level than before). The sequence number of each TIE MUST be | |||
increased. | increased. | |||
7. After a level has been derived the node MUST set the | 7. After a level has been derived, the node MUST set the | |||
_not_a_ztp_offer_ on LIEs towards all systems offering a VOL for | _not_a_ztp_offer_ on LIEs towards all systems offering a VOL for | |||
HAL. | HAL. | |||
8. A node that changed its level SHOULD flush from its link state | 8. A node that changed its level SHOULD flush TIEs of all other | |||
database TIEs of all other nodes, otherwise stale information may | nodes from its LSDB; otherwise, stale information may persist on | |||
persist on "direction reversal", i.e., nodes that seemed south | "direction reversal", i.e., nodes that seemed south are now north | |||
are now north or east-west. This will not prevent the correct | or east-west. This will not prevent the correct operation of the | |||
operation of the protocol but could be slightly confusing | protocol but could be slightly confusing operationally. | |||
operationally. | ||||
A node starting with LEVEL_VALUE being 0 (i.e., it assumes a leaf | A node starting with LEVEL_VALUE being 0 (i.e., it assumes a leaf | |||
function by being configured with the appropriate flags or has a | function by being configured with the appropriate flags or has a | |||
CONFIGURED_LEVEL of 0) MUST follow those additional procedures: | CONFIGURED_LEVEL of 0) MUST follow this additional procedure: | |||
1. It computes HAT per procedures above but does *not* use it to | 1. It computes HAT per the procedures above but does *not* use it to | |||
compute DERIVED_LEVEL. HAT is used to limit adjacency formation | compute DERIVED_LEVEL. HAT is used to limit adjacency formation | |||
per Section 6.2. | per Section 6.2. | |||
It MAY also follow modified procedures: | It MAY also follow this modified procedure: | |||
1. It may pick a different strategy to choose VOL, e.g. use the VOL | 1. It may pick a different strategy to choose VOL, e.g., use the VOL | |||
value with highest number of VOLs. Such strategies are only | value with highest number of VOLs. Such strategies are only | |||
possible since the node always remains "at the bottom of the | possible since the node always remains "at the bottom of the | |||
fabric" while another layer could "invert" the fabric by picking | fabric", while another layer could "invert" the fabric by picking | |||
its preferred VOL in a different fashion rather than always | its preferred VOL in a different fashion rather than always | |||
trying to achieve the highest viable level. | trying to achieve the highest viable level. | |||
6.7.5. RIFT ZTP FSM | 6.7.5. RIFT ZTP FSM | |||
This section specifies the precise, normative ZTP FSM and can be | This section specifies the precise, normative ZTP FSM and can be | |||
omitted unless the reader is pursuing an implementation of the | omitted unless the reader is pursuing an implementation of the | |||
protocol. For additional clarity a graphical representation of the | protocol. For additional clarity, a graphical representation of the | |||
ZTP FSM is depicted in Figure 29. It may also be helpful to refer to | ZTP FSM is depicted in Figure 29. It may also be helpful to refer to | |||
the normative schema in Section 7. | the normative schema in Section 7. | |||
Initial state is ComputeBestOffer. | The initial state is ComputeBestOffer. | |||
Enter | Enter | |||
| | | | |||
v | v | |||
+------------------+ | +------------------+ | |||
| ComputeBestOffer | | | ComputeBestOffer | | |||
| |<----+ | | |<----+ | |||
| | | BetterHAL | | | | BetterHAL | |||
| | | BetterHAT | | | | BetterHAT | |||
| | | ChangeLocalConfiguredLevel | | | | ChangeLocalConfiguredLevel | |||
skipping to change at page 101, line 39 ¶ | skipping to change at line 4473 ¶ | |||
| | | ShortTic | | | | ShortTic | |||
| |-----+ | | |-----+ | |||
+------------------+ | +------------------+ | |||
| | | | |||
| LostHAL | | LostHAL | |||
V | V | |||
(HoldingDown) | (HoldingDown) | |||
Figure 29: RIFT ZTP FSM | Figure 29: RIFT ZTP FSM | |||
The following words are used for well-known procedures: | The following terms are used for well-known procedures: | |||
* PUSH Event: queues an event to be executed by the FSM upon exit of | * PUSH Event: queues an event to be executed by the FSM upon exit of | |||
this action | this action | |||
* COMPARE_OFFERS: checks whether based on current offers and held | * COMPARE_OFFERS: checks whether, based on current offers and held | |||
last results, the events BetterHAL/LostHAL/BetterHAT/LostHAT are | last results, the events BetterHAL/LostHAL/BetterHAT/LostHAT are | |||
necessary and returns them | necessary and returns them | |||
* UPDATE_OFFER: store current offer with adjacency holdtime as | * UPDATE_OFFER: store current offer with adjacency holdtime as | |||
lifetime and COMPARE_OFFERS, then PUSH corresponding events | lifetime and COMPARE_OFFERS, then PUSH corresponding events | |||
* LEVEL_COMPUTE: compute best offered or configured level and HAL/ | * LEVEL_COMPUTE: compute best offered or configured level and HAL/ | |||
HAT, if anything changed PUSH ComputationDone | HAT, if anything changed, PUSH ComputationDone | |||
* REMOVE_OFFER: remove the corresponding offer and COMPARE_OFFERS, | * REMOVE_OFFER: remove the corresponding offer and COMPARE_OFFERS, | |||
PUSH corresponding events | PUSH corresponding events | |||
* PURGE_OFFERS: REMOVE_OFFER for all held offers, COMPARE OFFERS, | * PURGE_OFFERS: REMOVE_OFFER for all held offers, COMPARE OFFERS, | |||
PUSH corresponding events | PUSH corresponding events | |||
* PROCESS_OFFER: | * PROCESS_OFFER: | |||
1. if no level offered then REMOVE_OFFER | 1. if no level is offered, then REMOVE_OFFER | |||
2. else | 2. else | |||
1. if offered level > leaf then UPDATE_OFFER | a. if offered level > leaf, then UPDATE_OFFER | |||
2. else REMOVE_OFFER | b. else REMOVE_OFFER | |||
States: | States: | |||
* ComputeBestOffer: processes received offers to derive ZTP | * ComputeBestOffer: Processes received offers to derive ZTP | |||
variables | variables. | |||
* HoldingDown: holding down while receiving updates | * HoldingDown: Holding down while receiving updates. | |||
* UpdatingClients: updates other FSMs on the same node with | * UpdatingClients: Updates other FSMs on the same node with | |||
computation results | computation results. | |||
Events: | Events: | |||
* ChangeLocalHierarchyIndications: node locally configured with new | * ChangeLocalHierarchyIndications: Node locally configured with new | |||
leaf flags. | leaf flags. | |||
* ChangeLocalConfiguredLevel: node locally configured with a defined | * ChangeLocalConfiguredLevel: Node locally configured with a defined | |||
level | level. | |||
* NeighborOffer: a new neighbor offer with optional level and | * NeighborOffer: A new neighbor offer with optional level and | |||
neighbor state. | neighbor state. | |||
* BetterHAL: better HAL computed internally. | * BetterHAL: Better HAL computed internally. | |||
* BetterHAT: better HAT computed internally. | * BetterHAT: Better HAT computed internally. | |||
* LostHAL: lost last HAL in computation. | * LostHAL: Lost last HAL in computation. | |||
* LostHAT: lost HAT in computation. | * LostHAT: Lost HAT in computation. | |||
* ComputationDone: computation performed. | * ComputationDone: Computation performed. | |||
* HoldDownExpired: holddown timer expired. | * HoldDownExpired: Holddown timer expired. | |||
* ShortTic: one-second timer tick. This event is provided to the | * ShortTic: One-second timer tick. This event is provided to the | |||
FSM once a second by an implementation-specific mechanism that is | FSM once a second by an implementation-specific mechanism that is | |||
outside the scope of this specification. This event is quietly | outside the scope of this specification. This event is quietly | |||
ignored if the relevant transition does not exist. | ignored if the relevant transition does not exist. | |||
Actions: | Actions: | |||
* on ChangeLocalConfiguredLevel in HoldingDown finishes in | * on ChangeLocalConfiguredLevel in HoldingDown finishes in | |||
ComputeBestOffer: store configured level | ComputeBestOffer: store configured level | |||
* on BetterHAT in HoldingDown finishes in HoldingDown: no action | * on BetterHAT in HoldingDown finishes in HoldingDown: no action | |||
* on ShortTic in HoldingDown finishes in HoldingDown: remove expired | * on ShortTic in HoldingDown finishes in HoldingDown: remove expired | |||
offers and if holddown timer expired PUSH_EVENT HoldDownExpired | offers, and if holddown timer expired, PUSH_EVENT HoldDownExpired | |||
* on NeighborOffer in HoldingDown finishes in HoldingDown: | * on NeighborOffer in HoldingDown finishes in HoldingDown: | |||
PROCESS_OFFER | PROCESS_OFFER | |||
* on ComputationDone in HoldingDown finishes in HoldingDown: no | * on ComputationDone in HoldingDown finishes in HoldingDown: no | |||
action | action | |||
* on BetterHAL in HoldingDown finishes in HoldingDown: no action | * on BetterHAL in HoldingDown finishes in HoldingDown: no action | |||
* on LostHAT in HoldingDown finishes in HoldingDown: no action | * on LostHAT in HoldingDown finishes in HoldingDown: no action | |||
skipping to change at page 104, line 6 ¶ | skipping to change at line 4583 ¶ | |||
* on NeighborOffer in ComputeBestOffer finishes in ComputeBestOffer: | * on NeighborOffer in ComputeBestOffer finishes in ComputeBestOffer: | |||
PROCESS_OFFER | PROCESS_OFFER | |||
* on BetterHAT in ComputeBestOffer finishes in ComputeBestOffer: | * on BetterHAT in ComputeBestOffer finishes in ComputeBestOffer: | |||
LEVEL_COMPUTE | LEVEL_COMPUTE | |||
* on ChangeLocalHierarchyIndications in ComputeBestOffer finishes in | * on ChangeLocalHierarchyIndications in ComputeBestOffer finishes in | |||
ComputeBestOffer: store leaf flags and LEVEL_COMPUTE | ComputeBestOffer: store leaf flags and LEVEL_COMPUTE | |||
* on LostHAL in ComputeBestOffer finishes in HoldingDown: if any | * on LostHAL in ComputeBestOffer finishes in HoldingDown: if any | |||
southbound adjacencies present then update holddown timer to | southbound adjacencies present, then update holddown timer to | |||
normal duration else fire holddown timer immediately | normal duration, else fire holddown timer immediately | |||
* on ShortTic in ComputeBestOffer finishes in ComputeBestOffer: | * on ShortTic in ComputeBestOffer finishes in ComputeBestOffer: | |||
remove expired offers | remove expired offers | |||
* on ComputationDone in ComputeBestOffer finishes in | * on ComputationDone in ComputeBestOffer finishes in | |||
UpdatingClients: no action | UpdatingClients: no action | |||
* on ChangeLocalConfiguredLevel in ComputeBestOffer finishes in | * on ChangeLocalConfiguredLevel in ComputeBestOffer finishes in | |||
ComputeBestOffer: store configured level and LEVEL_COMPUTE | ComputeBestOffer: store configured level and LEVEL_COMPUTE | |||
* on BetterHAL in ComputeBestOffer finishes in ComputeBestOffer: | * on BetterHAL in ComputeBestOffer finishes in ComputeBestOffer: | |||
LEVEL_COMPUTE | LEVEL_COMPUTE | |||
* on ShortTic in UpdatingClients finishes in UpdatingClients: remove | * on ShortTic in UpdatingClients finishes in UpdatingClients: remove | |||
expired offers | expired offers | |||
* on LostHAL in UpdatingClients finishes in HoldingDown: if any | * on LostHAL in UpdatingClients finishes in HoldingDown: if any | |||
southbound adjacencies are present then update holddown timer to | southbound adjacencies are present, then update holddown timer to | |||
normal duration else fire holddown timer immediately | normal duration, else fire holddown timer immediately | |||
* on BetterHAT in UpdatingClients finishes in ComputeBestOffer: no | * on BetterHAT in UpdatingClients finishes in ComputeBestOffer: no | |||
action | action | |||
* on BetterHAL in UpdatingClients finishes in ComputeBestOffer: no | * on BetterHAL in UpdatingClients finishes in ComputeBestOffer: no | |||
action | action | |||
* on ChangeLocalConfiguredLevel in UpdatingClients finishes in | * on ChangeLocalConfiguredLevel in UpdatingClients finishes in | |||
ComputeBestOffer: store configured level | ComputeBestOffer: store configured level | |||
skipping to change at page 105, line 40 ¶ | skipping to change at line 4663 ¶ | |||
| | | | | | | | |||
+---------+ | | | +---------+ | | | |||
| | | | | | | | |||
++-++ +---+ | | ++-++ +---+ | | |||
| X | | Y +-+ | | X | | Y +-+ | |||
| 0 | | 0 | | | 0 | | 0 | | |||
+---+ +---+ | +---+ +---+ | |||
Figure 30: Generic ZTP Topology Autoconfigured | Figure 30: Generic ZTP Topology Autoconfigured | |||
In case where the LEAF_ONLY restriction on Y is removed the outcome | In the case where the LEAF_ONLY restriction on Y is removed, the | |||
would be very different however and result in Figure 31. This | outcome would be very different however and result in Figure 31. | |||
demonstrates basically that auto configuration makes miscabling | This basically demonstrates that autoconfiguration makes miscabling | |||
detection hard and with that can lead to undesirable effects in cases | detection hard and, with that, can lead to undesirable effects in | |||
where leaves are not "nailed" by the appropriately configured flags | cases where leaves are not "nailed" by the appropriately configured | |||
and arbitrarily cabled. | flags and arbitrarily cabled. | |||
+---+ | +---+ | |||
| A | | | A | | |||
| 24| | | 24| | |||
++-++ | ++-++ | |||
| | | | | | |||
+--+ +--+ | +--+ +--+ | |||
| | | | | | |||
+--++ ++--+ | +--++ ++--+ | |||
| E | | F | | | E | | F | | |||
skipping to change at page 106, line 41 ¶ | skipping to change at line 4706 ¶ | |||
| X +--------+ | | X +--------+ | |||
| 0 | | | 0 | | |||
+---+ | +---+ | |||
Figure 31: Generic ZTP Topology Autoconfigured | Figure 31: Generic ZTP Topology Autoconfigured | |||
6.8. Further Mechanisms | 6.8. Further Mechanisms | |||
6.8.1. Route Preferences | 6.8.1. Route Preferences | |||
Since RIFT distinguishes between different route types such as e.g. | Since RIFT distinguishes between different route types, such as | |||
external routes from other protocols and additionally advertises | external routes from other protocols, and additionally advertises | |||
special types of routes on disaggregation, the protocol MUST tie- | special types of routes on disaggregation, the protocol MUST tie- | |||
break internally different types on a clear preference scale to | break internally different types on a clear preference scale to | |||
prevent traffic loss or loops. The preferences are given in the | prevent traffic loss or loops. The preferences are given in the | |||
schema type _RouteType_. | schema type _RouteType_. | |||
Table 5 contains the route type as derived from the TIE type carrying | Table 5 contains the route type as derived from the TIE type carrying | |||
it. Entries are sorted from the most preferred route type to the | it. Entries are sorted from the most preferred route type to the | |||
least preferred route type. | least preferred route type. | |||
+==================================+======================+ | +==================================+======================+ | |||
skipping to change at page 107, line 33 ¶ | skipping to change at line 4745 ¶ | |||
| South External Prefix and South | SouthExternalPrefix | | | South External Prefix and South | SouthExternalPrefix | | |||
| Positive External Disaggregation | | | | Positive External Disaggregation | | | |||
+----------------------------------+----------------------+ | +----------------------------------+----------------------+ | |||
| South Negative Prefix | NegativeSouthPrefix | | | South Negative Prefix | NegativeSouthPrefix | | |||
+----------------------------------+----------------------+ | +----------------------------------+----------------------+ | |||
Table 5: TIEs and Contained Route Types | Table 5: TIEs and Contained Route Types | |||
6.8.2. Overload Bit | 6.8.2. Overload Bit | |||
Overload attribute is specified in the packet encoding schema | The overload attribute is specified in the packet encoding schema | |||
(Section 7) in the _overload_ flag. | (Section 7) in the _overload_ flag. | |||
The overload flag MUST be respected by all necessary SPF | The overload flag MUST be respected by all necessary SPF | |||
computations. A node with the overload flag set SHOULD advertise all | computations. A node with the overload flag set SHOULD advertise all | |||
locally hosted prefixes both northbound and southbound, all other | locally hosted prefixes, both northbound and southbound; all other | |||
southbound prefixes SHOULD NOT be advertised. | southbound prefixes SHOULD NOT be advertised. | |||
Leaf nodes SHOULD set the overload attribute on all originated Node | Leaf nodes SHOULD set the overload attribute on all originated Node | |||
TIEs. If spine nodes were to forward traffic not intended for the | TIEs. If spine nodes were to forward traffic not intended for the | |||
local node, the leaf node would not be able to prevent routing/ | local node, the leaf node would not be able to prevent routing/ | |||
forwarding loops as it does not have the necessary topology | forwarding loops as it does not have the necessary topology | |||
information to do so. | information to do so. | |||
6.8.3. Optimized Route Computation on Leaves | 6.8.3. Optimized Route Computation on Leaves | |||
Leaf nodes only have visibility to directly connected nodes and | Leaf nodes only have visibility to directly connected nodes and | |||
therefore are not required to run "full" SPF computations. Instead, | therefore are not required to run "full" SPF computations. Instead, | |||
prefixes from neighboring nodes can be gathered to run a "partial" | prefixes from neighboring nodes can be gathered to run a "partial" | |||
SPF computation in order to build the routing table. | SPF computation in order to build the routing table. | |||
Leaf nodes SHOULD only hold their own N-TIEs, and in cases of L2L | Leaf nodes SHOULD only hold their own N-TIEs and, in cases of L2L | |||
implementations, the N-TIEs of their East/West neighbors. Leaf nodes | implementations, the N-TIEs of their East-West neighbors. Leaf nodes | |||
MUST hold all S-TIEs from their neighbors. | MUST hold all S-TIEs from their neighbors. | |||
Normally, a full network graph is created based on local N-TIEs and | Normally, a full network graph is created based on local N-TIEs and | |||
remote S-TIEs that it receives from neighbors, at which time, | remote S-TIEs that it receives from neighbors, at which time, | |||
necessary SPF computations are performed. Instead, leaf nodes can | necessary SPF computations are performed. Instead, leaf nodes can | |||
simply compute the minimum cost and next-hop set of each leaf | simply compute the minimum cost and next-hop set of each leaf | |||
neighbor by examining its local adjacencies. Associated N-TIEs are | neighbor by examining its local adjacencies. Associated N-TIEs are | |||
used to determine bi-directionality and derive the next-hop set. | used to determine bidirectionality and derive the next-hop set. The | |||
Cost is then derived from the minimum cost of the local adjacency to | cost is then derived from the minimum cost of the local adjacency to | |||
the neighbor and the prefix cost. | the neighbor and the prefix cost. | |||
Leaf nodes would then attach necessary prefixes as described in | Leaf nodes would then attach necessary prefixes as described in | |||
Section 6.6. | Section 6.6. | |||
6.8.4. Mobility | 6.8.4. Mobility | |||
The RIFT control plane MUST maintain the real time status of every | The RIFT control plane MUST maintain the real time status of every | |||
prefix, to which port it is attached, and to which leaf node that | prefix, to which port it is attached, and to which leaf node that | |||
port belongs. This is still true in cases of IP mobility where the | port belongs. This is still true in cases of IP mobility where the | |||
point of attachment may change several times a second. | point of attachment may change several times a second. | |||
There are two classic approaches to explicitly maintain this | There are two classic approaches to explicitly maintain this | |||
information, "timestamp" and "sequence counter" as follows: | information, "timestamp" and "sequence counter", which are defined as | |||
follows: | ||||
timestamp: | timestamp: | |||
With this method, the infrastructure SHOULD record the precise | With this method, the infrastructure SHOULD record the precise | |||
time at which the movement is observed. One key advantage of this | time at which the movement is observed. One key advantage of this | |||
technique is that it has no dependency on the mobile device. One | technique is that it has no dependency on the mobile device. One | |||
drawback is that the infrastructure MUST be precisely synchronized | drawback is that the infrastructure MUST be precisely synchronized | |||
in order to be able to compare timestamps as the points of | in order to be able to compare timestamps as the points of | |||
attachment change. This could be accomplished by utilizing | attachment change. This could be accomplished by utilizing the | |||
Precision Time Protocol (PTP) IEEE Std. 1588 [IEEEstd1588] or | Precision Time Protocol (PTP) (IEEE Std. 1588 [IEEEstd1588] or | |||
802.1AS [IEEEstd8021AS] which is designed for bridged LANs. Both | 802.1AS [IEEEstd8021AS]), which is designed for bridged LANs. | |||
the precision of the synchronization protocol and the resolution | Both the precision of the synchronization protocol and the | |||
of the timestamp must beat the shortest possible roaming time on | resolution of the timestamp must beat the shortest possible | |||
the fabric. Another drawback is that the presence of a mobile | roaming time on the fabric. Another drawback is that the presence | |||
device may only be observed asynchronously, such as when it starts | of a mobile device may only be observed asynchronously, such as | |||
using an IP protocol like ARP [RFC0826], IPv6 Neighbor Discovery | when it starts using an IP protocol like ARP [RFC0826], IPv6 | |||
[RFC4861], IPv6 Stateless Address Configuration [RFC4862], DHCP | Neighbor Discovery [RFC4861], IPv6 Stateless Address Configuration | |||
[RFC2131], or DHCPv6 [RFC8415]. | [RFC4862], DHCP [RFC2131], or DHCPv6 [RFC8415]. | |||
sequence counter: | sequence counter: | |||
With this method, a mobile device notifies its point of attachment | With this method, a mobile device notifies its point of attachment | |||
on arrival with a sequence counter that is incremented upon each | on arrival with a sequence counter that is incremented upon each | |||
movement. On the positive side, this method does not have a | movement. On the positive side, this method does not have a | |||
dependency on a precise sense of time, since the sequence of | dependency on a precise sense of time, since the sequence of | |||
movements is kept in order by the mobile device. The disadvantage | movements is kept in order by the mobile device. The disadvantage | |||
of this approach is the need for support for protocols that may be | of this approach is the need for support for protocols that may be | |||
used by the mobile device to register its presence to the leaf | used by the mobile device to register its presence to the leaf | |||
node with the capability to provide a sequence counter. Well- | node with the capability to provide a sequence counter. Well- | |||
known issues with sequence counters such as wrapping and | known issues with sequence counters, such as wrapping and | |||
comparison rules MUST be addressed properly. Sequence numbers | comparison rules, MUST be addressed properly. Sequence numbers | |||
MUST be compared by a single homogenous source to make operation | MUST be compared by a single homogenous source to make operation | |||
feasible. Sequence number comparison from multiple heterogeneous | feasible. Sequence number comparison from multiple heterogeneous | |||
sources would be extremely difficult to implement. | sources would be extremely difficult to implement. | |||
RIFT supports a hybrid approach by using an optional | RIFT supports a hybrid approach by using an optional | |||
'PrefixSequenceType' attribute (that is also called a | 'PrefixSequenceType' attribute (which is also called a | |||
_monotonic_clock_ in the schema) that consists of a timestamp and | _monotonic_clock_ in the schema) that consists of a timestamp and | |||
optional sequence number field. In case of a negatively distributed | optional sequence number field. In case of a negatively distributed | |||
prefix this attribute MUST NOT be included by the originator and it | prefix, this attribute MUST NOT be included by the originator and it | |||
MUST be ignored by all nodes during computation. When this attribute | MUST be ignored by all nodes during computation. When this attribute | |||
is present (observe that per data schema the attribute itself is | is present (observe that per data schema, the attribute itself is | |||
optional but in case it is included the 'timestamp' field is | optional, but in case it is included, the "timestamp" field is | |||
required): | required): | |||
* The leaf node MAY advertise a timestamp of the latest sighting of | * The leaf node MAY advertise a timestamp of the latest sighting of | |||
a prefix, e.g., by snooping IP protocols or the node using the | a prefix, e.g., by snooping IP protocols or the node using the | |||
time at which it advertised the prefix. RIFT transports the | time at which it advertised the prefix. RIFT transports the | |||
timestamp within the desired Prefix North TIEs as [IEEEstd1588] | timestamp within the desired North Prefix TIEs as the | |||
timestamp. | [IEEEstd1588] timestamp. | |||
* RIFT MAY interoperate with "Registration Extensions for 6LoWPAN | * RIFT MAY interoperate with "Registration Extensions for 6LoWPAN | |||
Neighbor Discovery" [RFC8505], which provides a method for | Neighbor Discovery" [RFC8505], which provides a method for | |||
registering a prefix with a sequence number called a Transaction | registering a prefix with a sequence number called a Transaction | |||
ID (TID). In such cases, RIFT SHOULD transport the derived TID | ID (TID). In such cases, RIFT SHOULD transport the derived TID | |||
without modification. | without modification. | |||
* RIFT also defines an abstract negative clock (ASNC) (also called | * RIFT also defines an abstract negative clock (ASNC) (also called | |||
an 'undefined' clock). The ASNC MUST be considered older than any | an "undefined" clock). The ASNC MUST be considered older than any | |||
other defined clock. By default, when a node receives a Prefix | other defined clock. By default, when a node receives a North | |||
North TIE that does not contain a 'PrefixSequenceType' attribute, | Prefix TIE that does not contain a 'PrefixSequenceType' attribute, | |||
it MUST interpret the absence as the ASNC. | it MUST interpret the absence as the ASNC. | |||
* Any prefix present on the fabric in multiple nodes that have the | * Any prefix present on the fabric in multiple nodes that have the | |||
*same* clock is considered as anycast. | *same* clock is considered as anycast. | |||
* RIFT specification assumes that all nodes are being synchronized | * The RIFT specification assumes that all nodes are being | |||
within at least 200 milliseconds or less. This is achievable | synchronized within at least 200 milliseconds or less. This is | |||
through the use of NTP [RFC5905]. An implementation MAY provide a | achievable through the use of NTP [RFC5905]. An implementation | |||
way to reconfigure a domain to a different value, and provides for | MAY provide a way to reconfigure a domain to a different value and | |||
this purpose a variable called MAXIMUM_CLOCK_DELTA. | provides a variable called MAXIMUM_CLOCK_DELTA for this purpose. | |||
6.8.4.1. Clock Comparison | 6.8.4.1. Clock Comparison | |||
All monotonic clock values MUST be compared to each other using the | All monotonic clock values MUST be compared to each other using the | |||
following rules: | following rules: | |||
1. The ASNC is older than any other value except ASNC *and* | 1. The ASNC is older than any other value except ASNC *and* | |||
2. Clocks with timestamp differing by more than MAXIMUM_CLOCK_DELTA | 2. Clocks with timestamps differing by more than MAXIMUM_CLOCK_DELTA | |||
are comparable by using the timestamps only *and* | are comparable by using the timestamps only *and* | |||
3. Clocks with timestamps differing by less than MAXIMUM_CLOCK_DELTA | 3. Clocks with timestamps differing by less than MAXIMUM_CLOCK_DELTA | |||
are comparable by using their TIDs only *and* | are comparable by using their TIDs only, *and* | |||
4. An undefined TID is always older than any other TID *and* | 4. An undefined TID is always older than any other TID, *and* | |||
5. TIDs are compared using rules of [RFC8505]. | 5. TIDs are compared using rules of [RFC8505]. | |||
6.8.4.2. Interaction between Time Stamps and Sequence Counters | 6.8.4.2. Interaction Between Timestamps and Sequence Counters | |||
For attachment changes that occur less frequently (e.g., once per | For attachment changes that occur less frequently (e.g., once per | |||
second), the timestamp that the RIFT infrastructure captures should | second), the timestamp that the RIFT infrastructure captures should | |||
be enough to determine the most current discovery. If the point of | be enough to determine the most current discovery. If the point of | |||
attachment changes faster than the maximum drift of the time stamping | attachment changes faster than the maximum drift of the timestamping | |||
mechanism (i.e., MAXIMUM_CLOCK_DELTA), then a sequence number SHOULD | mechanism (i.e., MAXIMUM_CLOCK_DELTA), then a sequence number SHOULD | |||
be used to enable necessary precision to determine currency. | be used to enable necessary precision to determine currency. | |||
The sequence counter in [RFC8505] is encoded as one octet and wraps | The sequence counter in [RFC8505] is encoded as one octet and wraps | |||
around using Appendix A. | around using the arithmetic defined in Appendix A. | |||
Within the resolution of MAXIMUM_CLOCK_DELTA, sequence counter values | Within the resolution of MAXIMUM_CLOCK_DELTA, sequence counter values | |||
captured during 2 sequential iterations of the same timestamp SHOULD | captured during 2 sequential iterations of the same timestamp SHOULD | |||
be comparable. This means that with default values, a node may move | be comparable. This means that with default values, a node may move | |||
up to 127 times in a 200-millisecond period and the clocks will | up to 127 times in a 200-millisecond period and the clocks will | |||
remain comparable. This allows the RIFT infrastructure to explicitly | remain comparable. This allows the RIFT infrastructure to explicitly | |||
assert the most up-to-date advertisement. | assert the most up-to-date advertisement. | |||
6.8.4.3. Anycast vs. Unicast | 6.8.4.3. Anycast vs. Unicast | |||
A unicast prefix can be attached to at most one leaf, whereas an | A unicast prefix can be attached to one leaf at most, whereas an | |||
anycast prefix may be reachable via more than one leaf. | anycast prefix may be reachable via more than one leaf. | |||
If a monotonic clock attribute is provided on the prefix, then the | If a monotonic clock attribute is provided on the prefix, then the | |||
prefix with the *newest* clock value is strictly preferred. An | prefix with the *newest* clock value is strictly preferred. An | |||
anycast prefix does not carry a clock or all clock attributes MUST be | anycast prefix does not carry a clock, or all clock attributes MUST | |||
the same under the rules of Section 6.8.4.1. | be the same under the rules of Section 6.8.4.1. | |||
It is important that in mobility events the leaf is re-flooding as | In mobility events, it is important that the leaf is reflooding as | |||
quickly as possible to communicate the absence of the prefix that | quickly as possible to communicate the absence of the prefix that | |||
moved. | moved. | |||
Without support for [RFC8505] movements on the fabric within | Without support for [RFC8505], movements on the fabric within | |||
intervals smaller than 100msec will be interpreted as anycast. | intervals smaller than 100 msec will be interpreted as anycast. | |||
6.8.4.4. Overlays and Signaling | 6.8.4.4. Overlays and Signaling | |||
RIFT is agnostic to any overlay technologies and their associated | RIFT is agnostic to any overlay technologies and their associated | |||
control and transports that run on top of it (e.g. VXLAN). It is | control and transports that run on top of it (e.g., Virtual | |||
expected that leaf nodes and possibly ToF nodes can perform necessary | eXtensible Local Area Network (VXLAN)). It is expected that leaf | |||
data plane encapsulation. | nodes and possibly ToF nodes can perform necessary data plane | |||
encapsulation. | ||||
In the context of mobility, overlays provide another possible | In the context of mobility, overlays provide another possible | |||
solution to avoid injecting mobile prefixes into the fabric as well | solution to avoid injecting mobile prefixes into the fabric as well | |||
as improving scalability of the deployment. It makes sense to | as improving scalability of the deployment. It makes sense to | |||
consider overlays for mobility solutions in IP fabrics. As an | consider overlays for mobility solutions in IP fabrics. As an | |||
example, a mobility protocol such as LISP [RFC9300] [RFC9301] may | example, a mobility protocol such as the Locator/ID Separation | |||
inform the ingress leaf of the location of the egress leaf in real | Protocol (LISP) [RFC9300] [RFC9301] may inform the ingress leaf of | |||
time. | the location of the egress leaf in real time. | |||
Another possibility is to consider that mobility as an underlay | Another possibility is to consider that mobility is an underlay | |||
service and support it in RIFT to an extent. The load on the fabric | service and support it in RIFT to an extent. The load on the fabric | |||
increases with the amount of mobility obviously since a move forces | increases with the amount of mobility since a move forces flooding | |||
flooding and computation on all nodes in the scope of the move so | and computation on all nodes in the scope of the move so tunneling | |||
tunneling from leaf to the ToF may be desired to speed up convergence | from the leaf to the ToF may be desired to speed up convergence | |||
times. | times. | |||
6.8.5. Key/Value (KV) Store | 6.8.5. Key/Value (KV) Store | |||
6.8.5.1. Southbound | 6.8.5.1. Southbound | |||
RIFT supports the southbound distribution of key-value pairs that can | RIFT supports the southbound distribution of key-value pairs that can | |||
be used to distribute information to facilitate higher levels of | be used to distribute information to facilitate higher levels of | |||
functionality (e.g. distribution of configuration information). KV | functionality (e.g., distribution of configuration information). KV | |||
South TIEs may arrive from multiple nodes and therefore MUST execute | South TIEs may arrive from multiple nodes and therefore MUST execute | |||
the following tie-breaking rules for each key: | the following tie-breaking rules for each key: | |||
1. Only KV TIEs received from nodes to which a bi-directional | 1. Only KV TIEs received from nodes to which a bidirectional | |||
adjacency exists MUST be considered. | adjacency exists MUST be considered. | |||
2. For each valid KV South TIEs that contains the same key, the | 2. For each valid KV South TIEs that contains the same key, the | |||
value within the South TIE with the highest level will be | value within the South TIE with the highest level will be | |||
preferred. If the levels are identical, the highest originating | preferred. If the levels are identical, the highest originating | |||
System ID will be preferred. In the case of overlapping keys in | System ID will be preferred. In the case of overlapping keys in | |||
the winning South TIE, the behavior is undefined. | the winning South TIE, the behavior is undefined. | |||
Consider that if a node goes down, nodes south of it will lose | Consider that if a node goes down, nodes south of it will lose | |||
associated adjacencies causing them to disregard corresponding KVs. | associated adjacencies, causing them to disregard corresponding KVs. | |||
New KV South TIEs are advertised to prevent stale information being | New KV South TIEs are advertised to prevent stale information being | |||
used by nodes that are further south. KV advertisements southbound | used by nodes that are further south. KV advertisements southbound | |||
are not a result of independent computation by every node over the | are not a result of independent computation by every node over the | |||
same set of South TIEs, but a diffused computation. | same set of South TIEs but a diffused computation. | |||
6.8.5.2. Northbound | 6.8.5.2. Northbound | |||
Certain use cases necessitate distribution of essential KV | Certain use cases necessitate distribution of essential KV | |||
information that is generated by the leaves in the northbound | information that is generated by the leaves in the northbound | |||
direction. Such information is flooded in KV North TIEs. Since the | direction. Such information is flooded in KV North TIEs. Since the | |||
originator of the KV North TIEs is preserved during flooding, the | originator of the KV North TIEs is preserved during flooding, the | |||
corresponding mechanism will define, if necessary, tie-breaking rules | corresponding mechanism will define, if necessary, tie-breaking rules | |||
depending on the semantics of the information. | depending on the semantics of the information. | |||
Only KV TIEs from nodes that are reachable via multiplane | Only KV TIEs from nodes that are reachable via multi-plane | |||
reachability computation mentioned in Section 6.5.2.3 SHOULD be | reachability computation mentioned in Section 6.5.2.3 SHOULD be | |||
considered. | considered. | |||
6.8.6. Interactions with BFD | 6.8.6. Interactions with BFD | |||
RIFT MAY incorporate BFD [RFC5881] to react quickly to link failures. | RIFT MAY incorporate Bidirectional Forwarding Detection (BFD) | |||
In such case, the following procedures are introduced: | [RFC5881] to react quickly to link failures. In such case, the | |||
following procedures are introduced: | ||||
After RIFT _ThreeWay_ hello adjacency convergence a BFD session | 1. After RIFT _ThreeWay_ hello adjacency convergence, a BFD session | |||
MAY be formed automatically between the RIFT endpoints without | MAY be formed automatically between the RIFT endpoints without | |||
further configuration using the exchanged discriminators that are | further configuration using the exchanged discriminators that are | |||
equal to the _local_id_ in the _LIEPacket_. The capability of the | equal to the _local_id_ in the _LIEPacket_. The capability of the | |||
remote side to support BFD is carried in the LIEs in | remote side to support BFD is carried in the LIEs in | |||
_LinkCapabilities_. | _LinkCapabilities_. | |||
In case an established BFD session goes Down after it was Up, RIFT | 2. In case an established BFD session goes Down after it was Up, | |||
adjacency SHOULD be re-initialized and subsequently started from | RIFT adjacency SHOULD be re-initialized and subsequently started | |||
Init after it receives a consecutive BFD Up. | from Init after it receives a consecutive BFD Up. | |||
In case of parallel links between nodes each link MAY run its own | 3. In case of parallel links between nodes, each link MAY run its | |||
independent BFD session or they MAY share a session. The specific | own independent BFD session or they MAY share a session. The | |||
manner in which this is implemented is outside the scope of this | specific manner in which this is implemented is outside the scope | |||
document. | of this document. | |||
If link identifiers or BFD capabilities change, both the LIE and | 4. If link identifiers or BFD capabilities change, both the LIE and | |||
any BFD sessions SHOULD be brought down and back up again. In | any BFD sessions SHOULD be brought down and back up again. In | |||
case only the advertised capabilities change, the node MAY choose | case only the advertised capabilities change, the node MAY choose | |||
to persist the BFD session. | to persist the BFD session. | |||
Multiple RIFT instances MAY choose to share a single BFD session, | 5. Multiple RIFT instances MAY choose to share a single BFD session; | |||
in such cases the behavior for which discriminators are used is | in such cases, the behavior for which discriminators are used is | |||
undefined. However, RIFT MAY advertise the same link ID for the | undefined. However, RIFT MAY advertise the same link ID for the | |||
same interface in multiple instances to "share" discriminators. | same interface in multiple instances to "share" discriminators. | |||
The BFD TTL follows [RFC5082]. | 6. The BFD TTL follows [RFC5082]. | |||
6.8.7. Fabric Bandwidth Balancing | 6.8.7. Fabric Bandwidth Balancing | |||
A well understood problem in fabrics is that, in case of link | A well understood problem in fabrics is that, in case of link | |||
failures, it would be ideal to rebalance how much traffic is sent to | failures, it would be ideal to rebalance how much traffic is sent to | |||
switches in the next level based on available ingress and egress | switches in the next level based on the available ingress and egress | |||
bandwidth. | bandwidth. | |||
RIFT supports a light-weight mechanism that can deal with the problem | RIFT supports a light-weight mechanism that can deal with the problem | |||
based on the fact that RIFT is loop-free. | based on the fact that RIFT is loop-free. | |||
6.8.7.1. Northbound Direction | 6.8.7.1. Northbound Direction | |||
Every RIFT node SHOULD compute the amount of northbound bandwidth | Every RIFT node SHOULD compute the amount of northbound bandwidth | |||
available through neighbors at a higher level and modify the distance | available through neighbors at a higher level and modify the distance | |||
received on default route from these neighbors. The bandwidth is | received on the default route from these neighbors. The bandwidth is | |||
advertised in _NodeNeighborsTIEElement_ element which represents the | advertised in the _NodeNeighborsTIEElement_ element, which represents | |||
sum of the bandwidths of all the parallel links to a neighbor. | the sum of the bandwidths of all the parallel links to a neighbor. | |||
Default routes with differing distances SHOULD be used to support | Default routes with differing distances SHOULD be used to support | |||
weighted ECMP forwarding. Such a distance is called Bandwidth | weighted ECMP forwarding. Such a distance is called Bandwidth | |||
Adjusted Distance (BAD). This is best illustrated by a simple | Adjusted Distance (BAD). This is best illustrated by a simple | |||
example. | example. | |||
100 x 100 100 MBits | 100 x 100 100 Mbit/s | |||
| x | | | | x | | | |||
+-+---+-+ +-+---+-+ | +-+---+-+ +-+---+-+ | |||
| | | | | | | | | | |||
|Spin111| |Spin112| | |Spin111| |Spin112| | |||
+-+---+++ ++----+++ | +-+---+++ ++----+++ | |||
|x || || || | |x || || || | |||
|| |+---------------+ || | || |+---------------+ || | |||
|| +---------------+| || | || +---------------+| || | |||
|| || || || | || || || || | |||
|| || || || | || || || || | |||
-----All Links 10 MBit------- | -----All Links 10 Mbit/s----- | |||
|| || || || | || || || || | |||
|| || || || | || || || || | |||
|| +------------+| || || | || +------------+| || || | |||
|| |+------------+ || || | || |+------------+ || || | |||
|x || || || | |x || || || | |||
+-+---+++ +--++-+++ | +-+---+++ +--++-+++ | |||
| | | | | | | | | | |||
|Leaf111| |Leaf112| | |Leaf111| |Leaf112| | |||
+-------+ +-------+ | +-------+ +-------+ | |||
Figure 32: Balancing Bandwidth | Figure 32: Balancing Bandwidth | |||
Figure 32 depicts an example topology where links between leaf and | Figure 32 depicts an example topology where links between leaf and | |||
spine nodes are 10 MBit/s and links from spine nodes northbound are | spine nodes are 10 Mbit/s and links from spine nodes northbound are | |||
100 MBit/s. It includes parallel link failure between Leaf 111 and | 100 Mbit/s. It includes parallel link failure between Leaf 111 and | |||
Spine 111 and as a result, Leaf 111 wants to forward more traffic | Spine 111, and as a result, Leaf 111 wants to forward more traffic | |||
toward Spine 112. Additionally, it includes as well an uplink | towards Spine 112. Additionally, it includes an uplink failure on | |||
failure on Spine 111. | Spine 111. | |||
The local modification of the received default route distance from | The local modification of the received default route distance from | |||
upper level is achieved by running a relatively simple algorithm | the upper level is achieved by running a relatively simple algorithm | |||
where the bandwidth is weighted exponentially, while the distance on | where the bandwidth is weighted exponentially, while the distance on | |||
the default route represents a multiplier for the bandwidth weight | the default route represents a multiplier for the bandwidth weight | |||
for easy operational adjustments. | for easy operational adjustments. | |||
On a node, L, use Node TIEs to compute from each non-overloaded | On a node, L, use Node TIEs to compute 3 values from each non- | |||
northbound neighbor N to compute 3 values: | overloaded northbound neighbor, N: | |||
L_N_u: sum of the bandwidth available from L to N (to account for | 1. L_N_u: sum of the bandwidth available from L to N (to account for | |||
parallel links) | parallel links) | |||
N_u: sum of the uplink bandwidth available on N | 2. N_u: sum of the uplink bandwidth available on N | |||
T_N_u: L_N_u * OVERSUBSCRIPTION_CONSTANT + N_u | 3. T_N_u: L_N_u * OVERSUBSCRIPTION_CONSTANT + N_u | |||
For all T_N_u determine the corresponding M_N_u as | For all T_N_u, determine the corresponding M_N_u as | |||
log_2(next_power_2(T_N_u)) and determine MAX_M_N_u as maximum value | log_2(next_power_2(T_N_u)) and determine MAX_M_N_u as the maximum | |||
of all such M_N_u values. | value of all such M_N_u values. | |||
For each advertised default route from a node N modify the advertised | For each advertised default route from a node N, modify the | |||
distance D to BAD = D * (1 + MAX_M_N_u - M_N_u) and use BAD instead | advertised distance D to BAD = D * (1 + MAX_M_N_u - M_N_u) and use | |||
of distance D to weight balance default forwarding towards N. | BAD instead of distance D to balance the weight of the default | |||
forwarding towards N. | ||||
For the example above, a simple table of values will help in | For the example above, a simple table of values will help in | |||
understanding of the concept. The implicit assumption here is that | understanding the concept. The implicit assumption here is that all | |||
all default route distances are advertised with D=1 and that | default route distances are advertised with D=1 and that | |||
OVERSUBSCRIPTION_CONSTANT = 1. | OVERSUBSCRIPTION_CONSTANT=1. | |||
+=========+===========+=======+=======+=====+ | +=========+===========+=======+=======+=====+ | |||
| Node | N | T_N_u | M_N_u | BAD | | | Node | N | T_N_u | M_N_u | BAD | | |||
+=========+===========+=======+=======+=====+ | +=========+===========+=======+=======+=====+ | |||
| Leaf111 | Spine 111 | 110 | 7 | 2 | | | Leaf111 | Spine 111 | 110 | 7 | 2 | | |||
+---------+-----------+-------+-------+-----+ | +---------+-----------+-------+-------+-----+ | |||
| Leaf111 | Spine 112 | 220 | 8 | 1 | | | Leaf111 | Spine 112 | 220 | 8 | 1 | | |||
+---------+-----------+-------+-------+-----+ | +---------+-----------+-------+-------+-----+ | |||
| Leaf112 | Spine 111 | 120 | 7 | 2 | | | Leaf112 | Spine 111 | 120 | 7 | 2 | | |||
+---------+-----------+-------+-------+-----+ | +---------+-----------+-------+-------+-----+ | |||
| Leaf112 | Spine 112 | 220 | 8 | 1 | | | Leaf112 | Spine 112 | 220 | 8 | 1 | | |||
+---------+-----------+-------+-------+-----+ | +---------+-----------+-------+-------+-----+ | |||
Table 6: BAD Computation | Table 6: BAD Computation | |||
If a calculation produces a result exceeding the range of the type, | If a calculation produces a result exceeding the range of the type, | |||
e.g. bandwidth, the result is set to the highest possible value for | e.g., bandwidth, the result is set to the highest possible value for | |||
that type. | that type. | |||
BAD SHOULD only be computed for default routes. A node MAY compute | BAD SHOULD only be computed for default routes. A node MAY compute | |||
and use BAD for any disaggregated prefixes or other RIFT routes. A | and use BAD for any disaggregated prefixes or other RIFT routes. A | |||
node MAY use a different algorithm to weight northbound traffic based | node MAY use a different algorithm to weight northbound traffic based | |||
on bandwidth. If a different algorithm is used, its successful | on the bandwidth. If a different algorithm is used, its successful | |||
behavior MUST NOT depend on uniformity of algorithm or | behavior MUST NOT depend on uniformity of the algorithm or | |||
synchronization of BAD computations across the fabric. E.g. it is | synchronization of BAD computations across the fabric. For example, | |||
conceivable that leaves could use real time link loads gathered by | it is conceivable that leaves could use real time link loads gathered | |||
analytics to change the amount of traffic assigned to each default | by analytics to change the amount of traffic assigned to each default | |||
route next hop. | route next hop. | |||
A change in available bandwidth will only affect, at most, two levels | A change in available bandwidth will only affect, at most, two levels | |||
down in the fabric, i.e., the blast radius of bandwidth adjustments | down in the fabric, i.e., the blast radius of bandwidth adjustments | |||
is constrained no matter the fabric's height. | is constrained no matter the fabric's height. | |||
6.8.7.2. Southbound Direction | 6.8.7.2. Southbound Direction | |||
Due to its loop free nature, during South SPF, a node MAY account for | Due to its loop-free nature, during South SPF, a node MAY account for | |||
maximum available bandwidth on nodes in lower levels and modify the | the maximum available bandwidth on nodes in lower levels and modify | |||
amount of traffic offered to the next level's southbound nodes. It | the amount of traffic offered to the next level's southbound nodes. | |||
is worth considering that such computations may be more effective if | It is worth considering that such computations may be more effective | |||
standardized, but do not have to be. As long as a packet continues | if they are standardized, but they do not have to be. As long as a | |||
to flow southbound, it will take some viable, loop-free path to reach | packet continues to flow southbound, it will take some viable, loop- | |||
its destination. | free path to reach its destination. | |||
6.8.8. Label Binding | 6.8.8. Label Binding | |||
A node MAY advertise in its LIEs, a locally significant, downstream | In its LIEs, a node MAY advertise a locally significant, downstream- | |||
assigned, interface specific label. One use of such a label is a | assigned, interface-specific label. One use of such a label is a | |||
hop-by-hop encapsulation allowing forwarding planes to be easily | hop-by-hop encapsulation allowing forwarding planes to be easily | |||
distinguished among multiple RIFT instances. | distinguished among multiple RIFT instances. | |||
6.8.9. Leaf to Leaf Procedures | 6.8.9. L2L Procedures | |||
RIFT implementations SHOULD support special East-West adjacencies | RIFT implementations SHOULD support special East-West adjacencies | |||
between leaf nodes. Leaf nodes supporting these procedures MUST: | between leaf nodes. Leaf nodes supporting these procedures MUST: | |||
advertise the LEAF_2_LEAF flag in its node capabilities *and* | 1. advertise the LEAF_2_LEAF flag in its node capabilities, | |||
set the overload flag on all leaf's Node TIEs *and* | 2. set the overload flag on all leaf's Node TIEs, | |||
flood only a node's own north and south TIEs over E-W leaf | 3. flood only a node's own North and South TIEs over E-W leaf | |||
adjacencies *and* | adjacencies, | |||
always use E-W leaf adjacency in all SPF computations *and* | 4. always use E-W leaf adjacency in all SPF computations, | |||
install a discard route for any advertised aggregate routes in a | 5. install a discard route for any advertised aggregate routes in a | |||
leaf's TIE *and* | leaf's TIE, *and* | |||
never form southbound adjacencies. | 6. never form southbound adjacencies. | |||
This will allow the E-W leaf nodes to exchange traffic strictly for | This will allow the E-W leaf nodes to exchange traffic strictly for | |||
the prefixes advertised in each other's north prefix TIEs since the | the prefixes advertised in each other's North Prefix TIEs since the | |||
southbound computation will find the reverse direction in the other | southbound computation will find the reverse direction in the other | |||
node's TIE and install its north prefixes. | node's TIE and install its north prefixes. | |||
6.8.10. Address Family and Multi Topology Considerations | 6.8.10. Address Family and Multi-Topology Considerations | |||
Multi-Topology (MT)[RFC5120] and Multi-Instance (MI)[RFC8202] | Multi-Topology (MT) [RFC5120] and Multi-Instance (MI) [RFC8202] | |||
concepts are used today in link-state routing protocols to support | concepts are used today in link-state routing protocols to support | |||
several domains on the same physical topology. RIFT supports this | several domains on the same physical topology. RIFT supports this | |||
capability by carrying transport ports in the LIE protocol exchanges. | capability by carrying transport ports in the LIE protocol exchanges. | |||
Multiplexing of LIEs can be achieved by either choosing varying | Multiplexing of LIEs can be achieved by either choosing varying | |||
multicast addresses or ports on the same address. | multicast addresses or ports on the same address. | |||
BFD interactions in Section 6.8.6 are implementation dependent when | BFD interactions in Section 6.8.6 are implementation-dependent when | |||
multiple RIFT instances run on the same link. | multiple RIFT instances run on the same link. | |||
6.8.11. One-Hop Healing of Levels with East-West Links | 6.8.11. One-Hop Healing of Levels with East-West Links | |||
Based on the rules defined in Section 6.4, Section 6.3.8 and given | Based on the rules defined in Sections 6.4 and 6.3.8 and given the | |||
the presence of E-W links, RIFT can provide a one-hop protection for | presence of E-W links, RIFT can provide a one-hop protection for | |||
nodes that have lost all their northbound links. This can also be | nodes that have lost all their northbound links. This can also be | |||
applied to multi-plane designs where complex link set failures occur | applied to multi-plane designs where complex link set failures occur | |||
at the ToF when links are exclusively used for flooding topology | at the ToF when links are exclusively used for flooding topology | |||
information. Appendix B.4 outlines this behavior. | information. Appendix B.4 outlines this behavior. | |||
6.9. Security | 6.9. Security | |||
6.9.1. Security Model | 6.9.1. Security Model | |||
An inherent property of any security and ZTP architecture is the | An inherent property of any security and ZTP architecture is the | |||
resulting trade-off in regard to integrity verification of the | resulting trade-off in regard to integrity verification of the | |||
information distributed through the fabric vs. provisioning and auto- | information distributed through the fabric vs. provisioning and | |||
configuration requirements. At a minimum the security of an | autoconfiguration requirements. At a minimum, the security of an | |||
established adjacency should be ensured. The stricter the security | established adjacency should be ensured. The stricter the security | |||
model the more provisioning must take over the role of ZTP. | model, the more provisioning must take over the role of ZTP. | |||
RIFT supports the following security models to allow for flexible | RIFT supports the following security models to allow for flexible | |||
control by the operator. | control by the operator: | |||
* The most security conscious operators may choose to have control | * The most security-conscious operators may choose to have control | |||
over which ports interconnect between a given pair of nodes, such | over which ports interconnect between a given pair of nodes, such | |||
a model is called the "Port-Association Model" (PAM). This is | a model is called the "Port-Association Model" (PAM). This is | |||
achievable by configuring each pair of directly connected ports | achievable by configuring each pair of directly connected ports | |||
with a designated shared key or public/private key pair. | with a designated shared key or public/private key pair. | |||
* In physically secure data center locations, operators may choose | * In physically secure data center locations, operators may choose | |||
to control connectivity between entire nodes, called here the | to control connectivity between entire nodes, called here the | |||
"Node-Association Model" (NAM). A benefit of this model is that | "Node-Association Model" (NAM). A benefit of this model is that | |||
it allows for simplified port sparing. | it allows for simplified port sparing. | |||
skipping to change at page 118, line 20 ¶ | skipping to change at line 5228 ¶ | |||
are replaced more often than network nodes. In addition, this | are replaced more often than network nodes. In addition, this | |||
model allows for simplified node sparing. | model allows for simplified node sparing. | |||
* These models may be mixed throughout the fabric depending upon | * These models may be mixed throughout the fabric depending upon | |||
security requirements at various levels of the fabric and | security requirements at various levels of the fabric and | |||
willingness to accept increased provisioning complexity. | willingness to accept increased provisioning complexity. | |||
In order to support the cases mentioned above, RIFT implementations | In order to support the cases mentioned above, RIFT implementations | |||
supports, through operator control, mechanisms that allow for: | supports, through operator control, mechanisms that allow for: | |||
a. specification of the appropriate level in the fabric, | * a specification of the appropriate level in the fabric, | |||
b. discovery and reporting of missing connections, | * discovery and reporting of missing connections, and | |||
c. discovery and reporting of unexpected connections while | * discovery and reporting of unexpected connections while preventing | |||
preventing them from forming insecure adjacencies. | them from forming insecure adjacencies. | |||
Operators may only choose to configure the level of each node, but | Operators may only choose to configure the level of each node but not | |||
not explicitly configure which connections are allowed. In this | explicitly configure which connections are allowed. In this case, | |||
case, RIFT will only allow adjacencies to establish between nodes | RIFT will only allow adjacencies to establish between nodes that are | |||
that are in adjacent levels. Operators with the lowest security | in adjacent levels. Operators with the lowest security requirements | |||
requirements may not use any configuration to specify which | may not use any configuration to specify which connections are | |||
connections are allowed. Nodes in such fabrics could rely fully on | allowed. Nodes in such fabrics could rely fully on ZTP and | |||
ZTP and only established adjacencies between nodes in adjacent | established adjacencies between nodes in adjacent levels. Figure 33 | |||
levels. Figure 33 illustrates inherent tradeoffs between the | illustrates inherent trade-offs between the different security | |||
different security models. | models. | |||
Some level of link quality verification may be required prior to an | Some level of link quality verification may be required prior to an | |||
adjacency being used for forwarding. For example, an implementation | adjacency being used for forwarding. For example, an implementation | |||
may require that a BFD session comes up before advertising the | may require that a BFD session comes up before advertising the | |||
adjacency. | adjacency. | |||
For the cases outlined above, RIFT has two approaches to enforce that | For the cases outlined above, RIFT has two approaches to enforce that | |||
a local port is connected to the correct port on the correct remote | a local port is connected to the correct port on the correct remote | |||
node. One approach is to piggy-back on RIFT's authentication | node. One approach is to piggyback on RIFT's authentication | |||
mechanism. Assuming the provisioning model (e.g. YANG) is flexible | mechanism. Assuming the provisioning model (e.g., YANG) is flexible | |||
enough, operators can choose to provision a unique authentication key | enough, operators can choose to provision a unique authentication key | |||
for the following conceptual models: | for the following conceptual models: | |||
a. each pair of ports in "port-association model" or | * each pair of ports in "port-association model" | |||
b. each pair of switches in "node-association model" or | * each pair of switches in "node-association model", or | |||
c. the entire fabric in "fabric-association model". | ||||
The other approach is to rely on the System ID, port-id and level | * the entire fabric in "fabric-association model". | |||
The other approach is to rely on the System ID, port-id, and level | ||||
fields in the LIE message to validate an adjacency against the | fields in the LIE message to validate an adjacency against the | |||
expected cabling topology, and optionally introduce some new rules in | expected cabling topology and optionally introduce some new rules in | |||
the FSM to allow the adjacency to come up if the expectations are | the FSM to allow the adjacency to come up if the expectations are | |||
met. | met. | |||
^ /\ | | ^ /\ | | |||
/|\ / \ | | /|\ / \ | | |||
| / \ | | | / \ | | |||
| / PAM \ | | | / PAM \ | | |||
Increasing / \ Increasing | Increasing / \ Increasing | |||
Integrity +----------+ Flexibility | Integrity +----------+ Flexibility | |||
& / NAM \ & | & / NAM \ & | |||
skipping to change at page 119, line 30 ¶ | skipping to change at line 5287 ¶ | |||
Provisioning / FAM \ Configuration | Provisioning / FAM \ Configuration | |||
| / \ | | | / \ | | |||
| +--------------------+ \|/ | | +--------------------+ \|/ | |||
| / Zero Configuration \ v | | / Zero Configuration \ v | |||
+------------------------+ | +------------------------+ | |||
Figure 33: Security Model | Figure 33: Security Model | |||
6.9.2. Security Mechanisms | 6.9.2. Security Mechanisms | |||
RIFT Security goals are to ensure: | RIFT security goals are to ensure: | |||
1. authentication | * authentication, | |||
2. message integrity | * message integrity, | |||
3. the prevention of replay attacks | * the prevention of replay attacks, | |||
4. low processing overhead | * low processing overhead, and | |||
5. efficient messaging | * efficient messaging | |||
unless no security is deployed by means of using | unless no security is deployed by means of using | |||
`undefined_securitykey_id` as key identifiers. | 'undefined_securitykey_id' as key identifiers (key ID). | |||
Message confidentiality is a non-goal. | Message confidentiality is a non-goal. | |||
The model in the previous section allows a range of security key | The model in the previous section allows a range of security key | |||
types that are analogous to the various security association models. | types that are analogous to the various security association models. | |||
PAM and NAM allow security associations at the port or node level | PAM and NAM allow security associations at the port or node level | |||
using symmetric or asymmetric keys that are pre-installed. FAM | using symmetric or asymmetric keys that are preinstalled. FAM argues | |||
argues for security associations to be applied only at a group level | for security associations to be applied only at a group level or to | |||
or to be refined once the topology has been established. RIFT does | be refined once the topology has been established. RIFT does not | |||
not specify how security keys are installed or updated, though it | specify how security keys are installed or updated, though it does | |||
does specify how the key can be used to achieve security goals. | specify how the key can be used to achieve security goals. | |||
The protocol has provisions for "weak" nonces to prevent replay | The protocol has provisions for "weak" nonces to prevent replay | |||
attacks and includes authentication mechanisms comparable to | attacks and includes authentication mechanisms comparable to those | |||
[RFC5709] and [RFC7987]. | described in [RFC5709] and [RFC7987]. | |||
6.9.3. Security Envelope | 6.9.3. Security Envelope | |||
A serialized schema _ProtocolPacket_ MUST be carried in a secure | A serialized schema _ProtocolPacket_ MUST be carried in a secure | |||
envelope illustrated in Figure 34. The _ProtocolPacket_ MUST be | envelope as illustrated in Figure 34. The _ProtocolPacket_ MUST be | |||
serialized using the default Thrift's Binary Protocol. Any value in | serialized using the default Thrift's binary protocol. Any value in | |||
the packet following a security fingerprint MUST be used by a | the packet following a security fingerprint MUST be used by a | |||
receiver only after the fingerprint generated based on acceptable, | receiver only after the fingerprint generated based on an acceptable, | |||
advertised key ID has been validated against the data covered by it | advertised key ID has been validated against the data covered by the | |||
bare exceptions arising from operational exigencies where, based on | bare exceptions arising from operational exigencies. Based on local | |||
local configuration, a node MAY allow for the envelope's integrity | configuration, a node MAY allow for the envelope's integrity checks | |||
checks to be skipped and for behavior specified in Section 6.9.6. | to be skipped and for the procedure specified in Section 6.9.6 to be | |||
This means that for all packets, in case the node is configured to | implemented. This means that for all packets, in case the node is | |||
validate the outer fingerprint based on a key ID, an unexpected key | configured to validate the outer fingerprint based on a key ID, an | |||
ID or fingerprint not validating against expected key ID will lead to | unexpected key ID or fingerprint not validating against the expected | |||
packet rejection. Further, in case of reception of a TIE, and the | key ID will lead to packet rejection. Further, in case of reception | |||
receiver being configured to validate the originator by checking the | of a TIE and the receiver being configured to validate the originator | |||
TIE Origin Security Envelope Header fingerprint against a key ID, an | by checking the TIE Origin Security Envelope Header fingerprint | |||
incorrect key ID or inner fingerprint not validating against the key | against a key ID, an incorrect key ID or inner fingerprint not | |||
ID will lead to the rejection of the packet. | validating against the key ID will lead to the rejection of the | |||
packet. | ||||
For reasons of clarity it is important to observe that the | For reasons of clarity, it is important to observe that the | |||
specification uses the word fingerprint and signature interchangeably | specification uses the words "fingerprint" and "signature" | |||
since the specific properties of the fingerprint part of the envelope | interchangeably since the specific properties of the fingerprint part | |||
depend on the algorithms used to insure the payload integrity. | of the envelope depend on the algorithms used to insure the payload | |||
Moreover, any security chosen never implies encryption due to | integrity. Moreover, any security chosen never implies encryption | |||
performance impact involved but only fingerprint or signature | due to performance impact involved but only fingerprint or signature | |||
generation and validation. | generation and validation. | |||
An implementation MUST implement at least both sending and receiving | An implementation MUST implement at least both sending and receiving | |||
HMAC-SHA256 fingerprints as defined in Section 10.2 to ensure | HMAC-SHA256 fingerprints as defined in Section 10.2 to ensure | |||
interoperability but MAY use `undefined_securitykey_id` by default. | interoperability but MAY use 'undefined_securitykey_id' by default. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
UDP Header: | UDP Header: | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Source Port | RIFT destination port | | | Source Port | RIFT destination port | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| UDP Length | UDP Checksum | | | UDP Length | UDP Checksum | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Outer Security Envelope Header: | Outer Security Envelope Header: | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| RIFT MAGIC | Packet Number | | | RIFT MAGIC | Packet Number | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Reserved | RIFT Major | Outer Key ID | Fingerprint | | | Reserved | RIFT Major | Outer Key ID | Fingerprint | | |||
| | Version | | Length | | | | Version | | Length | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | | | | | |||
~ Security Fingerprint covers all following content ~ | ~ Security Fingerprint covers all following content ~ | |||
| | | | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Weak Nonce Local | Weak Nonce Remote | | | Weak Nonce Local | Weak Nonce Remote | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Remaining TIE Lifetime (all 1s in case of LIE) | | | Remaining TIE Lifetime (all 1s in case of LIE) | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
TIE Origin Security Envelope Header: | TIE Origin Security Envelope Header: | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| TIE Origin Key ID | Fingerprint | | | TIE Origin Key ID | Fingerprint | | |||
| | Length | | | | Length | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | | | | | |||
~ Security Fingerprint covers all following content ~ | ~ Security Fingerprint covers all following content ~ | |||
| | | | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Serialized RIFT Model Object | Serialized RIFT Model Object | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | | | | | |||
~ Serialized RIFT Model Object ~ | ~ Serialized RIFT Model Object ~ | |||
| | | | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 34: Security Envelope | Figure 34: Security Envelope | |||
RIFT MAGIC: | RIFT MAGIC: 16 bits | |||
16 bits. Constant value of 0xA1F7 that allows easy classification | ||||
of RIFT packets independent of the UDP port used. | ||||
Packet Number: | Constant value of 0xA1F7 that allows easy classification of RIFT | |||
16 bits. An optional, per adjacency, per packet type number set | packets independent of the UDP port used. | |||
using the sequence number arithmetic defined in Appendix A. If | ||||
the arithmetic in Appendix A is not used the node MUST set the | Packet Number: 16 bits | |||
value to _undefined_packet_number_. This number can be used to | ||||
detect losses and misordering in flooding for either operational | An optional, per-adjacency, per-packet type number set using the | |||
purposes or in implementation to adjust flooding behavior to | sequence number arithmetic defined in Appendix A. If the | |||
current link or buffer quality. This number MUST NOT be used to | arithmetic in Appendix A is not used, the node MUST set the value | |||
discard or validate the correctness of packets. Packet numbers | to _undefined_packet_number_. This number can be used to detect | |||
are incremented on each interface and within that for each type of | losses and misordering in flooding for either operational purposes | |||
or in implementation to adjust flooding behavior to current link | ||||
or buffer quality. This number MUST NOT be used to discard or | ||||
validate the correctness of packets. Packet numbers are | ||||
incremented on each interface and within that for each type of | ||||
packet independently. This allows parallelizing packet generation | packet independently. This allows parallelizing packet generation | |||
and processing for different types within an implementation if so | and processing for different types within an implementation, if so | |||
desired. | desired. | |||
RIFT Major Version: | RIFT Major Version: 8 bits | |||
8 bits. This value MUST be set to `protocol_major_version` | ||||
This value MUST be set to "protocol_major_version", which is | ||||
defined in the schema and used to serialize the object contained. | defined in the schema and used to serialize the object contained. | |||
It allows checking whether protocol versions are compatible on | It allows checking whether protocol versions are compatible on | |||
both sides, i.e., which schema version is necessary to decode the | both sides, i.e., which schema version is necessary to decode the | |||
serialized object. An implementation MUST drop packets with | serialized object. An implementation MUST drop packets with | |||
unexpected values and MAY report a problem. The specification of | unexpected values and MAY report a problem. The specification of | |||
how an implementation may negotiate the schema's major version is | how an implementation may negotiate the schema's major version is | |||
outside the scope of this document. | outside the scope of this document. | |||
Outer Key ID: | Outer Key ID: 8 bits | |||
8 bits. A simple, unstructured value acting as indirection into a | ||||
A simple, unstructured value acting as indirection into a | ||||
structure holding an algorithm and any related secrets necessary | structure holding an algorithm and any related secrets necessary | |||
to validate any provided outer security fingerprint or signature. | to validate any provided outer security fingerprint or signature. | |||
Value _undefined_securitykey_id_ means that no valid fingerprint | The value _undefined_securitykey_id_ means that no valid | |||
was computed or is provided, otherwise one of the algorithms in | fingerprint was computed or is provided; otherwise, one of the | |||
Section 10.2 MUST be used to compute the fingerprint. This Key ID | algorithms in Section 10.2 MUST be used to compute the | |||
scope is local to the nodes on both ends of the adjacency. | fingerprint. This key ID scope is local to the nodes on both ends | |||
of the adjacency. | ||||
TIE Origin Key ID: | TIE Origin Key ID: 24 bits | |||
24 bits. A simple, unstructured value acting as indirection into | ||||
a structure holding an algorithm and any related secrets necessary | A simple, unstructured value acting as indirection into a | |||
structure holding an algorithm and any related secrets necessary | ||||
to validate any provided inner security fingerprint or signature. | to validate any provided inner security fingerprint or signature. | |||
Value _undefined_securitykey_id_ means that no valid fingerprint | The value _undefined_securitykey_id_ means that no valid | |||
was computed, otherwise one of the algorithms in Section 10.2 MUST | fingerprint was computed; otherwise, one of the algorithms in | |||
be used to compute the fingerprint.. This Key ID scope is global | Section 10.2 MUST be used to compute the fingerprint. This key ID | |||
to the RIFT instance since it may imply the originator of the TIE | scope is global to the RIFT instance since it may imply the | |||
so the contained object does not have to be de-serialized to | originator of the TIE so the contained object does not have to be | |||
obtain the originator. | deserialized to obtain the originator. | |||
Length of Fingerprint: | Fingerprint Length: 8 bits | |||
8 bits. Length in 32-bit multiples of the following fingerprint | ||||
(not including lifetime or weak nonces). It allows the structure | Length in 32-bit multiples of the following fingerprint (not | |||
to be navigated when an unknown key type is present. To clarify, | including lifetime or weak nonces). It allows the structure to be | |||
a common corner case when this value is set to 0 is when it | navigated when an unknown key type is present. To clarify, a | |||
common corner case when this value is set to 0 is when it | ||||
signifies an empty (0 bytes long) security fingerprint. | signifies an empty (0 bytes long) security fingerprint. | |||
Security Fingerprint: | Security Fingerprint: 32 bits * Fingerprint Length. | |||
32 bits * Length of Fingerprint. This is a signature that is | ||||
computed over all data following after it. If the significant | ||||
bits of fingerprint are fewer than the 32 bits padded length then | ||||
the significant bits MUST be left aligned and remaining bits on | ||||
the right padded with 0s. When using PKI (Public Key | ||||
Infrastructure) the Security fingerprint originating node uses its | ||||
private key to create the signature. The original packet can then | ||||
be verified provided the public key is shared and current. | ||||
Methodology to negotiate, distribute, or roll over keys are | ||||
outside the scope of this document. | ||||
Remaining TIE Lifetime: | This is a signature that is computed over all data following after | |||
32 bits. In case of anything but TIEs this field MUST be set to | it. If the significant bits of the fingerprint are fewer than the | |||
all ones and Origin Security Envelope Header MUST NOT be present | 32-bit padded length, then the significant bits MUST be left | |||
in the packet. For TIEs this field represents the remaining | aligned and the remaining bits on the right are padded with 0s. | |||
lifetime of the TIE and Origin Security Envelope Header MUST be | When using Public Key Infrastructure (PKI), the security | |||
present in the packet. | fingerprint originating node uses its private key to create the | |||
signature. The original packet can then be verified, provided the | ||||
public key is shared and current. Methodology to negotiate, | ||||
distribute, or rollover keys is outside the scope of this | ||||
document. | ||||
Weak Nonce Local: | Remaining TIE Lifetime: 32 bits | |||
16 bits. Local Weak Nonce of the adjacency as advertised in LIEs. | ||||
Weak Nonce Remote: | In case of anything but TIEs, this field MUST be set to all ones | |||
16 bits. Remote Weak Nonce of the adjacency as received in LIEs. | and the Origin Security Envelope Header MUST NOT be present in the | |||
packet. For TIEs, this field represents the remaining lifetime of | ||||
the TIE and the Origin Security Envelope Header MUST be present in | ||||
the packet. | ||||
TIE Origin Security Envelope Header: | Weak Nonce Local: 16 bits | |||
It MUST be present if and only if the Remaining TIE Lifetime field | ||||
is *not* all ones. It carries through the originators Key ID and | ||||
corresponding fingerprint of the object to protect TIE from | ||||
modification during flooding. This ensures origin validation and | ||||
integrity (but does not provide validation of a chain of trust). | ||||
Observe that due to the schema migration rules per Section 7 the | Local Weak Nonce of the adjacency, as advertised in LIEs. | |||
contained model can be always decoded if the major version matches | ||||
Weak Nonce Remote: 16 bits | ||||
Remote Weak Nonce of the adjacency, as received in LIEs. | ||||
TIE Origin Security Envelope Header: It MUST be present if and only | ||||
if the Remaining TIE Lifetime field is *not* all ones. It carries | ||||
through the originator's key ID and corresponding fingerprint of | ||||
the object to protect TIE from modification during flooding. This | ||||
ensures origin validation and integrity (but does not provide | ||||
validation of a chain of trust). | ||||
Observe that, due to the schema migration rules per Section 7, the | ||||
contained model can always be decoded if the major version matches | ||||
and the envelope integrity has been validated. Consequently, | and the envelope integrity has been validated. Consequently, | |||
description of the TIE is available to flood it properly including | description of the TIE is available to flood it properly, including | |||
unknown TIE types. | unknown TIE types. | |||
6.9.4. Weak Nonces | 6.9.4. Weak Nonces | |||
The protocol uses two 16-bit nonces to salt generated signatures. | The protocol uses two 16-bit nonces to salt generated signatures. | |||
The term "nonce" is used a bit loosely since RIFT nonces are not | The term "nonce" is used a bit loosely since RIFT nonces are not | |||
being changed in every packet as often common in cryptography. For | being changed in every packet, which is common in cryptography. For | |||
efficiency purposes they are changed at a high enough frequency to | efficiency purposes, they are changed at a high enough frequency to | |||
dwarf practical replay attack attempts. And hence, such nonces are | dwarf practical replay attack attempts. And hence, such nonces are | |||
called from this point on "weak" nonces. | called from this point on "weak" nonces. | |||
Any implementation using outer key ID different from | Any implementation using a different outer key ID from | |||
`undefined_securitykey_id` MUST generate and wrap around local nonces | 'undefined_securitykey_id' MUST generate and wrap around local nonces | |||
properly and SHOULD do it even if not using any algorithm in | properly and SHOULD do it even if not using any algorithm from | |||
Section 10.2. When a nonce increment leads to _undefined_nonce_ | Section 10.2. When a nonce increment leads to the _undefined_nonce_ | |||
value, the value MUST be incremented again immediately. All | value, the value MUST be incremented again immediately. All | |||
implementations MUST reflect the neighbor's nonces. An | implementations MUST reflect the neighbor's nonces. An | |||
implementation SHOULD increment a chosen nonce on every LIE FSM | implementation SHOULD increment a chosen nonce on every LIE FSM | |||
transition that ends up in a different state from the previous one | transition that ends up in a different state from the previous one | |||
and MUST increment its nonce at least every | and MUST increment its nonce at least every | |||
_nonce_regeneration_interval_ if using any algorithm in Section 10.2 | _nonce_regeneration_interval_ if using any algorithm in Section 10.2 | |||
(such considerations allow for efficient implementations without | (such considerations allow for efficient implementations without | |||
opening a significant security risk). When flooding TIEs, the | opening a significant security risk). When flooding TIEs, the | |||
implementation MUST use recent (i.e. within allowed difference) | implementation MUST use recent (i.e., within allowed difference) | |||
nonces reflected in the LIE exchange. The schema specifies in | nonces reflected in the LIE exchange. The schema specifies in | |||
_maximum_valid_nonce_delta_ the maximum allowable nonce value | _maximum_valid_nonce_delta_ the maximum allowable nonce value | |||
difference on a packet compared to reflected nonces in the LIEs. Any | difference on a packet compared to reflected nonces in the LIEs. Any | |||
packet received with nonces deviating more than the allowed delta | packet received with nonces deviating more than the allowed delta | |||
MUST be discarded without further computation of signatures to | MUST be discarded without further computation of signatures to | |||
prevent computation load attacks. The delta is either a negative or | prevent computation load attacks. The delta is either a negative or | |||
positive difference that a mirrored nonce can deviate from local | positive difference that a mirrored nonce can deviate from the local | |||
value to be considered valid. If nonces are not changed on every | value to be considered valid. If nonces are not changed on every | |||
packet but at the maximum interval on both sides this opens | packet, but at the maximum interval on both sides, this opens | |||
statistically a _maximum_valid_nonce_delta_/2 window for identical | statistically a _maximum_valid_nonce_delta_/2 window for identical | |||
LIEs, TIE and TI(x)E replays. The interval cannot be too small since | LIEs, TIE, and TI(x)E replays. The interval cannot be too small | |||
LIE FSM may change states fairly quickly during ZTP without sending | since LIE FSM may change states fairly quickly during ZTP without | |||
LIEs and additionally, UDP can both loose as well as misorder | sending LIEs, and additionally, UDP can both loose as well as | |||
packets. | misorder packets. | |||
In cases where a secure implementation does not receive signatures or | In cases where a secure implementation does not receive signatures or | |||
receives undefined nonces from a neighbor (indicating that it does | receives undefined nonces from a neighbor (indicating that it does | |||
not support or verify signatures), it is a matter of local policy as | not support or verify signatures), it is a matter of local policy as | |||
to how those packets are treated. A secure implementation MAY refuse | to how those packets are treated. A secure implementation MAY refuse | |||
forming an adjacency with an implementation that is not advertising | forming an adjacency with an implementation that is not advertising | |||
signatures or valid nonces, or it MAY continue signing local packets | signatures or valid nonces, or it MAY continue signing local packets | |||
while accepting a neighbor's packets without further security | while accepting a neighbor's packets without further security | |||
validation. | validation. | |||
As a necessary exception, an implementation MUST advertise the remote | As a necessary exception, an implementation MUST advertise the remote | |||
nonce value as _undefined_nonce_ when the FSM is not in _TwoWay_ or | nonce value as _undefined_nonce_ when the FSM is not in _TwoWay_ or | |||
_ThreeWay_ state and accept an _undefined_nonce_ for its local nonce | _ThreeWay_ state and accept an _undefined_nonce_ for its local nonce | |||
value on packets in any other state than _ThreeWay_. | value on packets in any other state than _ThreeWay_. | |||
As an optional optimization, an implementation MAY send one LIE with | As an optional optimization, an implementation MAY send one LIE with | |||
previously negotiated neighbor's nonce to try to speed up a | a previously negotiated neighbor's nonce to try to speed up a | |||
neighbor's transition from _ThreeWay_ to _OneWay_ and MUST revert to | neighbor's transition from _ThreeWay_ to _OneWay_ and MUST revert to | |||
sending _undefined_nonce_ after that. | sending _undefined_nonce_ after that. | |||
6.9.5. Lifetime | 6.9.5. Lifetime | |||
Reflooding same TIE version quickly with small variations in its | Reflooding the same TIE version quickly with small variations in its | |||
lifetime may lead to an excessive number of security fingerprint | lifetime may lead to an excessive number of security fingerprint | |||
computations. To avoid this, the application generating the | computations. To avoid this, the application generating the | |||
fingerprints for flooded TIEs MAY round the value down to the next | fingerprints for flooded TIEs MAY round the value down to the next | |||
_rounddown_lifetime_interval_ on the packet header to reuse previous | _rounddown_lifetime_interval_ on the packet header to reuse previous | |||
computation results. TIEs flooded with such rounded lifetimes only | computation results. TIEs flooded with such rounded lifetimes will | |||
will limit the amount of computations necessary during transitions | only limit the amount of computations necessary during transitions | |||
that lead to advertisement of same TIEs with same information within | that lead to advertisement of the same TIEs with the same information | |||
a short period of time. | within a short period of time. | |||
6.9.6. Security Association Changes | 6.9.6. Security Association Changes | |||
No mechanism is specified to convert a security envelope for the same | No mechanism is specified to convert a security envelope for the same | |||
Key ID from one algorithm to another once the envelope is | key ID from one algorithm to another once the envelope is | |||
operational. The recommended procedure to change to a new algorithm | operational. The recommended procedure to change to a new algorithm | |||
is to take the adjacency down, make the necessary changes to the | is to take the adjacency down, make the necessary changes to the | |||
secret and algorithm used by the according key ID, and bring the | secret and algorithm used by the according key ID, and bring the | |||
adjacency back up. Obviously, an implementation MAY choose to stop | adjacency back up. Obviously, an implementation MAY choose to stop | |||
verifying security envelope for the duration of algorithm change to | verifying the security envelope for the duration of the algorithm | |||
keep the adjacency up but since this introduces a security | change to keep the adjacency up, but since this introduces a security | |||
vulnerability window, such roll-over SHOULD NOT be recommended. | vulnerability window, such rollover SHOULD NOT be recommended. Other | |||
Other approaches, such as accepting multiple algorithms for same key | approaches, such as accepting multiple algorithms for same key ID for | |||
ID for a configured time window are possible but in the realm of | a configured time window, are possible but in the realm of | |||
implementation choices rather than protocol specification. | implementation choices rather than protocol specification. | |||
7. Information Elements Schema | 7. Information Elements Schema | |||
This section introduces the schema for information elements. The IDL | This section introduces the schema for information elements. The | |||
is Thrift [thrift]. | Interface Description Language (IDL) is Thrift [thrift]. | |||
On schema changes that | On schema changes that | |||
1. change field numbers *or* | 1. change field numbers *or* | |||
2. add new *required* fields *or* | 2. add new *required* fields *or* | |||
3. remove any fields *or* | 3. remove any fields *or* | |||
4. change lists into sets, unions into structures *or* | 4. change lists into sets, unions into structures *or* | |||
5. change multiplicity of fields *or* | 5. change multiplicity of fields *or* | |||
6. changes type or name of any field *or* | 6. changes type or name of any field *or* | |||
7. change data types of the type of any field *or* | 7. change data types of the type of any field *or* | |||
skipping to change at page 126, line 26 ¶ | skipping to change at line 5615 ¶ | |||
9. removes or changes any defined constant or constant value *or* | 9. removes or changes any defined constant or constant value *or* | |||
10. changes any enumeration type except extending | 10. changes any enumeration type except extending | |||
`common.TIETypeType` (use of enumeration types is generally | `common.TIETypeType` (use of enumeration types is generally | |||
discouraged) *or* | discouraged) *or* | |||
11. adds new TIE type to _TIETypeType_ with flooding scope different | 11. adds new TIE type to _TIETypeType_ with flooding scope different | |||
from prefix TIE flooding scope | from prefix TIE flooding scope | |||
major version of the schema MUST increase. All other changes MUST | the major version of the schema MUST increase. All other changes | |||
increase minor version within the same major. | MUST increase the minor version within the same major. | |||
Introducing an optional field does not cause a major version increase | Introducing an optional field does not cause a major version increase | |||
even if the fields inside the structure are optional with defaults. | even if the fields inside the structure are optional with defaults. | |||
All signed integer as forced by Thrift [thrift] support must be cast | All signed integers, as forced by Thrift [thrift] support, must be | |||
for internal purposes to equivalent unsigned values without | cast for internal purposes to equivalent unsigned values without | |||
discarding the signedness bit. An implementation SHOULD try to avoid | discarding the signedness bit. An implementation SHOULD try to avoid | |||
using the signedness bit when generating values. | using the signedness bit when generating values. | |||
The schema is normative. | The schema is normative. | |||
7.1. Backwards-Compatible Extension of Schema | 7.1. Backwards-Compatible Extension of Schema | |||
The set of rules in Section 7 guarantees that every decoder can | The set of rules in Section 7 guarantees that every decoder can | |||
process serialized content generated by a higher minor version of the | process serialized content generated by a higher minor version of the | |||
schema and with that the protocol can progress without a 'flag-day'. | schema, and with that, the protocol can progress without a 'flag- | |||
Contrary to that, content serialized using a major version X is *not* | day'. Contrary to that, content serialized using a major version X | |||
expected to be decodable by any implementation using decoder for a | is *not* expected to be decodable by any implementation using a | |||
model with a major version lower than X. Schema negotiation and | decoder for a model with a major version lower than X. Schema | |||
translation within RIFT is outside the scope of this document. | negotiation and translation within RIFT is outside the scope of this | |||
document. | ||||
Additionally, based on the propagated minor version in encoded | Additionally, based on the propagated minor version in encoded | |||
content and added optional node capabilities new TIE types or even | content and added optional node capabilities, new TIE types or even | |||
de-facto mandatory fields can be introduced without progressing the | de facto mandatory fields can be introduced without progressing the | |||
major version albeit only nodes supporting such new extensions would | major version, albeit only nodes supporting such new extensions would | |||
decode them. Given the model is encoded at the source and never re- | decode them. Given the model is encoded at the source and never re- | |||
encoded flooding through nodes not understanding any new extensions | encoded, flooding through nodes not understanding any new extensions | |||
will preserve the corresponding fields. However, it is important to | will preserve the corresponding fields. However, it is important to | |||
understand that a higher minor version of a schema does *not* | understand that a higher minor version of a schema does *not* | |||
guarantee that capabilities introduced in lower minors of the same | guarantee that capabilities introduced in lower minors of the same | |||
major are supported. The _node_capabilities_ field is used to | major are supported. The _node_capabilities_ field is used to | |||
indicate which capabilities are supported. | indicate which capabilities are supported. | |||
Specifically, the schema SHOULD add elements to _NodeCapabilities_ | Specifically, the schema SHOULD add elements to the | |||
field future capabilities to indicate whether it will support | _NodeCapabilities_ field's future capabilities to indicate whether it | |||
interpretation of schema extensions on the same major revision if | will support interpretation of schema extensions on the same major | |||
they are present. Such fields MUST be optional and have an implicit | revision if they are present. Such fields MUST be optional and have | |||
or explicit false default value. If a future capability changes | an implicit or explicit false default value. If a future capability | |||
route selection or generates conditions that cause packet loss if | changes route selection or generates conditions that cause packet | |||
some nodes are not supporting it then a major version increment will | loss if some nodes are not supporting it, then a major version | |||
be however unavoidable. _NodeCapabilities_ shown in LIE MUST match | increment will be unavoidable. _NodeCapabilities_ shown in LIE MUST | |||
the capabilities shown in the Node TIEs, otherwise the behavior is | match the capabilities shown in the Node TIEs; otherwise, the | |||
unspecified. A node detecting the mismatch SHOULD generate a | behavior is unspecified. A node detecting the mismatch SHOULD | |||
notification. | generate a notification. | |||
Alternately or additionally, new optional fields can be introduced | Alternately or additionally, new optional fields can be introduced | |||
into e.g. _NodeTIEElement_ if a special field is chosen to indicate | into, e.g., _NodeTIEElement_, if a special field is chosen to | |||
via its presence that an optional feature is enabled (since | indicate via its presence that an optional feature is enabled (since | |||
capability to support a feature does not necessarily mean that the | capability to support a feature does not necessarily mean that the | |||
feature is actually configured and operational). | feature is actually configured and operational). | |||
To support new TIE types without increasing the major version | To support new TIE types without increasing the major version | |||
enumeration _TIEElement_ can be extended with new optional elements | enumeration, _TIEElement_ can be extended with new optional elements | |||
for new `common.TIETypeType` values as long the scope of the new TIE | for new 'common.TIETypeType' values as long the scope of the new TIE | |||
matches the prefix TIE scope. In case it is necessary to understand | matches the prefix TIE scope. In case it is necessary to understand | |||
whether all nodes can parse the new TIE type a node capability MUST | whether all nodes can parse the new TIE type, a node capability MUST | |||
be added in _NodeCapabilities_ to prevent a non-homogenous network. | be added in _NodeCapabilities_ to prevent a non-homogenous network. | |||
7.2. common.thrift | 7.2. common.thrift | |||
/** | This schema references [RFC5837], [RFC5880], and [RFC6550]. | |||
Thrift file with common definitions for RIFT | ||||
*/ | ||||
namespace py common | /** | |||
Thrift file with common definitions for RIFT | ||||
*/ | ||||
/** @note MUST be interpreted in implementation as unsigned 64 bits. | namespace py common | |||
*/ | ||||
typedef i64 SystemIDType | ||||
typedef i32 IPv4Address | ||||
typedef i32 MTUSizeType | ||||
/** @note MUST be interpreted in implementation as unsigned | ||||
rolling over number */ | ||||
typedef i64 SeqNrType | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i32 LifeTimeInSecType | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i8 LevelType | ||||
typedef i16 PacketNumberType | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i32 PodType | ||||
/** @note MUST be interpreted in implementation as unsigned. | ||||
/** this has to be long enough to accomodate prefix */ | ||||
typedef binary IPv6Address | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i16 UDPPortType | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i32 TIENrType | ||||
/** @note MUST be interpreted in implementation as unsigned | ||||
This is carried in the | ||||
security envelope and must hence fit into 8 bits. */ | ||||
typedef i8 VersionType | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i16 MinorVersionType | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i32 MetricType | ||||
/** @note MUST be interpreted in implementation as unsigned | ||||
and unstructured */ | ||||
typedef i64 RouteTagType | ||||
/** @note MUST be interpreted in implementation as unstructured | ||||
label value */ | ||||
typedef i32 LabelType | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i32 BandwithInMegaBitsType | ||||
/** @note Key Value Key ID type */ | ||||
typedef i32 KeyIDType | ||||
/** node local, unique identification for a link (interface/tunnel | ||||
* etc. Basically anything RIFT runs on). This is kept | ||||
* at 32 bits so it aligns with BFD [RFC5880] discriminator size. | ||||
*/ | ||||
typedef i32 LinkIDType | ||||
/** @note MUST be interpreted in implementation as unsigned, | ||||
especially since we have the /128 IPv6 case. */ | ||||
typedef i8 PrefixLenType | ||||
/** timestamp in seconds since the epoch */ | ||||
typedef i64 TimestampInSecsType | ||||
/** security nonce. | ||||
@note MUST be interpreted in implementation as rolling | ||||
over unsigned value */ | ||||
typedef i16 NonceType | ||||
/** LIE FSM holdtime type */ | ||||
typedef i16 TimeIntervalInSecType | ||||
/** Transaction ID type for prefix mobility as specified by RFC6550, | ||||
value MUST be interpreted in implementation as unsigned */ | ||||
typedef i8 PrefixTransactionIDType | ||||
/** Timestamp per IEEE 802.1AS, all values MUST be interpreted in | ||||
implementation as unsigned. */ | ||||
struct IEEE802_1ASTimeStampType { | ||||
1: required i64 AS_sec; | ||||
2: optional i32 AS_nsec; | ||||
} | ||||
/** generic counter type */ | ||||
typedef i64 CounterType | ||||
/** Platform Interface Index type, i.e. index of interface on hardware, | ||||
can be used e.g. with RFC5837 */ | ||||
typedef i32 PlatformInterfaceIndex | ||||
/** Flags indicating node configuration in case of ZTP. | /** @note MUST be interpreted in implementation as unsigned 64 bits. | |||
*/ | */ | |||
enum HierarchyIndications { | typedef i64 SystemIDType | |||
/** forces level to `leaf_level` and enables according procedures */ | typedef i32 IPv4Address | |||
leaf_only = 0, | typedef i32 MTUSizeType | |||
/** forces level to `leaf_level` and enables according procedures */ | /** @note MUST be interpreted in implementation as unsigned | |||
leaf_only_and_leaf_2_leaf_procedures = 1, | rolling over number */ | |||
/** forces level to `top_of_fabric` and enables according | typedef i64 SeqNrType | |||
procedures */ | /** @note MUST be interpreted in implementation as unsigned */ | |||
top_of_fabric = 2, | typedef i32 LifeTimeInSecType | |||
} | /** @note MUST be interpreted in implementation as unsigned */ | |||
typedef i8 LevelType | ||||
typedef i16 PacketNumberType | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i32 PodType | ||||
/** @note MUST be interpreted in implementation as unsigned. | ||||
/** this has to be long enough to accommodate prefix */ | ||||
typedef binary IPv6Address | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i16 UDPPortType | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i32 TIENrType | ||||
/** @note MUST be interpreted in implementation as unsigned | ||||
This is carried in the security envelope and must | ||||
hence fit into 8 bits. */ | ||||
typedef i8 VersionType | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i16 MinorVersionType | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i32 MetricType | ||||
/** @note MUST be interpreted in implementation as unsigned | ||||
and unstructured */ | ||||
typedef i64 RouteTagType | ||||
/** @note MUST be interpreted in implementation as unstructured | ||||
label value */ | ||||
typedef i32 LabelType | ||||
/** @note MUST be interpreted in implementation as unsigned */ | ||||
typedef i32 BandwidthInMegaBitsType | ||||
/** @note Key Value key ID type */ | ||||
typedef i32 KeyIDType | ||||
/** node local, unique identification for a link (interface/tunnel/ | ||||
* etc., basically anything RIFT runs on). This is kept | ||||
* at 32 bits so it aligns with BFD (RFC 5880) discriminator size. | ||||
*/ | ||||
typedef i32 LinkIDType | ||||
/** @note MUST be interpreted in implementation as unsigned, | ||||
especially since we have the /128 IPv6 case. */ | ||||
typedef i8 PrefixLenType | ||||
/** timestamp in seconds since the epoch */ | ||||
typedef i64 TimestampInSecsType | ||||
/** security nonce. | ||||
@note MUST be interpreted in implementation as rolling | ||||
over unsigned value */ | ||||
typedef i16 NonceType | ||||
/** LIE FSM holdtime type */ | ||||
typedef i16 TimeIntervalInSecType | ||||
/** Transaction ID type for prefix mobility as specified by RFC 6550, | ||||
value MUST be interpreted in implementation as unsigned */ | ||||
typedef i8 PrefixTransactionIDType | ||||
/** Timestamp per IEEE 802.1AS, all values MUST be interpreted in | ||||
implementation as unsigned. */ | ||||
struct IEEE802_1ASTimeStampType { | ||||
1: required i64 AS_sec; | ||||
2: optional i32 AS_nsec; | ||||
} | ||||
/** generic counter type */ | ||||
typedef i64 CounterType | ||||
/** Platform Interface Index type, i.e., index of interface on | ||||
hardware, can be used, e.g., with RFC 5837 */ | ||||
typedef i32 PlatformInterfaceIndex | ||||
const PacketNumberType undefined_packet_number = 0 | /** Flags indicating node configuration in case of ZTP. | |||
/** used when node is configured as top of fabric in ZTP.*/ | */ | |||
const LevelType top_of_fabric_level = 24 | enum HierarchyIndications { | |||
/** default bandwidth on a link */ | /** forces level to 'leaf_level' and enables | |||
const BandwithInMegaBitsType default_bandwidth = 100 | according procedures */ | |||
/** fixed leaf level when ZTP is not used */ | leaf_only = 0, | |||
const LevelType leaf_level = 0 | /** forces level to 'leaf_level' and enables | |||
const LevelType default_level = leaf_level | according procedures */ | |||
const PodType default_pod = 0 | leaf_only_and_leaf_2_leaf_procedures = 1, | |||
const LinkIDType undefined_linkid = 0 | /** forces level to 'top_of_fabric' and enables according | |||
procedures */ | ||||
top_of_fabric = 2, | ||||
} | ||||
/** invalid key for key value */ | const PacketNumberType undefined_packet_number = 0 | |||
const KeyIDType invalid_key_value_key = 0 | /** used when node is configured as top of fabric in ZTP.*/ | |||
/** default distance used */ | const LevelType top_of_fabric_level = 24 | |||
const MetricType default_distance = 1 | /** default bandwidth on a link */ | |||
/** any distance larger than this will be considered infinity */ | const BandwidthInMegaBitsType default_bandwidth = 100 | |||
const MetricType infinite_distance = 0x7FFFFFFF | /** fixed leaf level when ZTP is not used */ | |||
/** represents invalid distance */ | const LevelType leaf_level = 0 | |||
const MetricType invalid_distance = 0 | const LevelType default_level = leaf_level | |||
const bool overload_default = false | const PodType default_pod = 0 | |||
const bool flood_reduction_default = true | const LinkIDType undefined_linkid = 0 | |||
/** default LIE FSM LIE TX internval time */ | ||||
const TimeIntervalInSecType default_lie_tx_interval = 1 | ||||
/** default LIE FSM holddown time */ | ||||
const TimeIntervalInSecType default_lie_holdtime = 3 | ||||
/** multipler for default_lie_holdtime to hold down multiple neighbors */ | ||||
const i8 multiple_neighbors_lie_holdtime_multipler = 4 | ||||
/** default ZTP FSM holddown time */ | ||||
const TimeIntervalInSecType default_ztp_holdtime = 1 | ||||
/** by default LIE levels are ZTP offers */ | ||||
const bool default_not_a_ztp_offer = false | ||||
/** by default everyone is repeating flooding */ | ||||
const bool default_you_are_flood_repeater = true | ||||
/** 0 is illegal for SystemID */ | ||||
const SystemIDType IllegalSystemID = 0 | ||||
/** empty set of nodes */ | ||||
const set<SystemIDType> empty_set_of_nodeids = {} | ||||
/** default lifetime of TIE is one week */ | ||||
const LifeTimeInSecType default_lifetime = 604800 | ||||
/** default lifetime when TIEs are purged is 5 minutes */ | ||||
const LifeTimeInSecType purge_lifetime = 300 | ||||
/** optional round down interval when TIEs are sent with security signatures | ||||
to prevent excessive computation. **/ | ||||
const LifeTimeInSecType rounddown_lifetime_interval = 60 | ||||
/** any `TieHeader` that has a smaller lifetime difference | ||||
than this constant is equal (if other fields equal). */ | ||||
const LifeTimeInSecType lifetime_diff2ignore = 400 | ||||
/** default UDP port to run LIEs on */ | /** invalid key for key value */ | |||
const UDPPortType default_lie_udp_port = 914 | const KeyIDType invalid_key_value_key = 0 | |||
/** default UDP port to receive TIEs on, that can be peer specific */ | /** default distance used */ | |||
const UDPPortType default_tie_udp_flood_port = 915 | const MetricType default_distance = 1 | |||
/** any distance larger than this will be considered infinity */ | ||||
const MetricType infinite_distance = 0x7FFFFFFF | ||||
/** represents invalid distance */ | ||||
const MetricType invalid_distance = 0 | ||||
const bool overload_default = false | ||||
const bool flood_reduction_default = true | ||||
/** default LIE FSM LIE TX interval time */ | ||||
const TimeIntervalInSecType default_lie_tx_interval = 1 | ||||
/** default LIE FSM holddown time */ | ||||
const TimeIntervalInSecType default_lie_holdtime = 3 | ||||
/** multiplier for default_lie_holdtime to | ||||
holddown multiple neighbors */ | ||||
const i8 multiple_neighbors_lie_holdtime_multiplier = 4 | ||||
/** default ZTP FSM holddown time */ | ||||
const TimeIntervalInSecType default_ztp_holdtime = 1 | ||||
/** by default LIE levels are ZTP offers */ | ||||
const bool default_not_a_ztp_offer = false | ||||
/** by default everyone is repeating flooding */ | ||||
const bool default_you_are_flood_repeater = true | ||||
/** 0 is illegal for System IDs */ | ||||
const SystemIDType IllegalSystemID = 0 | ||||
/** empty set of nodes */ | ||||
const set<SystemIDType> empty_set_of_nodeids = {} | ||||
/** default lifetime of TIE is one week */ | ||||
const LifeTimeInSecType default_lifetime = 604800 | ||||
/** default lifetime when TIEs are purged is 5 minutes */ | ||||
const LifeTimeInSecType purge_lifetime = 300 | ||||
/** optional round down interval when | ||||
* TIEs are sent with security signatures | ||||
* to prevent excessive computation. | ||||
*/ | ||||
const LifeTimeInSecType rounddown_lifetime_interval = 60 | ||||
/** any 'TieHeader' that has a smaller lifetime difference | ||||
than this constant is equal (if other fields equal). */ | ||||
const LifeTimeInSecType lifetime_diff2ignore = 400 | ||||
/** default MTU link size to use */ | /** default UDP port to run LIEs on */ | |||
const MTUSizeType default_mtu_size = 1400 | const UDPPortType default_lie_udp_port = 914 | |||
/** default link being BFD capable */ | /** default UDP port to receive TIEs on, | |||
const bool bfd_default = true | which can be peer specific */ | |||
const UDPPortType default_tie_udp_flood_port = 915 | ||||
/** type used to target nodes with key value */ | /** default MTU link size to use */ | |||
typedef i64 KeyValueTargetType | const MTUSizeType default_mtu_size = 1400 | |||
/** default link being BFD capable */ | ||||
const bool bfd_default = true | ||||
/** default target for key value are all nodes. */ | /** type used to target nodes with key value */ | |||
const KeyValueTargetType keyvaluetarget_default = 0 | typedef i64 KeyValueTargetType | |||
/** value for _all leaves_ addressing. Represented by all bits set. */ | ||||
const KeyValueTargetType keyvaluetarget_all_south_leaves = -1 | ||||
/** undefined nonce, equivalent to missing nonce */ | /** default target for key value are all nodes. */ | |||
const NonceType undefined_nonce = 0; | const KeyValueTargetType keyvaluetarget_default = 0 | |||
/** outer security Key ID, MUST be interpreted as in implementation | /** value for _all leaves_ addressing. | |||
as unsigned */ | Represented by all bits set. */ | |||
typedef i8 OuterSecurityKeyID | const KeyValueTargetType keyvaluetarget_all_south_leaves = -1 | |||
/** security Key ID, MUST be interpreted as in implementation | ||||
as unsigned */ | ||||
typedef i32 TIESecurityKeyID | ||||
/** undefined key */ | ||||
const TIESecurityKeyID undefined_securitykey_id = 0; | ||||
/** Maximum delta (negative or positive) that a mirrored nonce can | ||||
deviate from local value to be considered valid. */ | ||||
const i16 maximum_valid_nonce_delta = 5; | ||||
const TimeIntervalInSecType nonce_regeneration_interval = 300; | ||||
/** Direction of TIEs. */ | /** undefined nonce, equivalent to missing nonce */ | |||
enum TieDirectionType { | const NonceType undefined_nonce = 0; | |||
Illegal = 0, | /** outer security key ID, MUST be interpreted as in implementation | |||
South = 1, | as unsigned */ | |||
North = 2, | typedef i8 OuterSecurityKeyID | |||
DirectionMaxValue = 3, | /** security key ID, MUST be interpreted as in implementation | |||
} | as unsigned */ | |||
typedef i32 TIESecurityKeyID | ||||
/** undefined key */ | ||||
const TIESecurityKeyID undefined_securitykey_id = 0; | ||||
/** Maximum delta (negative or positive) that a mirrored nonce can | ||||
deviate from local value to be considered valid. */ | ||||
const i16 maximum_valid_nonce_delta = 5; | ||||
const TimeIntervalInSecType nonce_regeneration_interval = 300; | ||||
/** Address family type. */ | /** Direction of TIEs. */ | |||
enum AddressFamilyType { | enum TieDirectionType { | |||
Illegal = 0, | Illegal = 0, | |||
AddressFamilyMinValue = 1, | South = 1, | |||
IPv4 = 2, | North = 2, | |||
IPv6 = 3, | DirectionMaxValue = 3, | |||
AddressFamilyMaxValue = 4, | } | |||
} | ||||
/** IPv4 prefix type. */ | /** Address family type. */ | |||
struct IPv4PrefixType { | enum AddressFamilyType { | |||
1: required IPv4Address address; | Illegal = 0, | |||
2: required PrefixLenType prefixlen; | AddressFamilyMinValue = 1, | |||
} | IPv4 = 2, | |||
IPv6 = 3, | ||||
AddressFamilyMaxValue = 4, | ||||
} | ||||
/** IPv6 prefix type. */ | /** IPv4 prefix type. */ | |||
struct IPv6PrefixType { | struct IPv4PrefixType { | |||
1: required IPv6Address address; | 1: required IPv4Address address; | |||
2: required PrefixLenType prefixlen; | 2: required PrefixLenType prefixlen; | |||
} | } | |||
/** IP address type. */ | /** IPv6 prefix type. */ | |||
union IPAddressType { | struct IPv6PrefixType { | |||
/** Content is IPv4 */ | 1: required IPv6Address address; | |||
1: optional IPv4Address ipv4address; | 2: required PrefixLenType prefixlen; | |||
/** Content is IPv6 */ | } | |||
2: optional IPv6Address ipv6address; | ||||
} | ||||
/** Prefix advertisement. | /** IP address type. */ | |||
union IPAddressType { | ||||
/** Content is IPv4 */ | ||||
1: optional IPv4Address ipv4address; | ||||
/** Content is IPv6 */ | ||||
2: optional IPv6Address ipv6address; | ||||
} | ||||
@note: for interface | /** Prefix advertisement. | |||
addresses the protocol can propagate the address part beyond | ||||
the subnet mask and on reachability computation that has to | ||||
be normalized. The non-significant bits can be used | ||||
for operational purposes. | ||||
*/ | ||||
union IPPrefixType { | ||||
1: optional IPv4PrefixType ipv4prefix; | ||||
2: optional IPv6PrefixType ipv6prefix; | ||||
} | ||||
/** Sequence of a prefix in case of move. | @note: For interface | |||
*/ | addresses, the protocol can propagate the address part beyond | |||
struct PrefixSequenceType { | the subnet mask and on reachability computation that has to | |||
1: required IEEE802_1ASTimeStampType timestamp; | be normalized. The non-significant bits can be used | |||
/** Transaction ID set by client in e.g. in 6LoWPAN. */ | for operational purposes. | |||
2: optional PrefixTransactionIDType transactionid; | */ | |||
} | union IPPrefixType { | |||
1: optional IPv4PrefixType ipv4prefix; | ||||
2: optional IPv6PrefixType ipv6prefix; | ||||
} | ||||
/** Type of TIE. | /** Sequence of a prefix in case of move. | |||
*/ | */ | |||
enum TIETypeType { | struct PrefixSequenceType { | |||
Illegal = 0, | 1: required IEEE802_1ASTimeStampType timestamp; | |||
TIETypeMinValue = 1, | /** Transaction ID set by the client in, e.g., 6LoWPAN. */ | |||
/** first legal value */ | 2: optional PrefixTransactionIDType transactionid; | |||
NodeTIEType = 2, | } | |||
PrefixTIEType = 3, | ||||
PositiveDisaggregationPrefixTIEType = 4, | ||||
NegativeDisaggregationPrefixTIEType = 5, | ||||
PGPrefixTIEType = 6, | ||||
KeyValueTIEType = 7, | ||||
ExternalPrefixTIEType = 8, | ||||
PositiveExternalDisaggregationPrefixTIEType = 9, | ||||
TIETypeMaxValue = 10, | ||||
} | ||||
/** RIFT route types. | /** Type of TIE. | |||
@note: The only purpose of those values is to introduce an | */ | |||
ordering whereas an implementation can choose internally | enum TIETypeType { | |||
any other values as long the ordering is preserved | Illegal = 0, | |||
*/ | TIETypeMinValue = 1, | |||
enum RouteType { | /** first legal value */ | |||
Illegal = 0, | NodeTIEType = 2, | |||
RouteTypeMinValue = 1, | PrefixTIEType = 3, | |||
/** First legal value. */ | PositiveDisaggregationPrefixTIEType = 4, | |||
/** Discard routes are most preferred */ | NegativeDisaggregationPrefixTIEType = 5, | |||
Discard = 2, | PGPrefixTIEType = 6, | |||
KeyValueTIEType = 7, | ||||
ExternalPrefixTIEType = 8, | ||||
PositiveExternalDisaggregationPrefixTIEType = 9, | ||||
TIETypeMaxValue = 10, | ||||
} | ||||
/** Local prefixes are directly attached prefixes on the | /** RIFT route types. | |||
* system such as e.g. interface routes. | @note: The only purpose of those values is to introduce an | |||
*/ | ordering, whereas an implementation can internally choose | |||
LocalPrefix = 3, | any other values as long the ordering is preserved. | |||
/** Advertised in S-TIEs */ | */ | |||
SouthPGPPrefix = 4, | enum RouteType { | |||
/** Advertised in N-TIEs */ | Illegal = 0, | |||
NorthPGPPrefix = 5, | RouteTypeMinValue = 1, | |||
/** Advertised in N-TIEs */ | /** First legal value. */ | |||
NorthPrefix = 6, | /** Discard routes are most preferred */ | |||
/** Externally imported north */ | Discard = 2, | |||
NorthExternalPrefix = 7, | ||||
/** Advertised in S-TIEs, either normal prefix or positive | ||||
disaggregation */ | ||||
SouthPrefix = 8, | ||||
/** Externally imported south */ | ||||
SouthExternalPrefix = 9, | ||||
/** Negative, transitive prefixes are least preferred */ | ||||
NegativeSouthPrefix = 10, | ||||
RouteTypeMaxValue = 11, | ||||
} | ||||
enum KVTypes { | /** Local prefixes are directly attached prefixes on the | |||
Experimental = 1, | * system, such as interface routes. | |||
WellKnown = 2, | */ | |||
OUI = 3, | LocalPrefix = 3, | |||
} | /** Advertised in S-TIEs */ | |||
SouthPGPPrefix = 4, | ||||
/** Advertised in N-TIEs */ | ||||
NorthPGPPrefix = 5, | ||||
/** Advertised in N-TIEs */ | ||||
NorthPrefix = 6, | ||||
/** Externally imported north */ | ||||
NorthExternalPrefix = 7, | ||||
/** Advertised in S-TIEs, either normal prefix or positive | ||||
disaggregation */ | ||||
SouthPrefix = 8, | ||||
/** Externally imported south */ | ||||
SouthExternalPrefix = 9, | ||||
/** Negative, transitive prefixes are least preferred */ | ||||
NegativeSouthPrefix = 10, | ||||
RouteTypeMaxValue = 11, | ||||
} | ||||
enum KVTypes { | ||||
Experimental = 1, | ||||
WellKnown = 2, | ||||
OUI = 3, | ||||
} | ||||
7.3. encoding.thrift | 7.3. encoding.thrift | |||
/** | /** | |||
Thrift file for packet encodings for RIFT | Thrift file for packet encodings for RIFT | |||
*/ | */ | |||
include "common.thrift" | include "common.thrift" | |||
namespace py encoding | namespace py encoding | |||
/** Represents protocol encoding schema major version */ | /** Represents protocol encoding schema major version */ | |||
const common.VersionType protocol_major_version = 8 | const common.VersionType protocol_major_version = 8 | |||
/** Represents protocol encoding schema minor version */ | /** Represents protocol encoding schema minor version */ | |||
const common.MinorVersionType protocol_minor_version = 0 | const common.MinorVersionType protocol_minor_version = 0 | |||
/** Common RIFT packet header. */ | ||||
struct PacketHeader { | ||||
/** Major version of protocol. */ | ||||
1: required common.VersionType major_version = | ||||
protocol_major_version; | ||||
/** Minor version of protocol. */ | ||||
2: required common.MinorVersionType minor_version = | ||||
protocol_minor_version; | ||||
/** Node sending the packet, in case of LIE/TIRE/TIDE | ||||
also the originator of it. */ | ||||
3: required common.SystemIDType sender; | ||||
/** Level of the node sending the packet, required on everything | ||||
except LIEs. Lack of presence on LIEs indicates UNDEFINED_LEVEL | ||||
and is used in ZTP procedures. | ||||
*/ | ||||
4: optional common.LevelType level; | ||||
} | ||||
/** Prefix community. */ | /** Common RIFT packet header. */ | |||
struct Community { | struct PacketHeader { | |||
/** Higher order bits */ | /** Major version of protocol. */ | |||
1: required i32 top; | 1: required common.VersionType major_version = | |||
/** Lower order bits */ | protocol_major_version; | |||
2: required i32 bottom; | /** Minor version of protocol. */ | |||
} | 2: required common.MinorVersionType minor_version = | |||
protocol_minor_version; | ||||
/** Node sending the packet, in case of LIE/TIRE/TIDE | ||||
also the originator of it. */ | ||||
3: required common.SystemIDType sender; | ||||
/** Level of the node sending the packet, required on everything | ||||
except LIEs. Lack of presence on LIEs indicates | ||||
UNDEFINED_LEVEL and is used in ZTP procedures. | ||||
*/ | ||||
4: optional common.LevelType level; | ||||
} | ||||
/** Neighbor structure. */ | /** Prefix community. */ | |||
struct Neighbor { | struct Community { | |||
/** System ID of the originator. */ | /** Higher order bits */ | |||
1: required common.SystemIDType originator; | 1: required i32 top; | |||
/** ID of remote side of the link. */ | /** Lower order bits */ | |||
2: required common.LinkIDType remote_id; | 2: required i32 bottom; | |||
} | } | |||
/** Capabilities the node supports. */ | /** Neighbor structure. */ | |||
struct NodeCapabilities { | struct Neighbor { | |||
/** Must advertise supported minor version dialect that way. */ | /** System ID of the originator. */ | |||
1: required common.MinorVersionType protocol_minor_version = | 1: required common.SystemIDType originator; | |||
protocol_minor_version; | /** ID of remote side of the link. */ | |||
/** indicates that node supports flood reduction. */ | 2: required common.LinkIDType remote_id; | |||
2: optional bool flood_reduction = | } | |||
common.flood_reduction_default; | ||||
/** indicates place in hierarchy, i.e. top-of-fabric or | ||||
leaf only (in ZTP) or support for leaf-2-leaf | ||||
procedures. */ | ||||
3: optional common.HierarchyIndications hierarchy_indications; | ||||
} | /** Capabilities the node supports. */ | |||
struct NodeCapabilities { | ||||
/** Must advertise supported minor version dialect that way. */ | ||||
1: required common.MinorVersionType protocol_minor_version = | ||||
protocol_minor_version; | ||||
/** indicates that node supports flood reduction. */ | ||||
2: optional bool flood_reduction = | ||||
common.flood_reduction_default; | ||||
/** indicates place in hierarchy, i.e., top of fabric or | ||||
leaf only (in ZTP) or support for L2L | ||||
procedures. */ | ||||
3: optional common.HierarchyIndications hierarchy_indications; | ||||
} | ||||
/** Link capabilities. */ | /** Link capabilities. */ | |||
struct LinkCapabilities { | struct LinkCapabilities { | |||
/** Indicates that the link is supporting BFD. */ | /** Indicates that the link is supporting BFD. */ | |||
1: optional bool bfd = | 1: optional bool bfd = | |||
common.bfd_default; | common.bfd_default; | |||
/** Indicates whether the interface will support IPv4 forwarding. */ | /** Indicates whether the interface will support IPv4 | |||
2: optional bool ipv4_forwarding_capable = | forwarding. */ | |||
true; | 2: optional bool ipv4_forwarding_capable = | |||
} | true; | |||
} | ||||
/** RIFT LIE Packet. | /** RIFT LIE Packet. | |||
@note: this node's level is already included on the packet header | @note: This node's level is already included on the packet header. | |||
*/ | */ | |||
struct LIEPacket { | struct LIEPacket { | |||
/** Node or adjacency name. */ | /** Node or adjacency name. */ | |||
1: optional string name; | 1: optional string name; | |||
/** Local link ID. */ | /** Local link ID. */ | |||
2: required common.LinkIDType local_id; | 2: required common.LinkIDType local_id; | |||
/** UDP port to which we can receive flooded TIEs. */ | /** UDP port to which we can receive flooded TIEs. */ | |||
3: required common.UDPPortType flood_port = | 3: required common.UDPPortType flood_port = | |||
common.default_tie_udp_flood_port; | common.default_tie_udp_flood_port; | |||
/** Layer 2 MTU, used to discover mismatch. */ | /** Layer 2 MTU, used to discover mismatch. */ | |||
4: optional common.MTUSizeType link_mtu_size = | 4: optional common.MTUSizeType link_mtu_size = | |||
common.default_mtu_size; | common.default_mtu_size; | |||
/** Local link bandwidth on the interface. */ | /** Local link bandwidth on the interface. */ | |||
5: optional common.BandwithInMegaBitsType | 5: optional common.BandwidthInMegaBitsType | |||
link_bandwidth = common.default_bandwidth; | link_bandwidth = common.default_bandwidth; | |||
/** Reflects the neighbor once received to provide | /** Reflects the neighbor once received to provide | |||
3-way connectivity. */ | 3-way connectivity. */ | |||
6: optional Neighbor neighbor; | 6: optional Neighbor neighbor; | |||
/** Node's PoD. */ | /** Node's PoD. */ | |||
7: optional common.PodType pod = | 7: optional common.PodType pod = | |||
common.default_pod; | common.default_pod; | |||
/** Node capabilities supported. */ | /** Node capabilities supported. */ | |||
10: required NodeCapabilities node_capabilities; | 10: required NodeCapabilities node_capabilities; | |||
/** Capabilities of this link. */ | /** Capabilities of this link. */ | |||
11: optional LinkCapabilities link_capabilities; | 11: optional LinkCapabilities link_capabilities; | |||
/** Required holdtime of the adjacency, i.e. for how | /** Required holdtime of the adjacency, i.e., for how long a | |||
long a period should adjacency be kept up without valid LIE reception. */ | period adjacency should be kept up without valid LIE | |||
12: required common.TimeIntervalInSecType | reception. */ | |||
holdtime = common.default_lie_holdtime; | 12: required common.TimeIntervalInSecType | |||
/** Optional, unsolicited, downstream assigned locally significant label | holdtime = common.default_lie_holdtime; | |||
value for the adjacency. */ | /** Optional, unsolicited, downstream assigned locally significant | |||
13: optional common.LabelType label; | label value for the adjacency. */ | |||
/** Indicates that the level on the LIE must not be used | 13: optional common.LabelType label; | |||
to derive a ZTP level by the receiving node. */ | /** Indicates that the level on the LIE must not be used | |||
21: optional bool not_a_ztp_offer = | to derive a ZTP level by the receiving node. */ | |||
common.default_not_a_ztp_offer; | 21: optional bool not_a_ztp_offer = | |||
/** Indicates to northbound neighbor that it should | common.default_not_a_ztp_offer; | |||
be reflooding TIEs received from this node to achieve flood | /** Indicates to northbound neighbor that it should | |||
reduction and balancing for northbound flooding. */ | be reflooding TIEs received from this node to achieve flood | |||
22: optional bool you_are_flood_repeater = | reduction and balancing for northbound flooding. */ | |||
common.default_you_are_flood_repeater; | 22: optional bool you_are_flood_repeater = | |||
/** Indicates to neighbor to flood node TIEs only and slow down | common.default_you_are_flood_repeater; | |||
all other TIEs. Ignored when received from southbound neighbor. */ | /** Indicates to neighbor to flood node TIEs only and slow down | |||
23: optional bool you_are_sending_too_quickly = | all other TIEs. Ignored when received from southbound | |||
false; | neighbor. */ | |||
/** Instance name in case multiple RIFT instances running on same | 23: optional bool you_are_sending_too_quickly = | |||
interface. */ | false; | |||
24: optional string instance_name; | /** Instance name in case multiple RIFT instances running on same | |||
/** It provides the optional ID of the Fabric configured. This MUST match the information advertised | interface. */ | |||
on the node element. */ | 24: optional string instance_name; | |||
35: optional common.FabricIDType fabric_id = common.default_fabric_id; | /** It provides the optional ID of the fabric configured. This | |||
MUST match the information advertised on the node element. */ | ||||
35: optional common.FabricIDType fabric_id = | ||||
common.default_fabric_id; | ||||
} | } | |||
/** LinkID pair describes one of parallel links between two nodes. */ | /** LinkID pair describes one of parallel links between two nodes. */ | |||
struct LinkIDPair { | struct LinkIDPair { | |||
/** Node-wide unique value for the local link. */ | /** Node-wide unique value for the local link. */ | |||
1: required common.LinkIDType local_id; | 1: required common.LinkIDType local_id; | |||
/** Received remote link ID for this link. */ | /** Received remote link ID for this link. */ | |||
2: required common.LinkIDType remote_id; | 2: required common.LinkIDType remote_id; | |||
/** Describes the local interface index of the link. */ | /** Describes the local interface index of the link. */ | |||
10: optional common.PlatformInterfaceIndex platform_interface_index; | 10: optional common.PlatformInterfaceIndex | |||
/** Describes the local interface name. */ | platform_interface_index; | |||
11: optional string platform_interface_name; | /** Describes the local interface name. */ | |||
/** Indicates whether the link is secured, i.e. protected by | 11: optional string platform_interface_name; | |||
outer key, absence of this element means no indication, | /** Indicates whether the link is secured, i.e., protected by | |||
undefined outer key means not secured. */ | outer key, absence of this element means no indication, | |||
12: optional common.OuterSecurityKeyID | undefined outer key means not secured. */ | |||
trusted_outer_security_key; | 12: optional common.OuterSecurityKeyID | |||
/** Indicates whether the link is protected by established | trusted_outer_security_key; | |||
BFD session. */ | /** Indicates whether the link is protected by established | |||
13: optional bool bfd_up; | BFD session. */ | |||
/** Optional indication which address families are up on the | 13: optional bool bfd_up; | |||
interface */ | /** Optional indication which address families are up on the | |||
14: optional set<common.AddressFamilyType> | interface */ | |||
address_families; | 14: optional set<common.AddressFamilyType> | |||
} | address_families; | |||
} | ||||
/** Unique ID of a TIE. */ | /** Unique ID of a TIE. */ | |||
struct TIEID { | struct TIEID { | |||
/** direction of TIE */ | /** direction of TIE */ | |||
1: required common.TieDirectionType direction; | 1: required common.TieDirectionType direction; | |||
/** indicates originator of the TIE */ | /** indicates originator of the TIE */ | |||
2: required common.SystemIDType originator; | 2: required common.SystemIDType originator; | |||
/** type of the tie */ | /** type of the tie */ | |||
3: required common.TIETypeType tietype; | 3: required common.TIETypeType tietype; | |||
/** number of the tie */ | /** number of the tie */ | |||
4: required common.TIENrType tie_nr; | 4: required common.TIENrType tie_nr; | |||
} | } | |||
/** Header of a TIE. */ | /** Header of a TIE. */ | |||
struct TIEHeader { | struct TIEHeader { | |||
/** ID of the tie. */ | /** ID of the tie. */ | |||
2: required TIEID tieid; | 2: required TIEID tieid; | |||
/** Sequence number of the tie. */ | /** Sequence number of the tie. */ | |||
3: required common.SeqNrType seq_nr; | 3: required common.SeqNrType seq_nr; | |||
/** Absolute timestamp when the TIE was generated. */ | /** Absolute timestamp when the TIE was generated. */ | |||
10: optional common.IEEE802_1ASTimeStampType origination_time; | 10: optional common.IEEE802_1ASTimeStampType origination_time; | |||
/** Original lifetime when the TIE was generated. */ | /** Original lifetime when the TIE was generated. */ | |||
12: optional common.LifeTimeInSecType origination_lifetime; | 12: optional common.LifeTimeInSecType origination_lifetime; | |||
} | } | |||
/** Header of a TIE as described in TIRE/TIDE. | /** Header of a TIE as described in TIRE/TIDE. | |||
*/ | */ | |||
struct TIEHeaderWithLifeTime { | struct TIEHeaderWithLifeTime { | |||
1: required TIEHeader header; | 1: required TIEHeader header; | |||
/** Remaining lifetime. */ | /** Remaining lifetime. */ | |||
2: required common.LifeTimeInSecType remaining_lifetime; | 2: required common.LifeTimeInSecType remaining_lifetime; | |||
} | } | |||
/** TIDE with *sorted* TIE headers. */ | /** TIDE with *sorted* TIE headers. */ | |||
struct TIDEPacket { | struct TIDEPacket { | |||
/** First TIE header in the tide packet. */ | /** First TIE header in the TIDE packet. */ | |||
1: required TIEID start_range; | 1: required TIEID start_range; | |||
/** Last TIE header in the tide packet. */ | /** Last TIE header in the TIDE packet. */ | |||
2: required TIEID end_range; | 2: required TIEID end_range; | |||
/** _Sorted_ list of headers. */ | /** _Sorted_ list of headers. */ | |||
3: required list<TIEHeaderWithLifeTime> | 3: required list<TIEHeaderWithLifeTime> | |||
headers; | headers; | |||
} | } | |||
/** TIRE packet */ | /** TIRE packet */ | |||
struct TIREPacket { | struct TIREPacket { | |||
1: required set<TIEHeaderWithLifeTime> | 1: required set<TIEHeaderWithLifeTime> | |||
headers; | headers; | |||
} | } | |||
/** neighbor of a node */ | ||||
struct NodeNeighborsTIEElement { | ||||
/** level of neighbor */ | ||||
1: required common.LevelType level; | ||||
/** Cost to neighbor. Ignore anything equal/larger than `infinite_distance` or equal `invalid_distance` */ | ||||
3: optional common.MetricType cost | ||||
= common.default_distance; | ||||
/** can carry description of multiple parallel links in a TIE */ | ||||
4: optional set<LinkIDPair> | ||||
link_ids; | ||||
/** total bandwith to neighbor as sum of all parallel links */ | ||||
5: optional common.BandwithInMegaBitsType | ||||
bandwidth = common.default_bandwidth; | ||||
} | ||||
/** Indication flags of the node. */ | /** neighbor of a node */ | |||
struct NodeFlags { | struct NodeNeighborsTIEElement { | |||
/** Indicates that node is in overload, do not transit traffic | /** level of neighbor */ | |||
through it. */ | 1: required common.LevelType level; | |||
1: optional bool overload = common.overload_default; | /** Cost to neighbor. Ignore anything equal/larger than | |||
} | 'infinite_distance' or equal 'invalid_distance' */ | |||
3: optional common.MetricType cost | ||||
= common.default_distance; | ||||
/** can carry description of multiple parallel links in a TIE */ | ||||
4: optional set<LinkIDPair> | ||||
link_ids; | ||||
/** total bandwidth to neighbor as sum of all parallel links */ | ||||
5: optional common.BandwidthInMegaBitsType | ||||
bandwidth = common.default_bandwidth; | ||||
} | ||||
/** Description of a node. */ | /** Indication flags of the node. */ | |||
struct NodeTIEElement { | struct NodeFlags { | |||
/** Level of the node. */ | /** Indicates that node is in overload, do not transit traffic | |||
1: required common.LevelType level; | through it. */ | |||
/** Node's neighbors. Multiple node TIEs can carry disjoint sets of neighbors. */ | 1: optional bool overload = common.overload_default; | |||
2: required map<common.SystemIDType, | } | |||
NodeNeighborsTIEElement> neighbors; | ||||
/** Capabilities of the node. */ | ||||
3: required NodeCapabilities capabilities; | ||||
/** Flags of the node. */ | ||||
4: optional NodeFlags flags; | ||||
/** Optional node name for easier operations. */ | ||||
5: optional string name; | ||||
/** PoD to which the node belongs. */ | ||||
6: optional common.PodType pod; | ||||
/** optional startup time of the node */ | ||||
7: optional common.TimestampInSecsType startup_time; | ||||
/** If any local links are miscabled, this indication is flooded. */ | /** Description of a node. */ | |||
10: optional set<common.LinkIDType> | struct NodeTIEElement { | |||
miscabled_links; | /** Level of the node. */ | |||
1: required common.LevelType level; | ||||
/** Node's neighbors. Multiple node TIEs can carry disjoint sets | ||||
of neighbors. */ | ||||
2: required map<common.SystemIDType, | ||||
NodeNeighborsTIEElement> neighbors; | ||||
/** Capabilities of the node. */ | ||||
3: required NodeCapabilities capabilities; | ||||
/** Flags of the node. */ | ||||
4: optional NodeFlags flags; | ||||
/** Optional node name for easier operations. */ | ||||
5: optional string name; | ||||
/** PoD to which the node belongs. */ | ||||
6: optional common.PodType pod; | ||||
/** Optional startup time of the node */ | ||||
7: optional common.TimestampInSecsType startup_time; | ||||
/** ToFs in the same plane. Only carried by ToF. Multiple Node TIEs can carry disjoint sets of ToFs | /** If any local links are miscabled, this indication is | |||
which MUST be joined to form a single set. */ | flooded. */ | |||
12: optional set<common.SystemIDType> | 10: optional set<common.LinkIDType> | |||
same_plane_tofs; | miscabled_links; | |||
/** It provides the optional ID of the Fabric configured */ | /** ToFs in the same plane. Only carried by ToF. Multiple Node | |||
20: optional common.FabricIDType fabric_id = common.default_fabric_id; | TIEs can carry disjoint sets of ToFs that MUST be joined to | |||
form a single set. */ | ||||
12: optional set<common.SystemIDType> | ||||
same_plane_tofs; | ||||
} | /** It provides the optional ID of the fabric configured */ | |||
20: optional common.FabricIDType fabric_id = | ||||
common.default_fabric_id; | ||||
/** Attributes of a prefix. */ | } | |||
struct PrefixAttributes { | ||||
/** Distance of the prefix. */ | ||||
2: required common.MetricType metric | ||||
= common.default_distance; | ||||
/** Generic unordered set of route tags, can be redistributed | ||||
to other protocols or use within the context of real time | ||||
analytics. */ | ||||
3: optional set<common.RouteTagType> | ||||
tags; | ||||
/** Monotonic clock for mobile addresses. */ | ||||
4: optional common.PrefixSequenceType monotonic_clock; | ||||
/** Indicates if the prefix is a node loopback. */ | ||||
6: optional bool loopback = false; | ||||
/** Indicates that the prefix is directly attached. */ | ||||
7: optional bool directly_attached = true; | ||||
/** link to which the address belongs to. */ | ||||
10: optional common.LinkIDType from_link; | ||||
/** Optional, per prefix significant label. */ | ||||
12: optional common.LabelType label; | ||||
} | ||||
/** TIE carrying prefixes */ | /** Attributes of a prefix. */ | |||
struct PrefixTIEElement { | struct PrefixAttributes { | |||
/** Prefixes with the associated attributes. */ | /** Distance of the prefix. */ | |||
1: required map<common.IPPrefixType, PrefixAttributes> prefixes; | 2: required common.MetricType metric | |||
} | = common.default_distance; | |||
/** Generic unordered set of route tags, can be redistributed | ||||
to other protocols or used within the context of real time | ||||
analytics. */ | ||||
3: optional set<common.RouteTagType> | ||||
tags; | ||||
/** Monotonic clock for mobile addresses. */ | ||||
4: optional common.PrefixSequenceType monotonic_clock; | ||||
/** Indicates if the prefix is a node loopback. */ | ||||
6: optional bool loopback = false; | ||||
/** Indicates that the prefix is directly attached. */ | ||||
7: optional bool directly_attached = true; | ||||
/** Link to which the address belongs to. */ | ||||
10: optional common.LinkIDType from_link; | ||||
/** Optional, per-prefix significant label. */ | ||||
12: optional common.LabelType label; | ||||
} | ||||
/** Defines the targeted nodes and the value carried. */ | /** TIE carrying prefixes */ | |||
struct KeyValueTIEElementContent { | struct PrefixTIEElement { | |||
1: optional common.KeyValueTargetType targets = common.keyvaluetarget_default; | /** Prefixes with the associated attributes. */ | |||
2: optional binary value; | 1: required map<common.IPPrefixType, PrefixAttributes> prefixes; | |||
} | } | |||
/** Generic key value pairs. */ | /** Defines the targeted nodes and the value carried. */ | |||
struct KeyValueTIEElement { | struct KeyValueTIEElementContent { | |||
1: required map<common.KeyIDType, KeyValueTIEElementContent> keyvalues; | 1: optional common.KeyValueTargetType targets = | |||
} | common.keyvaluetarget_default; | |||
2: optional binary value; | ||||
} | ||||
/** Single element in a TIE. */ | /** Generic key value pairs. */ | |||
union TIEElement { | struct KeyValueTIEElement { | |||
/** Used in case of enum common.TIETypeType.NodeTIEType. */ | 1: required map<common.KeyIDType, KeyValueTIEElementContent> | |||
1: optional NodeTIEElement node; | keyvalues; | |||
/** Used in case of enum common.TIETypeType.PrefixTIEType. */ | } | |||
2: optional PrefixTIEElement prefixes; | ||||
/** Positive prefixes (always southbound). */ | ||||
3: optional PrefixTIEElement positive_disaggregation_prefixes; | ||||
/** Transitive, negative prefixes (always southbound) */ | ||||
5: optional PrefixTIEElement negative_disaggregation_prefixes; | ||||
/** Externally reimported prefixes. */ | ||||
6: optional PrefixTIEElement external_prefixes; | ||||
/** Positive external disaggregated prefixes (always southbound). */ | ||||
7: optional PrefixTIEElement | ||||
positive_external_disaggregation_prefixes; | ||||
/** Key-Value store elements. */ | ||||
9: optional KeyValueTIEElement keyvalues; | ||||
} | ||||
/** TIE packet */ | /** Single element in a TIE. */ | |||
struct TIEPacket { | union TIEElement { | |||
1: required TIEHeader header; | /** Used in case of enum common.TIETypeType.NodeTIEType. */ | |||
2: required TIEElement element; | 1: optional NodeTIEElement node; | |||
} | /** Used in case of enum common.TIETypeType.PrefixTIEType. */ | |||
2: optional PrefixTIEElement prefixes; | ||||
/** Positive prefixes (always southbound). */ | ||||
3: optional PrefixTIEElement positive_disaggregation_prefixes; | ||||
/** Transitive, negative prefixes (always southbound) */ | ||||
5: optional PrefixTIEElement negative_disaggregation_prefixes; | ||||
/** Externally reimported prefixes. */ | ||||
6: optional PrefixTIEElement external_prefixes; | ||||
/** Positive external disaggregated prefixes (always | ||||
southbound). */ | ||||
7: optional PrefixTIEElement | ||||
positive_external_disaggregation_prefixes; | ||||
/** Key-Value store elements. */ | ||||
9: optional KeyValueTIEElement keyvalues; | ||||
} | ||||
/** Content of a RIFT packet. */ | /** TIE packet */ | |||
union PacketContent { | struct TIEPacket { | |||
1: optional LIEPacket lie; | 1: required TIEHeader header; | |||
2: optional TIDEPacket tide; | 2: required TIEElement element; | |||
3: optional TIREPacket tire; | } | |||
4: optional TIEPacket tie; | ||||
} | ||||
/** RIFT packet structure. */ | /** Content of a RIFT packet. */ | |||
struct ProtocolPacket { | union PacketContent { | |||
1: required PacketHeader header; | 1: optional LIEPacket lie; | |||
2: required PacketContent content; | 2: optional TIDEPacket tide; | |||
} | 3: optional TIREPacket tire; | |||
4: optional TIEPacket tie; | ||||
} | ||||
/** RIFT packet structure. */ | ||||
struct ProtocolPacket { | ||||
1: required PacketHeader header; | ||||
2: required PacketContent content; | ||||
} | ||||
8. Further Details on Implementation | 8. Further Details on Implementation | |||
8.1. Considerations for Leaf-Only Implementation | 8.1. Considerations for Leaf-Only Implementation | |||
RIFT can and is intended to be stretched to the lowest level in the | RIFT can and is intended to be stretched to the lowest level in the | |||
IP fabric to integrate ToRs or even servers. Since those entities | IP fabric to integrate ToRs or even servers. Since those entities | |||
would run as leaves only, it is worth to observe that a leaf only | would run as leaves only, it is worth it to observe that a leaf-only | |||
version is significantly simpler to implement and requires much less | version is significantly simpler to implement and requires much less | |||
resources: | resources: | |||
1. Leaf nodes only need to maintain a multipath default route under | 1. Leaf nodes only need to maintain a multipath default route under | |||
normal circumstances. However, in cases of catastrophic | normal circumstances. However, in cases of catastrophic | |||
partitioning, leaf nodes SHOULD be capable of accommodating all | partitioning, leaf nodes SHOULD be capable of accommodating all | |||
the leaf routes in their own PoD to prevent traffic loss. | the leaf routes in their own PoD to prevent traffic loss. | |||
2. Leaf nodes hold only their own North TIEs and the South TIEs of | 2. Leaf nodes only hold their own North TIEs and the South TIEs of | |||
Level 1 nodes they are connected to. | level 1 nodes they are connected to. | |||
3. Leaf nodes do not have to support any type of disaggregation | 3. Leaf nodes do not have to support any type of disaggregation | |||
computation or propagation. | computation or propagation. | |||
4. Leaf nodes are not required to support the overload flag. | 4. Leaf nodes are not required to support the overload flag. | |||
5. Leaf nodes do not need to originate S-TIEs unless optional leaf- | 5. Leaf nodes do not need to originate S-TIEs unless optional L2L | |||
2-leaf features are desired. | features are desired. | |||
8.2. Considerations for Spine Implementation | 8.2. Considerations for Spine Implementation | |||
Nodes that do not act as ToF are not required to discover fallen | Nodes that do not act as ToF are not required to discover fallen | |||
leaves by comparing reachable destinations with peers and therefore | leaves by comparing reachable destinations with peers and therefore | |||
do not need to run the computation of disaggregated routes based on | do not need to run the computation of disaggregated routes based on | |||
that discovery. On the other hand, non-ToF nodes need to respect | that discovery. On the other hand, non-ToF nodes need to respect | |||
disaggregated routes advertised from the north. In the case of | disaggregated routes advertised from the north. In the case of | |||
negative disaggregation, spines nodes need to generate southbound | negative disaggregation, spines nodes need to generate southbound | |||
disaggregated routes when all parents are lost for a fallen leaf. | disaggregated routes when all parents are lost for a fallen leaf. | |||
9. Security Considerations | 9. Security Considerations | |||
9.1. General | 9.1. General | |||
One can consider attack vectors where a router may reboot many times | One can consider attack vectors where a router may reboot many times | |||
while changing its System ID and pollute the network with many stale | while changing its System ID and pollute the network with many stale | |||
TIEs or TIEs that are sent with very long lifetimes and not cleaned | TIEs or TIEs that are sent with very long lifetimes and not cleaned | |||
up when the routes vanish. Those attack vectors are not unique to | up when the routes vanish. Those attack vectors are not unique to | |||
RIFT. Given large memory footprints available today those attacks | RIFT. Given large memory footprints available today, those attacks | |||
should be relatively benign. Otherwise, a node SHOULD implement a | should be relatively benign. Otherwise, a node SHOULD implement a | |||
strategy of discarding contents of all TIEs that were not present in | strategy of discarding contents of all TIEs that were not present in | |||
the SPF tree over a certain, configurable period of time. Since the | the SPF tree over a certain, configurable period of time. Since the | |||
protocol is self-stabilizing and will advertise the presence of such | protocol is self-stabilizing and will advertise the presence of such | |||
TIEs to its neighbors, they can be re-requested again if a | TIEs to its neighbors, they can be re-requested again if a | |||
computation finds that it has an adjacency formed towards the System | computation finds that it has an adjacency formed towards the System | |||
ID of the discarded TIEs. | ID of the discarded TIEs. | |||
The inner protection configured based on any of the mechanisms in | The inner protection configured based on any of the mechanisms in | |||
Section 10.2 guarantees the integrity of TIE content and when | Section 10.2 guarantees the integrity of TIE content, and when | |||
combined with outer part of the envelope using any of the mechanisms | combined with the outer part of the envelope, using any of the | |||
in Section 10.2 guarantees protection against replay attacks as well. | mechanisms in Section 10.2, guarantees protection against replay | |||
If only outer protection (i.e., an outer key ID different from | attacks as well. If only outer protection (i.e., an outer key ID | |||
`undefined_securitykey_id`) is applied to an adjacency by the means | different from 'undefined_securitykey_id') is applied to an adjacency | |||
of any mechanism in Section 10.2 the integrity of the packet and | by the means of any mechanism in Section 10.2, the integrity of the | |||
replay protection is guaranteed only over the adjacency involved in | packet and replay protection is guaranteed only over the adjacency | |||
any of the configured directions. Further considerations can be | involved in any of the configured directions. Further considerations | |||
found in Section 9.7 and Section 9.8. | can be found in Sections 9.7 and 9.8. | |||
9.2. Time to Live and Hop Limit Values | 9.2. Time to Live and Hop Limit Values | |||
RIFT explicitly requires the use of a TTL/HL value of 1 *or* 255 when | RIFT explicitly requires the use of a TTL/HL value of 1 *or* 255 when | |||
sending/receiving LIEs and TIEs so that implementors have a choice | sending/receiving LIEs and TIEs so that implementors have a choice | |||
between the two. | between the two. | |||
Using a TTL/HL value of 255 does come with security concerns, but | Using a TTL/HL value of 255 does come with security concerns, but | |||
those risks are addressed in [RFC5082]. However, this approach may | those risks are addressed in [RFC5082]. However, this approach may | |||
still have difficulties with some forwarding implementations (e.g. | still have difficulties with some forwarding implementations (e.g., | |||
incorrectly processing TTL/HL, loops within forwarding plane itself, | incorrectly processing TTL/HL, loops within the forwarding plane | |||
etc.). | itself, etc.). | |||
It is for this reason that RIFT also allows implementations to use a | It is for this reason that RIFT also allows implementations to use a | |||
TTL/HL of 1. Attacks that exploit this by spoofing it from several | TTL/HL of 1. Attacks that exploit this by spoofing it from several | |||
hops away are indeed possible, but are exceptionally difficult to | hops away are indeed possible but are exceptionally difficult to | |||
engineer. Replay attacks are another potential attack vector, but as | engineer. Replay attacks are another potential attack vector, but as | |||
described in the subsequent security sections, RIFT is well protected | described in the subsequent security sections, RIFT is well protected | |||
against such attacks if any of the mechanisms in Section 10.2 is | against such attacks if any of the mechanisms in Section 10.2 are | |||
applied. Additionally, for link-local scoped multicast addresses | applied. Additionally, for link-local scoped multicast addresses | |||
used for LIE the value of 1 presents a more consistent choice. | used for LIE, the value of 1 presents a more consistent choice. | |||
9.3. Malformed Packets | 9.3. Malformed Packets | |||
The protocol protects packets extensively through optional signatures | The protocol protects packets extensively through optional signatures | |||
and nonces so if the possibility of maliciously injected malformed or | and nonces, so if the possibility of maliciously injected malformed | |||
replayed packets exist in a deployment algorithms in Section 10.2 | or replayed packets exist in a deployment, algorithms in Section 10.2 | |||
must be applied. | must be applied. | |||
Even with the security envelope, since RIFT relies on Thrift encoders | Even with the security envelope, since RIFT relies on Thrift encoders | |||
and decoders generated automatically from IDL it is conceivable that | and decoders generated automatically from IDL, it is conceivable that | |||
errors in such encoders/decoders could be discovered and lead to | errors in such encoders/decoders could be discovered and lead to | |||
delivery of corrupted packets or reception of packets that cannot be | delivery of corrupted packets or reception of packets that cannot be | |||
decoded. Misformatted packets lead normally to decoder returning an | decoded. Misformatted packets normally lead to the decoder returning | |||
error condition to the caller and with that the packet is basically | an error condition to the caller, and with that, the packet is | |||
unparsable with no other choice but to discard it. Should the | basically unparsable with no other choice but to discard it. Should | |||
unlikely scenario occur of the decoder being forced to abort the | the unlikely scenario occur of the decoder being forced to abort the | |||
protocol this is neither better nor worse than today's behavior of | protocol, this is neither better nor worse than today's behavior of | |||
other protocols. | other protocols. | |||
9.4. RIFT ZTP | 9.4. RIFT ZTP | |||
Section 6.7 presents many attack vectors in untrusted environments, | Section 6.7 presents many attack vectors in untrusted environments, | |||
starting with nodes that oscillate their level offers to the | starting with nodes that oscillate their level offers to the | |||
possibility of nodes offering a _ThreeWay_ adjacency with the highest | possibility of nodes offering a _ThreeWay_ adjacency with the highest | |||
possible level value and a very long holdtime trying to put itself | possible level value and a very long holdtime trying to put itself | |||
"on top of the lattice" thereby allowing it to gain access to the | "on top of the lattice", thereby allowing it to gain access to the | |||
whole southbound topology. Session authentication mechanisms are | whole southbound topology. Session authentication mechanisms are | |||
necessary in environments where this is possible and RIFT provides | necessary in environments where this is possible, and RIFT provides | |||
the security envelope to ensure this if so desired if any mechanism | the security envelope to ensure this, if so desired, if any mechanism | |||
in Section 10.2 is deployed. | in Section 10.2 is deployed. | |||
9.5. Lifetime | 9.5. Lifetime | |||
RIFT removes lifetime modification and replay attack vectors by | RIFT removes lifetime modification and replay attack vectors by | |||
protecting the lifetime behind a signature computed over it and | protecting the lifetime behind a signature computed over it and | |||
additional nonce combination which results in the inability of an | additional nonce combination, which results in the inability of an | |||
attacker to artificially shorten the _remaining_lifetime_. This only | attacker to artificially shorten the _remaining_lifetime_. This only | |||
applies if any mechanism in Section 10.2 is used. | applies if any mechanism in Section 10.2 is used. | |||
9.6. Packet Number | 9.6. Packet Number | |||
An optional defined value number that is carried in the security | A packet number is an optional defined value number that is carried | |||
envelope without any fingerprint protection and is hence vulnerable | in the security envelope without any fingerprint protection and is | |||
to replay and modification attacks. Contrary to nonces, this number | hence vulnerable to replay and modification attacks. Contrary to | |||
must change on every packet and would present a very high | nonces, this number must change on every packet and would present a | |||
cryptographic load if signed. The attack vector packet number | very high cryptographic load if signed. The attack vector packet | |||
present is relatively benign. Changing the packet number by a man- | number present is relatively benign. Changing the packet number by a | |||
in-the-middle attack will only affect operational validation tools | man-in-the-middle attack will only affect operational validation | |||
and possibly some performance optimizations on flooding. It is | tools and possibly some performance optimizations on flooding. It is | |||
expected that an implementation detecting too many "fake losses" or | expected that an implementation detecting too many "fake losses" or | |||
"misorderings" due to the attack on the packet number would simply | "misorderings" due to the attack on the packet number would simply | |||
suppress its further processing. | suppress its further processing. | |||
9.7. Outer Fingerprint Attacks | 9.7. Outer Fingerprint Attacks | |||
Even when a mechanism in Section 10.2 is enabled to generate outer | Even when a mechanism in Section 10.2 is enabled to generate outer | |||
fingerprints further attack considerations apply. | fingerprints, further attack considerations apply. | |||
A node can try to inject LIE packets observing a conversation on the | A node can try to inject LIE packets observing a conversation on the | |||
wire by using the observed outer Key ID albeit it cannot generate | wire by using the observed outer key ID, albeit it cannot generate | |||
valid signatures in case it changes the integrity of the message so | valid signatures in case it changes the integrity of the message, so | |||
the only possible attack is DoS due to excessive LIE validation if | the only possible attack is DoS due to excessive LIE validation if | |||
any mechanism in Section 10.2 is used. | any mechanism in Section 10.2 is used. | |||
A node can try to replay previous LIEs with changed state that it | A node can try to replay previous LIEs with a changed state that it | |||
recorded but the attack is hard to replicate since the nonce | recorded, but the attack is hard to replicate since the nonce | |||
combination must match the ongoing exchange and is then limited to a | combination must match the ongoing exchange and is then limited to | |||
single flap only since both nodes will advance their nonces in case | only a single flap since both nodes will advance their nonces in case | |||
the adjacency state changed. Even in the most unlikely case the | the adjacency state changed. Even in the most unlikely case, the | |||
attack length is limited due to both sides periodically increasing | attack length is limited due to both sides periodically increasing | |||
their nonces. | their nonces. | |||
Generally, since weak nonces are not changed on every packet for | Generally, since weak nonces are not changed on every packet for | |||
performance reasons a conceivable attack vector by a man-in-the- | performance reasons, a conceivable attack vector by a man in the | |||
middle is to flood a receiving node with maximum bandwidth of | middle is to flood a receiving node with the maximum bandwidth of | |||
recently observed packets, both LIEs as well as TIEs. In a scenario | recently observed packets, both LIEs as well as TIEs. In a scenario | |||
where such attacks are likely _maximum_valid_nonce_delta_ can be | where such attacks are likely, _maximum_valid_nonce_delta_ and | |||
implemented as configurable, small value and | _nonce_regeneration_interval_ can be implemented as configurable and | |||
_nonce_regeneration_interval_ configured to very small value as well. | set to small values. This will likely present a significant | |||
This will likely present a significant computational load on large | computational load on large fabrics under normal operation. | |||
fabrics under normal operation. | ||||
9.8. TIE Origin Fingerprint DoS Attacks | 9.8. TIE Origin Fingerprint DoS Attacks | |||
Even when a mechanism in Section 10.2 is enabled to generate inner | Even when a mechanism in Section 10.2 is enabled to generate inner | |||
fingerprints or signatures further attack considerations apply. | fingerprints or signatures, further attack considerations apply. | |||
In case the inner fingerprint could be generated by a compromised | In case the inner fingerprint could be generated by a compromised | |||
node in the network other than the originator based on shared secrets | node in the network other than the originator based on shared | |||
the deployment must fall back on use of signatures that can be | secrets, the deployment must fall back on use of signatures that can | |||
validated but not generated by any other node but the originator. | be validated but not generated by any other node except the | |||
originator. | ||||
A compromised node in the network can attempt to brute force "fake | A compromised node in the network can attempt to brute force "fake | |||
TIEs" using other nodes' TIE origin key identifiers without | TIEs" using other nodes' TIE origin key ID without possessing the | |||
possessing the necessary secrets. Albeit the ultimate validation of | necessary secrets. Albeit the ultimate validation of the origin | |||
the origin signature will fail in such scenarios and not progress | signature will fail in such scenarios and not progress further than | |||
further than immediately peering nodes, the resulting denial of | immediately peering nodes, the resulting DoS attack seems unavoidable | |||
service attack seems unavoidable since the TIE origin Key ID is only | since the TIE origin key ID is only protected by the (here assumed to | |||
protected by the (here assumed to be compromised) node. | be compromised) node. | |||
9.9. Host Implementations | 9.9. Host Implementations | |||
It can be reasonably expected that with the proliferation of RotH | It can be reasonably expected that the proliferation of RotH servers, | |||
servers, rather than dedicated networking devices, will represent a | rather than dedicated networking devices, will represent a | |||
significant amount of RIFT devices. Given their normally far wider | significant amount of RIFT devices. Given their normally far wider | |||
software envelope and access granted to them, such servers are also | software envelope and access granted to them, such servers are also | |||
far more likely to be compromised and present an attack vector on the | far more likely to be compromised and present an attack vector on the | |||
protocol. Hijacking of prefixes to attract traffic is a trust | protocol. Hijacking of prefixes to attract traffic is a trust | |||
problem and cannot be easily addressed within the protocol if the | problem and cannot be easily addressed within the protocol if the | |||
trust model is breached, i.e. the server presents valid credentials | trust model is breached, i.e., the server presents valid credentials | |||
to form an adjacency and issue TIEs. In an even more devious way, | to form an adjacency and issue TIEs. In an even more devious way, | |||
the servers can present DoS (or even DDoS) vectors of issuing too | the servers can present DoS (or even DDoS) vectors from issuing too | |||
many LIE packets, flooding large amounts of North TIEs, and | many LIE packets, flooding large amounts of North TIEs, and | |||
attempting similar resource overrun attacks. A prudent | attempting similar resource overrun attacks. A prudent | |||
implementation forming adjacencies to leaves should implement | implementation forming adjacencies to leaves should implement | |||
thresholds mechanisms and raise warnings when, e.g., a leaf is | threshold mechanisms and raise warnings when, e.g., a leaf is | |||
advertising an excess number of TIEs or prefixes. Additionally, such | advertising an excess number of TIEs or prefixes. Additionally, such | |||
implementation could refuse any topology information except the | implementation could refuse any topology information except the | |||
node's own TIEs and authenticated, reflected South Node TIEs at own | node's own TIEs and authenticated, reflected South Node TIEs at their | |||
level. | own level. | |||
To isolate possible attack vectors on the leaf to the largest | To isolate possible attack vectors on the leaf to the largest | |||
possible extent a dedicated leaf-only implementation could run | possible extent, a dedicated leaf-only implementation could run | |||
without any configuration by hard-coding a well-known adjacency key | without any configuration by: | |||
(which can be always rolled-over by the means of, e.g., well-known | ||||
key-value distributed from top of the fabric), leaf level value and | ||||
always setting overload flag. All other values can be derived by | ||||
automatic means as described above. | ||||
9.9.1. IPv4 Broadcast and IPv6 All Routers Multicast Implementations | * hard-coding a well-known adjacency key (which can be always rolled | |||
over by means of, e.g., a well-known key-value distributed from | ||||
top of the fabric), | ||||
* hard-coding a leaf level value, and | ||||
* always setting the overload flag | ||||
9.9.1. IPv4 Broadcast and IPv6 All-Routers Multicast Implementations | ||||
Section 6.2 describes an optional implementation that supports LIE | Section 6.2 describes an optional implementation that supports LIE | |||
exchange over IPv4 broadcast addresses and/or the IPv6 all routers | exchange over IPv4 broadcast addresses and/or the IPv6 all-routers | |||
multicast address. It is important to consider that if an | multicast address. It is important to consider that if an | |||
implementation supports this, the attack surface widens as LIEs may | implementation supports this, the attack surface widens as LIEs may | |||
be propagated to devices outside of the intended RIFT topology. This | be propagated to devices outside of the intended RIFT topology. This | |||
may leave RIFT nodes more susceptible to the various attack vectors | may leave RIFT nodes more susceptible to the various attack vectors | |||
already described in this section. | already described in this section. | |||
10. IANA Considerations | 10. IANA Considerations | |||
This specification requests multicast address assignments and | As detailed below, multicast addresses and standard port numbers have | |||
standard port numbers. Additionally, registries for the schema are | been assigned. Additionally, registries for the schema have been | |||
requested and suggested values provided that reflect the numbers | created with initial values assigned. | |||
allocated in the given schema. | ||||
10.1. Requested Multicast and Port Numbers | 10.1. Multicast and Port Numbers | |||
This document requests allocation in the 'IPv4 Multicast Address | In the "IPv4 Multicast Address Space" registry, the value of | |||
Space' registry the suggested value of 224.0.0.121 as | 224.0.0.121 has been assigned for 'ALL_V4_RIFT_ROUTERS'. In the | |||
'ALL_V4_RIFT_ROUTERS' and in the 'IPv6 Multicast Address Space' | "IPv6 Multicast Address Space" registry, the value of ff02::a1f7 has | |||
registry the suggested value of ff02::a1f7 as 'ALL_V6_RIFT_ROUTERS'. | been assigned for 'ALL_V6_RIFT_ROUTERS'. | |||
This document requests the following allocations from the "Service | The following assignments have been made in the "Service Name and | |||
Name and Transport Protocol Port Number Registry": | Transport Protocol Port Number Registry": | |||
_RIFT LIE Port_ | _RIFT LIE Port_ | |||
Service Name: rift-lies | ||||
Transport Protocol(s): UDP | Service Name: rift-lies | |||
Assignee: Tony Przygienda (prz@juniper.net) | Port Number: 914 | |||
Contact: Jordan Head (jhead@juniper.net) | Transport Protocol: udp | |||
Description: Routing in Fat Trees Link Information Element | Description: Routing in Fat Trees Link Information Element | |||
Reference: This Document | Assignee: IESG (iesg@ietf.org) | |||
Port Number: 914 | Contact: IETF Chair (chair@ietf.org) | |||
Reference: RFC 9692 | ||||
_RIFT TIE Port_ | _RIFT TIE Port_ | |||
Service Name: rift-ties | Service Name: rift-ties | |||
Transport Protocol(s): UDP | Port Number: 915 | |||
Assignee: Tony Przygienda (prz@juniper.net) | Transport Protocol: udp | |||
Contact: Jordan Head (jhead@juniper.net) | Description: Routing in Fat Trees Topology Information Element | |||
Description: Routing in Fat Trees Topology Information Element | Assignee: IESG (iesg@ietf.org) | |||
Reference: This Document | Contact: IETF Chair (chair@ietf.org) | |||
Port Number: 915 | Reference: RFC 9692 | |||
10.2. Requested Registry for RIFT Security Algorithms | 10.2. Registry for RIFT Security Algorithms | |||
This section requests generation of a new registry holding the | A new registry has been created to hold the allowed RIFT security | |||
allowed RIFT Security Algorithms. No particular enumeration values | algorithms. No particular enumeration values are necessary since | |||
are necessary since RIFT uses a key ID abstraction on packets without | RIFT uses a key ID abstraction on packets without disclosing any | |||
disclosing any information about the algorithm or secrets used and | information about the algorithm or secrets used and only carries the | |||
only carries the resulting fingerprint or signature protecting the | resulting fingerprint or signature protecting the integrity of the | |||
integrity of the data. | data. | |||
The registry applies the "Specification Required" policy per | The registry applies the "Specification Required" policy per | |||
[RFC5226]. The designated expert should ensure that the algorithms | [RFC8126]. The designated expert should ensure that the algorithms | |||
suggested represent the state of the art at a given point in time and | suggested represent the state of the art at a given point in time and | |||
avoid introducing algorithms which do not represent enhanced security | avoid introducing algorithms that do not represent enhanced security | |||
properties or ensure such properties at lower cost as compared to | properties or ensure such properties at a lower cost as compared to | |||
existing registry entries. | existing registry entries. | |||
+==========================+===========+==========================+ | +==========================+==========================+============+ | |||
| Name | Reference | Recommendation | | | Name | Recommendation | Reference | | |||
+==========================+===========+==========================+ | +==========================+==========================+============+ | |||
| HMAC-SHA256 | [SHA-2] | Simplest way to ensure | | | HMAC-SHA256 | Simplest way to ensure | [SHA-2] | | |||
| | and | integrity of | | | | integrity of | and | | |||
| | [RFC2104] | transmissions across | | | | transmissions across | [RFC2104] | | |||
| | | adjacencies when used as | | | | adjacencies when used as | | | |||
| | | outer key and integrity | | | | outer keys and integrity | | | |||
| | | of TIEs when used as | | | | of TIEs when used as | | | |||
| | | inner keys. Recommended | | | | inner keys. Recommended | | | |||
| | | for most interoperable | | | | for most interoperable | | | |||
| | | security protection. | | | | security protection. | | | |||
+--------------------------+-----------+--------------------------+ | +--------------------------+--------------------------+------------+ | |||
| HMAC-SHA512 | [SHA-2] | Same as HMAC-SHA256 with | | | HMAC-SHA512 | Same as HMAC-SHA256 with | [SHA-2] | | |||
| | and | stronger protection. | | | | stronger protection. | and | | |||
| | [RFC2104] | | | | | | [RFC2104] | | |||
+--------------------------+-----------+--------------------------+ | +--------------------------+--------------------------+------------+ | |||
| SHA256-RSASSA-PKCS1-v1_5 | [RFC8017] | Recommended for high | | | SHA256-RSASSA-PKCS1-v1_5 | Recommended for high | [RFC8017], | | |||
| | Section | security applications | | | | security applications | Section | | |||
| | 8.2 | where private keys are | | | | where private keys are | 8.2 | | |||
| | | protected by according | | | | protected by according | | | |||
| | | nodes. Recommended as | | | | nodes. Recommended as | | | |||
| | | well in case not only | | | | well in case not only | | | |||
| | | integrity but origin | | | | integrity but origin | | | |||
| | | validation is necessary | | | | validation is necessary | | | |||
| | | for TIEs. Recommended | | | | for TIEs. Recommended | | | |||
| | | when adjacencies must be | | | | when adjacencies must be | | | |||
| | | protected without | | | | protected without | | | |||
| | | disclosing the secrets | | | | disclosing the secrets | | | |||
| | | on both sides of the | | | | on both sides of the | | | |||
| | | adjacency. | | | | adjacency. | | | |||
+--------------------------+-----------+--------------------------+ | +--------------------------+--------------------------+------------+ | |||
| SHA512-RSASSA-PKCS1-v1_5 | [RFC8017] | Same as SHA256-RSASSA- | | | SHA512-RSASSA-PKCS1-v1_5 | Same as SHA256-RSASSA- | [RFC8017] | | |||
| | | PKCS1-v1_5 with stronger | | | | PKCS1-v1_5 with stronger | | | |||
| | | protection. | | | | protection. | | | |||
+--------------------------+-----------+--------------------------+ | +--------------------------+--------------------------+------------+ | |||
Table 7 | Table 7: RIFT Security Algorithms | |||
10.3. Requested Registries with Assigned Values for Schema Values | 10.3. Registries with Assigned Values for Schema Values | |||
This section requests registries that help govern the schema via | This section requests registries that help govern the schema via the | |||
usual IANA registry procedures. A top-level group named 'RIFT' | usual IANA registry procedures. The registry group "Routing in Fat | |||
should hold the corresponding registries requested in the following | Trees (RIFT)" holds the following registries. Registry values are | |||
sections with their pre-defined values. Registry values are stored | stored with their minimum and maximum version in which they are | |||
with their minimum and maximum version in which they are available. | available. All values not provided are to be considered | |||
All values not provided as to be considered `Unassigned`. The range | "Unassigned". The range of every registry is a 16-bit integer. | |||
of every registry is a 16-bit integer. Allocation of new values is | Allocation of new values is performed via "Expert Review" action only | |||
performed via `Expert Review` action in case of major or minor Change | in the case of minor changes per the rules in Section 7. All other | |||
per rules in Section 7. Any other allocation is performed via | allocations are performed via "Specification Required". | |||
'Specification Required'. | ||||
The registries do not contain in some cases necessary information | In some cases, the registries do not contain necessary information | |||
such as whether the fields are optional or required, what units are | such as whether the fields are optional or required, what units are | |||
used or what datatype is involved. This information is encoded in | used, or what datatype is involved. This information is encoded in | |||
the normative schema itself by the means of IDL syntax or necessary | the normative schema itself by the means of IDL syntax or necessary | |||
type definitions and their names. | type definitions and their names. | |||
10.3.1. Registry RIFT/Versions | 10.3.1. RIFTVersions Registry | |||
This registry stores all RIFT protocol schema major and minor | This registry stores all RIFT protocol schema major and minor | |||
versions including the reference to the document introducing the | versions, including the reference to the document introducing the | |||
version. This means as well that if multiple documents extend rift | version. This also means that, if multiple documents extend rift | |||
schema they have to serialize using this registry to increase the | schema, they have to serialize using this registry to increase the | |||
minor or major versions sequentially. | minor or major versions sequentially. | |||
+================+===================================+ | +================+=====================+ | |||
| Schema Version | Reference | | | Schema Version | Reference | | |||
+================+===================================+ | +================+=====================+ | |||
| 8.0 | https://datatracker.ietf.org/doc/ | | | 8.0 | RFC 9692, Section 7 | | |||
| | draft-ietf-rift-rift/ Section 7 | | +----------------+---------------------+ | |||
+----------------+-----------------------------------+ | ||||
Table 8 | ||||
10.3.2. Registry RIFT/common/AddressFamilyType | ||||
The name of the registry should be RIFTCommonAddressFamilyType. | Table 8 | |||
Address family type. | 10.3.2. RIFTCommonAddressFamilyType Registry | |||
+=======================+=======+=============+=========+=========+ | This registry has the following initial values. | |||
| Name | Value | Min. Schema | Max. | Comment | | ||||
| | | Version | Schema | | | ||||
| | | | Version | | | ||||
+=======================+=======+=============+=========+=========+ | ||||
| Illegal | 0 | 8.0 | | | | ||||
+-----------------------+-------+-------------+---------+---------+ | ||||
| AddressFamilyMinValue | 1 | 8.0 | | | | ||||
+-----------------------+-------+-------------+---------+---------+ | ||||
| IPv4 | 2 | 8.0 | | | | ||||
+-----------------------+-------+-------------+---------+---------+ | ||||
| IPv6 | 3 | 8.0 | | | | ||||
+-----------------------+-------+-------------+---------+---------+ | ||||
| AddressFamilyMaxValue | 4 | 8.0 | | | | ||||
+-----------------------+-------+-------------+---------+---------+ | ||||
Table 9 | +=======+=======================+=============+=========+=========+ | |||
| Value | Name | Min. Schema | Max. | Comment | | ||||
| | | Version | Schema | | | ||||
| | | | Version | | | ||||
+=======+=======================+=============+=========+=========+ | ||||
| 0 | Illegal | 8.0 | | | | ||||
+-------+-----------------------+-------------+---------+---------+ | ||||
| 1 | AddressFamilyMinValue | 8.0 | | | | ||||
+-------+-----------------------+-------------+---------+---------+ | ||||
| 2 | IPv4 | 8.0 | | | | ||||
+-------+-----------------------+-------------+---------+---------+ | ||||
| 3 | IPv6 | 8.0 | | | | ||||
+-------+-----------------------+-------------+---------+---------+ | ||||
| 4 | AddressFamilyMaxValue | 8.0 | | | | ||||
+-------+-----------------------+-------------+---------+---------+ | ||||
10.3.3. Registry RIFT/common/HierarchyIndications | Table 9: Address Family Type | |||
The name of the registry should be RIFTCommonHierarchyIndications. | 10.3.3. RIFTCommonHierarchyIndications Registry | |||
Flags indicating node configuration in case of ZTP. | This registry has the following initial values. | |||
+====================================+=====+=======+=======+=======+ | +=====+====================================+=======+=======+=======+ | |||
|Name |Value| Min.| Max.|Comment| | |Value|Name |Min. |Max. |Comment| | |||
| | | Schema| Schema| | | | | |Schema |Schema | | | |||
| | |Version|Version| | | | | |Version|Version| | | |||
+====================================+=====+=======+=======+=======+ | +=====+====================================+=======+=======+=======+ | |||
|leaf_only | 0| 8.0| | | | |0 |leaf_only |8.0 | | | | |||
+------------------------------------+-----+-------+-------+-------+ | +-----+------------------------------------+-------+-------+-------+ | |||
|leaf_only_and_leaf_2_leaf_procedures| 1| 8.0| | | | |1 |leaf_only_and_leaf_2_leaf_procedures|8.0 | | | | |||
+------------------------------------+-----+-------+-------+-------+ | +-----+------------------------------------+-------+-------+-------+ | |||
|top_of_fabric | 2| 8.0| | | | |2 |top_of_fabric |8.0 | | | | |||
+------------------------------------+-----+-------+-------+-------+ | +-----+------------------------------------+-------+-------+-------+ | |||
Table 10 | Table 10: Flags Indicating Node Configuration in Case of ZTP | |||
10.3.4. Registry RIFT/common/IEEE802_1ASTimeStampType | 10.3.4. RIFTCommonIEEE8021ASTimeStampType Registry | |||
The name of the registry should be RIFTCommonIEEE8021ASTimeStampType. | This registry has the following initial values. | |||
Timestamp per IEEE 802.1AS, all values MUST be interpreted in | The timestamp is per IEEE 802.1AS; all values MUST be interpreted in | |||
implementation as unsigned. | implementation as unsigned. | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| AS_sec | 1 | 8.0 | | | | | 1 | AS_sec | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| AS_nsec | 2 | 8.0 | | | | | 2 | AS_nsec | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
Table 11 | Table 11 | |||
10.3.5. Registry RIFT/common/IPAddressType | 10.3.5. RIFTCommonIPAddressType Registry | |||
The name of the registry should be RIFTCommonIPAddressType. | ||||
IP address type. | ||||
+=============+=======+=====================+=============+=========+ | ||||
| Name | Value | Min. Schema | Max. Schema | Comment | | ||||
| | | Version | Version | | | ||||
+=============+=======+=====================+=============+=========+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------------+-------+---------------------+-------------+---------+ | ||||
| ipv4address | 1 | 8.0 | | Content | | ||||
| | | | | is ipv4 | | ||||
+-------------+-------+---------------------+-------------+---------+ | ||||
| ipv6address | 2 | 8.0 | | Content | | ||||
| | | | | is ipv6 | | ||||
+-------------+-------+---------------------+-------------+---------+ | ||||
Table 12 | ||||
10.3.6. Registry RIFT/common/IPPrefixType | This registry has the following initial values. | |||
The name of the registry should be RIFTCommonIPPrefixType. | +=======+=============+=====================+=============+=========+ | |||
| Value | Name | Min. Schema | Max. Schema | Comment | | ||||
| | | Version | Version | | | ||||
+=======+=============+=====================+=============+=========+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+-------------+---------------------+-------------+---------+ | ||||
| 1 | ipv4address | 8.0 | | Content | | ||||
| | | | | is IPv4 | | ||||
+-------+-------------+---------------------+-------------+---------+ | ||||
| 2 | ipv6address | 8.0 | | Content | | ||||
| | | | | is IPv6 | | ||||
+-------+-------------+---------------------+-------------+---------+ | ||||
Prefix advertisement. | Table 12: IP Address Type | |||
@note: for interface addresses the protocol can propagate the address | 10.3.6. RIFTCommonIPPrefixType Registry | |||
part beyond the subnet mask and on reachability computation that has | ||||
to be normalized. The non-significant bits can be used for | ||||
operational purposes. | ||||
+============+=======+=====================+=============+=========+ | This registry has the following initial values. | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | ||||
| | | | Version | | | ||||
+============+=======+=====================+=============+=========+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+------------+-------+---------------------+-------------+---------+ | ||||
| ipv4prefix | 1 | 8.0 | | | | ||||
+------------+-------+---------------------+-------------+---------+ | ||||
| ipv6prefix | 2 | 8.0 | | | | ||||
+------------+-------+---------------------+-------------+---------+ | ||||
Table 13 | | Note: For interface addresses the protocol can propagate the | |||
| address part beyond the subnet mask and on reachability | ||||
| computation the non-significant bits have to be normalized. | ||||
| Those bits can be used for operational purposes. | ||||
10.3.7. Registry RIFT/common/IPv4PrefixType | +=======+============+=====================+=============+=========+ | |||
| Value | Name | Min. Schema Version | Max. Schema | Comment | | ||||
| | | | Version | | | ||||
+=======+============+=====================+=============+=========+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+------------+---------------------+-------------+---------+ | ||||
| 1 | ipv4prefix | 8.0 | | | | ||||
+-------+------------+---------------------+-------------+---------+ | ||||
| 2 | ipv6prefix | 8.0 | | | | ||||
+-------+------------+---------------------+-------------+---------+ | ||||
The name of the registry should be RIFTCommonIPv4PrefixType. | Table 13: Prefix Advertisement | |||
IPv4 prefix type. | 10.3.7. RIFTCommonIPv4PrefixType Registry | |||
+===========+=======+=====================+=============+=========+ | This registry has the following initial values. | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | ||||
| | | | Version | | | ||||
+===========+=======+=====================+=============+=========+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-----------+-------+---------------------+-------------+---------+ | ||||
| address | 1 | 8.0 | | | | ||||
+-----------+-------+---------------------+-------------+---------+ | ||||
| prefixlen | 2 | 8.0 | | | | ||||
+-----------+-------+---------------------+-------------+---------+ | ||||
Table 14 | +=======+===========+=====================+=============+=========+ | |||
| Value | Name | Min. Schema Version | Max. Schema | Comment | | ||||
| | | | Version | | | ||||
+=======+===========+=====================+=============+=========+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+-----------+---------------------+-------------+---------+ | ||||
| 1 | address | 8.0 | | | | ||||
+-------+-----------+---------------------+-------------+---------+ | ||||
| 2 | prefixlen | 8.0 | | | | ||||
+-------+-----------+---------------------+-------------+---------+ | ||||
10.3.8. Registry RIFT/common/IPv6PrefixType | Table 14: IPv4 Prefix Type | |||
The name of the registry should be RIFTCommonIPv6PrefixType. | 10.3.8. RIFTCommonIPv6PrefixType Registry | |||
IPv6 prefix type. | This registry has the following initial values. | |||
+===========+=======+=====================+=============+=========+ | +=======+===========+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+===========+=======+=====================+=============+=========+ | +=======+===========+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+-----------+-------+---------------------+-------------+---------+ | +-------+-----------+---------------------+-------------+---------+ | |||
| address | 1 | 8.0 | | | | | 1 | address | 8.0 | | | | |||
+-----------+-------+---------------------+-------------+---------+ | +-------+-----------+---------------------+-------------+---------+ | |||
| prefixlen | 2 | 8.0 | | | | | 2 | prefixlen | 8.0 | | | | |||
+-----------+-------+---------------------+-------------+---------+ | +-------+-----------+---------------------+-------------+---------+ | |||
Table 15 | Table 15: IPv6 Prefix Type | |||
10.3.9. Registry RIFT/common/KVTypes | 10.3.9. RIFTCommonKVTypes Registry | |||
The name of the registry should be RIFTCommonKVTypes. | This registry has the following initial values. | |||
+==============+=======+=============+=============+=========+ | +=======+==============+=============+=============+=========+ | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | | Value | Name | Min. Schema | Max. Schema | Comment | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+==============+=======+=============+=============+=========+ | +=======+==============+=============+=============+=========+ | |||
| Experimental | 1 | 8.0 | | | | | 0 | Unassigned | | | | | |||
+--------------+-------+-------------+-------------+---------+ | +-------+--------------+-------------+-------------+---------+ | |||
| WellKnown | 2 | 8.0 | | | | | 1 | Experimental | 8.0 | | | | |||
+--------------+-------+-------------+-------------+---------+ | +-------+--------------+-------------+-------------+---------+ | |||
| OUI | 3 | 8.0 | | | | | 2 | WellKnown | 8.0 | | | | |||
+--------------+-------+-------------+-------------+---------+ | +-------+--------------+-------------+-------------+---------+ | |||
| 3 | OUI | 8.0 | | | | ||||
+-------+--------------+-------------+-------------+---------+ | ||||
Table 16 | Table 16 | |||
10.3.10. Registry RIFT/common/PrefixSequenceType | 10.3.10. RIFTCommonPrefixSequenceType Registry | |||
The name of the registry should be RIFTCommonPrefixSequenceType. | ||||
Sequence of a prefix in case of move. | ||||
+===============+=======+=============+==========+==================+ | ||||
| Name | Value | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+===============+=======+=============+==========+==================+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+---------------+-------+-------------+----------+------------------+ | ||||
| timestamp | 1 | 8.0 | | | | ||||
+---------------+-------+-------------+----------+------------------+ | ||||
| transactionid | 2 | 8.0 | | Transaction id | | ||||
| | | | | set by client in | | ||||
| | | | | e.g. in 6lowpan. | | ||||
+---------------+-------+-------------+----------+------------------+ | ||||
Table 17 | ||||
10.3.11. Registry RIFT/common/RouteType | ||||
The name of the registry should be RIFTCommonRouteType. | This registry has the following initial values. | |||
RIFT route types. @note: The only purpose of those values is to | +=======+===============+=========+==========+===================+ | |||
introduce an ordering whereas an implementation can choose internally | | Value | Name | Min. | Max. | Comment | | |||
any other values as long the ordering is preserved | | | | Schema | Schema | | | |||
+=====================+=======+=============+=============+=========+ | | | | Version | Version | | | |||
| Name | Value | Min. Schema | Max. | Comment | | +=======+===============+=========+==========+===================+ | |||
| | | Version | Schema | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Version | | | | | | | Versions | | | |||
+=====================+=======+=============+=============+=========+ | +-------+---------------+---------+----------+-------------------+ | |||
| Illegal | 0 | 8.0 | | | | | 1 | timestamp | 8.0 | | | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------+---------+----------+-------------------+ | |||
| RouteTypeMinValue | 1 | 8.0 | | | | | 2 | transactionid | 8.0 | | Transaction ID | | |||
+---------------------+-------+-------------+-------------+---------+ | | | | | | set by client in, | | |||
| Discard | 2 | 8.0 | | | | | | | | | e.g., 6LoWPAN. | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------+---------+----------+-------------------+ | |||
| LocalPrefix | 3 | 8.0 | | | | ||||
+---------------------+-------+-------------+-------------+---------+ | ||||
| SouthPGPPrefix | 4 | 8.0 | | | | ||||
+---------------------+-------+-------------+-------------+---------+ | ||||
| NorthPGPPrefix | 5 | 8.0 | | | | ||||
+---------------------+-------+-------------+-------------+---------+ | ||||
| NorthPrefix | 6 | 8.0 | | | | ||||
+---------------------+-------+-------------+-------------+---------+ | ||||
| NorthExternalPrefix | 7 | 8.0 | | | | ||||
+---------------------+-------+-------------+-------------+---------+ | ||||
| SouthPrefix | 8 | 8.0 | | | | ||||
+---------------------+-------+-------------+-------------+---------+ | ||||
| SouthExternalPrefix | 9 | 8.0 | | | | ||||
+---------------------+-------+-------------+-------------+---------+ | ||||
| NegativeSouthPrefix | 10 | 8.0 | | | | ||||
+---------------------+-------+-------------+-------------+---------+ | ||||
| RouteTypeMaxValue | 11 | 8.0 | | | | ||||
+---------------------+-------+-------------+-------------+---------+ | ||||
Table 18 | Table 17: Sequence of a Prefix in Case of Move | |||
10.3.12. Registry RIFT/common/TIETypeType | 10.3.11. RIFTCommonRouteType Registry | |||
The name of the registry should be RIFTCommonTIETypeType. | This registry has the following initial values. | |||
Type of TIE. | | Note: The only purpose of these values is to introduce an | |||
| ordering, whereas an implementation can internally choose any | ||||
| other values as long the ordering is preserved. | ||||
+===========================================+=====+=======+=======+=======+ | +=======+=====================+=============+=============+=========+ | |||
|Name |Value| Min.| Max.|Comment| | | Value | Name | Min. Schema | Max. | Comment | | |||
| | | Schema| Schema| | | | | | Version | Schema | | | |||
| | |Version|Version| | | | | | | Version | | | |||
+===========================================+=====+=======+=======+=======+ | +=======+=====================+=============+=============+=========+ | |||
|Illegal | 0| 8.0| | | | | 0 | Illegal | 8.0 | | | | |||
+-------------------------------------------+-----+-------+-------+-------+ | +-------+---------------------+-------------+-------------+---------+ | |||
|TIETypeMinValue | 1| 8.0| | | | | 1 | RouteTypeMinValue | 8.0 | | | | |||
+-------------------------------------------+-----+-------+-------+-------+ | +-------+---------------------+-------------+-------------+---------+ | |||
|NodeTIEType | 2| 8.0| | | | | 2 | Discard | 8.0 | | | | |||
+-------------------------------------------+-----+-------+-------+-------+ | +-------+---------------------+-------------+-------------+---------+ | |||
|PrefixTIEType | 3| 8.0| | | | | 3 | LocalPrefix | 8.0 | | | | |||
+-------------------------------------------+-----+-------+-------+-------+ | +-------+---------------------+-------------+-------------+---------+ | |||
|PositiveDisaggregationPrefixTIEType | 4| 8.0| | | | | 4 | SouthPGPPrefix | 8.0 | | | | |||
+-------------------------------------------+-----+-------+-------+-------+ | +-------+---------------------+-------------+-------------+---------+ | |||
|NegativeDisaggregationPrefixTIEType | 5| 8.0| | | | | 5 | NorthPGPPrefix | 8.0 | | | | |||
+-------------------------------------------+-----+-------+-------+-------+ | +-------+---------------------+-------------+-------------+---------+ | |||
|PGPrefixTIEType | 6| 8.0| | | | | 6 | NorthPrefix | 8.0 | | | | |||
+-------------------------------------------+-----+-------+-------+-------+ | +-------+---------------------+-------------+-------------+---------+ | |||
|KeyValueTIEType | 7| 8.0| | | | | 7 | NorthExternalPrefix | 8.0 | | | | |||
+-------------------------------------------+-----+-------+-------+-------+ | +-------+---------------------+-------------+-------------+---------+ | |||
|ExternalPrefixTIEType | 8| 8.0| | | | | 8 | SouthPrefix | 8.0 | | | | |||
+-------------------------------------------+-----+-------+-------+-------+ | +-------+---------------------+-------------+-------------+---------+ | |||
|PositiveExternalDisaggregationPrefixTIEType| 9| 8.0| | | | | 9 | SouthExternalPrefix | 8.0 | | | | |||
+-------------------------------------------+-----+-------+-------+-------+ | +-------+---------------------+-------------+-------------+---------+ | |||
|TIETypeMaxValue | 10| 8.0| | | | | 10 | NegativeSouthPrefix | 8.0 | | | | |||
+-------------------------------------------+-----+-------+-------+-------+ | +-------+---------------------+-------------+-------------+---------+ | |||
| 11 | RouteTypeMaxValue | 8.0 | | | | ||||
+-------+---------------------+-------------+-------------+---------+ | ||||
Table 19 | Table 18: RIFT Route Types | |||
10.3.13. Registry RIFT/common/TieDirectionType | 10.3.12. RIFTCommonTIETypeType Registry | |||
The name of the registry should be RIFTCommonTieDirectionType. | This registry has the following initial values. | |||
Direction of TIEs. | +=====+===========================================+=======+=======+=======+ | |||
|Value|Name |Min. |Max. |Comment| | ||||
| | |Schema |Schema | | | ||||
| | |Version|Version| | | ||||
+=====+===========================================+=======+=======+=======+ | ||||
|0 |Illegal |8.0 | | | | ||||
+-----+-------------------------------------------+-------+-------+-------+ | ||||
|1 |TIETypeMinValue |8.0 | | | | ||||
+-----+-------------------------------------------+-------+-------+-------+ | ||||
|2 |NodeTIEType |8.0 | | | | ||||
+-----+-------------------------------------------+-------+-------+-------+ | ||||
|3 |PrefixTIEType |8.0 | | | | ||||
+-----+-------------------------------------------+-------+-------+-------+ | ||||
|4 |PositiveDisaggregationPrefixTIEType |8.0 | | | | ||||
+-----+-------------------------------------------+-------+-------+-------+ | ||||
|5 |NegativeDisaggregationPrefixTIEType |8.0 | | | | ||||
+-----+-------------------------------------------+-------+-------+-------+ | ||||
|6 |PGPrefixTIEType |8.0 | | | | ||||
+-----+-------------------------------------------+-------+-------+-------+ | ||||
|7 |KeyValueTIEType |8.0 | | | | ||||
+-----+-------------------------------------------+-------+-------+-------+ | ||||
|8 |ExternalPrefixTIEType |8.0 | | | | ||||
+-----+-------------------------------------------+-------+-------+-------+ | ||||
|9 |PositiveExternalDisaggregationPrefixTIEType|8.0 | | | | ||||
+-----+-------------------------------------------+-------+-------+-------+ | ||||
|10 |TIETypeMaxValue |8.0 | | | | ||||
+-----+-------------------------------------------+-------+-------+-------+ | ||||
+===================+=======+=============+=============+=========+ | Table 19: Type of TIE | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | ||||
| | | Version | Version | | | ||||
+===================+=======+=============+=============+=========+ | ||||
| Illegal | 0 | 8.0 | | | | ||||
+-------------------+-------+-------------+-------------+---------+ | ||||
| South | 1 | 8.0 | | | | ||||
+-------------------+-------+-------------+-------------+---------+ | ||||
| North | 2 | 8.0 | | | | ||||
+-------------------+-------+-------------+-------------+---------+ | ||||
| DirectionMaxValue | 3 | 8.0 | | | | ||||
+-------------------+-------+-------------+-------------+---------+ | ||||
Table 20 | 10.3.13. RIFTCommonTieDirectionType Registry | |||
10.3.14. Registry RIFT/encoding/Community | This registry has the following initial values. | |||
The name of the registry should be RIFTEncodingCommunity. | +=======+===================+=============+=============+=========+ | |||
| Value | Name | Min. Schema | Max. Schema | Comment | | ||||
| | | Version | Version | | | ||||
+=======+===================+=============+=============+=========+ | ||||
| 0 | Illegal | 8.0 | | | | ||||
+-------+-------------------+-------------+-------------+---------+ | ||||
| 1 | South | 8.0 | | | | ||||
+-------+-------------------+-------------+-------------+---------+ | ||||
| 2 | North | 8.0 | | | | ||||
+-------+-------------------+-------------+-------------+---------+ | ||||
| 3 | DirectionMaxValue | 8.0 | | | | ||||
+-------+-------------------+-------------+-------------+---------+ | ||||
Prefix community. | Table 20: Direction of TIEs | |||
+==========+=======+=====================+=============+============+ | 10.3.14. RIFTEncodingCommunity Registry | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | ||||
| | | Version | Version | | | ||||
+==========+=======+=====================+=============+============+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+----------+-------+---------------------+-------------+------------+ | ||||
| top | 1 | 8.0 | | Higher | | ||||
| | | | | order bits | | ||||
+----------+-------+---------------------+-------------+------------+ | ||||
| bottom | 2 | 8.0 | | Lower | | ||||
| | | | | order bits | | ||||
+----------+-------+---------------------+-------------+------------+ | ||||
Table 21 | This registry has the following initial values. | |||
10.3.15. Registry RIFT/encoding/KeyValueTIEElement | +=======+==========+=====================+=============+============+ | |||
| Value | Name | Min. Schema | Max. Schema | Comment | | ||||
| | | Version | Version | | | ||||
+=======+==========+=====================+=============+============+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+----------+---------------------+-------------+------------+ | ||||
| 1 | top | 8.0 | | Higher | | ||||
| | | | | order bits | | ||||
+-------+----------+---------------------+-------------+------------+ | ||||
| 2 | bottom | 8.0 | | Lower | | ||||
| | | | | order bits | | ||||
+-------+----------+---------------------+-------------+------------+ | ||||
The name of the registry should be RIFTEncodingKeyValueTIEElement. | Table 21: Prefix Community | |||
Generic key value pairs. | 10.3.15. RIFTEncodingKeyValueTIEElement Registry | |||
+===========+=======+=====================+=============+=========+ | This registry has the following initial values. | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | ||||
| | | | Version | | | ||||
+===========+=======+=====================+=============+=========+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-----------+-------+---------------------+-------------+---------+ | ||||
| keyvalues | 1 | 8.0 | | | | ||||
+-----------+-------+---------------------+-------------+---------+ | ||||
Table 22 | +=======+===========+=====================+=============+=========+ | |||
| Value | Name | Min. Schema Version | Max. Schema | Comment | | ||||
| | | | Version | | | ||||
+=======+===========+=====================+=============+=========+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+-----------+---------------------+-------------+---------+ | ||||
| 1 | keyvalues | 8.0 | | | | ||||
+-------+-----------+---------------------+-------------+---------+ | ||||
10.3.16. Registry RIFT/encoding/KeyValueTIEElementContent | Table 22: Generic Key Value Pairs | |||
The name of the registry should be | 10.3.16. RIFTEncodingKeyValueTIEElementContent Registry | |||
RIFTEncodingKeyValueTIEElementContent. | ||||
Defines the targeted nodes and the value carried. | This registry has the following initial values. It defines the | |||
targeted nodes and the value carried. | ||||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| targets | 1 | 8.0 | | | | | 1 | targets | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| value | 2 | 8.0 | | | | | 2 | value | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
Table 23 | Table 23 | |||
10.3.17. Registry RIFT/encoding/LIEPacket | 10.3.17. RIFTEncodingLIEPacket Registry | |||
The name of the registry should be RIFTEncodingLIEPacket. | ||||
RIFT LIE Packet. | ||||
@note: this node's level is already included on the packet header | ||||
+=============================+=====+=======+========+=============+ | ||||
| Name |Value| Min.| Max.|Comment | | ||||
| | | Schema| Schema| | | ||||
| | |Version| Version| | | ||||
+=============================+=====+=======+========+=============+ | ||||
| Reserved | 0| 8.0| All| | | ||||
| | | |Versions| | | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| name | 1| 8.0| | Node or| | ||||
| | | | | adjacency| | ||||
| | | | | name.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| local_id | 2| 8.0| | Local link| | ||||
| | | | | id.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| flood_port | 3| 8.0| | Udp port to| | ||||
| | | | | which we can| | ||||
| | | | | receive| | ||||
| | | | |flooded ties.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| link_mtu_size | 4| 8.0| | Layer 2 mtu,| | ||||
| | | | | used to| | ||||
| | | | | discover| | ||||
| | | | | mismatch.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| link_bandwidth | 5| 8.0| | Local link| | ||||
| | | | | bandwidth on| | ||||
| | | | | the| | ||||
| | | | | interface.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| neighbor | 6| 8.0| | Reflects the| | ||||
| | | | |neighbor once| | ||||
| | | | | received to| | ||||
| | | | |provide 3-way| | ||||
| | | | |connectivity.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| pod | 7| 8.0| | Node's pod.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| node_capabilities | 10| 8.0| | Node| | ||||
| | | | | capabilities| | ||||
| | | | | supported.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| link_capabilities | 11| 8.0| | Capabilities| | ||||
| | | | |of this link.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| holdtime | 12| 8.0| | Required| | ||||
| | | | | holdtime of| | ||||
| | | | | the| | ||||
| | | | | adjacency,| | ||||
| | | | | i.e. for how| | ||||
| | | | |long a period| | ||||
| | | | | should| | ||||
| | | | | adjacency be| | ||||
| | | | | kept up| | ||||
| | | | |without valid| | ||||
| | | | | lie| | ||||
| | | | | reception.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| label | 13| 8.0| | Optional,| | ||||
| | | | | unsolicited,| | ||||
| | | | | downstream| | ||||
| | | | | assigned| | ||||
| | | | | locally| | ||||
| | | | | significant| | ||||
| | | | | label value| | ||||
| | | | | for the| | ||||
| | | | | adjacency.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| not_a_ztp_offer | 21| 8.0| | Indicates| | ||||
| | | | | that the| | ||||
| | | | | level on the| | ||||
| | | | | lie must not| | ||||
| | | | | be used to| | ||||
| | | | | derive a ztp| | ||||
| | | | | level by the| | ||||
| | | | | receiving| | ||||
| | | | | node.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| you_are_flood_repeater | 22| 8.0| | Indicates to| | ||||
| | | | | northbound| | ||||
| | | | |neighbor that| | ||||
| | | | | it should be| | ||||
| | | | | reflooding| | ||||
| | | | |ties received| | ||||
| | | | | from this| | ||||
| | | | | node to| | ||||
| | | | |achieve flood| | ||||
| | | | |reduction and| | ||||
| | | | |balancing for| | ||||
| | | | | northbound| | ||||
| | | | | flooding.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| you_are_sending_too_quickly | 23| 8.0| | Indicates to| | ||||
| | | | | neighbor to| | ||||
| | | | | flood node| | ||||
| | | | |ties only and| | ||||
| | | | |slow down all| | ||||
| | | | | other ties.| | ||||
| | | | | ignored when| | ||||
| | | | |received from| | ||||
| | | | | southbound| | ||||
| | | | | neighbor.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| instance_name | 24| 8.0| |Instance name| | ||||
| | | | | in case| | ||||
| | | | |multiple rift| | ||||
| | | | | instances| | ||||
| | | | | running on| | ||||
| | | | | same| | ||||
| | | | | interface.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
| fabric_id | 35| 8.0| | It provides| | ||||
| | | | | the optional| | ||||
| | | | | id of the| | ||||
| | | | | fabric| | ||||
| | | | | configured.| | ||||
| | | | | this must| | ||||
| | | | | match the| | ||||
| | | | | information| | ||||
| | | | |advertised on| | ||||
| | | | | the node| | ||||
| | | | | element.| | ||||
+-----------------------------+-----+-------+--------+-------------+ | ||||
Table 24 | ||||
10.3.18. Registry RIFT/encoding/LinkCapabilities | ||||
The name of the registry should be RIFTEncodingLinkCapabilities. | ||||
Link capabilities. | ||||
+=========================+=====+=========+==========+==============+ | ||||
| Name |Value| Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+=========================+=====+=========+==========+==============+ | ||||
| Reserved | 0| 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------------------------+-----+---------+----------+--------------+ | ||||
| bfd | 1| 8.0 | | Indicates | | ||||
| | | | | that the | | ||||
| | | | | link is | | ||||
| | | | | supporting | | ||||
| | | | | bfd. | | ||||
+-------------------------+-----+---------+----------+--------------+ | ||||
| ipv4_forwarding_capable | 2| 8.0 | | Indicates | | ||||
| | | | | whether the | | ||||
| | | | | interface | | ||||
| | | | | will | | ||||
| | | | | support | | ||||
| | | | | ipv4 | | ||||
| | | | | forwarding. | | ||||
+-------------------------+-----+---------+----------+--------------+ | ||||
Table 25 | ||||
10.3.19. Registry RIFT/encoding/LinkIDPair | ||||
The name of the registry should be RIFTEncodingLinkIDPair. | ||||
LinkID pair describes one of parallel links between two nodes. | ||||
+============================+=====+=======+========+===============+ | ||||
| Name |Value| Min.| Max.| Comment | | ||||
| | | Schema| Schema| | | ||||
| | |Version| Version| | | ||||
+============================+=====+=======+========+===============+ | ||||
| Reserved | 0| 8.0| All| | | ||||
| | | |Versions| | | ||||
+----------------------------+-----+-------+--------+---------------+ | ||||
| local_id | 1| 8.0| | Node-wide | | ||||
| | | | | unique value | | ||||
| | | | | for the | | ||||
| | | | | local link. | | ||||
+----------------------------+-----+-------+--------+---------------+ | ||||
| remote_id | 2| 8.0| | Received | | ||||
| | | | | remote link | | ||||
| | | | | id for this | | ||||
| | | | | link. | | ||||
+----------------------------+-----+-------+--------+---------------+ | ||||
| platform_interface_index | 10| 8.0| | Describes | | ||||
| | | | | the local | | ||||
| | | | | interface | | ||||
| | | | | index of the | | ||||
| | | | | link. | | ||||
+----------------------------+-----+-------+--------+---------------+ | ||||
| platform_interface_name | 11| 8.0| | Describes | | ||||
| | | | | the local | | ||||
| | | | | interface | | ||||
| | | | | name. | | ||||
+----------------------------+-----+-------+--------+---------------+ | ||||
| trusted_outer_security_key | 12| 8.0| | Indicates | | ||||
| | | | | whether the | | ||||
| | | | | link is | | ||||
| | | | | secured, | | ||||
| | | | | i.e. | | ||||
| | | | | protected by | | ||||
| | | | | outer key, | | ||||
| | | | | absence of | | ||||
| | | | | this element | | ||||
| | | | | means no | | ||||
| | | | | indication, | | ||||
| | | | | undefined | | ||||
| | | | | outer key | | ||||
| | | | | means not | | ||||
| | | | | secured. | | ||||
+----------------------------+-----+-------+--------+---------------+ | ||||
| bfd_up | 13| 8.0| | Indicates | | ||||
| | | | | whether the | | ||||
| | | | | link is | | ||||
| | | | | protected by | | ||||
| | | | | established | | ||||
| | | | | bfd session. | | ||||
+----------------------------+-----+-------+--------+---------------+ | ||||
| address_families | 14| 8.0| | Optional | | ||||
| | | | | indication | | ||||
| | | | | which | | ||||
| | | | | address | | ||||
| | | | | families are | | ||||
| | | | | up on the | | ||||
| | | | | interface. | | ||||
+----------------------------+-----+-------+--------+---------------+ | ||||
Table 26 | ||||
10.3.20. Registry RIFT/encoding/Neighbor | ||||
The name of the registry should be RIFTEncodingNeighbor. | ||||
Neighbor structure. | ||||
+============+=======+=============+=============+=================+ | ||||
| Name | Value | Min. Schema | Max. Schema | Comment | | ||||
| | | Version | Version | | | ||||
+============+=======+=============+=============+=================+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+------------+-------+-------------+-------------+-----------------+ | ||||
| originator | 1 | 8.0 | | System id of | | ||||
| | | | | the originator. | | ||||
+------------+-------+-------------+-------------+-----------------+ | ||||
| remote_id | 2 | 8.0 | | Id of remote | | ||||
| | | | | side of the | | ||||
| | | | | link. | | ||||
+------------+-------+-------------+-------------+-----------------+ | ||||
Table 27 | ||||
10.3.21. Registry RIFT/encoding/NodeCapabilities | ||||
The name of the registry should be RIFTEncodingNodeCapabilities. | ||||
Capabilities the node supports. | ||||
+========================+=====+=======+==========+=================+ | ||||
| Name |Value| Min.| Max. | Comment | | ||||
| | | Schema| Schema | | | ||||
| | |Version| Version | | | ||||
+========================+=====+=======+==========+=================+ | ||||
| Reserved | 0| 8.0| All | | | ||||
| | | | Versions | | | ||||
+------------------------+-----+-------+----------+-----------------+ | ||||
| protocol_minor_version | 1| 8.0| | Must advertise | | ||||
| | | | | supported | | ||||
| | | | | minor version | | ||||
| | | | | dialect that | | ||||
| | | | | way. | | ||||
+------------------------+-----+-------+----------+-----------------+ | ||||
| flood_reduction | 2| 8.0| | Indicates that | | ||||
| | | | | node supports | | ||||
| | | | | flood | | ||||
| | | | | reduction. | | ||||
+------------------------+-----+-------+----------+-----------------+ | ||||
| hierarchy_indications | 3| 8.0| | Indicates | | ||||
| | | | | place in | | ||||
| | | | | hierarchy, | | ||||
| | | | | i.e. top-of- | | ||||
| | | | | fabric or leaf | | ||||
| | | | | only (in ztp) | | ||||
| | | | | or support for | | ||||
| | | | | leaf-2-leaf | | ||||
| | | | | procedures. | | ||||
+------------------------+-----+-------+----------+-----------------+ | ||||
Table 28 | ||||
10.3.22. Registry RIFT/encoding/NodeFlags | ||||
The name of the registry should be RIFTEncodingNodeFlags. | ||||
Indication flags of the node. | This registry has the following initial values. | |||
+==========+=======+=========+==========+===========================+ | | Note: This node's level is already included on the packet | |||
| Name | Value | Min. | Max. | Comment | | | header. | |||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+==========+=======+=========+==========+===========================+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+----------+-------+---------+----------+---------------------------+ | ||||
| overload | 1 | 8.0 | | Indicates that node | | ||||
| | | | | is in overload, do | | ||||
| | | | | not transit traffic | | ||||
| | | | | through it. | | ||||
+----------+-------+---------+----------+---------------------------+ | ||||
Table 29 | +=====+=============================+=======+========+==============+ | |||
|Value| Name |Min. |Max. |Comment | | ||||
| | |Schema |Schema | | | ||||
| | |Version|Version | | | ||||
+=====+=============================+=======+========+==============+ | ||||
|0 | Reserved |8.0 |All | | | ||||
| | | |Versions| | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|1 | name |8.0 | |Node or | | ||||
| | | | |adjacency | | ||||
| | | | |name. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|2 | local_id |8.0 | |Local link | | ||||
| | | | |ID. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|3 | flood_port |8.0 | |UDP port to | | ||||
| | | | |which we can | | ||||
| | | | |receive | | ||||
| | | | |flooded ties. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|4 | link_mtu_size |8.0 | |Layer 2 MTU, | | ||||
| | | | |used to | | ||||
| | | | |discover | | ||||
| | | | |mismatch. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|5 | link_bandwidth |8.0 | |Local link | | ||||
| | | | |bandwidth on | | ||||
| | | | |the | | ||||
| | | | |interface. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|6 | neighbor |8.0 | |Reflects the | | ||||
| | | | |neighbor once | | ||||
| | | | |received to | | ||||
| | | | |provide 3-way | | ||||
| | | | |connectivity. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|7 | pod |8.0 | |Node's PoD. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|10 | node_capabilities |8.0 | |Node | | ||||
| | | | |capabilities | | ||||
| | | | |supported. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|11 | link_capabilities |8.0 | |Capabilities | | ||||
| | | | |of this link. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|12 | holdtime |8.0 | |Required | | ||||
| | | | |holdtime of | | ||||
| | | | |the | | ||||
| | | | |adjacency, | | ||||
| | | | |i.e., for how | | ||||
| | | | |long a period | | ||||
| | | | |adjacency | | ||||
| | | | |should be | | ||||
| | | | |kept up | | ||||
| | | | |without valid | | ||||
| | | | |LIE | | ||||
| | | | |reception. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|13 | label |8.0 | |Optional, | | ||||
| | | | |unsolicited, | | ||||
| | | | |downstream | | ||||
| | | | |assigned | | ||||
| | | | |locally | | ||||
| | | | |significant | | ||||
| | | | |label value | | ||||
| | | | |for the | | ||||
| | | | |adjacency. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|21 | not_a_ztp_offer |8.0 | |Indicates | | ||||
| | | | |that the | | ||||
| | | | |level on the | | ||||
| | | | |LIE must not | | ||||
| | | | |be used to | | ||||
| | | | |derive a ZTP | | ||||
| | | | |level by the | | ||||
| | | | |receiving | | ||||
| | | | |node. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|22 | you_are_flood_repeater |8.0 | |Indicates to | | ||||
| | | | |the | | ||||
| | | | |northbound | | ||||
| | | | |neighbor that | | ||||
| | | | |it should be | | ||||
| | | | |reflooding | | ||||
| | | | |TIEs received | | ||||
| | | | |from this | | ||||
| | | | |node to | | ||||
| | | | |achieve flood | | ||||
| | | | |reduction and | | ||||
| | | | |balancing for | | ||||
| | | | |northbound | | ||||
| | | | |flooding. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|23 | you_are_sending_too_quickly |8.0 | |Indicates to | | ||||
| | | | |the neighbor | | ||||
| | | | |to flood node | | ||||
| | | | |ties only and | | ||||
| | | | |slow down all | | ||||
| | | | |other ties. | | ||||
| | | | |Ignored when | | ||||
| | | | |received from | | ||||
| | | | |the | | ||||
| | | | |southbound | | ||||
| | | | |neighbor. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|24 | instance_name |8.0 | |Instance name | | ||||
| | | | |in case | | ||||
| | | | |multiple RIFT | | ||||
| | | | |instances are | | ||||
| | | | |running on | | ||||
| | | | |the same | | ||||
| | | | |interface. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
|35 | fabric_id |8.0 | |It provides | | ||||
| | | | |the optional | | ||||
| | | | |ID of the | | ||||
| | | | |fabric | | ||||
| | | | |configured. | | ||||
| | | | |This must | | ||||
| | | | |match the | | ||||
| | | | |information | | ||||
| | | | |advertised on | | ||||
| | | | |the node | | ||||
| | | | |element. | | ||||
+-----+-----------------------------+-------+--------+--------------+ | ||||
10.3.23. Registry RIFT/encoding/NodeNeighborsTIEElement | Table 24: RIFT LIE Packet | |||
The name of the registry should be | 10.3.18. RIFTEncodingLinkCapabilities Registry | |||
RIFTEncodingNodeNeighborsTIEElement. | ||||
neighbor of a node | This registry has the following initial values. | |||
+===========+=======+=========+==========+==========================+ | ||||
| Name | Value | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+===========+=======+=========+==========+==========================+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-----------+-------+---------+----------+--------------------------+ | ||||
| level | 1 | 8.0 | | Level of neighbor. | | ||||
+-----------+-------+---------+----------+--------------------------+ | ||||
| cost | 3 | 8.0 | | Cost to neighbor. | | ||||
| | | | | ignore anything | | ||||
| | | | | equal or larger than | | ||||
| | | | | `infinite_distance` | | ||||
| | | | | and equal to | | ||||
| | | | | `invalid_distance`. | | ||||
+-----------+-------+---------+----------+--------------------------+ | ||||
| link_ids | 4 | 8.0 | | Carries description | | ||||
| | | | | of multiple parallel | | ||||
| | | | | links in a tie. | | ||||
+-----------+-------+---------+----------+--------------------------+ | ||||
| bandwidth | 5 | 8.0 | | Total bandwith to | | ||||
| | | | | neighbor as sum of | | ||||
| | | | | all parallel links. | | ||||
+-----------+-------+---------+----------+--------------------------+ | ||||
Table 30 | +=====+=========================+=========+==========+==============+ | |||
|Value| Name | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+=====+=========================+=========+==========+==============+ | ||||
|0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-----+-------------------------+---------+----------+--------------+ | ||||
|1 | bfd | 8.0 | | Indicates | | ||||
| | | | | that the | | ||||
| | | | | link is | | ||||
| | | | | supporting | | ||||
| | | | | BFD. | | ||||
+-----+-------------------------+---------+----------+--------------+ | ||||
|2 | ipv4_forwarding_capable | 8.0 | | Indicates | | ||||
| | | | | whether the | | ||||
| | | | | interface | | ||||
| | | | | will | | ||||
| | | | | support | | ||||
| | | | | IPv4 | | ||||
| | | | | forwarding. | | ||||
+-----+-------------------------+---------+----------+--------------+ | ||||
10.3.24. Registry RIFT/encoding/NodeTIEElement | Table 25: Link Capabilities | |||
The name of the registry should be RIFTEncodingNodeTIEElement. | 10.3.19. RIFTEncodingLinkIDPair Registry | |||
Description of a node. | The LinkID pair describes one of the parallel links between two | |||
nodes. | ||||
+=================+=======+=========+==========+====================+ | This registry has the following initial values. | |||
| Name | Value | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+=================+=======+=========+==========+====================+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-----------------+-------+---------+----------+--------------------+ | ||||
| level | 1 | 8.0 | | Level of the | | ||||
| | | | | node. | | ||||
+-----------------+-------+---------+----------+--------------------+ | ||||
| neighbors | 2 | 8.0 | | Node's neighbors. | | ||||
| | | | | multiple node | | ||||
| | | | | ties can carry | | ||||
| | | | | disjoint sets of | | ||||
| | | | | neighbors. | | ||||
+-----------------+-------+---------+----------+--------------------+ | ||||
| capabilities | 3 | 8.0 | | Capabilities of | | ||||
| | | | | the node. | | ||||
+-----------------+-------+---------+----------+--------------------+ | ||||
| flags | 4 | 8.0 | | Flags of the | | ||||
| | | | | node. | | ||||
+-----------------+-------+---------+----------+--------------------+ | ||||
| name | 5 | 8.0 | | Optional node | | ||||
| | | | | name for easier | | ||||
| | | | | operations. | | ||||
+-----------------+-------+---------+----------+--------------------+ | ||||
| pod | 6 | 8.0 | | Pod to which the | | ||||
| | | | | node belongs. | | ||||
+-----------------+-------+---------+----------+--------------------+ | ||||
| startup_time | 7 | 8.0 | | Optional startup | | ||||
| | | | | time of the node | | ||||
+-----------------+-------+---------+----------+--------------------+ | ||||
| miscabled_links | 10 | 8.0 | | If any local | | ||||
| | | | | links are | | ||||
| | | | | miscabled, this | | ||||
| | | | | indication is | | ||||
| | | | | flooded. | | ||||
+-----------------+-------+---------+----------+--------------------+ | ||||
| same_plane_tofs | 12 | 8.0 | | Tofs in the same | | ||||
| | | | | plane. only | | ||||
| | | | | carried by tof. | | ||||
| | | | | multiple node | | ||||
| | | | | ties can carry | | ||||
| | | | | disjoint sets of | | ||||
| | | | | tofs which must | | ||||
| | | | | be joined to form | | ||||
| | | | | a single set. | | ||||
+-----------------+-------+---------+----------+--------------------+ | ||||
| fabric_id | 20 | 8.0 | | It provides the | | ||||
| | | | | optional id of | | ||||
| | | | | the fabric | | ||||
| | | | | configured | | ||||
+-----------------+-------+---------+----------+--------------------+ | ||||
Table 31 | +=====+============================+=======+========+==============+ | |||
|Value| Name |Min. |Max. | Comment | | ||||
| | |Schema |Schema | | | ||||
| | |Version|Version | | | ||||
+=====+============================+=======+========+==============+ | ||||
|0 | Reserved |8.0 |All | | | ||||
| | | |Versions| | | ||||
+-----+----------------------------+-------+--------+--------------+ | ||||
|1 | local_id |8.0 | | Node-wide | | ||||
| | | | | unique value | | ||||
| | | | | for the | | ||||
| | | | | local link. | | ||||
+-----+----------------------------+-------+--------+--------------+ | ||||
|2 | remote_id |8.0 | | Received the | | ||||
| | | | | remote link | | ||||
| | | | | ID for this | | ||||
| | | | | link. | | ||||
+-----+----------------------------+-------+--------+--------------+ | ||||
|10 | platform_interface_index |8.0 | | Describes | | ||||
| | | | | the local | | ||||
| | | | | interface | | ||||
| | | | | index of the | | ||||
| | | | | link. | | ||||
+-----+----------------------------+-------+--------+--------------+ | ||||
|11 | platform_interface_name |8.0 | | Describes | | ||||
| | | | | the local | | ||||
| | | | | interface | | ||||
| | | | | name. | | ||||
+-----+----------------------------+-------+--------+--------------+ | ||||
|12 | trusted_outer_security_key |8.0 | | Indicates | | ||||
| | | | | whether the | | ||||
| | | | | link is | | ||||
| | | | | secured, | | ||||
| | | | | i.e., | | ||||
| | | | | protected by | | ||||
| | | | | outer key, | | ||||
| | | | | absence of | | ||||
| | | | | this element | | ||||
| | | | | means no | | ||||
| | | | | indication, | | ||||
| | | | | undefined | | ||||
| | | | | outer key | | ||||
| | | | | means not | | ||||
| | | | | secured. | | ||||
+-----+----------------------------+-------+--------+--------------+ | ||||
|13 | bfd_up |8.0 | | Indicates | | ||||
| | | | | whether the | | ||||
| | | | | link is | | ||||
| | | | | protected by | | ||||
| | | | | an | | ||||
| | | | | established | | ||||
| | | | | BFD session. | | ||||
+-----+----------------------------+-------+--------+--------------+ | ||||
|14 | address_families |8.0 | | Optional | | ||||
| | | | | indication | | ||||
| | | | | that address | | ||||
| | | | | families are | | ||||
| | | | | up on the | | ||||
| | | | | interface. | | ||||
+-----+----------------------------+-------+--------+--------------+ | ||||
10.3.25. Registry RIFT/encoding/PacketContent | Table 26 | |||
The name of the registry should be RIFTEncodingPacketContent. | 10.3.20. RIFTEncodingNeighbor Registry | |||
Content of a RIFT packet. | This registry has the following initial values. | |||
+==========+=======+=====================+=============+=========+ | +=======+============+=============+=============+=================+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema | Max. Schema | Comment | | |||
| | | | Version | | | | | | Version | Version | | | |||
+==========+=======+=====================+=============+=========+ | +=======+============+=============+=============+=================+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+------------+-------------+-------------+-----------------+ | |||
| lie | 1 | 8.0 | | | | | 1 | originator | 8.0 | | System ID of | | |||
+----------+-------+---------------------+-------------+---------+ | | | | | | the originator. | | |||
| tide | 2 | 8.0 | | | | +-------+------------+-------------+-------------+-----------------+ | |||
+----------+-------+---------------------+-------------+---------+ | | 2 | remote_id | 8.0 | | ID of remote | | |||
| tire | 3 | 8.0 | | | | | | | | | side of the | | |||
+----------+-------+---------------------+-------------+---------+ | | | | | | link. | | |||
| tie | 4 | 8.0 | | | | +-------+------------+-------------+-------------+-----------------+ | |||
+----------+-------+---------------------+-------------+---------+ | ||||
Table 32 | Table 27: Neighbor Structure | |||
10.3.26. Registry RIFT/encoding/PacketHeader | 10.3.21. RIFTEncodingNodeCapabilities Registry | |||
The name of the registry should be RIFTEncodingPacketHeader. | This registry has the following initial values. | |||
Common RIFT packet header. | +=====+========================+=========+==========+==============+ | |||
|Value| Name | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+=====+========================+=========+==========+==============+ | ||||
|0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-----+------------------------+---------+----------+--------------+ | ||||
|1 | protocol_minor_version | 8.0 | | Must | | ||||
| | | | | advertise | | ||||
| | | | | supported | | ||||
| | | | | minor | | ||||
| | | | | version | | ||||
| | | | | dialect that | | ||||
| | | | | way. | | ||||
+-----+------------------------+---------+----------+--------------+ | ||||
|2 | flood_reduction | 8.0 | | Indicates | | ||||
| | | | | that node | | ||||
| | | | | supports | | ||||
| | | | | flood | | ||||
| | | | | reduction. | | ||||
+-----+------------------------+---------+----------+--------------+ | ||||
|3 | hierarchy_indications | 8.0 | | Indicates | | ||||
| | | | | place in | | ||||
| | | | | hierarchy, | | ||||
| | | | | i.e., top of | | ||||
| | | | | fabric or | | ||||
| | | | | leaf only | | ||||
| | | | | (in ZTP) or | | ||||
| | | | | support for | | ||||
| | | | | L2L | | ||||
| | | | | procedures. | | ||||
+-----+------------------------+---------+----------+--------------+ | ||||
+===============+=======+=========+==========+===================+ | Table 28: Capabilities the Node Supports | |||
| Name | Value | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+===============+=======+=========+==========+===================+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+---------------+-------+---------+----------+-------------------+ | ||||
| major_version | 1 | 8.0 | | Major version of | | ||||
| | | | | protocol. | | ||||
+---------------+-------+---------+----------+-------------------+ | ||||
| minor_version | 2 | 8.0 | | Minor version of | | ||||
| | | | | protocol. | | ||||
+---------------+-------+---------+----------+-------------------+ | ||||
| sender | 3 | 8.0 | | Node sending the | | ||||
| | | | | packet, in case | | ||||
| | | | | of lie/tire/tide | | ||||
| | | | | also the | | ||||
| | | | | originator of it. | | ||||
+---------------+-------+---------+----------+-------------------+ | ||||
| level | 4 | 8.0 | | Level of the node | | ||||
| | | | | sending the | | ||||
| | | | | packet, required | | ||||
| | | | | on everything | | ||||
| | | | | except lies. lack | | ||||
| | | | | of presence on | | ||||
| | | | | lies indicates | | ||||
| | | | | undefined_level | | ||||
| | | | | and is used in | | ||||
| | | | | ztp procedures. | | ||||
+---------------+-------+---------+----------+-------------------+ | ||||
Table 33 | 10.3.22. RIFTEncodingNodeFlags Registry | |||
10.3.27. Registry RIFT/encoding/PrefixAttributes | This registry has the following initial values. | |||
The name of the registry should be RIFTEncodingPrefixAttributes. | +=======+==========+=========+==========+===========================+ | |||
| Value | Name | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+=======+==========+=========+==========+===========================+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+----------+---------+----------+---------------------------+ | ||||
| 1 | overload | 8.0 | | Indicates that node | | ||||
| | | | | is in overload; do | | ||||
| | | | | not transit traffic | | ||||
| | | | | through it. | | ||||
+-------+----------+---------+----------+---------------------------+ | ||||
Attributes of a prefix. | Table 29: Indication Flags of the Node | |||
+===================+=======+=========+==========+==================+ | 10.3.23. RIFTEncodingNodeNeighborsTIEElement Registry | |||
| Name | Value | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+===================+=======+=========+==========+==================+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------------------+-------+---------+----------+------------------+ | ||||
| metric | 2 | 8.0 | | Distance of the | | ||||
| | | | | prefix. | | ||||
+-------------------+-------+---------+----------+------------------+ | ||||
| tags | 3 | 8.0 | | Generic | | ||||
| | | | | unordered set | | ||||
| | | | | of route tags, | | ||||
| | | | | can be | | ||||
| | | | | redistributed | | ||||
| | | | | to other | | ||||
| | | | | protocols or | | ||||
| | | | | use within the | | ||||
| | | | | context of real | | ||||
| | | | | time analytics. | | ||||
+-------------------+-------+---------+----------+------------------+ | ||||
| monotonic_clock | 4 | 8.0 | | Monotonic clock | | ||||
| | | | | for mobile | | ||||
| | | | | addresses. | | ||||
+-------------------+-------+---------+----------+------------------+ | ||||
| loopback | 6 | 8.0 | | Indicates if | | ||||
| | | | | the prefix is a | | ||||
| | | | | node loopback. | | ||||
+-------------------+-------+---------+----------+------------------+ | ||||
| directly_attached | 7 | 8.0 | | Indicates that | | ||||
| | | | | the prefix is | | ||||
| | | | | directly | | ||||
| | | | | attached. | | ||||
+-------------------+-------+---------+----------+------------------+ | ||||
| from_link | 10 | 8.0 | | Link to which | | ||||
| | | | | the address | | ||||
| | | | | belongs to. | | ||||
+-------------------+-------+---------+----------+------------------+ | ||||
| label | 12 | 8.0 | | Optional, per | | ||||
| | | | | prefix | | ||||
| | | | | significant | | ||||
| | | | | label. | | ||||
+-------------------+-------+---------+----------+------------------+ | ||||
Table 34 | This registry has the following initial values. | |||
10.3.28. Registry RIFT/encoding/PrefixTIEElement | +=======+===========+=========+==========+======================+ | |||
| Value | Name | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+=======+===========+=========+==========+======================+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+-----------+---------+----------+----------------------+ | ||||
| 1 | level | 8.0 | | Level of neighbor. | | ||||
+-------+-----------+---------+----------+----------------------+ | ||||
| 3 | cost | 8.0 | | Cost to neighbor. | | ||||
| | | | | Ignore anything | | ||||
| | | | | equal or larger than | | ||||
| | | | | 'infinite_distance' | | ||||
| | | | | and equal to | | ||||
| | | | | 'invalid_distance'. | | ||||
+-------+-----------+---------+----------+----------------------+ | ||||
| 4 | link_ids | 8.0 | | Carries description | | ||||
| | | | | of multiple parallel | | ||||
| | | | | links in a tie. | | ||||
+-------+-----------+---------+----------+----------------------+ | ||||
| 5 | bandwidth | 8.0 | | Total bandwidth to | | ||||
| | | | | neighbor as sum of | | ||||
| | | | | all parallel links. | | ||||
+-------+-----------+---------+----------+----------------------+ | ||||
The name of the registry should be RIFTEncodingPrefixTIEElement. | Table 30: Neighbor of a Node | |||
TIE carrying prefixes | 10.3.24. RIFTEncodingNodeTIEElement Registry | |||
+==========+=======+=============+=============+================+ | This registry has the following initial values. | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | ||||
| | | Version | Version | | | ||||
+==========+=======+=============+=============+================+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+----------+-------+-------------+-------------+----------------+ | ||||
| prefixes | 1 | 8.0 | | Prefixes with | | ||||
| | | | | the associated | | ||||
| | | | | attributes. | | ||||
+----------+-------+-------------+-------------+----------------+ | ||||
Table 35 | +=======+=================+=========+==========+====================+ | |||
| Value | Name | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+=======+=================+=========+==========+====================+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+-----------------+---------+----------+--------------------+ | ||||
| 1 | level | 8.0 | | Level of the | | ||||
| | | | | node. | | ||||
+-------+-----------------+---------+----------+--------------------+ | ||||
| 2 | neighbors | 8.0 | | Node's neighbors. | | ||||
| | | | | Multiple node | | ||||
| | | | | ties can carry | | ||||
| | | | | disjoint sets of | | ||||
| | | | | neighbors. | | ||||
+-------+-----------------+---------+----------+--------------------+ | ||||
| 3 | capabilities | 8.0 | | Capabilities of | | ||||
| | | | | the node. | | ||||
+-------+-----------------+---------+----------+--------------------+ | ||||
| 4 | flags | 8.0 | | Flags of the | | ||||
| | | | | node. | | ||||
+-------+-----------------+---------+----------+--------------------+ | ||||
| 5 | name | 8.0 | | Optional node | | ||||
| | | | | name for easier | | ||||
| | | | | operations. | | ||||
+-------+-----------------+---------+----------+--------------------+ | ||||
| 6 | pod | 8.0 | | Pod to which the | | ||||
| | | | | node belongs. | | ||||
+-------+-----------------+---------+----------+--------------------+ | ||||
| 7 | startup_time | 8.0 | | Optional startup | | ||||
| | | | | time of the node. | | ||||
+-------+-----------------+---------+----------+--------------------+ | ||||
| 10 | miscabled_links | 8.0 | | If any local | | ||||
| | | | | links are | | ||||
| | | | | miscabled, this | | ||||
| | | | | indication is | | ||||
| | | | | flooded. | | ||||
+-------+-----------------+---------+----------+--------------------+ | ||||
| 12 | same_plane_tofs | 8.0 | | ToFs in the same | | ||||
| | | | | plane. Only | | ||||
| | | | | carried by ToF. | | ||||
| | | | | Multiple node | | ||||
| | | | | ties can carry | | ||||
| | | | | disjoint sets of | | ||||
| | | | | ToFs that must be | | ||||
| | | | | joined to form a | | ||||
| | | | | single set. | | ||||
+-------+-----------------+---------+----------+--------------------+ | ||||
| 20 | fabric_id | 8.0 | | It provides the | | ||||
| | | | | optional ID of | | ||||
| | | | | the fabric | | ||||
| | | | | configured. | | ||||
+-------+-----------------+---------+----------+--------------------+ | ||||
10.3.29. Registry RIFT/encoding/ProtocolPacket | Table 31: Description of a Node | |||
The name of the registry should be RIFTEncodingProtocolPacket. | 10.3.25. RIFTEncodingPacketContent Registry | |||
RIFT packet structure. | This registry has the following initial values. | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| header | 1 | 8.0 | | | | | 1 | lie | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| content | 2 | 8.0 | | | | | 2 | tide | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| 3 | tire | 8.0 | | | | ||||
+-------+----------+---------------------+-------------+---------+ | ||||
| 4 | tie | 8.0 | | | | ||||
+-------+----------+---------------------+-------------+---------+ | ||||
Table 36 | Table 32: Content of a RIFT Packet | |||
10.3.30. Registry RIFT/encoding/TIDEPacket | 10.3.26. RIFTEncodingPacketHeader Registry | |||
The name of the registry should be RIFTEncodingTIDEPacket. | This registry has the following initial values. | |||
TIDE with *sorted* TIE headers. | +=======+===============+=========+==========+===================+ | |||
| Value | Name | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+=======+===============+=========+==========+===================+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+---------------+---------+----------+-------------------+ | ||||
| 1 | major_version | 8.0 | | Major version of | | ||||
| | | | | protocol. | | ||||
+-------+---------------+---------+----------+-------------------+ | ||||
| 2 | minor_version | 8.0 | | Minor version of | | ||||
| | | | | protocol. | | ||||
+-------+---------------+---------+----------+-------------------+ | ||||
| 3 | sender | 8.0 | | Node sending the | | ||||
| | | | | packet, in case | | ||||
| | | | | of LIE/TIRE/TIDE | | ||||
| | | | | also the | | ||||
| | | | | originator of it. | | ||||
+-------+---------------+---------+----------+-------------------+ | ||||
| 4 | level | 8.0 | | Level of the node | | ||||
| | | | | sending the | | ||||
| | | | | packet, required | | ||||
| | | | | on everything | | ||||
| | | | | except LIEs. | | ||||
| | | | | Lack of presence | | ||||
| | | | | on LIEs indicates | | ||||
| | | | | undefined_level | | ||||
| | | | | and is used in | | ||||
| | | | | ZTP procedures. | | ||||
+-------+---------------+---------+----------+-------------------+ | ||||
+=============+=======+=============+=============+===============+ | Table 33: Common RIFT Packet Header | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | ||||
| | | Version | Version | | | ||||
+=============+=======+=============+=============+===============+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------------+-------+-------------+-------------+---------------+ | ||||
| start_range | 1 | 8.0 | | First tie | | ||||
| | | | | header in the | | ||||
| | | | | tide packet. | | ||||
+-------------+-------+-------------+-------------+---------------+ | ||||
| end_range | 2 | 8.0 | | Last tie | | ||||
| | | | | header in the | | ||||
| | | | | tide packet. | | ||||
+-------------+-------+-------------+-------------+---------------+ | ||||
| headers | 3 | 8.0 | | _sorted_ list | | ||||
| | | | | of headers. | | ||||
+-------------+-------+-------------+-------------+---------------+ | ||||
Table 37 | 10.3.27. RIFTEncodingPrefixAttributes Registry | |||
10.3.31. Registry RIFT/encoding/TIEElement | This registry has the following initial values. | |||
The name of the registry should be RIFTEncodingTIEElement. | +=======+===================+=========+==========+==================+ | |||
| Value | Name | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+=======+===================+=========+==========+==================+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+-------------------+---------+----------+------------------+ | ||||
| 2 | metric | 8.0 | | Distance of the | | ||||
| | | | | prefix. | | ||||
+-------+-------------------+---------+----------+------------------+ | ||||
| 3 | tags | 8.0 | | Generic | | ||||
| | | | | unordered set | | ||||
| | | | | of route tags, | | ||||
| | | | | can be | | ||||
| | | | | redistributed | | ||||
| | | | | to other | | ||||
| | | | | protocols or | | ||||
| | | | | used within the | | ||||
| | | | | context of real | | ||||
| | | | | time analytics. | | ||||
+-------+-------------------+---------+----------+------------------+ | ||||
| 4 | monotonic_clock | 8.0 | | Monotonic clock | | ||||
| | | | | for mobile | | ||||
| | | | | addresses. | | ||||
+-------+-------------------+---------+----------+------------------+ | ||||
| 6 | loopback | 8.0 | | Indicates if | | ||||
| | | | | the prefix is a | | ||||
| | | | | node loopback. | | ||||
+-------+-------------------+---------+----------+------------------+ | ||||
| 7 | directly_attached | 8.0 | | Indicates that | | ||||
| | | | | the prefix is | | ||||
| | | | | directly | | ||||
| | | | | attached. | | ||||
+-------+-------------------+---------+----------+------------------+ | ||||
| 10 | from_link | 8.0 | | Link to which | | ||||
| | | | | the address | | ||||
| | | | | belongs to. | | ||||
+-------+-------------------+---------+----------+------------------+ | ||||
| 12 | label | 8.0 | | Optional, per- | | ||||
| | | | | prefix | | ||||
| | | | | significant | | ||||
| | | | | label. | | ||||
+-------+-------------------+---------+----------+------------------+ | ||||
Single element in a TIE. | Table 34: Attributes of a Prefix | |||
+=========================================+=====+=======+========+=================================+ | 10.3.28. RIFTEncodingPrefixTIEElement Registry | |||
|Name |Value| Min.| Max.|Comment | | ||||
| | | Schema| Schema| | | ||||
| | |Version| Version| | | ||||
+=========================================+=====+=======+========+=================================+ | ||||
|Reserved | 0| 8.0| All| | | ||||
| | | |Versions| | | ||||
+-----------------------------------------+-----+-------+--------+---------------------------------+ | ||||
|node | 1| 8.0| | Used in case of enum| | ||||
| | | | | common.tietypetype.nodetietype.| | ||||
+-----------------------------------------+-----+-------+--------+---------------------------------+ | ||||
|prefixes | 2| 8.0| | Used in case of enum| | ||||
| | | | |common.tietypetype.prefixtietype.| | ||||
+-----------------------------------------+-----+-------+--------+---------------------------------+ | ||||
|positive_disaggregation_prefixes | 3| 8.0| | Positive prefixes (always| | ||||
| | | | | southbound).| | ||||
+-----------------------------------------+-----+-------+--------+---------------------------------+ | ||||
|negative_disaggregation_prefixes | 5| 8.0| | Transitive, negative prefixes| | ||||
| | | | | (always southbound)| | ||||
+-----------------------------------------+-----+-------+--------+---------------------------------+ | ||||
|external_prefixes | 6| 8.0| | Externally reimported prefixes.| | ||||
+-----------------------------------------+-----+-------+--------+---------------------------------+ | ||||
|positive_external_disaggregation_prefixes| 7| 8.0| | Positive external disaggregated| | ||||
| | | | | prefixes (always southbound).| | ||||
+-----------------------------------------+-----+-------+--------+---------------------------------+ | ||||
|keyvalues | 9| 8.0| | Key-value store elements.| | ||||
+-----------------------------------------+-----+-------+--------+---------------------------------+ | ||||
Table 38 | This registry has the following initial values. | |||
10.3.32. Registry RIFT/encoding/TIEHeader | +=======+==========+=============+=============+================+ | |||
| Value | Name | Min. Schema | Max. Schema | Comment | | ||||
| | | Version | Version | | | ||||
+=======+==========+=============+=============+================+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+----------+-------------+-------------+----------------+ | ||||
| 1 | prefixes | 8.0 | | Prefixes with | | ||||
| | | | | the associated | | ||||
| | | | | attributes. | | ||||
+-------+----------+-------------+-------------+----------------+ | ||||
The name of the registry should be RIFTEncodingTIEHeader. | Table 35: TIE Carrying Prefixes | |||
Header of a TIE. | 10.3.29. RIFTEncodingProtocolPacket Registry | |||
+======================+=======+=========+==========+==============+ | This registry has the following initial values. | |||
| Name | Value | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+======================+=======+=========+==========+==============+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+----------------------+-------+---------+----------+--------------+ | ||||
| tieid | 2 | 8.0 | | Id of tie. | | ||||
+----------------------+-------+---------+----------+--------------+ | ||||
| seq_nr | 3 | 8.0 | | Sequence | | ||||
| | | | | number of | | ||||
| | | | | tie. | | ||||
+----------------------+-------+---------+----------+--------------+ | ||||
| origination_time | 10 | 8.0 | | Absolute | | ||||
| | | | | timestamp | | ||||
| | | | | when tie was | | ||||
| | | | | generated. | | ||||
+----------------------+-------+---------+----------+--------------+ | ||||
| origination_lifetime | 12 | 8.0 | | Original | | ||||
| | | | | lifetime | | ||||
| | | | | when tie was | | ||||
| | | | | generated. | | ||||
+----------------------+-------+---------+----------+--------------+ | ||||
Table 39 | +=======+==========+=====================+=============+=========+ | |||
| Value | Name | Min. Schema Version | Max. Schema | Comment | | ||||
| | | | Version | | | ||||
+=======+==========+=====================+=============+=========+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+----------+---------------------+-------------+---------+ | ||||
| 1 | header | 8.0 | | | | ||||
+-------+----------+---------------------+-------------+---------+ | ||||
| 2 | content | 8.0 | | | | ||||
+-------+----------+---------------------+-------------+---------+ | ||||
10.3.33. Registry RIFT/encoding/TIEHeaderWithLifeTime | Table 36: RIFT Packet Structure | |||
The name of the registry should be RIFTEncodingTIEHeaderWithLifeTime. | 10.3.30. RIFTEncodingTIDEPacket Registry | |||
Header of a TIE as described in TIRE/TIDE. | This registry has the following initial values. | |||
+====================+=======+=============+==========+===========+ | +=======+=============+=============+=============+===============+ | |||
| Name | Value | Min. Schema | Max. | Comment | | | Value | Name | Min. Schema | Max. Schema | Comment | | |||
| | | Version | Schema | | | | | | Version | Version | | | |||
| | | | Version | | | +=======+=============+=============+=============+===============+ | |||
+====================+=======+=============+==========+===========+ | | 0 | Reserved | 8.0 | All | | | |||
| Reserved | 0 | 8.0 | All | | | | | | | Versions | | | |||
| | | | Versions | | | +-------+-------------+-------------+-------------+---------------+ | |||
+--------------------+-------+-------------+----------+-----------+ | | 1 | start_range | 8.0 | | First TIE | | |||
| header | 1 | 8.0 | | | | | | | | | header in the | | |||
+--------------------+-------+-------------+----------+-----------+ | | | | | | TIDE packet. | | |||
| remaining_lifetime | 2 | 8.0 | | Remaining | | +-------+-------------+-------------+-------------+---------------+ | |||
| | | | | lifetime. | | | 2 | end_range | 8.0 | | Last TIE | | |||
+--------------------+-------+-------------+----------+-----------+ | | | | | | header in the | | |||
| | | | | TIDE packet. | | ||||
+-------+-------------+-------------+-------------+---------------+ | ||||
| 3 | headers | 8.0 | | _sorted_ list | | ||||
| | | | | of headers. | | ||||
+-------+-------------+-------------+-------------+---------------+ | ||||
Table 40 | Table 37: TIDE with Sorted TIE Headers | |||
10.3.34. Registry RIFT/encoding/TIEID | 10.3.31. RIFTEncodingTIEElement Registry | |||
The name of the registry should be RIFTEncodingTIEID. | This registry has the following initial values. | |||
Unique ID of a TIE. | +=====+========================+=======+========+===================+ | |||
|Value|Name |Min. |Max. |Comment | | ||||
| | |Schema |Schema | | | ||||
| | |Version|Version | | | ||||
+=====+========================+=======+========+===================+ | ||||
|0 |Reserved |8.0 |All | | | ||||
| | | |Versions| | | ||||
+-----+------------------------+-------+--------+-------------------+ | ||||
|1 |node |8.0 | |Used in case of | | ||||
| | | | |enum | | ||||
| | | | |common.tietypetype.| | ||||
| | | | |nodetietype. | | ||||
+-----+------------------------+-------+--------+-------------------+ | ||||
|2 |prefixes |8.0 | |Used in case of | | ||||
| | | | |enum | | ||||
| | | | |common.tietypetype.| | ||||
| | | | |prefixtietype. | | ||||
+-----+------------------------+-------+--------+-------------------+ | ||||
|3 |positive_disaggregation_|8.0 | |Positive prefixes | | ||||
| |prefixes | | |(always | | ||||
| | | | |southbound). | | ||||
+-----+------------------------+-------+--------+-------------------+ | ||||
|5 |negative_disaggregation_|8.0 | |Transitive, | | ||||
| |prefixes | | |negative prefixes | | ||||
| | | | |(always southbound)| | ||||
+-----+------------------------+-------+--------+-------------------+ | ||||
|6 |external_prefixes |8.0 | |Externally | | ||||
| | | | |reimported | | ||||
| | | | |prefixes. | | ||||
+-----+------------------------+-------+--------+-------------------+ | ||||
|7 |positive_external_ |8.0 | |Positive external | | ||||
| |disaggregation_prefixes | | |disaggregated | | ||||
| | | | |prefixes | | ||||
| | | | |(always | | ||||
| | | | |southbound). | | ||||
+-----+------------------------+-------+--------+-------------------+ | ||||
|9 |keyvalues |8.0 | |Key-value | | ||||
| | | | |store elements. | | ||||
+-----+------------------------+-------+--------+-------------------+ | ||||
+============+=======+=============+=============+============+ | Table 38: Single Element in a TIE | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | ||||
| | | Version | Version | | | ||||
+============+=======+=============+=============+============+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+------------+-------+-------------+-------------+------------+ | ||||
| direction | 1 | 8.0 | | Direction | | ||||
| | | | | of tie. | | ||||
+------------+-------+-------------+-------------+------------+ | ||||
| originator | 2 | 8.0 | | Indicates | | ||||
| | | | | originator | | ||||
| | | | | of tie. | | ||||
+------------+-------+-------------+-------------+------------+ | ||||
| tietype | 3 | 8.0 | | Type of | | ||||
| | | | | tie. | | ||||
+------------+-------+-------------+-------------+------------+ | ||||
| tie_nr | 4 | 8.0 | | Number of | | ||||
| | | | | tie. | | ||||
+------------+-------+-------------+-------------+------------+ | ||||
Table 41 | 10.3.32. RIFTEncodingTIEHeader Registry | |||
10.3.35. Registry RIFT/encoding/TIEPacket | This registry has the following initial values. | |||
The name of the registry should be RIFTEncodingTIEPacket. | +=======+======================+=========+==========+==============+ | |||
| Value | Name | Min. | Max. | Comment | | ||||
| | | Schema | Schema | | | ||||
| | | Version | Version | | | ||||
+=======+======================+=========+==========+==============+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+----------------------+---------+----------+--------------+ | ||||
| 2 | tieid | 8.0 | | ID of TIE. | | ||||
+-------+----------------------+---------+----------+--------------+ | ||||
| 3 | seq_nr | 8.0 | | Sequence | | ||||
| | | | | number of | | ||||
| | | | | TIE. | | ||||
+-------+----------------------+---------+----------+--------------+ | ||||
| 10 | origination_time | 8.0 | | Absolute | | ||||
| | | | | timestamp | | ||||
| | | | | when TIE was | | ||||
| | | | | generated. | | ||||
+-------+----------------------+---------+----------+--------------+ | ||||
| 12 | origination_lifetime | 8.0 | | Original | | ||||
| | | | | lifetime | | ||||
| | | | | when TIE was | | ||||
| | | | | generated. | | ||||
+-------+----------------------+---------+----------+--------------+ | ||||
TIE packet | Table 39: Header of a TIE | |||
+==========+=======+=====================+=============+=========+ | 10.3.33. RIFTEncodingTIEHeaderWithLifeTime Registry | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | ||||
| | | | Version | | | ||||
+==========+=======+=====================+=============+=========+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+----------+-------+---------------------+-------------+---------+ | ||||
| header | 1 | 8.0 | | | | ||||
+----------+-------+---------------------+-------------+---------+ | ||||
| element | 2 | 8.0 | | | | ||||
+----------+-------+---------------------+-------------+---------+ | ||||
Table 42 | This registry has the following initial values. | |||
10.3.36. Registry RIFT/encoding/TIREPacket | +=======+====================+=============+==========+===========+ | |||
| Value | Name | Min. Schema | Max. | Comment | | ||||
| | | Version | Schema | | | ||||
| | | | Version | | | ||||
+=======+====================+=============+==========+===========+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+--------------------+-------------+----------+-----------+ | ||||
| 1 | header | 8.0 | | | | ||||
+-------+--------------------+-------------+----------+-----------+ | ||||
| 2 | remaining_lifetime | 8.0 | | Remaining | | ||||
| | | | | lifetime. | | ||||
+-------+--------------------+-------------+----------+-----------+ | ||||
The name of the registry should be RIFTEncodingTIREPacket. | Table 40: Header of a TIE as Described in TIRE/TIDE | |||
TIRE packet | 10.3.34. RIFTEncodingTIEID Registry | |||
+==========+=======+=====================+=============+=========+ | This registry has the following initial values. | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | ||||
| | | | Version | | | ||||
+==========+=======+=====================+=============+=========+ | ||||
| Reserved | 0 | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+----------+-------+---------------------+-------------+---------+ | ||||
| headers | 1 | 8.0 | | | | ||||
+----------+-------+---------------------+-------------+---------+ | ||||
Table 43 | +=======+============+=============+=============+============+ | |||
| Value | Name | Min. Schema | Max. Schema | Comment | | ||||
| | | Version | Version | | | ||||
+=======+============+=============+=============+============+ | ||||
| 0 | Reserved | 8.0 | All | | | ||||
| | | | Versions | | | ||||
+-------+------------+-------------+-------------+------------+ | ||||
| 1 | direction | 8.0 | | Direction | | ||||
| | | | | of TIE. | | ||||
+-------+------------+-------------+-------------+------------+ | ||||
| 2 | originator | 8.0 | | Indicates | | ||||
| | | | | originator | | ||||
| | | | | of TIE. | | ||||
+-------+------------+-------------+-------------+------------+ | ||||
| 3 | tietype | 8.0 | | Type of | | ||||
| | | | | TIE. | | ||||
+-------+------------+-------------+-------------+------------+ | ||||
| 4 | tie_nr | 8.0 | | Number of | | ||||
| | | | | TIE. | | ||||
+-------+------------+-------------+-------------+------------+ | ||||
11. Acknowledgments | Table 41: Unique ID of a TIE | |||
A new routing protocol in its complexity is not a product of a parent | 10.3.35. RIFTEncodingTIEPacket Registry | |||
but of a village as the author list shows already. However, many | ||||
more people provided input, fine-combed the specification based on | ||||
their experience in design, implementation or application of | ||||
protocols in IP fabrics. This section will make an inadequate | ||||
attempt in recording their contribution. | ||||
Many thanks to Naiming Shen for some of the early discussions around | This registry has the following initial values. | |||
the topic of using IGPs for routing in topologies related to Clos. | ||||
Russ White to be especially acknowledged for the key conversation on | ||||
epistemology that allowed to tie current asynchronous distributed | ||||
systems theory results to a modern protocol design presented in this | ||||
scope. Adrian Farrel, Joel Halpern, Jeffrey Zhang, Krzysztof | ||||
Szarkowicz, Nagendra Kumar, Melchior Aelmans, Kaushal Tank, Will | ||||
Jones, Moin Ahmed, Sandy Zhang, Donald Eastlake provided thoughtful | ||||
comments that improved the readability of the document and found good | ||||
amount of corners where the light failed to shine. Kris Price was | ||||
first to mention single router, single arm default considerations. | ||||
Jeff Tantsura helped out with some initial thoughts on BFD | ||||
interactions while Jeff Haas corrected several misconceptions about | ||||
BFD's finer points and helped to improve the security section around | ||||
leaf considerations. Artur Makutunowicz pointed out many possible | ||||
improvements and acted as sounding board in regard to modern protocol | ||||
implementation techniques RIFT is exploring. Barak Gafni formalized | ||||
first time clearly the problem of partitioned spine and fallen leaves | ||||
on a (clean) napkin in Singapore that led to the very important part | ||||
of the specification centered around multiple ToF planes and negative | ||||
disaggregation. Igor Gashinsky and others shared many thoughts on | ||||
problems encountered in design and operation of large-scale data | ||||
center fabrics. Xu Benchong found a delicate error in the flooding | ||||
procedures and a schema datatype size mismatch. | ||||
Too many people to mention provided reviews from many directions in | +=======+==========+=====================+=============+=========+ | |||
IETF, often pointing to critical defects, sometimes asking for things | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
again that have been removed by one the previous reviewers as | | | | | Version | | | |||
objectionable or superfluous, and many times claiming the document | +=======+==========+=====================+=============+=========+ | |||
being somewhere on the extremes between too crowded with the obvious | | 0 | Reserved | 8.0 | All | | | |||
and omitting introduction to cryptic concepts everywhere. The result | | | | | Versions | | | |||
is the best editors could do to find a balance of a document guiding | +-------+----------+---------------------+-------------+---------+ | |||
the reader by Section 2 into a specification tight enough to result | | 1 | header | 8.0 | | | | |||
in interoperable implementations while at the same time introducing | +-------+----------+---------------------+-------------+---------+ | |||
enough operational context of IP routable fabrics to guarantee a | | 2 | element | 8.0 | | | | |||
concise, common language when facing unaccustomed concepts the | +-------+----------+---------------------+-------------+---------+ | |||
protocol relies on. In the process it was important to not end up | ||||
carrying Aesop's donkey of course so while the result may not be | ||||
perceived as perfect by everyone it should be practically speaking | ||||
more than sufficient for everyone that ends up using it in the | ||||
future. | ||||
Last but not least, Alvaro Retana, John Scudder, Andrew Alston and | Table 42: TIE Packet | |||
Jim Guichard guided the undertaking as ADs by asking many necessary | ||||
procedural and technical questions which did not only improve the | ||||
content but did also lay out the track towards publication. And | ||||
Roman Danyliw is mentioned very last but not least either for his | ||||
painstakingly detailed review and improvement of security aspects of | ||||
the specification. | ||||
12. Contributors | 10.3.36. RIFTEncodingTIREPacket Registry | |||
This work is a product of a list of individuals which are all to be | This registry has the following initial values. | |||
considered major contributors independent of the fact whether their | ||||
name made it to the limited boilerplate author's list or not. | ||||
+======================+===+==================+===+================+ | +=======+==========+=====================+=============+=========+ | |||
+======================+===+==================+===+================+ | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| Tony Przygienda, Ed. | | | | | | Pascal Thubert | | | | | | Version | | | |||
+----------------------+---+------------------+---+----------------+ | +=======+==========+=====================+=============+=========+ | |||
| Juniper | | | | | | Cisco | | | 0 | Reserved | 8.0 | All | | | |||
+----------------------+---+------------------+---+----------------+ | | | | | Versions | | | |||
| Bruno Rijsman | | | Jordan Head, Ed. | | | Dmitry | | +-------+----------+---------------------+-------------+---------+ | |||
| | | | | Afanasiev | | | 1 | headers | 8.0 | | | | |||
+----------------------+---+------------------+---+----------------+ | +-------+----------+---------------------+-------------+---------+ | |||
| Individual | | | Juniper | | | Individual | | ||||
+----------------------+---+------------------+---+----------------+ | ||||
| Don Fedyk | | | Alia Atlas | | | John Drake | | ||||
+----------------------+---+------------------+---+----------------+ | ||||
| LabN | | | Individual | | | Individual | | ||||
+----------------------+---+------------------+---+----------------+ | ||||
| Ilya Vershkov | | | | | | | | | | ||||
+----------------------+---+------------------+---+----------------+ | ||||
| NVidia | | | | | | | | | | ||||
+----------------------+---+------------------+---+----------------+ | ||||
Table 44: RIFT Authors | Table 43: TIRE Packet | |||
13. References | 11. References | |||
13.1. Normative References | 11.1. Normative References | |||
[EUI64] IEEE, "Guidelines for Use of Extended Unique Identifier | [EUI64] IEEE, "Guidelines for Use of Extended Unique Identifier | |||
(EUI), Organizationally Unique Identifier (OUI), and | (EUI), Organizationally Unique Identifier (OUI), and | |||
Company ID (CID)", IEEE EUI, | Company ID (CID)", <https://standards-support.ieee.org/hc/ | |||
<http://standards.ieee.org/develop/regauth/tut/eui.pdf>. | en-us/articles/4888705676564-Guidelines-for-Use-of- | |||
Extended-Unique-Identifier-EUI-Organizationally-Unique- | ||||
Identifier-OUI-and-Company-ID-CID>. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC2365] Meyer, D., "Administratively Scoped IP Multicast", BCP 23, | [RFC2365] Meyer, D., "Administratively Scoped IP Multicast", BCP 23, | |||
RFC 2365, DOI 10.17487/RFC2365, July 1998, | RFC 2365, DOI 10.17487/RFC2365, July 1998, | |||
<https://www.rfc-editor.org/info/rfc2365>. | <https://www.rfc-editor.org/info/rfc2365>. | |||
skipping to change at page 180, line 21 ¶ | skipping to change at line 7829 ¶ | |||
[RFC9300] Farinacci, D., Fuller, V., Meyer, D., Lewis, D., and A. | [RFC9300] Farinacci, D., Fuller, V., Meyer, D., Lewis, D., and A. | |||
Cabellos, Ed., "The Locator/ID Separation Protocol | Cabellos, Ed., "The Locator/ID Separation Protocol | |||
(LISP)", RFC 9300, DOI 10.17487/RFC9300, October 2022, | (LISP)", RFC 9300, DOI 10.17487/RFC9300, October 2022, | |||
<https://www.rfc-editor.org/info/rfc9300>. | <https://www.rfc-editor.org/info/rfc9300>. | |||
[RFC9301] Farinacci, D., Maino, F., Fuller, V., and A. Cabellos, | [RFC9301] Farinacci, D., Maino, F., Fuller, V., and A. Cabellos, | |||
Ed., "Locator/ID Separation Protocol (LISP) Control | Ed., "Locator/ID Separation Protocol (LISP) Control | |||
Plane", RFC 9301, DOI 10.17487/RFC9301, October 2022, | Plane", RFC 9301, DOI 10.17487/RFC9301, October 2022, | |||
<https://www.rfc-editor.org/info/rfc9301>. | <https://www.rfc-editor.org/info/rfc9301>. | |||
[SHA-2] National Institute of Standards and Technology, "Secure | [SHA-2] NIST, "Secure Hash Standard (SHS)", FIPS PUB 180-4, | |||
Hash Standard, FIPS PUB 180-3", 2008. | DOI 10.6028/NIST.FIPS.180-4, July 2015, | |||
<https://csrc.nist.gov/pubs/fips/180-4/upd1/final>. | ||||
[thrift] Apache Software Foundation, "Thrift Language | [thrift] Apache Software Foundation, "Apache Thrift Documentation", | |||
Implementation and Documentation", | <https://thrift.apache.org/docs/>. | |||
<https://github.com/apache/thrift/tree/0.15.0/doc>. | ||||
13.2. Informative References | 11.2. Informative References | |||
[APPLICABILITY] | [APPLICABILITY] | |||
Wei, Y., Zhang, Z., Afanasiev, D., Thubert, P., and T. | Wei, Y., Zhang, Z., Afanasiev, D., Thubert, P., and T. | |||
Przygienda, "RIFT Applicability", Work in Progress, | Przygienda, "RIFT Applicability and Operational | |||
Internet-Draft, draft-ietf-rift-applicability-15, 13 May | Considerations", Work in Progress, Internet-Draft, draft- | |||
2024, <https://datatracker.ietf.org/doc/html/draft-ietf- | ietf-rift-applicability-17, 17 June 2024, | |||
rift-applicability-15>. | <https://datatracker.ietf.org/doc/html/draft-ietf-rift- | |||
applicability-17>. | ||||
[CLOS] Yuan, X., "On Nonblocking Folded-Clos Networks in Computer | [CLOS] Yuan, X., "On Nonblocking Folded-Clos Networks in Computer | |||
Communication Environments", IEEE International Parallel & | Communication Environments", 2011 IEEE International | |||
Distributed Processing Symposium, 2011. | Parallel & Distributed Processing Symposium, | |||
DOI 10.1109/IPDPS.2011.27, 2011, | ||||
<https://ieeexplore.ieee.org/document/6012836>. | ||||
[DayOne] Aelmans, M., Vandezande, O., Rijsman, B., Head, J., Graf, | [DayOne] Aelmans, M., Vandezande, O., Rijsman, B., Head, J., Graf, | |||
C., Alberro, L., Mali, H., and O. Steudler, "Day One: | C., Alberro, L., Mali, H., and O. Steudler, "Day One: | |||
Routing in Fat Trees (RIFT)", Juniper DayOne . | Routing in Fat Trees (RIFT)", Juniper Network Books, | |||
ISBN 978-1-7363160-0-9, December 2020. | ||||
[DIJKSTRA] Dijkstra, E. W., "A Note on Two Problems in Connexion with | [DIJKSTRA] Dijkstra, E. W., "A Note on Two Problems in Connexion with | |||
Graphs", Journal Numer. Math. , 1959. | Graphs", Numerische Mathematik, vol. 1, pp. 269-271, | |||
DOI 10.1007/BF01386390, December 1959, | ||||
<https://link.springer.com/article/10.1007/BF01386390>. | ||||
[DYNAMO] De Candia et al., G., "Dynamo: amazon's highly available | [DYNAMO] De Candia, G., Hastorun, D., Jampani, M., Kakulpati, G., | |||
key-value store", ACM SIGOPS symposium on Operating | Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, | |||
systems principles (SOSP '07), 2007. | P., and W. Vogels, "Dynamo: amazon's highly available key- | |||
value store", ACM SIGOPS Operating Systems Review, vol. | ||||
41, no. 6, pp. 205-220, DOI 10.1145/1323293.1294281, 2007, | ||||
<https://dl.acm.org/doi/10.1145/1323293.1294281>. | ||||
[EPPSTEIN] Eppstein, D., "Finding the k-Shortest Paths", 1997. | [EPPSTEIN] Eppstein, D., "Finding the k Shortest Paths", 1997, | |||
<https://ics.uci.edu/~eppstein/pubs/Epp-SJC-98.pdf>. | ||||
[FATTREE] Leiserson, C. E., "Fat-Trees: Universal Networks for | [FATTREE] Leiserson, C. E., "Fat-Trees: Universal Networks for | |||
Hardware-Efficient Supercomputing", 1985. | Hardware-Efficient Supercomputing", IEEE Transactions on | |||
Computers, vol. C-34, no. 10, pp. 892-901, | ||||
DOI 10.1109/TC.1985.6312192, October 1985, | ||||
<https://ieeexplore.ieee.org/document/6312192>. | ||||
[IEEEstd1588] | [IEEEstd1588] | |||
IEEE, "IEEE Standard for a Precision Clock Synchronization | IEEE, "IEEE Standard for a Precision Clock Synchronization | |||
Protocol for Networked Measurement and Control Systems", | Protocol for Networked Measurement and Control Systems", | |||
IEEE Standard 1588, | IEEE Std 1588-2008, DOI 10.1109/IEEESTD.2008.4579760, July | |||
<https://ieeexplore.ieee.org/document/4579760/>. | 2008, <https://ieeexplore.ieee.org/document/4579760/>. | |||
[IEEEstd8021AS] | [IEEEstd8021AS] | |||
IEEE, "IEEE Standard for Local and Metropolitan Area | IEEE, "IEEE Standard for Local and Metropolitan Area | |||
Networks - Timing and Synchronization for Time-Sensitive | Networks - Timing and Synchronization for Time-Sensitive | |||
Applications in Bridged Local Area Networks", | Applications in Bridged Local Area Networks", IEEE Std | |||
IEEE Standard 802.1AS, | 802.1AS-2011, DOI 10.1109/IEEESTD.2011.5741898, March | |||
<https://ieeexplore.ieee.org/document/5741898/>. | 2011, <https://ieeexplore.ieee.org/document/5741898/>. | |||
[RFC0826] Plummer, D., "An Ethernet Address Resolution Protocol: Or | [RFC0826] Plummer, D., "An Ethernet Address Resolution Protocol: Or | |||
Converting Network Protocol Addresses to 48.bit Ethernet | Converting Network Protocol Addresses to 48.bit Ethernet | |||
Address for Transmission on Ethernet Hardware", STD 37, | Address for Transmission on Ethernet Hardware", STD 37, | |||
RFC 826, DOI 10.17487/RFC0826, November 1982, | RFC 826, DOI 10.17487/RFC0826, November 1982, | |||
<https://www.rfc-editor.org/info/rfc826>. | <https://www.rfc-editor.org/info/rfc826>. | |||
[RFC1982] Elz, R. and R. Bush, "Serial Number Arithmetic", RFC 1982, | [RFC1982] Elz, R. and R. Bush, "Serial Number Arithmetic", RFC 1982, | |||
DOI 10.17487/RFC1982, August 1996, | DOI 10.17487/RFC1982, August 1996, | |||
<https://www.rfc-editor.org/info/rfc1982>. | <https://www.rfc-editor.org/info/rfc1982>. | |||
skipping to change at page 182, line 20 ¶ | skipping to change at line 7936 ¶ | |||
[RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, | [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, | |||
"Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, | "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, | |||
DOI 10.17487/RFC4861, September 2007, | DOI 10.17487/RFC4861, September 2007, | |||
<https://www.rfc-editor.org/info/rfc4861>. | <https://www.rfc-editor.org/info/rfc4861>. | |||
[RFC4862] Thomson, S., Narten, T., and T. Jinmei, "IPv6 Stateless | [RFC4862] Thomson, S., Narten, T., and T. Jinmei, "IPv6 Stateless | |||
Address Autoconfiguration", RFC 4862, | Address Autoconfiguration", RFC 4862, | |||
DOI 10.17487/RFC4862, September 2007, | DOI 10.17487/RFC4862, September 2007, | |||
<https://www.rfc-editor.org/info/rfc4862>. | <https://www.rfc-editor.org/info/rfc4862>. | |||
[RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an | ||||
IANA Considerations Section in RFCs", RFC 5226, | ||||
DOI 10.17487/RFC5226, May 2008, | ||||
<https://www.rfc-editor.org/info/rfc5226>. | ||||
[RFC5837] Atlas, A., Ed., Bonica, R., Ed., Pignataro, C., Ed., Shen, | [RFC5837] Atlas, A., Ed., Bonica, R., Ed., Pignataro, C., Ed., Shen, | |||
N., and JR. Rivers, "Extending ICMP for Interface and | N., and JR. Rivers, "Extending ICMP for Interface and | |||
Next-Hop Identification", RFC 5837, DOI 10.17487/RFC5837, | Next-Hop Identification", RFC 5837, DOI 10.17487/RFC5837, | |||
April 2010, <https://www.rfc-editor.org/info/rfc5837>. | April 2010, <https://www.rfc-editor.org/info/rfc5837>. | |||
[RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection | [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection | |||
(BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, | (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, | |||
<https://www.rfc-editor.org/info/rfc5880>. | <https://www.rfc-editor.org/info/rfc5880>. | |||
[RFC6550] Winter, T., Ed., Thubert, P., Ed., Brandt, A., Hui, J., | [RFC6550] Winter, T., Ed., Thubert, P., Ed., Brandt, A., Hui, J., | |||
Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, | Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, | |||
JP., and R. Alexander, "RPL: IPv6 Routing Protocol for | JP., and R. Alexander, "RPL: IPv6 Routing Protocol for | |||
Low-Power and Lossy Networks", RFC 6550, | Low-Power and Lossy Networks", RFC 6550, | |||
DOI 10.17487/RFC6550, March 2012, | DOI 10.17487/RFC6550, March 2012, | |||
<https://www.rfc-editor.org/info/rfc6550>. | <https://www.rfc-editor.org/info/rfc6550>. | |||
[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for | ||||
Writing an IANA Considerations Section in RFCs", BCP 26, | ||||
RFC 8126, DOI 10.17487/RFC8126, June 2017, | ||||
<https://www.rfc-editor.org/info/rfc8126>. | ||||
[RFC8415] Mrugalski, T., Siodelski, M., Volz, B., Yourtchenko, A., | [RFC8415] Mrugalski, T., Siodelski, M., Volz, B., Yourtchenko, A., | |||
Richardson, M., Jiang, S., Lemon, T., and T. Winters, | Richardson, M., Jiang, S., Lemon, T., and T. Winters, | |||
"Dynamic Host Configuration Protocol for IPv6 (DHCPv6)", | "Dynamic Host Configuration Protocol for IPv6 (DHCPv6)", | |||
RFC 8415, DOI 10.17487/RFC8415, November 2018, | RFC 8415, DOI 10.17487/RFC8415, November 2018, | |||
<https://www.rfc-editor.org/info/rfc8415>. | <https://www.rfc-editor.org/info/rfc8415>. | |||
[VAHDAT08] Al-Fares, M., Loukissas, A., and A. Vahdat, "A Scalable, | [VAHDAT08] Al-Fares, M., Loukissas, A., and A. Vahdat, "A Scalable, | |||
Commodity Data Center Network Architecture", SIGCOMM , | Commodity Data Center Network Architecture", ACM SIGCOMM | |||
2008. | Computer Communication Review, vol. 38, no. 4, pp. 63-74, | |||
DOI 10.1145/1402946.1402967, August 2008, | ||||
<https://dl.acm.org/doi/10.1145/1402946.1402967>. | ||||
[VFR] Giotsas, V. and S. Zhou, "Valley-free violation in | [VFR] Giotsas, V. and S. Zhou, "Valley-free violation in | |||
Internet routing - Analysis based on BGP Community data", | Internet routing - Analysis based on BGP Community data", | |||
2012 IEEE International Conference on Communications | 2012 IEEE International Conference on Communications | |||
(ICC) , 2012. | (ICC), DOI 10.1109/ICC.2012.6363987, 2012, | |||
<https://ieeexplore.ieee.org/document/6363987>. | ||||
Appendix A. Sequence Number Binary Arithmetic | Appendix A. Sequence Number Binary Arithmetic | |||
This section defines a variant of sequence number arithmetic related | This section defines a variant of sequence number arithmetic related | |||
to [RFC1982] explained over two complement arithmetic which is easy | to [RFC1982] explained over two complement arithmetic, which is easy | |||
to implement. | to implement. | |||
Assuming straight two complement's subtractions on the bit-width of | Assuming straight two complement's subtractions on the bit width of | |||
the sequence numbers, the corresponding >: and =: relations are | the sequence numbers, the corresponding >: and =: relations are | |||
defined as: | defined as: | |||
U_1, U_2 are 12-bits aligned unsigned version number | * U_1, U_2 are 12-bits aligned unsigned version number | |||
D_f is ( U_1 - U_2 ) interpreted as two complement signed 12-bits | * D_f is ( U_1 - U_2 ) interpreted as two complement signed 12-bits | |||
D_b is ( U_2 - U_1 ) interpreted as two complement signed 12-bits | ||||
U_1 >: U_2 IIF D_f > 0 *and* D_b < 0 | * D_b is ( U_2 - U_1 ) interpreted as two complement signed 12-bits | |||
U_1 =: U_2 IIF D_f = 0 | ||||
* U_1 >: U_2 IIF D_f > 0 *and* D_b < 0 | ||||
* U_1 =: U_2 IIF D_f = 0 | ||||
The >: relationship is anti-symmetric but not transitive. Observe | The >: relationship is anti-symmetric but not transitive. Observe | |||
that this leaves >: of the numbers having maximum two complement | that this leaves >: of the numbers having maximum two complement | |||
distance, e.g. ( 0 and 0x800 ) undefined in the 12-bits case since | distance, e.g., ( 0 and 0x800 ) undefined in the 12-bits case since | |||
D_f and D_b are both -0x7ff. | D_f and D_b are both -0x7ff. | |||
A simple example of the relationship in case of 3-bit arithmetic | A simple example of the relationship in case of 3-bit arithmetic | |||
follows as table indicating D_f/D_b values and then the relationship | follows as table indicating D_f/D_b values and then the relationship | |||
of U_1 to U_2: | of U_1 to U_2: | |||
U2 / U1 0 1 2 3 4 5 6 7 | +=========+=====+=====+=====+=====+=====+=====+=====+=====+ | |||
0 +/+ +/- +/- +/- -/- -/+ -/+ -/+ | | U2 / U1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | |||
1 -/+ +/+ +/- +/- +/- -/- -/+ -/+ | +=========+=====+=====+=====+=====+=====+=====+=====+=====+ | |||
2 -/+ -/+ +/+ +/- +/- +/- -/- -/+ | | 0 | +/+ | +/- | +/- | +/- | -/- | -/+ | -/+ | -/+ | | |||
3 -/+ -/+ -/+ +/+ +/- +/- +/- -/- | +---------+-----+-----+-----+-----+-----+-----+-----+-----+ | |||
4 -/- -/+ -/+ -/+ +/+ +/- +/- +/- | | 1 | -/+ | +/+ | +/- | +/- | +/- | -/- | -/+ | -/+ | | |||
5 +/- -/- -/+ -/+ -/+ +/+ +/- +/- | +---------+-----+-----+-----+-----+-----+-----+-----+-----+ | |||
6 +/- +/- -/- -/+ -/+ -/+ +/+ +/- | | 2 | -/+ | -/+ | +/+ | +/- | +/- | +/- | -/- | -/+ | | |||
7 +/- +/- +/- -/- -/+ -/+ -/+ +/+ | +---------+-----+-----+-----+-----+-----+-----+-----+-----+ | |||
U2 / U1 0 1 2 3 4 5 6 7 | | 3 | -/+ | -/+ | -/+ | +/+ | +/- | +/- | +/- | -/- | | |||
0 = > > > ? < < < | +---------+-----+-----+-----+-----+-----+-----+-----+-----+ | |||
1 < = > > > ? < < | | 4 | -/- | -/+ | -/+ | -/+ | +/+ | +/- | +/- | +/- | | |||
2 < < = > > > ? < | +---------+-----+-----+-----+-----+-----+-----+-----+-----+ | |||
3 < < < = > > > ? | | 5 | +/- | -/- | -/+ | -/+ | -/+ | +/+ | +/- | +/- | | |||
4 ? < < < = > > > | +---------+-----+-----+-----+-----+-----+-----+-----+-----+ | |||
5 > ? < < < = > > | | 6 | +/- | +/- | -/- | -/+ | -/+ | -/+ | +/+ | +/- | | |||
6 > > ? < < < = > | +---------+-----+-----+-----+-----+-----+-----+-----+-----+ | |||
7 > > > ? < < < = | | 7 | +/- | +/- | +/- | -/- | -/+ | -/+ | -/+ | +/+ | | |||
+---------+-----+-----+-----+-----+-----+-----+-----+-----+ | ||||
Table 44 | ||||
+=========+===+===+===+===+===+===+===+===+ | ||||
| U2 / U1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | ||||
+=========+===+===+===+===+===+===+===+===+ | ||||
| 0 | = | > | > | > | ? | < | < | < | | ||||
+---------+---+---+---+---+---+---+---+---+ | ||||
| 1 | < | = | > | > | > | ? | < | < | | ||||
+---------+---+---+---+---+---+---+---+---+ | ||||
| 2 | < | < | = | > | > | > | ? | < | | ||||
+---------+---+---+---+---+---+---+---+---+ | ||||
| 3 | < | < | < | = | > | > | > | ? | | ||||
+---------+---+---+---+---+---+---+---+---+ | ||||
| 4 | ? | < | < | < | = | > | > | > | | ||||
+---------+---+---+---+---+---+---+---+---+ | ||||
| 5 | > | ? | < | < | < | = | > | > | | ||||
+---------+---+---+---+---+---+---+---+---+ | ||||
| 6 | > | > | ? | < | < | < | = | > | | ||||
+---------+---+---+---+---+---+---+---+---+ | ||||
| 7 | > | > | > | ? | < | < | < | = | | ||||
+---------+---+---+---+---+---+---+---+---+ | ||||
Table 45 | ||||
Appendix B. Examples | Appendix B. Examples | |||
B.1. Normal Operation | B.1. Normal Operation | |||
^ N +--------+ +--------+ | ^ N +--------+ +--------+ | |||
Level 2 | |ToF 21| |ToF 22| | Level 2 | |ToF 21| |ToF 22| | |||
E <-*-> W ++-+--+-++ ++-+--+-++ | E <-*-> W ++-+--+-++ ++-+--+-++ | |||
| | | | | | | | | | | | | | | | | | | | |||
S v P111/2 |P121/2 | | | | | S v P111/2 |P121/2 | | | | | |||
skipping to change at page 184, line 48 ¶ | skipping to change at line 8082 ¶ | |||
| +---0/0--->-----+ 0/0 | +----------------+ | | | +---0/0--->-----+ 0/0 | +----------------+ | | |||
0/0 | | | | | | | | 0/0 | | | | | | | | |||
| +---<-0/0-----+ | v | +--------------+ | | | | +---<-0/0-----+ | v | +--------------+ | | | |||
v | | | | | | | | v | | | | | | | | |||
+-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+ | +-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+ | |||
Level 0 | | | | | | | | | Level 0 | | | | | | | | | |||
|Leaf111| |Leaf112| |Leaf121| |Leaf122| | |Leaf111| |Leaf112| |Leaf121| |Leaf122| | |||
+-+-----+ +-+---+-+ +--+--+-+ +-+-----+ | +-+-----+ +-+---+-+ +--+--+-+ +-+-----+ | |||
+ + \ / + + | + + \ / + + | |||
Prefix111 Prefix112 \ / Prefix121 Prefix122 | Prefix111 Prefix112 \ / Prefix121 Prefix122 | |||
multi-homed | multihomed | |||
Prefix | Prefix | |||
+---------- PoD 1 ---------+ +---------- PoD 2 ---------+ | +---------- PoD 1 ---------+ +---------- PoD 2 ---------+ | |||
Figure 35: Normal Case Topology | Figure 35: Normal Case Topology | |||
This section describes RIFT deployment in the example topology given | This section describes RIFT deployment in the example topology given | |||
in Figure 35 without any node or link failures. The scenario | in Figure 35 without any node or link failures. The scenario | |||
disregards flooding reduction for simplicity's sake and compresses | disregards flooding reduction for simplicity's sake and compresses | |||
the node names in some cases to fit them into the picture better. | the node names in some cases to fit them into the picture better. | |||
First, the following bi-directional adjacencies will be established: | First, the following bidirectional adjacencies will be established: | |||
1. ToF 21 (PoD 0) to Spine 111, Spine 112, Spine 121, and Spine 122 | 1. ToF 21 (PoD 0) to Spine 111, Spine 112, Spine 121, and Spine 122 | |||
2. ToF 22 (PoD 0) to Spine 111, Spine 112, Spine 121, and Spine 122 | 2. ToF 22 (PoD 0) to Spine 111, Spine 112, Spine 121, and Spine 122 | |||
3. Spine 111 to Leaf 111, Leaf 112 | 3. Spine 111 to Leaf 111 and Leaf 112 | |||
4. Spine 112 to Leaf 111, Leaf 112 | 4. Spine 112 to Leaf 111 and Leaf 112 | |||
5. Spine 121 to Leaf 121, Leaf 122 | 5. Spine 121 to Leaf 121 and Leaf 122 | |||
6. Spine 122 to Leaf 121, Leaf 122 | 6. Spine 122 to Leaf 121 and Leaf 122 | |||
Leaf 111 and Leaf 112 originate N-TIEs for Prefix 111 and Prefix 112 | Leaf 111 and Leaf 112 originate N-TIEs for Prefix 111 and Prefix 112 | |||
(respectively) to both Spine 111 and Spine 112 (Leaf 112 also | (respectively) to both Spine 111 and Spine 112 (Leaf 112 also | |||
originates an N-TIE for the multi-homed prefix). Spine 111 and Spine | originates an N-TIE for the multihomed prefix). Spine 111 and Spine | |||
112 will then originate their own N-TIEs, as well as flood the N-TIEs | 112 will then originate their own N-TIEs, as well as flood the N-TIEs | |||
received from Leaf 111 and Leaf 112 to both ToF 21 and ToF 22. | received from Leaf 111 and Leaf 112 to both ToF 21 and ToF 22. | |||
Similarly, Leaf 121 and Leaf 122 originate North TIEs for Prefix 121 | Similarly, Leaf 121 and Leaf 122 originate North TIEs for Prefix 121 | |||
and Prefix 122 (respectively) to Spine 121 and Spine 122 (Leaf 121 | and Prefix 122 (respectively) to Spine 121 and Spine 122 (Leaf 121 | |||
also originates a North TIE for the multi-homed prefix). Spine 121 | also originates a North TIE for the multihomed prefix). Spine 121 | |||
and Spine 122 will then originate their own North TIEs, as well as | and Spine 122 will then originate their own North TIEs, as well as | |||
flood the North TIEs received from Leaf 121 and Leaf 122 to both ToF | flood the North TIEs received from Leaf 121 and Leaf 122 to both ToF | |||
21 and ToF 22. | 21 and ToF 22. | |||
Spines hold only North TIEs of level 0 for their PoD, while leaves | Spines hold only North TIEs of level 0 for their PoD, while leaves | |||
only hold their own North TIEs while, at this point, both ToF 21 and | only hold their own North TIEs while, at this point, both ToF 21 and | |||
ToF 22 (as well as any northbound connected controllers) would have | ToF 22 (as well as any northbound connected controllers) would have | |||
the complete network topology. | the complete network topology. | |||
ToF 21 and ToF 22 would then originate and flood South TIEs | ToF 21 and ToF 22 would then originate and flood South TIEs | |||
containing any established adjacencies and a default IP route to all | containing any established adjacencies and a default IP route to all | |||
spines. Spine 111, Spine 112, Spine 121, and Spine 122 will reflect | spines. Spine 111, Spine 112, Spine 121, and Spine 122 will reflect | |||
all Node South TIEs received from ToF 21 to ToF 22, and all Node | all South Node TIEs received from ToF 21 to ToF 22 and all South Node | |||
South TIEs from ToF 22 to ToF 21. South TIEs will not be re- | TIEs from ToF 22 to ToF 21. South TIEs will not be re-propagated | |||
propagated southbound. | southbound. | |||
South TIEs containing a default IP route are then originated by both | South TIEs containing a default IP route are then originated by both | |||
Spine 111 and Spine 112 toward Leaf 111 and Leaf 112. Similarly, | Spine 111 and Spine 112 towards Leaf 111 and Leaf 112. Similarly, | |||
South TIEs containing a default IP route are originated by Spine 121 | South TIEs containing a default IP route are originated by Spine 121 | |||
and Spine 122 toward Leaf 121 and Leaf 122. | and Spine 122 towards Leaf 121 and Leaf 122. | |||
At this point IP connectivity across maximum number of viable paths | At this point, IP connectivity across the maximum number of viable | |||
has been established for all leaves, with routing information | paths has been established for all leaves, with routing information | |||
constrained to only the minimum amount that allows for normal | constrained to only the minimum amount that allows for normal | |||
operation and redundancy. | operation and redundancy. | |||
B.2. Leaf Link Failure | B.2. Leaf Link Failure | |||
| | | | | | | | | | |||
+-+---+-+ +-+---+-+ | +-+---+-+ +-+---+-+ | |||
| | | | | | | | | | |||
|Spin111| |Spin112| | |Spin111| |Spin112| | |||
+-+---+-+ ++----+-+ | +-+---+-+ ++----+-+ | |||
skipping to change at page 186, line 40 ¶ | skipping to change at line 8167 ¶ | |||
+-------+ +-------+ | +-------+ +-------+ | |||
+ + | + + | |||
Prefix111 Prefix112 | Prefix111 Prefix112 | |||
Figure 36: Single Leaf Link Failure | Figure 36: Single Leaf Link Failure | |||
In the event of a link failure between Spine 112 and Leaf 112, both | In the event of a link failure between Spine 112 and Leaf 112, both | |||
nodes will originate new Node TIEs that contain their connected | nodes will originate new Node TIEs that contain their connected | |||
adjacencies, except for the one that just failed. Leaf 112 will send | adjacencies, except for the one that just failed. Leaf 112 will send | |||
a North Node TIE to Spine 111. Spine 112 will send a North Node TIE | a North Node TIE to Spine 111. Spine 112 will send a North Node TIE | |||
to ToF 21 and ToF 22 as well as a new Node South TIE to Leaf 111 that | to ToF 21 and ToF 22 as well as a new South Node TIE to Leaf 111 that | |||
will be reflected to Spine 111. Necessary SPF recomputation will | will be reflected to Spine 111. Necessary SPF recomputation will | |||
occur, resulting in Spine 112 no longer being in the forwarding path | occur, resulting in Spine 112 no longer being in the forwarding path | |||
for Prefix 112. | for Prefix 112. | |||
Spine 111 will also disaggregate Prefix 112 by sending new Prefix | Spine 111 will also disaggregate Prefix 112 by sending new South | |||
South TIE to Leaf 111 and Leaf 112. Though disaggregation is covered | Prefix TIE to Leaf 111 and Leaf 112. Though disaggregation is | |||
in more detail in the following section, it is worth mentioning in | covered in more detail in the following section, it is worth | |||
this example as it further illustrates RIFT's mechanism to mitigate | mentioning in this example as it further illustrates RIFT's mechanism | |||
traffic loss. Consider that Leaf 111 has yet to receive the more | to mitigate traffic loss. Consider that Leaf 111 has yet to receive | |||
specific (disaggregated) route from Spine 111. In such a scenario, | the more specific (disaggregated) route from Spine 111. In such a | |||
traffic from Leaf 111 toward Prefix 112 may still use Spine 112's | scenario, traffic from Leaf 111 towards Prefix 112 may still use | |||
default route, causing it to traverse ToF 21 and ToF 22 back down via | Spine 112's default route, causing it to traverse ToF 21 and ToF 22 | |||
Spine 111. While this behavior is suboptimal, it is transient in | back down via Spine 111. While this behavior is suboptimal, it is | |||
nature and preferred to dropping traffic. | transient in nature and preferred to dropping traffic. | |||
B.3. Partitioned Fabric | B.3. Partitioned Fabric | |||
+--------+ +--------+ | +--------+ +--------+ | |||
Level 2 |ToF 21| |ToF 22| | Level 2 |ToF 21| |ToF 22| | |||
++-+--+-++ ++-+--+-++ | ++-+--+-++ ++-+--+-++ | |||
| | | | | | | | | | | | | | | | | | |||
| | | | | | | 0/0 | | | | | | | | 0/0 | |||
| | | | | | | | | | | | | | | | | | |||
| | | | | | | | | | | | | | | | | | |||
skipping to change at page 188, line 5 ¶ | skipping to change at line 8219 ¶ | |||
+-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+ | +-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+ | |||
Level 3 | | | | | | | | | Level 3 | | | | | | | | | |||
|Leaf111| |Leaf112| |Leaf121| |Leaf122| | |Leaf111| |Leaf112| |Leaf121| |Leaf122| | |||
+-+-----+ ++------+ +-----+-+ +-+-----+ | +-+-----+ ++------+ +-----+-+ +-+-----+ | |||
+ + + + | + + + + | |||
Prefix111 Prefix112 Prefix121 Prefix122 | Prefix111 Prefix112 Prefix121 Prefix122 | |||
1.1/16 | 1.1/16 | |||
Figure 37: Fabric Partition | Figure 37: Fabric Partition | |||
Figure 37 shows one of more catastrophic scenarios where ToF 21 is | Figure 37 shows more catastrophic scenario where ToF 21 is completely | |||
completely severed from access to Prefix 121 due to a double link | severed from access to Prefix 121 due to a double link failure. If | |||
failure. If only default routes existed, this would result in 50% of | only default routes existed, this would result in 50% of traffic from | |||
traffic from Leaf 111 and Leaf 112 toward Prefix 121 being dropped. | Leaf 111 and Leaf 112 towards Prefix 121 being dropped. | |||
The mechanism to resolve this scenario hinges on ToF 21's South TIEs | The mechanism to resolve this scenario hinges on ToF 21's South TIEs | |||
being reflected from Spine 111 and Spine 112 to ToF 22. Once ToF 22 | being reflected from Spine 111 and Spine 112 to ToF 22. Once ToF 22 | |||
is informed that Prefix 121 cannot be reached from ToF 21, it will | is informed that Prefix 121 cannot be reached from ToF 21, it will | |||
begin to disaggregate Prefix 121 by advertising a more specific route | begin to disaggregate Prefix 121 by advertising a more specific route | |||
(1.1/16) along with the default IP prefix route to all spines (ToF 21 | (1.1/16), along with the default IP prefix route to all spines (ToF | |||
still only sends a default route). The result is Spine 111 and | 21 still only sends a default route). The result is Spine 111 and | |||
Spine112 using the more specific route to Prefix 121 via ToF 22. All | Spine 112 using the more specific route to Prefix 121 via ToF 22. | |||
other prefixes continue to use the default IP prefix route toward | All other prefixes continue to use the default IP prefix route | |||
both ToF 21 and ToF 22. | towards both ToF 21 and ToF 22. | |||
The more specific route for Prefix 121 being advertised by ToF 22 | The more specific route for Prefix 121 being advertised by ToF 22 | |||
does not need to be propagated further south to the leaves, as they | does not need to be propagated further south to the leaves, as they | |||
do not benefit from this information. Spine 111 and Spine 112 are | do not benefit from this information. Spine 111 and Spine 112 are | |||
only required to reflect the new South Node TIEs received from ToF 22 | only required to reflect the new South Node TIEs received from ToF 22 | |||
to ToF 21. In short, only the relevant nodes received the relevant | to ToF 21. In short, only the relevant nodes received the relevant | |||
updates, thereby restricting the failure to only the partitioned | updates, thereby restricting the failure to only the partitioned | |||
level rather than burdening the whole fabric with the flooding and | level rather than burdening the whole fabric with the flooding and | |||
recomputation of the new topology information. | recomputation of the new topology information. | |||
To finish this example, the following table shows sets computed by | To finish this example, the following list shows sets computed by ToF | |||
ToF 22 using notation introduced in Section 6.5: | 22 using notation introduced in Section 6.5: | |||
|R = Prefix 111, Prefix 112, Prefix 121, Prefix 122 | |R = Prefix 111, Prefix 112, Prefix 121, Prefix 122 | |||
|H (for r=Prefix 111) = Spine 111, Spine 112 | |H (for r=Prefix 111) = Spine 111, Spine 112 | |||
|H (for r=Prefix 112) = Spine 111, Spine 112 | |H (for r=Prefix 112) = Spine 111, Spine 112 | |||
|H (for r=Prefix 121) = Spine 121, Spine 122 | |H (for r=Prefix 121) = Spine 121, Spine 122 | |||
|H (for r=Prefix 122) = Spine 121, Spine 122 | |H (for r=Prefix 122) = Spine 121, Spine 122 | |||
skipping to change at page 189, line 29 ¶ | skipping to change at line 8289 ¶ | |||
| | | | | | | | | | | | | | | | | | | | |||
++-+-+--+ | +---+---+ | +-+---+-++ | ++-+-+--+ | +---+---+ | +-+---+-++ | |||
| | +-+ +-+ | | | | | +-+ +-+ | | | |||
| L01 | | L02 | | L03 | Level 0 | | L01 | | L02 | | L03 | Level 0 | |||
+-------+ +-------+ +--------+ | +-------+ +-------+ +--------+ | |||
Figure 38: North Partitioned Router | Figure 38: North Partitioned Router | |||
Figure 38 shows a part of a fabric where level 1 is horizontally | Figure 38 shows a part of a fabric where level 1 is horizontally | |||
connected and A01 lost its only northbound adjacency. Based on N-SPF | connected and A01 lost its only northbound adjacency. Based on N-SPF | |||
rules in Section 6.4.1 A01 will compute northbound reachability by | rules in Section 6.4.1, A01 will compute northbound reachability by | |||
using the link A01 to A02. A02 however, will *not* use this link | using the link A01 to A02. However, A02 will *not* use this link | |||
during N-SPF. The result is A01 utilizing the horizontal link for | during N-SPF. The result is A01 utilizing the horizontal link for | |||
default route advertisement and unidirectional routing. | default route advertisement and unidirectional routing. | |||
Furthermore, if A02 also loses its only northbound adjacency (N2), | Furthermore, if A02 also loses its only northbound adjacency (N2), | |||
the situation evolves. A01 will no longer have northbound | the situation evolves. A01 will no longer have northbound | |||
reachability while it receives A03's northbound adjacencies in South | reachability while it receives A03's northbound adjacencies in South | |||
Node TIEs reflected by nodes south of it. As a result, A01 will no | Node TIEs reflected by nodes south of it. As a result, A01 will no | |||
longer advertise its default route in accordance with Section 6.3.8. | longer advertise its default route in accordance with Section 6.3.8. | |||
Acknowledgments | ||||
A new routing protocol in its complexity is not a product of a parent | ||||
but of a village, as the author list already shows. However, many | ||||
more people provided input and fine-combed the specification based on | ||||
their experience in design, implementation, or application of | ||||
protocols in IP fabrics. This section will make an inadequate | ||||
attempt in recording their contribution. | ||||
Many thanks to Naiming Shen for some of the early discussions around | ||||
the topic of using IGPs for routing in topologies related to Clos. | ||||
Russ White is especially acknowledged for the key conversation on | ||||
epistemology that tied the current asynchronous distributed systems | ||||
theory results to a modern protocol design presented in this scope. | ||||
Adrian Farrel, Joel Halpern, Jeffrey Zhang, Krzysztof Szarkowicz, | ||||
Nagendra Kumar, Melchior Aelmans, Kaushal Tank, Will Jones, Moin | ||||
Ahmed, Zheng (Sandy) Zhang, and Donald Eastlake provided thoughtful | ||||
comments that improved the readability of the document and found a | ||||
good amount of corners where the light failed to shine. Kris Price | ||||
was first to mention single router, single arm default | ||||
considerations. Jeff Tantsura helped out with some initial thoughts | ||||
on BFD interactions while Jeff Haas corrected several misconceptions | ||||
about BFD's finer points and helped to improve the security section | ||||
around leaf considerations. Artur Makutunowicz pointed out many | ||||
possible improvements and acted as a sounding board in regard to | ||||
modern protocol implementation techniques RIFT is exploring. Barak | ||||
Gafni formalized the problem of partitioned spine and fallen leaves | ||||
for the first time clearly on a (clean) napkin in Singapore that led | ||||
to the very important part of the specification centered around | ||||
multiple ToF planes and negative disaggregation. Igor Gashinsky and | ||||
others shared many thoughts on problems encountered in design and | ||||
operation of large-scale data center fabrics. Xu Benchong found a | ||||
delicate error in the flooding procedures and a schema datatype size | ||||
mismatch. | ||||
Too many people to mention provided reviews from many directions in | ||||
IETF, often pointing to critical defects, sometimes asking for things | ||||
again that have been removed by one of the previous reviewers as | ||||
objectionable or superfluous, and many times claiming the document | ||||
being somewhere on the extremes between too crowded with the obvious | ||||
and omitting introduction to cryptic concepts everywhere. The result | ||||
is the best editors could do to find a balance of a document guiding | ||||
the reader by Section 2 into a specification tight enough to result | ||||
in interoperable implementations while at the same time introducing | ||||
enough operational context of IP routable fabrics to guarantee a | ||||
concise, common language when facing unaccustomed concepts the | ||||
protocol relies on. In the process, it was important to not end up | ||||
carrying Aesop's donkey of course, so while the result may not be | ||||
perceived as perfect by everyone, it should be practically speaking | ||||
more than sufficient for everyone that ends up using it in the | ||||
future. | ||||
Last but not least, Alvaro Retana, John Scudder, Andrew Alston, and | ||||
Jim Guichard guided the undertaking as ADs by asking many necessary | ||||
procedural and technical questions that did not only improve the | ||||
content but also laid out the track towards publication. And Roman | ||||
Danyliw is mentioned very last but not least for both his | ||||
painstakingly detailed review and improvement of security aspects of | ||||
the specification. | ||||
Contributors | ||||
This work is a product of a list of individuals who are all to be | ||||
considered major contributors, independent of the fact whether or not | ||||
their name made it to the limited author list. | ||||
Tony Przygienda, Ed. | ||||
Juniper | ||||
Jordan Head, Ed. | ||||
Juniper | ||||
Alankar Sharma | ||||
Hudson River Trading | ||||
Pascal Thubert | ||||
Cisco | ||||
Bruno Rijsman | ||||
Individual | ||||
Dmitry Afanasiev | ||||
Individual | ||||
Don Fedyk | ||||
LabN | ||||
Alia Atlas | ||||
Individual | ||||
John Drake | ||||
Individual | ||||
Ilya Vershkov | ||||
Nvidia | ||||
Authors' Addresses | Authors' Addresses | |||
Tony Przygienda (editor) | Tony Przygienda (editor) | |||
Juniper Networks | Juniper Networks | |||
1137 Innovation Way | 1137 Innovation Way | |||
Sunnyvale, CA 94089 | Sunnyvale, CA 94089 | |||
United States of America | United States of America | |||
Email: prz@juniper.net | Email: prz@juniper.net | |||
Jordan Head (editor) | Jordan Head (editor) | |||
Juniper Networks | Juniper Networks | |||
1137 Innovation Way | 1137 Innovation Way | |||
Sunnyvale, CA 94089 | Sunnyvale, CA 94089 | |||
United States of America | United States of America | |||
Email: jhead@juniper.net | Email: jhead@juniper.net | |||
Alankar Sharma | Alankar Sharma | |||
Hudson River Trading | Hudson River Trading | |||
United States of America | United States of America | |||
End of changes. 1093 change blocks. | ||||
3945 lines changed or deleted | 3990 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |