| rfc9768v6.txt | rfc9768.txt | |||
|---|---|---|---|---|
| Internet Engineering Task Force (IETF) B. Briscoe | Internet Engineering Task Force (IETF) B. Briscoe | |||
| Request for Comments: 9768 Independent | Request for Comments: 9768 Independent | |||
| Updates: 3168 M. Kühlewind | Updates: 3168 M. Kühlewind | |||
| Category: Standards Track Ericsson | Category: Standards Track Ericsson | |||
| ISSN: 2070-1721 R. Scheffenegger | ISSN: 2070-1721 R. Scheffenegger | |||
| NetApp | NetApp | |||
| October 2025 | November 2025 | |||
| More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP | More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP | |||
| Abstract | Abstract | |||
| Explicit Congestion Notification (ECN) is a mechanism by which | Explicit Congestion Notification (ECN) is a mechanism by which | |||
| network nodes can mark IP packets instead of dropping them to | network nodes can mark IP packets instead of dropping them to | |||
| indicate incipient congestion to the endpoints. Receivers with an | indicate incipient congestion to the endpoints. Receivers with an | |||
| ECN-capable transport protocol feed back this information to the | ECN-capable transport protocol feed back this information to the | |||
| sender. ECN was originally specified for TCP in such a way that only | sender. ECN was originally specified for TCP in such a way that only | |||
| one feedback signal can be transmitted per Round-Trip Time (RTT). | one feedback signal can be transmitted per Round-Trip Time (RTT). | |||
| More recently defined TCP mechanisms like Congestion Exposure | More recently defined mechanisms like Congestion Exposure (ConEx), | |||
| (ConEx), Data Center TCP (DCTCP), or Low Latency, Low Loss, and | Data Center TCP (DCTCP), or Low Latency, Low Loss, and Scalable | |||
| Scalable Throughput (L4S) need more Accurate ECN (AccECN) feedback | Throughput (L4S) need more precise ECN feedback information whenever | |||
| information whenever more than one marking is received in one RTT. | more than one marking is received in one RTT. This document updates | |||
| This document updates the original ECN specification defined in RFC | the original ECN specification defined in RFC 3168 by specifying a | |||
| 3168 by specifying a scheme that provides more than one feedback | scheme that provides more than one feedback signal per RTT in the TCP | |||
| signal per RTT in the TCP header. Given TCP header space is scarce, | header. Given TCP header space is scarce, it allocates a reserved | |||
| it allocates a reserved header bit previously assigned to the ECN- | header bit previously assigned to the ECN-nonce. It also overloads | |||
| nonce. It also overloads the two existing ECN flags in the TCP | the two existing ECN flags in the TCP header. The resulting extra | |||
| header. The resulting extra space is additionally exploited to feed | space is additionally exploited to feed back the IP-ECN field | |||
| back the IP ECN field received during the TCP connection | received during the TCP connection establishment. Supplementary | |||
| establishment. Supplementary feedback information can optionally be | feedback information can optionally be provided in two new TCP Option | |||
| provided in two new TCP Option alternatives, which are never used on | alternatives, which are never used on the TCP SYN. The document also | |||
| the TCP SYN. The document also specifies the treatment of this | specifies the treatment of this updated TCP wire protocol by | |||
| updated TCP wire protocol by middleboxes. | middleboxes. | |||
| Status of This Memo | Status of This Memo | |||
| This is an Internet Standards Track document. | This is an Internet Standards Track document. | |||
| This document is a product of the Internet Engineering Task Force | This document is a product of the Internet Engineering Task Force | |||
| (IETF). It represents the consensus of the IETF community. It has | (IETF). It represents the consensus of the IETF community. It has | |||
| received public review and has been approved for publication by the | received public review and has been approved for publication by the | |||
| Internet Engineering Steering Group (IESG). Further information on | Internet Engineering Steering Group (IESG). Further information on | |||
| Internet Standards is available in Section 2 of RFC 7841. | Internet Standards is available in Section 2 of RFC 7841. | |||
| skipping to change at line 135 ¶ | skipping to change at line 135 ¶ | |||
| 6. Summary: Protocol Properties | 6. Summary: Protocol Properties | |||
| 7. IANA Considerations | 7. IANA Considerations | |||
| 8. Security and Privacy Considerations | 8. Security and Privacy Considerations | |||
| 9. References | 9. References | |||
| 9.1. Normative References | 9.1. Normative References | |||
| 9.2. Informative References | 9.2. Informative References | |||
| Appendix A. Example Algorithms | Appendix A. Example Algorithms | |||
| A.1. Example Algorithm to Encode/Decode the AccECN Option | A.1. Example Algorithm to Encode/Decode the AccECN Option | |||
| A.2. Example Algorithm for Safety Against Long Sequences of ACK | A.2. Example Algorithm for Safety Against Long Sequences of ACK | |||
| Loss | Loss | |||
| A.2.1. Safety Algorithm Without the AccECN Option | A.2.1. Safety Algorithm without the AccECN Option | |||
| A.2.2. Safety Algorithm with the AccECN Option | A.2.2. Safety Algorithm with the AccECN Option | |||
| A.3. Example Algorithm to Estimate Marked Bytes from Marked | A.3. Example Algorithm to Estimate Marked Bytes from Marked | |||
| Packets | Packets | |||
| A.4. Example Algorithm to Count Not-ECT Bytes | A.4. Example Algorithm to Count Not-ECT Bytes | |||
| Appendix B. Rationale for Usage of TCP Header Flags | Appendix B. Rationale for Usage of TCP Header Flags | |||
| B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | |||
| B.2. Four Codepoints in the SYN/ACK | B.2. Four Codepoints in the SYN/ACK | |||
| B.3. Space for Future Evolution | B.3. Space for Future Evolution | |||
| Acknowledgements | Acknowledgements | |||
| Authors' Addresses | Authors' Addresses | |||
| skipping to change at line 176 ¶ | skipping to change at line 176 ¶ | |||
| supporting the pre-existing TCP congestion controllers that use just | supporting the pre-existing TCP congestion controllers that use just | |||
| one feedback signal per round. Congestion control is the term the | one feedback signal per round. Congestion control is the term the | |||
| IETF uses to describe data rate management. It is the algorithm that | IETF uses to describe data rate management. It is the algorithm that | |||
| a sender uses to optimize its sending rate so that it transmits data | a sender uses to optimize its sending rate so that it transmits data | |||
| as fast as the network can carry it, but no faster. A fuller | as fast as the network can carry it, but no faster. A fuller | |||
| description of the motivation for this specification is given in the | description of the motivation for this specification is given in the | |||
| associated requirements document [RFC7560]. | associated requirements document [RFC7560]. | |||
| This document specifies a Standards Track scheme for ECN feedback in | This document specifies a Standards Track scheme for ECN feedback in | |||
| the TCP header to provide more than one feedback signal per RTT. It | the TCP header to provide more than one feedback signal per RTT. It | |||
| is called the more "Accurate ECN feedback" scheme, or AccECN for | is called the "more Accurate ECN feedback" scheme, or AccECN for | |||
| short. This document updates RFC 3168 with respect to negotiation | short. This document updates RFC 3168 with respect to negotiation | |||
| and use of the feedback scheme for TCP. All aspects of RFC 3168 | and use of the feedback scheme for TCP. All aspects of RFC 3168 | |||
| other than the TCP feedback scheme and its negotiation remain | other than the TCP feedback scheme and its negotiation remain | |||
| unchanged by this specification. In particular, the definition of | unchanged by this specification. In particular, the definition of | |||
| ECN at the IP layer is unaffected. Section 4 details the aspects of | ECN at the IP layer is unaffected. Section 4 details the aspects of | |||
| RFC 3168 that are updated by this document. | RFC 3168 that are updated by this document. | |||
| This document uses the term "Classic ECN feedback" when it needs to | This document uses the term "Classic ECN feedback" when it needs to | |||
| distinguish the TCP/ECN feedback scheme defined in [RFC3168] from the | distinguish the TCP/ECN feedback scheme defined in [RFC3168] from the | |||
| AccECN TCP feedback scheme. AccECN is intended to offer a complete | AccECN TCP feedback scheme. AccECN is intended to offer a complete | |||
| skipping to change at line 257 ¶ | skipping to change at line 257 ¶ | |||
| main TCP header and quantifies the space left for future use. | main TCP header and quantifies the space left for future use. | |||
| 1.2. Goals | 1.2. Goals | |||
| [RFC7560] enumerates requirements that a candidate feedback scheme | [RFC7560] enumerates requirements that a candidate feedback scheme | |||
| needs to satisfy, under the headings: resilience, timeliness, | needs to satisfy, under the headings: resilience, timeliness, | |||
| integrity, accuracy (including ordering and lack of bias), | integrity, accuracy (including ordering and lack of bias), | |||
| complexity, overhead, and compatibility (both backward and forward). | complexity, overhead, and compatibility (both backward and forward). | |||
| It recognizes that a perfect scheme that fully satisfies all the | It recognizes that a perfect scheme that fully satisfies all the | |||
| requirements is unlikely and trade-offs between requirements are | requirements is unlikely and trade-offs between requirements are | |||
| likely. Section 6 considers the properties of AccECN against these | likely. Section 6 assesses the properties of AccECN against these | |||
| requirements and discusses the trade-offs. | requirements and discusses the trade-offs. | |||
| The requirements document recognizes that a protocol as ubiquitous as | The requirements document recognizes that a protocol as ubiquitous as | |||
| TCP needs to be able to serve as-yet-unspecified requirements. | TCP needs to be able to serve as-yet-unspecified requirements. | |||
| Therefore, an AccECN receiver acts as a generic (mechanistic) | Therefore, an AccECN receiver acts as a generic (mechanistic) | |||
| reflector of congestion information with the aim that new sender | reflector of congestion information with the aim that new sender | |||
| behaviours can be deployed unilaterally (see Section 2.5) in the | behaviours can be deployed unilaterally in the future (see | |||
| future. | Section 2.5). | |||
| 1.3. Terminology | 1.3. Terminology | |||
| Accurate ECN feedback: The more Accurate ECN feedback scheme is | AccECN: The more Accurate ECN feedback scheme. | |||
| called AccECN for short. | ||||
| Classic ECN: The ECN protocol specified in [RFC3168]. | Classic ECN: The ECN protocol specified in [RFC3168]. | |||
| Classic ECN feedback: The feedback aspect of the ECN protocol | Classic ECN feedback: The feedback aspect of the ECN protocol | |||
| specified in [RFC3168], including generation, encoding, | specified in [RFC3168], including generation, encoding, | |||
| transmission and decoding of feedback, but not the Data Sender's | transmission and decoding of feedback, but not the Data Sender's | |||
| subsequent response to that feedback. | subsequent response to that feedback. | |||
| ACK: A TCP acknowledgement, with or without a data payload (ACK=1). | ACK: A TCP acknowledgement, with or without a data payload (ACK=1). | |||
| skipping to change at line 315 ¶ | skipping to change at line 314 ¶ | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in | "OPTIONAL" in this document are to be interpreted as described in | |||
| BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| 1.4. Recap of Existing ECN Feedback in IP/TCP | 1.4. Recap of Existing ECN Feedback in IP/TCP | |||
| Explicit Congestion Notification (ECN) [RFC3168] can be split into | Explicit Congestion Notification (ECN) [RFC3168] can be split into | |||
| two parts conceptionally. In the forward direction, alongside the | two parts conceptually. In the forward direction, alongside the data | |||
| data stream, it uses a 2-bit field in the IP header. This is | stream, it uses a 2-bit field in the IP header. This is referred to | |||
| referred to as IP ECN later on. This signal carried in the IP (Layer | as IP ECN later on. This signal carried in the IP (Layer 3) header | |||
| 3) header is exposed to network devices and may be modified when such | is exposed to network devices, which can modify it when they start to | |||
| a device starts to experience congestion (see Table 1). The second | experience congestion (see Table 1). The second part is the feedback | |||
| part is the feedback mechanism, by which the original data sender is | mechanism, by which the data receiver notifies the current congestion | |||
| notified of the current congestion state of the intermediate path. | state to the original data sender of the intermediate path. That | |||
| That returned signal is carried in a protocol-specific manner, and is | returned signal is carried in a transport-protocol-specific manner, | |||
| not to be modified by intermediate network devices. While ECN is in | and is not to be modified by intermediate network devices. While ECN | |||
| active use for protocols such as QUIC [RFC9000], SCTP [RFC9260], RTP | is in active use for protocols such as QUIC [RFC9000], SCTP | |||
| [RFC6679], and Remote Direct Memory Access over Converged Ethernet | [RFC9260], RTP [RFC6679], and Remote Direct Memory Access over | |||
| [RoCEv2], this document only concerns itself with the specific | Converged Ethernet [RoCEv2], this document only concerns itself with | |||
| implementation for the TCP protocol. | the specific implementation for the TCP protocol. | |||
| Once ECN has been negotiated for a transport layer connection, the | Once ECN has been negotiated for a transport layer connection, the | |||
| Data Sender for either half-connection can set two possible | Data Sender for either half-connection can set two possible | |||
| codepoints (ECT(0) or ECT(1)) in the IP header of a data packet to | codepoints (ECT(0) or ECT(1)) in the IP header of a data packet to | |||
| indicate an ECN-capable transport (ECT). If the ECN codepoint is | indicate an ECN-capable transport (ECT). If the ECN codepoint is | |||
| 0b00, the packet is considered to have been sent by a Not ECN-capable | 0b00, the packet is considered to have been sent by a Not ECN-capable | |||
| Transport (Not-ECT). When a network node experiences congestion, it | Transport (Not-ECT). When a network node experiences congestion, it | |||
| will occasionally either drop or mark a packet, with the choice | will occasionally either drop or mark a packet, with the choice | |||
| depending on the packet's ECN codepoint. If the codepoint is Not- | depending on the packet's ECN codepoint. If the codepoint is Not- | |||
| ECT, only drop is appropriate. If the codepoint is ECT(0) or ECT(1), | ECT, only drop is appropriate. If the codepoint is ECT(0) or ECT(1), | |||
| the node can mark the packet by setting the ECN codepoint to 0b11, | the node can mark the packet by setting the ECN codepoint to 0b11, | |||
| which is termed 'Congestion Experienced' (CE), or loosely a | which is termed 'Congestion Experienced' (CE), or loosely a | |||
| 'congestion mark'. Table 1 summarises these codepoints. | 'congestion mark'. Table 1 summarises these codepoints. | |||
| +==================+================+===========================+ | +==================+================+===========================+ | |||
| | IP ECN Codepoint | Codepoint Name | Description | | | IP-ECN Codepoint | Codepoint Name | Description | | |||
| +==================+================+===========================+ | +==================+================+===========================+ | |||
| | 0b00 | Not-ECT | Not ECN-Capable Transport | | | 0b00 | Not-ECT | Not ECN-Capable Transport | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| | 0b01 | ECT(1) | ECN-Capable Transport (1) | | | 0b01 | ECT(1) | ECN-Capable Transport (1) | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| | 0b10 | ECT(0) | ECN-Capable Transport (0) | | | 0b10 | ECT(0) | ECN-Capable Transport (0) | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| | 0b11 | CE | Congestion Experienced | | | 0b11 | CE | Congestion Experienced | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| skipping to change at line 404 ¶ | skipping to change at line 403 ¶ | |||
| Like the general TCP approach, the Data Receiver of each TCP half- | Like the general TCP approach, the Data Receiver of each TCP half- | |||
| connection sends AccECN feedback to the Data Sender on TCP | connection sends AccECN feedback to the Data Sender on TCP | |||
| acknowledgements, reusing data packets of the other half-connection | acknowledgements, reusing data packets of the other half-connection | |||
| whenever possible. | whenever possible. | |||
| The AccECN protocol has had to be designed in two parts: | The AccECN protocol has had to be designed in two parts: | |||
| * an essential feedback part that reuses the TCP-ECN header bits for | * an essential feedback part that reuses the TCP-ECN header bits for | |||
| the Data Receiver to feed back the number of packets arriving with | the Data Receiver to feed back the number of packets arriving with | |||
| CE in the IP ECN field. This provides more accuracy than Classic | CE in the IP-ECN field. This provides more accuracy than Classic | |||
| ECN feedback, but limited resilience against ACK loss. | ECN feedback, but limited resilience against ACK loss. | |||
| * a supplementary feedback part using one of two new alternative | * a supplementary feedback part using one of two new alternative | |||
| AccECN TCP Options that provide additional feedback on the number | AccECN TCP Options that provide additional feedback on the number | |||
| of payload bytes that arrive marked with each of the three ECN | of payload bytes that arrive marked with each of the three ECN | |||
| codepoints in the IP ECN field (not just CE marks). See the BCP | codepoints in the IP-ECN field (not just CE marks). See the BCP | |||
| on Byte and Packet Congestion Notification [RFC7141] for the | on Byte and Packet Congestion Notification [RFC7141] for the | |||
| rationale determining that conveying congested payload bytes | rationale determining that conveying congested payload bytes | |||
| should be preferred over just providing feedback about congested | should be preferred over just providing feedback about congested | |||
| packets. This also provides greater resilience against ACK loss | packets. This also provides greater resilience against ACK loss | |||
| than the essential feedback, but it is currently more likely to | than the essential feedback, but it is currently more likely to | |||
| suffer from middlebox interference. | suffer from middlebox interference. | |||
| The two part design was necessary, given limitations on the space | The two part design was necessary, given limitations on the space | |||
| available for TCP Options and given the possibility that certain | available for TCP Options and given the possibility that certain | |||
| incorrectly designed middleboxes might prevent TCP from using any new | incorrectly designed middleboxes might prevent TCP from using any new | |||
| skipping to change at line 442 ¶ | skipping to change at line 441 ¶ | |||
| * a single upgrade path for the TCP protocol is preferable to a fork | * a single upgrade path for the TCP protocol is preferable to a fork | |||
| in the design that modifies the TCP header to convey all ECN | in the design that modifies the TCP header to convey all ECN | |||
| feedback; | feedback; | |||
| * otherwise, Classic and Accurate ECN feedback could give | * otherwise, Classic and Accurate ECN feedback could give | |||
| conflicting feedback about the same segment, which could open up | conflicting feedback about the same segment, which could open up | |||
| new security concerns and make implementations unnecessarily | new security concerns and make implementations unnecessarily | |||
| complex; | complex; | |||
| * middleboxes are more likely to faithfully forward the TCP ECN | * middleboxes are more likely to faithfully forward the TCP-ECN | |||
| flags than newly defined areas of the TCP header. | flags than newly defined areas of the TCP header. | |||
| AccECN is designed to work even if the supplementary feedback part is | AccECN is designed to work even if the supplementary feedback part is | |||
| removed or zeroed out, as long as the essential feedback part gets | removed or zeroed out, as long as the essential feedback part gets | |||
| through. | through. | |||
| 2.1. Capability Negotiation | 2.1. Capability Negotiation | |||
| AccECN changes the wire protocol of the main TCP header; therefore, | AccECN changes the wire protocol of the main TCP header; therefore, | |||
| it can only be used if both endpoints have been upgraded to | it can only be used if both endpoints have been upgraded to | |||
| skipping to change at line 470 ¶ | skipping to change at line 469 ¶ | |||
| An AccECN TCP Client does not send an AccECN Option on the SYN as SYN | An AccECN TCP Client does not send an AccECN Option on the SYN as SYN | |||
| option space is limited. The TCP Server sends an AccECN Option on | option space is limited. The TCP Server sends an AccECN Option on | |||
| the SYN/ACK, and the TCP Client sends one on the first ACK to test | the SYN/ACK, and the TCP Client sends one on the first ACK to test | |||
| whether the network path forwards these options correctly. | whether the network path forwards these options correctly. | |||
| 2.2. Feedback Mechanism | 2.2. Feedback Mechanism | |||
| A Data Receiver maintains four counters initialized at the start of | A Data Receiver maintains four counters initialized at the start of | |||
| the half-connection. Three count the number of arriving payload | the half-connection. Three count the number of arriving payload | |||
| bytes marked CE, ECT(1), and ECT(0) in the IP ECN field. These byte | bytes marked CE, ECT(1), and ECT(0) in the IP-ECN field. These byte | |||
| counters reflect only the TCP payload length, excluding the TCP | counters reflect only the TCP payload length, excluding the TCP | |||
| header and TCP Options. The fourth counter counts the number of | header and TCP Options. The fourth counter counts the number of | |||
| packets arriving marked with a CE codepoint (including control | packets arriving marked with a CE codepoint (including control | |||
| packets without payload if they are CE-marked). | packets without payload if they are CE-marked). | |||
| The Data Sender maintains four equivalent counters for the half- | The Data Sender maintains four equivalent counters for the half- | |||
| connection, and the AccECN protocol is designed to ensure they will | connection, and the AccECN protocol is designed to ensure they will | |||
| match the values in the Data Receiver's counters, albeit after a | match the values in the Data Receiver's counters, albeit after a | |||
| little delay. | little delay. | |||
| skipping to change at line 511 ¶ | skipping to change at line 510 ¶ | |||
| actually cycled completely and then incremented by one. The Data | actually cycled completely and then incremented by one. The Data | |||
| Receiver is not allowed to delay sending an ACK to such an extent | Receiver is not allowed to delay sending an ACK to such an extent | |||
| that the ACE field would cycle. However, ACKs received at the Data | that the ACE field would cycle. However, ACKs received at the Data | |||
| Sender could still cycle because a whole sequence of ACKs carrying | Sender could still cycle because a whole sequence of ACKs carrying | |||
| intervening values of the field might all be lost or delayed in | intervening values of the field might all be lost or delayed in | |||
| transit. | transit. | |||
| The fields in an AccECN Option are larger, but they will increment in | The fields in an AccECN Option are larger, but they will increment in | |||
| larger steps because they count bytes not packets. Nonetheless, | larger steps because they count bytes not packets. Nonetheless, | |||
| their size has been chosen such that a whole cycle of the field would | their size has been chosen such that a whole cycle of the field would | |||
| never occur between ACKs unless there has been an infeasibly long | never occur between ACKs unless there had been an infeasibly long | |||
| sequence of ACK losses. Therefore, provided that an AccECN Option is | sequence of ACK losses. Therefore, provided that an AccECN Option is | |||
| available, it can be treated as a dependable feedback channel. | available, it can be treated as a dependable feedback channel. | |||
| If an AccECN Option is not available, e.g., it is being stripped by a | If an AccECN Option is not available, e.g., it is being stripped by a | |||
| middlebox, the AccECN protocol will only feed back information on CE | middlebox, the AccECN protocol will only feed back information on CE | |||
| markings (using the ACE field). Although not ideal, this will be | markings (using the ACE field). Although not ideal, this will be | |||
| sufficient, because it is envisaged that neither ECT(0) nor ECT(1) | sufficient, because it is envisaged that neither ECT(0) nor ECT(1) | |||
| will ever indicate more severe congestion than CE, even though future | will ever indicate more severe congestion than CE, even though future | |||
| uses for ECT(0) or ECT(1) are still unclear [RFC8311]. Because the | uses for ECT(0) or ECT(1) are still unclear [RFC8311]. Because the | |||
| 3-bit ACE field is so small, when it is the only field available, the | 3-bit ACE field is so small, when it is the only field available, the | |||
| skipping to change at line 557 ¶ | skipping to change at line 556 ¶ | |||
| other than the L4S experiment [RFC9330], such as a lower severity or | other than the L4S experiment [RFC9330], such as a lower severity or | |||
| a more instant congestion signal than CE. | a more instant congestion signal than CE. | |||
| Feedback in bytes is provided to protect against the receiver or a | Feedback in bytes is provided to protect against the receiver or a | |||
| middlebox using attacks similar to 'ACK-Division' to artificially | middlebox using attacks similar to 'ACK-Division' to artificially | |||
| inflate the congestion window, which is why [RFC5681] now recommends | inflate the congestion window, which is why [RFC5681] now recommends | |||
| that TCP counts acknowledge bytes not packets. | that TCP counts acknowledge bytes not packets. | |||
| 2.5. Generic (Mechanistic) Reflector | 2.5. Generic (Mechanistic) Reflector | |||
| The ACE field provides feedback about CE markings in the IP ECN field | The ACE field provides feedback about CE markings in the IP-ECN field | |||
| of both data and control packets. According to [RFC3168], the Data | of both data and control packets. According to [RFC3168], the Data | |||
| Sender is meant to set the IP ECN field of control packets to Not- | Sender is meant to set the IP-ECN field of control packets to Not- | |||
| ECT. However, mechanisms in certain private networks (e.g., data | ECT. However, mechanisms in certain private networks (e.g., data | |||
| centres) set control packets to be ECN-capable because they are | centres) set control packets to be ECN-capable because they are | |||
| precisely the packets that performance depends on most. | precisely the packets that performance depends on most. | |||
| For this reason, AccECN is designed to be a generic reflector of | For this reason, AccECN is designed to be a generic reflector of | |||
| whatever ECN markings it sees, whether or not they are compliant with | whatever ECN markings it sees, whether or not they are compliant with | |||
| a current standard. Then as standards evolve, Data Senders can | a current standard. Then as standards evolve, Data Senders can | |||
| upgrade unilaterally without any need for receivers to upgrade too. | upgrade unilaterally without any need for receivers to upgrade too. | |||
| It is also useful to be able to rely on generic reflection behaviour | It is also useful to be able to rely on generic reflection behaviour | |||
| when senders need to test for unexpected interference with markings | when senders need to test for unexpected interference with markings | |||
| (for instance Sections 3.2.2.3, 3.2.2.4, and 3.2.3.2 of the present | (for instance Sections 3.2.2.3, 3.2.2.4, and 3.2.3.2 of the present | |||
| document and paragraph 2 of Section 20.2 of [RFC3168]). | document and paragraph 2 of Section 20.2 of [RFC3168]). | |||
| The initial SYN and SYN/ACK are the most critical control packets, so | The initial SYN and SYN/ACK are the most critical control packets, so | |||
| AccECN feeds back their IP ECN fields. Although RFC 3168 prohibits | AccECN feeds back their IP-ECN fields. Although RFC 3168 prohibits | |||
| ECN-capable SYNs and SYN/ACKs, providing feedback of ECN marking on | ECN-capable SYNs and SYN/ACKs, providing feedback of ECN marking on | |||
| the SYN and SYN/ACK supports future scenarios in which SYNs might be | the SYN and SYN/ACK supports future scenarios in which SYNs might be | |||
| ECN-enabled (without prejudging whether they ought to be). For | ECN-enabled (without prejudging whether they ought to be). For | |||
| instance, [RFC8311] updates this aspect of RFC 3168 to allow | instance, [RFC8311] updates this aspect of RFC 3168 to allow | |||
| experimentation with ECN-capable TCP control packets. | experimentation with ECN-capable TCP control packets. | |||
| Even if the TCP Client (or Server) has set the SYN (or SYN/ACK) to | Even if the TCP Client (or Server) has set the SYN (or SYN/ACK) to | |||
| Not-ECT in compliance with RFC 3168, feedback on the state of the IP | Not-ECT in compliance with RFC 3168, feedback on the state of the IP- | |||
| ECN field when it arrives at the receiver could still be useful, | ECN field when it arrives at the receiver could still be useful, | |||
| because middleboxes have been known to overwrite the IP ECN field as | because middleboxes have been known to overwrite the IP-ECN field as | |||
| if it is still part of the old Type of Service (ToS) field | if it is still part of the old Type of Service (ToS) field | |||
| [Mandalari18]. For example, if a TCP Client has set the SYN to Not- | [Mandalari18]. For example, if a TCP Client has set the SYN to Not- | |||
| ECT, but receives feedback that the IP ECN field on the SYN arrived | ECT, but receives feedback that the IP-ECN field on the SYN arrived | |||
| with a different codepoint, it can detect such middlebox | with a different codepoint, it can detect such middlebox | |||
| interference. Previously, neither end knew what IP ECN field the | interference. Previously, neither end knew what IP-ECN field the | |||
| other sent. So, if a TCP Server received ECT or CE on a SYN, it | other sent. So, if a TCP Server received ECT or CE on a SYN, it | |||
| could not know whether it was invalid because only the TCP Client | could not know whether it was invalid because only the TCP Client | |||
| knew whether it originally marked the SYN as Not-ECT (or ECT). | knew whether it originally marked the SYN as Not-ECT (or ECT). | |||
| Therefore, prior to AccECN, the Server's only safe course of action | Therefore, prior to AccECN, the Server's only safe course of action | |||
| in this example was to disable ECN for the connection. Instead, the | in this example was to disable ECN for the connection. Instead, the | |||
| AccECN protocol allows the Server and Client to feed back the ECN | AccECN protocol allows the Server and Client to feed back the ECN | |||
| field received on the SYN and SYN/ACK to their peer, which now has | field received on the SYN and SYN/ACK to their peer, which now has | |||
| all the information to decide whether the connection has to fall back | all the information to decide whether the connection has to fall back | |||
| from supporting ECN (or not). | from supporting ECN (or not). | |||
| skipping to change at line 630 ¶ | skipping to change at line 629 ¶ | |||
| TCP Three-Way Handshake | TCP Three-Way Handshake | |||
| During the TCP three-way handshake at the start of a connection, to | During the TCP three-way handshake at the start of a connection, to | |||
| request more Accurate ECN feedback the TCP Client (host A) MUST set | request more Accurate ECN feedback the TCP Client (host A) MUST set | |||
| the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment. | the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment. | |||
| If a TCP Server (host B) that is AccECN-enabled receives a SYN with | If a TCP Server (host B) that is AccECN-enabled receives a SYN with | |||
| the above three flags set, it MUST set both its half-connections into | the above three flags set, it MUST set both its half-connections into | |||
| AccECN mode. Then it MUST set the AE, CWR, and ECE TCP flags on the | AccECN mode. Then it MUST set the AE, CWR, and ECE TCP flags on the | |||
| SYN/ACK to the combination in the top block of Table 2 that feeds | SYN/ACK to the combination in the top block of Table 2 that feeds | |||
| back the IP ECN field that arrived on the SYN. This applies whether | back the IP-ECN field that arrived on the SYN. This applies whether | |||
| or not the Server itself supports setting the IP ECN field on a SYN | or not the Server itself supports setting the IP-ECN field on a SYN | |||
| or SYN/ACK (see Section 2.5 for rationale). | or SYN/ACK (see Section 2.5 for rationale). | |||
| When the TCP Server returns any of the four combinations in the top | When the TCP Server returns any of the four combinations in the top | |||
| block of Table 2, it confirms that it supports AccECN. The TCP | block of Table 2, it confirms that it supports AccECN. The TCP | |||
| Server MUST NOT set one of these four combinations of flags on the | Server MUST NOT set one of these four combinations of flags on the | |||
| SYN/ACK unless the preceding SYN requested support for AccECN as | SYN/ACK unless the preceding SYN requested support for AccECN as | |||
| above. | above. | |||
| Once a TCP Client (A) has sent the above SYN to declare that it | Once a TCP Client (A) has sent the above SYN to declare that it | |||
| supports AccECN, and once it has received the above SYN/ACK segment | supports AccECN, and once it has received the above SYN/ACK segment | |||
| skipping to change at line 661 ¶ | skipping to change at line 660 ¶ | |||
| The procedures for retransmission of SYNs or SYN/ACKs are given in | The procedures for retransmission of SYNs or SYN/ACKs are given in | |||
| Section 3.1.4. | Section 3.1.4. | |||
| It is RECOMMENDED that the AccECN protocol be implemented alongside | It is RECOMMENDED that the AccECN protocol be implemented alongside | |||
| Selective Acknowledgement (SACK) [RFC2018]. If SACK is implemented | Selective Acknowledgement (SACK) [RFC2018]. If SACK is implemented | |||
| with AccECN, Duplicate Selective Acknowledgement (D-SACK) [RFC2883] | with AccECN, Duplicate Selective Acknowledgement (D-SACK) [RFC2883] | |||
| MUST also be implemented. | MUST also be implemented. | |||
| 3.1.2. Backward Compatibility | 3.1.2. Backward Compatibility | |||
| The three flags set to 1 to indicate AccECN support on the SYN has | The setting of all three flags to 1 in order to indicate AccECN | |||
| been carefully chosen to enable natural fall-back to prior stages in | support on the SYN was carefully chosen to enable natural fall-back | |||
| the evolution of ECN. Table 2 tabulates all the negotiation | to prior stages in the evolution of ECN. Table 2 tabulates all the | |||
| possibilities for ECN-related capabilities that involve at least one | negotiation possibilities for ECN-related capabilities that involve | |||
| AccECN-capable host. The entries in the first two columns have been | at least one AccECN-capable host. The entries in the first two | |||
| abbreviated, as follows: | columns have been abbreviated, as follows: | |||
| AccECN: Supports more Accurate ECN feedback (the present | AccECN: Supports more Accurate ECN feedback (the present | |||
| specification). | specification). | |||
| Nonce: Supports ECN-nonce feedback [RFC3540]. | Nonce: Supports ECN-nonce feedback [RFC3540]. | |||
| ECN: Supports 'Classic' ECN feedback [RFC3168]. | ECN: Supports 'Classic' ECN feedback [RFC3168]. | |||
| No ECN: Not ECN-capable. Implicit congestion notification using | No ECN: Not ECN-capable. Implicit congestion notification using | |||
| packet drop. | packet drop. | |||
| skipping to change at line 793 ¶ | skipping to change at line 792 ¶ | |||
| such a combination, the Server MUST negotiate the use of AccECN as if | such a combination, the Server MUST negotiate the use of AccECN as if | |||
| the three flags had been set to (1,1,1). However, an AccECN Client | the three flags had been set to (1,1,1). However, an AccECN Client | |||
| implementation MUST NOT send a SYN with any combination other than | implementation MUST NOT send a SYN with any combination other than | |||
| the three listed. | the three listed. | |||
| If a TCP Client sent a SYN requesting AccECN feedback with | If a TCP Client sent a SYN requesting AccECN feedback with | |||
| (AE,CWR,ECE) = (1,1,1) and then receives a SYN/ACK with the currently | (AE,CWR,ECE) = (1,1,1) and then receives a SYN/ACK with the currently | |||
| reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have | reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have | |||
| logic specific to such a combination, the Client MUST enable AccECN | logic specific to such a combination, the Client MUST enable AccECN | |||
| mode as if the SYN/ACK confirmed that the Server supported AccECN and | mode as if the SYN/ACK confirmed that the Server supported AccECN and | |||
| as if it fed back that the IP ECN field on the SYN had arrived | as if it fed back that the IP-ECN field on the SYN had arrived | |||
| unchanged. However, an AccECN Server implementation MUST NOT send a | unchanged. However, an AccECN Server implementation MUST NOT send a | |||
| SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1). | SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1). | |||
| | For the avoidance of doubt, the behaviour described in the | | For the avoidance of doubt, the behaviour described in the | |||
| | present specification applies whether or not the three | | present specification applies whether or not the three | |||
| | remaining reserved TCP header flags are zero. | | remaining reserved TCP header flags are zero. | |||
| All of these requirements ensure that future uses of all the Reserved | All of these requirements ensure that future uses of all the Reserved | |||
| combinations on a SYN or SYN/ACK (see Table 2) can rely on consistent | combinations of all the TCP header bits on a SYN or SYN/ACK (see | |||
| behaviour from the installed base of AccECN implementations. See | Table 2) can rely on consistent behaviour from the installed base of | |||
| Appendix B.3 for related discussion. | AccECN implementations. See Appendix B.3 for related discussion. | |||
| 3.1.4. Multiple SYNs or SYN/ACKs | 3.1.4. Multiple SYNs or SYN/ACKs | |||
| 3.1.4.1. Retransmitted SYNs | 3.1.4.1. Retransmitted SYNs | |||
| If the sender of an AccECN SYN (the TCP Client) times out before | If the sender of an AccECN SYN (the TCP Client) times out before | |||
| receiving the SYN/ACK, it SHOULD attempt to negotiate the use of | receiving the SYN/ACK, it SHOULD attempt to negotiate the use of | |||
| AccECN at least one more time by continuing to set all three TCP ECN | AccECN at least one more time by continuing to set all three TCP-ECN | |||
| flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using | flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using | |||
| the usual retransmission timeouts). If this first retransmission | the usual retransmission timeouts). If this first retransmission | |||
| also fails to be acknowledged, in deployment scenarios where AccECN | also fails to be acknowledged, in deployment scenarios where AccECN | |||
| path traversal might be problematic, the TCP Client SHOULD send | path traversal might be problematic, the TCP Client SHOULD send | |||
| subsequent retransmissions of the SYN with the three TCP-ECN flags | subsequent retransmissions of the SYN with the three TCP-ECN flags | |||
| cleared (AE,CWR,ECE) = (0,0,0). Such a retransmitted SYN MUST use | cleared (AE,CWR,ECE) = (0,0,0). Such a retransmitted SYN MUST use | |||
| the same initial sequence number (ISN) as the original SYN. | the same initial sequence number (ISN) as the original SYN. | |||
| Retrying once before fall-back adds delay in the case where a | Retrying once before fall-back adds delay in the case where a | |||
| middlebox drops an AccECN (or ECN) SYN deliberately. However, recent | middlebox drops an AccECN (or ECN) SYN deliberately. However, recent | |||
| measurements [Mandalari18] imply that a drop is less likely to be due | measurements [Mandalari18] imply that a drop is less likely to be due | |||
| to middlebox interference than other intermittent causes of loss, | to middlebox interference than other intermittent causes of loss, | |||
| e.g., congestion, wireless transmission loss, etc. | e.g., congestion, wireless transmission loss, etc. | |||
| Implementers MAY use other fall-back strategies if they are found to | Implementers MAY use other fall-back strategies if they are found to | |||
| be more effective (e.g., attempting to negotiate AccECN on the SYN | be more effective, e.g., attempting to negotiate AccECN on the SYN | |||
| only once or more than twice (most appropriate during high levels of | only once or more than twice (most appropriate during high levels of | |||
| congestion)). | congestion). | |||
| Further it might make sense to also remove any other new or | Further it might make sense to also remove any other new or | |||
| experimental fields or options on the SYN in case a middlebox might | experimental fields or options on the SYN in case a middlebox might | |||
| be blocking them, although the required behaviour will depend on the | be blocking them, although the required behaviour will depend on the | |||
| specification of the other option(s) and any attempt to coordinate | specification of the other option(s) and any attempt to coordinate | |||
| fall-back between different modules of the stack. For instance, if | fall-back between different modules of the stack. For instance, if | |||
| taking part in an [RFC8311] experiment that allows ECT on a SYN, it | taking part in an [RFC8311] experiment that allows ECT on a SYN, it | |||
| would be advisable to have a fall-back strategy that tries use of | would be advisable to have a fall-back strategy that tries use of | |||
| AccECN without setting ETC on SYN. | AccECN without setting ECT on the SYN. | |||
| Whichever fall-back strategy is used, the TCP initiator SHOULD cache | Whichever fall-back strategy is used, the TCP initiator SHOULD cache | |||
| failed connection attempts. If it does, it SHOULD NOT give up | failed connection attempts. If it does, it SHOULD NOT give up | |||
| attempting to negotiate AccECN on the SYN of subsequent connection | attempting to negotiate AccECN on the SYN of subsequent connection | |||
| attempts until it is clear that the blockage is persistently and | attempts until it is clear that the blockage is persistently and | |||
| specifically due to AccECN. The cache needs to be arranged to expire | specifically due to AccECN. The cache needs to be arranged to expire | |||
| so that the initiator will infrequently attempt to check whether the | so that the initiator will infrequently attempt to check whether the | |||
| problem has been resolved. | problem has been resolved. | |||
| All fall-back strategies will need to follow all the normative rules | All fall-back strategies will need to follow all the normative rules | |||
| skipping to change at line 870 ¶ | skipping to change at line 869 ¶ | |||
| possibly reordered. | possibly reordered. | |||
| * Such a TCP Client enters the feedback mode appropriate to the | * Such a TCP Client enters the feedback mode appropriate to the | |||
| first SYN/ACK it receives according to Table 2, and it does not | first SYN/ACK it receives according to Table 2, and it does not | |||
| switch to a different mode, whatever other SYN/ACKs it might | switch to a different mode, whatever other SYN/ACKs it might | |||
| receive or send. | receive or send. | |||
| * If a TCP Client has entered AccECN mode but then subsequently | * If a TCP Client has entered AccECN mode but then subsequently | |||
| sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it | sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it | |||
| is still allowed to set ECT on packets for the rest of the | is still allowed to set ECT on packets for the rest of the | |||
| connection. Note that this rule is different than that of a | connection. Note that this rule is different from that of a | |||
| Server in an equivalent position (Section 3.1.5 explains). | Server in an equivalent position (Section 3.1.5 explains). | |||
| * Having entered AccECN mode, in general a TCP Client commits to | * Having entered AccECN mode, in general a TCP Client commits to | |||
| respond to any incoming congestion feedback, whether or not it | respond to any incoming congestion feedback, whether or not it | |||
| sets ECT on outgoing packets (for rationale and some exceptions | sets ECT on outgoing packets (for rationale and some exceptions | |||
| see Section 3.2.2.3, Section 3.2.2.4). | see Section 3.2.2.3, Section 3.2.2.4). | |||
| * Having entered AccECN mode, a TCP Client commits to using AccECN | * Having entered AccECN mode, a TCP Client commits to using AccECN | |||
| to feed back the IP ECN field in incoming packets for the rest of | to feed back the IP-ECN field in incoming packets for the rest of | |||
| the connection, as specified in Section 3.2, even if it is not | the connection, as specified in Section 3.2, even if it is not | |||
| itself setting ECT on outgoing packets. | itself setting ECT on outgoing packets. | |||
| 3.1.4.2. Retransmitted SYN/ACKs | 3.1.4.2. Retransmitted SYN/ACKs | |||
| A TCP Server might send multiple SYN/ACKs indicating different | A TCP Server might send multiple SYN/ACKs indicating different | |||
| feedback modes. For instance, when falling back to sending a SYN/ACK | feedback modes. For instance, when falling back to sending a SYN/ACK | |||
| with (AE,CWR,ECE) = (0,0,0) after previous AccECN SYN/ACKs have timed | with (AE,CWR,ECE) = (0,0,0) after previous AccECN SYN/ACKs have timed | |||
| out (Section 3.2.3.2.2); or to acknowledge different retransmissions | out (Section 3.2.3.2.2); or to acknowledge different retransmissions | |||
| of the SYN (Section 3.1.4.1). | of the SYN (Section 3.1.4.1). | |||
| skipping to change at line 908 ¶ | skipping to change at line 907 ¶ | |||
| * An AccECN-capable TCP Server enters the feedback mode appropriate | * An AccECN-capable TCP Server enters the feedback mode appropriate | |||
| to the first SYN it receives using Table 2, and it does not switch | to the first SYN it receives using Table 2, and it does not switch | |||
| to a different mode, whatever other SYNs it might receive and | to a different mode, whatever other SYNs it might receive and | |||
| whatever SYN/ACKs it might send. | whatever SYN/ACKs it might send. | |||
| * If a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = | * If a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = | |||
| (0,0,0), it preferably acknowledges it first using an AccECN SYN/ | (0,0,0), it preferably acknowledges it first using an AccECN SYN/ | |||
| ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0). | ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0). | |||
| * If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it | * If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it | |||
| uses the TCP-ECN flags in each SYN/ACK to feed back the IP ECN | uses the TCP-ECN flags in each SYN/ACK to feed back the IP-ECN | |||
| field on the latest SYN to have arrived. | field on the latest SYN to have arrived. | |||
| * If a TCP Server enters AccECN mode and then subsequently sends a | * If a TCP Server enters AccECN mode and then subsequently sends a | |||
| SYN/ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is | SYN/ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is | |||
| prohibited from setting ECT on any packet for the rest of the | prohibited from setting ECT on any packet for the rest of the | |||
| connection. | connection. | |||
| * Having entered AccECN mode, in general a TCP Server commits to | * Having entered AccECN mode, in general a TCP Server commits to | |||
| respond to any incoming congestion feedback, whether or not it | respond to any incoming congestion feedback, whether or not it | |||
| sets ECT on outgoing packets (for rationale and some exceptions | sets ECT on outgoing packets (for rationale and some exceptions | |||
| see Sections 3.2.2.3, 3.2.2.4). | see Sections 3.2.2.3, 3.2.2.4). | |||
| * Having entered AccECN mode, a TCP Server commits to using AccECN | * Having entered AccECN mode, a TCP Server commits to using AccECN | |||
| to feed back the IP ECN field in incoming packets for the rest of | to feed back the IP-ECN field in incoming packets for the rest of | |||
| the connection, as specified in Section 3.2, even if it is not | the connection, as specified in Section 3.2, even if it is not | |||
| itself setting ECT on outgoing packets. | itself setting ECT on outgoing packets. | |||
| 3.1.5. Implications of AccECN Mode | 3.1.5. Implications of AccECN Mode | |||
| Section 3.1.1 describes the only ways that a host can enter AccECN | Section 3.1.1 describes the only ways that a host can enter AccECN | |||
| mode, whether as a Client or as a Server. | mode, whether as a Client or as a Server. | |||
| An implementation that supports AccECN has the rights and obligations | An implementation that supports AccECN has the rights and obligations | |||
| concerning the use of ECN defined below, which update those in | concerning the use of ECN defined below, which update those in | |||
| skipping to change at line 947 ¶ | skipping to change at line 946 ¶ | |||
| synchronization. | synchronization. | |||
| 'Valid SYN': A SYN that has the same port numbers and the same ISN | 'Valid SYN': A SYN that has the same port numbers and the same ISN | |||
| as the SYN that first caused the Server to open the connection. | as the SYN that first caused the Server to open the connection. | |||
| An 'Acceptable' packet is defined in Section 1.3. | An 'Acceptable' packet is defined in Section 1.3. | |||
| Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back): | Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back): | |||
| * Any implementation that supports AccECN: | * Any implementation that supports AccECN: | |||
| - MUST NOT switch into a different feedback mode than the one it | - MUST NOT switch into a different feedback mode from the one it | |||
| first entered according to Table 2, no matter whether it | first entered according to Table 2, no matter whether it | |||
| subsequently receives valid SYNs or Acceptable SYN/ACKs of | subsequently receives valid SYNs or Acceptable SYN/ACKs of | |||
| different types; | different types; | |||
| - SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are | - SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are | |||
| received after the implementation reaches the ESTABLISHED | received after the implementation reaches the ESTABLISHED | |||
| state, in line with the general TCP approach [RFC9293]; | state, in line with the general TCP approach [RFC9293]; | |||
| Reason: Reaching ESTABLISHED state implies that at least one | Reason: Reaching ESTABLISHED state implies that at least one | |||
| SYN and one SYN/ACK have successfully been delivered. And all | SYN and one SYN/ACK have successfully been delivered. And all | |||
| skipping to change at line 987 ¶ | skipping to change at line 986 ¶ | |||
| handshake. | handshake. | |||
| The last four rules are necessary because, if one peer were to | The last four rules are necessary because, if one peer were to | |||
| negotiate the feedback mode in two different types of handshake, | negotiate the feedback mode in two different types of handshake, | |||
| it would not be possible for the other peer to know for certain | it would not be possible for the other peer to know for certain | |||
| which handshake packet(s) the other end had eventually received or | which handshake packet(s) the other end had eventually received or | |||
| in which order it received them. So, in the absence of these | in which order it received them. So, in the absence of these | |||
| rules, the two peers could end up using different ECN feedback | rules, the two peers could end up using different ECN feedback | |||
| modes without knowing it. | modes without knowing it. | |||
| * A host in AccECN mode that is feeding back the IP ECN field on a | * A host in AccECN mode that is feeding back the IP-ECN field on a | |||
| SYN or SYN/ACK: | SYN or SYN/ACK: | |||
| - MUST feed back the IP ECN field on the latest valid SYN or | - MUST feed back the IP-ECN field on the latest valid SYN or | |||
| acceptable SYN/ACK to arrive. | acceptable SYN/ACK to arrive. | |||
| * A TCP Server already in AccECN mode: | * A TCP Server already in AccECN mode: | |||
| - SHOULD acknowledge a valid SYN arriving with (AE,CWR,ECE) = | - SHOULD acknowledge a valid SYN arriving with (AE,CWR,ECE) = | |||
| (0,0,0) by emitting an AccECN SYN/ACK (with the appropriate | (0,0,0) by emitting an AccECN SYN/ACK (with the appropriate | |||
| combination of TCP-ECN flags to feed back the IP ECN field of | combination of TCP-ECN flags to feed back the IP-ECN field of | |||
| this latest SYN); | this latest SYN); | |||
| - MAY acknowledge a valid SYN arriving with (AE,CWR,ECE) = | - MAY acknowledge a valid SYN arriving with (AE,CWR,ECE) = | |||
| (0,0,0) by sending a SYN/ACK with (AE,CWR,ECE) = (0,0,0). | (0,0,0) by sending a SYN/ACK with (AE,CWR,ECE) = (0,0,0). | |||
| Rationale: When a SYN arrives with (AE,CWR,ECE) = (0,0,0) at a TCP | Rationale: When a SYN arrives with (AE,CWR,ECE) = (0,0,0) at a TCP | |||
| Server that is already in AccECN mode, it implies that the TCP | Server that is already in AccECN mode, it implies that the TCP | |||
| Client had probably not received the previous AccECN SYN/ACK | Client had probably not received the previous AccECN SYN/ACK | |||
| emitted by the TCP Server. Therefore, the first bullet recommends | emitted by the TCP Server. Therefore, the first bullet recommends | |||
| attempting at least one more AccECN SYN/ACK. Nonetheless, the | attempting at least one more AccECN SYN/ACK. Nonetheless, the | |||
| skipping to change at line 1085 ¶ | skipping to change at line 1084 ¶ | |||
| For the avoidance of doubt, this is unlike an RFC 3168 data | For the avoidance of doubt, this is unlike an RFC 3168 data | |||
| sender and this does not preclude the Data Sender from setting | sender and this does not preclude the Data Sender from setting | |||
| the bits of the ACE counter field, which includes an overloaded | the bits of the ACE counter field, which includes an overloaded | |||
| use of the same bit. | use of the same bit. | |||
| Receiving ECT: | Receiving ECT: | |||
| * A host in AccECN mode: | * A host in AccECN mode: | |||
| - MUST feed back the information in the IP ECN field of incoming | - MUST feed back the information in the IP-ECN field of incoming | |||
| packets using Accurate ECN feedback, as specified in | packets using Accurate ECN feedback, as specified in | |||
| Section 3.2. | Section 3.2. | |||
| For the avoidance of doubt, this requirement stands even if the | For the avoidance of doubt, this requirement stands even if the | |||
| AccECN host has also sent or received a SYN or SYN/ACK with | AccECN host has also sent or received a SYN or SYN/ACK with | |||
| (AE,CWR,ECE) = (0,0,0). Reason: Such a SYN or SYN/ACK implies | (AE,CWR,ECE) = (0,0,0). Reason: Such a SYN or SYN/ACK implies | |||
| some form of packet mangling might be present. Even if the | some form of packet mangling might be present. Even if the | |||
| remote peer is not setting ECT, it could still be set | remote peer is not setting ECT, it could still be set | |||
| erroneously by packet mangling at the IP layer (see | erroneously by packet mangling at the IP layer (see | |||
| Section 3.2.2.3). In such cases, the Data Sender is best | Section 3.2.2.3). In such cases, the Data Sender is best | |||
| placed to decide whether ECN markings are valid, but it can | placed to decide whether ECN markings are valid, but it can | |||
| only do that if the Data Receiver mechanistically feeds back | only do that if the Data Receiver mechanistically feeds back | |||
| any ECN markings. This approach will not lead to TCP Options | any ECN markings. This approach will not lead to TCP Options | |||
| being generated unnecessarily if the recommended simple scheme | being generated unnecessarily if the recommended simple scheme | |||
| in Section 3.2.3.3 is used, because no byte counters will | in Section 3.2.3.3 is used, because no byte counters will | |||
| change if no packets are set to ECT. | change if no packets are set to ECT. | |||
| - MUST NOT use reception of packets with ECT set in the IP ECN | - MUST NOT use reception of packets with ECT set in the IP-ECN | |||
| field as an implicit signal that the peer is ECN-capable. | field as an implicit signal that the peer is ECN-capable. | |||
| Reason: ECT at the IP layer does not explicitly confirm the | Reason: ECT at the IP layer does not explicitly confirm the | |||
| peer has the correct ECN feedback logic, because the packets | peer has the correct ECN feedback logic, because the packets | |||
| could have been mangled at the IP layer. | could have been mangled at the IP layer. | |||
| 3.2. AccECN Feedback | 3.2. AccECN Feedback | |||
| Each Data Receiver of each half-connection maintains four counters, | Each Data Receiver of each half-connection maintains four counters, | |||
| r.cep, r.ceb, r.e0b, and r.e1b: | r.cep, r.ceb, r.e0b, and r.e1b: | |||
| * The Data Receiver MUST increment the CE packet counter (r.cep), | * The Data Receiver MUST increment the CE packet counter (r.cep), | |||
| for every Acceptable packet that it receives with the CE code | for every Acceptable packet that it receives with the CE code | |||
| point in the IP ECN field, including CE-marked control packets and | point in the IP-ECN field, including CE-marked control packets and | |||
| retransmissions but excluding CE on SYN packets (SYN=1; ACK=0). | retransmissions but excluding CE on SYN packets (SYN=1; ACK=0). | |||
| * A Data Receiver that supports sending of AccECN TCP Options MUST | * A Data Receiver that supports sending of AccECN TCP Options MUST | |||
| increment the r.ceb, r.e0b, or r.e1b byte counters by the number | increment the r.ceb, r.e0b, or r.e1b byte counters by the number | |||
| of TCP payload octets in Acceptable packets marked with the CE, | of TCP payload octets in Acceptable packets marked with the CE, | |||
| ECT(0), and ECT(1) codepoint in their IP ECN field, including any | ECT(0), and ECT(1) codepoint in their IP-ECN field, including any | |||
| payload octets on control packets and retransmissions, but not | payload octets on control packets and retransmissions, but not | |||
| including any payload octets on SYN packets (SYN=1; ACK=0). | including any payload octets on SYN packets (SYN=1; ACK=0). | |||
| Each Data Sender of each half-connection maintains four counters, | Each Data Sender of each half-connection maintains four counters, | |||
| s.cep, s.ceb, s.e0b, and s.e1b, intended to track the equivalent | s.cep, s.ceb, s.e0b, and s.e1b, intended to track the equivalent | |||
| counters at the Data Receiver. | counters at the Data Receiver. | |||
| A Data Receiver feeds back the CE packet counter using the Accurate | A Data Receiver feeds back the CE packet counter using the Accurate | |||
| ECN (ACE) field, as explained in Section 3.2.2. And it optionally | ECN (ACE) field, as explained in Section 3.2.2. And it optionally | |||
| feeds back all the byte counters using the AccECN TCP Option, as | feeds back all the byte counters using the AccECN TCP Option, as | |||
| skipping to change at line 1201 ¶ | skipping to change at line 1200 ¶ | |||
| Both parts of each of these conditions are equally important. For | Both parts of each of these conditions are equally important. For | |||
| instance, even if AccECN negotiation has been successful, the ACE | instance, even if AccECN negotiation has been successful, the ACE | |||
| field is not defined on any segments with SYN=1 (e.g., a | field is not defined on any segments with SYN=1 (e.g., a | |||
| retransmission of an unacknowledged SYN/ACK, or when both ends send | retransmission of an unacknowledged SYN/ACK, or when both ends send | |||
| SYN/ACKs after AccECN support has been successfully negotiated during | SYN/ACKs after AccECN support has been successfully negotiated during | |||
| a simultaneous open). | a simultaneous open). | |||
| 3.2.2.1. ACE Field on the ACK of the SYN/ACK | 3.2.2.1. ACE Field on the ACK of the SYN/ACK | |||
| A TCP Client (A) in AccECN mode MUST feed back which of the 4 | A TCP Client (A) in AccECN mode MUST feed back which of the 4 | |||
| possible values of the IP ECN field was on the SYN/ACK by writing it | possible values of the IP-ECN field was on the SYN/ACK by writing it | |||
| into the ACE field of a pure ACK with no SACK blocks using the binary | into the ACE field of a pure ACK with no SACK blocks using the binary | |||
| encoding in Table 3 (which is the same as that used on the SYN/ACK in | encoding in Table 3 (which is the same as that used on the SYN/ACK in | |||
| Table 2). This shall be called the "handshake encoding" of the ACE | Table 2). This shall be called the "handshake encoding" of the ACE | |||
| field, and it is the only exception to the rule that the ACE field | field, and it is the only exception to the rule that the ACE field | |||
| carries the 3 least significant bits of the r.cep counter on packets | carries the 3 least significant bits of the r.cep counter on packets | |||
| with SYN=0. | with SYN=0. | |||
| Normally, a TCP Client acknowledges a SYN/ACK with an ACK that | Normally, a TCP Client acknowledges a SYN/ACK with an ACK that | |||
| satisfies the above conditions anyway (SYN=0, no data, no SACK | satisfies the above conditions anyway (SYN=0, no data, no SACK | |||
| blocks). If an AccECN TCP Client intends to acknowledge the SYN/ACK | blocks). If an AccECN TCP Client intends to acknowledge the SYN/ACK | |||
| with a packet that does not satisfy these conditions (e.g., it has | with a packet that does not satisfy these conditions (e.g., it has | |||
| data to include on the ACK), it SHOULD first send a pure ACK that | data to include on the ACK), it SHOULD first send a pure ACK that | |||
| does satisfy these conditions (see Section 5.2), so that it can feed | does satisfy these conditions (see Section 5.2), so that it can feed | |||
| back which of the four values of the IP ECN field arrived on the SYN/ | back which of the four values of the IP-ECN field arrived on the SYN/ | |||
| ACK. A valid exception to this "SHOULD" would be where the | ACK. A valid exception to this "SHOULD" would be where the | |||
| implementation will only be used in an environment where mangling of | implementation will only be used in an environment where mangling of | |||
| the ECN field is unlikely. | the ECN field is unlikely. | |||
| The TCP Client MUST also use the handshake encoding for the pure ACK | The TCP Client MUST also use the handshake encoding for the pure ACK | |||
| of any retransmitted SYN/ACK that confirms that the TCP Server | of any retransmitted SYN/ACK that confirms that the TCP Server | |||
| supports AccECN. If the final ACK of the handshake does not arrive | supports AccECN. If the TCP Server does not receive the final ACK of | |||
| before its retransmission timer expires, the procedure that the TCP | the handshake before its retransmission timer expires, the procedure | |||
| Server will follow is given in Section 3.1.4.2. | for it to follow is given in Section 3.1.4.2. | |||
| +==================+================+=====================+ | +==================+================+=====================+ | |||
| | IP ECN Codepoint | ACE on Pure | r.cep of TCP Client | | | IP-ECN Codepoint | ACE on Pure | r.cep of TCP Client | | |||
| | on SYN/ACK | ACK of SYN/ACK | in AccECN Mode | | | on SYN/ACK | ACK of SYN/ACK | in AccECN Mode | | |||
| +==================+================+=====================+ | +==================+================+=====================+ | |||
| | Not-ECT | 0b010 | 5 | | | Not-ECT | 0b010 | 5 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| | ECT(1) | 0b011 | 5 | | | ECT(1) | 0b011 | 5 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| | ECT(0) | 0b100 | 5 | | | ECT(0) | 0b100 | 5 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| | CE | 0b110 | 6 | | | CE | 0b110 | 6 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| Table 3: The Encoding of the ACE Field in the ACK of | Table 3: The Encoding of the ACE Field in the ACK of | |||
| the SYN-ACK to Reflect the SYN-ACK's IP ECN Field | the SYN-ACK to Reflect the SYN-ACK's IP-ECN Field | |||
| When an AccECN Server in SYN-RCVD state receives a pure ACK with | When an AccECN Server in SYN-RCVD state receives a pure ACK with | |||
| SYN=0 and no SACK blocks, it MUST infer the meaning of each possible | SYN=0 and no SACK blocks, it MUST infer the meaning of each possible | |||
| value of the ACE field from Table 4 instead of treating the ACE field | value of the ACE field from Table 4 instead of treating the ACE field | |||
| as a counter. As a result, an AccECN Server MUST set s.cep to the | as a counter. As a result, an AccECN Server MUST set s.cep to the | |||
| respective value, also shown in Table 4. | respective value, also shown in Table 4. | |||
| Given this encoding of the ACE field on the ACK of a SYN/ACK is | Given this encoding of the ACE field on the ACK of a SYN/ACK is | |||
| exceptional, an AccECN Server using large receive offload (LRO) might | exceptional, an AccECN Server using large receive offload (LRO) might | |||
| prefer to disable LRO until the ACK of the SYN/ACK was sent and it | prefer to disable LRO until it transitions out of SYN-RCVD state | |||
| has transitioned out of SYN-RCVD state. | (when it first receives an ACK that covers the SYN/ACK). | |||
| +============+==========================+=====================+ | +============+==========================+=====================+ | |||
| | ACE on ACK | IP ECN Codepoint on SYN/ | s.cep of TCP Server | | | ACE on ACK | IP-ECN Codepoint on SYN/ | s.cep of TCP Server | | |||
| | of SYN/ACK | ACK Inferred by Server | in AccECN Mode | | | of SYN/ACK | ACK Inferred by Server | in AccECN Mode | | |||
| +============+==========================+=====================+ | +============+==========================+=====================+ | |||
| | 0b000 | {Notes 1, 3} | Disable s.cep | | | 0b000 | {Notes 1, 3} | Disable s.cep | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b001 | {Notes 2, 3} | 5 | | | 0b001 | {Notes 2, 3} | 5 | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b010 | Not-ECT | 5 | | | 0b010 | Not-ECT | 5 | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b011 | ECT(1) | 5 | | | 0b011 | ECT(1) | 5 | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| skipping to change at line 1323 ¶ | skipping to change at line 1322 ¶ | |||
| Note 3: In the case where a Server that implements AccECN is also | Note 3: In the case where a Server that implements AccECN is also | |||
| using a stateless handshake (termed a SYN cookie), it will | using a stateless handshake (termed a SYN cookie), it will | |||
| not remember whether it entered AccECN mode. The values | not remember whether it entered AccECN mode. The values | |||
| 0b000 or 0b001 will remind it that it did not enter AccECN | 0b000 or 0b001 will remind it that it did not enter AccECN | |||
| mode, because AccECN does not use them (see Section 5.1 for | mode, because AccECN does not use them (see Section 5.1 for | |||
| details). If a Server that uses a stateless handshake and | details). If a Server that uses a stateless handshake and | |||
| implements AccECN receives either of these two values in the | implements AccECN receives either of these two values in the | |||
| ACK, its action is implementation-dependent and outside the | ACK, its action is implementation-dependent and outside the | |||
| scope of this document. It will certainly not take the | scope of this document. It will certainly not take the | |||
| action in the third column because, after it receives either | action in the third column because, after it receives either | |||
| of these values, it is not in AccECN mode. For example, it | of these values, it is not in AccECN mode. That is, it will | |||
| will not disable ECN (at least not just because ACE is | not disable ECN (at least not just because ACE is 0b000) and | |||
| 0b000) and it will not set s.cep. | it will not set s.cep. | |||
| 3.2.2.2. Encoding and Decoding Feedback in the ACE Field | 3.2.2.2. Encoding and Decoding Feedback in the ACE Field | |||
| Whenever the Data Receiver sends an ACK with SYN=0 (with or without | Whenever the Data Receiver sends an ACK with SYN=0 (with or without | |||
| data), unless the handshake encoding in Section 3.2.2.1 applies, the | data), unless the handshake encoding in Section 3.2.2.1 applies, the | |||
| Data Receiver MUST encode the least significant 3 bits of its r.cep | Data Receiver MUST encode the least significant 3 bits of its r.cep | |||
| counter into the ACE field (see Appendix A.2). | counter into the ACE field (see Appendix A.2). | |||
| Whenever the Data Sender receives an ACK with SYN=0 (with or without | Whenever the Data Sender receives an ACK with SYN=0 (with or without | |||
| data), it first checks whether it has already been superseded | data), it first checks whether it has already been superseded | |||
| skipping to change at line 1355 ¶ | skipping to change at line 1354 ¶ | |||
| * It then follows the safety procedures in Section 3.2.2.5.2 to | * It then follows the safety procedures in Section 3.2.2.5.2 to | |||
| calculate or estimate how many packets the ACK could have | calculate or estimate how many packets the ACK could have | |||
| acknowledged under the prevailing conditions to determine whether | acknowledged under the prevailing conditions to determine whether | |||
| the ACE field might have wrapped more than once. | the ACE field might have wrapped more than once. | |||
| The encode/decode procedures during the three-way handshake are | The encode/decode procedures during the three-way handshake are | |||
| exceptions to the general rules given so far, so they are spelled out | exceptions to the general rules given so far, so they are spelled out | |||
| step by step below for clarity: | step by step below for clarity: | |||
| * If a TCP Server in AccECN mode receives a CE mark in the IP ECN | * If a TCP Server in AccECN mode receives a CE mark in the IP-ECN | |||
| field of a SYN (SYN=1, ACK=0), it MUST NOT increment r.cep (it | field of a SYN (SYN=1, ACK=0), it MUST NOT increment r.cep (it | |||
| remains at its initial value of 5). | remains at its initial value of 5). | |||
| Reason: It would be redundant for the Server to include CE-marked | Reason: It would be redundant for the Server to include CE-marked | |||
| SYNs in its r.cep counter, because it already reliably delivers | SYNs in its r.cep counter, because it already reliably delivers | |||
| feedback of any CE marking using the encoding in the top block of | feedback of any CE marking using the encoding in the top block of | |||
| Table 2 in the SYN/ACK. This also ensures that, when the Server | Table 2 in the SYN/ACK. This also ensures that, when the Server | |||
| starts using the ACE field, it has not unnecessarily consumed more | starts using the ACE field, it has not unnecessarily consumed more | |||
| than one initial value, given they can be used to negotiate | than one initial value, given they can be used to negotiate | |||
| variants of the AccECN protocol (see Appendix B.3). | variants of the AccECN protocol (see Appendix B.3). | |||
| * If a TCP Client in AccECN mode receives CE feedback in the TCP | * If a TCP Client in AccECN mode receives CE feedback in the TCP | |||
| flags of a SYN/ACK, it MUST NOT increment s.cep (it remains at its | flags of a SYN/ACK, it MUST NOT increment s.cep (it remains at its | |||
| initial value of 5) so that it stays in step with r.cep on the | initial value of 5) so that it stays in step with r.cep on the | |||
| Server. Nonetheless, the TCP Client still triggers the congestion | Server. Nonetheless, the TCP Client still triggers the congestion | |||
| control actions necessary to respond to the CE feedback. | control actions necessary to respond to the CE feedback. | |||
| * If a TCP Client in AccECN mode receives a CE mark in the IP ECN | * If a TCP Client in AccECN mode receives a CE mark in the IP-ECN | |||
| field of a SYN/ACK, it MUST increment r.cep, but no more than once | field of a SYN/ACK, it MUST increment r.cep, but no more than once | |||
| no matter how many CE-marked SYN/ACKs it receives (i.e., | no matter how many CE-marked SYN/ACKs it receives (i.e., | |||
| incremented from 5 to 6, but no further). | incremented from 5 to 6, but no further). | |||
| Reason: Incrementing r.cep ensures the Client will eventually | Reason: Incrementing r.cep ensures the Client will eventually | |||
| deliver any CE marking to the Server reliably when it starts using | deliver any CE marking to the Server reliably when it starts using | |||
| the ACE field. Even though the Client also feeds back any CE | the ACE field. Even though the Client also feeds back any CE | |||
| marking on the ACK of the SYN/ACK using the encoding in Table 3, | marking on the ACK of the SYN/ACK using the encoding in Table 3, | |||
| this ACK is not delivered reliably, so it can be considered as a | this ACK is not delivered reliably, so it can be considered as a | |||
| timely notification that is redundant but unreliable. The Client | timely notification that is redundant but unreliable. The Client | |||
| skipping to change at line 1418 ¶ | skipping to change at line 1417 ¶ | |||
| ACK of the SYN/ACK) that is delayed for longer than the Server's | ACK of the SYN/ACK) that is delayed for longer than the Server's | |||
| retransmission timeout; or packet duplication by the network. And | retransmission timeout; or packet duplication by the network. And | |||
| the impact of any error in the feedback on such ACKs will only be | the impact of any error in the feedback on such ACKs will only be | |||
| temporary. | temporary. | |||
| 3.2.2.3. Testing for Mangling of the IP/ECN Field | 3.2.2.3. Testing for Mangling of the IP/ECN Field | |||
| * TCP Client side: | * TCP Client side: | |||
| The value of the TCP-ECN flags on the SYN/ACK indicates the value | The value of the TCP-ECN flags on the SYN/ACK indicates the value | |||
| of the IP ECN field when the SYN arrived at the Server. The TCP | of the IP-ECN field when the SYN arrived at the Server. The TCP | |||
| Client can compare this with how it originally set the IP ECN | Client can compare this with how it originally set the IP-ECN | |||
| field on the SYN. If this comparison implies an invalid | field on the SYN. If this comparison implies an invalid | |||
| transition (defined below) of the IP ECN field, for the remainder | transition (defined below) of the IP-ECN field, for the remainder | |||
| of the half-connection the Client is advised to send non-ECN- | of the half-connection the Client is advised to send non-ECN- | |||
| capable packets, but it still ought to respond to any feedback of | capable packets, but it still ought to respond to any feedback of | |||
| CE markings (explained below). However, the TCP Client MUST | CE markings (explained below). However, the TCP Client MUST | |||
| remain in the AccECN feedback mode and it MUST continue to feed | remain in the AccECN feedback mode and it MUST continue to feed | |||
| back any ECN markings on arriving packets (in its role as Data | back any ECN markings on arriving packets (in its role as Data | |||
| Receiver). | Receiver). | |||
| * TCP Server side: | * TCP Server side: | |||
| The value of the ACE field on the last ACK of the three-way | The value of the ACE field on the last ACK of the three-way | |||
| handshake indicates the value of the IP ECN field when the SYN/ACK | handshake indicates the value of the IP-ECN field when the SYN/ACK | |||
| arrived at the TCP Client. The Server can compare this with how | arrived at the TCP Client. The Server can compare this with how | |||
| it originally set the IP ECN field on the SYN/ACK. If this | it originally set the IP-ECN field on the SYN/ACK. If this | |||
| comparison implies an invalid transition of the IP ECN field, for | comparison implies an invalid transition of the IP-ECN field, for | |||
| the remainder of the half-connection the Server is advised to send | the remainder of the half-connection the Server is advised to send | |||
| non-ECN-capable packets, but it still ought to respond to any | non-ECN-capable packets, but it still ought to respond to any | |||
| feedback of CE markings (explained below). However, the Server | feedback of CE markings (explained below). However, the Server | |||
| MUST remain in the AccECN feedback mode and it MUST continue to | MUST remain in the AccECN feedback mode and it MUST continue to | |||
| feed back any ECN markings on arriving packets (in its role as | feed back any ECN markings on arriving packets (in its role as | |||
| Data Receiver). | Data Receiver). | |||
| If a Data Sender in AccECN mode starts sending non-ECN-capable | If a Data Sender in AccECN mode starts sending non-ECN-capable | |||
| packets because it has detected mangling, it is still advised to | packets because it has detected mangling, it is still advised to | |||
| respond to CE feedback. Reason: Any CE marking arriving at the Data | respond to CE feedback. Reason: Any CE marking arriving at the Data | |||
| Receiver could be due to something early in the path mangling the | Receiver could be due to something early in the path mangling the | |||
| non-ECN-capable IP ECN field into an ECN-capable codepoint and then, | non-ECN-capable IP-ECN field into an ECN-capable codepoint and then, | |||
| later in the path, a network bottleneck might be applying CE markings | later in the path, a network bottleneck might be applying CE markings | |||
| to indicate genuine congestion. This argument applies whether the | to indicate genuine congestion. This argument applies whether the | |||
| handshake packet originally sent by the TCP Client or Server was non- | handshake packet originally sent by the TCP Client or Server was non- | |||
| ECN-capable or ECN-capable because, in either case, an unsafe | ECN-capable or ECN-capable because, in either case, an unsafe | |||
| transition could imply that non-ECN-capable packets later in the | transition could imply that non-ECN-capable packets later in the | |||
| connection might get mangled. | connection might get mangled. | |||
| Once a Data Sender has entered AccECN mode it is advised to check | Once a Data Sender has entered AccECN mode it is advised to check | |||
| whether it is receiving continuous feedback of CE. Specifying | whether it is receiving continuous feedback of CE. Specifying | |||
| exactly how to do this is beyond the scope of the present | exactly how to do this is beyond the scope of the present | |||
| skipping to change at line 1470 ¶ | skipping to change at line 1469 ¶ | |||
| marking. If continuous CE marking is detected, for the remainder of | marking. If continuous CE marking is detected, for the remainder of | |||
| the half-connection, the Data Sender ought to send non-ECN-capable | the half-connection, the Data Sender ought to send non-ECN-capable | |||
| packets, and it is advised not to respond to any feedback of CE | packets, and it is advised not to respond to any feedback of CE | |||
| markings. The Data Sender might occasionally test whether it can | markings. The Data Sender might occasionally test whether it can | |||
| resume sending ECN-capable packets. | resume sending ECN-capable packets. | |||
| The above advice on switching to sending non-ECN-capable packets but | The above advice on switching to sending non-ECN-capable packets but | |||
| still responding to CE markings unless they become continuous is not | still responding to CE markings unless they become continuous is not | |||
| stated normatively (in capitals), because the best strategy might | stated normatively (in capitals), because the best strategy might | |||
| depend on experience of the most likely types of mangling, which can | depend on experience of the most likely types of mangling, which can | |||
| only be known at the time of deployment. The same is true for other | only be known at the time of deployment. For instance, later in a | |||
| forms of mangling (or resumption of expected marking) during later | connection, sender implementations might need to detect the onset (or | |||
| stages of a connection. | the end) of mangling and stop (or start) sending ECN-capable packets | |||
| accordingly. | ||||
| As always, once a host has entered AccECN mode, it follows the | As always, once a host has entered AccECN mode, it follows the | |||
| general mandatory requirements (Section 3.1.5) to remain in the same | general mandatory requirements (Section 3.1.5) to remain in the same | |||
| feedback mode and to continue feeding back any ECN markings on | feedback mode and to continue feeding back any ECN markings on | |||
| arriving packets using AccECN feedback. This follows the general | arriving packets using AccECN feedback. This follows the general | |||
| approach where an AccECN Data Receiver mechanistically reflects | approach where an AccECN Data Receiver mechanistically reflects | |||
| whatever it receives (Section 2.5). | whatever it receives (Section 2.5). | |||
| The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | |||
| count of CE marks is still eventually delivered reliably). If this | count of CE marks is still eventually delivered reliably). If this | |||
| ACK does not arrive, the Server is advised to continue to send ECN- | ACK does not arrive, the Server is advised to continue to send ECN- | |||
| capable packets without having tested for mangling of the IP ECN | capable packets without having tested for mangling of the IP-ECN | |||
| field on the SYN/ACK. | field on the SYN/ACK. | |||
| All the fall-back behaviours in this section are necessary in case | All the fall-back behaviours in this section are necessary in case | |||
| mangling of the IP ECN field is asymmetric, which is currently common | mangling of the IP-ECN field is asymmetric, which is currently common | |||
| over some mobile networks [Mandalari18]. In this case, one end might | over some mobile networks [Mandalari18]. In this case, one end might | |||
| see no unsafe transition and continue sending ECN-capable packets, | see no unsafe transition and continue sending ECN-capable packets, | |||
| while the other end sees an unsafe transition and stops sending ECN- | while the other end sees an unsafe transition and stops sending ECN- | |||
| capable packets. | capable packets. | |||
| Invalid transitions of the IP ECN field are defined in Section 18 of | Invalid transitions of the IP-ECN field are defined in Section 18 of | |||
| the Classic ECN specification [RFC3168] and repeated here for | the Classic ECN specification [RFC3168] and repeated here for | |||
| convenience: | convenience: | |||
| * the Not-ECT codepoint changes; | * the Not-ECT codepoint changes; | |||
| * either ECT codepoint transitions to Not-ECT; | * either ECT codepoint transitions to Not-ECT; | |||
| * the CE codepoint changes. | * the CE codepoint changes. | |||
| RFC 3168 says that a router that changes ECT to Not-ECT is invalid | RFC 3168 says that a router that changes ECT to Not-ECT is invalid | |||
| skipping to change at line 1543 ¶ | skipping to change at line 1543 ¶ | |||
| * a broken remote TCP implementation; | * a broken remote TCP implementation; | |||
| * potential mangling of the ECN fields in the TCP headers (although | * potential mangling of the ECN fields in the TCP headers (although | |||
| unlikely given they clearly survived during the handshake). | unlikely given they clearly survived during the handshake). | |||
| This advice is not stated normatively (in capitals), because the best | This advice is not stated normatively (in capitals), because the best | |||
| strategy might depend on the likelihood to experience these | strategy might depend on the likelihood to experience these | |||
| scenarios, which can only be known at the time of deployment. | scenarios, which can only be known at the time of deployment. | |||
| Note that a host in AccECN mode MUST continue to provide Accurate ECN | | Note that a host in AccECN mode MUST continue to provide | |||
| feedback to its peer, even if it is no longer sending ECT itself over | | Accurate ECN feedback to its peer, even if it is no longer | |||
| the other half-connection. | | sending ECT itself over the other half-connection. | |||
| If reordering occurs, the first feedback packet that arrives will not | If reordering occurs, the first feedback packet that arrives will not | |||
| necessarily be the same as the first packet in sequence order. The | necessarily be the same as the first packet in sequence order. The | |||
| test has been specified loosely like this to simplify implementation, | test has been specified loosely like this to simplify implementation, | |||
| and because it would not have been any more precise to have specified | and because it would not have been any more precise to have specified | |||
| the first packet in sequence order, which would not necessarily be | the first packet in sequence order, which would not necessarily be | |||
| the first ACE counter that the Data Receiver fed back anyway, given | the first ACE counter that the Data Receiver fed back anyway, given | |||
| it might have been a retransmission. | it might have been a retransmission. | |||
| The possibility of reordering means that there is a small chance that | The possibility of reordering means that there is a small chance that | |||
| the ACE field on the first packet to arrive is genuinely zero | the ACE field on the first packet to arrive is genuinely zero | |||
| (without middlebox interference). This would cause a host to | (without middlebox interference). This would cause a host to | |||
| unnecessarily disable ECN for a half-connection. Therefore, in | unnecessarily disable ECN for a half-connection. Therefore, in | |||
| environments where there is no evidence of the ACE field being | environments where there is no evidence of the ACE field being | |||
| zeroed, implementations MAY skip this test. | zeroed, implementations MAY skip this test. | |||
| Note that the Data Sender MUST NOT test whether the arriving counter | | Note that the Data Sender MUST NOT test whether the arriving | |||
| in the initial ACE field has been initialized to a specific valid | | counter in the initial ACE field has been initialized to a | |||
| value -- the above check solely tests whether the ACE fields have | | specific valid value -- the above check solely tests whether | |||
| been incorrectly zeroed. This allows hosts to use different initial | | the ACE fields have been incorrectly zeroed. This allows hosts | |||
| values as an additional signalling channel in the future. | | to use different initial values as an additional signalling | |||
| | channel in the future. | ||||
| 3.2.2.5. Safety Against Ambiguity of the ACE Field | 3.2.2.5. Safety Against Ambiguity of the ACE Field | |||
| If too many CE-marked segments are acknowledged at once, or if a long | If too many CE-marked segments are acknowledged at once, or if a long | |||
| run of ACKs is lost or thinned out, the 3-bit counter in the ACE | run of ACKs is lost or thinned out, the 3-bit counter in the ACE | |||
| field might have cycled between two ACKs arriving at the Data Sender. | field might have cycled between two ACKs arriving at the Data Sender. | |||
| The following safety procedures minimize this ambiguity. | The following safety procedures minimize this ambiguity. | |||
| 3.2.2.5.1. Packet Receiver Safety Procedures | 3.2.2.5.1. Packet Receiver Safety Procedures | |||
| skipping to change at line 1622 ¶ | skipping to change at line 1623 ¶ | |||
| Even if a number of data packets do not arrive as one event, the | Even if a number of data packets do not arrive as one event, the | |||
| 'Change-Triggered ACKs' rule could sometimes cause the ACK rate to be | 'Change-Triggered ACKs' rule could sometimes cause the ACK rate to be | |||
| problematic for high performance (although high performance protocols | problematic for high performance (although high performance protocols | |||
| such as DCTCP already successfully use change-triggered ACKs). The | such as DCTCP already successfully use change-triggered ACKs). The | |||
| rationale for change-triggered ACKs is so that the Data Sender can | rationale for change-triggered ACKs is so that the Data Sender can | |||
| rely on them to detect queue growth as soon as possible, particularly | rely on them to detect queue growth as soon as possible, particularly | |||
| at the start of a flow. The approach can lead to some additional | at the start of a flow. The approach can lead to some additional | |||
| ACKs but it feeds back the timing and the order in which ECN marks | ACKs but it feeds back the timing and the order in which ECN marks | |||
| are received with minimal additional complexity. If CE marks are | are received with minimal additional complexity. If CE marks are | |||
| infrequent, as is the case for most Active Queue Management (AQM) | infrequent, as is the case for most Active Queue Management (AQM) | |||
| packet schedulers at the time of writing, or there are multiple marks | algorithms at the time of writing, or there are multiple marks in a | |||
| in a row, the additional load will be low. However, marking patterns | row, the additional load will be low. However, marking patterns with | |||
| with numerous non-contiguous CE marks could increase the load | numerous non-contiguous CE marks could increase the load | |||
| significantly. One possible compromise would be for the receiver to | significantly. One possible compromise would be for the receiver to | |||
| heuristically detect whether the sender is in slow-start, then to | heuristically detect whether the sender is in slow-start, then to | |||
| implement change-triggered ACKs while the sender is in slow-start, | implement change-triggered ACKs while the sender is in slow-start, | |||
| and offload otherwise. | and offload otherwise. | |||
| In a scenario where both endpoints support AccECN, if host B has | In a scenario where both endpoints support AccECN, if host B has | |||
| chosen to use ECN-capable pure ACKs (as allowed in [RFC8311] | chosen to use ECN-capable pure ACKs (as allowed in [RFC8311] | |||
| experiments) and enough of these ACKs become CE marked, then the | experiments) and enough of these ACKs become CE marked, then the | |||
| 'Increment-Triggered ACKs' rule ensures that its peer (host A) gives | 'Increment-Triggered ACKs' rule ensures that its peer (host A) gives | |||
| B sufficient feedback about this congestion on the ACKs from B to A. | B sufficient feedback about this congestion on the ACKs from B to A. | |||
| skipping to change at line 1864 ¶ | skipping to change at line 1865 ¶ | |||
| AccECN Options. To expedite connection setup in deployment scenarios | AccECN Options. To expedite connection setup in deployment scenarios | |||
| where AccECN path traversal might be problematic, the TCP Server | where AccECN path traversal might be problematic, the TCP Server | |||
| SHOULD retransmit the SYN/ACK, but with no AccECN Option. If this | SHOULD retransmit the SYN/ACK, but with no AccECN Option. If this | |||
| retransmission times out, to expedite connection setup, the TCP | retransmission times out, to expedite connection setup, the TCP | |||
| Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and | Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and | |||
| no AccECN Option, but it remains in AccECN feedback mode (per | no AccECN Option, but it remains in AccECN feedback mode (per | |||
| Section 3.1.5). | Section 3.1.5). | |||
| | Note that a retransmitted AccECN SYN/ACK will not necessarily | | Note that a retransmitted AccECN SYN/ACK will not necessarily | |||
| | have the same TCP-ECN flags as the original SYN/ACK, because it | | have the same TCP-ECN flags as the original SYN/ACK, because it | |||
| | feeds back the IP ECN field of the latest SYN to have arrived | | feeds back the IP-ECN field of the latest SYN to have arrived | |||
| | (by the rule in Section 3.1.5). | | (by the rule in Section 3.1.5). | |||
| The above fall-back approach limits any interference by middleboxes | The above fall-back approach limits any interference by middleboxes | |||
| that might drop packets with unknown options, even though it is more | that might drop packets with unknown options, even though it is more | |||
| likely that SYN/ACK loss is due to congestion. The TCP Server MAY | likely that SYN/ACK loss is due to congestion. The TCP Server MAY | |||
| try to send another packet with an AccECN Option at a later point | try to send another packet with an AccECN Option at a later point | |||
| during the connection but it ought to monitor if that packet got lost | during the connection but it ought to monitor if that packet got lost | |||
| as well, in which case it SHOULD disable the sending of AccECN | as well, in which case it SHOULD disable the sending of AccECN | |||
| Options for this half-connection. | Options for this half-connection. | |||
| skipping to change at line 1973 ¶ | skipping to change at line 1974 ¶ | |||
| * the TCP Client MAY check that the initial value of the EE0B field | * the TCP Client MAY check that the initial value of the EE0B field | |||
| or the EE1B field is non-zero on the SYN/ACK. If it runs a test | or the EE1B field is non-zero on the SYN/ACK. If it runs a test | |||
| and either initial value is zero, the Client will switch into a | and either initial value is zero, the Client will switch into a | |||
| mode that ignores AccECN Options for this half-connection. | mode that ignores AccECN Options for this half-connection. | |||
| While a host is in the mode that ignores AccECN Options, it MUST | While a host is in the mode that ignores AccECN Options, it MUST | |||
| adopt the conservative interpretation of the ACE field discussed in | adopt the conservative interpretation of the ACE field discussed in | |||
| Section 3.2.2.5. | Section 3.2.2.5. | |||
| Note that the Data Sender MUST NOT test whether the arriving byte | | Note that the Data Sender MUST NOT test whether the arriving | |||
| counters in an initial AccECN Option have been initialized to | | byte counters in an initial AccECN Option have been initialized | |||
| specific valid values -- the above checks solely test whether these | | to specific valid values -- the above checks solely test | |||
| fields have been incorrectly zeroed. This allows hosts to use | | whether these fields have been incorrectly zeroed. This allows | |||
| different initial values as an additional signalling channel in the | | hosts to use different initial values as an additional | |||
| future. Also note that the initial value of either field might be | | signalling channel in the future. Also note that the initial | |||
| greater than its expected initial value, because the counters might | | value of either field might be greater than its expected | |||
| already have been incremented. Nonetheless, the initial values of | | initial value, because the counters might already have been | |||
| the counters have been chosen so that they cannot wrap to zero on | | incremented. Nonetheless, the initial values of the counters | |||
| these initial segments. | | have been chosen so that they cannot wrap to zero on these | |||
| | initial segments. | ||||
| 3.2.3.2.5. Consistency Between AccECN Feedback Fields | 3.2.3.2.5. Consistency Between AccECN Feedback Fields | |||
| When AccECN Options are available, they ought to provide more | When AccECN Options are available, they ought to provide more | |||
| unambiguous feedback. However, they supplement but do not replace | unambiguous feedback. However, they supplement but do not replace | |||
| the ACE field. An endpoint using AccECN feedback MUST always | the ACE field. An endpoint using AccECN feedback MUST always | |||
| reconcile the information provided in the ACE field with that in any | reconcile the information provided in the ACE field with that in any | |||
| AccECN Option, so that the state of the ACE-related packet counter | AccECN Option, so that the state of the ACE-related packet counter | |||
| can be relied on if future feedback does not carry an AccECN Option. | can be relied on if future feedback does not carry an AccECN Option. | |||
| skipping to change at line 2008 ¶ | skipping to change at line 2010 ¶ | |||
| could also occur if a middlebox mangled an AccECN Option but not the | could also occur if a middlebox mangled an AccECN Option but not the | |||
| ACE field. However, the Data Sender has to assume that the integrity | ACE field. However, the Data Sender has to assume that the integrity | |||
| of AccECN Options is sound, based on the above test of the well-known | of AccECN Options is sound, based on the above test of the well-known | |||
| initial values and optionally other integrity tests (Section 5.3). | initial values and optionally other integrity tests (Section 5.3). | |||
| If either endpoint detects that the s.ceb counter has increased but | If either endpoint detects that the s.ceb counter has increased but | |||
| the s.cep has not (and by testing ACK coverage it is certain how much | the s.cep has not (and by testing ACK coverage it is certain how much | |||
| the ACE field has wrapped), and if there is no explanation other than | the ACE field has wrapped), and if there is no explanation other than | |||
| an invalid protocol transition due to some form of feedback mangling, | an invalid protocol transition due to some form of feedback mangling, | |||
| the Data Sender MUST disable sending ECN-capable packets for the | the Data Sender MUST disable sending ECN-capable packets for the | |||
| remainder of the half-connection by setting the IP ECN field in all | remainder of the half-connection by setting the IP-ECN field in all | |||
| subsequent packets to Not-ECT. | subsequent packets to Not-ECT. | |||
| 3.2.3.3. Usage of the AccECN TCP Option | 3.2.3.3. Usage of the AccECN TCP Option | |||
| If a Data Receiver in AccECN mode intends to use AccECN TCP Options | If a Data Receiver in AccECN mode intends to use AccECN TCP Options | |||
| to provide feedback, the rules below determine when to include an | to provide feedback, the rules below determine when to include an | |||
| AccECN TCP Option, and which fields to include, given other options | AccECN TCP Option, and which fields to include, given other options | |||
| might be competing for limited option space: | might be competing for limited option space: | |||
| Importance of Congestion Control: AccECN is for congestion control, | Importance of Congestion Control: AccECN is for congestion control, | |||
| skipping to change at line 2100 ¶ | skipping to change at line 2102 ¶ | |||
| is to be included. | is to be included. | |||
| The recommended scheme is intended as a simple way to ensure that all | The recommended scheme is intended as a simple way to ensure that all | |||
| the relevant byte counters will be carried on any ACK that reaches | the relevant byte counters will be carried on any ACK that reaches | |||
| the Data Sender, no matter how many pure ACKs are filtered or | the Data Sender, no matter how many pure ACKs are filtered or | |||
| coalesced along the network path, and without consuming the space | coalesced along the network path, and without consuming the space | |||
| available for payload data with counter field(s) that have never | available for payload data with counter field(s) that have never | |||
| changed. | changed. | |||
| As an example of the recommended scheme, if ECT(0) is the only | As an example of the recommended scheme, if ECT(0) is the only | |||
| codepoint that has ever arrived in the IP ECN field, the Data | codepoint that has ever arrived in the IP-ECN field, the Data | |||
| Receiver will feed back an AccECN0 TCP Option with only the EE0B | Receiver will feed back an AccECN0 TCP Option with only the EE0B | |||
| field on every packet that acknowledges new data. However, as soon | field on every packet that acknowledges new data. However, as soon | |||
| as even one CE-marked packet arrives, on every packet that | as even one CE-marked packet arrives, on every packet that | |||
| acknowledges new data it will start to include an option with two | acknowledges new data it will start to include an option with two | |||
| fields, EE0B and ECEB. As a second example, if the first packet to | fields, EE0B and ECEB. As a second example, if the first packet to | |||
| arrive happens to be CE marked, the Data Receiver will have to | arrive happens to be CE marked, the Data Receiver will have to | |||
| arbitrarily choose whether to precede the ECEB field with an EE0B | arbitrarily choose whether to precede the ECEB field with an EE0B | |||
| field or an EE1B field. If it chooses, say, EEB0 but it turns out | field or an EE1B field. If it chooses, say, EEB0 but it turns out | |||
| never to receive ECT(0), it can start sending EE1B and ECEB instead | never to receive ECT(0), it can start sending EE1B and ECEB instead | |||
| -- it does not have to include the EE0B field if the r.e0b counter | -- it does not have to include the EE0B field if the r.e0b counter | |||
| skipping to change at line 2149 ¶ | skipping to change at line 2151 ¶ | |||
| on each side complied with the present AccECN specification and each | on each side complied with the present AccECN specification and each | |||
| side negotiated AccECN independently of the other side. | side negotiated AccECN independently of the other side. | |||
| 3.3.2. Requirements for Transparent Middleboxes and TCP Normalizers | 3.3.2. Requirements for Transparent Middleboxes and TCP Normalizers | |||
| Another large class of middleboxes intervenes to some degree at the | Another large class of middleboxes intervenes to some degree at the | |||
| transport layer, but attempts to be transparent (invisible) to the | transport layer, but attempts to be transparent (invisible) to the | |||
| end-to-end connection. A subset of this class of middleboxes | end-to-end connection. A subset of this class of middleboxes | |||
| attempts to 'normalize' the TCP wire protocol by checking that all | attempts to 'normalize' the TCP wire protocol by checking that all | |||
| values in header fields comply with a rather narrow interpretation of | values in header fields comply with a rather narrow interpretation of | |||
| the TCP specifications that is not always up to date. | the TCP specifications that is also not always kept up to date. | |||
| A middlebox that is not normalizing the TCP protocol and does not | A middlebox that is not normalizing the TCP protocol and does not | |||
| itself act as a back-to-back pair of TCP endpoints (i.e., a middlebox | itself act as a back-to-back pair of TCP endpoints (i.e., a middlebox | |||
| that intends to be transparent or invisible at the transport layer) | that intends to be transparent or invisible at the transport layer) | |||
| ought to forward AccECN TCP Options unaltered, whether or not the | ought to forward AccECN TCP Options unaltered, whether or not the | |||
| length value matches one of those specified in Section 3.2.3, and | length value matches one of those specified in Section 3.2.3, and | |||
| whether or not the initial values of the byte-counter fields match | whether or not the initial values of the byte-counter fields match | |||
| those in Section 3.2.1. This is because blocking apparently invalid | those in Section 3.2.1. This is because blocking apparently invalid | |||
| values prevents the standardized set of values from being extended in | values prevents the standardized set of values from being extended in | |||
| the future (such outdated normalizers would block updated hosts from | the future (such outdated normalizers would block updated hosts from | |||
| skipping to change at line 2172 ¶ | skipping to change at line 2174 ¶ | |||
| A TCP normalizer is likely to block or alter an AccECN TCP Option if | A TCP normalizer is likely to block or alter an AccECN TCP Option if | |||
| the length value or the initial values of its byte-counter fields do | the length value or the initial values of its byte-counter fields do | |||
| not match one of those specified in Sections 3.2.3 or 3.2.1. | not match one of those specified in Sections 3.2.3 or 3.2.1. | |||
| However, to comply with the present AccECN specification, a middlebox | However, to comply with the present AccECN specification, a middlebox | |||
| MUST NOT change the ACE field; or those fields of an AccECN Option | MUST NOT change the ACE field; or those fields of an AccECN Option | |||
| that are currently specified in Section 3.2.3; or any AccECN field | that are currently specified in Section 3.2.3; or any AccECN field | |||
| covered by integrity protection (e.g., [RFC5925]). | covered by integrity protection (e.g., [RFC5925]). | |||
| 3.3.3. Requirements for TCP ACK Filtering | 3.3.3. Requirements for TCP ACK Filtering | |||
| Section 5.2.1 of RFC 3449 [BCP69] gives best current practice on | Section Section 5.2.1 of [RFC3449] gives best current practice on | |||
| filtering (aka thinning or coalescing) of pure TCP ACKs. It advises | filtering (aka thinning or coalescing) of pure TCP ACKs. It advises | |||
| that filtering ACKs carrying ECN feedback ought to preserve the | that filtering ACKs carrying ECN feedback ought to preserve the | |||
| correct operation of ECN feedback. As the present specification | correct operation of ECN feedback. As the present specification | |||
| updates the operation of ECN feedback, this section discusses how an | updates the operation of ECN feedback, this section discusses how an | |||
| ACK filter might preserve correct operation of AccECN feedback as | ACK filter might preserve correct operation of AccECN feedback as | |||
| well. | well. | |||
| The problem divides into two parts: determining if an ACK is part of | The problem divides into two parts: determining if an ACK is part of | |||
| a connection that is using AccECN and then preserving the correct | a connection that is using AccECN and then preserving the correct | |||
| operation of AccECN feedback: | operation of AccECN feedback: | |||
| * To determine whether a pure TCP ACK is part of an AccECN | * To determine whether a pure TCP ACK is part of an AccECN | |||
| connection without resorting to connection tracking and per-flow | connection without resorting to connection tracking and per-flow | |||
| state, a useful heuristic would be to check for a non-zero ECN | state, a useful heuristic would be to check for a non-zero ECN | |||
| field at the IP layer (because the ECN++ experiment only allows | field at the IP layer (because the ECN++ experiment only allows | |||
| TCP pure ACKs to be ECN-capable if AccECN has been negotiated | TCP pure ACKs to be ECN-capable if AccECN has been negotiated | |||
| [ECN++]). This heuristic is simple and stateless. However, it | [ECN++]). This heuristic is simple and stateless. However, it | |||
| might omit some AccECN ACKs because AccECN can be used without | might omit some AccECN ACKs because AccECN can be used without | |||
| ECN++. Even if ECN++ is used, pure ACKs do not necessarily have | ECN++. Even if a sender uses ECN++, it does not necessarily have | |||
| to be marked as ECN-capable -- only deployment experience will | to mark pure ACKs as ECN-capable -- only deployment experience | |||
| tell. Also, TCP ACKs might be ECN-capable owing to some scheme | will tell. Also, TCP ACKs might be ECN-capable owing to some | |||
| other than AccECN, e.g., [RFC5690] or some future standards | scheme other than AccECN, e.g., [RFC5690] or some future standards | |||
| action. Again, only deployment experience will tell. | action. Again, only deployment experience will tell. | |||
| * The main concern with preserving correct AccECN operation involves | * The main concern with preserving correct AccECN operation involves | |||
| leaving enough ACKs for the Data Sender to work out whether the | leaving enough ACKs for the Data Sender to work out whether the | |||
| 3-bit ACE field has wrapped. In the worst case, in feedback about | 3-bit ACE field has wrapped. In the worst case, in feedback about | |||
| a run of received packets that were all ECN-marked, the ACE field | a run of received packets that were all ECN-marked, the ACE field | |||
| will wrap every 8 acknowledged packets. ACE field wrap might be | will wrap every 8 acknowledged packets. ACE field wrap might be | |||
| of less concern if packets also carry AccECN TCP Options. | of less concern if packets also carry AccECN TCP Options. | |||
| However, note that logic to read an AccECN TCP Option is optional | However, note that logic to read an AccECN TCP Option is optional | |||
| to implement (albeit recommended -- see Section 3.2.3). So one | to implement (albeit recommended -- see Section 3.2.3). So one | |||
| skipping to change at line 2249 ¶ | skipping to change at line 2251 ¶ | |||
| around incompatibilities (e.g., when only global configurable TSO TCP | around incompatibilities (e.g., when only global configurable TSO TCP | |||
| Flag bitmasks are available), otherwise this would cause some issues. | Flag bitmasks are available), otherwise this would cause some issues. | |||
| One way around this could be to only negotiate for Accurate ECN, but | One way around this could be to only negotiate for Accurate ECN, but | |||
| not offer a fall back to Classic ECN [RFC3168]. Another way could be | not offer a fall back to Classic ECN [RFC3168]. Another way could be | |||
| to allow TSO only as long as the CWR flag in the TCP header is not | to allow TSO only as long as the CWR flag in the TCP header is not | |||
| set -- at the cost of more processing overhead while the ACE field | set -- at the cost of more processing overhead while the ACE field | |||
| has this bit set. | has this bit set. | |||
| For LRO in the receive direction, a different issue may get exposed | For LRO in the receive direction, a different issue may get exposed | |||
| with Classic ECN [RFC3168] supporting hardware. | with hardware that supports Classic ECN [RFC3168]. | |||
| The ACE field changes with every received CE marking, so today's | The ACE field changes with every received CE marking, so today's | |||
| receive offloading could lead to many interrupts in high congestion | receive offloading could lead to many interrupts in high congestion | |||
| situations. Although that would be useful (because congestion | situations. Although that would be useful (because congestion | |||
| information is received sooner), it could also significantly increase | information is received sooner), it could also significantly increase | |||
| processor load, particularly in scenarios such as DCTCP or L4S where | processor load, particularly in scenarios such as DCTCP or L4S where | |||
| the marking rate is generally higher. | the marking rate is generally higher. | |||
| Current offload hardware ejects a segment from the coalescing process | Current offload hardware ejects a segment from the coalescing process | |||
| whenever the TCP ECN flags change. In data centres, it has been | whenever the TCP-ECN flags change. In data centres, it has been | |||
| fortunate for this offload hardware that DCTCP-style feedback changes | fortunate for this offload hardware that DCTCP-style feedback changes | |||
| less often when there are long sequences of CE marks, which is more | less often when there are long sequences of CE marks, which is more | |||
| common with a step marking threshold (but less likely the more short | common with a step marking threshold (but less likely the more short | |||
| flows are in the mix). The ACE counter approach has been designed so | flows are in the mix). The ACE counter approach has been designed so | |||
| that coalescing can continue over arbitrary patterns of marking and | that coalescing can continue over arbitrary patterns of marking and | |||
| only needs to stop when the counter wraps. Nonetheless, until the | only needs to stop when the counter wraps. Nonetheless, until the | |||
| particular offload hardware in use implements this more efficient | particular offload hardware in use implements this more efficient | |||
| approach, it is likely to be more efficient for AccECN connections to | approach, it is likely to be more efficient for AccECN connections to | |||
| implement this counter-style logic using software segmentation | implement this counter-style logic using software segmentation | |||
| offload. | offload. | |||
| ECN encodes a varying signal in the ACK stream, so it is inevitable | ECN encodes a varying signal in the ACK stream, so it is inevitable | |||
| that offload hardware will ultimately need to handle any form of ECN | that offload hardware will ultimately need to handle any form of ECN | |||
| feedback exceptionally. The ACE field has been designed as a counter | feedback exceptionally. The ACE field has been designed as a counter | |||
| so that it is straightforward for offload hardware to pass on the | so that it is straightforward for offload hardware to pass on the | |||
| highest counter, and to push a segment from its cache before the | highest counter, and to push a segment from its cache before the | |||
| counter wraps. The purpose of working towards standardized TCP ECN | counter wraps. The purpose of working towards standardized TCP-ECN | |||
| feedback is to reduce the risk for hardware developers, who would | feedback is to reduce the risk for hardware developers, who would | |||
| otherwise have to guess which scheme is likely to become dominant. | otherwise have to guess which scheme is likely to become dominant. | |||
| The above process has been designed to enable a continuing | The above process has been designed to enable a continuing | |||
| incremental deployment path -- to more highly dynamic congestion | incremental deployment path -- to more highly dynamic congestion | |||
| control. Once offload hardware supports AccECN, it will be able to | control. Once offload hardware supports AccECN, it will be able to | |||
| coalesce efficiently for any sequence of marks, instead of relying on | coalesce efficiently for any sequence of marks, instead of relying on | |||
| the long marking sequences from step marking for efficiency. In the | the long marking sequences from step marking for efficiency. In the | |||
| next stage, marking can evolve from a step to a ramp function. That | next stage, marking can evolve from a step to a ramp function. That | |||
| in turn will allow host congestion control algorithms to respond | in turn will allow host congestion control algorithms to respond | |||
| faster to dynamics, while being backwards compatible with existing | faster to dynamics, while being backwards compatible with existing | |||
| host algorithms. | host algorithms. | |||
| 4. Updates to RFC 3168 | 4. Updates to RFC 3168 | |||
| This section clarifies which parts of RFC 3168 are updated and maps | This section clarifies which parts of RFC 3168 are updated and maps | |||
| them to the relevant updated sections of the present AccECN | them to the relevant updated sections of the present AccECN | |||
| specification. | specification. | |||
| * The whole of Section 6.1.1 of [RFC3168] is updated by Section 3.1 | * The whole of Section 6.1.1 (TCP Initialization) of [RFC3168] is | |||
| of the present specification. | updated by Section 3.1 of the present specification. | |||
| * In Section 6.1.2 of [RFC3168], all mentions of a congestion | * In Section 6.1.2 (The TCP Sender) of [RFC3168], all mentions of a | |||
| response to an ECN-Echo (ECE) ACK packet are updated by | congestion response to an ECN-Echo (ECE) ACK packet are updated by | |||
| Section 3.2 of the present specification to mean an increment to | Section 3.2 of the present specification to mean an increment to | |||
| the sender's count of CE-marked packets, s.cep. And the | the sender's count of CE-marked packets, s.cep. And the | |||
| requirements to set the CWR flag no longer apply, as specified in | requirements to set the CWR flag no longer apply, as specified in | |||
| Section 3.1.5 of the present specification. Otherwise, the | Section 3.1.5 of the present specification. Otherwise, the | |||
| remaining requirements in Section 6.1.2 of [RFC3168] still stand. | remaining requirements in Section 6.1.2 (The TCP Sender) of | |||
| [RFC3168] still stand. | ||||
| It will be noted that [RFC8311] already updates a number of the | It will be noted that [RFC8311] already updates a number of the | |||
| requirements in Section 6.1.2 of [RFC3168]. Section 6.1.2 of RFC | requirements in Section 6.1.2 (The TCP Sender) of [RFC3168]. | |||
| 3168 extended standard TCP congestion control [RFC5681] to cover | Section 6.1.2 of [RFC3168] extended standard TCP congestion | |||
| ECN marking as well as packet drop. Whereas, [RFC8311] enables | control [RFC5681] to cover ECN marking as well as packet drop. | |||
| experimentation with alternative responses to ECN marking, if | Whereas, [RFC8311] enables experimentation with alternative | |||
| specified for instance by an Experimental RFC produced by the IETF | responses to ECN marking, if specified for instance by an | |||
| Stream. [RFC8311] also strengthened the statement that "ECT(0) | Experimental RFC produced by the IETF Stream. [RFC8311] also | |||
| SHOULD be used" to a "MUST" (see [RFC8311] for the details). | strengthened the statement that "ECT(0) SHOULD be used" to a | |||
| "MUST" (see [RFC8311] for the details). | ||||
| * The whole of Section 6.1.3 of [RFC3168] is updated by Section 3.2 | * The whole of Section 6.1.3 (The TCP Receiver) of [RFC3168] is | |||
| of the present specification, with the exception of the last | updated by Section 3.2 of the present specification, with the | |||
| paragraph (about congestion response to drop and ECN in the same | exception of the last paragraph (about congestion response to drop | |||
| round trip), which still stands. Incidentally, this last | and ECN in the same round trip), which still stands. | |||
| paragraph is in the wrong section, because it relates to "TCP | Incidentally, this last paragraph is in the wrong section, because | |||
| Sender" behaviour. | it relates to "TCP Sender" behaviour. | |||
| * The following text within Section 6.1.5 of [RFC3168]: | * The following text within Section 6.1.5 (Retransmitted TCP | |||
| packets) of [RFC3168]: | ||||
| | the TCP data receiver SHOULD ignore the ECN field on arriving | | the TCP data receiver SHOULD ignore the ECN field on arriving | |||
| | data packets that are outside of the receiver's current window. | | data packets that are outside of the receiver's current window. | |||
| is updated by more stringent acceptability tests for any packet | is updated by more stringent acceptability tests for any packet | |||
| (not just data packets) in the present specification. | (not just data packets) in the present specification. | |||
| Specifically, in the normative specification of AccECN | Specifically, in the normative specification of AccECN | |||
| (Section 3), only 'Acceptable' packets contribute to the ECN | (Section 3), only 'Acceptable' packets contribute to the ECN | |||
| counters at the AccECN receiver and Section 1.3 defines an | counters at the AccECN receiver and Section 1.3 defines an | |||
| Acceptable packet as one that passes acceptability tests | Acceptable packet as one that passes acceptability tests | |||
| equivalent in strength to those in both [RFC9293] and [RFC5961]. | equivalent in strength to those in both [RFC9293] and [RFC5961]. | |||
| * Sections 5.2, 6.1.1, 6.1.4, 6.1.5, and 6.1.6 of [RFC3168] prohibit | * Sections 5.2 (Dropped or Corrupted Packets), 6.1.1 (TCP | |||
| use of ECN on TCP control packets and retransmissions. The | Initialization), 6.1.4 (Congestion on the ACK-path), 6.1.5 | |||
| present specification does not update that aspect of [RFC3168], | (Retransmitted TCP packets), and 6.1.6 (TCP Window Probes) of | |||
| but it does say what feedback an AccECN Data Receiver ought to | [RFC3168] prohibit use of ECN on TCP control packets and | |||
| provide if it receives an ECN-capable control packet or | retransmissions. The present specification does not update that | |||
| retransmission. This ensures AccECN is forward compatible with | aspect of [RFC3168], but it does say what feedback an AccECN Data | |||
| any future scheme that allows ECN on these packets, as provided | Receiver ought to provide if it receives an ECN-capable control | |||
| for in Section 4.3 of [RFC8311] and as proposed in [ECN++]. | packet or retransmission. This ensures AccECN is forward | |||
| compatible with any future scheme that allows ECN on these | ||||
| packets, as provided for in Section 4.3 of [RFC8311] and as | ||||
| proposed in [ECN++]. | ||||
| 5. Interaction with TCP Variants | 5. Interaction with TCP Variants | |||
| This section is informative, not normative. | This section is informative, not normative. | |||
| 5.1. Compatibility with SYN Cookies | 5.1. Compatibility with SYN Cookies | |||
| A TCP Server can use SYN Cookies (see Appendix A of [RFC4987]) to | A TCP Server can use SYN Cookies (see Appendix A of [RFC4987]) to | |||
| protect itself from SYN flooding attacks. It places minimal commonly | protect itself from SYN flooding attacks. It places minimal commonly | |||
| used connection state in the SYN/ACK, and deliberately does not hold | used connection state in the SYN/ACK, and deliberately does not hold | |||
| skipping to change at line 2393 ¶ | skipping to change at line 2401 ¶ | |||
| 5.2. Compatibility with TCP Experiments and Common TCP Options | 5.2. Compatibility with TCP Experiments and Common TCP Options | |||
| AccECN is compatible (at least on paper) with the most commonly used | AccECN is compatible (at least on paper) with the most commonly used | |||
| TCP Options: MSS, timestamp, window scaling, SACK, and TCP-AO. It is | TCP Options: MSS, timestamp, window scaling, SACK, and TCP-AO. It is | |||
| also compatible with Multipath TCP (MPTCP [RFC8684]) and the | also compatible with Multipath TCP (MPTCP [RFC8684]) and the | |||
| experimental TCP Option TCP Fast Open (TFO [RFC7413]). AccECN is | experimental TCP Option TCP Fast Open (TFO [RFC7413]). AccECN is | |||
| friendly to all these protocols, because space for TCP Options is | friendly to all these protocols, because space for TCP Options is | |||
| particularly scarce on the SYN, where AccECN consumes zero additional | particularly scarce on the SYN, where AccECN consumes zero additional | |||
| header space. | header space. | |||
| Because option space is limited, Section 3.2.3.3 provides guidance on | Because option space is limited, Section 3.2.3.3 specifies which | |||
| how important it is to send an AccECN Option relative to other | AccECN Option fields are more important to include and provides | |||
| options and specifies which fields are more important to include. | guidance on the relative importance of AccECN Options against other | |||
| TCP Options. | ||||
| Implementers of TFO need to take careful note of the recommendation | Implementers of TFO need to take careful note of the recommendation | |||
| in Section 3.2.2.1. That section recommends that, if the TCP Client | in Section 3.2.2.1. That section recommends that, if the TCP Client | |||
| has successfully negotiated AccECN, when acknowledging the SYN/ACK, | has successfully negotiated AccECN, when acknowledging the SYN/ACK, | |||
| even if it has data to send, it sends a pure ACK immediately before | even if it has data to send, it sends a pure ACK immediately before | |||
| the data. Then it can reflect the IP ECN field of the SYN/ACK on | the data. Then it can reflect the IP-ECN field of the SYN/ACK on | |||
| this pure ACK, which allows the Server to detect ECN mangling. Note | this pure ACK, which allows the Server to detect ECN mangling. Note | |||
| that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0) | that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0) | |||
| is not included in any of the byte counters held locally for each ECN | is not included in any of the byte counters held locally for each ECN | |||
| marking, nor in the AccECN Option on the wire. | marking, nor in the AccECN Option on the wire. | |||
| AccECN feedback is compatible with the ECN++ experiment [ECN++], | AccECN feedback is compatible with the ECN++ experiment [ECN++], | |||
| which allows TCP control packets and retransmissions to be ECN- | which allows TCP control packets and retransmissions to be ECN- | |||
| capable ([RFC3168] was updated by [RFC8311] to permit such | capable ([RFC3168] was updated by [RFC8311] to permit such | |||
| experiments). AccECN is likely to inherently support any experiment | experiments). AccECN is likely to inherently support any experiment | |||
| with ECN-capable packets, because it feeds back the contents of the | with ECN-capable packets, because it feeds back the contents of the | |||
| skipping to change at line 2425 ¶ | skipping to change at line 2434 ¶ | |||
| an earlier experimental protocol with narrower scope than ECN++ and a | an earlier experimental protocol with narrower scope than ECN++ and a | |||
| 5-way handshake. | 5-way handshake. | |||
| 5.3. Compatibility with Feedback Integrity Mechanisms | 5.3. Compatibility with Feedback Integrity Mechanisms | |||
| Three alternative mechanisms are available to assure the integrity of | Three alternative mechanisms are available to assure the integrity of | |||
| ECN and/or loss signals. AccECN is compatible with any of these | ECN and/or loss signals. AccECN is compatible with any of these | |||
| approaches: | approaches: | |||
| * The Data Sender can test the integrity of the receiver's ECN (or | * The Data Sender can test the integrity of the receiver's ECN (or | |||
| loss) feedback by occasionally setting the IP ECN field to a value | loss) feedback by occasionally setting the IP-ECN field to a value | |||
| normally only set by the network (and/or deliberately leaving a | normally only set by the network (and/or deliberately leaving a | |||
| sequence number gap). Then it can test whether the Data | sequence number gap). Then it can test whether the Data | |||
| Receiver's feedback faithfully reports what it expects (similar to | Receiver's feedback faithfully reports what it expects (similar to | |||
| paragraph 2 of Section 20.2 of [RFC3168]). Unlike the ECN-nonce | paragraph 2 of Section 20.2 of [RFC3168]). Unlike the ECN-nonce | |||
| [RFC3540], this approach does not waste the ECT(1) codepoint in | [RFC3540], this approach does not waste the ECT(1) codepoint in | |||
| the IP header, it does not require standardization, and it does | the IP header, it does not require standardization, and it does | |||
| not rely on misbehaving receivers volunteering to reveal feedback | not rely on misbehaving receivers volunteering to reveal feedback | |||
| information that allows them to be detected. However, setting the | information that allows them to be detected. However, setting the | |||
| CE mark by the sender might conceal actual congestion feedback | CE mark by the sender might conceal actual congestion feedback | |||
| from the network and therefore ought to only be done sparingly. | from the network and therefore ought to only be done sparingly. | |||
| skipping to change at line 2546 ¶ | skipping to change at line 2555 ¶ | |||
| can assure the integrity of ECN feedback. If AccECN Options are | can assure the integrity of ECN feedback. If AccECN Options are | |||
| stripped, the resolution of the feedback is degraded, but the | stripped, the resolution of the feedback is degraded, but the | |||
| integrity of this degraded feedback can still be assured. | integrity of this degraded feedback can still be assured. | |||
| Backward Compatibility: If only one endpoint supports the AccECN | Backward Compatibility: If only one endpoint supports the AccECN | |||
| scheme, it will fall back to the most advanced ECN feedback scheme | scheme, it will fall back to the most advanced ECN feedback scheme | |||
| supported by the other end. | supported by the other end. | |||
| If AccECN Options are stripped by a middlebox, AccECN still | If AccECN Options are stripped by a middlebox, AccECN still | |||
| provides basic congestion feedback in the ACE field. Further, | provides basic congestion feedback in the ACE field. Further, | |||
| AccECN can be used to detect mangling of the IP ECN field; | AccECN can be used to detect mangling of the IP-ECN field; | |||
| mangling of the TCP ECN flags; blocking of ECT-marked segments; | mangling of the TCP-ECN flags; blocking of ECT-marked segments; | |||
| and blocking of segments carrying an AccECN Option. It can detect | and blocking of segments carrying an AccECN Option. It can detect | |||
| these conditions during TCP's three-way handshake so that it can | these conditions during TCP's three-way handshake so that it can | |||
| fall back to operation without ECN and/or operation without AccECN | fall back to operation without ECN and/or operation without AccECN | |||
| Options. | Options. | |||
| Forward Compatibility: The behaviour of endpoints and middleboxes is | Forward Compatibility: The behaviour of endpoints and middleboxes is | |||
| carefully defined for all reserved or currently unused codepoints | carefully defined for all reserved or currently unused codepoints | |||
| in the scheme. Then, the designers of security devices can | in the scheme. Then, the designers of security devices can | |||
| understand which currently unused values might appear in the | understand which currently unused values might appear in the | |||
| future. So, even if they choose to treat such values as anomalous | future. So, even if they choose to treat such values as anomalous | |||
| while they are not widely used, any blocking will at least be | while they are not widely used, any blocking will at least be | |||
| under policy control and not hard-coded. Then, if previously | under policy control, not hard-coded. Then, if previously unused | |||
| unused values start to appear on the Internet (or in standards), | values start to appear on the Internet (or in standards), such | |||
| such policies could be quickly reversed. | policies could be quickly reversed. | |||
| 7. IANA Considerations | 7. IANA Considerations | |||
| This document reassigns the TCP header flag at bit offset 7 to the | This document reassigns the TCP header flag at bit offset 7 to the | |||
| AccECN protocol. This bit was previously called the Nonce Sum (NS) | AccECN protocol. This bit was previously called the Nonce Sum (NS) | |||
| flag [RFC3540], but RFC 3540 has been reclassified as Historic | flag [RFC3540], but RFC 3540 has been reclassified as Historic | |||
| [RFC8311]. The flag is now defined as the following in the "TCP | [RFC8311]. The flag is now defined as the following in the "TCP | |||
| Header Flags" registry in the "Transmission Control Protocol (TCP) | Header Flags" registry in the "Transmission Control Protocol (TCP) | |||
| Parameters" registry group: | Parameters" registry group: | |||
| skipping to change at line 2641 ¶ | skipping to change at line 2650 ¶ | |||
| still be assured. Assuring that Data Senders respond appropriately | still be assured. Assuring that Data Senders respond appropriately | |||
| to ECN feedback is possible, but the scope of the present document is | to ECN feedback is possible, but the scope of the present document is | |||
| confined to the feedback protocol and excludes the response to this | confined to the feedback protocol and excludes the response to this | |||
| feedback. | feedback. | |||
| In Section 3.2.3, a Data Sender is allowed to ignore an unrecognized | In Section 3.2.3, a Data Sender is allowed to ignore an unrecognized | |||
| TCP AccECN Option length and read as many whole 3-octet fields from | TCP AccECN Option length and read as many whole 3-octet fields from | |||
| it as possible up to a maximum of 3, treating the remainder as | it as possible up to a maximum of 3, treating the remainder as | |||
| padding. This opens up a potential covert channel of up to 29B (40 - | padding. This opens up a potential covert channel of up to 29B (40 - | |||
| (2+3*3)). However, it is really an overt channel (not hidden) and it | (2+3*3)). However, it is really an overt channel (not hidden) and it | |||
| is no different than the use of unknown TCP Options with unknown | is no different from the use of unknown TCP Options with unknown | |||
| option lengths in general. Therefore, where this is of concern, it | option lengths in general. Therefore, where this is of concern, it | |||
| can already be adequately mitigated by regular TCP normalizer | can already be adequately mitigated by regular TCP normalizer | |||
| technology (see Section 3.3.2). | technology (see Section 3.3.2). | |||
| The AccECN protocol is not believed to introduce any new privacy | ||||
| concerns, because it merely counts and feeds back signals at the | ||||
| transport layer that had already been visible at the IP layer. A | ||||
| covert channel can be used to compromise privacy. However, as | ||||
| explained above, undefined TCP Options in general open up such | ||||
| channels, and common techniques are available to close them off. | ||||
| There is a potential concern that a Data Receiver could deliberately | There is a potential concern that a Data Receiver could deliberately | |||
| omit AccECN Options pretending that they had been stripped by a | omit AccECN Options pretending that they had been stripped by a | |||
| middlebox. Currently, there is no known way for a receiver to take | middlebox. Currently, there is no known way for a receiver to take | |||
| advantage of this behaviour, which seems to always degrade its own | advantage of this behaviour, which seems to always degrade its own | |||
| performance. However, the concern is mentioned here for | performance. However, the concern is mentioned here for | |||
| completeness. | completeness. | |||
| The AccECN protocol is not believed to introduce any new privacy | ||||
| concerns, because it merely counts and feeds back signals at the | ||||
| transport layer that had already been visible at the IP layer. A | ||||
| covert channel can be used to compromise privacy. However, as | ||||
| explained above, undefined TCP Options in general open up such | ||||
| channels, and common techniques are available to close them off. | ||||
| A generic privacy concern of any new protocol is that for a while it | A generic privacy concern of any new protocol is that for a while it | |||
| will be used by a small population of hosts, and thus those hosts | will be used by a small population of hosts, and thus those hosts | |||
| could be more easily identified. However, it is expected that AccECN | could be more easily identified. However, it is expected that AccECN | |||
| will become available in operating systems over time and that it will | will become available in more operating systems over time and that it | |||
| eventually be turned on by default. Thus, an individual | will eventually be turned on by default. Thus, an individual | |||
| identification of a particular user is less of a concern than the | identification of a particular user is less of a concern than the | |||
| fingerprinting of specific versions of operation systems. However, | fingerprinting of specific versions of operation systems. However, | |||
| the latter can be done using different means independent of Accurate | the latter can be done using different means independent of Accurate | |||
| ECN. | ECN. | |||
| As Accurate ECN exposes more bits in the TCP header that could be | As Accurate ECN exposes more bits in the TCP header that could be | |||
| tampered with without interfering with the transport excessively, it | tampered with without interfering with the transport excessively, it | |||
| may allow an additional way to identify specific data streams across | may allow an additional way to identify specific data streams across | |||
| a virtual private network (VPN) to an attacker that has access to the | a virtual private network (VPN) to an attacker that has access to the | |||
| datastream before and after the VPN tunnel endpoints. This may be | datastream before and after the VPN tunnel endpoints. This may be | |||
| achieved by injecting or modifying the ACE field in specific patterns | achieved by injecting or modifying the ACE field in specific patterns | |||
| that can be recognized. | that can be recognized. | |||
| Overall, Accurate ECN does not change the risk profile on privacy to | Overall, Accurate ECN does not change the risk profile on privacy to | |||
| a user dramatically beyond what is already possible using classic | a user dramatically beyond what is already possible using classic | |||
| ECN. However, in order to prevent such attacks and means of easier | ECN. However, in order to prevent such attacks and means of easier | |||
| identification of flows, it is advisable for privacy-conscious users | identification of flows, it is advisable for privacy-conscious users | |||
| behind VPNs to not enable the Accurate ECN, or Classic ECN for that | behind VPNs to not enable Accurate ECN, or Classic ECN for that | |||
| matter. | matter. | |||
| 9. References | 9. References | |||
| 9.1. Normative References | 9.1. Normative References | |||
| [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | |||
| Selective Acknowledgment Options", RFC 2018, | Selective Acknowledgment Options", RFC 2018, | |||
| DOI 10.17487/RFC2018, October 1996, | DOI 10.17487/RFC2018, October 1996, | |||
| <https://www.rfc-editor.org/info/rfc2018>. | <https://www.rfc-editor.org/info/rfc2018>. | |||
| skipping to change at line 2724 ¶ | skipping to change at line 2733 ¶ | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
| [RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | [RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | |||
| STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | |||
| <https://www.rfc-editor.org/info/rfc9293>. | <https://www.rfc-editor.org/info/rfc9293>. | |||
| 9.2. Informative References | 9.2. Informative References | |||
| [BCP69] Best Current Practice 69, | ||||
| <https://www.rfc-editor.org/info/bcp69>. | ||||
| At the time of writing, this BCP comprises the following: | ||||
| Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | ||||
| Sooriyabandara, "TCP Performance Implications of Network | ||||
| Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449, | ||||
| December 2002, <https://www.rfc-editor.org/info/rfc3449>. | ||||
| [ECN++] Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit | [ECN++] Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit | |||
| Congestion Notification (ECN) to TCP Control Packets", | Congestion Notification (ECN) to TCP Control Packets", | |||
| Work in Progress, Internet-Draft, draft-ietf-tcpm- | Work in Progress, Internet-Draft, draft-ietf-tcpm- | |||
| generalized-ecn-17, 21 April 2025, | generalized-ecn-17, 21 April 2025, | |||
| <https://datatracker.ietf.org/doc/html/draft-ietf-tcpm- | <https://datatracker.ietf.org/doc/html/draft-ietf-tcpm- | |||
| generalized-ecn-17>. | generalized-ecn-17>. | |||
| [Mandalari18] | [Mandalari18] | |||
| Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Ö. | Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Ö. | |||
| Alay, "Measuring ECN++: Good News for ++, Bad News for ECN | Alay, "Measuring ECN++: Good News for ++, Bad News for ECN | |||
| over Mobile", IEEE Communications Magazine , March 2018, | over Mobile", IEEE Communications Magazine , March 2018, | |||
| <http://www.it.uc3m.es/amandala/ | <http://www.it.uc3m.es/amandala/ | |||
| ecn++/ecn_commag_2018.html>. | ecn++/ecn_commag_2018.html>. | |||
| [RFC3449] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | ||||
| Sooriyabandara, "TCP Performance Implications of Network | ||||
| Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449, | ||||
| December 2002, <https://www.rfc-editor.org/info/rfc3449>. | ||||
| [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | |||
| Congestion Notification (ECN) Signaling with Nonces", | Congestion Notification (ECN) Signaling with Nonces", | |||
| RFC 3540, DOI 10.17487/RFC3540, June 2003, | RFC 3540, DOI 10.17487/RFC3540, June 2003, | |||
| <https://www.rfc-editor.org/info/rfc3540>. | <https://www.rfc-editor.org/info/rfc3540>. | |||
| [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common | [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common | |||
| Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, | Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, | |||
| <https://www.rfc-editor.org/info/rfc4987>. | <https://www.rfc-editor.org/info/rfc4987>. | |||
| [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. | [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. | |||
| skipping to change at line 2858 ¶ | skipping to change at line 2863 ¶ | |||
| (L4S) Internet Service: Architecture", RFC 9330, | (L4S) Internet Service: Architecture", RFC 9330, | |||
| DOI 10.17487/RFC9330, January 2023, | DOI 10.17487/RFC9330, January 2023, | |||
| <https://www.rfc-editor.org/info/rfc9330>. | <https://www.rfc-editor.org/info/rfc9330>. | |||
| [RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., | [RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., | |||
| "CUBIC for Fast and Long-Distance Networks", RFC 9438, | "CUBIC for Fast and Long-Distance Networks", RFC 9438, | |||
| DOI 10.17487/RFC9438, August 2023, | DOI 10.17487/RFC9438, August 2023, | |||
| <https://www.rfc-editor.org/info/rfc9438>. | <https://www.rfc-editor.org/info/rfc9438>. | |||
| [RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture | [RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture | |||
| Specification", | Specification", Volume 1, Release 1.4, 2020, | |||
| <https://www.infinibandta.org/ibta-specification/>. | <https://www.infinibandta.org/ibta-specification/>. | |||
| Appendix A. Example Algorithms | Appendix A. Example Algorithms | |||
| This appendix is informative, not normative. It gives example | This appendix is informative, not normative. It gives example | |||
| algorithms that would satisfy the normative requirements of the | algorithms that would satisfy the normative requirements of the | |||
| AccECN protocol. However, implementers are free to choose other ways | AccECN protocol. However, implementers are free to choose other ways | |||
| to satisfy the requirements. | to satisfy the requirements. | |||
| A.1. Example Algorithm to Encode/Decode the AccECN Option | A.1. Example Algorithm to Encode/Decode the AccECN Option | |||
| skipping to change at line 2943 ¶ | skipping to change at line 2948 ¶ | |||
| heuristically detect a long enough unbroken string of ACK losses that | heuristically detect a long enough unbroken string of ACK losses that | |||
| could have concealed a cycle of the congestion counter in the ACE | could have concealed a cycle of the congestion counter in the ACE | |||
| field of the next ACK to arrive. | field of the next ACK to arrive. | |||
| Two variants of the algorithm are given: i) a more conservative | Two variants of the algorithm are given: i) a more conservative | |||
| variant for a Data Sender to use if it detects that AccECN Options | variant for a Data Sender to use if it detects that AccECN Options | |||
| are not available (see Section 3.2.2.5 and Section 3.2.3.2); and ii) | are not available (see Section 3.2.2.5 and Section 3.2.3.2); and ii) | |||
| a less conservative variant that is feasible when complementary | a less conservative variant that is feasible when complementary | |||
| information is available from AccECN Options. | information is available from AccECN Options. | |||
| A.2.1. Safety Algorithm Without the AccECN Option | A.2.1. Safety Algorithm without the AccECN Option | |||
| It is assumed that each local packet counter is a sufficiently sized | It is assumed that each local packet counter is a sufficiently sized | |||
| unsigned integer (probably 32b) and that the following constant has | unsigned integer (probably 32b) and that the following constant has | |||
| been assigned: | been assigned: | |||
| DIVACE = 2^3 | DIVACE = 2^3 | |||
| Every time an Acceptable CE marked packet arrives (Section 3.2.2.2), | Every time an Acceptable CE marked packet arrives (Section 3.2.2.2), | |||
| the Data Receiver increments its local value of r.cep by 1. It | the Data Receiver increments its local value of r.cep by 1. It | |||
| repeats the same value of ACE in every subsequent ACK until the next | repeats the same value of ACE in every subsequent ACK until the next | |||
| skipping to change at line 3030 ¶ | skipping to change at line 3035 ¶ | |||
| average segment size and prevailing ECN marking. For instance, | average segment size and prevailing ECN marking. For instance, | |||
| newlyAckedPkt in the above formula could be replaced with | newlyAckedPkt in the above formula could be replaced with | |||
| newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing | newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing | |||
| segment size and p is the prevailing ECN marking probability. | segment size and p is the prevailing ECN marking probability. | |||
| However, ultimately, if TCP's ECN feedback becomes inaccurate, it | However, ultimately, if TCP's ECN feedback becomes inaccurate, it | |||
| still has loss detection to fall back on. Therefore, it would seem | still has loss detection to fall back on. Therefore, it would seem | |||
| safe to implement a simple algorithm, rather than a perfect one. | safe to implement a simple algorithm, rather than a perfect one. | |||
| The simple algorithm for dSafer.cep above requires no monitoring of | The simple algorithm for dSafer.cep above requires no monitoring of | |||
| prevailing conditions and it would still be safe if, for example, | prevailing conditions and it would still be safe if, for example, | |||
| segments were on average at least 5% of a full-sized packet as long | segments were on average at least 5% of a full-sized segment as long | |||
| as ECN marking was 5% or less. Assuming it was used, the Data Sender | as ECN marking was 5% or less. Assuming it was used, the Data Sender | |||
| would increment its packet counter as follows: | would increment its packet counter as follows: | |||
| s.cep += dSafer.cep | s.cep += dSafer.cep | |||
| If missing acknowledgement numbers arrive later (due to reordering), | If missing acknowledgement numbers arrive later (due to reordering), | |||
| Section 3.2.2.5.2 says "the Data Sender MAY attempt to neutralize the | Section 3.2.2.5.2 says "the Data Sender MAY attempt to neutralize the | |||
| effect of any action it took based on a conservative assumption that | effect of any action it took based on a conservative assumption that | |||
| it later found to be incorrect". To do this, the Data Sender would | it later found to be incorrect". To do this, the Data Sender would | |||
| have to store the values of all the relevant variables whenever it | have to store the values of all the relevant variables whenever it | |||
| skipping to change at line 3118 ¶ | skipping to change at line 3123 ¶ | |||
| size is more likely to have been just less than one MSS, rather | size is more likely to have been just less than one MSS, rather | |||
| than below MSS/2. | than below MSS/2. | |||
| If pure ACKs were allowed to be ECN-capable, missing ACKs would be | If pure ACKs were allowed to be ECN-capable, missing ACKs would be | |||
| far less likely. However, because [RFC3168] currently precludes | far less likely. However, because [RFC3168] currently precludes | |||
| this, the above algorithm assumes that pure ACKs are not ECN-capable. | this, the above algorithm assumes that pure ACKs are not ECN-capable. | |||
| A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets | A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets | |||
| If AccECN Options are not available, the Data Sender can only decode | If AccECN Options are not available, the Data Sender can only decode | |||
| a CE marking from the ACE field in packets. Every time an ACK | the ACE field as a number of marked packets. Every time an ACK | |||
| arrives, to convert the number of CE markings into an estimate of CE- | arrives, to convert the number of CE markings into an estimate of CE- | |||
| marked bytes, it needs an average of the segment size, s_ave. Then | marked bytes, it needs an average of the segment size, s_ave. Then | |||
| it can add or subtract s_ave from the value of d.ceb as the value of | it can add or subtract s_ave from the value of d.ceb as the value of | |||
| d.cep increments or decrements. Some possible ways to calculate | d.cep increments or decrements. Some possible ways to calculate | |||
| s_ave are outlined below. The precise details will depend on why an | s_ave are outlined below. The precise details will depend on why an | |||
| estimate of marked bytes is needed. | estimate of marked bytes is needed. | |||
| The implementation could keep a record of the byte numbers of all the | The implementation could keep a record of the byte numbers of all the | |||
| boundaries between packets in flight (including control packets), and | boundaries between packets in flight (including control packets), and | |||
| recalculate s_ave on every ACK. However, it would be simpler to | recalculate s_ave on every ACK. However, it would be simpler to | |||
| skipping to change at line 3178 ¶ | skipping to change at line 3183 ¶ | |||
| IPv6 Traffic Class field). To detect bleaching, it will be | IPv6 Traffic Class field). To detect bleaching, it will be | |||
| sufficient to detect whether nearly all bytes arrive marked as Not- | sufficient to detect whether nearly all bytes arrive marked as Not- | |||
| ECT. Therefore, there ought to be no need to keep track of the | ECT. Therefore, there ought to be no need to keep track of the | |||
| details of retransmissions. | details of retransmissions. | |||
| Appendix B. Rationale for Usage of TCP Header Flags | Appendix B. Rationale for Usage of TCP Header Flags | |||
| B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | |||
| AccECN uses a rather unorthodox approach to negotiate the highest | AccECN uses a rather unorthodox approach to negotiate the highest | |||
| version TCP ECN feedback scheme that both ends support, as justified | version TCP-ECN feedback scheme that both ends support, as justified | |||
| below. It follows from the original TCP ECN capability negotiation | below. It follows from the original TCP-ECN capability negotiation | |||
| [RFC3168], in which the Client set the 2 least significant of the | [RFC3168], in which the Client set the 2 least significant of the | |||
| original reserved flags in the TCP header, and fell back to No ECN | original reserved flags in the TCP header, and fell back to no | |||
| support if the Server responded with the 2 flags cleared, which had | support for ECN if the Server responded with the 2 flags cleared, | |||
| previously been the default. | which had previously been the default. | |||
| Classic ECN used header flags rather than a TCP Option because it was | Classic ECN used header flags rather than a TCP Option because it was | |||
| considered more efficient to use a header flag for 1 bit of feedback | considered more efficient to use a header flag for 1 bit of feedback | |||
| per ACK, and this bit could be overloaded to indicate support for | per ACK, and this bit could be overloaded to indicate support for | |||
| Classic ECN during the handshake. During the development of ECN, 1 | Classic ECN during the handshake. During the development of ECN, 1 | |||
| bit crept up to 2, in order to deliver the feedback reliably and to | bit crept up to 2, in order to deliver the feedback reliably and to | |||
| work round some broken hosts that reflected the reserved flags during | work round some broken hosts that reflected the reserved flags during | |||
| the handshake. | the handshake. | |||
| In order to be backward compatible with RFC 3168, AccECN continues | In order to be backward compatible with RFC 3168, AccECN continues | |||
| skipping to change at line 3238 ¶ | skipping to change at line 3243 ¶ | |||
| indicate on the SYN/ACK, four already indicated earlier (or broken) | indicate on the SYN/ACK, four already indicated earlier (or broken) | |||
| versions of ECN support, one now being Historic. In the early design | versions of ECN support, one now being Historic. In the early design | |||
| of AccECN, an AccECN Server could use only 2 of the 4 remaining | of AccECN, an AccECN Server could use only 2 of the 4 remaining | |||
| codepoints. They both indicated AccECN support, but one fed back | codepoints. They both indicated AccECN support, but one fed back | |||
| that the SYN had arrived marked as CE. Even though ECN support on a | that the SYN had arrived marked as CE. Even though ECN support on a | |||
| SYN is not yet on the Standards Track, the idea is for either end to | SYN is not yet on the Standards Track, the idea is for either end to | |||
| act as a mechanistic reflector, so that future capabilities can be | act as a mechanistic reflector, so that future capabilities can be | |||
| unilaterally deployed without requiring 2-ended deployment (justified | unilaterally deployed without requiring 2-ended deployment (justified | |||
| in Section 2.5). | in Section 2.5). | |||
| During traversal testing, it was discovered that the IP ECN field in | During traversal testing, it was discovered that the IP-ECN field in | |||
| the SYN was mangled on a non-negligible proportion of paths. | the SYN was mangled on a non-negligible proportion of paths. | |||
| Therefore, it was necessary to allow the SYN/ACK to feed all four IP | Therefore, it was necessary to allow the SYN/ACK to feed all four IP- | |||
| ECN codepoints that the SYN could arrive with back to the Client. | ECN codepoints that the SYN could arrive with back to the Client. | |||
| Without this, the Client could not know whether to disable ECN for | Without this, the Client could not know whether to disable ECN for | |||
| the connection due to mangling of the IP ECN field (also explained in | the connection due to mangling of the IP-ECN field (also explained in | |||
| Section 2.5). This development consumed the remaining two codepoints | Section 2.5). This development consumed the remaining two codepoints | |||
| on the SYN/ACK that had been reserved for future use by AccECN in | on the SYN/ACK that had been reserved for future use by AccECN in | |||
| earlier draft versions of this document. | earlier draft versions of this document. | |||
| B.3. Space for Future Evolution | B.3. Space for Future Evolution | |||
| Despite availability of usable TCP header space being extremely | Despite availability of usable TCP header space being extremely | |||
| scarce, the AccECN protocol has taken all possible steps to ensure | scarce, the AccECN protocol has taken all possible steps to ensure | |||
| that there is space to negotiate possible future variants of the | that there is space to negotiate possible future variants of the | |||
| protocol, either if a variant of AccECN is required, or if a | protocol, either if a variant of AccECN is required, or if a | |||
| End of changes. 101 change blocks. | ||||
| 207 lines changed or deleted | 212 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||