rfc9626.original | rfc9626.txt | |||
---|---|---|---|---|
Network Working Group M. Zanaty | Internet Engineering Task Force (IETF) M. Zanaty | |||
Internet-Draft E. Berger | Request for Comments: 9626 E. Berger | |||
Intended status: Experimental S. Nandakumar | Category: Experimental S. Nandakumar | |||
Expires: 5 September 2024 Cisco Systems | ISSN: 2070-1721 Cisco Systems | |||
4 March 2024 | August 2024 | |||
Video Frame Marking RTP Header Extension | Video Frame Marking RTP Header Extension | |||
draft-ietf-avtext-framemarking-16 | ||||
Abstract | Abstract | |||
This document describes a Video Frame Marking RTP header extension | This document describes a Video Frame Marking RTP header extension | |||
used to convey information about video frames that is critical for | used to convey information about video frames that is critical for | |||
error recovery and packet forwarding in RTP middleboxes or network | error recovery and packet forwarding in RTP middleboxes or network | |||
nodes. It is most useful when media is encrypted, and essential when | nodes. It is most useful when media is encrypted and essential when | |||
the middlebox or node has no access to the media decryption keys. It | the middlebox or node has no access to the media decryption keys. It | |||
is also useful for codec-agnostic processing of encrypted or | is also useful for codec-agnostic processing of encrypted or | |||
unencrypted media, while it also supports extensions for codec- | unencrypted media, while it also supports extensions for codec- | |||
specific information. | specific information. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
provisions of BCP 78 and BCP 79. | published for examination, experimental implementation, and | |||
evaluation. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document defines an Experimental Protocol for the Internet | |||
and may be updated, replaced, or obsoleted by other documents at any | community. This document is a product of the Internet Engineering | |||
time. It is inappropriate to use Internet-Drafts as reference | Task Force (IETF). It represents the consensus of the IETF | |||
material or to cite them other than as "work in progress." | community. It has received public review and has been approved for | |||
publication by the Internet Engineering Steering Group (IESG). Not | ||||
all documents approved by the IESG are candidates for any level of | ||||
Internet Standard; see Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 5 September 2024. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9626. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2024 IETF Trust and the persons identified as the | Copyright (c) 2024 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
2. Key Words for Normative Requirements . . . . . . . . . . . . 4 | 2. Requirements Language | |||
3. Frame Marking RTP Header Extension . . . . . . . . . . . . . 4 | 3. Frame Marking RTP Header Extension | |||
3.1. Long Extension for Scalable Streams . . . . . . . . . . . 5 | 3.1. Long Extension for Scalable Streams | |||
3.2. Short Extension for Non-Scalable Streams . . . . . . . . 7 | 3.2. Short Extension for Non-scalable Streams | |||
3.3. Layer ID Mappings for Scalable Streams . . . . . . . . . 7 | 3.3. LID Mappings for Scalable Streams | |||
3.3.1. VP9 LID Mapping . . . . . . . . . . . . . . . . . . . 8 | 3.3.1. VP9 LID Mapping | |||
3.3.2. H265 LID Mapping . . . . . . . . . . . . . . . . . . 8 | 3.3.2. H265 LID Mapping | |||
3.3.3. H264-SVC LID Mapping . . . . . . . . . . . . . . . . 9 | 3.3.3. H264 Scalable Video Coding (SVC) LID Mapping | |||
3.3.4. H264 (AVC) LID Mapping . . . . . . . . . . . . . . . 10 | 3.3.4. H264 Advanced Video Coding (AVC) LID Mapping | |||
3.3.5. VP8 LID Mapping . . . . . . . . . . . . . . . . . . . 10 | 3.3.5. VP8 LID Mapping | |||
3.3.6. Future Codec LID Mapping . . . . . . . . . . . . . . 11 | 3.3.6. Future Codec LID Mapping | |||
3.4. Signaling Information . . . . . . . . . . . . . . . . . . 11 | 3.4. Signaling Information | |||
3.5. Usage Considerations . . . . . . . . . . . . . . . . . . 11 | 3.5. Usage Considerations | |||
3.5.1. Relation to Layer Refresh Request (LRR) . . . . . . . 12 | 3.5.1. Relation to Layer Refresh Request (LRR) | |||
3.5.2. Scalability Structures . . . . . . . . . . . . . . . 12 | 3.5.2. Scalability Structures | |||
4. Security Considerations and Privacy Considerations . . . . . 12 | 4. Security and Privacy Considerations | |||
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 | 5. IANA Considerations | |||
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 | 6. References | |||
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 | 6.1. Normative References | |||
7.1. Normative References . . . . . . . . . . . . . . . . . . 14 | 6.2. Informative References | |||
7.2. Informative References . . . . . . . . . . . . . . . . . 14 | Acknowledgements | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 | Authors' Addresses | |||
1. Introduction | 1. Introduction | |||
Many widely deployed RTP [RFC3550] topologies [RFC7667] used in | Many widely deployed RTP [RFC3550] topologies [RFC7667] used in | |||
modern voice and video conferencing systems include a centralized | modern voice and video conferencing systems include a centralized | |||
component that acts as an RTP switch. It receives voice and video | component that acts as an RTP switch. It receives voice and video | |||
streams from each participant, which may be encrypted using SRTP | streams from each participant, which may be encrypted using Secure | |||
[RFC3711], or extensions that provide participants with private media | Real-time Transport Protocol (SRTP) [RFC3711] or extensions that | |||
[RFC8871] via end-to-end encryption where the switch has no access to | provide participants with private media [RFC8871] via end-to-end | |||
media decryption keys. The goal is to provide a set of streams back | encryption where the switch has no access to media decryption keys. | |||
to the participants which enable them to render the right media | The goal is to provide a set of streams back to the participants, | |||
content. In a simple video configuration, for example, the goal will | which enable them to render the right media content. For example, in | |||
be that each participant sees and hears just the active speaker. In | a simple video configuration, the goal will be that each participant | |||
that case, the goal of the switch is to receive the voice and video | sees and hears just the active speaker. In that case, the goal of | |||
streams from each participant, determine the active speaker based on | the switch is to receive the voice and video streams from each | |||
energy in the voice packets, possibly using the client-to-mixer audio | participant, determine the active speaker based on energy in the | |||
level RTP header extension [RFC6464], and select the corresponding | voice packets, possibly using the client-to-mixer audio level RTP | |||
video stream for transmission to participants; see Figure 1. | header extension [RFC6464], and select the corresponding video stream | |||
for transmission to participants; see Figure 1. | ||||
In this document, an "RTP switch" is used as a common short term for | In this document, an "RTP switch" is used as shorthand for the terms | |||
the terms "switching RTP mixer", "source projecting middlebox", | "switching RTP mixer", "source projecting middlebox", "source | |||
"source forwarding unit/middlebox" and "video switching MCU" as | forwarding unit/middlebox" and "video switching Multipoint Control | |||
discussed in [RFC7667]. | Unit (MCU)", as discussed in [RFC7667]. | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
| A |<---->| |<---->| B | | | A |<---->| |<---->| B | | |||
+---+ | | +---+ | +---+ | | +---+ | |||
| RTP | | | RTP | | |||
+---+ | Switch | +---+ | +---+ | Switch | +---+ | |||
| C |<---->| |<---->| D | | | C |<---->| |<---->| D | | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
Figure 1: RTP switch | Figure 1: RTP Switch | |||
In order to properly support switching of video streams, the RTP | In order to properly support the switching of video streams, the RTP | |||
switch typically needs some critical information about video frames | switch typically needs some critical information about video frames | |||
in order to start and stop forwarding streams. | in order to start and stop forwarding streams. | |||
* Because of inter-frame dependencies, it should ideally switch | * Because of inter-frame dependencies, it should ideally switch | |||
video streams at a point where the first frame from the new | video streams at a point where the first frame from the new | |||
speaker can be decoded by recipients without prior frames, e.g | speaker can be decoded by recipients without prior frames, e.g., | |||
switch on an intra-frame. | switch on an intra-frame. | |||
* In many cases, the switch may need to drop frames in order to | * In many cases, the switch may need to drop frames in order to | |||
realize congestion control techniques, and needs to know which | realize congestion control techniques, and it needs to know which | |||
frames can be dropped with minimal impact to video quality. | frames can be dropped with minimal impact to video quality. | |||
* For scalable streams with dependent layers, the switch may need to | * For scalable streams with dependent layers, the switch may need to | |||
selectively forward specific layers to specific recipients due to | selectively forward specific layers to specific recipients due to | |||
recipient bandwidth or decoder limits. | recipient bandwidth or decoder limits. | |||
Furthermore, it is highly desirable to do this in a payload format- | Furthermore, it is highly desirable to do this in a payload format- | |||
agnostic way which is not specific to each different video codec. | agnostic way that is not specific to each different video codec. | |||
Most modern video codecs share common concepts around frame types and | Most modern video codecs share common concepts around frame types and | |||
other critical information to make this codec-agnostic handling | other critical information to make this codec-agnostic handling | |||
possible. | possible. | |||
It is also desirable to be able to do this for SRTP without requiring | It is also desirable to be able to do this for SRTP without requiring | |||
the video switch to decrypt the packets. SRTP will encrypt the RTP | the video switch to decrypt the packets. SRTP will encrypt the RTP | |||
payload format contents and consequently this data is not usable for | payload format contents; consequently, this data is not usable for | |||
the switching function without decryption, which may not even be | the switching function without decryption, which may not even be | |||
possible in the case of end-to-end encryption of private media | possible in the case of end-to-end encryption of private media | |||
[RFC8871]. | [RFC8871]. | |||
By providing meta-information about the RTP streams outside the | By providing meta-information about the RTP streams outside the | |||
encrypted media payload, an RTP switch can do codec-agnostic | encrypted media payload, an RTP switch can do codec-agnostic | |||
selective forwarding without decrypting the payload. This document | selective forwarding without decrypting the payload. This document | |||
specifies the necessary meta-information in an RTP header extension. | specifies the necessary meta-information in an RTP header extension. | |||
2. Key Words for Normative Requirements | 2. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
3. Frame Marking RTP Header Extension | 3. Frame Marking RTP Header Extension | |||
This specification uses RTP header extensions as defined in | This specification uses RTP header extensions as defined in | |||
[RFC8285]. A subset of meta-information from the video stream is | [RFC8285]. A subset of meta-information from the video stream is | |||
provided as an RTP header extension to allow an RTP switch to do | provided as an RTP header extension to allow an RTP switch to do | |||
generic selective forwarding of video streams encoded with | generic selective forwarding of video streams encoded with | |||
potentially different video codecs. | potentially different video codecs. | |||
The Frame Marking RTP header extension is encoded using the one-byte | The Frame Marking RTP header extension is encoded using the one-byte | |||
header or two-byte header as described in [RFC8285]. The one-byte | header or two-byte header as described in [RFC8285]. The one-byte | |||
header format is used for examples in this memo. The two-byte header | header format is used for examples in this document. The two-byte | |||
format is used when other two-byte header extensions are present in | header format is used when other two-byte header extensions are | |||
the same RTP packet, since mixing one-byte and two-byte extensions is | present in the same RTP packet since mixing one-byte and two-byte | |||
not possible in the same RTP packet. | extensions is not possible in the same RTP packet. | |||
This extension is only specified for Source (not Redundancy) RTP | This extension is only specified for Source (not Redundancy) RTP | |||
Streams [RFC7656] that carry video payloads. It is not specified for | Streams [RFC7656] that carry video payloads. It is not specified for | |||
audio payloads, nor is it specified for Redundancy RTP Streams. The | audio payloads, nor is it specified for Redundancy RTP Streams. The | |||
(separate) specifications for Redundancy RTP Streams often include | (separate) specifications for Redundancy RTP Streams often include | |||
provisions for recovering any header extensions that were part of the | provisions for recovering any header extensions that were part of the | |||
original source packet. Such provisions can be followed to recover | original source packet. Such provisions can be followed to recover | |||
the Frame Marking RTP header extension of the original source packet. | the Frame Marking RTP header extension of the original source packet. | |||
Source packet frame markings may be useful when generating Redundancy | Source packet frame markings may be useful when generating Redundancy | |||
RTP Streams; for example, the I (Independent Frame) and D | RTP Streams; for example, the I (Independent Frame) and D | |||
(Discardable Frame) bits, defined in Section 3.1, can be used to | (Discardable Frame) bits, defined in Section 3.1, can be used to | |||
generate extra or no redundancy, respectively, and redundancy schemes | generate extra or no redundancy, respectively, and redundancy schemes | |||
with source blocks can align source block boundaries with independent | with source blocks can align source block boundaries with independent | |||
frame boundaries as marked by the I bit. | frame boundaries as marked by the I bit. | |||
A frame, in the context of this specification, is the set of RTP | A frame, in the context of this specification, is the set of RTP | |||
packets with the same RTP timestamp from a specific RTP | packets with the same RTP timestamp from a specific RTP | |||
synchronization source (SSRC). A frame within a layer is the set of | Synchronization Source (SSRC). A frame within a layer is the set of | |||
RTP packets with the same RTP timestamp, SSRC, Temporal ID (TID), and | RTP packets with the same RTP timestamp, SSRC, Temporal ID (TID), and | |||
Layer ID (LID). | Layer ID (LID). | |||
3.1. Long Extension for Scalable Streams | 3.1. Long Extension for Scalable Streams | |||
The following RTP header extension is RECOMMENDED for scalable | The following RTP header extension is RECOMMENDED for scalable | |||
streams. It MAY also be used for non-scalable streams, in which case | streams. It MAY also be used for non-scalable streams, in which case | |||
TID, LID and TL0PICIDX MUST be 0 or omitted. The ID is assigned per | the TID, LID, and TL0PICIDX MUST be 0 or omitted. The ID is assigned | |||
[RFC8285], and the length is encoded as L=2 which indicates 3 octets | per [RFC8285]. The length is encoded as follows: | |||
of data when nothing is omitted, or L=1 for 2 octets when TL0PICIDX | ||||
is omitted, or L=0 for 1 octet when both LID and TL0PICIDX are | * L=2 to indicate 3 octets of data when nothing is omitted, | |||
omitted. | ||||
* L=1 for 2 octets when TL0PICIDX is omitted, or | ||||
* L=0 for 1 octet when both the LID and TL0PICIDX are omitted. | ||||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| ID=? | L=2 |S|E|I|D|B| TID | LID | TL0PICIDX | | | ID=? | L=2 |S|E|I|D|B| TID | LID | TL0PICIDX | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
or | or | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| ID=? | L=1 |S|E|I|D|B| TID | LID | (TL0PICIDX omitted) | | ID=? | L=1 |S|E|I|D|B| TID | LID | (TL0PICIDX omitted) | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
or | or | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| ID=? | L=0 |S|E|I|D|B| TID | (LID and TL0PICIDX omitted) | | ID=? | L=0 |S|E|I|D|B| TID | (LID and TL0PICIDX omitted) | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
The following information are extracted from the media payload and | The following information is extracted from the media payload and | |||
sent in the Frame Marking RTP header extension. | sent in the Frame Marking RTP header extension. | |||
* S: Start of Frame (1 bit) - MUST be 1 in the first packet in a | S: Start of Frame (1 bit) | |||
frame within a layer; otherwise MUST be 0. | MUST be 1 in the first packet in a frame within a layer; | |||
* E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame | otherwise, MUST be 0. | |||
within a layer; otherwise MUST be 0. Note that the RTP header | ||||
marker bit MAY be used to infer the last packet of the highest | ||||
enhancement layer, in payload formats with such semantics. | ||||
* I: Independent Frame (1 bit) - MUST be 1 for a frame within a | ||||
layer that can be decoded independent of temporally prior frames, | ||||
e.g. intra-frame, VPX keyframe, H.264 IDR [RFC6184], H.265 | ||||
IDR/CRA/BLA/RAP [RFC7798]; otherwise MUST be 0. Note that this | ||||
bit only signals temporal independence, so it can be 1 in spatial | ||||
or quality enhancement layers that depend on temporally co-located | ||||
layers but not temporally prior frames. | ||||
* D: Discardable Frame (1 bit) - MUST be 1 for a frame within a | ||||
layer the sender knows can be discarded, and still provide a | ||||
decodable media stream; otherwise MUST be 0. | ||||
* B: Base Layer Sync (1 bit) - When TID is not 0, this MUST be 1 if | ||||
the sender knows this frame within a layer only depends on the | ||||
base temporal layer; otherwise MUST be 0. When TID is 0 or if no | ||||
scalability is used, this MUST be 0. | ||||
* TID: Temporal ID (3 bits) - Identifies the temporal layer/sub- | E: End of Frame (1 bit) | |||
layer encoded, starting with 0 for the base layer, and increasing | MUST be 1 in the last packet in a frame within a layer; otherwise, | |||
with higher temporal fidelity. If no scalability is used, this | MUST be 0. Note that the RTP header marker bit MAY be used to | |||
MUST be 0. It is implicitly 0 in the short extension format. | infer the last packet of the highest enhancement layer in payload | |||
* LID: Layer ID (8 bits) - Identifies the spatial and quality layer | formats with such semantics. | |||
encoded, starting with 0 for the base layer, and increasing with | ||||
higher fidelity. If no scalability is used, this MUST be 0 or | ||||
omitted to reduce length. When omitted, TL0PICIDX MUST also be | ||||
omitted. It is implicitly 0 in the short extension format or when | ||||
omitted in the long extension format. | ||||
* TL0PICIDX: Temporal Layer 0 Picture Index (8 bits) - When TID is 0 | ||||
and LID is 0, this is a cyclic counter labeling base layer frames. | ||||
When TID is not 0 or LID is not 0, this indicates a dependency on | ||||
the given index, such that this frame within this layer depends on | ||||
the frame with this label in the layer with TID 0 and LID 0. If | ||||
no scalability is used, or the cyclic counter is unknown, this | ||||
MUST be omitted to reduce length. Note that 0 is a valid index | ||||
value for TL0PICIDX. | ||||
The layer information contained in TID and LID convey useful aspects | I: Independent Frame (1 bit) | |||
of the layer structure that can be utilized in selective forwarding. | MUST be 1 for a frame within a layer that can be decoded | |||
independent of temporally prior frames, e.g., intra-frame, VPX | ||||
keyframe, H.264 Instantaneous Decoding Refresh (IDR) [RFC6184], or | ||||
H.265 IDR / Clean Random Access (CRA) / Broken Link Access (BLA) / | ||||
Random Access Point (RAP) [RFC7798]; otherwise, MUST be 0. Note | ||||
that this bit only signals temporal independence, so it can be 1 | ||||
in spatial or quality enhancement layers that depend on temporally | ||||
co-located layers but not temporally prior frames. | ||||
D: Discardable Frame (1 bit) | ||||
MUST be 1 for a frame within a layer the sender knows can be | ||||
discarded and still provide a decodable media stream; otherwise, | ||||
MUST be 0. | ||||
B: Base Layer Sync (1 bit) | ||||
When the TID is not 0, this MUST be 1 if the sender knows this | ||||
frame within a layer only depends on the base temporal layer; | ||||
otherwise, MUST be 0. When the TID is 0 or if no scalability is | ||||
used, this MUST be 0. | ||||
TID: Temporal ID (3 bits) | ||||
Identifies the temporal layer/sub-layer encoded, starting with 0 | ||||
for the base layer and increasing with higher temporal fidelity. | ||||
If no scalability is used, this MUST be 0. It is implicitly 0 in | ||||
the short extension format. | ||||
LID: Layer ID (8 bits) | ||||
Identifies the spatial and quality layer encoded, starting with 0 | ||||
for the base layer and increasing with higher fidelity. If no | ||||
scalability is used, this MUST be 0 or omitted to reduce length. | ||||
When the LID is omitted, TL0PICIDX MUST also be omitted. It is | ||||
implicitly 0 in the short extension format or when omitted in the | ||||
long extension format. | ||||
TL0PICIDX: Temporal Layer 0 Picture Index (8 bits) | ||||
When the TID is 0 and the LID is 0, this is a cyclic counter | ||||
labeling base layer frames. When the TID is not 0 or the LID is | ||||
not 0, the indication is that a dependency on the given index, | ||||
such that this frame within this layer depends on the frame with | ||||
this label in the layer with a TID 0 and LID 0. If no scalability | ||||
is used, or the cyclic counter is unknown, TL0PICIDX MUST be | ||||
omitted to reduce length. Note that 0 is a valid index value for | ||||
TL0PICIDX. | ||||
The layer information contained in the TID and LID convey useful | ||||
aspects of the layer structure that can be utilized in selective | ||||
forwarding. | ||||
Without further information about the layer structure, these TID/LID | Without further information about the layer structure, these TID/LID | |||
identifiers can only be used for relative priority of layers and | identifiers can only be used for relative priority of layers and | |||
implicit dependencies between layers. They convey a layer hierarchy | implicit dependencies between layers. They convey a layer hierarchy | |||
with TID=0 and LID=0 identifying the base layer. Higher values of | with TID = 0 and LID = 0 identifying the base layer. Higher values | |||
TID identify higher temporal layers with higher frame rates. Higher | of TID identify higher temporal layers with higher frame rates. | |||
values of LID identify higher spatial and/or quality layers with | Higher values of LID identify higher spatial and/or quality layers | |||
higher resolutions and/or bitrates. Implicit dependencies between | with higher resolutions and/or bitrates. Implicit dependencies | |||
layers assume that a layer with a given TID/LID MAY depend on | between layers assume that a layer with a given TID/LID MAY depend on | |||
layer(s) with the same or lower TID/LID, but MUST NOT depend on | a layer or layers with the same or lower TID/LID, but they MUST NOT | |||
layer(s) with higher TID/LID. | depend on a layer or layers with higher TID/LID. | |||
With further information, for example, possible future RTCP SDES | With further information, for example, possible future RTCP source | |||
items that convey full layer structure information, it may be | description (SDES) items that convey full layer structure | |||
possible to map these TIDs and LIDs to specific absolute frame rates, | information, it may be possible to map these TIDs and LIDs to | |||
resolutions and bitrates, as well as explicit dependencies between | specific absolute frame rates, resolutions, bitrates, and explicit | |||
layers. Such additional layer information may be useful for | dependencies between layers. Such additional layer information may | |||
forwarding decisions in the RTP switch, but is beyond the scope of | be useful for forwarding decisions in the RTP switch but is beyond | |||
this memo. The relative layer information is still useful for many | the scope of this memo. The relative layer information is still | |||
selective forwarding decisions even without such additional layer | useful for many selective forwarding decisions, even without such | |||
information. | additional layer information. | |||
3.2. Short Extension for Non-Scalable Streams | 3.2. Short Extension for Non-scalable Streams | |||
The following RTP header extension is RECOMMENDED for non-scalable | The following RTP header extension is RECOMMENDED for non-scalable | |||
streams. It is identical to the shortest form of the extension for | streams. It is identical to the shortest form of the extension for | |||
scalable streams, except the last four bits (B and TID) are replaced | scalable streams, except the last four bits (B and TID) are replaced | |||
with zeros. It MAY also be used for scalable streams if the sender | with zeros. It MAY also be used for scalable streams if the sender | |||
has limited or no information about stream scalability. The ID is | has limited or no information about stream scalability. The ID is | |||
assigned per [RFC8285], and the length is encoded as L=0 which | assigned per [RFC8285]; the length is encoded as L=0, which indicates | |||
indicates 1 octet of data. | 1 octet of data. | |||
0 1 | 0 1 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| ID=? | L=0 |S|E|I|D|0 0 0 0| | | ID=? | L=0 |S|E|I|D|0 0 0 0| | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
The following information are extracted from the media payload and | The following information is extracted from the media payload and | |||
sent in the Frame Marking RTP header extension. | sent in the Frame Marking RTP header extension. | |||
* S: Start of Frame (1 bit) - MUST be 1 in the first packet in a | S: Start of Frame (1 bit) | |||
frame; otherwise MUST be 0. | MUST be 1 in the first packet in a frame; otherwise, MUST be 0. | |||
* E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame; | ||||
otherwise MUST be 0. SHOULD match the RTP header marker bit in | ||||
payload formats with such semantics for marking end of frame. | ||||
* I: Independent Frame (1 bit) - MUST be 1 for frames that can be | ||||
decoded independent of temporally prior frames, e.g. intra-frame, | ||||
VPX keyframe, H.264 IDR [RFC6184], H.265 IDR/CRA/BLA/IRAP | ||||
[RFC7798]; otherwise MUST be 0. | ||||
* D: Discardable Frame (1 bit) - MUST be 1 for frames the sender | ||||
knows can be discarded, and still provide a decodable media | ||||
stream; otherwise MUST be 0. | ||||
* The remaining (4 bits) - are reserved/fixed values and not used | ||||
for non-scalable streams; they MUST be set to 0 upon transmission | ||||
and ignored upon reception. | ||||
3.3. Layer ID Mappings for Scalable Streams | E: End of Frame (1 bit) | |||
MUST be 1 in the last packet in a frame; otherwise, MUST be 0. | ||||
SHOULD match the RTP header marker bit in payload formats with | ||||
such semantics for marking end of frame. | ||||
This section maps the specific Layer ID information contained in | I: Independent Frame (1 bit) | |||
specific scalable codecs to the generic LID and TID fields. | MUST be 1 for frames that can be decoded independent of temporally | |||
prior frames, e.g., intra-frame, VPX keyframe, H.264 IDR | ||||
[RFC6184], or H.265 IDR/CRA/BLA/IRAP [RFC7798]; otherwise, MUST be | ||||
0. | ||||
Note that non-scalable streams have no Layer ID information and thus | D: Discardable Frame (1 bit) | |||
no mappings. | MUST be 1 for frames the sender knows can be discarded and still | |||
provide a decodable media stream; otherwise, MUST be 0. | ||||
The remaining (4 bits) | ||||
These are reserved/fixed values and not used for non-scalable | ||||
streams; they MUST be set to 0 upon transmission and ignored upon | ||||
reception. | ||||
3.3. LID Mappings for Scalable Streams | ||||
This section maps the specific Layer ID (LID) information contained | ||||
in specific scalable codecs to the generic LID and TID fields. | ||||
Note that non-scalable streams have no LID information; thus, they | ||||
have no mappings. | ||||
3.3.1. VP9 LID Mapping | 3.3.1. VP9 LID Mapping | |||
The VP9 [I-D.ietf-payload-vp9] Spatial Layer ID (SID, 3 bits) and | The VP9 [RFC9628] Spatial Layer ID (SID, 3 bits) and Temporal Layer | |||
Temporal Layer ID (TID, 3 bits) in the VP9 payload descriptor are | ID (TID, 3 bits) in the VP9 payload descriptor are mapped to the | |||
mapped to the generic LID and TID fields in the header extension as | generic LID and TID fields in the header extension as shown in the | |||
shown in the following figure. | following figure. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0| SID | TL0PICIDX | | | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0| SID | TL0PICIDX | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
The S bit MUST match the B bit in the VP9 payload descriptor. | The S bit MUST match the B bit in the VP9 payload descriptor. | |||
The E bit MUST match the E bit in the VP9 payload descriptor. | The E bit MUST match the E bit in the VP9 payload descriptor. | |||
The I bit MUST match the inverse of the P bit in the VP9 payload | The I bit MUST match the inverse of the P bit in the VP9 payload | |||
descriptor. | descriptor. | |||
The D bit MUST be 1 if the refresh_frame_flags in the VP9 payload | The D bit MUST be 1 if the refresh_frame_flags in the VP9 payload | |||
uncompressed header are all 0, otherwise it MUST be 0. | uncompressed header are all 0; otherwise, it MUST be 0. | |||
The B bit MUST be 0 if TID is 0; otherwise, if TID is not 0, it MUST | The B bit MUST be 0 if the TID is 0; if the TID is not 0, it MUST | |||
match the U bit in the VP9 payload descriptor. Note: When using | match the U bit in the VP9 payload descriptor. Note: when using | |||
temporally nested scalability structures as recommended in | temporally nested scalability structures as recommended in | |||
Section 3.5.2, the B bit and VP9 U bit will always be 1 if TID is not | Section 3.5.2, the B bit and VP9 U bit will always be 1 if the TID is | |||
0, since it is always possible to switch up to a higher temporal | not 0 since it is always possible to switch up to a higher temporal | |||
layer in such nested structures. | layer in such nested structures. | |||
TID, SID and TL0PICIDX MUST match the correspondingly named fields in | The TID, SID, and TL0PICIDX MUST match the correspondingly named | |||
the VP9 payload descriptor, with SID aligned in the least significant | fields in the VP9 payload descriptor, with SID aligned in the least | |||
3 bits of the 8-bit LID field and zeros in the most significant 5 | significant 3 bits of the 8-bit LID field and zeros in the most | |||
bits. | significant 5 bits. | |||
3.3.2. H265 LID Mapping | 3.3.2. H265 LID Mapping | |||
The H265 [RFC7798] LayerID (6 bits) and TID (3 bits) from the NAL | The H265 [RFC7798] LayerID (6 bits), and TID (3 bits) from the | |||
unit header are mapped to the generic LID and TID fields in the | Network Abstraction Layer (NAL) unit header are mapped to the generic | |||
header extension as shown in the following figure. | LID and TID fields in the header extension as shown in the following | |||
figure. | ||||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| ID=? | L=2 |S|E|I|D|B| TID |0|0| LayerID | TL0PICIDX | | | ID=? | L=2 |S|E|I|D|B| TID |0|0| LayerID | TL0PICIDX | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
The S and E bits MUST match the correspondingly named bits in | The S and E bits MUST match the correspondingly named bits in | |||
PACI:PHES:TSCI payload structures. | PACI:PHES:TSCI payload structures. | |||
The I bit MUST be 1 when the NAL unit type is 16-23 (inclusive) or | The I bit MUST be 1 when the NAL unit type is 16-23 (inclusive) or | |||
32-34 (inclusive), or an aggregation packet or fragmentation unit | 32-34 (inclusive), or an aggregation packet or fragmentation unit | |||
encapsulating any of these types, otherwise it MUST be 0. These | encapsulating any of these types; otherwise, it MUST be 0. These | |||
ranges cover intra (IRAP) frames as well as critical parameter sets | ranges cover intra (IRAP) frames as well as critical parameter sets | |||
(VPS, SPS, PPS). | (Video Parameter Set (VPS), Sequence Parameter Set (SPS), Picture | |||
Parameter Set (PPS)). | ||||
The D bit MUST be 1 when the NAL unit type is 0, 2, 4, 6, 8, 10, 12, | The D bit MUST be 1 when the NAL unit type is 0, 2, 4, 6, 8, 10, 12, | |||
14, or 38, or an aggregation packet or fragmentation unit | 14, 38, or an aggregation packet or fragmentation unit encapsulating | |||
encapsulating only these types, otherwise it MUST be 0. These ranges | only these types; otherwise, it MUST be 0. These ranges cover non- | |||
cover non-reference frames as well as filler data. | reference frames as well as filler data. | |||
The B bit can not be determined reliably from simple inspection of | The B bit cannot be determined reliably from simple inspection of | |||
payload headers, and therefore is determined by implementation- | payload headers; therefore, it is determined by implementation- | |||
specific means. For example, internal codec interfaces may provide | specific means. For example, internal codec interfaces may provide | |||
information to set this reliably. | information to set this reliably. | |||
TID and LayerID MUST match the correspondingly named fields in the | The TID and LayerID MUST match the correspondingly named fields in | |||
H265 NAL unit header, with LayerID aligned in the least significant 6 | the H265 NAL unit header, with LayerID aligned in the least | |||
bits of the 8-bit LID field and zeros in the most significant 2 bits. | significant 6 bits of the 8-bit LID field and zeros in the most | |||
significant 2 bits. | ||||
3.3.3. H264-SVC LID Mapping | 3.3.3. H264 Scalable Video Coding (SVC) LID Mapping | |||
The following shows H264-SVC [RFC6190] Layer encoding information (3 | The following shows H264-SVC [RFC6190] Layer encoding information (3 | |||
bits for spatial/dependency layer, 4 bits for quality layer and 3 | bits for spatial/dependency layer, 4 bits for quality layer, and 3 | |||
bits for temporal layer) mapped to the generic LID and TID fields. | bits for temporal layer) mapped to the generic LID and TID fields. | |||
The S, E, I and D bits MUST match the correspondingly named bits in | The S, E, I, and D bits MUST match the correspondingly named bits in | |||
PACSI payload structures. | Payload Content Scalability Information (PACSI) payload structures. | |||
The I bit MUST be 1 when the NAL unit type is 5, 7, 8, 13, or 15, or | The I bit MUST be 1 when the NAL unit type is 5, 7, 8, 13, 15, or an | |||
an aggregation packet or fragmentation unit encapsulating any of | aggregation packet or fragmentation unit encapsulating any of these | |||
these types, otherwise it MUST be 0. These ranges cover intra (IDR) | types; otherwise, it MUST be 0. These ranges cover intra (IDR) | |||
frames as well as critical parameter sets (SPS/PPS variants). | frames as well as critical parameter sets (SPS/PPS variants). | |||
The D bit MUST be 1 when the NAL unit header NRI field is 0, or an | The D bit MUST be 1 when the NAL unit header Network Remote | |||
aggregation packet or fragmentation unit encapsulating only NAL units | Identification (NRI) field is 0, or an aggregation packet or | |||
with NRI=0, otherwise it MUST be 0. The NRI=0 condition signals non- | fragmentation unit encapsulating only NAL units with NRI=0; | |||
reference frames. | otherwise, it MUST be 0. The NRI=0 condition signals non-reference | |||
frames. | ||||
The B bit can not be determined reliably from simple inspection of | The B bit cannot be determined reliably from simple inspection of | |||
payload headers, and therefore is determined by implementation- | payload headers; therefore, it is determined by implementation- | |||
specific means. For example, internal codec interfaces may provide | specific means. For example, internal codec interfaces may provide | |||
information to set this reliably. | information to set this reliably. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| ID=? | L=2 |S|E|I|D|B| TID |0| DID | QID | TL0PICIDX | | | ID=? | L=2 |S|E|I|D|B| TID |0| DID | QID | TL0PICIDX | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
3.3.4. H264 (AVC) LID Mapping | 3.3.4. H264 Advanced Video Coding (AVC) LID Mapping | |||
The following shows the header extension for H264 (AVC) [RFC6184] | The following shows the header extension for H264 (AVC) [RFC6184] | |||
that contains only temporal layer information. | that contains only temporal layer information. | |||
The S bit MUST be 1 when the timestamp in the RTP header differs from | The S bit MUST be 1 when the timestamp in the RTP header differs from | |||
the timestamp in the prior RTP sequence number from the same SSRC, | the timestamp in the prior RTP sequence number from the same SSRC; | |||
otherwise it MUST be 0. | otherwise, it MUST be 0. | |||
The E bit MUST match the M bit in the RTP header. | The E bit MUST match the M bit in the RTP header. | |||
The I bit MUST be 1 when the NAL unit type is 5, 7, or 8, or an | The I bit MUST be 1 when the NAL unit type is 5, 7, or 8, or an | |||
aggregation packet or fragmentation unit encapsulating any of these | aggregation packet or fragmentation unit encapsulating any of these | |||
types, otherwise it MUST be 0. These ranges cover intra (IDR) frames | types; otherwise, it MUST be 0. These ranges cover intra (IDR) | |||
as well as critical parameter sets (SPS/PPS). | frames as well as critical parameter sets (SPS/PPS). | |||
The D bit MUST be 1 when the NAL unit header NRI field is 0, or an | The D bit MUST be 1 when the NAL unit header NRI field is 0, or an | |||
aggregation packet or fragmentation unit encapsulating only NAL units | aggregation packet or fragmentation unit encapsulating only NAL units | |||
with NRI=0, otherwise it MUST be 0. The NRI=0 condition signals non- | with NRI=0; otherwise, it MUST be 0. The NRI=0 condition signals | |||
reference frames. | non-reference frames. | |||
The B bit can not be determined reliably from simple inspection of | The B bit cannot be determined reliably from simple inspection of | |||
payload headers, and therefore is determined by implementation- | payload headers; therefore, it is determined by implementation- | |||
specific means. For example, internal codec interfaces may provide | specific means. For example, internal codec interfaces may provide | |||
information to set this reliably. | information to set this reliably. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | | | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
3.3.5. VP8 LID Mapping | 3.3.5. VP8 LID Mapping | |||
The following shows the header extension for VP8 [RFC7741] that | The following shows the header extension for VP8 [RFC7741] that | |||
contains only temporal layer information. | contains only temporal layer information. | |||
The S bit MUST match the correspondingly named bit in the VP8 payload | The S bit MUST match the correspondingly named bit in the VP8 payload | |||
descriptor when PID=0, otherwise it MUST be 0. | descriptor when PID=0; otherwise, it MUST be 0. | |||
The E bit MUST match the M bit in the RTP header. | The E bit MUST match the M bit in the RTP header. | |||
The I bit MUST match the inverse of the P bit in the VP8 payload | The I bit MUST match the inverse of the P bit in the VP8 payload | |||
header. | header. | |||
The D bit MUST match the N bit in the VP8 payload descriptor. | The D bit MUST match the N bit in the VP8 payload descriptor. | |||
The B bit MUST match the Y bit in the VP8 payload descriptor. Note: | The B bit MUST match the Y bit in the VP8 payload descriptor. Note: | |||
When using temporally nested scalability structures as recommended in | when using temporally nested scalability structures as recommended in | |||
Section 3.5.2, the B bit and VP8 Y bit will always be 1 if TID is not | Section 3.5.2, the B bit and VP8 Y bit will always be 1 if the TID is | |||
0, since it is always possible to switch up to a higher temporal | not 0 since it is always possible to switch up to a higher temporal | |||
layer in such nested structures. | layer in such nested structures. | |||
TID and TL0PICIDX MUST match the correspondingly named fields in the | The TID and TL0PICIDX MUST match the correspondingly named fields in | |||
VP8 payload descriptor. | the VP8 payload descriptor. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | | | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
3.3.6. Future Codec LID Mapping | 3.3.6. Future Codec LID Mapping | |||
The RTP payload format specification for future video codecs SHOULD | The RTP payload format specification for future video codecs SHOULD | |||
skipping to change at page 11, line 51 ¶ | skipping to change at line 544 ¶ | |||
3.5. Usage Considerations | 3.5. Usage Considerations | |||
The header extension values MUST represent what is already in the RTP | The header extension values MUST represent what is already in the RTP | |||
payload. | payload. | |||
When an RTP switch needs to discard a received video frame due to | When an RTP switch needs to discard a received video frame due to | |||
congestion control considerations, it is RECOMMENDED that it | congestion control considerations, it is RECOMMENDED that it | |||
preferably drop frames marked with the D (Discardable) bit set, or | preferably drop frames marked with the D (Discardable) bit set, or | |||
the highest values of TID and LID, which indicate the highest | the highest values of TID and LID, which indicate the highest | |||
temporal and spatial/quality enhancement layers, since those | temporal and spatial/quality enhancement layers, since those | |||
typically have fewer dependenices on them than lower layers. | typically have fewer dependencies on them than lower layers. | |||
When an RTP switch wants to forward a new video stream to a receiver, | When an RTP switch wants to forward a new video stream to a receiver, | |||
it is RECOMMENDED to select the new video stream from the first | it is RECOMMENDED to select the new video stream from the first | |||
switching point with the I (Independent) bit set in all spatial | switching point with the I (Independent) bit set in all spatial | |||
layers and forward the same. An RTP switch can request a media | layers and forward the same. An RTP switch can request that a media | |||
source to generate a switching point by sending Full Intra Request | source generate a switching point by sending Full Intra Request (RTCP | |||
(RTCP FIR) as defined in [RFC5104], for example. | FIR) as defined in [RFC5104], for example. | |||
3.5.1. Relation to Layer Refresh Request (LRR) | 3.5.1. Relation to Layer Refresh Request (LRR) | |||
Receivers can use the Layer Refresh Request (LRR) | Receivers can use the Layer Refresh Request (LRR) [RFC9627] RTCP | |||
[I-D.ietf-avtext-lrr] RTCP feedback message to upgrade to a higher | feedback message to upgrade to a higher layer in scalable encodings. | |||
layer in scalable encodings. The TID/LID values and formats used in | The TID/LID values and formats used in LRR messages MUST correspond | |||
LRR messages MUST correspond to the same values and formats specified | to the same values and formats specified in Section 3.1. | |||
in Section 3.1. | ||||
Because frame marking can only be used with temporally-nested | Because frame marking can only be used with temporally nested | |||
streams, temporal-layer LRR refreshes are unnecessary for frame- | streams, temporal-layer LRR refreshes are unnecessary for frame- | |||
marked streams. Other refreshes can be detected based on the I bit | marked streams. Other refreshes can be detected based on the I bit | |||
being set for the specific spatial layers. | being set for the specific spatial layers. | |||
3.5.2. Scalability Structures | 3.5.2. Scalability Structures | |||
The LID and TID information is most useful for fixed scalability | The LID and TID information is most useful for fixed scalability | |||
structures, such as nested hierarchical temporal layering structures, | structures, such as nested hierarchical temporal layering structures, | |||
where each temporal layer only references lower temporal layers or | where each temporal layer only references lower temporal layers or | |||
the base temporal layer. The LID and TID information is less useful, | the base temporal layer. The LID and TID information is less useful, | |||
or even not useful at all, for complex, irregular scalability | or even not useful at all, for complex, irregular scalability | |||
structures that do not conform to common, fixed patterns of inter- | structures that do not conform to common, fixed patterns of inter- | |||
layer dependencies and referencing structures. Therefore it is | layer dependencies and referencing structures. Therefore, it is | |||
RECOMMENDED to use LID and TID information for RTP switch forwarding | RECOMMENDED to use LID and TID information for RTP switch forwarding | |||
decisions only in the case of temporally nested scalability | decisions only in the case of temporally nested scalability | |||
structures, and it is NOT RECOMMENDED for other (more complex or | structures, and it is NOT RECOMMENDED for other (more complex or | |||
irregular) scalability structures. | irregular) scalability structures. | |||
4. Security Considerations and Privacy Considerations | 4. Security and Privacy Considerations | |||
In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP | In "The Secure Real-time Transport Protocol (SRTP)" [RFC3711], RTP | |||
header extensions are authenticated and optionally encrypted | header extensions are authenticated and optionally encrypted | |||
[RFC9335]. When unencrypted header extensions are used, some | [RFC9335]. When unencrypted header extensions are used, some | |||
metadata is exposed and visible to middle boxes on the network path, | metadata is exposed and visible to middleboxes on the network path, | |||
while encrypted media data and metadata in encrypted header | while encrypted media data and metadata in encrypted header | |||
extensions are not exposed. | extensions are not exposed. | |||
The primary utility of this specification is for RTP switches to make | The primary utility of this specification is for RTP switches to make | |||
proper media forwarding decisions. RTP switches are the SRTP peers | proper media forwarding decisions. RTP switches are the SRTP peers | |||
of endpoints, so they can access encrypted header extensions, but not | of endpoints, so they can access encrypted header extensions, but not | |||
end-to-end encrypted private media payloads. Other middle boxes on | end-to-end encrypted private media payloads. Other middleboxes on | |||
the network path can only access unencrypted header extensions, since | the network path can only access unencrypted header extensions since | |||
they are not SRTP peers. | they are not SRTP peers. | |||
RTP endpoints which negotiate this extension should consider whether | RTP endpoints that negotiate this extension should consider whether: | |||
this video frame marking metadata needs to be exposed to the SRTP | ||||
peer only, in which case the header extension can be encrypted; or | ||||
whether other middle boxes on the network path also need this | ||||
metadata, for example, to optimize packet drop decisions that | ||||
minimize media quality impacts, in which case the header extension | ||||
can be unencrypted, if the endpoint accepts the potential privacy | ||||
leakage of this metadata. For example, it would be possible to | ||||
determine keyframes and their frequency in unencrypted header | ||||
extensions. This information can often be obtained via statistical | ||||
analysis of encrypted data. For example, keyframes are usually much | ||||
larger than other frames, so frame size alone can leak this in the | ||||
absence of any unencrypted metadata. However, unencrypted metadata | ||||
provides a reliable signal rather than a statistical probability; so | ||||
endpoints should take that into consideration to balance the privacy | ||||
leakage risk against the potential benefit of optimized media | ||||
delivery when deciding whether to negotiate and encrypt this header | ||||
extension. | ||||
5. Acknowledgements | * this video frame marking metadata needs to be exposed to the SRTP | |||
peer only, in which case the header extension can be encrypted; or | ||||
Many thanks to Bernard Aboba, Jonathan Lennox, Stephan Wenger, Dale | * other middleboxes on the network path also need this metadata, for | |||
Worley, and Magnus Westerlund for their inputs. | example, to optimize packet drop decisions that minimize media | |||
quality impacts, in which case the header extension can be | ||||
unencrypted, if the endpoint accepts the potential privacy leakage | ||||
of this metadata. | ||||
6. IANA Considerations | For example, it would be possible to determine keyframes and their | |||
frequency in unencrypted header extensions. This information can | ||||
often be obtained via statistical analysis of encrypted data. For | ||||
example, keyframes are usually much larger than other frames, so | ||||
frame size alone can leak this in the absence of any unencrypted | ||||
metadata. However, unencrypted metadata provides a reliable signal | ||||
rather than a statistical probability; so endpoints should take that | ||||
into consideration to balance the privacy leakage risk against the | ||||
potential benefit of optimized media delivery when deciding whether | ||||
to negotiate and encrypt this header extension. | ||||
This document defines a new extension URI to the RTP Compact | 5. IANA Considerations | |||
HeaderExtensions sub-registry of the Real-Time Transport Protocol | ||||
(RTP) Parameters registry, according to the following data: | This document defines a new extension URI listed in the "RTP Compact | |||
Header Extensions" subregistry of the "Real-Time Transport Protocol | ||||
(RTP) Parameters" registry, according to the following data: | ||||
Extension URI: urn:ietf:params:rtp-hdrext:framemarkinginfo | Extension URI: urn:ietf:params:rtp-hdrext:framemarkinginfo | |||
Description: Frame marking information for video streams | Description: Frame marking information for video streams | |||
Contact: mzanaty@cisco.com | Contact: mzanaty@cisco.com | |||
Reference: RFC XXXX | Reference: RFC 9626 | |||
Note to RFC Editor: please replace RFC XXXX with the number of this | ||||
RFC. | ||||
7. References | 6. References | |||
7.1. Normative References | 6.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
skipping to change at page 14, line 43 ¶ | skipping to change at line 669 ¶ | |||
[RFC7741] Westin, P., Lundin, H., Glover, M., Uberti, J., and F. | [RFC7741] Westin, P., Lundin, H., Glover, M., Uberti, J., and F. | |||
Galligan, "RTP Payload Format for VP8 Video", RFC 7741, | Galligan, "RTP Payload Format for VP8 Video", RFC 7741, | |||
DOI 10.17487/RFC7741, March 2016, | DOI 10.17487/RFC7741, March 2016, | |||
<https://www.rfc-editor.org/info/rfc7741>. | <https://www.rfc-editor.org/info/rfc7741>. | |||
[RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M. | [RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M. | |||
M. Hannuksela, "RTP Payload Format for High Efficiency | M. Hannuksela, "RTP Payload Format for High Efficiency | |||
Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, | Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, | |||
March 2016, <https://www.rfc-editor.org/info/rfc7798>. | March 2016, <https://www.rfc-editor.org/info/rfc7798>. | |||
7.2. Informative References | 6.2. Informative References | |||
[RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and | [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and | |||
B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms | B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms | |||
for Real-Time Transport Protocol (RTP) Sources", RFC 7656, | for Real-Time Transport Protocol (RTP) Sources", RFC 7656, | |||
DOI 10.17487/RFC7656, November 2015, | DOI 10.17487/RFC7656, November 2015, | |||
<https://www.rfc-editor.org/info/rfc7656>. | <https://www.rfc-editor.org/info/rfc7656>. | |||
[RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, | [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, | |||
DOI 10.17487/RFC7667, November 2015, | DOI 10.17487/RFC7667, November 2015, | |||
<https://www.rfc-editor.org/info/rfc7667>. | <https://www.rfc-editor.org/info/rfc7667>. | |||
skipping to change at page 15, line 40 ¶ | skipping to change at line 712 ¶ | |||
[RFC8871] Jones, P., Benham, D., and C. Groves, "A Solution | [RFC8871] Jones, P., Benham, D., and C. Groves, "A Solution | |||
Framework for Private Media in Privacy-Enhanced RTP | Framework for Private Media in Privacy-Enhanced RTP | |||
Conferencing (PERC)", RFC 8871, DOI 10.17487/RFC8871, | Conferencing (PERC)", RFC 8871, DOI 10.17487/RFC8871, | |||
January 2021, <https://www.rfc-editor.org/info/rfc8871>. | January 2021, <https://www.rfc-editor.org/info/rfc8871>. | |||
[RFC9335] Uberti, J., Jennings, C., and S. Murillo, "Completely | [RFC9335] Uberti, J., Jennings, C., and S. Murillo, "Completely | |||
Encrypting RTP Header Extensions and Contributing | Encrypting RTP Header Extensions and Contributing | |||
Sources", RFC 9335, DOI 10.17487/RFC9335, January 2023, | Sources", RFC 9335, DOI 10.17487/RFC9335, January 2023, | |||
<https://www.rfc-editor.org/info/rfc9335>. | <https://www.rfc-editor.org/info/rfc9335>. | |||
[I-D.ietf-avtext-lrr] | [RFC9627] Lennox, J., Hong, D., Uberti, J., Holmer, S., and M. | |||
Lennox, J., Hong, D., Uberti, J., Holmer, S., and M. | ||||
Flodman, "The Layer Refresh Request (LRR) RTCP Feedback | Flodman, "The Layer Refresh Request (LRR) RTCP Feedback | |||
Message", Work in Progress, Internet-Draft, draft-ietf- | Message", RFC 9627, DOI 10.17487/RFC9627, August 2024, | |||
avtext-lrr-07, 2 July 2017, | <https://www.rfc-editor.org/info/rfc9627>. | |||
<https://datatracker.ietf.org/doc/html/draft-ietf-avtext- | ||||
lrr-07>. | ||||
[I-D.ietf-payload-vp9] | [RFC9628] Uberti, J., Holmer, S., Flodman, M., Hong, D., and J. | |||
Uberti, J., Holmer, S., Flodman, M., Hong, D., and J. | Lennox, "RTP Payload Format for VP9 Video", RFC 9628, | |||
Lennox, "RTP Payload Format for VP9 Video", Work in | DOI 10.17487/RFC9628, August 2024, | |||
Progress, Internet-Draft, draft-ietf-payload-vp9-16, 10 | <https://www.rfc-editor.org/info/rfc9628>. | |||
June 2021, <https://datatracker.ietf.org/doc/html/draft- | ||||
ietf-payload-vp9-16>. | Acknowledgements | |||
Many thanks to Bernard Aboba, Jonathan Lennox, Stephan Wenger, Dale | ||||
Worley, and Magnus Westerlund for their inputs. | ||||
Authors' Addresses | Authors' Addresses | |||
Mo Zanaty | Mo Zanaty | |||
Cisco Systems | Cisco Systems | |||
170 West Tasman Drive | 170 West Tasman Drive | |||
San Jose, CA 95134 | San Jose, CA 95134 | |||
United States of America | United States of America | |||
Email: mzanaty@cisco.com | Email: mzanaty@cisco.com | |||
End of changes. 84 change blocks. | ||||
281 lines changed or deleted | 310 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |