rfc9639.original.xml | rfc9639.xml | |||
---|---|---|---|---|
<?xml version="1.0" encoding="utf-8"?> | <?xml version="1.0" encoding="utf-8"?> | |||
<!-- name="GENERATOR" content="github.com/mmarkdown/mmark Mmark Markdown Process | ||||
or - mmark.miek.nl" --> | <!DOCTYPE rfc [ | |||
<rfc version="3" ipr="trust200902" docName="draft-ietf-cellar-flac-14" submissio | <!ENTITY nbsp " "> | |||
nType="IETF" category="std" xml:lang="en" xmlns:xi="http://www.w3.org/2001/XIncl | <!ENTITY zwsp "​"> | |||
ude" indexInclude="true"> | <!ENTITY nbhy "‑"> | |||
<!ENTITY wj "⁠"> | ||||
]> | ||||
<rfc version="3" ipr="trust200902" docName="draft-ietf-cellar-flac-14" number="9 | ||||
639" submissionType="IETF" category="std" xml:lang="en" xmlns:xi="http://www.w3. | ||||
org/2001/XInclude" tocInclude="true" consensus="true" updates="" obsoletes="" so | ||||
rtRefs="true" symRefs="true" > | ||||
<front> | <front> | |||
<title abbrev="FLAC">Free Lossless Audio Codec</title><seriesInfo value="draft-i | <title abbrev="FLAC">Free Lossless Audio Codec (FLAC)</title> | |||
etf-cellar-flac-14" stream="IETF" status="standard" name="Internet-Draft"></seri | <seriesInfo name="RFC" value="9639"/> | |||
esInfo> | <author initials="M.Q.C." surname="van Beurden" fullname="Martijn van Beurden" | |||
<author initials="M.Q.C." surname="van Beurden" fullname="Martijn van Beurden">< | > | |||
organization></organization><address><postal><street></street> | <address> | |||
<country>NL</country> | <postal> | |||
</postal><email>mvanb1@gmail.com</email> | <country>Netherlands</country> | |||
</address></author><author initials="A." surname="Weaver" fullname="Andrew Weave | </postal> | |||
r"><organization></organization><address><postal><street></street> | <email>mvanb1@gmail.com</email> | |||
</postal><email>theandrewjw@gmail.com</email> | </address> | |||
</address></author><date/> | </author> | |||
<area>art</area> | <author initials="A" surname="Weaver" fullname="Andrew Weaver"> | |||
<workgroup>cellar</workgroup> | <address> | |||
<keyword>free,lossless,audio,codec,encoder,decoder,compression,compressor,archiv | <email>theandrewjw@gmail.com</email> | |||
al,archive,archiving,backup,music</keyword> | </address> | |||
</author> | ||||
<date year="2024" month="November"/> | ||||
<area>art</area> | ||||
<workgroup>cellar</workgroup> | ||||
<keyword>free</keyword> | ||||
<keyword>lossless</keyword> | ||||
<keyword>audio</keyword> | ||||
<keyword>codec</keyword> | ||||
<keyword>encoder</keyword> | ||||
<keyword>decoder</keyword> | ||||
<keyword>compression</keyword> | ||||
<keyword>compressor</keyword> | ||||
<keyword>archival</keyword> | ||||
<keyword>archive</keyword> | ||||
<keyword>archiving</keyword> | ||||
<keyword>backup</keyword> | ||||
<keyword>music</keyword> | ||||
<abstract> | <abstract> | |||
<t>This document defines the Free Lossless Audio Codec (FLAC) format and its str | <t>This document defines the Free Lossless Audio Codec (FLAC) format and its str | |||
eamable subset. FLAC is designed to reduce the amount of computer storage space | eamable subset. FLAC is designed to reduce the amount of computer storage | |||
needed to store digital audio signals without losing information in doing so (i. | space needed to store digital audio signals. It does this losslessly, | |||
e., lossless). FLAC is free in the sense that its specification is open and its | i.e., it does so without losing information. FLAC is free in the sense that i | |||
reference implementation is open-source. Compared to other lossless (audio) codi | ts specification is open and its reference implementation is open source. | |||
ng formats, FLAC is a format with low complexity and can be coded to and from wi | Compared to other lossless audio coding formats, FLAC is a format with low | |||
th little computing resources. Decoding of FLAC has seen many independent implem | complexity and can be encoded and decoded with little computing | |||
entations on many different platforms, and both encoding and decoding can be imp | resources. Decoding of FLAC has been implemented independently | |||
lemented without needing floating-point arithmetic.</t> | for many different platforms, and both encoding and decoding can | |||
be implemented without needing floating-point arithmetic. </t> | ||||
</abstract> | </abstract> | |||
</front> | </front> | |||
<middle> | <middle> | |||
<section anchor="introduction"><name>Introduction</name> | <section anchor="introduction"><name>Introduction</name> | |||
<t>This document defines the FLAC format and its streamable subset. FLAC files a | <t>This document defines the Free Lossless Audio Codec (FLAC) format and its str | |||
nd streams can code for pulse-code modulated (PCM) audio with 1 to 8 channels, s | eamable subset. FLAC files and streams can code for pulse-code modulated (PCM) a | |||
ample rates from 1 up to 1048575 hertz and bit depths from 4 up to 32 bits. Most | udio with 1 to 8 channels, sample rates from 1 to 1048575 hertz, and bit depths | |||
tools for coding to and decoding from the FLAC format have been optimized for C | from 4 to 32 bits. Most tools for coding to and decoding from the FLAC format ha | |||
D-audio, which is PCM audio with 2 channels, a sample rate of 44.1 kHz, and a bi | ve been optimized for CD-audio, which is PCM audio with 2 channels, a sample rat | |||
t depth of 16 bits.</t> | e of 44.1 kHz, and a bit depth of 16 bits.</t> | |||
<t>FLAC is able to achieve lossless compression because samples in audio signals | <t>FLAC is able to achieve lossless compression because samples in audio signals | |||
tend to be highly correlated with their close neighbors. In contrast with gener | tend to be highly correlated with their close neighbors. In contrast with gener | |||
al-purpose compressors, which often use dictionaries, do run-length coding, or e | al-purpose compressors, which often use dictionaries, do run-length coding, or e | |||
xploit long-term repetition, FLAC removes redundancy solely in the very short te | xploit long-term repetition, FLAC removes redundancy solely in the very short te | |||
rm, looking back at at most 32 samples.</t> | rm, looking back at 32 samples at most.</t> | |||
<t>The coding methods provided by the FLAC format work best on PCM audio signals | ||||
, of which the samples have a signed representation and are centered around zero | <t> The coding methods provided by the FLAC format work best on PCM audio | |||
. Audio signals in which samples have an unsigned representation must be transfo | signals with samples that have a signed representation and are centered around | |||
rmed to a signed representation as described in this document in order to achiev | zero. Audio signals in which samples have an unsigned representation must be | |||
e reasonable compression. The FLAC format is not suited for compressing audio th | transformed to a signed representation as described in this document in order | |||
at is not PCM.</t> | to achieve reasonable compression. The FLAC format is not suited for | |||
compressing audio that is not PCM.</t> | ||||
</section> | </section> | |||
<section anchor="notation-and-conventions"><name>Notation and Conventions</name> | <section anchor="notation-and-conventions"><name>Notation and Conventions</name> | |||
<t>The key words "MUST", "MUST NOT", "REQUIRED", & | <t> | |||
quot;SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT&qu | The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", | |||
ot;, "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL NOT</bcp14> | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | ", | |||
14 <xref target="RFC2119"></xref> <xref target="RFC8174"></xref> when, and only | "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", | |||
when, they appear in all capitals, as shown here.</t> | "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>", | |||
<t>Values expressed as <tt>u(n)</tt> represent unsigned big-endian integer using | "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to | |||
<tt>n</tt> bits. Values expressed as <tt>s(n)</tt> represent signed big-endian | be | |||
integer using <tt>n</tt> bits, signed two's complement. Where necessary <tt>n</t | interpreted as described in BCP 14 <xref target="RFC2119"/> <xref | |||
t> is expressed as an equation using <tt>*</tt> (multiplication), <tt>/</tt> (di | target="RFC8174"/> when, and only when, they appear in all capitals, as | |||
vision), <tt>+</tt> (addition), or <tt>-</tt> (subtraction). An inclusive range | shown here. | |||
of the number of bits expressed is represented with an ellipsis, such as <tt>u(m | </t> | |||
...n)</tt>.</t> | <t>Values expressed as <tt>u(n)</tt> represent an unsigned big-endian integer us | |||
ing <tt>n</tt> bits. Values expressed as <tt>s(n)</tt> represent a signed big-en | ||||
dian integer using <tt>n</tt> bits, signed two's complement. Where necessary, <t | ||||
t>n</tt> is expressed as an equation using <tt>*</tt> (multiplication), <tt>/</t | ||||
t> (division), <tt>+</tt> (addition), or <tt>-</tt> (subtraction). An inclusive | ||||
range of the number of bits expressed is represented with an ellipsis, such as < | ||||
tt>u(m...n)</tt>.</t> | ||||
<t>All shifts mentioned in this document are arithmetic shifts.</t> | ||||
<t>While the FLAC format can store digital audio as well as other digital signal s, this document uses terminology specific to digital audio. The use of more gen eric terminology was deemed less clear, so a reader interested in non-audio use of the FLAC format is expected to make the translation from audio-specific terms to more generic terminology.</t> | <t>While the FLAC format can store digital audio as well as other digital signal s, this document uses terminology specific to digital audio. The use of more gen eric terminology was deemed less clear, so a reader interested in non-audio use of the FLAC format is expected to make the translation from audio-specific terms to more generic terminology.</t> | |||
</section> | </section> | |||
<section anchor="definitions"><name>Definitions</name> | <section anchor="definitions"><name>Definitions</name> | |||
<ul> | <dl> | |||
<li><t><strong>Lossless compression</strong>: reducing the amount of computer st | <dt><strong>Lossless compression</strong>:</dt><dd>Reducing the amount of comput | |||
orage space needed to store data without needing to remove or irreversibly alter | er storage space needed to store data without needing to remove or irreversibly | |||
any of this data in doing so. In other words, decompressing losslessly compress | alter any of this data in doing so. In other words, decompressing losslessly com | |||
ed information returns exactly the original data.</t> | pressed information returns exactly the original data.</dd> | |||
</li> | <dt><strong>Lossy compression</strong>:</dt><dd>Like lossless compression, but | |||
<li><t><strong>Lossy compression</strong>: like lossless compression, but instea | instead removing, irreversibly altering, or only approximating information for | |||
d removing, irreversibly altering, or only approximating information for the pur | the purpose of further reducing the amount of computer storage space | |||
pose of further reducing the amount of computer storage space needed. In other w | needed. In other words, decompressing lossy compressed information returns an | |||
ords, decompressing lossy compressed information returns an approximation of the | approximation of the original data.</dd> | |||
original data.</t> | <dt><strong>Block</strong>:</dt><dd>A (short) section of linear PCM audio with o | |||
</li> | ne or more channels.</dd> | |||
<li><t><strong>Block</strong>: A (short) section of linear pulse-code modulated | <dt><strong>Subblock</strong>:</dt><dd>All samples within a corresponding block | |||
audio with one or more channels.</t> | for one channel. One or more subblocks form a block, and all subblocks in a cert | |||
</li> | ain block contain the same number of samples.</dd> | |||
<li><t><strong>Subblock</strong>: All samples within a corresponding block for o | <dt><strong>Frame</strong>:</dt><dd>A frame header, one or more subframes, and a | |||
ne channel. One or more subblocks form a block, and all subblocks in a certain b | frame footer. It encodes the contents of a corresponding block.</dd> | |||
lock contain the same number of samples.</t> | <dt><strong>Subframe</strong>:</dt><dd>An encoded subblock. All subframes within | |||
</li> | a frame code for the same number of samples. When interchannel decorrelation is | |||
<li><t><strong>Frame</strong>: A frame header, one or more subframes, and a fram | used, a subframe can correspond to either the (per-sample) average of two subbl | |||
e footer. It encodes the contents of a corresponding block.</t> | ocks or the (per-sample) difference between two subblocks, instead of to a subbl | |||
</li> | ock directly; see <xref target="interchannel-decorrelation"></xref>.</dd> | |||
<li><t><strong>Subframe</strong>: An encoded subblock. All subframes within a fr | <dt><strong>Interchannel samples</strong>:</dt><dd>A sample count that applies t | |||
ame code for the same number of samples. When interchannel decorrelation is used | o all channels. For example, one second of 44.1 kHz audio has 44100 interchannel | |||
, a subframe can correspond to either the (per-sample) average of two subblocks | samples, meaning each channel has that number of samples.</dd> | |||
or the (per-sample) difference between two subblocks, instead of to a subblock d | <dt><strong>Block size</strong>:</dt><dd>The number of interchannel samples cont | |||
irectly, see <xref target="interchannel-decorrelation"></xref>.</t> | ained in a block or coded in a frame.</dd> | |||
</li> | <dt><strong>Bit depth</strong> or <strong>bits per sample</strong>:</dt><dd>The | |||
<li><t><strong>Interchannel samples</strong>: A sample count that applies to all | number of bits used to contain each sample. This <bcp14>MUST</bcp14> be the same | |||
channels. For example, one second of 44.1 kHz audio has 44100 interchannel samp | for all subblocks in a block but <bcp14>MAY</bcp14> be different for different | |||
les, meaning each channel has that number of samples.</t> | subframes in a frame because of interchannel decorrelation. (See <xref target="i | |||
</li> | nterchannel-decorrelation"></xref> for details on interchannel decorrelation.)</ | |||
<li><t><strong>Block size</strong>: The number of interchannel samples contained | dd> | |||
in a block or coded in a frame.</t> | <dt><strong>Predictor</strong>:</dt><dd>A model used to predict samples in an au | |||
</li> | dio signal based on past samples. FLAC uses such predictors to remove redundancy | |||
<li><t><strong>Bit depth</strong> or <strong>bits per sample</strong>: the numbe | in a signal in order to be able to compress it.</dd> | |||
r of bits used to contain each sample. This MUST be the same for all subblocks i | <dt><strong>Linear predictor</strong>:</dt><dd> A predictor using linear predict | |||
n a block but MAY be different for different subframes in a frame because of int | ion (see <xref target="LinearPrediction"></xref>). This is also called <strong>l | |||
erchannel decorrelation. (See <xref target="interchannel-decorrelation"></xref> | inear predictive coding (LPC)</strong>. With a linear predictor, each prediction | |||
for details on interchannel decorrelation)</t> | is a linear combination of past samples (hence the name). A linear predictor ha | |||
</li> | s a causal discrete-time finite impulse response (see <xref target="FIR"></xref> | |||
<li><t><strong>Predictor</strong>: a model used to predict samples in an audio s | ).</dd> | |||
ignal based on past samples. FLAC uses such predictors to remove redundancy in a | <dt><strong>Fixed predictor</strong>:</dt><dd>A linear predictor in which the mo | |||
signal in order to be able to compress it.</t> | del parameters are the same across all FLAC files and thus do not need to be sto | |||
</li> | red.</dd> | |||
<li><t><strong>Linear predictor</strong>: a predictor using linear prediction (s | <dt><strong>Predictor order</strong>:</dt><dd>The number of past samples that a | |||
ee <xref target="LinearPrediction"></xref>). This is also called <strong>linear | predictor uses. For example, a 4th order predictor uses the 4 samples directly p | |||
predictive coding (LPC)</strong>. With a linear predictor, each prediction is a | receding a certain sample to predict it. In FLAC, samples used in a predictor ar | |||
linear combination of past samples, hence the name. A linear predictor has a cau | e always consecutive and are always the samples directly before the sample that | |||
sal discrete-time finite impulse response (see <xref target="FIR"></xref>).</t> | is being predicted.</dd> | |||
</li> | <dt><strong>Residual</strong>:</dt><dd>The audio signal that remains after a | |||
<li><t><strong>Muxing</strong>: short for multiplexing, combining several stream | predictor has been subtracted from a subblock. If the predictor has been able | |||
s or files into a single stream or file. In the context of this document, muxing | to remove redundancy from the signal, the samples of the remaining signal (the | |||
more specifically refers to embedding a FLAC stream in a container as described | <strong>residual samples</strong>) will have, on average, a numerical value | |||
in <xref target="container-mappings"></xref>.</t> | closer to zero than the original signal.</dd> <dt><strong>Rice | |||
</li> | code</strong>:</dt><dd>A variable-length code (see <xref | |||
<li><t><strong>Fixed predictor</strong>: a linear predictor in which the model p | target="VarLengthCode"></xref>). It uses a short code for samples close to | |||
arameters are the same across all FLAC files, and thus do not need to be stored. | zero and a progressively longer code for samples further away from zero. This | |||
</t> | makes use of the observation that residual samples are often close to zero. | |||
</li> | </dd> | |||
<li><t><strong>Predictor order</strong>: the number of past samples that a predi | <dt><strong>Muxing</strong>:</dt><dd>Short for multiplexing. Combining several s | |||
ctor uses. For example, a 4th order predictor uses the 4 samples directly preced | treams or files into a single stream or file. In | |||
ing a certain sample to predict it. In FLAC, samples used in a predictor are alw | the context of this document, muxing specifically refers to embedding a FLAC str | |||
ays consecutive, and are always the samples directly before the sample that is b | eam in a container as described in <xref | |||
eing predicted.</t> | target="container-mappings"></xref>.</dd> | |||
</li> | </dl> </section> <section anchor="conceptual-overview"><name>Conceptual | |||
<li><t><strong>Residual</strong>: The audio signal that remains after a predicto | Overview</name> | |||
r has been subtracted from a subblock. If the predictor has been able to remove | <t>Similar to many other audio coders, a FLAC file is encoded following the step | |||
redundancy from the signal, the samples of the remaining signal (the <strong>res | s below. To decode a FLAC file, these steps are performed in reverse order, i.e. | |||
idual samples</strong>) will have, on average, a smaller numerical value than th | , from bottom to top.</t> | |||
e original signal.</t> | <ol> | |||
</li> | ||||
<li><t><strong>Rice code</strong>: A variable-length code (see <xref target="Var | ||||
LengthCode"></xref>) that compresses data by making use of the observation that, | ||||
after using an effective predictor, most residual samples are closer to zero th | ||||
an the original samples, while still allowing for a small part of the samples to | ||||
be much larger.</t> | ||||
</li> | ||||
</ul> | ||||
</section> | ||||
<section anchor="conceptual-overview"><name>Conceptual overview</name> | ||||
<t>Similar to many other audio coders, a FLAC file is encoded following the step | ||||
s below. On decoding a FLAC file, these steps are undone in reverse order, i.e., | ||||
from bottom to top.</t> | ||||
<ul> | ||||
<li><t><strong>Blocking</strong> (see <xref target="blocking"></xref>). The inpu t is split up into many contiguous blocks.</t> | <li><t><strong>Blocking</strong> (see <xref target="blocking"></xref>). The inpu t is split up into many contiguous blocks.</t> | |||
</li> | </li> | |||
<li><t><strong>Interchannel Decorrelation</strong> (see <xref target="interchann el-decorrelation"></xref>). In the case of stereo streams, the FLAC format allow s for transforming the left-right signal into a mid-side signal, a left-side sig nal or a side-right signal to remove redundancy between channels. Choosing betwe en any of these transformations is done independently for each block.</t> | <li><t><strong>Interchannel Decorrelation</strong> (see <xref target="interchann el-decorrelation"></xref>). In the case of stereo streams, the FLAC format allow s for transforming the left-right signal into a mid-side signal, a left-side sig nal, or a side-right signal to remove redundancy between channels. Choosing betw een any of these transformations is done independently for each block.</t> | |||
</li> | </li> | |||
<li><t><strong>Prediction</strong> (see <xref target="prediction"></xref>). To r emove redundancy in a signal, a predictor is stored for each subblock or its tra nsformation as formed in the previous step. A predictor consists of a simple mat hematical description that can be used, as the name implies, to predict a certai n sample from the samples that preceded it. As this prediction is rarely exact, the error of this prediction is passed on to the next stage. The predictor of ea ch subblock is completely independent from other subblocks. Since the methods of prediction are known to both the encoder and decoder, only the parameters of th e predictor need to be included in the compressed stream. If no usable predictor can be found for a certain subblock, the signal is stored uncompressed and the next stage is skipped.</t> | <li><t><strong>Prediction</strong> (see <xref target="prediction"></xref>). To r emove redundancy in a signal, a predictor is stored for each subblock or its tra nsformation as formed in the previous step. A predictor consists of a simple mat hematical description that can be used, as the name implies, to predict a certai n sample from the samples that preceded it. As this prediction is rarely exact, the error of this prediction is passed on to the next stage. The predictor of ea ch subblock is completely independent from other subblocks. Since the methods of prediction are known to both the encoder and decoder, only the parameters of th e predictor need to be included in the compressed stream. If no usable predictor can be found for a certain subblock, the signal is stored uncompressed, and the next stage is skipped.</t> | |||
</li> | </li> | |||
<li><t><strong>Residual Coding</strong> (see <xref target="residual-coding"></xr ef>). As the predictor does not describe the signal exactly, the difference betw een the original signal and the predicted signal (called the error or residual s ignal) is coded losslessly. If the predictor is effective, the residual signal w ill require fewer bits per sample than the original signal. FLAC uses Rice codin g, a subset of Golomb coding, with either 4-bit or 5-bit parameters to code the residual signal.</t> | <li><t><strong>Residual Coding</strong> (see <xref target="residual-coding"></xr ef>). As the predictor does not describe the signal exactly, the difference betw een the original signal and the predicted signal (called the error or residual s ignal) is coded losslessly. If the predictor is effective, the residual signal w ill require fewer bits per sample than the original signal. FLAC uses Rice codin g, a subset of Golomb coding, with either 4-bit or 5-bit parameters to code the residual signal.</t> | |||
</li> | </li> | |||
</ul> | </ol> | |||
<t>In addition, FLAC specifies a metadata system (see <xref target="file-level-m | <t>In addition, FLAC specifies a metadata system (see <xref target="file-level-m | |||
etadata"></xref>), which allows arbitrary information about the stream to be inc | etadata"></xref>) that allows arbitrary information about the stream to be inclu | |||
luded at the beginning of the stream.</t> | ded at the beginning of the stream.</t> | |||
<section anchor="blocking"><name>Blocking</name> | <section anchor="blocking"><name>Blocking</name> | |||
<t>The block size used for audio data has a direct effect on the compression rat | <t>The block size used for audio data has a direct effect on the compression rat | |||
io. If the block size is too small, the resulting large number of frames means t | io. If the block size is too small, the resulting large number of frames means t | |||
hat a disproportionate amount of bytes will be spent on frame headers. If the bl | hat a disproportionate number of bytes will be spent on frame headers. If the bl | |||
ock size is too large, the characteristics of the signal may vary so much that t | ock size is too large, the characteristics of the signal may vary so much that t | |||
he encoder will be unable to find a good predictor. In order to simplify encoder | he encoder will be unable to find a good predictor. In order to simplify encoder | |||
/decoder design, FLAC imposes a minimum block size of 16 samples, except for the | /decoder design, FLAC imposes a minimum block size of 16 samples, except for the | |||
last block, and a maximum block size of 65535 samples. The last block is allowe | last block, and a maximum block size of 65535 samples. The last block is allowe | |||
d to be smaller than 16 samples to be able to match the length of the encoded au | d to be smaller than 16 samples to be able to match the length of the encoded au | |||
dio without using padding.</t> | dio without using padding.</t> | |||
<t>While the block size does not have to be constant in a FLAC file, it is often | <t>While the block size does not have to be constant in a FLAC file, it is often | |||
difficult to find the optimal arrangement of block sizes for maximum compressio | difficult to find the optimal arrangement of block sizes for maximum compressio | |||
n. Because of this, the FLAC format explicitly stores whether a file has a const | n. Because of this, a FLAC stream has explicitly either a constant or variable | |||
ant or a variable block size throughout the stream, and stores a block number in | block size throughout and stores a block number instead of a sample number | |||
stead of a sample number to slightly improve compression if a stream has a const | to slightly improve compression if a stream has a constant block size.</t> | |||
ant block size.</t> | ||||
</section> | </section> | |||
<section anchor="interchannel-decorrelation"><name>Interchannel Decorrelation</n ame> | <section anchor="interchannel-decorrelation"><name>Interchannel Decorrelation</n ame> | |||
<t>In many audio files, channels are correlated. The FLAC format can exploit thi | <t>Channels are correlated in many audio files. The FLAC format can exploit this | |||
s correlation in stereo files by not directly coding subblocks into subframes, b | correlation in stereo files by coding an average of all samples in both | |||
ut instead coding an average of all samples in both subblocks (a mid channel) or | subblocks (a mid channel) or the difference between all samples in both subblock | |||
the difference between all samples in both subblocks (a side channel). The foll | s (a side channel) instead of directly coding subblocks into subframes. The foll | |||
owing combinations are possible:</t> | owing combinations are possible:</t> | |||
<ul> | <ul> | |||
<li><t><strong>Independent</strong>. All channels are coded independently. All n on-stereo files MUST be encoded this way.</t> | <li><t><strong>Independent</strong>. All channels are coded independently. All n on-stereo files <bcp14>MUST</bcp14> be encoded this way.</t> | |||
</li> | </li> | |||
<li><t><strong>Mid-side</strong>. A left and right subblock are converted to mid and side subframes. To calculate a sample for a mid subframe, the corresponding left and right samples are summed and the result is shifted right by 1 bit. To calculate a sample for a side subframe, the corresponding right sample is subtra cted from the corresponding left sample. On decoding, all mid channel samples ha ve to be shifted left by 1 bit. Also, if a side channel sample is odd, 1 has to be added to the corresponding mid channel sample after it has been shifted left by one bit. To reconstruct the left channel, the corresponding samples in the mi d and side subframes are added and the result shifted right by 1 bit, while for the right channel the side channel has to be subtracted from the mid channel and the result shifted right by 1 bit.</t> | <li><t><strong>Mid-side</strong>. A left and right subblock are converted to mid and side subframes. To calculate a sample for a mid subframe, the corresponding left and right samples are summed, and the result is shifted right by 1 bit. To calculate a sample for a side subframe, the corresponding right sample is subtr acted from the corresponding left sample. On decoding, all mid channel samples h ave to be shifted left by 1 bit. Also, if a side channel sample is odd, 1 has to be added to the corresponding mid channel sample after it has been shifted left by 1 bit. To reconstruct the left channel, the corresponding samples in the mid and side subframes are added and the result shifted right by 1 bit. For the rig ht channel, the side channel has to be subtracted from the mid channel and the r esult shifted right by 1 bit.</t> | |||
</li> | </li> | |||
<li><t><strong>Left-side</strong>. The left subblock is coded and the left and r ight subblocks are used to code a side subframe. The side subframe is constructe d in the same way as for mid-side. To decode, the right subblock is restored by subtracting the samples in the side subframe from the corresponding samples in t he the left subframe.</t> | <li><t><strong>Left-side</strong>. The left subblock is coded, and the left and right subblocks are used to code a side subframe. The side subframe is construct ed in the same way as for mid-side. To decode, the right subblock is restored by subtracting the samples in the side subframe from the corresponding samples in the left subframe.</t> | |||
</li> | </li> | |||
<li><t><strong>Side-right</strong>. The left and right subblocks are used to cod e a side subframe and the right subblock is coded. The side subframe is construc ted in the same way as for mid-side. To decode, the left subblock is restored by adding the samples in the side subframe to the corresponding samples in the rig ht subframe.</t> | <li><t><strong>Side-right</strong>. The left and right subblocks are used to cod e a side subframe, and the right subblock is coded. The side subframe is constru cted in the same way as for mid-side. To decode, the left subblock is restored b y adding the samples in the side subframe to the corresponding samples in the ri ght subframe.</t> | |||
</li> | </li> | |||
</ul> | </ul> | |||
<t>The side channel needs one extra bit of bit depth as the subtraction can prod uce sample values twice as large as the maximum possible in any given bit depth. The mid channel in mid-side stereo does not need one extra bit, as it is shifte d right one bit. The right shift of the mid channel does not lead to lossy behav ior, because an odd sample in the mid subframe must always be accompanied by a c orresponding odd sample in the side subframe, which means the lost least-signifi cant bit can be restored by taking it from the sample in the side subframe.</t> | <t>The side channel needs one extra bit of bit depth, as the subtraction can pro duce sample values twice as large as the maximum possible in any given bit depth . The mid channel in mid-side stereo does not need one extra bit, as it is shift ed right 1 bit. The right shift of the mid channel does not lead to lossy behavi or because an odd sample in the mid subframe must always be accompanied by a cor responding odd sample in the side subframe, which means the lost least-significa nt bit can be restored by taking it from the sample in the side subframe.</t> | |||
</section> | </section> | |||
<section anchor="prediction"><name>Prediction</name> | <section anchor="prediction"><name>Prediction</name> | |||
<t>The FLAC format has four methods for modeling the input signal:</t> | <t>The FLAC format has four methods for modeling the input signal:</t> | |||
<ol> | <ol> | |||
<li><t><strong>Verbatim</strong>. Samples are stored directly, without any model ing. This method is used for inputs with little correlation, like white noise. S ince the raw signal is not actually passed through the residual coding stage (it is added to the stream 'verbatim'), this method is different from using a zero- order fixed predictor.</t> | <li><t><strong>Verbatim</strong>. Samples are stored directly, without any model ing. This method is used for inputs with little correlation. Since the raw signa l is not actually passed through the residual coding stage (it is added to the s tream "verbatim"), this method is different from using a zero-order fixed predic tor.</t> | |||
</li> | </li> | |||
<li><t><strong>Constant</strong>. A single sample value is stored. This method i s used whenever a signal is pure DC ("digital silence"), i.e., a const ant value throughout.</t> | <li><t><strong>Constant</strong>. A single sample value is stored. This method i s used whenever a signal is pure DC ("digital silence"), i.e., a constant value throughout.</t> | |||
</li> | </li> | |||
<li><t><strong>Fixed predictor</strong>. Samples are predicted with one of five fixed (i.e., predefined) predictors, and the error of this prediction is process ed by the residual coder. These fixed predictors are well suited for predicting simple waveforms. Since the predictors are fixed, no predictor coefficients are stored. From a mathematical point of view, the predictors work by extrapolating the signal from the previous samples. The number of previous samples used is equ al to the predictor order. For more information, see <xref target="fixed-predict or-subframe"></xref>.</t> | <li><t><strong>Fixed predictor</strong>. Samples are predicted with one of five fixed (i.e., predefined) predictors, and the error of this prediction is process ed by the residual coder. These fixed predictors are well suited for predicting simple waveforms. Since the predictors are fixed, no predictor coefficients are stored. From a mathematical point of view, the predictors work by extrapolating the signal from the previous samples. The number of previous samples used is equ al to the predictor order. For more information, see <xref target="fixed-predict or-subframe"></xref>.</t> | |||
</li> | </li> | |||
<li><t><strong>Linear predictor</strong>. Samples are predicted using past sampl es and a set of predictor coefficients, and the error of this prediction is proc essed by the residual coder. Compared to a fixed predictor, using a generic line ar predictor adds overhead as predictor coefficients need to be stored. Therefor e, this method of prediction is best suited for predicting more complex waveform s, where the added overhead is offset by space savings in the residual coding st age resulting from more accurate prediction. A linear predictor in FLAC has two parameters besides the predictor coefficients and the predictor order: the numbe r of bits with which each coefficient is stored (the coefficient precision) and a prediction right shift. A prediction is formed by taking the sum of multiplyin g each predictor coefficient with the corresponding past sample, and dividing th at sum by applying the specified right shift. For more information, see <xref ta rget="linear-predictor-subframe"></xref>.</t> | <li><t><strong>Linear predictor</strong>. Samples are predicted using past sampl es and a set of predictor coefficients, and the error of this prediction is proc essed by the residual coder. Compared to a fixed predictor, using a generic line ar predictor adds overhead as predictor coefficients need to be stored. Therefor e, this method of prediction is best suited for predicting more complex waveform s, where the added overhead is offset by space savings in the residual coding st age resulting from more accurate prediction. A linear predictor in FLAC has two parameters besides the predictor coefficients and the predictor order: the numbe r of bits with which each coefficient is stored (the coefficient precision) and a prediction right shift. A prediction is formed by taking the sum of multiplyin g each predictor coefficient with the corresponding past sample and dividing tha t sum by applying the specified right shift. For more information, see <xref tar get="linear-predictor-subframe"></xref>.</t> | |||
</li> | </li> | |||
</ol> | </ol> | |||
<t>A FLAC encoder is free to select any of the above methods to model the input. However, to ensure lossless coding, the following exceptions apply:</t> | <t>A FLAC encoder is free to select any of the above methods to model the input. However, to ensure lossless coding, the following exceptions apply:</t> | |||
<ul spacing="compact"> | <ul> | |||
<li>When the samples that need to be stored do not all have the same value (i.e. , the signal is not constant), a constant subframe cannot be used.</li> | <li>When the samples that need to be stored do not all have the same value (i.e. , the signal is not constant), a constant subframe cannot be used.</li> | |||
<li>When an encoder is unable to find a fixed or linear predictor for which all residual samples are representable in 32-bit signed integers as stated in <xref target="coded-residual"></xref>, a verbatim subframe is used.</li> | <li>When an encoder is unable to find a fixed or linear predictor for which all residual samples are representable in 32-bit signed integers as stated in <xref target="coded-residual"></xref>, a verbatim subframe is used.</li> | |||
</ul> | </ul> | |||
<t>For more information on fixed and linear predictors, see <xref target="HPL-19 99-144"></xref> and <xref target="robinson-tr156"></xref>.</t> | <t>For more information on fixed and linear predictors, see <xref target="Lossle ss-Compression"></xref> and <xref target="Robinson-TR156"></xref>.</t> | |||
</section> | </section> | |||
<section anchor="residual-coding"><name>Residual Coding</name> | <section anchor="residual-coding"><name>Residual Coding</name> | |||
<t>If a subframe uses a predictor to approximate the audio signal, a residual is | <t>If a subframe uses a predictor to approximate the audio signal, a residual is | |||
stored to 'correct' the approximation to the exact value. When an effective pre | stored to "correct" the approximation to the exact value. When an effective pre | |||
dictor is used, the average numerical value of the residual samples is smaller t | dictor is used, the average numerical value of the residual samples is smaller t | |||
han that of the samples before prediction. While having smaller values on averag | han that of the samples before prediction. While having smaller values on averag | |||
e, it is possible that a few 'outlier' residual samples are much larger than any | e, it is possible that a few "outlier" residual samples are much larger than any | |||
of the original samples. Sometimes these outliers even exceed the range the bit | of the original samples. | |||
depth of the original audio offers.</t> | Sometimes these outliers even exceed the range that the bit depth of the origina | |||
<t>To be able to efficiently code such a stream of relatively small numbers with | l audio offers.</t> | |||
an occasional outlier, Rice coding (a subset of Golomb coding) is used. Dependi | <t>To efficiently code such a stream of relatively small numbers with an occasio | |||
ng on how small the numbers are that have to be coded, a Rice parameter is chose | nal outlier, Rice coding (a subset of Golomb coding) is used. Depending on how s | |||
n. The numerical value of each residual sample is split into two parts by dividi | mall the numbers are that have to be coded, a Rice parameter is chosen. The nume | |||
ng it by <tt>2^(Rice parameter)</tt>, creating a quotient and a remainder. The q | rical value of each residual sample is split into two parts by dividing it by 2< | |||
uotient is stored in unary form, the remainder in binary form. If indeed most re | sup>(Rice parameter)</sup>, creating a quotient and a remainder. | |||
sidual samples are close to zero and a suitable Rice parameter is chosen, this f | ||||
orm of coding, with a so-called variable-length code, uses fewer bits than the r | The quotient is stored in unary form and the remainder in binary form. If indeed | |||
esidual in unencoded form.</t> | most residual samples are close to zero and a suitable Rice parameter is chosen | |||
, this form of coding, with a so-called variable-length code, uses fewer bits th | ||||
an the residual in unencoded form.</t> | ||||
<t>As Rice codes can only handle unsigned numbers, signed numbers are zigzag enc oded to a so-called folded residual. See <xref target="coded-residual"></xref> f or a more thorough explanation.</t> | <t>As Rice codes can only handle unsigned numbers, signed numbers are zigzag enc oded to a so-called folded residual. See <xref target="coded-residual"></xref> f or a more thorough explanation.</t> | |||
<t>Quite often, the optimal Rice parameter varies over the course of a subframe. To accommodate this, the residual can be split up into partitions, where each p artition has its own Rice parameter. To keep overhead and complexity low, the nu mber of partitions used in a subframe is limited to powers of two.</t> | <t>Quite often, the optimal Rice parameter varies over the course of a subframe. To accommodate this, the residual can be split up into partitions, where each p artition has its own Rice parameter. To keep overhead and complexity low, the nu mber of partitions used in a subframe is limited to powers of two.</t> | |||
<t>The FLAC format uses two forms of Rice coding, which only differ in the numbe r of bits used for encoding the Rice parameter, either 4 or 5 bits.</t> | <t>The FLAC format uses two forms of Rice coding, which only differ in the numbe r of bits used for encoding the Rice parameter, either 4 or 5 bits.</t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="format-principles"><name>Format principles</name> | <section anchor="format-principles"><name>Format Principles</name> | |||
<t>FLAC has no format version information, but it does contain reserved space in | <t>FLAC has no format version information, but it does contain reserved space in | |||
several places. Future versions of the format MAY use this reserved space safel | several places. Future versions of the format <bcp14>MAY</bcp14> use this reser | |||
y without breaking the format of older streams. Older decoders MAY choose to abo | ved space safely without breaking the format of older streams. Older decoders <b | |||
rt decoding when encountering data encoded using methods they do not recognize. | cp14>MAY</bcp14> choose to abort decoding when encountering data that is encoded | |||
Apart from reserved patterns, the format specifies forbidden patterns in certain | using methods they do not recognize. Apart from reserved patterns, the format s | |||
places, meaning that the patterns MUST NOT appear in any bitstream. They are li | pecifies forbidden patterns in certain places, meaning that the patterns <bcp14> | |||
sted in the following table.</t> | MUST NOT</bcp14> appear in any bitstream. They are listed in the following table | |||
.</t> | ||||
<table anchor="tableforbiddenpatterns"> | <table anchor="tableforbiddenpatterns"> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
<th align="left">Reference</th> | <th align="left">Reference</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
skipping to change at line 169 ¶ | skipping to change at line 222 ¶ | |||
<td align="left">Minimum and maximum block sizes smaller than 16 in streaminfo m etadata block</td> | <td align="left">Minimum and maximum block sizes smaller than 16 in streaminfo m etadata block</td> | |||
<td align="left"><xref target="streaminfo"></xref></td> | <td align="left"><xref target="streaminfo"></xref></td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">Sample rate bits 0b1111</td> | <td align="left">Sample rate bits 0b1111</td> | |||
<td align="left"><xref target="sample-rate-bits"></xref></td> | <td align="left"><xref target="sample-rate-bits"></xref></td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">Uncommon blocksize 65536</td> | <td align="left">Uncommon block size 65536</td> | |||
<td align="left"><xref target="uncommon-block-size"></xref></td> | <td align="left"><xref target="uncommon-block-size"></xref></td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">Predictor coefficient precision bits 0b1111</td> | <td align="left">Predictor coefficient precision bits 0b1111</td> | |||
<td align="left"><xref target="linear-predictor-subframe"></xref></td> | <td align="left"><xref target="linear-predictor-subframe"></xref></td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">Negative predictor right shift</td> | <td align="left">Negative predictor right shift</td> | |||
<td align="left"><xref target="linear-predictor-subframe"></xref></td> | <td align="left"><xref target="linear-predictor-subframe"></xref></td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>All numbers used in a FLAC bitstream are integers, there are no float | </table><t>All numbers used in a FLAC bitstream are integers; there are no float | |||
ing-point representations. All numbers are big-endian coded, except the field le | ing-point representations. All numbers are big-endian coded, except the field le | |||
ngths used in Vorbis comments (see <xref target="vorbis-comment"></xref>), which | ngths used in Vorbis comments (see <xref target="vorbis-comment"></xref>), which | |||
are little-endian coded. This exception for Vorbis comments is to keep as much | are little-endian coded. This exception for Vorbis comments is to keep as much | |||
commonality as possible with Vorbis comments as used by the Vorbis codec (see <x | commonality as possible with Vorbis comments as used by the Vorbis codec (see <x | |||
ref target="Vorbis"></xref>). All numbers are unsigned except linear predictor c | ref target="Vorbis"></xref>). All numbers are unsigned except linear predictor c | |||
oefficients, the linear prediction shift (see <xref target="linear-predictor-sub | oefficients, the linear prediction shift (see <xref target="linear-predictor-sub | |||
frame"></xref>), and numbers that directly represent samples, which are signed. | frame"></xref>), and numbers that directly represent samples, which are signed. | |||
None of these restrictions apply to application metadata blocks or to Vorbis com | None of these restrictions apply to application metadata blocks or to Vorbis com | |||
ment field contents.</t> | ment field contents.</t> | |||
<t>All samples encoded to and decoded from the FLAC format MUST be in a signed r | <t>All samples encoded to and decoded from the FLAC format <bcp14>MUST</bcp14> b | |||
epresentation.</t> | e in a signed representation.</t> | |||
<t>There are several ways to convert unsigned sample representations to signed s | <t>There are several ways to convert unsigned sample representations to | |||
ample representations, but the coding methods provided by the FLAC format work b | signed sample representations, but the coding methods provided by the | |||
est on audio signals of which the numerical values of the samples are centered a | FLAC format work best on samples that have numerical values that are | |||
round zero, i.e., have no DC offset. In most unsigned audio formats, signals are | centered around zero, i.e., have no DC offset. | |||
centered around halfway the range of the unsigned integer type used. If that is | In most unsigned audio formats, signals are centered around halfway within the r | |||
the case, converting sample representations by first copying the number to a si | ange of the unsigned integer type used. If that is the case, converting sample r | |||
gned integer with sufficient range and then subtracting half of the range of the | epresentations by first copying the number to a signed integer with a sufficient | |||
unsigned integer type, results in a signal with samples centered around 0.</t> | range and then subtracting half of the range of the unsigned integer type resul | |||
<t>Unary coding in a FLAC bitstream is done with zero bits terminated with a one | ts in a signal with samples centered around 0.</t> | |||
bit, e.g., the number 5 is coded unary as 0b000001. This prevents the frame syn | <t>Unary coding in a FLAC bitstream is done with zero bits terminated with a one | |||
c code from appearing in unary coded numbers.</t> | bit, e.g., the number 5 is coded unary as 0b000001. This prevents the frame syn | |||
<t>When a FLAC file contains data that is forbidden or otherwise not valid, deco | c code from appearing in unary-coded numbers.</t> | |||
der behavior is left unspecified. A decoder MAY choose to stop decoding upon enc | <t>When a FLAC file contains data that is forbidden or otherwise not valid, deco | |||
ountering such data. Examples of such data are</t> | der behavior is left unspecified. A decoder <bcp14>MAY</bcp14> choose to stop de | |||
coding upon encountering such data. Examples of such data include the following: | ||||
</t> | ||||
<ul spacing="compact"> | <ul> | |||
<li>One or more decoded sample values exceed the range offered by the bit depth | <li>One or more decoded sample values exceed the range offered by the bit depth | |||
as coded for that frame. E.g., in a frame with a bit depth of 8 bits, any sample | as coded for that frame. For example, in a frame with a bit depth of 8 bits, any | |||
s not in the inclusive range from -128 to 127 are not valid.</li> | samples not in the inclusive range from -128 to 127 are not valid.</li> | |||
<li>The number of wasted bits (see <xref target="wasted-bits-per-sample"></xref> ) used by a subframe is such that the bit depth of that subframe (see <xref targ et="constant-subframe"></xref> for a description of subframe bit depth) equals z ero or is negative.</li> | <li>The number of wasted bits (see <xref target="wasted-bits-per-sample"></xref> ) used by a subframe is such that the bit depth of that subframe (see <xref targ et="constant-subframe"></xref> for a description of subframe bit depth) equals z ero or is negative.</li> | |||
<li>A frame header CRC (see <xref target="frame-header-crc"></xref>) or frame fo | <li>A frame header Cyclic Redundancy Check (CRC) (see <xref target="frame-header | |||
oter CRC (see <xref target="frame-footer"></xref>) does not validate.</li> | -crc"></xref>) or frame footer CRC (see <xref target="frame-footer"></xref>) doe | |||
<li>One of the forbidden bit patterns described in <xref target="tableforbiddenp | s not validate.</li> | |||
atterns"></xref> above is used.</li> | <li>One of the forbidden bit patterns described in <xref target="tableforbiddenp | |||
atterns"></xref> is used.</li> | ||||
</ul> | </ul> | |||
</section> | </section> | |||
<section anchor="format-layout-overview"><name>Format layout overview</name> | <section anchor="format-layout-overview"><name>Format Layout Overview</name> | |||
<t>A FLAC bitstream consists of the <tt>fLaC</tt> (i.e., 0x664C6143) marker at t | <t>A FLAC bitstream consists of the <tt>fLaC</tt> (i.e., 0x664C6143) marker at t | |||
he beginning of the stream, followed by a mandatory metadata block (called the S | he beginning of the stream, followed by a mandatory metadata block (called the s | |||
TREAMINFO block), any number of other metadata blocks, and then the audio frames | treaminfo metadata block), any number of other metadata blocks, and then the aud | |||
.</t> | io frames.</t> | |||
<t>FLAC supports 127 kinds of metadata blocks; currently, 7 kinds are defined in <xref target="file-level-metadata"></xref>.</t> | <t>FLAC supports 127 kinds of metadata blocks; currently, 7 kinds are defined in <xref target="file-level-metadata"></xref>.</t> | |||
<t>The audio data is composed of one or more audio frames. Each frame consists o | <t>The audio data is composed of one or more audio frames. Each frame consists o | |||
f a frame header, which contains a sync code, information about the frame (like | f a frame header that contains a sync code, information about the frame (like th | |||
the block size, sample rate and number of channels), and an 8-bit CRC. The frame | e block size, sample rate, and number of channels), and an 8-bit CRC. The frame | |||
header also contains either the sample number of the first sample in the frame | header also contains either the sample number of the first sample in the frame ( | |||
(for variable block size streams), or the frame number (for fixed block size str | for variable block size streams) or the frame number (for fixed block size strea | |||
eams). This allows for fast, sample-accurate seeking to be performed. Following | ms). This allows for fast, sample-accurate seeking to be performed. | |||
the frame header are encoded subframes, one for each channel. The frame is then | Following the frame header are encoded subframes, one for each channel. The fram | |||
zero-padded to a byte boundary and finished with a frame footer containing a che | e is then zero-padded to a byte boundary and finished with a frame footer contai | |||
cksum for the frame. Each subframe has its own header that specifies how the sub | ning a checksum for the frame. Each subframe has its own header that specifies h | |||
frame is encoded.</t> | ow the subframe is encoded.</t> | |||
<t>In order to allow a decoder to start decoding at any place in the stream, eac h frame starts with a byte-aligned 15-bit sync code. However, since it is not gu aranteed that the sync code does not appear elsewhere in the frame, the decoder can check that it synced correctly by parsing the rest of the frame header and v alidating the frame header CRC.</t> | <t>In order to allow a decoder to start decoding at any place in the stream, eac h frame starts with a byte-aligned 15-bit sync code. However, since it is not gu aranteed that the sync code does not appear elsewhere in the frame, the decoder can check that it synced correctly by parsing the rest of the frame header and v alidating the frame header CRC.</t> | |||
<t>Furthermore, to allow a decoder to start decoding at any place in the stream | <t>Furthermore, to allow a decoder to start decoding at any place in the stream | |||
even without having received a streaminfo metadata block, each frame header cont | even without having received a streaminfo metadata block, each frame header cont | |||
ains some basic information about the stream. This information includes sample r | ains some basic information about the stream. This information includes sample r | |||
ate, bits per sample, number of channels, etc. Since the frame header is overhea | ate, bits per sample, number of channels, etc. Since the frame header is overhea | |||
d, it has a direct effect on the compression ratio. To keep the frame header as | d, it has a direct effect on the compression ratio. To keep the frame header as | |||
small as possible, FLAC uses lookup tables for the most commonly used values for | small as possible, FLAC uses lookup tables for the most commonly used values for | |||
frame properties. When a certain property has a value that is not covered by th | frame properties. When a certain property has a value that is not covered by th | |||
e lookup table, the decoder is directed to find the value of that property (for | e lookup table, the decoder is directed to find the value of that property (for | |||
example, the sample rate) at the end of the frame header or in the streaminfo me | example, the sample rate) at the end of the frame header or in the streaminfo me | |||
tadata block. If a frame header refers to the streaminfo metadata block, the fil | tadata block. If a frame header refers to the streaminfo metadata block, the fil | |||
e is not 'streamable', see <xref target="streamable-subset"></xref> for details. | e is not "streamable"; see <xref target="streamable-subset"></xref> for details. | |||
By using lookup tables, the file is streamable and the frame header size small | By using lookup tables, the file is streamable and the frame header size is sma | |||
for the most common forms of audio data.</t> | ll for the most common forms of audio data.</t> | |||
<t>Individual subframes (one for each channel) are coded separately within a fra | <t>Individual subframes (one for each channel) are coded separately within a fra | |||
me, and appear serially in the stream. In other words, the encoded audio data is | me and appear serially in the stream. In other words, the encoded audio data is | |||
NOT channel-interleaved. This reduces decoder complexity at the cost of requiri | NOT channel-interleaved. This reduces decoder complexity at the cost of requirin | |||
ng larger decode buffers. Each subframe has its own header specifying the attrib | g larger decode buffers. Each subframe has its own header specifying the attribu | |||
utes of the subframe, like prediction method and order, residual coding paramete | tes of the subframe, like prediction method and order, residual coding parameter | |||
rs, etc. Each subframe header is followed by the encoded audio data for that cha | s, etc. Each subframe header is followed by the encoded audio data for that chan | |||
nnel.</t> | nel.</t> | |||
</section> | </section> | |||
<section anchor="streamable-subset"><name>Streamable subset</name> | <section anchor="streamable-subset"><name>Streamable Subset</name> | |||
<t>The FLAC format specifies a subset of itself as the FLAC streamable subset. T | <t>The FLAC format specifies a subset of itself as the FLAC streamable subset. T | |||
he purpose of this is to ensure that any streams encoded according to this subse | he purpose of this is to ensure that any streams encoded according to this subse | |||
t are truly "streamable", meaning that a decoder that cannot seek with | t are truly "streamable", meaning that a decoder that cannot seek within the str | |||
in the stream can still pick up in the middle of the stream and start decoding. | eam can still pick up in the middle of the stream and start decoding. It also ma | |||
It also makes hardware decoder implementations more practical by limiting the en | kes hardware decoder implementations more practical by limiting the encoding par | |||
coding parameters in such a way that decoder buffer sizes and other resource req | ameters in such a way that decoder buffer sizes and other resource requirements | |||
uirements can be easily determined. The streamable subset makes the following li | can be easily determined. The streamable subset makes the following limitations | |||
mitations on what MAY be used in the stream:</t> | on what <bcp14>MAY</bcp14> be used in the stream:</t> | |||
<ul spacing="compact"> | <ul> | |||
<li>The sample rate bits (see <xref target="sample-rate-bits"></xref>) in the fr | <li>The sample rate bits (see <xref target="sample-rate-bits"></xref>) in the fr | |||
ame header MUST be 0b0001-0b1110, i.e., the frame header MUST NOT refer to the s | ame header <bcp14>MUST</bcp14> be 0b0001-0b1110, i.e., the frame header <bcp14>M | |||
treaminfo metadata block to describe the sample rate.</li> | UST NOT</bcp14> refer to the streaminfo metadata block to describe the sample ra | |||
<li>The bit depth bits (see <xref target="bit-depth-bits"></xref>) in the frame | te.</li> | |||
header MUST be 0b001-0b111, i.e., the frame header MUST NOT refer to the streami | <li>The bit depth bits (see <xref target="bit-depth-bits"></xref>) in the frame | |||
nfo metadata block to describe the bit depth.</li> | header <bcp14>MUST</bcp14> be 0b001-0b111, i.e., the frame header <bcp14>MUST NO | |||
<li>The stream MUST NOT contain blocks with more than 16384 interchannel samples | T</bcp14> refer to the streaminfo metadata block to describe the bit depth.</li> | |||
, i.e., the maximum block size must not be larger than 16384.</li> | <li>The stream <bcp14>MUST NOT</bcp14> contain blocks with more than 16384 inter | |||
<li>Audio with a sample rate less than or equal to 48000 Hz MUST NOT be containe | channel samples, i.e., the maximum block size must not be larger than 16384.</li | |||
d in blocks with more than 4608 interchannel samples, i.e., the maximum block si | > | |||
ze used for this audio must not be larger than 4608.</li> | <li>Audio with a sample rate less than or equal to 48000 Hz <bcp14>MUST NOT</bcp | |||
<li>Linear prediction subframes (see <xref target="linear-predictor-subframe"></ | 14> be contained in blocks with more than 4608 interchannel samples, i.e., the m | |||
xref>) containing audio with a sample rate less than or equal to 48000 Hz MUST h | aximum block size used for this audio must not be larger than 4608.</li> | |||
ave a predictor order less than or equal to 12, i.e., the subframe type bits in | <li>Linear prediction subframes (see <xref target="linear-predictor-subframe"></ | |||
the subframe header (see <xref target="subframe-header"></xref>) MUST NOT be 0b1 | xref>) containing audio with a sample rate less than or equal to 48000 Hz <bcp14 | |||
01100-0b111111.</li> | >MUST</bcp14> have a predictor order less than or equal to 12, i.e., the subfram | |||
<li>The Rice partition order (see <xref target="coded-residual"></xref>) MUST be | e type bits in the subframe header (see <xref target="subframe-header"></xref>) | |||
less than or equal to 8.</li> | <bcp14>MUST NOT</bcp14> be 0b101100-0b111111.</li> | |||
<li>The channel ordering MUST be equal to one defined in <xref target="channels- | <li>The Rice partition order (see <xref target="coded-residual"></xref>) <bcp14> | |||
bits"></xref>, i.e., the FLAC file MUST NOT need a WAVEFORMATEXTENSIBLE_CHANNEL_ | MUST</bcp14> be less than or equal to 8.</li> | |||
MASK tag to describe the channel ordering. See <xref target="channel-mask"></xre | <li>The channel ordering <bcp14>MUST</bcp14> be equal to one defined in <xref ta | |||
f> for details.</li> | rget="channels-bits"></xref>, i.e., the FLAC file <bcp14>MUST NOT</bcp14> need a | |||
WAVEFORMATEXTENSIBLE_CHANNEL_MASK tag to describe the channel ordering. See <xr | ||||
ef target="channel-mask"></xref> for details.</li> | ||||
</ul> | </ul> | |||
</section> | </section> | |||
<section anchor="file-level-metadata"><name>File-level metadata</name> | <section anchor="file-level-metadata"><name>File-Level Metadata</name> | |||
<t>At the start of a FLAC file or stream, following the <tt>fLaC</tt> ASCII file | <t>At the start of a FLAC file or stream, following the <tt>fLaC</tt> ASCII file | |||
signature, one or more metadata blocks MUST be present before any audio frames | signature, one or more metadata blocks <bcp14>MUST</bcp14> be present before an | |||
appear. The first metadata block MUST be a streaminfo block.</t> | y audio frames appear. The first metadata block <bcp14>MUST</bcp14> be a streami | |||
nfo metadata block.</t> | ||||
<section anchor="metadata-block-header"><name>Metadata block header</name> | <section anchor="metadata-block-header"><name>Metadata Block Header</name> | |||
<t>Each metadata block starts with a 4 byte header. The first bit in this header | <t>Each metadata block starts with a 4-byte header. The first bit in this header | |||
flags whether a metadata block is the last one: it is a 0 when other metadata b | flags whether a metadata block is the last one. It is 0 when other metadata blo | |||
locks follow, otherwise it is a 1. The 7 remaining bits of the first header byte | cks follow; otherwise, it is 1. The 7 remaining bits of the first header byte co | |||
contain the type of the metadata block as an unsigned number between 0 and 126 | ntain the type of the metadata block as an unsigned number between 0 and 126, ac | |||
according to the following table. A value of 127 (i.e., 0b1111111) is forbidden. | cording to the following table. A value of 127 (i.e., 0b1111111) is forbidden. T | |||
The three bytes that follow code for the size of the metadata block in bytes, e | he three bytes that follow code for the size of the metadata block in bytes, exc | |||
xcluding the 4 header bytes, as an unsigned number coded big-endian.</t> | luding the 4 header bytes, as an unsigned number coded big-endian.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Value</th> | <th align="left">Value</th> | |||
<th align="left">Metadata block type</th> | <th align="left">Metadata Block Type</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0</td> | <td align="left">0</td> | |||
<td align="left">Streaminfo</td> | <td align="left">Streaminfo</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
skipping to change at line 251 ¶ | skipping to change at line 309 ¶ | |||
<td align="left">Padding</td> | <td align="left">Padding</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">2</td> | <td align="left">2</td> | |||
<td align="left">Application</td> | <td align="left">Application</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">3</td> | <td align="left">3</td> | |||
<td align="left">Seektable</td> | <td align="left">Seek table</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">4</td> | <td align="left">4</td> | |||
<td align="left">Vorbis comment</td> | <td align="left">Vorbis comment</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">5</td> | <td align="left">5</td> | |||
<td align="left">Cuesheet</td> | <td align="left">Cuesheet</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">6</td> | <td align="left">6</td> | |||
<td align="left">Picture</td> | <td align="left">Picture</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">7 - 126</td> | <td align="left">7 - 126</td> | |||
<td align="left">reserved</td> | <td align="left">Reserved</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">127</td> | <td align="left">127</td> | |||
<td align="left">forbidden, to avoid confusion with a frame sync code</td> | ||||
<td align="left">Forbidden (to avoid confusion with a frame sync code)</td> | ||||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table></section> | </table></section> | |||
<section anchor="streaminfo"><name>Streaminfo</name> | <section anchor="streaminfo"><name>Streaminfo</name> | |||
<t>The streaminfo metadata block has information about the whole stream, like sa | <t>The streaminfo metadata block has information about the whole stream, such as | |||
mple rate, number of channels, total number of samples, etc. It MUST be present | sample rate, number of channels, total number of samples, etc. It <bcp14>MUST</ | |||
as the first metadata block in the stream. Other metadata blocks MAY follow. The | bcp14> be present as the first metadata block in the stream. Other metadata bloc | |||
re MUST be no more than one streaminfo metadata block per FLAC stream.</t> | ks <bcp14>MAY</bcp14> follow. There <bcp14>MUST</bcp14> be no more than one stre | |||
<t>If the streaminfo metadata block contains incorrect or incomplete information | aminfo metadata block per FLAC stream.</t> | |||
, decoder behavior is left unspecified (i.e., up to the decoder implementation). | <t>If the streaminfo metadata block contains incorrect or incomplete information | |||
A decoder MAY choose to stop further decoding when the information supplied by | , decoder behavior is left unspecified (i.e., it is up to the decoder implementa | |||
the streaminfo metadata block turns out to be incorrect or contains forbidden va | tion). A decoder <bcp14>MAY</bcp14> choose to stop further decoding when the inf | |||
lues. A decoder accepting information from the streaminfo block (most-significan | ormation supplied by the streaminfo metadata block turns out to be incorrect or | |||
tly the maximum frame size, maximum block size, number of audio channels, number | contains forbidden values. A decoder accepting information from the streaminfo m | |||
of bits per sample, and total number of samples) without doing further checks d | etadata block (most significantly, the maximum frame size, maximum block size, n | |||
uring decoding of audio frames could be vulnerable to buffer overflows. See also | umber of audio channels, number of bits per sample, and total number of samples) | |||
<xref target="security-considerations"></xref>.</t> | without doing further checks during decoding of audio frames could be vulnerabl | |||
<t>The following table describes the streaminfo metadata block, excluding the me | e to buffer overflows. See also <xref target="security-considerations"></xref>.< | |||
tadata block header.</t> | /t> | |||
<t>The following table describes the streaminfo metadata block in order, excludi | ||||
ng the metadata block header.</t> | ||||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Data</th> | <th align="left">Data</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
skipping to change at line 331 ¶ | skipping to change at line 391 ¶ | |||
<td align="left">(number of channels)-1. FLAC supports from 1 to 8 channels.</td > | <td align="left">(number of channels)-1. FLAC supports from 1 to 8 channels.</td > | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(5)</tt></td> | <td align="left"><tt>u(5)</tt></td> | |||
<td align="left">(bits per sample)-1. FLAC supports from 4 to 32 bits per sample .</td> | <td align="left">(bits per sample)-1. FLAC supports from 4 to 32 bits per sample .</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(36)</tt></td> | <td align="left"><tt>u(36)</tt></td> | |||
<td align="left">Total number of interchannel samples in the stream. A value of zero here means the number of total samples is unknown.</td> | <td align="left">Total number of interchannel samples in the stream. A value of 0 here means the number of total samples is unknown.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(128)</tt></td> | <td align="left"><tt>u(128)</tt></td> | |||
<td align="left">MD5 checksum of the unencoded audio data. This allows the decod er to determine if an error exists in the audio data even when, despite the erro r, the bitstream itself is valid. A value of <tt>0</tt> signifies that the value is not known.</td> | <td align="left">MD5 checksum of the unencoded audio data. This allows the decod er to determine if an error exists in the audio data even when, despite the erro r, the bitstream itself is valid. A value of <tt>0</tt> signifies that the value is not known.</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>The minimum block size and the maximum block size MUST be in the 16-6 | </table><t>The minimum block size and the maximum block size <bcp14>MUST</bcp14> | |||
5535 range. The minimum block size MUST be equal to or less than the maximum blo | be in the 16-65535 range. The minimum block size <bcp14>MUST</bcp14> be equal t | |||
ck size.</t> | o or less than the maximum block size.</t> | |||
<t>Any frame but the last one MUST have a block size equal to or greater than th | <t>Any frame but the last one <bcp14>MUST</bcp14> have a block size equal to or | |||
e minimum block size and MUST have a block size equal to or lesser than the maxi | greater than the minimum block size and <bcp14>MUST</bcp14> have a block size eq | |||
mum block size. The last frame MUST have a block size equal to or lesser than th | ual to or less than the maximum block size. The last frame <bcp14>MUST</bcp14> h | |||
e maximum block size, it does not have to comply to the minimum block size becau | ave a block size equal to or less than the maximum block size; it does not have | |||
se the block size of that frame must be able to accommodate the length of the au | to comply to the minimum block size because the block size of that frame must be | |||
dio data the stream contains.</t> | able to accommodate the length of the audio data the stream contains.</t> | |||
<t>If the minimum block size is equal to the maximum block size, the file contai | <t>If the minimum block size is equal to the maximum block size, the file contai | |||
ns a fixed block size stream, as the minimum block size excludes the last block. | ns a fixed block size stream, as the minimum block size excludes the last block. | |||
Note that in the case of a stream with a variable block size, the actual maximu | Note that in the case of a stream with a variable block size, the actual maximu | |||
m block size MAY be smaller than the maximum block size listed in the streaminfo | m block size <bcp14>MAY</bcp14> be smaller than the maximum block size listed in | |||
block, and the actual smallest block size excluding the last block MAY be large | the streaminfo metadata block, and the actual smallest block size excluding the | |||
r than the minimum block size listed in the streaminfo block. This is because th | last block <bcp14>MAY</bcp14> be larger than the minimum block size listed in t | |||
e encoder has to write these fields before receiving any input audio data, and c | he streaminfo metadata block. | |||
annot know beforehand what block sizes it will use, only between what bounds the | This is because the encoder has to write these fields before receiving any input | |||
se will be chosen.</t> | audio data and cannot know beforehand what block sizes it will use, only betwee | |||
<t>The sample rate MUST NOT be 0 when the FLAC file contains audio. A sample rat | n what bounds the block sizes will be chosen.</t> | |||
e of 0 MAY be used when non-audio is represented. This is useful if data is enco | <t>The sample rate <bcp14>MUST NOT</bcp14> be 0 when the FLAC file contains audi | |||
ded that is not along a time axis, or when the sample rate of the data lies outs | o. A sample rate of 0 <bcp14>MAY</bcp14> be used when non-audio is represented. | |||
ide the range that FLAC can represent in the streaminfo metadata block. If a sam | This is useful if data is encoded that is not along a time axis or when the samp | |||
ple rate of 0 is used it is recommended to store the meaning of the encoded cont | le rate of the data lies outside the range that FLAC can represent in the stream | |||
ent in a Vorbis comment field (see <xref target="vorbis-comment"></xref>) or an | info metadata block. If a sample rate of 0 is used, it is recommended to store t | |||
application metadata block (see <xref target="application"></xref>). This docume | he meaning of the encoded content in a Vorbis comment field (see <xref target="v | |||
nt does not define such metadata.</t> | orbis-comment"></xref>) or an application metadata block (see <xref target="appl | |||
<t>The MD5 checksum is computed by applying the MD5 message-digest algorithm in | ication"></xref>). This document does not define such metadata.</t> | |||
<xref target="RFC1321"></xref>. The message to this algorithm consists of all th | <t>The MD5 checksum is computed by applying the MD5 message-digest algorithm in | |||
e samples of all channels interleaved, represented in signed, little-endian form | <xref target="RFC1321"></xref>. The message to this algorithm consists of all th | |||
. This interleaving is on a per-sample basis, so for a stereo file this means fi | e samples of all channels interleaved, represented in signed, little-endian form | |||
rst the first sample of the first channel, then the first sample of the second c | . | |||
hannel, then the second sample of the first channel etc. Before computing the ch | This interleaving is on a per-sample basis, so for a stereo file, this means | |||
ecksum, all samples must be byte-aligned. If the bit depth is not a whole number | the first sample of the first channel, then the first sample of the | |||
of bytes, the value of each sample is sign extended to the next whole number of | second channel, then the second sample of the first channel, etc. Before | |||
bytes.</t> | computing the checksum, all samples must be byte-aligned. If the bit depth is | |||
<t>So, in the case of a 2-channel stream with 6-bit samples, bits will be lined | not a whole number of bytes, the value of each sample is sign-extended to the | |||
up as follows.</t> | next whole number of bytes.</t> | |||
<t>In the case of a 2-channel stream with 6-bit samples, bits will be lined up a | ||||
s follows:</t> | ||||
<artwork><![CDATA[SSAAAAAASSBBBBBBSSCCCCCC | <artwork type="ascii-art"> | |||
<![CDATA[SSAAAAAASSBBBBBBSSCCCCCC | ||||
^ ^ ^ ^ ^ ^ | ^ ^ ^ ^ ^ ^ | |||
| | | | | Bits of 2nd sample of 1st channel | | | | | | Bits of 2nd sample of 1st channel | |||
| | | | Sign extension bits of 2nd sample of 2nd channel | | | | | Sign extension bits of 2nd sample of 2nd channel | |||
| | | Bits of 1st sample of 2nd channel | | | | Bits of 1st sample of 2nd channel | |||
| | Sign extension bits of 1st sample of 2nd channel | | | Sign extension bits of 1st sample of 2nd channel | |||
| Bits of 1st sample of 1st channel | | Bits of 1st sample of 1st channel | |||
Sign extention bits of 1st sample of 1st channel | Sign extension bits of 1st sample of 1st channel | |||
]]></artwork> | ||||
]]> | <t>In the case of a 1-channel stream with 12-bit samples, bits are lined up in l | |||
</artwork> | ittle-endian byte order as follows:</t> | |||
<t>As another example, in the case of a 1-channel with 12-bit samples, bits are | ||||
lined up as follows, showing the little-endian byte order</t> | ||||
<artwork><![CDATA[AAAAAAAASSSSAAAABBBBBBBBSSSSBBBB | <artwork type="ascii-art"> | |||
<![CDATA[AAAAAAAASSSSAAAABBBBBBBBSSSSBBBB | ||||
^ ^ ^ ^ ^ ^ | ^ ^ ^ ^ ^ ^ | |||
| | | | | Most-significant 4 bits of 2nd sample | | | | | | Most-significant 4 bits of 2nd sample | |||
| | | | Sign extension bits of 2nd sample | | | | | Sign extension bits of 2nd sample | |||
| | | Least-significant 8 bits of 2nd sample | | | | Least-significant 8 bits of 2nd sample | |||
| | Most-significant 4 bits of 1st sample | | | Most-significant 4 bits of 1st sample | |||
| Sign extension bits of 1st sample | | Sign extension bits of 1st sample | |||
Least-significant 8 bits of 1st sample | Least-significant 8 bits of 1st sample | |||
]]></artwork> | ||||
]]> | ||||
</artwork> | ||||
</section> | </section> | |||
<section anchor="padding"><name>Padding</name> | <section anchor="padding"><name>Padding</name> | |||
<t>The padding metadata block allows for an arbitrary amount of padding. This bl ock is useful when it is known that metadata will be edited after encoding; the user can instruct the encoder to reserve a padding block of sufficient size so t hat when metadata is added, it will simply overwrite the padding (which is relat ively quick) instead of having to insert it into the existing file (which would normally require rewriting the entire file). There MAY be one or more padding me tadata blocks per FLAC stream.</t> | <t>The padding metadata block allows for an arbitrary amount of padding. This bl ock is useful when it is known that metadata will be edited after encoding; the user can instruct the encoder to reserve a padding block of sufficient size so t hat when metadata is added, it will simply overwrite the padding (which is relat ively quick) instead of having to insert it into the existing file (which would normally require rewriting the entire file). There <bcp14>MAY</bcp14> be one or more padding metadata blocks per FLAC stream.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Data</th> | <th align="left">Data</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(n)</tt></td> | <td align="left"><tt>u(n)</tt></td> | |||
<td align="left">n '0' bits (n MUST be a multiple of 8, i.e., a whole number of bytes, and MAY be zero). n is 8 times the size described in the metadata block h eader.</td> | <td align="left">n "0" bits (n <bcp14>MUST</bcp14> be a multiple of 8, i.e., a w hole number of bytes, and <bcp14>MAY</bcp14> be zero). n is 8 times the size des cribed in the metadata block header.</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table></section> | </table></section> | |||
<section anchor="application"><name>Application</name> | <section anchor="application"><name>Application</name> | |||
<t>The application metadata block is for use by third-party applications. The on ly mandatory field is a 32-bit identifier. An ID registry is being maintained at <eref target="https://xiph.org/flac/id.html">https://xiph.org/flac/id.html</ere f>.</t> | <t>The application metadata block is for use by third-party applications. The on ly mandatory field is a 32-bit application identifier (application ID). Applicat ion IDs are registered in the IANA "FLAC Application Metadata Block IDs" registr y (see <xref target="application-id-registry"></xref>).</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Data</th> | <th align="left">Data</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(32)</tt></td> | <td align="left"><tt>u(32)</tt></td> | |||
<td align="left">Registered application ID.</td> | <td align="left">Registered application ID.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(n)</tt></td> | <td align="left"><tt>u(n)</tt></td> | |||
<td align="left">Application data (n MUST be a multiple of 8, i.e., a whole numb er of bytes) n is 8 times the size described in the metadata block header, minus the 32 bits already used for the application ID.</td> | <td align="left">Application data (n <bcp14>MUST</bcp14> be a multiple of 8, i.e ., a whole number of bytes). n is 8 times the size described in the metadata blo ck header minus the 32 bits already used for the application ID.</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>Application IDs are registered with the IANA, see <xref target="appli cation-id-registry"></xref>.</t> | </table> | |||
</section> | </section> | |||
<section anchor="seektable"><name>Seektable</name> | <section anchor="seektable"><name>Seek Table</name> | |||
<t>The seektable metadata block can be used to store seek points. It is possible | <t>The seek table metadata block can be used to store seek points. It is possibl | |||
to seek to any given sample in a FLAC stream without a seek table, but the dela | e to seek to any given sample in a FLAC stream without a seek table, but the del | |||
y can be unpredictable since the bitrate may vary widely within a stream. By add | ay can be unpredictable since the bitrate may vary widely within a stream. By ad | |||
ing seek points to a stream, this delay can be significantly reduced. There MUST | ding seek points to a stream, this delay can be significantly reduced. There <bc | |||
NOT be more than one seektable metadata block in a stream, but the table can ha | p14>MUST NOT</bcp14> be more than one seek table metadata block in a stream, but | |||
ve any number of seek points.</t> | the table can have any number of seek points.</t> | |||
<t>Each seek point takes 18 bytes, so a seek table with 1% resolution within a s | <t>Each seek point takes 18 bytes, so a seek table with 1% resolution within a s | |||
tream adds less than 2 kilobyte of data. The number of seek points is implied by | tream adds less than 2 kilobytes of data. The number of seek points is implied b | |||
the size described in the metadata block header, i.e., equal to size / 18. Ther | y the size described in the metadata block header, i.e., equal to size / 18. The | |||
e is also a special 'placeholder' seekpoint that will be ignored by decoders but | re is also a special "placeholder" seek point that will be ignored by decoders b | |||
can be used to reserve space for future seek point insertion.</t> | ut can be used to reserve space for future seek point insertion.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Data</th> | <th align="left">Data</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">Seekpoints</td> | <td align="left">Seek points</td> | |||
<td align="left">Zero or more seek points as defined in <xref target="seekpoint" ></xref>.</td> | <td align="left">Zero or more seek points as defined in <xref target="seekpoint" ></xref>.</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>A seektable is generally not usable for seeking in a FLAC file embedd ed in a container (see <xref target="container-mappings"></xref>), as such conta iners usually interleave FLAC data with other data and the offsets used in seekp oints are those of an unmuxed FLAC stream. Also, containers often provide their own seeking methods. It is, however, possible to store the seektable in the cont ainer along with other metadata when muxing a FLAC file, so this stored seektabl e can be restored when demuxing the FLAC stream into a standalone FLAC file.</t> | </table><t>A seek table is generally not usable for seeking in a FLAC file embed ded in a container (see <xref target="container-mappings"></xref>), as such cont ainers usually interleave FLAC data with other data and the offsets used in seek points are those of an unmuxed FLAC stream. Also, containers often provide thei r own seeking methods. However, it is possible to store the seek table in the co ntainer along with other metadata when muxing a FLAC file, so this stored seek t able can be restored when demuxing the FLAC stream into a standalone FLAC file.< /t> | |||
<section anchor="seekpoint"><name>Seekpoint</name> | <section anchor="seekpoint"><name>Seek Point</name> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Data</th> | <th align="left">Data</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(64)</tt></td> | <td align="left"><tt>u(64)</tt></td> | |||
<td align="left">Sample number of the first sample in the target frame, or <tt>0 xFFFFFFFFFFFFFFFF</tt> for a placeholder point.</td> | <td align="left">Sample number of the first sample in the target frame or <tt>0x FFFFFFFFFFFFFFFF</tt> for a placeholder point.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(64)</tt></td> | <td align="left"><tt>u(64)</tt></td> | |||
<td align="left">Offset (in bytes) from the first byte of the first frame header to the first byte of the target frame's header.</td> | <td align="left">Offset (in bytes) from the first byte of the first frame header to the first byte of the target frame's header.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(16)</tt></td> | <td align="left"><tt>u(16)</tt></td> | |||
<td align="left">Number of samples in the target frame.</td> | <td align="left">Number of samples in the target frame.</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>NOTES</t> | </table> | |||
<ul spacing="compact"> | <t>Notes:</t> | |||
<ul> | ||||
<li>For placeholder points, the second and third field values are undefined.</li > | <li>For placeholder points, the second and third field values are undefined.</li > | |||
<li>Seek points within a table MUST be sorted in ascending order by sample numbe | <li>Seek points within a table <bcp14>MUST</bcp14> be sorted in ascending order | |||
r.</li> | by sample number.</li> | |||
<li>Seek points within a table MUST be unique by sample number, with the excepti | <li>Seek points within a table <bcp14>MUST</bcp14> be unique by sample number, w | |||
on of placeholder points.</li> | ith the exception of placeholder points.</li> | |||
<li>The previous two notes imply that there MAY be any number of placeholder poi | <li>The previous two notes imply that there <bcp14>MAY</bcp14> be any number of | |||
nts, but they MUST all occur at the end of the table.</li> | placeholder points, but they <bcp14>MUST</bcp14> all occur at the end of the tab | |||
<li>The sample offsets are those of an unmuxed FLAC stream. The offsets MUST NOT | le.</li> | |||
be updated on muxing to reflect the new offsets of FLAC frames in a container.< | <li>The sample offsets are those of an unmuxed FLAC stream. The offsets <bcp14>M | |||
/li> | UST NOT</bcp14> be updated on muxing to reflect the new offsets of FLAC frames i | |||
n a container.</li> | ||||
</ul> | </ul> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="vorbis-comment"><name>Vorbis comment</name> | <section anchor="vorbis-comment"><name>Vorbis Comment</name> | |||
<t>A Vorbis comment metadata block contains human-readable information coded in | ||||
UTF-8. The name Vorbis comment points to the fact that the Vorbis codec stores s | ||||
uch metadata in almost the same way, see <xref target="Vorbis"></xref>. A Vorbis | ||||
comment metadata block consists of a vendor string optionally followed by a num | ||||
ber of fields, which are pairs of field names and field contents. Many users ref | ||||
er to these fields as FLAC tags or simply as tags. A FLAC file MUST NOT contain | ||||
more than one Vorbis comment metadata block.</t> | ||||
<t>In a Vorbis comment metadata block, the metadata block header is directly fol | ||||
lowed by 4 bytes containing the length in bytes of the vendor string as an unsig | ||||
ned number coded little-endian. The vendor string follows UTF-8 coded, and is no | ||||
t terminated in any way.</t> | ||||
<t>Following the vendor string are 4 bytes containing the number of fields that | ||||
are in the Vorbis comment block, stored as an unsigned number, coded little-endi | ||||
an. If this number is non-zero, it is followed by the fields themselves, each of | ||||
which is stored with a 4 byte length. First, the 4 byte field length in bytes i | ||||
s stored as an unsigned number, coded little-endian. The field itself is, like t | ||||
he vendor string, UTF-8 coded, not terminated in any way.</t> | ||||
<t>Each field consists of a field name and a field content, separated by an = ch | ||||
aracter. The field name MUST only consist of UTF-8 code points U+0020 through U+ | ||||
007E, excluding U+003D, which is the = character. In other words, the field name | ||||
can contain all printable ASCII characters except the equals sign. The evaluati | ||||
on of the field names MUST be case insensitive, so U+0041 through 0+005A (A-Z) M | ||||
UST be considered equivalent to U+0061 through U+007A (a-z) respectively. The fi | ||||
eld contents can contain any UTF-8 character.</t> | ||||
<t>Note that the Vorbis comment as used in Vorbis allows for on the order of 2^6 | ||||
4 bytes of data whereas the FLAC metadata block is limited to 2^24 bytes. Given | ||||
the stated purpose of Vorbis comments, i.e., human-readable textual information, | ||||
the FLAC metadata block limit is unlikely to be restrictive. Also note that the | ||||
32-bit field lengths are coded little-endian, as opposed to the usual big-endia | ||||
n coding of fixed-length integers in the rest of the FLAC format.</t> | ||||
<section anchor="standard-field-names"><name>Standard field names</name> | <t>A Vorbis comment metadata block contains human-readable information coded | |||
<t>Only one standard field name is defined: the channel mask field, in <xref tar | in UTF-8. The name "Vorbis comment" points to the fact that the Vorbis codec | |||
get="channel-mask"></xref>. No other field names are defined because the applica | stores such metadata in almost the same way (see <xref | |||
bility of any field name is strongly tied to the content it is associated with. | target="Vorbis"></xref>). A Vorbis comment metadata block consists of a vendor | |||
For example, field names useful for describing files that contain a single work | string optionally followed by a number of fields, which are pairs of field | |||
of music would be unusable when labeling archived broadcasts, recordings of any | names and field contents. The vendor string contains the name of the program | |||
kind, or a collection of music works. Even when describing a single work of musi | that generated the file or stream. The fields contain metadata describing | |||
c, different conventions exist depending on the kind of music: orchestral music | various aspects of the contained audio. Many users refer to these fields as | |||
differs from music by solo artists or bands.</t> | "FLAC tags" or simply as "tags". A FLAC file <bcp14>MUST NOT</bcp14> contain | |||
more than one Vorbis comment metadata block.</t> | ||||
<t>In a Vorbis comment metadata block, the metadata block header is directly | ||||
followed by 4 bytes containing the length in bytes of the vendor string as an | ||||
unsigned number coded little-endian. The vendor string follows, is UTF-8 coded | ||||
and is not terminated in any way. | ||||
</t> | ||||
<t>Following the vendor string are 4 bytes containing the number of fields | ||||
that are in the Vorbis comment block, stored as an unsigned number coded | ||||
little-endian. If this number is non-zero, it is followed by the fields | ||||
themselves, each of which is stored with a 4-byte length. For each field, the | ||||
field length in bytes is stored as a 4-byte unsigned number coded | ||||
little-endian. The field itself follows it. Like the vendor string, the field | ||||
is UTF-8 coded and not terminated in any way.</t> | ||||
<t>Each field consists of a field name and field contents, separated by an = cha | ||||
racter. The field name <bcp14>MUST</bcp14> only consist of UTF-8 code points U+0 | ||||
020 through U+007E, excluding U+003D, which is the = character. In other words, | ||||
the field name can contain all printable ASCII characters except the equals sign | ||||
. The evaluation of the field names <bcp14>MUST</bcp14> be case insensitive, so | ||||
U+0041 through 0+005A (A-Z) <bcp14>MUST</bcp14> be considered equivalent to U+00 | ||||
61 through U+007A (a-z). The field contents can contain any UTF-8 character.</t> | ||||
<t>Note that the Vorbis comment as used in Vorbis allows for 2<sup>64</sup> byte | ||||
s of data whereas the FLAC metadata block is limited to 2<sup>24</sup> bytes. Gi | ||||
ven the stated purpose of Vorbis comments, i.e., human-readable textual informat | ||||
ion, the FLAC metadata block limit is unlikely to be restrictive. Also, note tha | ||||
t the 32-bit field lengths are coded little-endian as opposed to the usual big-e | ||||
ndian coding of fixed-length integers in the rest of the FLAC format.</t> | ||||
<section anchor="standard-field-names"><name>Standard Field Names</name> | ||||
<t>Only one standard field name is defined: the channel mask field (see <xref ta | ||||
rget="channel-mask"></xref>). No other field names are defined because the appli | ||||
cability of any field name is strongly tied to the content it is associated with | ||||
. For example, field names that are useful for describing files that contain a s | ||||
ingle work of music would be unusable when labeling archived broadcasts, recordi | ||||
ngs of any kind, or a collection of music works. Even when describing a single w | ||||
ork of music, different conventions exist depending on the kind of music: orches | ||||
tral music differs from music by solo artists or bands.</t> | ||||
<t>Despite the fact that no field names are formally defined, there is a general trend among devices and software capable of FLAC playback that are meant to pla y music. Most of those recognize at least the following field names:</t> | <t>Despite the fact that no field names are formally defined, there is a general trend among devices and software capable of FLAC playback that are meant to pla y music. Most of those recognize at least the following field names:</t> | |||
<ul spacing="compact"> | <dl> | |||
<li>Title: name of the current work.</li> | <dt>Title:</dt><dd>Name of the current work.</dd> | |||
<li>Artist: name of the artist generally responsible for the current work. For o | <dt>Artist:</dt><dd>Name of the artist generally responsible for the current wor | |||
rchestral works, this is usually the composer; otherwise, it is often the perfor | k. For orchestral works, this is usually the composer; otherwise, it is often th | |||
mer.</li> | e performer.</dd> | |||
<li>Album: name of the collection the current work belongs to.</li> | <dt>Album:</dt><dd>Name of the collection the current work belongs to.</dd> | |||
</ul> | </dl> | |||
<t>For a more comprehensive list of possible field names suited for describing a | <t>For a more comprehensive list of possible field names suited for describing a | |||
single work of music in various genres, the list of tags used in the MusicBrain | single work of music in various genres, the list of tags used in the MusicBrain | |||
z project, see <xref target="MusicBrainz"></xref>, is suggested.</t> | z project is suggested; see <xref target="MusicBrainz"></xref>.</t> | |||
</section> | </section> | |||
<section anchor="channel-mask"><name>Channel mask</name> | <section anchor="channel-mask"><name>Channel Mask</name> | |||
<t>Besides fields containing information about the work itself, one field is def | ||||
ined for technical reasons, of which the field name is WAVEFORMATEXTENSIBLE_CHAN | <t>Besides fields containing information about the work itself, one field is def | |||
NEL_MASK. This field is used to communicate that the channels in a file differ f | ined for technical reasons: WAVEFORMATEXTENSIBLE_CHANNEL_MASK. This field is use | |||
rom the default channels defined in <xref target="channels-bits"></xref>. For ex | d to communicate that the channels in a file differ from the default channels de | |||
ample, by default, a FLAC file containing two channels is interpreted to contain | fined in <xref target="channels-bits"></xref>. For example, by default, a FLAC f | |||
a left and right channel, but with this field, it is possible to describe diffe | ile containing two channels is interpreted to contain a left and right channel, | |||
rent channel contents.</t> | but with this field, it is possible to describe different channel contents.</t> | |||
<t>The channel mask consists of flag bits indicating which channels are present. | ||||
The flags only signal which channels are present, not in which order, so if a f | <t>The channel mask consists of flag bits indicating which channels are | |||
ile has to be encoded in which channels are ordered differently, they have to be | present. The flags only signal which channels are present, not in which order, | |||
reordered. This mask is stored with a hexadecimal representation, preceded by 0 | so if a file to be encoded has channels that are ordered differently, they | |||
x, see the examples below. Please note that a file in which the channel order is | have to be reordered. This mask is stored with a hexadecimal representation | |||
defined through the WAVEFORMATEXTENSIBLE_CHANNEL_MASK is not streamable (see <x | preceded by 0x; see the examples below. Please note that a file in which the cha | |||
ref target="streamable-subset"></xref>), as the field is not found in each frame | nnel | |||
header. The mask bits can be found in the following table.</t> | order is defined through the WAVEFORMATEXTENSIBLE_CHANNEL_MASK is not | |||
<table> | streamable (see <xref target="streamable-subset"></xref>), as the field is not | |||
found in each frame header. The mask bits can be found in the following | ||||
table.</t> | ||||
<table anchor="mask-bits-table"> | ||||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Bit number</th> | <th align="left">Bit Number</th> | |||
<th align="left">Channel description</th> | <th align="left">Channel Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0</td> | <td align="left">0</td> | |||
<td align="left">Front left</td> | <td align="left">Front left</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
skipping to change at line 593 ¶ | skipping to change at line 692 ¶ | |||
<td align="left">Top rear center</td> | <td align="left">Top rear center</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">17</td> | <td align="left">17</td> | |||
<td align="left">Top rear right</td> | <td align="left">Top rear right</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>Following are three examples:</t> | </table><t>Following are three examples:</t> | |||
<ul spacing="compact"> | <ul> | |||
<li>If a file has a single channel, being a LFE channel, the Vorbis comment fiel | <li>A file has a single channel -- an LFE channel. The Vorbis comment field is W | |||
d is WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x8.</li> | AVEFORMATEXTENSIBLE_CHANNEL_MASK=0x8.</li> | |||
<li>If a file has four channels, being front left, front right, top front left, | <li>A file has four channels -- front left, front right, top front left, and top | |||
and top front right, the Vorbis comment field is WAVEFORMATEXTENSIBLE_CHANNEL_MA | front right. The Vorbis comment field is WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x50 | |||
SK=0x5003.</li> | 03.</li> | |||
<li>If an input has four channels, being back center, top front center, front ce | <li>An input has four channels -- back center, top front center, front center, a | |||
nter, and top rear center in that order, they have to be reordered to front cent | nd top rear center in that order. These have to be reordered to front center, ba | |||
er, back center, top front center and top rear center. The Vorbis comment field | ck center, top front center, and top rear center. The Vorbis comment field added | |||
added is WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x12104.</li> | is WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x12104.</li> | |||
</ul> | </ul> | |||
<t>WAVEFORMATEXTENSIBLE_CHANNEL_MASK fields MAY be padded with zeros, for exampl e, 0x0008 for a single LFE channel. Parsing of WAVEFORMATEXTENSIBLE_CHANNEL_MASK fields MUST be case-insensitive for both the field name and the field contents. </t> | <t>WAVEFORMATEXTENSIBLE_CHANNEL_MASK fields <bcp14>MAY</bcp14> be padded with ze ros, for example, 0x0008 for a single LFE channel. Parsing of WAVEFORMATEXTENSIB LE_CHANNEL_MASK fields <bcp14>MUST</bcp14> be case-insensitive for both the fiel d name and the field contents.</t> | |||
<t>A WAVEFORMATEXTENSIBLE_CHANNEL_MASK field of 0x0 can be used to indicate that none of the audio channels of a file correlate with speaker positions. This is the case when audio needs to be decoded into speaker positions (e.g., Ambisonics B-format audio) or when a multitrack recording is contained.</t> | <t>A WAVEFORMATEXTENSIBLE_CHANNEL_MASK field of 0x0 can be used to indicate that none of the audio channels of a file correlate with speaker positions. This is the case when audio needs to be decoded into speaker positions (e.g., Ambisonics B-format audio) or when a multitrack recording is contained.</t> | |||
<t>It is possible for a WAVEFORMATEXTENSIBLE_CHANNEL_MASK field to code for fewe | <t>It is possible for a WAVEFORMATEXTENSIBLE_CHANNEL_MASK field to code for fewe | |||
r channels than are present in the audio. If that is the case, the remaining cha | r channels than are present in the audio. If that is the case, the remaining cha | |||
nnels SHOULD NOT be rendered by a playback application unfamiliar with their pur | nnels <bcp14>SHOULD NOT</bcp14> be rendered by a playback application unfamiliar | |||
pose. For example, the Ambisonics UHJ format is compatible with stereo playback: | with their purpose. | |||
its first two channels can be played back on stereo equipment, but all four cha | ||||
nnels together can be decoded into surround sound. For that example, the Vorbis | For example, the Ambisonics UHJ format is compatible with stereo playback: its f | |||
comment field WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x3 would be set, indicating the | irst two channels can be played back on stereo equipment, but all four channels | |||
first two channels are front left and front right, and other channels do not co | together can be decoded into surround sound. For that example, the Vorbis commen | |||
rrelate with speaker positions directly.</t> | t field WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x3 would be set, indicating that the | |||
first two channels are front left and front right and other channels do not corr | ||||
elate with speaker positions directly.</t> | ||||
<t>If audio channels not assigned to any speaker are contained and decoding to s peaker positions is possible, it is recommended to provide metadata on how this decoding should take place in another Vorbis comment field or an application met adata block. This document does not define such metadata.</t> | <t>If audio channels not assigned to any speaker are contained and decoding to s peaker positions is possible, it is recommended to provide metadata on how this decoding should take place in another Vorbis comment field or an application met adata block. This document does not define such metadata.</t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="cuesheet"><name>Cuesheet</name> | <section anchor="cuesheet"><name>Cuesheet</name> | |||
<t>To either store the track and index point structure of a Compact Disc Digital Audio (CD-DA) along with its audio or to provide a mechanism to store locations of interest within a FLAC file, a cuesheet metadata block can be used. Certain aspects of this metadata block follow directly from the CD-DA specification, cal led Red Book, which is standardized as <xref target="IEC.60908.1999"></xref>. T he description below is complete and further reference to [IEC.60908.1999] is no t needed to implement this metadata block.</t> | <t>A cuesheet metadata block can be used either to store the track and index poi nt structure of a Compact Disc Digital Audio (CD-DA) along with its audio or to provide a mechanism to store locations of interest within a FLAC file. Certain a spects of this metadata block come directly from the CD-DA specification (called Red Book), which is standardized as <xref target="IEC.60908.1999"></xref>. The description below is complete, and further reference to <xref target="IEC.60908 .1999"/> is not needed to implement this metadata block.</t> | |||
<t>The structure of a cuesheet metadata block is enumerated in the following tab le.</t> | <t>The structure of a cuesheet metadata block is enumerated in the following tab le.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Data</th> | <th align="left">Data</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(128*8)</tt></td> | <td align="left"><tt>u(128*8)</tt></td> | |||
<td align="left">Media catalog number, in ASCII printable characters 0x20-0x7E.< /td> | <td align="left">Media catalog number in ASCII printable characters 0x20-0x7E.</ td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(64)</tt></td> | <td align="left"><tt>u(64)</tt></td> | |||
<td align="left">Number of lead-in samples.</td> | <td align="left">Number of lead-in samples.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(1)</tt></td> | <td align="left"><tt>u(1)</tt></td> | |||
<td align="left"><tt>1</tt> if the cuesheet corresponds to a CD-DA, else <tt>0</ | ||||
tt>.</td> | <td align="left"><tt>1</tt> if the cuesheet corresponds to a CD-DA; else <tt>0</ | |||
tt>.</td> | ||||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(7+258*8)</tt></td> | <td align="left"><tt>u(7+258*8)</tt></td> | |||
<td align="left">Reserved. All bits MUST be set to zero.</td> | <td align="left">Reserved. All bits <bcp14>MUST</bcp14> be set to zero.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(8)</tt></td> | <td align="left"><tt>u(8)</tt></td> | |||
<td align="left">Number of tracks in this cuesheet.</td> | <td align="left">Number of tracks in this cuesheet.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">Cuesheet tracks</td> | <td align="left">Cuesheet tracks</td> | |||
<td align="left">A number of structures as specified in <xref target="cuesheet-t rack"></xref> equal to the number of tracks specified previously.</td> | <td align="left">A number of structures as specified in <xref target="cuesheet-t rack"></xref> equal to the number of tracks specified previously.</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>If the media catalog number is less than 128 bytes long, it is right- | </table><t>If the media catalog number is less than 128 bytes long, it is right- | |||
padded with 0x00 bytes. For CD-DA, this is a thirteen digit number, followed by | padded with 0x00 bytes. For CD-DA, this is a 13-digit number followed by 115 0x0 | |||
115 0x00 bytes.</t> | 0 bytes.</t> | |||
<t>The number of lead-in samples has meaning only for CD-DA cuesheets; for other | <t>The number of lead-in samples has meaning only for CD-DA cuesheets; for other | |||
uses, it should be 0. For CD-DA, the lead-in is the TRACK 00 area where the tab | uses, it should be 0. For CD-DA, the lead-in is the TRACK 00 area where the tab | |||
le of contents is stored; more precisely, it is the number of samples from the f | le of contents is stored; more precisely, it is the number of samples from the f | |||
irst sample of the media to the first sample of the first index point of the fir | irst sample of the media to the first sample of the first index point of the fir | |||
st track. According to <xref target="IEC.60908.1999"></xref>, the lead-in MUST b | st track. According to <xref target="IEC.60908.1999"></xref>, the lead-in <bcp14 | |||
e silence and CD grabbing software does not usually store it; additionally, the | >MUST</bcp14> be silent, and CD grabbing software does not usually store it; add | |||
lead-in MUST be at least two seconds but MAY be longer. For these reasons, the l | itionally, the lead-in <bcp14>MUST</bcp14> be at least two seconds but <bcp14>MA | |||
ead-in length is stored here so that the absolute position of the first track ca | Y</bcp14> be longer. For these reasons, the lead-in length is stored here so tha | |||
n be computed. Note that the lead-in stored here is the number of samples up to | t the absolute position of the first track can be computed. Note that the lead-i | |||
the first index point of the first track, not necessarily to INDEX 01 of the fir | n stored here is the number of samples up to the first index point of the first | |||
st track; even the first track MAY have INDEX 00 data.</t> | track, not necessarily to INDEX 01 of the first track; even the first track <bcp | |||
<t>The number of tracks MUST be at least 1, as a cuesheet block MUST have a lead | 14>MAY</bcp14> have INDEX 00 data.</t> | |||
-out track. For CD-DA, this number MUST be no more than 100 (99 regular tracks a | <t>The number of tracks <bcp14>MUST</bcp14> be at least 1, as a cuesheet block < | |||
nd one lead-out track). The lead-out track is always the last track in the cuesh | bcp14>MUST</bcp14> have a lead-out track. For CD-DA, this number <bcp14>MUST</bc | |||
eet. For CD-DA, the lead-out track number MUST be 170 as specified by <xref targ | p14> be no more than 100 (99 regular tracks and one lead-out track). The lead-ou | |||
et="IEC.60908.1999"></xref>, otherwise it MUST be 255.</t> | t track is always the last track in the cuesheet. For CD-DA, the lead-out track | |||
number <bcp14>MUST</bcp14> be 170 as specified by <xref target="IEC.60908.1999"> | ||||
</xref>; otherwise, it <bcp14>MUST</bcp14> be 255.</t> | ||||
<section anchor="cuesheet-track"><name>Cuesheet track</name> | <section anchor="cuesheet-track"><name>Cuesheet Track</name> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Data</th> | <th align="left">Data</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
skipping to change at line 688 ¶ | skipping to change at line 790 ¶ | |||
<td align="left">The track type: 0 for audio, 1 for non-audio. This corresponds to the CD-DA Q-channel control bit 3.</td> | <td align="left">The track type: 0 for audio, 1 for non-audio. This corresponds to the CD-DA Q-channel control bit 3.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(1)</tt></td> | <td align="left"><tt>u(1)</tt></td> | |||
<td align="left">The pre-emphasis flag: 0 for no pre-emphasis, 1 for pre-emphasi s. This corresponds to the CD-DA Q-channel control bit 5.</td> | <td align="left">The pre-emphasis flag: 0 for no pre-emphasis, 1 for pre-emphasi s. This corresponds to the CD-DA Q-channel control bit 5.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(6+13*8)</tt></td> | <td align="left"><tt>u(6+13*8)</tt></td> | |||
<td align="left">Reserved. All bits MUST be set to zero.</td> | <td align="left">Reserved. All bits <bcp14>MUST</bcp14> be set to zero.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(8)</tt></td> | <td align="left"><tt>u(8)</tt></td> | |||
<td align="left">The number of track index points.</td> | <td align="left">The number of track index points.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">Cuesheet track index points</td> | <td align="left">Cuesheet track index points</td> | |||
<td align="left">For all tracks except the lead-out track, a number of structure s as specified in <xref target="cuesheet-track-index-point"></xref> equal to the number of index points specified previously.</td> | <td align="left">For all tracks except the lead-out track, a number of structure s as specified in <xref target="cuesheet-track-index-point"></xref> equal to the number of index points specified previously.</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>Note that the track offset differs from the one in CD-DA, where the t | </table> | |||
rack's offset in the TOC is that of the track's INDEX 01 even if there is an IND | ||||
EX 00. For CD-DA, the track offset MUST be evenly divisible by 588 samples (588 | ||||
samples = 44100 samples/s * 1/75 s).</t> | ||||
<t>A track number of 0 is not allowed, because the CD-DA specification reserves | ||||
this for the lead-in. For CD-DA the number MUST be 1-99, or 170 for the lead-out | ||||
; for non-CD-DA, the track number MUST be 255 for the lead-out. It is recommende | ||||
d to start with track 1 and increase sequentially. Track numbers MUST be unique | ||||
within a cuesheet.</t> | ||||
<t>The track ISRC (International Standard Recording Code) is a 12-digit alphanum | ||||
eric code; see <xref target="ISRC-handbook"></xref>. A value of 12 ASCII 0x00 ch | ||||
aracters MAY be used to denote the absence of an ISRC.</t> | ||||
<t>There MUST be at least one index point in every track in a cuesheet except fo | ||||
r the lead-out track, which MUST have zero. For CD-DA, the number of index point | ||||
s MUST NOT be more than 100.</t> | ||||
<section anchor="cuesheet-track-index-point"><name>Cuesheet track index point</n | <t>Note that the track offset differs from the one in CD-DA, where the track's o | |||
ame> | ffset in the table of contents (TOC) is that of the track's INDEX 01 even if the | |||
re is an INDEX 00. For CD-DA, the track offset <bcp14>MUST</bcp14> be evenly div | ||||
isible by 588 samples (588 samples = 44100 samples/s * 1/75 s).</t> | ||||
<t>A track number of 0 is not allowed because the CD-DA specification reserves t | ||||
his for the lead-in. For CD-DA, the number <bcp14>MUST</bcp14> be 1-99 or 170 fo | ||||
r the lead-out; for non-CD-DA, the track number <bcp14>MUST</bcp14> be 255 for t | ||||
he lead-out. It is recommended to start with track 1 and increase sequentially. | ||||
Track numbers <bcp14>MUST</bcp14> be unique within a cuesheet.</t> | ||||
<t>The track ISRC (International Standard Recording Code) is a 12-digit alphanum | ||||
eric code; see <xref target="ISRC-handbook"></xref>. A value of 12 ASCII 0x00 ch | ||||
aracters <bcp14>MAY</bcp14> be used to denote the absence of an ISRC.</t> | ||||
<t>There <bcp14>MUST</bcp14> be at least one index point in every track in a cue | ||||
sheet except for the lead-out track, which <bcp14>MUST</bcp14> have zero. For CD | ||||
-DA, the number of index points <bcp14>MUST NOT</bcp14> be more than 100.</t> | ||||
<section anchor="cuesheet-track-index-point"><name>Cuesheet Track Index Point</n | ||||
ame> | ||||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Data</th> | <th align="left">Data</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
skipping to change at line 728 ¶ | skipping to change at line 833 ¶ | |||
<td align="left">Offset in samples, relative to the track offset, of the index p oint.</td> | <td align="left">Offset in samples, relative to the track offset, of the index p oint.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(8)</tt></td> | <td align="left"><tt>u(8)</tt></td> | |||
<td align="left">The track index point number.</td> | <td align="left">The track index point number.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(3*8)</tt></td> | <td align="left"><tt>u(3*8)</tt></td> | |||
<td align="left">Reserved. All bits MUST be set to zero.</td> | <td align="left">Reserved. All bits <bcp14>MUST</bcp14> be set to zero.</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>For CD-DA, the track index point offset MUST be evenly divisible by 5 | </table><t>For CD-DA, the track index point offset <bcp14>MUST</bcp14> be evenly | |||
88 samples (588 samples = 44100 samples/s * 1/75 s). Note that the offset is fro | divisible by 588 samples (588 samples = 44100 samples/s * 1/75 s). Note that th | |||
m the beginning of the track, not the beginning of the audio data.</t> | e offset is from the beginning of the track, not the beginning of the audio data | |||
<t>For CD-DA, a track index point number of 0 corresponds to the track pre-gap. | .</t> | |||
The first index point in a track MUST have a number of 0 or 1, and subsequently, | <t>For CD-DA, a track index point number of 0 corresponds to the track pre-gap. | |||
index point numbers MUST increase by 1. Index point numbers MUST be unique with | The first index point in a track <bcp14>MUST</bcp14> have a number of 0 or 1, an | |||
in a track.</t> | d subsequently, index point numbers <bcp14>MUST</bcp14> increase by 1. Index poi | |||
nt numbers <bcp14>MUST</bcp14> be unique within a track.</t> | ||||
</section> | </section> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="picture"><name>Picture</name> | <section anchor="picture"><name>Picture</name> | |||
<t>The picture metadata block contains image data of a picture in some way belon | <t>The picture metadata block contains image data of a picture in some way belon | |||
ging to the audio contained in the FLAC file. Its format is derived from the API | ging to the audio contained in the FLAC file. Its format is derived from the Att | |||
C frame in the ID3v2 specification, see <xref target="ID3v2"></xref>. However, c | ached Picture (APIC) frame in the ID3v2 specification; see <xref target="ID3v2"> | |||
ontrary to the APIC frame in ID3v2, the media type and description are prepended | </xref>. However, contrary to the APIC frame in ID3v2, the media type and descri | |||
with a 4-byte length field instead of being 0x00 delimited strings. A FLAC file | ption are prepended with a 4-byte length field instead of being 0x00 delimited s | |||
MAY contain one or more picture metadata blocks.</t> | trings. A FLAC file <bcp14>MAY</bcp14> contain one or more picture metadata bloc | |||
<t>Note that while the length fields for media type, description, and picture da | ks.</t> | |||
ta are 4 bytes in length and could in theory code for a size up to 4 GiB, the to | <t>Note that while the length fields for media type, description, and picture da | |||
tal metadata block size cannot exceed what can be described by the metadata bloc | ta are 4 bytes in length and could code for a size up to 4 GiB in theory, the to | |||
k header, i.e., 16 MiB.</t> | tal metadata block size cannot exceed what can be described by the metadata bloc | |||
<t>Instead of picture data, the picture metadata block can also contain an URI a | k header, i.e., 16 MiB.</t> | |||
s described in <xref target="RFC3986"></xref>.</t> | <t>Instead of picture data, the picture metadata block can also contain a URI as | |||
described in <xref target="RFC3986"></xref>.</t> | ||||
<t>The structure of a picture metadata block is enumerated in the following tabl e.</t> | <t>The structure of a picture metadata block is enumerated in the following tabl e.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Data</th> | <th align="left">Data</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(32)</tt></td> | <td align="left"><tt>u(32)</tt></td> | |||
<td align="left">The picture type according to next table</td> | <td align="left">The picture type according to <xref target="table13"/>.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(32)</tt></td> | <td align="left"><tt>u(32)</tt></td> | |||
<td align="left">The length of the media type string in bytes.</td> | <td align="left">The length of the media type string in bytes.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(n*8)</tt></td> | <td align="left"><tt>u(n*8)</tt></td> | |||
<td align="left">The media type string as specified by <xref target="RFC2046"></ xref>, or the text string <tt>--></tt> to signify that the data part is a URI of the picture instead of the picture data itself. This field must be in printa ble ASCII characters 0x20-0x7E.</td> | <td align="left">The media type string as specified by <xref target="RFC2046"></ xref>, or the text string <tt>--></tt> to signify that the data part is a URI of the picture instead of the picture data itself. This field must be in printa ble ASCII characters 0x20-0x7E.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(32)</tt></td> | <td align="left"><tt>u(32)</tt></td> | |||
<td align="left">The length of the description string in bytes.</td> | <td align="left">The length of the description string in bytes.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(n*8)</tt></td> | <td align="left"><tt>u(n*8)</tt></td> | |||
<td align="left">The description of the picture, in UTF-8.</td> | <td align="left">The description of the picture in UTF-8.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(32)</tt></td> | <td align="left"><tt>u(32)</tt></td> | |||
<td align="left">The width of the picture in pixels.</td> | <td align="left">The width of the picture in pixels.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(32)</tt></td> | <td align="left"><tt>u(32)</tt></td> | |||
<td align="left">The height of the picture in pixels.</td> | <td align="left">The height of the picture in pixels.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(32)</tt></td> | <td align="left"><tt>u(32)</tt></td> | |||
<td align="left">The color depth of the picture in bits per pixel.</td> | <td align="left">The color depth of the picture in bits per pixel.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(32)</tt></td> | <td align="left"><tt>u(32)</tt></td> | |||
<td align="left">For indexed-color pictures (e.g., GIF), the number of colors us ed, or <tt>0</tt> for non-indexed pictures.</td> | <td align="left">For indexed-color pictures (e.g., GIF), the number of colors us ed; <tt>0</tt> for non-indexed pictures.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(32)</tt></td> | <td align="left"><tt>u(32)</tt></td> | |||
<td align="left">The length of the picture data in bytes.</td> | <td align="left">The length of the picture data in bytes.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(n*8)</tt></td> | <td align="left"><tt>u(n*8)</tt></td> | |||
<td align="left">The binary picture data.</td> | <td align="left">The binary picture data.</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>The height, width, color depth, and 'number of colors' fields are for | </table><t>The height, width, color depth, and "number of colors" fields are for | |||
informational purposes only. Applications MUST NOT use them in decoding the pic | informational purposes only. Applications <bcp14>MUST NOT</bcp14> use them in d | |||
ture or deciding how to display it, but MAY use them to decide whether to proces | ecoding the picture or deciding how to display it, but applications <bcp14>MAY</ | |||
s a block or not (e.g., when selecting between different picture blocks) and MAY | bcp14> use them to decide whether or not to process a block (e.g., when selectin | |||
show them to the user. If a picture has no concept for any of these fields (e.g | g between different picture blocks) and <bcp14>MAY</bcp14> show them to the user | |||
., vector images may not have a height or width in pixels) or the content of any | . If a picture has no concept for any of these fields (e.g., vector images may n | |||
field is unknown, the affected fields MUST be set to zero.</t> | ot have a height or width in pixels) or the content of any field is unknown, the | |||
<t>The following table contains all the defined picture types. Values other than | affected fields <bcp14>MUST</bcp14> be set to zero.</t> | |||
those listed in the table are reserved. There MAY only be one each of picture t | <t>The following table contains all the defined picture types. Values other than | |||
ypes 1 and 2 in a file. In general practice, many FLAC playback devices and soft | those listed in the table are reserved. There <bcp14>MAY</bcp14> only be one ea | |||
ware display the contents of a picture metadata block with picture type 3 (front | ch of picture types 1 and 2 in a file. In general practice, many FLAC playback d | |||
cover) during playback, if present.</t> | evices and software display the contents of a picture metadata block, if present | |||
<table> | , with picture type 3 (front cover) during playback.</t> | |||
<table anchor="table13"> | ||||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Value</th> | <th align="left">Value</th> | |||
<th align="left">Picture type</th> | <th align="left">Picture Type</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0</td> | <td align="left">0</td> | |||
<td align="left">Other</td> | <td align="left">Other</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">1</td> | <td align="left">1</td> | |||
<td align="left">PNG file icon of 32x32 pixels, see <xref target="RFC2083"></xre f></td> | <td align="left">PNG file icon of 32x32 pixels (see <xref target="RFC2083"></xre f>)</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">2</td> | <td align="left">2</td> | |||
<td align="left">General file icon</td> | <td align="left">General file icon</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">3</td> | <td align="left">3</td> | |||
<td align="left">Front cover</td> | <td align="left">Front cover</td> | |||
skipping to change at line 922 ¶ | skipping to change at line 1028 ¶ | |||
<tr> | <tr> | |||
<td align="left">19</td> | <td align="left">19</td> | |||
<td align="left">Band or artist logotype</td> | <td align="left">Band or artist logotype</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">20</td> | <td align="left">20</td> | |||
<td align="left">Publisher or studio logotype</td> | <td align="left">Publisher or studio logotype</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>The origin and use of value 17, "A bright colored fish", is | </table><t>The origin and use of value 17 ("A bright colored fish") is unclear. | |||
unclear. This was copied to maintain compatibility with ID3v2. Applications are | This was copied to maintain compatibility with ID3v2. Applications are discourag | |||
discouraged from offering this value to users when embedding a picture.</t> | ed from offering this value to users when embedding a picture.</t> | |||
<t>If not a picture but a URI is contained in this block, the following points a | <t>If a URI (not a picture) is contained in this block, the following points app | |||
pply:</t> | ly:</t> | |||
<ul spacing="compact"> | <ul> | |||
<li>The URI can be either in absolute or relative form. If an URI is in relative | <li>The URI can be in either absolute or relative form. If a URI is in relative | |||
form, it is related to the URI of the FLAC content processed.</li> | form, it is related to the URI of the FLAC content processed.</li> | |||
<li>Applications MUST obtain explicit user approval to retrieve images via remot | <li>Applications <bcp14>MUST</bcp14> obtain explicit user approval to retrieve i | |||
e protocols and to retrieve local images not located in the same directory as th | mages via remote protocols and to retrieve local images that are not located in | |||
e FLAC file being processed.</li> | the same directory as the FLAC file being processed.</li> | |||
<li>Applications supporting linked images MUST handle unavailability of URIs gra | <li>Applications supporting linked images <bcp14>MUST</bcp14> handle unavailabil | |||
cefully. They MAY report unavailability to the user.</li> | ity of URIs gracefully. They <bcp14>MAY</bcp14> report unavailability to the use | |||
<li>Applications MAY reject processing URIs for any reason, in particular for se | r.</li> | |||
curity or privacy reasons.</li> | <li>Applications <bcp14>MAY</bcp14> reject processing URIs for any reason, parti | |||
cularly for security or privacy reasons.</li> | ||||
</ul> | </ul> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="frame-structure"><name>Frame structure</name> | <section anchor="frame-structure"><name>Frame Structure</name> | |||
<t>Directly after the last metadata block, one or more frames follow. Each frame | <t>One or more frames follow directly after the last metadata block. Each frame | |||
consists of a frame header, one or more subframes, padding zero bits to achieve | consists of a frame header, one or more subframes, padding zero bits to achieve | |||
byte-alignment, and a frame footer. The number of subframes in each frame is eq | byte alignment, and a frame footer. The number of subframes in each frame is equ | |||
ual to the number of audio channels.</t> | al to the number of audio channels.</t> | |||
<t>Each frame header stores the audio sample rate, number of bits per sample, an | <t>Each frame header stores the audio sample rate, number of bits per sample, an | |||
d number of channels independently of the streaminfo metadata block and other fr | d number of channels independently of the streaminfo metadata block and other fr | |||
ame headers. This was done to permit multicasting of FLAC files, but it also all | ame headers. This was done to permit multicasting of FLAC files, but it also all | |||
ows these properties to change mid-stream. Because not all environments in which | ows these properties to change mid-stream. Because not all environments in which | |||
FLAC decoders are used are able to cope with changes to these properties during | FLAC decoders are used are able to cope with changes to these properties during | |||
playback, a decoder MAY choose to stop decoding on such a change. A decoder tha | playback, a decoder <bcp14>MAY</bcp14> choose to stop decoding on such a change | |||
t does not check for such a change could be vulnerable to buffer overflows. See | . A decoder that does not check for such a change could be vulnerable to buffer | |||
also <xref target="security-considerations"></xref>.</t> | overflows. See also <xref target="security-considerations"></xref>.</t> | |||
<t>Note that storing audio with changing audio properties in FLAC results in var | <t>Note that storing audio with changing audio properties in FLAC results in var | |||
ious practical problems. For example, these changes of audio properties must hap | ious practical problems. For example, these changes of audio properties must hap | |||
pen on a frame boundary, or the process will not be lossless. When a variable bl | pen on a frame boundary or the process will not be lossless. When a variable blo | |||
ock size is chosen to accommodate this, note that blocks smaller than 16 samples | ck size is chosen to accommodate this, note that blocks smaller than 16 samples | |||
are not allowed and it is therefore not possible to store an audio stream in wh | are not allowed; therefore, it is not possible to store an audio stream in which | |||
ich these properties change within 16 samples of the last change or the start of | these properties change within 16 samples of the last change or the start of th | |||
the file. Also, since the streaminfo metadata block can only accommodate a sing | e file. Also, since the streaminfo metadata block can only accommodate a single | |||
le set of properties, it is only valid for part of such an audio stream. Instead | set of properties, it is only valid for part of such an audio stream. Instead, i | |||
, it is RECOMMENDED to store an audio stream with changing properties in FLAC en | t is <bcp14>RECOMMENDED</bcp14> to store an audio stream with changing propertie | |||
capsulated in a container capable of handling such changes, as these do not suff | s in FLAC encapsulated in a container capable of handling such changes, as these | |||
er from the mentioned limitations. See <xref target="container-mappings"></xref> | do not suffer from the mentioned limitations. See <xref target="container-mappi | |||
for details.</t> | ngs"></xref> for details.</t> | |||
<section anchor="frame-header"><name>Frame header</name> | <section anchor="frame-header"><name>Frame Header</name> | |||
<t>Each frame MUST start on a byte boundary and starts with the 15-bit frame syn | <t>Each frame <bcp14>MUST</bcp14> start on a byte boundary and start with the 15 | |||
c code 0b111111111111100. Following the sync code is the blocking strategy bit, | -bit frame sync code 0b111111111111100. Following the sync code is the blocking | |||
which MUST NOT change during the audio stream. The blocking strategy bit is 0 fo | strategy bit, which <bcp14>MUST NOT</bcp14> change during the audio stream. The | |||
r a fixed block size stream or 1 for a variable block size stream. If the blocki | blocking strategy bit is 0 for a fixed block size stream or 1 for a variable blo | |||
ng strategy is known, a decoder can include this bit when searching for the star | ck size stream. If the blocking strategy is known, a decoder can include this bi | |||
t of a frame to reduce the possibility of encountering a false positive, as the | t when searching for the start of a frame to reduce the possibility of encounter | |||
first two bytes of a frame are either 0xFFF8 for a fixed block size stream or 0x | ing a false positive, as the first two bytes of a frame are either 0xFFF8 for a | |||
FFF9 for a variable block size stream.</t> | fixed block size stream or 0xFFF9 for a variable block size stream.</t> | |||
<section anchor="block-size-bits"><name>Block size bits</name> | <section anchor="block-size-bits"><name>Block Size Bits</name> | |||
<t>Following the frame sync code and blocking strategy bit are 4 bits (the first | <t>Following the frame sync code and blocking strategy bit are 4 bits (the first | |||
4 bits of the third byte of each frame) referred to as the block size bits. The | 4 bits of the third byte of each frame) referred to as the block size bits. The | |||
ir value relates to the block size according to the following table, where v is | ir value relates to the block size according to the following table, where v is | |||
the value of the 4 bits as an unsigned number. If the block size bits code for a | the value of the 4 bits as an unsigned number. If the block size bits code for a | |||
n uncommon block size, this is stored after the coded number, see <xref target=" | n uncommon block size, this is stored after the coded number; see <xref target=" | |||
uncommon-block-size"></xref>.</t> | uncommon-block-size"></xref>.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Value</th> | <th align="left">Value</th> | |||
<th align="left">Block size</th> | <th align="left">Block Size</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0b0000</td> | <td align="left">0b0000</td> | |||
<td align="left">reserved</td> | <td align="left">Reserved</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b0001</td> | <td align="left">0b0001</td> | |||
<td align="left">192</td> | <td align="left">192</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b0010 - 0b0101</td> | <td align="left">0b0010 - 0b0101</td> | |||
<td align="left">144 * (2^v), i.e., 576, 1152, 2304, or 4608</td> | <td align="left">144 * (2<sup>v</sup>), i.e., 576, 1152, 2304, or 4608</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b0110</td> | <td align="left">0b0110</td> | |||
<td align="left">uncommon block size minus 1 stored as an 8-bit number</td> | <td align="left">Uncommon block size minus 1, stored as an 8-bit number</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b0111</td> | <td align="left">0b0111</td> | |||
<td align="left">uncommon block size minus 1 stored as a 16-bit number</td> | <td align="left">Uncommon block size minus 1, stored as a 16-bit number</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b1000 - 0b1111</td> | <td align="left">0b1000 - 0b1111</td> | |||
<td align="left">2^v, i.e., 256, 512, 1024, 2048, 4096, 8192, 16384, or 32768</t d> | <td align="left">2<sup>v</sup>, i.e., 256, 512, 1024, 2048, 4096, 8192, 16384, o r 32768</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table></section> | </table></section> | |||
<section anchor="sample-rate-bits"><name>Sample rate bits</name> | <section anchor="sample-rate-bits"><name>Sample Rate Bits</name> | |||
<t>The next 4 bits (the last 4 bits of the third byte of each frame), referred t | <t>The next 4 bits (the last 4 bits of the third byte of each frame), referred t | |||
o as the sample rate bits, contain the sample rate of the audio according to the | o as the sample rate bits, contain the sample rate of the audio according to the | |||
following table. If the sample rate bits code for an uncommon sample rate, this | following table. If the sample rate bits code for an uncommon sample rate, this | |||
is stored after the uncommon block size or after the coded number if no uncommo | is stored after the uncommon block size; if no uncommon block size was used, th | |||
n block size was used. See <xref target="uncommon-sample-rate"></xref>.</t> | is is stored after the coded number. See <xref target="uncommon-sample-rate"></x | |||
ref>.</t> | ||||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Value</th> | <th align="left">Value</th> | |||
<th align="left">Sample rate</th> | <th align="left">Sample Rate</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0b0000</td> | <td align="left">0b0000</td> | |||
<td align="left">sample rate only stored in the streaminfo metadata block</td> | <td align="left">Sample rate only stored in the streaminfo metadata block</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b0001</td> | <td align="left">0b0001</td> | |||
<td align="left">88.2 kHz</td> | <td align="left">88.2 kHz</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b0010</td> | <td align="left">0b0010</td> | |||
<td align="left">176.4 kHz</td> | <td align="left">176.4 kHz</td> | |||
skipping to change at line 1058 ¶ | skipping to change at line 1164 ¶ | |||
<td align="left">48 kHz</td> | <td align="left">48 kHz</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b1011</td> | <td align="left">0b1011</td> | |||
<td align="left">96 kHz</td> | <td align="left">96 kHz</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b1100</td> | <td align="left">0b1100</td> | |||
<td align="left">uncommon sample rate in kHz stored as an 8-bit number</td> | <td align="left">Uncommon sample rate in kHz, stored as an 8-bit number</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b1101</td> | <td align="left">0b1101</td> | |||
<td align="left">uncommon sample rate in Hz stored as a 16-bit number</td> | <td align="left">Uncommon sample rate in Hz, stored as a 16-bit number</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b1110</td> | <td align="left">0b1110</td> | |||
<td align="left">uncommon sample rate in Hz divided by 10, stored as a 16-bit nu mber</td> | <td align="left">Uncommon sample rate in Hz divided by 10, stored as a 16-bit nu mber</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b1111</td> | <td align="left">0b1111</td> | |||
<td align="left">forbidden</td> | <td align="left">Forbidden</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table></section> | </table></section> | |||
<section anchor="channels-bits"><name>Channels bits</name> | <section anchor="channels-bits"><name>Channels Bits</name> | |||
<t>The next 4 bits (the first 4 bits of the fourth byte of each frame), referred to as the channels bits, contain both the number of channels of the audio as we ll as any stereo decorrelation used according to the following table.</t> | <t>The next 4 bits (the first 4 bits of the fourth byte of each frame), referred to as the channels bits, contain both the number of channels of the audio as we ll as any stereo decorrelation used according to the following table.</t> | |||
<t>If a channel layout different than the ones listed in the following table is used, this can be signaled with a WAVEFORMATEXTENSIBLE_CHANNEL_MASK tag in a Vor bis comment metadata block, see <xref target="channel-mask"></xref> for details. Note that even when such a different channel layout is specified with a WAVEFOR MATEXTENSIBLE_CHANNEL_MASK and the channel ordering in the following table is ov erridden, the channels bits still contain the actual number of channels coded in the frame. For details on the way left/side, right/side, and mid/side stereo ar e coded, see <xref target="interchannel-decorrelation"></xref>.</t> | <t>If a channel layout different than the ones listed in the following table is used, this can be signaled with a WAVEFORMATEXTENSIBLE_CHANNEL_MASK tag in a Vor bis comment metadata block; see <xref target="channel-mask"></xref> for details. Note that even when such a different channel layout is specified with a WAVEFOR MATEXTENSIBLE_CHANNEL_MASK and the channel ordering in the following table is ov erridden, the channels bits still contain the actual number of channels coded in the frame. For details on the way left-side, side-right, and mid-side stereo ar e coded, see <xref target="interchannel-decorrelation"></xref>.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Value</th> | <th align="left">Value</th> | |||
<th align="left">Channels</th> | <th align="left">Channels</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
skipping to change at line 1132 ¶ | skipping to change at line 1238 ¶ | |||
<td align="left">7 channels: front left, front right, front center, LFE, back ce nter, side left, side right</td> | <td align="left">7 channels: front left, front right, front center, LFE, back ce nter, side left, side right</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b0111</td> | <td align="left">0b0111</td> | |||
<td align="left">8 channels: front left, front right, front center, LFE, back le ft, back right, side left, side right</td> | <td align="left">8 channels: front left, front right, front center, LFE, back le ft, back right, side left, side right</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b1000</td> | <td align="left">0b1000</td> | |||
<td align="left">2 channels, left, right, stored as left/side stereo</td> | <td align="left">2 channels: left, right; stored as left-side stereo</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b1001</td> | <td align="left">0b1001</td> | |||
<td align="left">2 channels, left, right, stored as right/side stereo</td> | <td align="left">2 channels: left, right; stored as side-right stereo</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b1010</td> | <td align="left">0b1010</td> | |||
<td align="left">2 channels, left, right, stored as mid/side stereo</td> | <td align="left">2 channels: left, right; stored as mid-side stereo</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b1011 - 0b1111</td> | <td align="left">0b1011 - 0b1111</td> | |||
<td align="left">reserved</td> | <td align="left">Reserved</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table></section> | </table></section> | |||
<section anchor="bit-depth-bits"><name>Bit depth bits</name> | <section anchor="bit-depth-bits"><name>Bit Depth Bits</name> | |||
<t>The next 3 bits (bits 5, 6 and 7 of each fourth byte of each frame) contain t | <t>The next 3 bits (bits 5, 6, and 7 of each fourth byte of each frame) contain | |||
he bit depth of the audio according to the following table.</t> | the bit depth of the audio according to the following table. The next bit is res | |||
erved and <bcp14>MUST</bcp14> be zero.</t> | ||||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Value</th> | <th align="left">Value</th> | |||
<th align="left">Bit depth</th> | <th align="left">Bit Depth</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0b000</td> | <td align="left">0b000</td> | |||
<td align="left">bit depth only stored in the streaminfo metadata block</td> | <td align="left">Bit depth only stored in the streaminfo metadata block</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b001</td> | <td align="left">0b001</td> | |||
<td align="left">8 bits per sample</td> | <td align="left">8 bits per sample</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b010</td> | <td align="left">0b010</td> | |||
<td align="left">12 bits per sample</td> | <td align="left">12 bits per sample</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b011</td> | <td align="left">0b011</td> | |||
<td align="left">reserved</td> | <td align="left">Reserved</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b100</td> | <td align="left">0b100</td> | |||
<td align="left">16 bits per sample</td> | <td align="left">16 bits per sample</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b101</td> | <td align="left">0b101</td> | |||
<td align="left">20 bits per sample</td> | <td align="left">20 bits per sample</td> | |||
skipping to change at line 1203 ¶ | skipping to change at line 1309 ¶ | |||
<tr> | <tr> | |||
<td align="left">0b110</td> | <td align="left">0b110</td> | |||
<td align="left">24 bits per sample</td> | <td align="left">24 bits per sample</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b111</td> | <td align="left">0b111</td> | |||
<td align="left">32 bits per sample</td> | <td align="left">32 bits per sample</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>The next bit is reserved and MUST be zero.</t> | </table> | |||
</section> | </section> | |||
<section anchor="coded-number"><name>Coded number</name> | <section anchor="coded-number"><name>Coded Number</name> | |||
<t>Following the reserved bit (starting at the fifth byte of the frame) is eithe | <t>Following the reserved bit (starting at the fifth byte of the frame) is eithe | |||
r a sample or a frame number, which will be referred to as the coded number. Whe | r a sample or a frame number, which will be referred to as the coded number. Whe | |||
n dealing with variable block size streams, the sample number of the first sampl | n dealing with variable block size streams, the sample number of the first sampl | |||
e in the frame is encoded. When the file contains a fixed block size stream, the | e in the frame is encoded. When the file contains a fixed block size stream, the | |||
frame number is encoded. See <xref target="frame-header"></xref> on the blockin | frame number is encoded. See <xref target="frame-header"></xref> on the blockin | |||
g strategy bit which signals whether a stream is a fixed block size stream or a | g strategy bit, which signals whether a stream is a fixed block size stream or a | |||
variable block size stream. Also see <xref target="addition-of-blocking-strategy | variable block size stream. See also <xref target="addition-of-blocking-strateg | |||
-bit"></xref>.</t> | y-bit"></xref>.</t> | |||
<t>The coded number is stored in a variable length code like UTF-8 as defined in | <t>The coded number is stored in a variable-length code like UTF-8 as defined in | |||
<xref target="RFC3629"></xref>, but extended to a maximum of 36 bits unencoded, | <xref target="RFC3629"></xref> but extended to a maximum of 36 bits unencoded o | |||
7 bytes encoded.</t> | r 7 bytes encoded.</t> | |||
<t>When a frame number is encoded, the value MUST NOT be larger than what fits a | <t>When a frame number is encoded, the value <bcp14>MUST NOT</bcp14> be larger t | |||
value of 31 bits unencoded or 6 bytes encoded. Please note that as most general | han what fits a value of 31 bits unencoded or 6 bytes encoded. Please note that | |||
purpose UTF-8 encoders and decoders follow <xref target="RFC3629"></xref>, they | as most general purpose UTF-8 encoders and decoders follow <xref target="RFC3629 | |||
will not be able to handle these extended codes. Furthermore, while UTF-8 is sp | "></xref>, they will not be able to handle these extended codes. Furthermore, wh | |||
ecifically used to encode characters, FLAC uses it to encode numbers instead. To | ile UTF-8 is specifically used to encode characters, FLAC uses it to encode numb | |||
encode or decode a coded number, follow the procedures of Section 3 of <xref ta | ers instead. To encode or decode a coded number, follow the procedures in <xref | |||
rget="RFC3629"></xref>, but instead of using a character number, use a frame or | target="RFC3629" sectionFormat="of" section="3"/>, but instead of using a charac | |||
sample number, and instead of the table in Section 3 of <xref target="RFC3629">< | ter number, use a frame or sample number. In addition, use the extended table be | |||
/xref>, use the extended table below.</t> | low instead of the table in <xref target="RFC3629" sectionFormat="of" section="3 | |||
"/>.</t> | ||||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Number range (hexadecimal)</th> | <th align="left">Number Range (Hexadecimal)</th> | |||
<th align="left">Octet sequence (binary)</th> | <th align="left">Octet Sequence (Binary)</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0000 0000 0000 -<br /> | <td align="left">0000 0000 0000 -<br/> | |||
0000 0000 007F</td> | 0000 0000 007F</td> | |||
<td align="left">0xxxxxxx</td> | <td align="left">0xxxxxxx</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0000 0000 0080 -<br /> | <td align="left">0000 0000 0080 -<br/> | |||
0000 0000 07FF</td> | 0000 0000 07FF</td> | |||
<td align="left">110xxxxx 10xxxxxx</td> | <td align="left">110xxxxx 10xxxxxx</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0000 0000 0800 -<br /> | <td align="left">0000 0000 0800 -<br/> | |||
0000 0000 FFFF</td> | 0000 0000 FFFF</td> | |||
<td align="left">1110xxxx 10xxxxxx 10xxxxxx</td> | <td align="left">1110xxxx 10xxxxxx 10xxxxxx</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0000 0001 0000 -<br /> | <td align="left">0000 0001 0000 -<br/> | |||
0000 001F FFFF</td> | 0000 001F FFFF</td> | |||
<td align="left">11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</td> | <td align="left">11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0000 0020 0000 -<br /> | <td align="left">0000 0020 0000 -<br/> | |||
0000 03FF FFFF</td> | 0000 03FF FFFF</td> | |||
<td align="left">111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx</td> | <td align="left">111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0000 0400 0000 -<br /> | <td align="left">0000 0400 0000 -<br/> | |||
0000 7FFF FFFF</td> | 0000 7FFF FFFF</td> | |||
<td align="left">1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx</td> | <td align="left">1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0000 8000 0000 -<br /> | <td align="left">0000 8000 0000 -<br/> | |||
000F FFFF FFFF</td> | 000F FFFF FFFF</td> | |||
<td align="left">11111110 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx< /td> | <td align="left">11111110 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx< /td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>If the coded number is a frame number, it MUST be equal to the number | </table><t>If the coded number is a frame number, it <bcp14>MUST</bcp14> be equa | |||
of frames preceding the current frame. If the coded number is a sample number, | l to the number of frames preceding the current frame. If the coded number is a | |||
it MUST be equal to the number of samples preceding the current frame. In a stre | sample number, it <bcp14>MUST</bcp14> be equal to the number of samples precedin | |||
am where these requirements are not met, seeking is not (reliably) possible.</t> | g the current frame. In a stream where these requirements are not met, seeking i | |||
<t>For example, a frame that belongs to a variable block size stream and has exa | s not (reliably) possible.</t> | |||
ctly 51 billion samples preceding it, has its coded number constructed as follow | <t>For example, for a frame that belongs to a variable block size stream and has | |||
s.</t> | exactly 51 billion samples preceding it, the coded number is constructed as fol | |||
lows:</t> | ||||
<artwork><![CDATA[Octets 1-5 | <artwork type="ascii-art"> | |||
<![CDATA[Octets 1-5 | ||||
0b11111110 0b10101111 0b10011111 0b10110101 0b10100011 | 0b11111110 0b10101111 0b10011111 0b10110101 0b10100011 | |||
^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ | ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ | |||
| | | Bits 18-13 | | | | Bits 18-13 | |||
| | Bits 24-19 | | | Bits 24-19 | |||
| Bits 30-25 | | Bits 30-25 | |||
Bits 36-31 | Bits 36-31 | |||
Octets 6-7 | Octets 6-7 | |||
0b10111000 0b10000000 | 0b10111000 0b10000000 | |||
^^^^^^ ^^^^^^ | ^^^^^^ ^^^^^^ | |||
| Bits 6-1 | | Bits 6-1 | |||
Bits 12-7 | Bits 12-7 | |||
]]> | ]]></artwork> | |||
</artwork> | <t>A decoder that relies on the coded number during seeking could be vulnerable | |||
<t>A decoder that relies on the coded number during seeking could be vulnerable | to buffer overflows or getting stuck in an infinite loop if it seeks in a stream | |||
to buffer overflows or getting stuck in an infinite loop if it seeks in a stream | where the coded numbers are not strictly increasing or are otherwise not valid. | |||
where the coded numbers are not strictly increasing or otherwise not valid. See | See also <xref target="security-considerations"></xref>.</t> | |||
also <xref target="security-considerations"></xref>.</t> | ||||
</section> | </section> | |||
<section anchor="uncommon-block-size"><name>Uncommon block size</name> | <section anchor="uncommon-block-size"><name>Uncommon Block Size</name> | |||
<t>If the block size bits defined earlier in this section were 0b0110 or 0b0111 | ||||
(uncommon block size minus 1 stored), this follows the coded number as either an | <t>If the block size bits defined earlier in this section are 0b0110 or | |||
8-bit or a 16-bit unsigned number coded big-endian. A value of 65535 (correspon | 0b0111 (uncommon block size minus 1 stored), the block size minus 1 follows t | |||
ding to a block size of 65536) is forbidden and MUST NOT be used, because such a | he | |||
block size cannot be represented in the streaminfo metadata block. A value from | coded number as either an 8-bit or 16-bit unsigned number coded big-endian. A | |||
0 up to (and including) 14, which corresponds to a block size from 1 to 15, is | value of 65535 (corresponding to a block size of 65536) is forbidden and <bcp14 | |||
only valid for the last frame in a stream and MUST NOT be used for any other fra | >MUST NOT</bcp14> be used, because such a block size cannot be represented in th | |||
me. See also <xref target="streaminfo"></xref>.</t> | e streaminfo metadata block. A value from 0 up to (and including) 14, which corr | |||
esponds to a block size from 1 to 15, is only valid for the last frame in a stre | ||||
am and <bcp14>MUST NOT</bcp14> be used for any other frame. See also <xref targe | ||||
t="streaminfo"></xref>.</t> | ||||
</section> | </section> | |||
<section anchor="uncommon-sample-rate"><name>Uncommon sample rate</name> | <section anchor="uncommon-sample-rate"><name>Uncommon Sample Rate</name> | |||
<t>Following the uncommon block size (or the coded number if no uncommon block s | ||||
ize is stored) is the sample rate, if the sample rate bits were 0b1100, 0b1101, | <t> If the sample rate bits are 0b1100, 0b1101, or 0b1110 (uncommon sample | |||
or 0b1110 (uncommon sample rate stored), as either an 8-bit or a 16-bit unsigned | rate stored), the sample rate follows the uncommon block size (or the coded | |||
number coded big-endian.</t> | number if no uncommon block size is stored) as either an 8-bit or a 16-bit | |||
<t>The sample rate MUST NOT be 0 when the subframe contains audio. A sample rate | unsigned number coded big-endian.</t> | |||
of 0 MAY be used when non-audio is represented. See <xref target="streaminfo">< | <t>The sample rate <bcp14>MUST NOT</bcp14> be 0 when the subframe contains audio | |||
/xref> for details.</t> | . A sample rate of 0 <bcp14>MAY</bcp14> be used when non-audio is represented. S | |||
ee <xref target="streaminfo"></xref> for details.</t> | ||||
</section> | </section> | |||
<section anchor="frame-header-crc"><name>Frame header CRC</name> | <section anchor="frame-header-crc"><name>Frame Header CRC</name> | |||
<t>Finally, after either the frame/sample number, an uncommon block size, or an | <t>Finally, an 8-bit CRC follows the frame/sample number, an uncommon block size | |||
uncommon sample rate, depending on whether the latter two are stored, is an 8-bi | , or an uncommon sample rate (depending on whether the latter two are stored). T | |||
t CRC. This CRC is initialized with 0 and has the polynomial x^8 + x^2 + x^1 + x | his CRC is initialized with 0 and has the polynomial x<sup>8</sup> + x<sup>2</su | |||
^0. This CRC covers the whole frame header before the CRC, including the sync co | p> + x<sup>1</sup> + x<sup>0</sup>. This CRC covers the whole frame header befor | |||
de.</t> | e the CRC, including the sync code.</t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="subframes"><name>Subframes</name> | <section anchor="subframes"><name>Subframes</name> | |||
<t>Following the frame header are a number of subframes equal to the number of a | <t>Following the frame header are a number of subframes equal to the number of a | |||
udio channels. Note that as subframes contain a bitstream that does not necessar | udio channels. | |||
ily has to be a whole number of bytes, only the first subframe always starts at | Note that subframes contain a bitstream that does not necessarily have to be a w | |||
a byte boundary.</t> | hole number of bytes, so only the first subframe starts at a byte boundary.</t> | |||
<section anchor="subframe-header"><name>Subframe header</name> | <section anchor="subframe-header"><name>Subframe Header</name> | |||
<t>Each subframe starts with a header. The first bit of the header MUST be 0, fo | <t>Each subframe starts with a header. The first bit of the header <bcp14>MUST</ | |||
llowed by 6 bits describing which subframe type is used according to the followi | bcp14> be 0, followed by 6 bits that describe which subframe type is used accord | |||
ng table, where v is the value of the 6 bits as an unsigned number.</t> | ing to the following table, where v is the value of the 6 bits as an unsigned nu | |||
mber.</t> | ||||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Value</th> | <th align="left">Value</th> | |||
<th align="left">Subframe type</th> | <th align="left">Subframe Type</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0b000000</td> | <td align="left">0b000000</td> | |||
<td align="left">Constant subframe</td> | <td align="left">Constant subframe</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b000001</td> | <td align="left">0b000001</td> | |||
<td align="left">Verbatim subframe</td> | <td align="left">Verbatim subframe</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b000010 - 0b000111</td> | <td align="left">0b000010 - 0b000111</td> | |||
<td align="left">reserved</td> | <td align="left">Reserved</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b001000 - 0b001100</td> | <td align="left">0b001000 - 0b001100</td> | |||
<td align="left">Subframe with a fixed predictor of order v-8, i.e., 0, 1, 2, 3 or 4</td> | <td align="left">Subframe with a fixed predictor of order v-8; i.e., 0, 1, 2, 3 or 4</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b001101 - 0b011111</td> | <td align="left">0b001101 - 0b011111</td> | |||
<td align="left">reserved</td> | <td align="left">Reserved</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0b100000 - 0b111111</td> | <td align="left">0b100000 - 0b111111</td> | |||
<td align="left">Subframe with a linear predictor of order v-31, i.e., 1 through 32 (inclusive)</td> | <td align="left">Subframe with a linear predictor of order v-31; i.e., 1 through 32 (inclusive)</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>Following the subframe type bits is a bit that flags whether the subf | </table> | |||
rame uses any wasted bits (see <xref target="wasted-bits-per-sample"></xref>). I | <t>Following the subframe type bits is a bit that flags whether the subframe use | |||
f it is 0, the subframe doesn't use any wasted bits and the subframe header is c | s any wasted bits (see <xref target="wasted-bits-per-sample"></xref>). If the fl | |||
omplete. If it is 1, the subframe does use wasted bits and the number of used wa | ag bit is 0, the subframe doesn't use any wasted bits and the subframe header is | |||
sted bits follows unary coded.</t> | complete. If the flag bit is 1, the subframe uses wasted bits | |||
and the number of used wasted bits minus 1 appears | ||||
in unary form, directly following the flag bit.</t> | ||||
</section> | </section> | |||
<section anchor="wasted-bits-per-sample"><name>Wasted bits per sample</name> | <section anchor="wasted-bits-per-sample"><name>Wasted Bits per Sample</name> | |||
<t>Most uncompressed audio file formats can only store audio samples with a bit | <t>Most uncompressed audio file formats can only store audio samples with a bit | |||
depth that is an integer number of bytes. Samples of which the bit depth is not | depth that is an integer number of bytes. Samples in which the bit depth is not | |||
an integer number of bytes are usually stored in such formats by padding them wi | an integer number of bytes are usually stored in such formats by padding them wi | |||
th least-significant zero bits to a bit depth that is an integer number of bytes | th least-significant zero bits to a bit depth that is an integer number of bytes | |||
. For example, shifting a 14-bit sample right by 2 pads it to a 16-bit sample, w | . For example, shifting a 14-bit sample right by 2 pads it to a 16-bit sample, w | |||
hich then has two zero least-significant bits. In this specification, these leas | hich then has two zero least-significant bits. In this specification, these leas | |||
t-significant zero bits are referred to as wasted bits per sample or simply wast | t-significant zero bits are referred to as wasted bits per sample or simply wast | |||
ed bits. They are wasted in the sense that they contain no information, but are | ed bits. They are wasted in the sense that they contain no information but are s | |||
stored anyway.</t> | tored anyway.</t> | |||
<t>The FLAC format can optionally take advantage of these wasted bits by signali | <t>The FLAC format can optionally take advantage of these wasted bits by signali | |||
ng their presence and coding the subframe without them. To do this, the wasted b | ng their presence and coding the subframe without them. To do this, the wasted b | |||
its per sample flag in a subframe header is set to 0 and the number of wasted bi | its per sample flag in a subframe | |||
ts per sample (k) minus 1 follows the flag in an unary encoding. For example, if | header is set to 1 and the number of wasted bits per sample | |||
k is 3, 0b001 follows. If k = 0, the wasted bits per sample flag is 0 and no un | (k) minus 1 follows the flag in an unary encoding. For example, if k is 3, 0b | |||
ary coded k follows. In this document, if a subframe header signals a certain nu | 001 follows. If k = 0, the wasted bits per sample flag is 0 and no unary-coded k | |||
mber of wasted bits, it is said it 'uses' these wasted bits.</t> | follows. In this document, if a subframe header signals a certain number of was | |||
<t>If a subframe uses wasted bits (i.e., k is not equal to 0), samples are coded | ted bits, it is said it "uses" these wasted bits.</t> | |||
ignoring k least-significant bits. For example, if a frame not employing stereo | <t>If a subframe uses wasted bits (i.e., k is not equal to 0), samples are coded | |||
decorrelation specifies a sample size of 16 bits per sample in the frame header | ignoring k least-significant bits. For example, if a frame not employing stereo | |||
and k of a subframe is 3, samples in the subframe are coded as 13 bits per samp | decorrelation specifies a sample size of 16 bits per sample in the frame header | |||
le. For more details, see <xref target="constant-subframe"></xref> on how the bi | and k of a subframe is 3, samples in the subframe are coded as 13 bits per samp | |||
t depth of a subframe is calculated. A decoder MUST add k least-significant zero | le. For more details, see <xref target="constant-subframe"></xref> on how the bi | |||
bits by shifting left (padding) after decoding a subframe sample. If the frame | t depth of a subframe is calculated. A decoder <bcp14>MUST</bcp14> add k least-s | |||
has left/side, right/side, or mid/side stereo, a decoder MUST perform padding on | ignificant zero bits by shifting left (padding) after decoding a subframe sample | |||
the subframes before restoring the channels to left and right. The number of wa | . If the frame has left-side, side-right, or mid-side stereo, a decoder <bcp14>M | |||
sted bits per sample MUST be such that the resulting number of bits per sample ( | UST</bcp14> perform padding on the subframes before restoring the channels to le | |||
of which the calculation is explained in <xref target="constant-subframe"></xref | ft and right. The number of wasted bits per sample <bcp14>MUST</bcp14> be such t | |||
>) is larger than zero.</t> | hat the resulting number of bits per sample (of which the calculation is explain | |||
<t>Besides audio files that have a certain number of wasted bits for the whole f | ed in <xref target="constant-subframe"></xref>) is larger than zero.</t> | |||
ile, there exist audio files in which the number of wasted bits varies. There ar | <t>Besides audio files that have a certain number of wasted bits for the whole f | |||
e DVD-Audio discs in which blocks of samples have had their least-significant bi | ile, audio files exist in which the number of wasted bits varies. There are DVD- | |||
ts selectively zeroed to slightly improve the compression of their otherwise los | Audio discs in which blocks of samples have had their least-significant bits sel | |||
sless Meridian Lossless Packing codec, see <xref target="MLP"></xref>. There are | ectively zeroed to slightly improve the compression of their otherwise lossless | |||
also audio processors like lossyWAV, see <xref target="lossyWAV"></xref>, which | Meridian Lossless Packing codec; see <xref target="MLP"></xref>. There are also | |||
zero a number of least-sigificant bits for a block of samples, increasing the c | audio processors like lossyWAV (see <xref target="lossyWAV"></xref>) that zero a | |||
ompression in a non-lossless way. Because of this, the number of wasted bits k M | number of least-significant bits for a block of samples, increasing the compres | |||
AY change between frames and MAY differ between subframes. If the number of wast | sion in a non-lossless way. Because of this, the number of wasted bits k <bcp14> | |||
ed bits changes halfway through a subframe (e.g., the first part has 2 wasted bi | MAY</bcp14> change between frames and <bcp14>MAY</bcp14> differ between subframe | |||
ts and the second part has 4 wasted bits) the subframe uses the lowest number of | s. If the number of wasted bits changes halfway through a subframe (e.g., the fi | |||
wasted bits, as otherwise non-zero bits would be discarded and the process woul | rst part has 2 wasted bits and the second part has 4 wasted bits), the subframe | |||
d not be lossless.</t> | uses the lowest number of wasted bits; otherwise, non-zero bits would be discard | |||
ed, and the process would not be lossless.</t> | ||||
</section> | </section> | |||
<section anchor="constant-subframe"><name>Constant subframe</name> | <section anchor="constant-subframe"><name>Constant Subframe</name> | |||
<t>In a constant subframe, only a single sample is stored. This sample is stored | <t>In a constant subframe, only a single sample is stored. This sample is stored | |||
as an integer number coded big-endian, signed two's complement. The number of b | as an integer number coded big-endian, signed two's complement. The number of b | |||
its used to store this sample depends on the bit depth of the current subframe. | its used to store this sample depends on the bit depth of the current subframe. | |||
The bit depth of a subframe is equal to the bit depth as coded in the frame head | The bit depth of a subframe is equal to the bit depth as coded in the frame head | |||
er (see <xref target="bit-depth-bits"></xref>), minus the number of used wasted | er (see <xref target="bit-depth-bits"></xref>) minus the number of used wasted b | |||
bits coded in the subframe header (see <xref target="wasted-bits-per-sample"></x | its coded in the subframe header (see <xref target="wasted-bits-per-sample"></xr | |||
ref>). If a subframe is a side subframe (see <xref target="interchannel-decorrel | ef>). If a subframe is a side subframe (see <xref target="interchannel-decorrela | |||
ation"></xref>), the bit depth of that subframe is increased by 1 bit.</t> | tion"></xref>), the bit depth of that subframe is increased by 1 bit.</t> | |||
</section> | </section> | |||
<section anchor="verbatim-subframe"><name>Verbatim subframe</name> | <section anchor="verbatim-subframe"><name>Verbatim Subframe</name> | |||
<t>A verbatim subframe stores all samples unencoded in sequential order. See <xr | <t>A verbatim subframe stores all samples unencoded in sequential order. See <xr | |||
ef target="constant-subframe"></xref> on how a sample is stored unencoded. The n | ef target="constant-subframe"></xref> on how a sample is stored unencoded. The n | |||
umber of samples that need to be stored in a subframe is given by the block size | umber of samples that need to be stored in a subframe is provided by the block s | |||
in the frame header.</t> | ize in the frame header.</t> | |||
</section> | </section> | |||
<section anchor="fixed-predictor-subframe"><name>Fixed predictor subframe</name> | <section anchor="fixed-predictor-subframe"><name>Fixed Predictor | |||
<t>Five different fixed predictors are defined in the following table, one for e | Subframe</name> | |||
ach prediction order 0 through 4. In the table is also a derivation, which expla | <t>Five different fixed predictors are defined in the following table, one for e | |||
ins the rationale for choosing these fixed predictors.</t> | ach prediction order 0 through 4. The table also contains a derivation that expl | |||
ains the rationale for choosing these fixed predictors.</t> | ||||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Order</th> | <th align="left">Order</th> | |||
<th align="left">Prediction</th> | <th align="left">Prediction</th> | |||
<th align="left">Derivation</th> | <th align="left">Derivation</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
skipping to change at line 1400 ¶ | skipping to change at line 1520 ¶ | |||
<td align="left">3 * a(n-1) - 3 * a(n-2) + a(n-3)</td> | <td align="left">3 * a(n-1) - 3 * a(n-2) + a(n-3)</td> | |||
<td align="left">a(n-1) + a'(n-1) + a''(n-1)</td> | <td align="left">a(n-1) + a'(n-1) + a''(n-1)</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">4</td> | <td align="left">4</td> | |||
<td align="left">4 * a(n-1) - 6 * a(n-2) + 4 * a(n-3) - a(n-4)</td> | <td align="left">4 * a(n-1) - 6 * a(n-2) + 4 * a(n-3) - a(n-4)</td> | |||
<td align="left">a(n-1) + a'(n-1) + a''(n-1) + a'''(n-1)</td> | <td align="left">a(n-1) + a'(n-1) + a''(n-1) + a'''(n-1)</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>Where</t> | </table><t>Where:</t> | |||
<ul spacing="compact"> | <ul> | |||
<li>n is the number of the sample being predicted.</li> | <li>n is the number of the sample being predicted.</li> | |||
<li>a(n) is the sample being predicted.</li> | <li>a(n) is the sample being predicted.</li> | |||
<li>a(n-1) is the sample before the one being predicted.</li> | <li>a(n-1) is the sample before the one being predicted.</li> | |||
<li>a'(n-1) is the difference between the previous sample and the sample before that, i.e., a(n-1) - a(n-2). This is the closest available first-order discrete derivative.</li> | <li>a'(n-1) is the difference between the previous sample and the sample before that, i.e., a(n-1) - a(n-2). This is the closest available first-order discrete derivative.</li> | |||
<li>a''(n-1) is a'(n-1) - a'(n-2) or the closest available second-order discrete derivative.</li> | <li>a''(n-1) is a'(n-1) - a'(n-2) or the closest available second-order discrete derivative.</li> | |||
<li>a'''(n-1) is a''(n-1) - a''(n-2) or the closest available third-order discre te derivative.</li> | <li>a'''(n-1) is a''(n-1) - a''(n-2) or the closest available third-order discre te derivative.</li> | |||
</ul> | </ul> | |||
<t>As a predictor makes use of samples preceding the sample that is predicted, i t can only be used when enough samples are known. As each subframe in FLAC is co ded completely independently, the first few samples in each subframe cannot be p redicted. Therefore, a number of so-called warm-up samples equal to the predicto r order is stored. These are stored unencoded, bypassing the predictor and resid ual coding stages. See <xref target="constant-subframe"></xref> on how samples a re stored unencoded. The table below defines how a fixed predictor subframe appe ars in the bitstream.</t> | <t>As a predictor makes use of samples preceding the sample that is predicted, i t can only be used when enough samples are known. As each subframe in FLAC is co ded completely independently, the first few samples in each subframe cannot be p redicted. Therefore, a number of so-called warm-up samples equal to the predicto r order is stored. These are stored unencoded, bypassing the predictor and resid ual coding stages. See <xref target="constant-subframe"></xref> on how samples a re stored unencoded. The table below defines how a fixed predictor subframe appe ars in the bitstream.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
skipping to change at line 1430 ¶ | skipping to change at line 1550 ¶ | |||
<tr> | <tr> | |||
<td align="left"><tt>s(n)</tt></td> | <td align="left"><tt>s(n)</tt></td> | |||
<td align="left">Unencoded warm-up samples (n = subframe's bits per sample * pre dictor order).</td> | <td align="left">Unencoded warm-up samples (n = subframe's bits per sample * pre dictor order).</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">Coded residual</td> | <td align="left">Coded residual</td> | |||
<td align="left">Coded residual as defined in <xref target="coded-residual"></xr ef></td> | <td align="left">Coded residual as defined in <xref target="coded-residual"></xr ef></td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>As the fixed predictors are specified, they do not have to be stored. The fixed predictor order, which is stored in the subframe header, specifies wh ich predictor is used.</t> | </table><t>Because fixed predictors are specified, they do not have to be stored . The fixed predictor order, which is stored in the subframe header, specifies w hich predictor is used.</t> | |||
<t>To encode a signal with a fixed predictor, each sample has the corresponding prediction subtracted and sent to the residual coder. To decode a signal with a fixed predictor, the residual is decoded, and then the prediction can be added f or each sample. This means that decoding is necessarily a sequential process wit hin a subframe, as for each sample, enough fully decoded previous samples are ne eded to calculate the prediction.</t> | <t>To encode a signal with a fixed predictor, each sample has the corresponding prediction subtracted and sent to the residual coder. To decode a signal with a fixed predictor, the residual is decoded, and then the prediction can be added f or each sample. This means that decoding is necessarily a sequential process wit hin a subframe, as for each sample, enough fully decoded previous samples are ne eded to calculate the prediction.</t> | |||
<t>For fixed predictor order 0, the prediction is always 0, thus each residual s | <t>For fixed predictor order 0, the prediction is always 0; thus, each residual | |||
ample is equal to its corresponding input or decoded sample. The difference betw | sample is equal to its corresponding input or decoded sample. The difference bet | |||
een a fixed predictor with order 0 and a verbatim subframe, is that a verbatim s | ween a fixed predictor with order 0 and a verbatim subframe is that a verbatim s | |||
ubframe stores all samples unencoded, while a fixed predictor with order 0 has a | ubframe stores all samples unencoded while a fixed predictor with order 0 has al | |||
ll its samples processed by the residual coder.</t> | l its samples processed by the residual coder.</t> | |||
<t>The first order fixed predictor is comparable to how DPCM encoding works, as | <t>The first-order fixed predictor is comparable to how differential pulse-code | |||
the resulting residual sample is the difference between the corresponding sample | modulation (DPCM) encoding works, as the resulting residual sample is the differ | |||
and the sample before it. The higher order fixed predictors can be understood a | ence between the corresponding sample and the sample before it. The higher-order | |||
s polynomials fitted to the previous samples.</t> | fixed predictors can be understood as polynomials fitted to the previous sample | |||
s.</t> | ||||
</section> | </section> | |||
<section anchor="linear-predictor-subframe"><name>Linear predictor subframe</nam | <section anchor="linear-predictor-subframe"><name>Linear Predictor Subframe</nam | |||
e> | e> | |||
<t>Whereas fixed predictors are well suited for simple signals, using a (non-fix | <t>Whereas fixed predictors are well suited for simple signals, using a (non-fix | |||
ed) linear predictor on more complex signals can improve compression by making t | ed) linear predictor on more complex signals can improve compression by making t | |||
he residual samples even smaller. There is a certain trade-off however, as stori | he residual samples even smaller. There is a certain trade-off, however, as stor | |||
ng the predictor coefficients takes up space as well.</t> | ing the predictor coefficients takes up space as well.</t> | |||
<t>In the FLAC format, a predictor is defined by up to 32 predictor coefficients | <t>In the FLAC format, a predictor is defined by up to 32 predictor coefficients | |||
and a shift. To form a prediction, each coefficient is multiplied by its corres | and a shift. To form a prediction, each coefficient is multiplied by its corres | |||
ponding past sample, the results are summed, and this sum is then shifted. To en | ponding past sample, the results are summed, and this sum is then shifted. To en | |||
code a signal with a linear predictor, each sample has the corresponding predict | code a signal with a linear predictor, each sample has the corresponding predict | |||
ion subtracted and sent to the residual coder. To decode a signal with a linear | ion subtracted and sent to the residual coder. To decode a signal with a linear | |||
predictor, the residual is decoded, and then the prediction can be added for eac | predictor, the residual is decoded, and then the prediction can be added for eac | |||
h sample. This means that decoding MUST be a sequential process within a subfram | h sample. This means that decoding <bcp14>MUST</bcp14> be a sequential process w | |||
e, as for each sample, enough decoded samples are needed to calculate the predic | ithin a subframe, as enough decoded samples are needed to calculate the predicti | |||
tion.</t> | on for each sample.</t> | |||
<t>The table below defines how a linear predictor subframe appears in the bitstr eam.</t> | <t>The table below defines how a linear predictor subframe appears in the bitstr eam.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Data</th> | <th align="left">Data</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left"><tt>s(n)</tt></td> | <td align="left"><tt>s(n)</tt></td> | |||
<td align="left">Unencoded warm-up samples (n = subframe's bits per sample * lpc order).</td> | <td align="left">Unencoded warm-up samples (n = subframe's bits per sample * LPC order).</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>u(4)</tt></td> | <td align="left"><tt>u(4)</tt></td> | |||
<td align="left">(Predictor coefficient precision in bits)-1 (NOTE: 0b1111 is fo rbidden).</td> | <td align="left">(Predictor coefficient precision in bits)-1 (Note: 0b1111 is fo rbidden).</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>s(5)</tt></td> | <td align="left"><tt>s(5)</tt></td> | |||
<td align="left">Prediction right shift needed in bits.</td> | <td align="left">Prediction right shift needed in bits.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left"><tt>s(n)</tt></td> | <td align="left"><tt>s(n)</tt></td> | |||
<td align="left">Predictor coefficients (n = predictor coefficient precision * l pc order).</td> | <td align="left">Predictor coefficients (n = predictor coefficient precision * L PC order).</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">Coded residual</td> | <td align="left">Coded residual</td> | |||
<td align="left">Coded residual as defined in <xref target="coded-residual"></xr ef></td> | <td align="left">Coded residual as defined in <xref target="coded-residual"></xr ef>.</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>See <xref target="constant-subframe"></xref> on how the warm-up sampl es are stored unencoded. The predictor coefficients are stored as an integer num ber coded big-endian, signed two's complement, where the number of bits needed f or each coefficient is defined by the predictor coefficient precision. While the prediction right shift is signed two's complement, this number MUST NOT be nega tive, see <xref target="restriction-of-lpc-shift-to-non-negative-values"></xref> for an explanation why this is.</t> | </table><t>See <xref target="constant-subframe"></xref> on how the warm-up sampl es are stored unencoded. The predictor coefficients are stored as an integer num ber coded big-endian, signed two's complement, where the number of bits needed f or each coefficient is defined by the predictor coefficient precision. While the prediction right shift is signed two's complement, this number <bcp14>MUST NOT< /bcp14> be negative; see <xref target="restriction-of-lpc-shift-to-non-negative- values"></xref> for an explanation why this is.</t> | |||
<t>Please note that the order in which the predictor coefficients appear in the bitstream corresponds to which <strong>past</strong> sample they belong to. In o ther words, the order of the predictor coefficients is opposite to the chronolog ical order of the samples. So, the first predictor coefficient has to be multipl ied with the sample directly before the sample that is being predicted, the seco nd predictor coefficient has to be multiplied with the sample before that, etc.< /t> | <t>Please note that the order in which the predictor coefficients appear in the bitstream corresponds to which <strong>past</strong> sample they belong to. In o ther words, the order of the predictor coefficients is opposite to the chronolog ical order of the samples. So, the first predictor coefficient has to be multipl ied with the sample directly before the sample that is being predicted, the seco nd predictor coefficient has to be multiplied with the sample before that, etc.< /t> | |||
</section> | </section> | |||
<section anchor="coded-residual"><name>Coded residual</name> | <section anchor="coded-residual"><name>Coded Residual</name> | |||
<t>The first two bits in a coded residual indicate which coding method is used. See the table below.</t> | <t>The first two bits in a coded residual indicate which coding method is used. See the table below.</t> | |||
<table> | <table anchor="coded-residual-table"> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="right">Value</th> | <th align="left">Value</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="right">0b00</td> | <td align="left">0b00</td> | |||
<td align="left">partitioned Rice code with 4-bit parameters</td> | <td align="left">Partitioned Rice code with 4-bit parameters</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="right">0b01</td> | <td align="left">0b01</td> | |||
<td align="left">partitioned Rice code with 5-bit parameters</td> | <td align="left">Partitioned Rice code with 5-bit parameters</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="right">0b10 - 0b11</td> | <td align="left">0b10 - 0b11</td> | |||
<td align="left">reserved</td> | <td align="left">Reserved</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>Both defined coding methods work the same way, but differ in the numb er of bits used for Rice parameters. The 4 bits that directly follow the coding method bits form the partition order, which is an unsigned number. The rest of t he coded residual consists of 2^(partition order) partitions. For example, if th e 4 bits are 0b1000, the partition order is 8 and the residual is split up into 2^8 = 256 partitions.</t> | </table><t>Both defined coding methods work the same way but differ in the numbe r of bits used for Rice parameters. The 4 bits that directly follow the coding m ethod bits form the partition order, which is an unsigned number. The rest of th e coded residual consists of 2<sup>(partition order)</sup> partitions. For examp le, if the 4 bits are 0b1000, the partition order is 8, and the residual is spli t up into 2<sup>8</sup> = 256 partitions.</t> | |||
<t>Each partition contains a certain number of residual samples. The number of r esidual samples in the first partition is equal to (block size >> partitio n order) - predictor order, i.e., the block size divided by the number of partit ions minus the predictor order. In all other partitions, the number of residual samples is equal to (block size >> partition order).</t> | <t>Each partition contains a certain number of residual samples. The number of r esidual samples in the first partition is equal to (block size >> partitio n order) - predictor order, i.e., the block size divided by the number of partit ions minus the predictor order. In all other partitions, the number of residual samples is equal to (block size >> partition order).</t> | |||
<t>The partition order MUST be such that the block size is evenly divisible by t | <t>The partition order <bcp14>MUST</bcp14> be such that the block size is evenly | |||
he number of partitions. This means, for example, that for all odd block sizes, | divisible by the number of partitions. | |||
only partition order 0 is allowed. The partition order also MUST be such that t | This means, for example, that only partition order 0 is allowed for all odd bloc | |||
he (block size >> partition order) is larger than the predictor order. Thi | k sizes. | |||
s means, for example, that with a block size of 4096 and a predictor order of 4, | The partition order also <bcp14>MUST</bcp14> be such that the (block size >&g | |||
the partition order cannot be larger than 9.</t> | t; partition order) is larger than the predictor order. This means, for example, | |||
<t>Each partition starts with a parameter. If the coded residual of a subframe i | that with a block size of 4096 and a predictor order of 4, the partition order | |||
s one with 4-bit Rice parameters (see the table at the start of this section), t | cannot be larger than 9.</t> | |||
he first 4 bits of each partition are either a Rice parameter or an escape code. | <t>Each partition starts with a parameter. If the coded residual of a subframe i | |||
These 4 bits indicate an escape code if they are 0b1111, otherwise they contain | s one with 4-bit Rice parameters (see <xref target="coded-residual-table"/>), th | |||
the Rice parameter as an unsigned number. If the coded residual of the current | e first 4 bits of each partition are either a Rice parameter or an escape code. | |||
subframe is one with 5-bit Rice parameters, the first 5 bits of each partition i | These 4 bits indicate an escape code if they are 0b1111; otherwise, they contain | |||
ndicate an escape code if they are 0b11111, otherwise, they contain the Rice par | the Rice parameter as an unsigned number. If the coded residual of the current | |||
ameter as an unsigned number as well.</t> | subframe is one with 5-bit Rice parameters, the first 5 bits of each partition i | |||
ndicate an escape code if they are 0b11111; otherwise, they contain the Rice par | ||||
ameter as an unsigned number as well.</t> | ||||
<section anchor="escaped-partition"><name>Escaped partition</name> | <section anchor="escaped-partition"><name>Escaped Partition</name> | |||
<t>If an escape code was used, the partition does not contain a variable-length | <t>If an escape code was used, the partition does not contain a variable-length | |||
Rice coded residual, but a fixed-length unencoded residual. Directly following t | Rice-coded residual; rather, it contains a fixed-length unencoded residual. Dire | |||
he escape code are 5 bits containing the number of bits with which each residual | ctly following the escape code are 5 bits containing the number of bits with whi | |||
sample is stored, as an unsigned number. The residual samples themselves are st | ch each residual sample is stored, as an unsigned number. The residual samples t | |||
ored signed two's complement. For example, when a partition is escaped and each | hemselves are stored signed two's complement. For example, when a partition is e | |||
residual sample is stored with 3 bits, the number -1 is represented as 0b111.</t | scaped and each residual sample is stored with 3 bits, the number -1 is represen | |||
> | ted as 0b111.</t> | |||
<t>Note that it is possible that the number of bits with which each sample is st | <t>Note that it is possible that the number of bits with which each sample is st | |||
ored is 0, which means all residual samples in that partition have a value of 0 | ored is 0, which means that all residual samples in that partition have a value | |||
and that no bits are used to store the samples. In that case, the partition cont | of 0 and that no bits are used to store the samples. In that case, the partition | |||
ains nothing except the escape code and 0b00000.</t> | contains nothing except the escape code and 0b00000.</t> | |||
</section> | </section> | |||
<section anchor="rice-code"><name>Rice code</name> | <section anchor="rice-code"><name>Rice Code</name> | |||
<t>If a Rice parameter was provided for a certain partition, that partition cont | <t>If a Rice parameter was provided for a certain partition, that partition cont | |||
ains a Rice coded residual. The residual samples, which are signed numbers, are | ains a Rice-coded residual. The residual samples, which are signed numbers, are | |||
represented by unsigned numbers in the Rice code. For positive numbers, the repr | represented by unsigned numbers in the Rice code. For positive numbers, the repr | |||
esentation is the number doubled, for negative numbers, the representation is th | esentation is the number doubled. For negative numbers, the representation is th | |||
e number multiplied by -2 and has 1 subtracted. This representation of signed nu | e number multiplied by -2 and with 1 subtracted. This representation of signed n | |||
mbers is also known as zigzag encoding. The zigzag encoded residual is called th | umbers is also known as zigzag encoding. The zigzag-encoded residual is called t | |||
e folded residual.</t> | he folded residual.</t> | |||
<t>Each folded residual sample is then split into two parts, a most-significant part and a least-significant part. The Rice parameter at the start of each parti tion determines where that split lies: it is the number of bits in the least-sig nificant part. Each residual sample is then stored by coding the most-significan t part as unary, followed by the least-significant part as binary.</t> | <t>Each folded residual sample is then split into two parts, a most-significant part and a least-significant part. The Rice parameter at the start of each parti tion determines where that split lies: it is the number of bits in the least-sig nificant part. Each residual sample is then stored by coding the most-significan t part as unary, followed by the least-significant part as binary.</t> | |||
<t>For example, take a partition with Rice parameter 3 containing a folded resid | <t>For example, take a partition with Rice parameter 3 containing a folded resid | |||
ual sample with 38 as its value, which is 0b100110 in binary. The most-significa | ual sample with 38 as its value, which is 0b100110 in binary. | |||
nt part is 0b100 (4) and is stored unary as 0b00001. The least-significant part | The most-significant part is 0b100 (4) and is stored in unary form as 0b00001. T | |||
is 0b110 (6) and is stored as is. The Rice code word is thus 0b00001110. The Ric | he least-significant part is 0b110 (6) and is stored as is. The Rice code word i | |||
e code words for all residual samples in a partition are stored consecutively.</ | s thus 0b00001110. The Rice code words for all residual samples in a partition a | |||
t> | re stored consecutively.</t> | |||
<t>To decode a Rice code word, zero bits must be counted until encountering a on | ||||
e bit, after which a number of bits given by the Rice parameter must be read. Th | <t>To decode a Rice code word, zero bits must be counted until encountering a on | |||
e count of zero bits is shifted left by the Rice parameter (i.e., multiplied by | e bit, after which a number of bits given by the Rice parameter must be read. | |||
2 raised to the power Rice parameter) and bitwise ORed with (i.e., added to) the | The count of zero bits is shifted left by the Rice parameter (i.e., multiplied b | |||
read value. This is the folded residual value. An even folded residual value is | y 2 raised to the power Rice parameter) and bitwise ORed with (i.e., added to) t | |||
shifted right 1 bit (i.e., divided by two) to get the (unfolded) residual value | he read value. This is the folded residual value. An even folded residual value | |||
. An odd folded residual value is shifted right 1 bit and then has all bits flip | is shifted right 1 bit (i.e., divided by 2) to get the (unfolded) residual value | |||
ped (1 added to and divided by -2) to get the (unfolded) residual value, subject | . An odd folded residual value is shifted right 1 bit and then has all bits flip | |||
to negative numbers being signed two's complement on the decoding machine.</t> | ped (1 added to and divided by -2) to get the (unfolded) residual value, subject | |||
to negative numbers being signed two's complement on the decoding machine.</t> | ||||
<t><xref target="examples"></xref> shows decoding of a complete coded residual.< /t> | <t><xref target="examples"></xref> shows decoding of a complete coded residual.< /t> | |||
</section> | </section> | |||
<section anchor="residual-sample-value-limit"><name>Residual sample value limit< | <section anchor="residual-sample-value-limit"><name>Residual Sample Value Limit< | |||
/name> | /name> | |||
<t>All residual sample values MUST be representable in the range offered by a 32 | <t>All residual sample values <bcp14>MUST</bcp14> be representable in the range | |||
-bit integer, signed one's complement. Equivalently, all residual sample values | offered by a 32-bit integer, signed one's complement. Equivalently, all residual | |||
MUST fall in the range offered by a 32-bit integer signed two's complement exclu | sample values <bcp14>MUST</bcp14> fall in the range offered by a 32-bit integer | |||
ding the most negative possible value of that range. This means residual sample | signed two's complement, excluding the most negative possible value of that ran | |||
values MUST NOT have an absolute value equal to, or larger than, 2 to the power | ge. This means residual sample values <bcp14>MUST NOT</bcp14> have an absolute v | |||
31. A FLAC encoder MUST make sure of this. If a FLAC encoder is, for a certain s | alue equal to, or larger than, 2 to the power 31. A FLAC encoder <bcp14>MUST</bc | |||
ubframe, unable to find a suitable predictor for which all residual samples fall | p14> make sure of this. If a FLAC encoder is, for a certain subframe, unable to | |||
within said range, it MUST default to writing a verbatim subframe. <xref target | find a suitable predictor for which all residual samples fall within said range, | |||
="numerical-considerations"></xref> explains in which circumstances residual sam | it <bcp14>MUST</bcp14> default to writing a verbatim subframe. <xref target="nu | |||
ples are already implicitly representable in said range and thus an additional c | merical-considerations"></xref> explains in which circumstances residual samples | |||
heck is not needed.</t> | are already implicitly representable in said range; thus, an additional check i | |||
<t>The reason for this limit is to ensure that decoders can use 32-bit integers | s not needed.</t> | |||
when processing residuals, simplifying decoding. The reason the most negative va | <t>The reason for this limit is to ensure that decoders can use 32-bit integers | |||
lue of a 32-bit int signed two's complement is specifically excluded is to preve | when processing residuals, simplifying decoding. The reason the most negative va | |||
nt decoders from having to implement specific handling of that value, as it cann | lue of a 32-bit integer signed two's complement is specifically excluded is to p | |||
ot be negated within a 32-bit signed int, and most library routines calculating | revent decoders from having to implement specific handling of that value, as it | |||
an absolute value have undefined behavior on processing that value.</t> | cannot be negated within a 32-bit signed integer, and most library routines calc | |||
ulating an absolute value have undefined behavior for processing that value.</t> | ||||
</section> | </section> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="frame-footer"><name>Frame footer</name> | <section anchor="frame-footer"><name>Frame Footer</name> | |||
<t>Following the last subframe is the frame footer. If the last subframe is not | <t>Following the last subframe is the frame footer. If the last subframe is not | |||
byte aligned (i.e., the number of bits required to store all subframes put toget | byte aligned (i.e., the number of bits required to store all subframes put toget | |||
her is not divisible by 8), zero bits are added until byte alignment is reached. | her is not divisible by 8), zero bits are added until byte alignment is reached. | |||
Following this is a 16-bit CRC, initialized with 0, with the polynomial x^16 + | Following this is a 16-bit CRC, initialized with 0, with the polynomial x<sup>1 | |||
x^15 + x^2 + x^0. This CRC covers the whole frame excluding the 16-bit CRC, incl | 6</sup> + x<sup>15</sup> + x<sup>2</sup> + x<sup>0</sup>. This CRC covers the wh | |||
uding the sync code.</t> | ole frame, excluding the 16-bit CRC but including the sync code.</t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="container-mappings"><name>Container mappings</name> | <section anchor="container-mappings"><name>Container Mappings</name> | |||
<t>The FLAC format can be used without any container, as it already provides for | <t>The FLAC format can be used without any container, as it already provides for | |||
the most basic features normally associated with a container. However, the func | the most basic features normally associated with a container. However, the func | |||
tionality this basic container provides is rather limited, and for more advanced | tionality this basic container provides is rather limited, and for more advanced | |||
features, like combining FLAC audio with video, it needs to be encapsulated by | features (such as combining FLAC audio with video), it needs to be encapsulated | |||
a more capable container. This presents a problem: because of these container f | by a more capable container. This presents a problem: because of these containe | |||
eatures, the FLAC format mixes data that belongs to the encoded data (like block | r features, the FLAC format mixes data that belongs to the encoded data (like bl | |||
size and sample rate) with data that belongs to the container (like checksum an | ock size and sample rate) with data that belongs to the container (like checksum | |||
d timecode). The choice was made to encapsulate FLAC frames as they are, which m | and timecode). The choice was made to encapsulate FLAC frames as they are, whic | |||
eans some data will be duplicated and potentially deviating between the FLAC fra | h means some data will be duplicated and potentially deviating between the FLAC | |||
mes and the encapsulating container.</t> | frames and the encapsulating container.</t> | |||
<t>As FLAC frames are completely independent of each other, container format fea tures handling dependencies do not need to be used. For example, all FLAC frames embedded in Matroska are marked as keyframes when they are stored in a SimpleBl ock, and tracks in an MP4 file containing only FLAC frames do not need a sync sa mple box.</t> | <t>As FLAC frames are completely independent of each other, container format fea tures handling dependencies do not need to be used. For example, all FLAC frames embedded in Matroska are marked as keyframes when they are stored in a SimpleBl ock, and tracks in an MP4 file containing only FLAC frames do not need a sync sa mple box.</t> | |||
<section anchor="ogg-mapping"><name>Ogg mapping</name> | <section anchor="ogg-mapping"><name>Ogg Mapping</name> | |||
<t>The Ogg container format is defined in <xref target="RFC3533"></xref>. The fi rst packet of a logical bitstream carrying FLAC data is structured according to the following table.</t> | <t>The Ogg container format is defined in <xref target="RFC3533"></xref>. The fi rst packet of a logical bitstream carrying FLAC data is structured according to the following table.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Data</th> | <th align="left">Data</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">5 bytes</td> | <td align="left">5 bytes</td> | |||
<td align="left">Bytes <tt>0x7F 0x46 0x4C 0x41 0x43</tt> (as also defined by <xr ef target="RFC5334"></xref>)</td> | <td align="left">Bytes <tt>0x7F 0x46 0x4C 0x41 0x43</tt> (as also defined by <xr ef target="RFC5334"></xref>).</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">2 bytes</td> | <td align="left">2 bytes</td> | |||
<td align="left">Version number of the FLAC-in-Ogg mapping. These bytes are <tt> 0x01 0x00</tt>, meaning version 1.0 of the mapping.</td> | <td align="left">Version number of the FLAC-in-Ogg mapping. These bytes are <tt> 0x01 0x00</tt>, meaning version 1.0 of the mapping.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">2 bytes</td> | <td align="left">2 bytes</td> | |||
<td align="left">Number of header packets (excluding the first header packet) as an unsigned number coded big-endian.</td> | <td align="left">Number of header packets (excluding the first header packet) as an unsigned number coded big-endian.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">4 bytes</td> | <td align="left">4 bytes</td> | |||
<td align="left">The <tt>fLaC</tt> signature</td> | <td align="left">The <tt>fLaC</tt> signature.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">4 bytes</td> | <td align="left">4 bytes</td> | |||
<td align="left">A metadata block header for the streaminfo block</td> | <td align="left">A metadata block header for the streaminfo metadata block.</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">34 bytes</td> | <td align="left">34 bytes</td> | |||
<td align="left">A streaminfo metadata block</td> | <td align="left">A streaminfo metadata block.</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>The number of header packets MAY be 0, which means the number of pack | </table><t>The number of header packets <bcp14>MAY</bcp14> be 0, which means the | |||
ets that follow is unknown. This first packet MUST NOT share a Ogg page with any | number of packets that follow is unknown. This first packet <bcp14>MUST NOT</bc | |||
other packets. This means the first page of a logical stream of FLAC-in-Ogg is | p14> share a Ogg page with any other packets. This means the first page of a log | |||
always 79 bytes.</t> | ical stream of FLAC-in-Ogg is always 79 bytes.</t> | |||
<t>Following the first packet are one or more header packets, each of which cont | <t>Following the first packet are one or more header packets, each of which cont | |||
ains a single metadata block. The first of these packets SHOULD be a Vorbis comm | ains a single metadata block. The first of these packets <bcp14>SHOULD</bcp14> b | |||
ent metadata block, for historic reasons. This is contrary to unencapsulated FLA | e a Vorbis comment metadata block for historic reasons. This is contrary to unen | |||
C streams, where the order of metadata blocks is not important except for the st | capsulated FLAC streams, where the order of metadata blocks is not important exc | |||
reaminfo block and where a Vorbis comment metadata block is optional.</t> | ept for the streaminfo metadata block and where a Vorbis comment metadata block | |||
<t>Following the header packets are audio packets. Each audio packet contains a | is optional.</t> | |||
single FLAC frame. The first audio packet MUST start on a new Ogg page, i.e., th | <t>Following the header packets are audio packets. Each audio packet contains a | |||
e last metadata block MUST finish its page before any audio packets are encapsul | single FLAC frame. The first audio packet <bcp14>MUST</bcp14> start on a new Ogg | |||
ated.</t> | page, i.e., the last metadata block <bcp14>MUST</bcp14> finish its page before | |||
<t>The granule position of all pages containing header packets MUST be 0. For pa | any audio packets are encapsulated.</t> | |||
ges containing audio packets, the granule position is the number of the last sam | <t>The granule position of all pages containing header packets <bcp14>MUST</bcp1 | |||
ple contained in the last completed packet in the frame. The sample numbering co | 4> be 0. For pages containing audio packets, the granule position is the number | |||
nsiders interchannel samples. If a page contains no packet end (e.g., when it on | of the last sample contained in the last completed packet in the frame. The samp | |||
ly contains the start of a large packet, which continues on the next page), then | le numbering considers interchannel samples. If a page contains no packet end (e | |||
the granule position is set to the maximum value possible, i.e., <tt>0xFF 0xFF | .g., when it only contains the start of a large packet that continues on the nex | |||
0xFF 0xFF 0xFF 0xFF 0xFF 0xFF</tt>.</t> | t page), then the granule position is set to the maximum value possible, i.e., < | |||
<t>The granule position of the first audio data page with a completed packet MAY | tt>0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF</tt>.</t> | |||
be larger than the number of samples contained in packets that complete on that | <t>The granule position of the first audio data page with a completed packet <bc | |||
page. In other words, the apparent sample number of the first sample in the str | p14>MAY</bcp14> be larger than the number of samples contained in packets that c | |||
eam following from the granule position and the audio data MAY be larger than 0. | omplete on that page. In other words, the apparent sample number of the first sa | |||
This allows, for example, a server to cast a live stream to several clients tha | mple in the stream following from the granule position and the audio data <bcp14 | |||
t joined at different moments, without rewriting the granule position for each c | >MAY</bcp14> be larger than 0. This allows, for example, a server to cast a liv | |||
lient.</t> | e stream to several clients that joined at different moments without rewriting t | |||
he granule position for each client.</t> | ||||
<t>If an audio stream is encoded where audio properties (sample rate, number of channels, or bit depth) change at some point in the stream, this should be dealt with by finishing encoding of the current Ogg stream and starting a new Ogg str eam, concatenated to the previous one. This is called chaining in Ogg. See the O gg specification <xref target="RFC3533"></xref> for details.</t> | <t>If an audio stream is encoded where audio properties (sample rate, number of channels, or bit depth) change at some point in the stream, this should be dealt with by finishing encoding of the current Ogg stream and starting a new Ogg str eam, concatenated to the previous one. This is called chaining in Ogg. See the O gg specification <xref target="RFC3533"></xref> for details.</t> | |||
</section> | </section> | |||
<section anchor="matroska-mapping"><name>Matroska mapping</name> | <section anchor="matroska-mapping"><name>Matroska Mapping</name> | |||
<t>The Matroska container format is defined in <xref target="I-D.ietf-cellar-mat | <t>The Matroska container format is defined in <xref target="RFC9559"></xref>. T | |||
roska"></xref>. The codec ID (EBML path <tt>\Segment\Tracks\TrackEntry\CodecID</ | he codec ID (EBML path <tt>\Segment\Tracks\TrackEntry\CodecID</tt>) assigned to | |||
tt>) assigned to signal tracks carrying FLAC data is <tt>A_FLAC</tt> in ASCII. A | signal tracks carrying FLAC data is <tt>A_FLAC</tt> in ASCII. All FLAC data befo | |||
ll FLAC data before the first audio frame (i.e., the <tt>fLaC</tt> ASCII signatu | re the first audio frame (i.e., the <tt>fLaC</tt> ASCII signature and all metada | |||
re and all metadata blocks) is stored as CodecPrivate data (EBML path <tt>\Segme | ta blocks) is stored as CodecPrivate data (EBML path <tt>\Segment\Tracks\TrackEn | |||
nt\Tracks\TrackEntry\CodecPrivate</tt>).</t> | try\CodecPrivate</tt>).</t> | |||
<t>Each FLAC frame (including all of its subframes) is treated as a single frame | <t>Each FLAC frame (including all of its subframes) is treated as a single frame | |||
in the Matroska context.</t> | in the context of Matroska.</t> | |||
<t>If an audio stream is encoded where audio properties (sample rate, number of channels, or bit depth) change at some point in the stream, this should be dealt with by finishing the current Matroska segment and starting a new one with the new properties.</t> | <t>If an audio stream is encoded where audio properties (sample rate, number of channels, or bit depth) change at some point in the stream, this should be dealt with by finishing the current Matroska segment and starting a new one with the new properties.</t> | |||
</section> | </section> | |||
<section anchor="iso-base-media-file-format-mp4-mapping"><name>ISO Base Media Fi le Format (MP4) mapping</name> | <section anchor="iso-base-media-file-format-mp4-mapping"><name>ISO Base Media Fi le Format (MP4) Mapping</name> | |||
<t>The full encapsulation definition of FLAC audio in MP4 files was deemed too e xtensive to include in this document. A definition document can be found at <xre f target="FLAC-in-MP4-specification"></xref>.</t> | <t>The full encapsulation definition of FLAC audio in MP4 files was deemed too e xtensive to include in this document. A definition document can be found at <xre f target="FLAC-in-MP4-specification"></xref>.</t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="implementation-status"><name>Implementation status</name> | ||||
<t>Note to RFC Editor - please remove this entire section before publication, as | ||||
well as the reference to RFC 7942.</t> | ||||
<t>This section records the status of known implementations of the FLAC format, | ||||
and is based on a proposal described in <xref target="RFC7942"></xref>. Please n | ||||
ote that the listing of any individual implementation here does not imply endors | ||||
ement by the IETF. Furthermore, no effort has been spent to verify the informati | ||||
on presented here that was supplied by IETF contributors. This is not intended a | ||||
s, and must not be construed to be, a catalog of available implementations or th | ||||
eir features. Readers are advised to note that other implementations may exist. | ||||
</t> | ||||
<t>A reference encoder and decoder implementation of the FLAC format exists, kno | ||||
wn as libFLAC, maintained by Xiph.Org. It can be found at <eref target="https:// | ||||
xiph.org/flac/">https://xiph.org/flac/</eref> Note that while all libFLAC compon | ||||
ents are licensed under 3-clause BSD, the flac and metaflac command line tools o | ||||
ften supplied together with libFLAC are licensed under GPL.</t> | ||||
<t>Another completely independent implementation of both encoder and decoder of | ||||
the FLAC format is available in libavcodec, maintained by FFmpeg, licensed under | ||||
LGPL 2.1 or later. It can be found at <eref target="https://ffmpeg.org/">https: | ||||
//ffmpeg.org/</eref></t> | ||||
<t>A list of other implementations and an overview of which parts of the format | ||||
they implement can be found at <xref target="FLAC-wiki-implementations"></xref>. | ||||
</t> | ||||
</section> | ||||
<section anchor="security-considerations"><name>Security Considerations</name> | <section anchor="security-considerations"><name>Security Considerations</name> | |||
<t>Like any other codec (such as <xref target="RFC6716"></xref>), FLAC should no | <t>Like any other codec (such as <xref target="RFC6716"></xref>), FLAC should no | |||
t be used with insecure ciphers or cipher modes that are vulnerable to known pla | t be used with insecure ciphers or cipher modes that are vulnerable to known pla | |||
intext attacks. Some of the header bits as well as the padding are easily predic | intext attacks. Some of the header bits, as well as the padding, are easily pred | |||
table.</t> | ictable.</t> | |||
<t>Implementations of the FLAC codec need to take appropriate security considera | <t>Implementations of the FLAC codec need to take appropriate security considera | |||
tions into account. Section 2.1 of <xref target="RFC4732"></xref> provides gener | tions into account. <xref target="RFC4732" sectionFormat="of" section="2.1"/> pr | |||
al information on DoS attacks on end-systems and describes some mitigation strat | ovides general information on DoS attacks on end systems and describes some miti | |||
egies. Areas of concern specific to FLAC follow.</t> | gation strategies. Areas of concern specific to FLAC follow.</t> | |||
<t>It is extremely important for the decoder to be robust against malformed payl oads. Payloads that do not conform to this specification <bcp14>MUST NOT</bcp14> cause the decoder to overrun its allocated memory or take an excessive amount o f resources to decode. An overrun in allocated memory could lead to arbitrary co de execution by an attacker. The same applies to the encoder, even though proble ms with encoders are typically rarer. Malformed audio streams <bcp14>MUST NOT</b cp14> cause the encoder to misbehave because this would allow an attacker to att ack transcoding gateways.</t> | <t>It is extremely important for the decoder to be robust against malformed payl oads. Payloads that do not conform to this specification <bcp14>MUST NOT</bcp14> cause the decoder to overrun its allocated memory or take an excessive amount o f resources to decode. An overrun in allocated memory could lead to arbitrary co de execution by an attacker. The same applies to the encoder, even though proble ms with encoders are typically rarer. Malformed audio streams <bcp14>MUST NOT</b cp14> cause the encoder to misbehave because this would allow an attacker to att ack transcoding gateways.</t> | |||
<t>As with all compression algorithms, both encoding and decoding can produce an | <t>As with all compression algorithms, both encoding and decoding can produce an | |||
output much larger than the input. For decoding, the most extreme possible case | output much larger than the input. For decoding, the most extreme possible case | |||
of this is a frame with eight constant subframes of block size 65535 and coding | of this is a frame with eight constant subframes of block size 65535 and coding | |||
for 32-bit PCM. This frame is only 49 bytes in size, but codes for more than 2 | for 32-bit PCM. This frame is only 49 bytes in size but codes for more than 2 m | |||
megabytes of uncompressed PCM data. For encoding, it is possible to have an even | egabytes of uncompressed PCM data. For encoding, it is possible to have an even | |||
larger size increase, although such behavior is generally considered faulty. Th | larger size increase, although such behavior is generally considered faulty. Thi | |||
is happens if the encoder chooses a rice parameter that does not fit with the re | s happens if the encoder chooses a Rice parameter that does not fit with the res | |||
sidual that has to be encoded. In such a case, very long unary coded symbols can | idual that has to be encoded. In such a case, very long unary-coded symbols can | |||
appear, in the most extreme case, more than 4 gigabytes per sample. Decoder and | appear (in the most extreme case, more than 4 gigabytes per sample). Decoder and | |||
encoder implementors are advised to take precautions to prevent excessive resou | encoder implementors are advised to take precautions to prevent excessive resou | |||
rce utilization in such cases.</t> | rce utilization in such cases.</t> | |||
<t>Where metadata is handled, implementors are advised to either thoroughly test | <t>Where metadata is handled, implementors are advised to either thoroughly test | |||
the handling of extreme cases or impose reasonable limits beyond the limits of | the handling of extreme cases or impose reasonable limits beyond the limits of | |||
this specification document. For example, a single Vorbis comment metadata block | this specification. For example, a single Vorbis comment metadata block can cont | |||
can contain millions of valid fields. It is unlikely such a limit is ever reach | ain millions of valid fields. It is unlikely such a limit is ever reached except | |||
ed except in a potentially malicious file. Likewise, the media type and descript | in a potentially malicious file. Likewise, the media type and description of a | |||
ion of a picture metadata block can be millions of characters long, despite ther | picture metadata block can be millions of characters long, despite there being n | |||
e being no reasonable use of such contents. One possible use case for very long | o reasonable use of such contents. One possible use case for very long character | |||
character strings is in lyrics, which can be stored in Vorbis comment metadata b | strings is in lyrics, which can be stored in Vorbis comment metadata block fiel | |||
lock fields.</t> | ds.</t> | |||
<t>Various kinds of metadata blocks contain length fields or field counts. While | <t>Various kinds of metadata blocks contain length fields or field counts. While | |||
reading a block following these lengths or counts, a decoder MUST make sure hig | reading a block following these lengths or counts, a decoder <bcp14>MUST</bcp14 | |||
her-level lengths or counts (most importantly, the length field of the metadata | > make sure higher-level lengths or counts (most importantly, the length field o | |||
block itself) are not exceeded. As some of these length fields code string lengt | f the metadata block itself) are not exceeded. | |||
hs, memory for which must be allocated, parsers MUST first verify that a block i | As some of these length fields code string lengths and memory must be allocated | |||
s valid before allocating memory based on its contents, except when explicitly i | for that, parsers <bcp14>MUST</bcp14> first verify that a block is valid before | |||
nstructed to salvage data from a malformed file.</t> | allocating memory based on its contents, except when explicitly instructed to sa | |||
<t>Metadata blocks can also contain references, e.g., the picture metadata block | lvage data from a malformed file.</t> | |||
can contain a URI. When following an URI, the security considerations of [RFC39 | ||||
86] apply. Applications MUST obtain explicit user approval to retrieve resources | <t>Metadata blocks can also contain references, e.g., the picture metadata block | |||
via remote protocols. Following external URIs introduces a tracking risk from o | can contain a URI. When following a URI, the security considerations of <xref t | |||
n-path observers and the operator of the service hosting the URI. Likewise, the | arget="RFC3986"/> apply. Applications <bcp14>MUST</bcp14> obtain explicit user a | |||
choice of scheme, if it isn’t protected like https, could also introduce integri | pproval to retrieve resources via remote protocols. Following external URIs intr | |||
ty attacks by an on-path observer. A malicious operator of the service hosting t | oduces a tracking risk from on-path observers and the operator of the service ho | |||
he URI can return arbitrary content that the parser will read. Also, such retrie | sting the URI. Likewise, the choice of scheme, if it isn't protected like https, | |||
vals can be used in a DDoS attack when the URI points to a potential victim. The | could also introduce integrity attacks by an on-path observer. A malicious oper | |||
refore, applications need to ask user approval for each retrieval individually, | ator of the service hosting the URI can return arbitrary content that the parser | |||
take extra precautions when parsing retrieved data, and cache retrieved resource | will read. Also, such retrievals can be used in a DDoS attack when the URI poin | |||
s. Applications MUST obtain explicit user approval to retrieve local resources n | ts to a potential victim. Therefore, applications need to ask user approval for | |||
ot located in the same directory as the FLAC file being processed. Since relativ | each retrieval individually, take extra precautions when parsing retrieved data, | |||
e URIs are permitted, applications MUST guard against directory traversal attack | and cache retrieved resources. Applications <bcp14>MUST</bcp14> obtain explicit | |||
s and guard against a violation of a same-origin policy if such a policy is bein | user approval to retrieve local resources not located in the same directory as | |||
g enforced.</t> | the FLAC file being processed. Since relative URIs are permitted, applications < | |||
<t>Seeking in a FLAC stream that is not in a container relies on the coded numbe | bcp14>MUST</bcp14> guard against directory traversal attacks and guard against a | |||
r in frame headers and optionally a seektable metadata block. Parsers MUST emplo | violation of a same-origin policy if such a policy is being enforced.</t> | |||
y thorough checks on whether a found coded number or seekpoint is at all possibl | ||||
e, e.g., whether it is within bounds and not directly contradicting any other co | <t>Seeking in a FLAC stream that is not in a container relies on the coded numbe | |||
ded number or seekpoint that the seeking process relies on. Without these checks | r in frame headers and optionally a seek table metadata block. Parsers <bcp14>MU | |||
, seeking might get stuck in an infinite loop when numbers in frames are non-con | ST</bcp14> employ thorough checks on whether a found coded number or seek point | |||
secutive or otherwise not valid, which could be used in denial of service attack | is at all possible, e.g., whether it is within bounds and not directly contradic | |||
s.</t> | ting any other coded number or seek point that the seeking process relies on. Wi | |||
thout these checks, seeking might get stuck in an infinite loop when numbers in | ||||
frames are non-consecutive or otherwise not valid, which could be used in DoS at | ||||
tacks.</t> | ||||
<t>Implementors are advised to employ fuzz testing combined with different sanit izers on FLAC decoders to find security problems. Ignoring the results of CRC ch ecks improves the efficiency of decoder fuzz testing.</t> | <t>Implementors are advised to employ fuzz testing combined with different sanit izers on FLAC decoders to find security problems. Ignoring the results of CRC ch ecks improves the efficiency of decoder fuzz testing.</t> | |||
<t>See <xref target="FLAC-decoder-testbench"></xref> for a non-exhaustive list o f FLAC files with extreme configurations that lead to crashes or reboots on some known implementations. Besides providing a starting point for security testing, this set of files can also be used to test conformance with this specification. </t> | <t>See <xref target="FLAC-decoder-testbench"></xref> for a non-exhaustive list o f FLAC files with extreme configurations that lead to crashes or reboots on some known implementations. Besides providing a starting point for security testing, this set of files can also be used to test conformance with this specification. </t> | |||
<t>FLAC files may contain executable code, although the FLAC format is not desig ned for it and it is uncommon. One use case where FLAC is occasionally used to s tore executable code is when compressing images of mixed mode CDs, which contain both audio and non-audio data, of which the non-audio portion can contain execu table code. In that case, the executable code is stored as if it were audio and is potentially obscured. Of course, it is also possible to store executable code as metadata, for example as a vorbis comment with help of a binary-to-text enco ding or directly in an application metadata block. Applications MUST NOT execute code contained in FLAC files or present parts of FLAC files as executable code to the user, except when an application has that explicit purpose, e.g., applica tions reading FLAC files as disc images and presenting it as virtual disc drive. </t> | <t>FLAC files may contain executable code, although the FLAC format is not desig ned for it and it is uncommon. One use case where FLAC is occasionally used to s tore executable code is when compressing images of mixed-mode CDs, which contain both audio and non-audio data, the non-audio portion of which can contain execu table code. In that case, the executable code is stored as if it were audio and is potentially obscured. Of course, it is also possible to store executable code as metadata, for example, as a Vorbis comment with help of a binary-to-text enc oding or directly in an application metadata block. Applications <bcp14>MUST NOT </bcp14> execute code contained in FLAC files or present parts of FLAC files as executable code to the user, except when an application has that explicit purpos e, e.g., applications reading FLAC files as disc images and presenting it as a v irtual disc drive.</t> | |||
</section> | </section> | |||
<section anchor="iana-considerations"><name>IANA Considerations</name> | <section anchor="iana-considerations"><name>IANA Considerations</name> | |||
<t>This document registers one new media type, "audio/flac", as define | <t> Per this document, IANA has registered one new media type ("audio/flac") and | |||
d in the following section, and creates a new IANA registry.</t> | created a new IANA registry, as described in the subsections below.</t> | |||
<section anchor="media-type-registration"><name>Media Type Registration</name> | ||||
<section anchor="media-type-registration"><name>Media type registration</name> | <t>IANA has registered the "audio/flac" media type as follows. This media type i | |||
<t>The following information serves as the registration form for the "audio | s applicable for FLAC audio that is not packaged in a container as described in | |||
/flac" media type. This media type is applicable for FLAC audio that is not | <xref target="container-mappings"></xref>. FLAC audio packaged in such a contain | |||
packaged in a container as described in <xref target="container-mappings"></xre | er will take on the media type of that container, for example, "audio/ogg" when | |||
f>. FLAC audio packaged in such a container will take on the media type of that | packaged in an Ogg container or "video/mp4" when packaged in an MP4 container al | |||
container, for example, audio/ogg when packaged in an Ogg container, or video/mp | ongside a video track.</t> | |||
4 when packaged in an MP4 container alongside a video track.</t> | ||||
<artwork><![CDATA[Type name: audio | ||||
Subtype name: flac | ||||
Required parameters: N/A | ||||
Optional parameters: N/A | ||||
Encoding considerations: as per THISRFC | ||||
Security considerations: see the security considerations in Section | ||||
12 of THISRFC | ||||
Interoperability considerations: see the descriptions of past format | ||||
changes in Appendix B of THISRFC | ||||
Published specification: THISRFC | ||||
Applications that use this media type: ffmpeg, apache, firefox | <dl> | |||
<dt>Type name:</dt><dd>audio</dd> | ||||
Fragment identifier considerations: none | <dt>Subtype name:</dt><dd>flac</dd> | |||
Additional information: | <dt>Required parameters:</dt><dd>N/A</dd> | |||
Deprecated alias names for this type: audio/x-flac | <dt>Optional parameters:</dt><dd>N/A</dd> | |||
Magic number(s): fLaC | <dt>Encoding considerations:</dt><dd>as per RFC 9639</dd> | |||
File extension(s): flac | <dt>Security considerations:</dt><dd>See the security considerations in | |||
<xref target="security-considerations"></xref> of RFC 9639.</dd> | ||||
Macintosh file type code(s): none | <dt>Interoperability considerations:</dt><dd>See the descriptions of past | |||
format changes in <xref target="past-format-changes"/> of RFC 9639.</dd> | ||||
Uniform Type Identifier: org.xiph.flac conforms to public.audio | <dt>Published specification:</dt><dd>RFC 9639</dd> | |||
Windows Clipboard Format Name: audio/flac | <dt>Applications that use this media type:</dt><dd>FFmpeg, Apache, | |||
Firefox</dd> | ||||
Person & email address to contact for further information: | <dt>Fragment identifier considerations:</dt><dd>N/A</dd> | |||
IETF CELLAR WG cellar@ietf.org | ||||
Intended usage: COMMON | <dt>Additional information:</dt><dd> | |||
<t><br/></t> | ||||
<dl spacing="compact"> | ||||
<dt>Deprecated alias names for this type:</dt><dd>audio/x-flac</dd> | ||||
<dt>Magic number(s):</dt><dd>fLaC</dd> | ||||
<dt>File extension(s):</dt><dd>flac</dd> | ||||
<dt>Macintosh file type code(s):</dt><dd>N/A</dd> | ||||
<dt>Uniform Type Identifier:</dt><dd>org.xiph.flac conforms to public.audio</dd> | ||||
<dt>Windows Clipboard Format Name:</dt><dd>audio/flac</dd> | ||||
</dl> | ||||
</dd> | ||||
Restrictions on usage: N/A | <dt>Person & email address to contact for further | |||
information:</dt><dd>IETF CELLAR Working Group (cellar@ietf.org)</dd> | ||||
Author: IETF CELLAR WG | <dt>Intended usage:</dt><dd>COMMON</dd> | |||
Change controller: Internet Engineering Task Force | <dt>Restrictions on usage:</dt><dd>N/A</dd> | |||
(mailto:iesg@ietf.org) | ||||
Provisional registration? (standards tree only): NO | <dt>Author:</dt><dd>IETF CELLAR Working Group</dd> | |||
]]> | <dt>Change controller:</dt><dd>Internet Engineering Task Force | |||
</artwork> | (iesg@ietf.org)</dd> | |||
</dl> | ||||
</section> | </section> | |||
<section anchor="application-id-registry"><name>Application ID Registry</name> | <section anchor="application-id-registry"><name>FLAC Application Metadata | |||
<t>This document creates a new IANA registry called the "FLAC Application M | Block IDs Registry</name> | |||
etadata Block ID" registry. The values correspond to the 32-bit identifier | <t>IANA has created a new registry called the "FLAC Application Metadata Block I | |||
described in <xref target="application"></xref>.</t> | Ds" registry. The values correspond to the 32-bit identifier described in <xref | |||
<t>To register a new Application ID in this registry, one needs an Application I | target="application"></xref>.</t> | |||
D, a description, optionally a reference to a document describing the Applicatio | ||||
n ID and a Change Controller (IETF or email of registrant). The Application IDs | <t>To register a new application ID in this registry, one needs an application I | |||
are to be allocated according to the "First Come First Served" policy | D, a description, an optional reference to a document describing the application | |||
[RFC8126], so that there is no impediment to registering any Application IDs the | ID, and a Change Controller (IETF or email of registrant). The application IDs | |||
FLAC community encounters, especially if they were used in audio files but were | are allocated according to the "First Come First Served" policy <xref target="R | |||
not registered when the audio files were encoded. An Application ID can be any | FC8126"/> so that there is no impediment to registering any application IDs the | |||
32-bit value, but is often composed of 4 ASCII characters, to be human-readable. | FLAC community encounters, especially if they were used in audio files but were | |||
</t> | not registered when the audio files were encoded. An application ID can be any 3 | |||
<t>The FLAC Application Metadata Block ID registry is assigned the following ini | 2-bit value but is often composed of 4 ASCII characters that are human-readable. | |||
tial values, taken from the registration page at xiph.org (see <xref target="ID- | </t> | |||
registration-page"></xref>), which is no longer being maintained as it is replac | <t>The initial contents of "FLAC Application Metadata Block IDs" registry are sh | |||
ed by this registry.</t> | own in the table below. These initial values were taken from the registration pa | |||
ge at xiph.org (see <xref target="ID-registration-page"></xref>), which is no lo | ||||
nger being maintained as it has been replaced by this registry.</t> | ||||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Application ID</th> | <th align="left">Application ID</th> | |||
<th align="left">ASCII rendition (if available)</th> | <th align="left">ASCII Rendition (If Available)</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
<th align="left">Specification</th> | <th align="left">Reference</th> | |||
<th align="left">Change controller</th> | <th align="left">Change Controller</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0x41544348</td> | <td align="left">0x41544348</td> | |||
<td align="left">ATCH</td> | <td align="left">ATCH</td> | |||
<td align="left">FlacFile</td> | <td align="left">FlacFile</td> | |||
<td align="left"><xref target="FlacFile"></xref></td> | <td align="left"><xref target="FlacFile"></xref>, RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x42534F4C</td> | <td align="left">0x42534F4C</td> | |||
<td align="left">BSOL</td> | <td align="left">BSOL</td> | |||
<td align="left">beSolo</td> | <td align="left">beSolo</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x42554753</td> | <td align="left">0x42554753</td> | |||
<td align="left">BUGS</td> | <td align="left">BUGS</td> | |||
<td align="left">Bugs Player</td> | <td align="left">Bugs Player</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x43756573</td> | <td align="left">0x43756573</td> | |||
<td align="left">Cues</td> | <td align="left">Cues</td> | |||
<td align="left">GoldWave cue points</td> | <td align="left">GoldWave cue points</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x46696361</td> | <td align="left">0x46696361</td> | |||
<td align="left">Fica</td> | <td align="left">Fica</td> | |||
<td align="left">CUE Splitter</td> | <td align="left">CUE Splitter</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x46746F6C</td> | <td align="left">0x46746F6C</td> | |||
<td align="left">Ftol</td> | <td align="left">Ftol</td> | |||
<td align="left">flac-tools</td> | <td align="left">flac-tools</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x4D4F5442</td> | <td align="left">0x4D4F5442</td> | |||
<td align="left">MOTB</td> | <td align="left">MOTB</td> | |||
<td align="left">MOTB MetaCzar</td> | <td align="left">MOTB MetaCzar</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x4D505345</td> | <td align="left">0x4D505345</td> | |||
<td align="left">MPSE</td> | <td align="left">MPSE</td> | |||
<td align="left">MP3 Stream Editor</td> | <td align="left">MP3 Stream Editor</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x4D754D4C</td> | <td align="left">0x4D754D4C</td> | |||
<td align="left">MuML</td> | <td align="left">MuML</td> | |||
<td align="left">MusicML: Music Metadata Language</td> | <td align="left">MusicML: Music Metadata Language</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x52494646</td> | <td align="left">0x52494646</td> | |||
<td align="left">RIFF</td> | <td align="left">RIFF</td> | |||
<td align="left">Sound Devices RIFF chunk storage</td> | <td align="left">Sound Devices RIFF chunk storage</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x5346464C</td> | <td align="left">0x5346464C</td> | |||
<td align="left">SFFL</td> | <td align="left">SFFL</td> | |||
<td align="left">Sound Font FLAC</td> | <td align="left">Sound Font FLAC</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x534F4E59</td> | <td align="left">0x534F4E59</td> | |||
<td align="left">SONY</td> | <td align="left">SONY</td> | |||
<td align="left">Sony Creative Software</td> | <td align="left">Sony Creative Software</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x5351455A</td> | <td align="left">0x5351455A</td> | |||
<td align="left">SQEZ</td> | <td align="left">SQEZ</td> | |||
<td align="left">flacsqueeze</td> | <td align="left">flacsqueeze</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x54745776</td> | <td align="left">0x54745776</td> | |||
<td align="left">TtWv</td> | <td align="left">TtWv</td> | |||
<td align="left">TwistedWave</td> | <td align="left">TwistedWave</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x55495453</td> | <td align="left">0x55495453</td> | |||
<td align="left">UITS</td> | <td align="left">UITS</td> | |||
<td align="left">UITS Embedding tools</td> | <td align="left">UITS Embedding tools</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x61696666</td> | <td align="left">0x61696666</td> | |||
<td align="left">aiff</td> | <td align="left">aiff</td> | |||
<td align="left">FLAC AIFF chunk storage</td> | <td align="left">FLAC AIFF chunk storage</td> | |||
<td align="left"><xref target="Foreign-metadata"></xref></td> | <td align="left"><xref target="Foreign-metadata"></xref>, RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x696D6167</td> | <td align="left">0x696D6167</td> | |||
<td align="left">imag</td> | <td align="left">imag</td> | |||
<td align="left">flac-image</td> | <td align="left">flac-image</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x7065656D</td> | <td align="left">0x7065656D</td> | |||
<td align="left">peem</td> | <td align="left">peem</td> | |||
<td align="left">Parseable Embedded Extensible Metadata</td> | <td align="left">Parseable Embedded Extensible Metadata</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x71667374</td> | <td align="left">0x71667374</td> | |||
<td align="left">qfst</td> | <td align="left">qfst</td> | |||
<td align="left">QFLAC Studio</td> | <td align="left">QFLAC Studio</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x72696666</td> | <td align="left">0x72696666</td> | |||
<td align="left">riff</td> | <td align="left">riff</td> | |||
<td align="left">FLAC RIFF chunk storage</td> | <td align="left">FLAC RIFF chunk storage</td> | |||
<td align="left"><xref target="Foreign-metadata"></xref></td> | <td align="left"><xref target="Foreign-metadata"></xref>, RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x74756E65</td> | <td align="left">0x74756E65</td> | |||
<td align="left">tune</td> | <td align="left">tune</td> | |||
<td align="left">TagTuner</td> | <td align="left">TagTuner</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x773634C0</td> | <td align="left">0x77363420</td> | |||
<td align="left">w64</td> | <td align="left">w64 </td> | |||
<td align="left">FLAC Wave64 chunk storage</td> | <td align="left">FLAC Wave64 chunk storage</td> | |||
<td align="left"><xref target="Foreign-metadata"></xref></td> | <td align="left"><xref target="Foreign-metadata"></xref>, RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x78626174</td> | <td align="left">0x78626174</td> | |||
<td align="left">xbat</td> | <td align="left">xbat</td> | |||
<td align="left">XBAT</td> | <td align="left">XBAT</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x786D6364</td> | <td align="left">0x786D6364</td> | |||
<td align="left">xmcd</td> | <td align="left">xmcd</td> | |||
<td align="left">xmcd</td> | <td align="left">xmcd</td> | |||
<td align="left"></td> | <td align="left">RFC 9639</td> | |||
<td align="left">IETF</td> | <td align="left">IETF</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table></section> | </table></section> | |||
</section> | </section> | |||
<section anchor="acknowledgments"><name>Acknowledgments</name> | ||||
<t>FLAC owes much to the many people who have advanced the audio compression fie | ||||
ld so freely. For instance:</t> | ||||
<ul spacing="compact"> | ||||
<li>A. J. Robinson for his work on Shorten; his paper (see <xref target="robinso | ||||
n-tr156"></xref>) is a good starting point on some of the basic methods used by | ||||
FLAC. FLAC trivially extends and improves the fixed predictors, LPC coefficient | ||||
quantization, and Rice coding used in Shorten.</li> | ||||
<li>S. W. Golomb and Robert F. Rice; their universal codes are used by FLAC's en | ||||
tropy coder, see <xref target="Rice"></xref>.</li> | ||||
<li>N. Levinson and J. Durbin; the FLAC reference encoder (see <xref target="imp | ||||
lementation-status"></xref>) uses an algorithm developed and refined by them for | ||||
determining the LPC coefficients from the autocorrelation coefficients, see <xr | ||||
ef target="Durbin"></xref>.</li> | ||||
<li>And of course, Claude Shannon, see <xref target="Shannon"></xref>.</li> | ||||
</ul> | ||||
<t>The FLAC format, the FLAC reference implementation, and this document were or | ||||
iginally developed by Josh Coalson. While many others have contributed since, th | ||||
is original effort is deeply appreciated.</t> | ||||
</section> | ||||
</middle> | </middle> | |||
<back> | <back> | |||
<references><name>References</name> | <references> | |||
<references><name>Normative References</name> | <name>References</name> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-cel | <references> | |||
lar-matroska.xml"/> | <name>Normative References</name> | |||
<reference anchor="ISRC-handbook" target="https://www.ifpi.org/isrc_handbook/"> | <reference anchor="ISRC-handbook" target="https://www.ifpi.org/isrc_handbook/"> | |||
<front> | <front> | |||
<title>International Standard Recording Code (ISRC) Handbook, 4th edition</t itle> | <title>International Standard Recording Code (ISRC) Handbook</title> | |||
<author> | <author> | |||
<organization>International ISRC Registration Authority</organization> | <organization>International ISRC Registration Authority</organization> | |||
</author> | </author> | |||
<date year="2021"></date> | <date year="2021"/> | |||
</front> | </front> | |||
<refcontent>4th edition</refcontent> | ||||
</reference> | </reference> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.1321.xml" /> | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.1321.xml" /> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2046.xml" /> | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2046.xml" /> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2083.xml" /> | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2083.xml" /> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml" /> | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml" /> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3533.xml" /> | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3533.xml" /> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3629.xml" /> | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3629.xml" /> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3986.xml" /> | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3986.xml" /> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml" /> | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml" /> | |||
</references> | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9559.xml" | |||
<references><name>Informative References</name> | /> | |||
</references> | ||||
<references> | ||||
<name>Informative References</name> | ||||
<reference anchor="Durbin" target="https://www.jstor.org/stable/1401322"> | <reference anchor="Durbin" target="https://www.jstor.org/stable/1401322"> | |||
<front> | <front> | |||
<title>The Fitting of Time-Series Models </title> | <title>The Fitting of Time-Series Models</title> | |||
<author fullname="James Durbin" initials="J" surname="Durbin"> | <author fullname="James Durbin" initials="J" surname="Durbin"> | |||
<organization>University of London</organization> | <organization>University of London</organization> | |||
</author> | </author> | |||
<date year="1959" month="12"></date> | <date year="1960"/> | |||
</front> | </front> | |||
<seriesInfo name="DOI" value="10.2307/1401322"></seriesInfo> | <seriesInfo name="DOI" value="10.2307/1401322"></seriesInfo> | |||
<refcontent>Revue de l'Institut International de Statistique / Review of the I nternational Statistical Institute, vol. 28, no. 3, pp. 233–44</refcontent> | ||||
</reference> | </reference> | |||
<reference anchor="FIR" target="https://en.wikipedia.org/wiki/Finite_impulse_res | ||||
ponse"> | <reference anchor="FIR" target="https://en.wikipedia.org/w/index.php?title=Finit | |||
e_impulse_response&oldid=1240945295"> | ||||
<front> | <front> | |||
<title>Finite impulse response - Wikipedia</title> | <title>Finite impulse response</title> | |||
<author></author> | <author> | |||
<date></date> | <organization>Wikipedia</organization> | |||
</author> | ||||
<date month="August" year="2024"/> | ||||
</front> | </front> | |||
</reference> | </reference> | |||
<reference anchor="FLAC-decoder-testbench" target="https://github.com/ietf-wg-ce llar/flac-test-files"> | <reference anchor="FLAC-decoder-testbench" target="https://github.com/ietf-wg-ce llar/flac-test-files"> | |||
<front> | <front> | |||
<title>FLAC decoder testbench</title> | <title> The Free Lossless Audio Codec (FLAC) test files</title> | |||
<author></author> | <author></author> | |||
<date year="2023" month="08"></date> | <date year="2023" month="08"></date> | |||
</front> | </front> | |||
<refcontent>commit aa7b0c6</refcontent> | <refcontent>commit aa7b0c6</refcontent> | |||
</reference> | </reference> | |||
<reference anchor="FLAC-in-MP4-specification" target=" https://github.com/xiph/f | ||||
lac/blob/master/doc/isoflac.txt"> | <reference anchor="FLAC-in-MP4-specification" target="https://github.com/xiph/fl | |||
ac/blob/master/doc/isoflac.txt"> | ||||
<front> | <front> | |||
<title>Encapsulation of FLAC in ISO Base Media File Format</title> | <title>Encapsulation of FLAC in ISO Base Media File Format</title> | |||
<author fullname="Christopher Montgomery" initials="C" surname="Montgomery"> | <author></author> | |||
</author> | <date year="2022" month="07"/> | |||
<date year="2022" month="07"></date> | ||||
</front> | </front> | |||
<refcontent>commit 78d85dd</refcontent> | <refcontent>commit 78d85dd</refcontent> | |||
</reference> | </reference> | |||
<reference anchor="FLAC-specification-github" target="https://github.com/ietf-wg -cellar/flac-specification"> | <reference anchor="FLAC-specification-github" target="https://github.com/ietf-wg -cellar/flac-specification"> | |||
<front> | <front> | |||
<title>FLAC specification github repository</title> | <title>The Free Lossless Audio Codec (FLAC) Specification</title> | |||
<author></author> | <author></author> | |||
<date></date> | <date></date> | |||
</front> | </front> | |||
</reference> | </reference> | |||
<reference anchor="FLAC-wiki-implementations" target="https://github.com/ietf-wg | ||||
-cellar/flac-specification/wiki/Implementations"> | ||||
<front> | ||||
<title>FLAC specification wiki: Implementations</title> | ||||
<author></author> | ||||
</front> | ||||
</reference> | ||||
<reference anchor="FLAC-wiki-interoperability" target="https://github.com/ietf-w g-cellar/flac-specification/wiki/Interoperability-considerations"> | <reference anchor="FLAC-wiki-interoperability" target="https://github.com/ietf-w g-cellar/flac-specification/wiki/Interoperability-considerations"> | |||
<front> | <front> | |||
<title>FLAC specification wiki: Interoperability considerations</title> | <title>Interoperability considerations</title> | |||
<author></author> | <author></author> | |||
</front> | </front> | |||
<refcontent>commit 58a06d6</refcontent> | ||||
</reference> | </reference> | |||
<reference anchor="FlacFile" target="https://web.archive.org/web/20071023070305/ http://firestuff.org:80/flacfile/"> | <reference anchor="FlacFile" target="https://web.archive.org/web/20071023070305/ http://firestuff.org:80/flacfile/"> | |||
<front> | <front> | |||
<title>FlacFile</title> | <title>FlacFile</title> | |||
<author></author> | <author></author> | |||
<date year="2007" month="10"></date> | <date year="2007" month="10"></date> | |||
</front> | </front> | |||
<refcontent>Wayback Machine archive</refcontent> | ||||
</reference> | </reference> | |||
<reference anchor="FLAC-implementation" | ||||
target="https://xiph.org/flac/"> | ||||
<front> | ||||
<title>FLAC</title> | ||||
<author></author> | ||||
<date></date> | ||||
</front> | ||||
</reference> | ||||
<reference anchor="Foreign-metadata" target="https://github.com/xiph/flac/blob/m aster/doc/foreign_metadata_storage.md"> | <reference anchor="Foreign-metadata" target="https://github.com/xiph/flac/blob/m aster/doc/foreign_metadata_storage.md"> | |||
<front> | <front> | |||
<title>Specification of foreign metadata storage in FLAC</title> | <title>Specification of foreign metadata storage in FLAC</title> | |||
<author></author> | <author></author> | |||
<date year="2023" month="11"></date> | <date year="2023" month="11"></date> | |||
</front> | </front> | |||
<refcontent>commit 72787c3</refcontent> | ||||
</reference> | </reference> | |||
<reference anchor="HPL-1999-144" target="https://www.hpl.hp.com/techreports/1999 | ||||
/HPL-1999-144.pdf"> | <reference anchor="Lossless-Compression" target="https://ieeexplore.ieee.org/doc | |||
ument/939834"> | ||||
<front> | <front> | |||
<title>Lossless Compression of Digital Audio</title> | <title>Lossless compression of digital audio</title> | |||
<author fullname="Mat Hans" initials="M" surname="Hans"> | <author fullname="Mat Hans" initials="M" surname="Hans"> | |||
<organization>Client and Media Systems Laboratory, HP Laboratories Palo Al to</organization> | <organization>Client and Media Systems Laboratory, HP Laboratories Palo Al to</organization> | |||
</author> | </author> | |||
<author fullname="Ronald W. Schafer" initials="RW" surname="Schafer"> | <author fullname="Ronald W. Schafer" initials="R. W" surname="Schafer"> | |||
<organization>Center for Signal & Image Processing at the School of El ectrical and Computer Engineering, Georgia Institute of the Technology, Atlanta, Georgia</organization> | <organization>Center for Signal & Image Processing at the School of El ectrical and Computer Engineering, Georgia Institute of the Technology, Atlanta, Georgia</organization> | |||
</author> | </author> | |||
<date year="1999" month="11"></date> | <date year="2001" month="July"></date> | |||
</front> | </front> | |||
<seriesInfo name="DOI" value="10.1109/79.939834"></seriesInfo> | <seriesInfo name="DOI" value="10.1109/79.939834"></seriesInfo> | |||
<refcontent>IEEE Signal Processing Magazine, vol. 18, no. 4, pp. 21-32</refcon tent> | ||||
</reference> | </reference> | |||
<reference anchor="ID-registration-page" target="https://xiph.org/flac/id.html"> | <reference anchor="ID-registration-page" target="https://xiph.org/flac/id.html"> | |||
<front> | <front> | |||
<title>FLAC - ID Registry</title> | <title>ID registry</title> | |||
<author></author> | <author> | |||
<organization>Xiph.Org</organization> | ||||
</author> | ||||
</front> | </front> | |||
</reference> | </reference> | |||
<reference anchor="ID3v2" target="https://web.archive.org/web/20220903174949/htt ps://id3.org/id3v2.4.0-frames"> | <reference anchor="ID3v2" target="https://web.archive.org/web/20220903174949/htt ps://id3.org/id3v2.4.0-frames"> | |||
<front> | <front> | |||
<title>id3v2.4.0-frames.txt</title> | <title>ID3 tag version 2.4.0 - Native Frames</title> | |||
<author fullname="Martin Nilsson" initials="M" surname="Nilsson"></author> | <author fullname="Martin Nilsson" initials="M" surname="Nilsson"></author> | |||
<date year="2000" month="11"></date> | <date year="2000" month="11"></date> | |||
</front> | </front> | |||
<refcontent>Wayback Machine archive</refcontent> | ||||
</reference> | </reference> | |||
<reference anchor="IEC.60908.1999" target=""> | ||||
<reference anchor="IEC.60908.1999" target="https://webstore.iec.ch/publication/3 | ||||
885"> | ||||
<front> | <front> | |||
<title>Audio recording - Compact disc digital audio system</title> | <title>Audio recording - Compact disc digital audio system</title> | |||
<author> | <author> | |||
<organization>International Electrotechnical Commission</organization> | <organization>International Electrotechnical Commission</organization> | |||
</author> | </author> | |||
<date year="1999"></date> | <date year="1999"></date> | |||
</front> | </front> | |||
<seriesInfo name="IEC" value="International standard 60908 second edition"></s eriesInfo> | <seriesInfo name="IEC" value="60908:1999-02"></seriesInfo> | |||
</reference> | </reference> | |||
<reference anchor="LinearPrediction" target="https://en.wikipedia.org/wiki/Linea | ||||
r_prediction"> | <reference anchor="LinearPrediction" target="https://en.wikipedia.org/w/index.ph | |||
p?title=Linear_prediction&oldid=1169015573"> | ||||
<front> | <front> | |||
<title>Linear prediction - Wikipedia</title> | <title>Linear prediction</title> | |||
<author></author> | <author> | |||
<date></date> | <organization>Wikipedia</organization> | |||
</author> | ||||
<date month="August" year="2023" /> | ||||
</front> | </front> | |||
</reference> | </reference> | |||
<reference anchor="MLP" target="https://www.aes.org/e-lib/online/browse.cfm?elib =8082"> | <reference anchor="MLP" target="https://www.aes.org/e-lib/online/browse.cfm?elib =8082"> | |||
<front> | <front> | |||
<title>The MLP Lossless Compression System</title> | <title>The MLP Lossless Compression System</title> | |||
<author fullname="Michael A. Gerzon" initials="MA" surname="Gerzon"></author | <author fullname="Michael A. Gerzon" initials="M. A" surname="Gerzon"></auth | |||
> | or> | |||
<author fullname="Peter G. Craven" initials="PG" surname="Craven"> | <author fullname="Peter G. Craven" initials="P. G" surname="Craven"> | |||
<organization>Algol Applications Ltd, Hove, England</organization> | <organization>Algol Applications Ltd, Hove, England</organization> | |||
</author> | </author> | |||
<author fullname="J. Robert Stuart" initials="JR" surname="Stuart"> | <author fullname="J. Robert Stuart" initials="J. R" surname="Stuart"> | |||
<organization>Meridian Audio Ltd, Huntingdon, England</organization> | <organization>Meridian Audio Ltd, Huntingdon, England</organization> | |||
</author> | </author> | |||
<author fullname="Malcolm J. Law" initials="MJ" surname="Law"> | <author fullname="Malcolm J. Law" initials="M. J" surname="Law"> | |||
<organization>Algol Applications Ltd, Hove, England</organization> | <organization>Algol Applications Ltd, Hove, England</organization> | |||
</author> | </author> | |||
<author fullname="Rhonda J. Wilson" initials="RJ" surname="Wilson"> | <author fullname="Rhonda J. Wilson" initials="R. J" surname="Wilson"> | |||
<organization>Meridian Audio Ltd, Huntingdon, England</organization> | <organization>Meridian Audio Ltd, Huntingdon, England</organization> | |||
</author> | </author> | |||
<date year="1999" month="09"></date> | <date year="1999" month="09"></date> | |||
</front> | </front> | |||
<refcontent>Audio Engineering Society Conference: 17th International Conferenc e: High-Quality Audio Codin</refcontent> | ||||
</reference> | </reference> | |||
<reference anchor="MusicBrainz" target="https://picard-docs.musicbrainz.org/en/v ariables/variables.html"> | <reference anchor="MusicBrainz" target="https://picard-docs.musicbrainz.org/en/v ariables/variables.html"> | |||
<front> | <front> | |||
<title>Tags & Variables - MusicBrainz Picard v2.10 documentation</title> | <title>Tags & Variables</title> | |||
<author> | <author> | |||
<organization>MusicBrainz</organization> | <organization>MusicBrainz</organization> | |||
</author> | </author> | |||
<date></date> | ||||
</front> | </front> | |||
<refcontent>MusicBrainz Picard v2.10 documentation</refcontent> | ||||
</reference> | </reference> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4732.xml" /> | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4732.xml" /> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5334.xml" /> | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5334.xml" /> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6716.xml" /> | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6716.xml" /> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7942.xml" | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8126.xml" | |||
/> | /> | |||
<reference anchor="Rice" target="https://ieeexplore.ieee.org/document/1090789"> | <reference anchor="Rice" target="https://ieeexplore.ieee.org/document/1090789"> | |||
<front> | <front> | |||
<title>Adaptive Variable-Length Coding for Efficient Compression of Spacecra ft Television Data</title> | <title>Adaptive Variable-Length Coding for Efficient Compression of Spacecra ft Television Data</title> | |||
<author fullname="Robert Rice" initials="RF" surname="Rice"> | <author fullname="Robert Rice" initials="R. F" surname="Rice"> | |||
<organization>Jet Propulsion Laboratory, California Institute of Technolog y, Pasadena, CA, USA</organization> | <organization>Jet Propulsion Laboratory, California Institute of Technolog y, Pasadena, CA, USA</organization> | |||
</author> | </author> | |||
<author initials="JR" surname="Plaunt"> | <author initials="J. R" surname="Plaunt"> | |||
<organization>Jet Propulsion Laboratory, California Institute of Technolog y, Pasadena, CA, USA</organization> | <organization>Jet Propulsion Laboratory, California Institute of Technolog y, Pasadena, CA, USA</organization> | |||
</author> | </author> | |||
<date year="1971" month="12"></date> | <date year="1971" month="12"></date> | |||
</front> | </front> | |||
<seriesInfo name="DOI" value="10.1109/TCOM.1971.1090789"></seriesInfo> | <seriesInfo name="DOI" value="10.1109/TCOM.1971.1090789"></seriesInfo> | |||
<refcontent>IEEE Transactions on Communication Technology, vol. 19, no. 6, pp. 889-897</refcontent> | ||||
</reference> | </reference> | |||
<reference anchor="Shannon" target="https://ieeexplore.ieee.org/document/1697831 "> | <reference anchor="Shannon" target="https://ieeexplore.ieee.org/document/1697831 "> | |||
<front> | <front> | |||
<title>Communication in the Presence of Noise</title> | <title>Communication in the Presence of Noise</title> | |||
<author fullname="Claude Shannon" initials="CE" surname="Shannon"> | <author fullname="Claude Shannon" initials="C. E" surname="Shannon"> | |||
<organization>Bell Telephone Laboratories, Inc., Murray Hill, NJ, USA</org anization> | <organization>Bell Telephone Laboratories, Inc., Murray Hill, NJ, USA</org anization> | |||
</author> | </author> | |||
<date year="1949" month="01"></date> | <date year="1949" month="01"></date> | |||
</front> | </front> | |||
<seriesInfo name="DOI" value="10.1109/JRPROC.1949.232969"></seriesInfo> | <seriesInfo name="DOI" value="10.1109/JRPROC.1949.232969"></seriesInfo> | |||
<refcontent>Proceedings of the IRE, vol. 37, no. 1, pp. 10-21</refcontent> | ||||
</reference> | </reference> | |||
<reference anchor="VarLengthCode" target="https://en.wikipedia.org/wiki/Variable | ||||
-length_code"> | <reference anchor="VarLengthCode" target="https://en.wikipedia.org/w/index.php?t | |||
itle=Variable-length_code&oldid=1220260423"> | ||||
<front> | <front> | |||
<title>Variable-length code - Wikipedia</title> | <title>Variable-length code</title> | |||
<author></author> | <author> | |||
<date></date> | <organization>Wikipedia</organization> | |||
</author> | ||||
<date month="April" year="2024" /> | ||||
</front> | </front> | |||
</reference> | </reference> | |||
<reference anchor="Vorbis" target="https://xiph.org/vorbis/doc/v-comment.html"> | <reference anchor="Vorbis" target="https://xiph.org/vorbis/doc/v-comment.html"> | |||
<front> | <front> | |||
<title>Ogg Vorbis I format specification: comment field and header specifica tion</title> | <title>Ogg Vorbis I format specification: comment field and header specifica tion</title> | |||
<author> | <author> | |||
<organization>Xiph.Org</organization> | <organization>Xiph.Org</organization> | |||
</author> | </author> | |||
<date></date> | <date></date> | |||
</front> | </front> | |||
</reference> | </reference> | |||
<reference anchor="lossyWAV" target="https://wiki.hydrogenaud.io/index.php?title | ||||
=LossyWAV"> | <reference anchor="lossyWAV" target="https://wiki.hydrogenaud.io/index.php?title | |||
=LossyWAV&oldid=32877"> | ||||
<front> | <front> | |||
<title>lossyWAV - Hydrogenaudio Knowledgebase</title> | <title>lossyWAV</title> | |||
<author></author> | <author> | |||
<organization>Hydrogenaudio Knowledgebase</organization> | ||||
</author> | ||||
<date month="July" year="2021" /> | ||||
</front> | </front> | |||
</reference> | </reference> | |||
<reference anchor="robinson-tr156" target="https://mi.eng.cam.ac.uk/reports/abst | ||||
racts/robinson_tr156.html"> | <reference anchor="Robinson-TR156" target="https://mi.eng.cam.ac.uk/reports/svr- | |||
ftp/auto-pdf/robinson_tr156.pdf"> | ||||
<front> | <front> | |||
<title>SHORTEN: Simple lossless and near-lossless waveform compression</titl e> | <title>SHORTEN: Simple lossless and near-lossless waveform compression</titl e> | |||
<author fullname="Tony Robinson" initials="T" surname="Robinson"> | <author fullname="Tony Robinson" initials="T" surname="Robinson"> | |||
<organization>Cambridge University Engineering Department</organization> | <organization>Cambridge University Engineering Department</organization> | |||
</author> | </author> | |||
<date year="1994" month="12"></date> | <date year="1994" month="12"></date> | |||
</front> | </front> | |||
<refcontent>Cambridge University Engineering Department Technical Report CUED/ F-INFENG/TR.156</refcontent> | ||||
</reference> | </reference> | |||
</references> | </references> | |||
</references> | </references> | |||
<section anchor="numerical-considerations"><name>Numerical considerations</name> | <section anchor="numerical-considerations"><name>Numerical Considerations</name> | |||
<t>In order to maintain lossless behavior, all arithmetic used in encoding and d | <t>In order to maintain lossless behavior, all arithmetic used in encoding and d | |||
ecoding sample values must be done with integer data types to eliminate the poss | ecoding sample values must be done with integer data types to eliminate the poss | |||
ibility of introducing rounding errors associated with floating-point arithmetic | ibility of introducing rounding errors associated with floating-point arithmetic | |||
. Use of floating-point representations in analysis (e.g., finding a good predic | . Use of floating-point representations in analysis (e.g., finding a good predic | |||
tor or Rice parameter) is not a concern, as long as the process of using the fou | tor or Rice parameter) is not a concern as long as the process of using the foun | |||
nd predictor and Rice parameter to encode audio samples is implemented with only | d predictor and Rice parameter to encode audio samples is implemented with only | |||
integer math.</t> | integer math.</t> | |||
<t>Furthermore, the possibility of integer overflow can be eliminated by using l | <t>Furthermore, the possibility of integer overflow can be eliminated by using d | |||
arge enough data types. Choosing a 64-bit signed data type for all arithmetic in | ata types that are large enough. Choosing a 64-bit signed data type for all arit | |||
volving sample values would make sure the possibility for overflow is eliminated | hmetic involving sample values would make sure the possibility for overflow is e | |||
, but usually smaller data types are chosen for increased performance, especiall | liminated, but usually, smaller data types are chosen for increased performance, | |||
y in embedded devices. This appendix provides guidelines for choosing the approp | especially in embedded devices. This appendix provides guidelines for choosing | |||
riate data type for each step of encoding and decoding FLAC files.</t> | the appropriate data type for each step of encoding and decoding FLAC files.</t> | |||
<t>In this appendix, signed data types are signed two's complement.</t> | <t>In this appendix, signed data types are signed two's complement.</t> | |||
<section anchor="determining-the-necessary-data-type-size"><name>Determining the necessary data type size</name> | <section anchor="determining-the-necessary-data-type-size"><name>Determining the Necessary Data Type Size</name> | |||
<t>To find the smallest data type size that is guaranteed not to overflow for a certain sequence of arithmetic operations, the combination of values producing t he largest possible result should be considered.</t> | <t>To find the smallest data type size that is guaranteed not to overflow for a certain sequence of arithmetic operations, the combination of values producing t he largest possible result should be considered.</t> | |||
<t>If, for example, two 16-bit signed integers are added, the largest possible r | <t>For example, if two 16-bit signed integers are added, the largest possible re | |||
esult forms if both values are the largest number that can be represented with a | sult forms if both values are the largest number that can be represented with a | |||
16-bit signed integer. To store the result, a signed integer data type with at | 16-bit signed integer. To store the result, a signed integer data type with at l | |||
least 17 bits is needed. Similarly, when adding 4 of these values, 18 bits are n | east 17 bits is needed. Similarly, when adding 4 of these values, 18 bits are ne | |||
eeded; when adding 8, 19 bits are needed, etc. In general, the number of bits ne | eded; when adding 8, 19 bits are needed, etc. In general, the number of bits nec | |||
cessary when adding numbers together is increased by the log base 2 of the numbe | essary when adding numbers together is increased by the log base 2 of the number | |||
r of values rounded up to the nearest integer. So, when adding 18 unknown values | of values rounded up to the nearest integer. So, when adding 18 unknown values | |||
stored in 8 bit signed integers, we need a signed integer data type of at least | stored in 8-bit signed integers, we need a signed integer data type of at least | |||
13 bits to store the result, as the log base 2 of 18 rounded up is 5.</t> | 13 bits to store the result, as the log base 2 of 18 rounded up is 5.</t> | |||
<t>When multiplying two numbers, the number of bits needed for the result is the | <t>When multiplying two numbers, the number of bits needed for the result is the | |||
size of the first number plus the size of the second number. If, for example, a | size of the first number plus the size of the second number. For example, if a | |||
16-bit signed integer is multiplied by another 16-bit signed integer, the resul | 16-bit signed integer is multiplied by another 16-bit signed integer, the result | |||
t needs at least 32 bits to be stored without overflowing. To show this in pract | needs at least 32 bits to be stored without overflowing. To show this in practi | |||
ice, the largest signed value that can be stored in 4 bits is -8. (-8)*(-8) is 6 | ce, the largest signed value that can be stored in 4 bits is -8. (-8)*(-8) is 64 | |||
4, which needs at least 8 bits (signed) to store.</t> | , which needs at least 8 bits (signed) to store.</t> | |||
</section> | </section> | |||
<section anchor="stereo-decorrelation"><name>Stereo decorrelation</name> | <section anchor="stereo-decorrelation"><name>Stereo Decorrelation</name> | |||
<t>When stereo decorrelation is used, the side channel will have one extra bit o | <t>When stereo decorrelation is used, the side channel will have one extra bit o | |||
f bit depth, see <xref target="interchannel-decorrelation"></xref>.</t> | f bit depth; see <xref target="interchannel-decorrelation"></xref>.</t> | |||
<t>This means that while 16-bit signed integers have sufficient range to store s amples from a fully decoded FLAC frame with a bit depth of 16 bits, the decoding of a side subframe in such a file will need a data type with at least 17 bits t o store decoded subframe samples before undoing stereo decorrelation.</t> | <t>This means that while 16-bit signed integers have sufficient range to store s amples from a fully decoded FLAC frame with a bit depth of 16 bits, the decoding of a side subframe in such a file will need a data type with at least 17 bits t o store decoded subframe samples before undoing stereo decorrelation.</t> | |||
<t>Most FLAC decoders store decoded (subframe) samples as 32-bit values, which i s sufficient for files with bit depths up to (and including) 31 bits.</t> | <t>Most FLAC decoders store decoded (subframe) samples as 32-bit values, which i s sufficient for files with bit depths up to (and including) 31 bits.</t> | |||
</section> | </section> | |||
<section anchor="prediction-1"><name>Prediction</name> | <section anchor="prediction-1"><name>Prediction</name> | |||
<t>A prediction (which is used to calculate the residual on encoding or added to the residual to calculate the sample value on decoding) is formed by multiplyin g and summing preceding sample values. In order to eliminate the possibility of integer overflow, the combination of preceding sample values and predictor coeff icients producing the largest possible value should be considered.</t> | <t>A prediction (which is used to calculate the residual on encoding or added to the residual to calculate the sample value on decoding) is formed by multiplyin g and summing preceding sample values. In order to eliminate the possibility of integer overflow, the combination of preceding sample values and predictor coeff icients producing the largest possible value should be considered.</t> | |||
<t>To determine the size of the data type needed to calculate either a residual sample (on encoding) or an audio sample value (on decoding) in a fixed predictor subframe, the maximal possible value for these is calculated as described in <x ref target="determining-the-necessary-data-type-size"></xref> in the following t able. For example: if a frame codes for 16-bit audio and has some form of stereo decorrelation, the subframe coding for the side channel would need 16+1+3 bits if a third order fixed predictor is used.</t> | <t>To determine the size of the data type needed to calculate either a residual sample (on encoding) or an audio sample value (on decoding) in a fixed predictor subframe, the maximum possible value for these is calculated as described in <x ref target="determining-the-necessary-data-type-size"></xref> and in the followi ng table. For example, if a frame codes for 16-bit audio and has some form of st ereo decorrelation, the subframe coding for the side channel would need 16+1+3 b its if a third-order fixed predictor is used.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Order</th> | <th align="left">Order</th> | |||
<th align="left">Calculation of residual</th> | <th align="left">Calculation of Residual</th> | |||
<th align="left">Sample values summed</th> | <th align="left">Sample Values Summed</th> | |||
<th align="left">Extra bits</th> | <th align="left">Extra Bits</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0</td> | <td align="left">0</td> | |||
<td align="left">a(n)</td> | <td align="left">a(n)</td> | |||
<td align="left">1</td> | <td align="left">1</td> | |||
<td align="left">0</td> | <td align="left">0</td> | |||
</tr> | </tr> | |||
skipping to change at line 2191 ¶ | skipping to change at line 2349 ¶ | |||
<td align="left">3</td> | <td align="left">3</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">4</td> | <td align="left">4</td> | |||
<td align="left">a(n) - 4 * a(n-1) + 6 * a(n-2) - 4 * a(n-3) + a(n-4)</td> | <td align="left">a(n) - 4 * a(n-1) + 6 * a(n-2) - 4 * a(n-3) + a(n-4)</td> | |||
<td align="left">16</td> | <td align="left">16</td> | |||
<td align="left">4</td> | <td align="left">4</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>Where</t> | </table><t>Where:</t> | |||
<ul spacing="compact"> | <ul> | |||
<li>n is the number of the sample being predicted.</li> | <li>n is the number of the sample being predicted.</li> | |||
<li>a(n) is the sample being predicted.</li> | <li>a(n) is the sample being predicted.</li> | |||
<li>a(n-1) is the sample before the one being predicted, a(n-2) is the sample be fore that, etc.</li> | <li>a(n-1) is the sample before the one being predicted, a(n-2) is the sample be fore that, etc.</li> | |||
</ul> | </ul> | |||
<t>For subframes with a linear predictor, the calculation is a little more compl icated. Each prediction is the sum of several multiplications. Each of these mul tiply a sample value with a predictor coefficient. The extra bits needed can be calculated by adding the predictor coefficient precision (in bits) to the bit de pth of the audio samples. To account for the summing of these multiplications, t he log base 2 of the predictor order rounded up is added.</t> | <t>For subframes with a linear predictor, the calculation is a little more compl icated. Each prediction is the sum of several multiplications. Each of these mul tiply a sample value with a predictor coefficient. The extra bits needed can be calculated by adding the predictor coefficient precision (in bits) to the bit de pth of the audio samples. To account for the summing of these multiplications, t he log base 2 of the predictor order rounded up is added.</t> | |||
<t>For example, if the sample bit depth of the source is 24, the current subfram e encodes a side channel (see <xref target="interchannel-decorrelation"></xref>) , the predictor order is 12, and the predictor coefficient precision is 15 bits, the minimum required size of the used signed integer data type is at least (24 + 1) + 15 + ceil(log2(12)) = 44 bits. As another example, with a side-channel su bframe bit depth of 16, a predictor order of 8, and a predictor coefficient prec ision of 12 bits, the minimum required size of the used signed integer data type is (16 + 1) + 12 + ceil(log2(8)) = 32 bits.</t> | <t>For example, if the sample bit depth of the source is 24, the current subfram e encodes a side channel (see <xref target="interchannel-decorrelation"></xref>) , the predictor order is 12, and the predictor coefficient precision is 15 bits, the minimum required size of the used signed integer data type is at least (24 + 1) + 15 + ceil(log2(12)) = 44 bits. As another example, with a side-channel su bframe bit depth of 16, a predictor order of 8, and a predictor coefficient prec ision of 12 bits, the minimum required size of the used signed integer data type is (16 + 1) + 12 + ceil(log2(8)) = 32 bits.</t> | |||
</section> | </section> | |||
<section anchor="residual"><name>Residual</name> | <section anchor="residual"><name>Residual</name> | |||
<t>As stated in <xref target="coded-residual"></xref>, an encoder must make sure residual samples are representable by a 32-bit integer, signed two's complement , excluding the most negative value. Continuing as in the previous section, it i s possible to calculate when residual samples already implicitly fit and when an additional check is needed. This implicit fit is achieved when residuals would fit a theoretical 31-bit signed int, as that satisfies both of the mentioned cri teria. When this implicit fit is not achieved, all residual values must be calcu lated and checked individually.</t> | <t>As stated in <xref target="coded-residual"></xref>, an encoder must make sure residual samples are representable by a 32-bit integer, signed two's complement , excluding the most negative value. As in the previous section, it is possible to calculate when residual samples already implicitly fit and when an additional check is needed. This implicit fit is achieved when residuals would fit a theor etical 31-bit signed integer, as that satisfies both of the mentioned criteria. When this implicit fit is not achieved, all residual values must be calculated a nd checked individually.</t> | |||
<t>For the residual of a fixed predictor, the maximum residual sample size was a lready calculated in the previous section. However, for a linear predictor, the prediction is shifted right by a certain amount. The number of bits needed for t he residual is the number of bits calculated in the previous section, reduced by the prediction right shift, and increased by one bit to account for the subtrac tion of the prediction from the current sample on encoding.</t> | <t>For the residual of a fixed predictor, the maximum residual sample size was a lready calculated in the previous section. However, for a linear predictor, the prediction is shifted right by a certain amount. The number of bits needed for t he residual is the number of bits calculated in the previous section, reduced by the prediction right shift, and increased by one bit to account for the subtrac tion of the prediction from the current sample on encoding.</t> | |||
<t>Taking the last example of the previous section, where 32 bits were needed fo r the prediction, the required data type size for the residual samples in case o f a right shift of 10 bits would be 32 - 10 + 1 = 23 bits, which means it is not necessary to perform the aforementioned check.</t> | <t>Taking the last example of the previous section, where 32 bits were needed fo r the prediction, the required data type size for the residual samples in case o f a right shift of 10 bits would be 32 - 10 + 1 = 23 bits, which means it is not necessary to perform the aforementioned check.</t> | |||
<t>As another example, when encoding 32-bit PCM with fixed predictors, all predi | <t>As another example, when encoding 32-bit PCM with fixed predictors, all predi | |||
ctor orders must be checked. While the 0-order fixed predictor is guaranteed to | ctor orders must be checked. While the zero-order fixed predictor is guaranteed | |||
have residual samples that fit a 32-bit signed int, it might produce a residual | to have residual samples that fit a 32-bit signed integer, it might produce a re | |||
sample value that is the most negative representable value of that 32-bit signed | sidual sample value that is the most negative representable value of that 32-bit | |||
int.</t> | signed integer.</t> | |||
<t>Note that on decoding, while the residual sample values are limited to the af | <t>Note that on decoding, while the residual sample values are limited to the af | |||
orementioned range, the predictions are not. This means that while the decoding | orementioned range, the predictions are not. This means that while the decoding | |||
of the residual samples can happen fully in 32-bit signed integers, decoders mus | of the residual samples can happen fully in 32-bit signed integers, decoders mus | |||
t be sure to execute the addition of each residual sample to its accompanying pr | t be sure to execute the addition of each residual sample to its accompanying pr | |||
ediction with a wide enough signed integer data type like on encoding.</t> | ediction with a signed integer data type that is wide enough, as with encoding.< | |||
/t> | ||||
</section> | </section> | |||
<section anchor="rice-coding"><name>Rice coding</name> | <section anchor="rice-coding"><name>Rice Coding</name> | |||
<t>When folding (i.e., zig-zag encoding) the residual sample values, no extra bi | <t>When folding (i.e., zigzag encoding) the residual sample values, no extra bit | |||
ts are needed when the absolute value of each residual sample is first stored in | s are needed when the absolute value of each residual sample is first stored in | |||
an unsigned data type of the size of the last step, then doubled, and then has | an unsigned data type of the size of the last step, then doubled, and then has o | |||
one subtracted depending on whether the residual sample was positive or negative | ne subtracted depending on whether the residual sample was positive or negative. | |||
. Many implementations, however, choose to require one extra bit of data type si | However, many implementations choose to require one extra bit of data type size | |||
ze so zig-zag encoding can happen in one step and without a cast instead of the | so zigzag encoding can happen in one step without a cast instead of the procedu | |||
procedure described in the previous sentence.</t> | re described in the previous sentence.</t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="past-format-changes"><name>Past format changes</name> | <section anchor="past-format-changes"><name>Past Format Changes</name> | |||
<t>This informational appendix documents the changes made to the FLAC format ove r the years. This information might be of use when encountering FLAC files that were made with software following the format as it was before the changes docume nted in this appendix.</t> | <t>This informational appendix documents the changes made to the FLAC format ove r the years. This information might be of use when encountering FLAC files that were made with software following the format as it was before the changes docume nted in this appendix.</t> | |||
<t>The FLAC format was first specified in December 2000 and the bitstream format was considered frozen with the release of FLAC (the reference encoder/decoder) 1.0 in July 2001. Only changes made since this first stable release are consider ed in this appendix. Changes made to the FLAC streamable subset definition (see <xref target="streamable-subset"></xref>) are not considered.</t> | <t>The FLAC format was first specified in December 2000, and the bitstream forma t was considered frozen with the release of FLAC 1.0 (the reference encoder/deco der) in July 2001. Only changes made since this first stable release are conside red in this appendix. Changes made to the FLAC streamable subset definition (see <xref target="streamable-subset"></xref>) are not considered.</t> | |||
<section anchor="addition-of-blocking-strategy-bit"><name>Addition of blocking s | <section anchor="addition-of-blocking-strategy-bit"><name>Addition of Blocking S | |||
trategy bit</name> | trategy Bit</name> | |||
<t>Perhaps the largest backwards incompatible change to the specification was pu | <t>Perhaps the largest backwards-incompatible change to the specification was pu | |||
blished in July 2007. Before this change, variable block size streams were not e | blished in July 2007. Before this change, variable block size streams were not e | |||
xplicitly marked as such by a flag bit in the frame header. A decoder had two wa | xplicitly marked as such by a flag bit in the frame header. A decoder had two wa | |||
ys to detect a variable block size stream, either by comparing the minimum and m | ys to detect a variable block size stream: by comparing the minimum and maximum | |||
aximum block size in the STREAMINFO metadata block (which are equal for a fixed | block sizes in the streaminfo metadata block (which are equal for a fixed block | |||
block size stream), or, if a decoder did not receive a STREAMINFO metadata block | size stream) or by detecting a change of block size during a stream if a decoder | |||
, by detecting a change of block size during a stream, which could in theory not | did not receive a streaminfo metadata block, which could not happen at all in t | |||
happen at all. As the meaning of the coded number in the frame header depends o | heory. As the meaning of the coded number in the frame header depends on whether | |||
n whether or not a stream is variable block size, this presented a problem: the | or not a stream has a variable block size, this presented a problem: the meanin | |||
meaning of the coded number could not be reliably determined. To fix this proble | g of the coded number could not be reliably determined. To fix this problem, one | |||
m, one of the reserved bits was changed to be used as a blocking strategy bit. S | of the reserved bits was changed to be used as a blocking strategy bit. See als | |||
ee also <xref target="frame-header"></xref>.</t> | o <xref target="frame-header"></xref>.</t> | |||
<t>Along with the addition of a new flag, the meaning of the block size bits (se | <t>Along with the addition of a new flag, the meaning of the block size bits (se | |||
e <xref target="block-size-bits"></xref>) was subtly changed. Initially, block s | e <xref target="block-size-bits"></xref>) was subtly changed. Initially, block s | |||
ize bits patterns 0b0001-0b0101 and 0b1000-0b1111 could only be used for fixed b | ize bits patterns 0b0001-0b0101 and 0b1000-0b1111 could only be used for fixed b | |||
lock size streams, while 0b0110 and 0b0111 could be used for both fixed block si | lock size streams, while 0b0110 and 0b0111 could be used for both fixed block si | |||
ze and variable block size streams. With the change, these restrictions were lif | ze and variable block size streams. With this change, these restrictions were li | |||
ted, and patterns 0b0001-0b1111 are now used for both variable block size and fi | fted, and patterns 0b0001-0b1111 are now used for both variable block size and f | |||
xed block size streams.</t> | ixed block size streams.</t> | |||
</section> | </section> | |||
<section anchor="restriction-of-encoded-residual-samples"><name>Restriction of e | <section anchor="restriction-of-encoded-residual-samples"><name>Restriction of E | |||
ncoded residual samples</name> | ncoded Residual Samples</name> | |||
<t>Another change to the specification was deemed necessary during standardizati | <t>Another change to the specification was deemed necessary during standardizati | |||
on by the CELLAR working group of the IETF. As specified in <xref target="coded- | on by the CELLAR Working Group of the IETF. As specified in <xref target="coded- | |||
residual"></xref> a limit is imposed on residual samples. This limit was not spe | residual"></xref>, a limit is imposed on residual samples. This limit was not sp | |||
cified prior to the IETF standardization effort. However, as far as was known to | ecified prior to the IETF standardization effort. However, as far as was known t | |||
the working group, no FLAC encoder at that time produced FLAC files containing | o the working group, no FLAC encoder at that time produced FLAC files containing | |||
residual samples exceeding this limit. This is mostly because it is very unlikel | residual samples exceeding this limit. This is mostly because it is very unlike | |||
y to encounter residual samples exceeding this limit when encoding 24-bit PCM, a | ly to encounter residual samples exceeding this limit when encoding 24-bit PCM, | |||
nd encoding of PCM with higher bit depths was not yet implemented in any known e | and encoding of PCM with higher bit depths was not yet implemented in any known | |||
ncoder. In fact, these FLAC encoders would produce corrupt files upon being trig | encoder. In fact, these FLAC encoders would produce corrupt files upon being tri | |||
gered to produce such residual samples and it is unlikely any non-experimental e | ggered to produce such residual samples, and it is unlikely any non-experimental | |||
ncoder would ever do so, even when presented with crafted material. Therefore, i | encoder would ever do so, even when presented with crafted material. Therefore, | |||
t was not expected that existing implementations would be rendered non-compliant | it was not expected that existing implementations would be rendered non-complia | |||
by this change.</t> | nt by this change.</t> | |||
</section> | </section> | |||
<section anchor="addition-of-5-bit-rice-parameters"><name>Addition of 5-bit Rice | <section anchor="addition-of-5-bit-rice-parameters"><name>Addition of 5-Bit Rice | |||
parameters</name> | Parameters</name> | |||
<t>One significant addition to the format was the residual coding method using 5 | <t>One significant addition to the format was the residual coding method using | |||
-bit Rice parameters. Prior to publication of this addition in July 2007, there | 5-bit Rice parameters. Prior to publication of this addition in July 2007, a | |||
was only one residual coding method specified, a partitioned Rice code with 4-bi | partitioned Rice code with 4-bit Rice parameters was the only residual coding | |||
t Rice parameters. The range offered by this coding method proved too small when | method specified. The range offered by this coding method proved too small | |||
encoding 24-bit PCM, therefore, a second residual coding method was specified, | when encoding 24-bit PCM; therefore, a second residual coding method was | |||
identical to the first but with 5-bit Rice parameters.</t> | specified that was identical to the first, but with 5-bit Rice parameters.</t> | |||
</section> | </section> | |||
<section anchor="restriction-of-lpc-shift-to-non-negative-values"><name>Restrict | <section anchor="restriction-of-lpc-shift-to-non-negative-values"><name>Restrict | |||
ion of LPC shift to non-negative values</name> | ion of LPC Shift to Non-negative Values</name> | |||
<t>As stated in <xref target="linear-predictor-subframe"></xref>, the predictor | <t>As stated in <xref target="linear-predictor-subframe"></xref>, the predictor | |||
right shift is a number signed two's complement, which MUST NOT be negative. Thi | right shift is a number signed two's complement, which <bcp14>MUST NOT</bcp14> b | |||
s is because right shifting a number by a negative amount is undefined behavior | e negative. This is because shifting a number to the right by a negative amount | |||
in the C programming language standard. The intended behavior was that a positiv | is undefined behavior in the C programming language standard. The intended behav | |||
e number would be a right shift and a negative number would be a left shift. The | ior was that a positive number would be a right shift and a negative number woul | |||
FLAC reference encoder was changed in 2007 to not generate LPC subframes with a | d be a left shift. The FLAC reference encoder was changed in 2007 to not generat | |||
negative predictor right shift, as it turned out that the use of such subframes | e LPC subframes with a negative predictor right shift, as it turned out that the | |||
would only very rarely provide any benefit, and the decoders that were already | use of such subframes would only very rarely provide any benefit and the decode | |||
widely in use at that point were not able to handle such subframes.</t> | rs that were already widely in use at that point were not able to handle such su | |||
bframes.</t> | ||||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="interoperability-considerations"><name>Interoperability conside | <section anchor="interoperability-considerations"><name>Interoperability | |||
rations</name> | Considerations</name> | |||
<t>As documented in <xref target="past-format-changes"></xref>, there have been some changes and additions to the FLAC format. Additionally, implementation of c ertain features of the FLAC format took many years, meaning early decoder implem entations could not be tested against files with these features. Finally, many l ower-quality FLAC decoders only implement just enough features required for play back of the most common FLAC files.</t> | <t>As documented in <xref target="past-format-changes"></xref>, there have been some changes and additions to the FLAC format. Additionally, implementation of c ertain features of the FLAC format took many years, meaning early decoder implem entations could not be tested against files with these features. Finally, many l ower-quality FLAC decoders only implement just enough features required for play back of the most common FLAC files.</t> | |||
<t>This appendix provides some considerations for encoder implementations aiming to create highly compatible files. As this topic is one that might change after this document is finished, consult <xref target="FLAC-wiki-interoperability"></ xref> for more up-to-date information.</t> | <t>This appendix provides some considerations for encoder implementations aiming to create highly compatible files. As this topic is one that might change after this document is published, consult <xref target="FLAC-wiki-interoperability">< /xref> for more up-to-date information.</t> | |||
<section anchor="features-outside-of-the-streamable-subset"><name>Features outsi de of the streamable subset</name> | <section anchor="features-outside-of-the-streamable-subset"><name>Features outsi de of the Streamable Subset</name> | |||
<t>As described in <xref target="streamable-subset"></xref>, FLAC specifies a su bset of its capabilities as the FLAC streamable subset. Certain decoders may cho ose to only decode FLAC files conforming to the limitations imposed by the strea mable subset. Therefore, maximum compatibility with decoders is achieved when th e limitations of the FLAC streamable subset are followed when creating FLAC file s.</t> | <t>As described in <xref target="streamable-subset"></xref>, FLAC specifies a su bset of its capabilities as the FLAC streamable subset. Certain decoders may cho ose to only decode FLAC files conforming to the limitations imposed by the strea mable subset. Therefore, maximum compatibility with decoders is achieved when th e limitations of the FLAC streamable subset are followed when creating FLAC file s.</t> | |||
</section> | </section> | |||
<section anchor="variable-block-size"><name>Variable block size</name> | <section anchor="variable-block-size"><name>Variable Block Size</name> | |||
<t>Because it is often difficult to find the optimal arrangement of block sizes | <t>Because it is often difficult to find the optimal arrangement of block sizes | |||
for maximum compression, most encoders choose to create files with a fixed block | for maximum compression, most encoders choose to create files with a fixed block | |||
size. Because of this, many decoder implementations receive minimal use when ha | size. Because of this, many decoder implementations receive minimal use when ha | |||
ndling variable block size streams, and this can reveal bugs or reveal that impl | ndling variable block size streams, and this can reveal bugs or reveal that impl | |||
ementations do not decode them at all. Furthermore, as explained in <xref target | ementations do not decode them at all. Furthermore, as explained in <xref target | |||
="addition-of-blocking-strategy-bit"></xref>, there have been some changes to th | ="addition-of-blocking-strategy-bit"></xref>, there have been some changes to th | |||
e way variable block size streams were encoded. Because of this, maximum compati | e way variable block size streams are encoded. Because of this, maximum compatib | |||
bility with decoders is achieved when FLAC files are created using fixed block s | ility with decoders is achieved when FLAC files are created using fixed block si | |||
ize streams.</t> | ze streams.</t> | |||
</section> | </section> | |||
<section anchor="rice-parameter-5-bit"><name>5-bit Rice parameter</name> | <section anchor="rice-parameter-5-bit"><name>5-Bit Rice Parameters</name> | |||
<t>As the addition of the 5-bit Rice parameter, as described in <xref target="ad | <t> As the addition of the coding method using 5-bit Rice parameters, | |||
dition-of-5-bit-rice-parameters"></xref>, occurred quite a few years after the F | as described in <xref target="addition-of-5-bit-rice-parameters"/>, occurred | |||
LAC format was first introduced, some early decoders might not be able to decode | quite a few years after the | |||
files containing such Rice parameters. The introduction of this was specificall | FLAC format was first introduced, some early decoders might not | |||
y aimed at improving compression of 24-bit PCM audio, and compression of 16-bit | be able to decode files containing such Rice parameters. The introduction of | |||
PCM audio only rarely benefits from using 5-bit Rice parameters. Therefore, maxi | this was specifically aimed at improving compression of 24-bit PCM audio, and co | |||
mum compatibility with decoders is achieved when FLAC files containing audio wit | mpression of 16-bit PCM audio only rarely benefits from using 5-bit Rice paramet | |||
h a bit depth of 16 bits or lower are created without any use of 5-bit Rice para | ers. Therefore, maximum compatibility with decoders is achieved when FLAC files | |||
meters.</t> | containing audio with a bit depth of 16 bits or less are created without any use | |||
of 5-bit Rice parameters.</t> | ||||
</section> | </section> | |||
<section anchor="rice-escape-code"><name>Rice escape code</name> | <section anchor="rice-escape-code"><name>Rice Escape Code</name> | |||
<t>Escaped Rice partitions are seldom used, as it turned out their use provides | <t>Escaped Rice partitions are seldom used, as it turned out their use provides | |||
only a very small compression improvement. As many encoders therefore do not use | only a very small compression improvement. As many encoders do not use these by | |||
these by default or are not capable of producing them at all, it is likely that | default or are not capable of producing them at all, it is likely that many deco | |||
many decoder implementations are not able to decode them correctly. Therefore, | der implementations are not able to decode them correctly. Therefore, maximum co | |||
maximum compatibility with decoders is achieved when FLAC files are created with | mpatibility with decoders is achieved when FLAC files are created without any us | |||
out any use of escaped Rice partitions.</t> | e of escaped Rice partitions.</t> | |||
</section> | </section> | |||
<section anchor="uncommon-block-size-1"><name>Uncommon block size</name> | <section anchor="uncommon-block-size-1"><name>Uncommon Block Size</name> | |||
<t>For unknown reasons, some decoders have chosen to support only common block s izes for all but the last block of a stream. Therefore, maximum compatibility wi th decoders is achieved when creating FLAC files using common block sizes, as li sted in <xref target="block-size-bits"></xref>, for all but the last block of a stream.</t> | <t>For unknown reasons, some decoders have chosen to support only common block s izes for all but the last block of a stream. Therefore, maximum compatibility wi th decoders is achieved when creating FLAC files using common block sizes, as li sted in <xref target="block-size-bits"></xref>, for all but the last block of a stream.</t> | |||
</section> | </section> | |||
<section anchor="uncommon-bit-depth"><name>Uncommon bit depth</name> | <section anchor="uncommon-bit-depth"><name>Uncommon Bit Depth</name> | |||
<t>Most audio is stored in bit depths that are a whole number of bytes, e.g., 8, | <t>Most audio is stored in bit depths that are a whole number of bytes, e.g., 8, | |||
16 or 24 bit. There is however audio with different bit depths. A few examples: | 16, or 24 bits. However, there is audio with different bit depths. A few exampl | |||
</t> | es:</t> | |||
<ul spacing="compact"> | <ul> | |||
<li>DVD-Audio has the possibility to store 20 bit PCM audio.</li> | <li>DVD-Audio has the possibility to store 20-bit PCM audio.</li> | |||
<li>DAT and DV can store 12 bit PCM audio.</li> | <li>DAT and DV can store 12-bit PCM audio.</li> | |||
<li>NICAM-728 samples at 14 bit, which is companded to 10 bit.</li> | <li>NICAM-728 samples at 14 bits, which is companded to 10 bits.</li> | |||
<li>8-bit µ-law can be losslessly converted to 14 bit (Linear) PCM.</li> | <li>8-bit µ-law can be losslessly converted to 14-bit (Linear) PCM.</li> | |||
<li>8-bit A-law can be losslessly converted to 13 bit (Linear) PCM.</li> | <li>8-bit A-law can be losslessly converted to 13-bit (Linear) PCM.</li> | |||
</ul> | </ul> | |||
<t>The FLAC format can contain these bit depths directly, but because they are u ncommon, some decoders are not able to process the resulting files correctly. It is possible to store these formats in a FLAC file with a more common bit depth without sacrificing compression by padding each sample with zero bits to a bit d epth that is a whole byte. The FLAC format can efficiently compress these wasted bits. See <xref target="wasted-bits-per-sample"></xref> for details.</t> | <t>The FLAC format can contain these bit depths directly, but because they are u ncommon, some decoders are not able to process the resulting files correctly. It is possible to store these formats in a FLAC file with a more common bit depth without sacrificing compression by padding each sample with zero bits to a bit d epth that is a whole byte. The FLAC format can efficiently compress these wasted bits. See <xref target="wasted-bits-per-sample"></xref> for details.</t> | |||
<t>Therefore, maximum compatibility with decoders is achieved when FLAC files ar e created by padding samples of such audio with zero bits to the bit depth that is the next whole number of bytes.</t> | <t>Therefore, maximum compatibility with decoders is achieved when FLAC files ar e created by padding samples of such audio with zero bits to the bit depth that is the next whole number of bytes.</t> | |||
<t>In cases where the original signal is already padded, this operation cannot b | <t>In cases where the original signal is already padded, this operation cannot b | |||
e reversed losslessly without knowing the original bit depth. To leave no ambigu | e reversed losslessly without knowing the original bit depth. | |||
ity, the original bit depth needs to be stored, for example, in a vorbis comment | To leave no ambiguity, the original bit depth needs to be stored, for example, | |||
field, by storing the header of the original file, or in a description of the f | in a Vorbis comment field or by storing the header of the original file. The | |||
ile. The choice of a suitable method is left to the implementer.</t> | choice of a suitable method is left to the implementor.</t> | |||
<t>Besides audio with a 'non-whole byte' bit depth, some decoder implementations | <t>Besides audio with a "non-whole byte" bit depth, some decoder implementations | |||
have chosen to only accept FLAC files coding for PCM audio with a bit depth of | have chosen to only accept FLAC files coding for PCM audio with a bit depth of | |||
16 bit. Many implementations support bit depths up to 24 bit but no higher. Cons | 16 bits. Many implementations support bit depths up to 24 bits, but no higher. C | |||
ult <xref target="FLAC-wiki-interoperability"></xref> for more up-to-date inform | onsult <xref target="FLAC-wiki-interoperability"></xref> for more up-to-date inf | |||
ation.</t> | ormation.</t> | |||
</section> | </section> | |||
<section anchor="multi-channel-audio-and-uncommon-sample-rates"><name>Multi-chan nel audio and uncommon sample rates</name> | <section anchor="multi-channel-audio-and-uncommon-sample-rates"><name>Multi-Chan nel Audio and Uncommon Sample Rates</name> | |||
<t>Many FLAC audio players are unable to render multi-channel audio or audio wit h an uncommon sample rate. While this is not a concern specific to the FLAC form at, it is of note when requiring maximum compatibility with decoders. Unlike the previously mentioned interoperability considerations, this is one where compati bility cannot be improved without sacrificing the lossless nature of the FLAC fo rmat.</t> | <t>Many FLAC audio players are unable to render multi-channel audio or audio wit h an uncommon sample rate. While this is not a concern specific to the FLAC form at, it is of note when requiring maximum compatibility with decoders. Unlike the previously mentioned interoperability considerations, this is one where compati bility cannot be improved without sacrificing the lossless nature of the FLAC fo rmat.</t> | |||
<t>From a non-exhaustive inquiry, it seems that a non-negligible amount of playe | <t>From a non-exhaustive inquiry, it seems that a non-negligible number of playe | |||
rs, especially hardware players, do not support audio with 3 or more channels or | rs, especially hardware players, do not support audio with 3 or more channels or | |||
sample rates other than those considered common, see <xref target="sample-rate- | sample rates other than those considered common; see <xref target="sample-rate- | |||
bits"></xref>.</t> | bits"></xref>.</t> | |||
<t>For those players that do support and are able to render multi-channel audio, | <t>For those players that do support and are able to render multi-channel audio, | |||
many do not parse and use the WAVEFORMATEXTENSIBLE_CHANNEL_MASK tag (see <xref | many do not parse and use the WAVEFORMATEXTENSIBLE_CHANNEL_MASK tag (see <xref | |||
target="channel-mask"></xref>). This too is an interoperability consideration wh | target="channel-mask"></xref>). This is also an interoperability consideration b | |||
ere compatibility cannot be improved without sacrificing the lossless nature of | ecause compatibility cannot be improved without sacrificing the lossless nature | |||
the FLAC format.</t> | of the FLAC format.</t> | |||
</section> | </section> | |||
<section anchor="changing-audio-properties-mid-stream"><name>Changing audio prop erties mid-stream</name> | <section anchor="changing-audio-properties-mid-stream"><name>Changing Audio Prop erties Mid-Stream</name> | |||
<t>Each FLAC frame header stores the audio sample rate, number of bits per sampl e, and number of channels independently of the streaminfo metadata block and oth er frame headers. This was done to permit multicasting of FLAC files, but it als o allows these properties to change mid-stream. However, many FLAC decoders do n ot handle such changes, as few other formats are capable of holding such streams and changing playback properties during playback is often not possible without interrupting playback. Also, as explained in <xref target="frame-structure"></xr ef>, using this feature of FLAC results in various practical problems.</t> | <t>Each FLAC frame header stores the audio sample rate, number of bits per sampl e, and number of channels independently of the streaminfo metadata block and oth er frame headers. This was done to permit multicasting of FLAC files, but it als o allows these properties to change mid-stream. However, many FLAC decoders do n ot handle such changes, as few other formats are capable of holding such streams and changing playback properties during playback is often not possible without interrupting playback. Also, as explained in <xref target="frame-structure"></xr ef>, using this feature of FLAC results in various practical problems.</t> | |||
<t>However, even when storing an audio stream with changing properties in FLAC e ncapsulated in a container capable of handling such changes, as recommended in < xref target="frame-structure"></xref>, many decoders are not able to decode such a stream correctly. Therefore, maximum compatibility with decoders is achieved when FLAC files are created with a single set of audio properties, in which the properties coded in the streaminfo metadata block (see <xref target="streaminfo" ></xref>) and the properties coded in all frame headers (see <xref target="frame -header"></xref>) are the same. This can be achieved by splitting up an input st ream with changing audio properties at the points where these properties change into separate streams or files.</t> | <t>However, even when storing an audio stream with changing properties in FLAC e ncapsulated in a container capable of handling such changes, as recommended in < xref target="frame-structure"></xref>, many decoders are not able to decode such a stream correctly. Therefore, maximum compatibility with decoders is achieved when FLAC files are created with a single set of audio properties, in which the properties coded in the streaminfo metadata block (see <xref target="streaminfo" ></xref>) and the properties coded in all frame headers (see <xref target="frame -header"></xref>) are the same. This can be achieved by splitting up an input st ream with changing audio properties at the points where these properties change into separate streams or files.</t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="examples"><name>Examples</name> | <section anchor="examples"><name>Examples</name> | |||
<t>This informational appendix contains short example FLAC files that are decode | <t>This informational appendix contains short examples of FLAC files that are de | |||
d step by step. These examples provide a more engaging way to understand the FLA | coded step by step. These examples provide a more engaging way to understand the | |||
C format than the formal specification. The text explaining these examples assum | FLAC format than the formal specification. The text explaining these examples a | |||
es the reader has at least cursorily read the specification and that the reader | ssumes the reader has at least cursorily read the specification and that the rea | |||
refers to the specification for explanation of the terminology used. These examp | der refers to the specification for explanation of the terminology used. These e | |||
les mostly focus on the layout of several metadata blocks and subframe types and | xamples mostly focus on the layout of several metadata blocks, subframe types, a | |||
the implications of certain aspects (for example, wasted bits and stereo decorr | nd the implications of certain aspects (e.g., wasted bits and stereo decorrelati | |||
elation) on this layout.</t> | on) on this layout.</t> | |||
<t>The examples feature files generated by various FLAC encoders. These are pres | <t>The examples feature files generated by various FLAC encoders. These are pres | |||
ented in hexadecimal or binary format, followed by tables and text referring to | ented in hexadecimal or binary format, followed by tables and text referring to | |||
various features by their starting bit positions in these representations. Each | various features by their starting bit positions in these representations. Each | |||
starting position (shortened to 'start' in the tables) is a hexadecimal byte pos | starting position (shortened to "start" in the tables) is a hexadecimal byte pos | |||
ition and a start bit within that byte, separated by a plus sign. Counts for the | ition and a start bit within that byte, separated by a plus sign. Counts for the | |||
se start at zero. For example, a feature starting at the 3rd bit of the 17th byt | se start at zero. For example, a feature starting at the 3rd bit of the 17th byt | |||
e is referred to as starting at 0x10+2. The files that are explored in these exa | e is referred to as starting at 0x10+2. The files that are explored in these exa | |||
mples can be found at <xref target="FLAC-specification-github"></xref>.</t> | mples can be found at <xref target="FLAC-specification-github"></xref>.</t> | |||
<t>All data in this appendix has been thoroughly verified. However, as this appe ndix is informational, if any information here conflicts with statements in the formal specification, the latter takes precedence.</t> | <t>All data in this appendix has been thoroughly verified. However, as this appe ndix is informational, if any information here conflicts with statements in the formal specification, the latter takes precedence.</t> | |||
<section anchor="decoding-example-1"><name>Decoding example 1</name> | <section anchor="decoding-example-1"><name>Decoding Example 1</name> | |||
<t>This very short example FLAC file codes for PCM audio that has two channels, each containing one sample. The focus of this example is on the essential parts of a FLAC file.</t> | <t>This very short example FLAC file codes for PCM audio that has two channels, each containing one sample. The focus of this example is on the essential parts of a FLAC file.</t> | |||
<section anchor="example-file-1-in-hexadecimal-representation"><name>Example fil e 1 in hexadecimal representation</name> | <section anchor="example-file-1-in-hexadecimal-representation"><name>Example Fil e 1 in Hexadecimal Representation</name> | |||
<artwork><![CDATA[00000000: 664c 6143 8000 0022 1000 1000 fLaC...".... | <artwork type=""> | |||
00000000: 664c 6143 8000 0022 1000 1000 fLaC...".... | ||||
0000000c: 0000 0f00 000f 0ac4 42f0 0000 ........B... | 0000000c: 0000 0f00 000f 0ac4 42f0 0000 ........B... | |||
00000018: 0001 3e84 b418 07dc 6903 0758 ..>.....i..X | 00000018: 0001 3e84 b418 07dc 6903 0758 ..>.....i..X | |||
00000024: 6a3d ad1a 2e0f fff8 6918 0000 j=......i... | 00000024: 6a3d ad1a 2e0f fff8 6918 0000 j=......i... | |||
00000030: bf03 58fd 0312 8baa 9a ..X...... | 00000030: bf03 58fd 0312 8baa 9a ..X...... | |||
]]> | ||||
</artwork> | </artwork> | |||
</section> | </section> | |||
<section anchor="example-file-1-in-binary-representation"><name>Example file 1 i n binary representation</name> | <section anchor="example-file-1-in-binary-representation"><name>Example File 1 i n Binary Representation</name> | |||
<artwork><![CDATA[00000000: 01100110 01001100 01100001 01000011 fLaC | <artwork type=""> | |||
00000000: 01100110 01001100 01100001 01000011 fLaC | ||||
00000004: 10000000 00000000 00000000 00100010 ..." | 00000004: 10000000 00000000 00000000 00100010 ..." | |||
00000008: 00010000 00000000 00010000 00000000 .... | 00000008: 00010000 00000000 00010000 00000000 .... | |||
0000000c: 00000000 00000000 00001111 00000000 .... | 0000000c: 00000000 00000000 00001111 00000000 .... | |||
00000010: 00000000 00001111 00001010 11000100 .... | 00000010: 00000000 00001111 00001010 11000100 .... | |||
00000014: 01000010 11110000 00000000 00000000 B... | 00000014: 01000010 11110000 00000000 00000000 B... | |||
00000018: 00000000 00000001 00111110 10000100 ..>. | 00000018: 00000000 00000001 00111110 10000100 ..>. | |||
0000001c: 10110100 00011000 00000111 11011100 .... | 0000001c: 10110100 00011000 00000111 11011100 .... | |||
00000020: 01101001 00000011 00000111 01011000 i..X | 00000020: 01101001 00000011 00000111 01011000 i..X | |||
00000024: 01101010 00111101 10101101 00011010 j=.. | 00000024: 01101010 00111101 10101101 00011010 j=.. | |||
00000028: 00101110 00001111 11111111 11111000 .... | 00000028: 00101110 00001111 11111111 11111000 .... | |||
0000002c: 01101001 00011000 00000000 00000000 i... | 0000002c: 01101001 00011000 00000000 00000000 i... | |||
00000030: 10111111 00000011 01011000 11111101 ..X. | 00000030: 10111111 00000011 01011000 11111101 ..X. | |||
00000034: 00000011 00010010 10001011 10101010 .... | 00000034: 00000011 00010010 10001011 10101010 .... | |||
00000038: 10011010 | 00000038: 10011010 | |||
]]> | ||||
</artwork> | </artwork> | |||
</section> | </section> | |||
<section anchor="signature-and-streaminfo"><name>Signature and streaminfo</name> | <section anchor="signature-and-streaminfo"><name>Signature and Streaminfo</name> | |||
<t>The first 4 bytes of the file contain the fLaC file signature. Directly follo | <t>The first 4 bytes of the file contain the <tt>fLaC</tt> file signature. Direc | |||
wing it is a metadata block. The signature and the first metadata block header a | tly following it is a metadata block. The signature and the first metadata block | |||
re broken down in the following table.</t> | header are broken down in the following table.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0x00+0</td> | <td align="left">0x00+0</td> | |||
<td align="left">4 bytes</td> | <td align="left">4 bytes</td> | |||
<td align="left">0x664C6143</td> | <td align="left">0x664C6143</td> | |||
<td align="left">fLaC</td> | <td align="left"><tt>fLaC</tt></td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x04+0</td> | <td align="left">0x04+0</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b1</td> | <td align="left">0b1</td> | |||
<td align="left">Last metadata block</td> | <td align="left">Last metadata block</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x04+1</td> | <td align="left">0x04+1</td> | |||
<td align="left">7 bits</td> | <td align="left">7 bits</td> | |||
<td align="left">0b0000000</td> | <td align="left">0b0000000</td> | |||
<td align="left">Streaminfo metadata block</td> | <td align="left">Streaminfo metadata block</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x05+0</td> | <td align="left">0x05+0</td> | |||
<td align="left">3 bytes</td> | <td align="left">3 bytes</td> | |||
<td align="left">0x000022</td> | <td align="left">0x000022</td> | |||
<td align="left">Length 34 byte</td> | <td align="left">Length of 34 bytes</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>As the header indicates that this is the last metadata block, the pos ition of the first audio frame can now be calculated as the position of the firs t byte after the metadata block header + the length of the block, i.e., 8+34 = 4 2 or 0x2a. As can be seen, 0x2a indeed contains the frame sync code for fixed bl ock size streams, 0xfff8.</t> | </table><t>As the header indicates that this is the last metadata block, the pos ition of the first audio frame can now be calculated as the position of the firs t byte after the metadata block header + the length of the block, i.e., 8+34 = 4 2 or 0x2a. Thus, 0x2a indeed contains the frame sync code for fixed block size s treams -- 0xfff8.</t> | |||
<t>The streaminfo metadata block contents are broken down in the following table .</t> | <t>The streaminfo metadata block contents are broken down in the following table .</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
skipping to change at line 2401 ¶ | skipping to change at line 2572 ¶ | |||
<td align="left">0x0a+0</td> | <td align="left">0x0a+0</td> | |||
<td align="left">2 bytes</td> | <td align="left">2 bytes</td> | |||
<td align="left">0x1000</td> | <td align="left">0x1000</td> | |||
<td align="left">Max. block size 4096</td> | <td align="left">Max. block size 4096</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x0c+0</td> | <td align="left">0x0c+0</td> | |||
<td align="left">3 bytes</td> | <td align="left">3 bytes</td> | |||
<td align="left">0x00000f</td> | <td align="left">0x00000f</td> | |||
<td align="left">Min. frame size 15 byte</td> | <td align="left">Min. frame size 15 bytes</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x0f+0</td> | <td align="left">0x0f+0</td> | |||
<td align="left">3 bytes</td> | <td align="left">3 bytes</td> | |||
<td align="left">0x00000f</td> | <td align="left">0x00000f</td> | |||
<td align="left">Max. frame size 15 byte</td> | <td align="left">Max. frame size 15 bytes</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x12+0</td> | <td align="left">0x12+0</td> | |||
<td align="left">20 bits</td> | <td align="left">20 bits</td> | |||
<td align="left">0x0ac4, 0b0100</td> | <td align="left">0x0ac4, 0b0100</td> | |||
<td align="left">Sample rate 44100 hertz</td> | <td align="left">Sample rate 44100 hertz</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
skipping to change at line 2446 ¶ | skipping to change at line 2617 ¶ | |||
<td align="left">Total no. of samples 1</td> | <td align="left">Total no. of samples 1</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x1a</td> | <td align="left">0x1a</td> | |||
<td align="left">16 bytes</td> | <td align="left">16 bytes</td> | |||
<td align="left">(...)</td> | <td align="left">(...)</td> | |||
<td align="left">MD5 checksum</td> | <td align="left">MD5 checksum</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>The minimum and maximum block size are both 4096. This was apparently the block size the encoder planned to use, but as only 1 interchannel sample wa s provided, no frames with 4096 samples are actually present in this file.</t> | </table><t>The minimum and maximum block sizes are both 4096. This was apparentl y the block size the encoder planned to use, but as only 1 interchannel sample w as provided, no frames with 4096 samples are actually present in this file.</t> | |||
<t>Note that anywhere a number of samples is mentioned (block size, total number of samples, sample rate), interchannel samples are meant.</t> | <t>Note that anywhere a number of samples is mentioned (block size, total number of samples, sample rate), interchannel samples are meant.</t> | |||
<t>The MD5 checksum (starting at 0x1a) is 0x3e84 b418 07dc 6903 0758 6a3d ad1a 2 e0f. This will be validated after decoding the samples.</t> | <t>The MD5 checksum (starting at 0x1a) is 0x3e84 b418 07dc 6903 0758 6a3d ad1a 2 e0f. This will be validated after decoding the samples.</t> | |||
</section> | </section> | |||
<section anchor="audio-frames"><name>Audio frames</name> | <section anchor="audio-frames"><name>Audio Frames</name> | |||
<t>The frame header starts at position 0x2a and is broken down in the following table.</t> | <t>The frame header starts at position 0x2a and is broken down in the following table.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0x2a+0</td> | <td align="left">0x2a+0</td> | |||
<td align="left">15 bits</td> | <td align="left">15 bits</td> | |||
<td align="left">0xff, 0b1111100</td> | <td align="left">0xff, 0b1111100</td> | |||
<td align="left">frame sync</td> | <td align="left">Frame sync</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x2b+7</td> | <td align="left">0x2b+7</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">blocking strategy</td> | <td align="left">Blocking strategy</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x2c+0</td> | <td align="left">0x2c+0</td> | |||
<td align="left">4 bits</td> | <td align="left">4 bits</td> | |||
<td align="left">0b0110</td> | <td align="left">0b0110</td> | |||
<td align="left">8-bit block size further down</td> | <td align="left">8-bit block size further down</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x2c+4</td> | <td align="left">0x2c+4</td> | |||
<td align="left">4 bits</td> | <td align="left">4 bits</td> | |||
<td align="left">0b1001</td> | <td align="left">0b1001</td> | |||
<td align="left">sample rate 44.1 kHz</td> | <td align="left">Sample rate 44.1 kHz</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x2d+0</td> | <td align="left">0x2d+0</td> | |||
<td align="left">4 bits</td> | <td align="left">4 bits</td> | |||
<td align="left">0b0001</td> | <td align="left">0b0001</td> | |||
<td align="left">stereo, no decorrelation</td> | <td align="left">Stereo, no decorrelation</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x2d+4</td> | <td align="left">0x2d+4</td> | |||
<td align="left">3 bits</td> | <td align="left">3 bits</td> | |||
<td align="left">0b100</td> | <td align="left">0b100</td> | |||
<td align="left">bit depth 16 bit</td> | <td align="left">Bit depth 16 bits</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x2d+7</td> | <td align="left">0x2d+7</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">mandatory 0 bit</td> | <td align="left">Mandatory 0 bit</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x2e+0</td> | <td align="left">0x2e+0</td> | |||
<td align="left">1 byte</td> | <td align="left">1 byte</td> | |||
<td align="left">0x00</td> | <td align="left">0x00</td> | |||
<td align="left">frame number 0</td> | <td align="left">Frame number 0</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x2f+0</td> | <td align="left">0x2f+0</td> | |||
<td align="left">1 byte</td> | <td align="left">1 byte</td> | |||
<td align="left">0x00</td> | <td align="left">0x00</td> | |||
<td align="left">block size 1</td> | <td align="left">Block size 1</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x30+0</td> | <td align="left">0x30+0</td> | |||
<td align="left">1 byte</td> | <td align="left">1 byte</td> | |||
<td align="left">0xbf</td> | <td align="left">0xbf</td> | |||
<td align="left">frame header CRC</td> | <td align="left">Frame header CRC</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>As the stream is a fixed block size stream, the number at 0x2e contai ns a frame number. As the value is smaller than 128, only 1 byte is used for the encoding.</t> | </table><t>As the stream is a fixed block size stream, the number at 0x2e contai ns a frame number. Because the value is smaller than 128, only 1 byte is used fo r the encoding.</t> | |||
<t>At byte 0x31, the first subframe starts, which is broken down in the followin g table.</t> | <t>At byte 0x31, the first subframe starts, which is broken down in the followin g table.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0x31+0</td> | <td align="left">0x31+0</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">mandatory 0 bit</td> | <td align="left">Mandatory 0 bit</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x31+1</td> | <td align="left">0x31+1</td> | |||
<td align="left">6 bits</td> | <td align="left">6 bits</td> | |||
<td align="left">0b000001</td> | <td align="left">0b000001</td> | |||
<td align="left">verbatim subframe</td> | <td align="left">Verbatim subframe</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x31+7</td> | <td align="left">0x31+7</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b1</td> | <td align="left">0b1</td> | |||
<td align="left">wasted bits used</td> | <td align="left">Wasted bits used</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x32+0</td> | <td align="left">0x32+0</td> | |||
<td align="left">2 bits</td> | <td align="left">2 bits</td> | |||
<td align="left">0b01</td> | <td align="left">0b01</td> | |||
<td align="left">2 wasted bits used</td> | <td align="left">2 wasted bits used</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x32+2</td> | <td align="left">0x32+2</td> | |||
<td align="left">14 bits</td> | <td align="left">14 bits</td> | |||
<td align="left">0b011000, 0xfd</td> | <td align="left">0b011000, 0xfd</td> | |||
<td align="left">14-bit unencoded sample</td> | <td align="left">14-bit unencoded sample</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>As the wasted bits flag is 1 in this subframe, an unary coded number | </table><t>As the wasted bits flag is 1 in this subframe, a unary-coded number f | |||
follows. Starting at 0x32, we see 0b01, which unary codes for 1, meaning this su | ollows. Starting at 0x32, we see 0b01, which unary codes for 1, meaning that thi | |||
bframe uses 2 wasted bits.</t> | s subframe uses 2 wasted bits.</t> | |||
<t>As this is a verbatim subframe, the subframe only contains unencoded sample v | <t>As this is a verbatim subframe, the subframe only contains unencoded sample v | |||
alues. With a block size of 1, it contains only a single sample. The bit depth o | alues. With a block size of 1, it contains only a single sample. The bit depth o | |||
f the audio is 16 bits, but as the subframe header signals the use of 2 wasted b | f the audio is 16 bits, but as the subframe header signals the use of 2 wasted b | |||
its, only 14 bits are stored. As no stereo decorrelation is used, a bit depth in | its, only 14 bits are stored. As no stereo decorrelation is used, a bit depth in | |||
crease for the side channel is not applicable. So, the next 14 bits (starting at | crease for the side channel is not applicable. So, the next 14 bits (starting at | |||
position 0x32+2) contain the unencoded sample coded big-endian, signed two's co | position 0x32+2) contain the unencoded sample coded big-endian, signed two's co | |||
mplement. The value reads 0b011000 11111101, or 6397. This value needs to be shi | mplement. The value reads 0b011000 11111101, or 6397. This value needs to be shi | |||
fted left by 2 bits, to account for the wasted bits. The value is then 0b011000 | fted left by 2 bits to account for the wasted bits. The value is then 0b011000 1 | |||
11111101 00, or 25588.</t> | 1111101 00, or 25588.</t> | |||
<t>The second subframe starts at 0x34, and is broken down in the following table | <t>The second subframe starts at 0x34 and is broken down in the following table. | |||
.</t> | </t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0x34+0</td> | <td align="left">0x34+0</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">mandatory 0 bit</td> | <td align="left">Mandatory 0 bit</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x34+1</td> | <td align="left">0x34+1</td> | |||
<td align="left">6 bits</td> | <td align="left">6 bits</td> | |||
<td align="left">0b000001</td> | <td align="left">0b000001</td> | |||
<td align="left">verbatim subframe</td> | <td align="left">Verbatim subframe</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x34+7</td> | <td align="left">0x34+7</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b1</td> | <td align="left">0b1</td> | |||
<td align="left">wasted bits used</td> | <td align="left">Wasted bits used</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x35+0</td> | <td align="left">0x35+0</td> | |||
<td align="left">4 bits</td> | <td align="left">4 bits</td> | |||
<td align="left">0b0001</td> | <td align="left">0b0001</td> | |||
<td align="left">4 wasted bits used</td> | <td align="left">4 wasted bits used</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x35+4</td> | <td align="left">0x35+4</td> | |||
<td align="left">12 bits</td> | <td align="left">12 bits</td> | |||
<td align="left">0b0010, 0x8b</td> | <td align="left">0b0010, 0x8b</td> | |||
<td align="left">12-bit unencoded sample</td> | <td align="left">12-bit unencoded sample</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>Here the wasted bits flag is also one, but the unary coded number tha t follows it is 4 bit long, indicating the use of 4 wasted bits. This means the sample is stored in 12 bits. The sample value is 0b0010 10001011, or 651. This v alue now has to be shifted left by 4 bits, i.e., 0b0010 10001011 0000 or 10416.< /t> | </table><t>The wasted bits flag is also one, but the unary-coded number that fol lows it is 4 bits long, indicating the use of 4 wasted bits. This means the samp le is stored in 12 bits. The sample value is 0b0010 10001011, or 651. This value now has to be shifted left by 4 bits, i.e., 0b0010 10001011 0000, or 10416.</t> | |||
<t>At this point, we would undo stereo decorrelation if that was applicable.</t> | <t>At this point, we would undo stereo decorrelation if that was applicable.</t> | |||
<t>As the last subframe ends byte-aligned, no padding bits follow it. The next 2 bytes, starting at 0x38, contain the frame CRC. As this is the only frame in th e file, the file ends with the CRC.</t> | <t>As the last subframe ends byte-aligned, no padding bits follow it. The next 2 bytes, starting at 0x38, contain the frame CRC. As this is the only frame in th e file, the file ends with the CRC.</t> | |||
<t>To validate the MD5 checksum, we line up the samples interleaved, byte-aligne d, little endian, signed two's complement. The first sample, with value 25588, t ranslates to 0xf463, the second sample, with value 10416, translates to 0xb028. When computing the MD5 checksum with 0xf463b028 as input, we get the MD5 checks um found in the header, so decoding was lossless.</t> | <t>To validate the MD5 checksum, we line up the samples interleaved, byte-aligne d, little-endian, signed two's complement. The first sample, with value 25588, t ranslates to 0xf463, and the second sample, with value 10416, translates to 0xb0 28. When computing the MD5 checksum with 0xf463b028 as input, we get the MD5 che cksum found in the header, so decoding was lossless.</t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="decoding-example-2"><name>Decoding example 2</name> | <section anchor="decoding-example-2"><name>Decoding Example 2</name> | |||
<t>This FLAC file is larger than the first example, but still contains very litt | <t>This FLAC file is larger than the first example, but still contains very litt | |||
le audio. The focus of this example is on decoding a subframe with a fixed predi | le audio. The focus of this example is on decoding a subframe with a fixed predi | |||
ctor and a coded residual, but it also contains a very short seektable, a Vorbis | ctor and a coded residual, but it also contains a very short seek table, a Vorbi | |||
comment metadata block, and a padding metadata block.</t> | s comment metadata block, and a padding metadata block.</t> | |||
<section anchor="example-file-2-in-hexadecimal-representation"><name>Example fil e 2 in hexadecimal representation</name> | <section anchor="example-file-2-in-hexadecimal-representation"><name>Example Fil e 2 in Hexadecimal Representation</name> | |||
<artwork><![CDATA[00000000: 664c 6143 0000 0022 0010 0010 fLaC...".... | <artwork> | |||
00000000: 664c 6143 0000 0022 0010 0010 fLaC...".... | ||||
0000000c: 0000 1700 0044 0ac4 42f0 0000 .....D..B... | 0000000c: 0000 1700 0044 0ac4 42f0 0000 .....D..B... | |||
00000018: 0013 d5b0 5649 75e9 8b8d 8b93 ....VIu..... | 00000018: 0013 d5b0 5649 75e9 8b8d 8b93 ....VIu..... | |||
00000024: 0422 757b 8103 0300 0012 0000 ."u{........ | 00000024: 0422 757b 8103 0300 0012 0000 ."u{........ | |||
00000030: 0000 0000 0000 0000 0000 0000 ............ | 00000030: 0000 0000 0000 0000 0000 0000 ............ | |||
0000003c: 0000 0010 0400 003a 2000 0000 .......: ... | 0000003c: 0000 0010 0400 003a 2000 0000 .......: ... | |||
00000048: 7265 6665 7265 6e63 6520 6c69 reference li | 00000048: 7265 6665 7265 6e63 6520 6c69 reference li | |||
00000054: 6246 4c41 4320 312e 332e 3320 bFLAC 1.3.3 | 00000054: 6246 4c41 4320 312e 332e 3320 bFLAC 1.3.3 | |||
00000060: 3230 3139 3038 3034 0100 0000 20190804.... | 00000060: 3230 3139 3038 3034 0100 0000 20190804.... | |||
0000006c: 0e00 0000 5449 544c 453d d7a9 ....TITLE=.. | 0000006c: 0e00 0000 5449 544c 453d d7a9 ....TITLE=.. | |||
00000078: d79c d795 d79d 8100 0006 0000 ............ | 00000078: d79c d795 d79d 8100 0006 0000 ............ | |||
00000084: 0000 0000 fff8 6998 000f 9912 ......i..... | 00000084: 0000 0000 fff8 6998 000f 9912 ......i..... | |||
00000090: 0867 0162 3d14 4299 8f5d f70d .g.b=.B..].. | 00000090: 0867 0162 3d14 4299 8f5d f70d .g.b=.B..].. | |||
0000009c: 6fe0 0c17 caeb 2100 0ee7 a77a o.....!....z | 0000009c: 6fe0 0c17 caeb 2100 0ee7 a77a o.....!....z | |||
000000a8: 24a1 590c 1217 b603 097b 784f $.Y......{xO | 000000a8: 24a1 590c 1217 b603 097b 784f $.Y......{xO | |||
000000b4: aa9a 33d2 85e0 70ad 5b1b 4851 ..3...p.[.HQ | 000000b4: aa9a 33d2 85e0 70ad 5b1b 4851 ..3...p.[.HQ | |||
000000c0: b401 0d99 d2cd 1a68 f1e6 b810 .......h.... | 000000c0: b401 0d99 d2cd 1a68 f1e6 b810 .......h.... | |||
000000cc: fff8 6918 0102 a402 c382 c40b ..i......... | 000000cc: fff8 6918 0102 a402 c382 c40b ..i......... | |||
000000d8: c14a 03ee 48dd 03b6 7c13 30 .J..H...|.0 | 000000d8: c14a 03ee 48dd 03b6 7c13 30 .J..H...|.0 | |||
]]> | ||||
</artwork> | </artwork> | |||
</section> | </section> | |||
<section anchor="example-file-2-in-binary-representation-only-audio-frames"><nam e>Example file 2 in binary representation (only audio frames)</name> | <section anchor="example-file-2-in-binary-representation-only-audio-frames"><nam e>Example File 2 in Binary Representation (Only Audio Frames)</name> | |||
<artwork><![CDATA[00000088: 11111111 11111000 01101001 10011000 ..i. | <artwork type=""> | |||
00000088: 11111111 11111000 01101001 10011000 ..i. | ||||
0000008c: 00000000 00001111 10011001 00010010 .... | 0000008c: 00000000 00001111 10011001 00010010 .... | |||
00000090: 00001000 01100111 00000001 01100010 .g.b | 00000090: 00001000 01100111 00000001 01100010 .g.b | |||
00000094: 00111101 00010100 01000010 10011001 =.B. | 00000094: 00111101 00010100 01000010 10011001 =.B. | |||
00000098: 10001111 01011101 11110111 00001101 .].. | 00000098: 10001111 01011101 11110111 00001101 .].. | |||
0000009c: 01101111 11100000 00001100 00010111 o... | 0000009c: 01101111 11100000 00001100 00010111 o... | |||
000000a0: 11001010 11101011 00100001 00000000 ..!. | 000000a0: 11001010 11101011 00100001 00000000 ..!. | |||
000000a4: 00001110 11100111 10100111 01111010 ...z | 000000a4: 00001110 11100111 10100111 01111010 ...z | |||
000000a8: 00100100 10100001 01011001 00001100 $.Y. | 000000a8: 00100100 10100001 01011001 00001100 $.Y. | |||
000000ac: 00010010 00010111 10110110 00000011 .... | 000000ac: 00010010 00010111 10110110 00000011 .... | |||
000000b0: 00001001 01111011 01111000 01001111 .{xO | 000000b0: 00001001 01111011 01111000 01001111 .{xO | |||
skipping to change at line 2691 ¶ | skipping to change at line 2863 ¶ | |||
000000bc: 01011011 00011011 01001000 01010001 [.HQ | 000000bc: 01011011 00011011 01001000 01010001 [.HQ | |||
000000c0: 10110100 00000001 00001101 10011001 .... | 000000c0: 10110100 00000001 00001101 10011001 .... | |||
000000c4: 11010010 11001101 00011010 01101000 ...h | 000000c4: 11010010 11001101 00011010 01101000 ...h | |||
000000c8: 11110001 11100110 10111000 00010000 .... | 000000c8: 11110001 11100110 10111000 00010000 .... | |||
000000cc: 11111111 11111000 01101001 00011000 ..i. | 000000cc: 11111111 11111000 01101001 00011000 ..i. | |||
000000d0: 00000001 00000010 10100100 00000010 .... | 000000d0: 00000001 00000010 10100100 00000010 .... | |||
000000d4: 11000011 10000010 11000100 00001011 .... | 000000d4: 11000011 10000010 11000100 00001011 .... | |||
000000d8: 11000001 01001010 00000011 11101110 .J.. | 000000d8: 11000001 01001010 00000011 11101110 .J.. | |||
000000dc: 01001000 11011101 00000011 10110110 H... | 000000dc: 01001000 11011101 00000011 10110110 H... | |||
000000e0: 01111100 00010011 00110000 |.0 | 000000e0: 01111100 00010011 00110000 |.0 | |||
]]> | ||||
</artwork> | </artwork> | |||
</section> | </section> | |||
<section anchor="streaminfo-metadata-block"><name>Streaminfo metadata block</nam | <section anchor="streaminfo-metadata-block"><name>Streaminfo Metadata Block</nam | |||
e> | e> | |||
<t>Most of the streaminfo block, including its header, is the same as in example | <t>Most of the streaminfo metadata block, including its header, is the same as i | |||
1, so only parts that are different are listed in the following table.</t> | n example 1, so only parts that are different are listed in the following table. | |||
</t> | ||||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
skipping to change at line 2733 ¶ | skipping to change at line 2904 ¶ | |||
<td align="left">0x0a+0</td> | <td align="left">0x0a+0</td> | |||
<td align="left">2 bytes</td> | <td align="left">2 bytes</td> | |||
<td align="left">0x0010</td> | <td align="left">0x0010</td> | |||
<td align="left">Max. block size 16</td> | <td align="left">Max. block size 16</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x0c+0</td> | <td align="left">0x0c+0</td> | |||
<td align="left">3 bytes</td> | <td align="left">3 bytes</td> | |||
<td align="left">0x000017</td> | <td align="left">0x000017</td> | |||
<td align="left">Min. frame size 23 byte</td> | <td align="left">Min. frame size 23 bytes</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x0f+0</td> | <td align="left">0x0f+0</td> | |||
<td align="left">3 bytes</td> | <td align="left">3 bytes</td> | |||
<td align="left">0x000044</td> | <td align="left">0x000044</td> | |||
<td align="left">Max. frame size 68 byte</td> | <td align="left">Max. frame size 68 bytes</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x15+4</td> | <td align="left">0x15+4</td> | |||
<td align="left">36 bits</td> | <td align="left">36 bits</td> | |||
<td align="left">0b0000, 0x00000013</td> | <td align="left">0b0000, 0x00000013</td> | |||
<td align="left">Total no. of samples 19</td> | <td align="left">Total no. of samples 19</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x1a</td> | <td align="left">0x1a</td> | |||
<td align="left">16 bytes</td> | <td align="left">16 bytes</td> | |||
<td align="left">(...)</td> | <td align="left">(...)</td> | |||
<td align="left">MD5 checksum</td> | <td align="left">MD5 checksum</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>This time, the minimum and maximum block sizes are reflected in the f ile: there is one block of 16 samples, the last block (which has 3 samples) is n ot considered for the minimum block size. The MD5 checksum is 0xd5b0 5649 75e9 8 b8d 8b93 0422 757b 8103, this will be verified at the end of this example.</t> | </table><t>This time, the minimum and maximum block sizes are reflected in the f ile: there is one block of 16 samples, and the last block (which has 3 samples) is not considered for the minimum block size. The MD5 checksum is 0xd5b0 5649 75 e9 8b8d 8b93 0422 757b 8103. This will be verified at the end of this example.</ t> | |||
</section> | </section> | |||
<section anchor="seektable-1"><name>Seektable</name> | <section anchor="seektable-1"><name>Seek Table</name> | |||
<t>The seektable metadata block only holds one entry. It is not really useful he | <t>The seek table metadata block only holds one entry. It is not really useful h | |||
re, as it points to the first frame, but it is enough for this example. The seek | ere, as it points to the first frame, but it is enough for this example. The see | |||
table metadata block is broken down in the following table.</t> | k table metadata block is broken down in the following table.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
skipping to change at line 2784 ¶ | skipping to change at line 2955 ¶ | |||
<td align="left">0x2a+0</td> | <td align="left">0x2a+0</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">Not the last metadata block</td> | <td align="left">Not the last metadata block</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x2a+1</td> | <td align="left">0x2a+1</td> | |||
<td align="left">7 bits</td> | <td align="left">7 bits</td> | |||
<td align="left">0b0000011</td> | <td align="left">0b0000011</td> | |||
<td align="left">Seektable metadata block</td> | <td align="left">Seek table metadata block</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x2b+0</td> | <td align="left">0x2b+0</td> | |||
<td align="left">3 bytes</td> | <td align="left">3 bytes</td> | |||
<td align="left">0x000012</td> | <td align="left">0x000012</td> | |||
<td align="left">Length 18 byte</td> | <td align="left">Length 18 bytes</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x2e+0</td> | <td align="left">0x2e+0</td> | |||
<td align="left">8 bytes</td> | <td align="left">8 bytes</td> | |||
<td align="left">0x0000000000000000</td> | <td align="left">0x0000000000000000</td> | |||
<td align="left">Seekpoint to sample 0</td> | <td align="left">Seek point to sample 0</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x36+0</td> | <td align="left">0x36+0</td> | |||
<td align="left">8 bytes</td> | <td align="left">8 bytes</td> | |||
<td align="left">0x0000000000000000</td> | <td align="left">0x0000000000000000</td> | |||
<td align="left">Seekpoint to offset 0</td> | <td align="left">Seek point to offset 0</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x3e+0</td> | <td align="left">0x3e+0</td> | |||
<td align="left">2 bytes</td> | <td align="left">2 bytes</td> | |||
<td align="left">0x0010</td> | <td align="left">0x0010</td> | |||
<td align="left">Seekpoint to block size 16</td> | <td align="left">Seek point to block size 16</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table></section> | </table></section> | |||
<section anchor="vorbis-comment-1"><name>Vorbis comment</name> | <section anchor="vorbis-comment-1"><name>Vorbis Comment</name> | |||
<t>The Vorbis comment metadata block contains the vendor string and a single com ment. It is broken down in the following table.</t> | <t>The Vorbis comment metadata block contains the vendor string and a single com ment. It is broken down in the following table.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
skipping to change at line 2848 ¶ | skipping to change at line 3019 ¶ | |||
<td align="left">0x40+1</td> | <td align="left">0x40+1</td> | |||
<td align="left">7 bits</td> | <td align="left">7 bits</td> | |||
<td align="left">0b0000100</td> | <td align="left">0b0000100</td> | |||
<td align="left">Vorbis comment metadata block</td> | <td align="left">Vorbis comment metadata block</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x41+0</td> | <td align="left">0x41+0</td> | |||
<td align="left">3 bytes</td> | <td align="left">3 bytes</td> | |||
<td align="left">0x00003a</td> | <td align="left">0x00003a</td> | |||
<td align="left">Length 58 byte</td> | <td align="left">Length 58 bytes</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x44+0</td> | <td align="left">0x44+0</td> | |||
<td align="left">4 bytes</td> | <td align="left">4 bytes</td> | |||
<td align="left">0x20000000</td> | <td align="left">0x20000000</td> | |||
<td align="left">Vendor string length 32 byte</td> | <td align="left">Vendor string length 32 bytes</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x48+0</td> | <td align="left">0x48+0</td> | |||
<td align="left">32 bytes</td> | <td align="left">32 bytes</td> | |||
<td align="left">(...)</td> | <td align="left">(...)</td> | |||
<td align="left">Vendor string</td> | <td align="left">Vendor string</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x68+0</td> | <td align="left">0x68+0</td> | |||
<td align="left">4 bytes</td> | <td align="left">4 bytes</td> | |||
<td align="left">0x01000000</td> | <td align="left">0x01000000</td> | |||
<td align="left">Number of fields 1</td> | <td align="left">Number of fields 1</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x6c+0</td> | <td align="left">0x6c+0</td> | |||
<td align="left">4 bytes</td> | <td align="left">4 bytes</td> | |||
<td align="left">0x0e000000</td> | <td align="left">0x0e000000</td> | |||
<td align="left">Field length 14 byte</td> | <td align="left">Field length 14 bytes</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x70+0</td> | <td align="left">0x70+0</td> | |||
<td align="left">14 bytes</td> | <td align="left">14 bytes</td> | |||
<td align="left">(...)</td> | <td align="left">(...)</td> | |||
<td align="left">Field contents</td> | <td align="left">Field contents</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>The vendor string is reference libFLAC 1.3.3 20190804, and the field contents of the only field is TITLE=שלום. The Vorbis comment field is 14 bytes b ut only 10 characters in size, because it contains four 2-byte characters.</t> | </table><t>The vendor string is reference libFLAC 1.3.3 20190804, and the field contents of the only field is TITLE=שלום. The Vorbis comment field is 14 bytes b ut only 10 characters in size, because it contains four 2-byte characters.</t> | |||
skipping to change at line 2932 ¶ | skipping to change at line 3103 ¶ | |||
<tr> | <tr> | |||
<td align="left">0x82+0</td> | <td align="left">0x82+0</td> | |||
<td align="left">6 bytes</td> | <td align="left">6 bytes</td> | |||
<td align="left">0x000000000000</td> | <td align="left">0x000000000000</td> | |||
<td align="left">Padding bytes</td> | <td align="left">Padding bytes</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table></section> | </table></section> | |||
<section anchor="first-audio-frame"><name>First audio frame</name> | <section anchor="first-audio-frame"><name>First Audio Frame</name> | |||
<t>The frame header starts at position 0x88 and is broken down in the following table.</t> | <t>The frame header starts at position 0x88 and is broken down in the following table.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0x88+0</td> | <td align="left">0x88+0</td> | |||
<td align="left">15 bits</td> | <td align="left">15 bits</td> | |||
<td align="left">0xff, 0b1111100</td> | <td align="left">0xff, 0b1111100</td> | |||
<td align="left">frame sync</td> | <td align="left">Frame sync</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x89+7</td> | <td align="left">0x89+7</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">blocking strategy</td> | <td align="left">Blocking strategy</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x8a+0</td> | <td align="left">0x8a+0</td> | |||
<td align="left">4 bits</td> | <td align="left">4 bits</td> | |||
<td align="left">0b0110</td> | <td align="left">0b0110</td> | |||
<td align="left">8-bit block size further down</td> | <td align="left">8-bit block size further down</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x8a+4</td> | <td align="left">0x8a+4</td> | |||
<td align="left">4 bits</td> | <td align="left">4 bits</td> | |||
<td align="left">0b1001</td> | <td align="left">0b1001</td> | |||
<td align="left">sample rate 44.1 kHz</td> | <td align="left">Sample rate 44.1 kHz</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x8b+0</td> | <td align="left">0x8b+0</td> | |||
<td align="left">4 bits</td> | <td align="left">4 bits</td> | |||
<td align="left">0b1001</td> | <td align="left">0b1001</td> | |||
<td align="left">side-right stereo</td> | <td align="left">Side-right stereo</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x8b+4</td> | <td align="left">0x8b+4</td> | |||
<td align="left">3 bits</td> | <td align="left">3 bits</td> | |||
<td align="left">0b100</td> | <td align="left">0b100</td> | |||
<td align="left">bit depth 16 bit</td> | <td align="left">Bit depth 16 bit</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x8b+7</td> | <td align="left">0x8b+7</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">mandatory 0 bit</td> | <td align="left">Mandatory 0 bit</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x8c+0</td> | <td align="left">0x8c+0</td> | |||
<td align="left">1 byte</td> | <td align="left">1 byte</td> | |||
<td align="left">0x00</td> | <td align="left">0x00</td> | |||
<td align="left">frame number 0</td> | <td align="left">Frame number 0</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x8d+0</td> | <td align="left">0x8d+0</td> | |||
<td align="left">1 byte</td> | <td align="left">1 byte</td> | |||
<td align="left">0x0f</td> | <td align="left">0x0f</td> | |||
<td align="left">block size 16</td> | <td align="left">Block size 16</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x8e+0</td> | <td align="left">0x8e+0</td> | |||
<td align="left">1 byte</td> | <td align="left">1 byte</td> | |||
<td align="left">0x99</td> | <td align="left">0x99</td> | |||
<td align="left">frame header CRC</td> | <td align="left">Frame header CRC</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>The first subframe starts at byte 0x8f, it is broken down in the foll owing table excluding the coded residual. As this subframe codes for a side chan nel, the bit depth is increased by 1 bit from 16 bit to 17 bit. This is most cle arly present in the unencoded warm-up sample.</t> | </table><t>The first subframe starts at byte 0x8f, and it is broken down in the following table, excluding the coded residual. As this subframe codes for a side channel, the bit depth is increased by 1 bit from 16 bits to 17 bits. This is m ost clearly present in the unencoded warm-up sample.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0x8f+0</td> | <td align="left">0x8f+0</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">mandatory 0 bit</td> | <td align="left">Mandatory 0 bit</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x8f+1</td> | <td align="left">0x8f+1</td> | |||
<td align="left">6 bits</td> | <td align="left">6 bits</td> | |||
<td align="left">0b001001</td> | <td align="left">0b001001</td> | |||
<td align="left">fixed subframe, 1st order</td> | <td align="left">Fixed subframe, 1st order</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x8f+7</td> | <td align="left">0x8f+7</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">no wasted bits used</td> | <td align="left">No wasted bits used</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x90+0</td> | <td align="left">0x90+0</td> | |||
<td align="left">17 bits</td> | <td align="left">17 bits</td> | |||
<td align="left">0x0867, 0b0</td> | <td align="left">0x0867, 0b0</td> | |||
<td align="left">unencoded warm-up sample</td> | <td align="left">Unencoded warm-up sample</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>The coded residual is broken down in the following table. All quotien ts are unary coded, all remainders are stored unencoded with a number of bits sp ecified by the Rice parameter.</t> | </table><t>The coded residual is broken down in the following table. All quotien ts are unary coded, and all remainders are stored unencoded with a number of bit s specified by the Rice parameter.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
skipping to change at line 3298 ¶ | skipping to change at line 3469 ¶ | |||
<td align="left">Quotient 0</td> | <td align="left">Quotient 0</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xaa+5</td> | <td align="left">0xaa+5</td> | |||
<td align="left">11 bits</td> | <td align="left">11 bits</td> | |||
<td align="left">0b00100001100</td> | <td align="left">0b00100001100</td> | |||
<td align="left">Remainder 268</td> | <td align="left">Remainder 268</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>At this point, the decoder should know it is done decoding the coded | </table><t>At this point, the decoder should know it is done decoding the coded | |||
residual, as it received 16 samples: 1 warm-up sample and 15 residual samples. E | residual, as it received 16 samples: 1 warm-up sample and 15 residual samples. | |||
ach residual sample can be calculated from the quotient and remainder, and undoi | ||||
ng the zig-zag encoding. For example, the value of the first zig-zag encoded res | Each residual sample can be calculated from the quotient and remainder and from | |||
idual sample is 3 * 2^11 + 244 = 6388. As this is an even number, the zig-zag en | undoing the zigzag encoding. For example, the value of the first zigzag-encoded | |||
coding is undone by dividing by 2, the residual sample value is 3194. This is do | residual sample is 3 * 2<sup>11</sup> + 244 = 6388. As this is an even number, t | |||
ne for all residual samples in the next table.</t> | he zigzag encoding is undone by dividing by 2; the residual sample value is 3194 | |||
. This is done for all residual samples in the next table.</t> | ||||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Quotient</th> | <th align="left">Quotient</th> | |||
<th align="left">Remainder</th> | <th align="left">Remainder</th> | |||
<th align="left">Zig-zag encoded</th> | <th align="left">Zigzag Encoded</th> | |||
<th align="left">Residual sample value</th> | <th align="left">Residual Sample Value</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">3</td> | <td align="left">3</td> | |||
<td align="left">244</td> | <td align="left">244</td> | |||
<td align="left">6388</td> | <td align="left">6388</td> | |||
<td align="left">3194</td> | <td align="left">3194</td> | |||
</tr> | </tr> | |||
skipping to change at line 3415 ¶ | skipping to change at line 3588 ¶ | |||
<td align="left">-267</td> | <td align="left">-267</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0</td> | <td align="left">0</td> | |||
<td align="left">268</td> | <td align="left">268</td> | |||
<td align="left">268</td> | <td align="left">268</td> | |||
<td align="left">134</td> | <td align="left">134</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>It can be calculated that using a Rice code is, in this case, more ef | </table> | |||
ficient than storing values unencoded. The Rice code (excluding the partition or | ||||
der and parameter) is 199 bits in length. The largest residual value (-13172) wo | <t>In this case, using a Rice code is more efficient than storing values | |||
uld need 15 bits to be stored unencoded, so storing all 15 samples with 15 bits | unencoded. The Rice code (excluding the partition order and parameter) is 199 | |||
results in a sequence with a length of 225 bits.</t> | bits in length. The largest residual value (-13172) would need 15 bits to be | |||
<t>The next step is using the predictor and the residuals to restore the sample | stored unencoded, so storing all 15 samples with 15 bits results in a sequence | |||
values. As this subframe uses a fixed predictor with order 1, this means adding | with a length of 225 bits.</t> | |||
the residual value to the value of the previous sample.</t> | <t>The next step is using the predictor and the residuals to restore the sample | |||
values. As this subframe uses a fixed predictor with order 1, the residual value | ||||
is added to the value of the previous sample.</t> | ||||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th>Residual</th> | <th>Residual</th> | |||
<th align="left">Sample value</th> | <th align="left">Sample Value</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td>(warm-up)</td> | <td>(warm-up)</td> | |||
<td align="left">4302</td> | <td align="left">4302</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
skipping to change at line 3506 ¶ | skipping to change at line 3685 ¶ | |||
<tr> | <tr> | |||
<td>-267</td> | <td>-267</td> | |||
<td align="left">-6299</td> | <td align="left">-6299</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td>134</td> | <td>134</td> | |||
<td align="left">-6165</td> | <td align="left">-6165</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>With this, the decoding of the first subframe is complete. The decodi ng of the second subframe is very similar, as it also uses a fixed predictor of order 1, so this is left as an exercise for the reader, the results are in the n ext table. The next step is undoing stereo decorrelation, which is done in the f ollowing table. As the stereo decorrelation is side-right, the samples in the ri ght channel come directly from the second subframe, while the samples in the lef t channel are found by adding the values of both subframes for each sample.</t> | </table><t>With this, the decoding of the first subframe is complete. The decodi ng of the second subframe is very similar, as it also uses a fixed predictor of order 1. This is left as an exercise for the reader; the results are in the next table. The next step is undoing stereo decorrelation, which is done in the foll owing table. As the stereo decorrelation is side-right, the samples in the right channel come directly from the second subframe, while the samples in the left c hannel are found by adding the values of both subframes for each sample.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Subframe 1</th> | <th align="left">Subframe 1</th> | |||
<th align="left">Subframe 2</th> | <th align="left">Subframe 2</th> | |||
<th align="left">Left</th> | <th align="left">Left</th> | |||
<th align="left">Right</th> | <th align="left">Right</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
skipping to change at line 3633 ¶ | skipping to change at line 3812 ¶ | |||
<tr> | <tr> | |||
<td align="left">-6165</td> | <td align="left">-6165</td> | |||
<td align="left">-8653</td> | <td align="left">-8653</td> | |||
<td align="left">-14818</td> | <td align="left">-14818</td> | |||
<td align="left">-8653</td> | <td align="left">-8653</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>As the second subframe ends byte-aligned, no padding bits follow it. Finally, the last 2 bytes of the frame contain the frame CRC.</t> | </table><t>As the second subframe ends byte-aligned, no padding bits follow it. Finally, the last 2 bytes of the frame contain the frame CRC.</t> | |||
</section> | </section> | |||
<section anchor="second-audio-frame"><name>Second audio frame</name> | <section anchor="second-audio-frame"><name>Second Audio Frame</name> | |||
<t>The second audio frame is very similar to the frame decoded in the first exam | <t>The second audio frame is very similar to the frame decoded in the first exam | |||
ple, but this time not 1 but 3 samples are present.</t> | ple, but this time, 3 samples (not 1) are present.</t> | |||
<t>The frame header starts at position 0xcc and is broken down in the following table.</t> | <t>The frame header starts at position 0xcc and is broken down in the following table.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0xcc+0</td> | <td align="left">0xcc+0</td> | |||
<td align="left">15 bits</td> | <td align="left">15 bits</td> | |||
<td align="left">0xff, 0b1111100</td> | <td align="left">0xff, 0b1111100</td> | |||
<td align="left">frame sync</td> | <td align="left">Frame sync</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xcd+7</td> | <td align="left">0xcd+7</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">blocking strategy</td> | <td align="left">Blocking strategy</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xce+0</td> | <td align="left">0xce+0</td> | |||
<td align="left">4 bits</td> | <td align="left">4 bits</td> | |||
<td align="left">0b0110</td> | <td align="left">0b0110</td> | |||
<td align="left">8-bit block size further down</td> | <td align="left">8-bit block size further down</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xce+4</td> | <td align="left">0xce+4</td> | |||
<td align="left">4 bits</td> | <td align="left">4 bits</td> | |||
<td align="left">0b1001</td> | <td align="left">0b1001</td> | |||
<td align="left">sample rate 44.1 kHz</td> | <td align="left">Sample rate 44.1 kHz</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xcf+0</td> | <td align="left">0xcf+0</td> | |||
<td align="left">4 bits</td> | <td align="left">4 bits</td> | |||
<td align="left">0b0001</td> | <td align="left">0b0001</td> | |||
<td align="left">stereo, no decorrelation</td> | <td align="left">Stereo, no decorrelation</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xcf+4</td> | <td align="left">0xcf+4</td> | |||
<td align="left">3 bits</td> | <td align="left">3 bits</td> | |||
<td align="left">0b100</td> | <td align="left">0b100</td> | |||
<td align="left">bit depth 16 bit</td> | <td align="left">Bit depth 16 bits</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xcf+7</td> | <td align="left">0xcf+7</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">mandatory 0 bit</td> | <td align="left">Mandatory 0 bit</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xd0+0</td> | <td align="left">0xd0+0</td> | |||
<td align="left">1 byte</td> | <td align="left">1 byte</td> | |||
<td align="left">0x01</td> | <td align="left">0x01</td> | |||
<td align="left">frame number 1</td> | <td align="left">Frame number 1</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xd1+0</td> | <td align="left">0xd1+0</td> | |||
<td align="left">1 byte</td> | <td align="left">1 byte</td> | |||
<td align="left">0x02</td> | <td align="left">0x02</td> | |||
<td align="left">block size 3</td> | <td align="left">Block size 3</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xd2+0</td> | <td align="left">0xd2+0</td> | |||
<td align="left">1 byte</td> | <td align="left">1 byte</td> | |||
<td align="left">0xa4</td> | <td align="left">0xa4</td> | |||
<td align="left">frame header CRC</td> | <td align="left">Frame header CRC</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>The first subframe starts at 0xd3+0 and is broken down in the followi ng table.</t> | </table><t>The first subframe starts at 0xd3+0 and is broken down in the followi ng table.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0xd3+0</td> | <td align="left">0xd3+0</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">mandatory 0 bit</td> | <td align="left">Mandatory 0 bit</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xd3+1</td> | <td align="left">0xd3+1</td> | |||
<td align="left">6 bits</td> | <td align="left">6 bits</td> | |||
<td align="left">0b000001</td> | <td align="left">0b000001</td> | |||
<td align="left">verbatim subframe</td> | <td align="left">Verbatim subframe</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xd3+7</td> | <td align="left">0xd3+7</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">no wasted bits used</td> | <td align="left">No wasted bits used</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xd4+0</td> | <td align="left">0xd4+0</td> | |||
<td align="left">16 bits</td> | <td align="left">16 bits</td> | |||
<td align="left">0xc382</td> | <td align="left">0xc382</td> | |||
<td align="left">16-bit unencoded sample</td> | <td align="left">16-bit unencoded sample</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
skipping to change at line 3787 ¶ | skipping to change at line 3967 ¶ | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0xda+0</td> | <td align="left">0xda+0</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">mandatory 0 bit</td> | <td align="left">Mandatory 0 bit</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xda+1</td> | <td align="left">0xda+1</td> | |||
<td align="left">6 bits</td> | <td align="left">6 bits</td> | |||
<td align="left">0b000001</td> | <td align="left">0b000001</td> | |||
<td align="left">verbatim subframe</td> | <td align="left">Verbatim subframe</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xda+7</td> | <td align="left">0xda+7</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b1</td> | <td align="left">0b1</td> | |||
<td align="left">wasted bits used</td> | <td align="left">Wasted bits used</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xdb+0</td> | <td align="left">0xdb+0</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b1</td> | <td align="left">0b1</td> | |||
<td align="left">1 wasted bit used</td> | <td align="left">1 wasted bit used</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
skipping to change at line 3833 ¶ | skipping to change at line 4013 ¶ | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0xde+7</td> | <td align="left">0xde+7</td> | |||
<td align="left">15 bits</td> | <td align="left">15 bits</td> | |||
<td align="left">0b110110110011111</td> | <td align="left">0b110110110011111</td> | |||
<td align="left">15-bit unencoded sample</td> | <td align="left">15-bit unencoded sample</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>As this subframe uses wasted bits, the 15-bit unencoded samples need to be shifted left by 1 bit. For example, sample 1 is stored as -4536 and become s -9072 after shifting left 1 bit.</t> | </table><t>As this subframe uses wasted bits, the 15-bit unencoded samples need to be shifted left by 1 bit. For example, sample 1 is stored as -4536 and become s -9072 after shifting left 1 bit.</t> | |||
<t>As the last subframe does not end on byte alignment, 2 padding bits are added | ||||
before the 2 byte frame CRC follows at 0xe1+0.</t> | <t>As the last subframe does not end on byte alignment, 2 padding bits are added | |||
before the 2-byte frame CRC, which follows at 0xe1+0.</t> | ||||
</section> | </section> | |||
<section anchor="md5-checksum-verification"><name>MD5 checksum verification</nam | <section anchor="md5-checksum-verification"><name>MD5 Checksum Verification</nam | |||
e> | e> | |||
<t>All samples in the file have been decoded, we can now verify the MD5 checksum | <t>All samples in the file have been decoded, and we can now verify the MD5 chec | |||
. All sample values must be interleaved and stored signed, coded little-endian. | ksum. All sample values must be interleaved and stored signed coded little-endia | |||
The result of this follows in groups of 12 samples (i.e., 6 interchannel samples | n. The result of this follows in groups of 12 samples (i.e., 6 interchannel samp | |||
) per line.</t> | les) per line.</t> | |||
<artwork><![CDATA[0x8428 B617 7946 3129 5E3A 2722 D445 D128 0B3D B723 EB45 DF28 | <artwork type=""> | |||
0x8428 B617 7946 3129 5E3A 2722 D445 D128 0B3D B723 EB45 DF28 | ||||
0x723f 1E25 9D46 4929 B841 7026 5747 B829 8F43 8127 AEC7 14DF | 0x723f 1E25 9D46 4929 B841 7026 5747 B829 8F43 8127 AEC7 14DF | |||
0x9FC4 41DD 54C7 E4DE A5C4 40DD 1EC6 33DE 82C3 90DC 0BC4 02DD | 0x9FC4 41DD 54C7 E4DE A5C4 40DD 1EC6 33DE 82C3 90DC 0BC4 02DD | |||
0x4AC1 3EDB | 0x4AC1 3EDB | |||
]]> | ||||
</artwork> | </artwork> | |||
<t>The MD5 checksum of this is indeed the same as the one found in the streaminf o metadata block.</t> | <t>The MD5 checksum of this is indeed the same as the one found in the streaminf o metadata block.</t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="decoding-example-3"><name>Decoding example 3</name> | <section anchor="decoding-example-3"><name>Decoding Example 3</name> | |||
<t>This example is once again a very short FLAC file. The focus of this example is on decoding a subframe with a linear predictor and a coded residual with more than one partition.</t> | <t>This example is once again a very short FLAC file. The focus of this example is on decoding a subframe with a linear predictor and a coded residual with more than one partition.</t> | |||
<section anchor="example-file-3-in-hexadecimal-representation"><name>Example fil e 3 in hexadecimal representation</name> | <section anchor="example-file-3-in-hexadecimal-representation"><name>Example Fil e 3 in Hexadecimal Representation</name> | |||
<artwork><![CDATA[00000000: 664c 6143 8000 0022 1000 1000 fLaC...".... | <artwork type=""> | |||
00000000: 664c 6143 8000 0022 1000 1000 fLaC...".... | ||||
0000000c: 0000 1f00 001f 07d0 0070 0000 .........p.. | 0000000c: 0000 1f00 001f 07d0 0070 0000 .........p.. | |||
00000018: 0018 f8f9 e396 f5cb cfc6 dc80 ............ | 00000018: 0018 f8f9 e396 f5cb cfc6 dc80 ............ | |||
00000024: 7f99 7790 6b32 fff8 6802 0017 ..w.k2..h... | 00000024: 7f99 7790 6b32 fff8 6802 0017 ..w.k2..h... | |||
00000030: e944 004f 6f31 3d10 47d2 27cb .D.Oo1=.G.'. | 00000030: e944 004f 6f31 3d10 47d2 27cb .D.Oo1=.G.'. | |||
0000003c: 6d09 0831 452b dc28 2222 8057 m..1E+.("".W | 0000003c: 6d09 0831 452b dc28 2222 8057 m..1E+.("".W | |||
00000048: a3 . | 00000048: a3 . | |||
]]> | ||||
</artwork> | </artwork> | |||
</section> | </section> | |||
<section anchor="example-file-3-in-binary-representation-only-audio-frame"><name >Example file 3 in binary representation (only audio frame)</name> | <section anchor="example-file-3-in-binary-representation-only-audio-frame"><name >Example File 3 in Binary Representation (Only Audio Frame)</name> | |||
<artwork><![CDATA[0000002a: 11111111 11111000 01101000 00000010 ..h. | <artwork type=""> | |||
0000002a: 11111111 11111000 01101000 00000010 ..h. | ||||
0000002e: 00000000 00010111 11101001 01000100 ...D | 0000002e: 00000000 00010111 11101001 01000100 ...D | |||
00000032: 00000000 01001111 01101111 00110001 .Oo1 | 00000032: 00000000 01001111 01101111 00110001 .Oo1 | |||
00000036: 00111101 00010000 01000111 11010010 =.G. | 00000036: 00111101 00010000 01000111 11010010 =.G. | |||
0000003a: 00100111 11001011 01101101 00001001 '.m. | 0000003a: 00100111 11001011 01101101 00001001 '.m. | |||
0000003e: 00001000 00110001 01000101 00101011 .1E+ | 0000003e: 00001000 00110001 01000101 00101011 .1E+ | |||
00000042: 11011100 00101000 00100010 00100010 .("" | 00000042: 11011100 00101000 00100010 00100010 .("" | |||
00000046: 10000000 01010111 10100011 .W. | 00000046: 10000000 01010111 10100011 .W. | |||
]]> | ||||
</artwork> | </artwork> | |||
</section> | </section> | |||
<section anchor="streaminfo-metadata-block-1"><name>Streaminfo metadata block</n ame> | <section anchor="streaminfo-metadata-block-1"><name>Streaminfo Metadata Block</n ame> | |||
<t>Most of the streaminfo metadata block, including its header, is the same as i n example 1, so only parts that are different are listed in the following table. </t> | <t>Most of the streaminfo metadata block, including its header, is the same as i n example 1, so only parts that are different are listed in the following table. </t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td align="left">0x0c+0</td> | <td align="left">0x0c+0</td> | |||
<td align="left">3 bytes</td> | <td align="left">3 bytes</td> | |||
<td align="left">0x00001f</td> | <td align="left">0x00001f</td> | |||
<td align="left">Min. frame size 31 byte</td> | <td align="left">Min. frame size 31 bytes</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x0f+0</td> | <td align="left">0x0f+0</td> | |||
<td align="left">3 bytes</td> | <td align="left">3 bytes</td> | |||
<td align="left">0x00001f</td> | <td align="left">0x00001f</td> | |||
<td align="left">Max. frame size 31 byte</td> | <td align="left">Max. frame size 31 bytes</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x12+0</td> | <td align="left">0x12+0</td> | |||
<td align="left">20 bits</td> | <td align="left">20 bits</td> | |||
<td align="left">0x07d0, 0x0000</td> | <td align="left">0x07d0, 0x0000</td> | |||
<td align="left">Sample rate 32000 hertz</td> | <td align="left">Sample rate 32000 hertz</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x14+4</td> | <td align="left">0x14+4</td> | |||
<td align="left">3 bits</td> | <td align="left">3 bits</td> | |||
<td align="left">0b000</td> | <td align="left">0b000</td> | |||
<td align="left">1 channel</td> | <td align="left">1 channel</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x14+7</td> | <td align="left">0x14+7</td> | |||
<td align="left">5 bits</td> | <td align="left">5 bits</td> | |||
<td align="left">0b00111</td> | <td align="left">0b00111</td> | |||
<td align="left">Sample bit depth 8 bit</td> | <td align="left">Sample bit depth 8 bits</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x15+4</td> | <td align="left">0x15+4</td> | |||
<td align="left">36 bits</td> | <td align="left">36 bits</td> | |||
<td align="left">0b0000, 0x00000018</td> | <td align="left">0b0000, 0x00000018</td> | |||
<td align="left">Total no. of samples 24</td> | <td align="left">Total no. of samples 24</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x1a</td> | <td align="left">0x1a</td> | |||
<td align="left">16 bytes</td> | <td align="left">16 bytes</td> | |||
<td align="left">(...)</td> | <td align="left">(...)</td> | |||
<td align="left">MD5 checksum</td> | <td align="left">MD5 checksum</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table></section> | </table></section> | |||
<section anchor="audio-frame"><name>Audio frame</name> | <section anchor="audio-frame"><name>Audio Frame</name> | |||
<t>The frame header starts at position 0x2a and is broken down in the following table.</t> | <t>The frame header starts at position 0x2a and is broken down in the following table.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
skipping to change at line 3995 ¶ | skipping to change at line 4176 ¶ | |||
<td align="left">0x2d+0</td> | <td align="left">0x2d+0</td> | |||
<td align="left">4 bits</td> | <td align="left">4 bits</td> | |||
<td align="left">0b0000</td> | <td align="left">0b0000</td> | |||
<td align="left">Mono audio (1 channel)</td> | <td align="left">Mono audio (1 channel)</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x2d+4</td> | <td align="left">0x2d+4</td> | |||
<td align="left">3 bits</td> | <td align="left">3 bits</td> | |||
<td align="left">0b001</td> | <td align="left">0b001</td> | |||
<td align="left">Bit depth 8 bit</td> | <td align="left">Bit depth 8 bits</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x2d+7</td> | <td align="left">0x2d+7</td> | |||
<td align="left">1 bit</td> | <td align="left">1 bit</td> | |||
<td align="left">0b0</td> | <td align="left">0b0</td> | |||
<td align="left">Mandatory 0 bit</td> | <td align="left">Mandatory 0 bit</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
skipping to change at line 4026 ¶ | skipping to change at line 4207 ¶ | |||
<td align="left">Block size 24</td> | <td align="left">Block size 24</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x30+0</td> | <td align="left">0x30+0</td> | |||
<td align="left">1 byte</td> | <td align="left">1 byte</td> | |||
<td align="left">0xe9</td> | <td align="left">0xe9</td> | |||
<td align="left">Frame header CRC</td> | <td align="left">Frame header CRC</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>The first and only subframe starts at byte 0x31, it is broken down in the following table, without the coded residual.</t> | </table><t>The first and only subframe starts at byte 0x31. It is broken down in the following table, without the coded residual.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th align="left">Start</th> | <th align="left">Start</th> | |||
<th align="left">Length</th> | <th align="left">Length</th> | |||
<th align="left">Contents</th> | <th align="left">Contents</th> | |||
<th align="left">Description</th> | <th align="left">Description</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
skipping to change at line 4274 ¶ | skipping to change at line 4455 ¶ | |||
<td align="left">Rice parameter 1</td> | <td align="left">Rice parameter 1</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td align="left">0x43+3</td> | <td align="left">0x43+3</td> | |||
<td align="left">23 bits</td> | <td align="left">23 bits</td> | |||
<td align="left">(...)</td> | <td align="left">(...)</td> | |||
<td align="left">Residual partition 4</td> | <td align="left">Residual partition 4</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>The frame ends with 6 padding bits and a 2 byte frame CRC</t> | </table><t>The frame ends with 6 padding bits and a 2-byte frame CRC.</t> | |||
<t>To decode this subframe, 21 predictions have to be calculated and added to th | <t>To decode this subframe, 21 predictions have to be calculated and added to th | |||
eir corresponding residuals. This is a sequential process: as each prediction us | eir corresponding residuals. This is a sequential process: as each prediction us | |||
es previous samples, it is not possible to start this decoding halfway a subfram | es previous samples, it is not possible to start this decoding halfway through a | |||
e or decode a subframe with parallel threads.</t> | subframe or decode a subframe with parallel threads.</t> | |||
<t>The following table breaks down the calculation for each sample. For example, the predictor without shift value of row 4 is found by applying the predictor w ith the three warm-up samples: 7*111 - 6*79 + 2*0 = 303. This value is then shif ted right by 2 bits: 303 >> 2 = 75. Then, the decoded residual sample is a dded: 75 + 3 = 78.</t> | <t>The following table breaks down the calculation for each sample. For example, the predictor without shift value of row 4 is found by applying the predictor w ith the three warm-up samples: 7*111 - 6*79 + 2*0 = 303. This value is then shif ted right by 2 bits: 303 >> 2 = 75. Then, the decoded residual sample is a dded: 75 + 3 = 78.</t> | |||
<table> | <table> | |||
<thead> | <thead> | |||
<tr> | <tr> | |||
<th>Residual</th> | <th>Residual</th> | |||
<th align="left">Predictor w/o shift</th> | <th align="left">Predictor w/o Shift</th> | |||
<th align="left">Predictor</th> | <th align="left">Predictor</th> | |||
<th align="left">Sample value</th> | <th align="left">Sample Value</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<td>(warm-up)</td> | <td>(warm-up)</td> | |||
<td align="left">N/A</td> | <td align="left">N/A</td> | |||
<td align="left">N/A</td> | <td align="left">N/A</td> | |||
<td align="left">0</td> | <td align="left">0</td> | |||
</tr> | </tr> | |||
skipping to change at line 4456 ¶ | skipping to change at line 4637 ¶ | |||
<td align="left">-5</td> | <td align="left">-5</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<td>0</td> | <td>0</td> | |||
<td align="left">1</td> | <td align="left">1</td> | |||
<td align="left">0</td> | <td align="left">0</td> | |||
<td align="left">0</td> | <td align="left">0</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table><t>By lining all these samples up, we get the following input for the MD | </table><t>By lining up all these samples, we get the following input for the | |||
5 checksum calculation process.</t> | MD5 checksum calculation process:</t> | |||
<artwork><![CDATA[0x004F 6F4E 08C3 A6BC F32A 4335 0DE5 D2DA F40E 1813 06FC FB00 | <artwork type=""> | |||
]]> | 0x004F 6F4E 08C3 A6BC F32A 4335 0DE5 D2DA F40E 1813 06FC FB00 | |||
</artwork> | </artwork> | |||
<t>Which indeed results in the MD5 checksum found in the streaminfo metadata blo | <t>This indeed results in the MD5 checksum found in | |||
ck.</t> | the streaminfo metadata block.</t> | |||
</section> | ||||
</section> | </section> | |||
</section> | </section> | |||
<section numbered="false" anchor="acknowledgments"><name>Acknowledgments</name> | ||||
<t>FLAC owes much to the many people who have advanced the audio compression fie | ||||
ld so freely. For instance:</t> | ||||
<ul> | ||||
<li><t><contact fullname="Tony Robinson"/>: He worked on Shorten, and his paper | ||||
(see <xref target="Robinson-TR156"></xref>) is a good starting point on some | ||||
of the basic methods used by FLAC. FLAC trivially extends and improves the | ||||
fixed predictors, LPC coefficient quantization, and Rice coding used in | ||||
Shorten.</t></li> | ||||
<li><t><contact fullname="Solomon W. Golomb"/> and <contact fullname="Robert | ||||
F. Rice"/>: Their universal codes are used by FLAC's entropy coder. See <xref | ||||
target="Rice"></xref>.</t></li> | ||||
<li><t><contact fullname="Norman Levinson"/> and <contact fullname="James Durbin | ||||
"/>: | ||||
The FLAC reference encoder uses an algorithm developed and refined by them for | ||||
determining the LPC coefficients from the autocorrelation coefficients. See | ||||
<xref target="Durbin"></xref>).</t></li> | ||||
<li><t><contact fullname="Claude Shannon"/>: See <xref | ||||
target="Shannon"></xref>.</t></li> | ||||
</ul> | ||||
<t>The FLAC format, the FLAC reference implementation <xref target="FLAC-impleme | ||||
ntation"/>, and the initial draft version of this document were originally devel | ||||
oped by <contact fullname="Josh | ||||
Coalson"/>. While many others have contributed since, this original effort is | ||||
deeply appreciated. </t> | ||||
</section> | </section> | |||
</back> | </back> | |||
</rfc> | </rfc> | |||
End of changes. 438 change blocks. | ||||
1626 lines changed or deleted | 1747 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |