<?xml version="1.0" encoding="utf-8"?><!-- name="GENERATOR" content="github.com/mmarkdown/mmark Mmark Markdown Processor - mmark.miek.nl" --><!DOCTYPE rfc [ <!ENTITY nbsp " "> <!ENTITY zwsp "​"> <!ENTITY nbhy "‑"> <!ENTITY wj "⁠"> ]> <rfc version="3" ipr="trust200902" docName="draft-ietf-cellar-flac-14" number="9639" submissionType="IETF" category="std" xml:lang="en" xmlns:xi="http://www.w3.org/2001/XInclude"indexInclude="true">tocInclude="true" consensus="true" updates="" obsoletes="" sortRefs="true" symRefs="true" > <front> <title abbrev="FLAC">Free Lossless AudioCodec</title><seriesInfo value="draft-ietf-cellar-flac-14" stream="IETF" status="standard" name="Internet-Draft"></seriesInfo>Codec (FLAC)</title> <seriesInfo name="RFC" value="9639"/> <author initials="M.Q.C." surname="van Beurden" fullname="Martijn vanBeurden"><organization></organization><address><postal><street></street> <country>NL</country> </postal><email>mvanb1@gmail.com</email> </address></author><author initials="A."Beurden"> <address> <postal> <country>Netherlands</country> </postal> <email>mvanb1@gmail.com</email> </address> </author> <author initials="A" surname="Weaver" fullname="AndrewWeaver"><organization></organization><address><postal><street></street> </postal><email>theandrewjw@gmail.com</email> </address></author><date/>Weaver"> <address> <email>theandrewjw@gmail.com</email> </address> </author> <date year="2024" month="November"/> <area>art</area> <workgroup>cellar</workgroup><keyword>free,lossless,audio,codec,encoder,decoder,compression,compressor,archival,archive,archiving,backup,music</keyword><keyword>free</keyword> <keyword>lossless</keyword> <keyword>audio</keyword> <keyword>codec</keyword> <keyword>encoder</keyword> <keyword>decoder</keyword> <keyword>compression</keyword> <keyword>compressor</keyword> <keyword>archival</keyword> <keyword>archive</keyword> <keyword>archiving</keyword> <keyword>backup</keyword> <keyword>music</keyword> <abstract> <t>This document defines the Free Lossless Audio Codec (FLAC) format and its streamable subset. FLAC is designed to reduce the amount of computer storage space needed to store digital audiosignalssignals. It does this losslessly, i.e., it does so without losinginformation in doing so (i.e., lossless).information. FLAC is free in the sense that its specification is open and its reference implementation isopen-source.open source. Compared to other lossless(audio)audio coding formats, FLAC is a format with low complexity and can becoded toencoded andfromdecoded with little computing resources. Decoding of FLAC hasseen many independent implementations onbeen implemented independently for many different platforms, and both encoding and decoding can be implemented without needing floating-pointarithmetic.</t>arithmetic. </t> </abstract> </front> <middle> <section anchor="introduction"><name>Introduction</name> <t>This document defines theFLACFree Lossless Audio Codec (FLAC) format and its streamable subset. FLAC files and streams can code for pulse-code modulated (PCM) audio with 1 to 8 channels, sample rates from 1upto 1048575hertzhertz, and bit depths from 4upto 32 bits. Most tools for coding to and decoding from the FLAC format have been optimized for CD-audio, which is PCM audio with 2 channels, a sample rate of 44.1 kHz, and a bit depth of 16 bits.</t> <t>FLAC is able to achieve lossless compression because samples in audio signals tend to be highly correlated with their close neighbors. In contrast with general-purpose compressors, which often use dictionaries, do run-length coding, or exploit long-term repetition, FLAC removes redundancy solely in the very short term, looking back atat most32samples.</t> <t>Thesamples at most.</t> <t> The coding methods provided by the FLAC format work best on PCM audiosignals, of which thesignals with samples that have a signed representation and are centered around zero. Audio signals in which samples have an unsigned representation must be transformed to a signed representation as described in this document in order to achieve reasonable compression. The FLAC format is not suited for compressing audio that is not PCM.</t> </section> <section anchor="notation-and-conventions"><name>Notation and Conventions</name><t>The<t> The key words"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY","<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>", "<bcp14>MAY</bcp14>", and"OPTIONAL""<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as described inBCP 14BCP 14 <xreftarget="RFC2119"></xref>target="RFC2119"/> <xreftarget="RFC8174"></xref>target="RFC8174"/> when, and only when, they appear in all capitals, as shownhere.</t>here. </t> <t>Values expressed as <tt>u(n)</tt> represent an unsigned big-endian integer using <tt>n</tt> bits. Values expressed as <tt>s(n)</tt> represent a signed big-endian integer using <tt>n</tt> bits, signed two's complement. Wherenecessarynecessary, <tt>n</tt> is expressed as an equation using <tt>*</tt> (multiplication), <tt>/</tt> (division), <tt>+</tt> (addition), or <tt>-</tt> (subtraction). An inclusive range of the number of bits expressed is represented with an ellipsis, such as <tt>u(m...n)</tt>.</t> <t>All shifts mentioned in this document are arithmetic shifts.</t> <t>While the FLAC format can store digital audio as well as other digital signals, this document uses terminology specific to digital audio. The use of more generic terminology was deemed less clear, so a reader interested in non-audio use of the FLAC format is expected to make the translation from audio-specific terms to more generic terminology.</t> </section> <section anchor="definitions"><name>Definitions</name><ul> <li><t><strong>Lossless compression</strong>: reducing<dl> <dt><strong>Lossless compression</strong>:</dt><dd>Reducing the amount of computer storage space needed to store data without needing to remove or irreversibly alter any of this data in doing so. In other words, decompressing losslessly compressed information returns exactly the originaldata.</t> </li> <li><t><strong>Lossy compression</strong>: likedata.</dd> <dt><strong>Lossy compression</strong>:</dt><dd>Like lossless compression, but instead removing, irreversibly altering, or only approximating information for the purpose of further reducing the amount of computer storage space needed. In other words, decompressing lossy compressed information returns an approximation of the originaldata.</t> </li> <li><t><strong>Block</strong>: Adata.</dd> <dt><strong>Block</strong>:</dt><dd>A (short) section of linearpulse-code modulatedPCM audio with one or morechannels.</t> </li> <li><t><strong>Subblock</strong>: Allchannels.</dd> <dt><strong>Subblock</strong>:</dt><dd>All samples within a corresponding block for one channel. One or more subblocks form a block, and all subblocks in a certain block contain the same number ofsamples.</t> </li> <li><t><strong>Frame</strong>: Asamples.</dd> <dt><strong>Frame</strong>:</dt><dd>A frame header, one or more subframes, and a frame footer. It encodes the contents of a correspondingblock.</t> </li> <li><t><strong>Subframe</strong>: Anblock.</dd> <dt><strong>Subframe</strong>:</dt><dd>An encoded subblock. All subframes within a frame code for the same number of samples. When interchannel decorrelation is used, a subframe can correspond to either the (per-sample) average of two subblocks or the (per-sample) difference between two subblocks, instead of to a subblockdirectly,directly; see <xreftarget="interchannel-decorrelation"></xref>.</t> </li> <li><t><strong>Interchannel samples</strong>: Atarget="interchannel-decorrelation"></xref>.</dd> <dt><strong>Interchannel samples</strong>:</dt><dd>A sample count that applies to all channels. For example, one second of 44.1 kHz audio has 44100 interchannel samples, meaning each channel has that number ofsamples.</t> </li> <li><t><strong>Block size</strong>: Thesamples.</dd> <dt><strong>Block size</strong>:</dt><dd>The number of interchannel samples contained in a block or coded in aframe.</t> </li> <li><t><strong>Bitframe.</dd> <dt><strong>Bit depth</strong> or <strong>bits persample</strong>: thesample</strong>:</dt><dd>The number of bits used to contain each sample. ThisMUST<bcp14>MUST</bcp14> be the same for all subblocks in a block butMAY<bcp14>MAY</bcp14> be different for different subframes in a frame because of interchannel decorrelation. (See <xref target="interchannel-decorrelation"></xref> for details on interchanneldecorrelation)</t> </li> <li><t><strong>Predictor</strong>: adecorrelation.)</dd> <dt><strong>Predictor</strong>:</dt><dd>A model used to predict samples in an audio signal based on past samples. FLAC uses such predictors to remove redundancy in a signal in order to be able to compressit.</t> </li> <li><t><strong>Linear predictor</strong>: ait.</dd> <dt><strong>Linear predictor</strong>:</dt><dd> A predictor using linear prediction (see <xref target="LinearPrediction"></xref>). This is also called <strong>linear predictive coding (LPC)</strong>. With a linear predictor, each prediction is a linear combination of pastsamples, hencesamples (hence thename.name). A linear predictor has a causal discrete-time finite impulse response (see <xreftarget="FIR"></xref>).</t> </li> <li><t><strong>Muxing</strong>: short for multiplexing, combining several streams or files into a single stream or file. In the context of this document, muxing more specifically refers to embedding a FLAC stream in a container as described in <xref target="container-mappings"></xref>.</t> </li> <li><t><strong>Fixed predictor</strong>: atarget="FIR"></xref>).</dd> <dt><strong>Fixed predictor</strong>:</dt><dd>A linear predictor in which the model parameters are the same across all FLACfiles,files and thus do not need to bestored.</t> </li> <li><t><strong>Predictor order</strong>: thestored.</dd> <dt><strong>Predictor order</strong>:</dt><dd>The number of past samples that a predictor uses. For example, a 4th order predictor uses the 4 samples directly preceding a certain sample to predict it. In FLAC, samples used in a predictor are alwaysconsecutive,consecutive and are always the samples directly before the sample that is beingpredicted.</t> </li> <li><t><strong>Residual</strong>: Thepredicted.</dd> <dt><strong>Residual</strong>:</dt><dd>The audio signal that remains after a predictor has been subtracted from a subblock. If the predictor has been able to remove redundancy from the signal, the samples of the remaining signal (the <strong>residual samples</strong>) will have, on average, asmallernumerical value closer to zero than the originalsignal.</t> </li> <li><t><strong>Rice code</strong>: Asignal.</dd> <dt><strong>Rice code</strong>:</dt><dd>A variable-length code (see <xreftarget="VarLengthCode"></xref>) that compresses data by makingtarget="VarLengthCode"></xref>). It uses a short code for samples close to zero and a progressively longer code for samples further away from zero. This makes use of the observationthat, after using an effective predictor, mostthat residual samples arecloseroften close tozero than the original samples, while still allowingzero. </dd> <dt><strong>Muxing</strong>:</dt><dd>Short for multiplexing. Combining several streams or files into asmall part ofsingle stream or file. In thesamplescontext of this document, muxing specifically refers tobe much larger.</t> </li> </ul>embedding a FLAC stream in a container as described in <xref target="container-mappings"></xref>.</dd> </dl> </section> <section anchor="conceptual-overview"><name>Conceptualoverview</name>Overview</name> <t>Similar to many other audio coders, a FLAC file is encoded following the steps below.On decodingTo decode a FLAC file, these steps areundoneperformed in reverse order, i.e., from bottom to top.</t><ul><ol> <li><t><strong>Blocking</strong> (see <xref target="blocking"></xref>). The input is split up into many contiguous blocks.</t> </li> <li><t><strong>Interchannel Decorrelation</strong> (see <xref target="interchannel-decorrelation"></xref>). In the case of stereo streams, the FLAC format allows for transforming the left-right signal into a mid-side signal, a left-sidesignalsignal, or a side-right signal to remove redundancy between channels. Choosing between any of these transformations is done independently for each block.</t> </li> <li><t><strong>Prediction</strong> (see <xref target="prediction"></xref>). To remove redundancy in a signal, a predictor is stored for each subblock or its transformation as formed in the previous step. A predictor consists of a simple mathematical description that can be used, as the name implies, to predict a certain sample from the samples that preceded it. As this prediction is rarely exact, the error of this prediction is passed on to the next stage. The predictor of each subblock is completely independent from other subblocks. Since the methods of prediction are known to both the encoder and decoder, only the parameters of the predictor need to be included in the compressed stream. If no usable predictor can be found for a certain subblock, the signal is storeduncompresseduncompressed, and the next stage is skipped.</t> </li> <li><t><strong>Residual Coding</strong> (see <xref target="residual-coding"></xref>). As the predictor does not describe the signal exactly, the difference between the original signal and the predicted signal (called the error or residual signal) is coded losslessly. If the predictor is effective, the residual signal will require fewer bits per sample than the original signal. FLAC uses Rice coding, a subset of Golomb coding, with either 4-bit or 5-bit parameters to code the residual signal.</t> </li></ul></ol> <t>In addition, FLAC specifies a metadata system (see <xreftarget="file-level-metadata"></xref>), whichtarget="file-level-metadata"></xref>) that allows arbitrary information about the stream to be included at the beginning of the stream.</t> <section anchor="blocking"><name>Blocking</name> <t>The block size used for audio data has a direct effect on the compression ratio. If the block size is too small, the resulting large number of frames means that a disproportionateamountnumber of bytes will be spent on frame headers. If the block size is too large, the characteristics of the signal may vary so much that the encoder will be unable to find a good predictor. In order to simplify encoder/decoder design, FLAC imposes a minimum block size of 16 samples, except for the last block, and a maximum block size of 65535 samples. The last block is allowed to be smaller than 16 samples to be able to match the length of the encoded audio without using padding.</t> <t>While the block size does not have to be constant in a FLAC file, it is often difficult to find the optimal arrangement of block sizes for maximum compression. Because of this,the FLAC format explicitly stores whetherafileFLAC stream has explicitly either a constant oravariable block size throughoutthe stream,and stores a block number instead of a sample number to slightly improve compression if a stream has a constant block size.</t> </section> <section anchor="interchannel-decorrelation"><name>Interchannel Decorrelation</name><t>In<t>Channels are correlated in many audiofiles, channels are correlated.files. The FLAC format can exploit this correlation in stereo files bynot directly coding subblocks into subframes, but insteadcoding an average of all samples in both subblocks (a mid channel) or the difference between all samples in both subblocks (a sidechannel).channel) instead of directly coding subblocks into subframes. The following combinations are possible:</t> <ul> <li><t><strong>Independent</strong>. All channels are coded independently. All non-stereo filesMUST<bcp14>MUST</bcp14> be encoded this way.</t> </li> <li><t><strong>Mid-side</strong>. A left and right subblock are converted to mid and side subframes. To calculate a sample for a mid subframe, the corresponding left and right samples aresummedsummed, and the result is shifted right by 1 bit. To calculate a sample for a side subframe, the corresponding right sample is subtracted from the corresponding left sample. On decoding, all mid channel samples have to be shifted left by 1 bit. Also, if a side channel sample is odd, 1 has to be added to the corresponding mid channel sample after it has been shifted left byone1 bit. To reconstruct the left channel, the corresponding samples in the mid and side subframes are added and the result shifted right by 1bit, while forbit. For the rightchannelchannel, the side channel has to be subtracted from the mid channel and the result shifted right by 1 bit.</t> </li> <li><t><strong>Left-side</strong>. The left subblock iscodedcoded, and the left and right subblocks are used to code a side subframe. The side subframe is constructed in the same way as for mid-side. To decode, the right subblock is restored by subtracting the samples in the side subframe from the corresponding samples in thetheleft subframe.</t> </li> <li><t><strong>Side-right</strong>. The left and right subblocks are used to code a sidesubframesubframe, and the right subblock is coded. The side subframe is constructed in the same way as for mid-side. To decode, the left subblock is restored by adding the samples in the side subframe to the corresponding samples in the right subframe.</t> </li> </ul> <t>The side channel needs one extra bit of bitdepthdepth, as the subtraction can produce sample values twice as large as the maximum possible in any given bit depth. The mid channel in mid-side stereo does not need one extra bit, as it is shifted rightone1 bit. The right shift of the mid channel does not lead to lossybehavior,behavior because an odd sample in the mid subframe must always be accompanied by a corresponding odd sample in the side subframe, which means the lost least-significant bit can be restored by taking it from the sample in the side subframe.</t> </section> <section anchor="prediction"><name>Prediction</name> <t>The FLAC format has four methods for modeling the input signal:</t> <ol> <li><t><strong>Verbatim</strong>. Samples are stored directly, without any modeling. This method is used for inputs with littlecorrelation, like white noise.correlation. Since the raw signal is not actually passed through the residual coding stage (it is added to the stream'verbatim'),"verbatim"), this method is different from using a zero-order fixed predictor.</t> </li> <li><t><strong>Constant</strong>. A single sample value is stored. This method is used whenever a signal is pure DC("digital silence"),("digital silence"), i.e., a constant value throughout.</t> </li> <li><t><strong>Fixed predictor</strong>. Samples are predicted with one of five fixed (i.e., predefined) predictors, and the error of this prediction is processed by the residual coder. These fixed predictors are well suited for predicting simple waveforms. Since the predictors are fixed, no predictor coefficients are stored. From a mathematical point of view, the predictors work by extrapolating the signal from the previous samples. The number of previous samples used is equal to the predictor order. For more information, see <xref target="fixed-predictor-subframe"></xref>.</t> </li> <li><t><strong>Linear predictor</strong>. Samples are predicted using past samples and a set of predictor coefficients, and the error of this prediction is processed by the residual coder. Compared to a fixed predictor, using a generic linear predictor adds overhead as predictor coefficients need to be stored. Therefore, this method of prediction is best suited for predicting more complex waveforms, where the added overhead is offset by space savings in the residual coding stage resulting from more accurate prediction. A linear predictor in FLAC has two parameters besides the predictor coefficients and the predictor order: the number of bits with which each coefficient is stored (the coefficient precision) and a prediction right shift. A prediction is formed by taking the sum of multiplying each predictor coefficient with the corresponding pastsample,sample and dividing that sum by applying the specified right shift. For more information, see <xref target="linear-predictor-subframe"></xref>.</t> </li> </ol> <t>A FLAC encoder is free to select any of the above methods to model the input. However, to ensure lossless coding, the following exceptions apply:</t><ul spacing="compact"><ul> <li>When the samples that need to be stored do not all have the same value (i.e., the signal is not constant), a constant subframe cannot be used.</li> <li>When an encoder is unable to find a fixed or linear predictor for which all residual samples are representable in 32-bit signed integers as stated in <xref target="coded-residual"></xref>, a verbatim subframe is used.</li> </ul> <t>For more information on fixed and linear predictors, see <xreftarget="HPL-1999-144"></xref>target="Lossless-Compression"></xref> and <xreftarget="robinson-tr156"></xref>.</t>target="Robinson-TR156"></xref>.</t> </section> <section anchor="residual-coding"><name>Residual Coding</name> <t>If a subframe uses a predictor to approximate the audio signal, a residual is stored to'correct'"correct" the approximation to the exact value. When an effective predictor is used, the average numerical value of the residual samples is smaller than that of the samples before prediction. While having smaller values on average, it is possible that a few'outlier'"outlier" residual samples are much larger than any of the original samples. Sometimes these outliers even exceed the range that the bit depth of the original audio offers.</t> <t>Tobe able toefficiently code such a stream of relatively small numbers with an occasional outlier, Rice coding (a subset of Golomb coding) is used. Depending on how small the numbers are that have to be coded, a Rice parameter is chosen. The numerical value of each residual sample is split into two parts by dividing it by<tt>2^(Rice parameter)</tt>,2<sup>(Rice parameter)</sup>, creating a quotient and a remainder. The quotient is stored in unaryform,form and the remainder in binary form. If indeed most residual samples are close to zero and a suitable Rice parameter is chosen, this form of coding, with a so-called variable-length code, uses fewer bits than the residual in unencoded form.</t> <t>As Rice codes can only handle unsigned numbers, signed numbers are zigzag encoded to a so-called folded residual. See <xref target="coded-residual"></xref> for a more thorough explanation.</t> <t>Quite often, the optimal Rice parameter varies over the course of a subframe. To accommodate this, the residual can be split up into partitions, where each partition has its own Rice parameter. To keep overhead and complexity low, the number of partitions used in a subframe is limited to powers of two.</t> <t>The FLAC format uses two forms of Rice coding, which only differ in the number of bits used for encoding the Rice parameter, either 4 or 5 bits.</t> </section> </section> <section anchor="format-principles"><name>Formatprinciples</name>Principles</name> <t>FLAC has no format version information, but it does contain reserved space in several places. Future versions of the formatMAY<bcp14>MAY</bcp14> use this reserved space safely without breaking the format of older streams. Older decodersMAY<bcp14>MAY</bcp14> choose to abort decoding when encountering data that is encoded using methods they do not recognize. Apart from reserved patterns, the format specifies forbidden patterns in certain places, meaning that the patternsMUST NOT<bcp14>MUST NOT</bcp14> appear in any bitstream. They are listed in the following table.</t> <table anchor="tableforbiddenpatterns"> <thead> <tr> <th align="left">Description</th> <th align="left">Reference</th> </tr> </thead> <tbody> <tr> <td align="left">Metadata block type 127</td> <td align="left"><xref target="metadata-block-header"></xref></td> </tr> <tr> <td align="left">Minimum and maximum block sizes smaller than 16 in streaminfo metadata block</td> <td align="left"><xref target="streaminfo"></xref></td> </tr> <tr> <td align="left">Sample rate bits 0b1111</td> <td align="left"><xref target="sample-rate-bits"></xref></td> </tr> <tr> <td align="left">Uncommonblocksizeblock size 65536</td> <td align="left"><xref target="uncommon-block-size"></xref></td> </tr> <tr> <td align="left">Predictor coefficient precision bits 0b1111</td> <td align="left"><xref target="linear-predictor-subframe"></xref></td> </tr> <tr> <td align="left">Negative predictor right shift</td> <td align="left"><xref target="linear-predictor-subframe"></xref></td> </tr> </tbody> </table><t>All numbers used in a FLAC bitstream areintegers,integers; there are no floating-point representations. All numbers are big-endian coded, except the field lengths used in Vorbis comments (see <xref target="vorbis-comment"></xref>), which are little-endian coded. This exception for Vorbis comments is to keep as much commonality as possible with Vorbis comments as used by the Vorbis codec (see <xref target="Vorbis"></xref>). All numbers are unsigned except linear predictor coefficients, the linear prediction shift (see <xref target="linear-predictor-subframe"></xref>), and numbers that directly represent samples, which are signed. None of these restrictions apply to application metadata blocks or to Vorbis comment field contents.</t> <t>All samples encoded to and decoded from the FLAC formatMUST<bcp14>MUST</bcp14> be in a signed representation.</t> <t>There are several ways to convert unsigned sample representations to signed sample representations, but the coding methods provided by the FLAC format work best onaudio signals of which thesamples that have numerical valuesof the samplesthat are centered around zero, i.e., have no DC offset. In most unsigned audio formats, signals are centered around halfway within the range of the unsigned integer type used. If that is the case, converting sample representations by first copying the number to a signed integer with a sufficient range and then subtracting half of the range of the unsigned integertype,type results in a signal with samples centered around 0.</t> <t>Unary coding in a FLAC bitstream is done with zero bits terminated with a one bit, e.g., the number 5 is coded unary as 0b000001. This prevents the frame sync code from appearing inunary codedunary-coded numbers.</t> <t>When a FLAC file contains data that is forbidden or otherwise not valid, decoder behavior is left unspecified. A decoderMAY<bcp14>MAY</bcp14> choose to stop decoding upon encountering such data. Examples of such dataare</t> <ul spacing="compact">include the following:</t> <ul> <li>One or more decoded sample values exceed the range offered by the bit depth as coded for that frame.E.g.,For example, in a frame with a bit depth of 8 bits, any samples not in the inclusive range from -128 to 127 are not valid.</li> <li>The number of wasted bits (see <xref target="wasted-bits-per-sample"></xref>) used by a subframe is such that the bit depth of that subframe (see <xref target="constant-subframe"></xref> for a description of subframe bit depth) equals zero or is negative.</li> <li>A frame headerCRCCyclic Redundancy Check (CRC) (see <xref target="frame-header-crc"></xref>) or frame footer CRC (see <xref target="frame-footer"></xref>) does not validate.</li> <li>One of the forbidden bit patterns described in <xref target="tableforbiddenpatterns"></xref>aboveis used.</li> </ul> </section> <section anchor="format-layout-overview"><name>Formatlayout overview</name>Layout Overview</name> <t>A FLAC bitstream consists of the <tt>fLaC</tt> (i.e., 0x664C6143) marker at the beginning of the stream, followed by a mandatory metadata block (called theSTREAMINFOstreaminfo metadata block), any number of other metadata blocks, and then the audio frames.</t> <t>FLAC supports 127 kinds of metadata blocks; currently, 7 kinds are defined in <xref target="file-level-metadata"></xref>.</t> <t>The audio data is composed of one or more audio frames. Each frame consists of a frameheader, whichheader that contains a sync code, information about the frame (like the block size, sampleraterate, and number of channels), and an 8-bit CRC. The frame header also contains either the sample number of the first sample in the frame (for variable block sizestreams),streams) or the frame number (for fixed block size streams). This allows for fast, sample-accurate seeking to be performed. Following the frame header are encoded subframes, one for each channel. The frame is then zero-padded to a byte boundary and finished with a frame footer containing a checksum for the frame. Each subframe has its own header that specifies how the subframe is encoded.</t> <t>In order to allow a decoder to start decoding at any place in the stream, each frame starts with a byte-aligned 15-bit sync code. However, since it is not guaranteed that the sync code does not appear elsewhere in the frame, the decoder can check that it synced correctly by parsing the rest of the frame header and validating the frame header CRC.</t> <t>Furthermore, to allow a decoder to start decoding at any place in the stream even without having received a streaminfo metadata block, each frame header contains some basic information about the stream. This information includes sample rate, bits per sample, number of channels, etc. Since the frame header is overhead, it has a direct effect on the compression ratio. To keep the frame header as small as possible, FLAC uses lookup tables for the most commonly used values for frame properties. When a certain property has a value that is not covered by the lookup table, the decoder is directed to find the value of that property (for example, the sample rate) at the end of the frame header or in the streaminfo metadata block. If a frame header refers to the streaminfo metadata block, the file is not'streamable',"streamable"; see <xref target="streamable-subset"></xref> for details. By using lookup tables, the file is streamable and the frame header size is small for the most common forms of audio data.</t> <t>Individual subframes (one for each channel) are coded separately within aframe,frame and appear serially in the stream. In other words, the encoded audio data is NOT channel-interleaved. This reduces decoder complexity at the cost of requiring larger decode buffers. Each subframe has its own header specifying the attributes of the subframe, like prediction method and order, residual coding parameters, etc. Each subframe header is followed by the encoded audio data for that channel.</t> </section> <section anchor="streamable-subset"><name>Streamablesubset</name>Subset</name> <t>The FLAC format specifies a subset of itself as the FLAC streamable subset. The purpose of this is to ensure that any streams encoded according to this subset are truly"streamable","streamable", meaning that a decoder that cannot seek within the stream can still pick up in the middle of the stream and start decoding. It also makes hardware decoder implementations more practical by limiting the encoding parameters in such a way that decoder buffer sizes and other resource requirements can be easily determined. The streamable subset makes the following limitations on whatMAY<bcp14>MAY</bcp14> be used in the stream:</t><ul spacing="compact"><ul> <li>The sample rate bits (see <xref target="sample-rate-bits"></xref>) in the frame headerMUST<bcp14>MUST</bcp14> be 0b0001-0b1110, i.e., the frame headerMUST NOT<bcp14>MUST NOT</bcp14> refer to the streaminfo metadata block to describe the sample rate.</li> <li>The bit depth bits (see <xref target="bit-depth-bits"></xref>) in the frame headerMUST<bcp14>MUST</bcp14> be 0b001-0b111, i.e., the frame headerMUST NOT<bcp14>MUST NOT</bcp14> refer to the streaminfo metadata block to describe the bit depth.</li> <li>The streamMUST NOT<bcp14>MUST NOT</bcp14> contain blocks with more than 16384 interchannel samples, i.e., the maximum block size must not be larger than 16384.</li> <li>Audio with a sample rate less than or equal to 48000 HzMUST NOT<bcp14>MUST NOT</bcp14> be contained in blocks with more than 4608 interchannel samples, i.e., the maximum block size used for this audio must not be larger than 4608.</li> <li>Linear prediction subframes (see <xref target="linear-predictor-subframe"></xref>) containing audio with a sample rate less than or equal to 48000 HzMUST<bcp14>MUST</bcp14> have a predictor order less than or equal to 12, i.e., the subframe type bits in the subframe header (see <xref target="subframe-header"></xref>)MUST NOT<bcp14>MUST NOT</bcp14> be 0b101100-0b111111.</li> <li>The Rice partition order (see <xref target="coded-residual"></xref>)MUST<bcp14>MUST</bcp14> be less than or equal to 8.</li> <li>The channel orderingMUST<bcp14>MUST</bcp14> be equal to one defined in <xref target="channels-bits"></xref>, i.e., the FLAC fileMUST NOT<bcp14>MUST NOT</bcp14> need a WAVEFORMATEXTENSIBLE_CHANNEL_MASK tag to describe the channel ordering. See <xref target="channel-mask"></xref> for details.</li> </ul> </section> <sectionanchor="file-level-metadata"><name>File-level metadata</name>anchor="file-level-metadata"><name>File-Level Metadata</name> <t>At the start of a FLAC file or stream, following the <tt>fLaC</tt> ASCII file signature, one or more metadata blocksMUST<bcp14>MUST</bcp14> be present before any audio frames appear. The first metadata blockMUST<bcp14>MUST</bcp14> be a streaminfo metadata block.</t> <section anchor="metadata-block-header"><name>Metadatablock header</name>Block Header</name> <t>Each metadata block starts with a4 byte4-byte header. The first bit in this header flags whether a metadata block is the lastone: itone. It isa0 when other metadata blocksfollow, otherwisefollow; otherwise, it isa1. The 7 remaining bits of the first header byte contain the type of the metadata block as an unsigned number between 0 and126126, according to the following table. A value of 127 (i.e., 0b1111111) is forbidden. The three bytes that follow code for the size of the metadata block in bytes, excluding the 4 header bytes, as an unsigned number coded big-endian.</t> <table> <thead> <tr> <th align="left">Value</th> <th align="left">Metadatablock type</th>Block Type</th> </tr> </thead> <tbody> <tr> <td align="left">0</td> <td align="left">Streaminfo</td> </tr> <tr> <td align="left">1</td> <td align="left">Padding</td> </tr> <tr> <td align="left">2</td> <td align="left">Application</td> </tr> <tr> <td align="left">3</td> <tdalign="left">Seektable</td>align="left">Seek table</td> </tr> <tr> <td align="left">4</td> <td align="left">Vorbis comment</td> </tr> <tr> <td align="left">5</td> <td align="left">Cuesheet</td> </tr> <tr> <td align="left">6</td> <td align="left">Picture</td> </tr> <tr> <td align="left">7 - 126</td> <tdalign="left">reserved</td>align="left">Reserved</td> </tr> <tr> <td align="left">127</td> <tdalign="left">forbidden, toalign="left">Forbidden (to avoid confusion with a frame synccode</td>code)</td> </tr> </tbody> </table></section> <section anchor="streaminfo"><name>Streaminfo</name> <t>The streaminfo metadata block has information about the whole stream,likesuch as sample rate, number of channels, total number of samples, etc. ItMUST<bcp14>MUST</bcp14> be present as the first metadata block in the stream. Other metadata blocksMAY<bcp14>MAY</bcp14> follow. ThereMUST<bcp14>MUST</bcp14> be no more than one streaminfo metadata block per FLAC stream.</t> <t>If the streaminfo metadata block contains incorrect or incomplete information, decoder behavior is left unspecified (i.e., it is up to the decoder implementation). A decoderMAY<bcp14>MAY</bcp14> choose to stop further decoding when the information supplied by the streaminfo metadata block turns out to be incorrect or contains forbidden values. A decoder accepting information from the streaminfo metadata block(most-significantly(most significantly, the maximum frame size, maximum block size, number of audio channels, number of bits per sample, and total number of samples) without doing further checks during decoding of audio frames could be vulnerable to buffer overflows. See also <xref target="security-considerations"></xref>.</t> <t>The following table describes the streaminfo metadatablock,block in order, excluding the metadata block header.</t> <table> <thead> <tr> <th align="left">Data</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left"><tt>u(16)</tt></td> <td align="left">The minimum block size (in samples) used in the stream, excluding the last block.</td> </tr> <tr> <td align="left"><tt>u(16)</tt></td> <td align="left">The maximum block size (in samples) used in the stream.</td> </tr> <tr> <td align="left"><tt>u(24)</tt></td> <td align="left">The minimum frame size (in bytes) used in the stream. A value of <tt>0</tt> signifies that the value is not known.</td> </tr> <tr> <td align="left"><tt>u(24)</tt></td> <td align="left">The maximum frame size (in bytes) used in the stream. A value of <tt>0</tt> signifies that the value is not known.</td> </tr> <tr> <td align="left"><tt>u(20)</tt></td> <td align="left">Sample rate in Hz.</td> </tr> <tr> <td align="left"><tt>u(3)</tt></td> <td align="left">(number of channels)-1. FLAC supports from 1 to 8 channels.</td> </tr> <tr> <td align="left"><tt>u(5)</tt></td> <td align="left">(bits per sample)-1. FLAC supports from 4 to 32 bits per sample.</td> </tr> <tr> <td align="left"><tt>u(36)</tt></td> <td align="left">Total number of interchannel samples in the stream. A value ofzero0 here means the number of total samples is unknown.</td> </tr> <tr> <td align="left"><tt>u(128)</tt></td> <td align="left">MD5 checksum of the unencoded audio data. This allows the decoder to determine if an error exists in the audio data even when, despite the error, the bitstream itself is valid. A value of <tt>0</tt> signifies that the value is not known.</td> </tr> </tbody> </table><t>The minimum block size and the maximum block sizeMUST<bcp14>MUST</bcp14> be in the 16-65535 range. The minimum block sizeMUST<bcp14>MUST</bcp14> be equal to or less than the maximum block size.</t> <t>Any frame but the last oneMUST<bcp14>MUST</bcp14> have a block size equal to or greater than the minimum block size andMUST<bcp14>MUST</bcp14> have a block size equal to orlesserless than the maximum block size. The last frameMUST<bcp14>MUST</bcp14> have a block size equal to orlesserless than the maximum blocksize,size; it does not have to comply to the minimum block size because the block size of that frame must be able to accommodate the length of the audio data the stream contains.</t> <t>If the minimum block size is equal to the maximum block size, the file contains a fixed block size stream, as the minimum block size excludes the last block. Note that in the case of a stream with a variable block size, the actual maximum block sizeMAY<bcp14>MAY</bcp14> be smaller than the maximum block size listed in the streaminfo metadata block, and the actual smallest block size excluding the last blockMAY<bcp14>MAY</bcp14> be larger than the minimum block size listed in the streaminfo metadata block. This is because the encoder has to write these fields before receiving any input audiodata,data and cannot know beforehand what block sizes it will use, only between what boundsthesethe block sizes will be chosen.</t> <t>The sample rateMUST NOT<bcp14>MUST NOT</bcp14> be 0 when the FLAC file contains audio. A sample rate of 0MAY<bcp14>MAY</bcp14> be used when non-audio is represented. This is useful if data is encoded that is not along a timeaxis,axis or when the sample rate of the data lies outside the range that FLAC can represent in the streaminfo metadata block. If a sample rate of 0 isusedused, it is recommended to store the meaning of the encoded content in a Vorbis comment field (see <xref target="vorbis-comment"></xref>) or an application metadata block (see <xref target="application"></xref>). This document does not define such metadata.</t> <t>The MD5 checksum is computed by applying the MD5 message-digest algorithm in <xref target="RFC1321"></xref>. The message to this algorithm consists of all the samples of all channels interleaved, represented in signed, little-endian form. This interleaving is on a per-sample basis, so for a stereofilefile, this meansfirstthe first sample of the first channel, then the first sample of the second channel, then the second sample of the firstchannelchannel, etc. Before computing the checksum, all samples must be byte-aligned. If the bit depth is not a whole number of bytes, the value of each sample issign extendedsign-extended to the next whole number of bytes.</t><t>So, in<t>In the case of a 2-channel stream with 6-bit samples, bits will be lined up asfollows.</t> <artwork><![CDATA[SSAAAAAASSBBBBBBSSCCCCCCfollows:</t> <artwork type="ascii-art"> <![CDATA[SSAAAAAASSBBBBBBSSCCCCCC ^ ^ ^ ^ ^ ^ | | | | | Bits of 2nd sample of 1st channel | | | | Sign extension bits of 2nd sample of 2nd channel | | | Bits of 1st sample of 2nd channel | | Sign extension bits of 1st sample of 2nd channel | Bits of 1st sample of 1st channel Signextentionextension bits of 1st sample of 1st channel]]> </artwork> <t>As another example, in]]></artwork> <t>In the case of a 1-channel stream with 12-bit samples, bits are lined upas follows, showing thein little-endian byteorder</t> <artwork><![CDATA[AAAAAAAASSSSAAAABBBBBBBBSSSSBBBBorder as follows:</t> <artwork type="ascii-art"> <![CDATA[AAAAAAAASSSSAAAABBBBBBBBSSSSBBBB ^ ^ ^ ^ ^ ^ | | | | | Most-significant 4 bits of 2nd sample | | | | Sign extension bits of 2nd sample | | | Least-significant 8 bits of 2nd sample | | Most-significant 4 bits of 1st sample | Sign extension bits of 1st sample Least-significant 8 bits of 1st sample]]> </artwork>]]></artwork> </section> <section anchor="padding"><name>Padding</name> <t>The padding metadata block allows for an arbitrary amount of padding. This block is useful when it is known that metadata will be edited after encoding; the user can instruct the encoder to reserve a padding block of sufficient size so that when metadata is added, it will simply overwrite the padding (which is relatively quick) instead of having to insert it into the existing file (which would normally require rewriting the entire file). ThereMAY<bcp14>MAY</bcp14> be one or more padding metadata blocks per FLAC stream.</t> <table> <thead> <tr> <th align="left">Data</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left"><tt>u(n)</tt></td> <td align="left">n'0'"0" bits (nMUST<bcp14>MUST</bcp14> be a multiple of 8, i.e., a whole number of bytes, andMAY<bcp14>MAY</bcp14> be zero). n is 8 times the size described in the metadata block header.</td> </tr> </tbody> </table></section> <section anchor="application"><name>Application</name> <t>The application metadata block is for use by third-party applications. The only mandatory field is a 32-bitidentifier. An IDapplication identifier (application ID). Application IDs are registered in the IANA "FLAC Application Metadata Block IDs" registryis being maintained at <eref target="https://xiph.org/flac/id.html">https://xiph.org/flac/id.html</eref>.</t>(see <xref target="application-id-registry"></xref>).</t> <table> <thead> <tr> <th align="left">Data</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left"><tt>u(32)</tt></td> <td align="left">Registered application ID.</td> </tr> <tr> <td align="left"><tt>u(n)</tt></td> <td align="left">Application data (nMUST<bcp14>MUST</bcp14> be a multiple of 8, i.e., a whole number ofbytes)bytes). n is 8 times the size described in the metadata blockheader,header minus the 32 bits already used for the application ID.</td> </tr> </tbody></table><t>Application IDs are registered with the IANA, see <xref target="application-id-registry"></xref>.</t></table> </section> <sectionanchor="seektable"><name>Seektable</name>anchor="seektable"><name>Seek Table</name> <t>Theseektableseek table metadata block can be used to store seek points. It is possible to seek to any given sample in a FLAC stream without a seek table, but the delay can be unpredictable since the bitrate may vary widely within a stream. By adding seek points to a stream, this delay can be significantly reduced. ThereMUST NOT<bcp14>MUST NOT</bcp14> be more than oneseektableseek table metadata block in a stream, but the table can have any number of seek points.</t> <t>Each seek point takes 18 bytes, so a seek table with 1% resolution within a stream adds less than 2kilobytekilobytes of data. The number of seek points is implied by the size described in the metadata block header, i.e., equal to size / 18. There is also a special'placeholder' seekpoint"placeholder" seek point that will be ignored by decoders but can be used to reserve space for future seek point insertion.</t> <table> <thead> <tr> <th align="left">Data</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <tdalign="left">Seekpoints</td>align="left">Seek points</td> <td align="left">Zero or more seek points as defined in <xref target="seekpoint"></xref>.</td> </tr> </tbody> </table><t>Aseektableseek table is generally not usable for seeking in a FLAC file embedded in a container (see <xref target="container-mappings"></xref>), as such containers usually interleave FLAC data with other data and the offsets used inseekpointsseek points are those of an unmuxed FLAC stream. Also, containers often provide their own seeking methods.It is, however,However, it is possible to store theseektableseek table in the container along with other metadata when muxing a FLAC file, so this storedseektableseek table can be restored when demuxing the FLAC stream into a standalone FLAC file.</t> <sectionanchor="seekpoint"><name>Seekpoint</name>anchor="seekpoint"><name>Seek Point</name> <table> <thead> <tr> <th align="left">Data</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left"><tt>u(64)</tt></td> <td align="left">Sample number of the first sample in the targetframe,frame or <tt>0xFFFFFFFFFFFFFFFF</tt> for a placeholder point.</td> </tr> <tr> <td align="left"><tt>u(64)</tt></td> <td align="left">Offset (in bytes) from the first byte of the first frame header to the first byte of the target frame's header.</td> </tr> <tr> <td align="left"><tt>u(16)</tt></td> <td align="left">Number of samples in the target frame.</td> </tr> </tbody></table><t>NOTES</t> <ul spacing="compact"></table> <t>Notes:</t> <ul> <li>For placeholder points, the second and third field values are undefined.</li> <li>Seek points within a tableMUST<bcp14>MUST</bcp14> be sorted in ascending order by sample number.</li> <li>Seek points within a tableMUST<bcp14>MUST</bcp14> be unique by sample number, with the exception of placeholder points.</li> <li>The previous two notes imply that thereMAY<bcp14>MAY</bcp14> be any number of placeholder points, but theyMUST<bcp14>MUST</bcp14> all occur at the end of the table.</li> <li>The sample offsets are those of an unmuxed FLAC stream. The offsetsMUST NOT<bcp14>MUST NOT</bcp14> be updated on muxing to reflect the new offsets of FLAC frames in a container.</li> </ul> </section> </section> <section anchor="vorbis-comment"><name>Vorbiscomment</name>Comment</name> <t>A Vorbis comment metadata block contains human-readable information coded in UTF-8. The nameVorbis comment"Vorbis comment" points to the fact that the Vorbis codec stores such metadata in almost the sameway, seeway (see <xreftarget="Vorbis"></xref>.target="Vorbis"></xref>). A Vorbis comment metadata block consists of a vendor string optionally followed by a number of fields, which are pairs of field names and field contents. The vendor string contains the name of the program that generated the file or stream. The fields contain metadata describing various aspects of the contained audio. Many users refer to these fields asFLAC tags"FLAC tags" or simply astags."tags". A FLAC fileMUST NOT<bcp14>MUST NOT</bcp14> contain more than one Vorbis comment metadata block.</t> <t>In a Vorbis comment metadata block, the metadata block header is directly followed by 4 bytes containing the length in bytes of the vendor string as an unsigned number coded little-endian. The vendor stringfollowsfollows, is UTF-8coded,coded and is not terminated in anyway.</t>way. </t> <t>Following the vendor string are 4 bytes containing the number of fields that are in the Vorbis comment block, stored as an unsignednumber,number coded little-endian. If this number is non-zero, it is followed by the fields themselves, each of which is stored with a4 byte4-byte length.First,For each field, the4 bytefield length in bytes is stored asana 4-byte unsignednumber,number coded little-endian. The field itselfis, likefollows it. Like the vendor string, the field is UTF-8coded,coded and not terminated in any way.</t> <t>Each field consists of a field name andafieldcontent,contents, separated by an = character. The field nameMUST<bcp14>MUST</bcp14> only consist of UTF-8 code points U+0020 through U+007E, excluding U+003D, which is the = character. In other words, the field name can contain all printable ASCII characters except the equals sign. The evaluation of the field namesMUST<bcp14>MUST</bcp14> be case insensitive, so U+0041 through 0+005A (A-Z)MUST<bcp14>MUST</bcp14> be considered equivalent to U+0061 through U+007A(a-z) respectively.(a-z). The field contents can contain any UTF-8 character.</t> <t>Note that the Vorbis comment as used in Vorbis allows foron the order of 2^642<sup>64</sup> bytes of data whereas the FLAC metadata block is limited to2^242<sup>24</sup> bytes. Given the stated purpose of Vorbis comments, i.e., human-readable textual information, the FLAC metadata block limit is unlikely to be restrictive.AlsoAlso, note that the 32-bit field lengths are codedlittle-endian,little-endian as opposed to the usual big-endian coding of fixed-length integers in the rest of the FLAC format.</t> <section anchor="standard-field-names"><name>Standardfield names</name>Field Names</name> <t>Only one standard field name is defined: the channel maskfield, infield (see <xreftarget="channel-mask"></xref>.target="channel-mask"></xref>). No other field names are defined because the applicability of any field name is strongly tied to the content it is associated with. For example, field names that are useful for describing files that contain a single work of music would be unusable when labeling archived broadcasts, recordings of any kind, or a collection of music works. Even when describing a single work of music, different conventions exist depending on the kind of music: orchestral music differs from music by solo artists or bands.</t> <t>Despite the fact that no field names are formally defined, there is a general trend among devices and software capable of FLAC playback that are meant to play music. Most of those recognize at least the following field names:</t><ul spacing="compact"> <li>Title: name<dl> <dt>Title:</dt><dd>Name of the currentwork.</li> <li>Artist: namework.</dd> <dt>Artist:</dt><dd>Name of the artist generally responsible for the current work. For orchestral works, this is usually the composer; otherwise, it is often theperformer.</li> <li>Album: nameperformer.</dd> <dt>Album:</dt><dd>Name of the collection the current work belongsto.</li> </ul>to.</dd> </dl> <t>For a more comprehensive list of possible field names suited for describing a single work of music in various genres, the list of tags used in the MusicBrainzproject,project is suggested; see <xreftarget="MusicBrainz"></xref>, is suggested.</t>target="MusicBrainz"></xref>.</t> </section> <section anchor="channel-mask"><name>Channelmask</name>Mask</name> <t>Besides fields containing information about the work itself, one field is defined for technicalreasons, of which the field name isreasons: WAVEFORMATEXTENSIBLE_CHANNEL_MASK. This field is used to communicate that the channels in a file differ from the default channels defined in <xref target="channels-bits"></xref>. For example, by default, a FLAC file containing two channels is interpreted to contain a left and right channel, but with this field, it is possible to describe different channel contents.</t> <t>The channel mask consists of flag bits indicating which channels are present. The flags only signal which channels are present, not in which order, so if a filehasto be encodedin whichhas channels that are ordered differently, they have to be reordered. This mask is stored with a hexadecimalrepresentation,representation preceded by0x,0x; see the examples below. Please note that a file in which the channel order is defined through the WAVEFORMATEXTENSIBLE_CHANNEL_MASK is not streamable (see <xref target="streamable-subset"></xref>), as the field is not found in each frame header. The mask bits can be found in the following table.</t><table><table anchor="mask-bits-table"> <thead> <tr> <th align="left">Bitnumber</th>Number</th> <th align="left">Channeldescription</th>Description</th> </tr> </thead> <tbody> <tr> <td align="left">0</td> <td align="left">Front left</td> </tr> <tr> <td align="left">1</td> <td align="left">Front right</td> </tr> <tr> <td align="left">2</td> <td align="left">Front center</td> </tr> <tr> <td align="left">3</td> <td align="left">Low-frequency effects (LFE)</td> </tr> <tr> <td align="left">4</td> <td align="left">Back left</td> </tr> <tr> <td align="left">5</td> <td align="left">Back right</td> </tr> <tr> <td align="left">6</td> <td align="left">Front left of center</td> </tr> <tr> <td align="left">7</td> <td align="left">Front right of center</td> </tr> <tr> <td align="left">8</td> <td align="left">Back center</td> </tr> <tr> <td align="left">9</td> <td align="left">Side left</td> </tr> <tr> <td align="left">10</td> <td align="left">Side right</td> </tr> <tr> <td align="left">11</td> <td align="left">Top center</td> </tr> <tr> <td align="left">12</td> <td align="left">Top front left</td> </tr> <tr> <td align="left">13</td> <td align="left">Top front center</td> </tr> <tr> <td align="left">14</td> <td align="left">Top front right</td> </tr> <tr> <td align="left">15</td> <td align="left">Top rear left</td> </tr> <tr> <td align="left">16</td> <td align="left">Top rear center</td> </tr> <tr> <td align="left">17</td> <td align="left">Top rear right</td> </tr> </tbody> </table><t>Following are three examples:</t><ul spacing="compact"> <li>If a<ul> <li>A file has a singlechannel, being achannel -- an LFEchannel, thechannel. The Vorbis comment field is WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x8.</li><li>If a<li>A file has fourchannels, beingchannels -- front left, front right, top front left, and top frontright, theright. The Vorbis comment field is WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x5003.</li><li>If an<li>An input has fourchannels, beingchannels -- back center, top front center, front center, and top rear center in thatorder, theyorder. These have to be reordered to front center, back center, top frontcentercenter, and top rear center. The Vorbis comment field added is WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x12104.</li> </ul> <t>WAVEFORMATEXTENSIBLE_CHANNEL_MASK fieldsMAY<bcp14>MAY</bcp14> be padded with zeros, for example, 0x0008 for a single LFE channel. Parsing of WAVEFORMATEXTENSIBLE_CHANNEL_MASK fieldsMUST<bcp14>MUST</bcp14> be case-insensitive for both the field name and the field contents.</t> <t>A WAVEFORMATEXTENSIBLE_CHANNEL_MASK field of 0x0 can be used to indicate that none of the audio channels of a file correlate with speaker positions. This is the case when audio needs to be decoded into speaker positions (e.g., Ambisonics B-format audio) or when a multitrack recording is contained.</t> <t>It is possible for a WAVEFORMATEXTENSIBLE_CHANNEL_MASK field to code for fewer channels than are present in the audio. If that is the case, the remaining channelsSHOULD NOT<bcp14>SHOULD NOT</bcp14> be rendered by a playback application unfamiliar with their purpose. For example, the Ambisonics UHJ format is compatible with stereo playback: its first two channels can be played back on stereo equipment, but all four channels together can be decoded into surround sound. For that example, the Vorbis comment field WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x3 would be set, indicating that the first two channels are front left and frontright,right and other channels do not correlate with speaker positions directly.</t> <t>If audio channels not assigned to any speaker are contained and decoding to speaker positions is possible, it is recommended to provide metadata on how this decoding should take place in another Vorbis comment field or an application metadata block. This document does not define such metadata.</t> </section> </section> <section anchor="cuesheet"><name>Cuesheet</name><t>To<t>A cuesheet metadata block can be used either to store the track and index point structure of a Compact Disc Digital Audio (CD-DA) along with its audio or to provide a mechanism to store locations of interest within a FLACfile, a cuesheet metadata block can be used.file. Certain aspects of this metadata blockfollowcome directly from the CD-DAspecification, calledspecification (called RedBook,Book), which is standardized as <xref target="IEC.60908.1999"></xref>. The description below iscompletecomplete, and further reference to[IEC.60908.1999]<xref target="IEC.60908.1999"/> is not needed to implement this metadata block.</t> <t>The structure of a cuesheet metadata block is enumerated in the following table.</t> <table> <thead> <tr> <th align="left">Data</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left"><tt>u(128*8)</tt></td> <td align="left">Media catalognumber,number in ASCII printable characters 0x20-0x7E.</td> </tr> <tr> <td align="left"><tt>u(64)</tt></td> <td align="left">Number of lead-in samples.</td> </tr> <tr> <td align="left"><tt>u(1)</tt></td> <td align="left"><tt>1</tt> if the cuesheet corresponds to aCD-DA,CD-DA; else <tt>0</tt>.</td> </tr> <tr> <td align="left"><tt>u(7+258*8)</tt></td> <td align="left">Reserved. All bitsMUST<bcp14>MUST</bcp14> be set to zero.</td> </tr> <tr> <td align="left"><tt>u(8)</tt></td> <td align="left">Number of tracks in this cuesheet.</td> </tr> <tr> <td align="left">Cuesheet tracks</td> <td align="left">A number of structures as specified in <xref target="cuesheet-track"></xref> equal to the number of tracks specified previously.</td> </tr> </tbody> </table><t>If the media catalog number is less than 128 bytes long, it is right-padded with 0x00 bytes. For CD-DA, this is athirteen digit number,13-digit number followed by 115 0x00 bytes.</t> <t>The number of lead-in samples has meaning only for CD-DA cuesheets; for other uses, it should be 0. For CD-DA, the lead-in is the TRACK 00 area where the table of contents is stored; more precisely, it is the number of samples from the first sample of the media to the first sample of the first index point of the first track. According to <xref target="IEC.60908.1999"></xref>, the lead-inMUST<bcp14>MUST</bcp14> besilencesilent, and CD grabbing software does not usually store it; additionally, the lead-inMUST<bcp14>MUST</bcp14> be at least two seconds butMAY<bcp14>MAY</bcp14> be longer. For these reasons, the lead-in length is stored here so that the absolute position of the first track can be computed. Note that the lead-in stored here is the number of samples up to the first index point of the first track, not necessarily to INDEX 01 of the first track; even the first trackMAY<bcp14>MAY</bcp14> have INDEX 00 data.</t> <t>The number of tracksMUST<bcp14>MUST</bcp14> be at least 1, as a cuesheet blockMUST<bcp14>MUST</bcp14> have a lead-out track. For CD-DA, this numberMUST<bcp14>MUST</bcp14> be no more than 100 (99 regular tracks and one lead-out track). The lead-out track is always the last track in the cuesheet. For CD-DA, the lead-out track numberMUST<bcp14>MUST</bcp14> be 170 as specified by <xreftarget="IEC.60908.1999"></xref>, otherwisetarget="IEC.60908.1999"></xref>; otherwise, itMUST<bcp14>MUST</bcp14> be 255.</t> <section anchor="cuesheet-track"><name>Cuesheettrack</name>Track</name> <table> <thead> <tr> <th align="left">Data</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left"><tt>u(64)</tt></td> <td align="left">Track offset of the first index point in samples, relative to the beginning of the FLAC audio stream.</td> </tr> <tr> <td align="left"><tt>u(8)</tt></td> <td align="left">Track number.</td> </tr> <tr> <td align="left"><tt>u(12*8)</tt></td> <td align="left">Track ISRC.</td> </tr> <tr> <td align="left"><tt>u(1)</tt></td> <td align="left">The track type: 0 for audio, 1 for non-audio. This corresponds to the CD-DA Q-channel control bit 3.</td> </tr> <tr> <td align="left"><tt>u(1)</tt></td> <td align="left">The pre-emphasis flag: 0 for no pre-emphasis, 1 for pre-emphasis. This corresponds to the CD-DA Q-channel control bit 5.</td> </tr> <tr> <td align="left"><tt>u(6+13*8)</tt></td> <td align="left">Reserved. All bitsMUST<bcp14>MUST</bcp14> be set to zero.</td> </tr> <tr> <td align="left"><tt>u(8)</tt></td> <td align="left">The number of track index points.</td> </tr> <tr> <td align="left">Cuesheet track index points</td> <td align="left">For all tracks except the lead-out track, a number of structures as specified in <xref target="cuesheet-track-index-point"></xref> equal to the number of index points specified previously.</td> </tr> </tbody></table><t>Note</table> <t>Note that the track offset differs from the one in CD-DA, where the track's offset in theTOCtable of contents (TOC) is that of the track's INDEX 01 even if there is an INDEX 00. For CD-DA, the track offsetMUST<bcp14>MUST</bcp14> be evenly divisible by 588 samples (588 samples = 44100 samples/s * 1/75 s).</t> <t>A track number of 0 is notallowed,allowed because the CD-DA specification reserves this for the lead-in. ForCD-DACD-DA, the numberMUST<bcp14>MUST</bcp14> be1-99,1-99 or 170 for the lead-out; for non-CD-DA, the track numberMUST<bcp14>MUST</bcp14> be 255 for the lead-out. It is recommended to start with track 1 and increase sequentially. Track numbersMUST<bcp14>MUST</bcp14> be unique within a cuesheet.</t> <t>The track ISRC (International Standard Recording Code) is a 12-digit alphanumeric code; see <xref target="ISRC-handbook"></xref>. A value of 12 ASCII 0x00 charactersMAY<bcp14>MAY</bcp14> be used to denote the absence of an ISRC.</t> <t>ThereMUST<bcp14>MUST</bcp14> be at least one index point in every track in a cuesheet except for the lead-out track, whichMUST<bcp14>MUST</bcp14> have zero. For CD-DA, the number of index pointsMUST NOT<bcp14>MUST NOT</bcp14> be more than 100.</t> <section anchor="cuesheet-track-index-point"><name>Cuesheettrack index point</name>Track Index Point</name> <table> <thead> <tr> <th align="left">Data</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left"><tt>u(64)</tt></td> <td align="left">Offset in samples, relative to the track offset, of the index point.</td> </tr> <tr> <td align="left"><tt>u(8)</tt></td> <td align="left">The track index point number.</td> </tr> <tr> <td align="left"><tt>u(3*8)</tt></td> <td align="left">Reserved. All bitsMUST<bcp14>MUST</bcp14> be set to zero.</td> </tr> </tbody> </table><t>For CD-DA, the track index point offsetMUST<bcp14>MUST</bcp14> be evenly divisible by 588 samples (588 samples = 44100 samples/s * 1/75 s). Note that the offset is from the beginning of the track, not the beginning of the audio data.</t> <t>For CD-DA, a track index point number of 0 corresponds to the track pre-gap. The first index point in a trackMUST<bcp14>MUST</bcp14> have a number of 0 or 1, and subsequently, index point numbersMUST<bcp14>MUST</bcp14> increase by 1. Index point numbersMUST<bcp14>MUST</bcp14> be unique within a track.</t> </section> </section> </section> <section anchor="picture"><name>Picture</name> <t>The picture metadata block contains image data of a picture in some way belonging to the audio contained in the FLAC file. Its format is derived from theAPICAttached Picture (APIC) frame in the ID3v2specification,specification; see <xref target="ID3v2"></xref>. However, contrary to the APIC frame in ID3v2, the media type and description are prepended with a 4-byte length field instead of being 0x00 delimited strings. A FLAC fileMAY<bcp14>MAY</bcp14> contain one or more picture metadata blocks.</t> <t>Note that while the length fields for media type, description, and picture data are 4 bytes in length and couldin theorycode for a size up to 4GiB,GiB in theory, the total metadata block size cannot exceed what can be described by the metadata block header, i.e., 16 MiB.</t> <t>Instead of picture data, the picture metadata block can also containana URI as described in <xref target="RFC3986"></xref>.</t> <t>The structure of a picture metadata block is enumerated in the following table.</t> <table> <thead> <tr> <th align="left">Data</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left"><tt>u(32)</tt></td> <td align="left">The picture type according tonext table</td><xref target="table13"/>.</td> </tr> <tr> <td align="left"><tt>u(32)</tt></td> <td align="left">The length of the media type string in bytes.</td> </tr> <tr> <td align="left"><tt>u(n*8)</tt></td> <td align="left">The media type string as specified by <xref target="RFC2046"></xref>, or the text string <tt>--></tt> to signify that the data part is a URI of the picture instead of the picture data itself. This field must be in printable ASCII characters 0x20-0x7E.</td> </tr> <tr> <td align="left"><tt>u(32)</tt></td> <td align="left">The length of the description string in bytes.</td> </tr> <tr> <td align="left"><tt>u(n*8)</tt></td> <td align="left">The description of thepicture,picture in UTF-8.</td> </tr> <tr> <td align="left"><tt>u(32)</tt></td> <td align="left">The width of the picture in pixels.</td> </tr> <tr> <td align="left"><tt>u(32)</tt></td> <td align="left">The height of the picture in pixels.</td> </tr> <tr> <td align="left"><tt>u(32)</tt></td> <td align="left">The color depth of the picture in bits per pixel.</td> </tr> <tr> <td align="left"><tt>u(32)</tt></td> <td align="left">For indexed-color pictures (e.g., GIF), the number of colorsused, orused; <tt>0</tt> for non-indexed pictures.</td> </tr> <tr> <td align="left"><tt>u(32)</tt></td> <td align="left">The length of the picture data in bytes.</td> </tr> <tr> <td align="left"><tt>u(n*8)</tt></td> <td align="left">The binary picture data.</td> </tr> </tbody> </table><t>The height, width, color depth, and'number"number ofcolors'colors" fields are for informational purposes only. ApplicationsMUST NOT<bcp14>MUST NOT</bcp14> use them in decoding the picture or deciding how to display it, butMAYapplications <bcp14>MAY</bcp14> use them to decide whether or not to process a blockor not(e.g., when selecting between different picture blocks) andMAY<bcp14>MAY</bcp14> show them to the user. If a picture has no concept for any of these fields (e.g., vector images may not have a height or width in pixels) or the content of any field is unknown, the affected fieldsMUST<bcp14>MUST</bcp14> be set to zero.</t> <t>The following table contains all the defined picture types. Values other than those listed in the table are reserved. ThereMAY<bcp14>MAY</bcp14> only be one each of picture types 1 and 2 in a file. In general practice, many FLAC playback devices and software display the contents of a picture metadatablockblock, if present, with picture type 3 (front cover) duringplayback, if present.</t> <table>playback.</t> <table anchor="table13"> <thead> <tr> <th align="left">Value</th> <th align="left">Picturetype</th>Type</th> </tr> </thead> <tbody> <tr> <td align="left">0</td> <td align="left">Other</td> </tr> <tr> <td align="left">1</td> <td align="left">PNG file icon of 32x32pixels, seepixels (see <xreftarget="RFC2083"></xref></td>target="RFC2083"></xref>)</td> </tr> <tr> <td align="left">2</td> <td align="left">General file icon</td> </tr> <tr> <td align="left">3</td> <td align="left">Front cover</td> </tr> <tr> <td align="left">4</td> <td align="left">Back cover</td> </tr> <tr> <td align="left">5</td> <td align="left">Liner notes page</td> </tr> <tr> <td align="left">6</td> <td align="left">Media label (e.g., CD, Vinyl or Cassette label)</td> </tr> <tr> <td align="left">7</td> <td align="left">Lead artist, lead performer, or soloist</td> </tr> <tr> <td align="left">8</td> <td align="left">Artist or performer</td> </tr> <tr> <td align="left">9</td> <td align="left">Conductor</td> </tr> <tr> <td align="left">10</td> <td align="left">Band or orchestra</td> </tr> <tr> <td align="left">11</td> <td align="left">Composer</td> </tr> <tr> <td align="left">12</td> <td align="left">Lyricist or text writer</td> </tr> <tr> <td align="left">13</td> <td align="left">Recording location</td> </tr> <tr> <td align="left">14</td> <td align="left">During recording</td> </tr> <tr> <td align="left">15</td> <td align="left">During performance</td> </tr> <tr> <td align="left">16</td> <td align="left">Movie or video screen capture</td> </tr> <tr> <td align="left">17</td> <td align="left">A bright colored fish</td> </tr> <tr> <td align="left">18</td> <td align="left">Illustration</td> </tr> <tr> <td align="left">19</td> <td align="left">Band or artist logotype</td> </tr> <tr> <td align="left">20</td> <td align="left">Publisher or studio logotype</td> </tr> </tbody> </table><t>The origin and use of value17, "A17 ("A bright coloredfish",fish") is unclear. This was copied to maintain compatibility with ID3v2. Applications are discouraged from offering this value to users when embedding a picture.</t> <t>Ifnot a picture buta URI (not a picture) is contained in this block, the following points apply:</t><ul spacing="compact"><ul> <li>The URI can beeitherin either absolute or relative form. Ifana URI is in relative form, it is related to the URI of the FLAC content processed.</li> <li>ApplicationsMUST<bcp14>MUST</bcp14> obtain explicit user approval to retrieve images via remote protocols and to retrieve local images that are not located in the same directory as the FLAC file being processed.</li> <li>Applications supporting linked imagesMUST<bcp14>MUST</bcp14> handle unavailability of URIs gracefully. TheyMAY<bcp14>MAY</bcp14> report unavailability to the user.</li> <li>ApplicationsMAY<bcp14>MAY</bcp14> reject processing URIs for any reason,in particularparticularly for security or privacy reasons.</li> </ul> </section> </section> <section anchor="frame-structure"><name>Framestructure</name> <t>DirectlyStructure</name> <t>One or more frames follow directly after the last metadatablock, one or more frames follow.block. Each frame consists of a frame header, one or more subframes, padding zero bits to achievebyte-alignment,byte alignment, and a frame footer. The number of subframes in each frame is equal to the number of audio channels.</t> <t>Each frame header stores the audio sample rate, number of bits per sample, and number of channels independently of the streaminfo metadata block and other frame headers. This was done to permit multicasting of FLAC files, but it also allows these properties to change mid-stream. Because not all environments in which FLAC decoders are used are able to cope with changes to these properties during playback, a decoderMAY<bcp14>MAY</bcp14> choose to stop decoding on such a change. A decoder that does not check for such a change could be vulnerable to buffer overflows. See also <xref target="security-considerations"></xref>.</t> <t>Note that storing audio with changing audio properties in FLAC results in various practical problems. For example, these changes of audio properties must happen on a frameboundary,boundary or the process will not be lossless. When a variable block size is chosen to accommodate this, note that blocks smaller than 16 samples are notallowed andallowed; therefore, it isthereforenot possible to store an audio stream in which these properties change within 16 samples of the last change or the start of the file. Also, since the streaminfo metadata block can only accommodate a single set of properties, it is only valid for part of such an audio stream. Instead, it isRECOMMENDED<bcp14>RECOMMENDED</bcp14> to store an audio stream with changing properties in FLAC encapsulated in a container capable of handling such changes, as these do not suffer from the mentioned limitations. See <xref target="container-mappings"></xref> for details.</t> <section anchor="frame-header"><name>Frameheader</name>Header</name> <t>Each frameMUST<bcp14>MUST</bcp14> start on a byte boundary andstartsstart with the 15-bit frame sync code 0b111111111111100. Following the sync code is the blocking strategy bit, whichMUST NOT<bcp14>MUST NOT</bcp14> change during the audio stream. The blocking strategy bit is 0 for a fixed block size stream or 1 for a variable block size stream. If the blocking strategy is known, a decoder can include this bit when searching for the start of a frame to reduce the possibility of encountering a false positive, as the first two bytes of a frame are either 0xFFF8 for a fixed block size stream or 0xFFF9 for a variable block size stream.</t> <section anchor="block-size-bits"><name>Blocksize bits</name>Size Bits</name> <t>Following the frame sync code and blocking strategy bit are 4 bits (the first 4 bits of the third byte of each frame) referred to as the block size bits. Their value relates to the block size according to the following table, where v is the value of the 4 bits as an unsigned number. If the block size bits code for an uncommon block size, this is stored after the codednumber,number; see <xref target="uncommon-block-size"></xref>.</t> <table> <thead> <tr> <th align="left">Value</th> <th align="left">Blocksize</th>Size</th> </tr> </thead> <tbody> <tr> <td align="left">0b0000</td> <tdalign="left">reserved</td>align="left">Reserved</td> </tr> <tr> <td align="left">0b0001</td> <td align="left">192</td> </tr> <tr> <td align="left">0b0010 - 0b0101</td> <td align="left">144 *(2^v),(2<sup>v</sup>), i.e., 576, 1152, 2304, or 4608</td> </tr> <tr> <td align="left">0b0110</td> <tdalign="left">uncommonalign="left">Uncommon block size minus11, stored as an 8-bit number</td> </tr> <tr> <td align="left">0b0111</td> <tdalign="left">uncommonalign="left">Uncommon block size minus11, stored as a 16-bit number</td> </tr> <tr> <td align="left">0b1000 - 0b1111</td> <tdalign="left">2^v,align="left">2<sup>v</sup>, i.e., 256, 512, 1024, 2048, 4096, 8192, 16384, or 32768</td> </tr> </tbody> </table></section> <section anchor="sample-rate-bits"><name>Samplerate bits</name>Rate Bits</name> <t>The next 4 bits (the last 4 bits of the third byte of each frame), referred to as the sample rate bits, contain the sample rate of the audio according to the following table. If the sample rate bits code for an uncommon sample rate, this is stored after the uncommon blocksize or after the coded numbersize; if no uncommon block size wasused.used, this is stored after the coded number. See <xref target="uncommon-sample-rate"></xref>.</t> <table> <thead> <tr> <th align="left">Value</th> <th align="left">Samplerate</th>Rate</th> </tr> </thead> <tbody> <tr> <td align="left">0b0000</td> <tdalign="left">samplealign="left">Sample rate only stored in the streaminfo metadata block</td> </tr> <tr> <td align="left">0b0001</td> <td align="left">88.2 kHz</td> </tr> <tr> <td align="left">0b0010</td> <td align="left">176.4 kHz</td> </tr> <tr> <td align="left">0b0011</td> <td align="left">192 kHz</td> </tr> <tr> <td align="left">0b0100</td> <td align="left">8 kHz</td> </tr> <tr> <td align="left">0b0101</td> <td align="left">16 kHz</td> </tr> <tr> <td align="left">0b0110</td> <td align="left">22.05 kHz</td> </tr> <tr> <td align="left">0b0111</td> <td align="left">24 kHz</td> </tr> <tr> <td align="left">0b1000</td> <td align="left">32 kHz</td> </tr> <tr> <td align="left">0b1001</td> <td align="left">44.1 kHz</td> </tr> <tr> <td align="left">0b1010</td> <td align="left">48 kHz</td> </tr> <tr> <td align="left">0b1011</td> <td align="left">96 kHz</td> </tr> <tr> <td align="left">0b1100</td> <tdalign="left">uncommonalign="left">Uncommon sample rate inkHzkHz, stored as an 8-bit number</td> </tr> <tr> <td align="left">0b1101</td> <tdalign="left">uncommonalign="left">Uncommon sample rate inHzHz, stored as a 16-bit number</td> </tr> <tr> <td align="left">0b1110</td> <tdalign="left">uncommonalign="left">Uncommon sample rate in Hz divided by 10, stored as a 16-bit number</td> </tr> <tr> <td align="left">0b1111</td> <tdalign="left">forbidden</td>align="left">Forbidden</td> </tr> </tbody> </table></section> <section anchor="channels-bits"><name>Channelsbits</name>Bits</name> <t>The next 4 bits (the first 4 bits of the fourth byte of each frame), referred to as the channels bits, contain both the number of channels of the audio as well as any stereo decorrelation used according to the following table.</t> <t>If a channel layout different than the ones listed in the following table is used, this can be signaled with a WAVEFORMATEXTENSIBLE_CHANNEL_MASK tag in a Vorbis comment metadatablock,block; see <xref target="channel-mask"></xref> for details. Note that even when such a different channel layout is specified with a WAVEFORMATEXTENSIBLE_CHANNEL_MASK and the channel ordering in the following table is overridden, the channels bits still contain the actual number of channels coded in the frame. For details on the wayleft/side, right/side,left-side, side-right, andmid/sidemid-side stereo are coded, see <xref target="interchannel-decorrelation"></xref>.</t> <table> <thead> <tr> <th align="left">Value</th> <th align="left">Channels</th> </tr> </thead> <tbody> <tr> <td align="left">0b0000</td> <td align="left">1 channel: mono</td> </tr> <tr> <td align="left">0b0001</td> <td align="left">2 channels: left, right</td> </tr> <tr> <td align="left">0b0010</td> <td align="left">3 channels: left, right, center</td> </tr> <tr> <td align="left">0b0011</td> <td align="left">4 channels: front left, front right, back left, back right</td> </tr> <tr> <td align="left">0b0100</td> <td align="left">5 channels: front left, front right, front center, back/surround left, back/surround right</td> </tr> <tr> <td align="left">0b0101</td> <td align="left">6 channels: front left, front right, front center, LFE, back/surround left, back/surround right</td> </tr> <tr> <td align="left">0b0110</td> <td align="left">7 channels: front left, front right, front center, LFE, back center, side left, side right</td> </tr> <tr> <td align="left">0b0111</td> <td align="left">8 channels: front left, front right, front center, LFE, back left, back right, side left, side right</td> </tr> <tr> <td align="left">0b1000</td> <td align="left">2channels,channels: left,right,right; stored asleft/sideleft-side stereo</td> </tr> <tr> <td align="left">0b1001</td> <td align="left">2channels,channels: left,right,right; stored asright/sideside-right stereo</td> </tr> <tr> <td align="left">0b1010</td> <td align="left">2channels,channels: left,right,right; stored asmid/sidemid-side stereo</td> </tr> <tr> <td align="left">0b1011 - 0b1111</td> <tdalign="left">reserved</td>align="left">Reserved</td> </tr> </tbody> </table></section> <section anchor="bit-depth-bits"><name>Bitdepth bits</name>Depth Bits</name> <t>The next 3 bits (bits 5,66, and 7 of each fourth byte of each frame) contain the bit depth of the audio according to the followingtable.</t>table. The next bit is reserved and <bcp14>MUST</bcp14> be zero.</t> <table> <thead> <tr> <th align="left">Value</th> <th align="left">Bitdepth</th>Depth</th> </tr> </thead> <tbody> <tr> <td align="left">0b000</td> <tdalign="left">bitalign="left">Bit depth only stored in the streaminfo metadata block</td> </tr> <tr> <td align="left">0b001</td> <td align="left">8 bits per sample</td> </tr> <tr> <td align="left">0b010</td> <td align="left">12 bits per sample</td> </tr> <tr> <td align="left">0b011</td> <tdalign="left">reserved</td>align="left">Reserved</td> </tr> <tr> <td align="left">0b100</td> <td align="left">16 bits per sample</td> </tr> <tr> <td align="left">0b101</td> <td align="left">20 bits per sample</td> </tr> <tr> <td align="left">0b110</td> <td align="left">24 bits per sample</td> </tr> <tr> <td align="left">0b111</td> <td align="left">32 bits per sample</td> </tr> </tbody></table><t>The next bit is reserved and MUST be zero.</t></table> </section> <section anchor="coded-number"><name>Codednumber</name>Number</name> <t>Following the reserved bit (starting at the fifth byte of the frame) is either a sample or a frame number, which will be referred to as the coded number. When dealing with variable block size streams, the sample number of the first sample in the frame is encoded. When the file contains a fixed block size stream, the frame number is encoded. See <xref target="frame-header"></xref> on the blocking strategybitbit, which signals whether a stream is a fixed block size stream or a variable block size stream.Also seeSee also <xref target="addition-of-blocking-strategy-bit"></xref>.</t> <t>The coded number is stored in avariable lengthvariable-length code like UTF-8 as defined in <xreftarget="RFC3629"></xref>,target="RFC3629"></xref> but extended to a maximum of 36 bitsunencoded,unencoded or 7 bytes encoded.</t> <t>When a frame number is encoded, the valueMUST NOT<bcp14>MUST NOT</bcp14> be larger than what fits a value of 31 bits unencoded or 6 bytes encoded. Please note that as most general purpose UTF-8 encoders and decoders follow <xref target="RFC3629"></xref>, they will not be able to handle these extended codes. Furthermore, while UTF-8 is specifically used to encode characters, FLAC uses it to encode numbers instead. To encode or decode a coded number, follow the proceduresof Section 3 ofin <xreftarget="RFC3629"></xref>,target="RFC3629" sectionFormat="of" section="3"/>, but instead of using a character number, use a frame or samplenumber, andnumber. In addition, use the extended table below instead of the table inSection 3 of<xreftarget="RFC3629"></xref>, use the extended table below.</t>target="RFC3629" sectionFormat="of" section="3"/>.</t> <table> <thead> <tr> <th align="left">Numberrange (hexadecimal)</th>Range (Hexadecimal)</th> <th align="left">Octetsequence (binary)</th>Sequence (Binary)</th> </tr> </thead> <tbody> <tr> <td align="left">0000 0000 0000-<br />-<br/> 0000 0000 007F</td> <td align="left">0xxxxxxx</td> </tr> <tr> <td align="left">0000 0000 0080-<br />-<br/> 0000 0000 07FF</td> <td align="left">110xxxxx 10xxxxxx</td> </tr> <tr> <td align="left">0000 0000 0800-<br />-<br/> 0000 0000 FFFF</td> <td align="left">1110xxxx 10xxxxxx 10xxxxxx</td> </tr> <tr> <td align="left">0000 0001 0000-<br />-<br/> 0000 001F FFFF</td> <td align="left">11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</td> </tr> <tr> <td align="left">0000 0020 0000-<br />-<br/> 0000 03FF FFFF</td> <td align="left">111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx</td> </tr> <tr> <td align="left">0000 0400 0000-<br />-<br/> 0000 7FFF FFFF</td> <td align="left">1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx</td> </tr> <tr> <td align="left">0000 8000 0000-<br />-<br/> 000F FFFF FFFF</td> <td align="left">11111110 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx</td> </tr> </tbody> </table><t>If the coded number is a frame number, itMUST<bcp14>MUST</bcp14> be equal to the number of frames preceding the current frame. If the coded number is a sample number, itMUST<bcp14>MUST</bcp14> be equal to the number of samples preceding the current frame. In a stream where these requirements are not met, seeking is not (reliably) possible.</t> <t>For example, for a frame that belongs to a variable block size stream and has exactly 51 billion samples preceding it,has itsthe coded number is constructed asfollows.</t> <artwork><![CDATA[Octetsfollows:</t> <artwork type="ascii-art"> <![CDATA[Octets 1-5 0b11111110 0b10101111 0b10011111 0b10110101 0b10100011 ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ | | | Bits 18-13 | | Bits 24-19 | Bits 30-25 Bits 36-31 Octets 6-7 0b10111000 0b10000000 ^^^^^^ ^^^^^^ | Bits 6-1 Bits 12-7]]> </artwork>]]></artwork> <t>A decoder that relies on the coded number during seeking could be vulnerable to buffer overflows or getting stuck in an infinite loop if it seeks in a stream where the coded numbers are not strictly increasing or are otherwise not valid. See also <xref target="security-considerations"></xref>.</t> </section> <section anchor="uncommon-block-size"><name>Uncommonblock size</name>Block Size</name> <t>If the block size bits defined earlier in this sectionwereare 0b0110 or 0b0111 (uncommon block size minus 1 stored),thisthe block size minus 1 follows the coded number as either an 8-bit ora16-bit unsigned number coded big-endian. A value of 65535 (corresponding to a block size of 65536) is forbidden andMUST NOT<bcp14>MUST NOT</bcp14> be used, because such a block size cannot be represented in the streaminfo metadata block. A value from 0 up to (and including) 14, which corresponds to a block size from 1 to 15, is only valid for the last frame in a stream andMUST NOT<bcp14>MUST NOT</bcp14> be used for any other frame. See also <xref target="streaminfo"></xref>.</t> </section> <section anchor="uncommon-sample-rate"><name>Uncommon Sample Rate</name> <t> If the samplerate</name> <t>Followingrate bits are 0b1100, 0b1101, or 0b1110 (uncommon sample rate stored), the sample rate follows the uncommon block size (or the coded number if no uncommon block size is stored)is the sample rate, if the sample rate bits were 0b1100, 0b1101, or 0b1110 (uncommon sample rate stored),as either an 8-bit or a 16-bit unsigned number coded big-endian.</t> <t>The sample rateMUST NOT<bcp14>MUST NOT</bcp14> be 0 when the subframe contains audio. A sample rate of 0MAY<bcp14>MAY</bcp14> be used when non-audio is represented. See <xref target="streaminfo"></xref> for details.</t> </section> <section anchor="frame-header-crc"><name>FrameheaderHeader CRC</name> <t>Finally,after eitheran 8-bit CRC follows the frame/sample number, an uncommon block size, or an uncommon samplerate, dependingrate (depending on whether the latter two arestored, is an 8-bit CRC.stored). This CRC is initialized with 0 and has the polynomialx^8x<sup>8</sup> +x^2x<sup>2</sup> +x^1x<sup>1</sup> +x^0.x<sup>0</sup>. This CRC covers the whole frame header before the CRC, including the sync code.</t> </section> </section> <section anchor="subframes"><name>Subframes</name> <t>Following the frame header are a number of subframes equal to the number of audio channels. Note thatassubframes contain a bitstream that does not necessarilyhashave to be a whole number of bytes, so only the first subframealwaysstarts at a byte boundary.</t> <section anchor="subframe-header"><name>Subframeheader</name>Header</name> <t>Each subframe starts with a header. The first bit of the headerMUST<bcp14>MUST</bcp14> be 0, followed by 6 bitsdescribingthat describe which subframe type is used according to the following table, where v is the value of the 6 bits as an unsigned number.</t> <table> <thead> <tr> <th align="left">Value</th> <th align="left">Subframetype</th>Type</th> </tr> </thead> <tbody> <tr> <td align="left">0b000000</td> <td align="left">Constant subframe</td> </tr> <tr> <td align="left">0b000001</td> <td align="left">Verbatim subframe</td> </tr> <tr> <td align="left">0b000010 - 0b000111</td> <tdalign="left">reserved</td>align="left">Reserved</td> </tr> <tr> <td align="left">0b001000 - 0b001100</td> <td align="left">Subframe with a fixed predictor of orderv-8,v-8; i.e., 0, 1, 2, 3 or 4</td> </tr> <tr> <td align="left">0b001101 - 0b011111</td> <tdalign="left">reserved</td>align="left">Reserved</td> </tr> <tr> <td align="left">0b100000 - 0b111111</td> <td align="left">Subframe with a linear predictor of orderv-31,v-31; i.e., 1 through 32 (inclusive)</td> </tr> </tbody></table><t>Following</table> <t>Following the subframe type bits is a bit that flags whether the subframe uses any wasted bits (see <xref target="wasted-bits-per-sample"></xref>). Ifitthe flag bit is 0, the subframe doesn't use any wasted bits and the subframe header is complete. Ifitthe flag bit is 1, the subframedoes useuses wasted bits and the number of used wasted bitsfollowsminus 1 appears in unarycoded.</t>form, directly following the flag bit.</t> </section> <section anchor="wasted-bits-per-sample"><name>WastedbitsBits persample</name>Sample</name> <t>Most uncompressed audio file formats can only store audio samples with a bit depth that is an integer number of bytes. Samplesofin which the bit depth is not an integer number of bytes are usually stored in such formats by padding them with least-significant zero bits to a bit depth that is an integer number of bytes. For example, shifting a 14-bit sample right by 2 pads it to a 16-bit sample, which then has two zero least-significant bits. In this specification, these least-significant zero bits are referred to as wasted bits per sample or simply wasted bits. They are wasted in the sense that they contain noinformation,information but are stored anyway.</t> <t>The FLAC format can optionally take advantage of these wasted bits by signaling their presence and coding the subframe without them. To do this, the wasted bits per sample flag in a subframe header is set to01 and the number of wasted bits per sample (k) minus 1 follows the flag in an unary encoding. For example, if k is 3, 0b001 follows. If k = 0, the wasted bits per sample flag is 0 and nounary codedunary-coded k follows. In this document, if a subframe header signals a certain number of wasted bits, it is said it'uses'"uses" these wasted bits.</t> <t>If a subframe uses wasted bits (i.e., k is not equal to 0), samples are coded ignoring k least-significant bits. For example, if a frame not employing stereo decorrelation specifies a sample size of 16 bits per sample in the frame header and k of a subframe is 3, samples in the subframe are coded as 13 bits per sample. For more details, see <xref target="constant-subframe"></xref> on how the bit depth of a subframe is calculated. A decoderMUST<bcp14>MUST</bcp14> add k least-significant zero bits by shifting left (padding) after decoding a subframe sample. If the frame hasleft/side, right/side,left-side, side-right, ormid/sidemid-side stereo, a decoderMUST<bcp14>MUST</bcp14> perform padding on the subframes before restoring the channels to left and right. The number of wasted bits per sampleMUST<bcp14>MUST</bcp14> be such that the resulting number of bits per sample (of which the calculation is explained in <xref target="constant-subframe"></xref>) is larger than zero.</t> <t>Besides audio files that have a certain number of wasted bits for the whole file,there existaudio files exist in which the number of wasted bits varies. There are DVD-Audio discs in which blocks of samples have had their least-significant bits selectively zeroed to slightly improve the compression of their otherwise lossless Meridian Lossless Packingcodec,codec; see <xref target="MLP"></xref>. There are also audio processors likelossyWAV, seelossyWAV (see <xreftarget="lossyWAV"></xref>, whichtarget="lossyWAV"></xref>) that zero a number ofleast-sigificantleast-significant bits for a block of samples, increasing the compression in a non-lossless way. Because of this, the number of wasted bits kMAY<bcp14>MAY</bcp14> change between frames andMAY<bcp14>MAY</bcp14> differ between subframes. If the number of wasted bits changes halfway through a subframe (e.g., the first part has 2 wasted bits and the second part has 4 wastedbits)bits), the subframe uses the lowest number of wastedbits, as otherwisebits; otherwise, non-zero bits would bediscardeddiscarded, and the process would not be lossless.</t> </section> <section anchor="constant-subframe"><name>Constantsubframe</name>Subframe</name> <t>In a constant subframe, only a single sample is stored. This sample is stored as an integer number coded big-endian, signed two's complement. The number of bits used to store this sample depends on the bit depth of the current subframe. The bit depth of a subframe is equal to the bit depth as coded in the frame header (see <xreftarget="bit-depth-bits"></xref>),target="bit-depth-bits"></xref>) minus the number of used wasted bits coded in the subframe header (see <xref target="wasted-bits-per-sample"></xref>). If a subframe is a side subframe (see <xref target="interchannel-decorrelation"></xref>), the bit depth of that subframe is increased by 1 bit.</t> </section> <section anchor="verbatim-subframe"><name>Verbatimsubframe</name>Subframe</name> <t>A verbatim subframe stores all samples unencoded in sequential order. See <xref target="constant-subframe"></xref> on how a sample is stored unencoded. The number of samples that need to be stored in a subframe isgivenprovided by the block size in the frame header.</t> </section> <section anchor="fixed-predictor-subframe"><name>Fixedpredictor subframe</name>Predictor Subframe</name> <t>Five different fixed predictors are defined in the following table, one for each prediction order 0 through 4.In theThe tableisalso contains aderivation, whichderivation that explains the rationale for choosing these fixed predictors.</t> <table> <thead> <tr> <th align="left">Order</th> <th align="left">Prediction</th> <th align="left">Derivation</th> </tr> </thead> <tbody> <tr> <td align="left">0</td> <td align="left">0</td> <td align="left">N/A</td> </tr> <tr> <td align="left">1</td> <td align="left">a(n-1)</td> <td align="left">N/A</td> </tr> <tr> <td align="left">2</td> <td align="left">2 * a(n-1) - a(n-2)</td> <td align="left">a(n-1) + a'(n-1)</td> </tr> <tr> <td align="left">3</td> <td align="left">3 * a(n-1) - 3 * a(n-2) + a(n-3)</td> <td align="left">a(n-1) + a'(n-1) + a''(n-1)</td> </tr> <tr> <td align="left">4</td> <td align="left">4 * a(n-1) - 6 * a(n-2) + 4 * a(n-3) - a(n-4)</td> <td align="left">a(n-1) + a'(n-1) + a''(n-1) + a'''(n-1)</td> </tr> </tbody></table><t>Where</t> <ul spacing="compact"></table><t>Where:</t> <ul> <li>n is the number of the sample being predicted.</li> <li>a(n) is the sample being predicted.</li> <li>a(n-1) is the sample before the one being predicted.</li> <li>a'(n-1) is the difference between the previous sample and the sample before that, i.e., a(n-1) - a(n-2). This is the closest available first-order discrete derivative.</li> <li>a''(n-1) is a'(n-1) - a'(n-2) or the closest available second-order discrete derivative.</li> <li>a'''(n-1) is a''(n-1) - a''(n-2) or the closest available third-order discrete derivative.</li> </ul> <t>As a predictor makes use of samples preceding the sample that is predicted, it can only be used when enough samples are known. As each subframe in FLAC is coded completely independently, the first few samples in each subframe cannot be predicted. Therefore, a number of so-called warm-up samples equal to the predictor order is stored. These are stored unencoded, bypassing the predictor and residual coding stages. See <xref target="constant-subframe"></xref> on how samples are stored unencoded. The table below defines how a fixed predictor subframe appears in the bitstream.</t> <table> <thead> <tr> <th align="left">Data</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left"><tt>s(n)</tt></td> <td align="left">Unencoded warm-up samples (n = subframe's bits per sample * predictor order).</td> </tr> <tr> <td align="left">Coded residual</td> <td align="left">Coded residual as defined in <xref target="coded-residual"></xref></td> </tr> </tbody></table><t>As the</table><t>Because fixed predictors are specified, they do not have to be stored. The fixed predictor order, which is stored in the subframe header, specifies which predictor is used.</t> <t>To encode a signal with a fixed predictor, each sample has the corresponding prediction subtracted and sent to the residual coder. To decode a signal with a fixed predictor, the residual is decoded, and then the prediction can be added for each sample. This means that decoding is necessarily a sequential process within a subframe, as for each sample, enough fully decoded previous samples are needed to calculate the prediction.</t> <t>For fixed predictor order 0, the prediction is always0, thus0; thus, each residual sample is equal to its corresponding input or decoded sample. The difference between a fixed predictor with order 0 and a verbatimsubframe,subframe is that a verbatim subframe stores all samplesunencoded,unencoded while a fixed predictor with order 0 has all its samples processed by the residual coder.</t> <t>Thefirst orderfirst-order fixed predictor is comparable to howDPCMdifferential pulse-code modulation (DPCM) encoding works, as the resulting residual sample is the difference between the corresponding sample and the sample before it. Thehigher orderhigher-order fixed predictors can be understood as polynomials fitted to the previous samples.</t> </section> <section anchor="linear-predictor-subframe"><name>Linearpredictor subframe</name>Predictor Subframe</name> <t>Whereas fixed predictors are well suited for simple signals, using a (non-fixed) linear predictor on more complex signals can improve compression by making the residual samples even smaller. There is a certaintrade-offtrade-off, however, as storing the predictor coefficients takes up space as well.</t> <t>In the FLAC format, a predictor is defined by up to 32 predictor coefficients and a shift. To form a prediction, each coefficient is multiplied by its corresponding past sample, the results are summed, and this sum is then shifted. To encode a signal with a linear predictor, each sample has the corresponding prediction subtracted and sent to the residual coder. To decode a signal with a linear predictor, the residual is decoded, and then the prediction can be added for each sample. This means that decodingMUST<bcp14>MUST</bcp14> be a sequential process within a subframe, asfor each sample,enough decoded samples are needed to calculate theprediction.</t>prediction for each sample.</t> <t>The table below defines how a linear predictor subframe appears in the bitstream.</t> <table> <thead> <tr> <th align="left">Data</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left"><tt>s(n)</tt></td> <td align="left">Unencoded warm-up samples (n = subframe's bits per sample *lpcLPC order).</td> </tr> <tr> <td align="left"><tt>u(4)</tt></td> <td align="left">(Predictor coefficient precision in bits)-1(NOTE:(Note: 0b1111 is forbidden).</td> </tr> <tr> <td align="left"><tt>s(5)</tt></td> <td align="left">Prediction right shift needed in bits.</td> </tr> <tr> <td align="left"><tt>s(n)</tt></td> <td align="left">Predictor coefficients (n = predictor coefficient precision *lpcLPC order).</td> </tr> <tr> <td align="left">Coded residual</td> <td align="left">Coded residual as defined in <xreftarget="coded-residual"></xref></td>target="coded-residual"></xref>.</td> </tr> </tbody> </table><t>See <xref target="constant-subframe"></xref> on how the warm-up samples are stored unencoded. The predictor coefficients are stored as an integer number coded big-endian, signed two's complement, where the number of bits needed for each coefficient is defined by the predictor coefficient precision. While the prediction right shift is signed two's complement, this numberMUST NOT<bcp14>MUST NOT</bcp14> benegative,negative; see <xref target="restriction-of-lpc-shift-to-non-negative-values"></xref> for an explanation why this is.</t> <t>Please note that the order in which the predictor coefficients appear in the bitstream corresponds to which <strong>past</strong> sample they belong to. In other words, the order of the predictor coefficients is opposite to the chronological order of the samples. So, the first predictor coefficient has to be multiplied with the sample directly before the sample that is being predicted, the second predictor coefficient has to be multiplied with the sample before that, etc.</t> </section> <section anchor="coded-residual"><name>Codedresidual</name>Residual</name> <t>The first two bits in a coded residual indicate which coding method is used. See the table below.</t><table><table anchor="coded-residual-table"> <thead> <tr> <thalign="right">Value</th>align="left">Value</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <tdalign="right">0b00</td>align="left">0b00</td> <tdalign="left">partitionedalign="left">Partitioned Rice code with 4-bit parameters</td> </tr> <tr> <tdalign="right">0b01</td>align="left">0b01</td> <tdalign="left">partitionedalign="left">Partitioned Rice code with 5-bit parameters</td> </tr> <tr> <tdalign="right">0b10align="left">0b10 - 0b11</td> <tdalign="left">reserved</td>align="left">Reserved</td> </tr> </tbody> </table><t>Both defined coding methods work the sameway,way but differ in the number of bits used for Rice parameters. The 4 bits that directly follow the coding method bits form the partition order, which is an unsigned number. The rest of the coded residual consists of2^(partition order)2<sup>(partition order)</sup> partitions. For example, if the 4 bits are 0b1000, the partition order is88, and the residual is split up into2^82<sup>8</sup> = 256 partitions.</t> <t>Each partition contains a certain number of residual samples. The number of residual samples in the first partition is equal to (block size >> partition order) - predictor order, i.e., the block size divided by the number of partitions minus the predictor order. In all other partitions, the number of residual samples is equal to (block size >> partition order).</t> <t>The partition orderMUST<bcp14>MUST</bcp14> be such that the block size is evenly divisible by the number of partitions. This means, for example, thatfor all odd block sizes,only partition order 0 isallowed.allowed for all odd block sizes. The partition order alsoMUST<bcp14>MUST</bcp14> be such that the (block size >> partition order) is larger than the predictor order. This means, for example, that with a block size of 4096 and a predictor order of 4, the partition order cannot be larger than 9.</t> <t>Each partition starts with a parameter. If the coded residual of a subframe is one with 4-bit Rice parameters (seethe table at the start of this section),<xref target="coded-residual-table"/>), the first 4 bits of each partition are either a Rice parameter or an escape code. These 4 bits indicate an escape code if they are0b1111, otherwise0b1111; otherwise, they contain the Rice parameter as an unsigned number. If the coded residual of the current subframe is one with 5-bit Rice parameters, the first 5 bits of each partition indicate an escape code if they are0b11111,0b11111; otherwise, they contain the Rice parameter as an unsigned number as well.</t> <section anchor="escaped-partition"><name>Escapedpartition</name>Partition</name> <t>If an escape code was used, the partition does not contain a variable-lengthRice coded residual, butRice-coded residual; rather, it contains a fixed-length unencoded residual. Directly following the escape code are 5 bits containing the number of bits with which each residual sample is stored, as an unsigned number. The residual samples themselves are stored signed two's complement. For example, when a partition is escaped and each residual sample is stored with 3 bits, the number -1 is represented as 0b111.</t> <t>Note that it is possible that the number of bits with which each sample is stored is 0, which means that all residual samples in that partition have a value of 0 and that no bits are used to store the samples. In that case, the partition contains nothing except the escape code and 0b00000.</t> </section> <section anchor="rice-code"><name>Ricecode</name>Code</name> <t>If a Rice parameter was provided for a certain partition, that partition contains aRice codedRice-coded residual. The residual samples, which are signed numbers, are represented by unsigned numbers in the Rice code. For positive numbers, the representation is the numberdoubled, fordoubled. For negative numbers, the representation is the number multiplied by -2 andhaswith 1 subtracted. This representation of signed numbers is also known as zigzag encoding. Thezigzag encodedzigzag-encoded residual is called the folded residual.</t> <t>Each folded residual sample is then split into two parts, a most-significant part and a least-significant part. The Rice parameter at the start of each partition determines where that split lies: it is the number of bits in the least-significant part. Each residual sample is then stored by coding the most-significant part as unary, followed by the least-significant part as binary.</t> <t>For example, take a partition with Rice parameter 3 containing a folded residual sample with 38 as its value, which is 0b100110 in binary. The most-significant part is 0b100 (4) and is stored in unary form as 0b00001. The least-significant part is 0b110 (6) and is stored as is. The Rice code word is thus 0b00001110. The Rice code words for all residual samples in a partition are stored consecutively.</t> <t>To decode a Rice code word, zero bits must be counted until encountering a one bit, after which a number of bits given by the Rice parameter must be read. The count of zero bits is shifted left by the Rice parameter (i.e., multiplied by 2 raised to the power Rice parameter) and bitwise ORed with (i.e., added to) the read value. This is the folded residual value. An even folded residual value is shifted right 1 bit (i.e., divided bytwo)2) to get the (unfolded) residual value. An odd folded residual value is shifted right 1 bit and then has all bits flipped (1 added to and divided by -2) to get the (unfolded) residual value, subject to negative numbers being signed two's complement on the decoding machine.</t> <t><xref target="examples"></xref> shows decoding of a complete coded residual.</t> </section> <section anchor="residual-sample-value-limit"><name>Residualsample value limit</name>Sample Value Limit</name> <t>All residual sample valuesMUST<bcp14>MUST</bcp14> be representable in the range offered by a 32-bit integer, signed one's complement. Equivalently, all residual sample valuesMUST<bcp14>MUST</bcp14> fall in the range offered by a 32-bit integer signed two'scomplementcomplement, excluding the most negative possible value of that range. This means residual sample valuesMUST NOT<bcp14>MUST NOT</bcp14> have an absolute value equal to, or larger than, 2 to the power 31. A FLAC encoderMUST<bcp14>MUST</bcp14> make sure of this. If a FLAC encoder is, for a certain subframe, unable to find a suitable predictor for which all residual samples fall within said range, itMUST<bcp14>MUST</bcp14> default to writing a verbatim subframe. <xref target="numerical-considerations"></xref> explains in which circumstances residual samples are already implicitly representable in saidrange and thusrange; thus, an additional check is not needed.</t> <t>The reason for this limit is to ensure that decoders can use 32-bit integers when processing residuals, simplifying decoding. The reason the most negative value of a 32-bitintinteger signed two's complement is specifically excluded is to prevent decoders from having to implement specific handling of that value, as it cannot be negated within a 32-bit signedint,integer, and most library routines calculating an absolute value have undefined behavioronfor processing that value.</t> </section> </section> </section> <section anchor="frame-footer"><name>Framefooter</name>Footer</name> <t>Following the last subframe is the frame footer. If the last subframe is not byte aligned (i.e., the number of bits required to store all subframes put together is not divisible by 8), zero bits are added until byte alignment is reached. Following this is a 16-bit CRC, initialized with 0, with the polynomialx^16x<sup>16</sup> +x^15x<sup>15</sup> +x^2x<sup>2</sup> +x^0.x<sup>0</sup>. This CRC covers the wholeframeframe, excluding the 16-bitCRC,CRC but including the sync code.</t> </section> </section> <section anchor="container-mappings"><name>Containermappings</name>Mappings</name> <t>The FLAC format can be used without any container, as it already provides for the most basic features normally associated with a container. However, the functionality this basic container provides is rather limited, and for more advancedfeatures, likefeatures (such as combining FLAC audio withvideo,video), it needs to be encapsulated by a more capable container. This presents a problem: because of these container features, the FLAC format mixes data that belongs to the encoded data (like block size and sample rate) with data that belongs to the container (like checksum and timecode). The choice was made to encapsulate FLAC frames as they are, which means some data will be duplicated and potentially deviating between the FLAC frames and the encapsulating container.</t> <t>As FLAC frames are completely independent of each other, container format features handling dependencies do not need to be used. For example, all FLAC frames embedded in Matroska are marked as keyframes when they are stored in a SimpleBlock, and tracks in an MP4 file containing only FLAC frames do not need a sync sample box.</t> <section anchor="ogg-mapping"><name>Oggmapping</name>Mapping</name> <t>The Ogg container format is defined in <xref target="RFC3533"></xref>. The first packet of a logical bitstream carrying FLAC data is structured according to the following table.</t> <table> <thead> <tr> <th align="left">Data</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">5 bytes</td> <td align="left">Bytes <tt>0x7F 0x46 0x4C 0x41 0x43</tt> (as also defined by <xreftarget="RFC5334"></xref>)</td>target="RFC5334"></xref>).</td> </tr> <tr> <td align="left">2 bytes</td> <td align="left">Version number of the FLAC-in-Ogg mapping. These bytes are <tt>0x01 0x00</tt>, meaning version 1.0 of the mapping.</td> </tr> <tr> <td align="left">2 bytes</td> <td align="left">Number of header packets (excluding the first header packet) as an unsigned number coded big-endian.</td> </tr> <tr> <td align="left">4 bytes</td> <td align="left">The <tt>fLaC</tt>signature</td>signature.</td> </tr> <tr> <td align="left">4 bytes</td> <td align="left">A metadata block header for the streaminfoblock</td>metadata block.</td> </tr> <tr> <td align="left">34 bytes</td> <td align="left">A streaminfo metadatablock</td>block.</td> </tr> </tbody> </table><t>The number of header packetsMAY<bcp14>MAY</bcp14> be 0, which means the number of packets that follow is unknown. This first packetMUST NOT<bcp14>MUST NOT</bcp14> share a Ogg page with any other packets. This means the first page of a logical stream of FLAC-in-Ogg is always 79 bytes.</t> <t>Following the first packet are one or more header packets, each of which contains a single metadata block. The first of these packetsSHOULD<bcp14>SHOULD</bcp14> be a Vorbis comment metadatablock,block for historic reasons. This is contrary to unencapsulated FLAC streams, where the order of metadata blocks is not important except for the streaminfo metadata block and where a Vorbis comment metadata block is optional.</t> <t>Following the header packets are audio packets. Each audio packet contains a single FLAC frame. The first audio packetMUST<bcp14>MUST</bcp14> start on a new Ogg page, i.e., the last metadata blockMUST<bcp14>MUST</bcp14> finish its page before any audio packets are encapsulated.</t> <t>The granule position of all pages containing header packetsMUST<bcp14>MUST</bcp14> be 0. For pages containing audio packets, the granule position is the number of the last sample contained in the last completed packet in the frame. The sample numbering considers interchannel samples. If a page contains no packet end (e.g., when it only contains the start of a largepacket, whichpacket that continues on the next page), then the granule position is set to the maximum value possible, i.e., <tt>0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF</tt>.</t> <t>The granule position of the first audio data page with a completed packetMAY<bcp14>MAY</bcp14> be larger than the number of samples contained in packets that complete on that page. In other words, the apparent sample number of the first sample in the stream following from the granule position and the audio dataMAY<bcp14>MAY</bcp14> be larger than 0. This allows, for example, a server to cast a live stream to several clients that joined at differentmoments,moments without rewriting the granule position for each client.</t> <t>If an audio stream is encoded where audio properties (sample rate, number of channels, or bit depth) change at some point in the stream, this should be dealt with by finishing encoding of the current Ogg stream and starting a new Ogg stream, concatenated to the previous one. This is called chaining in Ogg. See the Ogg specification <xref target="RFC3533"></xref> for details.</t> </section> <section anchor="matroska-mapping"><name>Matroskamapping</name>Mapping</name> <t>The Matroska container format is defined in <xreftarget="I-D.ietf-cellar-matroska"></xref>.target="RFC9559"></xref>. The codec ID (EBML path <tt>\Segment\Tracks\TrackEntry\CodecID</tt>) assigned to signal tracks carrying FLAC data is <tt>A_FLAC</tt> in ASCII. All FLAC data before the first audio frame (i.e., the <tt>fLaC</tt> ASCII signature and all metadata blocks) is stored as CodecPrivate data (EBML path <tt>\Segment\Tracks\TrackEntry\CodecPrivate</tt>).</t> <t>Each FLAC frame (including all of its subframes) is treated as a single frame in theMatroska context.</t>context of Matroska.</t> <t>If an audio stream is encoded where audio properties (sample rate, number of channels, or bit depth) change at some point in the stream, this should be dealt with by finishing the current Matroska segment and starting a new one with the new properties.</t> </section> <section anchor="iso-base-media-file-format-mp4-mapping"><name>ISO Base Media File Format (MP4)mapping</name>Mapping</name> <t>The full encapsulation definition of FLAC audio in MP4 files was deemed too extensive to include in this document. A definition document can be found at <xref target="FLAC-in-MP4-specification"></xref>.</t> </section> </section> <sectionanchor="implementation-status"><name>Implementation status</name> <t>Note to RFC Editor - please remove this entire section before publication, as well as the reference to RFC 7942.</t> <t>This section records the status of known implementations of the FLAC format, and is based on a proposal described in <xref target="RFC7942"></xref>. Please note that the listing of any individual implementation here does not imply endorsement by the IETF. Furthermore, no effort has been spent to verify the information presented here that was supplied by IETF contributors. This is not intended as, and must not be construed to be, a catalog of available implementations or their features. Readers are advised to note that other implementations may exist.</t> <t>A reference encoder and decoder implementation of the FLAC format exists, known as libFLAC, maintained by Xiph.Org. It can be found at <eref target="https://xiph.org/flac/">https://xiph.org/flac/</eref> Note that while all libFLAC components are licensed under 3-clause BSD, the flac and metaflac command line tools often supplied together with libFLAC are licensed under GPL.</t> <t>Another completely independent implementation of both encoder and decoder of the FLAC format is available in libavcodec, maintained by FFmpeg, licensed under LGPL 2.1 or later. It can be found at <eref target="https://ffmpeg.org/">https://ffmpeg.org/</eref></t> <t>A list of other implementations and an overview of which parts of the format they implement can be found at <xref target="FLAC-wiki-implementations"></xref>.</t> </section> <sectionanchor="security-considerations"><name>Security Considerations</name> <t>Like any other codec (such as <xref target="RFC6716"></xref>), FLAC should not be used with insecure ciphers or cipher modes that are vulnerable to known plaintext attacks. Some of the headerbitsbits, as well as thepaddingpadding, are easily predictable.</t> <t>Implementations of the FLAC codec need to take appropriate security considerations into account.Section 2.1 of<xreftarget="RFC4732"></xref>target="RFC4732" sectionFormat="of" section="2.1"/> provides general information on DoS attacks onend-systemsend systems and describes some mitigation strategies. Areas of concern specific to FLAC follow.</t> <t>It is extremely important for the decoder to be robust against malformed payloads. Payloads that do not conform to this specification <bcp14>MUST NOT</bcp14> cause the decoder to overrun its allocated memory or take an excessive amount of resources to decode. An overrun in allocated memory could lead to arbitrary code execution by an attacker. The same applies to the encoder, even though problems with encoders are typically rarer. Malformed audio streams <bcp14>MUST NOT</bcp14> cause the encoder to misbehave because this would allow an attacker to attack transcoding gateways.</t> <t>As with all compression algorithms, both encoding and decoding can produce an output much larger than the input. For decoding, the most extreme possible case of this is a frame with eight constant subframes of block size 65535 and coding for 32-bit PCM. This frame is only 49 bytes insize,size but codes for more than 2 megabytes of uncompressed PCM data. For encoding, it is possible to have an even larger size increase, although such behavior is generally considered faulty. This happens if the encoder chooses ariceRice parameter that does not fit with the residual that has to be encoded. In such a case, very longunary codedunary-coded symbols canappear, inappear (in the most extreme case, more than 4 gigabytes persample.sample). Decoder and encoder implementors are advised to take precautions to prevent excessive resource utilization in such cases.</t> <t>Where metadata is handled, implementors are advised to either thoroughly test the handling of extreme cases or impose reasonable limits beyond the limits of thisspecification document.specification. For example, a single Vorbis comment metadata block can contain millions of valid fields. It is unlikely such a limit is ever reached except in a potentially malicious file. Likewise, the media type and description of a picture metadata block can be millions of characters long, despite there being no reasonable use of such contents. One possible use case for very long character strings is in lyrics, which can be stored in Vorbis comment metadata block fields.</t> <t>Various kinds of metadata blocks contain length fields or field counts. While reading a block following these lengths or counts, a decoderMUST<bcp14>MUST</bcp14> make sure higher-level lengths or counts (most importantly, the length field of the metadata block itself) are not exceeded. As some of these length fields code stringlengths,lengths and memoryfor whichmust beallocated,allocated for that, parsersMUST<bcp14>MUST</bcp14> first verify that a block is valid before allocating memory based on its contents, except when explicitly instructed to salvage data from a malformed file.</t> <t>Metadata blocks can also contain references, e.g., the picture metadata block can contain a URI. When followingana URI, the security considerations of[RFC3986]<xref target="RFC3986"/> apply. ApplicationsMUST<bcp14>MUST</bcp14> obtain explicit user approval to retrieve resources via remote protocols. Following external URIs introduces a tracking risk from on-path observers and the operator of the service hosting the URI. Likewise, the choice of scheme, if itisn’tisn't protected like https, could also introduce integrity attacks by an on-path observer. A malicious operator of the service hosting the URI can return arbitrary content that the parser will read. Also, such retrievals can be used in a DDoS attack when the URI points to a potential victim. Therefore, applications need to ask user approval for each retrieval individually, take extra precautions when parsing retrieved data, and cache retrieved resources. ApplicationsMUST<bcp14>MUST</bcp14> obtain explicit user approval to retrieve local resources not located in the same directory as the FLAC file being processed. Since relative URIs are permitted, applicationsMUST<bcp14>MUST</bcp14> guard against directory traversal attacks and guard against a violation of a same-origin policy if such a policy is being enforced.</t> <t>Seeking in a FLAC stream that is not in a container relies on the coded number in frame headers and optionally aseektableseek table metadata block. ParsersMUST<bcp14>MUST</bcp14> employ thorough checks on whether a found coded number orseekpointseek point is at all possible, e.g., whether it is within bounds and not directly contradicting any other coded number orseekpointseek point that the seeking process relies on. Without these checks, seeking might get stuck in an infinite loop when numbers in frames are non-consecutive or otherwise not valid, which could be used indenial of serviceDoS attacks.</t> <t>Implementors are advised to employ fuzz testing combined with different sanitizers on FLAC decoders to find security problems. Ignoring the results of CRC checks improves the efficiency of decoder fuzz testing.</t> <t>See <xref target="FLAC-decoder-testbench"></xref> for a non-exhaustive list of FLAC files with extreme configurations that lead to crashes or reboots on some known implementations. Besides providing a starting point for security testing, this set of files can also be used to test conformance with this specification.</t> <t>FLAC files may contain executable code, although the FLAC format is not designed for it and it is uncommon. One use case where FLAC is occasionally used to store executable code is when compressing images ofmixed modemixed-mode CDs, which contain both audio and non-audio data,of whichthe non-audio portion of which can contain executable code. In that case, the executable code is stored as if it were audio and is potentially obscured. Of course, it is also possible to store executable code as metadata, forexampleexample, as avorbisVorbis comment with help of a binary-to-text encoding or directly in an application metadata block. ApplicationsMUST NOT<bcp14>MUST NOT</bcp14> execute code contained in FLAC files or present parts of FLAC files as executable code to the user, except when an application has that explicit purpose, e.g., applications reading FLAC files as disc images and presenting it as a virtual disc drive.</t> </section> <section anchor="iana-considerations"><name>IANA Considerations</name><t>This document registers<t> Per this document, IANA has registered one new mediatype, "audio/flac", as defined in the following section,type ("audio/flac") andcreatescreated a new IANAregistry.</t> <section anchor="media-type-registration"><name>Media type registration</name> <t>The following information servesregistry, as described in theregistration form forsubsections below.</t> <section anchor="media-type-registration"><name>Media Type Registration</name> <t>IANA has registered the"audio/flac""audio/flac" mediatype.type as follows. This media type is applicable for FLAC audio that is not packaged in a container as described in <xref target="container-mappings"></xref>. FLAC audio packaged in such a container will take on the media type of that container, for example,audio/ogg"audio/ogg" when packaged in an Oggcontainer,container orvideo/mp4"video/mp4" when packaged in an MP4 container alongside a video track.</t><artwork><![CDATA[Type name: audio Subtype name: flac Required parameters: N/A Optional parameters: N/A Encoding considerations: as<dl> <dt>Type name:</dt><dd>audio</dd> <dt>Subtype name:</dt><dd>flac</dd> <dt>Required parameters:</dt><dd>N/A</dd> <dt>Optional parameters:</dt><dd>N/A</dd> <dt>Encoding considerations:</dt><dd>as perTHISRFC Security considerations: seeRFC 9639</dd> <dt>Security considerations:</dt><dd>See the security considerations inSection 12<xref target="security-considerations"></xref> ofTHISRFC Interoperability considerations: seeRFC 9639.</dd> <dt>Interoperability considerations:</dt><dd>See the descriptions of past format changes inAppendix B<xref target="past-format-changes"/> ofTHISRFC Published specification: THISRFC ApplicationsRFC 9639.</dd> <dt>Published specification:</dt><dd>RFC 9639</dd> <dt>Applications that use this mediatype: ffmpeg, apache, firefox Fragmenttype:</dt><dd>FFmpeg, Apache, Firefox</dd> <dt>Fragment identifierconsiderations: none Additional information: Deprecatedconsiderations:</dt><dd>N/A</dd> <dt>Additional information:</dt><dd> <t><br/></t> <dl spacing="compact"> <dt>Deprecated alias names for thistype: audio/x-flac Magic number(s): fLaC File extension(s): flac Macintoshtype:</dt><dd>audio/x-flac</dd> <dt>Magic number(s):</dt><dd>fLaC</dd> <dt>File extension(s):</dt><dd>flac</dd> <dt>Macintosh file typecode(s): none Uniformcode(s):</dt><dd>N/A</dd> <dt>Uniform TypeIdentifier: org.xiph.flacIdentifier:</dt><dd>org.xiph.flac conforms topublic.audio Windowspublic.audio</dd> <dt>Windows Clipboard FormatName: audio/flac Person &Name:</dt><dd>audio/flac</dd> </dl> </dd> <dt>Person & email address to contact for furtherinformation: IETFinformation:</dt><dd>IETF CELLARWG cellar@ietf.org Intended usage: COMMON RestrictionsWorking Group (cellar@ietf.org)</dd> <dt>Intended usage:</dt><dd>COMMON</dd> <dt>Restrictions onusage: N/A Author: IETFusage:</dt><dd>N/A</dd> <dt>Author:</dt><dd>IETF CELLARWG Change controller: InternetWorking Group</dd> <dt>Change controller:</dt><dd>Internet Engineering Task Force(mailto:iesg@ietf.org) Provisional registration? (standards tree only): NO ]]> </artwork>(iesg@ietf.org)</dd> </dl> </section> <sectionanchor="application-id-registry"><name>Application IDanchor="application-id-registry"><name>FLAC Application Metadata Block IDs Registry</name><t>This document creates<t>IANA has created a newIANAregistry called the"FLAC"FLAC Application Metadata BlockID"IDs" registry. The values correspond to the 32-bit identifier described in <xref target="application"></xref>.</t> <t>To register a newApplicationapplication ID in this registry, one needs anApplicationapplication ID, a description,optionally aan optional reference to a document describing theApplication IDapplication ID, and a Change Controller (IETF or email of registrant). TheApplicationapplication IDs areto beallocated according to the"First"First Come FirstServed"Served" policy[RFC8126],<xref target="RFC8126"/> so that there is no impediment to registering anyApplicationapplication IDs the FLAC community encounters, especially if they were used in audio files but were not registered when the audio files were encoded. AnApplicationapplication ID can be any 32-bitvalue,value but is often composed of 4 ASCIIcharacters, to becharacters that are human-readable.</t> <t>TheFLACinitial contents of "FLAC Application Metadata BlockIDIDs" registryis assignedare shown in thefollowingtable below. These initialvalues,values were taken from the registration page at xiph.org (see <xref target="ID-registration-page"></xref>), which is no longer being maintained as itishas been replaced by this registry.</t> <table> <thead> <tr> <th align="left">Application ID</th> <th align="left">ASCIIrendition (if available)</th>Rendition (If Available)</th> <th align="left">Description</th> <thalign="left">Specification</th>align="left">Reference</th> <th align="left">Changecontroller</th>Controller</th> </tr> </thead> <tbody> <tr> <td align="left">0x41544348</td> <td align="left">ATCH</td> <td align="left">FlacFile</td> <td align="left"><xreftarget="FlacFile"></xref></td>target="FlacFile"></xref>, RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x42534F4C</td> <td align="left">BSOL</td> <td align="left">beSolo</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x42554753</td> <td align="left">BUGS</td> <td align="left">Bugs Player</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x43756573</td> <td align="left">Cues</td> <td align="left">GoldWave cue points</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x46696361</td> <td align="left">Fica</td> <td align="left">CUE Splitter</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x46746F6C</td> <td align="left">Ftol</td> <td align="left">flac-tools</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x4D4F5442</td> <td align="left">MOTB</td> <td align="left">MOTB MetaCzar</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x4D505345</td> <td align="left">MPSE</td> <td align="left">MP3 Stream Editor</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x4D754D4C</td> <td align="left">MuML</td> <td align="left">MusicML: Music Metadata Language</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x52494646</td> <td align="left">RIFF</td> <td align="left">Sound Devices RIFF chunk storage</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x5346464C</td> <td align="left">SFFL</td> <td align="left">Sound Font FLAC</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x534F4E59</td> <td align="left">SONY</td> <td align="left">Sony Creative Software</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x5351455A</td> <td align="left">SQEZ</td> <td align="left">flacsqueeze</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x54745776</td> <td align="left">TtWv</td> <td align="left">TwistedWave</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x55495453</td> <td align="left">UITS</td> <td align="left">UITS Embedding tools</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x61696666</td> <td align="left">aiff</td> <td align="left">FLAC AIFF chunk storage</td> <td align="left"><xreftarget="Foreign-metadata"></xref></td>target="Foreign-metadata"></xref>, RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x696D6167</td> <td align="left">imag</td> <td align="left">flac-image</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x7065656D</td> <td align="left">peem</td> <td align="left">Parseable Embedded Extensible Metadata</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x71667374</td> <td align="left">qfst</td> <td align="left">QFLAC Studio</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x72696666</td> <td align="left">riff</td> <td align="left">FLAC RIFF chunk storage</td> <td align="left"><xreftarget="Foreign-metadata"></xref></td>target="Foreign-metadata"></xref>, RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x74756E65</td> <td align="left">tune</td> <td align="left">TagTuner</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <tdalign="left">0x773634C0</td>align="left">0x77363420</td> <tdalign="left">w64</td>align="left">w64 </td> <td align="left">FLAC Wave64 chunk storage</td> <td align="left"><xreftarget="Foreign-metadata"></xref></td>target="Foreign-metadata"></xref>, RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x78626174</td> <td align="left">xbat</td> <td align="left">XBAT</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> <tr> <td align="left">0x786D6364</td> <td align="left">xmcd</td> <td align="left">xmcd</td> <tdalign="left"></td>align="left">RFC 9639</td> <td align="left">IETF</td> </tr> </tbody> </table></section> </section><section anchor="acknowledgments"><name>Acknowledgments</name> <t>FLAC owes much to the many people who have advanced the audio compression field so freely. For instance:</t> <ul spacing="compact"> <li>A. J. Robinson for his work on Shorten; his paper (see <xref target="robinson-tr156"></xref>) is a good starting point on some of the basic methods used by FLAC. FLAC trivially extends and improves the fixed predictors, LPC coefficient quantization, and Rice coding used in Shorten.</li> <li>S. W. Golomb and Robert F. Rice; their universal codes are used by FLAC's entropy coder, see <xref target="Rice"></xref>.</li> <li>N. Levinson and J. Durbin; the FLAC reference encoder (see <xref target="implementation-status"></xref>) uses an algorithm developed and refined by them for determining the LPC coefficients from the autocorrelation coefficients, see <xref target="Durbin"></xref>.</li> <li>And of course, Claude Shannon, see <xref target="Shannon"></xref>.</li> </ul> <t>The FLAC format, the FLAC reference implementation, and this document were originally developed by Josh Coalson. While many others have contributed since, this original effort is deeply appreciated.</t> </section></middle> <back><references><name>References</name> <references><name>Normative<references> <name>References</name> <references> <name>Normative References</name><xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-cellar-matroska.xml"/><reference anchor="ISRC-handbook" target="https://www.ifpi.org/isrc_handbook/"> <front> <title>International Standard Recording Code (ISRC)Handbook, 4th edition</title>Handbook</title> <author> <organization>International ISRC Registration Authority</organization> </author> <dateyear="2021"></date>year="2021"/> </front> <refcontent>4th edition</refcontent> </reference> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.1321.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2046.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2083.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3533.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3629.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3986.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9559.xml"/> </references><references><name>Informative<references> <name>Informative References</name> <reference anchor="Durbin" target="https://www.jstor.org/stable/1401322"> <front> <title>The Fitting of Time-SeriesModels </title>Models</title> <author fullname="James Durbin" initials="J" surname="Durbin"> <organization>University of London</organization> </author> <dateyear="1959" month="12"></date>year="1960"/> </front> <seriesInfo name="DOI" value="10.2307/1401322"></seriesInfo> <refcontent>Revue de l'Institut International de Statistique / Review of the International Statistical Institute, vol. 28, no. 3, pp. 233–44</refcontent> </reference> <reference anchor="FIR"target="https://en.wikipedia.org/wiki/Finite_impulse_response">target="https://en.wikipedia.org/w/index.php?title=Finite_impulse_response&oldid=1240945295"> <front> <title>Finite impulseresponse - Wikipedia</title> <author></author> <date></date>response</title> <author> <organization>Wikipedia</organization> </author> <date month="August" year="2024"/> </front> </reference> <reference anchor="FLAC-decoder-testbench" target="https://github.com/ietf-wg-cellar/flac-test-files"> <front><title>FLAC decoder testbench</title><title> The Free Lossless Audio Codec (FLAC) test files</title> <author></author> <date year="2023" month="08"></date> </front> <refcontent>commit aa7b0c6</refcontent> </reference> <reference anchor="FLAC-in-MP4-specification"target=" https://github.com/xiph/flac/blob/master/doc/isoflac.txt">target="https://github.com/xiph/flac/blob/master/doc/isoflac.txt"> <front> <title>Encapsulation of FLAC in ISO Base Media File Format</title><author fullname="Christopher Montgomery" initials="C" surname="Montgomery"></author><author></author> <date year="2022"month="07"></date>month="07"/> </front> <refcontent>commit 78d85dd</refcontent> </reference> <reference anchor="FLAC-specification-github" target="https://github.com/ietf-wg-cellar/flac-specification"> <front><title>FLAC specification github repository</title><title>The Free Lossless Audio Codec (FLAC) Specification</title> <author></author> <date></date> </front> </reference> <referenceanchor="FLAC-wiki-implementations" target="https://github.com/ietf-wg-cellar/flac-specification/wiki/Implementations"> <front> <title>FLAC specification wiki: Implementations</title> <author></author> </front> </reference> <referenceanchor="FLAC-wiki-interoperability" target="https://github.com/ietf-wg-cellar/flac-specification/wiki/Interoperability-considerations"> <front><title>FLAC specification wiki: Interoperability<title>Interoperability considerations</title> <author></author> </front> <refcontent>commit 58a06d6</refcontent> </reference> <reference anchor="FlacFile" target="https://web.archive.org/web/20071023070305/http://firestuff.org:80/flacfile/"> <front> <title>FlacFile</title> <author></author> <date year="2007" month="10"></date> </front> <refcontent>Wayback Machine archive</refcontent> </reference> <reference anchor="FLAC-implementation" target="https://xiph.org/flac/"> <front> <title>FLAC</title> <author></author> <date></date> </front> </reference> <reference anchor="Foreign-metadata" target="https://github.com/xiph/flac/blob/master/doc/foreign_metadata_storage.md"> <front> <title>Specification of foreign metadata storage in FLAC</title> <author></author> <date year="2023" month="11"></date> </front> <refcontent>commit 72787c3</refcontent> </reference> <referenceanchor="HPL-1999-144" target="https://www.hpl.hp.com/techreports/1999/HPL-1999-144.pdf">anchor="Lossless-Compression" target="https://ieeexplore.ieee.org/document/939834"> <front> <title>LosslessCompressioncompression ofDigital Audio</title>digital audio</title> <author fullname="Mat Hans" initials="M" surname="Hans"> <organization>Client and Media Systems Laboratory, HP Laboratories Palo Alto</organization> </author> <author fullname="Ronald W. Schafer"initials="RW"initials="R. W" surname="Schafer"> <organization>Center for Signal & Image Processing at the School of Electrical and Computer Engineering, Georgia Institute of the Technology, Atlanta, Georgia</organization> </author> <dateyear="1999" month="11"></date>year="2001" month="July"></date> </front> <seriesInfo name="DOI" value="10.1109/79.939834"></seriesInfo> <refcontent>IEEE Signal Processing Magazine, vol. 18, no. 4, pp. 21-32</refcontent> </reference> <reference anchor="ID-registration-page" target="https://xiph.org/flac/id.html"> <front><title>FLAC - ID Registry</title> <author></author><title>ID registry</title> <author> <organization>Xiph.Org</organization> </author> </front> </reference> <reference anchor="ID3v2" target="https://web.archive.org/web/20220903174949/https://id3.org/id3v2.4.0-frames"> <front><title>id3v2.4.0-frames.txt</title><title>ID3 tag version 2.4.0 - Native Frames</title> <author fullname="Martin Nilsson" initials="M" surname="Nilsson"></author> <date year="2000" month="11"></date> </front> <refcontent>Wayback Machine archive</refcontent> </reference> <reference anchor="IEC.60908.1999"target="">target="https://webstore.iec.ch/publication/3885"> <front> <title>Audio recording - Compact disc digital audio system</title> <author> <organization>International Electrotechnical Commission</organization> </author> <date year="1999"></date> </front> <seriesInfo name="IEC"value="International standard 60908 second edition"></seriesInfo>value="60908:1999-02"></seriesInfo> </reference> <reference anchor="LinearPrediction"target="https://en.wikipedia.org/wiki/Linear_prediction">target="https://en.wikipedia.org/w/index.php?title=Linear_prediction&oldid=1169015573"> <front> <title>Linearprediction - Wikipedia</title> <author></author> <date></date>prediction</title> <author> <organization>Wikipedia</organization> </author> <date month="August" year="2023" /> </front> </reference> <reference anchor="MLP" target="https://www.aes.org/e-lib/online/browse.cfm?elib=8082"> <front> <title>The MLP Lossless Compression System</title> <author fullname="Michael A. Gerzon"initials="MA"initials="M. A" surname="Gerzon"></author> <author fullname="Peter G. Craven"initials="PG"initials="P. G" surname="Craven"> <organization>Algol Applications Ltd, Hove, England</organization> </author> <author fullname="J. Robert Stuart"initials="JR"initials="J. R" surname="Stuart"> <organization>Meridian Audio Ltd, Huntingdon, England</organization> </author> <author fullname="Malcolm J. Law"initials="MJ"initials="M. J" surname="Law"> <organization>Algol Applications Ltd, Hove, England</organization> </author> <author fullname="Rhonda J. Wilson"initials="RJ"initials="R. J" surname="Wilson"> <organization>Meridian Audio Ltd, Huntingdon, England</organization> </author> <date year="1999" month="09"></date> </front> <refcontent>Audio Engineering Society Conference: 17th International Conference: High-Quality Audio Codin</refcontent> </reference> <reference anchor="MusicBrainz" target="https://picard-docs.musicbrainz.org/en/variables/variables.html"> <front> <title>Tags &Variables - MusicBrainz Picard v2.10 documentation</title>Variables</title> <author> <organization>MusicBrainz</organization> </author><date></date></front> <refcontent>MusicBrainz Picard v2.10 documentation</refcontent> </reference> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4732.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5334.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6716.xml"/> <xi:includehref="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7942.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8126.xml"/> <reference anchor="Rice" target="https://ieeexplore.ieee.org/document/1090789"> <front> <title>Adaptive Variable-Length Coding for Efficient Compression of Spacecraft Television Data</title> <author fullname="Robert Rice"initials="RF"initials="R. F" surname="Rice"> <organization>Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA</organization> </author> <authorinitials="JR"initials="J. R" surname="Plaunt"> <organization>Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA</organization> </author> <date year="1971" month="12"></date> </front> <seriesInfo name="DOI" value="10.1109/TCOM.1971.1090789"></seriesInfo> <refcontent>IEEE Transactions on Communication Technology, vol. 19, no. 6, pp. 889-897</refcontent> </reference> <reference anchor="Shannon" target="https://ieeexplore.ieee.org/document/1697831"> <front> <title>Communication in the Presence of Noise</title> <author fullname="Claude Shannon"initials="CE"initials="C. E" surname="Shannon"> <organization>Bell Telephone Laboratories, Inc., Murray Hill, NJ, USA</organization> </author> <date year="1949" month="01"></date> </front> <seriesInfo name="DOI" value="10.1109/JRPROC.1949.232969"></seriesInfo> <refcontent>Proceedings of the IRE, vol. 37, no. 1, pp. 10-21</refcontent> </reference> <reference anchor="VarLengthCode"target="https://en.wikipedia.org/wiki/Variable-length_code">target="https://en.wikipedia.org/w/index.php?title=Variable-length_code&oldid=1220260423"> <front> <title>Variable-lengthcode - Wikipedia</title> <author></author> <date></date>code</title> <author> <organization>Wikipedia</organization> </author> <date month="April" year="2024" /> </front> </reference> <reference anchor="Vorbis" target="https://xiph.org/vorbis/doc/v-comment.html"> <front> <title>Ogg Vorbis I format specification: comment field and header specification</title> <author> <organization>Xiph.Org</organization> </author> <date></date> </front> </reference> <reference anchor="lossyWAV"target="https://wiki.hydrogenaud.io/index.php?title=LossyWAV">target="https://wiki.hydrogenaud.io/index.php?title=LossyWAV&oldid=32877"> <front><title>lossyWAV - Hydrogenaudio Knowledgebase</title> <author></author><title>lossyWAV</title> <author> <organization>Hydrogenaudio Knowledgebase</organization> </author> <date month="July" year="2021" /> </front> </reference> <referenceanchor="robinson-tr156" target="https://mi.eng.cam.ac.uk/reports/abstracts/robinson_tr156.html">anchor="Robinson-TR156" target="https://mi.eng.cam.ac.uk/reports/svr-ftp/auto-pdf/robinson_tr156.pdf"> <front> <title>SHORTEN: Simple lossless and near-lossless waveform compression</title> <author fullname="Tony Robinson" initials="T" surname="Robinson"> <organization>Cambridge University Engineering Department</organization> </author> <date year="1994" month="12"></date> </front> <refcontent>Cambridge University Engineering Department Technical Report CUED/F-INFENG/TR.156</refcontent> </reference> </references> </references> <section anchor="numerical-considerations"><name>Numericalconsiderations</name>Considerations</name> <t>In order to maintain lossless behavior, all arithmetic used in encoding and decoding sample values must be done with integer data types to eliminate the possibility of introducing rounding errors associated with floating-point arithmetic. Use of floating-point representations in analysis (e.g., finding a good predictor or Rice parameter) is not aconcern,concern as long as the process of using the found predictor and Rice parameter to encode audio samples is implemented with only integer math.</t> <t>Furthermore, the possibility of integer overflow can be eliminated by usinglarge enoughdatatypes.types that are large enough. Choosing a 64-bit signed data type for all arithmetic involving sample values would make sure the possibility for overflow is eliminated, butusuallyusually, smaller data types are chosen for increased performance, especially in embedded devices. This appendix provides guidelines for choosing the appropriate data type for each step of encoding and decoding FLAC files.</t> <t>In this appendix, signed data types are signed two's complement.</t> <section anchor="determining-the-necessary-data-type-size"><name>Determining thenecessary data type size</name>Necessary Data Type Size</name> <t>To find the smallest data type size that is guaranteed not to overflow for a certain sequence of arithmetic operations, the combination of values producing the largest possible result should be considered.</t><t>If, for<t>For example, if two 16-bit signed integers are added, the largest possible result forms if both values are the largest number that can be represented with a 16-bit signed integer. To store the result, a signed integer data type with at least 17 bits is needed. Similarly, when adding 4 of these values, 18 bits are needed; when adding 8, 19 bits are needed, etc. In general, the number of bits necessary when adding numbers together is increased by the log base 2 of the number of values rounded up to the nearest integer. So, when adding 18 unknown values stored in8 bit8-bit signed integers, we need a signed integer data type of at least 13 bits to store the result, as the log base 2 of 18 rounded up is 5.</t> <t>When multiplying two numbers, the number of bits needed for the result is the size of the first number plus the size of the second number.If, forFor example, if a 16-bit signed integer is multiplied by another 16-bit signed integer, the result needs at least 32 bits to be stored without overflowing. To show this in practice, the largest signed value that can be stored in 4 bits is -8. (-8)*(-8) is 64, which needs at least 8 bits (signed) to store.</t> </section> <section anchor="stereo-decorrelation"><name>Stereodecorrelation</name>Decorrelation</name> <t>When stereo decorrelation is used, the side channel will have one extra bit of bitdepth,depth; see <xref target="interchannel-decorrelation"></xref>.</t> <t>This means that while 16-bit signed integers have sufficient range to store samples from a fully decoded FLAC frame with a bit depth of 16 bits, the decoding of a side subframe in such a file will need a data type with at least 17 bits to store decoded subframe samples before undoing stereo decorrelation.</t> <t>Most FLAC decoders store decoded (subframe) samples as 32-bit values, which is sufficient for files with bit depths up to (and including) 31 bits.</t> </section> <section anchor="prediction-1"><name>Prediction</name> <t>A prediction (which is used to calculate the residual on encoding or added to the residual to calculate the sample value on decoding) is formed by multiplying and summing preceding sample values. In order to eliminate the possibility of integer overflow, the combination of preceding sample values and predictor coefficients producing the largest possible value should be considered.</t> <t>To determine the size of the data type needed to calculate either a residual sample (on encoding) or an audio sample value (on decoding) in a fixed predictor subframe, themaximalmaximum possible value for these is calculated as described in <xref target="determining-the-necessary-data-type-size"></xref> and in the following table. Forexample:example, if a frame codes for 16-bit audio and has some form of stereo decorrelation, the subframe coding for the side channel would need 16+1+3 bits if athird orderthird-order fixed predictor is used.</t> <table> <thead> <tr> <th align="left">Order</th> <th align="left">Calculation ofresidual</th>Residual</th> <th align="left">Samplevalues summed</th>Values Summed</th> <th align="left">Extrabits</th>Bits</th> </tr> </thead> <tbody> <tr> <td align="left">0</td> <td align="left">a(n)</td> <td align="left">1</td> <td align="left">0</td> </tr> <tr> <td align="left">1</td> <td align="left">a(n) - a(n-1)</td> <td align="left">2</td> <td align="left">1</td> </tr> <tr> <td align="left">2</td> <td align="left">a(n) - 2 * a(n-1) + a(n-2)</td> <td align="left">4</td> <td align="left">2</td> </tr> <tr> <td align="left">3</td> <td align="left">a(n) - 3 * a(n-1) + 3 * a(n-2) - a(n-3)</td> <td align="left">8</td> <td align="left">3</td> </tr> <tr> <td align="left">4</td> <td align="left">a(n) - 4 * a(n-1) + 6 * a(n-2) - 4 * a(n-3) + a(n-4)</td> <td align="left">16</td> <td align="left">4</td> </tr> </tbody></table><t>Where</t> <ul spacing="compact"></table><t>Where:</t> <ul> <li>n is the number of the sample being predicted.</li> <li>a(n) is the sample being predicted.</li> <li>a(n-1) is the sample before the one being predicted, a(n-2) is the sample before that, etc.</li> </ul> <t>For subframes with a linear predictor, the calculation is a little more complicated. Each prediction is the sum of several multiplications. Each of these multiply a sample value with a predictor coefficient. The extra bits needed can be calculated by adding the predictor coefficient precision (in bits) to the bit depth of the audio samples. To account for the summing of these multiplications, the log base 2 of the predictor order rounded up is added.</t> <t>For example, if the sample bit depth of the source is 24, the current subframe encodes a side channel (see <xref target="interchannel-decorrelation"></xref>), the predictor order is 12, and the predictor coefficient precision is 15 bits, the minimum required size of the used signed integer data type is at least (24 + 1) + 15 + ceil(log2(12)) = 44 bits. As another example, with a side-channel subframe bit depth of 16, a predictor order of 8, and a predictor coefficient precision of 12 bits, the minimum required size of the used signed integer data type is (16 + 1) + 12 + ceil(log2(8)) = 32 bits.</t> </section> <section anchor="residual"><name>Residual</name> <t>As stated in <xref target="coded-residual"></xref>, an encoder must make sure residual samples are representable by a 32-bit integer, signed two's complement, excluding the most negative value.Continuing asAs in the previous section, it is possible to calculate when residual samples already implicitly fit and when an additional check is needed. This implicit fit is achieved when residuals would fit a theoretical 31-bit signedint,integer, as that satisfies both of the mentioned criteria. When this implicit fit is not achieved, all residual values must be calculated and checked individually.</t> <t>For the residual of a fixed predictor, the maximum residual sample size was already calculated in the previous section. However, for a linear predictor, the prediction is shifted right by a certain amount. The number of bits needed for the residual is the number of bits calculated in the previous section, reduced by the prediction right shift, and increased by one bit to account for the subtraction of the prediction from the current sample on encoding.</t> <t>Taking the last example of the previous section, where 32 bits were needed for the prediction, the required data type size for the residual samples in case of a right shift of 10 bits would be 32 - 10 + 1 = 23 bits, which means it is not necessary to perform the aforementioned check.</t> <t>As another example, when encoding 32-bit PCM with fixed predictors, all predictor orders must be checked. While the0-orderzero-order fixed predictor is guaranteed to have residual samples that fit a 32-bit signedint,integer, it might produce a residual sample value that is the most negative representable value of that 32-bit signedint.</t>integer.</t> <t>Note that on decoding, while the residual sample values are limited to the aforementioned range, the predictions are not. This means that while the decoding of the residual samples can happen fully in 32-bit signed integers, decoders must be sure to execute the addition of each residual sample to its accompanying prediction with awide enoughsigned integer data typelike onthat is wide enough, as with encoding.</t> </section> <section anchor="rice-coding"><name>Ricecoding</name>Coding</name> <t>When folding (i.e.,zig-zagzigzag encoding) the residual sample values, no extra bits are needed when the absolute value of each residual sample is first stored in an unsigned data type of the size of the last step, then doubled, and then has one subtracted depending on whether the residual sample was positive or negative.Many implementations, however,However, many implementations choose to require one extra bit of data type size sozig-zagzigzag encoding can happen in one stepandwithout a cast instead of the procedure described in the previous sentence.</t> </section> </section> <section anchor="past-format-changes"><name>Pastformat changes</name>Format Changes</name> <t>This informational appendix documents the changes made to the FLAC format over the years. This information might be of use when encountering FLAC files that were made with software following the format as it was before the changes documented in this appendix.</t> <t>The FLAC format was first specified in December20002000, and the bitstream format was considered frozen with the release of FLAC 1.0 (the reference encoder/decoder)1.0in July 2001. Only changes made since this first stable release are considered in this appendix. Changes made to the FLAC streamable subset definition (see <xref target="streamable-subset"></xref>) are not considered.</t> <section anchor="addition-of-blocking-strategy-bit"><name>Addition ofblocking strategy bit</name>Blocking Strategy Bit</name> <t>Perhaps the largestbackwards incompatiblebackwards-incompatible change to the specification was published in July 2007. Before this change, variable block size streams were not explicitly marked as such by a flag bit in the frame header. A decoder had two ways to detect a variable block sizestream, eitherstream: by comparing the minimum and maximum blocksizesizes in theSTREAMINFOstreaminfo metadata block (which are equal for a fixed block sizestream), or, if a decoder did not receive a STREAMINFO metadata block,stream) or by detecting a change of block size during astream,stream if a decoder did not receive a streaminfo metadata block, which couldin theorynot happen atall.all in theory. As the meaning of the coded number in the frame header depends on whether or not a streamishas a variable block size, this presented a problem: the meaning of the coded number could not be reliably determined. To fix this problem, one of the reserved bits was changed to be used as a blocking strategy bit. See also <xref target="frame-header"></xref>.</t> <t>Along with the addition of a new flag, the meaning of the block size bits (see <xref target="block-size-bits"></xref>) was subtly changed. Initially, block size bits patterns 0b0001-0b0101 and 0b1000-0b1111 could only be used for fixed block size streams, while 0b0110 and 0b0111 could be used for both fixed block size and variable block size streams. Withthethis change, these restrictions were lifted, and patterns 0b0001-0b1111 are now used for both variable block size and fixed block size streams.</t> </section> <section anchor="restriction-of-encoded-residual-samples"><name>Restriction ofencoded residual samples</name>Encoded Residual Samples</name> <t>Another change to the specification was deemed necessary during standardization by the CELLARworking groupWorking Group of the IETF. As specified in <xreftarget="coded-residual"></xref>target="coded-residual"></xref>, a limit is imposed on residual samples. This limit was not specified prior to the IETF standardization effort. However, as far as was known to the working group, no FLAC encoder at that time produced FLAC files containing residual samples exceeding this limit. This is mostly because it is very unlikely to encounter residual samples exceeding this limit when encoding 24-bit PCM, and encoding of PCM with higher bit depths was not yet implemented in any known encoder. In fact, these FLAC encoders would produce corrupt files upon being triggered to produce such residualsamplessamples, and it is unlikely any non-experimental encoder would ever do so, even when presented with crafted material. Therefore, it was not expected that existing implementations would be rendered non-compliant by this change.</t> </section> <section anchor="addition-of-5-bit-rice-parameters"><name>Addition of5-bit5-Bit Riceparameters</name>Parameters</name> <t>One significant addition to the format was the residual coding method using 5-bit Rice parameters. Prior to publication of this addition in July 2007,there was only one residual coding method specified,a partitioned Rice code with 4-bit Riceparameters.parameters was the only residual coding method specified. The range offered by this coding method proved too small when encoding 24-bitPCM,PCM; therefore, a second residual coding method wasspecified,specified that was identical to thefirstfirst, but with 5-bit Rice parameters.</t> </section> <section anchor="restriction-of-lpc-shift-to-non-negative-values"><name>Restriction of LPCshiftShift tonon-negative values</name>Non-negative Values</name> <t>As stated in <xref target="linear-predictor-subframe"></xref>, the predictor right shift is a number signed two's complement, whichMUST NOT<bcp14>MUST NOT</bcp14> be negative. This is becauserightshifting a number to the right by a negative amount is undefined behavior in the C programming language standard. The intended behavior was that a positive number would be a right shift and a negative number would be a left shift. The FLAC reference encoder was changed in 2007 to not generate LPC subframes with a negative predictor right shift, as it turned out that the use of such subframes would only very rarely provide anybenefit,benefit and the decoders that were already widely in use at that point were not able to handle such subframes.</t> </section> </section> <section anchor="interoperability-considerations"><name>Interoperabilityconsiderations</name>Considerations</name> <t>As documented in <xref target="past-format-changes"></xref>, there have been some changes and additions to the FLAC format. Additionally, implementation of certain features of the FLAC format took many years, meaning early decoder implementations could not be tested against files with these features. Finally, many lower-quality FLAC decoders only implement just enough features required for playback of the most common FLAC files.</t> <t>This appendix provides some considerations for encoder implementations aiming to create highly compatible files. As this topic is one that might change after this document isfinished,published, consult <xref target="FLAC-wiki-interoperability"></xref> for more up-to-date information.</t> <section anchor="features-outside-of-the-streamable-subset"><name>Features outside of thestreamable subset</name>Streamable Subset</name> <t>As described in <xref target="streamable-subset"></xref>, FLAC specifies a subset of its capabilities as the FLAC streamable subset. Certain decoders may choose to only decode FLAC files conforming to the limitations imposed by the streamable subset. Therefore, maximum compatibility with decoders is achieved when the limitations of the FLAC streamable subset are followed when creating FLAC files.</t> </section> <section anchor="variable-block-size"><name>Variableblock size</name>Block Size</name> <t>Because it is often difficult to find the optimal arrangement of block sizes for maximum compression, most encoders choose to create files with a fixed block size. Because of this, many decoder implementations receive minimal use when handling variable block size streams, and this can reveal bugs or reveal that implementations do not decode them at all. Furthermore, as explained in <xref target="addition-of-blocking-strategy-bit"></xref>, there have been some changes to the way variable block size streamswereare encoded. Because of this, maximum compatibility with decoders is achieved when FLAC files are created using fixed block size streams.</t> </section> <sectionanchor="rice-parameter-5-bit"><name>5-bitanchor="rice-parameter-5-bit"><name>5-Bit Riceparameter</name> <t>AsParameters</name> <t> As the addition of the coding method using 5-bit Riceparameter,parameters, as described in <xreftarget="addition-of-5-bit-rice-parameters"></xref>,target="addition-of-5-bit-rice-parameters"/>, occurred quite a few years after the FLAC format was first introduced, some early decoders might not be able to decode files containing such Rice parameters. The introduction of this was specifically aimed at improving compression of 24-bit PCM audio, and compression of 16-bit PCM audio only rarely benefits from using 5-bit Rice parameters. Therefore, maximum compatibility with decoders is achieved when FLAC files containing audio with a bit depth of 16 bits orlowerless are created without any use of 5-bit Rice parameters.</t> </section> <section anchor="rice-escape-code"><name>Riceescape code</name>Escape Code</name> <t>Escaped Rice partitions are seldom used, as it turned out their use provides only a very small compression improvement. As many encodersthereforedo not use these by default or are not capable of producing them at all, it is likely that many decoder implementations are not able to decode them correctly. Therefore, maximum compatibility with decoders is achieved when FLAC files are created without any use of escaped Rice partitions.</t> </section> <section anchor="uncommon-block-size-1"><name>Uncommonblock size</name>Block Size</name> <t>For unknown reasons, some decoders have chosen to support only common block sizes for all but the last block of a stream. Therefore, maximum compatibility with decoders is achieved when creating FLAC files using common block sizes, as listed in <xref target="block-size-bits"></xref>, for all but the last block of a stream.</t> </section> <section anchor="uncommon-bit-depth"><name>Uncommonbit depth</name>Bit Depth</name> <t>Most audio is stored in bit depths that are a whole number of bytes, e.g., 8,1616, or 24bit. Therebits. However, there ishoweveraudio with different bit depths. A few examples:</t><ul spacing="compact"><ul> <li>DVD-Audio has the possibility to store20 bit20-bit PCM audio.</li> <li>DAT and DV can store12 bit12-bit PCM audio.</li> <li>NICAM-728 samples at 14bit,bits, which is companded to 10bit.</li>bits.</li> <li>8-bit µ-law can be losslessly converted to14 bit14-bit (Linear) PCM.</li> <li>8-bit A-law can be losslessly converted to13 bit13-bit (Linear) PCM.</li> </ul> <t>The FLAC format can contain these bit depths directly, but because they are uncommon, some decoders are not able to process the resulting files correctly. It is possible to store these formats in a FLAC file with a more common bit depth without sacrificing compression by padding each sample with zero bits to a bit depth that is a whole byte. The FLAC format can efficiently compress these wasted bits. See <xref target="wasted-bits-per-sample"></xref> for details.</t> <t>Therefore, maximum compatibility with decoders is achieved when FLAC files are created by padding samples of such audio with zero bits to the bit depth that is the next whole number of bytes.</t> <t>In cases where the original signal is already padded, this operation cannot be reversed losslessly without knowing the original bit depth. To leave no ambiguity, the original bit depth needs to be stored, for example, in avorbisVorbis commentfield,field or by storing the header of the originalfile, or in a description of thefile. The choice of a suitable method is left to theimplementer.</t>implementor.</t> <t>Besides audio with a'non-whole byte'"non-whole byte" bit depth, some decoder implementations have chosen to only accept FLAC files coding for PCM audio with a bit depth of 16bit.bits. Many implementations support bit depths up to 24bitbits, but no higher. Consult <xref target="FLAC-wiki-interoperability"></xref> for more up-to-date information.</t> </section> <sectionanchor="multi-channel-audio-and-uncommon-sample-rates"><name>Multi-channel audioanchor="multi-channel-audio-and-uncommon-sample-rates"><name>Multi-Channel Audio anduncommon sample rates</name>Uncommon Sample Rates</name> <t>Many FLAC audio players are unable to render multi-channel audio or audio with an uncommon sample rate. While this is not a concern specific to the FLAC format, it is of note when requiring maximum compatibility with decoders. Unlike the previously mentioned interoperability considerations, this is one where compatibility cannot be improved without sacrificing the lossless nature of the FLAC format.</t> <t>From a non-exhaustive inquiry, it seems that a non-negligibleamountnumber of players, especially hardware players, do not support audio with 3 or more channels or sample rates other than those consideredcommon,common; see <xref target="sample-rate-bits"></xref>.</t> <t>For those players that do support and are able to render multi-channel audio, many do not parse and use the WAVEFORMATEXTENSIBLE_CHANNEL_MASK tag (see <xref target="channel-mask"></xref>). Thistoois also an interoperability considerationwherebecause compatibility cannot be improved without sacrificing the lossless nature of the FLAC format.</t> </section> <section anchor="changing-audio-properties-mid-stream"><name>Changingaudio properties mid-stream</name>Audio Properties Mid-Stream</name> <t>Each FLAC frame header stores the audio sample rate, number of bits per sample, and number of channels independently of the streaminfo metadata block and other frame headers. This was done to permit multicasting of FLAC files, but it also allows these properties to change mid-stream. However, many FLAC decoders do not handle such changes, as few other formats are capable of holding such streams and changing playback properties during playback is often not possible without interrupting playback. Also, as explained in <xref target="frame-structure"></xref>, using this feature of FLAC results in various practical problems.</t> <t>However, even when storing an audio stream with changing properties in FLAC encapsulated in a container capable of handling such changes, as recommended in <xref target="frame-structure"></xref>, many decoders are not able to decode such a stream correctly. Therefore, maximum compatibility with decoders is achieved when FLAC files are created with a single set of audio properties, in which the properties coded in the streaminfo metadata block (see <xref target="streaminfo"></xref>) and the properties coded in all frame headers (see <xref target="frame-header"></xref>) are the same. This can be achieved by splitting up an input stream with changing audio properties at the points where these properties change into separate streams or files.</t> </section> </section> <section anchor="examples"><name>Examples</name> <t>This informational appendix contains shortexampleexamples of FLAC files that are decoded step by step. These examples provide a more engaging way to understand the FLAC format than the formal specification. The text explaining these examples assumes the reader has at least cursorily read the specification and that the reader refers to the specification for explanation of the terminology used. These examples mostly focus on the layout of several metadatablocks andblocks, subframetypestypes, and the implications of certain aspects(for example,(e.g., wasted bits and stereo decorrelation) on this layout.</t> <t>The examples feature files generated by various FLAC encoders. These are presented in hexadecimal or binary format, followed by tables and text referring to various features by their starting bit positions in these representations. Each starting position (shortened to'start'"start" in the tables) is a hexadecimal byte position and a start bit within that byte, separated by a plus sign. Counts for these start at zero. For example, a feature starting at the 3rd bit of the 17th byte is referred to as starting at 0x10+2. The files that are explored in these examples can be found at <xref target="FLAC-specification-github"></xref>.</t> <t>All data in this appendix has been thoroughly verified. However, as this appendix is informational, if any information here conflicts with statements in the formal specification, the latter takes precedence.</t> <section anchor="decoding-example-1"><name>DecodingexampleExample 1</name> <t>This very short example FLAC file codes for PCM audio that has two channels, each containing one sample. The focus of this example is on the essential parts of a FLAC file.</t> <section anchor="example-file-1-in-hexadecimal-representation"><name>ExamplefileFile 1 inhexadecimal representation</name> <artwork><![CDATA[00000000:Hexadecimal Representation</name> <artwork type=""> 00000000: 664c 6143 8000 0022 1000 1000 fLaC...".... 0000000c: 0000 0f00 000f 0ac4 42f0 0000 ........B... 00000018: 0001 3e84 b418 07dc 6903 0758 ..>.....i..X 00000024: 6a3d ad1a 2e0f fff8 6918 0000 j=......i... 00000030: bf03 58fd 0312 8baa 9a ..X......]]></artwork> </section> <section anchor="example-file-1-in-binary-representation"><name>ExamplefileFile 1 inbinary representation</name> <artwork><![CDATA[00000000:Binary Representation</name> <artwork type=""> 00000000: 01100110 01001100 01100001 01000011 fLaC 00000004: 10000000 00000000 00000000 00100010 ..." 00000008: 00010000 00000000 00010000 00000000 .... 0000000c: 00000000 00000000 00001111 00000000 .... 00000010: 00000000 00001111 00001010 11000100 .... 00000014: 01000010 11110000 00000000 00000000 B... 00000018: 00000000 00000001 00111110 10000100 ..>. 0000001c: 10110100 00011000 00000111 11011100 .... 00000020: 01101001 00000011 00000111 01011000 i..X 00000024: 01101010 00111101 10101101 00011010 j=.. 00000028: 00101110 00001111 11111111 11111000 .... 0000002c: 01101001 00011000 00000000 00000000 i... 00000030: 10111111 00000011 01011000 11111101 ..X. 00000034: 00000011 00010010 10001011 10101010 .... 00000038: 10011010]]></artwork> </section> <section anchor="signature-and-streaminfo"><name>Signature andstreaminfo</name>Streaminfo</name> <t>The first 4 bytes of the file contain thefLaC<tt>fLaC</tt> file signature. Directly following it is a metadata block. The signature and the first metadata block header are broken down in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x00+0</td> <td align="left">4 bytes</td> <td align="left">0x664C6143</td> <tdalign="left">fLaC</td>align="left"><tt>fLaC</tt></td> </tr> <tr> <td align="left">0x04+0</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Last metadata block</td> </tr> <tr> <td align="left">0x04+1</td> <td align="left">7 bits</td> <td align="left">0b0000000</td> <td align="left">Streaminfo metadata block</td> </tr> <tr> <td align="left">0x05+0</td> <td align="left">3 bytes</td> <td align="left">0x000022</td> <td align="left">Length of 34byte</td>bytes</td> </tr> </tbody> </table><t>As the header indicates that this is the last metadata block, the position of the first audio frame can now be calculated as the position of the first byte after the metadata block header + the length of the block, i.e., 8+34 = 42 or 0x2a.As can be seen,Thus, 0x2a indeed contains the frame sync code for fixed block sizestreams,streams -- 0xfff8.</t> <t>The streaminfo metadata block contents are broken down in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x08+0</td> <td align="left">2 bytes</td> <td align="left">0x1000</td> <td align="left">Min. block size 4096</td> </tr> <tr> <td align="left">0x0a+0</td> <td align="left">2 bytes</td> <td align="left">0x1000</td> <td align="left">Max. block size 4096</td> </tr> <tr> <td align="left">0x0c+0</td> <td align="left">3 bytes</td> <td align="left">0x00000f</td> <td align="left">Min. frame size 15byte</td>bytes</td> </tr> <tr> <td align="left">0x0f+0</td> <td align="left">3 bytes</td> <td align="left">0x00000f</td> <td align="left">Max. frame size 15byte</td>bytes</td> </tr> <tr> <td align="left">0x12+0</td> <td align="left">20 bits</td> <td align="left">0x0ac4, 0b0100</td> <td align="left">Sample rate 44100 hertz</td> </tr> <tr> <td align="left">0x14+4</td> <td align="left">3 bits</td> <td align="left">0b001</td> <td align="left">2 channels</td> </tr> <tr> <td align="left">0x14+7</td> <td align="left">5 bits</td> <td align="left">0b01111</td> <td align="left">Sample bit depth 16</td> </tr> <tr> <td align="left">0x15+4</td> <td align="left">36 bits</td> <td align="left">0b0000, 0x00000001</td> <td align="left">Total no. of samples 1</td> </tr> <tr> <td align="left">0x1a</td> <td align="left">16 bytes</td> <td align="left">(...)</td> <td align="left">MD5 checksum</td> </tr> </tbody> </table><t>The minimum and maximum blocksizesizes are both 4096. This was apparently the block size the encoder planned to use, but as only 1 interchannel sample was provided, no frames with 4096 samples are actually present in this file.</t> <t>Note that anywhere a number of samples is mentioned (block size, total number of samples, sample rate), interchannel samples are meant.</t> <t>The MD5 checksum (starting at 0x1a) is 0x3e84 b418 07dc 6903 0758 6a3d ad1a 2e0f. This will be validated after decoding the samples.</t> </section> <section anchor="audio-frames"><name>Audioframes</name>Frames</name> <t>The frame header starts at position 0x2a and is broken down in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x2a+0</td> <td align="left">15 bits</td> <td align="left">0xff, 0b1111100</td> <tdalign="left">framealign="left">Frame sync</td> </tr> <tr> <td align="left">0x2b+7</td> <td align="left">1 bit</td> <td align="left">0b0</td> <tdalign="left">blockingalign="left">Blocking strategy</td> </tr> <tr> <td align="left">0x2c+0</td> <td align="left">4 bits</td> <td align="left">0b0110</td> <td align="left">8-bit block size further down</td> </tr> <tr> <td align="left">0x2c+4</td> <td align="left">4 bits</td> <td align="left">0b1001</td> <tdalign="left">samplealign="left">Sample rate 44.1 kHz</td> </tr> <tr> <td align="left">0x2d+0</td> <td align="left">4 bits</td> <td align="left">0b0001</td> <tdalign="left">stereo,align="left">Stereo, no decorrelation</td> </tr> <tr> <td align="left">0x2d+4</td> <td align="left">3 bits</td> <td align="left">0b100</td> <tdalign="left">bitalign="left">Bit depth 16bit</td>bits</td> </tr> <tr> <td align="left">0x2d+7</td> <td align="left">1 bit</td> <td align="left">0b0</td> <tdalign="left">mandatoryalign="left">Mandatory 0 bit</td> </tr> <tr> <td align="left">0x2e+0</td> <td align="left">1 byte</td> <td align="left">0x00</td> <tdalign="left">framealign="left">Frame number 0</td> </tr> <tr> <td align="left">0x2f+0</td> <td align="left">1 byte</td> <td align="left">0x00</td> <tdalign="left">blockalign="left">Block size 1</td> </tr> <tr> <td align="left">0x30+0</td> <td align="left">1 byte</td> <td align="left">0xbf</td> <tdalign="left">framealign="left">Frame header CRC</td> </tr> </tbody> </table><t>As the stream is a fixed block size stream, the number at 0x2e contains a frame number.AsBecause the value is smaller than 128, only 1 byte is used for the encoding.</t> <t>At byte 0x31, the first subframe starts, which is broken down in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x31+0</td> <td align="left">1 bit</td> <td align="left">0b0</td> <tdalign="left">mandatoryalign="left">Mandatory 0 bit</td> </tr> <tr> <td align="left">0x31+1</td> <td align="left">6 bits</td> <td align="left">0b000001</td> <tdalign="left">verbatimalign="left">Verbatim subframe</td> </tr> <tr> <td align="left">0x31+7</td> <td align="left">1 bit</td> <td align="left">0b1</td> <tdalign="left">wastedalign="left">Wasted bits used</td> </tr> <tr> <td align="left">0x32+0</td> <td align="left">2 bits</td> <td align="left">0b01</td> <td align="left">2 wasted bits used</td> </tr> <tr> <td align="left">0x32+2</td> <td align="left">14 bits</td> <td align="left">0b011000, 0xfd</td> <td align="left">14-bit unencoded sample</td> </tr> </tbody> </table><t>As the wasted bits flag is 1 in this subframe,an unary codeda unary-coded number follows. Starting at 0x32, we see 0b01, which unary codes for 1, meaning that this subframe uses 2 wasted bits.</t> <t>As this is a verbatim subframe, the subframe only contains unencoded sample values. With a block size of 1, it contains only a single sample. The bit depth of the audio is 16 bits, but as the subframe header signals the use of 2 wasted bits, only 14 bits are stored. As no stereo decorrelation is used, a bit depth increase for the side channel is not applicable. So, the next 14 bits (starting at position 0x32+2) contain the unencoded sample coded big-endian, signed two's complement. The value reads 0b011000 11111101, or 6397. This value needs to be shifted left by 2bits,bits to account for the wasted bits. The value is then 0b011000 11111101 00, or 25588.</t> <t>The second subframe starts at0x34,0x34 and is broken down in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x34+0</td> <td align="left">1 bit</td> <td align="left">0b0</td> <tdalign="left">mandatoryalign="left">Mandatory 0 bit</td> </tr> <tr> <td align="left">0x34+1</td> <td align="left">6 bits</td> <td align="left">0b000001</td> <tdalign="left">verbatimalign="left">Verbatim subframe</td> </tr> <tr> <td align="left">0x34+7</td> <td align="left">1 bit</td> <td align="left">0b1</td> <tdalign="left">wastedalign="left">Wasted bits used</td> </tr> <tr> <td align="left">0x35+0</td> <td align="left">4 bits</td> <td align="left">0b0001</td> <td align="left">4 wasted bits used</td> </tr> <tr> <td align="left">0x35+4</td> <td align="left">12 bits</td> <td align="left">0b0010, 0x8b</td> <td align="left">12-bit unencoded sample</td> </tr> </tbody></table><t>Here the</table><t>The wasted bits flag is also one, but theunary codedunary-coded number that follows it is 4bitbits long, indicating the use of 4 wasted bits. This means the sample is stored in 12 bits. The sample value is 0b0010 10001011, or 651. This value now has to be shifted left by 4 bits, i.e., 0b0010 1000101100000000, or 10416.</t> <t>At this point, we would undo stereo decorrelation if that was applicable.</t> <t>As the last subframe ends byte-aligned, no padding bits follow it. The next 2 bytes, starting at 0x38, contain the frame CRC. As this is the only frame in the file, the file ends with the CRC.</t> <t>To validate the MD5 checksum, we line up the samples interleaved, byte-aligned,little endian,little-endian, signed two's complement. The first sample, with value 25588, translates to 0xf463, and the second sample, with value 10416, translates to 0xb028. When computing the MD5 checksum with 0xf463b028 as input, we get the MD5 checksum found in the header, so decoding was lossless.</t> </section> </section> <section anchor="decoding-example-2"><name>DecodingexampleExample 2</name> <t>This FLAC file is larger than the first example, but still contains very little audio. The focus of this example is on decoding a subframe with a fixed predictor and a coded residual, but it also contains a very shortseektable,seek table, a Vorbis comment metadata block, and a padding metadata block.</t> <section anchor="example-file-2-in-hexadecimal-representation"><name>ExamplefileFile 2 inhexadecimal representation</name> <artwork><![CDATA[00000000:Hexadecimal Representation</name> <artwork> 00000000: 664c 6143 0000 0022 0010 0010 fLaC...".... 0000000c: 0000 1700 0044 0ac4 42f0 0000 .....D..B... 00000018: 0013 d5b0 5649 75e9 8b8d 8b93 ....VIu..... 00000024: 0422 757b 8103 0300 0012 0000 ."u{........ 00000030: 0000 0000 0000 0000 0000 0000 ............ 0000003c: 0000 0010 0400 003a 2000 0000 .......: ... 00000048: 7265 6665 7265 6e63 6520 6c69 reference li 00000054: 6246 4c41 4320 312e 332e 3320 bFLAC 1.3.3 00000060: 3230 3139 3038 3034 0100 0000 20190804.... 0000006c: 0e00 0000 5449 544c 453d d7a9 ....TITLE=.. 00000078: d79c d795 d79d 8100 0006 0000 ............ 00000084: 0000 0000 fff8 6998 000f 9912 ......i..... 00000090: 0867 0162 3d14 4299 8f5d f70d .g.b=.B..].. 0000009c: 6fe0 0c17 caeb 2100 0ee7 a77a o.....!....z 000000a8: 24a1 590c 1217 b603 097b 784f $.Y......{xO 000000b4: aa9a 33d2 85e0 70ad 5b1b 4851 ..3...p.[.HQ 000000c0: b401 0d99 d2cd 1a68 f1e6 b810 .......h.... 000000cc: fff8 6918 0102 a402 c382 c40b ..i......... 000000d8: c14a 03ee 48dd 03b6 7c13 30 .J..H...|.0]]></artwork> </section> <section anchor="example-file-2-in-binary-representation-only-audio-frames"><name>ExamplefileFile 2 inbinary representation (only audio frames)</name> <artwork><![CDATA[00000088:Binary Representation (Only Audio Frames)</name> <artwork type=""> 00000088: 11111111 11111000 01101001 10011000 ..i. 0000008c: 00000000 00001111 10011001 00010010 .... 00000090: 00001000 01100111 00000001 01100010 .g.b 00000094: 00111101 00010100 01000010 10011001 =.B. 00000098: 10001111 01011101 11110111 00001101 .].. 0000009c: 01101111 11100000 00001100 00010111 o... 000000a0: 11001010 11101011 00100001 00000000 ..!. 000000a4: 00001110 11100111 10100111 01111010 ...z 000000a8: 00100100 10100001 01011001 00001100 $.Y. 000000ac: 00010010 00010111 10110110 00000011 .... 000000b0: 00001001 01111011 01111000 01001111 .{xO 000000b4: 10101010 10011010 00110011 11010010 ..3. 000000b8: 10000101 11100000 01110000 10101101 ..p. 000000bc: 01011011 00011011 01001000 01010001 [.HQ 000000c0: 10110100 00000001 00001101 10011001 .... 000000c4: 11010010 11001101 00011010 01101000 ...h 000000c8: 11110001 11100110 10111000 00010000 .... 000000cc: 11111111 11111000 01101001 00011000 ..i. 000000d0: 00000001 00000010 10100100 00000010 .... 000000d4: 11000011 10000010 11000100 00001011 .... 000000d8: 11000001 01001010 00000011 11101110 .J.. 000000dc: 01001000 11011101 00000011 10110110 H... 000000e0: 01111100 00010011 00110000 |.0]]></artwork> </section> <section anchor="streaminfo-metadata-block"><name>Streaminfometadata block</name>Metadata Block</name> <t>Most of the streaminfo metadata block, including its header, is the same as in example 1, so only parts that are different are listed in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x04+0</td> <td align="left">1 bit</td> <td align="left">0b0</td> <td align="left">Not the last metadata block</td> </tr> <tr> <td align="left">0x08+0</td> <td align="left">2 bytes</td> <td align="left">0x0010</td> <td align="left">Min. block size 16</td> </tr> <tr> <td align="left">0x0a+0</td> <td align="left">2 bytes</td> <td align="left">0x0010</td> <td align="left">Max. block size 16</td> </tr> <tr> <td align="left">0x0c+0</td> <td align="left">3 bytes</td> <td align="left">0x000017</td> <td align="left">Min. frame size 23byte</td>bytes</td> </tr> <tr> <td align="left">0x0f+0</td> <td align="left">3 bytes</td> <td align="left">0x000044</td> <td align="left">Max. frame size 68byte</td>bytes</td> </tr> <tr> <td align="left">0x15+4</td> <td align="left">36 bits</td> <td align="left">0b0000, 0x00000013</td> <td align="left">Total no. of samples 19</td> </tr> <tr> <td align="left">0x1a</td> <td align="left">16 bytes</td> <td align="left">(...)</td> <td align="left">MD5 checksum</td> </tr> </tbody> </table><t>This time, the minimum and maximum block sizes are reflected in the file: there is one block of 16 samples, and the last block (which has 3 samples) is not considered for the minimum block size. The MD5 checksum is 0xd5b0 5649 75e9 8b8d 8b93 0422 757b8103, this8103. This will be verified at the end of this example.</t> </section> <sectionanchor="seektable-1"><name>Seektable</name>anchor="seektable-1"><name>Seek Table</name> <t>Theseektableseek table metadata block only holds one entry. It is not really useful here, as it points to the first frame, but it is enough for this example. Theseektableseek table metadata block is broken down in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x2a+0</td> <td align="left">1 bit</td> <td align="left">0b0</td> <td align="left">Not the last metadata block</td> </tr> <tr> <td align="left">0x2a+1</td> <td align="left">7 bits</td> <td align="left">0b0000011</td> <tdalign="left">Seektablealign="left">Seek table metadata block</td> </tr> <tr> <td align="left">0x2b+0</td> <td align="left">3 bytes</td> <td align="left">0x000012</td> <td align="left">Length 18byte</td>bytes</td> </tr> <tr> <td align="left">0x2e+0</td> <td align="left">8 bytes</td> <td align="left">0x0000000000000000</td> <tdalign="left">Seekpointalign="left">Seek point to sample 0</td> </tr> <tr> <td align="left">0x36+0</td> <td align="left">8 bytes</td> <td align="left">0x0000000000000000</td> <tdalign="left">Seekpointalign="left">Seek point to offset 0</td> </tr> <tr> <td align="left">0x3e+0</td> <td align="left">2 bytes</td> <td align="left">0x0010</td> <tdalign="left">Seekpointalign="left">Seek point to block size 16</td> </tr> </tbody> </table></section> <section anchor="vorbis-comment-1"><name>Vorbiscomment</name>Comment</name> <t>The Vorbis comment metadata block contains the vendor string and a single comment. It is broken down in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x40+0</td> <td align="left">1 bit</td> <td align="left">0b0</td> <td align="left">Not the last metadata block</td> </tr> <tr> <td align="left">0x40+1</td> <td align="left">7 bits</td> <td align="left">0b0000100</td> <td align="left">Vorbis comment metadata block</td> </tr> <tr> <td align="left">0x41+0</td> <td align="left">3 bytes</td> <td align="left">0x00003a</td> <td align="left">Length 58byte</td>bytes</td> </tr> <tr> <td align="left">0x44+0</td> <td align="left">4 bytes</td> <td align="left">0x20000000</td> <td align="left">Vendor string length 32byte</td>bytes</td> </tr> <tr> <td align="left">0x48+0</td> <td align="left">32 bytes</td> <td align="left">(...)</td> <td align="left">Vendor string</td> </tr> <tr> <td align="left">0x68+0</td> <td align="left">4 bytes</td> <td align="left">0x01000000</td> <td align="left">Number of fields 1</td> </tr> <tr> <td align="left">0x6c+0</td> <td align="left">4 bytes</td> <td align="left">0x0e000000</td> <td align="left">Field length 14byte</td>bytes</td> </tr> <tr> <td align="left">0x70+0</td> <td align="left">14 bytes</td> <td align="left">(...)</td> <td align="left">Field contents</td> </tr> </tbody> </table><t>The vendor string is reference libFLAC 1.3.3 20190804, and the field contents of the only field is TITLE=שלום. The Vorbis comment field is 14 bytes but only 10 characters in size, because it contains four 2-byte characters.</t> </section> <section anchor="padding-1"><name>Padding</name> <t>The last metadata block is a (very short) padding block.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x7e+0</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Last metadata block</td> </tr> <tr> <td align="left">0x7e+1</td> <td align="left">7 bits</td> <td align="left">0b0000001</td> <td align="left">Padding metadata block</td> </tr> <tr> <td align="left">0x7f+0</td> <td align="left">3 bytes</td> <td align="left">0x000006</td> <td align="left">Length 6 byte</td> </tr> <tr> <td align="left">0x82+0</td> <td align="left">6 bytes</td> <td align="left">0x000000000000</td> <td align="left">Padding bytes</td> </tr> </tbody> </table></section> <section anchor="first-audio-frame"><name>Firstaudio frame</name>Audio Frame</name> <t>The frame header starts at position 0x88 and is broken down in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x88+0</td> <td align="left">15 bits</td> <td align="left">0xff, 0b1111100</td> <tdalign="left">framealign="left">Frame sync</td> </tr> <tr> <td align="left">0x89+7</td> <td align="left">1 bit</td> <td align="left">0b0</td> <tdalign="left">blockingalign="left">Blocking strategy</td> </tr> <tr> <td align="left">0x8a+0</td> <td align="left">4 bits</td> <td align="left">0b0110</td> <td align="left">8-bit block size further down</td> </tr> <tr> <td align="left">0x8a+4</td> <td align="left">4 bits</td> <td align="left">0b1001</td> <tdalign="left">samplealign="left">Sample rate 44.1 kHz</td> </tr> <tr> <td align="left">0x8b+0</td> <td align="left">4 bits</td> <td align="left">0b1001</td> <tdalign="left">side-rightalign="left">Side-right stereo</td> </tr> <tr> <td align="left">0x8b+4</td> <td align="left">3 bits</td> <td align="left">0b100</td> <tdalign="left">bitalign="left">Bit depth 16 bit</td> </tr> <tr> <td align="left">0x8b+7</td> <td align="left">1 bit</td> <td align="left">0b0</td> <tdalign="left">mandatoryalign="left">Mandatory 0 bit</td> </tr> <tr> <td align="left">0x8c+0</td> <td align="left">1 byte</td> <td align="left">0x00</td> <tdalign="left">framealign="left">Frame number 0</td> </tr> <tr> <td align="left">0x8d+0</td> <td align="left">1 byte</td> <td align="left">0x0f</td> <tdalign="left">blockalign="left">Block size 16</td> </tr> <tr> <td align="left">0x8e+0</td> <td align="left">1 byte</td> <td align="left">0x99</td> <tdalign="left">framealign="left">Frame header CRC</td> </tr> </tbody> </table><t>The first subframe starts at byte 0x8f, and it is broken down in the followingtabletable, excluding the coded residual. As this subframe codes for a side channel, the bit depth is increased by 1 bit from 16bitbits to 17bit.bits. This is most clearly present in the unencoded warm-up sample.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x8f+0</td> <td align="left">1 bit</td> <td align="left">0b0</td> <tdalign="left">mandatoryalign="left">Mandatory 0 bit</td> </tr> <tr> <td align="left">0x8f+1</td> <td align="left">6 bits</td> <td align="left">0b001001</td> <tdalign="left">fixedalign="left">Fixed subframe, 1st order</td> </tr> <tr> <td align="left">0x8f+7</td> <td align="left">1 bit</td> <td align="left">0b0</td> <tdalign="left">noalign="left">No wasted bits used</td> </tr> <tr> <td align="left">0x90+0</td> <td align="left">17 bits</td> <td align="left">0x0867, 0b0</td> <tdalign="left">unencodedalign="left">Unencoded warm-up sample</td> </tr> </tbody> </table><t>The coded residual is broken down in the following table. All quotients are unary coded, and all remainders are stored unencoded with a number of bits specified by the Rice parameter.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x92+1</td> <td align="left">2 bits</td> <td align="left">0b00</td> <td align="left">Rice code with 4-bit parameter</td> </tr> <tr> <td align="left">0x92+3</td> <td align="left">4 bits</td> <td align="left">0b0000</td> <td align="left">Partition order 0</td> </tr> <tr> <td align="left">0x92+7</td> <td align="left">4 bits</td> <td align="left">0b1011</td> <td align="left">Rice parameter 11</td> </tr> <tr> <td align="left">0x93+3</td> <td align="left">4 bits</td> <td align="left">0b0001</td> <td align="left">Quotient 3</td> </tr> <tr> <td align="left">0x93+7</td> <td align="left">11 bits</td> <td align="left">0b00011110100</td> <td align="left">Remainder 244</td> </tr> <tr> <td align="left">0x95+2</td> <td align="left">2 bits</td> <td align="left">0b01</td> <td align="left">Quotient 1</td> </tr> <tr> <td align="left">0x95+4</td> <td align="left">11 bits</td> <td align="left">0b01000100001</td> <td align="left">Remainder 545</td> </tr> <tr> <td align="left">0x96+7</td> <td align="left">2 bits</td> <td align="left">0b01</td> <td align="left">Quotient 1</td> </tr> <tr> <td align="left">0x97+1</td> <td align="left">11 bits</td> <td align="left">0b00110011000</td> <td align="left">Remainder 408</td> </tr> <tr> <td align="left">0x98+4</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Quotient 0</td> </tr> <tr> <td align="left">0x98+5</td> <td align="left">11 bits</td> <td align="left">0b11101011101</td> <td align="left">Remainder 1885</td> </tr> <tr> <td align="left">0x9a+0</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Quotient 0</td> </tr> <tr> <td align="left">0x9a+1</td> <td align="left">11 bits</td> <td align="left">0b11101110000</td> <td align="left">Remainder 1904</td> </tr> <tr> <td align="left">0x9b+4</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Quotient 0</td> </tr> <tr> <td align="left">0x9b+5</td> <td align="left">11 bits</td> <td align="left">0b10101101111</td> <td align="left">Remainder 1391</td> </tr> <tr> <td align="left">0x9d+0</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Quotient 0</td> </tr> <tr> <td align="left">0x9d+1</td> <td align="left">11 bits</td> <td align="left">0b11000000000</td> <td align="left">Remainder 1536</td> </tr> <tr> <td align="left">0x9e+4</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Quotient 0</td> </tr> <tr> <td align="left">0x9e+5</td> <td align="left">11 bits</td> <td align="left">0b10000010111</td> <td align="left">Remainder 1047</td> </tr> <tr> <td align="left">0xa0+0</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Quotient 0</td> </tr> <tr> <td align="left">0xa0+1</td> <td align="left">11 bits</td> <td align="left">0b10010101110</td> <td align="left">Remainder 1198</td> </tr> <tr> <td align="left">0xa1+4</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Quotient 0</td> </tr> <tr> <td align="left">0xa1+5</td> <td align="left">11 bits</td> <td align="left">0b01100100001</td> <td align="left">Remainder 801</td> </tr> <tr> <td align="left">0xa3+0</td> <td align="left">13 bits</td> <td align="left">0b0000000000001</td> <td align="left">Quotient 12</td> </tr> <tr> <td align="left">0xa4+5</td> <td align="left">11 bits</td> <td align="left">0b11011100111</td> <td align="left">Remainder 1767</td> </tr> <tr> <td align="left">0xa6+0</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Quotient 0</td> </tr> <tr> <td align="left">0xa6+1</td> <td align="left">11 bits</td> <td align="left">0b01001110111</td> <td align="left">Remainder 631</td> </tr> <tr> <td align="left">0xa7+4</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Quotient 0</td> </tr> <tr> <td align="left">0xa7+5</td> <td align="left">11 bits</td> <td align="left">0b01000100100</td> <td align="left">Remainder 548</td> </tr> <tr> <td align="left">0xa9+0</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Quotient 0</td> </tr> <tr> <td align="left">0xa9+1</td> <td align="left">11 bits</td> <td align="left">0b01000010101</td> <td align="left">Remainder 533</td> </tr> <tr> <td align="left">0xaa+4</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Quotient 0</td> </tr> <tr> <td align="left">0xaa+5</td> <td align="left">11 bits</td> <td align="left">0b00100001100</td> <td align="left">Remainder 268</td> </tr> </tbody> </table><t>At this point, the decoder should know it is done decoding the coded residual, as it received 16 samples: 1 warm-up sample and 15 residual samples. Each residual sample can be calculated from the quotient andremainder,remainder and from undoing thezig-zagzigzag encoding. For example, the value of the firstzig-zag encodedzigzag-encoded residual sample is 3 *2^112<sup>11</sup> + 244 = 6388. As this is an even number, thezig-zagzigzag encoding is undone by dividing by2,2; the residual sample value is 3194. This is done for all residual samples in the next table.</t> <table> <thead> <tr> <th align="left">Quotient</th> <th align="left">Remainder</th> <thalign="left">Zig-zag encoded</th>align="left">Zigzag Encoded</th> <th align="left">Residualsample value</th>Sample Value</th> </tr> </thead> <tbody> <tr> <td align="left">3</td> <td align="left">244</td> <td align="left">6388</td> <td align="left">3194</td> </tr> <tr> <td align="left">1</td> <td align="left">545</td> <td align="left">2593</td> <td align="left">-1297</td> </tr> <tr> <td align="left">1</td> <td align="left">408</td> <td align="left">2456</td> <td align="left">1228</td> </tr> <tr> <td align="left">0</td> <td align="left">1885</td> <td align="left">1885</td> <td align="left">-943</td> </tr> <tr> <td align="left">0</td> <td align="left">1904</td> <td align="left">1904</td> <td align="left">952</td> </tr> <tr> <td align="left">0</td> <td align="left">1391</td> <td align="left">1391</td> <td align="left">-696</td> </tr> <tr> <td align="left">0</td> <td align="left">1536</td> <td align="left">1536</td> <td align="left">768</td> </tr> <tr> <td align="left">0</td> <td align="left">1047</td> <td align="left">1047</td> <td align="left">-524</td> </tr> <tr> <td align="left">0</td> <td align="left">1198</td> <td align="left">1198</td> <td align="left">599</td> </tr> <tr> <td align="left">0</td> <td align="left">801</td> <td align="left">801</td> <td align="left">-401</td> </tr> <tr> <td align="left">12</td> <td align="left">1767</td> <td align="left">26343</td> <td align="left">-13172</td> </tr> <tr> <td align="left">0</td> <td align="left">631</td> <td align="left">631</td> <td align="left">-316</td> </tr> <tr> <td align="left">0</td> <td align="left">548</td> <td align="left">548</td> <td align="left">274</td> </tr> <tr> <td align="left">0</td> <td align="left">533</td> <td align="left">533</td> <td align="left">-267</td> </tr> <tr> <td align="left">0</td> <td align="left">268</td> <td align="left">268</td> <td align="left">134</td> </tr> </tbody></table><t>It can be calculated that</table> <t>In this case, using a Rice codeis, in this case,is more efficient than storing values unencoded. The Rice code (excluding the partition order and parameter) is 199 bits in length. The largest residual value (-13172) would need 15 bits to be stored unencoded, so storing all 15 samples with 15 bits results in a sequence with a length of 225 bits.</t> <t>The next step is using the predictor and the residuals to restore the sample values. As this subframe uses a fixed predictor with order 1,this means addingthe residual value is added to the value of the previous sample.</t> <table> <thead> <tr> <th>Residual</th> <th align="left">Samplevalue</th>Value</th> </tr> </thead> <tbody> <tr> <td>(warm-up)</td> <td align="left">4302</td> </tr> <tr> <td>3194</td> <td align="left">7496</td> </tr> <tr> <td>-1297</td> <td align="left">6199</td> </tr> <tr> <td>1228</td> <td align="left">7427</td> </tr> <tr> <td>-943</td> <td align="left">6484</td> </tr> <tr> <td>952</td> <td align="left">7436</td> </tr> <tr> <td>-696</td> <td align="left">6740</td> </tr> <tr> <td>768</td> <td align="left">7508</td> </tr> <tr> <td>-524</td> <td align="left">6984</td> </tr> <tr> <td>599</td> <td align="left">7583</td> </tr> <tr> <td>-401</td> <td align="left">7182</td> </tr> <tr> <td>-13172</td> <td align="left">-5990</td> </tr> <tr> <td>-316</td> <td align="left">-6306</td> </tr> <tr> <td>274</td> <td align="left">-6032</td> </tr> <tr> <td>-267</td> <td align="left">-6299</td> </tr> <tr> <td>134</td> <td align="left">-6165</td> </tr> </tbody> </table><t>With this, the decoding of the first subframe is complete. The decoding of the second subframe is very similar, as it also uses a fixed predictor of order1, so this1. This is left as an exercise for thereader,reader; the results are in the next table. The next step is undoing stereo decorrelation, which is done in the following table. As the stereo decorrelation is side-right, the samples in the right channel come directly from the second subframe, while the samples in the left channel are found by adding the values of both subframes for each sample.</t> <table> <thead> <tr> <th align="left">Subframe 1</th> <th align="left">Subframe 2</th> <th align="left">Left</th> <th align="left">Right</th> </tr> </thead> <tbody> <tr> <td align="left">4302</td> <td align="left">6070</td> <td align="left">10372</td> <td align="left">6070</td> </tr> <tr> <td align="left">7496</td> <td align="left">10545</td> <td align="left">18041</td> <td align="left">10545</td> </tr> <tr> <td align="left">6199</td> <td align="left">8743</td> <td align="left">14942</td> <td align="left">8743</td> </tr> <tr> <td align="left">7427</td> <td align="left">10449</td> <td align="left">17876</td> <td align="left">10449</td> </tr> <tr> <td align="left">6484</td> <td align="left">9143</td> <td align="left">15627</td> <td align="left">9143</td> </tr> <tr> <td align="left">7436</td> <td align="left">10463</td> <td align="left">17899</td> <td align="left">10463</td> </tr> <tr> <td align="left">6740</td> <td align="left">9502</td> <td align="left">16242</td> <td align="left">9502</td> </tr> <tr> <td align="left">7508</td> <td align="left">10569</td> <td align="left">18077</td> <td align="left">10569</td> </tr> <tr> <td align="left">6984</td> <td align="left">9840</td> <td align="left">16824</td> <td align="left">9840</td> </tr> <tr> <td align="left">7583</td> <td align="left">10680</td> <td align="left">18263</td> <td align="left">10680</td> </tr> <tr> <td align="left">7182</td> <td align="left">10113</td> <td align="left">17295</td> <td align="left">10113</td> </tr> <tr> <td align="left">-5990</td> <td align="left">-8428</td> <td align="left">-14418</td> <td align="left">-8428</td> </tr> <tr> <td align="left">-6306</td> <td align="left">-8895</td> <td align="left">-15201</td> <td align="left">-8895</td> </tr> <tr> <td align="left">-6032</td> <td align="left">-8476</td> <td align="left">-14508</td> <td align="left">-8476</td> </tr> <tr> <td align="left">-6299</td> <td align="left">-8896</td> <td align="left">-15195</td> <td align="left">-8896</td> </tr> <tr> <td align="left">-6165</td> <td align="left">-8653</td> <td align="left">-14818</td> <td align="left">-8653</td> </tr> </tbody> </table><t>As the second subframe ends byte-aligned, no padding bits follow it. Finally, the last 2 bytes of the frame contain the frame CRC.</t> </section> <section anchor="second-audio-frame"><name>Secondaudio frame</name>Audio Frame</name> <t>The second audio frame is very similar to the frame decoded in the first example, but thistime not 1 buttime, 3 samples (not 1) are present.</t> <t>The frame header starts at position 0xcc and is broken down in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0xcc+0</td> <td align="left">15 bits</td> <td align="left">0xff, 0b1111100</td> <tdalign="left">framealign="left">Frame sync</td> </tr> <tr> <td align="left">0xcd+7</td> <td align="left">1 bit</td> <td align="left">0b0</td> <tdalign="left">blockingalign="left">Blocking strategy</td> </tr> <tr> <td align="left">0xce+0</td> <td align="left">4 bits</td> <td align="left">0b0110</td> <td align="left">8-bit block size further down</td> </tr> <tr> <td align="left">0xce+4</td> <td align="left">4 bits</td> <td align="left">0b1001</td> <tdalign="left">samplealign="left">Sample rate 44.1 kHz</td> </tr> <tr> <td align="left">0xcf+0</td> <td align="left">4 bits</td> <td align="left">0b0001</td> <tdalign="left">stereo,align="left">Stereo, no decorrelation</td> </tr> <tr> <td align="left">0xcf+4</td> <td align="left">3 bits</td> <td align="left">0b100</td> <tdalign="left">bitalign="left">Bit depth 16bit</td>bits</td> </tr> <tr> <td align="left">0xcf+7</td> <td align="left">1 bit</td> <td align="left">0b0</td> <tdalign="left">mandatoryalign="left">Mandatory 0 bit</td> </tr> <tr> <td align="left">0xd0+0</td> <td align="left">1 byte</td> <td align="left">0x01</td> <tdalign="left">framealign="left">Frame number 1</td> </tr> <tr> <td align="left">0xd1+0</td> <td align="left">1 byte</td> <td align="left">0x02</td> <tdalign="left">blockalign="left">Block size 3</td> </tr> <tr> <td align="left">0xd2+0</td> <td align="left">1 byte</td> <td align="left">0xa4</td> <tdalign="left">framealign="left">Frame header CRC</td> </tr> </tbody> </table><t>The first subframe starts at 0xd3+0 and is broken down in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0xd3+0</td> <td align="left">1 bit</td> <td align="left">0b0</td> <tdalign="left">mandatoryalign="left">Mandatory 0 bit</td> </tr> <tr> <td align="left">0xd3+1</td> <td align="left">6 bits</td> <td align="left">0b000001</td> <tdalign="left">verbatimalign="left">Verbatim subframe</td> </tr> <tr> <td align="left">0xd3+7</td> <td align="left">1 bit</td> <td align="left">0b0</td> <tdalign="left">noalign="left">No wasted bits used</td> </tr> <tr> <td align="left">0xd4+0</td> <td align="left">16 bits</td> <td align="left">0xc382</td> <td align="left">16-bit unencoded sample</td> </tr> <tr> <td align="left">0xd6+0</td> <td align="left">16 bits</td> <td align="left">0xc40b</td> <td align="left">16-bit unencoded sample</td> </tr> <tr> <td align="left">0xd8+0</td> <td align="left">16 bits</td> <td align="left">0xc14a</td> <td align="left">16-bit unencoded sample</td> </tr> </tbody> </table><t>The second subframe starts at 0xda+0 and is broken down in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0xda+0</td> <td align="left">1 bit</td> <td align="left">0b0</td> <tdalign="left">mandatoryalign="left">Mandatory 0 bit</td> </tr> <tr> <td align="left">0xda+1</td> <td align="left">6 bits</td> <td align="left">0b000001</td> <tdalign="left">verbatimalign="left">Verbatim subframe</td> </tr> <tr> <td align="left">0xda+7</td> <td align="left">1 bit</td> <td align="left">0b1</td> <tdalign="left">wastedalign="left">Wasted bits used</td> </tr> <tr> <td align="left">0xdb+0</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">1 wasted bit used</td> </tr> <tr> <td align="left">0xdb+1</td> <td align="left">15 bits</td> <td align="left">0b110111001001000</td> <td align="left">15-bit unencoded sample</td> </tr> <tr> <td align="left">0xdd+0</td> <td align="left">15 bits</td> <td align="left">0b110111010000001</td> <td align="left">15-bit unencoded sample</td> </tr> <tr> <td align="left">0xde+7</td> <td align="left">15 bits</td> <td align="left">0b110110110011111</td> <td align="left">15-bit unencoded sample</td> </tr> </tbody> </table><t>As this subframe uses wasted bits, the 15-bit unencoded samples need to be shifted left by 1 bit. For example, sample 1 is stored as -4536 and becomes -9072 after shifting left 1 bit.</t> <t>As the last subframe does not end on byte alignment, 2 padding bits are added before the2 byte2-byte frameCRCCRC, which follows at 0xe1+0.</t> </section> <section anchor="md5-checksum-verification"><name>MD5checksum verification</name>Checksum Verification</name> <t>All samples in the file have been decoded, and we can now verify the MD5 checksum. All sample values must be interleaved and storedsigned,signed coded little-endian. The result of this follows in groups of 12 samples (i.e., 6 interchannel samples) per line.</t><artwork><![CDATA[0x8428<artwork type=""> 0x8428 B617 7946 3129 5E3A 2722 D445 D128 0B3D B723 EB45 DF28 0x723f 1E25 9D46 4929 B841 7026 5747 B829 8F43 8127 AEC7 14DF 0x9FC4 41DD 54C7 E4DE A5C4 40DD 1EC6 33DE 82C3 90DC 0BC4 02DD 0x4AC1 3EDB]]></artwork> <t>The MD5 checksum of this is indeed the same as the one found in the streaminfo metadata block.</t> </section> </section> <section anchor="decoding-example-3"><name>DecodingexampleExample 3</name> <t>This example is once again a very short FLAC file. The focus of this example is on decoding a subframe with a linear predictor and a coded residual with more than one partition.</t> <section anchor="example-file-3-in-hexadecimal-representation"><name>ExamplefileFile 3 inhexadecimal representation</name> <artwork><![CDATA[00000000:Hexadecimal Representation</name> <artwork type=""> 00000000: 664c 6143 8000 0022 1000 1000 fLaC...".... 0000000c: 0000 1f00 001f 07d0 0070 0000 .........p.. 00000018: 0018 f8f9 e396 f5cb cfc6 dc80 ............ 00000024: 7f99 7790 6b32 fff8 6802 0017 ..w.k2..h... 00000030: e944 004f 6f31 3d10 47d2 27cb .D.Oo1=.G.'. 0000003c: 6d09 0831 452b dc28 2222 8057 m..1E+.("".W 00000048: a3 .]]></artwork> </section> <section anchor="example-file-3-in-binary-representation-only-audio-frame"><name>ExamplefileFile 3 inbinary representation (only audio frame)</name> <artwork><![CDATA[0000002a:Binary Representation (Only Audio Frame)</name> <artwork type=""> 0000002a: 11111111 11111000 01101000 00000010 ..h. 0000002e: 00000000 00010111 11101001 01000100 ...D 00000032: 00000000 01001111 01101111 00110001 .Oo1 00000036: 00111101 00010000 01000111 11010010 =.G. 0000003a: 00100111 11001011 01101101 00001001 '.m. 0000003e: 00001000 00110001 01000101 00101011 .1E+ 00000042: 11011100 00101000 00100010 00100010 .("" 00000046: 10000000 01010111 10100011 .W.]]></artwork> </section> <section anchor="streaminfo-metadata-block-1"><name>Streaminfometadata block</name>Metadata Block</name> <t>Most of the streaminfo metadata block, including its header, is the same as in example 1, so only parts that are different are listed in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x0c+0</td> <td align="left">3 bytes</td> <td align="left">0x00001f</td> <td align="left">Min. frame size 31byte</td>bytes</td> </tr> <tr> <td align="left">0x0f+0</td> <td align="left">3 bytes</td> <td align="left">0x00001f</td> <td align="left">Max. frame size 31byte</td>bytes</td> </tr> <tr> <td align="left">0x12+0</td> <td align="left">20 bits</td> <td align="left">0x07d0, 0x0000</td> <td align="left">Sample rate 32000 hertz</td> </tr> <tr> <td align="left">0x14+4</td> <td align="left">3 bits</td> <td align="left">0b000</td> <td align="left">1 channel</td> </tr> <tr> <td align="left">0x14+7</td> <td align="left">5 bits</td> <td align="left">0b00111</td> <td align="left">Sample bit depth 8bit</td>bits</td> </tr> <tr> <td align="left">0x15+4</td> <td align="left">36 bits</td> <td align="left">0b0000, 0x00000018</td> <td align="left">Total no. of samples 24</td> </tr> <tr> <td align="left">0x1a</td> <td align="left">16 bytes</td> <td align="left">(...)</td> <td align="left">MD5 checksum</td> </tr> </tbody> </table></section> <section anchor="audio-frame"><name>Audioframe</name>Frame</name> <t>The frame header starts at position 0x2a and is broken down in the following table.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x2a+0</td> <td align="left">15 bits</td> <td align="left">0xff, 0b1111100</td> <td align="left">Frame sync</td> </tr> <tr> <td align="left">0x2b+7</td> <td align="left">1 bit</td> <td align="left">0b0</td> <td align="left">blocking strategy</td> </tr> <tr> <td align="left">0x2c+0</td> <td align="left">4 bits</td> <td align="left">0b0110</td> <td align="left">8-bit block size further down</td> </tr> <tr> <td align="left">0x2c+4</td> <td align="left">4 bits</td> <td align="left">0b1000</td> <td align="left">Sample rate 32 kHz</td> </tr> <tr> <td align="left">0x2d+0</td> <td align="left">4 bits</td> <td align="left">0b0000</td> <td align="left">Mono audio (1 channel)</td> </tr> <tr> <td align="left">0x2d+4</td> <td align="left">3 bits</td> <td align="left">0b001</td> <td align="left">Bit depth 8bit</td>bits</td> </tr> <tr> <td align="left">0x2d+7</td> <td align="left">1 bit</td> <td align="left">0b0</td> <td align="left">Mandatory 0 bit</td> </tr> <tr> <td align="left">0x2e+0</td> <td align="left">1 byte</td> <td align="left">0x00</td> <td align="left">Frame number 0</td> </tr> <tr> <td align="left">0x2f+0</td> <td align="left">1 byte</td> <td align="left">0x17</td> <td align="left">Block size 24</td> </tr> <tr> <td align="left">0x30+0</td> <td align="left">1 byte</td> <td align="left">0xe9</td> <td align="left">Frame header CRC</td> </tr> </tbody> </table><t>The first and only subframe starts at byte0x31, it0x31. It is broken down in the following table, without the coded residual.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x31+0</td> <td align="left">1 bit</td> <td align="left">0b0</td> <td align="left">Mandatory 0 bit</td> </tr> <tr> <td align="left">0x31+1</td> <td align="left">6 bits</td> <td align="left">0b100010</td> <td align="left">Linear prediction subframe, 3rd order</td> </tr> <tr> <td align="left">0x31+7</td> <td align="left">1 bit</td> <td align="left">0b0</td> <td align="left">No wasted bits used</td> </tr> <tr> <td align="left">0x32+0</td> <td align="left">8 bits</td> <td align="left">0x00</td> <td align="left">Unencoded warm-up sample 0</td> </tr> <tr> <td align="left">0x33+0</td> <td align="left">8 bits</td> <td align="left">0x4f</td> <td align="left">Unencoded warm-up sample 79</td> </tr> <tr> <td align="left">0x34+0</td> <td align="left">8 bits</td> <td align="left">0x6f</td> <td align="left">Unencoded warm-up sample 111</td> </tr> <tr> <td align="left">0x35+0</td> <td align="left">4 bits</td> <td align="left">0b0011</td> <td align="left">Coefficient precision 4 bit</td> </tr> <tr> <td align="left">0x35+4</td> <td align="left">5 bits</td> <td align="left">0b00010</td> <td align="left">Prediction right shift 2</td> </tr> <tr> <td align="left">0x36+1</td> <td align="left">4 bits</td> <td align="left">0b0111</td> <td align="left">Predictor coefficient 7</td> </tr> <tr> <td align="left">0x36+5</td> <td align="left">4 bits</td> <td align="left">0b1010</td> <td align="left">Predictor coefficient -6</td> </tr> <tr> <td align="left">0x37+1</td> <td align="left">4 bits</td> <td align="left">0b0010</td> <td align="left">Predictor coefficient 2</td> </tr> </tbody> </table><t>The data stream continues with the coded residual, which is broken down in the following table. Residual partitions 3 and 4 are left as an exercise for the reader.</t> <table> <thead> <tr> <th align="left">Start</th> <th align="left">Length</th> <th align="left">Contents</th> <th align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="left">0x37+5</td> <td align="left">2 bits</td> <td align="left">0b00</td> <td align="left">Rice-coded residual, 4-bit parameter</td> </tr> <tr> <td align="left">0x37+7</td> <td align="left">4 bits</td> <td align="left">0b0010</td> <td align="left">Partition order 2</td> </tr> <tr> <td align="left">0x38+3</td> <td align="left">4 bits</td> <td align="left">0b0011</td> <td align="left">Rice parameter 3</td> </tr> <tr> <td align="left">0x38+7</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Quotient 0</td> </tr> <tr> <td align="left">0x39+0</td> <td align="left">3 bits</td> <td align="left">0b110</td> <td align="left">Remainder 6</td> </tr> <tr> <td align="left">0x39+3</td> <td align="left">1 bit</td> <td align="left">0b1</td> <td align="left">Quotient 0</td> </tr> <tr> <td align="left">0x39+4</td> <td align="left">3 bits</td> <td align="left">0b001</td> <td align="left">Remainder 1</td> </tr> <tr> <td align="left">0x39+7</td> <td align="left">4 bits</td> <td align="left">0b0001</td> <td align="left">Quotient 3</td> </tr> <tr> <td align="left">0x3a+3</td> <td align="left">3 bits</td> <td align="left">0b001</td> <td align="left">Remainder 1</td> </tr> <tr> <td align="left">0x3a+6</td> <td align="left">4 bits</td> <td align="left">0b1111</td> <td align="left">No Rice parameter, escape code</td> </tr> <tr> <td align="left">0x3b+2</td> <td align="left">5 bits</td> <td align="left">0b00101</td> <td align="left">Partition encoded with 5 bits</td> </tr> <tr> <td align="left">0x3b+7</td> <td align="left">5 bits</td> <td align="left">0b10110</td> <td align="left">Residual -10</td> </tr> <tr> <td align="left">0x3c+4</td> <td align="left">5 bits</td> <td align="left">0b11010</td> <td align="left">Residual -6</td> </tr> <tr> <td align="left">0x3d+1</td> <td align="left">5 bits</td> <td align="left">0b00010</td> <td align="left">Residual 2</td> </tr> <tr> <td align="left">0x3d+6</td> <td align="left">5 bits</td> <td align="left">0b01000</td> <td align="left">Residual 8</td> </tr> <tr> <td align="left">0x3e+3</td> <td align="left">5 bits</td> <td align="left">0b01000</td> <td align="left">Residual 8</td> </tr> <tr> <td align="left">0x3f+0</td> <td align="left">5 bits</td> <td align="left">0b00110</td> <td align="left">Residual 6</td> </tr> <tr> <td align="left">0x3f+5</td> <td align="left">4 bits</td> <td align="left">0b0010</td> <td align="left">Rice parameter 2</td> </tr> <tr> <td align="left">0x40+1</td> <td align="left">22 bits</td> <td align="left">(...)</td> <td align="left">Residual partition 3</td> </tr> <tr> <td align="left">0x42+7</td> <td align="left">4 bits</td> <td align="left">0b0001</td> <td align="left">Rice parameter 1</td> </tr> <tr> <td align="left">0x43+3</td> <td align="left">23 bits</td> <td align="left">(...)</td> <td align="left">Residual partition 4</td> </tr> </tbody> </table><t>The frame ends with 6 padding bits and a2 byte2-byte frameCRC</t>CRC.</t> <t>To decode this subframe, 21 predictions have to be calculated and added to their corresponding residuals. This is a sequential process: as each prediction uses previous samples, it is not possible to start this decoding halfway through a subframe or decode a subframe with parallel threads.</t> <t>The following table breaks down the calculation for each sample. For example, the predictor without shift value of row 4 is found by applying the predictor with the three warm-up samples: 7*111 - 6*79 + 2*0 = 303. This value is then shifted right by 2 bits: 303 >> 2 = 75. Then, the decoded residual sample is added: 75 + 3 = 78.</t> <table> <thead> <tr> <th>Residual</th> <th align="left">Predictor w/oshift</th>Shift</th> <th align="left">Predictor</th> <th align="left">Samplevalue</th>Value</th> </tr> </thead> <tbody> <tr> <td>(warm-up)</td> <td align="left">N/A</td> <td align="left">N/A</td> <td align="left">0</td> </tr> <tr> <td>(warm-up)</td> <td align="left">N/A</td> <td align="left">N/A</td> <td align="left">79</td> </tr> <tr> <td>(warm-up)</td> <td align="left">N/A</td> <td align="left">N/A</td> <td align="left">111</td> </tr> <tr> <td>3</td> <td align="left">303</td> <td align="left">75</td> <td align="left">78</td> </tr> <tr> <td>-1</td> <td align="left">38</td> <td align="left">9</td> <td align="left">8</td> </tr> <tr> <td>-13</td> <td align="left">-190</td> <td align="left">-48</td> <td align="left">-61</td> </tr> <tr> <td>-10</td> <td align="left">-319</td> <td align="left">-80</td> <td align="left">-90</td> </tr> <tr> <td>-6</td> <td align="left">-248</td> <td align="left">-62</td> <td align="left">-68</td> </tr> <tr> <td>2</td> <td align="left">-58</td> <td align="left">-15</td> <td align="left">-13</td> </tr> <tr> <td>8</td> <td align="left">137</td> <td align="left">34</td> <td align="left">42</td> </tr> <tr> <td>8</td> <td align="left">236</td> <td align="left">59</td> <td align="left">67</td> </tr> <tr> <td>6</td> <td align="left">191</td> <td align="left">47</td> <td align="left">53</td> </tr> <tr> <td>0</td> <td align="left">53</td> <td align="left">13</td> <td align="left">13</td> </tr> <tr> <td>-3</td> <td align="left">-93</td> <td align="left">-24</td> <td align="left">-27</td> </tr> <tr> <td>-5</td> <td align="left">-161</td> <td align="left">-41</td> <td align="left">-46</td> </tr> <tr> <td>-4</td> <td align="left">-134</td> <td align="left">-34</td> <td align="left">-38</td> </tr> <tr> <td>-1</td> <td align="left">-44</td> <td align="left">-11</td> <td align="left">-12</td> </tr> <tr> <td>1</td> <td align="left">52</td> <td align="left">13</td> <td align="left">14</td> </tr> <tr> <td>1</td> <td align="left">94</td> <td align="left">23</td> <td align="left">24</td> </tr> <tr> <td>4</td> <td align="left">60</td> <td align="left">15</td> <td align="left">19</td> </tr> <tr> <td>2</td> <td align="left">17</td> <td align="left">4</td> <td align="left">6</td> </tr> <tr> <td>2</td> <td align="left">-24</td> <td align="left">-6</td> <td align="left">-4</td> </tr> <tr> <td>2</td> <td align="left">-26</td> <td align="left">-7</td> <td align="left">-5</td> </tr> <tr> <td>0</td> <td align="left">1</td> <td align="left">0</td> <td align="left">0</td> </tr> </tbody> </table><t>By lining up all thesesamples up,samples, we get the following input for the MD5 checksum calculationprocess.</t> <artwork><![CDATA[0x004Fprocess:</t> <artwork type=""> 0x004F 6F4E 08C3 A6BC F32A 4335 0DE5 D2DA F40E 1813 06FC FB00]]></artwork><t>Which<t>This indeed results in the MD5 checksum found in the streaminfo metadata block.</t> </section> </section> </section> <section numbered="false" anchor="acknowledgments"><name>Acknowledgments</name> <t>FLAC owes much to the many people who have advanced the audio compression field so freely. For instance:</t> <ul> <li><t><contact fullname="Tony Robinson"/>: He worked on Shorten, and his paper (see <xref target="Robinson-TR156"></xref>) is a good starting point on some of the basic methods used by FLAC. FLAC trivially extends and improves the fixed predictors, LPC coefficient quantization, and Rice coding used in Shorten.</t></li> <li><t><contact fullname="Solomon W. Golomb"/> and <contact fullname="Robert F. Rice"/>: Their universal codes are used by FLAC's entropy coder. See <xref target="Rice"></xref>.</t></li> <li><t><contact fullname="Norman Levinson"/> and <contact fullname="James Durbin"/>: The FLAC reference encoder uses an algorithm developed and refined by them for determining the LPC coefficients from the autocorrelation coefficients. See <xref target="Durbin"></xref>).</t></li> <li><t><contact fullname="Claude Shannon"/>: See <xref target="Shannon"></xref>.</t></li> </ul> <t>The FLAC format, the FLAC reference implementation <xref target="FLAC-implementation"/>, and the initial draft version of this document were originally developed by <contact fullname="Josh Coalson"/>. While many others have contributed since, this original effort is deeply appreciated. </t> </section> </back> </rfc>