Overview

Introduction

This document provides detailed delivery information for all accepted media and files for the iTunes Store, including music, music video, television, and movies. If further details are required, contact your iTunes Technical Representative.

Quality is important to us at iTunes. We expect to receive the highest-quality assets available. Our product must meet or exceed the quality of the physical product already out in the marketplace. For example, if 5.1 surround sound or closed captions exist on the physical version of the product, those must be provided. If the physical product gives the chapters actual names (as opposed to Chapter 1, Chapter 2, and so on), then our product should have those same chapter titles. If the album is in stereo, stereo audio must be provided.

Changes Made in this Release

Date/Version

Changes Made

October 4, 2021 - Version 5.3.8

Corrected accepted audio sample rates. Clarified Dolby Atmos audio.

For a complete history of changes, see Previous Guide Revisions.

What’s New in the iTunes Video and Audio Asset Guide 5.3.8?

Music: Immersive Audio Source Profile Clarification

Dolby Atmos, 5.1, and 7.1 audio files generated automatically and/or algorithmically from a stereo master are not allowed. See Immersive Audio Source Profile for more requirements.

Music: Audio Profile Correction

Apple accepts audio with a sampling rate of 44.1, 48, 88.2, 96, 176.4, or 192 kHz with 16-bit or 24-bit resolution.

Music Audio Content Profiles

Music Audio Source Profile

Apple accepts audio with a sampling rate of 44.1, 48, 88.2, 96, 176.4, or 192 kHz with 16-bit or 24-bit resolution. Note that if stereo audio source exists, it must be used. See Apple Digital Masters Source Profile for audio requirements specific to Apple Digital Masters content.

Where stereo source is not available, as may be the case with certain vintage or field recordings, send audio source with two identical channels for left and right. Single-channel audio will not be accepted.

Uncompressed audio formats supported are:

Format

Container Type

Qualified CODEC

Pulse-Code Modulation (PCM)

WAV (.wav)

Apple Lossless (ALAC)

M4A (.m4a)

QuickTime https://www.apple.com/quicktime

Apple https://www.apple.com/itunes

CAF (.caf)

iTunes Producer

Free Lossless Audio Codec (FLAC)

FLAC (.flac)

FLAC https://xiph.org/flac/

All other audio formats will be rejected.

Important: All audio must be generated using a CODEC qualified and approved by Apple.

Apple Digital Masters Source Profile

The audio for Apple Digital Masters (formerly “Mastered for iTunes’”) must follow these requirements to be badged as such in Apple's services:

  • Audio must be delivered at 24-bit resolution in an approved format.

  • Acceptable sample rates are 44.1, 48, 88.2, 96, 176.4, and 192 kHz.

For an in-depth description of Apple Digital Masters, refer to the technology brief here: Apple Digital Masters.

Best Practices for Apple Digital Masters Content

The following lists best practices for producing Apple Digital Masters content. Apple recommends communicating these best practices to your mastering house:

  • Source format must have been minimum 24-bit with a minimum sample rate of 44.1 kHz. (Up-sampling and/or bit-padding of 44.1 kHz/16-bit files is not allowed.)

  • All masters must have been auditioned as encoded by the current Apple AAC encoder either with the “Apple Digital Masters Droplet", "RoundTripAAC" plug-in, or the Sonnox "Pro-Codec V2" plug-in that includes Apple's "iTunes +" AAC CODEC. Use these tools to set an appropriate level so that the encode doesn't show clipping.

  • Although Apple doesn't reject masters for specific numbers of clips, audible clipping caused by excessive levels to the encoder may be reason for tracks to not be badged and marketed as an “Apple Digital Master."

  • The format of the masters must be 24-bit PCM at a sample rate of 44.1, 48, 88.2, 96, 176.4, or 192 kHz. Native resolution of project should be used — do not down-sample. (ALAC or FLAC lossless compression is acceptable.)

Immersive Audio Source Profile

The immersive source audio must meet the following requirements:

For Dolby Atmos music deliverables:

Note: Dolby Atmos audio files generated automatically and/or algorithmically from a stereo master are not allowed.

  • Dolby Atmos master file must be provided as a BWF ADM file using 24fps timecode for all tracks.

  • All audio must be 24-bit LPCM audio at 48kHz.

  • All deliverables must be conformed and synced to the original stereo reference masters.

  • Target Loudness value should not exceed -18 LKFS measured as per ITU-R BS. 1770-4.

  • True-peak level should not exceed -1 dB TP measured as per ITU-R BS. 1770-4.

  • For albums where gapless playback is intended between tracks:

    • Each album track must be delivered as an individual BWF ADM file.

    • Each track boundary must be no more than half a frame (1,000 samples @ 48 kHz) earlier or later than the same track boundary in the corresponding stereo deliverable.

    • There must be no additional silence at the end of each track when compared to the same track from the corresponding stereo deliverable.

For 5.1 Surround and 7.1 Surround deliverables:

Note: 5.1 and 7.1 audio files generated automatically and/or algorithmically from a stereo master are not allowed.

  • Audio files must be provided as LPCM in .mov containers with channel assignments.

  • At least 48kHz with 16-bit or 24-bit resolution.

  • Target Loudness value should not exceed -18 LKFS measured as per ITU-R BS. 1770-4.

  • True-peak level should not exceed -1 dB TP measured as per ITU-R BS. 1770-4.

  • All deliverables should be conformed and synced to the original stereo reference master.

Immersive Audio Channel Assignments

  • The QuickTime .mov file extension is expected for immersive 5.1 and 7.1 audio content.

  • For 7.1 surround audio, each audio channel must have an assignment, which must match one of the options in the table below. Option 1 shows one track with all eight channels and Option 2 shows one track for each channel. Note that “Rsl” (Rear Surround Left) can also be represented by “Rls” (Rear Left Surround).

    image of table showing audio channel assignments
  • For 5.1 immersive audio, the channel assignments must match one of the options in the table below. For Option 1 (one track with all six channels), the order of the channel assignments can vary as noted in Option 1a, 1b, 1c, and 1d.

    Audio channel assignment table
    • For all stereo tracks in all options listed, the channel assignments can be indicated using L and R, or Lt and Rt, but not L and Rt, or R and Lt.

    Refer to Audio Channel Assignments for instructions on applying audio channel assignments and label descriptions.

Best Practices for Immersive Audio Content

Keep the following in mind when delivering immersive audio:

  • On initial delivery, you must deliver a data file with the role audio.2_0 containing the 2.0 stereo audio source unless you instruct Apple to downmix, using the audio.transform_to.2_0 attribute.

  • Only two audio sources per track (one stereo, one immersive) are allowed.

  • You cannot deliver two immersive audio sources or two stereo sources for the same track.

  • A single data file containing both 2.0 stereo source and 5.1 or 7.1 immersive source is not supported.

  • The difference between the most recently delivered (new or update) stereo source and immersive source durations must be less than or equal to 50 milliseconds.

  • All tracks in an album should be at same frame rate.

Ringtone Source Profile

  • Sampling rate of 44.1 kHz with 16-bit or 24-bit resolution and 96 kHz with 24-bit resolution

  • Must be lossless

  • WAV, FLAC, or ALAC format

  • Minimum length is 5 seconds and the maximum length is 30 seconds

See the table in Music Audio Source Profile for the uncompressed audio formats that are supported. All other audio formats will be rejected. Note that if stereo audio source exists, it must be used.

Important: All audio must be generated using a CODEC qualified and approved by Apple.

Music Album Cover Art Profile

  • JPEG with .jpg extension (quality unconstrained) or PNG with .png extension

  • Color space: RGB (screen standard)

  • Album cover art files should be a recommended size of 3000 x 3000 pixels; minimum size of 1400 x 1400 pixels.

  • For ringtones, minimum size of 800 x 800 pixels. 1400 x 1400 pixels recommended for best results

  • Images must be square

  • File formats: JPEG or PNG (100% quality)

  • 1:1 aspect ratio

Do not increase the size of a smaller image to meet the minimum size standard. Excessively blurry or pixelated images will be rejected.

Important: CMYK (print standard) images will not be accepted.

Music Digital Booklet Profile

  • PDF format with .pdf extension

  • Four-page minimum

  • No more than 10 MB in size

  • All fonts embedded

  • 11 in x 8.264 in (28 cm x 21 cm)

  • RGB color

  • Horizontal presentation

  • All images full-bleed as shown in sample pages

Important: These booklets are expressly designed for the Apple format, and cannot be reproductions of the liner notes with borders to increase their size. Booklets are not available on Apple Music.

Content Considerations

  • When saving as PDF, make sure the document opens full screen with no negative space surrounding the document.

  • If the digital booklet is many pages, consider using fewer images or optimizing images to achieve lower overall file size.

  • Printer’s marks are not allowed.

  • You cannot sell or advertise other products or services. No other promotional sites are allowed.

  • No links to anything outside of the booklet, except to the artist and/or label website(s).

  • No time-sensitive information (for example, a promotion or dates for an upcoming tour or concert).

Digital booklet examples

Music Video Content Profiles

Music Video SD Source Profile

Note: Chaptering is not supported for music videos.

NTSC and PAL

  • ITU-R BT.601 color space, Long GOP

  • Accepted formats and dimensions:

Format

Bitrate

Encoded Dimensions

Display Dimensions

Apple ProRes 422 (HQ)

VBR: 40-60 Mbps

NTSC: 720 x 480 or 720 x 4861

853 x 480 for 16:9 content

640 x 480 for 4:3 content

Apple ProRes 422 (HQ)

VBR: 40-60 Mbps

PAL: 720 x 576

1024 x 576 for 16:9 content

768 x 576 for 4:3 content

MPEG-2 Program Stream Main Profile

15 Mbps minimum

NTSC or PAL: 640 x height2

640 x height2

1 Content that has 720 x 486 encoded pixels should have a minimum of 4 pixels of black at the top and 2 at the bottom; crop values for top and bottom must total at least 6 pixels

2 Height depends on aspect ratio of source with a maximum of 480 pixels

  • All encoded content must include pixel aspect ratio (pasp), preferably one that results in a display aspect ratio of 4:3 or 16:9.

  • Native frame rate of original source: 

    • 29.97 interlaced frames per second video source, delivered as either interlaced or de-interlaced properly tagged as progressive

    • 24, 25, or 30 frames per second sourced from film, delivered as progressive

    • 23.976 frames per second for inverse telecine, delivered as progressive; must not be delivered interlaced or delivery will fail

    • For mixed frame rate material, contact your iTunes Technical Representative

  • Interlaced content must be tagged non-progressive and field ordering must be defined in the stream

  • Field dominance must be properly tagged (top field first, bottom field first, or progressive)

  • Gamma values are accepted and the value must be between 2.15 and 2.25

  • Telecine materials will not be accepted

  • Video source may be delivered matted: letterbox, pillarbox, or windowbox.

    • If the SD source is matted, the SD source should be delivered in its full-frame state with metadata included to specify the crop rectangle. See Music Video Single Crop Dimensions Metadata Example in the iTunes Package Music Specification for details.

    • If the SD source file is not delivered matted or if there are no inactive pixels, Apple recommends setting all crop dimension attributes to '0' (zero).

    • 4:3 standard definition video should not be delivered anamorphic 16:9 with matting.

Important: All video must begin and end with at least one black frame. In addition, videos can only have empty edits in the last edit of the edit list; videos with empty edits other than the last edit will be blocked.

Music Video HD Source Profile

Note: Chaptering is not supported for music videos.

  • Apple ProRes 422 (HQ) or 4444 or 4444 (XQ)

  • VBR expected at ~220 Mbps

  • HD encoded dimensions accepted to support square pixel aspect ratios (PASP):

Encoded

PASP

Converted to ProRes From

1920 x 1080

1:1

HDCAM SR, D5, ATSC

1280 x 720

1:1

ATSC progressive

  • HD encoded dimensions accepted to support non-square pixel aspect ratios (this allows you to send HD video in the native dimensions of your best original source, for example in HD broadcast dimensions*):

Encoded

PASP

Converted to ProRes From

1440 x 1080

1:1.33333

XDCAM-HD, HDCAM

1280 x 1080

1:1.5

DVCProHD interlaced

960 x 720

1:1.33333

DVCProHD progressive

* If your original HD source pixel aspect ratio is non-square, contact your iTunes Technical Representative before delivery.

  • Native frame rate of original source:

    • 29.97 or 25 interlaced frames per second for video-sourced material

    • 23.976, 24, 25, or 30 frames per second for digital-progressive or film-sourced material.

  • Gamma values are accepted and the value must be between 2.15 and 2.25

  • HD source may be delivered matted: letterbox, pillarbox, or windowbox.

    • The HD source may be delivered in its full-frame state with metadata included to specify the crop rectangle. See Music Video Single Crop Dimensions Metadata Example in the iTunes Package Music Specification for details.

    • If the HD source file is not delivered matted or if there are no inactive pixels, Apple recommends setting all crop dimension attributes to '0' (zero).

Important: All video must begin and end with at least one black frame. In addition, videos that begin with or contain empty edits will be blocked; the file can contain an empty edit in its edit list only if it is the last edit.

Music Video HDR Source Profile

The HDR source video must meet the following minimum requirements in the subsections below.

For both Dolby® Vision and HDR10

  • Display dimensions and PASP must match corresponding primary video display dimensions and PASP

  • HDR source video must have progressive scan and uniform frame rate

  • HDR source video track should only contain a single edit list entry

  • Duration must match corresponding primary video duration

  • Frame count must match corresponding primary video frame count

  • HDR format (Dolby® Vision or HDR10) specified as a source attribute

  • HDR video source that contains embedded audio will be accepted, but the audio will be ignored; the audio on the SDR source will be used for bundling.

  • Gamma values are accepted and the value must be between 2.15 and 2.25

  • HDR source may be delivered matted: letterbox, pillarbox, or windowbox.

    • The HDR source may be delivered in its full-frame state with metadata included to specify the crop rectangle. See Music Video Single Crop Dimensions Metadata Example in the iTunes Package Music Specification for details.

    • If the HDR source file is not delivered matted or if there are no inactive pixels, Apple recommends setting all crop dimension attributes to '0' (zero).

  • All video must begin and end with at least one black frame. In addition, videos that begin with or contain empty edits will be blocked; the file can contain an empty edit in its edit list only if it is the last edit. For HDR source video in Dolby® Vision format, the sidecar metadata file should cover these black frames.

For Dolby® Vision

  • Single Apple ProRes 4444 or 4444 XQ 12-bit file accompanied by a single Dolby® Vision CM metadata file (Dolby Vision CM version 2.9 and CM version 4.0 sidecar metadata files are supported)

  • Transfer function: SMPTE ST 2084 (PQ)

  • White point and color primaries: ITU-R BT.2020 or D65 P3

  • Transform matrix: BT.2020 (for BT.2020 primaries) or BT.709 (for D65 P3 primaries)

  • The Dolby® Vision CM metadata should not contain any gaps in the shots and the frames referenced in the shots should cover all frames in the video essence

For HDR10

  • Single Apple ProRes 4444 or 4444 XQ 12-bit file

  • Transfer function: SMPTE ST 2084 (PQ)

  • White point and color primaries: ITU-R BT.2020 or D65 P3

  • Transform matrix: BT.2020 (for BT.2020 primaries) or BT.709 (for D65 P3 primaries)

  • ST 2086 and MaxCLL / MaxFALL metadata provided as additional source attributes

Music Video 4K Source Profile

  • Dimensions should be 3840 x 2160 (UHD) or 4096 X 2160 (DCI 4k). Any DCI 4k asset can have optional crop values (Apple strongly recommends sending crop values for DCI 4k).

  • Apple ProRes 422 HQ or 4444 or 4444 XQ

  • VBR expected at ~880 Mbps for 422 HQ, ~1320 Mbps for 4444 and ~2000 Mbps for 4444 XQ

  • Content should be encoded using ITU-R BT.709 color space. For more information see http://www.itu.int/rec/R-REC-BT.709/en

  • Content should be delivered in the original frame rate of the source

  • 4K source must be progressive scan and can be delivered in 23.976, 24, 25, 29.97, or 30 frames per second

  • Gamma values are accepted and the value must be between 2.15 and 2.25

  • 4K source may be delivered matted: letterbox, pillarbox, or windowbox.

    • The 4K source may be delivered in its full-frame state with metadata included to specify the crop rectangle. See Music Video Single Crop Dimensions Metadata Example in the iTunes Package Music Specification for details.

    • If the 4K source file is not delivered matted or if there are no inactive pixels, Apple recommends setting all crop dimension attributes to '0' (zero).

  • All video must begin and end with at least one black frame. In addition, videos that begin with or contain empty edits will be blocked; the file can contain an empty edit in its edit list only if it is the last edit.

Music Video Audio Source Profile

If 5.1 Surround is available for a music video audio source, the audio should be delivered in 5.1 Surround in addition to providing a stereo version; otherwise the audio may be delivered in Stereo only.

Surround

  • LPCM in either Big Endian or Little Endian, 16-bit or 24-bit, at least 48kHz

  • Expected channels: L, R, C, LFE, Ls, Rs

Stereo

  • MPEG-1 layer II stereo

  • 384 kpbs

  • 48Khz

  • Included in the same file as the delivered video

Music Video Audio/Video Container

  • Deliver all content in an MPEG-2 Program Stream file container

  • The .mpg file extension is expected for all MPEG-2 content

  • Audio must be delivered muxed with the video stream

Music Video Closed Captioning Profile

Note: Closed captioning can be sent with ProRes and MPEG-2 files.

  • Text in EIA 608 format.

  • Delivered in the same package with the video it references.

  • In a Scenarist SCC formatted file, using .scc file extension.

  • The timecode frame rate can only be 29.97 and is independent from your video source frame rate. The timecode format however must match the timecode format of the source video, either drop frame (DF) or non-drop frame (NDF).

    • Drop frame format has colons for the first two time delimiters and a semi-colon for the last time delimiter (HH:MM:SS;FF)

    • Non-drop frame format has colons for all the time delimiters (HH:MM:SS:FF)

Source Video Frame Rate

Closed Caption Frame Rate

Description

Timecode Format

Timecode Example

29.97, 59.94i

29.97

NTSC Video

DF

HH:MM:SS;FF

25, 50i

29.97

PAL Video

NDF

HH:MM:SS:FF

24

29.97

Film

NDF

HH:MM:SS:FF

23.976

29.97

NTSC Film

NDF

HH:MM:SS:FF

  • Captions should display and synchronize to within one second of the initial, audible dialog to be represented in text.

The timecodes of the captions are relative to the start of the program, and not the QuickTime movie's timecode track.

Currently, iTunes does not support EIA 708 (ATSC closed captioning) or Teletext.

MacCaption is a tool you can use to create .scc files: http://www.telestream.net. (Note that this product is not endorsed by Apple. Apple cannot and does not provide support for third-party products.)

Notes:

  • To test the closed captioning before delivering a video, see Import and preview captions in Compressor.

  • If closed caption data is available for any broadcast or web delivery system, it must be supplied to iTunes.

Music Video Screen Capture Image Profile

Note: Apple strongly recommends using the image_time attribute to deliver the timecode for the screen capture over sending an asset file. Apple will use the frame specified with the timecode to create the image for you. See Music Video Single Metadata Annotations in the iTunes Package Music Specification for details.

  • Screen capture directly from delivered video (specifying the timecode with the image_time attribute in XML is preferred)

  • 16:9 or 4:3 (for legacy videos) aspect ratio

  • Preferred size: 3840 x 2160 - 4K resolution

  • Minimum size (below this size will not be accepted): at least 1920 wide or 1080 high

  • JPEG with .jpg extension (quality unconstrained) or PNG with .png extension

  • RGB color profile (screen standard)

  • Only the active pixel area may be included (no letterboxing / black bars on outer portions of image to fulfill required aspect ratio)

Video screen capture will be rejected if it includes any of the following:

  • Motion blurred imagery

  • Low quality or pixelated imagery

  • Harsh crops of faces, arms, shoulders

  • Text containing the artist name or music video title

  • Third-party trademarks without authorization or usage rights

  • Unintentionally closed eyes

Do not increase the size of a smaller image to meet the minimum size standard. Excessively blurry or pixelated images will be rejected. Images must be taken directly from the video.

Important:

  • Do not increase the size of a smaller image to meet the minimum size standard.

  • CMYK color profile images will not be accepted.

  • Images must be taken directly from the video.

TTML File Format

Overview

This chapter describes the file format used to deliver song lyrics for an album song track (the file format does not apply to music video tracks). The file format is a restricted dialect of TTML Version 1.0 (http://www.w3.org/TR/ttaf1-dfxp/) and also includes some extensions used for Apple-specific properties. The lyrics file must be delivered in the correct format with the .ttml extension. Once the TTML document has been created and saved with the .ttml extension, it is delivered as a file in the <lyrics_file> block for a track.

Note: To understand this document, you should have some familiarity with TTML.

Overview of TTML for Lyrics

You send lyrics for songs in an album (you cannot send lyrics for music videos) in documents formatted according to the TTML format with extensions for features specific to iTunes. Each document contains the lyrics for one song and that file is referred to in the metadata.xml under a <track> tag using the <lyrics_file> tag (see Basic Music Album Metadata Example for how to deliver the lyrics file). A full example of a TTML file and annotations appear later in this chapter.

Note: You can still send lyrics in plain text format using the <lyrics> tag, however, sending lyrics in HTML format is being deprecated.

The following sections briefly describe the standard TTML content elements for lyrics. Throughout this chapter, the symbol iTunes extension symbol indicates an iTunes-specific implementation or recommendation.

Lyrics Structure

The basic structure of a song is based on the standard TTML content elements.

Lyrics Content

TTML Element

Use

Song

<body>

The <body> element block encloses the tags used to deliver the lyrics.

Paragraphs

<div>

The <div> block encloses the tags used to deliver the lyrics for one paragraph (or line).

iTunes extension symbol The <div> block can include a song-part attribute to note its role in the song. See the section below this table for more information.

Lines

<p>

Each line in the lyrics is enclosed within a <p> tag.

The ttm:agent attribute can be added to any <p> and <span> element to indicate which artist is performing the contents.

You can also apply timing to a line or word (see Timing).

Words (or words)

<span>

The <span> tag is used within a <p> tag to apply special conditions to a word or phrase, for example, bold or italic styles or the performer name singing the word.

Line break

<br>

iTunes extension symbol Line breaks are strongly discouraged. Instead, use <p> tags to delimit lines.

iTunes Extension for Lyrics

In addition to the standard TTML, additional extensions have been added for Apple-specific properties. You can use these extensions to specify:

  • words that are unclear or indecipherable

  • words that are explicit

  • different song parts to distinguish between verse and chorus, for example

iTunes extension symbol Unsure Words: When you are unsure of a word or have low confidence in the lyrics transcription for a word, you can mark that word or phrase by enclosing it within a <span> tag and adding the attribute itunes:unsure="true":

<span>Baby it's <span itunes:unsure="true"/>mold</span> outside</span>

iTunes extension symbol Explicit Words: When a word or phrase in a song is considered explicit, you can mark that word or phrase by enclosing it within a <span> tag and adding the attribute itunes:explicit="true":

<span>Baby it's <span itunes:explicit="true"/>explicit</span> outside</span>

iTunes extension symbol Song Parts: To distinguish between one song part and another, you can add the song-part attribute to a <div> element to note its role in the song:

<div itunes:song-part="Chorus">

You can use any value you want for the song-part attribute, but the preferred values are:

  • Verse

  • Chorus

  • PreChorus

  • Bridge

  • Intro

  • Outro

  • Refrain

  • Instrumental

Note: Currently, song parts are not displayed to the users.

Timing

You can add optional timing to the lyrics, by line or even by word, so that the lyrics can be synchronized to the timing of the actual words sung by the singers, much like the display of lyrics in karaoke.

Time values are expressed as a SMIL full- or partial-clock-value:

(Hours ":")? Minutes ":" Seconds ("." Fraction up to 3 digits)?

Specifying hours and fractions of seconds is optional (as indicated by the ?).

The following rules apply for timing:

  • Where timing occurs, both the begin and end attributes must be specified.

  • All begin times must appear in ascending time sequence.

  • All time codes must be valid.

  • The begin time code must be before the end time code in the same element.

  • Time codes of sub-elements must be within the time codes of the parent element.

  • Time codes must be within the duration of song.

  • Time codes must be within the duration of the lyrics (if the duration (dur attribute on the <body> element) is specified in the lyrics file).

  • The timing of <div> elements must not overlap (regardless of any agent attribute).

Text and Whitespace

White space in the document is processed according to rules defined by [XML 1.0], §2.10, White Space Handling. iTunes extension symbol Apple strongly recommends that you adhere to the xml:space="default" rules and refrain from specifying XML whitespace handling attributes.

Lyrics Example XML Delivery

Below is a partial example of a lyrics document file in TTML format.

<?xml version="1.0" encoding="UTF-8"?><tt xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3.org/ns/ttml#styling" xmlns:itunes="http://itunes.apple.com/lyric-ttml-extensions" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xml:lang="en-US"> <head> <metadata> <ttm:title>Baby, It's Cold Outside</ttm:title> </metadata> <ttm:agent xml:id="voice1" type="person" itunes:adamid="73568"> <ttm:name type="full">Ella Fitzgerald</ttm:name> </ttm:agent> <ttm:agent xml:id="voice2" type="person"> <ttm:name type="full">Louis Jordan</ttm:name> </ttm:agent> <!-- Additional agent elements here as needed. --> </head> <body dur="00:05:01.120"> <div itunes:song-part="Verse"> <p begin="00:00:08.000" end="00:00:11.500" ttm:agent="voice1"> I really can't stay </p> <p begin="00:00:09.000" end="00:00:14.000" ttm:agent="voice2"> But, baby it's cold outside </p> <p begin="00:00:12.000" end="00:00:15.000" ttm:agent="voice1"> I've got to go away </p> </div> <div itunes:song-part="Verse"> <p begin="00:01:22.000" end="00:01:31.500" ttm:agent="voice2"> Ooo, baby you're so <span tts:fontStyle="italic">delicious</span> </p> <p begin="00:01:30.00" end="00:01:31.800" ttm:agent="voice1"> Well maybe just one little <span itunes:explicit="true">explicit</span> more </p> <p begin="00:01:32.000" end="00:01:33.500" ttm:agent="voice2"> Never such a <span itunes:unsure="true">buzzard</span> before </p> </div> <!-- Additional div elements here as needed. --> </body></tt>

Lyrics Example Annotations

The symbol iTunes extension symbol in the annotations indicates an Apple-specific implementation or recommendation. The reference numbers (for example, §7.2.2) refer to the section in the TTML specification in which the element is explained.

<?xml version="1.0" encoding="UTF-8"?>

XML Declaration (required)

The character encoding of your document must be defined.

Apple only accepts UTF-8 encoding as it efficiently encodes non-Roman characters.

Important: The TTML file must not contain a byte-order mark (BOM).

<tt xmlns="http://www.w3.org/ns/ttml"

  xmlns:tts="http://www.w3.org/ns/ttml#styling"

  xmlns:itunes="http://itunes.apple.com/lyric-ttml-extensions"

  xmlns:ttm="http://www.w3.org/ns/ttml#metadata"

Document Container §7.1.1 (required)

The tt element begins the iTT document container. The xmlns (for XML namespace) attribute is required and is needed for schema validation. It is used to declare the namespace (and associated schema) to which the tags in the XML are expected to conform. The namespace must be http://www.w3.org/ns/ttml.

You must supply the XML Namespace to allow the use of iTunes extensions: http://itunes.apple.com/lyric-ttml-extensions. Note that this example uses the prefix "itunes" (in the prefix xmlns:itunes), however using "itunes" is not required; you can use any prefix.

xml:lang="en-US">

Language §7.2.2 (required)

This is the same format that is described for use in the locale attribute in the Language Codes, and indicates the language and optional dialect used in the lyrics.

Head Element

<head>

Head §7.1.2 (required)

Begins the head element, which specifies document metadata, such as title, and other document-specific information, such as the artists performing.

Note:iTunes extension symbol Any styling and layout elements included in the Head element will be ignored by Apple processing. You can use bold and italic styles inline (see the annotation for the <p> tag below).

<metadata>

Metadata §8.1.1 (required)

Begins the metadata block.

  <ttm:title>Baby, It's Cold Outside</ttm:title>

</metadata>

Title §12.1.2 (required)

Supplies the title of the song for the lyrics you are delivering.

<ttm:agent xml:id="voice1" type="person" itunes:adamid="73568">

Agent §12.1.5 (optional)

The ttm:agent element is used to indicate all the artists who perform vocal parts of the lyrics. Later in the TTML document when you supply the lyrics, you can indicate which artist is performing a line or word(s) by supplying the value provided with the xml:id attribute. See the annotations for the <p> and <span> elements for how to use the ttm:agent and xml:id attributes.

The attributes supplied with the ttm:agent element include:

  • xid:id supplies the reference ID used to refer to the artist

  • type indicates the type of performer; the value must be "person"

  • itunes:adamid supplies the Apple ID of the artist

iTunes extension symbol The agent attribute can only be specified at one element in the hierarchy. For example, if you specify agent in a parent element (div), you cannot specify agent in a child element (p).

<ttm:name type="full">Ella Fitzgerald</ttm:name>

</ttm:agent>

Name §12.1.6 (required)

ttm:name supplies the name and name type of the performer.

The type attribute is required and must have the value of "full".

Body Element

<body dur="00:05:01.120">

Body §7.1.3 (required)

Begins the body element, which is used to determine the lyrics and their timings. The <body> element is required, but the dur attribute is optional.

<div itunes:song-part="Verse">

Div §7.1.4 (required)

Begins the section within the document where you enter the actual text of the lyrics and the timing.

Each paragraph is represented by a <div> element and each lyrics document can include multiple consecutive <div> elements. iTunes extension symbol Paragraphs identified with multiple <div> tags will not need a <p></br></p> to indicate a blank line.

iTunes extension symbol The optional song-part attribute can be added to a <div> element to note its role in the song. You can use any value you want, but Apple prefers that you use the following values:

  • Verse

  • Chorus

  • PreChorus

  • Bridge

  • Intro

  • Outro

  • Refrain

  • Instrumental

The div in the annotated example contains three paragraph tags (<p>) and each example paragraph is explained in the rows that follow.

Note: Currently, song parts are not displayed to the users.

<p begin="00:00:08.000" end="00:00:11.500" ttm:agent="voice1">

  I really can't stay

</p>

<p begin="00:00:09.000" end="00:00:14.000" ttm:agent="voice2">

  But, baby it's cold outside

</p>

<p begin="00:00:12.000" end="00:00:15.000" ttm:agent="voice1">

  I've got to go away

</p>

</div>

Example Paragraph 1 §7.1.5

Each <p> element defines a period of time in which the lyrics in the line are sung.

Timing:

Adding the timing attributes (begin and end) is optional. The begin attribute indicates the start time of the line of the lyrics and the end attribute indicates when that line ends. For time value details and rules, see Timing.

You can also add timing to a word or phrase by enclosing the word or phrase in a <span> element:

<p begin="00:01:22.000" end="00:01:24.000">

But, baby it's <span begin="00:01:23.200" end="00:01:23.620">cold</span>outside

</p>

Agent:

When the lyrics are sung by different artists, you can specify which artist sings the line using the ttm:agent attribute within a <p> or <span> element. Each artist can be referenced by the reference ID (xid:id) you assigned in the Agent element (ttm:agent) within the Head element (see annotation above). When using the agent attribute, the following timing rules apply:

  • Timed <p> and <span> elements for different agents can overlap (however, the begin times must always appear in sequence.)

  • Timing intervals for a single agent (artist) must not overlap.

<div itunes:song-part="Verse">

<p begin="00:01:22.000" end="00:01:31.500" ttm:agent="voice2">

  Ooo, baby you're so <span tts:fontStyle="italic">delicious</span>

</p>

<p begin="00:01:30.00" end="00:01:31.800" ttm:agent="voice1">

  Well maybe just one little <span itunes:explicit="true">explicit</span> more

</p>

<p begin="00:01:32.000" end="00:01:33.500" ttm:agent="voice2">

  Never such a <span itunes:unsure="true">buzzard</span> before

</p>

</div>

Example Paragraph 2 §7.1.5

This example illustrates how to use <span> elements to apply a condition on a single word or phrase within a <p>. Use <span> elements for the following:

  • iTunes extension symbol apply bold (tts:fontStyle="bold") or italics (tts:fontStyle="italic")

  • iTunes extension symbol indicate that a word or phrase contains explicit content (itunes:explicit="true")

  • iTunes extension symbol indicate that a word or phrase isn’t clear (itunes:unsure="true")

 </body>

</tt>

Television Content Profiles

ProRes File Requirements

Important: All ProRes files must contain valid integers to define field handling. Only the following field ("fiel" is displayed in the ProRes header) pair values are accepted by Apple:

  • field count = 1 , field ordering = 0

  • field count = 2 , field ordering = 1

  • field count = 2 , field ordering = 6

  • field count = 2 , field ordering = 9

  • field count = 2 , field ordering = 14

Field count is the “fields” value found in Atom Inspector for fiel atom tag. Field ordering is the “detail” value found in Atom Inspector for fiel atom tag.

HD TV Source Profile

  • Apple ProRes 422 (HQ) or 4444 or 4444 (XQ)

  • ITU-R BT.709 color space, file tagged correctly as 709

  • VBR expected at 88-220 Mbps

  • HD encoded dimensions accepted to support square pixel aspect ratios (PASP):

Encoded

PASP

Converted to ProRes From

1920 x 1080

1:1

HDCAM SR, D5, ATSC

1280 x 720

1:1

ATSC progressive

  • HD encoded dimensions accepted to support non-square pixel aspect ratios (this allows you to send HD video in the native dimensions of your best original source, for example in HD broadcast dimensions*):

Encoded

PASP

Converted to ProRes From

1440 x 1080

1:1.33333

XDCAM-HD, HDCAM

1280 x 1080

1:1.5

DVCProHD interlaced

960 x 720

1:1.33333

DVCProHD progressive

* If your original HD source pixel aspect ratio is non-square, contact your iTunes Technical Representative before delivery.

  • 23.976, 24, 25, 29.97 frame rates are supported

  • Native frame rate of original source: 

    • 29.97 interlaced frames per second video source can be delivered either interlaced or de-interlaced properly tagged as progressive

    • 23.976, 24, and 25 frames per second must be delivered progressive

    • Field dominance must be properly tagged (top field first, bottom field first, or progressive)

    • Fields and frames may not be duplicated or eliminated to create a broadcast frame rate (for example, telecine, NTSC to PAL conversion)

    • For mixed frame rate material, contact your iTunes Technical Representative

  • Interlaced content must be correctly tagged as interlaced and field ordering must be defined in the QuickTime container.

  • Crop dimensions should be supplied in the metadata for content with inactive pixels due to letterbox, pillarbox, or windowbox. Refer to Basic TV Metadata Example in the iTunes Package TV Specification for further information.

  • Content upscaled from SD will be rejected.

SD TV Source Profile

NTSC

Apple ProRes:

  • Apple ProRes 422 (HQ)

  • VBR expected at 40-60 Mbps

  • 720 x 480 or 720 x 486 encoded pixels; for display at either 853 x 480 for 16:9 content or 640 x 480 for 4:3 content. Properly created 720 x 486 content will have a minimum of 4 pixels of black at the top and 2 at the bottom. Crop values for top and bottom must total at least 6 pixels.

  • All encoded content must include pixel aspect ratio (pasp), preferably one that results in a display aspect ratio of 4:3 or 16:9.

  • Native frame rate of original source:

    • 29.97 frames per second video source can be delivered interlaced.

    • 24 frames per second must be delivered progressive.

    • 23.976 frames per second for inverse telecine must be delivered progressive; must not be delivered interlaced or delivery will fail.

    • Telecine materials will not be accepted.

  • Crop dimensions should be supplied in the metadata for content with inactive pixels due to letterbox, pillarbox, or windowbox. Refer to Basic TV Metadata Example in the iTunes Package TV Specification for further information.

  • 4:3 standard definition video should not be delivered anamorphic 16:9 with matting.

MPEG-2:

  • MPEG-2 Program Stream Main Profile

  • 4:2:0 chroma sampling

  • ITU-R BT.601 color space

  • 15 Mbps minimum

  • Long GOP

  • 640 fixed horizontal dimension

  • Variable size vertical dimension depending on aspect ratio of source, maximum size of 480

  • Square pixel aspect ratio (1:1)

  • Native frame rate of original source:

    • 29.97 interlaced frames per second video source can be delivered either interlaced or de-interlaced properly tagged as progressive

    • 24 frames per second must be delivered progressive

    • 23.976 frames per second for inverse telecine must be delivered progressive; must not be delivered interlaced or delivery will fail

    • Field dominance must be properly tagged (top field first, bottom field first, or progressive)

    • Fields and frames may not be duplicated or eliminated to create a broadcast frame rate (for example, telecine, NTSC to PAL conversion)

    • For mixed frame rate material, contact your iTunes Technical Representative

  • Interlaced content must be tagged non-progressive and field ordering must be defined in the stream.

  • Crop inactive pixels and maintain fields. All edges must have active pixels for greater than 90% of the duration of the video.

  • Content may NOT be delivered letterbox, pillarbox, or windowbox.

PAL

Apple ProRes:

  • Apple ProRes 422 (HQ)

  • VBR expected at 40-60 Mbps

  • 720 x 576 encoded pixels; for display at either 1024 x 576 for 16:9 content or 768 x 576 for 4:3 content.

  • All encoded content must include pixel aspect ratio (pasp), preferably one that results in a display aspect ratio of 4:3 or 16:9.

  • Native frame rate of original source:

    • 25 frames per second video source can be delivered interlaced or de-interlaced and properly tagged as progressive.

    • Telecine materials will not be accepted.

  • Crop dimensions should be supplied in the metadata for content with inactive pixels due to letterbox, pillarbox, or windowbox. Refer to Basic TV Metadata Example in the iTunes Package TV Specification for further information.

  • 4:3 standard definition video should not be delivered anamorphic 16:9 with matting.

MPEG-2:

  • MPEG-2 Program Stream Main Profile

  • 4:2:0 chroma sampling

  • ITU-R BT.601 color space

  • 15 Mbps minimum

  • Long GOP

  • 640 fixed horizontal dimension

  • Variable size vertical dimension depending on aspect ratio of source, maximum size of 480

  • Square pixel aspect ratio (1:1)

  • Native frame rate of original source:

    • 25 interlaced frames per second sourced from video must be delivered de-interlaced and properly tagged as progressive

    • 24 and 25 frames per second sourced from film must be delivered progressive

    • 23.976 frames per second for inverse telecine must be delivered progressive; must not be delivered interlaced or delivery will fail

    • Field dominance must be properly tagged (top field first, bottom field first, or progressive)

    • Interlaced materials will not be accepted

    • Fields and frames may not be duplicated or eliminated to create a broadcast frame rate (for example, telecine, NTSC to PAL conversion)

    • For mixed frame rate material, contact your iTunes Technical Representative

  • Crop inactive pixels. All edges must have active pixels for greater than 90% of the duration of the video.

  • Content may NOT be delivered letterbox, pillarbox, or windowbox.

Important: All video must begin and end with at least one black frame. In addition, videos can only have empty edits in the last edit of the edit list; videos with empty edits other than the last edit will be blocked.

TV Audio Source Profile

MPEG-2 Program Stream Container

Stereo

  • MPEG-1 layer II

  • 384 kpbs

  • 48Khz

  • Included in the same file as the delivered video

QuickTime Container

5.1 Surround

Note: Audio track channels for 5.1 Surround can either be all 24 bit or all 16 bit. An audio track cannot be a combination of 16 bit and 24 bit channels.

  • LPCM in either Big Endian or Little Endian, 16-bit or 24-bit, at least 48kHz

  • Expected channels: L, R, C, LFE, Ls, Rs

Stereo

  • LPCM in either Big Endian or Little Endian, 16-bit or 24-bit, at least 48kHz

  • Expected Dolby Pro Logic channels: Lt, Rt or expected stereo channels: L, R

TV Audio/Video Container

MPEG-2 Program Stream Container

  • Deliver all content in an MPEG-2 Program Stream file container.

  • The .mpg file extension is expected for all MPEG-2 content.

  • Audio must be delivered muxed with the video stream.

QuickTime Container

  • Deliver all content in a QuickTime .mov file container.

  • The QuickTime .mov file extension is expected for all audio and video content.

  • For 5.1 surround audio, each audio channel must have an assignment. The channel assignments must match one of the options in the table below. For Option 1 (one track with all six channels), the order of the channel assignments can vary as noted in Option 1a, 1b, 1c, and 1d. Note that "Lt" and "Rt" are only used for Dolby matrix audio mixdown.

Audio channel assignments table
  • For all stereo tracks in all options listed, the channel assignments can be indicated using L and R, or Lt and Rt, but not L and Rt, or R and Lt.

Important: Refer to Audio Channel Assignments for instructions on applying audio channel assignments and label descriptions.

TV Closed Captioning Profile

Note: Closed captioning can be sent with ProRes and MPEG-2 files.

  • Text in EIA 608 format.

  • Delivered in the same package with the video it references.

  • In a Scenarist SCC formatted file, using .scc file extension.

  • The timecode frame rate can only be 29.97 and is independent from your video source frame rate. The timecode format however must match the timecode format of the source video, either drop frame (DF) or non-drop frame (NDF).

    • Drop frame format has colons for the first two time delimiters and a semi-colon for the last time delimiter (HH:MM:SS;FF)

    • Non-drop frame format has colons for all the time delimiters (HH:MM:SS:FF)

Source Video Frame Rate

Closed Caption Frame Rate

Description

Timecode Format

Timecode Example

29.97, 59.94i

29.97

NTSC Video

DF

HH:MM:SS;FF

25, 50i

29.97

PAL Video

NDF

HH:MM:SS:FF

24

29.97

Film

NDF

HH:MM:SS:FF

23.976

29.97

NTSC Film

NDF

HH:MM:SS:FF

  • Captions should display and synchronize to within one second of the initial, audible dialog to be represented in text.

The timecodes of the captions are relative to the start of the program, and not the QuickTime movie's timecode track.

Currently, iTunes does not support EIA 708 (ATSC closed captioning) or Teletext.

MacCaption is a tool you can use to create .scc files: http://www.telestream.net. (Note that this product is not endorsed by Apple. Apple cannot and does not provide support for third-party products.)

Notes:

  • To test the closed captioning before delivering a video, see Import and preview captions in Compressor.

  • If closed caption data is available for any broadcast or web delivery system, it must be supplied to iTunes.

TV Cover Art Profile

1:1 Cover Art

  • JPEG with .jpg extension (quality unconstrained), PNG with .png extension, or LSR with .lsr extension

  • Color space: RGB (screen standard)

  • LSR files:

    • must have a minimum size of 3000 x 3000 pixels

    • must have a minimum of two layers (five layers maximum)

    • each image within the layered LSR file must have a unique name

    • each image within the layered LSR file must be in PNG format

  • JPG or PNG files should be a minimum size of 3000 x 3000 pixels.

  • 72 dpi minimum resolution

  • 1:1 aspect ratio

  • No nudity or graphic material

  • No promotional material, including URLs or bugs

Do not increase the size of a smaller image to meet the minimum size standard. Excessively blurry or pixelated images will be rejected.

Important: CMYK (print standard) images will not be accepted.

Display P3 Cover Art

  • LSR with .lsr extension or PNG with .png extension (Display P3 Color Profile must be embedded)

  • Color space: Display P3

  • Color mode: RGB

  • Color depth: 16 bits per channel

  • Size: 3000 x 3000 pixels

  • Resolution: 72 dpi minimum

  • No nudity or graphic material

  • No promotional material, including URLs or bugs

16:9 Cover Art

Cover art for 16:9:

  • LSR with .lsr extension or PNG with .png extension

  • Minimum size: 1920 x 1080 pixels; 3840 x 2160 pixels preferred

  • Aspect ratio: 1.75d to 1.80d

  • No nudity or graphic material

  • No promotional material, including URLs or bugs

TV Backdrop Art and Content Logos

Tall Backdrop Art

  • PNG with .png extension

  • Exact size: 1680 x 3636 pixels

Wide Backdrop Art

  • PNG with .png extension

  • Exact size: 4320 x 3240 pixels

Logo Art (Single-color or full-color)

  • PNG with .png extension

  • Exact size: 4320 x 1300 pixels

TV Content Considerations

  • No bugs or logos should be visible during the body of the video.

  • No tune-ins should be visible during the body of the video. Tune-ins are only acceptable at the end of the video.

  • No ratings or advisories should be displayed at any time during the video.

  • Network cards at the beginning and end of the video are accepted as long as they are visible less than five (5) seconds.

  • Commercials or other promotional material, including URLs, are NOT accepted. For more details, please contact your iTunes Technical Representative.

  • Commercial black may be a maximum of 5 seconds.

  • Previews must contain content suitable for a general audience.

  • Previews must not have opening or ending credits and should not start on a black frame.

  • Previews should be unique for each episode on a season.

  • Previews should not contain any spoilers.

  • A minimum of 1 black frame at the beginning and end of each video is required.

Film Content Profiles

ProRes File Requirements

Important: All ProRes files must contain valid integers to define field handling. Only the following field ("fiel" is displayed in the ProRes header) pair values are accepted by Apple:

  • field count = 1 , field ordering = 0

  • field count = 2 , field ordering = 1

  • field count = 2 , field ordering = 6

  • field count = 2 , field ordering = 9

  • field count = 2 , field ordering = 14

Field count is the “fields” value found in Atom Inspector for fiel atom tag. Field ordering is the “detail” value found in Atom Inspector for fiel atom tag.

Film HD Source Profile

  • Apple ProRes 422 (HQ) or 4444 or 4444 (XQ)

  • ITU-R BT.709 color space, file tagged correctly as 709

  • VBR expected at ~220 Mbps

  • HD encoded dimensions accepted to support square pixel aspect ratios (PASP):

Encoded

PASP

Converted to ProRes From

1920 x 1080

1:1

HDCAM SR, D5, ATSC

  • HD encoded dimensions accepted to support non-square pixel aspect ratios (this allows you to send HD video in the native dimensions of your best original source, for example in HD broadcast dimensions*):

Encoded

PASP

Converted to ProRes From

1440 x 1080

1:1.33333

XDCAM-HD, HDCAM

1280 x 1080

1:1.5

DVCProHD interlaced

* If your original HD source pixel aspect ratio is non-square, contact your iTunes Technical Representative before delivery.

  • Native frame rate of original source:

    • 29.97 or 25 interlaced frames per second for video-sourced material.

    • 23.976, 24, or 25 frames per second for digital-progressive or film-sourced material.

  • Content may be delivered matted: letterbox, pillarbox, or windowbox (with proper corresponding crop values in the metadata package).

  • Content upscaled from SD will be rejected.

Important: All video must begin and end with at least one black frame. In addition, videos can only have empty edits in the last edit of the edit list; videos with empty edits other than the last edit will be blocked.

Film HDR Source Profile

The HDR source video must meet the following minimum requirements in the subsections below.

For both Dolby® Vision and HDR10

  • Display dimensions and PASP must match corresponding primary video display dimensions and PASP

  • HDR source video must have progressive scan and uniform frame rate

  • HDR source video track should only contain a single edit list entry

  • Duration must match corresponding primary video duration

  • Frame count must match corresponding primary video frame count

  • HDR format (Dolby® Vision or HDR10) specified as a source attribute

  • HDR video source that contains embedded audio will be accepted, but the audio will be ignored; the audio on the SDR source will be used for bundling.

For Dolby® Vision

  • Single Apple ProRes 4444 or 4444 XQ 12-bit file accompanied by a single Dolby® Vision CM metadata file (Dolby Vision CM version 2.9 and CM version 4.0 sidecar metadata files are supported)

  • Transfer function: SMPTE ST 2084 (PQ)

  • White point and color primaries: ITU-R BT.2020 or D65 P3

  • Transform matrix: BT.2020 (for BT.2020 primaries) or BT.709 (for D65 P3 primaries)

  • The Dolby® Vision CM metadata should not contain any gaps in the shots and the frames referenced in the shots should cover all frames in the video essence

For HDR10

  • Single Apple ProRes 4444 or 4444 XQ 12-bit file

  • Transfer function: SMPTE ST 2084 (PQ)

  • White point and color primaries: ITU-R BT.2020 or D65 P3

  • Transform matrix: BT.2020 (for BT.2020 primaries) or BT.709 (for D65 P3 primaries)

  • ST 2086 and MaxCLL / MaxFALL metadata provided as additional source attributes

4K Source Profile

  • Dimensions should be 3840 x 2160 (UHD) or 4096 X 2160 (DCI 4k). Any DCI 4k asset can have optional crop values (Apple strongly recommends sending crop values for DCI 4k).

  • Apple ProRes 422 HQ or 4444 or 4444 XQ

  • VBR expected at ~880 Mbps for 422 HQ, ~1320 Mbps for 4444 and ~2000 Mbps for 4444 XQ

  • Content should be encoded using ITU-R BT.709 color space. For more information see http://www.itu.int/rec/R-REC-BT.709/en

  • Content should be delivered in the original frame rate of the source

  • 4K source must be progressive scan and can be delivered in 23.976, 24, 25, 29.97, or 30 frames per second

  • If the 4K source file is not delivered matted or if there are no inactive pixels, Apple recommends setting all crop dimension attributes to '0' (zero).

  • All video must begin and end with at least one black frame. In addition, videos that begin with or contain empty edits will be blocked; the file can contain an empty edit in its edit list only if it is the last edit. For HDR source video in Dolby® Vision format, the sidecar metadata file should cover these black frames.

Film SD Source Profile

NTSC

  • Apple ProRes 422 (HQ)

  • VBR expected at 40-60 Mbps

  • 720 x 480 or 720 x 486 encoded pixels; for display at either 853 x 480 for 16:9 content or 640 x 480 for 4:3 content. Properly created 720 x 486 content will have a minimum of 4 pixels of black at the top and 2 at the bottom. Crop values for top and bottom must total at least 6 pixels.

  • All encoded content must include pixel aspect ratio (pasp), preferably one that results in a display aspect ratio of 4:3 or 16:9.

  • Native frame rate of original source:

    • 29.97 frames per second video source can be delivered interlaced

    • 24 frames per second must be delivered progressive

    • 23.976 frames per second for inverse telecine must be delivered progressive; must not be delivered interlaced or delivery will fail

    • Telecine materials will not be accepted

  • Content may be delivered matted: letterbox, pillarbox, or windowbox.

  • 4:3 standard definition video should not be delivered anamorphic 16:9 with matting. For example, the minimum post-cropped display must equal or exceed 768 x 576.

PAL

  • Apple ProRes 422 (HQ)

  • VBR expected at 40-60 Mbps

  • 720 x 576 encoded pixels; for display at either 1024 x 576 for 16:9 content or 768 x 576 for 4:3 content

  • All encoded content must include pixel aspect ratio (pasp), preferably one that results in a display aspect ratio of 4:3 or 16:9.

  • Native frame rate of original source:

    • 24 and 25 frames per second sourced from film must be delivered progressive

    • 23.976 frames per second for inverse telecine must be delivered progressive; must not be delivered interlaced or delivery will fail

    • Telecine materials will not be accepted

  • Content may be delivered matted: letterbox, pillarbox, or windowbox.

  • 4:3 standard definition video should not be delivered anamorphic 16:9 with matting. For example, the minimum post-cropped display must equal or exceed 768 x 576.

Important: All video must begin and end with at least one black frame. In addition, videos that begin with or contain empty edits will be blocked; the file can contain an empty edit in its edit list only if it is the last edit.

Film Audio Source Profile

For every film that 5.1 or 7.1 Surround audio is available in any competing format or market, it must be provided to Apple in addition to the stereo tracks.

Note: Audio track channels for 5.1 and 7.1 Surround can either be all 24 bit or all 16 bit. An audio track cannot be a combination of 16 bit and 24 bit channels.

7.1 Surround

  • LPCM in either Big Endian or Little Endian, 16-bit or 24-bit, at least 48kHz

  • Expected channels: L, R, C, LFE, Ls, Rs, Rls (or Lrs), Rrs

5.1 Surround

  • LPCM in either Big Endian or Little Endian, 16-bit or 24-bit, at least 48kHz

  • Expected channels: L, R, C, LFE, Ls, Rs

Stereo

  • LPCM in either Big Endian or Little Endian, 16-bit or 24-bit, at least 48kHz

  • Expected Dolby Pro Logic channels: Lt, Rt or expected stereo channels: L, R

Dolby Atmos Audio Source Profile

  • The audio mix must have been approved for home listening and monitored in a room with at least a 7.1.4 speaker layout.

  • If the Dolby Atmos master can be used for derivation of legacy audio assets (7.1ch, 5.1ch and/or stereo LtRt audio), these legacy renders must be approved as well.

  • All audio deliverables should be conformed and synced to final picture as long-play and not as separate reels.

  • Leader and sync pop should be removed from the Dolby Atmos master file.

  • The Dolby Atmos master file must be provided as a BWF ADM file.

  • All audio tracks in the file must be 24-bit LPCM audio at 48kHz.

  • There must not be more than 128 individual audio tracks.

  • Tracks 1-128 may be used for objects or beds.

  • Mastering information (for example, desired artistic compression profiles, downmix mix levels, etc.) should be correctly authored in the DBMD section of the BWF ADM file.

  • The average loudness of dialogue/speech of the Dolby Atmos master must be within the range of -31 LKFS to -10 LKFS, and should ideally be between -30 LKFS and -18 LKFS. The following measurement methodology should be used:

    • Run the loudness measurement on a 5.1 channel render of the full mix.

    • Use the measurement algorithm BS.1770 + Dialogue Intelligence (or other speech-gating algorithm) to measure the dialogue-gated loudness, integrated over the full duration of the asset, to verify it falls within the specified range indicated above.

    • Monitor the reported percentage of dialogue. If it is less than 10%, dialogue may not be the anchor element for loudness correction of this audio asset. You should instead follow the procedure in the paragraph below.

  • For Dolby Atmos masters where dialogue is not the anchor element (for example, music assets), the average audio loudness of the Dolby Atmos master must be within the range of -31 LKFS to -5 LKFS. The following measurement methodology should be used:

    • Run the loudness measurement on a 5.1 channel render of the full mix.

    • Use the BS.1770-4 (or BS.1770-3) measurement algorithm to measure the full-program mix (all channels over the entire length of the asset) to verify that loudness falls within the specified range indicated above.

Film Audio/Video and Alt-Audio Container

  • Deliver all content in a QuickTime .mov file container.

  • The QuickTime .mov file extension is expected for all audio and video content.

  • For 7.1 surround audio, each audio channel must have an assignment, which must match one of the options in the table below. Option 1 shows one track with all eight channels and Option 2 shows one track for each channel. Note that “Rsl” (Rear Surround Left) can also be represented by “Rls” (Rear Left Surround).

    image of table showing audio channel assignments
  • For 5.1 surround audio, each audio channel must have an assignment. The channel assignments must match one of the options in the table below. For Option 1 (one track with all six channels), the order of the channel assignments can vary as noted in Option 1a, 1b, 1c, and 1d. Note that "Lt" and "Rt" are only used for Dolby matrix audio mixdown.

    Audio channel assignment table
    • For all stereo tracks in all options listed, the channel assignments can be indicated using L and R, or Lt and Rt, but not L and Rt, or R and Lt.

    Important: Refer to Audio Channel Assignments for instructions on applying audio channel assignments and label descriptions. Refer to Table 1: ProRes Audio Channel Data Assignment and Levels for audio levels and channel assignments for music, sound effects, and dialogue.

    Note: For more information on alternate audio, see Assets and Data Files in the iTunes Package Film Specification.

Film Closed Captioning Profile

  • Text in EIA 608 format.

  • Delivered in the same package with the video it references.

  • In a Scenarist SCC formatted file, using .scc file extension.

  • The timecode frame rate can only be 29.97 and is independent from your video source frame rate. The timecode format however must match the timecode format of the source video, either drop frame (DF) or non-drop frame (NDF).

    • Drop frame format has colons for the first two time delimiters and a semi-colon for the last time delimiter (HH:MM:SS;FF)

    • Non-drop frame format has colons for all the time delimiters (HH:MM:SS:FF)

Source Video Frame Rate

Closed Caption Frame Rate

Description

Timecode Format

Timecode Example

29.97, 59.94i

29.97

NTSC Video

DF

HH:MM:SS;FF

25, 50i

29.97

PAL Video

NDF

HH:MM:SS:FF

24

29.97

Film

NDF

HH:MM:SS:FF

23.976

29.97

NTSC Film

NDF

HH:MM:SS:FF

  • Captions should display and synchronize to within one second of the initial, audible dialog to be represented in text.

The timecodes of the captions are relative to the start of the program, and not the QuickTime movie's timecode track.

Currently, iTunes does not support EIA 708 (ATSC closed captioning) or Teletext.

MacCaption is a tool you can use to create .scc files: http://www.telestream.net. (Note that this product is not endorsed by Apple. Apple cannot and does not provide support for third-party products.)

Notes:

  • To test the closed captioning before delivering a video, see Import and preview captions in Compressor.

  • If closed caption data is available for any broadcast or web delivery system, it must be supplied to iTunes.

Film Audio Description (AD) Profile

  • Provides an alternate audio track for Audio Description (AD) and includes a description of visual elements that are important in understanding what is occurring at the time, as well as the plot, music, dialogue, and sound effects.

  • Delivered in pre-mixed format and in 2.0 stereo and optionally 5.1 surround (no 7.1 surround).

  • Audio Description (AD) files are accepted for all languages and iTunes includes the Audio Description when creating bundles. To be included in a bundle, the locale of the Audio Description must match the locale of a corresponding source audio or alternate audio file. If you send an Audio Description file that doesn’t have a corresponding audio file, it will not go live on iTunes.

Film iTunes Timed Text Profile

Below is a summary of delivery requirements for iTunes Timed Text. Refer to iTunes Timed Text profile in the iTunes Package Film Specification for complete details.

  • Delivered in an iTunes Timed Text (iTT) formatted file, using .itt file extension.

  • Delivered in the same package with the video it references as an asset in the <assets> block.

  • Only one div element is allowed in an iTT document.

  • timeBase must be set to smpte.

  • dropMode must be set to "dropNTSC" or "nonDrop"; iTunes Timed Text does not support dropPAL.

  • Only sansSerif may be specified as the typeface in fontFamily.

The iTT file format is a subset of the Timed Text Markup Language, Version 1 W3C Recommendation 24 September 2013 (TTML) (https://www.w3.org/TR/ttml1/) from the World Wide Web Consortium (W3C) (http://w3.org/). All iTT documents are TTML documents that use the restricted subset of TTML.

Film Dub Card Video Profile

The full feature-length video asset is comprised of a set of data files, which play specific roles for their asset. The following table describes the optional data file for dub card video.

Asset Type

Data File

Description

Full

Role: video.end.dub_credits

An optional data file containing the credits associated with an audio track.

A video-only sequence containing one or more still credits specific to the locale-matched audio. iTunes products will include dub credit video sequences for the associated audio dubs following the main program.

Locale: Required

Dub Card Video Profile

  • Apple ProRes 422 (HQ)

  • Movie correctly tagged with color parameter: ITU BT.709

  • Video dimensions, pixel aspect ratio, and frame rate must match full program video

  • Dub card time scale must match the time scale of the video source.

  • Minimum of 4 seconds per dub card

  • If a dub card video has a resolution that does not match the resolution of the video source, you may supply different crop dimension attributes for dub cards.

  • Sound tracks should not be supplied for dub card video — sound tracks will be ignored

  • Dub card video will be deinterlaced if necessary so the field order does not need to match — progressive is preferred

  • Dissolves and scrolling credits are not supported

  • First and last frames do not need to be black frames

Film Card Profile

The full feature-length video asset is comprised of a set of data files, which play specific roles for their asset. The following table describes the optional data file for cards.

Asset Type

Data File

Description

Full

Role: card

An optional data file containing an image associated with the film.

An image file containing information, such as a certificate, about the film. Apple will create a video using the image for playback on the Store.

The card data file has a subtype attribute to distinguish the type of card that is delivered. Currently, there is only one subtype (certificate).

Card Image Profile

  • JPEG with .jpg extension or PNG with .png extension

  • Minimum size of 640 x 480 pixels

  • Color space: RGB (screen standard)

  • Only active pixel area may be included

  • Certificate images must be cropped (no letterbox, pillarbox, or windowbox)

  • Certificate images must contain only certificate content

  • Do not increase the size of a smaller image to meet the minimum size standard. Excessively blurry or pixelated images will be rejected.

Film Chapter Image Profile

  • JPEG with .jpg extension (quality unconstrained) or PNG with .png extension

  • RGB (screen standard)

  • Must be same aspect ratio as video source

  • 640 minimum horizontal dimension (larger for HD sourced)

  • Variable size vertical dimension (based on aspect ratio of video source)

  • Only active pixel area may be included (except where necessary to match the overall aspect ratio of the full program)

  • Chapter images must be cropped (no letterbox, pillarbox, or windowbox, except as noted above)

  • Chapter images must contain picture content

  • Chapter image files must be unique with different checksums

Important: CMYK (print standard) images will not be accepted.

Film Poster Art Profile

2:3 Poster Art

  • JPEG with .jpg extension (quality unconstrained), PNG with .png extension, or LSR with .lsr extension

  • Color space: RGB (screen standard)

  • LSR files:

    • must have a minimum size of 2000 x 3000 pixels

    • must have a minimum of two layers (five layers maximum)

    • each image within the layered LSR file must have a unique name

    • each image within the layered LSR file must be in PNG format

  • JPG or PNG files should be a minimum size of 2000 x 3000 pixels.

  • 2:3 aspect ratio

  • Poster art (one-sheet) from film. Must contain key art and title. DVD cover, release date, website, or promotional tagging may not be included.

  • Poster art must not display film ratings.

Do not increase the size of a smaller image to meet the minimum size standard. Excessively blurry or pixelated images will be rejected.

Important: CMYK (print standard) images will not be accepted.

Display P3 Poster Art

  • LSR with .lsr extension or PNG with .png extension (Display P3 Color Profile must be embedded)

  • Color space: Display P3

  • Color mode: RGB

  • Color depth: 16 bits per channel

  • Size: 2000 x 3000 pixels

  • Resolution: 72 dpi minimum

  • Poster art (one-sheet) from film. Must contain key art and title. DVD cover, release date, website, or promotional tagging may not be included.

  • Poster art must not display film ratings.

16:9 Poster Art

  • Must be LSR (.lsr), PNG (.png), or JPG (.jpg - quality unconstrained

  • Minimum size: 1920 x 1080 pixels; 3840 x 2160 pixels preferred

  • Aspect ratio: 1.75d to 1.80d

  • Poster art (one-sheet) from film. Must contain key art and title. DVD cover, release date, website, or promotional tagging may not be included.

  • Poster art must not display film ratings.

Film Backdrop Art and Content Logos

Tall Backdrop Art

  • PNG with .png extension

  • Exact size: 1680 x 3636 pixels

Wide Backdrop Art

  • PNG with .png extension

  • Exact size: 4320 x 3240 pixels

Logo Art (Single-color or full-color)

  • PNG with .png extension

  • Exact size: 4320 x 1300 pixels

Film Content Considerations

  • The full movie asset should not contain FBI, MPAA, or release date tagging.

  • The trailer asset should not contain FBI, MPAA, or release date tagging.

  • A minimum of 1 black frame at the beginning and end of each video is required.

  • Trailer should be same aspect ratio as the full asset.

  • Promotional bumpers, including URLs, are NOT accepted. For more details, contact your iTunes Technical Representative.

  • For US and Canadian preview trailers, the contents of the preview must be appropriate for general audiences.

  • For preview trailers from all other countries, the contents of the preview cannot exceed the rating classification of the feature film for territories in which it is available. Previews being made available worldwide (WW) must be suitable for all territories that are cleared for sale.

  • Poster art should not contain DVD tagging, release date tagging, or website tagging.

XML

  • All XML must be encoded in UTF-8.

  • No byte order markers (BOM) can be used.

  • There should be no null data or empty tags in the XML. If not used, elements should be removed.

  • The XML must be formatted to use line breaks and indentations.

For further information, refer to the appropriate media type metadata specification, or consult with your iTunes Technical Representative.

Previous Guide Revisions

The following table lists the previously-released guides and the revisions:

Date/Version

Summary

July 19, 2021 - Version 5.3.7

Removed references to CBFC for India.

June 03, 2021 - Version 5.3.6

Revised source requirements for immersive audio. Clarified the difference between immersive and spatial audio. Updated film poster art requirements.

May 17, 2021 - Version 5.3.5

Added source requirements and best practices for immersive audio.

August 31, 2020 - Version 5.3.4

Dolby Vision CM 4 sidecar metadata files are now supported. Added a new data file role for delivering CBFC Certificates for content sold in India. Added Rrs (Rear Right Surround) as a label for audio channel assignments.

August 10, 2020 - Version 5.3.3

Added gamma value to music video profile. Chapter images can be sent in PNG format. QuickTime Pro 7 has been deprecated. The iTunes Closed Captioning Testing Guide has been deprecated.

October 28, 2019 - Version 5.3.2

Updated Apple Digital Masters requirements. Updated Music Audio Source requirements.

August 7, 2019 - Version 5.3.1

Removed audiobooks requirements. Rebranded MFiT.

May 13, 2019 - Version 5.3

Added requirements for 16:9 poster art, content logos, and backdrop art for both TV and Film. Changed the version number of this guide from 5.2 to 5.3 to keep the version number in sync with the new schema version.

January 30, 2019 - Version 5.2.14

Added 30fps to HD and SD music video source profiles. Added audiobooks profile.

August 8, 2018 - Version 5.2.13

Updated music video source profile. Added Dolby Atmos Audio source for film. Clarified pixel aspect ratio.

May 2, 2018 - Version 5.2.12

Requirements for music video screen capture images have changed.

Poster art and cover art requirements for P3 displays have been added for Film and TV.

February 21, 2018 - Version 5.2.11

Updated screen capture image requirements for music video. HD content upscaled from SD will be rejected. Corrected a link.

October 18, 2017 - Version 5.2.10

Added source asset requirements for HDR and 4K video.

July 12, 2017 - Version 5.2.9

Added ability to set audio levels and channel assignments for music, sound effects, and dialogue. Audio Description (AD) files can be delivered in 5.1 surround audio. Crop dimensions are allowed for dub card video. Poster art and cover art size requirements have changed.

May 17, 2017 - Version 5.2.8

The requirements for film trailers have changed. The requirements for HD Source for music video, TV, and film have been updated.

January 26, 2017 - Version 5.2.7

Added a chapter to describe the file format used to deliver song lyrics for album song tracks. Clarified single-channel audio.

August 17, 2016 - Version 5.2.6

Updates to Audio Description (AD).

May 26, 2016 - Version 5.2.5

Audio track channels for 5.1 and 7.1 Surround can either be all 24 bit or all 16 bit. Dub card time scale must match the time scale of the video source.

January 15, 2016 - Version 5.2.4

Additional formats added for HD source. Updated a URL link.

September 28, 2015 - Version 5.2.3

Changed requirements for poster art for film and TV. Added requirements for layered images for film. Clarified closed captions.

July 16, 2015 - Version 5.2.2

Added requirements for 7.1 surround audio for film. Changed requirements for album cover art.

January 8, 2015 - Version 5.2.1

Added explanations of field count values for ProRes. Added display dimensions of HD source to accommodate video formats that use non-square pixels, for example, in broadcast dimensions. Added Audio Description (AD) requirements for film. Clarified music video screen capture image.

April 9, 2014 - Version 5.2

Added closed caption requirements for music videos. Added a best practice for MFiT content. Added requirements for TV cover art. Added guidelines to TV content considerations. Clarified SCC files for both TV and Film. Changed the version number of this specification from 5.1 to 5.2 to keep the version number in sync with the new schema version.

March 20, 2013 - Version 5.1.1

Corrected the ProRes SD Profiles for NTSC and PAL.

February 28, 2013 - Version 5.1

Added best practices for delivering Mastered for iTunes content. Changed requirements for ringtone album cover art. Clarified acceptable frame rates for closed captioning for TV and Film. Videos with empty edits other than the last edit will be blocked.

November 7, 2012 - Version 5.0.1

Clarified SD Source video for film. Added new video source validations. Added Apple ProRes to NTSC and PAL SD TV source profiles. Changed requirements for QuickTime audio channel assignments. Closed captions can be added to MPEG-2 sources

May 30, 2012 - Version 5.0

Album cover art and poster art requirements have changed. Removed TIFF from the list of recommended image formats and removed DPI requirements. Added delivery requirements for dub card video. 96Khz audio is now supported.

September 22, 2011 - Version 4.8

Added crop dimensions for TV. Clarified content considerations for TV. Clarified closed captioning for TV. Added delivery requirements for iTT files.

July 13, 2011 - Version 4.7.2

Clarified delivery requirements for 5.1 audio and closed captioning. Added the profile for closed captioning for TV. Film poster art requirements have changed.

April 15, 2011 - Version 4.7

Clarified HD cropping for TV. Added color space requirement for HD film source. Clarified closed captioning text for film.

February 9, 2011 - Version 4.6

Removed asset specifications for books (a new iBooks Store asset guide has been created). Renamed this asset guide to: iTunes Video and Audio Asset Guide.

November 5, 2010 - Version 4.5

Clarified surround sound for HD music video audio source profile. Clarified delivery of HD source for music videos. Added a chapter for book source profiles. Put back 25 fps in the HD TV source profile that was incorrectly removed. Added two new best practice items to the TV Content Considerations section.

August 5, 2010 - Version 4.4

Added source profile for HD music video and cropping information. Clarified album cover art. Added surround sound to HD music video audio source profile.

February 3, 2010 - Version 4.3

Clarified that ALAC in a CAF container is allowed. Added source profile for pre-cut ringtones. Clarified that film ratings should not appear on poster art.

December 18, 2009 - Version 4.2

Clarified quality standards. Clarified closed captioning.

November 10, 2009 - Version 4.1

Clarified audio requirements for music and film.

September 11, 2009 - Version 4.0

Added best practices content for Film. Clarified requirements for SCC files.

July 1, 2009 - Version 3.3.2

Clarified image and audio requirements. Clarified frame rate requirements for TV.

May 12, 2009 - Version 3.3.1

Added support for PNG format images for cover art, poster art, and video screen captures. PNG images are not currently supported for chapter thumbnail images.

March 17, 2009 - Version 3.3

Added updated PAL support for film. Added closed-captioning to Film Content Profile. Added 24-bit support for audio. Added best practices content for TV. Clarified how to send stereo sound for Film and TV.

October 1, 2008 - Version 3.2

Added audio source specification to Music Audio Content Profile, added HD format to Television Content Profile and Appendix I, which provides audio channel assignments instructions.

May 8, 2008 - Version 3.1.1

Complete reformatting of the Guide. Separation of content type profiles. Addition of Movie HD and SD specification. Addition of image specifications for TV and Film.

April 2, 2007 - Version 2.3

Introduction of Asset Specification Guide.

Audio Channel Assignments

Audio Channel Assignments

Previously, you could tag audio channel assignments using QuickTime Pro 7. QuickTime Pro 7 is a 32-bit application and the recently released macOS Catalina only supports 64-bit applications.

Instead, you can use Compressor 4.4.5 to tag audio channel assignments for QuickTime movies using Compressor’s command-line interface. Use the command-line option -⁠relabelaudiotracks followed by the channels you want to tag and their values. See Audio channel layouts in Compressor for more on audio channel layouts.

The supported values for audio tagging are:

Label

Description

L

Left

R

Right

C

Center

LFE

LFE Screen

Ls

Left Surround

Rs

Right Surround

Lc

Left Center

Rc

Right Center

Rsl (or Rls)

Rear Surround Left (or Rear Left Surround)

Rsr (or Rrs)

Rear Surround Right (or Rear Right Surround)

Lt

Left Total

Rt

Right Total

LtRt

Matrix stereo (Lt Rt)

stereo

Stereo (L R)

Examples:

  • Given a QuickTime Movie file with two audio tracks where the first audio track contains the Left audio channel and the second audio track contains the Right audio channel, type the following at the command line:

    /Applications/Compressor.app/Contents/MacOS/Compressor -relabelaudiotracks L R <path_to_Quicktime_Movie_source_file>

  • Similarly, given a QuickTime Movie file with one audio track where the track contains both the Lt and Rt audio channels (that is, matrix stereo), type the following at the command line:

    /Applications/Compressor.app/Contents/MacOS/Compressor -relabelaudiotracks LtRt <path_to_Quicktime_Movie_source_file>

The above commands overwrite the audio assignments in the original source file, which can speed up the process. To save to a new file, add -⁠locationpath to the command, for example:

/Applications/Compressor.app/Contents/MacOS/Compressor -relabelaudiotracks L R <path_to_Quicktime_Movie_source_file> -locationpath <path_to_output_file>

and

/Applications/Compressor.app/Contents/MacOS/Compressor -relabelaudiotracks LtRt <path_to_Quicktime_Movie_source_file> -locationpath <path_to_output_file>

See Intro to shell commands in Compressor for more information using the command line. And in the command line tool, you can display help using: /Applications/Compressor.app/Contents/MacOS/Compressor -help.

Table 1: ProRes Audio Channel Data Assignment and Levels

Label

Data and Levels

L

MnE

R

MnE

C

Dialogue

LFE

Effects

Ls

Music

Rs

Music

Lt

Mixdown of 5.1 (stereo)

Rt

Mixdown of 5.1 (stereo)

Max peak:

-6db

LKFS:

-24db