Overview

Introduction

This document provides detailed delivery information for all accepted media and files, including music, music video, television, and movies. If further details are required, contact your technical representative.

Quality is important to us. We expect to receive the highest-quality assets available. Our product must meet or exceed the quality of the physical product already out in the marketplace. For example, if 5.1 surround sound or closed captions exist on the physical version of the product, those must be provided. If the physical product gives the chapters actual names (as opposed to Chapter 1, Chapter 2, and so on), then our product should have those same chapter titles. If the album is in stereo, stereo audio must be provided.

Changes Made in This Release

Date/Version

Changes Made

April 8, 2024 - Version 5.3.14

Added stereoscopic video source file requirements. Renamed this asset guide to Apple Video and Audio Asset Guide.

For a complete history of changes, see Previous Guide Revisions.

What’s New in Apple Video and Audio Asset Guide 5.3.14?

Title

Renamed this asset guide to Apple Video and Audio Asset Guide.

Stereoscopic video source files

Added stereoscopic video source file requirements. See the Stereoscopic Video Source Profile section.

Music Audio Content Profiles

Music Audio Source Profile

Apple accepts audio with a sampling rate of 44.1, 48, 88.2, 96, 176.4, or 192 kHz with 16-bit or 24-bit resolution. Note that if stereo audio source exists, it must be used. See Apple Digital Masters Source Profile for audio requirements specific to Apple Digital Masters content.

Where stereo source is not available, as may be the case with certain vintage or field recordings, send audio source with two identical channels for left and right. Single-channel audio will not be accepted.

Uncompressed audio formats supported are:

Format

Container Type

Qualified CODEC

Pulse-Code Modulation (PCM)

WAV (.wav)

Apple Lossless (ALAC)

M4A (.m4a)

QuickTime https://www.apple.com/quicktime

CAF (.caf)

iTunes Producer

Free Lossless Audio Codec (FLAC)

FLAC (.flac)

FLAC https://xiph.org/flac/

All other audio formats will be rejected.

Important: All audio must be generated using a CODEC qualified and approved by Apple.

Apple Digital Masters Source Profile

The audio for Apple Digital Masters (formerly “Mastered for iTunes’”) must follow these requirements to be badged as such in Apple's services:

  • Audio must be delivered at 24-bit resolution in an approved format.

  • Acceptable sample rates are 44.1, 48, 88.2, 96, 176.4, and 192 kHz.

For an in-depth description of Apple Digital Masters, refer to the technology brief here: Apple Digital Masters.

Best Practices for Apple Digital Masters Content

The following lists best practices for producing Apple Digital Masters content. Apple recommends communicating these best practices to your mastering house:

  • Source format must have been minimum 24-bit with a minimum sample rate of 44.1 kHz. (Up-sampling and/or bit-padding of 44.1 kHz/16-bit files is not allowed.)

  • All files must have been auditioned as encoded by the current Apple AAC encoder either with the “Apple Digital Masters Droplet", "RoundTripAAC" plug-in, or the Sonnox "Pro-Codec V2" plug-in that includes Apple's "iTunes +" AAC CODEC. Use these tools to set an appropriate level so that the encode doesn't show clipping.

  • Although Apple doesn't reject files for specific numbers of clips, audible clipping caused by excessive levels to the encoder may be reason for tracks to not be badged and marketed as an “Apple Digital Master."

  • The format of the files must be 24-bit PCM at a sample rate of 44.1, 48, 88.2, 96, 176.4, or 192 kHz. Native resolution of project should be used — do not down-sample. (ALAC or FLAC lossless compression is acceptable.)

Hi-Res Lossless Profile

Requirements for Hi-Res Lossless Content

To be badged as Hi-Res Lossless in Apple's services, audio must meet these requirements:

  • 24-bit resolution in an approved format

  • Sample rates 88.2, 96, 176.4, or 192 kHz. Note that the 44.1 and 48 kHz sample rates don't qualify as "Hi-Res Lossless."

Follow the “Best Practices for Apple Digital Masters Content” described in Apple Digital Masters Source Profile.

Immersive Audio Source Profile

Immersive audio source must meet the following requirements.

Dolby Atmos music deliverables

  • Dolby Atmos audio files generated from stereo mixes are not allowed. Specifically:

    • A Dolby Atmos track must be created from multitracks or stems created from multitracks.

    • Upmixing from a stereo release is not allowed.

    • Extracting stems (“de-mixing”) from a stereo release is not allowed.

    • A Dolby Atmos track consisting only of a stereo mix placed in the sound field with added ambience or reverb is not allowed.

  • Provide the Dolby Atmos file as a Broadcast Wave Format Audio Definition Model (BWF ADM) file.

  • All tracks within a project must be at the same frame rate.

  • All audio must be 24-bit linear pulse code modulation (LPCM) audio at 48kHz.

  • You must conform and sync the Atmos files with the stereo reference files for the same project.

  • The integrated loudness value should not exceed -18 LKFS measured as per ITU-R BS. 1770-4.

  • True-peak level should not exceed -1 dB TP measured as per ITU-R BS. 1770-4.

  • Full-frequency content should not be present in the Low-Frequency Effects (LFE) channel of the BWF ADM.

  • For albums where gapless playback is intended between tracks:

    • Each album track must be delivered as an individual BWF ADM file.

    • Each track boundary must be no more than half a frame (1,000 samples @ 48 kHz) earlier or later than the same track boundary in the corresponding stereo deliverable.

    • There must be no additional silence at the end of each track when compared to the same track from the corresponding stereo deliverable.

Note: Immersive audio that does not meet the requirements may be removed from Apple Music.

5.1 Surround and 7.1 Surround deliverables

Note: 5.1 and 7.1 audio files generated from stereo mixes are not allowed. Upmixing from 5.1 to 7.1 is not allowed. The 5.1 and 7.1 audio files must be delivered according to the guidelines below.

  • Audio files must be provided as LPCM in .mov containers with channel assignments.

  • Deliver files at 48 kHz with 16-bit or 24-bit resolution.

  • The integrated loudness value should not exceed -18 LKFS measured as per ITU-R BS. 1770-4.

  • True-peak level should not exceed -1 dB TP measured as per ITU-R BS. 1770-4.

  • You must conform and sync the deliverables with the stereo reference files for the same project.

Immersive Audio Channel Assignments

  • The QuickTime .mov file extension is expected for immersive 5.1 and 7.1 audio content.

  • For 7.1 surround audio, each audio channel must have an assignment, which must match one of the options in the table below. Option 1 shows one track with all eight channels and Option 2 shows one track for each channel. Note that “Rsl” (Rear Surround Left) can also be represented by “Rls” (Rear Left Surround).

    image of table showing audio channel assignments
  • For 5.1 surround audio, the channel assignments must match one of the options in the table below. For Option 1 (one track with all six channels), the order of the channel assignments can vary as noted in Option 1a, 1b, 1c, and 1d.

    Audio channel assignment table
    • For all stereo tracks in all options listed, the channel assignments can be indicated using L and R, or Lt and Rt, but not L and Rt, or R and Lt.

    Refer to Audio Channel Assignments for instructions on applying audio channel assignments and label descriptions.

Best Practices for Immersive Audio Content

Keep the following in mind when delivering immersive audio:

  • On initial delivery, you must deliver a data file with the role audio.2_0 containing the 2.0 stereo audio source unless you instruct Apple to downmix, using the audio.transform_to.2_0 attribute.

  • Only two audio sources per track (one stereo, one immersive) are allowed.

  • You cannot deliver two immersive audio sources or two stereo sources for the same track.

  • A single data file containing both 2.0 stereo source and 5.1 or 7.1 immersive source is not supported.

  • The difference between the most recently delivered (new or update) stereo source and immersive source durations must be less than or equal to 50 milliseconds.

  • All tracks in an album should be at same frame rate.

Ringtone Source Profile

  • Sampling rate of 44.1 kHz with 16-bit or 24-bit resolution and 96 kHz with 24-bit resolution

  • Must be lossless

  • WAV, FLAC, or ALAC format

  • Minimum length is 5 seconds and the maximum length is 30 seconds

See the table in Music Audio Source Profile for the uncompressed audio formats that are supported. All other audio formats will be rejected. Note that if stereo audio source exists, it must be used.

Important: All audio must be generated using a CODEC qualified and approved by Apple.

Music Album Cover Art Profile

  • JPEG with .jpg extension (quality unconstrained) or PNG with .png extension

  • Color space: RGB (screen standard)

  • Album cover art files should be a recommended size of 3000 x 3000 pixels; minimum size of 1400 x 1400 pixels.

  • For ringtones, minimum size of 800 x 800 pixels. 1400 x 1400 pixels recommended for best results

  • Images must be square

  • File formats: JPEG or PNG (100% quality)

  • 1:1 aspect ratio

Do not increase the size of a smaller image to meet the minimum size standard. Excessively blurry or pixelated images will be rejected.

Important: CMYK (print standard) images will not be accepted.

Music Album Motion Art Profile

Music Album Motion Art must be delivered to these technical specifications. If the uploaded assets do not meet these specifications, then the bundled assets are rejected.

To deliver album motion art, you must include both formats:

  • Album Page Motion 3x4

  • Album Page Motion 1x1

Album Page Motion 3x4

This motion art is displayed on the Album Page on iPhone. This motion art format is required.

  • File format: MOV with .mov extension

  • Video codec: Apple ProRes 422 or Apple ProRes 4444

  • Resolution: 2048x2732 pixels (3x4) (square pixels 1:1)

  • Frame Rate: 23.976, 24, 25, 29.97, or 30 fps

  • Color space: Rec 709 or sRGB

  • Length: 15 to 35 seconds

Album Page Motion 1x1

This motion art is displayed on the Album Page on Mac, iPad, and smart TVs. This motion art format is required.

  • File format: MOV with .mov extension

  • Video codec: Apple ProRes 422 or Apple ProRes 4444

  • Resolution: 3840x3840 pixels (1x1) (square pixels 1:1)

  • Frame Rate: 23.976, 24, 25, 29.97, or 30 fps

  • Color space: Rec 709 or sRGB

  • Length: 15 to 35 seconds

Music Digital Booklet Profile

  • PDF format with .pdf extension

  • Four-page minimum

  • No more than 10 MB in size

  • All fonts embedded

  • 11 in x 8.264 in (28 cm x 21 cm)

  • RGB color

  • Horizontal presentation

  • All images full-bleed as shown in sample pages

Important: These booklets are expressly designed for the Apple format, and cannot be reproductions of the liner notes with borders to increase their size.

Content Considerations

  • When saving as PDF, make sure the document opens full screen with no negative space surrounding the document.

  • If the digital booklet is many pages, consider using fewer images or optimizing images to achieve lower overall file size.

  • Printer’s marks are not allowed.

  • You cannot sell or advertise other products or services. No other promotional sites are allowed.

  • No links to anything outside of the booklet, except to the artist and/or label website(s).

  • No time-sensitive information (for example, a promotion or dates for an upcoming tour or concert).

Digital booklet examples

Music Video Content Profiles

Music Video SD Source Profile

Note: Chaptering is not supported for music videos.

NTSC and PAL

  • ITU-R BT.601 color space, Long GOP

  • Accepted formats and dimensions:

Format

Bitrate

Encoded Dimensions

Display Dimensions

Apple ProRes 422 (HQ)

VBR: 40-60 Mbps

NTSC: 720 x 480 or 720 x 4861

853 x 480 for 16:9 content

640 x 480 for 4:3 content

Apple ProRes 422 (HQ)

VBR: 40-60 Mbps

PAL: 720 x 576

1024 x 576 for 16:9 content

768 x 576 for 4:3 content

MPEG-2 Program Stream Main Profile

15 Mbps minimum

NTSC or PAL: 640 x height2

640 x height2

1 Content that has 720 x 486 encoded pixels should have a minimum of 4 pixels of black at the top and 2 at the bottom; crop values for top and bottom must total at least 6 pixels

2 Height depends on aspect ratio of source with a maximum of 480 pixels

  • All encoded content must include pixel aspect ratio (pasp), preferably one that results in a display aspect ratio of 4:3 or 16:9.

  • Native frame rate of original source: 

    • 29.97 interlaced frames per second video source, delivered as either interlaced or de-interlaced properly tagged as progressive

    • 24, 25, or 30 frames per second sourced from film, delivered as progressive

    • 23.976 frames per second for inverse telecine, delivered as progressive; must not be delivered interlaced or delivery will fail

    • For mixed frame rate material, contact your technical representative

  • Interlaced content must be tagged non-progressive and field ordering must be defined in the stream

  • Field dominance must be properly tagged (top field first, bottom field first, or progressive)

  • Gamma values are accepted and the value must be between 2.15 and 2.25

  • Telecine materials will not be accepted

  • Video source may be delivered matted: letterbox, pillarbox, or windowbox.

    • If the SD source is matted, the SD source should be delivered in its full-frame state with metadata included to specify the crop rectangle. See Music Video Single Crop Dimensions Metadata Example in the Apple Music Specification for details.

    • If the SD source file is not delivered matted or if there are no inactive pixels, Apple recommends setting all crop dimension attributes to '0' (zero).

    • 4:3 standard definition video should not be delivered anamorphic 16:9 with matting.

Important: All video must begin and end with at least one black frame. In addition, videos can only have empty edits in the last edit of the edit list; videos with empty edits other than the last edit will be blocked.

Music Video HD Source Profile

Note: Chaptering is not supported for music videos.

  • Apple ProRes 422 (HQ) or 4444 or 4444 (XQ)

  • VBR expected at ~220 Mbps

  • HD encoded dimensions accepted to support square pixel aspect ratios (PASP):

Encoded

PASP

Converted to ProRes From

1920 x 1080

1:1

HDCAM SR, D5, ATSC

1280 x 720

1:1

ATSC progressive

  • HD encoded dimensions accepted to support non-square pixel aspect ratios (this allows you to send HD video in the native dimensions of your best original source, for example in HD broadcast dimensions*):

Encoded

PASP

Converted to ProRes From

1440 x 1080

1:1.33333

XDCAM-HD, HDCAM

1280 x 1080

1:1.5

DVCProHD interlaced

960 x 720

1:1.33333

DVCProHD progressive

* If your original HD source pixel aspect ratio is non-square, contact your technical representative before delivery.

  • Native frame rate of original source:

    • 29.97 or 25 interlaced frames per second for video-sourced material

    • 23.976, 24, 25, or 30 frames per second for digital-progressive or film-sourced material.

  • Gamma values are accepted and the value must be between 2.15 and 2.25

  • HD source may be delivered matted: letterbox, pillarbox, or windowbox.

    • The HD source may be delivered in its full-frame state with metadata included to specify the crop rectangle. See Music Video Single Crop Dimensions Metadata Example in the Apple Music Specification for details.

    • If the HD source file is not delivered matted or if there are no inactive pixels, Apple recommends setting all crop dimension attributes to '0' (zero).

Important: All video must begin and end with at least one black frame. In addition, videos that begin with or contain empty edits will be blocked; the file can contain an empty edit in its edit list only if it is the last edit.

Music Video HDR Source Profile

The HDR source video must meet the following minimum requirements in the subsections below.

For both Dolby® Vision and HDR10

  • Display dimensions and PASP must match corresponding primary video display dimensions and PASP

  • HDR source video must have progressive scan and uniform frame rate

  • HDR source video track should only contain a single edit list entry

  • Duration must match corresponding primary video duration

  • Frame count must match corresponding primary video frame count

  • HDR format (Dolby® Vision or HDR10) specified as a source attribute

  • HDR video source that contains embedded audio will be accepted, but the audio will be ignored; the audio on the SDR source will be used for bundling.

  • Gamma values are accepted and the value must be between 2.15 and 2.25

  • HDR source may be delivered matted: letterbox, pillarbox, or windowbox.

    • The HDR source may be delivered in its full-frame state with metadata included to specify the crop rectangle. See Music Video Single Crop Dimensions Metadata Example in the Apple Music Specification for details.

    • If the HDR source file is not delivered matted or if there are no inactive pixels, Apple recommends setting all crop dimension attributes to '0' (zero).

  • All video must begin and end with at least one black frame. In addition, videos that begin with or contain empty edits will be blocked; the file can contain an empty edit in its edit list only if it is the last edit. For HDR source video in Dolby® Vision format, the sidecar metadata file should cover these black frames.

For Dolby® Vision

  • Single Apple ProRes 4444 or 4444 XQ 12-bit file accompanied by a single Dolby® Vision CM metadata file (Dolby Vision CM version 2.9 and CM version 4.0 sidecar metadata files are supported)

  • Transfer function: SMPTE ST 2084 (PQ)

  • White point and color primaries: ITU-R BT.2020 or D65 P3

  • Transform matrix: BT.2020 (for BT.2020 primaries) or BT.709 (for D65 P3 primaries)

  • The Dolby® Vision CM metadata should not contain any gaps in the shots and the frames referenced in the shots should cover all frames in the video essence

For HDR10

  • Single Apple ProRes 4444 or 4444 XQ 12-bit file

  • Transfer function: SMPTE ST 2084 (PQ)

  • White point and color primaries: ITU-R BT.2020 or D65 P3

  • Transform matrix: BT.2020 (for BT.2020 primaries) or BT.709 (for D65 P3 primaries)

  • ST 2086 and MaxCLL / MaxFALL metadata provided as additional source attributes

Music Video 4K Source Profile

  • Dimensions should be 3840 x 2160 (UHD) or 4096 X 2160 (DCI 4k). Any DCI 4k asset can have optional crop values (Apple strongly recommends sending crop values for DCI 4k).

  • Apple ProRes 422 HQ or 4444 or 4444 XQ

  • VBR expected at ~880 Mbps for 422 HQ, ~1320 Mbps for 4444 and ~2000 Mbps for 4444 XQ

  • Content should be encoded using ITU-R BT.709 color space. For more information see https://www.itu.int/rec/R-REC-BT.709/en

  • Content should be delivered in the original frame rate of the source

  • 4K source must be progressive scan and can be delivered in 23.976, 24, 25, 29.97, or 30 frames per second

  • Gamma values are accepted and the value must be between 2.15 and 2.25

  • 4K source may be delivered matted: letterbox, pillarbox, or windowbox.

    • The 4K source may be delivered in its full-frame state with metadata included to specify the crop rectangle. See Music Video Single Crop Dimensions Metadata Example in the Apple Music Specification for details.

    • If the 4K source file is not delivered matted or if there are no inactive pixels, Apple recommends setting all crop dimension attributes to '0' (zero).

  • All video must begin and end with at least one black frame. In addition, videos that begin with or contain empty edits will be blocked; the file can contain an empty edit in its edit list only if it is the last edit.

Music Video Audio Source Profile

If 5.1 Surround is available for a music video audio source, the audio should be delivered in 5.1 Surround in addition to providing a stereo version; otherwise the audio may be delivered in Stereo only.

Surround

  • LPCM in either Big Endian or Little Endian, 16-bit or 24-bit, at least 48kHz

  • Expected channels: L, R, C, LFE, Ls, Rs

Stereo

  • MPEG-1 layer II stereo

  • 384 kpbs

  • 48Khz

  • Included in the same file as the delivered video

Music Video Audio/Video Container

  • Deliver all content in an MPEG-2 Program Stream file container

  • The .mpg file extension is expected for all MPEG-2 content

  • Audio must be delivered muxed with the video stream

Music Video Closed Captioning Profile

Note: Closed captioning can be sent with ProRes and MPEG-2 files.

  • Text in EIA-608 format.

  • Delivered in the same package with the video it references.

  • In a Scenarist SCC formatted file, using .scc file extension.

  • The timecode frame rate can only be 29.97 and is independent from your video source frame rate. The timecode format however must match the timecode format of the source video, either drop frame (DF) or non-drop frame (NDF).

    • Drop frame format has colons for the first two time delimiters and a semi-colon for the last time delimiter (HH:MM:SS;FF)

    • Non-drop frame format has colons for all the time delimiters (HH:MM:SS:FF)

Source Video Frame Rate

Closed Caption Frame Rate

Description

Timecode Format

Timecode Example

29.97, 59.94i

29.97

NTSC Video

DF

HH:MM:SS;FF

25, 50i

29.97

PAL Video

NDF

HH:MM:SS:FF

24

29.97

Film

NDF

HH:MM:SS:FF

23.976

29.97

NTSC Film

NDF

HH:MM:SS:FF

  • Captions should display and synchronize to within one second of the initial, audible dialog to be represented in text.

The timecodes of the captions are relative to the start of the program, and not the QuickTime movie's timecode track.

To create EIA-608 closed captions, use Final Cut Pro or a third-party captions-authoring app to output an SCC text file.

Notes:

Music Video Screen Capture Image Profile

Note: Apple strongly recommends using the image_time attribute to deliver the timecode for the screen capture over sending an asset file. Apple will use the frame specified with the timecode to create the image for you. See Music Video Single Metadata Annotations in the Apple Music Specification for details.

  • Screen capture directly from delivered video (specifying the timecode with the image_time attribute in XML is preferred)

  • 16:9 or 4:3 (for legacy videos) aspect ratio

  • Preferred size: 3840 x 2160 - 4K resolution

  • Minimum size (below this size will not be accepted): at least 1920 wide or 1080 high

  • JPEG with .jpg extension (quality unconstrained) or PNG with .png extension

  • RGB color profile (screen standard)

  • Only the active pixel area may be included (no letterboxing / black bars on outer portions of image to fulfill required aspect ratio)

Video screen capture will be rejected if it includes any of the following:

  • Motion blurred imagery

  • Low quality or pixelated imagery

  • Harsh crops of faces, arms, shoulders

  • Text containing the artist name or music video title

  • Third-party trademarks without authorization or usage rights

  • Unintentionally closed eyes

Do not increase the size of a smaller image to meet the minimum size standard. Excessively blurry or pixelated images will be rejected. Images must be taken directly from the video.

Important:

  • Do not increase the size of a smaller image to meet the minimum size standard.

  • CMYK color profile images will not be accepted.

  • Images must be taken directly from the video.

TTML File Format

Overview

This chapter describes the file format used to deliver song lyrics for an album song track. (The file format does not apply to music video tracks.) The file format is a restricted dialect of TTML Version 1.0 (https://www.w3.org/TR/ttml1/) that also includes some extensions for the Apple implementation. The lyrics file must be delivered in the correct format with the .ttml extension. Once the TTML document has been created and saved with the .ttml extension, it is delivered as a file in the <lyrics_file> block for a track.

Note: To understand this document, you should have some familiarity with TTML.

Overview of TTML for Lyrics

Provide lyrics for songs in an album by delivering documents called lyrics files. Lyrics files are formatted according to the TTML format, with extensions for features specific to the Apple implementation. Each document contains the lyrics for one song. For each <track> tag in the metadata.xml file, specify the lyrics file using the <lyrics_file> tag. See Basic Music Album Metadata Example for information on how to deliver the lyrics file.

Note: You cannot deliver lyrics for music videos.

Lyrics files can specify line-by-line lyrics or beat-by-beat lyrics:

For annotations on the XML formats, see Lyrics Example Annotations.

Note: You can still send lyrics in plain text format using the <lyrics> tag, however, sending lyrics in HTML format is being deprecated.

The following sections briefly describe the standard TTML content elements for lyrics. Throughout this chapter, the symbol Apple extension symbol indicates an Apple implementation or recommendation.

Lyrics Structure

The basic structure of a song is based on the standard TTML content elements.

Lyrics Content

TTML Element

Use

Song

<body>

The <body> element block encloses the tags used to deliver the lyrics.

Paragraphs

<div>

A <div> block encloses the tags used to deliver the lyrics for one paragraph (or line).

Apple extension symbol In the <div> block, use the itunes:song-part attribute to identify its compositional purpose in the song. See Apple Extension for Lyrics below.

Lines

<p>

A <p> tag encloses each line in the lyrics.

Apple extension symbol Do not use line breaks (<br>). Instead, use <p> tags to delimit lines.

  • In the <p> tag, use the ttm:agent attribute to indicate which artist is performing the line. See Agent below.

  • In the <p> tag, add being and end attributes to apply line-by-line timing to lines. See Timing below.

Word or Phrase

<span>

A <span> tag is used within a <p> tag to apply special conditions to a word or phrase.

  • In the <span> tag, add the ttm:agent attribute to indicate which artist is performing the word or phrase. See Agent below.

  • In the <span> tag, add being and end attributes to apply beat-by-beat timing to a word or phrase. See Timing below.

Apple Extension for Lyrics

Apple extension symbol Song Parts: To distinguish between song parts, add the itunes:song-part attribute to a <div> block to note its compositional purpose in the song.

Example: <div itunes:song-part="Chorus">

You can use any value you want for the itunes:song-part attribute, but Apple recommends the following values:

  • Verse

  • Chorus

  • PreChorus

  • Bridge

  • Intro

  • Outro

  • Refrain

  • Instrumental

Note: Currently, song parts are not displayed to the users.

Agent

Add the ttm:agent attribute to any <p> and <span> element to indicate which artist is performing the contents.

Timing

Optionally add timing to the lyrics so that the lyrics can be synchronized to the timing of the words sung by the singers, like the display of lyrics in a sing-along.

A time value is expressed as a Synchronized Multimedia Integration Language (SMIL) full or partial clock-value:

(Hours ":")? Minutes ":" Seconds ("." Fraction up to 3 digits)?

Specifying hours and fractions of seconds is optional (as indicated by the ?).

Rules for timing

  • Where timing occurs, you must specify both the begin and end attributes.

  • All time codes must be valid.

  • The begin time code must be before the end time code in the same element.

  • Time codes of sub-elements must be within the time codes of the parent element.

  • Time codes must be within the duration of song.

  • The timing of <div> elements must not overlap, regardless of any ttm:agent attribute.

Text and Whitespace

White space in the document is processed according to rules defined by [XML 1.0], §2.10, White Space Handling.

Apple extension symbol Apple strongly recommends that you adhere to the xml:space="default" rules and refrain from specifying XML whitespace handling attributes.

Line-by-Line Lyrics Example XML Delivery

The following example shows how to deliver a line-by-line lyrics document file in TTML format.

<?xml version="1.0" encoding="UTF-8"?><tt xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3.org/ns/ttml#styling" xmlns:itunes="http://itunes.apple.com/lyric-ttml-extensions" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xml:lang="en-US"> <head> <metadata> <ttm:title>City of Stars</ttm:title> <ttm:agent type="person" xml:id="v1"> <ttm:name type="full">Ryan Gosling</ttm:name> </ttm:agent> <ttm:agent type="person" xml:id="v2"> <ttm:name type="full">Emma Stone</ttm:name> </ttm:agent> <ttm:agent type="group" xml:id="v3"/> </metadata> </head> <body dur="02:29.720"> <div begin="00:09.327" end="00:44.651" itunes:song-part="Verse"> <p begin="00:09.327" end="00:12.109" ttm:agent="v1">City of stars</p> <p begin="00:12.426" end="00:15.906" ttm:agent="v1">Are you shining just for me?</p> <p begin="00:18.972" end="00:21.753" ttm:agent="v1">City of stars</p> <p begin="00:22.024" end="00:25.552" ttm:agent="v1">There's so much that I can't see</p> <p begin="00:28.052" end="00:30.752" ttm:agent="v1">Who knows?</p> <p begin="00:31.373" end="00:37.357" ttm:agent="v1">I felt it from the first embrace I shared with you</p> <p begin="00:37.803" end="00:41.859" ttm:agent="v2">That now our dreams</p> <p begin="00:41.957" end="00:44.651" ttm:agent="v2">May finally come true</p> </div> <div begin="00:48.026" end="01:14.408" itunes:song-part="Verse"> <p begin="00:48.026" end="00:50.585" ttm:agent="v2">City of stars</p> <p begin="00:50.757" end="00:54.257" ttm:agent="v2">Just one thing everybody wants</p> <p begin="00:57.199" end="00:59.753" ttm:agent="v2">There in the bars</p> <p begin="00:59.753" end="01:05.401" ttm:agent="v2">And through the smokescreen of the crowded restaurants</p> <p begin="01:05.637" end="01:11.699" ttm:agent="v2">It's love, yes, all we're looking for is love</p> <p begin="01:11.699" end="01:14.408" ttm:agent="v2">From someone else</p> </div> <div begin="01:15.080" end="01:54.303" itunes:song-part="Chorus"> <p begin="01:15.080" end="01:16.660" ttm:agent="v1">A rush</p> <p begin="01:16.159" end="01:17.555" ttm:agent="v2">A glance</p> <p begin="01:17.304" end="01:18.858" ttm:agent="v1">A touch</p> <p begin="01:18.357" end="01:19.705" ttm:agent="v2">A dance</p> <p begin="01:19.705" end="01:22.873" ttm:agent="v3">Look in somebody's eyes</p> <p begin="01:22.874" end="01:25.029" ttm:agent="v3">To light up the skies</p> <p begin="01:25.030" end="01:28.223" ttm:agent="v3">To open the world and send it reeling</p> <p begin="01:28.324" end="01:34.000" ttm:agent="v3">A voice that says, "I'll be here, and you'll be alright"</p> <p begin="01:37.001" end="01:39.905" ttm:agent="v3">I don't care if I know</p> <p begin="01:40.003" end="01:42.126" ttm:agent="v3">Just where I will go</p> <p begin="01:42.126" end="01:45.302" ttm:agent="v3">'Cause all that I need's this crazy feeling</p> <p begin="01:45.304" end="01:48.958" ttm:agent="v3">A rat, tat, tat on my heart</p> <p begin="01:49.703" end="01:54.303" ttm:agent="v1">Think I want it to stay</p> </div> <div begin="01:55.855" end="02:16.755" itunes:song-part="Outro"> <p begin="01:55.855" end="01:58.807" ttm:agent="v1">City of stars</p> <p begin="01:59.104" end="02:02.506" ttm:agent="v1">Are you shining just for me?</p> <p begin="02:05.936" end="02:08.957" ttm:agent="v1">City of stars</p> <p begin="02:10.380" end="02:16.755" ttm:agent="v2">You never shined so brightly</p> </div> </body></tt>

Beat-by-Beat Lyrics Example XML Delivery

The following example shows how to deliver a beat-by-beat lyrics document file in TTML format.

<?xml version="1.0" encoding="UTF-8"?><tt xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3.org/ns/ttml#styling" xmlns:itunes="http://itunes.apple.com/lyric-ttml-extensions" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xml:lang="en-US"> <head> <metadata> <ttm:title>Dancing With A Stranger</ttm:title> <ttm:agent type="person" xml:id="v1"> <ttm:name type="full">Sam Smith</ttm:name> </ttm:agent> <ttm:agent type="person" xml:id="v2"> <ttm:name type="full">Normani</ttm:name> </ttm:agent> <ttm:agent type="group" xml:id="v3"/> </metadata> </head> <body dur="02:51.030"> <div begin="00:07.621" end="00:45.268" itunes:song-part="Verse"> <p begin="00:07.621" end="00:10.267" ttm:agent="v1"> <span begin="00:07.621" end="00:07.920">I</span> <span begin="00:07.920" end="00:08.253">don't</span> <span begin="00:08.253" end="00:08.739">wanna</span> <span begin="00:08.739" end="00:09.103">be</span> <span begin="00:09.103" end="00:09.655">alone</span> <span begin="00:09.655" end="00:10.267">tonight</span> </p> <p begin="00:10.961" end="00:15.046" ttm:agent="v1"> <span begin="00:10.961" end="00:12.244">It's</span> <span begin="00:12.244" end="00:12.828">pretty</span> <span begin="00:12.828" end="00:13.177">clear</span> <span begin="00:13.177" end="00:13.462">that</span> <span begin="00:13.462" end="00:13.761">I'm</span> <span begin="00:13.761" end="00:14.046">not</span> <span begin="00:14.046" end="00:14.494">over</span> <span begin="00:14.494" end="00:15.046">you</span> </p> <p begin="00:16.828" end="00:20.471" ttm:agent="v1"> <span begin="00:16.828" end="00:17.210">I'm</span> <span begin="00:17.210" end="00:17.527">still</span> <span begin="00:17.527" end="00:18.127">thinking</span> <span begin="00:18.127" end="00:18.394">'bout</span> <span begin="00:18.394" end="00:18.676">the</span> <span begin="00:18.676" end="00:19.178">things</span> <span begin="00:19.178" end="00:19.594">you</span> <span begin="00:19.594" end="00:20.471">do</span> </p> <p begin="00:21.271" end="00:24.175" ttm:agent="v1"> <span begin="00:21.271" end="00:21.588">So</span> <span begin="00:21.588" end="00:21.859">I</span> <span begin="00:21.859" end="00:22.188">don't</span> <span begin="00:22.188" end="00:22.722">wanna</span> <span begin="00:22.722" end="00:23.055">be</span> <span begin="00:23.055" end="00:23.620">alone</span> <span begin="00:23.620" end="00:24.175">tonight</span> </p> <p begin="00:24.289" end="00:26.656" ttm:agent="v1"> <span begin="00:24.289" end="00:24.705">Alone</span> <span begin="00:24.705" end="00:25.321">tonight,</span> <span begin="00:25.471" end="00:25.905">alone</span> <span begin="00:25.905" end="00:26.656">tonight</span> </p> <p begin="00:27.320" end="00:29.237" ttm:agent="v1"> <span begin="00:27.320" end="00:27.686">Can</span> <span begin="00:27.686" end="00:28.003">you</span> <span begin="00:28.003" end="00:28.304">light</span> <span begin="00:28.304" end="00:28.486">the</span> <span begin="00:28.486" end="00:29.237">fire?</span> </p> <p begin="00:30.545" end="00:34.297" ttm:agent="v1"> <span begin="00:30.545" end="00:30.894">I</span> <span begin="00:30.894" end="00:31.211">need</span> <span begin="00:31.211" end="00:32.078">somebody</span> <span begin="00:32.078" end="00:32.361">who</span> <span begin="00:32.361" end="00:32.662">can</span> <span begin="00:32.662" end="00:32.995">take</span> <span begin="00:32.995" end="00:34.297">control</span> </p> <p begin="00:35.177" end="00:39.449" ttm:agent="v1"> <span begin="00:35.177" end="00:35.529">I</span> <span begin="00:35.529" end="00:35.878">know</span> <span begin="00:35.878" end="00:36.761">exactly</span> <span begin="00:36.761" end="00:37.028">what</span> <span begin="00:37.028" end="00:37.326">I</span> <span begin="00:37.326" end="00:37.678">need</span> <span begin="00:37.678" end="00:38.177">to</span> <span begin="00:38.177" end="00:39.449">do</span> </p> <p begin="00:39.868" end="00:42.815" ttm:agent="v1"> <span begin="00:39.868" end="00:40.217">'Cause</span> <span begin="00:40.217" end="00:40.516">I</span> <span begin="00:40.516" end="00:40.833">don't</span> <span begin="00:40.833" end="00:41.401">wanna</span> <span begin="00:41.401" end="00:41.684">be</span> <span begin="00:41.684" end="00:42.268">alone</span> <span begin="00:42.268" end="00:42.815">tonight</span> </p> <p begin="00:42.935" end="00:45.268" ttm:agent="v1"> <span begin="00:42.935" end="00:43.401">Alone</span> <span begin="00:43.401" end="00:43.961">tonight,</span> <span begin="00:44.142" end="00:44.625">alone</span> <span begin="00:44.625" end="00:45.268">tonight</span> </p> </div> <div begin="00:46.537" end="01:10.357" itunes:song-part="Chorus"> <p begin="00:46.537" end="00:49.038" ttm:agent="v1"> <span begin="00:46.537" end="00:47.238">Look</span> <span begin="00:47.238" end="00:47.537">what</span> <span begin="00:47.537" end="00:47.820">you</span> <span begin="00:47.820" end="00:48.121">made</span> <span begin="00:48.121" end="00:48.406">me</span> <span begin="00:48.406" end="00:49.038">do</span> </p> <p begin="00:49.214" end="00:51.121" ttm:agent="v1"> <span begin="00:49.214" end="00:49.529">I'm</span> <span begin="00:49.529" end="00:49.846">with</span> <span begin="00:49.846" end="00:50.745">somebody</span> <span begin="00:50.745" end="00:51.121">new</span> </p> <p begin="00:51.228" end="00:55.675" ttm:agent="v1"> <span begin="00:51.228" end="00:51.879">Ooh,</span> <span begin="00:51.879" end="00:52.463">baby,</span> <span begin="00:52.463" end="00:53.012">baby,</span> <span begin="00:53.012" end="00:53.345">I'm</span> <span begin="00:53.345" end="00:54.180">dancing</span> <span begin="00:54.180" end="00:54.513">with</span> <span begin="00:54.513" end="00:54.831">a</span> <span begin="00:54.831" end="00:55.675">stranger</span> </p> <p begin="00:55.921" end="00:58.332" ttm:agent="v1"> <span begin="00:55.921" end="00:56.521">Look</span> <span begin="00:56.521" end="00:56.820">what</span> <span begin="00:56.820" end="00:57.153">you</span> <span begin="00:57.153" end="00:57.487">made</span> <span begin="00:57.487" end="00:57.769">me</span> <span begin="00:57.769" end="00:58.332">do</span> </p> <p begin="00:58.524" end="01:00.468" ttm:agent="v1"> <span begin="00:58.524" end="00:58.876">I'm</span> <span begin="00:58.876" end="00:59.209">with</span> <span begin="00:59.209" end="01:00.041">somebody</span> <span begin="01:00.041" end="01:00.468">new</span> </p> <p begin="01:00.572" end="01:05.714" ttm:agent="v1"> <span begin="01:00.572" end="01:01.153">Ooh,</span> <span begin="01:01.153" end="01:01.788">baby,</span> <span begin="01:01.788" end="01:02.321">baby,</span> <span begin="01:02.321" end="01:02.639">I'm</span> <span begin="01:02.639" end="01:03.505">dancing</span> <span begin="01:03.505" end="01:03.823">with</span> <span begin="01:03.823" end="01:04.137">a</span> <span begin="01:04.137" end="01:04.763">strang</span> <span begin="01:04.763" end="01:05.714">er</span> </p> <p begin="01:07.255" end="01:10.357" ttm:agent="v1"> <span begin="01:07.255" end="01:08.186">Dancing</span> <span begin="01:08.186" end="01:08.471">with</span> <span begin="01:08.471" end="01:08.805">a</span> <span begin="01:08.805" end="01:09.410">strang</span> <span begin="01:09.410" end="01:10.357">er</span> </p> </div> <div begin="01:12.395" end="01:31.914" itunes:song-part="Verse"> <p begin="01:12.395" end="01:15.728" ttm:agent="v2"> <span begin="01:12.395" end="01:12.795">I</span> <span begin="01:12.795" end="01:13.328">wasn't</span> <span begin="01:13.328" end="01:13.960">even</span> <span begin="01:13.960" end="01:14.576">going</span> <span begin="01:14.576" end="01:14.867">out</span> <span begin="01:14.867" end="01:15.728">tonight</span> </p> <p begin="01:17.141" end="01:20.976" ttm:agent="v2"> <span begin="01:17.141" end="01:17.491">But</span> <span begin="01:17.491" end="01:17.808">boy,</span> <span begin="01:17.808" end="01:18.141">I</span> <span begin="01:18.141" end="01:18.408">need</span> <span begin="01:18.408" end="01:18.691">to</span> <span begin="01:18.691" end="01:18.976">get</span> <span begin="01:18.976" end="01:19.259">you</span> <span begin="01:19.259" end="01:19.607">off</span> <span begin="01:19.607" end="01:19.887">of</span> <span begin="01:19.887" end="01:20.257">my</span> <span begin="01:20.257" end="01:20.976">mind</span> </p> <p begin="01:21.843" end="01:25.984" ttm:agent="v2"> <span begin="01:21.843" end="01:22.158">I</span> <span begin="01:22.158" end="01:22.443">know</span> <span begin="01:22.443" end="01:23.360">exactly</span> <span begin="01:23.360" end="01:23.643">what</span> <span begin="01:23.643" end="01:23.926">I</span> <span begin="01:23.926" end="01:24.344">have</span> <span begin="01:24.344" end="01:24.774">to</span> <span begin="01:24.774" end="01:25.984">do</span> </p> <p begin="01:26.678" end="01:29.484" ttm:agent="v2"> <span begin="01:26.678" end="01:27.062">I</span> <span begin="01:27.062" end="01:27.395">don't</span> <span begin="01:27.395" end="01:27.929">wanna</span> <span begin="01:27.929" end="01:28.262">be</span> <span begin="01:28.262" end="01:28.862">alone</span> <span begin="01:28.862" end="01:29.484">tonight</span> </p> <p begin="01:29.628" end="01:31.914" ttm:agent="v2"> <span begin="01:29.628" end="01:30.060">Alone</span> <span begin="01:30.060" end="01:30.575">tonight,</span> <span begin="01:30.746" end="01:31.212">alone</span> <span begin="01:31.212" end="01:31.914">tonight</span> </p> </div> <div begin="01:33.183" end="02:06.889" itunes:song-part="Chorus"> <p begin="01:33.183" end="01:35.516" ttm:agent="v2"> <span begin="01:33.183" end="01:33.799">Look</span> <span begin="01:33.799" end="01:34.116">what</span> <span begin="01:34.116" end="01:34.434">you</span> <span begin="01:34.434" end="01:34.767">made</span> <span begin="01:34.767" end="01:35.050">me</span> <span begin="01:35.050" end="01:35.516">do</span> </p> <p begin="01:35.841" end="01:37.660" ttm:agent="v2"> <span begin="01:35.841" end="01:36.174">I'm</span> <span begin="01:36.174" end="01:36.457">with</span> <span begin="01:36.457" end="01:37.324">somebody</span> <span begin="01:37.324" end="01:37.660">new</span> </p> <p begin="01:37.921" end="01:42.396" ttm:agent="v2"> <span begin="01:37.921" end="01:38.454">Oh,</span> <span begin="01:38.454" end="01:39.054">baby,</span> <span begin="01:39.054" end="01:39.604">baby,</span> <span begin="01:39.604" end="01:39.905">I'm</span> <span begin="01:39.905" end="01:40.804">dancing</span> <span begin="01:40.804" end="01:41.121">with</span> <span begin="01:41.121" end="01:41.406">a</span> <span begin="01:41.406" end="01:42.396">stranger</span> </p> <p begin="01:42.543" end="01:44.887" ttm:agent="v2"> <span begin="01:42.543" end="01:43.111">Look</span> <span begin="01:43.111" end="01:43.410">what</span> <span begin="01:43.410" end="01:43.746">you</span> <span begin="01:43.746" end="01:44.044">made</span> <span begin="01:44.044" end="01:44.343">me</span> <span begin="01:44.343" end="01:44.887">do</span> </p> <p begin="01:45.154" end="01:46.957" ttm:agent="v2"> <span begin="01:45.154" end="01:45.453">I'm</span> <span begin="01:45.453" end="01:45.754">with</span> <span begin="01:45.754" end="01:46.671">somebody</span> <span begin="01:46.671" end="01:46.957">new</span> </p> <p begin="01:47.178" end="01:51.922" ttm:agent="v2"> <span begin="01:47.178" end="01:47.778">Oh,</span> <span begin="01:47.778" end="01:48.362">baby,</span> <span begin="01:48.362" end="01:48.911">baby,</span> <span begin="01:48.911" end="01:49.245">I'm</span> <span begin="01:49.245" end="01:50.111">dancing</span> <span begin="01:50.111" end="01:50.445">with</span> <span begin="01:50.445" end="01:50.746">a</span> <span begin="01:50.746" end="01:51.922">stranger</span> </p> <p begin="01:53.860" end="01:57.321" ttm:agent="v3"> <span begin="01:53.860" end="01:54.727">Dancing</span> <span begin="01:54.727" end="01:55.092">with</span> <span begin="01:55.092" end="01:55.393">a</span> <span begin="01:55.393" end="01:56.106">strang</span> <span begin="01:56.106" end="01:57.321">er</span> </p> <p begin="01:58.560" end="02:01.904" ttm:agent="v3"> <span begin="01:58.560" end="01:59.424">Dancing</span> <span begin="01:59.424" end="01:59.757">with</span> <span begin="01:59.757" end="02:00.074">a</span> <span begin="02:00.074" end="02:00.823">strang</span> <span begin="02:00.823" end="02:01.904">er</span> </p> <p begin="02:03.210" end="02:05.197" ttm:agent="v1"> <span begin="02:03.210" end="02:04.461">Dancing,</span> <span begin="02:04.461" end="02:05.197">yeah</span> </p> <p begin="02:04.801" end="02:06.889" ttm:agent="v2"> <span begin="02:04.801" end="02:05.741">Ooh,</span> <span begin="02:05.741" end="02:06.440">ooh-</span> <span begin="02:06.440" end="02:06.889">yeah</span> </p> </div> <div begin="02:10.453" end="02:29.249" itunes:song-part="Outro"> <p begin="02:10.453" end="02:12.827" ttm:agent="v1"> <span begin="02:10.453" end="02:11.103">Look</span> <span begin="02:11.103" end="02:11.437">what</span> <span begin="02:11.437" end="02:11.738">you</span> <span begin="02:11.738" end="02:12.071">made</span> <span begin="02:12.071" end="02:12.370">me</span> <span begin="02:12.370" end="02:12.827">do</span> </p> <p begin="02:13.187" end="02:15.036" ttm:agent="v1"> <span begin="02:13.187" end="02:13.520">I'm</span> <span begin="02:13.520" end="02:13.822">with</span> <span begin="02:13.822" end="02:14.720">somebody</span> <span begin="02:14.720" end="02:15.036">new</span> </p> <p begin="02:15.179" end="02:19.646" ttm:agent="v1"> <span begin="02:15.179" end="02:15.830">Ooh,</span> <span begin="02:15.830" end="02:16.395">baby,</span> <span begin="02:16.395" end="02:16.896">baby,</span> <span begin="02:16.896" end="02:17.246">I'm</span> <span begin="02:17.246" end="02:18.080">dancing</span> <span begin="02:18.080" end="02:18.395">with</span> <span begin="02:18.395" end="02:18.712">a</span> <span begin="02:18.712" end="02:19.646">stranger</span> </p> <p begin="02:19.835" end="02:22.136" ttm:agent="v3"> <span begin="02:19.835" end="02:20.417">Look</span> <span begin="02:20.417" end="02:20.753">what</span> <span begin="02:20.753" end="02:21.051">you</span> <span begin="02:21.051" end="02:21.369">made</span> <span begin="02:21.369" end="02:21.683">me</span> <span begin="02:21.683" end="02:22.136">do</span> </p> <p begin="02:22.497" end="02:24.374" ttm:agent="v3"> <span begin="02:22.497" end="02:22.798">I'm</span> <span begin="02:22.798" end="02:23.065">with</span> <span begin="02:23.065" end="02:23.982">somebody</span> <span begin="02:23.982" end="02:24.374">new</span> </p> <p begin="02:24.510" end="02:29.249" ttm:agent="v3"> <span begin="02:24.510" end="02:25.059">Ooh,</span> <span begin="02:25.059" end="02:25.710">baby,</span> <span begin="02:25.710" end="02:26.209">baby,</span> <span begin="02:26.209" end="02:26.542">I'm</span> <span begin="02:26.542" end="02:27.411">dancing</span> <span begin="02:27.411" end="02:27.745">with</span> <span begin="02:27.745" end="02:28.043">a</span> <span begin="02:28.043" end="02:29.249">stranger</span> </p> </div> <div begin="02:29.648" end="02:48.133" itunes:song-part="Outro"> <p begin="02:29.648" end="02:32.602" ttm:agent="v3"> <span begin="02:29.648" end="02:30.000">I'm</span> <span begin="02:30.000" end="02:30.917">dancing,</span> <span begin="02:30.917" end="02:31.232">I'm</span> <span begin="02:31.232" end="02:32.602">dancing</span> </p> <p begin="02:34.324" end="02:39.579" ttm:agent="v1"> <span ttm:role="x-bg"> <span begin="02:34.324" end="02:34.690">(I'm</span> <span begin="02:34.690" end="02:35.572">dancing,</span> <span begin="02:35.572" end="02:35.874">I'm</span> <span begin="02:35.874" end="02:37.266">dancing)</span> </span> <span begin="02:35.747" end="02:36.677">Dancing</span> <span begin="02:36.677" end="02:37.027">with</span> <span begin="02:37.027" end="02:37.344">a</span> <span begin="02:37.344" end="02:38.253">strang</span> <span begin="02:38.253" end="02:39.579">er</span> </p> <p begin="02:38.994" end="02:43.927" ttm:agent="v1"> <span ttm:role="x-bg"> <span begin="02:38.994" end="02:39.413">(I'm</span> <span begin="02:39.413" end="02:40.178">dancing,</span> <span begin="02:40.178" end="02:40.562">I'm</span> <span begin="02:40.562" end="02:41.978">dancing)</span> </span> <span begin="02:40.407" end="02:41.325">Dancing</span> <span begin="02:41.325" end="02:41.671">with</span> <span begin="02:41.671" end="02:41.941">a</span> <span begin="02:41.941" end="02:42.860">strang</span> <span begin="02:42.860" end="02:43.927">er</span> </p> <p begin="02:43.690" end="02:48.133" ttm:agent="v2"> <span ttm:role="x-bg"> <span begin="02:43.690" end="02:44.039">(I'm</span> <span begin="02:44.039" end="02:44.874">dancing,</span> <span begin="02:44.874" end="02:45.191">I'm</span> <span begin="02:45.191" end="02:46.495">dancing)</span> </span> <span begin="02:45.163" end="02:46.061">Dancing</span> <span begin="02:46.061" end="02:46.411">with</span> <span begin="02:46.411" end="02:46.747">a</span> <span begin="02:46.747" end="02:47.558">strang</span> <span begin="02:47.558" end="02:48.133">er</span> </p> </div> </body></tt>

Lyrics Example Annotations

The symbol Apple extension symbol in the annotations indicates an Apple-specific implementation or recommendation.

The reference numbers (for example, §7.2.2) refer to the section in the TTML specification in which the element is explained.

<?xml version="1.0" encoding="UTF-8"?>

XML Declaration (required)

The character encoding of your document must be defined.

Apple accepts only UTF-8 encoding as it efficiently encodes non-Roman characters.

Important: The TTML file must not contain a byte-order mark (BOM).

<tt xmlns="http://www.w3.org/ns/ttml"

  xmlns:tts="http://www.w3.org/ns/ttml#styling"

  xmlns:itunes="http://itunes.apple.com/lyric-ttml-extensions"

  xmlns:ttm="http://www.w3.org/ns/ttml#metadata"

Document Container §7.1.1 (required)

The tt element begins the iTunes Timed Text (iTT) document container.

The xmlns (for XML namespace) attribute is required and is needed for schema validation. It is used to declare the namespace (and associated schema) to which the tags in the XML are expected to conform. The namespace must be http://www.w3.org/ns/ttml.

Apple extension symbol Supply the XML Namespace to allow the use of Apple extensions: http://itunes.apple.com/lyric-ttml-extensions. Note that this example uses the prefix itunes (in the prefix xmlns:itunes), however using itunes is not required; you can use any prefix.

xml:lang="en-US">

Language §7.2.2 (required)

This is the same format that is described for use in the locale attribute in the Language Codes, and indicates the language and optional dialect used in the lyrics.

Head Element

<head>

Head §7.1.2 (required)

Begins the head element, which specifies document metadata, such as title, and other document-specific information, such as the artists performing.

Note:Apple extension symbol Any styling and layout elements included in the Head element are ignored by Apple processing.

<metadata>

Metadata §8.1.1 (required)

Begins the metadata block.

  <ttm:title>City of Stars</ttm:title>

Title §12.1.2 (required)

Supplies the title of the song for the lyrics you are delivering.

<ttm:agent type="person" xml:id="v1">

Agent §12.1.5 (optional)

The ttm:agent element is used to indicate all the artists who perform vocal parts of the lyrics. Later in the TTML document when you supply the lyrics, you can indicate which artist is performing a line or words by supplying the value provided with the xml:id attribute. See the annotations for the <p> and <span> elements for how to use the ttm:agent and xml:id attributes.

For example, if multiple agents are identified, the Apple Music Duet feature shows lyrics on opposite sides of the screen.

The attributes supplied with the ttm:agent element include the type and the xml:id.

The type attribute indicates the type of performer:

  • Use "person" for lines sung by individual vocalist

  • Use "group" for lines sung by multiple vocalists

  • Use "other" for other cases

The xml:id attribute supplies the reference ID used to refer to the artist.

Apple extension symbol The ttm:agent attribute can be specified only at one element in the hierarchy. For example, if you specify ttm:agent in a parent element (div), you cannot specify ttm:agent in a child element (p).

<ttm:name type="full">Ryan Gosling</ttm:name>

Name §12.1.6 (optional)

The ttm:name attribute supplies the name and name type of the performer.

The type attribute is required. The value must be "full".

</ttm:agent>

Ends the ttm:agent block.

</metadata>

Ends the metadata block.

Body Element

<body dur="02:29.720">

Body §7.1.3 (required)

Begins the body element, which is used to determine the lyrics and their timings. The <body> element is required, but the dur attribute is optional.

<div begin="00:09.327" end="00:44.651" itunes:song-part="Verse">

Div §7.1.4 (required)

Begins the section within the document where you enter the actual text of the lyrics and the timing.

Each paragraph is represented by a <div> element. The lyrics document can include multiple consecutive <div> elements.

Apple extension symbol Because paragraphs are identified with <div> tags, do not use <p></br></p> to indicate blank lines.

Apple extension symbol Add the optional itunes:song-part attribute to a <div> element to note its compositional purpose in the song. Apple recommends the following values:

  • Verse

  • Chorus

  • PreChorus

  • Bridge

  • Intro

  • Outro

  • Refrain

  • Instrumental

The div blocks in the annotated examples contain the following paragraph tags (<p>). The example paragraphs are explained in the rows that follow.

Note: Currently, song parts are not displayed to the users.

Line-by-line timing example

<p begin="00:09.327" end="00:12.109" ttm:agent="v1">City of stars</p>

<p begin="00:12.426" end="00:15.906" ttm:agent="v1">Are you shining just for me?</p>

<p begin="00:18.972" end="00:21.753" ttm:agent="v1">City of stars</p>

<p begin="00:22.024" end="00:25.552" ttm:agent="v1">There's so much that I can't see</p>

<p begin="00:28.052" end="00:30.752" ttm:agent="v1">Who knows?</p>

<p begin="00:31.373" end="00:37.357" ttm:agent="v1">I felt it from the first embrace I shared with you</p>

<p begin="00:37.803" end="00:41.859" ttm:agent="v2">That now our dreams</p>

<p begin="00:41.957" end="00:44.651" ttm:agent="v2">May finally come true</p>

Example Paragraph §7.1.5

Each <p> tag defines a period of time in which the lyrics in the line are sung.

Timing (optional):

For line-by-line timing, add the timing attributes (begin and end) to the <p> tag.

  • The begin attribute indicates the start time of the line of the lyrics

  • The end attribute indicates when that line ends

For time value details and rules, see Timing.

Agent:

When the lyrics are sung by different artists, you can specify which artist sings the line using the ttm:agent attribute. Each artist can be referenced by the reference ID (xml:id) you assigned in the Agent element (ttm:agent) within the Head element (see annotation above). When using the agent attribute, the following timing rules apply:

  • Timed elements for different agents can overlap.

  • Timing intervals for a single agent must not overlap.

Beat-by-beat timing example

<p begin="02:38.994" end="02:43.927" ttm:agent="v1">

<span ttm:role="x-bg">

<span begin="02:38.994" end="02:39.413">(I'm</span>

<span begin="02:39.413" end="02:40.178">dancing,</span>

<span begin="02:40.178" end="02:40.562">I'm</span>

<span begin="02:40.562" end="02:41.978">dancing)</span>

</span>

<span begin="02:40.407" end="02:41.325">Dancing</span>

<span begin="02:41.325" end="02:41.671">with</span>

<span begin="02:41.671" end="02:41.941">a</span>

<span begin="02:41.941" end="02:42.860">strang</span>

<span begin="02:42.860" end="02:43.927">er</span>

</p>

<p begin="02:43.690" end="02:48.133" ttm:agent="v2">

<span ttm:role="x-bg">

<span begin="02:43.690" end="02:44.039">(I'm</span>

<span begin="02:44.039" end="02:44.874">dancing,</span>

<span begin="02:44.874" end="02:45.191">I'm</span>

<span begin="02:45.191" end="02:46.495">dancing)</span>

</span>

<span begin="02:45.163" end="02:46.061">Dancing</span>

<span begin="02:46.061" end="02:46.411">with</span>

<span begin="02:46.411" end="02:46.747">a</span>

<span begin="02:46.747" end="02:47.558">strang</span>

<span begin="02:47.558" end="02:48.133">er</span>

</p>

Paragraph §7.1.5

Each <p> tag defines a period of time in which the lyrics in the line are sung.

Span §7.1.6

Use the <span> elements to specify beat-by-beat timing and to inset background vocals.

Timing (required for beat-by-beat timing):

Add timing attributes to the <span> elements within a <p> tag. To apply beat-by-beat timing to the entire song, all lines (<p> tags) must have <span> elements with timing attributes.

  • The begin attribute indicates the start time of the <span> element

  • The end attribute indicates when that <span> element ends

There’s no requirement to use <span> elements on word breaks.

Role §12.2.2

The <span> elements can use the ttm:role="x-bg" attribute to inset background vocals.

Place the <span ttm:role="x-bg"> tag either first or last in the <p> tag:

  • If the background vocal starts before the main vocals come in, place the <span ttm:role="x-bg"> tag first, in front of the main vocals <span> tag.

  • Otherwise, place the <span ttm:role="x-bg"> tag last in the <p> tag.

  </div>

 </body>

</tt>

Television Content Profiles

ProRes File Requirements

Important: All ProRes files must contain valid integers to define field handling. Only the following field ("fiel" is displayed in the ProRes header) pair values are accepted by Apple:

  • field count = 1 , field ordering = 0

  • field count = 2 , field ordering = 1

  • field count = 2 , field ordering = 6

  • field count = 2 , field ordering = 9

  • field count = 2 , field ordering = 14

Field count is the “fields” value found in Atom Inspector for fiel atom tag. Field ordering is the “detail” value found in Atom Inspector for fiel atom tag.

HD TV Source Profile

  • Apple ProRes 422 (HQ) or 4444 or 4444 (XQ)

  • ITU-R BT.709 color space, file tagged correctly as 709

  • VBR expected at 88-220 Mbps

  • HD encoded dimensions accepted to support square pixel aspect ratios (PASP):

Encoded

PASP

Converted to ProRes From

1920 x 1080

1:1

HDCAM SR, D5, ATSC

1280 x 720

1:1

ATSC progressive

  • HD encoded dimensions accepted to support non-square pixel aspect ratios (this allows you to send HD video in the native dimensions of your best original source, for example in HD broadcast dimensions*):

Encoded

PASP

Converted to ProRes From

1440 x 1080

1:1.33333

XDCAM-HD, HDCAM

1280 x 1080

1:1.5

DVCProHD interlaced

960 x 720

1:1.33333

DVCProHD progressive

* If your original HD source pixel aspect ratio is non-square, contact your technical representative before delivery.

  • 23.976, 24, 25, 29.97 frame rates are supported

  • Native frame rate of original source: 

    • 29.97 interlaced frames per second video source can be delivered either interlaced or de-interlaced properly tagged as progressive

    • 23.976, 24, and 25 frames per second must be delivered progressive

    • Field dominance must be properly tagged (top field first, bottom field first, or progressive)

    • Fields and frames may not be duplicated or eliminated to create a broadcast frame rate (for example, telecine, NTSC to PAL conversion)

    • For mixed frame rate material, contact your technical representative

  • Interlaced content must be correctly tagged as interlaced and field ordering must be defined in the QuickTime container.

  • Crop dimensions should be supplied in the metadata for content with inactive pixels due to letterbox, pillarbox, or windowbox. Refer to Basic TV Metadata Example in the Apple Transactional TV Specification for further information.

  • Content upscaled from SD will be rejected.

SD TV Source Profile

NTSC

Apple ProRes:

  • Apple ProRes 422 (HQ)

  • VBR expected at 40-60 Mbps

  • 720 x 480 or 720 x 486 encoded pixels; for display at either 853 x 480 for 16:9 content or 640 x 480 for 4:3 content. Properly created 720 x 486 content will have a minimum of 4 pixels of black at the top and 2 at the bottom. Crop values for top and bottom must total at least 6 pixels.

  • All encoded content must include pixel aspect ratio (pasp), preferably one that results in a display aspect ratio of 4:3 or 16:9.

  • Native frame rate of original source:

    • 29.97 frames per second video source can be delivered interlaced.

    • 24 frames per second must be delivered progressive.

    • 23.976 frames per second for inverse telecine must be delivered progressive; must not be delivered interlaced or delivery will fail.

    • Telecine materials will not be accepted.

  • Crop dimensions should be supplied in the metadata for content with inactive pixels due to letterbox, pillarbox, or windowbox. Refer to Basic TV Metadata Example in the Apple Transactional TV Specification for further information.

  • 4:3 standard definition video should not be delivered anamorphic 16:9 with matting.

MPEG-2:

  • MPEG-2 Program Stream Main Profile

  • 4:2:0 chroma sampling

  • ITU-R BT.601 color space

  • 15 Mbps minimum

  • Long GOP

  • 640 fixed horizontal dimension

  • Variable size vertical dimension depending on aspect ratio of source, maximum size of 480

  • Square pixel aspect ratio (1:1)

  • Native frame rate of original source:

    • 29.97 interlaced frames per second video source can be delivered either interlaced or de-interlaced properly tagged as progressive

    • 24 frames per second must be delivered progressive

    • 23.976 frames per second for inverse telecine must be delivered progressive; must not be delivered interlaced or delivery will fail

    • Field dominance must be properly tagged (top field first, bottom field first, or progressive)

    • Fields and frames may not be duplicated or eliminated to create a broadcast frame rate (for example, telecine, NTSC to PAL conversion)

    • For mixed frame rate material, contact your technical representative

  • Interlaced content must be tagged non-progressive and field ordering must be defined in the stream.

  • Crop inactive pixels and maintain fields. All edges must have active pixels for greater than 90% of the duration of the video.

  • Content may NOT be delivered letterbox, pillarbox, or windowbox.

PAL

Apple ProRes:

  • Apple ProRes 422 (HQ)

  • VBR expected at 40-60 Mbps

  • 720 x 576 encoded pixels; for display at either 1024 x 576 for 16:9 content or 768 x 576 for 4:3 content.

  • All encoded content must include pixel aspect ratio (pasp), preferably one that results in a display aspect ratio of 4:3 or 16:9.

  • Native frame rate of original source:

    • 25 frames per second video source can be delivered interlaced or de-interlaced and properly tagged as progressive.

    • Telecine materials will not be accepted.

  • Crop dimensions should be supplied in the metadata for content with inactive pixels due to letterbox, pillarbox, or windowbox. Refer to Basic TV Metadata Example in the Apple Transactional TV Specification for further information.

  • 4:3 standard definition video should not be delivered anamorphic 16:9 with matting.

MPEG-2:

  • MPEG-2 Program Stream Main Profile

  • 4:2:0 chroma sampling

  • ITU-R BT.601 color space

  • 15 Mbps minimum

  • Long GOP

  • 640 fixed horizontal dimension

  • Variable size vertical dimension depending on aspect ratio of source, maximum size of 480

  • Square pixel aspect ratio (1:1)

  • Native frame rate of original source:

    • 25 interlaced frames per second sourced from video must be delivered de-interlaced and properly tagged as progressive

    • 24 and 25 frames per second sourced from film must be delivered progressive

    • 23.976 frames per second for inverse telecine must be delivered progressive; must not be delivered interlaced or delivery will fail

    • Field dominance must be properly tagged (top field first, bottom field first, or progressive)

    • Interlaced materials will not be accepted

    • Fields and frames may not be duplicated or eliminated to create a broadcast frame rate (for example, telecine, NTSC to PAL conversion)

    • For mixed frame rate material, contact your technical representative

  • Crop inactive pixels. All edges must have active pixels for greater than 90% of the duration of the video.

  • Content may NOT be delivered letterbox, pillarbox, or windowbox.

Important: All video must begin and end with at least one black frame. In addition, videos can only have empty edits in the last edit of the edit list; videos with empty edits other than the last edit will be blocked.

TV Audio Source Profile

MPEG-2 Program Stream Container

Stereo

  • MPEG-1 layer II

  • 384 kpbs

  • 48Khz

  • Included in the same file as the delivered video

QuickTime Container

5.1 Surround

Note: Audio track channels for 5.1 Surround can either be all 24 bit or all 16 bit. An audio track cannot be a combination of 16 bit and 24 bit channels.

  • LPCM in either Big Endian or Little Endian, 16-bit or 24-bit, at least 48kHz

  • Expected channels: L, R, C, LFE, Ls, Rs

Stereo

  • LPCM in either Big Endian or Little Endian, 16-bit or 24-bit, at least 48kHz

  • Expected Dolby Pro Logic channels: Lt, Rt or expected stereo channels: L, R

TV Audio/Video Container

MPEG-2 Program Stream Container

  • Deliver all content in an MPEG-2 Program Stream file container.

  • The .mpg file extension is expected for all MPEG-2 content.

  • Audio must be delivered muxed with the video stream.

QuickTime Container

  • Deliver all content in a QuickTime .mov file container.

  • The QuickTime .mov file extension is expected for all audio and video content.

  • For 5.1 surround audio, each audio channel must have an assignment. The channel assignments must match one of the options in the table below. For Option 1 (one track with all six channels), the order of the channel assignments can vary as noted in Option 1a, 1b, 1c, and 1d. Note that "Lt" and "Rt" are only used for Dolby matrix audio mixdown.

Audio channel assignments table
  • For all stereo tracks in all options listed, the channel assignments can be indicated using L and R, or Lt and Rt, but not L and Rt, or R and Lt.

Important: Refer to Audio Channel Assignments for instructions on applying audio channel assignments and label descriptions.

TV Closed Captioning Profile

Note: Closed captioning can be sent with ProRes and MPEG-2 files.

  • Text in EIA-608 format.

  • Delivered in the same package with the video it references.

  • In a Scenarist SCC formatted file, using .scc file extension.

  • The timecode frame rate can only be 29.97 and is independent from your video source frame rate. The timecode format however must match the timecode format of the source video, either drop frame (DF) or non-drop frame (NDF).

    • Drop frame format has colons for the first two time delimiters and a semi-colon for the last time delimiter (HH:MM:SS;FF)

    • Non-drop frame format has colons for all the time delimiters (HH:MM:SS:FF)

Source Video Frame Rate

Closed Caption Frame Rate

Description

Timecode Format

Timecode Example

29.97, 59.94i

29.97

NTSC Video

DF

HH:MM:SS;FF

25, 50i

29.97

PAL Video

NDF

HH:MM:SS:FF

24

29.97

Film

NDF

HH:MM:SS:FF

23.976

29.97

NTSC Film

NDF

HH:MM:SS:FF

  • Captions should display and synchronize to within one second of the initial, audible dialog to be represented in text.

The timecodes of the captions are relative to the start of the program, and not the QuickTime movie's timecode track.

To create EIA-608 closed captions, use Final Cut Pro or a third-party captions-authoring app to output an SCC text file.

Notes:

TV Cover Art Profile

1:1 Cover Art

  • JPEG with .jpg extension (quality unconstrained), PNG with .png extension, or LSR with .lsr extension

  • Color space: RGB (screen standard)

  • LSR files:

    • must have a minimum size of 3000 x 3000 pixels

    • must have a minimum of two layers (five layers maximum)

    • each image within the layered LSR file must have a unique name

    • each image within the layered LSR file must be in PNG format

  • JPG or PNG files should be a minimum size of 3000 x 3000 pixels.

  • 72 dpi minimum resolution

  • 1:1 aspect ratio

  • No nudity or graphic material

  • No promotional material, including URLs or bugs

Do not increase the size of a smaller image to meet the minimum size standard. Excessively blurry or pixelated images will be rejected.

Important: CMYK (print standard) images will not be accepted.

Display P3 Cover Art

  • LSR with .lsr extension or PNG with .png extension (Display P3 Color Profile must be embedded)

  • Color space: Display P3

  • Color mode: RGB

  • Color depth: 16 bits per channel

  • Size: 3000 x 3000 pixels

  • Resolution: 72 dpi minimum

  • No nudity or graphic material

  • No promotional material, including URLs or bugs

16:9 Cover Art

Cover art for 16:9:

  • LSR with .lsr extension or PNG with .png extension

  • Minimum size: 1920 x 1080 pixels; 3840 x 2160 pixels preferred

  • Aspect ratio: 1.75d to 1.80d

  • No nudity or graphic material

  • No promotional material, including URLs or bugs

TV Backdrop Art and Content Logos

Tall Backdrop Art

  • PNG with .png extension

  • Exact size: 1680 x 3636 pixels

Wide Backdrop Art

  • PNG with .png extension

  • Exact size: 4320 x 3240 pixels

Full Color Content Logo

  • Transparent PNG with .png extension

  • Exact size: 4320 x 1300 pixels

Single Color Content Logo

  • Transparent PNG with .png extension

  • Exact size: 4320 x 1300 pixels

  • Text must be white

TV Content Considerations

  • No bugs or logos should be visible during the body of the video.

  • No tune-ins should be visible during the body of the video. Tune-ins are only acceptable at the end of the video.

  • No ratings or advisories should be displayed at any time during the video.

  • Network cards at the beginning and end of the video are accepted as long as they are visible less than five (5) seconds.

  • Commercials or other promotional material, including URLs, are NOT accepted. For more details, please contact your technical representative.

  • Commercial black may be a maximum of 5 seconds.

  • Previews must contain content suitable for a general audience.

  • Previews must not have opening or ending credits and should not start on a black frame.

  • Previews should be unique for each episode on a season.

  • Previews should not contain any spoilers.

  • A minimum of 1 black frame at the beginning and end of each video is required.

Film Content Profiles

ProRes File Requirements

Important: All ProRes files must contain valid integers to define field handling. Only the following field ("fiel" is displayed in the ProRes header) pair values are accepted by Apple:

  • field count = 1 , field ordering = 0

  • field count = 2 , field ordering = 1

  • field count = 2 , field ordering = 6

  • field count = 2 , field ordering = 9

  • field count = 2 , field ordering = 14

Field count is the “fields” value found in Atom Inspector for fiel atom tag. Field ordering is the “detail” value found in Atom Inspector for fiel atom tag.

Film HD Source Profile

  • Apple ProRes 422 (HQ) or 4444 or 4444 (XQ)

  • ITU-R BT.709 color space, file tagged correctly as 709

  • VBR expected at ~220 Mbps

  • HD encoded dimensions accepted to support square pixel aspect ratios (PASP):

Encoded

PASP

Converted to ProRes From

1920 x 1080

1:1

HDCAM SR, D5, ATSC

  • HD encoded dimensions accepted to support non-square pixel aspect ratios (this allows you to send HD video in the native dimensions of your best original source, for example in HD broadcast dimensions*):

Encoded

PASP

Converted to ProRes From

1440 x 1080

1:1.33333

XDCAM-HD, HDCAM

1280 x 1080

1:1.5

DVCProHD interlaced

* If your original HD source pixel aspect ratio is non-square, contact your technical representative before delivery.

  • Native frame rate of original source:

    • 29.97 interlaced frames per second video source can be delivered either interlaced or de-interlaced properly tagged as progressive.

    • 25 interlaced frames per second for video-sourced material.

    • 23.976, 24, 25, or 30 frames per second for digital-progressive or film-sourced material.

  • Content may be delivered matted: letterbox, pillarbox, or windowbox (with proper corresponding crop values in the metadata package).

  • Content upscaled from SD will be rejected.

Important: All video must begin and end with at least one black frame. In addition, videos can only have empty edits in the last edit of the edit list; videos with empty edits other than the last edit will be blocked.

Film HDR Source Profile

The HDR source video must meet the following minimum requirements in the subsections below.

For both Dolby® Vision and HDR10

  • Display dimensions and PASP must match corresponding primary video display dimensions and PASP

  • HDR source video must have progressive scan and uniform frame rate

  • HDR source video track should only contain a single edit list entry

  • Duration must match corresponding primary video duration

  • Frame count must match corresponding primary video frame count

  • HDR format (Dolby® Vision or HDR10) specified as a source attribute

  • The HDR file must contain audio tracks. See Apple Transactional Film Specification for information on how to handle the audio: whether to ignore it, use it, or to set it as the primary audio.

For Dolby® Vision

  • Single Apple ProRes 4444 or 4444 XQ 12-bit file accompanied by a single Dolby® Vision CM metadata file (Dolby Vision CM version 2.9 and CM version 4.0 sidecar metadata files are supported)

  • Transfer function: SMPTE ST 2084 (PQ)

  • White point and color primaries: ITU-R BT.2020 or D65 P3

  • Transform matrix: BT.2020 (for BT.2020 primaries) or BT.709 (for D65 P3 primaries)

  • The Dolby® Vision CM metadata should not contain any gaps in the shots and the frames referenced in the shots should cover all frames in the video essence

For HDR10

  • Single Apple ProRes 4444 or 4444 XQ 12-bit file

  • Transfer function: SMPTE ST 2084 (PQ)

  • White point and color primaries: ITU-R BT.2020 or D65 P3

  • Transform matrix: BT.2020 (for BT.2020 primaries) or BT.709 (for D65 P3 primaries)

  • ST 2086 and MaxCLL / MaxFALL metadata provided as additional source attributes

4K Source Profile

  • Dimensions should be 3840 x 2160 (UHD) or 4096 X 2160 (DCI 4k). Any DCI 4k asset can have optional crop values (Apple strongly recommends sending crop values for DCI 4k).

  • Apple ProRes 422 HQ or 4444 or 4444 XQ

  • VBR expected at ~880 Mbps for 422 HQ, ~1320 Mbps for 4444 and ~2000 Mbps for 4444 XQ

  • Content should be encoded using ITU-R BT.709 color space. For more information see https://www.itu.int/rec/R-REC-BT.709/en

  • Content should be delivered in the original frame rate of the source

  • 4K source must be progressive scan and can be delivered in 23.976, 24, 25, 29.97, or 30 frames per second

  • If the 4K source file is not delivered matted or if there are no inactive pixels, Apple recommends setting all crop dimension attributes to '0' (zero).

  • All video must begin and end with at least one black frame. In addition, videos that begin with or contain empty edits will be blocked; the file can contain an empty edit in its edit list only if it is the last edit. For HDR source video in Dolby® Vision format, the sidecar metadata file should cover these black frames.

Film SD Source Profile

NTSC

  • Apple ProRes 422 (HQ)

  • VBR expected at 40-60 Mbps

  • 720 x 480 or 720 x 486 encoded pixels; for display at either 853 x 480 for 16:9 content or 640 x 480 for 4:3 content. Properly created 720 x 486 content will have a minimum of 4 pixels of black at the top and 2 at the bottom. Crop values for top and bottom must total at least 6 pixels.

  • All encoded content must include pixel aspect ratio (pasp), preferably one that results in a display aspect ratio of 4:3 or 16:9.

  • Native frame rate of original source:

    • 29.97 frames per second video source can be delivered interlaced

    • 24 frames per second must be delivered progressive

    • 23.976 frames per second for inverse telecine must be delivered progressive; must not be delivered interlaced or delivery will fail

    • Telecine materials will not be accepted

  • Content may be delivered matted: letterbox, pillarbox, or windowbox.

  • 4:3 standard definition video should not be delivered anamorphic 16:9 with matting. For example, the minimum post-cropped display must equal or exceed 768 x 576.

PAL

  • Apple ProRes 422 (HQ)

  • VBR expected at 40-60 Mbps

  • 720 x 576 encoded pixels; for display at either 1024 x 576 for 16:9 content or 768 x 576 for 4:3 content

  • All encoded content must include pixel aspect ratio (pasp), preferably one that results in a display aspect ratio of 4:3 or 16:9.

  • Native frame rate of original source:

    • 24 and 25 frames per second sourced from film must be delivered progressive

    • 23.976 frames per second for inverse telecine must be delivered progressive; must not be delivered interlaced or delivery will fail

    • Telecine materials will not be accepted

  • Content may be delivered matted: letterbox, pillarbox, or windowbox.

  • 4:3 standard definition video should not be delivered anamorphic 16:9 with matting. For example, the minimum post-cropped display must equal or exceed 768 x 576.

Important: All video must begin and end with at least one black frame. In addition, videos that begin with or contain empty edits will be blocked; the file can contain an empty edit in its edit list only if it is the last edit.

Stereoscopic Video Source Profile

To deliver stereoscopic video content, you need to submit two video files:

  • Left-eye video file, optionally with an associated stereo audio track in a single QuickTime container

  • Right-eye video file with no audio track

These video files can be standard dynamic range (SDR) sources, or high dynamic range (HDR) sources, but not both for the same asset. If you use HDR sources, the HDR format type should be Dolby® Vision. Refer to Stereoscopic Video Delivery in Apple Transactional Film Specification for details.

In addition, the following formats are supported for video sources.

Dyanmic Range

Codec

Variant

Primaries

Transfer Function

SDR

Apple ProRes

422HQ

ITU-R BT.709-6

ITU-R BT.709-6

HDR

Apple ProRes

4444XQ

ITU-R BT.2020

SMPTE ST-2084

HDR

Apple ProRes

4444XQ

P3-D65

SMPTE ST-2084

Film Audio Source Profile

For every film that 5.1 or 7.1 Surround audio is available in any competing format or market, it must be provided to Apple in addition to the stereo tracks.

Note: Audio track channels for 5.1 and 7.1 Surround can either be all 24 bit or all 16 bit. An audio track cannot be a combination of 16 bit and 24 bit channels.

7.1 Surround

  • LPCM in either Big Endian or Little Endian, 16-bit or 24-bit, at least 48kHz

  • Expected channels: L, R, C, LFE, Ls, Rs, Rls (or Lrs), Rrs

5.1 Surround

  • LPCM in either Big Endian or Little Endian, 16-bit or 24-bit, at least 48kHz

  • Expected channels: L, R, C, LFE, Ls, Rs

Stereo

  • LPCM in either Big Endian or Little Endian, 16-bit or 24-bit, at least 48kHz

  • Expected Dolby Pro Logic channels: Lt, Rt or expected stereo channels: L, R

Dolby Atmos Audio Source Profile

  • The audio mix must be approved for home listening and monitored in a room with at least a 7.1.4 speaker layout.

  • If the Dolby Atmos file can be used for derivation of legacy audio assets (7.1ch, 5.1ch and stereo LtRt audio), these legacy renders must also be approved.

  • You must conform and sync all audio deliverables to final picture as long-play and not as separate reels.

  • Leader and sync pop should be removed from the Dolby Atmos file.

  • Provide the Dolby Atmos file as a Broadcast Wave Format Audio Definition Model (BWF ADM) file.

  • All audio tracks in the file must be 24-bit LPCM audio at 48kHz.

  • There must not be more than 128 individual audio tracks.

  • Tracks 1-128 may be used for objects or beds.

  • Other content metadata (for example, the desired artistic compression profiles or the downmix mix levels) should be correctly authored in the DBMD section of the BWF ADM file.

  • The average loudness of dialogue or speech in the Dolby Atmos file must be within the range of -31 LKFS to -10 LKFS, and should ideally be between -30 LKFS and -18 LKFS. The following measurement methodology should be used:

    • Run the loudness measurement on a 5.1 channel render of the full mix.

    • Use the measurement algorithm BS.1770 + Dialogue Intelligence (or other speech-gating algorithm) to measure the dialogue-gated loudness, integrated over the full duration of the asset, to verify it falls within the specified range indicated above.

    • Monitor the reported percentage of dialogue. If it is less than 10%, dialogue may not be the anchor element for loudness correction of this audio asset. You should instead follow the procedure in the paragraph below.

  • For Dolby Atmos files where dialogue is not the anchor element (for example, music assets), the average audio loudness of the Dolby Atmos file must be within the range of -31 LKFS to -5 LKFS. The following measurement methodology should be used:

    • Run the loudness measurement on a 5.1 channel render of the full mix.

    • Use the BS.1770-4 (or BS.1770-3) measurement algorithm to measure the full-program mix (all channels over the entire length of the asset) to verify that loudness falls within the specified range indicated above.

Film Audio/Video and Alt-Audio Container

  • Deliver all content in a QuickTime .mov file container.

  • The QuickTime .mov file extension is expected for all audio and video content.

  • For 7.1 surround audio, the channel assignments must match one of the options in the table below. Option 1 shows one track with all eight channels and Option 2 shows one track for each channel. Note that “Rsl” (Rear Surround Left) can also be represented by “Rls” (Rear Left Surround).

    image of table showing audio channel assignments
  • For 5.1 surround audio, the channel assignments must match one of the options in the table below. For Option 1 (one track with all six channels), the order of the channel assignments can vary as noted in Option 1a, 1b, 1c, and 1d. Note that "Lt" and "Rt" are only used for Dolby matrix audio mixdown.

    Audio channel assignment table
    • For all stereo tracks in all options listed, the channel assignments can be indicated using L and R, or Lt and Rt, but not L and Rt, or R and Lt.

    Important: Refer to Audio Channel Assignments for instructions on applying audio channel assignments and label descriptions. Refer to Table 1: ProRes Audio Channel Data Assignment and Levels for audio levels and channel assignments for music, sound effects, and dialogue.

    Note: For more information on alternate audio, see Assets and Data Files in Apple Transactional Film Specification.

Film Closed Captioning Profile

  • Text in EIA-608 format.

  • Delivered in the same package with the video it references.

  • In a Scenarist SCC formatted file, using .scc file extension.

  • The timecode frame rate can only be 29.97 and is independent from your video source frame rate. The timecode format however must match the timecode format of the source video, either drop frame (DF) or non-drop frame (NDF).

    • Drop frame format has colons for the first two time delimiters and a semi-colon for the last time delimiter (HH:MM:SS;FF)

    • Non-drop frame format has colons for all the time delimiters (HH:MM:SS:FF)

Source Video Frame Rate

Closed Caption Frame Rate

Description

Timecode Format

Timecode Example

29.97, 59.94i

29.97

NTSC Video

DF

HH:MM:SS;FF

25, 50i

29.97

PAL Video

NDF

HH:MM:SS:FF

24

29.97

Film

NDF

HH:MM:SS:FF

23.976

29.97

NTSC Film

NDF

HH:MM:SS:FF

  • Captions should display and synchronize to within one second of the initial, audible dialog to be represented in text.

The timecodes of the captions are relative to the start of the program, and not the QuickTime movie's timecode track.

To create EIA-608 closed captions, use Final Cut Pro or a third-party captions-authoring app to output an SCC text file.

Notes:

Film Audio Description (AD) Profile

  • Provides an alternate audio track for Audio Description (AD) and includes a description of visual elements that are important in understanding what is occurring at the time, as well as the plot, music, dialogue, and sound effects.

  • Delivered in pre-mixed format and in 2.0 stereo and optionally 5.1 surround (no 7.1 surround).

  • Audio Description (AD) files are accepted for all languages and the Audio Description is included when creating bundles. To be included in a bundle, the locale of the Audio Description must match the locale of a corresponding source audio or alternate audio file. If you send an Audio Description file that doesn’t have a corresponding audio file, it will not go live.

Film iTunes Timed Text Profile

Below is a summary of delivery requirements for iTunes Timed Text files. Refer to iTunes Timed Text profile in the Apple Transactional Film Specification for complete details.

  • Delivered in an iTunes Timed Text (iTT) formatted file, using .itt file extension.

  • Delivered in the same package with the video it references as an asset in the <assets> block.

  • Only one div element is allowed in an ITT document.

  • timeBase must be set to smpte.

  • dropMode must be set to "dropNTSC" or "nonDrop"; iTunes Timed Text does not support dropPAL.

  • Only sansSerif may be specified as the typeface in fontFamily.

The iTT file format is a subset of the Timed Text Markup Language, Version 1 W3C Recommendation 24 September 2013 (TTML) (https://www.w3.org/TR/ttml1/) from the World Wide Web Consortium (W3C) (https://www.w3.org). All iTT documents are TTML documents that use the restricted subset of TTML.

To create iTT subtitles, use Final Cut Pro or a third-party captions-authoring app to output an iTT file.

Film Dub Card Video Profile

The full feature-length video asset is comprised of a set of data files, which play specific roles for their asset. The following table describes the optional data file for dub card video.

Asset Type

Data File

Description

Full

Role: video.end.dub_credits

An optional data file containing the credits associated with an audio track.

A video-only sequence containing one or more still credits specific to the locale-matched audio. Products include dub credit video sequences for the associated audio dubs following the main program.

Locale: Required

Dub Card Video Profile

  • Apple ProRes 422 (HQ)

  • Movie correctly tagged with color parameter: ITU BT.709

  • Video dimensions, pixel aspect ratio, and frame rate must match full program video

  • Dub card time scale must match the time scale of the video source.

  • Minimum of 4 seconds per dub card

  • If a dub card video has a resolution that does not match the resolution of the video source, you must supply different crop dimension attributes for dub cards to match the aspect ratio of the feature.

  • Sound tracks should not be supplied for dub card video — sound tracks will be ignored

  • Dub card video will be deinterlaced if necessary so the field order does not need to match — progressive is preferred

  • Dissolves and scrolling credits are not supported

  • First and last frames do not need to be black frames

Film Card Profile

The full feature-length video asset is comprised of a set of data files, which play specific roles for their asset. The following table describes the optional data file for cards.

Asset Type

Data File

Description

Full

Role: card

An optional data file containing an image associated with the film.

An image file containing information, such as a certificate, about the film. Apple will create a video using the image for playback on the Store.

The card data file has a subtype attribute to distinguish the type of card that is delivered. Currently, there is only one subtype (certificate).

Card Image Profile

  • JPEG with .jpg extension or PNG with .png extension

  • Minimum size of 640 x 480 pixels

  • Color space: RGB (screen standard)

  • Only active pixel area may be included

  • Certificate images must be cropped (no letterbox, pillarbox, or windowbox)

  • Certificate images must contain only certificate content

  • Do not increase the size of a smaller image to meet the minimum size standard. Excessively blurry or pixelated images will be rejected.

Film Chapter Image Profile

  • JPEG with .jpg extension (quality unconstrained) or PNG with .png extension

  • RGB (screen standard)

  • Must be same aspect ratio as video source

  • 640 minimum horizontal dimension (larger for HD sourced)

  • Variable size vertical dimension (based on aspect ratio of video source)

  • Only active pixel area may be included (except where necessary to match the overall aspect ratio of the full program)

  • Chapter images must be cropped (no letterbox, pillarbox, or windowbox, except as noted above)

  • Chapter images must contain picture content

  • Chapter image files must be unique with different checksums

Important: CMYK (print standard) images will not be accepted.

Film Poster Art Profile

2:3 Poster Art

  • JPEG with .jpg extension (quality unconstrained), PNG with .png extension, or LSR with .lsr extension

  • Color space: RGB (screen standard)

  • LSR files:

    • must have a minimum size of 2000 x 3000 pixels

    • must have a minimum of two layers (five layers maximum)

    • each image within the layered LSR file must have a unique name

    • each image within the layered LSR file must be in PNG format

  • JPG or PNG files should be a minimum size of 2000 x 3000 pixels.

  • 2:3 aspect ratio

  • Poster art (one-sheet) from film. Must contain key art and title. DVD cover, release date, website, or promotional tagging may not be included.

  • Poster art must not display film ratings.

Do not increase the size of a smaller image to meet the minimum size standard. Excessively blurry or pixelated images will be rejected.

Important: CMYK (print standard) images will not be accepted.

Display P3 Poster Art

  • LSR with .lsr extension or PNG with .png extension (Display P3 Color Profile must be embedded)

  • Color space: Display P3

  • Color mode: RGB

  • Color depth: 16 bits per channel

  • Size: 2000 x 3000 pixels

  • Resolution: 72 dpi minimum

  • Poster art (one-sheet) from film. Must contain key art and title. DVD cover, release date, website, or promotional tagging may not be included.

  • Poster art must not display film ratings.

16:9 Poster Art

  • Must be LSR (.lsr), PNG (.png), or JPG (.jpg - quality unconstrained)

  • Minimum size for PNG and JPG: 1920 x 1080 pixels but 3840 x 2160 pixels preferred

  • Minimum size for LSR: 3840 x 2160 pixels

  • Resolution: 72 dpi

  • Aspect ratio: 1.75d to 1.80d

  • Poster art (one-sheet) from film. Must contain key art and title. DVD cover, release date, website, or promotional tagging may not be included.

  • Poster art must not display film ratings.

Film Backdrop Art and Content Logos

Tall Backdrop Art

  • PNG with .png extension

  • Exact size: 1680 x 3636 pixels

Wide Backdrop Art

  • PNG with .png extension

  • Exact size: 4320 x 3240 pixels

Full Color Content Logo

  • Transparent PNG with .png extension

  • Exact size: 4320 x 1300 pixels

Single Color Content Logo

  • Transparent PNG with .png extension

  • Exact size: 4320 x 1300 pixels

  • Text must be white

Film Content Considerations

  • The full movie asset should not contain FBI, MPAA, or release date tagging.

  • The trailer asset should not contain FBI, MPAA, or release date tagging.

  • A minimum of 1 black frame at the beginning and end of each video is required.

  • Trailer should be same aspect ratio as the full asset.

  • Promotional bumpers, including URLs, are NOT accepted. For more details, contact your technical representative.

  • For US and Canadian preview trailers, the contents of the preview must be appropriate for general audiences.

  • For preview trailers from all other countries, the contents of the preview cannot exceed the rating classification of the feature film for territories in which it is available. Previews being made available worldwide (WW) must be suitable for all territories that are cleared for sale.

  • Poster art should not contain DVD tagging, release date tagging, or website tagging.

Epic Stage Video Profiles

Epic Stage Video Considerations

Epic stage videos are used for Epic Stage. They are supported for Apple TV Channels only.

Considerations

An epic stage video must ensure a smooth transition from static to video. In the first two seconds of video there should be:

  • No black first frame

  • No quick cuts

  • No abrupt/aggressive sound effects or music cues

  • No dialogue

Throughout the preview:

  • No title cards, studio logos, accolades or festival awards, credits, longlines or narrative title cards, and no CTAs (calls to action)

For the end shot:

  • Give the end shot time to resolve or settle before transitioning back to the static. Recommendation is two seconds.

  • No black last frame

Note: Design for a sound-on experience.

Epic stage videos must not contain nudity, violence, blood, racist/hate imagery, or graphic sexual material, no guns pointed at viewer, and must be suitable for all ages. Also, see Minimum Storefront Requirements to ensure the quality of content offered on the Apple TV Channels subscription service meets a set of minimum requirements for each territories.

Epic Stage Video Wide Requirements

This is a summary of delivery requirements. See Epic Stage Video Considerations for additional considerations.

  • Platform: Apple TV, Desktop, iPad

  • Aspect Ratio: 16:9

  • Dimensions: 3840 × 2160 pixels

  • File Format: ProRes 4444, full frame

  • Frame Rate: Minimum 23.98 frames per second

  • Audio: 2.0 (required), 5.1 and Atmos (optional)

  • Duration: 0:30–0:40 seconds (recommended)

  • Color Profile: 8 bit

Epic Stage Video Tall Requirements

This is a summary of delivery requirements. See Epic Stage Video Considerations for additional considerations.

  • Platform: iPhone

  • Aspect Ratio: 3:4

  • Dimensions: 1620 x 2160 pixels

  • File Format: ProRes 4444, full frame

  • Frame Rate: Minimum 23.98 frames per second

  • Audio: 2.0 (required), 5.1 and Atmos (optional)

  • Duration: 0:15–0:20 seconds (recommended)

  • Color Profile: 8 bit

XML

  • All XML must be encoded in UTF-8.

  • No byte order markers (BOM) can be used.

  • There should be no null data or empty tags in the XML. If not used, elements should be removed.

  • The XML must be formatted to use line breaks and indentations.

For further information, refer to the appropriate media type metadata specification, or consult with your technical representative.

Audio Channel Assignments

Audio Channel Assignments

Previously, you could tag audio channel assignments using QuickTime Pro 7. However, QuickTime Pro 7 is a 32-bit application. Starting with the release of macOS Catalina, macOS supports 64-bit applications only.

Instead, you can use Compressor 4.4.5 to tag audio channel assignments for QuickTime movies using Compressor’s command-line interface. Use the command-line option -⁠relabelaudiotracks followed by the channels you want to tag and their values. See Audio channel layouts in Compressor for more on audio channel layouts.

The supported values for audio tagging are:

Label

Description

L

Left

R

Right

C

Center

LFE

LFE Screen

Ls

Left Surround

Rs

Right Surround

Lc

Left Center

Rc

Right Center

Rsl (or Rls)

Rear Surround Left (or Rear Left Surround)

Rsr (or Rrs)

Rear Surround Right (or Rear Right Surround)

Lt

Left Total

Rt

Right Total

LtRt

Matrix stereo (Lt Rt)

stereo

Stereo (L R)

Examples:

  • Given a QuickTime Movie file with two audio tracks where the first audio track contains the Left audio channel and the second audio track contains the Right audio channel, type the following at the command line:

    /Applications/Compressor.app/Contents/MacOS/Compressor -relabelaudiotracks L R <path_to_Quicktime_Movie_source_file>

  • Similarly, given a QuickTime Movie file with one audio track where the track contains both the Lt and Rt audio channels (that is, matrix stereo), type the following at the command line:

    /Applications/Compressor.app/Contents/MacOS/Compressor -relabelaudiotracks LtRt <path_to_Quicktime_Movie_source_file>

The above commands overwrite the audio assignments in the original source file, which can speed up the process. To save to a new file, add -⁠locationpath to the command, for example:

/Applications/Compressor.app/Contents/MacOS/Compressor -relabelaudiotracks L R <path_to_Quicktime_Movie_source_file> -locationpath <path_to_output_file>

and

/Applications/Compressor.app/Contents/MacOS/Compressor -relabelaudiotracks LtRt <path_to_Quicktime_Movie_source_file> -locationpath <path_to_output_file>

See Intro to shell commands in Compressor for more information using the command line. And in the command line tool, you can display help using: /Applications/Compressor.app/Contents/MacOS/Compressor -help.

Table 1: ProRes Audio Channel Data Assignment and Levels

Label

Data and Levels

L

MnE

R

MnE

C

Dialogue

LFE

Effects

Ls

Music

Rs

Music

Lt

Mixdown of 5.1 (stereo)

Rt

Mixdown of 5.1 (stereo)

Max peak:

-6db

LKFS:

-24db

Previous Guide Revisions

The following table lists the previously-released guides and the revisions:

Date/Version

Summary

January 16, 2024 - Version 5.3.13

Updated the immersive audio source requirements. Renamed this asset guide to Video and Audio Asset Guide.

December 4, 2023 - Version 5.3.12

Corrections in the TTML file format specifications.

August 14, 2023 - Version 5.3.11

Updated the TTML file format specifications. Additional minor updates and clarifications.

February 27, 2023 - Version 5.3.10

Updates in the immersive audio, film HD, film HDR, and Dolby Atmos audio source profiles. A correction on the dimensions for tall epic stage videos.

January 17, 2023 - Version 5.3.9

Added the Hi-Res Lossless Profile, the Music Album Motion Art Profile, and the Epic Stage Video Profiles. Made some additional corrections.

October 4, 2021 - Version 5.3.8

Corrected accepted audio sample rates. Clarified Dolby Atmos audio.

July 19, 2021 - Version 5.3.7

Removed references to CBFC for India.

June 03, 2021 - Version 5.3.6

Revised source requirements for immersive audio. Clarified the difference between immersive and spatial audio. Updated film poster art requirements.

May 17, 2021 - Version 5.3.5

Added source requirements and best practices for immersive audio.

August 31, 2020 - Version 5.3.4

Dolby Vision CM 4 sidecar metadata files are now supported. Added a new data file role for delivering CBFC Certificates for content sold in India. Added Rrs (Rear Right Surround) as a label for audio channel assignments.

August 10, 2020 - Version 5.3.3

Added gamma value to music video profile. Chapter images can be sent in PNG format. QuickTime Pro 7 has been deprecated. The iTunes Closed Captioning Testing Guide has been deprecated.

October 28, 2019 - Version 5.3.2

Updated Apple Digital Masters requirements. Updated Music Audio Source requirements.

August 7, 2019 - Version 5.3.1

Removed audiobooks requirements. Rebranded MFiT.

May 13, 2019 - Version 5.3

Added requirements for 16:9 poster art, content logos, and backdrop art for both TV and Film. Changed the version number of this guide from 5.2 to 5.3 to keep the version number in sync with the new schema version.

January 30, 2019 - Version 5.2.14

Added 30fps to HD and SD music video source profiles. Added audiobooks profile.

August 8, 2018 - Version 5.2.13

Updated music video source profile. Added Dolby Atmos Audio source for film. Clarified pixel aspect ratio.

May 2, 2018 - Version 5.2.12

Requirements for music video screen capture images have changed.

Poster art and cover art requirements for P3 displays have been added for Film and TV.

February 21, 2018 - Version 5.2.11

Updated screen capture image requirements for music video. HD content upscaled from SD will be rejected. Corrected a link.

October 18, 2017 - Version 5.2.10

Added source asset requirements for HDR and 4K video.

July 12, 2017 - Version 5.2.9

Added ability to set audio levels and channel assignments for music, sound effects, and dialogue. Audio Description (AD) files can be delivered in 5.1 surround audio. Crop dimensions are allowed for dub card video. Poster art and cover art size requirements have changed.

May 17, 2017 - Version 5.2.8

The requirements for film trailers have changed. The requirements for HD Source for music video, TV, and film have been updated.

January 26, 2017 - Version 5.2.7

Added a chapter to describe the file format used to deliver song lyrics for album song tracks. Clarified single-channel audio.

August 17, 2016 - Version 5.2.6

Updates to Audio Description (AD).

May 26, 2016 - Version 5.2.5

Audio track channels for 5.1 and 7.1 Surround can either be all 24 bit or all 16 bit. Dub card time scale must match the time scale of the video source.

January 15, 2016 - Version 5.2.4

Additional formats added for HD source. Updated a URL link.

September 28, 2015 - Version 5.2.3

Changed requirements for poster art for film and TV. Added requirements for layered images for film. Clarified closed captions.

July 16, 2015 - Version 5.2.2

Added requirements for 7.1 surround audio for film. Changed requirements for album cover art.

January 8, 2015 - Version 5.2.1

Added explanations of field count values for ProRes. Added display dimensions of HD source to accommodate video formats that use non-square pixels, for example, in broadcast dimensions. Added Audio Description (AD) requirements for film. Clarified music video screen capture image.

April 9, 2014 - Version 5.2

Added closed caption requirements for music videos. Added a best practice for MFiT content. Added requirements for TV cover art. Added guidelines to TV content considerations. Clarified SCC files for both TV and Film. Changed the version number of this specification from 5.1 to 5.2 to keep the version number in sync with the new schema version.

March 20, 2013 - Version 5.1.1

Corrected the ProRes SD Profiles for NTSC and PAL.

February 28, 2013 - Version 5.1

Added best practices for delivering Mastered for iTunes content. Changed requirements for ringtone album cover art. Clarified acceptable frame rates for closed captioning for TV and Film. Videos with empty edits other than the last edit will be blocked.

November 7, 2012 - Version 5.0.1

Clarified SD Source video for film. Added new video source validations. Added Apple ProRes to NTSC and PAL SD TV source profiles. Changed requirements for QuickTime audio channel assignments. Closed captions can be added to MPEG-2 sources

May 30, 2012 - Version 5.0

Album cover art and poster art requirements have changed. Removed TIFF from the list of recommended image formats and removed DPI requirements. Added delivery requirements for dub card video. 96Khz audio is now supported.

September 22, 2011 - Version 4.8

Added crop dimensions for TV. Clarified content considerations for TV. Clarified closed captioning for TV. Added delivery requirements for iTT files.

July 13, 2011 - Version 4.7.2

Clarified delivery requirements for 5.1 audio and closed captioning. Added the profile for closed captioning for TV. Film poster art requirements have changed.

April 15, 2011 - Version 4.7

Clarified HD cropping for TV. Added color space requirement for HD film source. Clarified closed captioning text for film.

February 9, 2011 - Version 4.6

Removed asset specifications for books (a new iBooks Store asset guide has been created). Renamed this asset guide to: iTunes Video and Audio Asset Guide.

November 5, 2010 - Version 4.5

Clarified surround sound for HD music video audio source profile. Clarified delivery of HD source for music videos. Added a chapter for book source profiles. Put back 25 fps in the HD TV source profile that was incorrectly removed. Added two new best practice items to the TV Content Considerations section.

August 5, 2010 - Version 4.4

Added source profile for HD music video and cropping information. Clarified album cover art. Added surround sound to HD music video audio source profile.

February 3, 2010 - Version 4.3

Clarified that ALAC in a CAF container is allowed. Added source profile for pre-cut ringtones. Clarified that film ratings should not appear on poster art.

December 18, 2009 - Version 4.2

Clarified quality standards. Clarified closed captioning.

November 10, 2009 - Version 4.1

Clarified audio requirements for music and film.

September 11, 2009 - Version 4.0

Added best practices content for Film. Clarified requirements for SCC files.

July 1, 2009 - Version 3.3.2

Clarified image and audio requirements. Clarified frame rate requirements for TV.

May 12, 2009 - Version 3.3.1

Added support for PNG format images for cover art, poster art, and video screen captures. PNG images are not currently supported for chapter thumbnail images.

March 17, 2009 - Version 3.3

Added updated PAL support for film. Added closed-captioning to Film Content Profile. Added 24-bit support for audio. Added best practices content for TV. Clarified how to send stereo sound for Film and TV.

October 1, 2008 - Version 3.2

Added audio source specification to Music Audio Content Profile, added HD format to Television Content Profile and Appendix I, which provides audio channel assignments instructions.

May 8, 2008 - Version 3.1.1

Complete reformatting of the Guide. Separation of content type profiles. Addition of Movie HD and SD specification. Addition of image specifications for TV and Film.

April 2, 2007 - Version 2.3

Introduction of Asset Specification Guide.