Media Overlays Structure

During the read aloud narration, the text being read can be highlighted word-by-word, sentence-by-sentence, or not highlighted at all. Highlighting words during read aloud is accomplished using Media Overlays. The Media Overlay is EPUB3's method of syncing a portion of an audio file to a phrase of corresponding text. Text phrases are identified using a standard HTML id attribute. The corresponding audio is referenced by a start time and end time. The identified text and audio are paired together using a SMIL XML file. The SMIL file contains a series of <par> elements, each containing an <audio> element and a <text> element. Both the <text> and <audio> elements contain a required src attribute. The src attribute used in the <text> element uses a URL with a fragment identifier (the segment attached to the end of the src attribute starting with a # (hash)) to point to the identified word, text phrase, or sentence. The src attribute used in the <audio> element is a URL pointing to the location of the audio file within the EPUB bundle. The highlighting of words or sentences in the <text> element is defined by the fragment identifier, and the corresponding spoken words or sentences in the <audio> element is defined by the attributes, clipBegin and clipEnd.

Note: Apple recommends that you start the audio on the title page and have the title and author read as part of the audio.

SMIL File Example

<?xml version="1.0" encoding="UTF-8"?><smil xmlns="http://www.w3.org/ns/SMIL" version="3.0" profile="http://www.idpf.org/epub/30/profile/content/"> <body> <par id="par1"> <text src="page1.xhtml#word0"/> <audio src="audio/page1.m4a" clipBegin="5s" clipEnd="15s"/> </par> <par id="par2"> <text src="page1.xhtml#word2"/> <audio src="audio/page1.m4a" clipBegin="15s" clipEnd="25s"/> </par> </body></smil>

HTML File Example

<p> <span id="word0">Shall</span> <span id="word1">I</span> <span id="word2">compare</span> <span id="word3">thee</span> <span id="word4">to</span> <span id="word5">a</span> <span id="word6">summer's</span> <span id="word7">day?</span></p>

Notes

Pages Without Audio

You can set the timing for page turning in pages that do not have audio. Apple Books has two default zoom levels in each orientation: page and spread. When a reader zooms into a page, each page is focused independently during navigation. When a reader zooms to a spread, the spread is treated as a single step during book navigation.