During the read aloud narration, the text being read can be highlighted word-by-word, sentence-by-sentence, or not highlighted at all. Highlighting words during read aloud is accomplished using Media Overlays. The Media Overlay is EPUB3's method of syncing a portion of an audio file to a phrase of corresponding text. Text phrases are identified using a standard HTML id
attribute. The corresponding audio is referenced by a start time and end time. The identified text and audio are paired together using a SMIL XML file. The SMIL file contains a series of <par>
elements, each containing an <audio>
element and a <text>
element. Both the <text>
and <audio>
elements contain a required src
attribute. The src
attribute used in the <text>
element uses a URL with a fragment identifier (the segment attached to the end of the src
attribute starting with a # (hash)) to point to the identified word, text phrase, or sentence. The src
attribute used in the <audio>
element is a URL pointing to the location of the audio file within the EPUB bundle. The highlighting of words or sentences in the <text>
element is defined by the fragment identifier, and the corresponding spoken words or sentences in the <audio>
element is defined by the attributes, clipBegin
and clipEnd
.
Note: Apple recommends that you start the audio on the title page and have the title and author read as part of the audio.
<?xml version="1.0" encoding="UTF-8"?>
<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0"
profile="http://www.idpf.org/epub/30/profile/content/">
<body>
<par id="par1">
<text src="page1.xhtml#word0"/>
<audio src="audio/page1.m4a" clipBegin="5s" clipEnd="15s"/>
</par>
<par id="par2">
<text src="page1.xhtml#word2"/>
<audio src="audio/page1.m4a" clipBegin="15s" clipEnd="25s"/>
</par>
</body>
</smil>
<p>
<span id="word0">Shall</span>
<span id="word1">I</span>
<span id="word2">compare</span>
<span id="word3">thee</span>
<span id="word4">to</span>
<span id="word5">a</span>
<span id="word6">summer's</span>
<span id="word7">day?</span>
</p>
All <par>
elements must follow the narrative order of the book. (For example, <par id=”par2”>
must follow <par id=”par1”>
)
Highlighting the words during read aloud can be as detailed or broad as the content-creator defines it. For children's books, word-by-word highlighting is strongly preferred. Text ID attributes could also be defined at a sentence level.
The highlighting is defined using CSS. You can set the color of the highlight or make the color of the highlight the same color as the text to turn off the highlighting. See CSS Styling of Media Overlays.
Create one SMIL document per XHTML document.
You can set the timing for page turning in pages that do not have audio. Apple Books has two default zoom levels in each orientation: page and spread. When a reader zooms into a page, each page is focused independently during navigation. When a reader zooms to a spread, the spread is treated as a single step during book navigation.
If Turn Pages is set to Automatically, Apple Books pauses reading for 3 seconds on any pages or spreads that do not have any associated audio. After 3 seconds, reading continues, and the reader is taken to the next page or spread.
If Turn Pages is set to Manually, Apple Books takes the reader to pages or spreads with no audio, and the corner of the page is immediately turned up, indicating to the reader that it is time for them to turn the page.
To override this behavior and skip the spread entirely, provide a <par>
that corresponds to the skipped spread, and define a duration of 0s. If you want a pause longer than 3 seconds, build that time into the audio file.