Patrick Schmitz
November 29, 2000
Technical Report
MSR-TR-2000-114
Microsoft Research
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
This note describes a model for unifying event-based indeterminate timing (also known as atemporal composition) and declarative, determinate timing. Background rationale is presented, and a mechanism is described for processing events in the timing model and scheduler. The specifics of integrating this model into the SMIL timing model are described.
This paper was originally published June 6, 1999 as a W3C (SYMM working group) internal note.
Introduction
Definition of Terms
Background and Rationale
SMIL 1.0 Approach
Unifying Schedulers and Generic Event-based Timing
Applying the unified model to SMIL
The first W3C Working Group on Synchronized Multimedia (SYMM) developed SMIL - Synchronized Multimedia Integration Language. This XML-based language is used to express synchronization relationships among media elements. SMIL 1.0 documents describe multimedia presentations that can be played in SMIL-conformant viewers.
As part of the current SYMM Activity, the Working Group is extending the SMIL Timing and Synchronization support, and generalizing the support provided in SMIL 1.0. Additional capabilities will be added to the timing model, as well as support for integration with HTML and XML languages. See also "Synchronized Multimedia Modules based upon SMIL 1.0".
Among the areas of interest is support for interactive timing, also described as atemporal composition. This document describes a model for unifying traditional scheduled time models and interactive event-based models for multimedia.
I present these terms for the sake of this discussion. These definitions may not necessarily apply to the more general context of multimedia - their significance here is only for this limited context.
endsync
attribute). For file-based media,
the intrinsic duration is often finite.
For synthetic media and for timelines, this can be indefinite or infinite
(depending upon the semantics of the model).The model I present came about as an evolution of experience with a variety of multimedia runtimes and schedulers. Many of these were supported in popular multimedia authoring tools, video and audio editing tools and some platform APIs for multimedia. In general, authoring tools tend to present an authoring model that is closely aligned to the actual implementation model of the associated runtime engine. While this is not a requirement, a unified model for scheduled and event-based timing will benefit both the runtime implementation as well as the authoring model.
For a long time, multimedia runtimes supported one of two basic models:
SMIL 1.0 presents something of a hybrid of these models, but with some constraints. The issue of performance QOS is left to the implementation, and is described as either "hard" or "soft" synchronization. In the "hard" sync model, most of the SMIL time model is scheduled and determinate. In the "soft" sync model, it seems to be undefined.
One significant point in the SMIL 1.0 model however is the ability to handle indeterminate durations for some media elements. Some media has a finite duration, but this duration is not known until the presentation of the associated media element is complete (or at least the data has all been downloaded). Other elements that are defined relative to such an element duration (e.g. successive elements in a sequence timeline) also have indeterminate timing. The timing and sync relationships for these elements is resolved when the original indeterminate time is resolved (e.g. when the movie is fully downloaded and the duration becomes known).
Note that it is possible to construct a SMIL 1.0 runtime that essentially hands off the scheduling issues to a media server. In this case, the client runtime does not manage the inter-media sync relationships, but simply plays media as delivered by the server. The server could preclude all indeterminate timing by gathering duration information for media as the stream scheduler prepares a presentation. Nevertheless, in many runtimes (and even in some streaming servers), it will be a requirement that indeterminate timing be supported.
SMIL 1.0 syntax includes a means of defining a timing relationship to the begin or end of another element. The specification refers to this as event-based timing. However, there is no specified requirement on the implementation to actually use events. The semantics of the timing determine that when the "effective" time does not match the "desired" time, that the time model is based upon the "effective" time. However for many cases, this is really an issue of the QOS for synchronization in the presentation.
For any time-graph in which all durations are determinate (e.g. all durations are explicit), "hard" sync runtimes should not allow any variance between "desired" and "effective" times. In these cases, there is no need for an event based model, and a pure scheduled runtime will suffice. Only in the cases where there is a specifically indeterminate time should the need for an event system arise. Therefore, in the case of determinate timing, the semantics of event-based timing and simple declared timing relationships (e.g. children of a <par> element) are equivalent. The only significant distinction between the simple timing relationships and event-based timing arises when indeterminate times are involved.
As an example, consider the following two descriptions in SMIL 1.0:
Sample 1)
<seq> <media id="m1.1" src="..." /> <media id="m1.2" src="..." /> <media id="m1.3" src="..." /> </seq>
Sample 2)
<par> <media id="m2.1" src="..." /> <media id="m2.2" src="..." begin="id(m2.1)(end)" /> <media id="m2.3" src="..." begin="id(m2.2)(end)" /> </par>
There should be no difference in the semantics of these two constructs. In practice, the runtime engine may have to use an event system for both cases, if the durations of the media elements cannot be determined when the time-graph is built. At the same time, if the durations of the elements can be determined, a runtime has no particular need to involve events at all. In this sense, the use of the term "event-based timing" can be seen as a means of describing the semantics, rather than any requirement on the implementation. Few authoring models would expose these kinds of timing relationships as events, in the same way that for example a mouse click event would be presented.
Nevertheless, the SMIL 1.0 syntax does not support timing descriptions relative to events that come from outside the timing model - in particular user interaction events and events associated with time-based media (e.g. events streamed with video). Given that the model already requires runtime support for indeterminate times, it should be a relatively simple extension to support generalized event-based timing. The key will be to incorporate user-interaction events in a manner that extends the current support for indeterminate timing.
More recently, some hybrid models have been developed that combine scheduling support and event-based declaration. In these models, there is some form of scheduled time graph that describes the presentation and the synchronization relationships among the media elements, but there is also support for event binding mechanisms. The challenge in these models is to unify the two models in a manner that is easy to author, flexible across a broad range of content and use-cases, and relatively simple to implement.
The approach is based upon a scheduled model, with extensions to support indeterminate timing in the general case. This builds upon a known model, and does not require significant changes to the runtime model. This section describes the changes to the runtime model, and the following section describes how this model is represented in an authoring model like SMIL.
To support interactive content, the scheduled model must be more flexible in a number of ways:
Start and End times for media elements are described by the model, and then computed and cached by the runtime engine. Representing the scheduled times makes it possible to implement a synchronization manager that can optimize the preparation of media, and ensure that the performance is closely tracking the sync relationships as described by the author. For simple scheduled elements, there is no event propagation delay or other overhead associated with an event-based scheme. At the same time, the model separates the description from the runtime values. This makes it possible to dynamically change values, and then propagate the effects of a change by recomputing all the dependent values.
Any cached time can have the special value indeterminate. This
means that while there is a description for how the value is computed, the
actual value is not currently known. For practical purposes, this can
be thought of as equivalent to setting the value to infinite, which
places the associated element at the theoretical end of the presentation
timeline.
There are several cases that lead to indeterminate times:
Indeterminate times in the model are handled by deferring any scheduling activity for the associated element. In most simple models, the synchronization point is defined at the beginning of an element timeline. As such, an indeterminate end-time does not preclude the playback of the element in the presentation. However and indeterminate begin-time also implies an indeterminate synchronization relationship. Thus for an element with an indeterminate begin time, the scheduler and synchronization engine defer action for the media, and do not attempt to incorporate the element into the running presentation. Another way of thinking about this is that the associated element (which may be an entire timeline or subgraph of the presentation) is disconnected from the running presentation graph. I often say that it is floating above the running timeline (or more precisely above the parent timeline), in context but not attached.
An indeterminate time can at some point in the presentation become
determinate. When this happens, all dependent times are re-evaluated,
and the scheduler incorporates all newly determinate synchronization
relationships into the running model.
Considering the cases from above:
An additional variant can take advantage of the mechanism described. Broadcast and streaming media can often deliver events as part of, or associated with the other media. These events generally are fired immediately, in the same model as an event firing when a user clicks with the mouse. However, these events can also have an associated event-time. The events can be delivered in advance of the event-time. When a timed event is delivered, the scheduler can propagate the event to timing registrants just as though the event had fired. The event is marked with the scheduled time, rather than the current presentation time, but is otherwise handled by the standard mechanism. Any times defined relative to the event are resolved to determinate times, and the scheduler can then optimize the performance by cueing media, etc. This increases the fidelity and quality of the performance for applications like IP-based enhancement of television, HTML and XML enhancement documents associated with streaming media, etc.
The sync relationship, begin and end times are always subject to the semantics of the parent timeline. As such, any local timeline that has indeterminate sync will still be cropped by the begin and end of the parent. Other mechanisms can force the presentation timeline to reset if so desired, but this is orthogonal to the basic model of indeterminate times and synchronization relationships. E.g. the SMIL hyperlinking model specifies that when linking to an element within a document, the document timeline is advanced to the presentation time associated with that element. If that element had an indeterminate begin time, an implementation can seek the presentation timeline to a point on the parent timeline and then resolve the indeterminate time to be determinate (e.g. by emulating a user-click). How would SMIL handle a link to an element defined to follow an MPEG movie in a sequence? Since the begin-time for the link destination cannot be determined other than by fully loading the MPEG movie, what should an implementation do?
To have useful application to the current SYMM Activity, we must be able to apply this model to the SYMM timing model, originally described in SMIL 1.0. This is my understanding of the requirements for such an integration:
SMIL 1.0 has already introduced the notion of time-model events (i.e. begin and end events) and a syntax for defining a time relative to these events. As discussed above, the notion of indeterminate timing was also introduced in SMIL 1.0. Although the event model described herein extends these concepts, there is no reason not to leverage the SMIL 1.0 syntax and definitions.
Events are used in the model only to describe a means of activating a given element time. There are no additional semantics associated with the event timing model. In particular, event timing does not carry the semantics defined in SMIL for timed hyperlinking, in which the link activation can seek the presentation timeline.
This principle simplifies the event semantics, and reduces the impact of event timing on existing semantic constructs. In particular, event-timed elements are still subject to all the semantic constraints defined by timeline containers. These include:
<par>
containers: children with event-based
begin and/or end times are still constrained to begin no earlier than the par
container begins, and to end no later than the par
container
ends. Activation of an event-based begin or end time does not directly
affect the par
container (other than via the definition of the endsync
specifier, which remains unchanged from SMIL 1.0). See also the event
routing discussion.<seq>
containers: children of a seq
are constrained to begin no earlier than preceding sibling element
ends. Due to the indeterminate nature of event timing, it must be illegal
to specify an event-timed begin for a child of a seq
. A similar
constraint is defined in SMIL 1.0. It is legal to specify an event-based end
time for a child of a seq
. Just as in SMIL 1.0, the
effective end of such a child will be the earlier of the event time, and an
explicit end of the parent seq
container. seq
container (other than the contribution to the implicit end of the seq
,
which remains unchanged from SMIL 1.0).<choice> or <excl>
containers (proposed for
SYMM 2.0): These new timeline constructs support the semantic that only
one child of the container may be active at one time. If a child is
activated by any means, any sibling that is already active is made
inactive. Children of a choice
are subject to the same
constraints as children of a par
container. If a child element
with an event-timed begin receives the associated event, the child begins
and the container manages the specific choice/exclusive semantics.Elements that define the a begin or end time relative to an event outside the timing model (e.g. user events, stream events) are said to have an explicitly indeterminate time. If the time is a begin time, the element has an explicitly indeterminate sync relationship. If the time dependency chain for an element time traces through any time that is explicitly indeterminate, then the element time is also explicitly indeterminate. It may be desirable to exclude elements with explicitly indeterminate sync relationships from certain calculations, such as the implicit duration of a par container. Conversely, the current SMIL model for the definition of implicit end could be applied, and the implicit begin could be defined to be 0 (zero) on the parent timeline. This needs further discussion
If an event-timed element is active (i.e. it is playing), and it receives a (second) begin-event, two behaviors are possible, and are resolved with the use of an attribute: "eventRestart". If "eventRestart=true", then the element will restart. This is useful for running graphic animation in response to a button click. If "eventRestart=false", then the additional begin events are ignored. This is useful to prevent restart of things like audio, while they are playing.
Once an element completes (including any repeat behavior), it can be restarted by another begin event, subject to the normal parent-timing constraints.
If an element with an indeterminate begin-time is activated (e.g. it received a begin event like a click), the begin time is resolved to a determinate begin time that matches the event-time (runtimes could use the current time in place of the event-time, but will slew as events propagate through the system). This determinate begin time is used to calculate the end time, and is propagated to all time dependents (e.g. to other elements specified relative to this one). This is described in somewhat more detail above.
For the case that an element is in a parent (or ancestor) timeline that repeats: for each iteration of the parent or ancestor, the element is played as though it were the first time the parent timeline was playing. This may require a reset of some sort in the implementation to ensure that the media is restarted, and that any event sensitivity is reset. Any state associated with eventRestart is reset. Any indeterminate begin-time (or end-time) that was resolved to a determinate time during the just completed repeat iteration, is reset to be indeterminate again.
According to the DOM event model, events are dispatched from the DOM root to the target node, following the DOM parent hierarchy. In general events may be captured during the dispatch phase, and can be cancelled, precluding further processing. Events also bubble up from the target node, again according to the DOM containment hierarchy.
The current discussion for a SYMM DOM includes the idea that the DOM will represent containment (i.e. parenting) based upon the timeline containers (e.g. par and seq). This model would support the semantics for event dispatch to event-timed elements in the time graph. In particular, an timeline container element that is not active can capture events targeted at any descendent, and cancel dispatch to any descendents. This enforces the semantic that an element cannot begin or end beyond the bounds of the parent timeline extent. It also makes explicit the semantic that an event-timed element is not sensitive to events if the parent timeline container is not active.
Another related semantic specifies that if an element is not currently active, and it receives an event included as part of an end-time specification, the event is ignored by the element.
This model is easy to implement based upon the DOM event model, and has minimum impact upon the existing time-graph semantics.
Many common authoring use-cases associate a given interaction result with several different events. In some cases, the set of events include different event types targeted at a given node (e.g. onClick and onDoubleClick). In other cases, the set of events includes a given event type targeted at one of a number of elements (e.g. button1.onclick, button2.onclick and button3.onclick). With event bubbling, it is possible to implicitly associate to an event on a number of elements, by specifying the focus as a parent (e.g. "parent.onClick" will actually associate to all click events on the descendents of "parent", unless event bubbling is cancelled).
It is a minor extension to support the explicit association of an element time to more than one event, by specifying a set of events. A possible syntax is a comma separated list of qualified event names.
This functionality is also described in the note: "Event Handling in SMIL and XHTML" [SYMM-EVENTS].