Proposal: Audio for Mobile

Author: Jaana Burcu Dogan

With input from David Crawshaw, Hyang-Ah Kim and Andrew Gerrand.

Last updated: November 30, 2015

Discussion at https://golang.org/issue/13432.

Abstract

This proposal suggests core abstractions to support audio decoding and playback on mobile devices.

Background

In the scope of the Go mobile project, an audio package that supports decoding and playback is a top priority. The current status of audio support under x/mobile is limited to OpenAL bindings and an experimental high-level audio player that is backed by OpenAL.

The experimental audio package fails to

In order to address these concerns, I am proposing core abstractions and a minimal set of features based on the proposed abstractions to provide decoding and playback support.

Proposal

I (Burcu Dogan) surveyed the top iOS and Android apps for audio features. Three major categories with majorly different requirements have revealed as a result of the survey. A good audio package shouldn't address the different class of requirements with isolated audio APIs, but must introduce common concepts and types that could be the backbone of both high- and low- level audio packages. This is how we will enable users to expand their audio capabilities by partially delegating their work to lower-level layers of the audio package without having to rewrite their entire audio stack.

Features considered

This section briefly explains the features required in order to support common audio requirements of the mobile applications. The abstractions we introduce today should be extendable to meet a majority of the features listed below in the long run.

Playback

Single or multi-channel playback with player controls such as play, pause, stop, etc. Games use a looping sample as the background music -- looping functionality is also essential. Multiple playback instances are needed. Most games require a background audio track and one-shot audio effects on the foreground.

Decoding

Codec library and decoding support. Most radio-like apps and music players need to play a variety of audio sources. Codec support in the parity of AudioUnit on iOS and OpenMAX on Android is good to have.

Remote streaming

Audio players, radios and tools that streams audio need to be able to work with remote audio sources. HTTP Live Streaming works on both platforms but used to be inefficient on Android devices.

Synchronization and composition

Playlist features

Music players and radios require playlisting features, so the users can queue, unqueue tracks on the player. Player also need shuffling and repeating features.

More information on the classification of the audio apps based on the features listed above is available at Appendix: Audio Apps Classification.

Goals

Short-term goals

Longer-term goals

Non-goals

Core abstractions

The section proposes the core interfaces and abstractions to represent audio, audio sources and decoding primitives. The goal of introducing and agreeing on the core abstractions is to be able to extend the audio package features in the light of the considered features listed above without breaking the APIs.

Clip

The audio package will represent audio data as linear PCM formatted in-memory audio chuncks. A fundamental interface, Clip, will define how to consume audio data and how audio attributes (such as bit and sample rate) are reported to the consumers of an audio media source.

Clip is is a small window into the underlying audio data.

// FrameInfo represents the frame-level information.
type FrameInfo struct {
    // Channels represent the number of audio channels
    // (e.g. 1 for mono, 2 for stereo).
    Channels int

    // Bit depth is the number of bits used to represent
    // a single sample.
    BitDepth int

    // Sample rate is the number of samples to be played
    // at each second.
    SampleRate int64
}

// Clip represents linear PCM formatted audio.
// Clip can seek and read a small number of frames to allow users to
// consume a small section of the underlying audio data.
//
// Frames return audio frames up to a number that can fit into the buf.
// n is the total number of returned frames.
// err is io.EOF if there are no frames left to read.
//
// FrameInfo returns the basic frame information about the clip audio.
//
// Seek seeks (offset*framesize*channels) byte in the source audio data.
// Seeking to negative offsets are illegal.
// An error is returned if the offset is out of the bounds of the
// audio data source.
//
// Size returns the total number of bytes of the underlying audio data.
// TODO(jbd): Support cases where size is unknown?
type Clip interface {
    Frames(buf []byte) (n int, err error)
    Seek(offset int64) (error)
    FrameInfo() FrameInfo
    Size() int64
}

Decoders

Decoders take any arbitrary input and is responsible to output a clip. TODO(jbd): Proposal should also mention how the decoders will be organized. e.g. image package's support for png, jpeg, gif, etc decoders.

// Decoder that reads from a Reader and converts the input
// to a PCM clip output.
func Decode(r io.ReadSeeker) (Clip, error) {
  panic("not implemented")
}

// A decoder that decodes the given data WAV byte slice and decodes it
// into a PCM clip output. An error is returned if any of the decoding
// steps fail. (e.g. ClipInfo cannot be determined from the WAV header.)
func DecodeWAVBytes(data []byte) (Clip, error) {
  panic("not implemented")
}

Clip sources

Any arbitrary valid audio data source can be converted into a clip. Examples of clip sources are networking streams, file assets and in-memory buffers.

// NewBufferClip converts a buffer to a Clip.
func NewBufferClip(buf []byte, info FrameInfo) Clip {
    panic("not implemented")
}

// NewRemoteClip converts the HTTP live streaming media
// source into a Clip.
func NewRemoteClip(url string) (Clip, error) {
    panic("not implemented")
}

Players

A player plays a series of clips back-to-back, provides basic control functions (play, stop, pause, seek, etc).

Note: Currently, x/mobile/exp/audio package provides an experimental and highly immature player. With the introduction of the new core interfaces, we will break the API surface in order to bless the new abstractions.

// NewPlayer returns a new Player. It initializes the underlying
// audio devices and the related resources.
// A player can play multiple clips back-to-back. Players will begin
// prefetching the next clip to provide a smooth and uninterrupted
// playback.
func NewPlayer(c ...Clip) (*Player, error)

Compatibility

No compatibility issues.

Implementation

The current scope of the implementation will be restricted to meet the requirements listed in the "Short-term goals" sections.

The interfaces will be contributed by Burcu Dogan. The implementation of the decoders and playback is a team effort and requires additional planning.

The audio package has no dependencies to the next Go releases and therefore doesn't have to fit in the Go release cycle.

Open issues

Appendix: Audio Apps Classification

Classification of the audio apps are based on thet survey results mentioned above. This section summarizes which features are highly related to each other.

Class A

Class A mostly represents games that require to play a background sound (in looping mode or not) and occasionally need to play one-shot audio effects fit in this category.

Class B

Class B represents games with advanced audio. Most apps that fit in this category are using advanced audio engines as their audio backend.

Class C

Class C represents the media players.