AirTunes 2 protocol
===================
.. contents:: :depth: 4
Introduction
------------
TODO
In the examples below, values to be replaced are put into curly
braces ("{}"). The braces should not be included after replacing
the values.
Credits
-------
* `Apple Inc. `_
* `Rogue Amoeba Software, LLC `_
Streaming audio to an AirTunes 2 compatible server
--------------------------------------------------
If encryption is necessary, a random key and IV (initialization vector) for AES
encryption, 16 bytes each, should be generated.
Every stream has a timestamp (uint64; initially set to ``INITIAL_TIMESTAMP``,
see Constants_) and sequence number (int16; initially set to 0) attached to it.
Both are updated when sending audio packets.
There are ``TIMESTAMPS_PER_SECOND`` timestamp ticks per second (equivalent
to the number of frames per second).
Up to ``PACKET_BACKLOG`` audio packets should be kept around after encoding and
encryption to resend if necessary. After sending an audio packet, the sender
should check if a sync packet should also be sent (basically every
``TIMESYNC_INTERVAL`` frames and just after connecting).
Connect
~~~~~~~
#. Establish TCP connection to RTSP port
- IP address(es) from Zeroconf TXT record
#. Send RTSP ``OPTIONS`` request
#. Send RTSP ``ANNOUNCE`` request
- Use password authentication based on authentication type from Zeroconf TXT
record or after receiving HTTP status code 401 (``401 Unauthorized``)
#. Send RTSP ``SETUP`` request
#. Set sequence number of connection to a random value between 0 and 8192,
timestamp and normal play time to 0 each
#. Send RTSP ``RECORD`` request
#. Send initial volume (see `Setting volume`_)
#. Prepare RTP connection for audio packets
Disconnect
~~~~~~~~~~
#. Stop sending audio data
#. Close RTSP connection
Preferred TCP/UDP ports
-----------------------
=========== ====
Connection Port
=========== ====
RTSP 5000
Audio data 6000
RTP control 6001
Timing 6002
=========== ====
Payload types
-------------
=============== ====
Timing request 0x52
Timing response 0x53
Sync 0x54
Range resend 0x55
Audio data 0x60
=============== ====
Data types
----------
When transferred over the network, multi-byte values need to converted
to network byte order. No aligning must be used within the packet
structures.
RtpHeader
~~~~~~~~~
::
/* RTP header bits */
RTP_HEADER_A_EXTENSION = 0x10;
RTP_HEADER_A_SOURCE = 0x0f;
RTP_HEADER_B_PAYLOAD_TYPE = 0x7f;
RTP_HEADER_B_MARKER = 0x80;
/* sizeof(RtpHeader) == 4 */
RtpHeader {
uint8_t a;
uint8_t b;
uint16_t seqnum;
/* extension = bool(a & RTP_HEADER_A_EXTENSION) */
/* source = a & RTP_HEADER_A_SOURCE */
/* payload_type = b & RTP_HEADER_B_PAYLOAD_TYPE */
/* marker = bool(b & RTP_HEADER_B_MARKER) */
}
RtpTime
~~~~~~~
::
/* sizeof(RtpTime) == 8 */
struct RtpTime {
/* Seconds since 1900-01-01 00:00:00 (TODO: Timezone?) */
uint32_t integer;
/* Fraction of second (0..2^32) */
uint32_t fraction;
}
TimingPacket
~~~~~~~~~~~~
::
/* sizeof(TimingPacket) == 32 */
struct TimingPacket {
RtpHeader header;
RtpTime timestamp;
RtpTime reference_time;
RtpTime received_time;
RtpTime send_time;
}
SyncPacket
~~~~~~~~~~
::
/* sizeof(SyncPacket) == 20 */
struct SyncPacket {
RtpHeader header;
uint32_t timestamp;
RtpTime some_time;
uint32_t next_timestamp;
}
ResendPacket
~~~~~~~~~~~~
::
/* sizeof(RtpResendPacket) == 8 */
struct RtpResendHeader {
RtpHeader header;
uint16_t missed_seqnum;
uint16_t count;
}
Constants
---------
===================== ========================= ==========================
Name Value Description
===================== ========================= ==========================
FRAMES_PER_PACKET 352 Audio frames per packet
SHORTS_PER_PACKET 2 * FRAMES_PER_PACKET Shorts per packet
TIMESTAMPS_PER_SECOND 44100 Timestamps per second
TIMESYNC_INTERVAL 44100 Once per second
TIME_PER_PACKET FRAMES_PER_PACKET / 44100 Milliseconds
PACKET_BACKLOG 1000 Packet resend buffer size
INITIAL_TIMESTAMP 0x10000000
===================== ========================= ==========================
RTSP
----
Common request headers
~~~~~~~~~~~~~~~~~~~~~~
.. _rtp-info:
================ =================================================
Client-Instance | 64 random bytes in hex. Must be unique per
connection.
CSeq | Request sequence number. Can either be counted
locally or response sequence number can be
increased by one.
RTP-Info ``rtptime={RTP timestamp}``
Session Server session ID (after SETUP)
User-Agent | ``iTunes/{Version} (Windows; N;)``
(e.g. Version=``iTunes/7.6.2 (Windows; N;)``)
================ =================================================
Request URI
~~~~~~~~~~~
Unless specified otherwise, ``rtsp://{Local IP address}/{Client session ID}``
must be used as the request URI. The client session ID is a random number
between 0 and 2^32 generated once per connection.
ANNOUNCE
~~~~~~~~
======= ===========================================================
Headers | ``Content-Type: application/sdp``
Body | ``v=0\r\n``
| ``o=iTunes {Client session ID} O IN IP4 {Local IP address}\r\n``
| ``s=iTunes\r\n``
| ``c=IN IP4 {Server IP address}\r\n``
| ``t=0 0\r\n``
| ``m=audio 0 RTP/AVP 96\r\n``
| ``a=rtpmap:96 AppleLossless\r\n``
| ``a=fmtp:96 {Frames per packet} 0 16 40 10 14 2 255 0 0 44100\r\n``
| ``a=rsaaeskey:{AES key in base64 w/o padding}\r\n``
| ``a=aesiv:{AES IV in base64 w/o padding}\r\n``
| ``\r\n``
======= ===========================================================
FLUSH
~~~~~
======= =============================================
Headers ``RTP-Info: seq={Last RTP seqnum};rtptime=0``
======= =============================================
OPTIONS
~~~~~~~
======= ============================================================
URI ``*``
Headers ``Apple-Challenge: {16 random bytes in base64 w/o padding}``
======= ============================================================
RECORD
~~~~~~
======= =========================================
Headers | ``Range: ntp={Note 1}``
| ``RTP-Info: seq={Note 2};rtptime={Note 3}``
======= =========================================
Note 1: Normal play time (apparently always 0), float, >=0. (TODO)
Note 2: Apparently a random number between 0 and 8192. (TODO)
Note 3: Apparently always zero. (TODO)
SET_PARAMETER
~~~~~~~~~~~~~
Setting volume
``````````````
======= =================================
Headers ``Content-Type: text/parameters``
Body ``volume: %f``
======= =================================
Volume is either -144.0 (muted) or (-30.0)..(0.0).
Set progress
````````````
======= =================================
Headers ``Content-Type: text/parameters``
Body ``progress: %f/%f/%f``
======= =================================
Values are RTP timestamp as unsigned integers (TODO).
Set DAAP metadata
`````````````````
======= =================================
Headers | ``Content-Type: application/x-dmap-tagged``
| RTP-Info_
Body DAAP metadata
======= =================================
SETUP
~~~~~
======= ====================================================
Headers ``Transport: RTP/AVP/UDP;unicast;interleaved=0-1;mode=record;control_port={Control port};timing_port={Timing port}``
======= ====================================================
Get ``server_port``, ``control_port`` and ``timing_port`` from ``Transport``
response header. Get ``Session`` response header and use it as server session ID.
TEARDOWN
~~~~~~~~
Nothing special.
Rogue Amoeba extensions
~~~~~~~~~~~~~~~~~~~~~~~
X_RA_SET_ALBUM_ART
``````````````````
Use this only if server wants PList metadata. Use the ``SET_PARAMETER``
method if DAAP metadata is requested.
======= ========================================
Headers | ``Content-Type: {Image content type}``
| RTP-Info_
Body Image data
======= ========================================
X_RA_SET_PLIST_METADATA
```````````````````````
======= ===================================
Headers | ``Content-Type: application/xml``
| RTP-Info_
Body Metadata in PList format
======= ===================================
Authentication
~~~~~~~~~~~~~~
AirTunes 2 uses the HTTP Digest authentication method as described
in RFC2617.
Detect speaker type
~~~~~~~~~~~~~~~~~~~
If ``Audio-Jack-Status`` is in response:
::
speaker_type() {
if ("disconnected" in Audio-Jack-Status) {
return unplugged;
} else if ("connected" in Audio-Jack-Status) {
if ("digital" in Audio-Jack-Status) {
return digital;
}
return analog;
}
return unknown;
}
Detect metadata and audio latency
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If ``Apple-Response``, ``Server`` or ``Audio-Latency`` in response:
::
if (Apple-Response in response) {
lowercase_password = False;
audio_format = EncryptedALAC;
wants_album_art = False;
wants_metadata = False;
wants_progress = False;
has_bad_latency_header = False;
}
if (Server in response) {
lowercase_password = True;
has_bad_latency_header = True;
if (not Apple-Response in response) {
audio_format = UnencryptedALAC;
wants_album_art = DAAP;
wants_metadata = DAAP;
wants_progress = True;
}
}
if (Audio-Latency in response) {
if (not has_bad_latency_header) {
audio_latency = Audio-Latency;
} else {
if (Audio-Latency == 322 or
Audio-Latency == 15049) {
audio_latency = 11025;
}
/* Why always 11025? */
audio_latency = 11025;
}
}
Timing
------
Replying to timing packet
~~~~~~~~~~~~~~~~~~~~~~~~~
::
on_timing_packet(TimingPacket req) {
assert req.header.payload_type == PAYLOAD_TIMING_REQUEST;
TimingPacket res;
res.header = req.header;
res.header.payload_type = PAYLOAD_TIMING_RESPONSE;
res.reftime = req.send_time;
res.received_time = time_now();
res.send_time = time_now();
send(res);
}
Sync
----
Sync packets are sent once per second or when adding a speaker.
TODO: More details such as timing adjustments.
Sending sync packet
~~~~~~~~~~~~~~~~~~~
::
send_sync(uint32_t timestamp, bool first) {
SyncPacket packet;
packet.header.payload_type = PAYLOAD_SYNC;
packet.header.marker = True;
packet.header.seqnum = 7; /* Why fixed? */
if (first) {
packet.header.extension = True;
}
packet.now_timestamp = /* TODO */;
packet.next_timestamp = timestamp;
packet.some_time = /* TODO */;
}
Audio
-----
Audio packet
~~~~~~~~~~~~
Header::
/* The first 4 bytes are an RtpHeader */
{ 0x80, 0x60, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00,
0x62, 0x74, 0x05, 0xb9 }
Audio codec
~~~~~~~~~~~
=============== =====================
Codec Apple Lossless (ALAC)
Sample size 16 Bit
Channels 2
Sample rate 44100
=============== =====================
Packetizing audio
~~~~~~~~~~~~~~~~~
#. Collect ``FRAMES_PER_PACKET`` frames from input data (each frame is
2 bytes)
#. Encode input frames using ALAC codec
#. Encode packet data
- Raw L16
#. Convert raw input data to big endian (it's an array of uint16)
#. Copy audio header and converted audio data into one buffer
#. Set 2nd byte of buffer to 0xa
- Unencrypted ALAC
#. Copy audio header to buffer
#. Append ALAC encoded audio data to buffer
- Encrypted ALAC
#. Encrypt ALAC encoded audio data (only complete 16 byte blocks,
the rest stays unencrypted)
#. Copy audio header to buffer
#. Append encrypted audio data to buffer
#. Set bytes 2-4 to sequence number in big endian
#. Set bytes 4-8 to timestamp in big endian
#. Increase sequence number by one for next packet
#. Increase timestamp by number of frames in this packet
Metadata
--------
DAAP metadata
~~~~~~~~~~~~~
=============== =============================
Content-type ``application/x-dmap-tagged``
Item name field ``dmap.itemname``
Artist field ``daap.songartist``
Album field ``daap.songalbum``
=============== =============================
PList metadata
~~~~~~~~~~~~~~
=============== =============================
Content-type ``application/xml``
Title field ``title``
Artist field ``artist``
Album field ``album``
=============== =============================
Zeroconf TXT record
-------------------
======= =======================================================
Field Description
======= =======================================================
txtvers TXT record version (always ``1``)
pw ``true`` if password required, ``false`` otherwise
sr Audio sample rate
ss Audio bit rate
ch Number of audio channels
tp Protocol (``UDP`` [TODO: or ``TCP``?])
======= =======================================================
Rogue Amoeba extensions
~~~~~~~~~~~~~~~~~~~~~~~
============== =======================================
Field Description
============== =======================================
rast ``afs`` if Airfoil speaker
ramach ``{Platform name}.{OS major version}``
raver Library version
raAudioFormats ``ALAC`` or ``L16``
============== =======================================