AirTunes 2 protocol
===================

.. contents:: :depth: 4

Introduction
------------

TODO

In the examples below, values to be replaced are put into curly
braces ("{}"). The braces should not be included after replacing
the values.

Credits
-------

* `Apple Inc. <http://www.apple.com/>`_
* `Rogue Amoeba Software, LLC <http://www.rogueamoeba.com/>`_


Streaming audio to an AirTunes 2 compatible server
--------------------------------------------------

If encryption is necessary, a random key and IV (initialization vector) for AES
encryption, 16 bytes each, should be generated.

Every stream has a timestamp (uint64; initially set to ``INITIAL_TIMESTAMP``,
see Constants_) and sequence number (int16; initially set to 0) attached to it.
Both are updated when sending audio packets.

There are ``TIMESTAMPS_PER_SECOND`` timestamp ticks per second (equivalent
to the number of frames per second).

Up to ``PACKET_BACKLOG`` audio packets should be kept around after encoding and
encryption to resend if necessary. After sending an audio packet, the sender
should check if a sync packet should also be sent (basically every
``TIMESYNC_INTERVAL`` frames and just after connecting).


Connect
~~~~~~~

#. Establish TCP connection to RTSP port

   - IP address(es) from Zeroconf TXT record

#. Send RTSP ``OPTIONS`` request
#. Send RTSP ``ANNOUNCE`` request

   - Use password authentication based on authentication type from Zeroconf TXT
     record or after receiving HTTP status code 401 (``401 Unauthorized``)

#. Send RTSP ``SETUP`` request
#. Set sequence number of connection to a random value between 0 and 8192,
   timestamp and normal play time to 0 each
#. Send RTSP ``RECORD`` request
#. Send initial volume (see `Setting volume`_)
#. Prepare RTP connection for audio packets


Disconnect
~~~~~~~~~~

#. Stop sending audio data
#. Close RTSP connection


Preferred TCP/UDP ports
-----------------------

=========== ====
Connection  Port
=========== ====
RTSP        5000
Audio data  6000
RTP control 6001
Timing      6002
=========== ====


Payload types
-------------

=============== ====
Timing request  0x52
Timing response 0x53
Sync            0x54
Range resend    0x55
Audio data      0x60
=============== ====

Data types
----------

When transferred over the network, multi-byte values need to converted
to network byte order. No aligning must be used within the packet
structures.

RtpHeader
~~~~~~~~~

::

  /* RTP header bits */
  RTP_HEADER_A_EXTENSION = 0x10;
  RTP_HEADER_A_SOURCE = 0x0f;

  RTP_HEADER_B_PAYLOAD_TYPE = 0x7f;
  RTP_HEADER_B_MARKER = 0x80;

  /* sizeof(RtpHeader) == 4 */
  RtpHeader {
   uint8_t a;
   uint8_t b;
   uint16_t seqnum;
  
   /* extension = bool(a & RTP_HEADER_A_EXTENSION) */
   /* source = a & RTP_HEADER_A_SOURCE */
  
   /* payload_type = b & RTP_HEADER_B_PAYLOAD_TYPE */
   /* marker = bool(b & RTP_HEADER_B_MARKER) */
  }


RtpTime
~~~~~~~

::

  /* sizeof(RtpTime) == 8 */
  struct RtpTime {
    /* Seconds since 1900-01-01 00:00:00 (TODO: Timezone?) */
    uint32_t integer;
    
    /* Fraction of second (0..2^32) */
    uint32_t fraction;
  }


TimingPacket
~~~~~~~~~~~~

::

  /* sizeof(TimingPacket) == 32 */
  struct TimingPacket {
    RtpHeader header;
    RtpTime timestamp;
    RtpTime reference_time;
    RtpTime received_time;
    RtpTime send_time;
  }


SyncPacket
~~~~~~~~~~

::

  /* sizeof(SyncPacket) == 20 */
  struct SyncPacket {
    RtpHeader header;
    uint32_t timestamp;
    RtpTime some_time;
    uint32_t next_timestamp;
  }


ResendPacket
~~~~~~~~~~~~

::

  /* sizeof(RtpResendPacket) == 8 */
  struct RtpResendHeader {
    RtpHeader header;
    uint16_t missed_seqnum;
    uint16_t count;
  }


Constants
---------

===================== ========================= ==========================
Name                  Value                     Description
===================== ========================= ==========================
FRAMES_PER_PACKET     352                       Audio frames per packet
SHORTS_PER_PACKET     2 * FRAMES_PER_PACKET     Shorts per packet
TIMESTAMPS_PER_SECOND 44100                     Timestamps per second
TIMESYNC_INTERVAL     44100                     Once per second
TIME_PER_PACKET       FRAMES_PER_PACKET / 44100 Milliseconds
PACKET_BACKLOG        1000                      Packet resend buffer size
INITIAL_TIMESTAMP     0x10000000
===================== ========================= ==========================


RTSP
----

Common request headers
~~~~~~~~~~~~~~~~~~~~~~

.. _rtp-info:

================ =================================================
Client-Instance  | 64 random bytes in hex. Must be unique per
                   connection.
CSeq             | Request sequence number. Can either be counted
                   locally or response sequence number can be
                   increased by one.
RTP-Info         ``rtptime={RTP timestamp}``
Session          Server session ID (after SETUP)
User-Agent       | ``iTunes/{Version} (Windows; N;)``
                   (e.g. Version=``iTunes/7.6.2 (Windows; N;)``)
================ =================================================


Request URI
~~~~~~~~~~~

Unless specified otherwise, ``rtsp://{Local IP address}/{Client session ID}``
must be used as the request URI. The client session ID is a random number
between 0 and 2^32 generated once per connection.


ANNOUNCE
~~~~~~~~

======= ===========================================================
Headers | ``Content-Type: application/sdp``
Body    | ``v=0\r\n``
        | ``o=iTunes {Client session ID} O IN IP4 {Local IP address}\r\n``
        | ``s=iTunes\r\n``
        | ``c=IN IP4 {Server IP address}\r\n``
        | ``t=0 0\r\n``
        | ``m=audio 0 RTP/AVP 96\r\n``
        | ``a=rtpmap:96 AppleLossless\r\n``
        | ``a=fmtp:96 {Frames per packet} 0 16 40 10 14 2 255 0 0 44100\r\n``
        | ``a=rsaaeskey:{AES key in base64 w/o padding}\r\n``
        | ``a=aesiv:{AES IV in base64 w/o padding}\r\n``
        | ``\r\n``
======= ===========================================================

FLUSH
~~~~~

======= =============================================
Headers ``RTP-Info: seq={Last RTP seqnum};rtptime=0``
======= =============================================

OPTIONS
~~~~~~~

======= ============================================================
URI     ``*``
Headers ``Apple-Challenge: {16 random bytes in base64 w/o padding}``
======= ============================================================

RECORD
~~~~~~

======= =========================================
Headers | ``Range: ntp={Note 1}``
        | ``RTP-Info: seq={Note 2};rtptime={Note 3}``
======= =========================================

Note 1: Normal play time (apparently always 0), float, >=0. (TODO)

Note 2: Apparently a random number between 0 and 8192. (TODO)

Note 3: Apparently always zero. (TODO)

SET_PARAMETER
~~~~~~~~~~~~~

Setting volume
``````````````

======= =================================
Headers ``Content-Type: text/parameters``
Body    ``volume: %f``
======= =================================

Volume is either -144.0 (muted) or (-30.0)..(0.0).

Set progress
````````````

======= =================================
Headers ``Content-Type: text/parameters``
Body    ``progress: %f/%f/%f``
======= =================================

Values are RTP timestamp as unsigned integers (TODO).

Set DAAP metadata
`````````````````

======= =================================
Headers | ``Content-Type: application/x-dmap-tagged``
        | RTP-Info_
Body    DAAP metadata
======= =================================

SETUP
~~~~~

======= ====================================================
Headers ``Transport: RTP/AVP/UDP;unicast;interleaved=0-1;mode=record;control_port={Control port};timing_port={Timing port}``
======= ====================================================

Get ``server_port``, ``control_port`` and ``timing_port`` from ``Transport``
response header. Get ``Session`` response header and use it as server session ID.

TEARDOWN
~~~~~~~~

Nothing special.

Rogue Amoeba extensions
~~~~~~~~~~~~~~~~~~~~~~~

X_RA_SET_ALBUM_ART
``````````````````

Use this only if server wants PList metadata. Use the ``SET_PARAMETER``
method if DAAP metadata is requested.

======= ========================================
Headers | ``Content-Type: {Image content type}``
        | RTP-Info_
Body    Image data
======= ========================================


X_RA_SET_PLIST_METADATA
```````````````````````

======= ===================================
Headers | ``Content-Type: application/xml``
        | RTP-Info_
Body    Metadata in PList format
======= ===================================


Authentication
~~~~~~~~~~~~~~

AirTunes 2 uses the HTTP Digest authentication method as described
in RFC2617.


Detect speaker type
~~~~~~~~~~~~~~~~~~~

If ``Audio-Jack-Status`` is in response:

::

  speaker_type() {
    if ("disconnected" in Audio-Jack-Status) {
      return unplugged;
  
    } else if ("connected" in Audio-Jack-Status) {
      if ("digital" in Audio-Jack-Status) {
        return digital;
      }
  
      return analog;
    }
    
    return unknown;
  }


Detect metadata and audio latency
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If ``Apple-Response``, ``Server`` or ``Audio-Latency`` in response:

::

  if (Apple-Response in response) {
    lowercase_password = False;
    audio_format = EncryptedALAC;
    wants_album_art = False;
    wants_metadata = False;
    wants_progress = False;
    has_bad_latency_header = False;
  }

  if (Server in response) {
    lowercase_password = True;
    has_bad_latency_header = True;

    if (not Apple-Response in response) {
      audio_format = UnencryptedALAC;
      wants_album_art = DAAP;
      wants_metadata = DAAP;
      wants_progress = True;
    }
  }
  
  if (Audio-Latency in response) {
    if (not has_bad_latency_header) {
      audio_latency = Audio-Latency;
    } else {
      if (Audio-Latency == 322 or
          Audio-Latency == 15049) {
        audio_latency = 11025;
      }
  
      /* Why always 11025? */
      audio_latency = 11025;
    }
  }


Timing
------

Replying to timing packet
~~~~~~~~~~~~~~~~~~~~~~~~~ 

::

  on_timing_packet(TimingPacket req) {
   assert req.header.payload_type == PAYLOAD_TIMING_REQUEST;
  
   TimingPacket res;
   res.header = req.header;
   res.header.payload_type = PAYLOAD_TIMING_RESPONSE;
   res.reftime = req.send_time;
   res.received_time = time_now();
   res.send_time = time_now();
  
   send(res);
  }


Sync
----

Sync packets are sent once per second or when adding a speaker.

TODO: More details such as timing adjustments.

Sending sync packet
~~~~~~~~~~~~~~~~~~~

::
  
  send_sync(uint32_t timestamp, bool first) {
   SyncPacket packet;
   packet.header.payload_type = PAYLOAD_SYNC;
   packet.header.marker = True;
   packet.header.seqnum = 7; /* Why fixed? */
  
   if (first) {
     packet.header.extension = True;
   }
  
   packet.now_timestamp = /* TODO */;
   packet.next_timestamp = timestamp;
   packet.some_time = /* TODO */;
  }


Audio
-----

Audio packet
~~~~~~~~~~~~

Header::

  /* The first 4 bytes are an RtpHeader */
  { 0x80, 0x60, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00,
    0x62, 0x74, 0x05, 0xb9 }


Audio codec
~~~~~~~~~~~

=============== =====================
Codec           Apple Lossless (ALAC)
Sample size     16 Bit
Channels        2
Sample rate     44100
=============== =====================


Packetizing audio
~~~~~~~~~~~~~~~~~

#. Collect ``FRAMES_PER_PACKET`` frames from input data (each frame is
   2 bytes)
#. Encode input frames using ALAC codec
#. Encode packet data

   - Raw L16
   
     #. Convert raw input data to big endian (it's an array of uint16)
     #. Copy audio header and converted audio data into one buffer
     #. Set 2nd byte of buffer to 0xa

   - Unencrypted ALAC
   
     #. Copy audio header to buffer
     #. Append ALAC encoded audio data to buffer
     
   - Encrypted ALAC
   
     #. Encrypt ALAC encoded audio data (only complete 16 byte blocks,
        the rest stays unencrypted)
     #. Copy audio header to buffer
     #. Append encrypted audio data to buffer
     
#. Set bytes 2-4 to sequence number in big endian
#. Set bytes 4-8 to timestamp in big endian
#. Increase sequence number by one for next packet
#. Increase timestamp by number of frames in this packet


Metadata
--------

DAAP metadata
~~~~~~~~~~~~~

=============== =============================
Content-type    ``application/x-dmap-tagged``
Item name field ``dmap.itemname``
Artist field    ``daap.songartist``
Album field     ``daap.songalbum``
=============== =============================

PList metadata
~~~~~~~~~~~~~~

=============== =============================
Content-type    ``application/xml``
Title field     ``title``
Artist field    ``artist``
Album field     ``album``
=============== =============================


Zeroconf TXT record
-------------------

======= =======================================================
Field   Description
======= =======================================================
txtvers TXT record version (always ``1``)
pw      ``true`` if password required, ``false`` otherwise
sr      Audio sample rate
ss      Audio bit rate
ch      Number of audio channels
tp      Protocol (``UDP`` [TODO: or ``TCP``?])
======= =======================================================


Rogue Amoeba extensions
~~~~~~~~~~~~~~~~~~~~~~~

============== =======================================
Field          Description
============== =======================================
rast           ``afs`` if Airfoil speaker
ramach         ``{Platform name}.{OS major version}``
raver          Library version
raAudioFormats ``ALAC`` or ``L16``
============== =======================================