AirTunes 2 protocol =================== .. contents:: :depth: 4 Introduction ------------ TODO In the examples below, values to be replaced are put into curly braces ("{}"). The braces should not be included after replacing the values. Credits ------- * `Apple Inc. `_ * `Rogue Amoeba Software, LLC `_ Streaming audio to an AirTunes 2 compatible server -------------------------------------------------- If encryption is necessary, a random key and IV (initialization vector) for AES encryption, 16 bytes each, should be generated. Every stream has a timestamp (uint64; initially set to ``INITIAL_TIMESTAMP``, see Constants_) and sequence number (int16; initially set to 0) attached to it. Both are updated when sending audio packets. There are ``TIMESTAMPS_PER_SECOND`` timestamp ticks per second (equivalent to the number of frames per second). Up to ``PACKET_BACKLOG`` audio packets should be kept around after encoding and encryption to resend if necessary. After sending an audio packet, the sender should check if a sync packet should also be sent (basically every ``TIMESYNC_INTERVAL`` frames and just after connecting). Connect ~~~~~~~ #. Establish TCP connection to RTSP port - IP address(es) from Zeroconf TXT record #. Send RTSP ``OPTIONS`` request #. Send RTSP ``ANNOUNCE`` request - Use password authentication based on authentication type from Zeroconf TXT record or after receiving HTTP status code 401 (``401 Unauthorized``) #. Send RTSP ``SETUP`` request #. Set sequence number of connection to a random value between 0 and 8192, timestamp and normal play time to 0 each #. Send RTSP ``RECORD`` request #. Send initial volume (see `Setting volume`_) #. Prepare RTP connection for audio packets Disconnect ~~~~~~~~~~ #. Stop sending audio data #. Close RTSP connection Preferred TCP/UDP ports ----------------------- =========== ==== Connection Port =========== ==== RTSP 5000 Audio data 6000 RTP control 6001 Timing 6002 =========== ==== Payload types ------------- =============== ==== Timing request 0x52 Timing response 0x53 Sync 0x54 Range resend 0x55 Audio data 0x60 =============== ==== Data types ---------- When transferred over the network, multi-byte values need to converted to network byte order. No aligning must be used within the packet structures. RtpHeader ~~~~~~~~~ :: /* RTP header bits */ RTP_HEADER_A_EXTENSION = 0x10; RTP_HEADER_A_SOURCE = 0x0f; RTP_HEADER_B_PAYLOAD_TYPE = 0x7f; RTP_HEADER_B_MARKER = 0x80; /* sizeof(RtpHeader) == 4 */ RtpHeader { uint8_t a; uint8_t b; uint16_t seqnum; /* extension = bool(a & RTP_HEADER_A_EXTENSION) */ /* source = a & RTP_HEADER_A_SOURCE */ /* payload_type = b & RTP_HEADER_B_PAYLOAD_TYPE */ /* marker = bool(b & RTP_HEADER_B_MARKER) */ } RtpTime ~~~~~~~ :: /* sizeof(RtpTime) == 8 */ struct RtpTime { /* Seconds since 1900-01-01 00:00:00 (TODO: Timezone?) */ uint32_t integer; /* Fraction of second (0..2^32) */ uint32_t fraction; } TimingPacket ~~~~~~~~~~~~ :: /* sizeof(TimingPacket) == 32 */ struct TimingPacket { RtpHeader header; RtpTime timestamp; RtpTime reference_time; RtpTime received_time; RtpTime send_time; } SyncPacket ~~~~~~~~~~ :: /* sizeof(SyncPacket) == 20 */ struct SyncPacket { RtpHeader header; uint32_t timestamp; RtpTime some_time; uint32_t next_timestamp; } ResendPacket ~~~~~~~~~~~~ :: /* sizeof(RtpResendPacket) == 8 */ struct RtpResendHeader { RtpHeader header; uint16_t missed_seqnum; uint16_t count; } Constants --------- ===================== ========================= ========================== Name Value Description ===================== ========================= ========================== FRAMES_PER_PACKET 352 Audio frames per packet SHORTS_PER_PACKET 2 * FRAMES_PER_PACKET Shorts per packet TIMESTAMPS_PER_SECOND 44100 Timestamps per second TIMESYNC_INTERVAL 44100 Once per second TIME_PER_PACKET FRAMES_PER_PACKET / 44100 Milliseconds PACKET_BACKLOG 1000 Packet resend buffer size INITIAL_TIMESTAMP 0x10000000 ===================== ========================= ========================== RTSP ---- Common request headers ~~~~~~~~~~~~~~~~~~~~~~ .. _rtp-info: ================ ================================================= Client-Instance | 64 random bytes in hex. Must be unique per connection. CSeq | Request sequence number. Can either be counted locally or response sequence number can be increased by one. RTP-Info ``rtptime={RTP timestamp}`` Session Server session ID (after SETUP) User-Agent | ``iTunes/{Version} (Windows; N;)`` (e.g. Version=``iTunes/7.6.2 (Windows; N;)``) ================ ================================================= Request URI ~~~~~~~~~~~ Unless specified otherwise, ``rtsp://{Local IP address}/{Client session ID}`` must be used as the request URI. The client session ID is a random number between 0 and 2^32 generated once per connection. ANNOUNCE ~~~~~~~~ ======= =========================================================== Headers | ``Content-Type: application/sdp`` Body | ``v=0\r\n`` | ``o=iTunes {Client session ID} O IN IP4 {Local IP address}\r\n`` | ``s=iTunes\r\n`` | ``c=IN IP4 {Server IP address}\r\n`` | ``t=0 0\r\n`` | ``m=audio 0 RTP/AVP 96\r\n`` | ``a=rtpmap:96 AppleLossless\r\n`` | ``a=fmtp:96 {Frames per packet} 0 16 40 10 14 2 255 0 0 44100\r\n`` | ``a=rsaaeskey:{AES key in base64 w/o padding}\r\n`` | ``a=aesiv:{AES IV in base64 w/o padding}\r\n`` | ``\r\n`` ======= =========================================================== FLUSH ~~~~~ ======= ============================================= Headers ``RTP-Info: seq={Last RTP seqnum};rtptime=0`` ======= ============================================= OPTIONS ~~~~~~~ ======= ============================================================ URI ``*`` Headers ``Apple-Challenge: {16 random bytes in base64 w/o padding}`` ======= ============================================================ RECORD ~~~~~~ ======= ========================================= Headers | ``Range: ntp={Note 1}`` | ``RTP-Info: seq={Note 2};rtptime={Note 3}`` ======= ========================================= Note 1: Normal play time (apparently always 0), float, >=0. (TODO) Note 2: Apparently a random number between 0 and 8192. (TODO) Note 3: Apparently always zero. (TODO) SET_PARAMETER ~~~~~~~~~~~~~ Setting volume `````````````` ======= ================================= Headers ``Content-Type: text/parameters`` Body ``volume: %f`` ======= ================================= Volume is either -144.0 (muted) or (-30.0)..(0.0). Set progress ```````````` ======= ================================= Headers ``Content-Type: text/parameters`` Body ``progress: %f/%f/%f`` ======= ================================= Values are RTP timestamp as unsigned integers (TODO). Set DAAP metadata ````````````````` ======= ================================= Headers | ``Content-Type: application/x-dmap-tagged`` | RTP-Info_ Body DAAP metadata ======= ================================= SETUP ~~~~~ ======= ==================================================== Headers ``Transport: RTP/AVP/UDP;unicast;interleaved=0-1;mode=record;control_port={Control port};timing_port={Timing port}`` ======= ==================================================== Get ``server_port``, ``control_port`` and ``timing_port`` from ``Transport`` response header. Get ``Session`` response header and use it as server session ID. TEARDOWN ~~~~~~~~ Nothing special. Rogue Amoeba extensions ~~~~~~~~~~~~~~~~~~~~~~~ X_RA_SET_ALBUM_ART `````````````````` Use this only if server wants PList metadata. Use the ``SET_PARAMETER`` method if DAAP metadata is requested. ======= ======================================== Headers | ``Content-Type: {Image content type}`` | RTP-Info_ Body Image data ======= ======================================== X_RA_SET_PLIST_METADATA ``````````````````````` ======= =================================== Headers | ``Content-Type: application/xml`` | RTP-Info_ Body Metadata in PList format ======= =================================== Authentication ~~~~~~~~~~~~~~ AirTunes 2 uses the HTTP Digest authentication method as described in RFC2617. Detect speaker type ~~~~~~~~~~~~~~~~~~~ If ``Audio-Jack-Status`` is in response: :: speaker_type() { if ("disconnected" in Audio-Jack-Status) { return unplugged; } else if ("connected" in Audio-Jack-Status) { if ("digital" in Audio-Jack-Status) { return digital; } return analog; } return unknown; } Detect metadata and audio latency ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If ``Apple-Response``, ``Server`` or ``Audio-Latency`` in response: :: if (Apple-Response in response) { lowercase_password = False; audio_format = EncryptedALAC; wants_album_art = False; wants_metadata = False; wants_progress = False; has_bad_latency_header = False; } if (Server in response) { lowercase_password = True; has_bad_latency_header = True; if (not Apple-Response in response) { audio_format = UnencryptedALAC; wants_album_art = DAAP; wants_metadata = DAAP; wants_progress = True; } } if (Audio-Latency in response) { if (not has_bad_latency_header) { audio_latency = Audio-Latency; } else { if (Audio-Latency == 322 or Audio-Latency == 15049) { audio_latency = 11025; } /* Why always 11025? */ audio_latency = 11025; } } Timing ------ Replying to timing packet ~~~~~~~~~~~~~~~~~~~~~~~~~ :: on_timing_packet(TimingPacket req) { assert req.header.payload_type == PAYLOAD_TIMING_REQUEST; TimingPacket res; res.header = req.header; res.header.payload_type = PAYLOAD_TIMING_RESPONSE; res.reftime = req.send_time; res.received_time = time_now(); res.send_time = time_now(); send(res); } Sync ---- Sync packets are sent once per second or when adding a speaker. TODO: More details such as timing adjustments. Sending sync packet ~~~~~~~~~~~~~~~~~~~ :: send_sync(uint32_t timestamp, bool first) { SyncPacket packet; packet.header.payload_type = PAYLOAD_SYNC; packet.header.marker = True; packet.header.seqnum = 7; /* Why fixed? */ if (first) { packet.header.extension = True; } packet.now_timestamp = /* TODO */; packet.next_timestamp = timestamp; packet.some_time = /* TODO */; } Audio ----- Audio packet ~~~~~~~~~~~~ Header:: /* The first 4 bytes are an RtpHeader */ { 0x80, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x62, 0x74, 0x05, 0xb9 } Audio codec ~~~~~~~~~~~ =============== ===================== Codec Apple Lossless (ALAC) Sample size 16 Bit Channels 2 Sample rate 44100 =============== ===================== Packetizing audio ~~~~~~~~~~~~~~~~~ #. Collect ``FRAMES_PER_PACKET`` frames from input data (each frame is 2 bytes) #. Encode input frames using ALAC codec #. Encode packet data - Raw L16 #. Convert raw input data to big endian (it's an array of uint16) #. Copy audio header and converted audio data into one buffer #. Set 2nd byte of buffer to 0xa - Unencrypted ALAC #. Copy audio header to buffer #. Append ALAC encoded audio data to buffer - Encrypted ALAC #. Encrypt ALAC encoded audio data (only complete 16 byte blocks, the rest stays unencrypted) #. Copy audio header to buffer #. Append encrypted audio data to buffer #. Set bytes 2-4 to sequence number in big endian #. Set bytes 4-8 to timestamp in big endian #. Increase sequence number by one for next packet #. Increase timestamp by number of frames in this packet Metadata -------- DAAP metadata ~~~~~~~~~~~~~ =============== ============================= Content-type ``application/x-dmap-tagged`` Item name field ``dmap.itemname`` Artist field ``daap.songartist`` Album field ``daap.songalbum`` =============== ============================= PList metadata ~~~~~~~~~~~~~~ =============== ============================= Content-type ``application/xml`` Title field ``title`` Artist field ``artist`` Album field ``album`` =============== ============================= Zeroconf TXT record ------------------- ======= ======================================================= Field Description ======= ======================================================= txtvers TXT record version (always ``1``) pw ``true`` if password required, ``false`` otherwise sr Audio sample rate ss Audio bit rate ch Number of audio channels tp Protocol (``UDP`` [TODO: or ``TCP``?]) ======= ======================================================= Rogue Amoeba extensions ~~~~~~~~~~~~~~~~~~~~~~~ ============== ======================================= Field Description ============== ======================================= rast ``afs`` if Airfoil speaker ramach ``{Platform name}.{OS major version}`` raver Library version raAudioFormats ``ALAC`` or ``L16`` ============== =======================================