|
Idea
|
|
Preface
The general idea behind UDP radio is to make a simple yet robust protocol for
sending audio data over a network.
Transmit scheme
The transmitter reads in x (about 1) seconds of 44100 Hz audio data. It splits this in
y (about 60) pieces from 0 to y-1. Piece 0 contains sample 0, y, 2y, 3y, 4y etc.
of the data read. Piece 1 contains sample 1, y+1, 2y+1, 3y+1 etc.. The same for piece 2 and on.
The pieces are then sent over the network at a rate of x/y Hz. The benefits of this scheme
are simple: if a packet (containing one piece) gets lost, there won't be a big gap in
the received audio, only the sample frequency and thus the audio quality decreases a little. In fact,
at least y consecutive packets should be lost to have a chance of hearing a gap.
Furthermore, if a gateway on a high bandwith network receives the data and wants to retransmit
it over a low bandwith network, it only has to select half, one third or so of the packets, it
does not need to do resampling or other things to reduce the bandwith.
The value of x should be low, so prebuffering time and buffersizes can be low, y should
be dividable by many primes, so there are many dividers and possible samplerates which
can be easily selected. Also, 44100*x/y should be close to but below 1500, the MTU of
most networks. Bigger packets mean greater efficiency, but packets that have to be
split in fragments have a bigger chance to get lost and use more CPU time to reconstruct.
I'd choose 60=2x2x3x5, which gives us a choices of approx. 44 kHz, 22 kHz, 11 kHz (very common) and 9 other frequencies.
Combined with the perfect 1 second slice of audio data, packets are 735 bytes in size.
We also need a small header for synchronisation info, and the UDP header is put into the
final packet.
This scheme sends 8 bit mono data. It is very easy however to get 16 bit stereo:
we just send 4 times as many packets, each identified as left or right and msb or lsb.
However, if someone wants to receive or retransmit mono data, it has to choose either left
or right packets. There is another way of transmitting the data. We send 'mixed' packets
and 'seperation' packets. Suppose L is the value of a left sample and R the value of the
corresponding right sample. Now we calculate M (the 'mixed' sample value) and 'S'
(the 'seperation' sample value). M=(L+R)/2 and S=M-L. The reconstruction formulas are
easily derived: M-S=L and M+S=(L+R)/2+(L+R)/2-L=L+R-L=R. This way, receivers that just want
mono data don't have to distinct mono and stereo broadcasts. Maybe some support might be
nice for the future 128 bit hexadecaphonic broadcasts on the common 10 Terabit networks.
Compression
The hard part is compression. The only way to compress the data stream without
losing one of the benefits of the scheme described above, is to compress the data
of each packet seperately. All msb packets can be seen as 1 second of 8 bit mono 735 Hz
audio data, but it gets close to noise. The lsb packets are noise. Because
of this rare scheme, it is hard to believe someone has already made a compression technique
for this situation. Hmm, I foresee long and sleepless nights...
Copyright (C) Guus Sliepen. Last updated: 16 november 1998. Send comments to root@sliepen.eu.org