Jeff McAdams on 16 Dec 2004 13:16:54 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] VOip


Doug Crompton wrote:
> I fully understand the concept of 2X or greater sample rate etc. But when
> WMA or MPG music can be 64K bit and have reasonably good fidelity and 96K
> or greater and be damn good then why can't we cram a measly 3K audio BW
> with lots of wasted space (silence) into MUCH less then the 90Kb they are
> spec'ing. Cell technology does not send silent periods but rather
> recreates in at the RX end. Silence isn't really silence but rather a
> perception that the line is not dead. I would guess that a channel
> contains less then 10% of useful audio at any given time.

Yup, you're describing several different issues and concepts, but all
have been done, can be done, and fairly often enough are done in the
VoIP world.  For the bandwidth saving, you end up using a different
codec.  G.711 is 64kbps, there are codecs in fairly common use that use
32kbps down to in the ballpark of 4kbps!  These all vary, of course, in
the quality with which they reproduce the human voice.  Usually the
quality difference is fairly noticeable...just as you can usually tell
when you call someone whether they're on a cell phone rather than a land
line...that's because most cell phones and networks use codecs in the
8-16kbps bandwidth range...you can tell the difference.  Part of the
issue is that human perception doesn't scale linearly...so comparing the
compression of human voice at under 64kbps to music at 128kbps or more
doesn't work precisely.  You're right that there are compressing codecs
that can squeeze voice down into less bandwidth, but it does involve DSP
work at each end of the connection, where the use of G.711 PCM audio is
pure audio samples, so no DSP processing is needed at all.  Yes, DSP's
are extremely powerful and inexpensive these days...but they do still
cost something, and the ability to essentially remove them completely is
still a win for hardware designers.  Also, you're not always dealing
with dedicated DSP resources, sometimes that gets offloaded to general
CPU's in systems, so removing DSP processing lowers the overall CPU
overhead, potentially resulting in the ability to use a less powerful
CPU in a system, again saving cost.

The other concepts you mentioned are called VAD (Voice Activity
Detection) and "comfort noise".  Basically VAD detects when there's no
useful audio being sampled, and quits sending audio samples.  It doesn't
compress it down, it just quits sending it altogether.  "comfort noise"
is a feature where a system, when not receiving audio signals from the
other side (like when VAD kicks in), will fill the audio output with a
low level of white noise.  These two features combine to reduce overall
bandwidth usage without significant degradation of perception on the
call.  When there is useful audio being sent, it's still going to
consume ~90kbps, or whatever the codec in use uses, but cumulative
bandwidth usage will drop because there will be times when nothing is
sent at all.  Your 10% estimation of useful audio may be a bit
optimistic, but I would say you're probably not all that far off!

You should be aware, though, that VAD isn't all goodness and light.  The
use of VAD can result in "clipping" on speech resulting from the system
not being able to pick up the leading edge of useful audio and
transitioning out of the suppression.  So, there is a perceptual
difference in using VAD, and it varies depending on the audio conditions
at either end of the connection.  If there's a higher level of ambient
noise, either VAD will not engage reliably, or clipping will end up
being more severe as the useful audio is more difficult to distinguish
from the ambient noise.

> The concept of needing a 64Kb channel to send a 4K analog signal dates
> back probably 40 years well before any of the technology we have today.
> Today a single chip would do the compression back then it would have taken
> a whole rack per channel, so it would not have even been thought of.

Absolutely...the DSP technology is there to do some significant
bandwidth savings, but there still are real benefits, even today, to
using the g.711 codec.  The greatest being that it eliminates DSP
processing completely, which, when you're designing consumer
electronics, even removing a single chip from your design can make the
difference between profitability and dot.bomb.  The other point to
remember is that these other codecs do cut down the bandwidth usage, but
its not without trade-offs...there *are* perceptual differences to the
audio quality with them, and of course, the more bandwidth you save, the
more the audio quality suffers.  Many of them are quite good, but as
with cell phones, it is noticeable, its not just theoretical.
-- 
Jeff McAdams
"They that can give up essential liberty to obtain a
little temporary safety deserve neither liberty nor safety."
                                       -- Benjamin Franklin

Attachment: signature.asc
Description: OpenPGP digital signature

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug