Jeff McAdams on 16 Dec 2004 13:16:54 -0000 |
Doug Crompton wrote: > I fully understand the concept of 2X or greater sample rate etc. But when > WMA or MPG music can be 64K bit and have reasonably good fidelity and 96K > or greater and be damn good then why can't we cram a measly 3K audio BW > with lots of wasted space (silence) into MUCH less then the 90Kb they are > spec'ing. Cell technology does not send silent periods but rather > recreates in at the RX end. Silence isn't really silence but rather a > perception that the line is not dead. I would guess that a channel > contains less then 10% of useful audio at any given time. Yup, you're describing several different issues and concepts, but all have been done, can be done, and fairly often enough are done in the VoIP world. For the bandwidth saving, you end up using a different codec. G.711 is 64kbps, there are codecs in fairly common use that use 32kbps down to in the ballpark of 4kbps! These all vary, of course, in the quality with which they reproduce the human voice. Usually the quality difference is fairly noticeable...just as you can usually tell when you call someone whether they're on a cell phone rather than a land line...that's because most cell phones and networks use codecs in the 8-16kbps bandwidth range...you can tell the difference. Part of the issue is that human perception doesn't scale linearly...so comparing the compression of human voice at under 64kbps to music at 128kbps or more doesn't work precisely. You're right that there are compressing codecs that can squeeze voice down into less bandwidth, but it does involve DSP work at each end of the connection, where the use of G.711 PCM audio is pure audio samples, so no DSP processing is needed at all. Yes, DSP's are extremely powerful and inexpensive these days...but they do still cost something, and the ability to essentially remove them completely is still a win for hardware designers. Also, you're not always dealing with dedicated DSP resources, sometimes that gets offloaded to general CPU's in systems, so removing DSP processing lowers the overall CPU overhead, potentially resulting in the ability to use a less powerful CPU in a system, again saving cost. The other concepts you mentioned are called VAD (Voice Activity Detection) and "comfort noise". Basically VAD detects when there's no useful audio being sampled, and quits sending audio samples. It doesn't compress it down, it just quits sending it altogether. "comfort noise" is a feature where a system, when not receiving audio signals from the other side (like when VAD kicks in), will fill the audio output with a low level of white noise. These two features combine to reduce overall bandwidth usage without significant degradation of perception on the call. When there is useful audio being sent, it's still going to consume ~90kbps, or whatever the codec in use uses, but cumulative bandwidth usage will drop because there will be times when nothing is sent at all. Your 10% estimation of useful audio may be a bit optimistic, but I would say you're probably not all that far off! You should be aware, though, that VAD isn't all goodness and light. The use of VAD can result in "clipping" on speech resulting from the system not being able to pick up the leading edge of useful audio and transitioning out of the suppression. So, there is a perceptual difference in using VAD, and it varies depending on the audio conditions at either end of the connection. If there's a higher level of ambient noise, either VAD will not engage reliably, or clipping will end up being more severe as the useful audio is more difficult to distinguish from the ambient noise. > The concept of needing a 64Kb channel to send a 4K analog signal dates > back probably 40 years well before any of the technology we have today. > Today a single chip would do the compression back then it would have taken > a whole rack per channel, so it would not have even been thought of. Absolutely...the DSP technology is there to do some significant bandwidth savings, but there still are real benefits, even today, to using the g.711 codec. The greatest being that it eliminates DSP processing completely, which, when you're designing consumer electronics, even removing a single chip from your design can make the difference between profitability and dot.bomb. The other point to remember is that these other codecs do cut down the bandwidth usage, but its not without trade-offs...there *are* perceptual differences to the audio quality with them, and of course, the more bandwidth you save, the more the audio quality suffers. Many of them are quite good, but as with cell phones, it is noticeable, its not just theoretical. -- Jeff McAdams "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -- Benjamin Franklin Attachment:
signature.asc ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|