|
So what does it mean?
Before I can explain the title of my thesis, I'll have to explain data reduced audio. (If that hasn't lost you already, you should be reading the technical summary of my research). But for those of you whose eyes just glazed over, what we're trying to do is throw away most of the information in a sound signal, without making it sound any different, so that it doesn't take up as much storage space.
So how good does a system which throws away most of the data in an audio signal sound? Theoretically, as good as the original, as long as it only throws away components of the sound that we can't hear. Don't believe me? If you have an audio capable PC, you can compare the following audio files for yourself:
- File 1 contains the original audio data. This takes up 1.4Mbits per second for CD quality sound.
- File 2 contains the data reduced version. This sound file has been data reduced by a ratio of 15:1, and only takes up 10kB per second for CD quality sound, by throwing away supposedly inaudible information.
- File 3 is simply the difference between the two files.
Each file is 172kB in length, and is a 44.1kHz 8bit mono dithered copy of a 16bit stereo original. The data rates above refer to the original 16bit files. The hiss is due to the reduction down to 8bits and is not part of the demonstration!
Can you hear the difference? To me it sounds like the transients (eg the hi-hat in the rhythm section) have been "softened" a little, but that's all. It's not terribly noticeable. Certainly when you play file 1, then file 2, you wouldn't believe that the difference was as much as it's shown to be by subtracting them (as given in file 3). Most of the differences are hidden by the music itself. If we can throw away most of the data, but hide ALL these differences in the music, then we've got a perfect coding system.
The point to all this is that by reducing the amount of data needed to store the sound by fifteen times, you can fit fifteen times as much music on a CD, fifteen times as many radio stations in a given frequency range, or download audio from the internet 15 time faster.
This isn't my research. (!) The problem which I am trying to address is how to assess the quality of the data reduced audio. More importantly, if there are two different ways of carrying out this data reduction, how do you know which is the best? The data reduction is based on human hearing (or rather, what human's can't hear), so traditionally you have to ask a real person's opinion.
My task is to model human hearing, to predict what is and isn't audible. If there's a difference which IS audible, it would be useful to know just how audible, or annoying, the difference is.
Where my work really hits new ground is in assessing how coding schemes affect sound localisation. In a stereo system, the general idea is that you have a speaker to your front-left, another to your front-right, and sound fills up the space in-between, so the instruments sound like they come from the positions they were at in the recording studio or concert. Sometimes the data reduction scheme is completely ignorant of this, and treats the signals going to each loudspeaker as being independent. So it's possible to throw away parts of the signal which would be inaudible on their own, but contribute to the perceived position of a given instrument. Modelling how the human auditory system locates sound in space, so that this problem can be detected and assessed, is the most challenging part of my research.
There you go. There's a little more to it than that, but since you didn't click on the technical summary you're probably asleep by now! When, in ten to fifteen years time, you have a radio set that offers you 100 CD quality radio stations, you'll think "Oh yeah, David was trying to test all that stuff when he was at Essex."
So, the title means: "Perceptual model..." - it's a simulation of what humans perceive. "...for assessment of coded audio" - to determine how good the audio will sound to a human listener.
Thanks for reading.
|