121 sound compression

Physics Factsheet
www.curriculum-press.co.uk
Number 121
Sound Compression
When we download tracks, or rip a CD onto our computer, we end
up with a data file from which music can be reproduced. The original
music file has been compressed, often by a factor of perhaps ten
times. There are many methods of sound compression – common
examples are MP3 and WMA. However most methods use the same
basic ideas.
Example: If the analogue p.d. values ranged from –5 volts to
+5 volts, what would be the smallest voltage step which could
be recorded using 16 bit sample values?
Answer: The number of values possible in 16 bits is 216 or
65,536. So the smallest step over this 10 volts range would be
about: 10 / 65 536 = 0.000152 volts or 0.15mV
Why compress the music?
First, let’s look at what happens when music is recorded these days.
(We have seen this in an earlier Factsheet on Compact Discs.)
These steps are so tiny that the original waveform should be
reproduced to a high degree of accuracy.
The old vinyl records stored the sound vibrations as analogue
waveforms. They looked like a sound pattern on an oscilloscope
screen. But nowadays, for many reasons, everything is digitised.
The sound waves are sampled many times each second, and the
sampled values are stored as digital numbers.
Example: Why not record values to 8 bits or 32 bits?
Answer: Eight bits would make the voltage steps too large,
and lose accuracy. Thirty-two bits would double the file size
(and we want to reduce it, if possible).
Quite often in Science we must make sensible decisions
with regards to accuracy. Sometimes accuracy can be sacrificed
to gain other advantages (in this case, smaller file size).
sampled values
We have already seen high frequencies eliminated and sampling
values limited to 16 bits. But we haven’t started proper sound
compression yet. This has all been part of the standard procedure
to turn an analogue waveform into a digital file. This file can then be
turned back into an analogue sound signal for playback.
Some decisions have to be made. How often should we sample the
waveform? To what degree of precision do we store each sample
value?
Analogue
Sampling rate: It can be shown mathematically that you can record
a waveform digitally if you sample at twice the highest frequency.
How many bits of data would be required for a one-minute stereo
song?
Input
A to D
Digital
File
D to A
Analogue
Output
44 100 samples per second × 16 bits per sample × 2 channels × 60
seconds
Example: What is the highest frequency that humans can
hear?
Answer: Everyone is different, but few people can hear much
above 20kHz.
This comes to 8.5 × 107 bits (or 1.1 × 107 Bytes, where one Byte = 8
bits)
A standard sampling rate is 44.1kHz. This means we should be able
to accurately reproduce sound up to a frequency of about 22kHz.
And we can’t hear frequencies above this.
Example: How much sound could you store on a 1GB portable
music player? (1 Byte equals 8 bits, so 1GB = 8Gb)
Example: A bat has swapped his favourite vinyl records for
CDs of the same music. Would he be a happy bunny (or bat)?
But we know from experience that, using MP3 or WMA sound
compression, we can store much more than this. We can now see
the need for compression.
Answer: 8 × 109 / 8.5 × 107 = 94 minutes (less than 2 CDs)
Answer: No. All the sound above 22kHz would be distorted or
missing. And bats can hear well above this frequency.
Exam Hint: Practise expressing large (or very small) numbers
in powers, and use common sense in deciding how many
significant figures to use in your calculations. It is not
uncommon for some candidates to use all of the figures shown
on the calculator in their calculations.
In a way, we have already started compressing the sound file by
ignoring frequencies above 22kHz. But as we can’t hear these
frequencies, there is no trade-off of file size against quality (yet).
Sample values: The voltage values each time we sample the
waveform are often recorded as 16 bit digital numbers. The smallest
is 0000…0000, and the largest is 1111…111.
1
Physics Factsheet
121. Sound Compression
Methods of Compression
Loud sound
Most of the compression is accomplished by the use of
psychoacoustics. This involves the study of what we can actually
hear. A sound we can hear in one situation may be inaudible in
another. We are not interested in the biology of this. But we can use
Physics to determine ways of decreasing the digital file size.
Soft sound
(nearby frequency)
time/s
We can’t hear the soft sound until time T has passed. This might be
5 or 10ms. So our compression system can ignore the soft sound for
several milliseconds after the loud sound stops.
Most of these techniques are considered “lossy”. This
means that when the sound is compressed, some information is
lost forever. When the sound file is reconstructed, it is different
from the original, but it sounds the same.
Analogue
Input
A to D
and
Compressed
Compressed
Digital
File
T
And even more surprisingly, this also happens before a loud sound.
Loud sound
Uncompressed Analogue
and
Output
D to A
Soft sound
(nearby frequency)
T
(a)Simultaneous Masking
time/s
A loud sound can “hide” a soft sound at the same frequency, or a
nearby frequency. We are familiar with this when the Hoover drowns
out the television.
If T is only a few milliseconds, we don’t hear the soft sound at all,
even though it reached our ears before the loud sound started. So
our compression software ignores it.
Our ears have a hearing threshold curve over the audible frequency
range of roughly this shape. (We can only hear sounds above the
threshold.)
Example: A loud sound is played for exactly 20ms. A soft
sound at an adjacent frequency starts 5ms before the loud
sound and finishes 5ms after it.
Can the soft sound be heard at all? Explain.
What percentage of the data can be ignored?
Loudness
/dB
Hearing
Threshold
Answer: No, it cannot be heard. Simultaneous and temporal
masking hide all of it.
You can ignore 30ms of soft sound and only concern yourself
with the 20ms of loud sound. This is 60% of the data being
ignored.
A
Frequency /Hz
(c)Stereo
We can hear sound A because it is louder than the threshold.
Much of the sound going into one ear also goes into the other ear.
The waveforms are often identical – just the amplitudes are different.
However a loud sound will distort the threshold over a range of
frequencies:
Loudness
/dB
Left channel
B
Distorted
Hearing
Threshold
Right channel
Rather than store almost identical sets of data for each channel, it is
often possible for the compression software to store the waveform
just once, and keep track of the different amplitudes for the left and
right channels.
A
Frequency /Hz
While sound B is being played, sound A cannot be heard. Our
compression software can ignore sound A. The more data we can
ignore, the easier it is to compress (reduce the size of) the data file.
(d)Variable bit rate (VBR):
What we have not said so far is that the analogue input signal is
chopped up into small time frames for compression, each lasting
perhaps 26ms. For each time frame the compression software decides
how best it can use the number of bits of data it is allowed.
(b)Temporal Masking
A very loud sound makes it difficult to hear a soft sound at a nearby
frequency for a short time after the loud sound stops. (An explosion
deafens us for a short time.)
But some time frames will have a simple waveform; others a more
complex sound. Some compression software can use a higher bitrate
when the sound is complex, and a lower bitrate for simpler regions,
making sure the average is the required rate.
2
Physics Factsheet
121. Sound Compression
Example: Suppose you chose an overall bitrate of 96kbs-1 for compressing a piece of music lasting 2 seconds. The software
compresses the complex part of the sound lasting 1.2 seconds at 128kbs-1. What bitrate would it have to use for the other 0.8
seconds?
Answer: The average must be 96kbs-1. This means 192kb in 2 seconds.
Already used = 1.2 × 128 = 154kb.
Remaining = 38kb.
Rate over 0.8 seconds = 38 / 0.8 = 48kbs-1.
Practice Questions
1. Why don’t we sample at a frequency higher than 44.1kHz? Why not take sample values at a frequency of 88.2kHz?
2. (a) If sample values were recorded to 8 bits, and the voltage range of the original waveform was 10V, what would be the smallest voltage
step that could be recorded?
(b) Find the smallest voltage step for recording values to 32 bits.
3. Without sound compression, estimate how many GB of memory would be required to store 100 CDs.
4. If you wanted to design an investigation into temporal masking, what variables would you have to consider (or control)?
5. (a) In a variable bitrate (VBR) recording of 30 seconds of music, 20 seconds are recorded at 80kbs-1 and the other 10 seconds are
recorded at 128kbs-1. Work out the average bit rate for the whole recording.
(b) Why is this method of compression superior to constant bitrate (CBR) recording?
Answers
1. The more sample values you record each second, the larger the data file will be, and the less music you could put onto a CD or store in
your MP3 player. A sampling rate of 44.1kHz should record all frequencies that humans can hear.
2. (a) 28 = 256 steps.
10 / 256 = 0.039V or 39mV. Not very accurate.
(b) 232 = 4.30 × 109 steps.
10 / 4.32 × 109 = 2.33×10-9 volts or 2.33×10-6 mV. Unnecessary accuracy.
3. Assume 60 minutes per CD.
For 100CDs, we need 6000 minutes recorded.
For each minute, we need 1.1×107 Bytes.
Total required about 6.6×1010Bytes, or 66GB. (a massive amount of memory)
4. The intensities of the two sounds, the frequency of the loud sound, the difference in frequencies between the two sounds, the time
before or after the loud sound that the soft sound begins or ends, background noises in the room, the quality of the observer’s hearing,
etc.
5. (a) Total bits recorded = (20×80) + (10×128) = 2880kb
Average bit rate = 2880 / 30 = 96kbs-1
(b) More detail for the more complex passages of music.
Acknowledgements:
This Physics Factsheet was researched and written by Paul Freeman
The Curriculum Press,Bank House, 105 King Street,Wellington, Shropshire, TF1 1NU
Physics Factsheets may be copied free of charge by teaching staff or students, provided that their school is a registered subscriber.
No part of these Factsheets may be reproduced, stored in a retrieval system, or transmitted, in any other form or by any other means, without the prior permission of the publisher.
ISSN 1351-5136
3