Related Topics:
Watermark Robustness Against
Data Compression
Expect to see my formal paper on my algorithm soon.
With alot of luck, it will be published at this year's data compression
conference, after which i will post it up here.
Table of Contents:
1. The DivX ;-) CoDec
2.
MPEG Compression: What Does It Mean?
3. MP3s
4. TV vs. CRT
5.
Flask Mpeg: Global Project Options
(Info here is only applicable to flaskmpeg and divx3.11)
6. Other
Forms of Compression
7. MP3: The Nitty-Gritty
8. PCM: Wave Files
9. Number Systems
10. Text Compression
11. Run Length Encoder
12. Dictionary Based Encoding
13.
Linear Predictors in Audio or Text
I've decided to create a kind of motley
crew of a resource for those of you out there that are interested in Compression
techniques and the such. The primary focus is to give you the knowledge
necessary to make educated decisions if you ever do video editing.
Its second purpose is to go through certain software i find particularly
helpful and describe its use from my point of view. This page will
serve a rough guide to those out they are trying to understand compression
and the particular multimedia aspects of it that are related. I foremost
apologize for my English; it is bad i know. Well, enough said... onto
to the knowledge. I apologize for the seemingly disconnectedness the
path of this readme moves. Trust me, the result will explain Flask,
but I wish to get you the background information that will allow you to make
your own educated decisions about which settings to use now and in the future.
I am planning to add some examples of the topics discussed on this site.
I also plan to re-arrange this site with a table with targets. If after
reading this you have any questions about compression just
drop me a line
and i will do my best to answer your question. This is good for me
cause i can upload your question here to make the readme more thorough.
An Introduction to Flask Mpeg:
We will begin in the "Output format options"
selection on the menu. In here, you can choose between the available
codecs (COmpressor DECompressor) on your computer. For the video aspect,
i am assuming that everyone will be using the DivX codec, so i will begin
with some basic knowledge on audio and video compression.
Video Codec: The Two DivX Codecs
As you will see, there are two DivX codecs,
a low-motion and a high-motion codec. The codecs are not as clear-cut
as their name entails. If you have read the latest article on
Tom's Hardware,
it is shit... Here is my rebuttal
to his article. Anyways, i will continue my discussion here.
The major difference between the two codecs is that the low-motion uses
a static bitrate and high-motion uses a variable bitrate. What this
means, is that the high motion will use less bits to encode a low motion
scene than a high motion scene. This improves quality for the high
motion sequences because the bitrate you specify is conserved for use on
these rapidly changing scenes. However, two other problems arise from
this. The first is that the low motion scenes see the compromise, and
unfortunately, this is where quality counts too. A very poorly looking
slow scene really shows, while the high motion scenes may look better, they
are usually few and far between, this is my personal view from my experiences.
The second hit comes from the fact that if you set the bitrate too high for
the codec it may go up to that upper bound and then some slower computers
may not be able to keep up in processing all of this data and performance
will be sacrificed, recreating the problem you were trying to avoid in the
first place. Furthermore, the bitrate for the low motion scenes is
calculated from an internal algorithm in the codec that encodes with the
the smallest bitrate achievable for a certain threshold. So your low-motion
scenes will look like shit. You cannot specify this threshold in 3.11,
but you can in 4.01 of the DivX codec.
MPEG Compression: What Does it Mean?
MPEG is an acronym for the "Moving
Pictures Expert Group"; this group goes about setting standards for digital
video and audio. Video also has an MPEG: MPEG-1, MPEG-2 (DVD compression),
MPEG-4, MPEG-7, And MPEG-21. MPEG-4 is going to be very useful for
gamers as it is based on splitting the picture up into specific objects.
This feature I have not seen implemented anywhere yet, but expect to see
it later this year. MPEG-2 is the best quality but uses a high bandwidth,
and MPEG-4 was made to cover a broader scope then MPEG-2.
MPEG-1 video was limited to the size
of 352 pixels x 288 pixels at 30 fps (frames per second) of a
de-interlaced
signal. Because MPEG-2 was made specifically for high bandwidth systems,
it has scalability built in and supports any resolution. However,
for some odd reason (odd at least to me) the MPEG-2 standard specifies an
interlaced signal. Hardly what one would expect for a high bandwidth.
Though, this 15 individual frames can be encoded with twice the de-interlaced
bitrate because there is half as many unique frames to split up for the overall
bitrate. Here is the problem here though. When you do MPEG encoding
on an interlaced signal, MPEG compression just detects movement of a group
of pixels. But since interlaced skips every other line the difference
in pixel value (an integer specifying the color) will be greater when comparing
pixel-to-pixel neighbors not on the same line. This can cause some
error resulting in a less effective retention of the original image.
Either way, DVDs and HDTVs use either 60 or 120 half fps, which i have been
told shows no artifacts in an interlaced signal.
Many people believe that if they change
the fps from a NTSC 29.97fps to a PAL 23.96 (the orignal black and white
standard was 30 and 25 fps respectively. However, adding colour required
an extra 1% overhead so that is why the true frame rate is little less) they
can compress the movie even more because there are fewer frames to be encoded.
This is not entirely true. When MPEG monitors the movement of a 16x16
block of pixels, it registers the movement as a "motion vector" and that is
what is stored. If you change the frame rate, this is what the computer
has to do. First, it needs to construct the 30 original frames, and
then it needs to divide those 30 frames into 25. It is easy to see that
30 divided by 25 is not a happy number, but this will be useful in a specific
case in MPEG-2 encoding. Anyways, changing the frame rate would result
in a bigger motion vector and hence more information that needed to be stored.
You will however increase the amount of computation required on the decoding
end because their will be higher values for each frame but you will be adding
a little to the amount of time the decoder has to make a frame since it needs
to only draw one every 1/25 of a second instead of every 1/30.
Now here is some more of the nitty gritty
of MPEG video compression... What it is actually doing behind the scenes.
You would think that MPEG would emulate what a film projector does and actually
store every single frame of the video. It does, but not all are created
equal. There are three major types of frames (there are 1 or 2 others
that are really remotely discussed) I, P, B. The I frame is the most
important and has maybe a 2:1 compression ratio, the p frames are the worst
quality as they have maybe 8:1 or 16:1, while B frames is only 4:1.
The reason for this is similar to differential encoding where the next value
is stored in terms of how it differs from the previous. In this case
the I frame is the core image (also inserted to prevent error propagating
through the entire video) and the P frames are the differences. The
way it works is in 16x16 blocks, it quantizes the pixels in this block by
using the Discrete Cosine Transform (DCT). This is a lossy form of compression
but it tries to take advantage of the peculiarities to what the human eye
can see (though not a pyscho-physical model of the human visual system).
This is actually exactly how JPEG pictures work if you were wondering.
Anyways, that is how the first frame is encoded. The next frame is
a P-Frame, what it does is the DCT and looks for 16x16 pixels that have the
same exact values in the I-Frame and creates a "motion vector" to tell the
decompressor how far this 16x16 block has moved from whence it was.
The reason it is a vector is because it has a direction and a magnitude (number
of pixels). It the encodes just the motion vector for the image.
Audio Codec:
For audio, I recommend Radium's mp3 codec,
at 44.1KHz (or 48KHz if a DVD) at 128Kb/sec, notice that a lower case 'b'
means bits, not Bytes. The difference is that there is eight bits in
a Byte. Now here is a little information on the mp3 encoding standard
(good news, LSF is coming out with a newer version later this year, but may
have digital rights protection built in).
MP3:
MP3 means MPEG-1 Layer-3; MPEG has 3
layers, the last of which most people use, and the less frequently used Layer-2
(MP2) and lastly Layer-1 which can be found with MPEG movies and VCDs.
The difference in the compression is this. Using mp3, the bitrate for
"untrained ear" not to notice a difference is 128kbps, while for MP2 you
need 160kbps. Therefore, that is the difference from the different
layers. I find it incredibly wasteful to see a bitrate of 320kbps for
a mp3, when not even half of that is fine for most users. However,
most of you probably have not even heard of mp2's, well it is the predessor
to mp3s. To continue, the technology built into the different layers
is such that if you have a MPEG-1 (layer 1) audio player, you can play mp3s
on it, but without the extra information that makes mp3s better. This
is the backward compatibility that we have all come to enjoy.
Is MP3 the best compression scheme to
use for all audio? No, MP3 is good for music where shit is all over the audio
spectrum. There are specific compressors that are made for voice, and
do a hell of lot better than mp3, compression wise. i.e., your cell
phone uses this technology, and that is why you sound a little different on
a cell phone. It can only handle voice, and when there is background
noise it gets all fucked up.
For point of reference, the Audio on
a DVD uses the MPEG-2 (note: not layer 2) audio standard. Mpeg-1 audio
only supports two channels, which is enough to have Dolby Surround (5.1),
while Mpeg-2 supports Dolby digital (6.1) and actually has multiple channels.
TV vs. CRT: The Standards of Each and How this Affects Digital Compression
Well, because of limiting technology we are now faced with two different standards for how video is displayed. One is for Television, camcorders and DVDs and the other is for Cathode Ray Tubes (CRT (otherwise known as a computer monitor)) and HDTV.
TV: A Brief History
Well... we all know the first TV was
black and white, but I'm sure many of you wondered how can a black &
white TV and a Color TV work off the same signal. Well, here you go...
Back when the TV signal was being standardized in the 1930's, they choose
to send an image composing of 525 horizontal lines (if you look real hard
you can see 'em). The new DTV standard is 1080 scan lines with 1920
pixels per line. This is about 5.5x as many pixels as a standard TV.
The digital TV that most people have today, usually in conjunction with their
cable modem, is not HDTV as some would think. It is a lower subset of
DTV, and its resolution is 720x1280. Here is the kicker, since HDTV
uses mpeg-2 compression, the only difference between your regular TV and an
HDTV, besides the different aspect ratios, is that the HDTV has a built in
MPEG-2 video decoder chip, which is present in all set-top DVD players.
So i ask, why can't you use the same chip to do both (think in computers that
have both a dvd decoder card and a tv card, but have no HDTV support).
This would also imply that HDTV's with built dvd players should be cheaper
than having both individual because there are shared aspects of the hardware.
The glass had a phosphorous coating and
when it was struck by the electron gun in the back of the tube the phosphorous
would glow a certain shade from white to black. Now, when color TVs
came out the industry said that all new TVs must be compatible with the current
black and white signal. What came of this was a video format called
YUV. They discovered that if you take a black and white image you could
colorize it by adding a shade of purple/pink and another of blue. They
then realized if they only encoded 1/4 of the original image color, there
would be no noticeable difference to the human eye. Therefore, what
they would do is take a 2x2 pixel block and average all their values into
one value to get a decrease to 1/4 the original number of pixels. They
then encoded these two extra fields into the TV signal such that a normal
black and white TV would just ignore these two extra fields. TVs were
also adjusted to have three electron guns, one for black and white and one
for each of the two new fields. They would work in unison to recreate
a color image on the screen.
Now, the AC outlets in our homes in the
US have a frequency of 60hz so it is easiest to make the refresh rate a multiple
of this. Therefore, that is why the standard frame rate in the US and
Japan is 30 fps (actually 29.97 and 30 for computers). This standard
was later called NTSC for "National Television Systems Committee".
Europe on the other hand, has an AC frequency of 50hz, so resulting in 25
fps called PAL. France went off and made their own standard called
SECAM. Now, they were a little ambitious, and a problem arose with
bandwidth, they could not send 30 fps: they were limited to 15fps.
The solution split the frame up into two parts.
The first part was composed of all the even lines and then half of the last
line (not definite that it was last line) and the second part was all the
odd lines and the other half of that line, this is called an "interlaced
signal". Because the TV was drawing every other line, it was drawing
an original 15 fps into 30 fps without the eye noticing. Now, monitors
on the other hand just move from the top of the frame to the bottom of frame
doing each line de-interlacing the signal, this is called "Progressive Scanning".
This results in a finer image with better resolution than its interlaced counterpart
does.
The dilemma, what happens when you output
an interlaced source to a non-interlaced source? Well, you get this
artifact called "combing effects". That is, if you look at the edges
of objects in motion you can actually see all the odd lines matching up and
all the even lines matching up and it looks like a comb. Yeah, I know,
this is a little difficult to visualize so here is an
example
of combing effects. You can look at the flag in the background to
see how a static object looks and my roommate Mark looks when dancing. Fortunately
for us, we have DVDs, that great digital video. Well, unfortunately
the MPEG-2 standard for video is an interlaced signal, so if you buy a fancy
progressive scanning DVD player, it do not mean squat unless you have a HDTV.
That is why computer DVD decoder cards must have what is called a "combing
filter" that eliminates this artifact.
Now, the other problem with outputting
a NTSC signal to a digital screen, as in a monitor, is the actual resolution.
The standard depicts that their is 525 lines, but doesn't specify a horizontal
dimension. Nor is it safe to assume that there is just one pixel to
each line. The way the signal works out to be has a resolution of 720x485
at 30fps (CCIR-601). However, the MPEG standard works on a matrix of
the size of 16x16 pixels. As you can clearly see, 485 is not divisible
by 16, so this presents a problem from the aspect of MPEG compression.
As the result, the resolution on a DVD is 720x480, which is divisible by 16.
Flask Mpeg: Global Project Options
This section will deal specifically with
setting up a good environment to encode a MPEG-2 video stream (DVD).
Each of the sections below refers to each of the option tabs.
Video:
Here are the optimal settings that i
have found through my and a friend's many DVD rips.
IEEE floating point - improves quality of video some,
but if you use this, you need to select 25fps and de-interlace and reconstruct
progressive images. For some reason, even after you de-interlace you
still get combing effects from the interlaced MPEG-2 signal and changing the
frame rate helps eliminate this.
Hower, you probably want to go with MMX iDCT cause it doesn't make that much
of a difference. Make sure the frame rate is at 29.97. Selecting
a lower frame rate does NOT improve compression or quality of the film, i
mention
why elsewheres in this document.
The follwoing is the formula for the bitrate to use for video
encoding
700/filmlength * 1024 * 8
filmlength in seconds, 700meg is cd size
You should get 1151
and then subtract 128 for aaudio, which leave you with about 1000Kb/sec,
round down...
A button here says show output pad. What this does is show you an actual frame from the MPEG-2 stream. Since these are made for a TV, they include the widescreen bars. This is not necessary for digital versions. Therefore, you want to select crop and move the window around until there is no black bars on the top and bottom. Simplified, reduce the height and move the image up with the buttons on the left. Before you start doing any of this, make sure you hit the reset settings button. If you do not, previous settings could screw up the aspect ratio (ratio of height to width) from a previously encoded file. Though cropping seems like a trivial task, it actually is a significant hit to the overall compressing stage, but then again you are not wasting bits on encoding the black bars.
Audio:
Not much to say here, it is straightforward.
Just for reference, unless you have a top of the line audio card, like the
Sound Blaster Live!, 48KHz will not be heard on your sound card. 48KHz
is better quality because it represents a larger range of frequencies.
I had confused the previous term with bit resolution. If the audio is
32-bit, it sounds truer to its analog original than 24 or 16 bit. The
way it works, it divides the vertical wave values up into 2^32 (in the case
of 32-bit) blocks. If you have ever taken calculus, it is equivalent
to a step function. What it is doing is finding the closest digital
step value to the analog one. As you can see, the fewer bits you use,
the larger the distance between the values and more of the signal is lost.
The bit resolution affects the total size of the wave in the respect that
a 32-bit number would be larger than a 16-bit number, and hence a larger file
size.
If your sound card does not support 48KHz,
it will be down sampled to 44.1KHz.
Post-Processing:
I frankly have no idea, but choose
the one that says highest quality. The other buttons and tabs here relate
to the cropping of the video discussed in the video section.
Other Forms of Compression: What, DVD Video is not the only thing that is compressed?
This section will serve as a continuing study in Data Compression. The topics i will discuss in here are little more complicated and actually pertain to other forms of compression, Lossy (MPEG) and Lossless (where no information is lost). Now, i also see the need to talk about other multimedia topics in general to give you an idea of exactly happening.
AUDIO:
Well there are many different
kinds of audio compression, and i have already mentioned MPEG-1 and MPEG-2
audio. I will actually now discuss what is entailed in MPEG-1 audio,
more specifically layer-3. I will also discuss the three major (that
i at least believe) audio formats out there, PCM, DPCM, ADPCM.
MP3: The Nitty-Gritty
We'll begin by stating that mp3s
are a lossy compression technique, but the quality is so good that the average
listener cannot tell the difference from the original and the compressed.
It works by removing any audio information you would not be able to hear anyways.
Hence, it is a compression model based on what the output is going to be.
There have been many tests on the psychoacoustics of the human ear that has
given rise to mp3. To begin, the ear is more sensitive to certain frequencies
than others are, what i mean is that you cannot here a certain frequency
until it passes a certain threshold (amplitude). With this eliminated,
the compressor proceeds to analyze the signal for what are called "masked
frequencies". Since we are dealing with a biological analog to digital
converter, your ear and everything in it, it takes some time to reset itself.
During this time, if you here something of lower amplitude than what you
just heard you cannot hear it. Now here is the freaky thing, the masking
also happens before the louder sound is heard, but on a much smaller time
scale then the masking that occurs after the dominant sound. The compressor
analyzes the audio for these special occurrences and removes them, resulting
in a very compressed file with minimal loss in sound. Of course, the
obvious drawback here is the model of the ear, where IEEE is using the general
model, when every ear is slightly different.
PCM: Wave Files
These are straightforward actually, except
for the last one. PCM stands for Pulse Code Modulation and it is the
typical format of audio on CDs and standard wave files. A better version
is Differential PCM or DPCM. DPCM works by taking the current value
and only storing the difference between itself and the number that precedes
it. For instance if the string of numbers was 5 8 9, then the DPCM values
would be 5 3 1. Now, this results in a Lossless compression scheme
where bits are save from using only one bit for nine instead of the normal
4. ADPCM is adaptive DPCM, tries to predict what the next value is
going to be, and then stores how off it actually was.
Digital Number Systems:
You are probably very well acquainted
with the decimal binary system (0, 1, 2, 3 ... 9), this is a system comprised
of ten digits. In the digital world, the number system is a little
different. There are currently three predominant systems used: binary
(0, 1), octal (0, 1 ... 7), and hexadecimal (Hexa or Hex for short) (0, 1,
... 9, A, B ... , F). Of the three, binary is the most used, and the
reason for this number system comes from the restraints from hardware.
Computers are built pretty much entirely of transistors or little light switches.
These switches have two positions, on (1) and off (0). If you look at
your power button on your computer or your monitor, you will see the one inside
the zero, so now you know where that symbol comes from. Now, the way
the system works is powers of 2, so the number 101 equals 5. 1*2^2
+ 0*2^1 + 1*2^0. This is the same for the other two number systems,
with the bases of 8 and 16 respectively.
Text in the computer is called ASCII
in which every character, printable (like letters) and non-printable (tabs
and returns), each coded with a value from 0-255 in binary. This means
each character is represented by 8 bits since 2^8 = 256, this is also sometimes
called a "byte" in binary. In pictorial languages like Chinese, each
Chinese character is represented by more than one byte.
There also exist a system for which to
denote more than one bit to simplify things.
4 bits = 1 nibble
8 bits = 2 nibbles = 1 byte (aren't computer science people
funny?)
1024 bytes = 1 kilobyte
and so the system progresses where the next tier is always
1024 of the previous tier. The next three progressions are megabyte,
gigabyte, and terabyte. Watch out here, hard drive companies have
gotten into the nasty habit of using the English system for prefixes here
to make their hard drives look bigger than they are. For them 1 billion
bytes equals 1 gigabyte, as you can see there is a gross margin of error
in their rounding. The reason it is 1024 is because it too is a binary
number, 2^10.
Text Compression:
This was the first form of compression,
and is designed to a lossless technique because what good is a paper if
it isn't an exact replica of the original? There are many forms of
text compression and they are relatively easier to understand than MPEG compression
schemes. Text compression is the root of everything, the last step in
any lossy compression technique is a lossless text compression scheme.
Run length encoding is just one example of a lossless compression scheme.
Run Length Encoder:
Well, way back when... someone noticed
that to write the number '1' in ASCII was "00000001". As you can see,
the only important bit here is the least significant bit (bit farthest right),
if there was only a way to chop off all those zeros. Well, there is
and it is called Run Length Encoding. What happens is that the encoder
looks for repeats, at either the bit level or byte level. When it finds
a repeating character, it notes the repeating character, and outputs some
flag or marker to tell the decoder it is not a standard number, the number
of repetitions and finally the repeated character. Therefore, the above
would change into something more like "F701", where 'F' is the flag.
Dictionary Based Text Compression:
I will begin with the simplest of all
the dictionary types of encoding, static dictionary methods. The way
dictionary methods work in general is that they see a character and look into
the dictionary to if there is a match, just the first character. It
keeps searching until the encoder reaches the largest possible string match
that is in the dictionary and encodes the value of that dictionary entry.
Such that you will get a one number to represent an entire string.
In static or fixed dictionary methods, when the the dictionary is filled you
stop adding entries into the dictionary. The dictionary is of a fixed
size for decoding purposes and 4K (4096) is a common dictionary size because
it means you have 12 bit pointer lengths.
There are many different schemes for
creating the dictionary. One popular method is to start the dictionary
with the alphabet (ASCII values 0-255 for example) and when you encode a
match you add the match plus the next character into the dictionary.
If the string to be encoded was "aba", the dictionary would find that the
longest match would be "a" and add "ab" to the dictionary. The type
of dictionary created from this method is referred to as a prefix dictionary
because every word in the dictionary also has its prefix. For example,
if the string "hello" was in the dictionary, then "hell", "hel", "he", "h"
are also in the dictionary.
Now in a dynamic dictionary method you
create the dictionary but when the dictionary is filled you have a removal
scheme that decides which entries to remove, a Least Recently Used (LRU) is
most common.
An interesting adaptation of dynamic
dictionary uses the text before the match as the dictionary. These
methods are called sliding window methods in which there is a window of size
4K. This means from the character you want to encode, the window extends
for 4K characters back in the file. You then search for the longest
match for the string in the window. What you encode is the offset,
to denote where in the window the match begins, and the length of the
match.
Linear Predictors in Audio or Text:
A linear Predictor is a means
by which you use either previously seen data samples to make a "prediction"
on the next sample. The number of data samples you look at to make your
prediction is called the "order" of the prediction. For example, second
order prediction looks at the previous two samples to geuss the next sample.
This specific example is called backwards prediction and is the most
commonly used in compression because it requires least amount of side information
to be used during decompression. In this case the side information
is the "quantization error" or difference between the prediction and the
actual value. The way you derive the prediction from the previous values,
is by assigning each value a "weight", of how much influence it has in the
prediction. Weights can be calculated in 2 ways, global average weights,
where each sample has an associated weight, or each sample has N
weights, where N is the order of prediction. Obviously, the
second one produces better results, but requires more processing. In
the latter, the weights for each sample used in the prediction would all add
up to approximately 1. You then just multiply that samples weights time
its value and add it to the other products of the other samples used for
prediction.
Last updated 2/6/01«««