Compression






Related Topics:
Watermark Robustness Against Data Compression
Expect to see my formal paper on my algorithm soon.  With alot of luck, it will be published at this year's data compression conference, after which i will post it up here.
 

Table of Contents:
1.  The DivX ;-) CoDec
2.  MPEG Compression: What Does It Mean?
3.  MP3s
4.  TV vs. CRT
5.  Flask Mpeg: Global Project Options (Info here is only applicable to flaskmpeg and divx3.11)
6.  Other Forms of Compression
7.  MP3: The Nitty-Gritty
8.  PCM: Wave Files
9.  Number Systems
10.  Text Compression
11.  Run Length Encoder
12.  Dictionary Based Encoding  
13.   Linear Predictors in Audio or Text


    I've decided to create a kind of motley crew of a resource for those of you out there that are interested in Compression techniques and the such.  The primary focus is to give you the knowledge necessary to make educated decisions if you ever do video editing.  Its second purpose is to go through certain software i find particularly helpful and describe its use from my point of view.  This page will serve a rough guide to those out they are trying to understand compression and the particular multimedia aspects of it that are related.  I foremost apologize for my English; it is bad i know.  Well, enough said... onto to the knowledge.  I apologize for the seemingly disconnectedness the path of this readme moves.  Trust me, the result will explain Flask, but I wish to get you the background information that will allow you to make your own educated decisions about which settings to use now and in the future.  I am planning to add some examples of the topics discussed on this site.  I also plan to re-arrange this site with a table with targets.  If after reading this you have any questions about compression just drop me a line and i will do my best to answer your question.  This is good for me cause i can upload your question here to make the readme more thorough.

An Introduction to Flask Mpeg:
    We will begin in the "Output format options" selection on the menu.  In here, you can choose between the available codecs (COmpressor DECompressor) on your computer.  For the video aspect, i am assuming that everyone will be using the DivX codec, so i will begin with some basic knowledge on audio and video compression.

Video Codec:  The Two DivX Codecs
    As you will see, there are two DivX codecs, a low-motion and a high-motion codec.  The codecs are not as clear-cut as their name entails.  If you have read the latest article on Tom's Hardware, it is shit... Here is my rebuttal to his article.  Anyways, i will continue my discussion here.  The major difference between the two codecs is that the low-motion uses a static bitrate and high-motion uses a variable bitrate.  What this means, is that the high motion will use less bits to encode a low motion scene than a high motion scene.  This improves quality for the high motion sequences because the bitrate you specify is conserved for use on these rapidly changing scenes.  However, two other problems arise from this.  The first is that the low motion scenes see the compromise, and unfortunately, this is where quality counts too.  A very poorly looking slow scene really shows, while the high motion scenes may look better, they are usually few and far between, this is my personal view from my experiences.  The second hit comes from the fact that if you set the bitrate too high for the codec it may go up to that upper bound and then some slower computers may not be able to keep up in processing all of this data and performance will be sacrificed, recreating the problem you were trying to avoid in the first place.  Furthermore, the bitrate for the low motion scenes is calculated from an internal algorithm in the codec that encodes with the the smallest bitrate achievable for a certain threshold.  So your low-motion scenes will look like shit.  You cannot specify this threshold in 3.11, but you can in 4.01 of the DivX codec.

MPEG Compression: What Does it Mean?

    MPEG is an acronym for the "Moving Pictures Expert Group"; this group goes about setting standards for digital video and audio.  Video also has an MPEG: MPEG-1, MPEG-2 (DVD compression), MPEG-4, MPEG-7, And MPEG-21.  MPEG-4 is going to be very useful for gamers as it is based on splitting the picture up into specific objects.  This feature I have not seen implemented anywhere yet, but expect to see it later this year.  MPEG-2 is the best quality but uses a high bandwidth, and MPEG-4 was made to cover a broader scope then MPEG-2.
    MPEG-1 video was limited to the size of 352 pixels x 288 pixels at 30 fps (frames per second) of a de-interlaced signal.  Because MPEG-2 was made specifically for high bandwidth systems, it has scalability built in and supports any resolution.  However, for some odd reason (odd at least to me) the MPEG-2 standard specifies an interlaced signal.  Hardly what one would expect for a high bandwidth.  Though, this 15 individual frames can be encoded with twice the de-interlaced bitrate because there is half as many unique frames to split up for the overall bitrate.  Here is the problem here though.  When you do MPEG encoding on an interlaced signal, MPEG compression just detects movement of a group of pixels.  But since interlaced skips every other line the difference in pixel value (an integer specifying the color) will be greater when comparing pixel-to-pixel neighbors not on the same line.  This can cause some error resulting in a less effective retention of the original image.  Either way, DVDs and HDTVs use either 60 or 120 half fps, which i have been told shows no artifacts in an interlaced signal.
    Many people believe that if they change the fps from a NTSC 29.97fps to a PAL 23.96 (the orignal black and white standard was 30 and 25 fps respectively.  However, adding colour required an extra 1% overhead so that is why the true frame rate is little less) they can compress the movie even more because there are fewer frames to be encoded.  This is not entirely true.  When MPEG monitors the movement of a 16x16 block of pixels, it registers the movement as a "motion vector" and that is what is stored.  If you change the frame rate, this is what the computer has to do.  First, it needs to construct the 30 original frames, and then it needs to divide those 30 frames into 25.  It is easy to see that 30 divided by 25 is not a happy number, but this will be useful in a specific case in MPEG-2 encoding.  Anyways, changing the frame rate would result in a bigger motion vector and hence more information that needed to be stored.  You will however increase the amount of computation required on the decoding end because their will be higher values for each frame but you will be adding a little to the amount of time the decoder has to make a frame since it needs to only draw one every 1/25 of a second instead of every 1/30.
    Now here is some more of the nitty gritty of MPEG video compression... What it is actually doing behind the scenes.  You would think that MPEG would emulate what a film projector does and actually store every single frame of the video.  It does, but not all are created equal.  There are three major types of frames (there are 1 or 2 others that are really remotely discussed)  I, P, B. The I frame is the most important and has maybe a 2:1 compression ratio, the p frames are the worst quality as they have maybe 8:1 or 16:1, while B frames is only 4:1.  The reason for this is similar to differential encoding where the next value is stored in terms of how it differs from the previous.  In this case the I frame is the core image (also inserted to prevent error propagating through the entire video) and the P frames are the differences.  The way it works is in 16x16 blocks, it quantizes the pixels in this block by using the Discrete Cosine Transform (DCT).  This is a lossy form of compression but it tries to take advantage of the peculiarities to what the human eye can see (though not a pyscho-physical model of the human visual system).  This is actually exactly how JPEG pictures work if you were wondering.  Anyways, that is how the first frame is encoded.  The next frame is a P-Frame, what it does is the DCT and looks for 16x16 pixels that have the same exact values in the I-Frame and creates a "motion vector" to tell the decompressor how far this 16x16 block has moved from whence it was.  The reason it is a vector is because it has a direction and a magnitude (number of pixels).  It the encodes just the motion vector for the image.

Audio Codec:
    For audio, I recommend Radium's mp3 codec, at 44.1KHz (or 48KHz if a DVD) at 128Kb/sec, notice that a lower case 'b' means bits, not Bytes.  The difference is that there is eight bits in a Byte.  Now here is a little information on the mp3 encoding standard (good news, LSF is coming out with a newer version later this year, but may have digital rights protection built in).

MP3:
    MP3 means MPEG-1 Layer-3; MPEG has 3 layers, the last of which most people use, and the less frequently used Layer-2 (MP2) and lastly Layer-1 which can be found with MPEG movies and VCDs.  The difference in the compression is this.  Using mp3, the bitrate for "untrained ear" not to notice a difference is 128kbps, while for MP2 you need 160kbps.  Therefore, that is the difference from the different layers.  I find it incredibly wasteful to see a bitrate of 320kbps for a mp3, when not even half of that is fine for most users.  However, most of you probably have not even heard of mp2's, well it is the predessor to mp3s.  To continue, the technology built into the different layers is such that if you have a MPEG-1 (layer 1) audio player, you can play mp3s on it, but without the extra information that makes mp3s better.  This is the backward compatibility that we have all come to enjoy.
    Is MP3 the best compression scheme to use for all audio? No, MP3 is good for music where shit is all over the audio spectrum.  There are specific compressors that are made for voice, and do a hell of lot better than mp3, compression wise.  i.e., your cell phone uses this technology, and that is why you sound a little different on a cell phone.  It can only handle voice, and when there is background noise it gets all fucked up.
    For point of reference, the Audio on a DVD uses the MPEG-2 (note: not layer 2) audio standard.  Mpeg-1 audio only supports two channels, which is enough to have Dolby Surround (5.1), while Mpeg-2 supports Dolby digital (6.1) and actually has multiple channels.

TV vs. CRT:  The Standards of Each and How this Affects Digital Compression

    Well, because of limiting technology we are now faced with two different standards for how video is displayed.  One is for Television, camcorders and DVDs and the other is for Cathode Ray Tubes (CRT (otherwise known as a computer monitor)) and HDTV.

TV: A Brief History
    Well... we all know the first TV was black and white, but I'm sure many of you wondered how can a black & white TV and a Color TV work off the same signal.  Well, here you go... Back when the TV signal was being standardized in the 1930's, they choose to send an image composing of 525 horizontal lines (if you look real hard you can see 'em).  The new DTV standard is 1080 scan lines with 1920 pixels per line.  This is about 5.5x as many pixels as a standard TV.  The digital TV that most people have today, usually in conjunction with their cable modem, is not HDTV as some would think.  It is a lower subset of DTV, and its resolution is 720x1280.  Here is the kicker, since HDTV uses mpeg-2 compression, the only difference between your regular TV and an HDTV, besides the different aspect ratios, is that the HDTV has a built in MPEG-2 video decoder chip, which is present in all set-top DVD players.  So i ask, why can't you use the same chip to do both (think in computers that have both a dvd decoder card and a tv card, but have no HDTV support).  This would also imply that HDTV's with built dvd players should be cheaper than having both individual because there are shared aspects of the hardware.
    The glass had a phosphorous coating and when it was struck by the electron gun in the back of the tube the phosphorous would glow a certain shade from white to black.  Now, when color TVs came out the industry said that all new TVs must be compatible with the current black and white signal.  What came of this was a video format called YUV.  They discovered that if you take a black and white image you could colorize it by adding a shade of purple/pink and another of blue.  They then realized if they only encoded 1/4 of the original image color, there would be no noticeable difference to the human eye.  Therefore, what they would do is take a 2x2 pixel block and average all their values into one value to get a decrease to 1/4 the original number of pixels.  They then encoded these two extra fields into the TV signal such that a normal black and white TV would just ignore these two extra fields.  TVs were also adjusted to have three electron guns, one for black and white and one for each of the two new fields.  They would work in unison to recreate a color image on the screen.
    Now, the AC outlets in our homes in the US have a frequency of 60hz so it is easiest to make the refresh rate a multiple of this.  Therefore, that is why the standard frame rate in the US and Japan is 30 fps (actually 29.97 and 30 for computers).  This standard was later called NTSC for "National Television Systems Committee".  Europe on the other hand, has an AC frequency of 50hz, so resulting in 25 fps called PAL.  France went off and made their own standard called SECAM.  Now, they were a little ambitious, and a problem arose with bandwidth, they could not send 30 fps: they were limited to 15fps.  The solution split the frame up into two parts.  The first part was composed of all the even lines and then half of the last line (not definite that it was last line) and the second part was all the odd lines and the other half of that line, this is called an "interlaced signal".  Because the TV was drawing every other line, it was drawing an original 15 fps into 30 fps without the eye noticing.  Now, monitors on the other hand just move from the top of the frame to the bottom of frame doing each line de-interlacing the signal, this is called "Progressive Scanning".  This results in a finer image with better resolution than its interlaced counterpart does.
    The dilemma, what happens when you output an interlaced source to a non-interlaced source?  Well, you get this artifact called "combing effects".  That is, if you look at the edges of objects in motion you can actually see all the odd lines matching up and all the even lines matching up and it looks like a comb.  Yeah, I know, this is a little difficult to visualize so here is an example of combing effects.  You can look at the flag in the background to see how a static object looks and my roommate Mark looks when dancing. Fortunately for us, we have DVDs, that great digital video.  Well, unfortunately the MPEG-2 standard for video is an interlaced signal, so if you buy a fancy progressive scanning DVD player, it do not mean squat unless you have a HDTV.  That is why computer DVD decoder cards must have what is called a "combing filter" that eliminates this artifact.
    Now, the other problem with outputting a NTSC signal to a digital screen, as in a monitor, is the actual resolution.  The standard depicts that their is 525 lines, but doesn't specify a horizontal dimension.  Nor is it safe to assume that there is just one pixel to each line.  The way the signal works out to be has a resolution of 720x485 at 30fps (CCIR-601).  However, the MPEG standard works on a matrix of the size of 16x16 pixels.  As you can clearly see, 485 is not divisible by 16, so this presents a problem from the aspect of MPEG compression.  As the result, the resolution on a DVD is 720x480, which is divisible by 16.

Flask Mpeg: Global Project Options
    This section will deal specifically with setting up a good environment to encode a MPEG-2 video stream (DVD).  Each of the sections below refers to each of the option tabs.

Video:
    Here are the optimal settings that i have found through my and a friend's many DVD rips.
IEEE floating point - improves quality of video some, but if you use this, you need to select 25fps and de-interlace and reconstruct progressive images.  For some reason, even after you de-interlace you still get combing effects from the interlaced MPEG-2 signal and changing the frame rate helps eliminate this.
Hower, you probably want to go with MMX iDCT cause it doesn't make that much of a difference.  Make sure the frame rate is at 29.97.  Selecting a lower frame rate does NOT improve compression or quality of the film, i mention
why elsewheres in this document.  

The follwoing is the formula for the bitrate to use for video encoding
700/filmlength * 1024 * 8

filmlength in seconds, 700meg is cd size
You should get 1151
and then subtract 128 for aaudio, which leave you with about 1000Kb/sec, round down...

    A button here says show output pad.  What this does is show you an actual frame from the MPEG-2 stream.  Since these are made for a TV, they include the widescreen bars.  This is not necessary for digital versions.  Therefore, you want to select crop and move the window around until there is no black bars on the top and bottom.  Simplified, reduce the height and move the image up with the buttons on the left.  Before you start doing any of this, make sure you hit the reset settings button.  If you do not, previous settings could screw up the aspect ratio (ratio of height to width) from a previously encoded file.  Though cropping seems like a trivial task, it actually is a significant hit to the overall compressing stage, but then again you are not wasting bits on encoding the black bars.

Audio:
    Not much to say here, it is straightforward.  Just for reference, unless you have a top of the line audio card, like the Sound Blaster Live!, 48KHz will not be heard on your sound card.  48KHz is better quality because it represents a larger range of frequencies.  I had confused the previous term with bit resolution.  If the audio is 32-bit, it sounds truer to its analog original than 24 or 16 bit.  The way it works, it divides the vertical wave values up into 2^32 (in the case of 32-bit) blocks.  If you have ever taken calculus, it is equivalent to a step function.  What it is doing is finding the closest digital step value to the analog one.  As you can see, the fewer bits you use, the larger the distance between the values and more of the signal is lost.  The bit resolution affects the total size of the wave in the respect that a 32-bit number would be larger than a 16-bit number, and hence a larger file size.
    If your sound card does not support 48KHz, it will be down sampled to 44.1KHz.

Post-Processing:
    I frankly have no idea, but choose the one that says highest quality.  The other buttons and tabs here relate to the cropping of the video discussed in the video section.

Other Forms of Compression: What, DVD Video is not the only thing that is compressed?

    This section will serve as a continuing study in Data Compression.  The topics i will discuss in here are little more complicated and actually pertain to other forms of compression, Lossy (MPEG) and Lossless (where no information is lost).  Now, i also see the need to talk about other multimedia topics in general to give you an idea of exactly happening.

AUDIO:
    Well there are many different kinds of audio compression, and i have already mentioned MPEG-1 and MPEG-2 audio.  I will actually now discuss what is entailed in MPEG-1 audio, more specifically layer-3.  I will also discuss the three major (that i at least believe) audio formats out there, PCM, DPCM, ADPCM.

MP3:  The Nitty-Gritty
    We'll begin by stating that mp3s are a lossy compression technique, but the quality is so good that the average listener cannot tell the difference from the original and the compressed.  It works by removing any audio information you would not be able to hear anyways.  Hence, it is a compression model based on what the output is going to be.  There have been many tests on the psychoacoustics of the human ear that has given rise to mp3.  To begin, the ear is more sensitive to certain frequencies than others are, what i mean is that you cannot here a certain frequency until it passes a certain threshold (amplitude).  With this eliminated, the compressor proceeds to analyze the signal for what are called "masked frequencies".  Since we are dealing with a biological analog to digital converter, your ear and everything in it, it takes some time to reset itself.  During this time, if you here something of lower amplitude than what you just heard you cannot hear it.  Now here is the freaky thing, the masking also happens before the louder sound is heard, but on a much smaller time scale then the masking that occurs after the dominant sound.  The compressor analyzes the audio for these special occurrences and removes them, resulting in a very compressed file with minimal loss in sound.  Of course, the obvious drawback here is the model of the ear, where IEEE is using the general model, when every ear is slightly different.
 

PCM: Wave Files
    These are straightforward actually, except for the last one.  PCM stands for Pulse Code Modulation and it is the typical format of audio on CDs and standard wave files.  A better version is Differential PCM or DPCM.  DPCM works by taking the current value and only storing the difference between itself and the number that precedes it.  For instance if the string of numbers was 5 8 9, then the DPCM values would be 5 3 1.  Now, this results in a Lossless compression scheme where bits are save from using only one bit for nine instead of the normal 4.  ADPCM is adaptive DPCM, tries to predict what the next value is going to be, and then stores how off it actually was.

Digital Number Systems:
    You are probably very well acquainted with the decimal binary system (0, 1, 2, 3 ... 9), this is a system comprised of ten digits.  In the digital world, the number system is a little different.  There are currently three predominant systems used: binary (0, 1), octal (0, 1 ... 7), and hexadecimal (Hexa or Hex for short) (0, 1, ... 9, A, B ... , F).  Of the three, binary is the most used, and the reason for this number system comes from the restraints from hardware.  Computers are built pretty much entirely of transistors or little light switches.  These switches have two positions, on (1) and off (0).  If you look at your power button on your computer or your monitor, you will see the one inside the zero, so now you know where that symbol comes from.  Now, the way the system works is powers of 2, so the number 101 equals 5.  1*2^2 + 0*2^1 + 1*2^0.  This is the same for the other two number systems, with the bases of 8 and 16 respectively.
    Text in the computer is called ASCII in which every character, printable (like letters) and non-printable (tabs and returns), each coded with a value from 0-255 in binary.  This means each character is represented by 8 bits since 2^8 = 256, this is also sometimes called a "byte" in binary.  In pictorial languages like Chinese, each Chinese character is represented by more than one byte.
    There also exist a system for which to denote more than one bit to simplify things.
4 bits = 1 nibble
8 bits = 2 nibbles = 1 byte (aren't computer science people funny?)
1024 bytes = 1 kilobyte
and so the system progresses where the next tier is always 1024 of the previous tier.  The next three progressions are megabyte, gigabyte, and terabyte.  Watch out here, hard drive companies have gotten into the nasty habit of using the English system for prefixes here to make their hard drives look bigger than they are.  For them 1 billion bytes equals 1 gigabyte, as you can see there is a gross margin of error in their rounding.  The reason it is 1024 is because it too is a binary number, 2^10.

Text Compression:
    This was the first form of compression, and is designed to a lossless technique because what good is a paper if it isn't an exact replica of the original?  There are many forms of text compression and they are relatively easier to understand than MPEG compression schemes.  Text compression is the root of everything, the last step in any lossy compression technique is a lossless text compression scheme.  Run length encoding is just one example of a lossless compression scheme.

Run Length Encoder:
    Well, way back when... someone noticed that to write the number '1' in ASCII was "00000001".  As you can see, the only important bit here is the least significant bit (bit farthest right), if there was only a way to chop off all those zeros.  Well, there is and it is called Run Length Encoding.  What happens is that the encoder looks for repeats, at either the bit level or byte level.  When it finds a repeating character, it notes the repeating character, and outputs some flag or marker to tell the decoder it is not a standard number, the number of repetitions and finally the repeated character.  Therefore, the above would change into something more like "F701", where 'F' is the flag.

Dictionary Based Text Compression:
    I will begin with the simplest of all the dictionary types of encoding, static dictionary methods.  The way dictionary methods work in general is that they see a character and look into the dictionary to if there is a match, just the first character.  It keeps searching until the encoder reaches the largest possible string match that is in the dictionary and encodes the value of that dictionary entry.  Such that you will get a one number to represent an entire string.   In static or fixed dictionary methods, when the the dictionary is filled you stop adding entries into the dictionary.  The dictionary is of a fixed size for decoding purposes and 4K (4096) is a common dictionary size because it means you have 12 bit pointer lengths.
    There are many different schemes for creating the dictionary.  One popular method is to start the dictionary with the alphabet (ASCII values 0-255 for example) and when you encode a match you add the match plus the next character into the dictionary.  If the string to be encoded was "aba", the dictionary would find that the longest match would be "a" and add "ab" to the dictionary.  The type of dictionary created from this method is referred to as a prefix dictionary because every word in the dictionary also has its prefix.  For example, if the string "hello" was in the dictionary, then "hell", "hel", "he", "h" are also in the dictionary.
    Now in a dynamic dictionary method you create the dictionary but when the dictionary is filled you have a removal scheme that decides which entries to remove, a Least Recently Used (LRU) is most common.
    An interesting adaptation of dynamic dictionary uses the text before the match as the dictionary.  These methods are called sliding window methods in which there is a window of size 4K.  This means from the character you want to encode, the window extends for 4K characters back in the file.  You then search for the longest match for the string in the window.  What you encode is the offset, to denote where in the window the match begins,  and the length of the match.

Linear Predictors in Audio or Text:

    A linear Predictor is a means by which you use either previously seen data samples to make a "prediction" on the next sample.  The number of data samples you look at to make your prediction is called the "order" of the prediction.  For example, second order prediction looks at the previous two samples to geuss the next sample.  This specific example is called backwards prediction and is the most commonly used in compression because it requires least amount of side information to be used during decompression.  In this case the side information is the "quantization error" or difference between the prediction and the actual value.  The way you derive the prediction from the previous values, is by assigning each value a "weight", of how much influence it has in the prediction.  Weights can be calculated in 2 ways, global average weights, where each sample has an associated weight, or each sample has N weights, where N is the order of prediction.  Obviously, the second one produces better results, but requires more processing.  In the latter, the weights for each sample used in the prediction would all add up to approximately 1.  You then just multiply that samples weights time its value and add it to the other products of the other samples used for prediction.

«««
Last updated 2/6/01