Thursday, July 21, 2011

Bitties and Byttes

After reading Cory Archangels explanation of compression I decided to have a deeper look. 

As a regular destroyer of data, mangler of images, battler of video formatting (NNNNNNNNNNNGHAAARGH!!!!!) i found all of this helpful and interesting.

(92% annotated and mangled from wikipedia or dvdhq. Badly referenced. No cred. The other 8% is likely me getting it wrong. )

BITS/BYTES
bit (binary digit) is the basic unit of information in computers. Bits exist in one of two possible distinct states. These may be the two stable states of a flip-flop, two positions of an electrical switch, two distinct voltage or current levels allowed by a circuit, two distinct levels of light intensity, two directions of magnetization or polarization, 0 or 1, black and white/on-off

There are several units of information which are defined as multiples of bits, such as byte (8 bits), kilobit (either 1000 or 210 = 1024 bits),megabyte (either 8000000 or 8×220 = 8388608bits), etc.
            
 A BYTE consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer.

The size of 8bits was used as it was convenient for displaying 26 alphabetic characters (only uppercase), 10 Numerical digits, and from 11 to 25 special graphic symbols.

  • The askii language could work with 7 bits, for instance, however 8 is divisible into two parts etc if needed. ie:  More efficient, and leaves the option to work in 4bit bytes if needed.

These four-bit quantities were called nibbles, in homage to the then-common 8-bit bytes

In computer architecture16-bit integersmemory addresses, or other data units are those that are at most 16 bits (2 octets) wide.








Colour Gamuts


8-bit color (28 = 256 colors) most early color Unix workstations, VGA at low resolution, Super VGA, color MacintoshesAtari TTAGAFalcon030.


Silicon Graphics systems, Neo Geo, Color NeXTstation systems, and Amiga systems in HAM mode.


























COMPRESSION!

After reading Cory Archangels explanation of compression i decided to have a deeper look myself.

Say hello to the mother of (nearly) all modern lossy compression algorithms.


Using a discrete Fourier transform limited to real numbers, we can translate our source data into frequency space, actually approaching the ideal Karhunen-Loève transform for signals based on limits of stochastic processes with the Markov property...
No, come back, I promise I won't do that again.
Let's start with something slightly simpler. Take these audio waveforms:

Fig. 15 - Three sinusoidal waveforms with different frequencies

Each of those three waveforms can be described using three values: its amplitude, its frequency and its phase. The amplitude is how high it goes, the frequency is how many cycles it goes through per second and the phase is what point of the cycle it starts at.

The fundamental issue: describing those sounds by their "wave properties" takes up a lot less space than storing them as a series of PCM samples.

Now look at this waveform:

Fig. 16 - A complex waveform

Looks considerably more complex, right? But actually there is less to this waveform than meets the eye: it is simply the result of mixing the three signals shown above.

Wouldn't it be great if there was some way to transform a digitised audio file into a list of frequencies that, when mixed, could be used to rebuild the original sound?
This is where something called DCT comes in. DCT stands for "discrete cosine transform", and refers to a group of mathematical functions that can be used to do just that: express a signal in terms of a sum of sinusoids with different frequencies and amplitudes.

I used audio as an example because it's easy to visualise audio waveforms as a 2D bitmap. It's not so easy to visualise the frequencies that make up a coloured, two-dimensional image, but images can also be expressed (or at least approximated) as a set of interfering frequencies. Here the frequency is not related to time (as in the case of audio signals), or to the wavelength of each colour, but rather to space. It refers to the way the image's channels (ex., brightness) vary along the axes, from one pixel to the next (and the next, and so on).
The JPEG (Joint Photographic Experts Group) image format, for example, follows these steps to encode an image:

First the image is converted to the YCbCr colour space, since this generally improves compressibility of "natural" images, and makes it possible to use chroma downsampling. Some encoders skip this stage and encode the image directly in RGB, but that generally results in poorer performance (a lower compression ratio, or lower quality for the same compression ratio).

Then the chroma channels are optionally downsampled using a 4:2:2 or 4:2:0 pattern, depending on the level of compression intended.

Each channel of the image is then divided into 8x8 pixel blocks, and each block is converted into frequencies by the discrete cosine transform (DCT). The encoder doesn't know in advance which frequencies are actually present in the block, though, and searching for every possible frequency can be very slow, so each block is encoded as a linear combination of 64 different patterns (seen in the following figure), each made up of progressively increasing horizontal and vertical frequencies.

Fig. 17 - JPEG DCT patterns (enlarged)

Since we are trying to encode 64 pixels (an 8x8 block) and there are 64 patterns, each of which must be assigned a coefficient, this does not actually reduce the amount of data necessary to represent the image. On the contrary; DCT coefficients will generally need more than 8 bits to represent the original information. Typical values are between 9 and 12 bits, depending on the precision of the encoder. This means that the data volume can actually increase by 50% at this stage.


That is reversed, however, when the DCT coefficients are quantised (More on this later! yay!). The global quantisation level is typically controlled by the "quality" or "compression level" slider in the software used to save the file. The fact that the image is represented by a series of frequency coefficients, however, means the encoder can use different quantisation levels for different frequencies. Human vision is pretty good at detecting subtle variations of colour or brightness across large areas (ex., the colour of the sky in a photograph of the sunset), corresponding to the lowest frequencies, but it's not so good at evaluating the exact magnitude of high frequency intensity variations (ex., it's very hard to distinguish a dark gray line from a black one if both are over a white background). 
(mine)

 The quantisation is very noticeable in areas with smooth colour gradients, but almost unnoticeable in areas with a lot of small detail. JPEG encoders take advantage of this fact and use higher quantisation for higher frequencies (essentially this means the DCT coefficients corresponding to higher frequencies are divided by bigger numbers).
The quantised data is then sorted in a diagonal zig-zag order, starting at the top left corner of the matrix (where the highest values will usually be, since they used less quantisation) and ending at the lowest right corner (where most of the values are likely to be zero, or very close to zero). This reordering improves the next step, which is to apply an optimised RLE algorithm to the sequence of values. Finally, the data is entropy-encoded, either with the Huffman algorithm or using arithmetic coding (most software uses Huffman coding, which is much faster, although not quite as efficient.


As you have probably noticed, JPEG images with high levels of compression sometimes display some "artifacts". These artifacts tend to be more evident near high-contrast edges, and they often have a "block-like" appearance, where each "block" is a square with 8x8 pixels. Now that you know how JPEG uses DCT, the appearance of those artifacts should make more sense. They are simply the result of high quantisation of high-frequency coefficients (more likely to occur in areas with finer detail, such as high-contrast edges).

If you look closely at those blocks, you might even notice that some of them look a lot like the patterns shown in the DCT pattern table, above (although most will be a mix of two or more of those patterns).
Good compressors will also take into acount an effect known as "frequency masking", where some types of variations become less noticeable if they are near other types of variations (for example, after a loud bang, our ears are less sensitive to subtle sound variations, so the audio immediately following an explosion or gunshot can be stored with less precision, improving its compression, and similar optimisations can be used in the compression of images).
The image below is a 3:1 reproduction of a file saved in JPEG with the "quality" slider set to 7 in Adobe Photoshop (which is a typical value used by photographs posted on the internet):

Fig. 18 - JPEG compression artifacts

Images with high-contrast edges or fine detail (such as letters, line drawings, certain types of logos, or images with dithering patterns) are not well-suited to DCT-based algorithms. Not only do they have a bigger tendency to show compression artifacts, but the files often end up being larger than if they had been saved in a lossless compression format, such as GIF or PNG. In other words, using JPEG for that kind of image is a lose-lose situation.
When DCT is used for audio (ex., MP3, Ogg Vorbis, etc.), the quantisation coefficients are distributed differently. Although we can see low (spatial) frequencies very well, we cannot hear low (temporal) frequencies, so both very high and very low audio frequencies can use higher quantisation than the midtones (we are most sensitive to frequencies around 3 kHz). DCT-based audio compression can generally achieve a reduction of 25% or more, compared to ADPCM, for the same sound quality. Even at a 10:1 compression ratio (compared to the source PCM data), DCT-compressed audio files can sound almost indistinguishable from the original.


Where do we go from here?
Two other compression techniques deserve to be mentioned, although they are not as common as DCT-based algorithms.
First, there is fractal compression, which works by looking for similar shapes (at different scales and angles) within an image. Despite much promise and the excitement always caused by the word "fractal", the fact is fractal compression has failed to deliver. Although very efficient in theory, real-world fractal compressors tend to produce worse results than JPEG, except at extremely high compression levels (where JPEG images look terrible and fractal-compressed images look slightly less terrible). In addition to that, fractal compression is slow and requires a great deal of human intervention. 

It is sometimes called "the graduate student algorithm", because the only way to get good compression is to lock a graduate student in a room with a computer until he manages to tweak the fractal compressor for the image you are trying to encode. The poor real-world performance of fractal compressors might be due to the fact that the process is covered by several patents, which limits the number of people and companies allowed to use and improve this technology.

The second method worth mentioning is wavelet-based compression. This encodes a signal as a series of scaled and translated copies of a finite (or fast-decaying) waveform. This concept actually has some similarities with the notion of a fractal, but it is different from fractal compression, since it doesn't look for visual self-similarity. It also has some similarities with DCT, but is a considerably more complex process, and therefore slower to compute. Wavelet compression is used by the JPEG-2000 format (file extension ".JP2" or ".J2C"), which was meant to replace the original JPEG format. However, most web browsers don't have built-in support for JPEG-2000, and the format does not have support for EXIF data (used by digital cameras to store information about each photo directly in the JPEG file), so it hasn't quite caught on. 
Typically, JPEG-2000 files are 20% to 50% smaller than JPEG images, for the same visual quality. They also tend to have less obvious compression artifacts at very high compression levels (they look blurry instead of blocky).

Lossless lossy compression
I mentioned that some of the techniques used in lossy compression can be also used to improve lossless compression. Naturally, I don't mean just those very lucky and very rare cases where the lossy compression algorithm manages to describe the source data perfectly.
A common way to evaluate the differences between a (lossily) compressed version of an image and its original is to subtract the pixel values of one from the other and display the result as a new image (sometimes called "the delta") (COOOOL!). In that image, darker areas indicate smaller differences, brighter areas indicate bigger differences. The following figure shows the difference between the original image and a JPEG-compressed version at a 25:1 ratio.

Fig. 20 - Lossy compression + delta = original data
The contrast of the delta image has been increased and all images are enlarged to 200% to improve visibility. Also, the delta image shown here is unsigned, since computer monitors can't display negative colours (the actual data would contain both positive and negative numbers). Notice how the differences are bigger near edges or areas with small detail (which correspond to higher frequencies), and smaller in low-contrast areas.

Using only the compressed data and the difference from the original (sometimes called "error data"), we could now rebuild the source data exactly (losslessly), simply by adding the two. Of course, adding the size of the compressed data and the difference data, we actually end up with 15 bytes, which is more than the original, so this isn't exactly a good idea. 

No comments:

Post a Comment