We have all downloaded files from the internet.You must have noticed that almost always, they are in .rar or .zip format i.e. they are compressed.Compression allows us to reduce the size of a file by up to 75%.

Now,the question that arises is, HOW IS IT DONE? Let me explain.

There are 2 types of compression:

1. Lossless Compresssion: In lossless compression,as the nae suggests,there is no loss i.e. the data is fully                 recovered on decompression. Eg. winRAR files,WinZip files.

2. Lossy Compresion: In lossy compression,there is a loss in the data while compression i.e. thecompressed file is never exactly equal to the original file. But, the loss is generally not noticable.Eg. MP3 files, JPEG pictures.

How does Lossless compression work?

The answer is just one word "redundancy" i.e. repetition of the same data again and again.

Let us consider an example:

"Democracy is government of the people,by the people,for te people".

If we consider a charater to occupy 2 bytes in the memory,then the sentence occupes 138 bytes. You might notice that the words "the" and "people" are reapeated 3 times. So, we create a dictionary by assigning a number to these words. Now we can use the numbers in place of these words. So the sentence becomes:

"democracy is government of 1 2, by 1 2,for 1 2."

The dictionary contains:---1: "the" ,  2: "people" .

Now,the sentence along with the dictionary occupies only 124 bytes. But, the compression program doesn't see this sentence as a group of words.It sees this as a collection of characters.So, its doesn't see which words are repeated. Instead,it sees which combination of charascters are repeated. So, it may take " the people" as a single unit. So ,the dictionary may contain "1 : the people".

Now the sentence becomes :

"Democracy is government of 1, by 1, for 1".

Now, the sentence along with the dictionary occupies only 104 bytes.Thus, we have achieved about 25% compression.And this is just one sentence! Imagine the compression in an entire article,chapter or even the entire book!.

The compression is much more in the programs written in programming languages like C++,java,visual basic,etc. where some keywords are repeated again and again. This was the simplest compression algorithm. There are many others like HoffMann coding and LZW algorithm.

Now,if you are really interested ,try compression this sentence:

"Ask not what your country can do for you,ask what you can do for your country".

This was about lossless compression.

Lossy compession works by removing informatin in such a way that it isn't noticeable. I may write about itnext time.

Just for information ,this article's size is 2842 bytes in .txt format and 1,143 bytes in .rar format(60% compression).

--Atul Barapatre

 


Like it on Facebook, Tweet it or share this article on other bookmarking websites.

No comments