Data Compression

The Free essays given on our site were donated by anonymous users and should not be viewed as samples of our custom writing service. You are welcome to use them to inspire yourself for writing your own term paper. If you need a custom term paper related to the subject of Computers or Data Compression, you can hire a professional writer here in just a few clicks.
subject = Information Theory title = Data Compression papers = Data Compression- in beginners' terms 'Data Compression' just sounds complicated. Don't be afraid, compression is our good friend for many reasons. It saves hard drive space. It makes data files to handle. It also cuts those immense file download times from the Internet. Wouldn't it be nice if we could compress all files down to just a few bytes? There is a limit to how much you can compress a file. How random the file is, is the determining factor to how far it can be compressed. If the file is completely random and no pattern can be found, then the shortest representation of the file is the file it self. The actual proof that proves this is at the end of my paper. The key to compressing a file is to find some sort of exploitable pattern. Most of this paper will be explaining those patterns that are commonly used. Null suppression is the most primitive form of data compression that I could find. Basically, it says that if you have different fields that data is in (possibly a spread sheet), and any of them have only zeros in them, then the program just eliminates the data and goes straight from the empty data set to the next. Only one step up from null suppression is Run Length Encoding. Run length encoding simply tells you how many of what you have in a row. It would change a set of binary data like {0011100001} into what the computer reads as (2)zeros, (3)ones, (4)zeros, 1. As you can see, it works on the same basic idea of finding a series of 0's (null suppression) and 1's in this case too and abbreviating them. Once the whole idea of data compression caught on, more people started working on programs for it. From these people we got some new premises to work with. Substitutional encoding is a big one. It was invented jointly by two people: Abraham Lempel and Jakob Ziv. Most compression algorithms (big word meaning roughly 'program') using substitutional encoding start with 'LZ' for Lempel-Ziv. LZ-77 is a really neat compression in which the program starts off just copying the source file over to the new target file, but when it recognizes a phrase of data that it has previously written, it replaces the second set of data in the target file with directions on how to get to the first occurrence of it and copy it in the directions' place. This is more commonly called a sliding-window compression because the focus of the program is always sliding all around the file. LZ-78 is the compression that most people have in their homes. Some of the more common ones are ZIP, LHA, ARJ, ZOO, and GZIP. The main idea behind LZ-78 is a 'dictionary'. Yet it works quite a bit like the LZ-77. For every phrase it comes across, it indexes the string by a number and writes it in a 'dictionary'. When the program comes across the same string, it uses the associated number in the 'dictionary' instead of the string. The 'dictionary' is then written along side the compressed file to be used in decoding. There is a combined version of LZ-77 an LZ-78. It is called LZFG. It only writes to the dictionary when it finds the repeated phrase, not on every phrase. Then instead of LZFG replacing the second set of data with directions on how to get to the first occurrence of it, the program puts in the number reference for the dictionary's translation. Not only is it faster, but it compresses better because of the fact that it doesn't have as big of a dictionary attached. Statistical encoding is another one of the new compression concepts. It is an offshoot of the LZ family of compressors; It uses basically the same style as LZFG, but instead of assigning the numbers in order that the strings come out of the source file, statistical compressors do some research. It calculates the number of times each string is used and then ranks the string with the most number of uses at the top of the hash table. The string with the least is ranked at the bottom. (A hash table is where the rank is figured) The higher up a string is on this list, the smaller of a reference number it gets to minimize the total bit usage. This gives this compression just a slight edge on the others, but every little bit helps. (ha ha -bit- ) Beware! There are a few compression programs out there that claim wonderful compression ratios; ratios that beat the compression limit for that file's randomness. These programs aren't really compression programs. They are OWS and WIC. Never compress anything with these. What they do is split up the file that you desired to compress and hide most of it on another part of your hard drive. OWS puts it in a specific spot on the physical hard drive disk. WIC puts the extra information in a hidden file called winfile.dll. The real problems with these programs are that if you don't have the winfile.dll or the information on the certain spot on your drive, then the program won't put your file back together. My original intent with this project was to invent a new compression algorithm. I started with the idea that if you took the file in its pure binary form and laid it out in a matrix, there were certain rows and columns that you could add up to get an output that would be able to recreate the original matrix. I was close too. I had four different outputs that actually would be what would make up the compressed file that combined together to create one output for each bit. From this single output I could determine if the bit was 1 or 0. It worked perfectly for matrixes of 1x1, 2x2, and 3x3. Except that with this small of a matrix, I wasn't compressing it at all. It was more of a coding system that took up more space than did the original file. I even found a way to shrink the size of the four outputs but it was not enough to even break even on bit count. When I got to the 4x4's I found an overlap. An overlap is a term I made up for this algorithm. It means that I got the same single output for a 1 as I did a 0. When that happens, I can't figure out which it is: a 1 or 0. When you can't recreate the original file, data compression has failed. It becomes lossy. I needed a fifth original output. If you want more information on how I thought the algorithm would have worked, please refer to my Inventor's Log that I included. It's way too much to re-type here and it would serve no real purpose in this paper. If you were paying attention earlier, you would be saying, "Why don't you find a pattern? Otherwise you can't compress it. You are treating it like a random file." I didn't find out that it was impossible to compress random data until about the time my algorithm was failing. Because of my setbacks I started looking for an entirely new way to compress data, using a pattern of some sort. I got to thinking about all of the existing algorithms. I wanted to combine a hash table, statistical coder, and a run length coder. The only hard part that I would see in that would be trying to get the patent holders of each of those algorithms to allow me to combine them and actually modify them slightly. In its current algorithm, the statistical coder only accepts alpha-numeric phrases. I would like to modify it to not read the characters that the binary code spells out, but the binary code it self. I

Our inspirational collection of essays and research papers is available for free to our registered users

Related Essays on Computers

Java vs C++

subject = Freshman title = Java vs. C++ papers = Since their inception, computers have played an increasingly important role in today's society. Advancements in technology have enabled computers...

read more
Save the Internet!

subject = English title = Save the Internet papers = Did you know that 83.5% of the images available on the Internet were pornographic (Kershaw)? Did you know that pornography on the Intern...

read more
Computers technology

English Computer Technology A computer is an electronic device that can receive a set of instructions or program and the carry out this program by performing calculations on nu...

read more
Does the goverment has the right to censor the Internet?

Subject:Computer Science Titile:Does The Government Have The Right To Censor The Internet? The Internet is a method of communication and a source of information that is becoming popular among tho...

read more
Where did the Internet Come From?

Where Did The Internet Come From? In the summer of 1969, not everyone was at Woodstock. In laboratories on either side of the continent a small group of computer scientists were quietly changing ...

read more
The digital world of BILL GATES

NAME: _______ English 1 HON Project Due March 12, 1998 THE DIGITAL WORLD OF BILL GATES He is the richest man in the world. His former friends, who are now his enemies, hate to even hear his na...

read more