This is the main assignment for the information theory course. (You are not obliged to do this assignment, but I think you will find it educational and fun.) Work in groups of two if you wish.
Neil has created a huge file containing the results of ten thousand tosses of a bent coin with f=0.01. (Don't print out this file!)
Your task is: Compress Neil's file.
Or, to be more precise, write two programs: a compressor and an uncompressor. The compressor should read in a file like Neil's file (~mackay/compress/filep.01.10000) and should write out another file, which, for simplicity, I would recommend making a plain text file containing 0s and 1s. (But you can use other formats, such as binary files, if you wish.)
The uncompressor should have the property that when it reads in the compressed file, it writes out the original file.
If I give your program another file similar to Neil's, the compressor and uncompressor should both work on that file too. [So you can't make a fake uncompressor that simply prints out Neil's file!]
We would like the compressed file to be as small as possible.
You may write the programs in any language you wish. C, octave, perl, and python all have their advantages.
If it's helpful to you, I can supply a Huffman coding algorithm (in ~mackay/bin/huffman.p) that produces optimal symbol codes for probability distributions that you choose. [Further information.]
| ||||||||||||||||||||
I encourage you to make a web-page, describing the method you used, how you decided on the parameters of your method, and (if it works) how many BITS your method compresses Neil's file into.