This is the main assignment for the information theory course. You may work in pairs if you wish.
You have a choice: you can do the bent coin task described here, or, if you prefer, write a compression program and uncompression program to solve any other task you think is interesting (for example, text compression, or image compression).
Neil has created a huge file containing the results of ten thousand tosses of a bent coin with f=0.01. (Don't print out this file!)
For your convenience, there are two copies of Neil's file: one with each outcome (0/1) on a separate line; and one where all the outcomes are concatenated on a single line. (Don't print out either file!)
Your task is: Compress Neil's file.
Or, to be more precise, write two programs: a compressor and an uncompressor. The compressor should read in a file like Neil's file (~mackay/compress/filep.01.10000) and should write out another file, which, for simplicity, I would recommend making a plain text file containing 0s and 1s. (But you can use other formats, such as binary files, if you wish.)
The uncompressor should have the property that when it reads in the compressed file, it writes out the original file.
| ||||||||||||||||||||
If I give your program another file similar to Neil's, the compressor and uncompressor should both work on that file too. [So you can't make a fake uncompressor that simply prints out Neil's file!]
We would like the compressed file to be as small as possible.
You may write the programs in any language you wish; we recommend python.
If it's helpful to you, a Huffman coding algorithm (in ~mackay/python/compression/huffman/Huffman.py) can produce optimal symbol codes for probability distributions that you choose. [Further information.]
Students' solutions | My solutions written in python
This package illustrates how to have a test() function that is executed when you type C-c C-c in emacs. It sets itself some simple tests, and if it passes all of those, it continues and attempts a harder test, "compressing" the BentCoinFile into tmp.zip.
This package illustrates
Usage:
compressing:
python copy.py < /home/mackay/compress/BentCoinFile > tmp
uncompressing:
python copy.py --uncompress < tmp > tmp2
I encourage you to make a web-page, describing the method you used, how you decided on the parameters of your method, and (if it works) how many BITS your method compresses Neil's file into.