shannon-fano: Shannon-fano compression algorithm in Haskell

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

Shannon-fano compression algorithm in Haskell program and API


[Skip to Readme]

Properties

Versions 0.1.0.0, 0.1.0.1, 1.0.0.0, 1.0.0.0
Change log CHANGELOG.md
Dependencies base (>=4.12.0.0 && <5), bytestring, optparse-generic, shannon-fano [details]
License MIT
Copyright 2020 Armando Santos
Author Armando Santos
Maintainer Armando Santos <armandoifsantos@gmail.com>
Category Codec
Home page https://github.com/bolt12/shannon-fano
Bug tracker https://github.com/bolt12/shannon-fano/issues
Source repo head: git clone https://github.com/bolt12/shannon-fano.git
Uploaded by bolt12 at 2020-06-02T17:43:18Z

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


Readme for shannon-fano-1.0.0.0

[back to package description]

Shannon-Fano compression algorithm library

Haskell implementation

GitHub CI Build status Hackage Stackage Lts Stackage Nightly

This library offers a set of functions to compress files into binary code applying the Shannon-Fano compression algorithm.

https://en.wikipedia.org/wiki/Shannon%E2%80%93Fano_coding

Installing / Getting started

Stack

This package is now availablein Hackage at https://hackage.haskell.org/package/shannon-fano-0.1.0.0

So you just need to:

$ stack install shannon-fano

Manually

$ git clone https://github.com/bolt12/shannon-fano.git
$ cd shannon-fano/
$ stack install

Build documentation

$ cd shannon-fano/
$ stack haddock

See for yourself

You can see if it's correctly installed by doing calling the program in the terminal. You should see the following:

> shannon-fano -h
Compress contents using the Shannon-Fano algorithm

Usage: shannon-fano [--decompress STRING] [--input STRING] [--output STRING]

Available options:
-h,--help                Show this help text
--decompress STRING      Decompress with decode table file name.
--input STRING           Input content file name. If not specified will be
read from stdin.
--output STRING          Output result file name. If not specified will be
'out.bin' or 'out.bin.dat' depending on the action

Use examples

The program is able to read from stdin if no input file is provided like so:

> shannon-fano
test
>

This will create a 'out.bin' file and a 'out.bin.tab' file (which contains the decode table), which you can decompress:

> shannon-fano --decompress out.bin.tab --input out.bin

If no output file name is provided, this should create a new file called 'out.dat':

> cat out.dat
test

Performance and compression

Testing the compressor program for a lorem ipsum text file of 1000 words:

> time shannon-fano --input test.txt

real    0m0.074s
user    0m0.060s
sys     0m0.025s

Compression:

> ls -lh out.bin test.txt | cut -d " " -f5
3.4K
6.4K

Total ~ 47%


Testing the compressor program with 1M of random data:

> base64 /dev/urandom | head -c 1000000 > test.txt
> time shannon-fano --input test.txt

real    0m2.648s
user    0m2.321s
sys     0m1.305s

Compression:

> ls -lh out.bin test.txt | cut -d " " -f5
737K
977K

Total ~ 15%


Testing the compressor program with a 2.1M file containing repetitions of 5 characters:

> time shannon-fano --input test.txt

real    0m2.356s
user    0m2.069s
sys     0m1.499s

Compression:

> ls -lh out.bin test.txt | cut -d " " -f5
734K
2.1M

Total ~ 65%

Decompression:

> time shannon-fano --decompress out.bin.tab --input out.bin

real    0m6.374s
user    0m6.252s
sys     0m1.394s

Conclusion

As you can see, this algorithm performs worse, in general, in terms of compression, with random and large data.

Contributing

If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.

Licensing

The code in this project is licensed under GPL3 license.