data-dispersal: Space-efficient and privacy-preserving data dispersal algorithms.
This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.
Warnings:
- Exposed modules use unallocated top-level names: Crypto
This library provides space-efficient (m,n)-information dispersal algorithms (IDAs).
Given a ByteString bstr
of length D
, we encode bstr
as a list fs
of n
Fragment
s, each containing a ByteString
of length O(D/m)
. Then, each fragment in fs
could be stored on a separate
machine for fault-tolerance.
Even if up to n-m
of these machines crash, we can still reconstruct the original
ByteString out of the remaining m fragments.
The total space required for the n fragments is O((n/m)*D)
.
Note that m
and n
are roughly in the same order, so the actual storage overhead
for getting good fault-tolerance increases only by a constant factor.
The module Data.IDA
contains the basic information dispersal algorithm. The module
Crypto.IDA
augments the dispersal scheme by combining it with secret sharing, i.e.,
the knowledge of up to m-1
fragments does not leak any information about
the original data. See Crypto.IDA for details.
GHCi Example:
> :m + Data.IDA > let msg = Data.ByteString.Char8.pack "my really important data" > let fragments = encode 5 15 msg -- Now we could distributed the fragments on different sites to add some -- fault-tolerance. > let frags' = drop 5 $ take 10 fragments -- let's pretend that 10 machines crashed > decode frags' "my really important data"
Fault-Tolerance:
Suppose that we have N
machines and encode our data as 2log(N)
fragments
with reconstruction threshold m = log(N)
.
Let's assume that we store each fragment on a separate machine and each
machine fails (independently) with probability at most 0.5.
What is the probability of our data being safe?
Pr[ at most n-m machines crash ] >= 1-0.5^(log(N)) = 1-N^(-1).
What is the overhead in terms of space that we pay for this level of fault-tolerance? We have n fragments, each of size D/m, so the total space is
n * D/ m = 2D.
In other words, we can guarantee that the data survives with high probability by increasing the required space by a constant factor.
This library is based on the following works:
"Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance", by Michael O. Rabin, JACM 1989.
"How to share a secret." by Adi Shamir. In Communications of the ACM 22 (11): 612–613, 1979.
"Secret Sharing Made Short" Hugo Krawczyk. CRYPTO 1993: 136-146
Properties
Versions | 1.0.0.0, 1.0.0.0, 1.0.0.1, 1.0.0.2 |
---|---|
Change log | None available |
Dependencies | AES (>=0.2.9), array (>=0.4.0.1), base (>=4.6 && <4.7), binary (>=0.7.2.1), bytestring (>=0.10.0.2), entropy (>=0.3.2), finite-field (>=0.8.0), matrix (>=0.3.4.0), secret-sharing (>=1.0.0.0), syb (>=0.4.0), vector (>=0.10.11.0) [details] |
License | LGPL-2.1-only |
Copyright | Peter Robinson 2014 |
Author | Peter Robinson <peter.robinson@monoid.at> |
Maintainer | peter.robinson@monoid.at |
Category | Data, Cryptography |
Home page | http://monoid.at/code |
Uploaded | by PeterRobinson at 2014-09-08T16:30:25Z |
Modules
- Crypto
- Crypto.IDA
- Data
- Data.IDA
- Data.IDA.FiniteField
- Data.IDA.Internal
- Data.IDA
Downloads
- data-dispersal-1.0.0.0.tar.gz [browse] (Cabal source package)
- Package description (as included in the package)
Maintainer's Corner
Package maintainers
For package maintainers and hackage trustees