streamly-lmdb-0.8.0: Stream data to or from LMDB databases using the streamly library.
Safe HaskellSafe-Inferred
LanguageHaskell2010

Streamly.External.LMDB

Synopsis

Acknowledgments

The functionality for the limits and getting the environment and database, in particular the idea of specifying the read-only or read-write mode at the type level, was mostly obtained from the lmdb-simple library.

Warning

Unless you know what you’re doing, please do not use other mechanisms (in addition to the public functionality of this library) in the same Haskell program to interact with your LMDB databases. (If you really want to do this, you should heed all the low-level requirements we have linked to in the source code of this library, and in general understand how the LMDB C API works with MDB_NOTLS enabled.)

Environments

With LMDB, one first creates a so-called “environment,” which one can think of as a file or directory on disk.

data Environment emode Source #

openEnvironment :: forall emode. Mode emode => FilePath -> Limits -> IO (Environment emode) Source #

Open an LMDB environment in either ReadWrite or ReadOnly mode. The FilePath argument may be either a directory or a regular file, but it must already exist; when creating a new environment, one should create an empty file or directory beforehand. If a regular file, an additional file with "-lock" appended to the name is automatically created for the reader lock table.

Note that an environment must have been opened in ReadWrite mode at least once before it can be opened in ReadOnly mode.

An environment opened in ReadOnly mode may still modify the reader lock table (except when the filesystem is read-only, in which case no locks are used).

To satisfy certain low-level LMDB requirements, please do not have opened the same environment (i.e., the same FilePath) more than once in the same process at the same time. Furthermore, please use the environment in the process that opened it (not after forking a new process).

isReadOnlyEnvironment :: forall emode. Mode emode => Bool Source #

closeEnvironment :: forall emode. Mode emode => Environment emode -> IO () Source #

Closes the given environment.

If you have merely a few dozen environments at most, there should be no need for this. (It is a common practice with LMDB to create one’s environments once and reuse them for the remainder of the program’s execution.)

To satisfy certain low-level LMDB requirements:

  • Before calling this function, please call closeDatabase on all databases in the environment.
  • Before calling this function, close all cursors and commit/abort all transactions on the environment. To make sure this requirement is satisified for read-only transactions, either (a) call waitReaders or (b) pass precreated cursors/transactions to readLMDB and unsafeReadLMDB.
  • After calling this function, do not use the environment or any related databases, transactions, and cursors.

Limits

data Limits Source #

LMDB environments have various limits on the size and number of databases and concurrent readers.

Constructors

Limits 

Fields

  • mapSize :: !Int

    Memory map size, in bytes (also the maximum size of all databases).

  • maxDatabases :: !Int

    Maximum number of named databases.

  • maxReaders :: !Int

    Maximum number of concurrent ReadOnly transactions (also the number of slots in the lock table).

defaultLimits :: Limits Source #

The default limits are 1 MiB map size, 0 named databases (see Databases), and 126 concurrent readers. These can be adjusted freely, and in particular the mapSize may be set very large (limited only by available address space). However, LMDB is not optimized for a large number of named databases so maxDatabases should be kept to a minimum.

The default mapSize is intentionally small, and should be changed to something appropriate for your application. It ought to be a multiple of the OS page size, and should be chosen as large as possible to accommodate future growth of the database(s). Once set for an environment, this limit cannot be reduced to a value smaller than the space already consumed by the environment; however, it can later be increased.

If you are going to use any named databases then you will need to change maxDatabases to the number of named databases you plan to use. However, you do not need to change this field if you are only going to use the single main (unnamed) database.

Databases

After creating an environment, one creates within it one or more databases.

data Database emode Source #

getDatabase :: forall emode. Mode emode => Environment emode -> Maybe String -> IO (Database emode) Source #

Gets a database with the given name.

If only one database is desired within the environment, the name can be Nothing (known as the “unnamed database”).

If one or more named databases (a database with a Just name) are desired, the maxDatabases of the environment’s limits should have been adjusted accordingly. The unnamed database will in this case contain the names of the named databases as keys, which one is allowed to read but not write.

Warning: When getting a named database for the first time (i.e., creating it), one must do so in the ReadWrite environment mode. (This restriction does not apply for the unnamed database.) In this case, this function spawns a bound thread and creates a temporary read-write transaction under the hood; see Transactions.

closeDatabase :: forall emode. Mode emode => Database emode -> IO () Source #

Closes the given database.

If you have merely a few dozen databases at most, there should be no need for this. (It is a common practice with LMDB to create one’s databases once and reuse them for the remainder of the program’s execution.)

To satisfy certain low-level LMDB requirements:

  • Before calling this function, please make sure all read-write transactions that have modified the database have already been committed or aborted.
  • After calling this function, do not use the database or any of its cursors again. To make sure this requirement is satisfied for cursors on read-only transactions, either (a) call waitReaders or (b) pass precreated cursors/transactions to readLMDB and unsafeReadLMDB.

Transactions

In LMDB, there are two types of transactions: read-only transactions and read-write transactions. On a given environment, read-only transactions do not block other transactions and read-write transactions do not block read-only transactions, but read-write transactions are serialized and block other read-write transactions.

Read-only transactions attain a snapshot view of the environment; this view is not affected by newer read-write transactions.

Warning: Long-lived transactions are discouraged by LMDB, and it is your responsibility as a user of this library to avoid them as necessary. The reasons are twofold: (a) The first one we already mentioned: Read-write transactions block other read-write transactions. (b) The second is more insidious: Even though read-only transactions do not block read-write transactions, read-only transactions (since they attain a snapshot view of the environment) prevent the reuse of pages freed by newer read-write transactions, so the database can grow quickly.

data Transaction tmode emode Source #

A read-only (tmode: ReadOnly) or read-write (tmode: ReadWrite) transaction.

emode: the environment’s mode. Note: ReadOnly environments can only have ReadOnly transactions; we enforce this at the type level.

Read-only transactions

beginReadOnlyTransaction :: forall emode. Mode emode => Environment emode -> IO (Transaction ReadOnly emode) Source #

Begins an LMDB read-only transaction on the given environment.

For read-only transactions returned from this function, it is your responsibility to (a) make sure the transaction only gets used by a single readLMDB, unsafeReadLMDB, or getLMDB at the same time, (b) use the transaction only on databases in the environment on which the transaction was begun, (c) make sure that those databases were already obtained before the transaction was begun, (d) dispose of the transaction with abortReadOnlyTransaction, and (e) be aware of the caveats regarding long-lived transactions; see Transactions.

To easily manage a read-only transaction’s lifecycle, we suggest using withReadOnlyTransaction.

abortReadOnlyTransaction :: forall emode. Mode emode => Transaction ReadOnly emode -> IO () Source #

Disposes of a read-only transaction created with beginReadOnlyTransaction.

It is your responsibility to not use the transaction or any of its cursors afterwards.

withReadOnlyTransaction :: forall m a emode. (Mode emode, MonadBaseControl IO m, MonadIO m) => Environment emode -> (Transaction ReadOnly emode -> m a) -> m a Source #

Creates a temporary read-only transaction on which the provided action is performed, after which the transaction gets aborted. The transaction also gets aborted upon exceptions.

You have the same responsibilities as documented for beginReadOnlyTransaction (apart from the transaction disposal).

waitReaders :: Mode emode => Environment emode -> IO () Source #

Waits for active read-only transactions on the given environment to finish. Note: This triggers garbage collection.

Read-write transactions

beginReadWriteTransaction :: Environment ReadWrite -> IO (Transaction ReadWrite ReadWrite) Source #

Begins an LMDB read-write transaction on the given environment.

Unlike read-only transactions, a given read-write transaction is not allowed to stray from the OS thread on which it was begun, and it is your responsibility to make sure of this. You can achieve this with, e.g., runInBoundThread.

Additionally, for read-write transactions returned from this function, it is your responsibility to (a) use the transaction only on databases in the environment on which the transaction was begun, (b) make sure that those databases were already obtained before the transaction was begun, (c) commit/abort the transaction with commitReadWriteTransaction/abortReadWriteTransaction, and (d) be aware of the caveats regarding long-lived transactions; see Transactions.

To easily manage a read-write transaction’s lifecycle, we suggest using withReadWriteTransaction.

abortReadWriteTransaction :: Transaction ReadWrite ReadWrite -> IO () Source #

Aborts a read-write transaction created with beginReadWriteTransaction.

It is your responsibility to not use the transaction afterwards.

commitReadWriteTransaction :: Transaction ReadWrite ReadWrite -> IO () Source #

Commits a read-write transaction created with beginReadWriteTransaction.

It is your responsibility to not use the transaction afterwards.

withReadWriteTransaction :: forall m a. (MonadBaseControl IO m, MonadIO m) => Environment ReadWrite -> (Transaction ReadWrite ReadWrite -> m a) -> m a Source #

Spawns a new bound thread and creates a temporary read-write transaction on which the provided action is performed, after which the transaction gets committed. The transaction gets aborted upon exceptions.

You have the same responsibilities as documented for beginReadWriteTransaction (apart from running it on a bound thread and committing/aborting it).

Cursors

data Cursor Source #

A cursor.

openCursor :: forall emode tmode. (Mode emode, Mode tmode, SubMode emode tmode) => Transaction tmode emode -> Database emode -> IO Cursor Source #

Opens a cursor for use with readLMDB or unsafeReadLMDB. It is your responsibility to (a) make sure the cursor only gets used by a single readLMDB or unsafeReadLMDB at the same time, (b) make sure the provided database is within the environment on which the provided transaction was begun, and (c) dispose of the cursor with closeCursor.

To easily manage a cursor’s lifecycle, we suggest using withCursor.

closeCursor :: Cursor -> IO () Source #

Disposes of a cursor created with openCursor.

withCursor :: forall m a emode tmode. (MonadBaseControl IO m, MonadIO m, Mode emode, Mode tmode, SubMode emode tmode) => Transaction tmode emode -> Database emode -> (Cursor -> m a) -> m a Source #

Creates a temporary cursor on which the provided action is performed, after which the cursor gets closed. The cursor also gets closed upon exceptions.

You have the same responsibilities as documented for openCursor (apart from the cursor disposal).

Reading

Stream-based reading

readLMDB :: forall m emode tmode. (MonadIO m, Mode emode, Mode tmode, SubMode emode tmode) => Unfold m (ReadOptions, Database emode, EitherTxn tmode (Maybe ChunkSize) (Transaction tmode emode, Cursor)) (ByteString, ByteString) Source #

Creates an unfold with which we can stream key-value pairs from the given database.

If an existing transaction and cursor are not provided, there are two possibilities: (a) If a chunk size is not provided, a read-only transaction and cursor are automatically created for the entire duration of the unfold. (b) Otherwise, new transactions and cursors are automatically created according to the desired chunk size. In this case, each transaction (apart from the first one) starts as expected at the key next to (i.e., the largest/smallest key less/greater than) the previously encountered key.

If you want to iterate through a large database while avoiding a long-lived transaction (see Transactions), it is your responsibility to either chunk up your usage of readLMDB (with which readStart can help) or specify a chunk size as described above.

Runtime consideration: If you call readLMDB very frequently without a precreated transaction and cursor, you might find upon profiling that a significant time is being spent at mdb_txn_begin, or find yourself having to increase maxReaders in the environment’s limits because the transactions and cursors are not being garbage collected fast enough. In this case, please consider precreating a transaction and cursor.

If you don’t want the overhead of intermediate ByteStrings (on your way to your eventual data structures), use unsafeReadLMDB instead.

unsafeReadLMDB :: forall m k v emode tmode. (MonadIO m, Mode emode, Mode tmode, SubMode emode tmode) => Unfold m (ReadOptions, Database emode, EitherTxn tmode (Maybe ChunkSize) (Transaction tmode emode, Cursor), CStringLen -> IO k, CStringLen -> IO v) (k, v) Source #

Similar to readLMDB, except that the keys and values are not automatically converted into Haskell ByteStrings.

To ensure safety, please make sure that the memory pointed to by the CStringLen for each key/value mapping function call is (a) only read (and not written to); and (b) not used after the mapping function has returned. One way to transform the CStringLens to your desired data structures is to use unsafePackCStringLen.

data ReadOptions Source #

Instances

Instances details
Show ReadOptions Source # 
Instance details

Defined in Streamly.External.LMDB.Internal

defaultReadOptions :: ReadOptions Source #

By default, we start reading from the beginning of the database (i.e., from the smallest key) and iterate in forward direction.

data ReadDirection Source #

Direction of key iteration.

Constructors

Forward 
Backward 

Instances

Instances details
Show ReadDirection Source # 
Instance details

Defined in Streamly.External.LMDB.Internal

data ReadStart Source #

The key from which an iteration should start.

Constructors

ReadBeg

Start from the smallest key.

ReadEnd

Start from the largest key.

ReadGE !ByteString

Start from the smallest key that is greater than or equal to the given key.

ReadGT !ByteString

Start from the smallest key that is greater than the given key.

ReadLE !ByteString

Start from the largest key that is less than or equal to the given key.

ReadLT !ByteString

Start from the largest key that is less than the given key.

Instances

Instances details
Show ReadStart Source # 
Instance details

Defined in Streamly.External.LMDB.Internal

Direct reading

getLMDB :: forall emode tmode. (Mode emode, Mode tmode, SubMode emode tmode) => Database emode -> MaybeTxn tmode (Transaction tmode emode) -> ByteString -> IO (Maybe ByteString) Source #

Looks up the value for the given key in the given database.

If an existing transaction is not provided, a read-only transaction is automatically created internally.

Runtime consideration: If you call getLMDB very frequently without a precreated transaction, you might find upon profiling that a significant time is being spent at mdb_txn_begin, or find yourself having to increase maxReaders in the environment’s limits because the transactions are not being garbage collected fast enough. In this case, please consider precreating a transaction.

Writing

Stream-based writing

writeLMDB :: forall m a. (MonadIO m, MonadCatch m, MonadThrow m) => WriteOptions m a -> Database ReadWrite -> Transaction ReadWrite ReadWrite -> Fold m (ByteString, ByteString) a Source #

Creates a fold that writes a stream of key-value pairs to the provided database using the provided transaction.

If you have a long stream of key-value pairs that you want to write to an LMDB database while avoiding a long-lived transaction (see Transactions), you can use the functions for chunked writing.

defaultWriteOptions :: WriteOptions m () Source #

By default, we allow overwriting.

data OverwriteOptions m a where Source #

  • OverwriteAllow: When a key reoccurs, overwrite the value.
  • OverwriteDisallow: When a key reoccurs, don’t overwrite and hand the maladaptive key-value pair to the accumulator.
  • OverwriteAppend: Assume the input data is already increasing, which allows the use of MDB_APPEND under the hood and substantially improves write performance. Hand arriving key-value pairs in a maladaptive order to the accumulator.

Accumulators

Accumulator types for OverwriteDisallow and OverwriteAppend. Various commonly used accumulators are provided as well.

type WriteAccum m a = Fold m (ByteString, ByteString) a Source #

A fold for (key, new value).

type WriteAccumWithOld m a = Fold m (ByteString, ByteString, ByteString) a Source #

A fold for (key, new value, old value). This has the overhead of getting the old value.

type ShowKey = ByteString -> String Source #

A function that shows a database key.

type ShowValue = ByteString -> String Source #

A function that shows a database value.

writeAccumThrow :: Monad m => Maybe (ShowKey, ShowValue) -> WriteAccum m () Source #

Throws upon the first maladaptive key. If desired, shows the maladaptive key-value pair in the exception.

writeAccumThrowAllowSameValue :: Monad m => Maybe (ShowKey, ShowValue) -> WriteAccumWithOld m () Source #

Throws upon the first maladaptive key where the old value differs from the new value. If desired, shows the maladaptive key-value pair with the old value in the exception.

writeAccumIgnore :: Monad m => WriteAccum m () Source #

Silently ignores maladaptive keys.

writeAccumStop :: Monad m => WriteAccum m () Source #

Gracefully stops upon the first maladaptive key.

Chunked writing

chunkPairs :: Monad m => ChunkSize -> Stream m (ByteString, ByteString) -> Stream m (Seq (ByteString, ByteString)) Source #

Chunks up the incoming stream of key-value pairs using the desired chunk size. One can try, e.g., ChunkBytes mebibyte (1 MiB chunks) and benchmark from there.

chunkPairsFold :: forall m a. Monad m => ChunkSize -> Fold m (Seq (ByteString, ByteString)) a -> Fold m (ByteString, ByteString) a Source #

Chunks up the incoming stream of key-value pairs using the desired chunk size. One can try, e.g., ChunkBytes mebibyte (1 MiB chunks) and benchmark from there.

The chunks are processed using the desired fold.

writeLMDBChunk :: forall m a. (MonadBaseControl IO m, MonadIO m, MonadCatch m) => WriteOptions m a -> Database ReadWrite -> Seq (ByteString, ByteString) -> m a Source #

Writes a chunk of key-value pairs to the given database. Under the hood, it uses writeLMDB surrounded with a withReadWriteTransaction.

Deleting

deleteLMDB :: DeleteOptions -> Database emode -> Transaction ReadWrite emode -> ByteString -> IO () Source #

Deletes the given key from the given database using the given transaction.

clearDatabase :: Database ReadWrite -> IO () Source #

Clears, i.e., removes all key-value pairs from, the given database.

Warning: Under the hood, this function spawns a bound thread and creates a potentially long-lived read-write transaction; see Transactions.

Mode

class Mode a Source #

A type class for ReadOnly and ReadWrite environments and transactions.

Minimal complete definition

isReadOnlyMode

Instances

Instances details
Mode ReadOnly Source # 
Instance details

Defined in Streamly.External.LMDB.Internal

Mode ReadWrite Source # 
Instance details

Defined in Streamly.External.LMDB.Internal

data ReadWrite Source #

Instances

Instances details
Mode ReadWrite Source # 
Instance details

Defined in Streamly.External.LMDB.Internal

data ReadOnly Source #

Instances

Instances details
Mode ReadOnly Source # 
Instance details

Defined in Streamly.External.LMDB.Internal

type family SubMode emode tmode where ... Source #

Enforces at the type level that ReadWrite environments support both ReadWrite and ReadOnly transactions, but ReadOnly environments support only ReadOnly transactions.

Equations

SubMode ReadWrite _ = () 
SubMode ReadOnly ReadOnly = () 
SubMode ReadOnly ReadWrite = TypeError ('Text "ReadOnly environments only support ReadOnly transactions") 

Error types

Miscellaneous

data ChunkSize Source #

A chunk size.

Constructors

ChunkNumPairs !Int

Chunk up key-value pairs by number of pairs. The final chunk can have a fewer number of pairs.

ChunkBytes !Int

Chunk up key-value pairs by number of bytes. As soon as the byte count for the keys and values is reached, a new chunk is created (such that each chunk has at least one key-value pair and can end up with more than the desired number of bytes). The final chunk can have less than the desired number of bytes.

Instances

Instances details
Show ChunkSize Source # 
Instance details

Defined in Streamly.External.LMDB.Internal

data MaybeTxn tmode a where Source #

A type for an optional thing where we want to fix the transaction mode to ReadOnly in the nothing case. (Maybe isn’t powerful enough for this.)

Constructors

NoTxn :: MaybeTxn ReadOnly a 
JustTxn :: a -> MaybeTxn tmode a 

data EitherTxn tmode a b where Source #

A type for an Either-like choice where we want to fix the transaction mode to ReadOnly in the Left case. (Either isn’t powerful enough for this.)

Constructors

LeftTxn :: a -> EitherTxn ReadOnly a b 
RightTxn :: b -> EitherTxn tmode a b 

kibibyte :: Num a => a Source #

A convenience constant for obtaining 1 KiB.

mebibyte :: Num a => a Source #

A convenience constant for obtaining 1 MiB.

gibibyte :: Num a => a Source #

A convenience constant for obtaining 1 GiB.

tebibyte :: Num a => a Source #

A convenience constant for obtaining 1 TiB.