tar-conduit-0.4.0: Extract and create tar files using conduit for streaming
Safe HaskellSafe-Inferred
LanguageHaskell2010

Data.Conduit.Tar

Description

This module is about stream-processing tar archives. It is currently not very well tested. See the documentation of withEntries for an usage sample.

Synopsis

Basic functions

tar :: MonadThrow m => ConduitM (Either FileInfo ByteString) ByteString m FileOffset Source #

Create a tar archive by suppying a stream of Left FileInfos. Whenever a file type is FTNormal, it must be immediately followed by its content as Right ByteString. The produced ByteString is in the raw tar format and is properly terminated at the end, therefore it can not be extended afterwards. Returned is the total size of the bytestring as a FileOffset.

Since: 0.2.0

tarEntries :: MonadThrow m => ConduitM (Either Header ByteString) ByteString m FileOffset Source #

Just like tar, except gives you the ability to work at a lower Header level, versus more user friendly FileInfo. A deeper understanding of tar format is necessary in order to work directly with Headers.

Since: 0.2.0

untar :: MonadThrow m => (FileInfo -> ConduitM ByteString o m ()) -> ConduitM ByteString o m () Source #

Just like withFileInfo, but works directly on the stream of bytes. Applies pax header blocks and extended headers. However, only the 'comment', 'gid', 'gname', 'linkpath', 'path', 'size', 'uid' and 'uname' pax keywords are supported.

Since: 0.2.0

untarRaw :: MonadThrow m => (FileInfo -> ConduitM ByteString o m ()) -> ConduitM ByteString o m () Source #

Like untar but does not apply pax header blocks and extended headers.

Since: 0.3.3

untarWithFinalizers :: (MonadThrow m, MonadIO m) => (FileInfo -> ConduitM ByteString (IO ()) m ()) -> ConduitM ByteString c m () Source #

Just like untar, except that each FileInfo handling function can produce a finalizing action, all of which will be executed after the whole tarball has been processed in the opposite order. Very useful with restoreFile and restoreFileInto, since they restore direcory modification timestamps only after files have been fully written to disk.

Since: 0.2.0

untarWithExceptions :: (MonadThrow m, MonadIO m) => (FileInfo -> ConduitM ByteString (IO (FileInfo, [SomeException])) m ()) -> ConduitM ByteString c m [(FileInfo, [SomeException])] Source #

Same as untarWithFinalizers, but will also produce a list of any exceptions that might have occured during restoration process.

Since: 0.2.5

restoreFile :: MonadResource m => FileInfo -> ConduitM ByteString (IO ()) m () Source #

Restore files onto the file system. Produces actions that will set the modification time on the directories, which can be executed after the pipeline has finished and all files have been written to disk.

restoreFileInto :: MonadResource m => FilePath -> FileInfo -> ConduitM ByteString (IO ()) m () Source #

Restore all files into a folder. Absolute file paths will be turned into relative to the supplied folder.

restoreFileIntoLenient :: MonadResource m => FilePath -> FileInfo -> ConduitM ByteString (IO (FileInfo, [SomeException])) m () Source #

Restore all files into a folder. Absolute file paths will be turned into relative to the supplied folder. Yields a list with exceptions instead of throwing them.

Since: 0.2.5

restoreFileWithErrors Source #

Arguments

:: MonadResource m 
=> Bool

Lenient flag, results in exceptions thrown instead of collected when set to False.

-> FileInfo 
-> ConduitM ByteString (IO (FileInfo, [SomeException])) m () 

Restore files onto the file system, much in the same way restoreFile does it, except with ability to ignore restoring problematic files and report errors that occured as a list of exceptions, which will be returned as a list when finilizer executed. If a list is empty, it means, that no errors occured and a file only had a finilizer associated with it.

Since: 0.2.4

Operate on Chunks

untarChunks :: Monad m => ConduitM ByteString TarChunk m () Source #

Convert a stream of raw bytes into a stream of TarChunks, after applying any pax header blocks and extended headers. This stream can further be passed into withFileInfo or withHeaders functions. Only the 'comment', 'gid', 'gname', 'linkpath', 'path', 'size', 'uid' and 'uname' pax keywords are supported. For a component that produces unprocessed TarChunks, see untarChunksRaw.

Since: 0.2.1

untarChunksRaw :: Monad m => ConduitM ByteString TarChunk m () Source #

Convert a stream of raw bytes into a stream of raw TarChunks. This stream can further be passed into withFileInfo or withHeaders functions. For a component that further processes raw TarChunks to apply pax header blocks and extended headers, see untarChunk.

Since: 0.3.3

applyPaxChunkHeaders :: Monad m => ConduitM TarChunk TarChunk (StateT PaxState m) () Source #

Applies tar chunks that are pax header blocks and extended headers to the tar chunks that follow. However, only the 'comment', 'gid', 'gname', 'linkpath', 'path', 'size', 'uid' and 'uname' pax keywords are supported.

withEntry :: MonadThrow m => (Header -> ConduitM ByteString o m r) -> ConduitM TarChunk o m r Source #

Process a single tar entry. See withEntries for more details.

Since: 0.1.0

withEntries :: MonadThrow m => (Header -> ConduitM ByteString o m ()) -> ConduitM TarChunk o m () Source #

This function handles each entry of the tar archive according to the behaviour of the function passed as first argument.

Here is a full example function, that reads a compressed tar archive and for each entry that is a simple file, it prints its file path and SHA256 digest. Note that this function can throw exceptions!

import qualified Crypto.Hash.Conduit as CH
import qualified Data.Conduit.Tar    as CT

import Conduit
import Crypto.Hash (Digest, SHA256)
import Control.Monad (when)
import Data.Conduit.Zlib (ungzip)
import Data.ByteString (ByteString)

filedigests :: FilePath -> IO ()
filedigests fp = runConduitRes (  sourceFileBS fp          -- read the raw file
                               .| ungzip                   -- gunzip
                               .| CT.untarChunks           -- decode the tar archive
                               .| CT.withEntries hashentry -- process each file
                               .| printC                   -- print the results
                               )
    where
        hashentry :: Monad m => CT.Header -> Conduit ByteString m (FilePath, Digest SHA256)
        hashentry hdr = when (CT.headerFileType hdr == CT.FTNormal) $ do
            hash <- CH.sinkHash
            yield (CT.headerFilePath hdr, hash)

The hashentry function handles a single entry, based on its first Header argument. In this example, a Consumer is used to process the whole entry.

Note that the benefits of stream processing are easily lost when working with a Consumer. For example, the following implementation would have used an unbounded amount of memory:

        hashentry hdr = when (CT.headerFileType hdr == CT.FTNormal) $ do
            content <- mconcat <$> sinkList
            yield (CT.headerFilePath hdr, hash content)

Since: 0.1.0

withFileInfo :: MonadThrow m => (FileInfo -> ConduitM ByteString o m ()) -> ConduitM TarChunk o m () Source #

Extract a tarball, similarly to withEntries, but instead of dealing directly with tar format, this conduit allows you to work directly on file abstractions FileInfo. For now support is minimal:

  • Old v7 tar format.
  • ustar: POSIX 1003.1-1988 format
  • and only some portions of GNU format:
  • Larger values for fileUserId, fileGroupId, fileSize and fileModTime.
  • L type - long file names, but only up to 4096 chars to prevent DoS attack
  • other types are simply discarded

Note - Here is a really good reference for specifics of different tar formats: https://github.com/libarchive/libarchive/wiki/ManPageTar5

Since: 0.2.2

Helper functions

headerFileType :: Header -> FileType Source #

Get Header file type.

Since: 0.1.0

Creation

tarFilePath :: (MonadThrow m, MonadResource m) => ConduitM FilePath ByteString m FileOffset Source #

Recursively tar all of the files and directories. There will be no conversion between relative and absolute paths, so just like with GNU tar cli tool, it may be necessary to setCurrentDirectory in order to get the paths relative. Using filePathConduit directly, while modifying the filePath, would be another approach to handling the file paths.

Since: 0.2.0

filePathConduit :: (MonadThrow m, MonadResource m) => ConduitM FilePath (Either FileInfo ByteString) m () Source #

Turn a stream of file paths into a stream of FileInfo and file content. All paths will be decended into recursively.

Since: 0.2.0

Directly on files

createTarball Source #

Arguments

:: FilePath

File name for the tarball

-> [FilePath]

List of files and directories to include in the tarball

-> IO () 

Uses tarFilePath to create a tarball, that will recursively include the supplied list of all the files and directories

Since: 0.2.0

writeTarball Source #

Arguments

:: Handle

Handle where created tarball will be written to

-> [FilePath]

List of files and directories to include in the tarball

-> IO () 

Take a list of files and paths, recursively tar them and write output into supplied handle.

Since: 0.2.0

extractTarball Source #

Arguments

:: FilePath

Filename for the tarball

-> Maybe FilePath

Folder where tarball should be extract to. Default is the current path

-> IO () 

Extract a tarball while using restoreFileInfo for writing files onto the file system. Restoration process is cross platform and should work concistently both on Windows and Posix systems.

Since: 0.2.0

extractTarballLenient Source #

Arguments

:: FilePath

Filename for the tarball

-> Maybe FilePath

Folder where tarball should be extract to. Default is the current path

-> IO [(FileInfo, [SomeException])] 

Same as extractTarball, but ignores possible extraction errors. It can still throw a TarException if the tarball is corrupt or malformed.

Since: 0.2.5

Types