newline-0.0.2.1: newline specifications as values
Safe HaskellSafe-Inferred
LanguageHaskell2010

Text.Newline.LineMap

Description

Create a map of the lines in the file to allow fast seeking later. Specifically, for each line, we output:

  • the byte offset from the start of the file of the start of the line
  • the length of the line in number of bytes (including the line terminator, if any)
  • the type of line terminator that ended the line, if any
  • the non-decoded bytes of that line.

There is an associated file format to serialize this data, based on CSV. See documentation for display.

Currently, we only support utf8-encoded text with Unix line-endings (LF).

Synopsis

Documentation

data Line a Source #

Holds a detected line. The main result type for this module.

Constructors

Line 

Fields

  • startOffset :: !Int

    offset in bytes of the start of the line from the start of the input file

  • content :: a

    generally, does not include newline

  • nlType :: Maybe Newline

    the terminator for this line, if any

  • length :: !Int

    length of the line in bytes, including the line terminator

Instances

Instances details
Functor Line Source # 
Instance details

Defined in Text.Newline.LineMap

Methods

fmap :: (a -> b) -> Line a -> Line b #

(<$) :: a -> Line b -> Line a #

display :: [Line a] -> String Source #

Render contents for a linemap file.

The format is simply a three-colum CSV with header row. The columns are offset, length, and terminator, as above. Offset and length are decimal-encoded unsigned integers. The terminator column must hold one of the following strings:

  • unix for LF (ASCII 0x0A),
  • dos for CRLF (ASCOO 0x0D 0x0A),
  • eof for end of file/input.

The output CSV does not require quoting, so the output actually abides by RFC 4180 (with the exception that I'm using LF instead of CRLF, sigh).

breakLines_unixUtf8 Source #

Arguments

:: ByteString

all bytes of a file

-> [Line ByteString] 

Split input into lines. Assumes utf8-encoded text with LF (ASCII 0x0A) line terminators. See breakLine_unixUtf8 to take a single line.

Does not include newlines in any Line content.

breakLine_unixUtf8 Source #

Arguments

:: Int

byte offset within file of input

-> ByteString

non-empty input bytes

-> (Line ByteString, ByteString)

resuling line and remaining input

Take one line of input, and also return the remaining input. Assumes utf8-encoded text with LF (ASCII 0x0A) line terminators. See breakLines_unixUtf8 to produce a list of all lines.

Does not include newlines in any Line content.