pdftotext: Extracts text from PDF using poppler

[ bsd3, library, pdf, program, text ] [ Propose Tags ] [ Report a vulnerability ]

The pdftotext package provides functions for extraction of plain text from PDF documents. It uses C++ library Poppler, which is required to be installed in the system. Output of Haskell pdftotext library is identical to output of Poppler's tool pdftotext.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0.0.1.0, 0.0.2.0, 0.1.0.0, 0.1.0.1
Change log CHANGELOG.md
Dependencies aeson (>=1.4 && <1.6), ansi-wl-pprint (>=0.6 && <0.7), base (>=4.11 && <5), bytestring (>=0.10 && <0.11), optparse-applicative (>=0.15 && <0.16), pdftotext, range (>=0.3 && <0.4), text (>=1.2 && <1.3) [details]
License BSD-3-Clause
Copyright 2020 G. Eyaeb
Author G. Eyaeb
Maintainer geyaeb@protonmail.com
Category Text, PDF
Home page https://sr.ht/~geyaeb/haskell-pdftotext/
Bug tracker https://todo.sr.ht/~geyaeb/haskell-pdftotext
Source repo head: hg clone https://hg.sr.ht/~geyaeb/haskell-pdftotext
Uploaded by geyaeb at 2020-12-01T00:45:55Z
Distributions NixOS:0.1.0.1
Executables pdftotext.hs
Downloads 790 total (16 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for pdftotext-0.1.0.1

[back to package description]

pdftotext

The pdftotext package provides functions for extraction of plain text from PDF documents. It uses C++ library Poppler, which is required to be installed in the system. Output of Haskell pdftotext library is identical to output of Poppler's tool pdftotext.

Usage

import qualified Data.Text.IO as T
import Pdftotext

main :: IO ()
main = do
  Just pdf <- openFile "path/to/file.pdf"
  T.putStrLn $ pdftotext Physical pdf

Executable

pdftotext comes with executable program pdftotext.hs which can print text extracted from PDF and basic information from the document.

$> pdftotext.hs info test/simple.pdf
File      : test/simple.pdf
Pages     : 4
Properties
  Title   : Simple document for testing
  Author  : G. Eyaeb
  Subject : Testing
  Creator : pdflatex
  Producer: LaTeX with hyperref
  Keywords: haskell,pdf
$> pdftotext.hs text --pages 1,4 test/simple.pdf
Simple document for testing

                  deserve neither
liberty nor safety.

See help for more information:

$> pdftotext.hs --help
$> pdftotext.hs text --help
$> pdftotext.hs info --help

Internals

The library uses poppler via FFI, therefore internally all functions are of type IO. However, their non-IO variants (using unsafePerformIO) should be safe to use. Module Pdftotext.Internal exposes all IO-typed functions.

Contribute

Project is hosted at https://sr.ht/~geyaeb/haskell-pdftotext/ . The homepage provides links to Mercurial repository, mailing list and ticket tracker.

Patches, suggestions, questions and general discussions can be send to the mailing list. Detailed information about sending patches by email can be found at [https://man.sr.ht/hg.sr.ht/email.md](https://man.sr.ht/hg.sr.ht/email.md).