hs-conllu: Conllu validating parser and utils.

[ lgpl, library, program, unclassified ] [ Propose Tags ] [ Report a vulnerability ]

utilities to parse, print, diff, and analyse data in CoNLL-U format.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.2, 0.1.5
Change log CHANGELOG
Dependencies base (>=4.9 && <5), containers (>=0.6 && <0.7), directory (>=1.3 && <1.4), filepath (>=1.4 && <1.5), hs-conllu, megaparsec (>=9 && <10), void (<1) [details]
Tested with ghc ==8.8.3
License LGPL-3.0-only
Copyright 2021 bruno cuconato
Author bruno cuconato
Maintainer bruno cuconato <bcclaro+haskell+hsconllu@gmail.com>
Home page https://github.com/odanoburu/hs-conllu
Bug tracker https://github.com/odanoburu/hs-conllu/issues
Source repo head: git clone https://github.com/odanoburu/hs-conllu.git
Uploaded by odanoburu at 2021-04-17T14:14:53Z
Distributions NixOS:0.1.5
Reverse Dependencies 1 direct, 0 indirect [details]
Executables hs-conllu
Downloads 1056 total (11 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2021-04-17 [all 1 reports]

Readme for hs-conllu-0.1.5

[back to package description]
# -*- mode: org -*-
#+TITLE: hs-conllu

[[https://travis-ci.org/odanoburu/hs-conllu][file:https://travis-ci.org/odanoburu/hs-conllu.svg?branch=master]]
[[http://hackage.haskell.org/package/hs-conllu][file:https://img.shields.io/hackage/v/hs-conllu.svg?style=flt]]

this package provides a validating[fn:1] parser of the [[http://universaldependencies.org/format.html][CoNLL-U format]], along
with a data model for its constituents. reading, pretty-printing, and
diffing functions are also provided.

further processing utilities are being developed and will be placed in
a separate package.

* installation
=hs-conllu= is available on [[http://hackage.haskell.org/package/hs-conllu][Hackage]], but if you prefer to install from
source:
  #+BEGIN_SRC sh
  cd /path/of/choice/
  git clone $REPO_URL
  #+END_SRC
  - using =cabal=:
    #+BEGIN_SRC sh
    cabal install
    #+END_SRC
  - using =stack=:
    #+BEGIN_SRC sh
    stack setup
    stack build
    stack install --system-ghc
    #+END_SRC

  the library is tested with multiple GHC versions, on Linux and on
  OSX (thanks Travis!).

  if you have problems with the dependency versions, you may try to
  alter them in the cabal file for the version you have. the version
  bounds were generated automatically by cabal, and are probably
  conservative -- the library probably will probably still work if you
  have the same major version. (if it does, make a PR!)

  if you don't want to have this kind of problem anymore, try [[https://docs.haskellstack.org/en/stable/README/][stack]]
  (see why [[https://www.fpcomplete.com/blog/2015/06/why-is-stack-not-cabal][here]]).

* usage
  if you would like to request features, please open an issue.

** hs-conllu, the executable
   this executable can be called using stack by
   : stack exec hs-conllu [subcommand] [args]
   it currently has two subcommands:
   - validate :: read and pretty-print the file given as argument.
   - diff :: diff the two CoNLL-U files provided as arguments, and
             print them. this assumes changes have only been made to
             word fields, not to sentence ordering, etc. if you'd like
             finer grained diffing, you will have to use the library.

** Reading CoNLL-U files
   the reading functions are in the =IO= module.
   #+BEGIN_SRC sh
   $ ghci
   > import Conllu.IO
   > d <- readConllu "path/to/conllu"
   #+END_SRC
   will read the file at the specified path, or all the =*.conllu=
   files in that path.

   if your CoNLL-U files don't stricly follow the specification or I
   got the parser wrong, please open an issue! aditionally, you may
   solve your problem if you take a look at the =Parser= module.

** Customizable parsers
   if you just want to tweak how a few fields of the CoNLL-U format
   are parsed, you may write a parser for that field and then
   customize the standard parser with it. see the Haddock
   documentation for the =Parse= module.

   I didn't make the parser as customizable as it could be, so if that
   bothers you, please create an issue or file a PR!

** Pretty-Printing
   the printing functions are in the =Print= module. see the Haddock
   documentation!

** Diffing
   see the =Diff= module Haddock documentation.

* contributing
  I'm a new haskeller, so any help will probably be useful -- even if
  its just a few pointers and comments on how I can improve the
  library or my code.

  if you want to contribute code, let me know, and go right on. you
  may want to look at the =TODO.org= file.

* Footnotes

[fn:1] it currently only validates the CoNLL-U syntax, not its
semantics (i.e., it will report an error if it finds a letter on the
ID field, but won't complain if you specified an inexisting word as
HEAD of another word).