duckling: A Haskell library for parsing text into structured data.

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

Warnings:

Duckling is a library for parsing text into structured data.


[Skip to Readme]

Properties

Versions 0.1.0.0, 0.1.1.0, 0.1.1.0, 0.1.2.0, 0.1.3.0, 0.1.4.0, 0.1.5.0, 0.1.6.0, 0.1.6.1, 0.2.0.0
Change log None available
Dependencies aeson (>=0.11.3.0 && <1.1), array (>=0.5.1.1 && <0.6), attoparsec (>=0.13.1.0 && <0.14), base (>=4.8.2 && <5.0), bytestring (>=0.10.6.0 && <0.11), containers (>=0.5.6.2 && <0.6), deepseq (>=1.4.1.1 && <1.5), dependent-sum (>=0.3.2.2 && <0.5), directory (>=1.2.2.0 && <1.4), duckling, extra (>=1.4.10 && <1.6), filepath (>=1.4.0.0 && <1.5), hashable (>=1.2.4.0 && <1.3), haskell-src-exts (>=1.18 && <1.19), regex-base (>=0.93.2 && <0.94), regex-pcre (>=0.94.4 && <0.95), snap-core (>=1.0.2.0 && <1.1), snap-server (>=1.0.1.1 && <1.1), text (>=1.2.2.1 && <1.3), text-show (>=2.1.2 && <3.5), time (>=1.5.0.1 && <1.9), timezone-olson (>=0.1.7 && <0.2), timezone-series (>=0.1.5.1 && <0.2), unordered-containers (>=0.2.7.2 && <0.3) [details]
License LicenseRef-OtherLicense[multiple license files]
Copyright Copyright (c) 2014-present, Facebook, Inc.
Author Facebook, Inc.
Maintainer duckling-team@fb.com
Category Systems
Home page https://github.com/facebookincubator/duckling#readme
Bug tracker https://github.com/facebookincubator/duckling/issues
Source repo head: git clone https://github.com/facebookincubator/duckling
Uploaded by patapizza at 2017-06-19T23:02:40Z

Modules

[Index]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


Readme for duckling-0.1.1.0

[back to package description]

Duckling Logo

Duckling Build Status

Duckling is a Haskell library that parses text into structured data.

"the first Tuesday of October"
=> {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"}

Requirements

A Haskell environment is required. We recommend using stack.

On macOS you'll need to install PCRE development headers. The easiest way to do that is with Homebrew:

brew install pcre

If that doesn't help, try running brew doctor and fix the issues it finds.

Quickstart

To compile and run the binary:

$ stack build
$ stack exec duckling-example-exe

The first time you run it, it will download all required packages.

This runs a basic HTTP server. Example request:

$ curl -XPOST http://0.0.0.0:8000/parse --data 'text=tomorrow at eight'

See exe/ExampleMain.hs for an example on how to integrate Duckling in your project.

Supported dimensions

Duckling supports many languages, but most don't support all dimensions yet (we need your help!).

Dimension Example input Example value output
AmountOfMoney "42€" {"value":42,"type":"value","unit":"EUR"}
Distance "6 miles" {"value":6,"type":"value","unit":"mile"}
Duration "3 mins" {"value":3,"minute":3,"unit":"minute","normalized":{"value":180,"unit":"second"}}
Email "duckling-team@fb.com" {"value":"duckling-team@fb.com"}
Numeral "eighty eight" {"value":88,"type":"value"}
Ordinal "33rd" {"value":33,"type":"value"}
PhoneNumber "+1 (650) 123-4567" {"value":"(+1) 6501234567"}
Quantity "3 cups of sugar" {"value":3,"type":"value","product":"sugar","unit":"cup"}
Temperature "80F" {"value":80,"type":"value","unit":"fahrenheit"}
Time "today at 9am" {"values":[{"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}],"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}
Url "https://api.wit.ai/message?q=hi" {"value":"https://api.wit.ai/message?q=hi","domain":"api.wit.ai"}
Volume "4 gallons" {"value":4,"type":"value","unit":"gallon"}

Extending Duckling

To regenerate the classifiers and run the test suite:

$ stack build :duckling-regen-exe && stack exec duckling-regen-exe && stack test

It's important to regenerate the classifiers after updating the code and before running the test suite.

To extend Duckling's support for a dimension in a given language, typically 2 files need to be updated:

Rules have a name, a pattern and a production. Patterns are used to perform character-level matching (regexes on input) and concept-level matching (predicates on tokens). Productions are arbitrary functions that take a list of tokens and return a new token.

The corpus (resp. negative corpus) is a list of examples that should (resp. shouldn't) parse. The reference time for the corpus is Tuesday Feb 12, 2013 at 4:30am.

Duckling.Debug provides a few debugging tools:

$ stack repl --no-load
> :l Duckling.Debug
> debug EN "in two minutes" [This Time]
in|within|after <duration> (in two minutes)
-- regex (in)
-- <integer> <unit-of-duration> (two minutes)
-- -- integer (0..19) (two)
-- -- -- regex (two)
-- -- minute (grain) (minutes)
-- -- -- regex (minutes)
[Entity {dim = "time", body = "in two minutes", value = "{\"values\":[{\"value\":\"2013-02-12T04:32:00.000-02:00\",\"grain\":\"second\",\"type\":\"value\"}],\"value\":\"2013-02-12T04:32:00.000-02:00\",\"grain\":\"second\",\"type\":\"value\"}", start = 0, end = 14}]

License

Duckling is BSD-licensed. We also provide an additional patent grant.