bloodhound-0.12.1.0: ElasticSearch client library for Haskell

Copyright(C) 2014 Chris Allen
LicenseBSD-style (see the file LICENSE)
MaintainerChris Allen <cma@bitemyapp.com
Stabilityprovisional
PortabilityOverloadedStrings
Safe HaskellNone
LanguageHaskell2010

Database.Bloodhound.Client

Contents

Description

Client side functions for talking to Elasticsearch servers.

Synopsis

Bloodhound client functions

The examples in this module assume the following code has been run. The :{ and :} will only work in GHCi. You'll only need the data types and typeclass instances for the functions that make use of them.

>>> :set -XOverloadedStrings
>>> :set -XDeriveGeneric
>>> import Database.Bloodhound
>>> let testServer = (Server "http://localhost:9200")
>>> let runBH' = withBH defaultManagerSettings testServer
>>> let testIndex = IndexName "twitter"
>>> let testMapping = MappingName "tweet"
>>> let defaultIndexSettings = IndexSettings (ShardCount 1) (ReplicaCount 0)
>>> data TweetMapping = TweetMapping deriving (Eq, Show)
>>> _ <- runBH' $ deleteIndex testIndex >> deleteMapping testIndex testMapping
>>> import GHC.Generics
>>> import           Data.Time.Calendar        (Day (..))
>>> import Data.Time.Clock (UTCTime (..), secondsToDiffTime)
>>> :{
instance ToJSON TweetMapping where
         toJSON TweetMapping =
           object ["properties" .=
             object ["location" .=
               object ["type" .= ("geo_point" :: Text)]]]
data Location = Location { lat :: Double
                        , lon :: Double } deriving (Eq, Generic, Show)
data Tweet = Tweet { user     :: Text
                   , postDate :: UTCTime
                   , message  :: Text
                   , age      :: Int
                   , location :: Location } deriving (Eq, Generic, Show)
exampleTweet = Tweet { user     = "bitemyapp"
                     , postDate = UTCTime
                                  (ModifiedJulianDay 55000)
                                  (secondsToDiffTime 10)
                     , message  = "Use haskell!"
                     , age      = 10000
                     , location = Location 40.12 (-71.34) }
instance ToJSON   Tweet where
 toJSON = genericToJSON defaultOptions
instance FromJSON Tweet where
 parseJSON = genericParseJSON defaultOptions
instance ToJSON   Location where
 toJSON = genericToJSON defaultOptions
instance FromJSON Location where
 parseJSON = genericParseJSON defaultOptions
data BulkTest = BulkTest { name :: Text } deriving (Eq, Generic, Show)
instance FromJSON BulkTest where
 parseJSON = genericParseJSON defaultOptions
instance ToJSON BulkTest where
 toJSON = genericToJSON defaultOptions
:}

withBH :: ManagerSettings -> Server -> BH IO a -> IO a Source

Convenience function that sets up a manager and BHEnv and runs the given set of bloodhound operations. Connections will be pipelined automatically in accordance with the given manager settings in IO. If you've got your own monad transformer stack, you should use runBH directly.

Indices

createIndex :: MonadBH m => IndexSettings -> IndexName -> m Reply Source

createIndex will create an index given a Server, IndexSettings, and an IndexName.

>>> response <- runBH' $ createIndex defaultIndexSettings (IndexName "didimakeanindex")
>>> respIsTwoHunna response
True
>>> runBH' $ indexExists (IndexName "didimakeanindex")
True

deleteIndex :: MonadBH m => IndexName -> m Reply Source

deleteIndex will delete an index given a Server, and an IndexName.

>>> _ <- runBH' $ createIndex defaultIndexSettings (IndexName "didimakeanindex")
>>> response <- runBH' $ deleteIndex (IndexName "didimakeanindex")
>>> respIsTwoHunna response
True
>>> runBH' $ indexExists testIndex
False

updateIndexSettings :: MonadBH m => NonEmpty UpdatableIndexSetting -> IndexName -> m Reply Source

updateIndexSettings will apply a non-empty list of setting updates to an index

>>> _ <- runBH' $ createIndex defaultIndexSettings (IndexName "unconfiguredindex")
>>> response <- runBH' $ updateIndexSettings (BlocksWrite False :| []) (IndexName "unconfiguredindex")
>>> respIsTwoHunna response
True

optimizeIndex :: MonadBH m => IndexSelection -> IndexOptimizationSettings -> m Reply Source

optimizeIndex will optimize a single index, list of indexes or all indexes. Note that this call will block until finishing but will continue even if the request times out. Concurrent requests to optimize an index while another is performing will block until the previous one finishes. For more information see https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-optimize.html. Nothing worthwhile comes back in the reply body, so matching on the status should suffice.

optimizeIndex with a maxNumSegments of 1 and onlyExpungeDeletes to True is the main way to release disk space back to the OS being held by deleted documents.

Note that this API was deprecated in ElasticSearch 2.1 for the almost completely identical forcemerge API. Adding support to that API would be trivial but due to the significant breaking changes, this library cannot currently be used with >= 2.0, so that feature was omitted.

>>> let ixn = IndexName "unoptimizedindex"
>>> _ <- runBH' $ deleteIndex ixn >> createIndex defaultIndexSettings ixn
>>> response <- runBH' $ optimizeIndex (IndexList (ixn :| [])) (defaultIndexOptimizationSettings { maxNumSegments = Just 1, onlyExpungeDeletes = True })
>>> respIsTwoHunna response
True

indexExists :: MonadBH m => IndexName -> m Bool Source

indexExists enables you to check if an index exists. Returns Bool in IO

>>> exists <- runBH' $ indexExists testIndex

openIndex :: MonadBH m => IndexName -> m Reply Source

openIndex opens an index given a Server and an IndexName. Explained in further detail at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-open-close.html

>>> reply <- runBH' $ openIndex testIndex

closeIndex :: MonadBH m => IndexName -> m Reply Source

closeIndex closes an index given a Server and an IndexName. Explained in further detail at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-open-close.html

>>> reply <- runBH' $ closeIndex testIndex

listIndices :: (MonadThrow m, MonadBH m) => m [IndexName] Source

listIndices returns a list of all index names on a given Server

waitForYellowIndex :: MonadBH m => IndexName -> m Reply Source

Block until the index becomes available for indexing documents. This is useful for integration tests in which indices are rapidly created and deleted.

Index Aliases

updateIndexAliases :: MonadBH m => NonEmpty IndexAliasAction -> m Reply Source

updateIndexAliases updates the server's index alias table. Operations are atomic. Explained in further detail at https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

>>> let src = IndexName "a-real-index"
>>> let aliasName = IndexName "an-alias"
>>> let iAlias = IndexAlias src (IndexAliasName aliasName)
>>> let aliasCreate = IndexAliasCreate Nothing Nothing
>>> _ <- runBH' $ deleteIndex src
>>> respIsTwoHunna <$> runBH' (createIndex defaultIndexSettings src)
True
>>> runBH' $ indexExists src
True
>>> respIsTwoHunna <$> runBH' (updateIndexAliases (AddAlias iAlias aliasCreate :| []))
True
>>> runBH' $ indexExists aliasName
True

getIndexAliases :: (MonadBH m, MonadThrow m) => m (Either EsError IndexAliasesSummary) Source

Get all aliases configured on the server.

Index Templates

putTemplate :: MonadBH m => IndexTemplate -> TemplateName -> m Reply Source

putTemplate creates a template given an IndexTemplate and a TemplateName. Explained in further detail at https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-templates.html

>>> let idxTpl = IndexTemplate (TemplatePattern "tweet-*") (Just (IndexSettings (ShardCount 1) (ReplicaCount 1))) [toJSON TweetMapping]
>>> resp <- runBH' $ putTemplate idxTpl (TemplateName "tweet-tpl")

templateExists :: MonadBH m => TemplateName -> m Bool Source

templateExists checks to see if a template exists.

>>> exists <- runBH' $ templateExists (TemplateName "tweet-tpl")

deleteTemplate :: MonadBH m => TemplateName -> m Reply Source

deleteTemplate is an HTTP DELETE and deletes a template.

>>> let idxTpl = IndexTemplate (TemplatePattern "tweet-*") (Just (IndexSettings (ShardCount 1) (ReplicaCount 1))) [toJSON TweetMapping]
>>> _ <- runBH' $ putTemplate idxTpl (TemplateName "tweet-tpl")
>>> resp <- runBH' $ deleteTemplate (TemplateName "tweet-tpl")

Mapping

putMapping :: (MonadBH m, ToJSON a) => IndexName -> MappingName -> a -> m Reply Source

putMapping is an HTTP PUT and has upsert semantics. Mappings are schemas for documents in indexes.

>>> _ <- runBH' $ createIndex defaultIndexSettings testIndex
>>> resp <- runBH' $ putMapping testIndex testMapping TweetMapping
>>> print resp
Response {responseStatus = Status {statusCode = 200, statusMessage = "OK"}, responseVersion = HTTP/1.1, responseHeaders = [("Content-Type","application/json; charset=UTF-8"),("Content-Length","21")], responseBody = "{\"acknowledged\":true}", responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}

deleteMapping :: MonadBH m => IndexName -> MappingName -> m Reply Source

deleteMapping is an HTTP DELETE and deletes a mapping for a given index. Mappings are schemas for documents in indexes.

>>> _ <- runBH' $ createIndex defaultIndexSettings testIndex
>>> _ <- runBH' $ putMapping testIndex testMapping TweetMapping
>>> resp <- runBH' $ deleteMapping testIndex testMapping
>>> print resp
Response {responseStatus = Status {statusCode = 200, statusMessage = "OK"}, responseVersion = HTTP/1.1, responseHeaders = [("Content-Type","application/json; charset=UTF-8"),("Content-Length","21")], responseBody = "{\"acknowledged\":true}", responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}

Documents

indexDocument :: (ToJSON doc, MonadBH m) => IndexName -> MappingName -> IndexDocumentSettings -> doc -> DocId -> m Reply Source

indexDocument is the primary way to save a single document in Elasticsearch. The document itself is simply something we can convert into a JSON Value. The DocId will function as the primary key for the document.

>>> resp <- runBH' $ indexDocument testIndex testMapping defaultIndexDocumentSettings exampleTweet (DocId "1")
>>> print resp
Response {responseStatus = Status {statusCode = 201, statusMessage = "Created"}, responseVersion = HTTP/1.1, responseHeaders = [("Content-Type","application/json; charset=UTF-8"),("Content-Length","74")], responseBody = "{\"_index\":\"twitter\",\"_type\":\"tweet\",\"_id\":\"1\",\"_version\":1,\"created\":true}", responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}

updateDocument :: (ToJSON patch, MonadBH m) => IndexName -> MappingName -> IndexDocumentSettings -> patch -> DocId -> m Reply Source

updateDocument provides a way to perform an partial update of a an already indexed document.

getDocument :: MonadBH m => IndexName -> MappingName -> DocId -> m Reply Source

getDocument is a straight-forward way to fetch a single document from Elasticsearch using a Server, IndexName, MappingName, and a DocId. The DocId is the primary key for your Elasticsearch document.

>>> yourDoc <- runBH' $ getDocument testIndex testMapping (DocId "1")

documentExists :: MonadBH m => IndexName -> MappingName -> Maybe DocumentParent -> DocId -> m Bool Source

documentExists enables you to check if a document exists. Returns Bool in IO

>>> exists <- runBH' $ documentExists testIndex testMapping Nothing (DocId "1")

deleteDocument :: MonadBH m => IndexName -> MappingName -> DocId -> m Reply Source

deleteDocument is the primary way to delete a single document.

>>> _ <- runBH' $ deleteDocument testIndex testMapping (DocId "1")

Searching

searchAll :: MonadBH m => Search -> m Reply Source

searchAll, given a Search, will perform that search against all indexes on an Elasticsearch server. Try to avoid doing this if it can be helped.

>>> let query = TermQuery (Term "user" "bitemyapp") Nothing
>>> let search = mkSearch (Just query) Nothing
>>> reply <- runBH' $ searchAll search

searchByIndex :: MonadBH m => IndexName -> Search -> m Reply Source

searchByIndex, given a Search and an IndexName, will perform that search against all mappings within an index on an Elasticsearch server.

>>> let query = TermQuery (Term "user" "bitemyapp") Nothing
>>> let search = mkSearch (Just query) Nothing
>>> reply <- runBH' $ searchByIndex testIndex search

searchByType :: MonadBH m => IndexName -> MappingName -> Search -> m Reply Source

searchByType, given a Search, IndexName, and MappingName, will perform that search against a specific mapping within an index on an Elasticsearch server.

>>> let query = TermQuery (Term "user" "bitemyapp") Nothing
>>> let search = mkSearch (Just query) Nothing
>>> reply <- runBH' $ searchByType testIndex testMapping search

scanSearch :: (FromJSON a, MonadBH m, MonadThrow m) => IndexName -> MappingName -> Search -> m [Hit a] Source

scanSearch uses the 'scan&scroll' API of elastic, for a given IndexName and MappingName. Note that this will consume the entire search result set and will be doing O(n) list appends so this may not be suitable for large result sets. In that case, getInitialScroll and advanceScroll are good low level tools. You should be able to hook them up trivially to conduit, pipes, or your favorite streaming IO abstraction of choice. Note that ordering on the search would destroy performance and thus is ignored.

getInitialScroll :: MonadBH m => IndexName -> MappingName -> Search -> m (Maybe ScrollId) Source

For a given scearch, request a scroll for efficient streaming of search results. Note that the search is put into SearchTypeScan mode and thus results will not be sorted. Combine this with advanceScroll to efficiently stream through the full result set

advanceScroll Source

Arguments

:: (FromJSON a, MonadBH m, MonadThrow m) 
=> ScrollId 
-> NominalDiffTime

How long should the snapshot of data be kept around? This timeout is updated every time advanceScroll is used, so don't feel the need to set it to the entire duration of your search processing. Note that durations < 1s will be rounded up. Also note that NominalDiffTime is an instance of Num so literals like 60 will be interpreted as seconds. 60s is a reasonable default.

-> m (Either EsError (SearchResult a)) 

Use the given scroll to fetch the next page of documents. If there are no further pages, 'SearchResult.searchHits.hits' will be '[]'.

refreshIndex :: MonadBH m => IndexName -> m Reply Source

refreshIndex will force a refresh on an index. You must do this if you want to read what you wrote.

>>> _ <- runBH' $ createIndex defaultIndexSettings testIndex
>>> _ <- runBH' $ refreshIndex testIndex

mkSearch :: Maybe Query -> Maybe Filter -> Search Source

mkSearch is a helper function for defaulting additional fields of a Search to Nothing in case you only care about your Query and Filter. Use record update syntax if you want to add things like aggregations or highlights while still using this helper function.

>>> let query = TermQuery (Term "user" "bitemyapp") Nothing
>>> mkSearch (Just query) Nothing
Search {queryBody = Just (TermQuery (Term {termField = "user", termValue = "bitemyapp"}) Nothing), filterBody = Nothing, sortBody = Nothing, aggBody = Nothing, highlight = Nothing, trackSortScores = False, from = From 0, size = Size 10, searchType = SearchTypeQueryThenFetch, fields = Nothing, source = Nothing}

mkAggregateSearch :: Maybe Query -> Aggregations -> Search Source

mkAggregateSearch is a helper function that defaults everything in a Search except for the Query and the Aggregation.

>>> let terms = TermsAgg $ (mkTermsAggregation "user") { termCollectMode = Just BreadthFirst }
>>> terms
TermsAgg (TermsAggregation {term = Left "user", termInclude = Nothing, termExclude = Nothing, termOrder = Nothing, termMinDocCount = Nothing, termSize = Nothing, termShardSize = Nothing, termCollectMode = Just BreadthFirst, termExecutionHint = Nothing, termAggs = Nothing})
>>> let myAggregation = mkAggregateSearch Nothing $ mkAggregations "users" terms

mkHighlightSearch :: Maybe Query -> Highlights -> Search Source

mkHighlightSearch is a helper function that defaults everything in a Search except for the Query and the Aggregation.

>>> let query = QueryMatchQuery $ mkMatchQuery (FieldName "_all") (QueryString "haskell")
>>> let testHighlight = Highlights Nothing [FieldHighlight (FieldName "message") Nothing]
>>> let search = mkHighlightSearch (Just query) testHighlight

bulk :: MonadBH m => Vector BulkOperation -> m Reply Source

bulk uses Elasticsearch's bulk API to perform bulk operations. The BulkOperation data type encodes the index/update/delete/create operations. You pass a Vector of BulkOperations and a Server to bulk in order to send those operations up to your Elasticsearch server to be performed. I changed from [BulkOperation] to a Vector due to memory overhead.

>>> let stream = V.fromList [BulkIndex testIndex testMapping (DocId "2") (toJSON (BulkTest "blah"))]
>>> _ <- runBH' $ bulk stream
>>> _ <- runBH' $ refreshIndex testIndex

pageSearch Source

Arguments

:: From

The result offset

-> Size

The number of results to return

-> Search

The current seach

-> Search

The paged search

pageSearch is a helper function that takes a search and assigns the from and size fields for the search. The from parameter defines the offset from the first result you want to fetch. The size parameter allows you to configure the maximum amount of hits to be returned.

>>> let query = QueryMatchQuery $ mkMatchQuery (FieldName "_all") (QueryString "haskell")
>>> let search = mkSearch (Just query) Nothing
>>> search
Search {queryBody = Just (QueryMatchQuery (MatchQuery {matchQueryField = FieldName "_all", matchQueryQueryString = QueryString "haskell", matchQueryOperator = Or, matchQueryZeroTerms = ZeroTermsNone, matchQueryCutoffFrequency = Nothing, matchQueryMatchType = Nothing, matchQueryAnalyzer = Nothing, matchQueryMaxExpansions = Nothing, matchQueryLenient = Nothing, matchQueryBoost = Nothing})), filterBody = Nothing, sortBody = Nothing, aggBody = Nothing, highlight = Nothing, trackSortScores = False, from = From 0, size = Size 10, searchType = SearchTypeQueryThenFetch, fields = Nothing, source = Nothing}
>>> pageSearch (From 10) (Size 100) search
Search {queryBody = Just (QueryMatchQuery (MatchQuery {matchQueryField = FieldName "_all", matchQueryQueryString = QueryString "haskell", matchQueryOperator = Or, matchQueryZeroTerms = ZeroTermsNone, matchQueryCutoffFrequency = Nothing, matchQueryMatchType = Nothing, matchQueryAnalyzer = Nothing, matchQueryMaxExpansions = Nothing, matchQueryLenient = Nothing, matchQueryBoost = Nothing})), filterBody = Nothing, sortBody = Nothing, aggBody = Nothing, highlight = Nothing, trackSortScores = False, from = From 10, size = Size 100, searchType = SearchTypeQueryThenFetch, fields = Nothing, source = Nothing}

mkShardCount :: Int -> Maybe ShardCount Source

mkShardCount is a straight-forward smart constructor for ShardCount which rejects Int values below 1 and above 1000.

>>> mkShardCount 10
Just (ShardCount 10)

mkReplicaCount :: Int -> Maybe ReplicaCount Source

mkReplicaCount is a straight-forward smart constructor for ReplicaCount which rejects Int values below 0 and above 1000.

>>> mkReplicaCount 10
Just (ReplicaCount 10)

getStatus :: MonadBH m => m (Maybe Status) Source

getStatus fetches the Status of a Server

>>> serverStatus <- runBH' getStatus
>>> fmap status (serverStatus)
Just 200

Snapshot/Restore

Snapshot Repos

getSnapshotRepos :: (MonadBH m, MonadThrow m) => SnapshotRepoSelection -> m (Either EsError [GenericSnapshotRepo]) Source

getSnapshotRepos gets the definitions of a subset of the defined snapshot repos.

updateSnapshotRepo Source

Arguments

:: (MonadBH m, SnapshotRepo repo) 
=> SnapshotRepoUpdateSettings

Use defaultSnapshotRepoUpdateSettings if unsure

-> repo 
-> m Reply 

Create or update a snapshot repo

verifySnapshotRepo :: (MonadBH m, MonadThrow m) => SnapshotRepoName -> m (Either EsError SnapshotVerification) Source

Verify if a snapshot repo is working. NOTE: this API did not make it into ElasticSearch until 1.4. If you use an older version, you will get an error here.

Snapshots

getSnapshots :: (MonadBH m, MonadThrow m) => SnapshotRepoName -> SnapshotSelection -> m (Either EsError [SnapshotInfo]) Source

Get info about known snapshots given a pattern and repo name.

deleteSnapshot :: MonadBH m => SnapshotRepoName -> SnapshotName -> m Reply Source

Delete a snapshot. Cancels if it is running.

Restoring Snapshots

restoreSnapshot Source

Arguments

:: MonadBH m 
=> SnapshotRepoName 
-> SnapshotName 
-> SnapshotRestoreSettings

Start with defaultSnapshotRestoreSettings and customize from there for reasonable defaults.

-> m Reply 

Nodes

Request Utilities

encodeBulkOperations :: Vector BulkOperation -> ByteString Source

encodeBulkOperations is a convenience function for dumping a vector of BulkOperation into an ByteString

>>> let bulkOps = V.fromList [BulkIndex testIndex testMapping (DocId "2") (toJSON (BulkTest "blah"))]
>>> encodeBulkOperations bulkOps
"\n{\"index\":{\"_type\":\"tweet\",\"_id\":\"2\",\"_index\":\"twitter\"}}\n{\"name\":\"blah\"}\n"

encodeBulkOperation :: BulkOperation -> ByteString Source

encodeBulkOperation is a convenience function for dumping a single BulkOperation into an ByteString

>>> let bulkOp = BulkIndex testIndex testMapping (DocId "2") (toJSON (BulkTest "blah"))
>>> encodeBulkOperation bulkOp
"{\"index\":{\"_type\":\"tweet\",\"_id\":\"2\",\"_index\":\"twitter\"}}\n{\"name\":\"blah\"}"

Authentication

basicAuthHook :: Monad m => EsUsername -> EsPassword -> Request -> m Request Source

This is a hook that can be set via the bhRequestHook function that will authenticate all requests using an HTTP Basic Authentication header. Note that it is *strongly* recommended that this option only be used over an SSL connection.

> (mkBHEnv myServer myManager) { bhRequestHook = basicAuthHook (EsUsername "myuser") (EsPassword "mypass") }

Reply-handling tools

isVersionConflict :: Reply -> Bool Source

Was there an optimistic concurrency control conflict when indexing a document?

parseEsResponse :: (MonadThrow m, FromJSON a) => Reply -> m (Either EsError a) Source

Tries to parse a response body as the expected type a and failing that tries to parse it as an EsError. All well-formed, JSON responses from elasticsearch should fall into these two categories. If they don't, a EsProtocolException will be thrown. If you encounter this, please report the full body it reports along with your ElasticSearch verison.