silero-vad: Voice activity detection powered by SileroVAD.

[ audio, library, mit, sound ] [ Propose Tags ] [ Report a vulnerability ]

A haskell implentation of SileroVAD, a pre-trained enterprise-grade voice activity detector.


[Skip to Readme]

Modules

[Last Documentation]

  • Silero
    • Silero.Detector
    • Silero.Model

Flags

Automatic Flags
NameDescriptionDefault
build-readme

Build the literate haskell README example.

Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0.1.0.0, 0.1.0.1, 0.1.0.2, 0.1.0.3, 0.1.0.4, 0.1.0.5
Change log CHANGELOG.md
Dependencies base (>=4.14.3.0 && <5), derive-storable (>=0.3.0.0 && <1), unix (>=2 && <3), unliftio (>=0.2.20.0 && <1), vector (>=0.13.0.0 && <1), WAVE (>=0.1 && <0.2), Win32 (>=2 && <3) [details]
Tested with ghc ==9.8, ghc ==9.2.8, ghc ==8.10.7
License MIT
Author qwbarch
Maintainer qwbarch <qwbarch@gmail.com>
Category Audio, Sound
Home page https://github.com/qwbarch/silero-vad-hs
Bug tracker https://github.com/qwbarch/silero-vad-hs/issues
Uploaded by qwbarch at 2024-12-26T10:39:14Z
Distributions
Executables readme
Downloads 113 total (113 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs not available [build log]
All reported builds failed as of 2024-12-26 [all 2 reports]

Readme for silero-vad-0.1.0.3

[back to package description]

silero-vad-hs

License: MIT Hackage

Voice activity detection powered by SileroVAD.

Supported architectures

Tested on GHC 9.8, GHC 9.2.8, and GHC 8.10.7.

  • build-linux-x64
  • build-mac-arm64
  • build-mac-x64
  • build-windows-x64

Quick start

This is a literate haskell file. You can run this example via the following:

nix develop --command bash -c '
  export LD_LIBRARY_PATH=lib:$(nix path-info .#stdenv.cc.cc.lib)/lib
  cabal run --flags="build-readme"
'

Necessary language extensions and imports for the example:

import qualified Data.Vector.Storable as Vector
import Data.Function ((&))
import Data.WAVE (sampleToDouble, WAVE (waveSamples), getWAVEFile)
import Silero (withVad, withModel, detectSegments, detectSpeech, windowLength)

For this example, the WAVE library is used for simplicity.
Unfortunately, its design is flawed and represents audio in a lazy linked list.
Prefer using wave for better performance.

main :: IO ()
main = do
  wav <- getWAVEFile "lib/jfk.wav"

The functions below expects a Vector Float. This converts it to the expected format.

  let samples =
        concat (waveSamples wav)
          & Vector.fromList
          & Vector.map (realToFrac . sampleToDouble)

Use detectSegments to detect the start/end times of voice activity segments.

  withVad $ \vad -> do
    segments <- detectSegments vad samples
    print segments

Alternatively, use detectSpeech if you want to detect if speech is found in a single window.

  withModel $ \model -> do
    probability <- detectSpeech model $ Vector.take windowLength samples
    putStrLn $ "Probability: " <> show probability

[!NOTE] Audio passed to detectSegments and detectSpeech functions have the following requirements:

  • Must be 16khz sample rate.
  • Must be mono channel.
  • Must be 16-bit audio.

When using detectSpeech, audio samples must be of size windowLength (defined as 512).
If length samples /= windowLength, the probability will always be 0.