symbolize: Efficient global Symbol table, with Garbage Collection.

[ bsd3, data, data-structures, library ] [ Propose Tags ] [ Report a vulnerability ]

Symbols, also known as Atoms or Interned Strings, are a common technique to reduce memory usage and improve performance when using many small strings:

A Symbol represents a string (any Textual, so String, Text, ShortText, ByteString, ShortByteString, etc.)

Just like ShortText, ShortByteString and ByteArray, a Symbol has an optimized memory representation, directly wrapping a primitive ByteArray#.

Furthermore, a global symbol table keeps track of which values currently exist, ensuring we always deduplicate symbols. This therefore allows us to:

  • Check for equality between symbols in constant-time (using pointer equality)

  • Calculate the hash in constant-time (using StableName)

  • Keep the memory footprint of repeatedly-seen strings low.

This is very useful if you're frequently comparing strings and the same strings might come up many times. It also makes Symbol a great candidate for a key in e.g. a HashMap or HashSet.

The global symbol table is implemented using weak pointers, which means that unused symbols will be garbage collected. As such, you do not need to be concerned about memory leaks (as is the case with many other symbol table implementations).

Please see the full README below or on GitHub at https://github.com/Qqwy/haskell-symbolize#readme


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0, 0.1.0.1, 0.1.0.2, 0.1.0.3, 1.0.0.0, 1.0.0.1, 1.0.0.2, 1.0.0.3, 1.0.0.4, 1.0.1.0, 1.0.2.0, 1.0.2.1, 1.0.2.2, 1.0.2.3, 1.0.2.4, 1.0.3.0, 1.0.3.1
Change log CHANGELOG.md
Dependencies base (>=4.7 && <5), binary (>=0.8.9 && <0.9), bytestring (>=0.11.0 && <0.13), deepseq (>=1.4.0 && <1.6), hashable (>=1.4.0 && <1.6), random (>=1.2 && <2), text (>=2.0 && <2.2), text-short (>=0.1.0 && <0.2), vector (>=0.12.0 && <0.14), vector-hashtables (>=0.1 && <0.2) [details]
Tested with ghc ==9.4.8 || ==9.6.6 || ==9.8.4 || ==9.10.1
License BSD-3-Clause
Copyright 2023-2025 Marten Wijnja
Author Qqwy / Marten
Maintainer qqwy@gmx.com
Category Data, Data Structures
Home page https://github.com/Qqwy/haskell-symbolize#readme
Bug tracker https://github.com/Qqwy/haskell-symbolize/issues
Source repo head: git clone https://github.com/Qqwy/haskell-symbolize
Uploaded by qqwy at 2025-03-02T01:26:47Z
Distributions NixOS:0.1.0.3, Stackage:1.0.3.1
Downloads 350 total (58 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2025-03-02 [all 1 reports]

Readme for symbolize-1.0.3.1

[back to package description]

Symbolize

Hackage HackageDocumentation test

Haskell library implementing a global Symbol Table, with garbage collection.

API Documentation

Symbols, also known as Atoms or Interned Strings, are a common technique to reduce memory usage and improve performance when using many small strings:

A Symbol represents a string (any Textual, so String, Text, ShortText, ByteString, ShortByteString, etc.)

Just like ShortText, ShortByteString and ByteArray, a Symbol has an optimized memory representation, directly wrapping a primitive ByteArray#.

Furthermore, a global symbol table keeps track of which values currently exist, ensuring we always deduplicate symbols. This therefore allows us to:

  • Check for equality between symbols in constant-time (using pointer equality)
  • Calculate the hash in constant-time (using StableName)
  • Keep the memory footprint of repeatedly-seen strings low.

This is very useful if you're frequently comparing strings and the same strings might come up many times. It also makes Symbol a great candidate for a key in e.g. a HashMap or HashSet.

The global symbol table is implemented using weak pointers, which means that unused symbols will be garbage collected. As such, you do not need to be concerned about memory leaks (as is the case with many other symbol table implementations).

Symbols are considered 'the same' regardless of whether they originate from a String, (lazy or strict, normal or short) Data.Text, (lazy or strict, normal or short) Data.ByteString etc.

The main advantages of Symbolize over other symbol table implementations are:

  • Garbage collection: Symbols which are no longer used are automatically cleaned up.
  • Support for any Textual type, including String, (strict and lazy) Data.Text, (strict and lazy) Data.ByteString, ShortText, ShortByteString, etc.
  • Great memory usage:
    • Symbols are simply a (lifted) wrapper around a ByteArray#, which is nicely unpacked by GHC.
    • The symbol table is an IntMap that contains weak pointers to these same ByteArray#s and their associated StableName#s
  • Great performance:
    • unintern is a simple pointer-dereference
    • calls to lookup are free of atomic memory barriers (and never have to wait on a concurrent thread running intern)
  • Thread-safe

Basic usage

This module is intended to be imported qualified, e.g.

import Symbolize (Symbol)
import qualified Symbolize

To intern a string, use intern:

>>> hello = Symbolize.intern "hello"
>>> world = Symbolize.intern "world"
>>> (hello, world)
(Symbolize.intern "hello",Symbolize.intern "world")

Interning supports any Textual type, so you can also use Data.Text or Data.ByteString etc.:

>>> import Data.Text (Text)
>>> niceCheeses = fmap Symbolize.intern (["Roquefort", "Camembert", "Brie"] :: [Text])
>>> niceCheeses
[Symbolize.intern "Roquefort",Symbolize.intern "Camembert",Symbolize.intern "Brie"]

And if you are using OverloadedStrings, you can use the IsString instance to intern constants:

>>> hello2 = ("hello" :: Symbol)
>>> hello2
Symbolize.intern "hello"

Comparisons between symbols run in O(1) time:

>>> hello == hello2
True
>>> hello == world
False

To get back the textual value of a symbol, use unintern:

>>> Symbolize.unintern hello
"hello"

If you only want to check whether a string is already interned, use lookup:

>>> Symbolize.lookup "hello"
Just (Symbolize.intern "hello")

Symbols make great keys for Data.HashMap and Data.HashSet. Hashing them is a no-op and they are guaranteed to be unique:

>>> import qualified Data.Hashable as Hashable
>>> Hashable.hash hello
0
>>> fmap Hashable.hash niceCheeses
[2,3,4]

For introspection, you can look at how many symbols currently exist:

>>> Symbolize.globalSymbolTableSize
5
>>> [unintern (intern (show x)) | x <- [1..5]]
["1","2","3","4","5"]
>>> Symbolize.globalSymbolTableSize
10

Unused symbols will be garbage-collected, so you don't have to worry about memory leaks:

>>> System.Mem.performGC
>>> Symbolize.globalSymbolTableSize
5

For deeper introspection, you can look at the Show instance of the global symbol table: /(Note that the exact format is subject to change.)/

>>> Symbolize.globalSymbolTable
GlobalSymbolTable { size = 5, symbols = ["Brie","Camembert","Roquefort","hello","world"] }