Safe Haskell | None |
---|---|
Language | Haskell98 |
Synopsis
- data Dendrogram a
- = Leaf a
- | Branch !Distance (Dendrogram a) (Dendrogram a)
- type Distance = Double
- elements :: Dendrogram a -> [a]
- cutAt :: Dendrogram a -> Distance -> [Dendrogram a]
- data Linkage
- dendrogram :: Linkage -> [a] -> (a -> a -> Distance) -> Dendrogram a
Dendrogram data type
data Dendrogram a Source #
Data structure for storing hierarchical clusters. The
distance between clusters is stored on the branches.
Distances between leafs are the distances between the elements
on those leafs, while distances between branches are defined
by the linkage used (see Linkage
).
Leaf a | The leaf contains the item |
Branch !Distance (Dendrogram a) (Dendrogram a) | Each branch connects two clusters/dendrograms that are
|
Instances
elements :: Dendrogram a -> [a] Source #
List of elements in a dendrogram.
cutAt :: Dendrogram a -> Distance -> [Dendrogram a] Source #
dendro `cutAt` threshold
cuts the dendrogram dendro
at
all branches which have distances strictly greater than
threshold
.
For example, suppose we have
dendro = Branch 0.8 (Branch 0.5 (Branch 0.2 (Leaf 'A') (Leaf 'B')) (Leaf 'C')) (Leaf 'D')
Then:
dendro `cutAt` 0.9 == dendro `cutAt` 0.8 == [dendro] -- no changes dendro `cutAt` 0.7 == dendro `cutAt` 0.5 == [Branch 0.5 (Branch 0.2 (Leaf 'A') (Leaf 'B')) (Leaf 'C'), Leaf 'D'] dendro `cutAt` 0.4 == dendro `cutAt` 0.2 == [Branch 0.2 (Leaf 'A') (Leaf 'B'), Leaf 'C', Leaf 'D'] dendro `cutAt` 0.1 == [Leaf 'A', Leaf 'B', Leaf 'C', Leaf 'D'] -- no branches at all
Linkage data type
The linkage type determines how the distance between clusters will be calculated. These are the linkage types currently available on this library.
SingleLinkage | The distance between two clusters |
CompleteLinkage | The distance between two clusters |
CLINK | The same as |
UPGMA | Unweighted Pair Group Method with Arithmetic mean, also
called "average linkage". The distance between two
clusters |
FakeAverageLinkage | This method is usually wrongly called "average linkage".
The distance between cluster
|
Instances
Enum Linkage Source # | |
Eq Linkage Source # | |
Ord Linkage Source # | |
Defined in Data.Clustering.Hierarchical.Internal.Types | |
Show Linkage Source # | |
Clustering function
:: Linkage | Linkage type to be used. |
-> [a] | Items to be clustered. |
-> (a -> a -> Distance) | Distance function between items. |
-> Dendrogram a | Complete dendrogram. |
Calculates a complete, rooted dendrogram for a list of items and a linkage type. The following are the time and space complexities for each linkage:
SingleLinkage
- O(n^2) time and O(n) space, using the SLINK algorithm. This algorithm is optimal in both space and time and gives the same answer as the naive algorithm using a distance matrix.
CompleteLinkage
- O(n^3) time and O(n^2) space, using
the naive algorithm with a distance matrix. Use
CLINK
if you need more performance. - Complete linkage with
CLINK
- O(n^2) time and O(n) space, using the CLINK algorithm. Note that this algorithm doesn't always give the same answer as the naive algorithm using a distance matrix, but it's much faster.
UPGMA
- O(n^3) time and O(n^2) space, using the naive algorithm with a distance matrix.
FakeAverageLinkage
- O(n^3) time and O(n^2) space, using the naive algorithm with a distance matrix.