Copyright | (c) Adam Conner-Sax 2019 |
---|---|
License | BSD |
Maintainer | adam_conner_sax@yahoo.com |
Stability | experimental |
Safe Haskell | None |
Language | Haskell2010 |
Frames.Aggregation.General contains types and functions to support a specific map/reduce operation. Frequently, data is given
with more specificity than required for downstream operations. Perhaps an age is given in years and we only need to know the
age-band. Assuming we know how to aggregagte data columns, we want to perform that aggregation on all the subsets required to
build the data-set with the simpler key, while perhaps leaving some other columns alone. aggregateFold
does this.
Synopsis
- type RecordKeyMap k k' = Record k -> Record k'
- combineKeyAggregations :: (a ⊆ (a ++ b), b ⊆ (a ++ b), Disjoint a' b' ~ True) => RecordKeyMap a a' -> RecordKeyMap b b' -> RecordKeyMap (a ++ b) (a' ++ b')
- keyMap :: forall a b. (KnownField a, KnownField b) => (Snd a -> Snd b) -> RecordKeyMap '[a] '[b]
- aggregateAllFold :: forall ak ak' d. ((ak' ++ d) ⊆ ((ak ++ d) ++ ak'), ak ⊆ (ak ++ d), ak' ⊆ (ak' ++ d), d ⊆ (ak' ++ d), Ord (Record ak'), Ord (Record ak), RecVec (ak' ++ d)) => RecordKeyMap ak ak' -> Fold (Record d) (Record d) -> Fold (Record (ak ++ d)) (FrameRec (ak' ++ d))
- aggregateFold :: forall k ak ak' d. ((ak' ++ d) ⊆ ((ak ++ d) ++ ak'), ak ⊆ (ak ++ d), ak' ⊆ (ak' ++ d), d ⊆ (ak' ++ d), Ord (Record ak'), RecVec (ak' ++ d), Ord (Record ak), (k ++ (ak' ++ d)) ~ ((k ++ ak') ++ d), Ord (Record k), k ⊆ ((k ++ ak') ++ d), k ⊆ ((k ++ ak) ++ d), (ak ++ d) ⊆ ((k ++ ak) ++ d), RecVec ((k ++ ak') ++ d)) => RecordKeyMap ak ak' -> Fold (Record d) (Record d) -> Fold (Record ((k ++ ak) ++ d)) (FrameRec ((k ++ ak') ++ d))
- mergeDataFolds :: Fold (Record d) (Record '[a]) -> Fold (Record d) (Record '[b]) -> Fold (Record d) (Record '[a, b])
Type-alias for maps from one record key to another
type RecordKeyMap k k' = Record k -> Record k' Source #
Type-alias for key aggregation functions.
Aggregation Function combinators
combineKeyAggregations :: (a ⊆ (a ++ b), b ⊆ (a ++ b), Disjoint a' b' ~ True) => RecordKeyMap a a' -> RecordKeyMap b b' -> RecordKeyMap (a ++ b) (a' ++ b') Source #
Combine 2 key aggregation functions over disjoint columns.
keyMap :: forall a b. (KnownField a, KnownField b) => (Snd a -> Snd b) -> RecordKeyMap '[a] '[b] Source #
Promote an ordinary function a -> b
to a RecordKeyMap aCol bCol
where
aCol
holds values of type a
and bCol
holds values of type b
.
aggregationFolds
:: ((ak' ++ d) ⊆ ((ak ++ d) ++ ak'), ak ⊆ (ak ++ d), ak' ⊆ (ak' ++ d), d ⊆ (ak' ++ d), Ord (Record ak'), Ord (Record ak), RecVec (ak' ++ d)) | |
=> RecordKeyMap ak ak' | get aggregated key from key |
-> Fold (Record d) (Record d) | aggregate data |
-> Fold (Record (ak ++ d)) (FrameRec (ak' ++ d)) |
Given some group keys in columns k, some keys to aggregate over in columns ak, some keys to aggregate into in (new) columns ak', a (hopefully surjective) map from records of ak to records of ak', and a fold over the data, in columns d, aggregating over the rows where ak was distinct but ak' is not, produce a fold to transform data keyed by k and ak to data keyed by k and ak' with appropriate aggregations done in the d. E.g., suppose you have voter turnout data for all 50 states in the US, keyed by state and age of voter in years. The data is two columns: total votes cast and turnout as a percentage. You want to aggregate the ages into two bands, over and under some age. So your k is the state column, ak is the age column, ak' is a new column with data type to indicate over/under. The Fold has to sum over the total votes and perform a weighted-sum over the percentages.
:: ((ak' ++ d) ⊆ ((ak ++ d) ++ ak'), ak ⊆ (ak ++ d), ak' ⊆ (ak' ++ d), d ⊆ (ak' ++ d), Ord (Record ak'), RecVec (ak' ++ d), Ord (Record ak), (k ++ (ak' ++ d)) ~ ((k ++ ak') ++ d), Ord (Record k), k ⊆ ((k ++ ak') ++ d), k ⊆ ((k ++ ak) ++ d), (ak ++ d) ⊆ ((k ++ ak) ++ d), RecVec ((k ++ ak') ++ d)) | |
=> RecordKeyMap ak ak' | get aggregated key from key |
-> Fold (Record d) (Record d) | aggregate data |
-> Fold (Record ((k ++ ak) ++ d)) (FrameRec ((k ++ ak') ++ d)) |
Aggregate key columns ak
into ak'
while leaving key columns k
along.
Allows aggregation over only some fields. Will often require a typeapplication
to specify what k
is.