Safe Haskell | None |
---|---|
Language | Haskell2010 |
A description of the operations that can be performed on nodes and columns.
- data TransformInvariant
- data Locality
- = Local
- | Distributed
- data StandardOperator = StandardOperator {}
- data ScalaStaticFunctionApplication = ScalaStaticFunctionApplication {
- sfaObjectName :: !Text
- sfaMethodName :: !Text
- data ColOp
- = ColExtraction !FieldPath
- | ColFunction !Text !(Vector ColOp)
- | ColLit !DataType !Value
- | ColStruct !(Vector TransformField)
- data TransformField = TransformField {}
- data StructuredTransform
- = InnerOp !ColOp
- | InnerStruct !(Vector TransformField)
- data DatasetTransformDesc
- data UniversalAggregatorOp = UniversalAggregatorOp {}
- data NodeOp
- makeOperator :: Text -> SQLType a -> StandardOperator
Documentation
data TransformInvariant Source #
The invariant respected by a transform.
Depending on the value of the invariant, different optimizations may be available.
Opaque | This operator has no special property. It may depend on the partitioning layout, the number of partitions, the order of elements in the partitions, etc. This sort of operator is unwelcome in Krapsh... |
PartitioningInvariant | This operator respects the canonical partition order, but may not have the same number of elements. For example, this could be a flatMap on an RDD (filter, etc.). This operator can be used locally with the signature a -> [a] |
DirectPartitioningInvariant | The strongest invariant. It respects the canonical partition order and it outputs the same number of elements. This is typically a map. This operator can be used locally with the signature a -> a |
The dynamic value of locality. There is still a tag on it, but it can be easily dropped.
Local | The data associated to this node is local. It can be materialized and accessed by the user. |
Distributed | The data associated to this node is distributed or not accessible locally. It cannot be accessed by the user. |
PHYSICAL OPERATORS ***********
data StandardOperator Source #
An operator defined by default in the release of Krapsh. All other physical operators can be converted to a standard operators.
data ScalaStaticFunctionApplication Source #
A scala method of a singleton object.
The different kinds of column operations. These operations describe the physical operations on columns as supported by Spark SQL. They can operate on column -> column, column -> row, row->row. Of course, not all operators are valid for each configuration.
ColExtraction !FieldPath | A projection onto a single column An extraction is always direct. |
ColFunction !Text !(Vector ColOp) | A function of other columns. In this case, the other columns may matter TODO(kps) add if this function is partition invariant. It should be the case most of the time. |
ColLit !DataType !Value | A constant defined for each element. The type should be the same as for the column A literal is always direct |
ColStruct !(Vector TransformField) | A structure. |
data StructuredTransform Source #
The content of a structured transform.
DATASET OPERATORS ************
data DatasetTransformDesc Source #
OBSERVABLE OPERATORS *******
AGGREGATION OPERATORS *****
NodeLocalOp StandardOperator | An operation between local nodes: [Observable] -> Observable |
NodeLocalLit !DataType !Value | An observable literal |
NodeOpaqueAggregator StandardOperator | Some aggregator that does not respect any particular invariant. |
NodeUniversalAggregator UniversalAggregatorOp | A universal aggregator. |
NodeStructuredTransform !ColOp | A structured transform, performed either on a local node or a distributed node. |
NodeDistributedLit !DataType !(Vector Value) | A distributed dataset (with no partition information) |
NodeDistributedOp StandardOperator | An opaque distributed operator. |
makeOperator :: Text -> SQLType a -> StandardOperator Source #
Makes a standard operator with no extra value