repa-3.4.0.1: High performance, regular, shape polymorphic parallel arrays.

Safe HaskellNone
LanguageHaskell98

Data.Array.Repa

Contents

Description

Repa arrays are wrappers around a linear structure that holds the element data.

The representation tag determines what structure holds the data.

Delayed Representations (functions that compute elements)

  • D -- Functions from indices to elements.
  • C -- Cursor functions.

Manifest Representations (real data)

  • U -- Adaptive unboxed vectors.
  • V -- Boxed vectors.
  • B -- Strict ByteStrings.
  • F -- Foreign memory buffers.

Meta Representations

  • P -- Arrays that are partitioned into several representations.
  • S -- Hints that computing this array is a small amount of work, so computation should be sequential rather than parallel to avoid scheduling overheads.
  • I -- Hints that computing this array will be an unbalanced workload, so computation of successive elements should be interleaved between the processors
  • X -- Arrays whose elements are all undefined.

Array fusion is achieved via the delayed (D) and cursored (C) representations. At compile time, the GHC simplifier combines the functions contained within D and C arrays without needing to create manifest intermediate arrays.

Advice for writing fast code:

  1. Repa does not support nested parallellism. This means that you cannot map a parallel worker function across an array and then call computeP to evaluate it, or pass a parallel worker to parallel reductions such as foldP. If you do then you will get a run-time warning and the code will run very slowly.
  2. Arrays of type (Array D sh a) or (Array C sh a) are not real arrays. They are represented as functions that compute each element on demand. You need to use computeS, computeP, computeUnboxedP and so on to actually evaluate the elements.
  3. Add INLINE pragmas to all leaf-functions in your code, expecially ones that compute numeric results. Non-inlined lazy function calls can cost upwards of 50 cycles each, while each numeric operator only costs one (or less). Inlining leaf functions also ensures they are specialised at the appropriate numeric types.
  4. Add bang patterns to all function arguments, and all fields of your data types. In a high-performance Haskell program, the cost of lazy evaluation can easily dominate the run time if not handled correctly. You don't want to rely on the strictness analyser in numeric code because if it does not return a perfect result then the performance of your program will be awful. This is less of a problem for general Haskell code, and in a different context relying on strictness analysis is fine.
  5. Scheduling an 8-thread parallel computation can take 50us on a Linux machine. You should switch to sequential evaluation functions like computeS and foldS for small arrays in inner loops, and at the bottom of a divide-and-conquer algorithm. Consider using a computeP that evaluates an array defined using computeS or foldS for each element.
  6. Compile the modules that use Repa with the following flags: -Odph -rtsopts -threaded -fno-liberate-case -funfolding-use-threshold1000 -funfolding-keeness-factor1000 -fllvm -optlo-O3 You don't want the liberate-case transform because it tends to duplicate too much intermediate code, and is not needed if you use bang patterns as per point 4. The unfolding flags tell the inliner to not to fool around with heuristics, and just inline everything. If the binaries become too big then split the array part of your program into separate modules and only compile those with the unfolding flags.
  7. Repa writes to the GHC eventlog at the start and end of each parallel computation. Use threadscope to see what your program is doing.
  8. When you're sure your program works, switch to the unsafe versions of functions like traverse. These don't do bounds checks.

Changes for Repa 3.2:

  1. Renamed some Repa 3.1 type classes to have more intuitive names: Repr -> Source, Fill -> Load, Fillable -> Target, Combine -> Structured.
  2. Also renamed MArray -> MVec to emphasise its linear structure.
  3. Made Array and MVec associated types of Source and Target respectively.
  4. Added the S (Smallness) and I (Interleave) hints.

Synopsis

Abstract array representation

class Source r e where Source

Class of array representations that we can read elements from.

Minimal complete definition

extent, linearIndex, deepSeqArray

Associated Types

data Array r sh e Source

Methods

extent :: Shape sh => Array r sh e -> sh Source

O(1). Take the extent (size) of an array.

index, unsafeIndex :: Shape sh => Array r sh e -> sh -> e Source

O(1). Shape polymorphic indexing.

linearIndex, unsafeLinearIndex :: Shape sh => Array r sh e -> Int -> e Source

O(1). Linear indexing into underlying, row-major, array representation.

deepSeqArray :: Shape sh => Array r sh e -> b -> b Source

Ensure an array's data structure is fully evaluated.

Instances

Source D a Source

Compute elements of a delayed array.

Source B Word8 Source

Read elements from a ByteString.

Storable a => Source F a Source

Read elements from a foreign buffer.

Unbox a => Source U a Source

Read elements from an unboxed vector array.

Source X e Source

Undefined array elements. Inspecting them yields error.

Source C a Source

Compute elements of a cursored array.

Source V a Source

Read elements from a boxed vector array.

Source r1 a => Source (S r1) a Source 
Source r1 a => Source (I r1) a Source 
(Source r1 e, Source r2 e) => Source (P r1 r2) e Source

Read elements from a partitioned array.

(!) :: Shape sh => Source r e => Array r sh e -> sh -> e Source

O(1). Alias for index

toList :: Shape sh => Source r e => Array r sh e -> [e] Source

O(n). Convert an array to a list.

deepSeqArrays :: Shape sh => Source r e => [Array r sh e] -> b -> b Source

Apply deepSeqArray to up to four arrays.

Computation

computeP :: (Load r1 sh e, Target r2 e, Source r2 e, Monad m) => Array r1 sh e -> m (Array r2 sh e) Source

Parallel computation of array elements.

  • The source array must have a delayed representation like D, C or P, and the result a manifest representation like U or F.
  • If you want to copy data between manifest representations then use copyP instead.
  • If you want to convert a manifest array back to a delayed representation then use delay instead.

computeS :: (Load r1 sh e, Target r2 e) => Array r1 sh e -> Array r2 sh e Source

Sequential computation of array elements.

copyP :: (Source r1 e, Source r2 e, Load D sh e, Target r2 e, Monad m) => Array r1 sh e -> m (Array r2 sh e) Source

Parallel copying of arrays.

  • This is a wrapper that delays an array before calling computeP.
  • You can use it to copy manifest arrays between representations.

copyS :: (Source r1 e, Load D sh e, Target r2 e) => Array r1 sh e -> Array r2 sh e Source

Sequential copying of arrays.

Concrete array representations

Delayed representation

data D Source

Delayed arrays are represented as functions from the index to element value.

Every time you index into a delayed array the element at that position is recomputed.

Instances

Source D a Source

Compute elements of a delayed array.

Elt e => LoadRange D DIM2 e Source

Compute a range of elements in a rank-2 array.

Shape sh => Load D sh e Source

Compute all elements in an array.

Structured D a b Source 
(Shape sh, Load D sh e) => Load (I D) sh e Source 
type TR D = D Source 
data Array D sh a = ADelayed !sh (sh -> a) Source 

fromFunction :: sh -> (sh -> a) -> Array D sh a Source

O(1). Wrap a function as a delayed array.

toFunction :: (Shape sh, Source r1 a) => Array r1 sh a -> (sh, sh -> a) Source

O(1). Produce the extent of an array, and a function to retrieve an arbitrary element.

delay :: Shape sh => Source r e => Array r sh e -> Array D sh e Source

O(1). Delay an array. This wraps the internal representation to be a function from indices to elements, so consumers don't need to worry about what the previous representation was.

Unboxed vector representation

data U Source

Unboxed arrays are represented as unboxed vectors.

The implementation uses Data.Vector.Unboxed which is based on type families and picks an efficient, specialised representation for every element type. In particular, unboxed vectors of pairs are represented as pairs of unboxed vectors. This is the most efficient representation for numerical data.

Instances

Unbox a => Source U a Source

Read elements from an unboxed vector array.

Unbox e => Target U e Source

Filling of unboxed vector arrays.

Unbox a => Structured U a b Source 
(Read sh, Read e, Unbox e) => Read (Array U sh e) Source 
(Show sh, Show e, Unbox e) => Show (Array U sh e) Source 
data MVec U = UMVec (IOVector e) Source 
type TR U = D Source 
data Array U sh a = AUnboxed !sh !(Vector a) Source 

computeUnboxedP :: (Shape sh, Load r1 sh e, Monad m, Unbox e) => Array r1 sh e -> m (Array U sh e) Source

Parallel computation of array elements.

  • This is an alias for computeP with a more specific type.

computeUnboxedS :: (Shape sh, Load r1 sh e, Unbox e) => Array r1 sh e -> Array U sh e Source

Sequential computation of array elements..

  • This is an alias for computeS with a more specific type.

fromListUnboxed :: (Shape sh, Unbox a) => sh -> [a] -> Array U sh a Source

O(n). Convert a list to an unboxed vector array.

  • This is an alias for fromList with a more specific type.

fromUnboxed :: (Shape sh, Unbox e) => sh -> Vector e -> Array U sh e Source

O(1). Wrap an unboxed vector as an array.

toUnboxed :: Unbox e => Array U sh e -> Vector e Source

O(1). Unpack an unboxed vector from an array.

Operators

Index space transformations

reshape :: (Shape sh1, Shape sh2, Source r1 e) => sh2 -> Array r1 sh1 e -> Array D sh2 e Source

Impose a new shape on the elements of an array. The new extent must be the same size as the original, else error.

append :: (Shape sh, Source r1 e, Source r2 e) => Array r1 (sh :. Int) e -> Array r2 (sh :. Int) e -> Array D (sh :. Int) e Source

Append two arrays.

(++) :: (Shape sh, Source r1 e, Source r2 e) => Array r1 (sh :. Int) e -> Array r2 (sh :. Int) e -> Array D (sh :. Int) e Source

Append two arrays.

extract Source

Arguments

:: (Shape sh, Source r e) 
=> sh

Starting index.

-> sh

Size of result.

-> Array r sh e 
-> Array D sh e 

Extract a sub-range of elements from an array.

transpose :: (Shape sh, Source r e) => Array r ((sh :. Int) :. Int) e -> Array D ((sh :. Int) :. Int) e Source

Transpose the lowest two dimensions of an array. Transposing an array twice yields the original.

backpermute Source

Arguments

:: (Shape sh1, Shape sh2, Source r e) 
=> sh2

Extent of result array.

-> (sh2 -> sh1)

Function mapping each index in the result array to an index of the source array.

-> Array r sh1 e

Source array.

-> Array D sh2 e 

Backwards permutation of an array's elements.

backpermuteDft Source

Arguments

:: (Shape sh1, Shape sh2, Source r1 e, Source r2 e) 
=> Array r2 sh2 e

Default values (arrDft)

-> (sh2 -> Maybe sh1)

Function mapping each index in the result array to an index in the source array.

-> Array r1 sh1 e

Source array.

-> Array D sh2 e 

Default backwards permutation of an array's elements. If the function returns Nothing then the value at that index is taken from the default array (arrDft)

Slice transformations

slice :: (Slice sl, Shape (FullShape sl), Shape (SliceShape sl), Source r e) => Array r (FullShape sl) e -> sl -> Array D (SliceShape sl) e Source

Take a slice from an array, according to a given specification.

For example, to take a row from a matrix use the following:

slice arr (Any :. (5::Int) :. All)

To take a column use:

slice arr (Any :. (5::Int))

extend :: (Slice sl, Shape (SliceShape sl), Shape (FullShape sl), Source r e) => sl -> Array r (SliceShape sl) e -> Array D (FullShape sl) e Source

Extend an array, according to a given slice specification.

For example, to replicate the rows of an array use the following:

extend arr (Any :. (5::Int) :. All)

Structure preserving operations

map :: (Shape sh, Source r a) => (a -> b) -> Array r sh a -> Array D sh b Source

Apply a worker function to each element of an array, yielding a new array with the same extent.

zipWith :: (Shape sh, Source r1 a, Source r2 b) => (a -> b -> c) -> Array r1 sh a -> Array r2 sh b -> Array D sh c Source

Combine two arrays, element-wise, with a binary operator. If the extent of the two array arguments differ, then the resulting array's extent is their intersection.

(+^) :: (Num c, Shape sh, Source r1 c, Source r2 c) => Array r1 sh c -> Array r2 sh c -> Array D sh c infixl 6 Source

(-^) :: (Num c, Shape sh, Source r1 c, Source r2 c) => Array r1 sh c -> Array r2 sh c -> Array D sh c infixl 6 Source

(*^) :: (Num c, Shape sh, Source r1 c, Source r2 c) => Array r1 sh c -> Array r2 sh c -> Array D sh c infixl 7 Source

(/^) :: (Fractional c, Shape sh, Source r1 c, Source r2 c) => Array r1 sh c -> Array r2 sh c -> Array D sh c infixl 7 Source

class Structured r1 a b where Source

Structured versions of map and zipWith that preserve the representation of cursored and partitioned arrays.

For cursored (C) arrays, the cursoring of the source array is preserved.

For partitioned (P) arrays, the worker function is fused with each array partition separately, instead of treating the whole array as a single bulk object.

Preserving the cursored and/or paritioned representation of an array is will make follow-on computation more efficient than if the array was converted to a vanilla Delayed (D) array as with plain map and zipWith.

If the source array is not cursored or partitioned then smap and szipWith are identical to the plain functions.

Associated Types

type TR r1 Source

The target result representation.

Methods

smap :: Shape sh => (a -> b) -> Array r1 sh a -> Array (TR r1) sh b Source

Structured map.

szipWith :: (Shape sh, Source r c) => (c -> a -> b) -> Array r sh c -> Array r1 sh a -> Array (TR r1) sh b Source

Structured zipWith. If you have a cursored or partitioned source array then use that as the third argument (corresponding to r1 here)

Instances

Structured D a b Source 
Structured B Word8 b Source 
Storable a => Structured F a b Source 
Unbox a => Structured U a b Source 
Structured X a b Source 
Structured C a b Source 
Structured r1 a b => Structured (S r1) a b Source 
Structured r1 a b => Structured (I r1) a b Source 
(Structured r1 a b, Structured r2 a b) => Structured (P r1 r2) a b Source 

Generic traversal

traverse Source

Arguments

:: (Source r a, Shape sh, Shape sh') 
=> Array r sh a

Source array.

-> (sh -> sh')

Function to produce the extent of the result.

-> ((sh -> a) -> sh' -> b)

Function to produce elements of the result. It is passed a lookup function to get elements of the source.

-> Array D sh' b 

Unstructured traversal.

traverse2 Source

Arguments

:: (Source r1 a, Source r2 b, Shape sh, Shape sh', Shape sh'') 
=> Array r1 sh a

First source array.

-> Array r2 sh' b

Second source array.

-> (sh -> sh' -> sh'')

Function to produce the extent of the result.

-> ((sh -> a) -> (sh' -> b) -> sh'' -> c)

Function to produce elements of the result. It is passed lookup functions to get elements of the source arrays.

-> Array D sh'' c 

Unstructured traversal over two arrays at once.

traverse3 :: forall r1 r2 r3 sh1 sh2 sh3 sh4 a b c d. (Source r1 a, Source r2 b, Source r3 c, Shape sh1, Shape sh2, Shape sh3, Shape sh4) => Array r1 sh1 a -> Array r2 sh2 b -> Array r3 sh3 c -> (sh1 -> sh2 -> sh3 -> sh4) -> ((sh1 -> a) -> (sh2 -> b) -> (sh3 -> c) -> sh4 -> d) -> Array D sh4 d Source

Unstructured traversal over three arrays at once.

traverse4 :: forall r1 r2 r3 r4 sh1 sh2 sh3 sh4 sh5 a b c d e. (Source r1 a, Source r2 b, Source r3 c, Source r4 d, Shape sh1, Shape sh2, Shape sh3, Shape sh4, Shape sh5) => Array r1 sh1 a -> Array r2 sh2 b -> Array r3 sh3 c -> Array r4 sh4 d -> (sh1 -> sh2 -> sh3 -> sh4 -> sh5) -> ((sh1 -> a) -> (sh2 -> b) -> (sh3 -> c) -> (sh4 -> d) -> sh5 -> e) -> Array D sh5 e Source

Unstructured traversal over four arrays at once.

Interleaving

interleave2 :: (Shape sh, Source r1 a, Source r2 a) => Array r1 (sh :. Int) a -> Array r2 (sh :. Int) a -> Array D (sh :. Int) a Source

Interleave the elements of two arrays. All the input arrays must have the same extent, else error. The lowest dimension of the result array is twice the size of the inputs.

 interleave2 a1 a2   b1 b2  =>  a1 b1 a2 b2
             a3 a4   b3 b4      a3 b3 a4 b4

interleave3 :: (Shape sh, Source r1 a, Source r2 a, Source r3 a) => Array r1 (sh :. Int) a -> Array r2 (sh :. Int) a -> Array r3 (sh :. Int) a -> Array D (sh :. Int) a Source

Interleave the elements of three arrays.

interleave4 :: (Shape sh, Source r1 a, Source r2 a, Source r3 a, Source r4 a) => Array r1 (sh :. Int) a -> Array r2 (sh :. Int) a -> Array r3 (sh :. Int) a -> Array r4 (sh :. Int) a -> Array D (sh :. Int) a Source

Interleave the elements of four arrays.

Reduction

foldP :: (Shape sh, Source r a, Elt a, Unbox a, Monad m) => (a -> a -> a) -> a -> Array r (sh :. Int) a -> m (Array U sh a) Source

Parallel reduction of the innermost dimension of an arbitray rank array.

The first argument needs to be an associative sequential operator. The starting element must be neutral with respect to the operator, for example 0 is neutral with respect to (+) as 0 + a = a. These restrictions are required to support parallel evaluation, as the starting element may be used multiple times depending on the number of threads.

Elements are reduced in the order of their indices, from lowest to highest. Applications of the operator are associatied arbitrarily.

>>> let c 0 x = x; c x 0 = x; c x y = y
>>> let a = fromListUnboxed (Z :. 2 :. 2) [1,2,3,4] :: Array U (Z :. Int :. Int) Int
>>> foldP c 0 a
AUnboxed (Z :. 2) (fromList [2,4])

foldS :: (Shape sh, Source r a, Elt a, Unbox a) => (a -> a -> a) -> a -> Array r (sh :. Int) a -> Array U sh a Source

Sequential reduction of the innermost dimension of an arbitrary rank array.

Combine this with transpose to fold any other dimension.

Elements are reduced in the order of their indices, from lowest to highest. Applications of the operator are associatied arbitrarily.

>>> let c 0 x = x; c x 0 = x; c x y = y
>>> let a = fromListUnboxed (Z :. 2 :. 2) [1,2,3,4] :: Array U (Z :. Int :. Int) Int
>>> foldS c 0 a
AUnboxed (Z :. 2) (fromList [2,4])

foldAllP :: (Shape sh, Source r a, Elt a, Unbox a, Monad m) => (a -> a -> a) -> a -> Array r sh a -> m a Source

Parallel reduction of an array of arbitrary rank to a single scalar value.

The first argument needs to be an associative sequential operator. The starting element must be neutral with respect to the operator, for example 0 is neutral with respect to (+) as 0 + a = a. These restrictions are required to support parallel evaluation, as the starting element may be used multiple times depending on the number of threads.

Elements are reduced in row-major order. Applications of the operator are associated arbitrarily.

foldAllS :: (Shape sh, Source r a, Elt a, Unbox a) => (a -> a -> a) -> a -> Array r sh a -> a Source

Sequential reduction of an array of arbitrary rank to a single scalar value.

Elements are reduced in row-major order. Applications of the operator are associated arbitrarily.

sumP :: (Shape sh, Source r a, Num a, Elt a, Unbox a, Monad m) => Array r (sh :. Int) a -> m (Array U sh a) Source

Parallel sum the innermost dimension of an array.

sumS :: (Shape sh, Source r a, Num a, Elt a, Unbox a) => Array r (sh :. Int) a -> Array U sh a Source

Sequential sum the innermost dimension of an array.

sumAllP :: (Shape sh, Source r a, Elt a, Unbox a, Num a, Monad m) => Array r sh a -> m a Source

Parallel sum all the elements of an array.

sumAllS :: (Shape sh, Source r a, Elt a, Unbox a, Num a) => Array r sh a -> a Source

Sequential sum of all the elements of an array.

equalsP :: (Shape sh, Eq sh, Source r1 a, Source r2 a, Eq a, Monad m) => Array r1 sh a -> Array r2 sh a -> m Bool Source

Check whether two arrays have the same shape and contain equal elements, in parallel.

equalsS :: (Shape sh, Eq sh, Source r1 a, Source r2 a, Eq a) => Array r1 sh a -> Array r2 sh a -> Bool Source

Check whether two arrays have the same shape and contain equal elements, sequentially.

Selection

selectP Source

Arguments

:: (Unbox a, Monad m) 
=> (Int -> Bool)

If the Int matches this predicate,

-> (Int -> a)

... then pass it to this fn to produce a value

-> Int

Range between 0 and this maximum.

-> m (Array U DIM1 a)

Array containing produced values.

Produce an array by applying a predicate to a range of integers. If the predicate matches, then use the second function to generate the element.

  • This is a low-level function helpful for writing filtering operations on arrays.
  • Use the integer as the index into the array you're filtering.