Safe Haskell | None |
---|---|
Language | Haskell98 |
Generic parallel array computation operators.
- fillChunked :: Gang -> (Int# -> a -> IO ()) -> (Int# -> a) -> Int# -> IO ()
- fillChunkedIO :: Gang -> (Int# -> a -> IO ()) -> (Int# -> IO (Int# -> IO a)) -> Int# -> IO ()
- fillBlock2 :: Elt a => Gang -> (Int# -> a -> IO ()) -> (Int# -> Int# -> a) -> Int# -> Int# -> Int# -> Int# -> Int# -> IO ()
- fillInterleaved :: Gang -> (Int# -> a -> IO ()) -> (Int# -> a) -> Int# -> IO ()
- fillCursoredBlock2 :: Elt a => Gang -> (Int# -> a -> IO ()) -> (Int# -> Int# -> cursor) -> (Int# -> Int# -> cursor -> cursor) -> (cursor -> a) -> Int# -> Int# -> Int# -> Int# -> Int# -> IO ()
- foldAll :: Gang -> (Int# -> a) -> (a -> a -> a) -> a -> Int# -> IO a
- foldInner :: Gang -> (Int# -> a -> IO ()) -> (Int# -> a) -> (a -> a -> a) -> a -> Int# -> Int# -> IO ()
Filling
:: Gang | Gang to run the operation on. |
-> (Int# -> a -> IO ()) | Update function to write into result buffer. |
-> (Int# -> a) | Function to get the value at a given index. |
-> Int# | Number of elements. |
-> IO () |
Fill something in parallel.
- The array is split into linear chunks, and each thread linearly fills one chunk.
:: Gang | Gang to run the operation on. |
-> (Int# -> a -> IO ()) | Update function to write into result buffer. |
-> (Int# -> IO (Int# -> IO a)) | Create a function to get the value at a given index. The first argument is the thread number, so you can do some per-thread initialisation. |
-> Int# | Number of elements. |
-> IO () |
Fill something in parallel, using a separate IO action for each thread.
- The array is split into linear chunks, and each thread linearly fills one chunk.
:: Elt a | |
=> Gang | |
-> (Int# -> a -> IO ()) | Update function to write into result buffer. |
-> (Int# -> Int# -> a) | Function to evaluate the element at an (x, y) index. |
-> Int# | Width of the whole array. |
-> Int# | x0 lower left corner of block to fill |
-> Int# | y0 |
-> Int# | w0 width of block to fill. |
-> Int# | h0 height of block to fill. |
-> IO () |
Fill a block in a rank-2 array in parallel.
- Blockwise filling can be more cache-efficient than linear filling for rank-2 arrays.
- Coordinates given are of the filled edges of the block.
- We divide the block into columns, and give one column to each thread.
- Each column is filled in row major order from top to bottom.
:: Gang | Gang to run the operation on. |
-> (Int# -> a -> IO ()) | Update function to write into result buffer. |
-> (Int# -> a) | Function to get the value at a given index. |
-> Int# | Number of elements. |
-> IO () |
Fill something in parallel, using a round-robin order.
- Threads handle elements in row major, round-robin order.
- Using this method helps even out unbalanced workloads.
:: Elt a | |
=> Gang | Gang to run the operation on. |
-> (Int# -> a -> IO ()) | Update function to write into result buffer. |
-> (Int# -> Int# -> cursor) | Make a cursor from an (x, y) index. |
-> (Int# -> Int# -> cursor -> cursor) | Shift the cursor by an (x, y) offset. |
-> (cursor -> a) | Function to evaluate the element at an index. |
-> Int# | Width of the whole array. |
-> Int# | x0 lower left corner of block to fill |
-> Int# | y0 |
-> Int# | w0 width of block to fill |
-> Int# | h0 height of block to fill |
-> IO () |
Fill a block in a rank-2 array in parallel.
- Blockwise filling can be more cache-efficient than linear filling for rank-2 arrays.
- Using cursor functions can help to expose inter-element indexing computations to the GHC and LLVM optimisers.
- Coordinates given are of the filled edges of the block.
- We divide the block into columns, and give one column to each thread.
- We need the
Elt
constraint so that we can use itstouch
function to provide an order of evaluation ammenable to the LLVM optimiser. You should compile your Haskell program with-fllvm -optlo-O3
to enable LLVM's Global Value Numbering optimisation.
Reduction
:: Gang | Gang to run the operation on. |
-> (Int# -> a) | Function to get an element from the source. |
-> (a -> a -> a) | Binary associative combining function. |
-> a | Starting value. |
-> Int# | Number of elements. |
-> IO a |
Parallel tree reduction of an array to a single value. Each thread takes an equally sized chunk of the data and computes a partial sum. The main thread then reduces the array of partial sums to the final result.
We don't require that the initial value be a neutral element, so each thread computes a fold1 on its chunk of the data, and the seed element is only applied in the final reduction step.
:: Gang | Gang to run the operation on. |
-> (Int# -> a -> IO ()) | Function to write into the result buffer. |
-> (Int# -> a) | Function to get an element from the source. |
-> (a -> a -> a) | Binary associative combination operator. |
-> a | Neutral starting value. |
-> Int# | Total length of source. |
-> Int# | Inner dimension (length to fold over). |
-> IO () |
Parallel reduction of a multidimensional array along the innermost dimension. Each output value is computed by a single thread, with the output values distributed evenly amongst the available threads.