Copyright | (c) 2009 Bryan O'Sullivan |
---|---|
License | BSD3 |
Maintainer | bos@serpentine.com |
Stability | experimental |
Portability | portable |
Safe Haskell | None |
Language | Haskell98 |
Functions for approximating quantiles, i.e. points taken at regular intervals from the cumulative distribution function of a random variable.
The number of quantiles is described below by the variable q, so with q=4, a 4-quantile (also known as a quartile) has 4 intervals, and contains 5 points. The parameter k describes the desired point, where 0 ≤ k ≤ q.
Synopsis
- data ContParam = ContParam !Double !Double
- class Default a where
- quantile :: Vector v Double => ContParam -> Int -> Int -> v Double -> Double
- quantiles :: (Vector v Double, Foldable f, Functor f) => ContParam -> f Int -> Int -> v Double -> f Double
- quantilesVec :: (Vector v Double, Vector v Int) => ContParam -> v Int -> Int -> v Double -> v Double
- cadpw :: ContParam
- hazen :: ContParam
- spss :: ContParam
- s :: ContParam
- medianUnbiased :: ContParam
- normalUnbiased :: ContParam
- weightedAvg :: Vector v Double => Int -> Int -> v Double -> Double
- median :: Vector v Double => ContParam -> v Double -> Double
- mad :: Vector v Double => ContParam -> v Double -> Double
- midspread :: Vector v Double => ContParam -> Int -> v Double -> Double
- continuousBy :: Vector v Double => ContParam -> Int -> Int -> v Double -> Double
Quantile estimation functions
Below is family of functions which use same algorithm for estimation
of sample quantiles. It approximates empirical CDF as continuous
piecewise function which interpolates linearly between points
\((X_k,p_k)\) where \(X_k\) is k-th order statistics (k-th smallest
element) and \(p_k\) is probability corresponding to
it. ContParam
determines how \(p_k\) is chosen. For more detailed
explanation see [Hyndman1996].
This is the method used by most statistical software, such as R, Mathematica, SPSS, and S.
Parameters α and β to the continuousBy
function. Exact
meaning of parameters is described in [Hyndman1996] in section
"Piecewise linear functions"
Instances
Eq ContParam Source # | |
Data ContParam Source # | |
Defined in Statistics.Quantile gfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b) -> (forall g. g -> c g) -> ContParam -> c ContParam # gunfold :: (forall b r. Data b => c (b -> r) -> c r) -> (forall r. r -> c r) -> Constr -> c ContParam # toConstr :: ContParam -> Constr # dataTypeOf :: ContParam -> DataType # dataCast1 :: Typeable t => (forall d. Data d => c (t d)) -> Maybe (c ContParam) # dataCast2 :: Typeable t => (forall d e. (Data d, Data e) => c (t d e)) -> Maybe (c ContParam) # gmapT :: (forall b. Data b => b -> b) -> ContParam -> ContParam # gmapQl :: (r -> r' -> r) -> r -> (forall d. Data d => d -> r') -> ContParam -> r # gmapQr :: (r' -> r -> r) -> r -> (forall d. Data d => d -> r') -> ContParam -> r # gmapQ :: (forall d. Data d => d -> u) -> ContParam -> [u] # gmapQi :: Int -> (forall d. Data d => d -> u) -> ContParam -> u # gmapM :: Monad m => (forall d. Data d => d -> m d) -> ContParam -> m ContParam # gmapMp :: MonadPlus m => (forall d. Data d => d -> m d) -> ContParam -> m ContParam # gmapMo :: MonadPlus m => (forall d. Data d => d -> m d) -> ContParam -> m ContParam # | |
Ord ContParam Source # | |
Defined in Statistics.Quantile | |
Show ContParam Source # | |
Generic ContParam Source # | |
ToJSON ContParam Source # | |
Defined in Statistics.Quantile | |
FromJSON ContParam Source # | |
Binary ContParam Source # | |
Default ContParam Source # | We use |
Defined in Statistics.Quantile | |
type Rep ContParam Source # | |
Defined in Statistics.Quantile type Rep ContParam = D1 (MetaData "ContParam" "Statistics.Quantile" "statistics-0.15.0.0-AkglZgHZAgx3cdskkvnxTn" False) (C1 (MetaCons "ContParam" PrefixI False) (S1 (MetaSel (Nothing :: Maybe Symbol) SourceUnpack SourceStrict DecidedStrict) (Rec0 Double) :*: S1 (MetaSel (Nothing :: Maybe Symbol) SourceUnpack SourceStrict DecidedStrict) (Rec0 Double))) |
A class for types with a default value.
Instances
:: Vector v Double | |
=> ContParam | Parameters α and β. |
-> Int | k, the desired quantile. |
-> Int | q, the number of quantiles. |
-> v Double | x, the sample data. |
-> Double |
O(n·log n). Estimate the kth q-quantile of a sample x, using the continuous sample method with the given parameters.
The following properties should hold, otherwise an error will be thrown.
- input sample must be nonempty
- the input does not contain
NaN
- 0 ≤ k ≤ q
quantiles :: (Vector v Double, Foldable f, Functor f) => ContParam -> f Int -> Int -> v Double -> f Double Source #
O(k·n·log n). Estimate set of the kth q-quantile of a sample x, using the continuous sample method with the given parameters. This is faster than calling quantile repeatedly since sample should be sorted only once
The following properties should hold, otherwise an error will be thrown.
- input sample must be nonempty
- the input does not contain
NaN
- for every k in set of quantiles 0 ≤ k ≤ q
quantilesVec :: (Vector v Double, Vector v Int) => ContParam -> v Int -> Int -> v Double -> v Double Source #
Parameters for the continuous sample method
California Department of Public Works definition, α=0, β=1. Gives a linear interpolation of the empirical CDF. This corresponds to method 4 in R and Mathematica.
Hazen's definition, α=0.5, β=0.5. This is claimed to be popular among hydrologists. This corresponds to method 5 in R and Mathematica.
Definition used by the SPSS statistics application, with α=0, β=0 (also known as Weibull's definition). This corresponds to method 6 in R and Mathematica.
Definition used by the S statistics application, with α=1,
β=1. The interpolation points divide the sample range into n-1
intervals. This corresponds to method 7 in R and Mathematica and
is default in R.
medianUnbiased :: ContParam Source #
Median unbiased definition, α=1/3, β=1/3. The resulting quantile estimates are approximately median unbiased regardless of the distribution of x. This corresponds to method 8 in R and Mathematica.
normalUnbiased :: ContParam Source #
Normal unbiased definition, α=3/8, β=3/8. An approximately unbiased estimate if the empirical distribution approximates the normal distribution. This corresponds to method 9 in R and Mathematica.
Other algorithms
:: Vector v Double | |
=> Int | k, the desired quantile. |
-> Int | q, the number of quantiles. |
-> v Double | x, the sample data. |
-> Double |
O(n·log n). Estimate the kth q-quantile of a sample,
using the weighted average method. Up to rounding errors it's same
as quantile s
.
The following properties should hold otherwise an error will be thrown.
- the length of the input is greater than
0
- the input does not contain
NaN
- k ≥ 0 and k ≤ q
Median & other specializations
O(n·log n) Estimate median of sample
O(n·log n). Estimate the median absolute deviation (MAD) of a
sample x using continuousBy
. It's robust estimate of
variability in sample and defined as:
\[ MAD = \operatorname{median}(| X_i - \operatorname{median}(X) |) \]
:: Vector v Double | |
=> ContParam | Parameters α and β. |
-> Int | q, the number of quantiles. |
-> v Double | x, the sample data. |
-> Double |
O(n·log n). Estimate the range between q-quantiles 1 and q-1 of a sample x, using the continuous sample method with the given parameters.
For instance, the interquartile range (IQR) can be estimated as follows:
midspread medianUnbiased 4 (U.fromList [1,1,2,2,3]) ==> 1.333333
Deprecated
:: Vector v Double | |
=> ContParam | Parameters α and β. |
-> Int | k, the desired quantile. |
-> Int | q, the number of quantiles. |
-> v Double | x, the sample data. |
-> Double |
Deprecated: Use quantile instead
References
- Weisstein, E.W. Quantile. MathWorld. http://mathworld.wolfram.com/Quantile.html
- [Hyndman1996] Hyndman, R.J.; Fan, Y. (1996) Sample quantiles in statistical packages. American Statistician 50(4):361–365. http://www.jstor.org/stable/2684934