Class

Several classes for estimating statistics and generating plots.

Dabest

 Dabest (data, idx, x, y, paired, id_col, ci, resamples, random_seed,
         proportional, delta2, experiment, experiment_label, x1_level,
         mini_meta)

Class for estimation statistics and plots.

Example: mean_diff

control = norm.rvs(loc=0, size=30, random_state=12345)
test    = norm.rvs(loc=0.5, size=30, random_state=12345)
my_df   = pd.DataFrame({"control": control,
                            "test": test})
my_dabest_object = dabest.load(my_df, idx=("control", "test"))
my_dabest_object.mean_diff
DABEST v2023.2.14
=================
                 
Good evening!
The current time is Fri Mar 31 19:41:17 2023.

The unpaired mean difference between control and test is 0.5 [95%CI -0.0412, 1.0].
The p-value of the two-sided permutation t-test is 0.0758, calculated for legacy purposes only. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis ofzero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use `.mean_diff.statistical_tests`

This is simply the mean of the control group subtracted from the mean of the test group.

\[\text{Mean difference} = \overline{x}_{Test} - \overline{x}_{Control}\]

where \(\overline{x}\) is the mean for the group \(x\).

Example: median_diff

control = norm.rvs(loc=0, size=30, random_state=12345)
test    = norm.rvs(loc=0.5, size=30, random_state=12345)
my_df   = pd.DataFrame({"control": control,
                            "test": test})
my_dabest_object = dabest.load(my_df, idx=("control", "test"))
my_dabest_object.median_diff
c:\users\zhang\desktop\vnbdev-dabest\dabest-python\dabest\effsize.py:72: UserWarning: Using median as the statistic in bootstrapping may result in a biased estimate and cause problems with BCa confidence intervals. Consider using a different statistic, such as the mean.
When plotting, please consider using percetile confidence intervals by specifying `ci_type='percentile'`. For detailed information, refer to https://github.com/ACCLAB/DABEST-python/issues/129 

  return func_difference(control, test, np.median, is_paired)
DABEST v2023.2.14
=================
                 
Good afternoon!
The current time is Thu Mar 30 17:07:33 2023.

The unpaired median difference between control and test is 0.5 [95%CI -0.0758, 0.991].
The p-value of the two-sided permutation t-test is 0.103, calculated for legacy purposes only. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis ofzero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use `.median_diff.statistical_tests`

This is the median difference between the control group and the test group.

If the comparison(s) are unpaired, median_diff is computed with the following equation:

\[\text{Median difference} = \widetilde{x}_{Test} - \widetilde{x}_{Control}\]

where \(\widetilde{x}\) is the median for the group \(x\).

If the comparison(s) are paired, median_diff is computed with the following equation:

\[\text{Median difference} = \widetilde{x}_{Test - Control}\]

Things to note

Using median difference as the statistic in bootstrapping may result in a biased estimate and cause problems with BCa confidence intervals. Consider using mean difference instead.

When plotting, consider using percentile confidence intervals instead of BCa confidence intervals by specifying ci_type = 'percentile' in .plot().

For detailed information, please refer to Issue 129.

Example: cohens_d

control = norm.rvs(loc=0, size=30, random_state=12345)
test    = norm.rvs(loc=0.5, size=30, random_state=12345)
my_df   = pd.DataFrame({"control": control,
                            "test": test})
my_dabest_object = dabest.load(my_df, idx=("control", "test"))
my_dabest_object.cohens_d
DABEST v2023.2.14
=================
                 
Good afternoon!
The current time is Thu Mar 30 17:07:39 2023.

The unpaired Cohen's d between control and test is 0.471 [95%CI -0.0843, 0.976].
The p-value of the two-sided permutation t-test is 0.0758, calculated for legacy purposes only. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis ofzero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use `.cohens_d.statistical_tests`

Cohen’s d is simply the mean of the control group subtracted from the mean of the test group.

If paired is None, then the comparison(s) are unpaired; otherwise the comparison(s) are paired.

If the comparison(s) are unpaired, Cohen’s d is computed with the following equation:

\[d = \frac{\overline{x}_{Test} - \overline{x}_{Control}} {\text{pooled standard deviation}}\]

For paired comparisons, Cohen’s d is given by

\[d = \frac{\overline{x}_{Test} - \overline{x}_{Control}} {\text{average standard deviation}}\]

where \(\overline{x}\) is the mean of the respective group of observations, \({Var}_{x}\) denotes the variance of that group,

\[\text{pooled standard deviation} = \sqrt{ \frac{(n_{control} - 1) * {Var}_{control} + (n_{test} - 1) * {Var}_{test} } {n_{control} + n_{test} - 2} }\]

and

\[\text{average standard deviation} = \sqrt{ \frac{{Var}_{control} + {Var}_{test}} {2}}\]

The sample variance (and standard deviation) uses N-1 degrees of freedoms. This is an application of Bessel’s correction, and yields the unbiased sample variance.

References:

https://en.wikipedia.org/wiki/Effect_size#Cohen's_d

https://en.wikipedia.org/wiki/Bessel%27s_correction

https://en.wikipedia.org/wiki/Standard_deviation#Corrected_sample_standard_deviation

Example: cohens_h

control = randint.rvs(0, 2, size=30, random_state=12345)
test    = randint.rvs(0, 2, size=30, random_state=12345)
my_df   = pd.DataFrame({"control": control,
                            "test": test})
my_dabest_object = dabest.load(my_df, idx=("control", "test"))
my_dabest_object.cohens_h
DABEST v2023.2.14
=================
                 
Good evening!
The current time is Mon Mar 27 00:48:59 2023.

The unpaired Cohen's h between control and test is 0.0 [95%CI -0.613, 0.429].
The p-value of the two-sided permutation t-test is 0.799, calculated for legacy purposes only. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis ofzero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use `.cohens_h.statistical_tests`

Cohen’s h uses the information of proportion in the control and test groups to calculate the distance between two proportions.

It can be used to describe the difference between two proportions as “small”, “medium”, or “large”.

It can be used to determine if the difference between two proportions is “meaningful”.

A directional Cohen’s h is computed with the following equation:

\[h = 2 * \arcsin{\sqrt{proportion_{Test}}} - 2 * \arcsin{\sqrt{proportion_{Control}}}\]

For a non-directional Cohen’s h, the equation is:

\[h = |2 * \arcsin{\sqrt{proportion_{Test}}} - 2 * \arcsin{\sqrt{proportion_{Control}}}|\]

References:

https://en.wikipedia.org/wiki/Cohen%27s_h

Example: hedges_g

control = norm.rvs(loc=0, size=30, random_state=12345)
test    = norm.rvs(loc=0.5, size=30, random_state=12345)
my_df   = pd.DataFrame({"control": control,
                            "test": test})
my_dabest_object = dabest.load(my_df, idx=("control", "test"))
my_dabest_object.hedges_g
DABEST v2023.2.14
=================
                 
Good evening!
The current time is Mon Mar 27 00:50:18 2023.

The unpaired Hedges' g between control and test is 0.465 [95%CI -0.0832, 0.963].
The p-value of the two-sided permutation t-test is 0.0758, calculated for legacy purposes only. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis ofzero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use `.hedges_g.statistical_tests`

Hedges’ g is cohens_d corrected for bias via multiplication with the following correction factor:

\[\frac{ \Gamma( \frac{a} {2} )} {\sqrt{ \frac{a} {2} } \times \Gamma( \frac{a - 1} {2} )}\]

where

\[a = {n}_{control} + {n}_{test} - 2\]

and \(\Gamma(x)\) is the Gamma function.

References:

https://en.wikipedia.org/wiki/Effect_size#Hedges'_g

https://journals.sagepub.com/doi/10.3102/10769986006002107

Example: cliffs_delta

control = norm.rvs(loc=0, size=30, random_state=12345)
test    = norm.rvs(loc=0.5, size=30, random_state=12345)
my_df   = pd.DataFrame({"control": control,
                            "test": test})
my_dabest_object = dabest.load(my_df, idx=("control", "test"))
my_dabest_object.cliffs_delta
DABEST v2023.2.14
=================
                 
Good evening!
The current time is Mon Mar 27 00:53:30 2023.

The unpaired Cliff's delta between control and test is 0.28 [95%CI -0.0244, 0.533].
The p-value of the two-sided permutation t-test is 0.061, calculated for legacy purposes only. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis ofzero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use `.cliffs_delta.statistical_tests`

Cliff’s delta is a measure of ordinal dominance, ie. how often the values from the test sample are larger than values from the control sample.

\[\text{Cliff's delta} = \frac{\#({x}_{test} > {x}_{control}) - \#({x}_{test} < {x}_{control})} {{n}_{Test} \times {n}_{Control}}\]

where \(\#\) denotes the number of times a value from the test sample exceeds (or is lesser than) values in the control sample.

Cliff’s delta ranges from -1 to 1; it can also be thought of as a measure of the degree of overlap between the two samples. An attractive aspect of this effect size is that it does not make an assumptions about the underlying distributions that the samples were drawn from.

References:

https://en.wikipedia.org/wiki/Effect_size#Effect_size_for_ordinal_data

https://psycnet.apa.org/record/1994-08169-001


DeltaDelta

 DeltaDelta (effectsizedataframe, permutation_count, ci=95)

A class to compute and store the delta-delta statistics for experiments with a 2-by-2 arrangement where two independent variables, A and B, each have two categorical values, 1 and 2. The data is divided into two pairs of two groups, and a primary delta is first calculated as the mean difference between each of the pairs:

\[\Delta_{1} = \overline{X}_{A_{2}, B_{1}} - \overline{X}_{A_{1}, B_{1}}\]

\[\Delta_{2} = \overline{X}_{A_{2}, B_{2}} - \overline{X}_{A_{1}, B_{2}}\]

where \(\overline{X}_{A_{i}, B_{j}}\) is the mean of the sample with A = i and B = j, \(\Delta\) is the mean difference between two samples.

A delta-delta value is then calculated as the mean difference between the two primary deltas:

\[\Delta_{\Delta} = \Delta_{2} - \Delta_{1}\]

and the standard deviation of the delta-delta value is calculated from a pooled variance of the 4 samples:

\[s_{\Delta_{\Delta}} = \sqrt{\frac{(n_{A_{2}, B_{1}}-1)s_{A_{2}, B_{1}}^2+(n_{A_{1}, B_{1}}-1)s_{A_{1}, B_{1}}^2+(n_{A_{2}, B_{2}}-1)s_{A_{2}, B_{2}}^2+(n_{A_{1}, B_{2}}-1)s_{A_{1}, B_{2}}^2}{(n_{A_{2}, B_{1}} - 1) + (n_{A_{1}, B_{1}} - 1) + (n_{A_{2}, B_{2}} - 1) + (n_{A_{1}, B_{2}} - 1)}}\]

where \(s\) is the standard deviation and \(n\) is the sample size.

Example: delta-delta

np.random.seed(9999) # Fix the seed so the results are replicable.
N = 20
# Create samples
y = norm.rvs(loc=3, scale=0.4, size=N*4)
y[N:2*N] = y[N:2*N]+1
y[2*N:3*N] = y[2*N:3*N]-0.5
# Add a `Treatment` column
t1 = np.repeat('Placebo', N*2).tolist()
t2 = np.repeat('Drug', N*2).tolist()
treatment = t1 + t2 
# Add a `Rep` column as the first variable for the 2 replicates of experiments done
rep = []
for i in range(N*2):
    rep.append('Rep1')
    rep.append('Rep2')
# Add a `Genotype` column as the second variable
wt = np.repeat('W', N).tolist()
mt = np.repeat('M', N).tolist()
wt2 = np.repeat('W', N).tolist()
mt2 = np.repeat('M', N).tolist()
genotype = wt + mt + wt2 + mt2
# Add an `id` column for paired data plotting.
id = list(range(0, N*2))
id_col = id + id 
# Combine all columns into a DataFrame.
df_delta2 = pd.DataFrame({'ID'        : id_col,
                  'Rep'      : rep,
                   'Genotype'  : genotype, 
                   'Treatment': treatment,
                   'Y'         : y
                })
unpaired_delta2 = dabest.load(data = df_delta2, x = ["Genotype", "Genotype"], y = "Y", delta2 = True, experiment = "Treatment")
unpaired_delta2.mean_diff.plot();


MiniMetaDelta

 MiniMetaDelta (effectsizedataframe, permutation_count, ci=95)

A class to compute and store the weighted delta. A weighted delta is calculated if the argument mini_meta=True is passed during dabest.load().

The weighted delta is calcuated as follows:

\[\theta_{\text{weighted}} = \frac{\Sigma\hat{\theta_{i}}w_{i}}{{\Sigma}w_{i}}\]

where:

\[\hat{\theta_{i}} = \text{Mean difference for replicate }i\]

\[w_{i} = \text{Weight for replicate }i = \frac{1}{s_{i}^2} \]

\[s_{i}^2 = \text{Pooled variance for replicate }i = \frac{(n_{test}-1)s_{test}^2+(n_{control}-1)s_{control}^2}{n_{test}+n_{control}-2}\]

\[n = \text{sample size and }s^2 = \text{variance for control/test.}\]

Example: mini-meta-delta

Ns = 20
c1 = norm.rvs(loc=3, scale=0.4, size=Ns)
c2 = norm.rvs(loc=3.5, scale=0.75, size=Ns)
c3 = norm.rvs(loc=3.25, scale=0.4, size=Ns)
t1 = norm.rvs(loc=3.5, scale=0.5, size=Ns)
t2 = norm.rvs(loc=2.5, scale=0.6, size=Ns)
t3 = norm.rvs(loc=3, scale=0.75, size=Ns)
my_df   = pd.DataFrame({'Control 1' : c1,     'Test 1' : t1,
                   'Control 2' : c2,     'Test 2' : t2,
                   'Control 3' : c3,     'Test 3' : t3})
my_dabest_object = dabest.load(my_df, idx=(("Control 1", "Test 1"), ("Control 2", "Test 2"), ("Control 3", "Test 3")), mini_meta=True)
my_dabest_object.mean_diff.mini_meta_delta
DABEST v2023.2.14
=================
                 
Good morning!
The current time is Mon Mar 27 01:01:11 2023.

The weighted-average unpaired mean differences is 0.0336 [95%CI -0.137, 0.228].
The p-value of the two-sided permutation t-test is 0.736, calculated for legacy purposes only. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis ofzero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

As of version 2023.02.14, weighted delta can only be calculated for mean difference, and not for standardized measures such as Cohen’s d.

Details about the calculated weighted delta are accessed as attributes of the mini_meta_delta class. See the minimetadelta for details on usage.

Refer to Chapter 10 of the Cochrane handbook for further information on meta-analysis: https://training.cochrane.org/handbook/current/chapter-10


TwoGroupsEffectSize

 TwoGroupsEffectSize (control, test, effect_size, proportional=False,
                      is_paired=None, ci=95, resamples=5000,
                      permutation_count=5000, random_seed=12345)

A class to compute and store the results of bootstrapped mean differences between two groups.

Compute the effect size between two groups.

Type Default Details
control array-like
test array-like These should be numerical iterables.
effect_size string. Any one of the following are accepted inputs:
‘mean_diff’, ‘median_diff’, ‘cohens_d’, ‘hedges_g’, or ‘cliffs_delta’
proportional bool False
is_paired NoneType None
ci int 95 The confidence interval width. The default of 95 produces 95%
confidence intervals.
resamples int 5000 The number of bootstrap resamples to be taken for the calculation
of the confidence interval limits.
permutation_count int 5000 The number of permutations (reshuffles) to perform for the
computation of the permutation p-value
random_seed int 12345 random_seed is used to seed the random number generator during
bootstrap resampling. This ensures that the confidence intervals
reported are replicable.
Returns py:class:TwoGroupEffectSize object: difference : float
The effect size of the difference between the control and the test.
effect_size : string
The type of effect size reported.
is_paired : string
The type of repeated-measures experiment.
ci : float
Returns the width of the confidence interval, in percent.
alpha : float
Returns the significance level of the statistical test as a float between 0 and 1.
resamples : int
The number of resamples performed during the bootstrap procedure.
bootstraps : numpy ndarray
The generated bootstraps of the effect size.
random_seed : int
The number used to initialise the numpy random seed generator, ie.seed_value from numpy.random.seed(seed_value) is returned.
bca_low, bca_high : float
The bias-corrected and accelerated confidence interval lower limit and upper limits, respectively.
pct_low, pct_high : float
The percentile confidence interval lower limit and upper limits, respectively.

Example

np.random.seed(12345)
control = norm.rvs(loc=0, size=30)
test = norm.rvs(loc=0.5, size=30)
effsize = dabest.TwoGroupsEffectSize(control, test, "mean_diff")
effsize
The unpaired mean difference is -0.253 [95%CI -0.78, 0.25].
The p-value of the two-sided permutation t-test is 0.348, calculated for legacy purposes only. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis ofzero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.
effsize.to_dict()
{'alpha': 0.05,
 'bca_high': 0.24951887238295106,
 'bca_interval_idx': (125, 4875),
 'bca_low': -0.7801782111071534,
 'bootstraps': array([-0.3649424 , -0.45018155, -0.56034412, ..., -0.49805581,
        -0.25334475, -0.55206229]),
 'ci': 95,
 'difference': -0.25315417702752846,
 'effect_size': 'mean difference',
 'is_paired': None,
 'pct_high': 0.24951887238295106,
 'pct_interval_idx': (125, 4875),
 'pct_low': -0.7801782111071534,
 'permutation_count': 5000,
 'permutations': array([ 0.17221029,  0.03112419, -0.13911387, ..., -0.38007941,
         0.30261507, -0.09073054]),
 'permutations_var': array([0.07201642, 0.07251104, 0.07219407, ..., 0.07003705, 0.07094885,
        0.07238581]),
 'proportional_difference': nan,
 'pvalue_brunner_munzel': nan,
 'pvalue_kruskal': nan,
 'pvalue_mann_whitney': 0.5201446121616038,
 'pvalue_mcnemar': nan,
 'pvalue_paired_students_t': nan,
 'pvalue_permutation': 0.3484,
 'pvalue_students_t': 0.34743913903372836,
 'pvalue_welch': 0.3474493875548964,
 'pvalue_wilcoxon': nan,
 'random_seed': 12345,
 'resamples': 5000,
 'statistic_brunner_munzel': nan,
 'statistic_kruskal': nan,
 'statistic_mann_whitney': 494.0,
 'statistic_mcnemar': nan,
 'statistic_paired_students_t': nan,
 'statistic_students_t': 0.9472545159069105,
 'statistic_welch': 0.9472545159069105,
 'statistic_wilcoxon': nan}

EffectSizeDataFrame

 EffectSizeDataFrame (dabest, effect_size, is_paired, ci=95,
                      proportional=False, resamples=5000,
                      permutation_count=5000, random_seed=12345,
                      x1_level=None, x2=None, delta2=False,
                      experiment_label=None, mini_meta=False)

A class that generates and stores the results of bootstrapped effect sizes for several comparisons.

Example: plot

Create a Gardner-Altman estimation plot for the mean difference.

np.random.seed(9999) # Fix the seed so the results are replicable.
# pop_size = 10000 # Size of each population.
Ns = 20 # The number of samples taken from each population

# Create samples
c1 = norm.rvs(loc=3, scale=0.4, size=Ns)
c2 = norm.rvs(loc=3.5, scale=0.75, size=Ns)
c3 = norm.rvs(loc=3.25, scale=0.4, size=Ns)

t1 = norm.rvs(loc=3.5, scale=0.5, size=Ns)
t2 = norm.rvs(loc=2.5, scale=0.6, size=Ns)
t3 = norm.rvs(loc=3, scale=0.75, size=Ns)
t4 = norm.rvs(loc=3.5, scale=0.75, size=Ns)
t5 = norm.rvs(loc=3.25, scale=0.4, size=Ns)
t6 = norm.rvs(loc=3.25, scale=0.4, size=Ns)


# Add a `gender` column for coloring the data.
females = np.repeat('Female', Ns/2).tolist()
males = np.repeat('Male', Ns/2).tolist()
gender = females + males

# Add an `id` column for paired data plotting.
id_col = pd.Series(range(1, Ns+1))

# Combine samples and gender into a DataFrame.
df = pd.DataFrame({'Control 1' : c1,     'Test 1' : t1,
                 'Control 2' : c2,     'Test 2' : t2,
                 'Control 3' : c3,     'Test 3' : t3,
                 'Test 4'    : t4,     'Test 5' : t5, 'Test 6' : t6,
                 'Gender'    : gender, 'ID'  : id_col
                })
my_data = dabest.load(df, idx=("Control 1", "Test 1"))
fig1 = my_data.mean_diff.plot();

Create a Gardner-Altman plot for the Hedges’ g effect size.

fig2 = my_data.hedges_g.plot();

Create a Cumming estimation plot for the mean difference.

fig3 = my_data.mean_diff.plot(float_contrast=True);

Create a paired Gardner-Altman plot.

my_data_paired = dabest.load(df, idx=("Control 1", "Test 1"),
                       id_col = "ID", paired='baseline')
fig4 = my_data_paired.mean_diff.plot();

Create a multi-group Cumming plot.

my_multi_groups = dabest.load(df, id_col = "ID", 
                             idx=(("Control 1", "Test 1"),
                                 ("Control 2", "Test 2")))
fig5 = my_multi_groups.mean_diff.plot();

Create a shared control Cumming plot.

my_shared_control = dabest.load(df, id_col = "ID",
                                 idx=("Control 1", "Test 1",
                                          "Test 2", "Test 3"))
fig6 = my_shared_control.mean_diff.plot();

Create a repeated meausures (against baseline) Slopeplot.

my_rm_baseline = dabest.load(df, id_col = "ID", paired = "baseline",
                                 idx=("Control 1", "Test 1",
                                          "Test 2", "Test 3"))
fig7 = my_rm_baseline.mean_diff.plot();

Create a repeated meausures (sequential) Slopeplot.

my_rm_sequential = dabest.load(df, id_col = "ID", paired = "sequential",
                                 idx=("Control 1", "Test 1",
                                          "Test 2", "Test 3"))
fig8 = my_rm_sequential.mean_diff.plot();


PermutationTest

 PermutationTest (control:np.array, test:np.array, effect_size:str,
                  is_paired:str=None, permutation_count:int=5000,
                  random_seed:int=12345, **kwargs)

A class to compute and report permutation tests.

Type Default Details
control np.array
test np.array These should be numerical iterables.
effect_size str Any one of the following are accepted inputs: ‘mean_diff’, ‘median_diff’, ‘cohens_d’, ‘hedges_g’, or ‘cliffs_delta’
is_paired str None
permutation_count int 5000 The number of permutations (reshuffles) to perform.
random_seed int 12345 random_seed is used to seed the random number generator during bootstrap resampling. This ensures that the generated permutations are replicable.
kwargs
Returns py:class:PermutationTest object: difference:float
The effect size of the difference between the control and the test.
effect_size:string
The type of effect size reported.

Notes:

The basic concept of permutation tests is the same as that behind bootstrapping. In an “exact” permutation test, all possible resuffles of the control and test labels are performed, and the proportion of effect sizes that equal or exceed the observed effect size is computed. This is the probability, under the null hypothesis of zero difference between test and control groups, of observing the effect size: the p-value of the Student’s t-test.

Exact permutation tests are impractical: computing the effect sizes for all reshuffles quickly exceeds trivial computational loads. A control group and a test group both with 10 observations each would have a total of \(20!\) or \(2.43 \times {10}^{18}\) reshuffles. Therefore, in practice, “approximate” permutation tests are performed, where a sufficient number of reshuffles are performed (5,000 or 10,000), from which the p-value is computed.

More information can be found here.

Example: permutation test

control = norm.rvs(loc=0, size=30, random_state=12345)
test = norm.rvs(loc=0.5, size=30, random_state=12345)
perm_test = dabest.PermutationTest(control, test, 
                                   effect_size="mean_diff", 
                                   is_paired=None)
perm_test
5000 permutations were taken. The p-value is 0.0758.