Summarises a continuous variable, returning a tibble of descriptive statistics and a plot. When a grouping variable is supplied, results are stratified by group.
For two groups, a t-test and Wilcoxon rank-sum test are reported. For three or more groups, a one-way ANOVA and Kruskal-Wallis test are reported.
Value
A tibble with one row per group (or one row when ungrouped) containing the following columns:
nNumber of non-missing observations.
n_missNumber of missing (NA) values.
medianMedian value.
p25,p7525th and 75th percentiles (interquartile range boundaries).
meanArithmetic mean.
sdStandard deviation.
ci_lower,ci_upperLower and upper bounds of the 95% confidence interval for the mean. Uses the t-distribution when n < 30, and the Z-distribution when n >= 30.
min,maxMinimum and maximum observed values.
n_outliersCount of values more than 1.5 x IQR below Q1 or above Q3 (Tukey fence method).
shapiro_pP-value from the Shapiro-Wilk test of normality. Returns
NAwhen n < 3 or n > 5000 (outside the valid range of the test).normalLogical.
TRUEifshapiro_p > 0.05, indicating no significant departure from normality at the 5% level.p_ttestShown when two groups are compared. P-value from an independent samples t-test, testing whether the means of the two groups differ. Assumes approximately normal distributions or large samples. All p-values are reported on the first row only; remaining rows contain
NA.p_wilcoxShown when two groups are compared. P-value from the Wilcoxon rank-sum test (Mann-Whitney U test), a non-parametric alternative to the t-test. Preferred over
p_ttestwhen data are skewed, ordinal, or contain outliers, as it compares ranks rather than means and makes no distributional assumptions.p_anovaShown when three or more groups are compared. P-value from a one-way analysis of variance (ANOVA) F-test, testing whether at least one group mean differs from the others. Assumes approximately normal distributions and equal variances across groups.
p_kruskalShown when three or more groups are compared. P-value from the Kruskal-Wallis test, a non-parametric alternative to one-way ANOVA. Preferred over
p_anovawhen data are skewed or the normality assumption is not met, as it compares rank distributions and makes no distributional assumptions.
Examples
example_data <- dplyr::tibble(id = 1:100, age = rnorm(100, mean = 30, sd = 10),
group = sample(c("a", "b", "c", "d"),
size = 100, replace = TRUE))
dist_sum(example_data, age, group)
#> # A tibble: 4 × 17
#> group n n_miss median p25 p75 mean sd min max n_outliers
#> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 a 33 0 32.6 23.6 39.6 31.4 10.6 12.1 55.6 0
#> 2 b 25 0 29.2 23.7 37.1 29.3 8.90 9.36 43.2 0
#> 3 c 20 0 33.8 23.7 38.4 31.7 11.9 5.53 56.5 0
#> 4 d 22 0 26.9 20.3 36.4 28.8 11.9 11.0 52.1 0
#> # ℹ 6 more variables: shapiro_p <dbl>, ci_lower <dbl>, ci_upper <dbl>,
#> # normal <lgl>, p_anova <dbl>, p_kruskal <dbl>
example_data <- dplyr::tibble(id = 1:100, age = rnorm(100, mean = 30, sd = 10),
sex = sample(c("male", "female"),
size = 100, replace = TRUE))
dist_sum(example_data, age, sex)
#> # A tibble: 2 × 17
#> sex n n_miss median p25 p75 mean sd min max n_outliers
#> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 female 54 0 29.0 24.5 35.9 30.3 9.30 10.2 51.3 0
#> 2 male 46 0 34.0 25.9 40.9 33.4 10.9 11.8 50.9 0
#> # ℹ 6 more variables: shapiro_p <dbl>, ci_lower <dbl>, ci_upper <dbl>,
#> # normal <lgl>, p_ttest <dbl>, p_wilcox <dbl>
summary <- dist_sum(example_data, age, sex) # Save summary statistics as a tibble.
