Statistics for Biology

4 minute read

Published:

Introduction to Statistics

Statistics helps us make sense of data by summarizing it, estimating unknowns, and testing ideas.

  • Descriptive statistics: Summarize data
  • Inferential statistics: Make predictions or generalizations
  • Applied in biology, medicine, engineering, psychology, economics, etc.

Describing Distributions

Measures of Location

  • Mean – average value
  • Median – middle value
  • Mode – most frequent value

These describe the “center” of the data.

Measures of Spread

  • Range – difference between max and min
  • Quartiles – divide data into four parts
  • Percentiles – value below which X% of data falls
  • Deviation – difference from the mean
  • Variance – average of squared deviations
  • Standard deviation – square root of variance

These show how scattered the data is.


Probability Foundations

What is Probability?

  • The likelihood of an event occurring
  • Expressed between 0 (impossible) and 1 (certain)

Law of Total Probability

If A can happen through exclusive events B and C:

P(A) = P(A|B) × P(B) + P(A|C) × P(C)

Bayesian Thinking

Bayes’ Theorem

Helps update belief in a hypothesis given new data.

P(H|D) = [P(D|H) × P(H)] / P(D)

Example: COVID-19 Testing

Even with a 99% accurate test, if prevalence is low (e.g. 1%), many positives may be false due to the base rate.


Probability Distributions

PMF (Probability Mass Function)

  • For discrete variables
  • Assigns probabilities to individual outcomes
  • Example: Coin toss

PDF (Probability Density Function)

  • For continuous variables
  • Probability = Area under the curve
  • Example: Human height

Normal Distribution

  • Symmetrical bell curve
  • Defined by mean (μ) and standard deviation (σ)
  • Most biological measurements follow this

Sampling: Population vs Sample

  • Population – entire group (e.g., all adults in India)
  • Sample – subset used to infer about the population

Sampling must be random and unbiased to ensure valid conclusions.


Estimating Parameters

Point Estimate

  • A single value (e.g., sample mean) used to estimate a population parameter

Central Limit Theorem

  • The distribution of the sample mean becomes normal as sample size increases, even if the original data isn’t normal

Confidence Intervals

Used to show a range where the population value likely falls

  • Confidence level – how sure we are (e.g., 95%)
  • Margin of error – how much the sample estimate may vary
CI = sample mean ± z × (σ / √n)

Hypothesis Testing

Goal:

Test if a difference/relationship is statistically significant

  • Null hypothesis (H₀) – no effect/difference
  • Alternative hypothesis (H₁) – there is an effect

Errors:

  • Type I (α): False positive (rejecting true H₀)
  • Type II (β): False negative (failing to reject false H₀)

P-value

  • Probability that observed data is due to chance
  • If p ≤ α (e.g. 0.05) → reject H₀

Parametric Tests

TestUse Case
Z-testKnown variance, large sample size
T-testUnknown variance
Paired T-testSame subjects before/after
Two-sample T-testCompare two independent groups
ANOVACompare 3+ group means
F-testCompare variances between 2 groups

ANOVA (Analysis of Variance)

Used to test differences across 3+ groups

Types:

  • One-Way ANOVA – one factor
  • Two-Way ANOVA – two factors, also tests interaction

Assumptions:

  • Independence
  • Normality
  • Equal variances

Key Terms:

  • SSB: Variation between groups
  • SSW: Variation within groups
  • F-ratio:
    F = MSB / MSW
    

If F is large, likely that group means differ significantly.


Non-Parametric Tests

Used when assumptions of normality or equal variance don’t hold

TestUsed For
Mann-Whitney UCompare two independent medians
Wilcoxon Signed RankPaired samples without normality
Chi-Square TestRelationship between categorical variables
Kruskal-WallisNon-parametric alternative to ANOVA

Real-Life Examples

  • Body Fat Study: Use 2-sample t-test to compare men vs women
  • Teaching Effectiveness: F-test to compare variance in student scores
  • Diet Study: One-way ANOVA to compare 3 diets’ weight loss
  • HIV Drug Trial: Mann-Whitney test for viral load comparison
  • Learning Preference vs Gender: Chi-square test for association

Practice Problems

  1. Normal range: What is the central 95% range for μ = 80, σ² = 144?
  2. Cholesterol study: Do Asian immigrants differ significantly in cholesterol?
  3. Normality check: Use a sample of weights to test for normal distribution.
  4. CLT demonstration: Use different sample sizes to show CLT in action.

✅ Summary Table: Test Selection

SituationTest to Use
Mean difference, large sample, known σZ-test
Mean difference, small sample, unknown σT-test
Difference between 3+ group meansANOVA
Comparison of variancesF-test
Categorical variable associationChi-square test
Median comparison (2 groups, non-normal)Mann-Whitney U Test
Repeated measurements on same subjectsPaired T-test / Wilcoxon

“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” – H.G. Wells