Mathematics/Statistics

# One- and Two-Sample Tests of Hypotheses

## Testing a Statistical Hypothesis

• Type I error: rejection of the null hypothesis when it is true
• Type II error: non-rejection of the null hypothesis when it is false
• Significance level: the probability of committing a type I error, written by the Greek letter $\alpha$.
• It is impossible to compute the probability of committing a type II error, denoted by $\beta$.
• The probability of committing both types of error can be reduced by increasing the sample size.
• $P$-value is the lowest level of significance at which the observed value of the test statistic is significant.

## Single Sample: Tests Concerning a Single Mean

Let two hypotheses be:

• $H_0: \mu = \mu_0$
• $H_1: \mu \ne \mu_0$

### Tests on a Single Mean (Variance Known)

For $\displaystyle z = {\bar x - \mu_0 \over \sigma / \sqrt n}$,

• If $-z_{\alpha/2} < z < z_{\alpha/2}$, do not reject $H_0$.
• Otherwise, reject $H_0$.

Note that $-z_{\alpha/2} \le z \le z_{\alpha/2}$ is equivalent to $\displaystyle \bar x - z_{\alpha/2} {\sigma \over \sqrt n} \le \mu_0 \le \bar x + z_{\alpha/2} {\sigma \over \sqrt n}$.

### Tests on a Single Mean (Variance Unknown)

For $\displaystyle t = {\bar x - \mu_0 \over s/ \sqrt n}$,

• If $-t_{\alpha/2, n-1} \le t \le t_{\alpha/2, n-1}$, do not reject $H_0$.
• Otherwise, reject $H_0$.

## Two Samples: Tests on Two Means

Let two hypotheses be:

• $H_0 : \mu_1 - \mu_2 = d_0$
• $H_1 : \mu_1 - \mu_2 \neq d_0$

### Variance Known

For $\displaystyle z = {(\bar x_1 - \bar x_2) - d_0 \over \sqrt{\sigma_1^2/n_1 + \sigma_2^2 /n_2}}$,

• If $-z_{\alpha/2} < z < z_{\alpha/2}$, do not reject $H_0$.
• Otherwise, reject $H_0$.

### Unknown But Equal Variances

Let $\displaystyle s_p = {s_1^2 (n_1 - 1) + s_2^2 (n_2 - 1) \over n_1 + n_2 - 2}$.

For $\displaystyle t = {(\bar x_1 - \bar x_2)-d_0 \over s_p \sqrt{1/n_1 + 1/n_2}}$,

• If $-t_{\alpha/2, n_1 + n_2 - 2} < t < t_{\alpha/2, n_1 + n_2 - 2}$, do not reject $H_0$.
• Otherwise, reject $H_0$.

### Unknown and Unequal Variances

For $\displaystyle t = {(\bar x_1 - \bar x_2)-d_0 \over \sqrt{s_1^2/n_1 + s_2^2 /n_2}}$ and d.f. $\displaystyle v = {(s_1^2 /n_1 + s_2^2 / n_2)^2 \over (s_1^2/n_1)^2/(n_1 - 1) + (s_2^2 /n_2)^2 / (n_2 - 1)}$,

• If $-t_{\alpha/2, v} < t < t_{\alpha/2, v}$, do not reject $H_0$.
• Otherwise, reject $H_0$.

### Paired Observations

Let $\bar D$ be the sample mean and $S_d$ be the standard deviation of the differences of the observations.

For $\displaystyle t = {\bar D - \mu_d \over S_d / \sqrt n}$,

• If $-t_{\alpha/2, n-1} < t < t_{\alpha/2, n-1}$, do not reject $H_0$.
• Otherwise, reject $H_0$.

## Choice of Sample Size for Testing Means

### Using $z$-value

• $H_0 : \mu = \mu_0$
• Alternative: $\mu = \mu_0 + \delta$

Note that $1 - \beta$ is the power of test

• One-side tailed test: $\displaystyle n = {(z_{\alpha} + z_{\beta})^2 \sigma^2 \over \delta^2}$
• Two-side tailed test: $\displaystyle n = {(z_{\alpha/2}+z_{\beta})^2 \sigma^2 \over \delta^2}$

### Two-Sample Case

Suppose $\sigma_1$ and $\sigma_2$ are known.

• One-side tailed test: $\displaystyle n = {(z_\alpha + z_\beta)^2 (\sigma_1^2 + \sigma_2^2) \over \delta^2}$
• Two-side tailed test: $\displaystyle n = {(z_{\alpha/2} + z_{\beta})^2 (\sigma_1^2 + \sigma_2^2) \over \delta^2}$

Suppose unknown.

• Should use non-central $t$-distribution
• Note that $\displaystyle \Delta = {|\delta| \over \sigma}$

## One Sample: Test on a Single Proportion

### Small Samples

• $H_0 : p = p_0$, $H_1 : p < p_0, p > p_0$, or $p \neq p_0$
• Choose $\alpha$, a level of significance
• Test statistic: binomial variable $X$ with $p = p_0$
• Computations: find $x$, the number of successes, and compute the appropriate $P$-value
• Make conclusions based on the $P$-value

If we use normal approximation, the $z$-value for $p = p_0$ is $\displaystyle z = {\hat p - p_0 \over \sqrt{p_0q_0/n}}$

## Two Samples: Tests on Two Proportions

• $H_0: p_1 = p_2$
• pooled estimate of the proportion $\displaystyle \hat p = {x_1 + x_2 \over n_1 + n_2}$
• $\displaystyle z = {\hat p_1 - \hat p_2 \over \sqrt{\hat p \hat q (1 / n_1 + 1/n_2)}}$
• Two-side tailed: $|z| < z_{\alpha/2}$
• One-side: $z < z_{\alpha}$ or $z > -z_{\alpha}$

## Concerning Variances

### Single Sample

Suppose we test $\sigma^2 = \sigma_0^2$.

• $\displaystyle \chi^2 = {(n-1)s^2 \over \sigma_0^2}$
• We don't reject $H_0$ if $\chi_{1-\alpha/2}^2 < \chi^2 < \chi_{\alpha/2}^2$
• For testing $\sigma^2 < \sigma_0^2$, we don't reject if $\chi^2 > \chi_{1-\alpha}^2$
• For testing $\sigma^2 > \sigma_0^2$, we don't reject if $\chi^2 < \chi_\alpha^2$
• with d.f. $n-1$

### Two Samples

Suppose we test $\sigma_1^2 = \sigma_2^2$

• $\displaystyle f = {s_1^2 \over s_2^2}$
• Then, $f$ is $F$-distribution of $v_1 = n_1 - 1$ and $v_2 = n_2 - 1$
• We don't reject if $f_{1-\alpha/2} < f < f_{\alpha/2}$
• For $\sigma_1^2 < \sigma_2^2$, we don't reject if $f > f_{1-\alpha/2}$
• For $\sigma_1^2 > \sigma_2^2$, we don't reject if $f < f_{\alpha/2}$

## Goodness of Fit Test

We want to test if some sample follows specific distribution.

Example: is dice balanced?

• $\displaystyle \chi^2 = \sum_{i=1}^k {(o_i - e_i)^2 \over e_i}$ where
• $o_i$: observed frequencies
• $e_i$: expected frequencies

We can also use normality assumption.

Note: Geary's test

• Suppose $X_1, \cdots, X_n$ is taken from $N(\mu, \sigma^2)$
• Consider: $\displaystyle U = {{ \sqrt{\pi/2} \sum |x_i - \bar X| / n } \over \sqrt{\sum(X_i - \bar X)^2/n}}$
• $\displaystyle z = {U - 1 \over 0.2661 / \sqrt n}$

## Test for Independence (Categorical Data)

To test for independence, a expected probability of a cell is multiple of column and row.

• First, we calculate $\displaystyle \chi^2 = \sum_{i} {(o_i - e_i)^2 \over e_i}$
• Now, test $\chi^2 > \chi_{\alpha}^2$ with $v = (r-1)(c-1)$ d.f.
• Note that if $v = 1$, we should use $\displaystyle \chi^2 = \sum_i {(|o_i - e_i| - 0.5)^2 \over e_i}$ instead.

## Test for Homogeneity

We want to test each row is homogeneous w.r.t each column

• Compute $\chi$ and test $\chi^2 > \chi_\alpha^2$

### Testing for Several Proportions

Just use that with binomial and test $p_1 = p_2 = \cdots$

#### 'Mathematics > Statistics' 카테고리의 다른 글

 Simple Linear Regression and Correlation  (0) 2021.10.24 2021.10.24 2021.10.24