WELCOME

Welcome to the workshop number 4: More about inferential stats with R.

Learning outcomes:

By the end of this assignment(s), you should be able to:

• Know-how to perform t.test for paired data

• Know-how to conduct and use One-way Anova test

Paired t-test

As you already know, t-Tests are a great way of identifying if two group means are statistically different. This can be done by comparing a sample to the population (one-sample) or comparing two different samples (two-sample).

T-tests are further broken down into two categories: unpaired t-tests and paired t-tests. This demo will focus on the latter.

Import Dataset:

To illustrate the paired t-test, we will use the Student’s Sleep Data.

Description Data which show the effect of two soporific drugs (increase in hours of sleep compared to control) on 10 patients.

Format A data frame with 20 observations on 3 variables.

  • [, 1] extra numeric increase in hours of sleep
  • [, 2] group factor drug given
  • [, 3] ID factor patient ID

Use the build-in sleep function to directly import into R the data set. Please, run the following R command:

# import R data set 'sleep'
sleep <- sleep
sleep
##    extra group ID
## 1    0.7     1  1
## 2   -1.6     1  2
## 3   -0.2     1  3
## 4   -1.2     1  4
## 5   -0.1     1  5
## 6    3.4     1  6
## 7    3.7     1  7
## 8    0.8     1  8
## 9    0.0     1  9
## 10   2.0     1 10
## 11   1.9     2  1
## 12   0.8     2  2
## 13   1.1     2  3
## 14   0.1     2  4
## 15  -0.1     2  5
## 16   4.4     2  6
## 17   5.5     2  7
## 18   1.6     2  8
## 19   4.6     2  9
## 20   3.4     2 10

Visualisation

boxplot(sleep$extra ~ sleep$group,
        col = c("red", "blue"),
        ylab = 'extra sleep',
        xlab = 'groups',
        main = 'formula = extra ~ groups')

DEMO 1:

Q1. How would you create the same box plot using ggplot library? How would you fix this error?

Q2. Research question: Is there a statistically significant effect of the drug on sleep hours?

## 
##  Paired t-test
## 
## data:  sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2]
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.4598858 -0.7001142
## sample estimates:
## mean of the differences 
##                   -1.58

Q3. What if we want to test the hypothesis that mean of group A is less than group B? Which parameter do you need to declare from the t.test(0) method?

## 
##  Paired t-test
## 
## data:  sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2]
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.4598858 -0.7001142
## sample estimates:
## mean of the differences 
##                   -1.58

Analysis of the Variance (ANOVA)

What is one-way ANOVA test?

The one-way analysis of variance (ANOVA), also known as one-factor ANOVA, is a statistical technique, commonly used to study differences between two or more group means. In one-way ANOVA, the data is organized into several groups base on one single grouping variable (also called factor variable). This section describes the basic principle of the one-way ANOVA test and provides practical anova test examples in R software.

ANOVA test hypotheses:

  • \(Ho\) –> Null hypothesis: the means of the different groups are the same.

  • \(Ha\) –> Alternative hypothesis: At least one sample mean is not equal to the others.

General steps to conduct an ANOVA

Here are the 2 steps you should follow to conduct a standard ANOVA in R:

  1. Create an ANOVA object using the aov() function. In the aov() function, specify the independent and dependent variable(s) with a formula with the format y ~ x1 where y is the dependent variable, and x1, are one (more more) factor independent variables.
# Step 1: Create an object (you can call it mod.aov)
mod.aov <- aov(formula = y ~ x1,
               data = data)
  1. Create a summary ANOVA table by applying the summary() function to the ANOVA object you created in Step 1.
# Step 2: Look at a summary of the aov object
summary(mod.aov)

Import dataset

Here, we’ll use the built-in R data set named PlantGrowth. It contains the weight of plants obtained under a control and two different treatment conditions.

plants <- PlantGrowth # import dataset with R built function
# show the levels == categpries
levels(plants$group)
## [1] "ctrl" "trt1" "trt2"

Check your data

To have an idea of what the data look like, we use the the function head(). The head() function randomly picks a few of the observations in the data frame to print out:

head(plants)
##   weight group
## 1   4.17  ctrl
## 2   5.58  ctrl
## 3   5.18  ctrl
## 4   6.11  ctrl
## 5   4.50  ctrl
## 6   4.61  ctrl

Note: In R terminology, the column “group” is called factor and the different categories (“ctr”, “trt1”, “trt2”) are named factor levels. The levels are ordered alphabetically.

# Show the levels
levels(plants$group)
## [1] "ctrl" "trt1" "trt2"

DEMO 2:

Q1: Create summary statistics by group. Why this error appears and how to solve it?

  • Visualise the data with boxplot

  • Compute one-way ANOVA test

Q2: We want to know if there is any significant difference between the average weights of plants in the 3 experimental conditions

Now, to see a full ANOVA summary table of the ANOVA object, apply the summary() to the ANOVA object from Step 1.


This work is licensed under the CC BY-NC 4.0 Creative Commons License.