Welcome to the workshop number 4: More about inferential stats with R.
Learning outcomes:
By the end of this assignment(s), you should be able to:
• Know-how to perform t.test for paired data
• Know-how to conduct and use One-way Anova test
As you already know, t-Tests are a great way of identifying if two group means are statistically different. This can be done by comparing a sample to the population (one-sample) or comparing two different samples (two-sample).
T-tests are further broken down into two categories: unpaired t-tests and paired t-tests. This demo will focus on the latter.
To illustrate the paired t-test, we will use the Student’s Sleep Data.
Description Data which show the effect of two soporific drugs (increase in hours of sleep compared to control) on 10 patients.
Format A data frame with 20 observations on 3 variables.
Use the build-in sleep
function to directly import into R the data set. Please, run the following R command:
# import R data set 'sleep'
sleep <- sleep
sleep
## extra group ID
## 1 0.7 1 1
## 2 -1.6 1 2
## 3 -0.2 1 3
## 4 -1.2 1 4
## 5 -0.1 1 5
## 6 3.4 1 6
## 7 3.7 1 7
## 8 0.8 1 8
## 9 0.0 1 9
## 10 2.0 1 10
## 11 1.9 2 1
## 12 0.8 2 2
## 13 1.1 2 3
## 14 0.1 2 4
## 15 -0.1 2 5
## 16 4.4 2 6
## 17 5.5 2 7
## 18 1.6 2 8
## 19 4.6 2 9
## 20 3.4 2 10
Visualisation
boxplot(sleep$extra ~ sleep$group,
col = c("red", "blue"),
ylab = 'extra sleep',
xlab = 'groups',
main = 'formula = extra ~ groups')
Q1. How would you create the same box plot using ggplot library? How would you fix this error?
Q2. Research question: Is there a statistically significant effect of the drug on sleep hours?
##
## Paired t-test
##
## data: sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2]
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.4598858 -0.7001142
## sample estimates:
## mean of the differences
## -1.58
Q3. What if we want to test the hypothesis that mean of group A is less than group B? Which parameter do you need to declare from the t.test(0)
method?
##
## Paired t-test
##
## data: sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2]
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.4598858 -0.7001142
## sample estimates:
## mean of the differences
## -1.58
The one-way analysis of variance (ANOVA), also known as one-factor ANOVA, is a statistical technique, commonly used to study differences between two or more group means. In one-way ANOVA, the data is organized into several groups base on one single grouping variable (also called factor variable). This section describes the basic principle of the one-way ANOVA test and provides practical anova test examples in R software.
\(Ho\) –> Null hypothesis: the means of the different groups are the same.
\(Ha\) –> Alternative hypothesis: At least one sample mean is not equal to the others.
Here are the 2 steps you should follow to conduct a standard ANOVA in R:
aov()
function. In the aov()
function, specify the independent and dependent variable(s) with a formula with the format y ~ x1
where y
is the dependent variable, and x1, are one (more more) factor independent variables.# Step 1: Create an object (you can call it mod.aov)
mod.aov <- aov(formula = y ~ x1,
data = data)
summary()
function to the ANOVA object you created in Step 1.# Step 2: Look at a summary of the aov object
summary(mod.aov)
Here, we’ll use the built-in R data set named PlantGrowth. It contains the weight of plants obtained under a control and two different treatment conditions.
plants <- PlantGrowth # import dataset with R built function
# show the levels == categpries
levels(plants$group)
## [1] "ctrl" "trt1" "trt2"
To have an idea of what the data look like, we use the the function head(). The head() function randomly picks a few of the observations in the data frame to print out:
head(plants)
## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
Note: In R terminology, the column “group” is called factor and the different categories (“ctr”, “trt1”, “trt2”) are named factor levels. The levels are ordered alphabetically.
# Show the levels
levels(plants$group)
## [1] "ctrl" "trt1" "trt2"
Q1: Create summary statistics by group. Why this error appears and how to solve it?
Visualise the data with boxplot
Compute one-way ANOVA test
Q2: We want to know if there is any significant difference between the average weights of plants in the 3 experimental conditions
Now, to see a full ANOVA summary table of the ANOVA object, apply the summary() to the ANOVA object from Step 1.
This work is licensed under the CC BY-NC 4.0 Creative Commons License.