metaphor data.csv
file from https://osf.io/qrc6b/metaphor
which is the
result of calling read_csv()
on
metaphor data.csv
library(tidyverse)
metaphor <- read_csv('https://www.stephenskalicky.com/r_data/metaphor_data.csv')
## Rows: 1304 Columns: 28
## ── Column specification ────────────────────────────────────────────
## Delimiter: ","
## chr (6): metaphor_id, response, met_type, sex, hand, language_group
## dbl (22): subject, conceptual, nm, trial_order, met_stim, met_RT, age, colle...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
met.small
from
metaphor
. Using dplyr::select()
, choose the
following columns:met.small <- metaphor %>%
dplyr::select(subject, met_type, met_RT, conceptual, nm, NFC)
mutate
the RT (the time spent writing
the metaphor) into seconds. The current measurement is in milliseconds,
and we want seconds. Therefore we need to divide met_RT
by
1000. Using mutate
, create a new variable in
met.small
named RT, which is the result of dividing
met_RT
by 1000. (Note that I am going to extend the pipe
from the original creation of met.small
each time).met.small <- metaphor %>%
dplyr::select(subject, met_type, met_RT, conceptual, nm, NFC) %>%
mutate(RT = met_RT/1000)
Next, we need to remove outliers. We will define an outlier as someone who spent longer than 2.5 standard deviations writing their metaphor. We will use z-scores to help us with this (don’t worry if you do not know what that is).
Using mutate, create a new variable in met.small
named zRT
, which is the result of calling the function
scale
on met_RT
.
met.small <- metaphor %>%
dplyr::select(subject, met_type, met_RT, conceptual, nm, NFC) %>%
mutate(RT = met_RT/1000) %>%
mutate(zRT = scale(met_RT))
mutate
call which
creates a new variable named outliers
. The value of
outliers
will be a 1
if zRT
is
>= to 2.5, otherwise it will be a 0
. To do this, we can
use the if_else
function in our mutate
call
(we could also use the case_when
function).The basic syntax for if_else
is
if_else(condition, A, B)
, where if condition = TRUE, do A,
otherwise, do B. You can write your mutate call like this:
mutate(outliers = if_else(condition, A, B))
It is up to you to write the correct values for
condition
, A
, and B
.
Below is the final pipe will all the previous commands in one pipe.
This is again why pipes are cool - you can add each line, step-by-step,
as part of your data cleaning / wrangling process. You could easily put
all the mutate functions into one call to mutate()
, but
this method has the advantage of being a bit more easy to read and see
how steps link to one another.
met.small <- metaphor %>%
dplyr::select(subject, met_type, met_RT, conceptual, nm, NFC) %>%
mutate(RT = met_RT/1000) %>%
mutate(zRT = scale(met_RT)) %>%
mutate(outliers = ifelse(zRT >= 2.5, 1, 0))
met.small
?sum(met.small$outliers)
## [1] 34
met.trim
which is the result
of removing the outliers from met.small
. Use the
filter()
function.met.trim <- met.small %>%
dplyr::filter(outliers == 0)
## Warning: Using one column matrices in `filter()` was deprecated in dplyr
## 1.1.0.
## ℹ Please use one dimensional logical vectors instead.
## ℹ The deprecated feature was likely used in the dplyr package.
## Please report the issue at
## <https://github.com/tidyverse/dplyr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this
## warning was generated.
# sanity check
sum(met.trim$outliers)
## [1] 0
met.violin
which is the
result of calling the ggplot()
function. Inside the
ggplot()
call, set the data
argument to
met.trim
, and make a aes()
call with the
correct x and y axis in order to replicate the chart above. View your
plot by running the name of the object (i.e., run
met.violin
by itself). Compare your Figure to the one above
- is it correct? What else are we missing?met.violin <- ggplot(data = met.trim, aes(x = met_type, y = RT)) +
geom_violin()
met.violin
geom_violin
, add a new line with
only the function theme_base()
- you will need the package
ggthemes()
to do thisgeom_violin()
function, add
aes(fill = met_type)
+
sign to add a new function to your ggplot which
is scale_x_discrete()
. This function can rename the values
on your x axis using the labels()
function. Use the
template scale_x_discrete(labels = c())
to rename the
values of met_type
to match the Figure above.+
sign to add a new function to your ggplot which
is labs
. This function will renamed your axes and figure.
Use x =
,y =
, and title =
inside
labs
to create new labels. We want NO label for the x axis,
what can you type to make that happen?theme(legend.position = 'none')
geom_jitter()
call. Using this information,
what do violin plots show us? How can you interpret this data?library(ggthemes)
# with a final geom_point() - the original figure does not have these though.
met.violin <- ggplot(data = met.trim, aes(x = met_type, y = RT)) +
geom_violin(aes(fill = met_type)) +
theme_base() +
scale_x_discrete(labels = c('Conventional', 'Novel')) +
labs(y = 'Production Time (seconds)', x = '', title = 'Metaphor Production Times') +
theme(legend.position = 'none') +
geom_jitter(aes(alpha = .5))
met.violin
ratings
which matches the right panel of the figure
above?ratings <- ggplot(data = met.trim, aes(x = met_type, y = nm)) +
theme_base() +
geom_violin(aes(fill = met_type)) +
scale_x_discrete(labels = c('Conventional', 'Novel')) +
labs(y = 'Ratings (1-5)', x = '', title = 'Metaphor Novelty/Mirth Ratings') +
theme(legend.position = 'none') +
geom_jitter(aes(alpha = .5))
ratings
gridExtra
. Install the package and then use the
grid.arrange()
function to join the two figures.gridExtra::grid.arrange(met.violin, ratings, nrow = 1)