1. Download the metaphor data.csv file from https://osf.io/qrc6b/
  2. Create a new object in R named metaphor which is the result of calling read_csv() on metaphor data.csv
library(tidyverse)
metaphor <- read_csv('https://www.stephenskalicky.com/r_data/metaphor_data.csv')
## Rows: 1304 Columns: 28
## ── Column specification ────────────────────────────────────────────
## Delimiter: ","
## chr  (6): metaphor_id, response, met_type, sex, hand, language_group
## dbl (22): subject, conceptual, nm, trial_order, met_stim, met_RT, age, colle...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. Using a pipe, make a new object called met.small from metaphor. Using dplyr::select(), choose the following columns:
met.small <- metaphor %>%
  dplyr::select(subject, met_type, met_RT, conceptual, nm, NFC)
  1. We want to recreate this figure:

  1. First we need to mutate the RT (the time spent writing the metaphor) into seconds. The current measurement is in milliseconds, and we want seconds. Therefore we need to divide met_RT by 1000. Using mutate, create a new variable in met.small named RT, which is the result of dividing met_RT by 1000. (Note that I am going to extend the pipe from the original creation of met.small each time).
met.small <- metaphor %>%
  dplyr::select(subject, met_type, met_RT, conceptual, nm, NFC) %>%
  mutate(RT = met_RT/1000)
  1. Next, we need to remove outliers. We will define an outlier as someone who spent longer than 2.5 standard deviations writing their metaphor. We will use z-scores to help us with this (don’t worry if you do not know what that is).

  2. Using mutate, create a new variable in met.small named zRT, which is the result of calling the function scale on met_RT.

met.small <- metaphor %>%
  dplyr::select(subject, met_type, met_RT, conceptual, nm, NFC) %>%
  mutate(RT = met_RT/1000) %>%
  mutate(zRT = scale(met_RT))
  1. Then, extend the pipe to a new mutate call which creates a new variable named outliers. The value of outliers will be a 1 if zRT is >= to 2.5, otherwise it will be a 0. To do this, we can use the if_else function in our mutate call (we could also use the case_when function).

The basic syntax for if_else is if_else(condition, A, B), where if condition = TRUE, do A, otherwise, do B. You can write your mutate call like this:

mutate(outliers = if_else(condition, A, B))

It is up to you to write the correct values for condition, A, and B.

Below is the final pipe will all the previous commands in one pipe. This is again why pipes are cool - you can add each line, step-by-step, as part of your data cleaning / wrangling process. You could easily put all the mutate functions into one call to mutate(), but this method has the advantage of being a bit more easy to read and see how steps link to one another.

met.small <- metaphor %>%
  dplyr::select(subject, met_type, met_RT, conceptual, nm, NFC) %>%
  mutate(RT = met_RT/1000) %>%
  mutate(zRT = scale(met_RT)) %>%
  mutate(outliers = ifelse(zRT >= 2.5, 1, 0))
  1. How many outliers are there? How can you easily find out using one R function applied to met.small?
sum(met.small$outliers)
## [1] 34
  1. Create a new object named met.trim which is the result of removing the outliers from met.small. Use the filter() function.
met.trim <- met.small %>%
  dplyr::filter(outliers == 0)
## Warning: Using one column matrices in `filter()` was deprecated in dplyr
## 1.1.0.
## ℹ Please use one dimensional logical vectors instead.
## ℹ The deprecated feature was likely used in the dplyr package.
##   Please report the issue at
##   <https://github.com/tidyverse/dplyr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this
## warning was generated.
# sanity check
sum(met.trim$outliers)
## [1] 0
  1. Create a new object named met.violin which is the result of calling the ggplot() function. Inside the ggplot() call, set the data argument to met.trim, and make a aes() call with the correct x and y axis in order to replicate the chart above. View your plot by running the name of the object (i.e., run met.violin by itself). Compare your Figure to the one above - is it correct? What else are we missing?
met.violin <- ggplot(data = met.trim, aes(x = met_type, y = RT)) +
  geom_violin()

met.violin

  1. Add the necessary ggplot information (go through this list one-at-a-time)
library(ggthemes)

# with a final geom_point() - the original figure does not have these though.
met.violin <- ggplot(data = met.trim, aes(x = met_type, y = RT)) +
  geom_violin(aes(fill = met_type)) +
  theme_base() +
  scale_x_discrete(labels = c('Conventional', 'Novel')) +
  labs(y = 'Production Time (seconds)', x = '', title = 'Metaphor Production Times') +
  theme(legend.position = 'none') +
  geom_jitter(aes(alpha = .5))

met.violin

  1. Using this information, can you make a new ggplot object named ratings which matches the right panel of the figure above?
ratings <- ggplot(data = met.trim, aes(x = met_type, y = nm)) +
  theme_base() +
  geom_violin(aes(fill = met_type)) + 
  scale_x_discrete(labels = c('Conventional', 'Novel')) +
  labs(y = 'Ratings (1-5)', x = '', title = 'Metaphor Novelty/Mirth Ratings') +
  theme(legend.position = 'none') +
  geom_jitter(aes(alpha = .5))
ratings

  1. Do you want to glue the figures together? You can by using the package gridExtra. Install the package and then use the grid.arrange() function to join the two figures.
gridExtra::grid.arrange(met.violin, ratings, nrow = 1)