pivoting data

Published

February 6, 2025

Pivoting data using tidyverse functions

The goal of this activity is to see how to convert data from wide and long format using tidyverse pivot functions.

First let’s consider the difference between long data and wide data. Wide data is where each row contains all the observations for a particular subject. Long data is where each row is an individual observation per subject. What is the difference?

Code
library(tidyverse)

Wide Data - each row contains three observations per subjects

# A tibble: 10 × 4
   subject   pre post1 post2
     <int> <dbl> <dbl> <dbl>
 1       1  23.9  49.7  40.6
 2       2  24.6  49.7  41.3
 3       3  25.3  53.5  40.0
 4       4  24.2  49.8  40.1
 5       5  23.6  54.2  37.9
 6       6  24.5  50.3  40.0
 7       7  24.5  51.8  40.2
 8       8  24.3  48.5  39.5
 9       9  24.5  48.3  40.4
10      10  24.6  47.7  39.9

Long data - each row containts one observation per subject (for the first three participants)

# A tibble: 9 × 3
  subject test  score
    <dbl> <chr> <dbl>
1       1 pre    56.2
2       1 post1  81.6
3       1 post2  29.6
4       2 pre    59.1
5       2 post1  24.0
6       2 post2  43.4
7       3 pre    79.6
8       3 post1  28.0
9       3 post2  63.3

Long data is typically the format you will want, but not always. Fortunately there is a method to swap back and forth between the two formats.

pivot_longer() and pivot_wider() are two methods to swap between these formats. They turn wide data to long, or long data to wide.

Let’s try to turn the object wide into long format using pivot_longer()
There are four crucial arguments:

  • data (the data object you are changing)
  • cols (the columns you want to manipulate)
  • names_to (the new name for the single column created by joining the columns named by cols)
  • values_to (the new name for the column which includes the values from the columns named in cols)

It is a bit confusing to wrap your head around at first. Basically, you are telling R to combine some number of columns into a single column.

Let’s practice.

1. Create a tibble named emotions with four columns and three rows.

  • subject with the values 1, 2, 3
  • first with the values “happy”, “sad” “happy”
  • second with the values “sad”, “happy”, “happy”
  • third with the values “sad” , “sad”, “sad”
Code
emotions <- tibble(subject = c(1,2,3),
                   first = c('happy', 'sad', 'happy'),
                   second = c('sad', 'happy', 'happy'),
                   third = c('sad', 'sad', 'sad'))

Your emotion tibble should look like this:

# A tibble: 3 × 4
  subject first second third
    <dbl> <chr> <chr>  <chr>
1       1 happy sad    sad  
2       2 sad   happy  sad  
3       3 happy happy  sad  

2. Change emotions to long format using pivot_longer()

  • Your goal is to have a single tibble with three columns
  • the columns will be named subject, sequence, and emotions
  • the new data will be 9 rows long, with three rows per subject
  • sequence will contain the values first, second, or third
  • emotions will contain the values happy or sad

How to do this?

  • Create a new tibble named emotions_long from emotions.
  • Use a single pipe into pivot_longer to combine the columns named first, second, and third into a single column.
  • You need to pass the names of all three columns to cols wrapped within c().
    • You do not need to put quotes around the names of the column
  • You need to choose which labels (‘sequence’ or ‘emotion’) to give to values_to and names_to
Code
emotions_long <- emotions %>%
  pivot_longer(cols = c(first, second, third), 
               values_to = "emotion", 
               names_to = "sequence")

If successful, your data should look like this

# A tibble: 9 × 3
  subject sequence emotion
    <dbl> <chr>    <chr>  
1       1 first    happy  
2       1 second   sad    
3       1 third    sad    
4       2 first    sad    
5       2 second   happy  
6       2 third    sad    
7       3 first    happy  
8       3 second   happy  
9       3 third    sad    

3. Now create wide data from long data using pivot_wider()

This function is similar to pivot_longer except now are going in the reverse direction - we want to take a single column and break it up into new columns. The crucial arguments are:

  • names_from (where to get the names of the new columns from?)

  • values_from (where to get the for the new columns from?)

  • Create a new tibble named emotions_wide from emotions_long

  • Using a single pipe, transform the data into wide using pivot_wider

  • You only need to use the two arguments listed above. Which column should be passed to which argument to get the correct output?

Code
emotions_wide <- emotions_long %>%
  pivot_wider(names_from = sequence, 
              values_from = emotion)

You should see this:

# A tibble: 3 × 4
  subject first second third
    <dbl> <chr> <chr>  <chr>
1       1 happy sad    sad  
2       2 sad   happy  sad  
3       3 happy happy  sad  

4. A realistic example - converting test data to wide for Chronbach’s alpha

Let’s say that you have some data in long format. These data are responses to a test, such as a c-test or test of vocabulary knowledge. While you need long data for your statistical analysis, you also want to run internal consistency using Chronbach’s alpha or some other similar measure. To do so, you need your data to be in wide format, where each row is a participant, and each column is a test item.

Let’s simulate some data for this, assuming 10 subjects each taking a test with 25 items:

Code
# list of subjects
subjects <-  rep(paste0("S", 1:10), each = 25)
subjects
  [1] "S1"  "S1"  "S1"  "S1"  "S1"  "S1"  "S1"  "S1"  "S1"  "S1"  "S1"  "S1" 
 [13] "S1"  "S1"  "S1"  "S1"  "S1"  "S1"  "S1"  "S1"  "S1"  "S1"  "S1"  "S1" 
 [25] "S1"  "S2"  "S2"  "S2"  "S2"  "S2"  "S2"  "S2"  "S2"  "S2"  "S2"  "S2" 
 [37] "S2"  "S2"  "S2"  "S2"  "S2"  "S2"  "S2"  "S2"  "S2"  "S2"  "S2"  "S2" 
 [49] "S2"  "S2"  "S3"  "S3"  "S3"  "S3"  "S3"  "S3"  "S3"  "S3"  "S3"  "S3" 
 [61] "S3"  "S3"  "S3"  "S3"  "S3"  "S3"  "S3"  "S3"  "S3"  "S3"  "S3"  "S3" 
 [73] "S3"  "S3"  "S3"  "S4"  "S4"  "S4"  "S4"  "S4"  "S4"  "S4"  "S4"  "S4" 
 [85] "S4"  "S4"  "S4"  "S4"  "S4"  "S4"  "S4"  "S4"  "S4"  "S4"  "S4"  "S4" 
 [97] "S4"  "S4"  "S4"  "S4"  "S5"  "S5"  "S5"  "S5"  "S5"  "S5"  "S5"  "S5" 
[109] "S5"  "S5"  "S5"  "S5"  "S5"  "S5"  "S5"  "S5"  "S5"  "S5"  "S5"  "S5" 
[121] "S5"  "S5"  "S5"  "S5"  "S5"  "S6"  "S6"  "S6"  "S6"  "S6"  "S6"  "S6" 
[133] "S6"  "S6"  "S6"  "S6"  "S6"  "S6"  "S6"  "S6"  "S6"  "S6"  "S6"  "S6" 
[145] "S6"  "S6"  "S6"  "S6"  "S6"  "S6"  "S7"  "S7"  "S7"  "S7"  "S7"  "S7" 
[157] "S7"  "S7"  "S7"  "S7"  "S7"  "S7"  "S7"  "S7"  "S7"  "S7"  "S7"  "S7" 
[169] "S7"  "S7"  "S7"  "S7"  "S7"  "S7"  "S7"  "S8"  "S8"  "S8"  "S8"  "S8" 
[181] "S8"  "S8"  "S8"  "S8"  "S8"  "S8"  "S8"  "S8"  "S8"  "S8"  "S8"  "S8" 
[193] "S8"  "S8"  "S8"  "S8"  "S8"  "S8"  "S8"  "S8"  "S9"  "S9"  "S9"  "S9" 
[205] "S9"  "S9"  "S9"  "S9"  "S9"  "S9"  "S9"  "S9"  "S9"  "S9"  "S9"  "S9" 
[217] "S9"  "S9"  "S9"  "S9"  "S9"  "S9"  "S9"  "S9"  "S9"  "S10" "S10" "S10"
[229] "S10" "S10" "S10" "S10" "S10" "S10" "S10" "S10" "S10" "S10" "S10" "S10"
[241] "S10" "S10" "S10" "S10" "S10" "S10" "S10" "S10" "S10" "S10"
Code
# list of items
items <- rep(c(1:25), 10)
items
  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
 [26]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
 [51]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
 [76]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[101]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[126]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[151]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[176]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[201]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[226]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Code
# list of test scores
test_scores <- as.vector(replicate(10, sample(c(0,1), size = 25, replace = T)))
test_scores
  [1] 1 0 0 1 1 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 0 1 0 1 1 1 1 0 1 1 1 0 0 1 1 1
 [38] 1 0 1 0 0 1 1 0 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 0 0 1 0 0 1
 [75] 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 1 0 1 1 1 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 1 0
[112] 1 0 1 1 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 1 0 0 1 0 0 1 0 0 1
[149] 0 0 1 0 1 0 1 1 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 0 0 1 0 1 0
[186] 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 0 1 1 0 1 0 0 1 0 1 1 0 0 1 1
[223] 1 0 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 0 0 1 1 1 1 0 0 0
Code
# combine into a df
fake_data <- tibble(subjects, items, test_scores)
fake_data
# A tibble: 250 × 3
   subjects items test_scores
   <chr>    <int>       <dbl>
 1 S1           1           1
 2 S1           2           0
 3 S1           3           0
 4 S1           4           1
 5 S1           5           1
 6 S1           6           1
 7 S1           7           1
 8 S1           8           1
 9 S1           9           0
10 S1          10           0
# ℹ 240 more rows

You need to pivot this so that each column is an item, and each row is a participant. Your new data frame should be thus ten rows long (10 participants) and 27 columns wide (one column per item, plus a column for subjects).

Code
fake_data_wide <- fake_data %>%
  pivot_wider(names_from = items, 
              values_from = test_scores)

It should look like this:

Code
fake_data_wide
# A tibble: 10 × 26
   subjects   `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`  `11`
   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 S1           1     0     0     1     1     1     1     1     0     0     0
 2 S2           1     1     1     0     1     1     1     0     0     1     1
 3 S3           1     1     1     1     1     1     1     1     1     1     1
 4 S4           0     1     1     0     0     1     0     1     1     1     1
 5 S5           1     1     1     0     1     1     0     1     0     1     0
 6 S6           0     0     0     0     0     0     1     1     1     1     0
 7 S7           1     0     1     0     1     1     0     0     0     1     0
 8 S8           1     1     0     0     0     0     1     0     1     0     1
 9 S9           0     0     0     0     1     1     1     0     1     1     0
10 S10          0     1     0     1     1     1     0     0     1     1     0
# ℹ 14 more variables: `12` <dbl>, `13` <dbl>, `14` <dbl>, `15` <dbl>,
#   `16` <dbl>, `17` <dbl>, `18` <dbl>, `19` <dbl>, `20` <dbl>, `21` <dbl>,
#   `22` <dbl>, `23` <dbl>, `24` <dbl>, `25` <dbl>

Now you can run an internal consistency analysis using psych::alpha() or some other calculation! The best part is, you didn’t have to change it all manually in excel :-)

Code
psych::alpha(fake_data_wide[2:ncol(fake_data_wide)], check.keys = T)$total
 raw_alpha std.alpha   G6(smc) average_r      S/N       ase  mean        sd
  0.769761 0.7670732 0.7363903 0.1163953 3.293195 0.1060218 0.512 0.1966836
  median_r
 0.1020621