Let’s focus again on the pipe operator and how it relates to data wrangling.

Do you remember our favourite package? it’s tidyverse. load it now

library(tidyverse)

But first we gotta review functions.

A function in R is a command. It is a pre-made set of instructions that are easily called upon by using the name of the function. Functions in R will have a parentheses after their name. For example, sum() is a function.

The parentheses contain the arguments for a function. Functions can have more than one argument. How many arguments can sum take?

One?

sum(1)
## [1] 1

Two?

sum(1,2)
## [1] 3

More?

sum(1,2,3,4,5,6,7,8,9,10)
## [1] 55

The help function gives you lots of information about functions, including the argument information about a function.

help(sum)
Description
sum returns the sum of all the values present in its arguments.

Arguments
...    numeric or complex or logical vectors.

na.rm     logical. Should missing values (including NaN) be removed?

help(function) or ?function are two ways to call help for things in R.

?sum

Functions usually aren’t as nice as sum(), and will require a default number of arguments. Not including the correct number will result in errors or other bad stuff. It’s always good to look at the help documentation for a function and scroll down to check the examples for use.

Let’s look at another function.

help(seq)
Description
Generate regular sequences...

Arguments
...  arguments passed to or from methods.

from, to    the starting and (maximal) end values of the sequence. Of length 1 unless just from is supplied as an unnamed argument.

by  number: increment of the sequence.

length.out  desired length of the sequence. A non-negative number, which for seq and seq.int will be rounded up if fractional.

along.with  take the length from the length of this argument.

the seq() function creates a sequence and has several arguments used to create a sequence. The first arguments are from and to. These define the boundaries of the sequence.

seq(from = 1, to = 8)
## [1] 1 2 3 4 5 6 7 8

Notice that you can actually type the names of the arguments and use = to specify their values. The seq() function also has a by argument - it allows you to specify the size of the sequence’s increments

seq(from = 10, to = 100, by = 5)
##  [1]  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85  90  95 100

Arguments also have default positions in a function. Compare:

# not calling arguments
seq(1,10)
##  [1]  1  2  3  4  5  6  7  8  9 10
# calling arguments 
seq(from = 1, to = 10)
##  [1]  1  2  3  4  5  6  7  8  9 10
# calling arguments in a different order
seq(to = 10, from = 1)
##  [1]  1  2  3  4  5  6  7  8  9 10

What is the default argument order for seq()?

Let’s make a variable

A variable is an R object that you create. It can be the result of a function, the result of data being loaded in, or the result of you manually typing the values. A variable has the name of the variable on the left side, followed by a <-, followed by the value of the variable.

# value is text
dogs <- 'cool'
cats <- 'drool'

# type the variable name to see the value. 
dogs
## [1] "cool"
cats
## [1] "drool"
# value is numbers
new.zealand <- 1
australia <- 2

new.zealand
## [1] 1
australia
## [1] 2

Can you use sequence to make a variable that is 10 digits between 2 and 20? Save the results as a variable named ten.twenty

ten.twenty <- seq(2,20,2)

ten.twenty
##  [1]  2  4  6  8 10 12 14 16 18 20

Check the help for the c() function.

?c

Can you use the c() function to make a variable that is the first ten digits of the English alphabet (a through j)? Save the results as a variable named ten.letters. Remember to use quotes around your letters.

ten.letters <- c('a','b','c','d','e','f','g','h','i','j')

ten.letters
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

Search the help for the length() function

?length

What is the length of ten.twenty and ten.letters?

length(ten.twenty)
## [1] 10
length(ten.letters)
## [1] 10

Using JUST the length function and the ten.twenty and ten.letters variables, can you create a new variable named one.hundred which is the value 100?

one.hundred <- length(ten.twenty) * length(ten.letters)

one.hundred
## [1] 100

Ok, what about pipes again?

Pipes are that weird %>% thing. We use them to chain functions together. It helps us write more readable code, saves time, avoids having to rename variables helps with debugging. Knowing how arguments work should help you understand the pipe a bit better now.

Let’s create a tibble of our variables. Name the tibble lals.pipes. The look at the tibble with the glimpse() function.

lals.pipes <- tibble(numbers = ten.twenty, letters = ten.letters)

glimpse(lals.pipes)
## Rows: 10
## Columns: 2
## $ numbers <dbl> 2, 4, 6, 8, 10, 12, 14, 16, 18, 20
## $ letters <chr> "a", "b", "c", "d", "e", "f", "g", "h", "i", "j"

Now let’s repeat the same procedure but with a pipe. What is different about this code? What does this tell us about what pipes “do”?

lals.pipes <- tibble(numbers = ten.twenty, letters = ten.letters) %>%
  glimpse()
## Rows: 10
## Columns: 2
## $ numbers <dbl> 2, 4, 6, 8, 10, 12, 14, 16, 18, 20
## $ letters <chr> "a", "b", "c", "d", "e", "f", "g", "h", "i", "j"

So, pipes take an R object and then pass the name of that R object to subsequent functions in the pipe. The first argument of many functions in R is the name of the object. Tidyverse is designed to take advantage of this.

To start a pipe with the same object, we can call the object on itself, like this

object <- object %>%
  more functions here...

Let’s add a column to our data. the mutate function adds varibles to a tibble. The first argument for mutate is the name of the object, and the second argument is the name of the new variable. However, the second argument also requires a = and then the values to set the variable to. For example mutate(data, variable1 = 1). You can also put a function inside the mutate() callmutate(data, variable_1 = sum(1,2)), which should demonstrate how powerful mutate() is.

Create a new variable in our lals.pipes called colour and assign it the value “blue”. Then run the glimpse() function on lals.pipes

lals.pipes <- lals.pipes %>%
  mutate(colour = 'blue')

glimpse(lals.pipes)
## Rows: 10
## Columns: 3
## $ numbers <dbl> 2, 4, 6, 8, 10, 12, 14, 16, 18, 20
## $ letters <chr> "a", "b", "c", "d", "e", "f", "g", "h", "i", "j"
## $ colour  <chr> "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue"…

What happened to our “colour” variable?

You can put make multiple variables with one mutate() function

d1 <- lals.pipes %>%
  mutate(variable1 = seq(1,10),
         variable2 = seq(11,20))

glimpse(d1)
## Rows: 10
## Columns: 5
## $ numbers   <dbl> 2, 4, 6, 8, 10, 12, 14, 16, 18, 20
## $ letters   <chr> "a", "b", "c", "d", "e", "f", "g", "h", "i", "j"
## $ colour    <chr> "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blu…
## $ variable1 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
## $ variable2 <int> 11, 12, 13, 14, 15, 16, 17, 18, 19, 20

Now for your challenge…

  1. Learn how to use the rep() command
  2. save a new variable named pipes.rule which is the value of lals.pipes
  3. mutate colour so that instead of ten values of blue, it instead alternates between “yellow” and “green” (please use the rep() function). You will have to use c() inside the rep() function.
  4. create a fourth variable named numbers2 in lals.pipes which is the same as numbers but in the opposite order (use the seq() function inside a mutate() function). You will have to use a negative by value.
  5. create a fifth variable named numbers3 in lals.pipes which is the result of multiplying numbers and numbers2
  6. do all of this using pipes using only the variables and functions listed in 1-4 and only one mutate() function.

Here is the answer with three mutate() calls

# Look up help for rep, it stands for "replicate" 
?rep


pipes.rule <- lals.pipes %>%
  mutate(colour = rep(c('yellow','green'),5)) %>%
  mutate(numbers2 = seq(20,2,-2)) %>%
  mutate(numbers3 = numbers * numbers2)
  
pipes.rule
## # A tibble: 10 × 5
##    numbers letters colour numbers2 numbers3
##      <dbl> <chr>   <chr>     <dbl>    <dbl>
##  1       2 a       yellow       20       40
##  2       4 b       green        18       72
##  3       6 c       yellow       16       96
##  4       8 d       green        14      112
##  5      10 e       yellow       12      120
##  6      12 f       green        10      120
##  7      14 g       yellow        8      112
##  8      16 h       green         6       96
##  9      18 i       yellow        4       72
## 10      20 j       green         2       40

Here is the answer with one mutate() call

pipes.rule <- lals.pipes %>%
  mutate(colour = rep(c('yellow','green'),5),
         numbers2 = seq(20,2,-2),
         numbers3 = numbers * numbers2)

pipes.rule
## # A tibble: 10 × 5
##    numbers letters colour numbers2 numbers3
##      <dbl> <chr>   <chr>     <dbl>    <dbl>
##  1       2 a       yellow       20       40
##  2       4 b       green        18       72
##  3       6 c       yellow       16       96
##  4       8 d       green        14      112
##  5      10 e       yellow       12      120
##  6      12 f       green        10      120
##  7      14 g       yellow        8      112
##  8      16 h       green         6       96
##  9      18 i       yellow        4       72
## 10      20 j       green         2       40