# Create sample data
<- data.frame(
df a = c(1, 2, 3, 4, 5),
b = c(10, 20, 30, 40, 50),
c = c(100, 200, 300, 400, 500)
)
quick introduction to map()
and apply()
quick comparison: apply() vs. purrr::map()
Both apply()
and map()
let you apply functions to data without writing loops, but they work differently: - apply()
: Base R function for matrices and data frames - map()
: tidyverse function for vectors and lists
Example 1: Calculate Mean of Each Column
df
a b c
1 1 10 100
2 2 20 200
3 3 30 300
4 4 40 400
5 5 50 500
Using apply()
apply()
is one member of the larger apply()
family that is meant for dataframes
# MARGIN = 2 sets this to be column-wise
apply(X = df, MARGIN = 2, FUN = mean)
a b c
3 30 300
# compare to MARGIN = 1
apply(X = df, MARGIN = 1, FUN = mean)
[1] 37 74 111 148 185
We get an error using apply()
on a single column
apply(X = df$a, MARGIN = 2, FUN = mean)
Error in apply(X = df$a, MARGIN = 2, FUN = mean): dim(X) must have a positive length
Using map()
library(purrr)
We can accomplish similar things with map()
, from the purrr
family.
The different variations of map()
let us specify the singular values we want returned. For example, we can use map_dbl()
to force a double to be returned from each element being iterated:
Calculate mean of all the columns using map_dbl()
map_dbl(df, mean)
a b c
3 30 300
What happens if we use map()
by itself? We see a different structure being returned
map(df, mean)
$a
[1] 3
$b
[1] 30
$c
[1] 300
map()
will return a list of values, which means you can actually ask for complex objects to be returned.
<- map(df,mean) test
We can see that it returns a list:
str(test)
List of 3
$ a: num 3
$ b: num 30
$ c: num 300
More complex functions
# create a list of vectors
<- list(
my_list x = 1:9,
y = 10:19,
z = 20:42
)
Create a simple function:
# What does this function do?
<- function(x) {
get_stats c(min = min(x), mean = mean(x), max = max(x))
}
get_stats(c(1,2,3))
min mean max
1 2 3
We can use the “list” version of apply()
to apply the function to our list:
lapply(my_list, get_stats)
$x
min mean max
1 5 9
$y
min mean max
10.0 14.5 19.0
$z
min mean max
20 31 42
Whereas map()
is happy to do it for us on its own:
# Using map()
map(my_list, get_stats)
$x
min mean max
1 5 9
$y
min mean max
10.0 14.5 19.0
$z
min mean max
20 31 42
map2()
map2()
is the same as map()
, but uses two inputs. With this knowledge, we could do a binomial test
# create data for binomial test
set.seed(42)
<- sample(x = seq(0, 25,), 10)
v1 <- rep(50, 10) v2
map2(v1, v2, ~ binom.test(.x, .y, p = .5))
[[1]]
Exact binomial test
data: .x and .y
number of successes = 16, number of trials = 50, p-value = 0.01535
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.1952042 0.4669938
sample estimates:
probability of success
0.32
[[2]]
Exact binomial test
data: .x and .y
number of successes = 4, number of trials = 50, p-value = 4.462e-10
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.02222796 0.19234278
sample estimates:
probability of success
0.08
[[3]]
Exact binomial test
data: .x and .y
number of successes = 0, number of trials = 50, p-value = 1.776e-15
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.00000000 0.07112174
sample estimates:
probability of success
0
[[4]]
Exact binomial test
data: .x and .y
number of successes = 9, number of trials = 50, p-value = 5.614e-06
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.08576208 0.31436941
sample estimates:
probability of success
0.18
[[5]]
Exact binomial test
data: .x and .y
number of successes = 3, number of trials = 50, p-value = 3.708e-11
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.01254859 0.16548195
sample estimates:
probability of success
0.06
[[6]]
Exact binomial test
data: .x and .y
number of successes = 17, number of trials = 50, p-value = 0.03284
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.2120547 0.4876525
sample estimates:
probability of success
0.34
[[7]]
Exact binomial test
data: .x and .y
number of successes = 25, number of trials = 50, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.355273 0.644727
sample estimates:
probability of success
0.5
[[8]]
Exact binomial test
data: .x and .y
number of successes = 14, number of trials = 50, p-value = 0.002602
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.1623106 0.4249054
sample estimates:
probability of success
0.28
[[9]]
Exact binomial test
data: .x and .y
number of successes = 6, number of trials = 50, p-value = 3.244e-08
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.04533532 0.24310132
sample estimates:
probability of success
0.12
[[10]]
Exact binomial test
data: .x and .y
number of successes = 21, number of trials = 50, p-value = 0.3222
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.2818822 0.5679396
sample estimates:
probability of success
0.42