group_by()
and summarize()
in further detail,n()
and n_distinct()
,across()
columns.dplyr
verbsRecall the standard dplyr
verb form:
dplyr
verb is the data (a tibble or data.frame), and the next arguments specify how we are using the verb()
.dplyr
functions can easily be “piped” between to create a pipeline from one verb to another.group_by()
and summarize()
.more_pets
library(tidyverse)
more_pets <- tibble(
names = c("Dude", "Pickle", "Kyle", "Nubs", "Marvin", "Figaro", "Slim"),
ages = c(6, 5, 3, 11, 11, 3, 6),
meals_per_day = c(2, 3, 3, 3, 1, 2, 2),
type_of_animal = c("dog", rep("cat", 3), "sheep/ram", "cat", "dog")
)
more_pets
# A tibble: 7 × 4
names ages meals_per_day type_of_animal
<chr> <dbl> <dbl> <chr>
1 Dude 6 2 dog
2 Pickle 5 3 cat
3 Kyle 3 3 cat
4 Nubs 11 3 cat
5 Marvin 11 1 sheep/ram
6 Figaro 3 2 cat
7 Slim 6 2 dog
dplyr
dplyr
provides intuitive and powerful ways to summarize data.summarize()
summarize()
function takes a very similar form to the other dplyr
functions.# A tibble: 1 × 1
years_lived
<dbl>
1 45
summarize()
Recall the quadratic mean:
summarize()
Recall the quadratic mean:
quad_mean <- function(x) {
return(sqrt(sum(x^2)) / length(x))
}
more_pets %>%
summarize(q_mean = quad_mean(ages),
a_mean = mean(ages))
# A tibble: 1 × 2
q_mean a_mean
<dbl> <dbl>
1 2.70 6.43
group_by()
group_by()
function allows us to group our tibble by a variable of interest.group_by()
on its own, does not change the rows or columns of the tibble, it just makes it “grouped”summarize()
“grouped” data, we get the results for each group.# A tibble: 3 × 2
type_of_animal number_of_types
<chr> <int>
1 cat 4
2 dog 2
3 sheep/ram 1
n()
n()
function counts for us!n()
, nothing ever goes in the parentheses.n()
in within a summarize()
or mutate()
on a grouped dataset. n()
counts the rows in each group for us.# A tibble: 7 × 5
# Groups: type_of_animal [3]
names ages meals_per_day type_of_animal other_of_same
<chr> <dbl> <dbl> <chr> <dbl>
1 Dude 6 2 dog 1
2 Pickle 5 3 cat 3
3 Kyle 3 3 cat 3
4 Nubs 11 3 cat 3
5 Marvin 11 1 sheep/ram 0
6 Figaro 3 2 cat 3
7 Slim 6 2 dog 1
# A tibble: 7 × 5
# Groups: type_of_animal [3]
names ages meals_per_day type_of_animal other_of_same
<chr> <dbl> <dbl> <chr> <dbl>
1 Dude 6 2 dog 1
2 Pickle 5 3 cat 3
3 Kyle 3 3 cat 3
4 Nubs 11 3 cat 3
5 Marvin 11 1 sheep/ram 0
6 Figaro 3 2 cat 3
7 Slim 6 2 dog 1
# A tibble: 1 × 1
n_types
<int>
1 3
n_distinct()
functionn()
, but takes columns as inputs,# A tibble: 3 × 2
type_of_animal unique_ages
<chr> <int>
1 cat 3
2 dog 1
3 sheep/ram 1
across()
columnsacross()
across()
takes two arguments: .cols
and .fns
.cols
argument specifies the columns we’d like to apply our function, .fns
, to.across()
within a mutate()
or a summarize()
.# A tibble: 7 × 4
names ages meals_per_day type_of_animal
<chr> <int> <int> <chr>
1 Dude 6 2 dog
2 Pickle 5 3 cat
3 Kyle 3 3 cat
4 Nubs 11 3 cat
5 Marvin 11 1 sheep/ram
6 Figaro 3 2 cat
7 Slim 6 2 dog
Bad example
Better
Best
more_pets %>%
summarize(
across(.cols = c(ages, meals_per_day),
.fns = c(mean = mean, standard_deviation = sd))
)
# A tibble: 1 × 4
ages_mean ages_standard_deviation meals_per_day_mean meals_per_day_standard_…¹
<dbl> <dbl> <dbl> <dbl>
1 6.43 3.36 2.29 0.756
# ℹ abbreviated name: ¹meals_per_day_standard_deviation
more_pets %>%
group_by(type_of_animal) %>%
summarize(
across(.cols = c(ages, meals_per_day),
.fns = c(mean = mean, standard_deviation = sd))
)
# A tibble: 3 × 5
type_of_animal ages_mean ages_standard_deviation meals_per_day_mean
<chr> <dbl> <dbl> <dbl>
1 cat 5.5 3.79 2.75
2 dog 6 0 2
3 sheep/ram 11 NA 1
# ℹ 1 more variable: meals_per_day_standard_deviation <dbl>
across()
Consider the toy dataset:
# A tibble: 1 × 2
dbh height
<dbl> <dbl>
1 NA 39
NA
:-(# A tibble: 1 × 2
dbh height
<dbl> <dbl>
1 12 39
.fns = mean
, but with lambda syntax we can specify additional arguments in the newfound parentheses.~
to specify lambda syntax.~
, we can now put parentheses after the function and specify additional arguments.# A tibble: 1 × 2
dbh height
<dbl> <dbl>
1 12 39
.x
in the place where the columns go (before, this happened implicitly)dplyr