dplyr
dplyr
verbs and pipes (%>%
)dplyr
verbs.Recall the standard dplyr
verb form:
dplyr
verb is the data (a tibble or data.frame), and the next arguments specify how we are using the verb()
.library(tidyverse)
more_pets <- tibble(
names = c("Dude", "Pickle", "Kyle", "Nubs", "Marvin", "Figaro", "Slim"),
ages = c(6, 5, 3, 11, 11, 3, 6),
meals_per_day = c(2, 3, 3, 3, 1, 2, 2),
is_dog = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE)
)
more_pets
# A tibble: 7 × 4
names ages meals_per_day is_dog
<chr> <dbl> <dbl> <lgl>
1 Dude 6 2 TRUE
2 Pickle 5 3 FALSE
3 Kyle 3 3 FALSE
4 Nubs 11 3 FALSE
5 Marvin 11 1 FALSE
6 Figaro 3 2 FALSE
7 Slim 6 2 TRUE
We’ll use more_pets
throughout lecture today.
relocate()
relocate()
moves columns around in a tibbleages
column to the front of the tibble# A tibble: 7 × 4
ages names meals_per_day is_dog
<dbl> <chr> <dbl> <lgl>
1 6 Dude 2 TRUE
2 5 Pickle 3 FALSE
3 3 Kyle 3 FALSE
4 11 Nubs 3 FALSE
5 11 Marvin 1 FALSE
6 3 Figaro 2 FALSE
7 6 Slim 2 TRUE
relocate()
.ages
column to the end of the tibbleHint: similar to how we removed columns with select()
.
ages
column to the end of the tibblenames
column after ages
names
column after ages
# A tibble: 7 × 4
ages names meals_per_day is_dog
<dbl> <chr> <dbl> <lgl>
1 6 Dude 2 TRUE
2 5 Pickle 3 FALSE
3 3 Kyle 3 FALSE
4 11 Nubs 3 FALSE
5 11 Marvin 1 FALSE
6 3 Figaro 2 FALSE
7 6 Slim 2 TRUE
.after
not after
is_dog
column before ages
is_dog
column before ages
mutate()
mutate()
creates new columns and adds them to the right side of an existing tibble.birth_year
to more_pets
# A tibble: 7 × 5
names ages meals_per_day is_dog birth_year
<chr> <dbl> <dbl> <lgl> <dbl>
1 Dude 6 2 TRUE 2018
2 Pickle 5 3 FALSE 2019
3 Kyle 3 3 FALSE 2021
4 Nubs 11 3 FALSE 2013
5 Marvin 11 1 FALSE 2013
6 Figaro 3 2 FALSE 2021
7 Slim 6 2 TRUE 2018
more_pets
?more_pets <- mutate(more_pets,
birth_year = 2024 - ages)
# now the column is added to more_pets
more_pets
# A tibble: 7 × 5
names ages meals_per_day is_dog birth_year
<chr> <dbl> <dbl> <lgl> <dbl>
1 Dude 6 2 TRUE 2018
2 Pickle 5 3 FALSE 2019
3 Kyle 3 3 FALSE 2021
4 Nubs 11 3 FALSE 2013
5 Marvin 11 1 FALSE 2013
6 Figaro 3 2 FALSE 2021
7 Slim 6 2 TRUE 2018
# A tibble: 7 × 6
names ages meals_per_day is_dog birth_year meals_per_year
<chr> <dbl> <dbl> <lgl> <dbl> <dbl>
1 Dude 6 2 TRUE 2018 730
2 Pickle 5 3 FALSE 2019 1095
3 Kyle 3 3 FALSE 2021 1095
4 Nubs 11 3 FALSE 2013 1095
5 Marvin 11 1 FALSE 2013 365
6 Figaro 3 2 FALSE 2021 730
7 Slim 6 2 TRUE 2018 730
# A tibble: 7 × 6
names ages meals_per_day is_dog birth_year meals_per_year
<chr> <dbl> <dbl> <lgl> <dbl> <dbl>
1 Dude 6 2 TRUE 2018 730
2 Pickle 5 3 FALSE 2019 1095
3 Kyle 3 3 FALSE 2021 1095
4 Nubs 11 3 FALSE 2013 1095
5 Marvin 11 1 FALSE 2013 365
6 Figaro 3 2 FALSE 2021 730
7 Slim 6 2 TRUE 2018 730
%>%
) puts the dataset before it into the first argument of the following function.dplyr
’s pipe (%>%
)dplyr
verbs.dplyr
’s pipe (%>%
)dplyr
verbs.Example:
dplyr
’s pipe (%>%
)dplyr
verbs.Example:
mutate()
mutate()
: transmute()
.mutate()
adds a column and keeps all the previous columns from the tibble.transmute()
on the other hand, adds a column and removes all the columns from the tibble.transmute()
to add a logical column called age_dogyears
to more_pets
transmute()
to add a column called age_dogyears
to more_pets
transmute()
to add a column called age_dogyears
to more_pets
, but keep the names
.What do you think will happen?
What do you think will happen?
post_pandemic
that indicates whether or not a pet was born after 2020.# A tibble: 7 × 5
names ages meals_per_day is_dog birth_year
<chr> <dbl> <dbl> <lgl> <dbl>
1 Dude 6 2 TRUE 2018
2 Pickle 5 3 FALSE 2019
3 Kyle 3 3 FALSE 2021
4 Nubs 11 3 FALSE 2013
5 Marvin 11 1 FALSE 2013
6 Figaro 3 2 FALSE 2021
7 Slim 6 2 TRUE 2018
case_when()
case_when()
function allows use to add a column based on logical conditions.case_when()
within a mutate()
the form is something like this:condition_1
and condition_2
are logical statements.post_pandemic
that indicates whether or not a pet was born after 2020.post_pandemic
that indicates whether or not a pet was born after 2020.post_pandemic
that indicates whether or not a pet was born after 2020.more_pets %>%
mutate(post_pandemic = case_when(
birth_year > 2020 ~ TRUE,
birth_year <= 2020 ~ FALSE
)
)
# A tibble: 7 × 6
names ages meals_per_day is_dog birth_year post_pandemic
<chr> <dbl> <dbl> <lgl> <dbl> <lgl>
1 Dude 6 2 TRUE 2018 FALSE
2 Pickle 5 3 FALSE 2019 FALSE
3 Kyle 3 3 FALSE 2021 TRUE
4 Nubs 11 3 FALSE 2013 FALSE
5 Marvin 11 1 FALSE 2013 FALSE
6 Figaro 3 2 FALSE 2021 TRUE
7 Slim 6 2 TRUE 2018 FALSE
type_of_animal
that tells us what type of animal each of the pets aretype_of_animal
that tells us what type of animal each of the pets aretype_of_animal
that tells us what type of animal each of the pets are# A tibble: 7 × 6
names ages meals_per_day is_dog birth_year type_of_animal
<chr> <dbl> <dbl> <lgl> <dbl> <chr>
1 Dude 6 2 TRUE 2018 dog
2 Pickle 5 3 FALSE 2019 cat
3 Kyle 3 3 FALSE 2021 cat
4 Nubs 11 3 FALSE 2013 cat
5 Marvin 11 1 FALSE 2013 sheep/ram
6 Figaro 3 2 FALSE 2021 cat
7 Slim 6 2 TRUE 2018 dog
dplyr
dplyr
provides intuitive and powerful ways to summarize data.summarize()
summarize()
function takes a very similar form to the other dplyr
functions.group_by()
group_by()
function allows us to group our tibble by a variable of interest.group_by()
on its own, does not change the rows or columns of the tibble, it just makes it “grouped”summarize()
“grouped” data, we get the results for each group.# A tibble: 7 × 6
names ages meals_per_day is_dog birth_year type_of_animal
<chr> <dbl> <dbl> <lgl> <dbl> <chr>
1 Dude 6 2 TRUE 2018 dog
2 Pickle 5 3 FALSE 2019 cat
3 Kyle 3 3 FALSE 2021 cat
4 Nubs 11 3 FALSE 2013 cat
5 Marvin 11 1 FALSE 2013 sheep/ram
6 Figaro 3 2 FALSE 2021 cat
7 Slim 6 2 TRUE 2018 dog
# A tibble: 7 × 6
# Groups: type_of_animal [3]
names ages meals_per_day is_dog birth_year type_of_animal
<chr> <dbl> <dbl> <lgl> <dbl> <chr>
1 Dude 6 2 TRUE 2018 dog
2 Pickle 5 3 FALSE 2019 cat
3 Kyle 3 3 FALSE 2021 cat
4 Nubs 11 3 FALSE 2013 cat
5 Marvin 11 1 FALSE 2013 sheep/ram
6 Figaro 3 2 FALSE 2021 cat
7 Slim 6 2 TRUE 2018 dog
dplyr