dplyrdplyr verbs and pipes (%>%)dplyr verbs.Recall the standard dplyr verb form:
dplyr verb is the data (a tibble or data.frame), and the next arguments specify how we are using the verb().library(tidyverse)
more_pets <- tibble(
names = c("Dude", "Pickle", "Kyle", "Nubs", "Marvin", "Figaro", "Slim"),
ages = c(6, 5, 3, 11, 11, 3, 6),
meals_per_day = c(2, 3, 3, 3, 1, 2, 2),
is_dog = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE)
)
more_pets# A tibble: 7 × 4
names ages meals_per_day is_dog
<chr> <dbl> <dbl> <lgl>
1 Dude 6 2 TRUE
2 Pickle 5 3 FALSE
3 Kyle 3 3 FALSE
4 Nubs 11 3 FALSE
5 Marvin 11 1 FALSE
6 Figaro 3 2 FALSE
7 Slim 6 2 TRUE
We’ll use more_pets throughout lecture today.
relocate()relocate() moves columns around in a tibbleages column to the front of the tibble# A tibble: 7 × 4
ages names meals_per_day is_dog
<dbl> <chr> <dbl> <lgl>
1 6 Dude 2 TRUE
2 5 Pickle 3 FALSE
3 3 Kyle 3 FALSE
4 11 Nubs 3 FALSE
5 11 Marvin 1 FALSE
6 3 Figaro 2 FALSE
7 6 Slim 2 TRUE
relocate().ages column to the end of the tibbleHint: similar to how we removed columns with select().
ages column to the end of the tibblenames column after agesnames column after ages# A tibble: 7 × 4
ages names meals_per_day is_dog
<dbl> <chr> <dbl> <lgl>
1 6 Dude 2 TRUE
2 5 Pickle 3 FALSE
3 3 Kyle 3 FALSE
4 11 Nubs 3 FALSE
5 11 Marvin 1 FALSE
6 3 Figaro 2 FALSE
7 6 Slim 2 TRUE
.after not afteris_dog column before agesis_dog column before agesmutate()mutate() creates new columns and adds them to the right side of an existing tibble.birth_year to more_pets# A tibble: 7 × 5
names ages meals_per_day is_dog birth_year
<chr> <dbl> <dbl> <lgl> <dbl>
1 Dude 6 2 TRUE 2018
2 Pickle 5 3 FALSE 2019
3 Kyle 3 3 FALSE 2021
4 Nubs 11 3 FALSE 2013
5 Marvin 11 1 FALSE 2013
6 Figaro 3 2 FALSE 2021
7 Slim 6 2 TRUE 2018
more_pets?more_pets <- mutate(more_pets,
birth_year = 2024 - ages)
# now the column is added to more_pets
more_pets# A tibble: 7 × 5
names ages meals_per_day is_dog birth_year
<chr> <dbl> <dbl> <lgl> <dbl>
1 Dude 6 2 TRUE 2018
2 Pickle 5 3 FALSE 2019
3 Kyle 3 3 FALSE 2021
4 Nubs 11 3 FALSE 2013
5 Marvin 11 1 FALSE 2013
6 Figaro 3 2 FALSE 2021
7 Slim 6 2 TRUE 2018
# A tibble: 7 × 6
names ages meals_per_day is_dog birth_year meals_per_year
<chr> <dbl> <dbl> <lgl> <dbl> <dbl>
1 Dude 6 2 TRUE 2018 730
2 Pickle 5 3 FALSE 2019 1095
3 Kyle 3 3 FALSE 2021 1095
4 Nubs 11 3 FALSE 2013 1095
5 Marvin 11 1 FALSE 2013 365
6 Figaro 3 2 FALSE 2021 730
7 Slim 6 2 TRUE 2018 730
# A tibble: 7 × 6
names ages meals_per_day is_dog birth_year meals_per_year
<chr> <dbl> <dbl> <lgl> <dbl> <dbl>
1 Dude 6 2 TRUE 2018 730
2 Pickle 5 3 FALSE 2019 1095
3 Kyle 3 3 FALSE 2021 1095
4 Nubs 11 3 FALSE 2013 1095
5 Marvin 11 1 FALSE 2013 365
6 Figaro 3 2 FALSE 2021 730
7 Slim 6 2 TRUE 2018 730
%>%) puts the dataset before it into the first argument of the following function.dplyr’s pipe (%>%)dplyr verbs.dplyr’s pipe (%>%)dplyr verbs.Example:
dplyr’s pipe (%>%)dplyr verbs.Example:
mutate()mutate(): transmute().mutate() adds a column and keeps all the previous columns from the tibble.transmute() on the other hand, adds a column and removes all the columns from the tibble.transmute() to add a logical column called age_dogyears to more_petstransmute() to add a column called age_dogyears to more_petstransmute() to add a column called age_dogyears to more_pets, but keep the names.What do you think will happen?
What do you think will happen?
post_pandemic that indicates whether or not a pet was born after 2020.# A tibble: 7 × 5
names ages meals_per_day is_dog birth_year
<chr> <dbl> <dbl> <lgl> <dbl>
1 Dude 6 2 TRUE 2018
2 Pickle 5 3 FALSE 2019
3 Kyle 3 3 FALSE 2021
4 Nubs 11 3 FALSE 2013
5 Marvin 11 1 FALSE 2013
6 Figaro 3 2 FALSE 2021
7 Slim 6 2 TRUE 2018
case_when()case_when() function allows use to add a column based on logical conditions.case_when() within a mutate() the form is something like this:condition_1 and condition_2 are logical statements.post_pandemic that indicates whether or not a pet was born after 2020.post_pandemic that indicates whether or not a pet was born after 2020.post_pandemic that indicates whether or not a pet was born after 2020.more_pets %>%
mutate(post_pandemic = case_when(
birth_year > 2020 ~ TRUE,
birth_year <= 2020 ~ FALSE
)
)# A tibble: 7 × 6
names ages meals_per_day is_dog birth_year post_pandemic
<chr> <dbl> <dbl> <lgl> <dbl> <lgl>
1 Dude 6 2 TRUE 2018 FALSE
2 Pickle 5 3 FALSE 2019 FALSE
3 Kyle 3 3 FALSE 2021 TRUE
4 Nubs 11 3 FALSE 2013 FALSE
5 Marvin 11 1 FALSE 2013 FALSE
6 Figaro 3 2 FALSE 2021 TRUE
7 Slim 6 2 TRUE 2018 FALSE
type_of_animal that tells us what type of animal each of the pets aretype_of_animal that tells us what type of animal each of the pets aretype_of_animal that tells us what type of animal each of the pets are# A tibble: 7 × 6
names ages meals_per_day is_dog birth_year type_of_animal
<chr> <dbl> <dbl> <lgl> <dbl> <chr>
1 Dude 6 2 TRUE 2018 dog
2 Pickle 5 3 FALSE 2019 cat
3 Kyle 3 3 FALSE 2021 cat
4 Nubs 11 3 FALSE 2013 cat
5 Marvin 11 1 FALSE 2013 sheep/ram
6 Figaro 3 2 FALSE 2021 cat
7 Slim 6 2 TRUE 2018 dog
dplyrdplyr provides intuitive and powerful ways to summarize data.summarize()summarize() function takes a very similar form to the other dplyr functions.group_by()group_by() function allows us to group our tibble by a variable of interest.group_by() on its own, does not change the rows or columns of the tibble, it just makes it “grouped”summarize() “grouped” data, we get the results for each group.# A tibble: 7 × 6
names ages meals_per_day is_dog birth_year type_of_animal
<chr> <dbl> <dbl> <lgl> <dbl> <chr>
1 Dude 6 2 TRUE 2018 dog
2 Pickle 5 3 FALSE 2019 cat
3 Kyle 3 3 FALSE 2021 cat
4 Nubs 11 3 FALSE 2013 cat
5 Marvin 11 1 FALSE 2013 sheep/ram
6 Figaro 3 2 FALSE 2021 cat
7 Slim 6 2 TRUE 2018 dog
# A tibble: 7 × 6
# Groups: type_of_animal [3]
names ages meals_per_day is_dog birth_year type_of_animal
<chr> <dbl> <dbl> <lgl> <dbl> <chr>
1 Dude 6 2 TRUE 2018 dog
2 Pickle 5 3 FALSE 2019 cat
3 Kyle 3 3 FALSE 2021 cat
4 Nubs 11 3 FALSE 2013 cat
5 Marvin 11 1 FALSE 2013 sheep/ram
6 Figaro 3 2 FALSE 2021 cat
7 Slim 6 2 TRUE 2018 dog
dplyr