Data Wrangling with dplyr

FOR 128: Lab 7

Published

October 17, 2024

Welcome

Welcome to Lab 7! Today, we’ll focus on writing dplyr code. In particular, we will both use the verbs individually and “write a sentence” with the verbs by stringing them together with pipes.

Learning objectives

  • Use dplyr verbs together with pipes.

Deliverables (i.e., what to put in the lab drop box)

Upload your rendered PDF (lab_07.pdf) and Quarto (lab_07.qmd) document to the lab drop box. Make sure the Quarto document properly renders to PDF.

Exercise 0

Load any packages you’ll need for this lab below.

Exercise 1

Create the following dataset and call it plots. The resulting tibble should look like this when printed:

plots
# A tibble: 5 × 6
   plot  tree   dbh  logs type  live 
  <dbl> <dbl> <dbl> <dbl> <chr> <lgl>
1     1     1  20.2   2   D     TRUE 
2     1     2  10.4   1   D     TRUE 
3     2     1   5     0.5 D     TRUE 
4     2     2  18    NA   C     FALSE
5     2     3  10.5   1.5 C     TRUE 

Exercise 2

Write some code to figure out the follow features of plots:

  1. How many rows and columns?
  2. What are the column names?
  3. What is the data type of each column?
  4. Are there any NA values? If so, in which column?

Exercise 3

Use a dplyr function to print all trees in plot 2.

# A tibble: 3 × 6
   plot  tree   dbh  logs type  live 
  <dbl> <dbl> <dbl> <dbl> <chr> <lgl>
1     2     1   5     0.5 D     TRUE 
2     2     2  18    NA   C     FALSE
3     2     3  10.5   1.5 C     TRUE 

Exercise 4

Use a dplyr function to print all trees in plot 2 that have dbh less than or equal 10.

# A tibble: 1 × 6
   plot  tree   dbh  logs type  live 
  <dbl> <dbl> <dbl> <dbl> <chr> <lgl>
1     2     1     5   0.5 D     TRUE 

Exercise 5

Use a dplyr function to print the tree with the largest dbh.

# A tibble: 1 × 6
   plot  tree   dbh  logs type  live 
  <dbl> <dbl> <dbl> <dbl> <chr> <lgl>
1     1     1  20.2     2 D     TRUE 

Exercise 6

Use a series of piped dplyr functions to find the largest dbh tree on plot 2.

# A tibble: 1 × 6
   plot  tree   dbh  logs type  live 
  <dbl> <dbl> <dbl> <dbl> <chr> <lgl>
1     2     2    18    NA C     FALSE

Exercise 7

Use a series of piped dplyr functions to find the largest dbh live tree on plot 2.

# A tibble: 1 × 6
   plot  tree   dbh  logs type  live 
  <dbl> <dbl> <dbl> <dbl> <chr> <lgl>
1     2     3  10.5   1.5 C     TRUE 

Exercise 8

Use a series of piped dplyr functions to find the largest dbh dead tree on plot 2.

# A tibble: 1 × 6
   plot  tree   dbh  logs type  live 
  <dbl> <dbl> <dbl> <dbl> <chr> <lgl>
1     2     2    18    NA C     FALSE

Exercise 9

Use a series of piped dplyr functions to find the largest dbh live tree on plot 2 of type D.

# A tibble: 1 × 6
   plot  tree   dbh  logs type  live 
  <dbl> <dbl> <dbl> <dbl> <chr> <lgl>
1     2     1     5   0.5 D     TRUE 

Exercise 10

Use a series of piped dplyr functions to find the smallest dbh tree on plot 1.

# A tibble: 1 × 6
   plot  tree   dbh  logs type  live 
  <dbl> <dbl> <dbl> <dbl> <chr> <lgl>
1     1     2  10.4     1 D     TRUE 

Exercise 11

Use a dplyr function to add a new column to plots to hold each tree’s basal area (ft\(^2\)). This new column should be called ba with values equal to 0.005454*dbh^2 (assuming dbh is in inches).

# A tibble: 5 × 7
   plot  tree   dbh  logs type  live     ba
  <dbl> <dbl> <dbl> <dbl> <chr> <lgl> <dbl>
1     1     1  20.2   2   D     TRUE  2.23 
2     1     2  10.4   1   D     TRUE  0.590
3     2     1   5     0.5 D     TRUE  0.136
4     2     2  18    NA   C     FALSE 1.77 
5     2     3  10.5   1.5 C     TRUE  0.601

Exercise 12

Use a dplyr function to move your newly created column ba to between the dbh and logs columns.

# A tibble: 5 × 7
   plot  tree   dbh    ba  logs type  live 
  <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <lgl>
1     1     1  20.2 2.23    2   D     TRUE 
2     1     2  10.4 0.590   1   D     TRUE 
3     2     1   5   0.136   0.5 D     TRUE 
4     2     2  18   1.77   NA   C     FALSE
5     2     3  10.5 0.601   1.5 C     TRUE 

Exercise 13

Use a series of piped dplyr functions to compute the mean dbh for trees on plots 1 and 2. Note, I called my mean mean_dbh.

# A tibble: 2 × 2
   plot mean_dbh
  <dbl>    <dbl>
1     1     15.3
2     2     11.2

Exercise 14

Use a series of piped dplyr functions to compute plot specific mean dbh and logs for trees. Exclude NA values from the mean calculations (hint, use the na.rm argument in mean()). Note, I called my mean mean_dbh and mean_logs.

# A tibble: 2 × 3
   plot mean_dbh mean_logs
  <dbl>    <dbl>     <dbl>
1     1     15.3       1.5
2     2     11.2       1  

Exercise 15

Use a series of piped dplyr functions to compute plot specific mean dbh and logs for live trees. Note, I called my mean mean_dbh and mean_logs. Why did only plot 2 mean dbh change from the your solution to Exercise 14?

# A tibble: 2 × 3
   plot mean_dbh mean_logs
  <dbl>    <dbl>     <dbl>
1     1    15.3        1.5
2     2     7.75       1  

Exercise 16

Sort plots by increasing plot number and increasing dbh within plot.

# A tibble: 5 × 7
   plot  tree   dbh    ba  logs type  live 
  <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <lgl>
1     1     2  10.4 0.590   1   D     TRUE 
2     1     1  20.2 2.23    2   D     TRUE 
3     2     1   5   0.136   0.5 D     TRUE 
4     2     3  10.5 0.601   1.5 C     TRUE 
5     2     2  18   1.77   NA   C     FALSE

Exercise 17

The type column holds values “D” and “C” which stand for deciduous and conifer, respectively. Use mutate() and the case_when() function to change values in the type column from “D” and “C” to “deciduous” and “conifer”.

plots
# A tibble: 5 × 7
   plot  tree   dbh    ba  logs type      live 
  <dbl> <dbl> <dbl> <dbl> <dbl> <chr>     <lgl>
1     1     1  20.2 2.23    2   deciduous TRUE 
2     1     2  10.4 0.590   1   deciduous TRUE 
3     2     1   5   0.136   0.5 deciduous TRUE 
4     2     2  18   1.77   NA   conifer   FALSE
5     2     3  10.5 0.601   1.5 conifer   TRUE 

Exercise 18

Use a series of piped dplyr functions to compute type specific mean dbh and logs. More specifically, I want you to use a grouped summarize(), where you group by type. Note, I called my mean mean_dbh and mean_logs.

# A tibble: 2 × 3
  type      mean_dbh mean_logs
  <chr>        <dbl>     <dbl>
1 conifer       14.2      1.5 
2 deciduous     11.9      1.17

Exercise 19

Use a series of piped dplyr functions to count the number of trees by type. Hint, use the n() within a grouped summarize(). I called my count n_trees.

# A tibble: 2 × 2
  type      n_trees
  <chr>       <int>
1 conifer         2
2 deciduous       3

Exercise 20

Use a series of piped dplyr functions to print the trees with the largest basal area within each plot.

# A tibble: 2 × 7
# Groups:   plot [2]
   plot  tree   dbh    ba  logs type      live 
  <dbl> <dbl> <dbl> <dbl> <dbl> <chr>     <lgl>
1     1     1  20.2  2.23     2 deciduous TRUE 
2     2     2  18    1.77    NA conifer   FALSE

Wrap up

Congratulations! You’ve made it to the end of Lab 7. Make sure to render your final document and submit both the .pdf and .qmd file to D2L.