<- c("tsugca", "tsugca", "betual", "acerru", "pinust", "pinust", "betual", "acerru")
spp <- c(15, 12, 6.6, 9.3, 28.1, 9.23, 15.3, 11.1)
dbh <- c("ugs", "ags", "ags", "ugs", "ags", "ags", "ugs", "ags")
qual <- c(TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE)
live
<- data.frame(spp, dbh, qual, live) plt
Logical subsetting vectors and data frames
Learning objectives
- create, subset, and manipulate vectors and data frames
- use comparison and logical operators
- practice combining logical tests to extract information from vectors and data frames
- start thinking “inside out”
Class exercise
Let’s again work with a data frame comprising four columns and eight rows.
Start a new script and copy the code above to create the plt
data frame.
Logical subsetting
So far we have directly specified the elements of vectors that we want to extract, for example spp[1]
or dbh[length(dbh)]
or spp[c(1,3,5)]
. We have also subsetted data frame rows and columns, for example, plt[c(2,5),]
, plt[nrow(plt),]
, plt[,1:2]
, or plt[,c("spp","dbh")]
. More commonly we want to extract elements that meet a condition, such as all trees greater than some minimum DBH or all trees of a given species. For this we use subsetting with logical vectors, see Section 6.6 in the course book.
Here are the comparison operators:
- Equal:
==
- Not equal:
!=
- Greater than:
>
- Less than:
<
- Greater than or equal to:
>=
- Less than or equal to:
<=
Let’s give these operators a spin using the plt
data frame.
First on a single column, i.e., a vector subset operation.
$qual[plt$qual == "ags"] # Understand this statement from the inside out! plt
[1] "ags" "ags" "ags" "ags" "ags"
Get all trees (i.e., rows) for species acerru.
$spp == "acerru", ] plt[plt
spp dbh qual live
4 acerru 9.3 ugs FALSE
8 acerru 11.1 ags TRUE
Get all trees (i.e., rows) but for acerru.
$spp != "acerru", ] plt[plt
spp dbh qual live
1 tsugca 15.00 ugs TRUE
2 tsugca 12.00 ags TRUE
3 betual 6.60 ags TRUE
5 pinust 28.10 ags TRUE
6 pinust 9.23 ags TRUE
7 betual 15.30 ugs FALSE
Get all trees (i.e., rows) with DBH greater than 10.
$dbh > 10, ] plt[plt
spp dbh qual live
1 tsugca 15.0 ugs TRUE
2 tsugca 12.0 ags TRUE
5 pinust 28.1 ags TRUE
7 betual 15.3 ugs FALSE
8 acerru 11.1 ags TRUE
Get all tree species with DBH greater than 10.
$dbh > 10, "spp"] plt[plt
[1] "tsugca" "tsugca" "pinust" "betual" "acerru"
Find all live trees. Note, why do I have plt$live
and not plt$live == TRUE
in the code below?
$live, ] plt[plt
spp dbh qual live
1 tsugca 15.00 ugs TRUE
2 tsugca 12.00 ags TRUE
3 betual 6.60 ags TRUE
5 pinust 28.10 ags TRUE
6 pinust 9.23 ags TRUE
8 acerru 11.10 ags TRUE
A bit on logical operators and subsetting
There are some logical operators we haven’t seen yet, including the “and” operator and the “or” operator.
- and:
&
- or:
|
The &
operator compares vector elements on its left and right to see if they match. If they are both TRUE
, then &
returns TRUE
, otherwise FALSE
. The |
operator compares vector elements on its left and right to see if either of them are TRUE
. If at least one is TRUE
then |
returns TRUE
, otherwise if both are FALSE
then FALSE
is returned. These operations are applied for each element pair along the vectors. For example:
c(FALSE, TRUE, FALSE) | c(TRUE, FALSE, FALSE)
[1] TRUE TRUE FALSE
c(FALSE, TRUE, FALSE) & c(TRUE, TRUE, FALSE)
[1] FALSE TRUE FALSE
So, say you want all acerru with DBH greater than 10.
$spp == "acerru" & plt$dbh > 10, ] plt[plt
spp dbh qual live
8 acerru 11.1 ags TRUE
Another useful logical operator is the !
(i.e., the exclamation point, referred to as the “bang” in coding slang) which negates or flips the logical value, so for example !FALSE
is TRUE
and !TRUE
is FALSE
(or !“I know what I’m talking about”).
Yet another very handy operator is %in%
which is used to identify if an element occurs in a second vector. Or a substitute for a series of “or” statements. Consider the example below and consult the manual page via help("%in%")
.
== 15 | dbh == 6 | dbh == 11 dbh
[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
%in% c(15, 6, 11) dbh
[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
%in% c("a", "m", "q", "s") letters
[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE
# letters == "a" | letters == "m" | letters = "q" | letters == "s" ## Find the error.
Say you want acerru and tsugca with DBH greater than 10.
$spp %in% c("acerru", "tsugca") & plt$dbh > 10,] plt[plt
spp dbh qual live
1 tsugca 15.0 ugs TRUE
2 tsugca 12.0 ags TRUE
8 acerru 11.1 ags TRUE
Now, say you want acerru and tsugca with DBH greater than 10 and acceptable growing stock.
$spp %in% c("acerru", "tsugca") & plt$dbh > 10 & plt$qual == "ags",] plt[plt
spp dbh qual live
2 tsugca 12.0 ags TRUE
8 acerru 11.1 ags TRUE
Your turn!
- Find all trees with DBH less than 9.5.
- Find all trees that are not tsugca (hint use the
!
). - Find all dead trees (hint use the
!
). - Find all live trees of unacceptable growing stock.
- Find all live betual and acerru that are of acceptable growing stock.
- Make up your own subsetting criteria.