Functions

Practical Computing and Data Science Tools

Annoucements

  • Midterm 1 is in two weeks, Oct 3, during lab time.
  • Lab 1 & 2 grades will be released soon on D2L.

Agenda

  • Review Data Structures
  • Functions

Data Structures

Vectors

Consider the vectors:

names <- c("Western red cedar", "Douglas-fir", "Pacific madrone")
ages <- c(NA, 120, 82)
conifer <- c(TRUE, TRUE, FALSE)

What are their types?

Vectors

What are their types?

typeof(names)
[1] "character"
typeof(ages)
[1] "double"
typeof(conifer)
[1] "logical"

Why are these different? Aren’t they all vectors?

00:30

Vectors

What’s the average age?

mean(ages)
[1] NA

Vectors

What’s the average age?

mean(ages)
[1] NA
mean(ages, na.rm = TRUE)
[1] 101

Vectors

How do we get the second element of conifer?

00:30

Vectors

How do we get the second element of conifer?

conifer[2]
[1] TRUE

Dataframes

Consider the dataframe:

my_df <- data.frame(
  names = names,
  ages = ages,
  conifer = conifer
)

my_df
              names ages conifer
1 Western red cedar   NA    TRUE
2       Douglas-fir  120    TRUE
3   Pacific madrone   82   FALSE

Dataframes

How do we access the third row of my_df?

00:30

Dataframes

How do we access the third row of my_df?

my_df[3, ]
            names ages conifer
3 Pacific madrone   82   FALSE

Dataframes

How do we access the age in the third row of my_df?

00:30

Dataframes

How do we access the age in the third row of my_df?

my_df[3, "ages"]
[1] 82

Comparison and logical operators

You learned about some very important operators in Chapter 4:

  • > and <: greater than and less than
  • >= and <=: greater than or equal to and less than or equal to
  • == and !=: equal to and not equal to
  • &: and
  • |: or
  • %in%: in

Comparison and logical operators

Remember, we can use these operators to subset vectors and dataframes:

In ages, get all ages that are greater than 82

[1]  NA 120  82
01:00

Comparison and logical operators

Remember, we can use these operators to subset vectors and dataframes:

In ages, get all ages that are greater than 82

ages[ages > 82]
[1]  NA 120

Comparison and logical operators

Remember, we can use these operators to subset vectors and dataframes:

In ages, get all ages that are greater than 82

ages[ages > 82]
[1]  NA 120

What’s going on with that NA? Try:

ages[ages > 82 & !is.na(ages)]
[1] 120

Comparison and logical operators

Remember, we can use these operators to subset vectors and dataframes:

In my_df, get all rows that are not conifers

              names ages conifer
1 Western red cedar   NA    TRUE
2       Douglas-fir  120    TRUE
3   Pacific madrone   82   FALSE
01:00

Comparison and logical operators

Remember, we can use these operators to subset vectors and dataframes:

In my_df, get all rows that are not conifers

my_df[!my_df$conifer, ]
            names ages conifer
3 Pacific madrone   82   FALSE

Comparison and logical operators

Remember, we can use these operators to subset vectors and dataframes:

In my_df, get all rows that are Pacific madrone

              names ages conifer
1 Western red cedar   NA    TRUE
2       Douglas-fir  120    TRUE
3   Pacific madrone   82   FALSE
01:00

Comparison and logical operators

Remember, we can use these operators to subset vectors and dataframes:

In my_df, get all rows that are Pacific madrone

my_df[my_df$names == "Pacific madrone", ]
            names ages conifer
3 Pacific madrone   82   FALSE

Comparison and logical operators

Remember, we can use these operators to subset vectors and dataframes:

In my_df, get all rows that are either Pacific madrone or Douglas-fir.

              names ages conifer
1 Western red cedar   NA    TRUE
2       Douglas-fir  120    TRUE
3   Pacific madrone   82   FALSE
01:00

Comparison and logical operators

Remember, we can use these operators to subset vectors and dataframes:

In my_df, get all rows that are either Pacific madrone or Douglas-fir.

my_df[my_df$names %in% c("Pacific madrone", "Douglas-fir"), ]
            names ages conifer
2     Douglas-fir  120    TRUE
3 Pacific madrone   82   FALSE

Functions

What are some functions we’ve used in this class?

01:00

Example: write.csv()

write.csv()

Example: write.csv()

write.csv(
  x = dataset,
  file = "path/to/output.csv"
)

Example: write.csv()

write.csv(
  x = dataset,
  file = "path/to/output.csv"
)

Example: write.csv()

write.csv(
  x = dataset,
  file = "path/to/output.csv"
)
  • x and file are arguments supplied to the write.csv() function.

  • Functions in R fall into three categories:

    • Already built-in functions like read.csv(),
    • functions from packages that you install onto your computer, and
    • entirely custom function built by you!

Built-in ‘base’ functions

Functions from packages you install

Custom functions built by you!

Writing a function

hello_world <- function() {
  
}

Writing a function

hello_world <- function() {
  
}

hello_world()
NULL

Writing a function

hello_world <- function() {
  return("hello!")
}

Writing a function

hello_world <- function() {
  return("hello!")
}

hello_world()
[1] "hello!"

Writing a(n interesting) function

pow <- function(x, v) {
  result <- x^v
  return(result)
}
  • What does the pow() function do?
  • What arguments does pow() take?
  • What will pow(5, 2) return?
  • What will pow(v = 5, x = 2) return?
  • What will pow("ten", "two") return?
  • What will pow(x = 5, z = 2) return?
02:00

pow() test

pow(5, 2)
[1] 25

pow() test

pow(5, 2)
[1] 25
pow(v = 5, x = 2)
[1] 32

pow() test

pow(5, 2)
[1] 25
pow(v = 5, x = 2)
[1] 32
pow("ten", "two")
Error in x^v: non-numeric argument to binary operator

pow() test

pow(5, 2)
[1] 25
pow(v = 5, x = 2)
[1] 32
pow("ten", "two")
Error in x^v: non-numeric argument to binary operator
pow(x = 5, z = 2)
Error in pow(x = 5, z = 2): unused argument (z = 2)

Errors

Consider the following errors:

pow("ten", "two")
Error in x^v: non-numeric argument to binary operator
pow(x = 5, z = 2)
Error in pow(x = 5, z = 2): unused argument (z = 2)
  • What do/don’t you like about each error?

  • Which error message is better? Why?

02:00

Informative Error Messages

Informative Error Messages

new_pow <- function(x, v) {
  stopifnot(
    "The x argument value should be numeric." = is.numeric(x),
    "The v argument value should be numeric." = is.numeric(v)
  )
  
  result <- x^v
  return(result)
}

Testing pow() vs. new_pow()

pow("ten", "two")
Error in x^v: non-numeric argument to binary operator
new_pow("ten", "two")
Error in new_pow("ten", "two"): The x argument value should be numeric.

Testing pow() vs. new_pow()

pow("ten", "two")
Error in x^v: non-numeric argument to binary operator
new_pow("ten", "two")
Error in new_pow("ten", "two"): The x argument value should be numeric.
new_pow(10, "two")
Error in new_pow(10, "two"): The v argument value should be numeric.

Testing pow() vs. new_pow()

pow("ten", "two")
Error in x^v: non-numeric argument to binary operator
new_pow("ten", "two")
Error in new_pow("ten", "two"): The x argument value should be numeric.
new_pow(10, "two")
Error in new_pow(10, "two"): The v argument value should be numeric.
new_pow(10, 2)
[1] 100

Difference function

Write a function, called diff(), that takes the difference (subtracts) its first and second arguments.

05:00

Difference function

A possible solution:

diff <- function(a, b) {
  result <- a - b
  return(result)
}

Difference function

A possible solution with informative error messages:

diff <- function(a, b) {
  stopifnot(
    "The a argument value should be numeric." = is.numeric(a),
    "The b argument value should be numeric." = is.numeric(b)
  )
  result <- a - b
  return(result)
}

Difference function

Testing:

diff(5, 2)
[1] 3
diff(2, 5)
[1] -3
diff("two", 5)
Error in diff("two", 5): The a argument value should be numeric.

Next Week

  • More functions!
  • Control statements: if, else
  • Loops!

See you upstairs in lab!