class(log)
[1] "function"
R is a functional programming language.
class(log)
[1] "function"
Functions take in some input and return some output. The input are a collection of arguments to the function and the output is the return value.
log(10)
[1] 2.302585
log(x = 10)
[1] 2.302585
log(10, base = exp(1))
[1] 2.302585
log(10, base = 10)
[1] 1
log(x = 10, base = 10)
[1] 1
Take a look at the arguments.
args(log)
function (x, base = exp(1))
NULL
In the log
function, the default value for the base
argument is exp(1)
.
all.equal(
log(10),
log(10, base = exp(1))
)
[1] TRUE
log(10, exp(1))
[1] 2.302585
log(exp(1), 10)
[1] 0.4342945
log(x = 10, base = exp(1))
[1] 2.302585
log(base = exp(1), x = 10)
[1] 2.302585
log(10, b = exp(1))
[1] 2.302585
log(10, ba = exp(1))
[1] 2.302585
log(10, bas = exp(1))
[1] 2.302585
log(10, base = exp(1))
[1] 2.302585
<- 100
y log(y)
[1] 4.60517
class(log(10))
[1] "numeric"
class(as.data.frame(10))
[1] "data.frame"
class(all.equal(1,1))
[1] "logical"
class(all.equal(1,2))
[1] "character"
<- lm(len ~ dose, data = ToothGrowth)
m class(m)
[1] "lm"
class(summary(m))
[1] "summary.lm"
# Create a function
<- function(x, y) {
add + y
x
}
# Argument by order
add(1, 2)
[1] 3
# Argument by name
add(x = 1, y = 2)
[1] 3
# Vector arguments
add(1:2, 3:4)
[1] 4 6
# Vector and scalar argument
add(1:2, 3)
[1] 4 5
add(1:2, 3:5)
Warning in x + y: longer object length is not a multiple of shorter object
length
[1] 4 6 6
# Define function
<- function(x = 1, y = 2) {
add + y
x
}
# Uses both default arguments
add()
[1] 3
# Only uses second default argument
add(3)
[1] 5
# Specify argument by name
add(y = 5)
[1] 6
R functions will return the last value created, but better practice is to explicitly return using the return()
function.
# Create function with explicit return
<- function(x, y) {
add return(x + y)
}
# Run function
add(1, 2)
[1] 3
Suppose you want to return a TRUE/FALSE depending on whether a specific character is in a string. As soon as you find the character, you can immediately return TRUE If you don’t find the character, you can return FALSE.
# Define function to check string for a character
<- function(string, char) {
is_char_in_string for (i in 1:nchar(string)) {
if (char == substr(string, i, i))
return(TRUE)
}return(FALSE)
}
# Examples
is_char_in_string("this is my string", "a")
[1] FALSE
is_char_in_string("this is my string", "s")
[1] TRUE
Errors will automatically be produced for the underlying functions you used.
# Error since "a" is not numeric
add(1, "a")
Error in x + y: non-numeric argument to binary operator
There are a variety of ways to communicate to the user. Use message()
for communicating a message.
# Create function with message
<- function(x, y) {
add message(paste(x, "+", y, "="))
return(x + y)
}
# Example message
add(1, 2)
1 + 2 =
[1] 3
Use warning()
when you think the results may not be what the user is expecting.
<- function(x, y) {
add if (length(x) != length(y))
warning("'x' and 'y' have inequal length.")
return(x + y)
}
# No warning
add(1, 2)
[1] 3
# Warning
add(1:2, 3)
Warning in add(1:2, 3): 'x' and 'y' have inequal length.
[1] 4 5
Use stop when it is clear the user is not doing what they intend.
# Define function with
<- function(x, y) {
add if (!is.numeric(x) | !is.numeric(y))
stop("Either 'x' or 'y' or both are not numeric!")
return(x + y)
}
# No Error
add(1, 2)
[1] 3
# Error
add("a", 2)
Error in add("a", 2): Either 'x' or 'y' or both are not numeric!
As shown previously, this would have caused an error in our original add()
function. Perhaps, this version of the error is more helpful.
The stopifnot()
function can be used to construct reasonably informative error messages.
# stopifnot() example
<- function(x, y) {
add stopifnot(is.numeric(x))
stopifnot(is.numeric(y))
return(x + y)
}
# No error
add(1, 2)
[1] 3
Now with an error.
add(1, "b")
Error in add(1, "b"): is.numeric(y) is not TRUE
These are some issues I want you to be aware of so you (I hope) avoid issues in the future.
Specifying argument values must be done using =
. You can simultaneously define R objects using <-
when specifying argument values. Generally, this should be avoided.
# Define function
<- function(x, y) {
my_fancy_function return(x + y*100)
}
What is the result of the following?
# Weird assignment
my_fancy_function(y <- 5, x <- 4)
[1] 405
What happened? We assigned y
the value 5 and x
the value 4 outside the function. Then, we passed y
(5) as the first argument of the function and x
(4) as the second argument fo the function.
This was equivalent to
# Prefer assignment outside the function
<- 5
y <- 4
x my_fancy_function(x = y, y = x)
[1] 405
So, when assigning function arguments, use =
. Also, it is probably helpful to avoid naming objects the same name as the argument names.
R functions will look outside their function if objects are missing.
# Define function with missing object
<- function() {
f return(y) # y was never defined
}
What is the result of the following?
# What will this result be.
f()
[1] 5
Basically, R searches through a series of environments to find the variable called y
.
But, if you change an object’s value inside the function, this will not be retained outside the function,
# Create function
<- function() {
f <- a + 1 # change object's value
a print(a)
}
# Create an object
<- 1
a f()
[1] 2
# object's value is not changed outside the function a
[1] 1
Sometimes you get baffling error messages due to closure
errors or special
errors.
1] mean[
Error in mean[1]: object of type 'closure' is not subsettable
1] log[
Error in log[1]: object of type 'special' is not subsettable
This is related to functions having a typeof
closure
or special
.
# typeof
typeof(mean)
[1] "closure"
typeof(log)
[1] "special"
You will see closure
errors much more commonly than special
errors. Both of these errors indicate problems using a function.
Generic functions in R will use the class()
of the first argument to determine what specific version of the function will be used.
The mean()
function is an example of a generic function.
# UseMethod() indicates a generic function
print(mean)
function (x, ...)
UseMethod("mean")
<bytecode: 0x103f19788>
<environment: namespace:base>
Take a look at the help file
# Look at the help file
?mean
Notice the words “Generic function”. This means what the function does will depend on the class of the first argument to the function.
# Using mean() on different object types
mean(1:5)
[1] 3
mean(as.Date(c("2023-01-01","2022-01-01")))
[1] "2022-07-02"
I bring up generic functions primarily to point out that it can be hard to track down the appropriate helpfile. Generally you will look up <function>.<class>
.
For example,
# Determine the class
class(as.Date(c("2023-01-01","2022-01-01")))
[1] "Date"
So you need to look up the helpfile for mean.Date()
.
# Look up the function
?mean.Date
This didn’t provide the actual help information. Because it went somewhere, this was the intended behavior. Why it isn’t documented, I have no idea.
# Integer
class(1:5)
[1] "integer"
# Try mean.integer help
?mean.integer
There is typically a default method that will be used if a specific method can’t be found.
# Try mean.default
?mean.default
Another function that we have used for multiple different data types is summary()
.
# Various uses of the generic summary function
summary(ToothGrowth$len) # numeric
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.20 13.07 19.25 18.81 25.27 33.90
summary(ToothGrowth$supp) # factor
OJ VC
30 30
summary(ToothGrowth) # data.frame
len supp dose
Min. : 4.20 OJ:30 Min. :0.500
1st Qu.:13.07 VC:30 1st Qu.:0.500
Median :19.25 Median :1.000
Mean :18.81 Mean :1.167
3rd Qu.:25.27 3rd Qu.:2.000
Max. :33.90 Max. :2.000
summary(lm(len ~ supp, data = ToothGrowth)) # lm object
Call:
lm(formula = len ~ supp, data = ToothGrowth)
Residuals:
Min 1Q Median 3Q Max
-12.7633 -5.7633 0.4367 5.5867 16.9367
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.663 1.366 15.127 <2e-16 ***
suppVC -3.700 1.932 -1.915 0.0604 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.482 on 58 degrees of freedom
Multiple R-squared: 0.05948, Adjusted R-squared: 0.04327
F-statistic: 3.668 on 1 and 58 DF, p-value: 0.06039
Take a look at the helpfiles for the different summary functions
# Summary function helpfiles
?summary
?summary.numeric
?summary.factor
?summary.data.frame ?summary.lm
Some functions have a ...
argument. This argument will get expanded by the underlying code and treated appropriately.
# Sum helpfile
?sum
For the sum()
function, it will sum everything.
# Sum scalars
sum(1,2,3)
[1] 6
# Sum a vector
sum(5:6)
[1] 11
# Sum scalars and vetor
sum(1,2,3,5:6)
[1] 17
Typos get ignored
# Typo in argument name
sum(c(1,2,NA), na.mr = TRUE) # vs
[1] NA
sum(c(1,2,NA), na.rm = TRUE)
[1] 3