4  Getting Started

Author

Jarad Niemi

4.1 Console

4.1.1 R interface

In contrast to many other statistical software packages that use a point-and-click interface, e.g. SPSS, JMP, Stata, etc, R has a command-line interface. The command line has a command prompt, e.g. >, see below.

>

This means, that you will be entering commands on this command line and hitting enter to execute them, e.g. 

help()

Use the up arrow to recover past commands.

hepl()
help() # Use up arrow and fix

4.1.2 R GUI (or RStudio)

Most likely, you are using a graphical user interface (GUI) and therefore, in addition, to the command line, you also have a windowed version of R with some point-and-click options, e.g. File, Edit, and Help.

In particular, there is an editor to create a new R script. So rather than entering commands on the command line, you will write commands in a script and then send those commands to the command line using Ctrl-R (PC) or Command-Enter (Mac).

a = 1 
b = 2
a + b
[1] 3

Multiple lines can be run in sequence by selecting them and then using Ctrl-R (PC) or Command-Enter (Mac).

4.1.3 Intro Activity

One of the most effective ways to use this documentation is to cut-and-paste the commands into a script and then execute them.

Cut-and-paste the following commands into a new script and then run those commands directly from the script using Ctrl-R (PC) or Command-Enter (Mac).

x <- 1:10
y <- rep(c(1,2), each=5)
m <- lm(y~x)
s <- summary(m)

Now, look at the result of each line

x
y
m
s
s$r.squared

4.2 Calculator

4.2.1 Basic calculator operations

All basic calculator operations can be performed in R.

1+2
[1] 3
1-2
[1] -1
1/2
[1] 0.5
1*2
[1] 2
2^3 # same as 2**3
[1] 8

For now, you can ignore the [1] at the beginning of the line, we’ll learn about that when we get to vectors.

4.2.2 Advanced calculator operations

Many advanced calculator operations are also available.

(1+3)*2 + 100^2  # standard order of operations (PEMDAS)
[1] 10008
sin(2*pi)        # the result is in scientific notation, i.e. -2.449294 x 10^-16 
[1] -2.449294e-16
sqrt(4)
[1] 2
log(10)          # the default is base e
[1] 2.302585
log(10, base = 10)
[1] 1

4.2.3 Using variables

A real advantage to using R rather than a calculator (or calculator app) is the ability to store quantities using variables.

a = 1
b = 2
a + b
[1] 3
a - b
[1] -1
a / b
[1] 0.5
a * b
[1] 2
b ^ 3
[1] 8

4.2.3.1 Case sensitive

R is a case sensitive language and therefore you need to be careful about capitalization.

ThisObjectExists <- 3
ThisObjectExists
[1] 3
thisobjectexists # no it doesn't
Error in eval(expr, envir, enclos): object 'thisobjectexists' not found

4.2.3.2 Valid object names

Valid object names “consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number”.

# Valid object names
a  = 1
.b = 2
# Invalid object names
2a  = 3
.2a = 4
_c  = 5
Error: <text>:2:2: unexpected symbol
1: # Invalid object names
2: 2a
    ^

You cannot use any reserved names as object names, i.e. these names cannot be overwritten.

?Reserved

4.2.3.3 Tab auto-complete

4.2.4 Assignment operators =, <-, and ->

When assigning variables values, you can also use arrows <- and -> and you will often see this in code, e.g. 

a <- 1 # recommended
2 -> b # uncommon, but sometimes useful
c = 3  # similar to other languages

Now print them.

a
[1] 1
b
[1] 2
c
[1] 3

4.2.5 Use informative variable names

While using variables alone is useful, it is much more useful to use informative variables names.

# Rectangle
length <- 4
width  <- 3

area <- length * width
area
[1] 12
perimeter <- 2 * (length + width)



# (Right) Triangle
opposite     <- 1
angleDegrees <- 30
angleRadians <- angleDegrees * pi/180

(adjacent     <- opposite / tan(angleRadians)) # = sqrt(3)
[1] 1.732051
(hypotenuse   <- opposite / sin(angleRadians)) # = 2
[1] 2

4.3 Packages

When you install R, you actually install several R packages. When you initially start R, you will load the following packages

You can find this information by running the following code and looking at the attached base packages.

sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Ventura 13.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.4.0    fastmap_1.2.0     cli_3.6.3        
 [5] tools_4.4.0       htmltools_0.5.8.1 rstudioapi_0.16.0 rmarkdown_2.27   
 [9] knitr_1.47        jsonlite_1.8.8    xfun_0.45         digest_0.6.36    
[13] rlang_1.1.4       evaluate_0.24.0  

These packages provide a lot of functionality and have been in existence (almost as they currently are) from the early days of R.

While a lot of functionality exist in these packages, much more functionality exists in user contributed packages. On the comprehensive R archive network (CRAN), there are (as of 2023/01/29) 19,122 packages available for download. On Bioconductor, there are an additional 2,183. There are also additional packages that exist outside of these repositories.

4.3.1 Install packages

To install packages from CRAN, use the install.packages function. For example,

install.packages("tidyverse")

or, to install all the packages needed for this class,

install.packages(c("tidyverse",
                   "gridExtra",
                   "rmarkdown",
                   "knitr",
                   "hexbin"))

R packages almost always depend on other packages. When you use the install.packages() function, R will automatically install these dependencies.

You may run into problems during this installation process. Sometimes the dependencies will fail. If this occurs, try to install just that dependency using install.packages().

Sometimes you will be asked whether you want to install a newer version of a package from source. My general advice (for those new to R) is to say no and instead install the older version of the package. If you want to install from source, you will need Rtools (Windows) or Xcode (Mac). Alternatively, you can wait a couple of days for the newer version to be pre-compiled.

4.3.2 Load packages

The installation only needs to be done once. But we will need to load the packages in every R session where we want to use them. To load the packages, use

library("dplyr")
library("tidyr")
library("ggplot2")

alternatively, you can load the entire (not very big) tidyverse.

library("tidyverse")
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4.9000     ✔ readr     2.1.5     
✔ forcats   1.0.0          ✔ stringr   1.5.1     
✔ ggplot2   3.5.1          ✔ tibble    3.2.1     
✔ lubridate 1.9.3          ✔ tidyr     1.3.1     
✔ purrr     1.0.2          
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

4.4 Getting help for R

4.4.1 Learning R

4.4.1.1 learnr

The Tutorial tab in RStudio is powered by the learnr package. Since it is built by RStudio it will provide an introduce to the tidyverse approach to data science. Click on the “Install learnr package’ in the RStudio Tutorial tab or run the code below.

install.packages("learnr")

4.4.1.2 swirl

To learn R, you may want to try the swirl package. To install, use

install.packages("swirl")

After installation, use the following to get started

library("swirl")
swirl()

Also, the R 4 Data Science (R4DS) and R4DS (2e) book is extremely helpful.

4.4.2 General help

As you work with R, there will be many times when you need to get help.

My basic approach is

  1. Use the help contained within R.
  2. Perform an internet search for an answer.
  3. Find somebody else who knows.

In all cases, knowing the R keywords, e.g. a function name, will be extremely helpful.

4.4.3 Help within R I

If you know the function name, then you can use ?<function>, e.g.

?mean

The structure of help is

  • Description: quick description of what the function does
  • Usage: the arguments, their order, and default values (if any)
  • Arguments: more thorough description about the arguments
  • Value: what the funtion returns
  • See Also: similar functions
  • Examples: examples of how to use the function

4.4.4 Help within R II

If you cannot remember the function name, then you can use help.search("<something>"), e.g.

help.search("mean")

Depending on how many packages you have installed, you will find a lot or a little here.

4.4.5 Internet search for R help

I google for <something> R, e.g. 

calculate mean R

Some useful sites are

4.4.6 Getting help on ggplot2

Although the general R help can still be used, e.g. 

?ggplot
?geom_point

It is much more helpful to google for an answer

geom_point 
ggplot2 line colors

The top hits will all have the code along with what the code produces.

4.4.7 Helpful sites

These sites all provide code. The first two also provide the plots that are produced.