Basics in R

Dates

x <- "02/10/2017"
x
## [1] "02/10/2017"
x <- as.Date(x, format = "%d/%m/%Y")
x
## [1] "2017-10-02"

Date components: Day

  • %a Abbreviated day of week
  • %A Full day of week
  • %d Numeric day of the month (01-31)
  • %e Numeric day of month with leading space for single digits
  • %u Numeric day of week, Monday = 1

Month

  • %b Abbreviated month
  • %B Full month name
  • %m Numeric month (01-12)

Year

  • %y Year without century
  • %Y Year with century

So one can format a date:

format(x, "%a")
## [1] "Mon"
format(x, "%A, %d %B %Y")
## [1] "Monday, 02 October 2017"

Logical operators

x <- 5
x == 5
## [1] TRUE
x < 1
## [1] FALSE
x > 2
## [1] TRUE
x < 5
## [1] FALSE
x <= 5
## [1] TRUE
x %in% c(1, 2, 3, 4, 5)
## [1] TRUE

Data types

Data frames

These are the bread-and-butter data types where you’ll do most of your work. They are rectangular data where each row forms an observation and each column is a variable. Technically, a dataframe is a list of equal-length vectors.

Making a dataframe

x <- data.frame(id = c(1,2,3,4,5),
                x_var = c(4,2,9,4,7), 
                height = c("Tall", "Short", "Medium", "Tall", "Short"), 
                stringsAsFactors = FALSE)
x
##   id x_var height
## 1  1     4   Tall
## 2  2     2  Short
## 3  3     9 Medium
## 4  4     4   Tall
## 5  5     7  Short

Accessing elements of a dataframe

Square-bracket indexing is used to access elements of a dataframe

# First row
x[1, ]
##   id x_var height
## 1  1     4   Tall
# First column
x[ , 1]
## [1] 1 2 3 4 5
# Second row, third col
x[2, 3]
## [1] "Short"
# all cols for 'short' observations
x[x$height == "Short", ]
##   id x_var height
## 2  2     2  Short
## 5  5     7  Short

Missing values

Missing values are represented as NA in R. These are handled slightly differently to missing values in other statistical packages. NA values cannot be compared with other values.

x <- NA
class(x)
## [1] "logical"
x < 1
## [1] NA
x > 1
## [1] NA
is.na(x)
## [1] TRUE
dat <- data.frame(x = c(1,2, NA), y = c("a", "b", "c"))
dat[is.na(dat$x) == TRUE, ]
##    x y
## 3 NA c

Factors

Text can be stored in two ways: as character strings or as factors. When sorted, character strings are ordered alphabetically. Factors, however, are ordered as set by the levels of the factor.

dat <- data.frame(x = c("triangle", "triangle", "triangle", "square", "square",
                        "circle"), 
                  stringsAsFactors = FALSE)

table(dat$x) # alphabetical sorting
## 
##   circle   square triangle 
##        1        2        3
dat$x <- factor(dat$x, levels = c("square", "circle", "triangle"))
table(dat$x) # sorting as determined by levels of factor.
## 
##   square   circle triangle 
##        2        1        3

So this can be used to determine the ordering of groups in plots.