Basics in R
Basics in R
Dates
x <- "02/10/2017"
x
## [1] "02/10/2017"
x <- as.Date(x, format = "%d/%m/%Y")
x
## [1] "2017-10-02"
Date components: Day
%aAbbreviated day of week%AFull day of week%dNumeric day of the month (01-31)%eNumeric day of month with leading space for single digits%uNumeric day of week, Monday = 1
Month
%bAbbreviated month%BFull month name%mNumeric month (01-12)
Year
%yYear without century%YYear with century
So one can format a date:
format(x, "%a")
## [1] "Mon"
format(x, "%A, %d %B %Y")
## [1] "Monday, 02 October 2017"
Logical operators
x <- 5
x == 5
## [1] TRUE
x < 1
## [1] FALSE
x > 2
## [1] TRUE
x < 5
## [1] FALSE
x <= 5
## [1] TRUE
x %in% c(1, 2, 3, 4, 5)
## [1] TRUE
Data types
Data frames
These are the bread-and-butter data types where you’ll do most of your work.
They are rectangular data where each row forms an observation and each column is a variable.
Technically, a dataframe is a list of equal-length vectors.
Making a dataframe
x <- data.frame(id = c(1,2,3,4,5),
x_var = c(4,2,9,4,7),
height = c("Tall", "Short", "Medium", "Tall", "Short"),
stringsAsFactors = FALSE)
x
## id x_var height
## 1 1 4 Tall
## 2 2 2 Short
## 3 3 9 Medium
## 4 4 4 Tall
## 5 5 7 Short
Accessing elements of a dataframe
Square-bracket indexing is used to access elements of a dataframe
# First row
x[1, ]
## id x_var height
## 1 1 4 Tall
# First column
x[ , 1]
## [1] 1 2 3 4 5
# Second row, third col
x[2, 3]
## [1] "Short"
# all cols for 'short' observations
x[x$height == "Short", ]
## id x_var height
## 2 2 2 Short
## 5 5 7 Short
Missing values
Missing values are represented as NA in R.
These are handled slightly differently to missing values in other statistical packages.
NA values cannot be compared with other values.
x <- NA
class(x)
## [1] "logical"
x < 1
## [1] NA
x > 1
## [1] NA
is.na(x)
## [1] TRUE
dat <- data.frame(x = c(1,2, NA), y = c("a", "b", "c"))
dat[is.na(dat$x) == TRUE, ]
## x y
## 3 NA c
Factors
Text can be stored in two ways: as character strings or as factors. When sorted, character strings are ordered alphabetically. Factors, however, are ordered as set by the levels of the factor.
dat <- data.frame(x = c("triangle", "triangle", "triangle", "square", "square",
"circle"),
stringsAsFactors = FALSE)
table(dat$x) # alphabetical sorting
##
## circle square triangle
## 1 2 3
dat$x <- factor(dat$x, levels = c("square", "circle", "triangle"))
table(dat$x) # sorting as determined by levels of factor.
##
## square circle triangle
## 2 1 3
So this can be used to determine the ordering of groups in plots.