Vectors and data types
Overview
Teaching: 15 min
Exercises: 20 minQuestions
What are data types in R?
Objectives
Describe four data types in R.
Vectors and data types
A vector is the most common and basic data type in R, and is pretty much
the workhorse of R. A vector is composed by a series of values, which can be
either numbers or characters. We can assign a series of values to a vector using
the c()
function. For example we can create a vector of animal weights and assign
it to a new object weight_g
:
weight_g <- c(50, 60, 65, 82)
weight_g
[1] 50 60 65 82
A vector can also contain characters:
animals <- c("mouse", "rat", "dog")
animals
[1] "mouse" "rat" "dog"
The quotes around “mouse”, “rat”, etc. are essential here. Without the quotes R
will assume objects have been created called mouse
, rat
and dog
. As these objects
don’t exist in R’s memory, there will be an error message.
There are many functions that allow you to inspect the content of a
vector. length()
tells you how many elements are in a particular vector:
length(weight_g)
[1] 4
length(animals)
[1] 3
An important feature of a vector, is that all of the elements are the same type of data.
The function class()
indicates what kind of object you are working with:
class(weight_g)
[1] "numeric"
class(animals)
[1] "character"
The function str()
provides an overview of the structure of an object and its
elements. It is a useful function when working with large and complex
objects:
str(weight_g)
num [1:4] 50 60 65 82
str(animals)
chr [1:3] "mouse" "rat" "dog"
You can use the c()
function to add other elements to your vector:
weight_g <- c(weight_g, 90) # add to the end of the vector
weight_g <- c(30, weight_g) # add to the beginning of the vector
weight_g
[1] 30 50 60 65 82 90
In the first line, we take the original vector weight_g
,
add the value 90
to the end of it, and save the result back into
weight_g
. Then we add the value 30
to the beginning, again saving the result
back into weight_g
.
We can do this over and over again to grow a vector, or assemble a dataset. As we program, this may be useful to add results that we are collecting or calculating.
An atomic vector is the simplest R data type and is a linear vector of a single type. Above, we saw
2 of the 6 main atomic vector types that R
uses: "character"
and "numeric"
(or "double"
). These are the basic building blocks that
all R objects are built from. The other 4 atomic vector types are:
"logical"
forTRUE
andFALSE
(the boolean data type)"integer"
for integer numbers (e.g.,2L
, theL
indicates to R that it’s an integer)"complex"
to represent complex numbers with real and imaginary parts (e.g.,1 + 4i
) and that’s all we’re going to say about them"raw"
for bitstreams that we won’t discuss further
You can check the type of your vector using the typeof()
function and inputting your vector as the argument.
Vectors are one of the many data structures that R uses. Other important
ones are lists (list
), matrices (matrix
), data frames (data.frame
),
factors (factor
) and arrays (array
).
Notice that vectors are one-dimensional containers for data all of the same type. All must be either character, numeric, logical or complex but no mixing of data types is permitted. A data frame is a two-dimensional data structure. Notice that each column of a data frame is a vector and that all elements in each column must be of the same data type. Lists, like vectors, are one-dimensional however they permit mixing of data types. Each row of a data frame is a list - a one-dimensional mix of different kinds of data.
Here is a summary table of R data structures, their dimensions, and the kind of data they permit.
Exercise 1
We’ve seen that atomic vectors can be of type character, numeric (or double), integer, and logical. But what happens if we try to mix these types in a single vector?
Solution
R implicitly converts them to all be the same type
Exercise 2
What will happen in each of these examples? (hint: use
class()
to check the data type of your objects):num_char <- c(1, 2, 3, "a") num_logical <- c(1, 2, 3, TRUE) char_logical <- c("a", "b", "c", TRUE) tricky <- c(1, 2, 3, "4")
Why do you think it happens?
Solution
Vectors can be of only one data type. R tries to convert (coerce) the content of this vector to find a “common denominator” that doesn’t lose any information.
Exercise 3
How many values in
combined_logical
are"TRUE"
(as a character) in the following example (reusing the 2..._logical
s from above):combined_logical <- c(num_logical, char_logical)
Solution
Only one. There is no memory of past data types, and the coercion happens the first time the vector is evaluated. Therefore, the
TRUE
innum_logical
gets converted into a1
before it gets converted into"1"
incombined_logical
.
Exercise 4
You’ve probably noticed that objects of different types get converted into a single, shared type within a vector. In R, we call converting objects from one class into another class coercion. These conversions happen according to a hierarchy, whereby some types get preferentially coerced into other types. Can you draw a diagram that represents the hierarchy of how these data types are coerced?
Solution
logical → numeric → character ← logical
Key Points