This lesson is in the early stages of development (Alpha version)

Vectors and data types

Overview

Teaching: 15 min
Exercises: 20 min
Questions
  • What are data types in R?

Objectives
  • Describe four data types in R.

Vectors and data types

A vector is the most common and basic data type in R, and is pretty much the workhorse of R. A vector is composed by a series of values, which can be either numbers or characters. We can assign a series of values to a vector using the c() function. For example we can create a vector of animal weights and assign it to a new object weight_g:

weight_g <- c(50, 60, 65, 82)
weight_g
[1] 50 60 65 82

A vector can also contain characters:

animals <- c("mouse", "rat", "dog")
animals
[1] "mouse" "rat"   "dog"  

The quotes around “mouse”, “rat”, etc. are essential here. Without the quotes R will assume objects have been created called mouse, rat and dog. As these objects don’t exist in R’s memory, there will be an error message.

There are many functions that allow you to inspect the content of a vector. length() tells you how many elements are in a particular vector:

length(weight_g)
[1] 4
length(animals)
[1] 3

An important feature of a vector, is that all of the elements are the same type of data. The function class() indicates what kind of object you are working with:

class(weight_g)
[1] "numeric"
class(animals)
[1] "character"

The function str() provides an overview of the structure of an object and its elements. It is a useful function when working with large and complex objects:

str(weight_g)
 num [1:4] 50 60 65 82
str(animals)
 chr [1:3] "mouse" "rat" "dog"

You can use the c() function to add other elements to your vector:

weight_g <- c(weight_g, 90) # add to the end of the vector
weight_g <- c(30, weight_g) # add to the beginning of the vector
weight_g
[1] 30 50 60 65 82 90

In the first line, we take the original vector weight_g, add the value 90 to the end of it, and save the result back into weight_g. Then we add the value 30 to the beginning, again saving the result back into weight_g.

We can do this over and over again to grow a vector, or assemble a dataset. As we program, this may be useful to add results that we are collecting or calculating.

An atomic vector is the simplest R data type and is a linear vector of a single type. Above, we saw 2 of the 6 main atomic vector types that R uses: "character" and "numeric" (or "double"). These are the basic building blocks that all R objects are built from. The other 4 atomic vector types are:

You can check the type of your vector using the typeof() function and inputting your vector as the argument.

Vectors are one of the many data structures that R uses. Other important ones are lists (list), matrices (matrix), data frames (data.frame), factors (factor) and arrays (array).

Data Structure: First Steps in R by Maite Ceballos and Nicolás Cardiel. 2013. https://web.archive.org/web/20200621022950/http://venus.ifca.unican.es/Rintro/dataStruct.html

Notice that vectors are one-dimensional containers for data all of the same type. All must be either character, numeric, logical or complex but no mixing of data types is permitted. A data frame is a two-dimensional data structure. Notice that each column of a data frame is a vector and that all elements in each column must be of the same data type. Lists, like vectors, are one-dimensional however they permit mixing of data types. Each row of a data frame is a list - a one-dimensional mix of different kinds of data.

Here is a summary table of R data structures, their dimensions, and the kind of data they permit.

Advanced R: Data Structures by Hadley Wickham. http://adv-r.had.co.nz/Data-structures.html

Exercise 1

We’ve seen that atomic vectors can be of type character, numeric (or double), integer, and logical. But what happens if we try to mix these types in a single vector?

Solution

R implicitly converts them to all be the same type

Exercise 2

What will happen in each of these examples? (hint: use class() to check the data type of your objects):

    num_char <- c(1, 2, 3, "a")  
    num_logical <- c(1, 2, 3, TRUE)   
    char_logical <- c("a", "b", "c", TRUE)    
    tricky <- c(1, 2, 3, "4")    

Why do you think it happens?

Solution

Vectors can be of only one data type. R tries to convert (coerce) the content of this vector to find a “common denominator” that doesn’t lose any information.

Exercise 3

How many values in combined_logical are "TRUE" (as a character) in the following example (reusing the 2 ..._logicals from above):

    combined_logical <- c(num_logical, char_logical)

Solution

Only one. There is no memory of past data types, and the coercion happens the first time the vector is evaluated. Therefore, the TRUE in num_logical gets converted into a 1 before it gets converted into "1" in combined_logical.

Exercise 4

You’ve probably noticed that objects of different types get converted into a single, shared type within a vector. In R, we call converting objects from one class into another class coercion. These conversions happen according to a hierarchy, whereby some types get preferentially coerced into other types. Can you draw a diagram that represents the hierarchy of how these data types are coerced?

Solution

logical → numeric → character ← logical

Key Points