2.1 Vectors

The vector is the basic unit of information in R. Pretty much everything else we’ll concern ourselves with is made of vectors and can be contained within one. Wow, what an existential paradox that is.

Let’s take a look at how this works and why it matters. Here, we have defined a as a variable with the value of 1.

a <- 1

…or have we?

print(a)
## [1] 1

What is the square bracket in the output here? It’s an index. The index is telling us that the first element of a is 1. This means that a is actually a “vector”, not a “scalar” or singular value as you may have been thinking about it. You can think of a vector as a column in an Excel spreadsheet or an analogous data table. By treating every object (loosely) as a vector, or an element thereof, the language becomes much more general.

So, even if we define something with a single value, it is still just a vector with one element. For us, this is important because of the way that it lets us do math. It makes vector operations so easy that we don’t even need to think about them when we start to make statistical models. It makes working through the math a zillion times easier than on paper! In terms of programming, it can make a lot of things easier, too.

An atomic vector is a vector that can hold one and only one kind of data. These can include:

  • Character
  • Numeric
  • Integer
  • Logical
  • Factor
  • Date/time

And some others, but none with which we’ll concern ourselves here.

If you are ever curious about what kind of object you are working with, you can find out by exposing the data structure with str():

Let’s go play with some!

str(a)
##  num 1

Examples of atomic vectors follow. Run the code to see what it does.

Integers and numerics

First, we demonstrate one way to make a vector in R. The c() function (“combine”) is our friend here for the quick-and-dirty approach.

In this case, we are making an object that contains a sequence of whole numbers, or integers.

# Make a vector of integers 1-5
a <- c(1, 2, 3, 4, 5)

# One way to look at our vector
print(a)

Here is another way to make the same vector, but we need to pay attention to how R sees the data type. A closer look shows that these methods produce a numeric vector (num) instead of an integer vector (int). For the most part, this one won’t make a huge difference, but it can become important when writing or debugging statistical models.

# Define the same vector using a sequence
a <- seq(from = 1, to = 5, by = 1)
str(a)
##  num [1:5] 1 2 3 4 5

We can change this by explicitly telling R how to build our vector:

a <- as.vector(x = seq(1, 5, 1), mode = "numeric")

Notice that I did not include the argument names in the call to seq() because these are commonly used default arguments. But, you can find out what they are by running ?seq.

Characters and factors

Characters are anything that is represented as text strings. If I want to make a vector of character strings, I need to close the elements in quotes like I do below. Otherwise, R will go look for objects with these names.

b <- c("a", "b", "c", "d", "e") # Make a character vector
b # Print it to the console
## [1] "a" "b" "c" "d" "e"
str(b) # Now it's a character vector
##  chr [1:5] "a" "b" "c" "d" "e"

They are readily converted (sometimes automatically) to factors:

b <- as.factor(b) # But we can change if we want
b
## [1] a b c d e
## Levels: a b c d e
str(b) # Look at the data structure
##  Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5

Factors are a special kind of data type in R that we may run across from time to time. They have levels that can be ordered numerically. By default, R assigns factor levels (1, 2, 3 …) in alpha numeric order, not by the order in which levels first appear in the data. This is not important except that it becomes useful for coding variables used in statistical models. But even then R does most of this behind the scenes and we won’t have to worry about it for the most part. In fact, in a lot of cases we will want to change factors to numerics or characters so they are easier to manipulate.

This is what it looks like when we code a factor as number:

as.numeric(b)
# What did that do?
?as.numeric

Aside: we can ask R what functions mean by adding a question mark as we do above in a couple of instances. And not just functions: we can ask it about pretty much any built-in object. The help pages take a little getting used to, but once you get the hang of it… In the mean time, the internet is your friend and you will find a multitude of online groups and forums with a quick search.

Logical vectors

Most of the logical vectors we deal with are yes/no or comparisons to determine whether a given piece of information matches a condition. Here, we use a logical check to see if the object a we created earlier is the same as object b. If we store the results of this check to a new object c, we get a new logical vector filled with TRUE and FALSE, one for each element in a and b.

# The "==" compares the numeric vector to the factor one
c <- a == b
c
## [1] FALSE FALSE FALSE FALSE FALSE
str(c)
##  logi [1:5] FALSE FALSE FALSE FALSE FALSE

We now have a logical vector. For the sake of demonstration, we could perform any number of logical checks on a vector using built-in R functions (it does not need to be a logical like c above).

We can check for missing values.

is.na(a)
## [1] FALSE FALSE FALSE FALSE FALSE

We can make sure that all values are finite.

is.finite(a)
## [1] TRUE TRUE TRUE TRUE TRUE

The exclamation ! point means “not” in to computers.

!is.na(a)
## [1] TRUE TRUE TRUE TRUE TRUE

We can see if specific elements meet a criterion.

a == 3
## [1] FALSE FALSE  TRUE FALSE FALSE

We can just look at unique values.

unique(b)
## [1] a b c d e
## Levels: a b c d e

The examples above are all simple vector operations. These form the basis for data manipulation and analysis in R.