2.1 Vectors
The vector is the basic unit of information in R. Pretty much everything else we’ll concern ourselves with is made of vectors and can be contained within one. Wow, what an existential paradox that is.
Let’s take a look at how this works and why it matters. Here, we have defined a
as a variable with the value of 1
.
…or have we?
## [1] 1
What is the square bracket in the output here? It’s an index. The index is telling us that the first element of a
is 1
. This means that a
is actually a “vector”, not a “scalar” or singular value as you may have been thinking about it. You can think of a vector as a column in an Excel spreadsheet or an analogous data table. By treating every object (loosely) as a vector, or an element thereof, the language becomes much more general.
So, even if we define something with a single value, it is still just a vector with one element. For us, this is important because of the way that it lets us do math. It makes vector operations so easy that we don’t even need to think about them when we start to make statistical models. It makes working through the math a zillion times easier than on paper! In terms of programming, it can make a lot of things easier, too.
An atomic vector is a vector that can hold one and only one kind of data. These can include:
- Character
- Numeric
- Integer
- Logical
- Factor
- Date/time
And some others, but none with which we’ll concern ourselves here.
If you are ever curious about what kind of object you are working with, you can find out by exposing the data structure with str()
:
Let’s go play with some!
## num 1
Examples of atomic vectors follow. Run the code to see what it does:
Integers and numerics
First, we demonstrate one way to make a vector in R. The c()
function (“combine”) is our friend here for the quick-and-dirty approach.
In this case, we are making an object that contains a sequence of whole numbers, or integers.
Here is another way to make the same vector, but we need to pay attention to how R sees the data type. A closer look shows that these methods produce a numeric vector (num
) instead of an integer vector (int
). For the most part, this one won’t make a huge difference, but it can become important when writing or debugging statistical models.
## num [1:5] 1 2 3 4 5
We can change this by explicitly telling R how to build our vector:
Notice that I did not include the argument names in the call to seq()
because these are commonly used default arguments.
Characters and factors
Characters are anything that is represented as text strings.
## [1] "a" "b" "c" "d" "e"
## chr [1:5] "a" "b" "c" "d" "e"
They are readily converted (sometimes automatically) to factors:
## [1] a b c d e
## Levels: a b c d e
## Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
Factors are a special kind of data type in R that we may run across from time to time. They have levels that can be ordered numerically. This is not important except that it becomes useful for coding variables used in statistical models- R does most of this behind the scenes and we won’t have to worry about it for the most part. In fact, in a lot of cases we will want to change factors to numerics or characters so they are easier to manipulate.
This is what it looks like when we code a factor as number:
Aside: we can ask R what functions mean by adding a question mark as we do above. And not just functions: we can ask it about pretty much any built-in object. The help pages take a little getting used to, but once you get the hang of it… In the mean time, the internet is your friend and you will find a multitude of online groups and forums with a quick search.
Logical vectors
Most of the logical
vectors we deal with are yes/no or comparisons to determine whether a given piece of information matches a condition. Here, we use a logical check to see if the object a
we created earlier is the same as object b
. If we store the results of this check to a new object c
, we get a new logical vector filled with TRUE
and FALSE
, one for each element in a
and b
.
## [1] FALSE FALSE FALSE FALSE FALSE
## logi [1:5] FALSE FALSE FALSE FALSE FALSE
We now have a logical vector. For the sake of demonstration, we could perform any number of logical checks on a vector using built-in R functions (it does not need to be a logical like c
above).
We can check for missing values.
## [1] FALSE FALSE FALSE FALSE FALSE
We can make sure that all values are finite.
## [1] TRUE TRUE TRUE TRUE TRUE
The exclamation !
point means “not” in to computers.
## [1] TRUE TRUE TRUE TRUE TRUE
We can see if specific elements meet a criterion.
## [1] FALSE FALSE TRUE FALSE FALSE
We can just look at unique values.
## [1] a b c d e
## Levels: a b c d e
The examples above are all simple vector operations. These form the basis for data manipulation and analysis in R.