1.4 Programming conventions
Style and organization
Learning to write code will be easier if you bite the bullet early on and adopt some kind of organization that allows you to interact with it (read, write, run, stare aimlessly, debug) more efficiently.
There are a lot of different ways to write computer code. All of them are intended to increase efficiency and readability. Some rules are more hard-coded and program-specific than others. For example, students in this class will notice that none of my code goes beyond a certain vertical line in the editor. That is to make it so that people don’t have to scroll over to the right of the editor to see what I have written when I email them code. When I share code with students I tend to justify everything really far to the left because everyone works on tiny laptops with multiple windows open and none of them maximized [shudders].
I suppose there is no “right” way to edit your code, but it will make your life easier if you find a style you like and stick to those conventions. If you are the kind of person who needs order
in your life, you can check out the tidyverse
style guide for tips. You can check code style with the lintr
package or interactively re-style your code with the styler
package if you"re thinking that may be a lot of work to remember on the front-end.
Regardless of how you end up styling your code, here are a few helpful hints that ought to help you get comfortable with your keyboard. I guess these are probably generally applicable to programming and not specific to R.
Some handy coding tips
Get to know your keyboard and your speed keys for code execution and completion. Use the mouse to navigate the GUI, not to write code. Here is a fairly comprehensive list of speed-key combinations for all of the major operating systems from the Rstudio website. You don’t need to know them all, but it can save you a ton of time.
File management is wicked important. This is probably one of the primary struggles folks have with starting to learn R and other languages. At the same time, it is a big part of the secret sauce behind good programming. For this class, I will assume that you are working out of a single working directory (call it something like “quant_bio” or “biol217”. That means I will assume your scripts (.R
files) for each chapter are in the same folder on your computer as your the folder that contains your data.
An example of your class folder might look like this:
Save early and often In general, RStudio is really good about keeping track of things for you, and it is more and more foolproof these days. However, there are still times when it will crash and there is nothing you can do to get your work back unless it has been saved to a file. So, whenever you write code, write it in a source file that is saved in a place you know you can find it. It is the first thing I do when I start a script, and the last thing I do before I run any code.
Please go check out the supplemental materials on the course website or check out the YouTube video linked above for more help getting started in R if you have no idea what I am talking about at this point.
Commenting code is helpful And I will require that you do it, at least to start. Comments are a way for you to explain what your code does and why. This is useful for sharing code or just figuring out what you did six months ago. It could also be that critical piece of clarity that makes me say “Oh, I see what you did there, +1” on your homework.
# This is a comment.
# We know because it is preceded
# by a hashtag, or "octothorpe".
# R ignores comments so you have
# a way to write down what you have
# done or what you are doing.
Section breaks help organization
I like to use the built-in heading style. It works really well for code-folding in R and when I"ve written a script that is several hundred lines long, sometimes all I want to see are the section headings. Go ahead and type the code below into a source file (File > New File > Rscript or Ctrl+Shift+N
) and save it (File > Save As or Ctrl+S
). Press the little upside-down triangle to the left of the line to see what it does.
# Follow a comment with four dashes or hashes
# to insert a section heading
# Section heading ----
# Also a section heading ####
This is really handy for organizing sections in your homework or for breaking code up into smaller sections when you get started. You’ll later learn that when you have to do this a lot, there are usually ways you can reduce your code or split it up more efficiently into other files.
Stricter R programming rules
For the next section, open RStudio if it is not already and type the code into a new source file (Ctrl+Shift+N
).
All code is in R is case sensitive.
Run the following lines (with the Run button or Ctrl+Enter
). If you highlight all of them, they will all be run in sequence from top to bottom. Or, you can manually run each line. Running each line can be helpful for learning how to debug code early on.
## [1] FALSE
So, what just happened? A few things going on here.
We’ve defined a couple of objects for the first time. If we translate the first line of code, we are saying, “Hey R, assign the value of
1
to an object nameda
for me.”Note that the two objects are not the same, and R knows this.
The
==
that we typed is a logical test that checks to see if the two objects are identical. If they were, then it would have returned aTRUE
instead ofFALSE
. This operator is very useful, and is more or less ubiquitous in object-oriented languages. We will use it extensively for data queries and conditional indexing (ooooh, I know!).
R will overwrite objects sequentially, so don’t name two things the same, unless you don’t need the first.
a <- 1
a <- 2
a # a takes on the second value here
print(a) # This is another way to look at the value of an object
show(a) # And, here is one more
Names should be short and meaningful. a
is a terrible name, even for a temporary object in most cases.
Cheesy, but better…
Punctuation and special symbols are important And, they are annoying to type in names. Avoid them in object names except for underscores “_
” where you can. I try to stick with lowercase for everything I do except built-in data and data from external files because it is a pain to change everything.
myobject <- 1 # Illegible
my.Object <- 1 # Annoying to type
myObject <- 1 # Better, but still annoying
my_object <- 1 # Same: maybe find a less annoying name?
Importantly, R doesn’t really care and would treat all of these as unique, but equivalent objects in all regards. Worth noting that most R style recommendations are moving toward the last example above.
Some symbol combinations are not allowed in object names But, these are usually bad names or temporary objects that create junk in your workspace anyway.
# In Rstudio there are nifty
# little markers to show this
# is broken
# 1a <- 1
# This one works (try it by typing
# "a1" in the console)
a1 <- 1
a2 <- a1 + 1
a3 <- a2 + 1
We’ll see later that sequential operations that require creation of redundant objects (that require memory) are usually better replace by over-writing objects in place or using functions like the pipe %>%
from the magrittr
package that help us keep a “tidy” workspace.
Some things can be expressed in multiple ways. Both T
and TRUE
can be used to indicate a logical that evaluates as being TRUE. But t
is used to transpose data.
Some names are “reserved”, “built-in”, or pre-defined. Did you notice that R already knew what T
and TRUE
were? We will talk more about this later in the course if we need to.
Other examples include functions like in, if, else, for, function()
and a mess of others have special uses.
Some symbols are also reserved for special use as “operators”, like:
+, -, *, % %, &, /, <, (, {, [, "", '', ...
, and a bunch of others. We will use basically all of these in just the first couple of chapters.