As a psychologist, the software I have used for much of my career when analysing statistics is, of course, SPSS. Its simplicity has meant SPSS has been quite a good friend over the years. However, the friendship is (almost) over. There’s a new “best friend” for this researcher: R.
R is a free (yep, FREE) software environment for statistical computing. Although that description might sound intimidating, do not let it put you off. I love R. No, I REALLY mean it. Since I’ve been using R, my statistical life has become a dream. The beauty of R is that you can do almost ANYTHING you want to with it (it doesn’t make coffee, although a package might be available for that someday; hey, I can dream, can’t I?).
Want to sort your data? R can do it. Want to run every-conceivable statistical test? R can do it. Want publication-quality figures? R can do it. Want to run cognitive model simulations? R can do it. You get the idea.
This post will highlight 3 steps to get started in the wonderful world of R. But first, a warning: R does have quite a steep learning curve. Fact. I’m not going to lie that for quite a while I was using R thinking “I could do this quicker in SPSS” or “Nice plots, but it’s easier in Excel”. But, after a (short) while, something clicked, and I’ve never looked back.
For me, the beauty of R is that you can store all of your analyses as scripts (like SPSS syntax files). Sure, it might take you a while to initially work out how to do a repeated measures ANOVA in R, but once you have done so, save the script in an appropriately named folder and you can go back to it any time. It means that I now do all of my data work in R. Before, I used to trim data in Excel, then calculate the means using a pivot table, copy & paste the means into SPSS to run my analysis, and then go BACK to Excel to create my graphs. With R, I just execute ONE script, and it does all this for me. If I have a different experiment I want to analyse, it usually won’t mean I need to start from scratch with my script; I just need to tweak a few bits of code here and there.
I also use R exclusively for simulation work that I do, be it statistical simulations (e.g., this post on the dynamics of p-values under the null hypothesis) or cognitive simulations (e.g., fitting cognitive models to task switching data).
Heck, I even THINK in R now. Everything looks like vectors and data frames to me.
Follow these three steps, and get started along a road that will lead to statistical salvation.
1) Install R.
Go to http://www.r-project.org/ and navigate to the downloads page. The rest is straightforward. This will install R onto your machine. As R is really a programming language as well as statistical software, you need somewhere to type your commands in so that R can do its stuff. Although the installation comes with a GUI, it is not very good, and only really supports single-line entries. Although this is what you will be using R for when you start out, very soon you will be wanting to run multi-line scripts (for example, to run an ANOVA). Therefore, I recommend you use R-Studio. Onto step 2…
2) Install R-Studio.
In my opinion, it was the installation of R-Studio which really kick-started my love affair with R. It’s a very nice-looking front-end to R, and let’s face it, the front-end that came with the base application is ugly. Go to http://www.rstudio.com/ and download the appropriate version. As with pretty much all-things R, it’s FREE. (God, I love R. Have I mentioned that yet?). Once installed, open it up, and you should see something like the image below.
OK, so your version won’t have a plot in the screen (yet!). The lower-left panel acts just like the command prompt that installed with the base R package (step 1). Try the following out, by typing it straight in to the command line and hitting return:
314 * 456
You should get the response 143184! But, this is only useful for single-line entries, and it is not very economical to enter all of your commands in this way, as chances are, pretty soon you will have multi-line code. So, this is where the upper-left panel comes in handy. This is where you can write your script, line by line, and then execute all of the lines at once. For example, to replicate the plot in the above image, select the upper-left panel and type (on separate lines, as below):
x <- rnorm(1000, mean = 1, sd = 0.5) hist(x)
The top line declares a variable, called ‘x’. We ask x to hold numbers sampled randomly from a normal distribution (this is what rnorm means). We ask for 1000 numbers, and ask the mean of the distribution being sampled from to be 1, and the standard deviation (SD) of the sample to be 0.5. We then ask to see a histogram of these numbers (hist(x)).
Select all of this script (click & drag your mouse over it), then select CODE from the upper menu, and select “Run Line(s)” from the drop-down menu; alternatively, with the script highlighted press & hold “Ctrl” and then hit Return. You should now see the plot on the bottom-right panel.
3) Learn R!
This is just a basic overview of R. You obviously want to learn much more about it than this. I learned much of my R via trial an error: I would hit a snag, and then trawl the internet for a solution. R has such a fantastic community, and there are lots of helpful resources online.
However, the other day I came across a superb resource for getting started with R. Try R by Code School (http://tryr.codeschool.com/) appears to introduce all of R’s key functions from the ground up in an easily accessible manner. In fact, going through some of the early stages myself I have learned quite a few new tricks.
So, go ahead and get stuck in with R. I promise you will not look back. (Sorry SPSS, but you really do suck.)