# R on the Mac

R is a comprehensive statistical programming language that is cooperatively developed on the Internet as an open source project. It is often referred to as the “GNU S,” because it almost completely emulates the S programming language. It has packages to do regression, ANOVA, general linear models, hazard models and structural equations. Graphical output can be created using a TeX plug-in to convert the standard ASCII-based output.

R has a *massive *range of tests, PDF and PostScript output, a function to expand zip archives, and numerous other unexpected features. R programs and algorithms are distributed by the Comprehensive R Archive Network (CRAN). A **simple** graphic user interface is included for Mac users; R Commander can be installed using the built-in package installer, which can also install file import features (which aren't installed by default). R Commander is an X11 program, which means it uses an alien interface and has odd open/save dialogues, but if you get past that it offers menu driven commands not dissimilar from, say, SPSS, just a lot more awkward to use, and without an output or data window.

Like many open source projects, R is exceedingly capable but has a steep learning curve. Some believe this is for the best because people will get a deeper understanding of the statistics they generate with a program such as R, versus one which allows the rapid creation of scads of irrelevant statistics leading to incorrect conclusions. Those who expect even a basic graphical interface (e.g. SPSS 4) may be disappointed by the R community’s definition of a GUI.

Version 2.11 adds support for bitmap rendering, a new function for drawing raster images, and tweaks and bug fixes.

Ashish Ranpura wrote:

Last week I finally put R through its paces on two recent experiments from our lab. It performed spectacularly. It's pretty easy to learn using online tutorials, in particular John Verzani's tutorial which is a course in introductory statistics using R.

The highlight: figuring out the 15 or so commands to import, parse, slice and graph a 3-way comparison of control subjects using a scatterplot and a violin plot. Then using BBEdit to search and replace the word "control" with my two experimental conditions, pasting that back into R, and generating a report with all 6 graphs in about 3 keystrokes! Now that's how a program ought to work.

But the

advantages of R are that it is absolutely cross-platform (Linux, MacOS, Windows) and that it's open source. You've a good chance of accessing your data 10 years from now, which I wouldn't say with the commercial packages. The user base is large, active, and productive. The S language on which it's based is a well-accepted standard in statistics. R has stood the test of time and is likely to continue to do so.majorThere is one significant caveat: R is relentlessly command-line driven, and even the graphs cannot be edited with mouse clicks. It's trivial to take the PDF graphs into Illustrator, though, so this limitation hasn't been a problem for me.

Some resources include:

- JGR - a graphical user interface for R which provides the R.app functions plus a simple text editor, split input/output screen, and spreadsheet view
- R for Mac FAQ
- The R project home page (with download links)
- This web page on R, S and S/Plus statistics systems, which provides a background on the software and summarizes available packages
- Using R for structural equation modeling

The current version is 2.11 (as of 4-28-10). This is a Universal Binary version which runs on OS X; prior versions ran under OS 8 and OS 9. It installs R.app, a simple menu system / R launcher that allows for easy browsing of datasets, installation of additional modules including JGR, and viewing/editing of graphs; it also houses the console that lets you interact with R via command-line. There is also an X11 version for those who like X11. Do not try using this without reading any instructions.

R has a massive range of tests and now has Matrix as a recommended package, a useKerning argument for PDF and PostScript output, a recursive argument for file.copy(), an unzip function to expand or list zip archives, and other changes.

There is a R for Mac Special Interest Group, called R-Sig-Mac. The group is implemented as an e-mail list. You can subscribe to the list or see the archives going to its official web page: http://www.stat.math.ethz.ch/mailman/listinfo/r-sig-mac

## S and R Programming Languages

Beginning in 1976, the S programming language was developed at Bell Labs (whose statistics department employed John Tukey and Joseph Kruskal) by John Chambers and others. Version 1 required Honeywell mainframes, Version 2 (1980) added Unix support, Version 3 (1988) added functions and objects, and Version 4 (1998) added full support for object-oriented design. In 1993, Bell Labs issued an exclusive license to StatSci (later MathSoft).S-Plus is Mathsoft’s commercial implementation of S, and the only way the language is available outside Lucent.

R was begun by Robert Gentleman and Ross Ihaka of the University of Auckland. It is now an open source project staffed by volunteers from around the world whose development is coordinated through the Comprehensive R Archive network. Source code, binaries, and documentation are at the CRAN web site.

Documentation that compares R and S include:

- The R and S discussion in CRAN’s FAQ.
- The online supplement to Venables and Ripley (1999).
- The published text of Venables and Ripley (2000), and its online errata.

Adapted from an August 2000 Academy of Management workshop on stat packages, we are showing how to use R for analyses common in management research:

Base package commands:

- anova: analysis of variance
- glm: general linear model, including logit, probit and poisson models
- ls/lsfit: fit an OLS or WLS regression model

Built-in packages

- ts package:
- arima: ARIMA time series models

Contributed R packages and their capabilities:

- boot: bootstrapping and jacknifing
- coda: analysis and diagnostics for Markov Chain Monte Carlo simulation
- fracdiff: ARIMA time series models
- matrix: matrix math
- cmdscale: multi-dimensional scaling
- multiv: cluster analysis, correspondance analysis, principal component factor analysis
- pls: Partial Least Squares structural equation modeling
- survival5: survival analysis (hazard models)