Stata for the Macintosh: statistics software review (April 2007, Stata 8)
SPSS and SAS are the big statistical dogs of business; their focus on business makes them relatively easy for the beginner. There are some reasons why SPSS for the Mac is not a viable long-term option for many people, even though it’s easier to explore than most stats programs. Mac versions lag behind Windows versions, and the user interface has quirks, bugs, odd crashes and pauses, and problems working with other programs. The price is absurd, and on top of the excessive cost for the base package, most users will need extra modules, each of which costs about as much as Stata – and they charge for module upgrades, too. Finally, there might not even be another version of SPSS for the Mac, and if there is, it might not work with new or old computer. Even now, there is no version of SPSS for Intel Macs.
Stata, like SPSS, has menu access to most or all commands, batch-language and interactive-command control, the ability to click a button and have a menu command appear in your batch file (so you will know how to write it in the future), and separate windows for syntax, data, and output. But the two programs are night and day, partly because SPSS is designed primarily for big-business users, and Stata is designed for serious statistics users. (Stata, incidentally, was originally developed in the 1980s for personal computers, while SPSS was designed earlier, for mainframes, and only slowly moved to personal computers - the “real” SPSS first showed up on OS/2 and, believe it or not, the Mac, with SPSS/PC, somewhat different, for DOS.)
Stata has full Mac support, including roughly-equal-citizen status; it was ready for Intel with amazing speed. The price of most Stata versions is well below SPSS’ base package, and just about any version of Stata is comparable in statistical capability to a full, complete version of SPSS, with two key caveats (producing what SPSS calls “camera ready” output, and easy copy-and-paste of output into Excel.)
Stata is solid and stable, and extremely fast, partly because it keeps data in RAM. For extremely large data sets, you either need to get more RAM, or enable disk caching, which slows things down. Some highly iterative operations can be slow, but an upgrade to the multiple-processor version of Stata should cure that; on a Mac Pro, with four cores, it should roughly triple performance. A two-core or two-processor machine should see a 1.7 time increase in speed. Stata Intercooled, which we tested, offers a good balance of cost and speed, but only uses a single core of a single processor. Our test machines were an iBook G4 and a Mac Pro, and both were very fast on all but heavily iterative processes; indeed, on simple runs, results were almost instant.
It did take us a while to figure out that our problem was not, as Stata claimed, our not responding to “More” but a lack of memory allocated to the program; once we discovered that, it was easy to change the settings.
Stata has an incredible range of statistics, and users can and do write programs to add more features; these are fairly easy to add on, as well. There are programs for odd little statistical procedures, for handling different input, for changing output, for playing with variables, and all sorts of other things. These programs are easy to install, and if you have problems, Stata tech support staff are willing and able to help.
Stata does, however, have quite a learning curve, even for people not trying to unlearn SPSS at the same time; SPSS is far easier to get used to, with its more modern GUI. That learning curve is probably one reason, along with brand recognition, why everyone in the (stats) world doesn’t use Stata.
Stata does has a menu-driven system, a spreadsheet view of data, and the ability to paste from menus into syntax files, which you can run line by line, save, and re-use code from; also, like SPSS, Stata lets you assign labels to variables, and import data from tab-delimited for fixed-length files. There are, however, so many differences, it’s hard to believe both programs essentially do the same things.
Stata maintains a much tighter connection to its DOS past; while SPSS has largely jettisoned “mainframe style” output that relies on spaces and pipes (|) for formatting, letting you easy copy and paste tables into Excel, Stata provides nothing but that kind of output (unless you install and use on of the user-designed commands to change the output of specific commands). This does greatly increase the program’s speed, but without output designed for the purpose, copying and pasting from Stata was difficult at best. In both cases, users who want to change the output will probably be frustrated, as will users who want to get rid of their graphing software. Both programs can generate very nice graphs but neither makes it easy to modify them; and SPSS has its own copy-and-paste issues at times. We have to wonder if anyone at either company puts new users into the labs and watches to see what they do – and what they expect the program to do.
Stata is a bit awkward in its presentation, overall. While keyboard commands are well done for the most part, so that you can bring up the spreadsheet view, command line, or other important windows easily with command-keys, there are notable quirks – including not being able to easily bring the active “Do” (syntax) file up front, and not being able to type in any commands while the data window is open. The spreadsheet doesn’t allow as much manipulation as with SPSS; the view promises more than it can deliver.
One nice feature is the Review window, which records the commands you’ve put in; it’s similar to the SPSS Journal, except that you don’t need to dig it out and open it. On the other hand, it’s far more difficult to take out groups of commands in Stata.
The variables window, which lists variables, is normally open as well, but it doesn’t allow you to select multiple variables at once; you have to do them one at a time. Fortunately, you can easily select multiple variables in the syntax (Do file); and because it supports both ranges (var1-avar40) and wildcards (a fairly flexible set of them), it’s easy to code in groups of variables. Using some forethought in variable names can make Stata very quick and powerful indeed.
Help is very fast. Type in help, and a new help viewer comes up instantly, even on an old G4. On the other hand, help isn’t necessarily helpful, and the extensive manuals, though chock full of information, are organized around program commands more than user needs. There are some reputable books providing more task-oriented introductions to Stata, selling for $50 and up, which may help – and wouldn’t be that expensive compared with getting SPSS. The tech support people say that the manuals also have a learning curve, but on the lighter side, they also have a full description of each command’s syntax (but not how to access it from the menus!), theoretical aspects of the process, and examples – not to mention summaries of algebraic formulations and a number of references. There are separate manuals for time series, longitudinal or panel data, survival analysis, survey data, and multivariate analysis; full manuals are devoted to data management, programming, and graphics. (If that seems intimidating, it is no more so than SPSS, which has a few big books for the main program plus manuals for each module). Stata itself provides online courses in the software.
Alan Riley in tech support advised us that there are also help shortcuts, which appear to be unique among stats programs:
The help file for each command has a link (or links) to the dialog (or multiple dialogs, such as with -ttest-) for that command in the upper right corner of the help file. So, all a user has to do to find the dialog(s) for -ttest- is to look at the help file for -ttest- and click on the links to the dialogs at the upper right.
We introduced a very short and simple command named -db-. If a user wants to quickly pop up the dialog for a given command rather than finding it in the menus, all they need to do is type -db regress- or -db ttest-, for example. This will immediately pop up the dialog for the command. In the case of a command like -ttest- which has multiple dialogs, the user is taken to an intermediate page which shows them a listing of all available dialogs for that command. I believe that this easy command access to any dialog is also unique among statistical packages.
We quickly discovered that both companies have very dedicated tech support groups, though we’d give the edge to Stata - especially since they are far more supportive of the Linux and Mac platforms. At SPSS, the bureaucracy has grown with the company’s fortunes, and we’ve found that they not only will demand our serial number, but then won’t find it in their database. Consistently. Across four separate licenses. At Stata, not only is tech support quick and helpful, but it is fully behind the cross-platform idea. If I may quote from Alan Riley, in a personal e-mail:
Cross-platform compatibility is extremely important to us, and we work very hard to release new versions of Stata on all platforms (including Mac) simultaneously as well as to make sure that all datasets, graphs, do-files, ado-files and other file types are completely compatible no matter what operating system a user may have. The only time we want to allow an incompatibility is when a particular operating system supports something that another does not...The only other potential incompatibility between Stata on different operating systems should be when a user refers to something like a file path that is platform specific.
In our time with Stata, we were impressed with the user community, especially as we ended up using user-contributed statistical modules to do things that were not programmed into Stata. The ease of adding these modules is surprising; and they can include help files that integrate right into the program.
More than anything, we came to the conclusion that Stata is the program for scientists and statisticians – people who work the numbers for their own use, and only rarely take numbers out of the program for publication. Getting Stata to do a new (to the user) procedure could be a rough task, with a steep learning curve and many alternative ways of doing things. These problems have to be weighed against our experience that, once we learned how to do something, Stata could be more flexible and more powerful than SPSS.
By commercial omnibus statistical package standards, Stata is downright cheap; it has a huge amount of support; the Mac version does nearly everything the Windows and Linux versions do (a small number of syntax commands are not supported, which can lead to trouble with user-written files); and the Mac is not treated as a third-rate platform. It is a strong environment for the serious statistician, or the serious user of statistics, who can “get into it” and get their time spent on learning the system paid back in greater effectiveness. For casual or occasional statistics users, SPSS is far easier to learn and use, if you happen to have a computer supported by SPSS at the moment, and if you bought all the modules you needed, and if what you’re doing is officially supported. However, in the long run, the money demanded by SPSS may well outweigh the time savings you get – and you may find that Stata is a better investment when you consider the cost (and need for) upgrades.
Additional Stata notes
This was our original 2007 writeup:
The philosophy of the program is quite different from SPSS; you give a little, you get a little (you give a command, you get a little back, you run another command, etc) instead of getting huge blocks of stats all at once. That way you remain in control and stay careful.
Dr. Evan Stark wrote: “Stata takes fastidious care of their product and their care about it is obvious. You get the sense that at Stata they thought of everything, and when they or a user points out that they didn’t, they quickly provide a fix or new functionality. Although it did take me a while to understand its syntax [switching from SPSS], I did master it and statistical life became thereafter very enjoyable. But buying Stata is like buying SPSS or Base (old for Mac) SAS for a 1/3 the price. It is undoubtedly the best, full-featured Mac statistics package. My sense is that Stata is the up and coming - with the exception of corporations and government organizations that run millions of records with thousands of variables, which will always be SAS’s domain. SPSS ought to be brought up on charges for its student/teacher price value.”
The dark side of Stata is that it doesn't do as well at hiding what we assume to be its mainframe roots; it is relentlessly character driven. Though the basic idea is similar to SPSS - you either use it interactively via menus, or type in individual commands, or write batch files and run them - the output window is text-only, as in DOS or mainframe style output. There are no pivot tables as in SPSS, and no simply copy-and-paste into Word or Excel. If you're the kind of person who loves LaTeX you may love that, but most people will find it painful. (We'll be experimenting with copy-pasting into BBEdit and replacing pipes - |, that is - with tabs.)
There are lots of quirks in Stata but some research got us over most of them, we think. The big ones are that you can’t have the variables showing when you run stuff, and you have to issue the clear command before loading new data or quitting. These took us a while to figure out. Stata is not a Mac program; it's a DOS program (or UNIX if you prefer) that happens to run extremely well on Macs.
We also found so far that Stata seems to provide far more flexibility than SPSS in statistical procedures, and relatively good explanations of what they do in their capacious manuals. Again, what SPSS requires a module for, Stata probably does on primary install. Then there is a huge base of user-written software that is easily available, and that you are encouraged by Stata to install. And frequent updates, which SPSS doesn't do, even for Windows. (And a new copy of Stata costs less than an SPSS upgrade.)
SPSS is practically a file-format standard, but Stata won’t import SPSS files; you can get around that by buying Stat/Transfer and Stata for less than the price of SPSS. Stat/Transfer is currently $295, or $179 academic (with even cheaper student pricing). We just got a copy but haven't put it through its paces yet.
To use multiple cores of your Mac’s processor (assuming you have a dual-core or quad-core processor), you need to get the pricey MP version (which, to be fair, is priced about the same as SPSS). On the much lighter side, Stata is neither afflicted with SPSS-like stability problems nor SPSS-like slowness. Stats run almost instantly in most cases - the speed difference is amazing. For large data sets it requires more RAM since all data is held in RAM.
If you use statistical programs mainly for analysis, our quick look at Stata (we’ll be doing an in-depth review) shows that it is quite probably a superior alternative to SPSS, at a much lower price, and with consistent, apparently permanent Mac support. If you use statistical programs partly for analysis and partly for tabulation, or often paste your results into Word, etc., and are allergic to LaTeX and other such programming chores, it may not do what you want.