Tools for statistical analysis

Model posts Sep 21, 2011 Statistics

For me, the choice of platform came down to R or SciPy. I am aware of Sage, but it seemed like such a huge and sprawling collection that I found it a little off-putting (and I suspect it greatly exceeds my needs). In the end, I chose R because I had (very limited) experience with it, and I figured I’d enjoy using a statistical language more than a general-purpose language for this kind of work. That’s not to say I didn’t get thoroughly confused at times, of course. I found the following R packages and documentation very useful:

  • The R manuals and FAQs are a good place to start.
  • ggplot2 is a very flexible (and pretty!) graphing package.
  • ROCR is a visualisation package for evaluating classifiers (e.g., GLMs).
  • Writing R Extensions (in html and pdf) describes how to write your own packages (such as my clumsy attempt as part of a sensitivity analysis of the Guyton model).
  • If you're having difficulties, the R Inferno may be useful (from the preface: "If you are using R and you think you're in hell, this is a map for you").
  • The R community has recently started a wiki book, whose sections are currently in varying stages of completeness.

In addition, there are several other tools that can be very handy at certain times:

  • g3data is an excellent tool for extracting data points from published graphs.
  • GGobi is a visualisation tool for exploring high-dimensional data.
  • Mondrian is an interactive data visualisation tool.

But despite all the time I’ve spent with R and other statistical tools, I’m still very much aware that I am a complete novice when it comes to statistics. I really need to start reading some good foundational texts, although I always ended up digging through the proofs and derivations because I’m rarely comfortable unless I’m convinced I understand the how and the why.