Tools for statistical analysis
— statisticsFor me, the choice of platform came down to R or SciPy. I am aware of Sage, but it seemed like such a huge and sprawling collection that I found it a little off-putting (and I suspect it greatly exceeds my needs). In the end, I chose R because I had (very limited) experience with it, and I figured I'd enjoy using a statistical language more than a general-purpose language for this kind of work. That's not to say I didn't get thoroughly confused at times, of course. I found the following R packages and documentation very useful:
- The R manuals and FAQs are a good place to start.
- ggplot2 is a very flexible (and pretty!) graphing package.
- ROCR is a visualisation package for evaluating classifiers (e.g., GLMs).
- Writing R Extensions (in html and pdf) describes how to write your own packages (such as my clumsy attempt as part of a sensitivity analysis of the Guyton model).
- If you're having difficulties, the R Inferno may be useful (from the preface: "If you are using R and you think you're in hell, this is a map for you").
- The R community has recently started a wiki book, whose sections are currently in varying stages of completeness.
In addition, there are several other tools that can be very handy at certain times:
- g3data is an excellent tool for extracting data points from published graphs.
- GGobi is a visualisation tool for exploring high-dimensional data.
- Mondrian is an interactive data visualisation tool.
But despite all the time I've spent with R and other statistical tools, I'm still very much aware that I am a complete novice when it comes to statistics. I really need to start reading some good foundational texts, although I always ended up digging through the proofs and derivations because I'm rarely comfortable unless I'm convinced I understand the how and the why.