After reading a large number of articles in a number of areas related to statistics – a typical theme arises where the authors lament that statistical analyses are suboptimal because of the limitations of the tools available.
I know R is “the language by statisticians, for statisticians”, but there are a large number of areas where R will not be adopted – particularly in engineering and chemistry where a mix of Fortran, C, C++, Matlab, and Python reigns.
There are at least 2 initiatives in different programming language communities to develop a standard library of mathematical and numerical functions, including statistical routines.
The Node.js community is also working on a standard library that will permit analysis and visualization anywhere node/javascript is able to run (link).
I’ve looked at a number of them, and wondered how the statistical experts would design a library
that is able to be used by a wide variety of statistical outlooks, but also encourages non-statisticians to engage in rigorous statistical thinking and modelling.
An added benefit for a good set of statistical methods coded in a language like Modern Fortran – it has standardized its interface to C, so it can be ported easily to whatever scripting language or platform becomes fashionable.
At the very least, I think the following are primitive:
- Probability Distributions
- Matrix Operations
- Combinatorial / Permutational operations
- Numerical integration and differentiation
- Random Number Generation (Don’t know how I didn’t list this initially!)
From there, regression models could be derived, with tests being a special case of the regression model.
Any specific guidance on what these language communities should standardize on so applied scientists can develop the most appropriate tools for his/her discipline?