Advice on the Standardization of statistical libraries for general purpose languages needed

After reading a large number of articles in a number of areas related to statistics – a typical theme arises where the authors lament that statistical analyses are suboptimal because of the limitations of the tools available.

I know R is “the language by statisticians, for statisticians”, but there are a large number of areas where R will not be adopted – particularly in engineering and chemistry where a mix of Fortran, C, C++, Matlab, and Python reigns.

There are at least 2 initiatives in different programming language communities to develop a standard library of mathematical and numerical functions, including statistical routines.

The Node.js community is also working on a standard library that will permit analysis and visualization anywhere node/javascript is able to run (link).

I’ve looked at a number of them, and wondered how the statistical experts would design a library
that is able to be used by a wide variety of statistical outlooks, but also encourages non-statisticians to engage in rigorous statistical thinking and modelling.

An added benefit for a good set of statistical methods coded in a language like Modern Fortran – it has standardized its interface to C, so it can be ported easily to whatever scripting language or platform becomes fashionable.

At the very least, I think the following are primitive:

  1. Probability Distributions
  2. Matrix Operations
  3. Combinatorial / Permutational operations
  4. Numerical integration and differentiation
  5. Random Number Generation (Don’t know how I didn’t list this initially!)

From there, regression models could be derived, with tests being a special case of the regression model.

Any specific guidance on what these language communities should standardize on so applied scientists can develop the most appropriate tools for his/her discipline?

3 Likes