Using R Statistical and Graphics Tools

Why R for Natural Resource Stewardship Science?

R is an open-source implementation of the S language for statistical computing. For almost 30 years, applied statisticians have been submitting implementations of their new techniques to StatLib. When those implementations were written as libraries for the commercial S-plus implementation, statisticians were providing software for free, but users (including those same statisticians) had to pay a third party to be able to run the software. A very small group of statisticians took it upon themselves to write a complete open-source implementation of S that would run under most operating systems, which they called R. Since then, the vast majority of implementations of new statistical techniques have been made available as R packages, which include the code as a library of functions and at least some documentation.

Because R is very useful for "computing with data," experts in many fields use it for their work. Because R is open source, many of those experts make their field-specific code and functions freely available as packages. For example, EPA folks use R, and make their tools for generating spatial survey samples (and for analyzing the resulting data!) available as package spsurvey. Both Helsel's and Millard's tools for water quality are available as well-documented R packages. Climate researchers use R with netCDF files, so there are packages for reading and writing netCDF files (netCDF, ncdf4) as well generating standard climate diagrams, imputing missing weather data, downscaling from coarse data, etc. (climtol, clim.pact, seas, anm, zyp). Phenology researchers provide packages bise and pheno. Jari Oksanen (with help from others) provides package vegan for vegetation analysis (ordination, classification, analysis of similarity). Ecologists working on habitat analysis and spatial prediction provide adehabitat, grasp, BIOMOD, ModelMap. Wildlife ecologists and statisticians from FWS, USGS, and elsewhere provide packages for estimating abundances and occupancy, including unmarked, mra, Rcapture, secr, PresenceAbsence, trip, and tripEstimation. Social scientists provide a set of packages for using the 2000 census data. Bioconductor is a project with many packages for analyzing microarray data, DNA and protein sequence data, and other molecular biology bioinformatics. Researchers on visual display of information, reproducible research, and accessibility also build and share tools in R. rOpenSci is a consortium of scientists building R packages to make science more powerful, reproducible, and transparent, through open data and open software. When this page was last updated, the main CRAN repository included over 13000 such packages, Bioconductor had 1560, and many others were available on r-forge or GitHub.

By learning how to use R, we can leverage all of those efforts and expertise, and not reinvent those wheels. If we improve a wheel or write one for a different need, we can in turn make our improved wheel available to others as a new package, or work with the authors of the original package and let them incorporate our additions and improvements.

Last updated: September 19, 2018