R is a
programming languageA programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
and software environment for
statisticalStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis.
R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. S was created by
John ChambersJohn M. Chambers is the creator of the S programming language, and core member of the R programming language project. He was awarded the 1998 ACM Software System Award for developing S...
while at Bell Labs. R was created by
Ross IhakaRoss Ihaka is an Associate Professor of Statistics at the University of Auckland, who is recognized, along with Robert Gentleman, as one of the originators of the R programming language...
and
Robert GentlemanRobert C. Gentleman is a Canadian statistician and bioinformatician currently working for Genentech. He is recognized, along with Ross Ihaka, as one of the originators of the R programming language and associated software packages like Bioconductor. He got his Ph.D...
at the
University of AucklandThe University of Auckland is a university located in Auckland, New Zealand. It is the largest university in the country and the highest ranked in the 2011 QS World University Rankings, having been ranked worldwide...
,
New ZealandNew Zealand is an island country in the south-western Pacific Ocean comprising two main landmasses and numerous smaller islands. The country is situated some east of Australia across the Tasman Sea, and roughly south of the Pacific island nations of New Caledonia, Fiji, and Tonga...
, and now, R is developed by the
R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors (Robert Gentleman and Ross Ihaka), and partly as a play on the name of S.
R is part of the
GNU projectThe GNU Project is a free software, mass collaboration project, announced on September 27, 1983, by Richard Stallman at MIT. It initiated GNU operating system development in January, 1984...
. Its
source codeIn computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
is freely available under the
GNU General Public LicenseThe GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....
, and pre-compiled binary versions are provided for various
operating systemAn operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
s. R uses a command line interface; however, several
graphical user interfaceIn computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...
s are available for use with R.
Statistical features
R provides a wide variety of statistical and graphical techniques, including
linearIn mathematics, a linear map or function f is a function which satisfies the following two properties:* Additivity : f = f + f...
and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and others. R is easily extensible through functions and extensions, and the R community is noted for its active contributions in terms of packages. There are some important differences, but much code written for S runs unaltered. Many of R's standard functions are written in R itself, which makes it easy for users to follow the algorithmic choices made. For computationally intensive tasks,
CC is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
,
C++C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
, and
FortranFortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
code can be linked and called at run time. Advanced users can write C or
JavaJava is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
code to manipulate R objects directly.
R is highly extensible through the use of user-submitted packages for specific functions or specific areas of study. Due to its S heritage, R has stronger
object-oriented programmingObject-oriented programming is a programming paradigm using "objects" – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction,...
facilities than most statistical computing languages. Extending R is also eased by its permissive lexical scoping rules.
According to
Rexer's Annual Data Miner SurveyRexer Analytics’s Annual Data Miner Survey is the largest survey of data mining professionals in the industry. It consists of approximately 50 multiple choice and open-ended questions that cover seven general areas of data mining science and practice: Field and goals, Algorithms, Models, Tools...
in 2010, R has become the
data miningData mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
tool used by more data miners (43%) than any other.
Another strength of R is static graphics, which can produce publication-quality graphs, including mathematical symbols. Dynamic and interactive graphics are available through additional packages such as
RGLRGL is a software package for the R programming language. It extends the R programming environment with a 3D real-time Visualization Device System.At the core, RGL is a 3D engine written in C++ using OpenGL. It provides an API for the R Programming Language...
.
R has its own
LaTeXLaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...
-like documentation format, which is used to supply comprehensive documentation, both on-line in a number of formats and in hard copy.
Programming features
R is an interpreted language typically used through a command line interpreter. If one types "2+2" at the command prompt and presses enter, the computer replies with "4".
> 2+2
[1] 4
Like many other languages, R supports
matrix arithmeticIn mathematics, a matrix is a rectangular array of numbers, symbols, or expressions. The individual items in a matrix are called its elements or entries. An example of a matrix with six elements isMatrices of the same size can be added or subtracted element by element...
. R's
data structureIn computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
s include
scalarsIn computing, a scalar variable or field is one that can hold only one value at a time; as opposed to composite variables like array, list, hash, record, etc. In some contexts, a scalar value may be understood to be numeric. A scalar data type is the type of a scalar variable...
,
vectors,
matricesIn mathematics, a matrix is a rectangular array of numbers, symbols, or expressions. The individual items in a matrix are called its elements or entries. An example of a matrix with six elements isMatrices of the same size can be added or subtracted element by element...
, data frames (similar to
tablesIn relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...
in a
relational databaseA relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
) and
lists. The R object system has been extended by package authors to define objects for
regression modelsIn statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...
, time-series and
geo-spatial coordinatesSpatial analysis or spatial statistics includes any of the formal techniques which study entities using their topological, geometric, or geographic properties...
.
R supports
procedural programmingProcedural programming can sometimes be used as a synonym for imperative programming , but can also refer to a programming paradigm, derived from structured programming, based upon the concept of the procedure call...
with functions and, for some functions,
object-oriented programmingObject-oriented programming is a programming paradigm using "objects" – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction,...
with
generic functionIn certain systems for object-oriented programming such as the Common Lisp Object System and Dylan, a generic function is an entity made up of all methods having the same name. Typically a generic function itself is an instance of a class that inherits both from function and standard-object...
s. A generic function acts differently depending on the type of arguments it is passed. In other words the generic function
dispatchesIn computer science, dynamic dispatch is the process of mapping a message to a specific sequence of code at runtime. This is done to support the cases where the appropriate method can't be determined at compile-time...
the function (
methodIn object-oriented programming, a method is a subroutine associated with a class. Methods define the behavior to be exhibited by instances of the associated class at program run time...
) specific to that type of
objectIn computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...
. For example, R has a
genericIn certain systems for object-oriented programming such as the Common Lisp Object System and Dylan, a generic function is an entity made up of all methods having the same name. Typically a generic function itself is an instance of a class that inherits both from function and standard-object...
print function that can print almost every type of
objectIn computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...
in R with a simple "print(objectname)" syntax.
Although R is mostly used by statisticians and other practitioners requiring an environment for statistical computation and software development, it can also be used as a
general matrix calculationNumerical linear algebra is the study of algorithms for performing linear algebra computations, most notably matrix operations, on computers. It is often a fundamental part of engineering and computational science problems, such as image and signal processing, Telecommunication, computational...
toolbox with performance benchmarks comparable to
GNU OctaveGNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command-line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with MATLAB...
or
MATLABMATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...
.
Example 1
The following examples illustrate the basic syntax of the language and use of the command-line interface.
In R and S, the
assignment operatorIn computer programming, an assignment statement sets or re-sets the value stored in the storage location denoted by a variable name. In most imperative computer programming languages, assignment statements are one of the basic statements...
is an arrow made from two characters "<-".
> x <- c(1,2,3,4,5,6) # Create ordered collection (vector)
> y <- x^2 # Square the elements of x
> print(y) # print (vector) y
[1] 1 4 9 16 25 36
> mean(y) # Calculate average (arithmetic mean) of (vector) y; result is scalar
[1] 15.16667
> var(y) # Calculate sample variance
[1] 178.9667
> lm_1 <- lm(y ~ x) # Fit a linear regression model "y = f(x)" or "y = B0 + (B1 * x)"
# store the results as lm_1
> print(lm_1) # Print the model from the (linear model object) lm_1
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-9.333 7.000
> summary(lm_1) # Compute and print statistics for the fit of the (linear model object) lm_1
Call:
lm(formula = y ~ x)
Residuals:
1 2 3 4 5 6
3.3333 -0.6667 -2.6667 -2.6667 -0.6667 3.3333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.3333 2.8441 -3.282 0.030453 *
x 7.0000 0.7303 9.585 0.000662 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.055 on 4 degrees of freedom
Multiple R-squared: 0.9583, Adjusted R-squared: 0.9478
F-statistic: 91.88 on 1 and 4 DF, p-value: 0.000662
> par(mfrow=c(2, 2)) # Request 2x2 plot layout
> plot(lm_1) # Diagnostic plot of regression model
Example 2
Short R code calculating
Mandelbrot setThe Mandelbrot set is a particular mathematical set of points, whose boundary generates a distinctive and easily recognisable two-dimensional fractal shape...
through the first 20 iterations of equation z = z² + c plotted for different complex constants
c. This example demonstrates:
- use of community developed external libraries (called packages), in this case caTools package
- handling of complex numbers
- multidimensional arrays of numbers used as basic data type, see variables C, Z and X
library(caTools) # external package providing write.gif function
jet.colors <- colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan", "#7FFF7F",
"yellow", "#FF7F00", "red", "#7F0000"))
m <- 1200 # define size
C <- complex( real=rep(seq(-1.8,0.6, length.out=m), each=m ),
imag=rep(seq(-1.2,1.2, length.out=m), m ) )
C <- matrix(C,m,m) # reshape as square matrix of complex numbers
Z <- 0 # initialize Z to zero
X <- array(0, c(m,m,20)) # initialize output 3D array
for (k in 1:20) { # loop with 20 iterations
Z <- Z^2+C # the central difference equation
X[,,k] <- exp(-abs(Z)) # capture results
}
write.gif(X, "Mandelbrot.gif", col=jet.colors, delay=100)
Packages
The capabilities of R are extended through user-created
packages, which allow specialized statistical techniques, graphical devices, import/export capabilities, reporting tools, etc. These packages are developed primarily in R, and sometimes in
JavaJava is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
,
CC is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
and
FortranFortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
. A core set of packages are included with the installation of R, with more than 4300 available at the
Comprehensive R Archive Network (CRAN),
BioconductorBioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology....
, and other repositories.
The
"Task Views" page (subject list) on the CRAN website lists the wide range of applications (Finance, Genetics, Machine Learning, Medical Imaging, Social Sciences and Spatial statistics) to which R has been applied and for which packages are available.
Other R package resources include Crantastic, a community site for rating and reviewing all CRAN packages, and also R-Forge, a central platform for the collaborative development of R packages, R-related software, and projects. It hosts many unpublished, beta packages, and development versions of CRAN packages.
The
BioconductorBioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology....
project provides R packages for the analysis of genomic data, such as
AffymetrixAffymetrix is a company that manufactures DNA microarrays; it is based in Santa Clara, California, United States. The company was founded by Dr. Stephen Fodor in 1992. It began as a unit in Affymax N.V...
and
cDNAIn genetics, complementary DNA is DNA synthesized from a messenger RNA template in a reaction catalyzed by the enzyme reverse transcriptase and the enzyme DNA polymerase. cDNA is often used to clone eukaryotic genes in prokaryotes...
microarrayA microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate that assays large amounts of biological material using high-throughput screening methods.Types of microarrays include:...
object-oriented data handling and analysis tools, and has started to provide tools for analysis of data from next-generation high-throughput sequencing methods.
Reproducible research and automated report generation can be accomplished with packages such as Sweave and odfWeave that support
execution of R code embedded within
LaTeXLaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...
,
OpenDocument formatThe Open Document Format for Office Applications is an XML-based file format for representing electronic documents such as spreadsheets, charts, presentations and word processing documents....
and other markups.
Milestones
The full list of changes is maintained in the
NEWS file. Some highlights are listed below.
- Version 0.16 – This is the last alpha version developed primarily by Ihaka and Gentleman. Much of the basic functionality from the "White Book" (see S history) was implemented. The mailing lists commenced on April 1, 1997.
- Version 0.49 – April 23, 1997 – This is the oldest available source
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
release, and compiles on a limited number of Unix-like platforms. CRAN is started on this date, with 3 mirrors that initially hosted 12 packages. Alpha versions of R for Microsoft Windows and Mac OSMac OS is a series of graphical user interface-based operating systems developed by Apple Inc. for their Macintosh line of computer systems. The Macintosh user experience is credited with popularizing the graphical user interface...
are made available shortly after this version.
- Version 0.60 – December 5, 1997 – R becomes an official part of the GNU Project
The GNU Project is a free software, mass collaboration project, announced on September 27, 1983, by Richard Stallman at MIT. It initiated GNU operating system development in January, 1984...
. The code is hosted and maintained on CVSThe Concurrent Versions System , also known as the Concurrent Versioning System, is a client-server free software revision control system in the field of software development. Version control system software keeps track of all work and all changes in a set of files, and allows several developers ...
.
- Version 1.0.0 – February 29, 2000 – Considered by its developers stable enough for production use.
- Version 1.4.0 – S4 methods are introduced and the first version for Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...
is made available soon after.
- Version 2.0.0 – October 4, 2004 – Introduced lazy loading
Lazy loading is a design pattern commonly used in computer programming to defer initialization of an object until the point at which it is needed. It can contribute to efficiency in the program's operation if properly and appropriately used...
, which enables fast loading of data with minimal expense of system memory.
- Version 2.1.0 – Support for UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
encoding, and the beginnings of internationalization and localizationIn computing, internationalization and localization are means of adapting computer software to different languages, regional differences and technical requirements of a target market...
for different languages.
- Version 2.11.0 – April 22, 2010 – Support for Windows 64 bit systems.
- Version 2.13.0 – April 14, 2011 – Adding a new compiler function that allows speeding up functions by converting them to byte-code.
- Version 2.14.0 - October 31, 2011 - Added mandatory namespaces for packages. Added a new parallel package.
Graphical user interfaces
- RGUI – comes with the pre-compiled version of R
- Java Gui for R
JGR is a universal and unified Graphical User Interface for the R programming language, licensed under the GNU General Public License.JGR is a cross-platform stand-alone R terminal, and can be used as a more advanced substitute to the default Rgui or to a simple R session started from a terminal...
– cross-platform stand-alone R terminal and editor based on JavaJava is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
(also known as JGR)
- Deducer - GUI for menu driven data analysis (similar to SPSS
SPSS is a computer program used for survey authoring and deployment , data mining , text analytics, statistical analysis, and collaboration and deployment ....
/JMPJMP may refer to:* JMP , a statistical analysis application by SAS Institute, Inc.* JMP * Jean-Marie Pfaff, a Belgian football goalkeeper* Joint Monitoring Programme for Water Supply and Sanitation...
/MinitabMinitab is a statistics package. It was developed at the Pennsylvania State University by researchers Barbara F. Ryan, Thomas A. Ryan, Jr., and Brian L. Joiner in 1972...
).
- Rattle GUI
Rattle GUI is a free and open source software package providing a graphical user interface for Data Mining using the R statistical programming language. The source code available at http://rattle.googlecode.com. Rattle is currently used around the world, in a variety of situations...
– cross-platform GUI based on RGtk2 and specifically designed for data miningData mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
- R Commander
R Commander is a GUI for the R programming language, licensed under the GNU General Public License. Among the existing R GUIs, Rcmdr together with its plug-ins is perhaps the most viable R-alternative to commercial statistical packages like SPSS...
– cross-platform menu-driven GUI based on tclTcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own...
tk (several plug-ins to Rcmdr are also available)
- RapidMiner
- RExcel
RExcel is an addin for Microsoft Excel. It allows access to the statistics package R from within Excel.The main features are:* Data transfer between R and Excel in both directions* Running R code directly from Excel ranges...
– using R and Rcmdr from within Microsoft ExcelMicrosoft Excel is a proprietary commercial spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications...
- Red-R – visual analysis interface that uses R for statistics
- RKWard
RKWard is a transparent front-end to the R programming language, a scripting-language with a strong focus on statistic functions. RKWard tries to combine the power of the R-language with the ease of use of commercial statistical packages....
– extensible GUI and IDE for R
- R AnalyticFlow - analysis flowcharts with R (freeware)
- RStudio - cross-platform open source IDE (which can also be run on a remote linux server)
- Weka
Weka is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand...
allows for the use of the data mining capabilities in Weka and statistical analysis in R.
Editors and IDEs
Text editorA text editor is a type of program used for editing plain text files.Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code....
s and
Integrated development environmentAn integrated development environment is a software application that provides comprehensive facilities to computer programmers for software development...
s (IDEs) with some support for R include:
BluefishBluefish is a web design editor focused towards the development of dynamic websites. Bluefish supports development in HTML, XHTML, CSS, XML, PHP, C, C++, JavaScript, Java, Google Go, Vala, Ada, D, SQL, Perl, ColdFusion, JSP, Python, Ruby and shell. Bluefish is available on most platforms,...
,
Crimson EditorCrimson Editor is an open-source text editor. It is typically used as a source code editor, and HTML editor, for Microsoft Windows. The author was Ingyu Kang.-History:...
, RStudio,
ConTEXTConTEXT is a text editor for Microsoft Windows that can open and edit very large files, while requiring only modest amounts of RAM and hard drive space to run....
,
EclipseEclipse is a multi-language software development environment comprising an integrated development environment and an extensible plug-in system...
,
EmacsEmacs is a class of text editors, usually characterized by their extensibility. GNU Emacs has over 1,000 commands. It also allows the user to combine these commands into macros to automate work.Development began in the mid-1970s and continues actively...
(
Emacs Speaks StatisticsEmacs Speaks Statistics is an Emacs package of modes for statistical languages. It adds two types of modes to emacs:# ESS modes for editing statistical languages like R and SAS; and...
),
VimVim is a text editor written by Bram Moolenaar and first released publicly in 1991. Based on the vi editor common to Unix-like systems, Vim is designed for use both from a command line interface and as a standalone application in a graphical user interface...
,
Tinn-R,
GeanyGeany is a lightweight cross-platform GTK+ text editor based on Scintilla and including basic Integrated Development Environment features. It is designed to have short load times, with limited dependency on separate packages or external libraries. It is available for a wide range of operating...
,
jEditjEdit is a text editor for programmers, available under the GNU General Public License version 2.0. It is written in Java and runs on any operating system with Java support, including Windows, Linux, Mac OS X, and BSD.-Development:...
,
KateIn computing, Kate is a text editor by KDE. The name Kate is an acronym for KDE Advanced Text Editor.-History:Kate has been part of KDE Software Compilation since release 2.2 in 2001. Because of the KParts technology, it is possible to embed Kate as an editing component in other KDE applications...
,
R Productivity Environment (part of Revolution R Enterprise),
TextMateTextMate is a general-purpose GUI text editor for Mac OS X created by Allan Odgaard. Popular with programmers, some notable features include declarative customizations, tabs for open documents, recordable macros, folding sections and snippets, shell integration, and an extensible bundle...
,
geditgedit is a text editor for the GNOME desktop environment, Mac OS X and Microsoft Windows. Designed as a general purpose text editor, gedit emphasizes simplicity and ease of use...
,
SciTESciTE or SCIntilla based Text Editor is a cross-platform text editor written by Neil Hodgson using the Scintilla editing component. It is licensed under a minimal version of the Historical Permission Notice and Disclaimer...
,
WinEdt-External links:* *...
(R Package RWinEdt),
notepad++Notepad++ is a text editor and source code editor for Windows. One advantage of Notepad++ over the built-in Windows text editor, Notepad, is tabbed editing, which allows working with multiple open files.Notepad++ is distributed as free software...
,.
Scripting languages
R functionality has been made accessible from several scripting languages such as
PythonPython is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
(by the RPy interface package),
PerlPerl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
(by the Statistics::R module) and
RubyRuby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...
(with the rsruby rubygem).
Scripting in R itself is possible via littler as well as via Rscript.
Comparison with SAS, SPSS and Stata
The general consensus is that R compares well with other popular statistical packages, such as SAS,
SPSSSPSS is a computer program used for survey authoring and deployment , data mining , text analytics, statistical analysis, and collaboration and deployment ....
and
StataStata is a general-purpose statistical software package created in 1985 by StataCorp. It is used by many businesses and academic institutions around the world...
. In January 2009, the
New York Times ran an article about R gaining acceptance among data analysts and presenting a potential threat for the market share occupied by commercial statistical packages, such as SAS.
Commercial support for R
In 2007,
Revolution AnalyticsRevolution Analytics is a statistical software company focused on developing "open-core" versions of the free and open source software R for enterprise, academic and analytics customers...
was founded to provide commercial support for Revolution R, its distribution of R which also includes components developed by the company. Major additional components include:
ParallelR, the
R Productivity Environment IDE ,
RevoScaleR (for
big dataBig data are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing...
analysis) ,
RevoDeployR, web services framework and the ability for reading and writing data in the SAS file format.
In Oct 2011,
OracleOracle Corporation is an American multinational computer technology corporation that specializes in developing and marketing hardware systems and enterprise software products – particularly database management systems...
announced the
Big Data Appliance, which integrates R, Apache Hadoop,
Oracle Enterprise LinuxOracle Linux, formerly known as Oracle Enterprise Linux, is a Red Hat Enterprise Linux-compatible distribution, repackaged and sold by Oracle, available under the GNU General Public License since late 2006....
, a
NoSQLIn computing, NoSQL is a broad class of database management systems that differ from the classic model of the relational database management system in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally...
database with the Exadata hardware.
Other major commercial software systems supporting connections to R include:
SpotfireSpotfire was a business intelligence company based in Somerville, Massachusetts. It was bought by TIBCO in 2007.-History:Spotfire's origins trace back to the Human-Computer Interaction Laboratory at the University of Maryland, College Park where, in the early 1990s, Christopher Ahlberg, a visiting...
,
SPSSSPSS is a computer program used for survey authoring and deployment , data mining , text analytics, statistical analysis, and collaboration and deployment ....
,
STATISTICASTATISTICA is a statistics and analytics software package developed by StatSoft. STATISTICA provides data analysis, data management, data mining, and data visualization procedures...
,
Platform SymphonyPlatform Symphony is a High-performance computing software system developed by Platform Computing, the company that developed Load Sharing Facility . Focusing on the Financial Services Industry , Symphony is designed to deliver scalability and enhances performance for compute-intensive risk and...
,
SAS- Special forces :* Special Air Service, a special forces unit of the British Army* Australian Special Air Service Regiment * New Zealand Special Air Service * Rhodesian Special Air Service...
See also
- List of statistical packages
- Comparison of statistical packages
The following tables compare general and technical information for a number of statistical analysis packages.-General information:Basic information about each product...
- List of numerical analysis software
- Comparison of numerical analysis software
The following tables provide a comparison of numerical analysis software.- General :- Operating system support :The operating systems the software can run on natively .- Language features :Colors indicate features available as...
- Free statistical software
In this article, the word free generally means can be legally obtained without paying any money . Just a few of the software packages mentioned here are also free as in the sense of free speech: they are not only open source but also free software in the sense that the source code of the software...
- Sweave
Sweave is a function in the statistical programming language R that enables integration of R code into LaTeX or LyX documents. The purpose is "to create dynamic reports, which can be updated automatically if data or analysis change"....
- ggplot2
ggplot2 is a data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005, ggplot2 as an implementation of Leland Wilkinson's Grammar of Graphics--a general scheme for data visualization which breaks up graph into semantic components such as scales and...
External links
of the R project
- The R wiki, a community wiki for R
- R books, has extensive list (with brief comments) of R-related books
- The R Graphical Manual, a collection of R graphics from all R packages, and an index to all functions in all R packages
- R seek, a custom frontend to Google search engine, to assist in finding results related to the R language