YALE
Encyclopedia

RapidMiner, formerly YALE (Yet Another Learning Environment), is an environment for machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

, data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

, text mining
Text mining
Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as...

, predictive analytics
Predictive analytics
Predictive analytics encompasses a variety of statistical techniques from modeling, machine learning, data mining and game theory that analyze current and historical facts to make predictions about future events....

, and business analytics
Business analytics
Business analytics refers to the skills, technologies, applications and practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning. Business analytics focuses on developing new insights and understanding of business...

. It is used for research, education, training, rapid prototyping
Rapid prototyping
Rapid prototyping is the automatic construction of physical objects using additive manufacturing technology. The first techniques for rapid prototyping became available in the late 1980s and were used to produce models and prototype parts. Today, they are used for a much wider range of applications...

, application development
Software development
Software development is the development of a software product...

, and industrial applications. In a poll by KDnuggets, a data-mining newspaper, RapidMiner ranks second in data mining/analytic tools used for real projects in 2009 and was first in 2010. It is distributed under the AGPL open source license and has been hosted
Host (network)
A network host is a computer connected to a computer network. A network host may offer information resources, services, and applications to users or other nodes on the network. A network host is a network node that is assigned a network layer host address....

 by SourceForge
SourceForge
SourceForge Enterprise Edition is a collaborative revision control and software development management system. It provides a front-end to a range of software development lifecycle services and integrates with a number of free software / open source software applications .While originally itself...

 since 2004.

The RapidMiner project was started in 2001 by Ralf Klinkenberg, Ingo Mierswa, and Simon Fischer at the Artificial Intelligence Unit of the University of Dortmund. In 2006 Ingo Mierswa and Ralf Klinkenberg founded the company Rapid-I that is now the main contributor out of more than 30 international developers further developing RapidMiner.

Purpose

RapidMiner provides data mining and machine learning procedures including: data loading and transformation (ETL), data preprocessing and visualization, modelling, evaluation, and deployment. The data mining processes can be made up of arbitrarily nestable operators, described in XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 files and created in RapidMiner's graphical user interface
Graphical user interface
In computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...

 (GUI). RapidMiner is written in the Java programming language
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

. It also integrates learning schemes and attribute evaluators of the Weka
Weka (machine learning)
Weka is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand...

 machine learning environment and statistical modelling schemes of the R-Project.

The Community Edition of RapidMiner is a toolkit for data mining. It is able to define analytical steps (similar to R
R (programming language)
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....

), and in generating graphs like MS Excel. It is also used for analyzing data generated by high-throughput instruments used in processes such as genotyping, proteomics, and mass spectrometry.

Example applications:
  • Bypassing its data mining functions and have RapidMiner generate figures.
  • Exploring data in Microsoft Excel fashion ("knowledge discovery").
  • Constructing custom data analysis workflows.
  • Calling RapidMiner functions from programs written in other languages/systems (e.g. Perl).


Features:
  • Broad collection of data mining algorithms such as decision trees and self-organization maps.
  • Overlapping histograms, tree charts and 3D scatter plots.
  • Many varied plugins, such as a text plugin for doing text analysis.

Applications

RapidMiner can be used for text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining. RapidMiner was rated as the fifth most used text mining software (6%) by Rexer's Annual Data Miner Survey
Rexer's Annual Data Miner Survey
Rexer Analytics’s Annual Data Miner Survey is the largest survey of data mining professionals in the industry. It consists of approximately 50 multiple choice and open-ended questions that cover seven general areas of data mining science and practice: Field and goals, Algorithms, Models, Tools...

 in 2010.

RapidMiner is found in the: electronics industry, energy industry, automobile industry, commerce, aviation, telecommunications, banking and insurance, production, IT industry, market research, pharmaceutical industry and other fields.

Properties

Some properties of RapidMiner are:
  • written in Java
  • knowledge discovery
    Knowledge discovery
    Knowledge discovery is a concept of the field of computer science that describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data . It is often described as deriving knowledge from the input data...

     processes are modeled as operator trees
  • internal XML representation ensures standardized interchange format of data mining experiments
  • scripting language allows for automatic large-scale experiments
  • multi-layered data view concept ensures efficient and transparent data handling
  • graphical user interface
    Graphical user interface
    In computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...

    , command line mode (batch mode
    Batch file
    In DOS, OS/2, and Microsoft Windows, batch file is the name given to a type of script file, a text file containing a series of commands to be executed by the command interpreter....

    ), and Java API for using RapidMiner from other programs
  • plugin and extension
    Extension (computing)
    Software extension, is a file containing programming that serves to extend the capabilities of or data available to a more basic program. It is a kind of list of commands which are directly included in the program. This term often coincides with the plug-in...

     mechanisms, several plugins already exist
  • plotting
    Plot (graphics)
    A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a mechanical or electronic plotter. Graphs are a visual representation of the relationship between variables, very useful for...

     facility offering a large set of high-dimensional visualization schemes for data and models
  • applications include text mining
    Text mining
    Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as...

    , multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining.

GUI

RapidMiner provides a GUI to design an analytical pipeline (the "operator tree"). The GUI generates an XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 (eXtensible Markup Language) file that defines the analytical processes the user wishes to apply to the data. This file is then read by RapidMiner to run the analyses automatically.

While these are running the GUI can also be used to interactively control and inspect running processes.

Other uses can involve calling RapidMiner from other programs and processes, for example from a Perl program. The Java application programming interface (API) provides clear interfaces for applying operators individually, i.e. there is no need to create an operator tree, providing the ability to bypass the GUI and control analytical processes directly. Individual RapidMiner functions can be called directly from the command line.

Software Versions

RapidMiner is open-source and is offered free of charge as a Community Edition although there is also an Enterprise Edition which offers more functions. RapidMiner source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

 is also offered under a proprietary commercial license, to allow integration into closed-source solutions.

Extensions

The Rapidminer can be extended with additional plugins. The program suite contains around 15 extensions which advance its applicability to: text mining, image processing, time series processing, web mining, statistics, visualization, semantics, paralleling of computation process, automatic process design (PaREn Automatic System Construction Wizard) and others.

Several of the extensions can be found directly in the application in an extension manager. The other extensions can be downloaded from their respective developers.

See also

  • Weka
    Weka (machine learning)
    Weka is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand...

     - machine learning algorithms that can be integrated into RapidMiner
  • R-Project - statistical framework that can be integrated into RapidMiner

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK