KNIME
Encyclopedia
KNIME, the Konstanz Information Miner, is a user friendly, coherent open source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining concept. The graphical user interface allows the quick and easy assembly of nodes for data preprocessing (ETL: Extraction, Transformation, Loading
Extract, transform, load
Extract, transform and load is a process in database usage and especially in data warehousing that involves:* Extracting data from outside sources* Transforming it to fit operational needs...

), for modeling and data analysis and visualization. Since 2006, KNIME is used in pharmaceutical research, but is also used in other areas like CRM
Customer relationship management
Customer relationship management is a widely implemented strategy for managing a company’s interactions with customers, clients and sales prospects. It involves using technology to organize, automate, and synchronize business processes—principally sales activities, but also those for marketing,...

 customer data analysis, business intelligence
Business intelligence
Business intelligence mainly refers to computer-based techniques used in identifying, extracting, and analyzing business data, such as sales revenue by products and/or departments, or by associated costs and incomes....

 and financial data analysis.

History

The Development of KNIME was started January 2004 by a team of software engineers at Konstanz University as a proprietary product. The original developer team headed by Michael Berthold came from a company in the Silicon Valley providing software for the pharmaceutical industry. KNIME has been developed from day one using rigorous professional software engineering processes since it was clear from the beginning that it was to be used in large scale enterprises. The initial goal was to create a modular, highly scalable and open data processing platform which allowed for the easy integration of different data loading, processing, transformation, analysis and visual exploration modules without the focus on any particular application area. The platform was intended to be a collaboration and research platform and should also serve as an integration platform for various other data analysis projects out there.
In 2006 the first version of KNIME was released and several pharmaceutical companies started using KNIME and a number of life science software vendors began integrating their tools into KNIME. Later that year, after an article in the German magazine c't, users from a number of other areas joined ship. As of fall 2010, KNIME is in use by over 5.000 actual users (i.e. not counting downloads but users regularly retrieving updates when they become available) not only in the life sciences but also at banks, publishers, consulting firms, and various other industries but also at a large number of research groups worldwide. In comparison to other open source data mining tools KNIME differentiates itself by it user friendliness and usability.

Internals

KNIME allows users to visually create data flows (or pipelines), selectively execute some or all analysis steps, and later inspect the results, models, and interactive views. KNIME is written in Java and based on Eclipse
Eclipse (software)
Eclipse is a multi-language software development environment comprising an integrated development environment and an extensible plug-in system...

 and makes use of its extension mechanism to add plugins providing additional functionality. The core version already includes hundreds of modules for data integration (file I/O, database nodes supporting all common database management systems), data transformation (filter, converter, combiner) as well as the commonly used methods for data analysis and visualization. With the free Report Designer extension, KNIME workflows can be used as data sets to create report templates that can be exported to document formats like doc, ppt, xls, pdf and others. Other capabilities of KNIME are:
  • KNIMEs core-architecture allows processing of large data volumes that are only limited by the available hard disk space (most other open source data analysis tools are working in main memory and are therefore limited to the available RAM). E.g. KNIME allows analysis of 300 million customer addresses, 20 million cell images and 10 million molecular structures.
  • Additional plugins allows the integration of methods for Text Mining
    Text mining
    Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as...

    , Image Mining, as well as time series analysis.
  • KNIME integrates various other Open-Source-projects, e.g. machine learning algorithms from Weka
    Weka (machine learning)
    Weka is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand...

    , the statistics package R, as well as LibSVM, JFreeChart
    JFreeChart
    JFreeChart is an open-source framework for the programming language Java, which allows the creation of a wide variety of both interactive and non-interactive charts.JFreeChart supports a number of various charts, including combined charts:...

    , ImageJ
    ImageJ
    ImageJ is a public domain, Java-based image processing program developed at the National Institutes of Health. ImageJ was designed with an open architecture that provides extensibility via Java plugins and recordable macros. Custom acquisition, analysis and processing plugins can be developed using...

    , and the Chemistry Development Kit
    Chemistry Development Kit
    The Chemistry Development Kit is an open-source Java library for Chemoinformatics and Bioinformatics. It is available for Windows, Unix, and Mac OS...

    .


KNIME is implemented in Java but also allows for wrappers calling other code in addition to providing nodes that allow to run Java, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

, Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

 and other code fragments.

License

As of version 2.1, KNIME is released under GPLv3 with an exception that allows others to use the well defined node API to add proprietary extensions. This allows also commercial SW vendors to add wrappers calling their tools to KNIME.

See also

  • Weka
    Weka (machine learning)
    Weka is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand...

     - machine learning algorithms that can be integrated in KNIME
  • ELKI
    Environment for DeveLoping KDD-Applications Supported by Index-Structures
    ELKI is a knowledge discovery in databases software framework developed for use in research and teaching by the database systems research unit of Professor Hans-Peter Kriegel at the Ludwig Maximilian University of Munich, Germany...

    - data mining framework with many clustering algorithms

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK