Feature Selection Toolbox
Encyclopedia
Feature Selection Toolbox (FST) is a machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

 software focusing primarily on the feature selection
Feature selection
In machine learning and statistics, feature selection, also known as variable selection, feature reduction, attribute selection or variable subset selection, is the technique of selecting a subset of relevant features for building robust learning models...

 problem, written in C++, developed at the Institute of Information Theory and Automation (UTIA), of the Czech Academy of Sciences.

Feature Selection Toolbox 1

The first generation of the software (FST1) is a Windows application with user interface allowing users to apply several sub-optimal, optimal and mixture-based feature selection methods on data stored in a trivial proprietary textual flat file format. FST1 is publicly available and free for non-commercial use.

Feature Selection Toolbox 3

The third generation of the software (Feature Selection Toolbox 3) is a library without user interface, written to be more efficient and versatile than the original FST1. FST3 is publicly available and free for non-commercial use.

FST3 supports several standard data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

 tasks, more specifically, data preprocessing and classification, but its main focus is on feature selection
Feature selection
In machine learning and statistics, feature selection, also known as variable selection, feature reduction, attribute selection or variable subset selection, is the technique of selecting a subset of relevant features for building robust learning models...

. In feature selection context it implements several common as well as less usual techniques, with particular emphasis put on threaded
Thread (computer science)
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...

 implementation of various sequential search methods (a form of hill-climbing). Implemented methods include individual feature ranking, floating search, oscillating search (suitable for very-high-dimensional problems) in randomized or deterministic form, optimal methods of branch and bound
Branch and bound
Branch and bound is a general algorithm for finding optimal solutions of various optimization problems, especially in discrete and combinatorial optimization...

 type, probabilistic class distance criteria, various classifier accuracy estimators, feature subset size optimization, feature selection with pre-specified feature weights, criteria ensembles, hybrid methods, detection of all equivalent solutions, or two-criterion optimization. FST3 is more narrowly specialized than popular software like WEKA
Weka (machine learning)
Weka is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand...

, RapidMiner or PRTools.

By default FST's techniques are predicated on the assumption that the data is available as a single flat file in a simple proprietary format or in WEKA
Weka (machine learning)
Weka is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand...

 format ARFF, where each data point is described by a fixed number of numeric attributes. FST3 is provided without user interface
User interface
The user interface, in the industrial design field of human–machine interaction, is the space where interaction between humans and machines occurs. The goal of interaction between a human and a machine at the user interface is effective operation and control of the machine, and feedback from the...

, and is meant to be used by users familiar both with machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

 and C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 programming. The older FST1 software is more suitable for simple experimenting or educational purposes because it can be used without necessity to code in C++.

History

  • In 1999, the development of the first Feature Selection Toolbox version started at UTIA as part of a Ph.D. thesis. It was originally developed in Optima++ (later known under the name Power++) RAD C++ environment.
  • In 2002, the development of the first FST generation has been suspended, mainly due to end of Sybase
    Sybase
    Sybase, an SAP company, is an enterprise software and services company offering software to manage, analyze, and mobilize information, using relational databases, analytics and data warehousing solutions and mobile applications development platforms....

    's support of the then used development environment.
  • In 2002-2008, FST kernel has been re-coded and used for research experimentation within UTIA only.
  • In 2009, 3rd FST kernel re-coding from scratch has started.
  • In 2010, FST3 has been made publicly available in form of a C++ library without GUI. The accompanying web page collects feature selection related links, references, documentation and the original FST1 available for download.
  • In 2011, an update of FST3 to version 3.1 includes new methods (in particular a novel dependency-aware feature ranking suitable for very-high-dimensional recognition problems) and core code improvements.

See also

  • Feature Selection
    Feature selection
    In machine learning and statistics, feature selection, also known as variable selection, feature reduction, attribute selection or variable subset selection, is the technique of selecting a subset of relevant features for building robust learning models...

  • Pattern Recognition
    Pattern recognition
    In machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...

  • Machine Learning
    Machine learning
    Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

  • Data Mining
    Data mining
    Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

  • WEKA
    Weka (machine learning)
    Weka is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand...

     (comprehensive and popular Java
    Java (programming language)
    Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

     open-source software from University of Waikato
    University of Waikato
    The University of Waikato is located in Hamilton and Tauranga, New Zealand, and was established in 1964. It has strengths across a broad range of subject areas, particularly its degrees in Computer Science and in Management...

    )
  • RapidMiner (formerly YALE (Yet Another Learning Environment)) open-source machine learning framework implemented in Java fully integrating Weka
  • PRTools of the Delft University of Technology
  • Infosel++ specialized in information theory
    Information theory
    Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

     based feature selection
  • Tooldiag a C++ pattern recognition toolbox

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK