Shogun (toolbox)
Encyclopedia
Shogun is an Free software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

, open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 toolbox written in C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

. It offers numerous algorithms and data structures for machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

 problems.

Shogun is licensed under the terms of the GNU General Public License
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

 version 3 or later.

Description

The focus of Shogun is on kernel machines such as support vector machine
Support vector machine
A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...

s for regression
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

 and classification problems. Shogun also offers a full implementation of Hidden Markov model
Hidden Markov model
A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. An HMM can be considered as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E...

s.
The core of Shogun is written in C++ and offers interfaces for MATLAB
MATLAB
MATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...

, Octave
GNU Octave
GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command-line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with MATLAB...

, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

, R
R (programming language)
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....

, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

, Lua, Ruby
Ruby (programming language)
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

 and C#.
Shogun has been under active development since 1999. Today there is a vibrant user community all over the world using Shogun as a base for research and education, and contributing to the core package.

Supported algorithms

Currently Shogun supports the following algorithms:
  • Support vector machines
    Support vector machine
    A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...

  • Dimensionality reduction algorithms, such as PCA, Kernel PCA, Locally Linear Embedding, Hessian Locally Linear Embedding, Local Tangent Space Alignment, Kernel Locally Linear Embedding, Kernel Local Tangent Space Alignment, Multidimensional Scaling, Isomap, Diffusion Maps, Laplacian Eigenmaps
  • Online learning algorithms such as SGD-QN, Vowpal Wabbit
  • Clustering algorithms: k-means and GMM
  • Kernel Ridge Regression, Support Vector Regression
  • Hidden Markov Models
  • K-Nearest Neighbors
    K-nearest neighbor algorithm
    In pattern recognition, the k-nearest neighbor algorithm is a method for classifying objects based on closest training examples in the feature space. k-NN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is deferred until...

  • Linear discriminant analysis
    Linear discriminant analysis
    Linear discriminant analysis and the related Fisher's linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events...

  • Kernel Perceptrons.


Many different kernels are implemented, ranging from kernels for numerical data (such as gaussian or linear kernels) to kernels on special data (such as strings over certain alphabets). The currently implemented kernels for numeric data include:
  • linear
  • gaussian
  • polynomial
  • sigmoid kernels


The supported kernels for special data include:
  • Spectrum
  • Weighted Degree
  • Weighted Degree with Shifts


The latter group of kernels allows processing of arbitrary sequences over fixed alphabets such as DNA sequence
DNA sequence
The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...

s as well as whole e-mail texts.

Special features

As Shogun was developed with bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

applications in mind it is capable of processing huge datasets consisting of up to 10 million samples.
Shogun supports the use of pre-calculated kernels. It is also possible to use a combined kernel i.e. a kernel consisting of a linear combination of arbitrary kernels over different domains. The coefficients or weights of the linear combination can be learned as well. For this purpose Shogun offers a multiple kernel learning functionality.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK