SimMetrics
Encyclopedia
SimMetrics is an open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 extensible library of algorithms for calculating string metrics - measures of similarity or dissimilarity between two text strings
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

. SimMetrics was developed and released by Dr Sam Chapman within the University of Sheffield
University of Sheffield
The University of Sheffield is a research university based in the city of Sheffield in South Yorkshire, England. It is one of the original 'red brick' universities and is a member of the Russell Group of leading research intensive universities...

.

Licensed under the terms of the GNU General Public License
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

.

The SimMetrics open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 library includes the following metrics
  • Levenshtein distance
    Levenshtein distance
    In information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences...

    ,
  • Block distance or city block distance or L2 distance,
  • Cosine similarity
    Cosine similarity
    Cosine similarity is a measure of similarity between two vectors by measuring the cosine of the angle between them. The cosine of 0 is 1, and less than 1 for any other angle. The cosine of the angle between two vectors thus determines whether two vectors are pointing in roughly the same...

    ,
  • Jaccard index
    Jaccard index
    The Jaccard index, also known as the Jaccard similarity coefficient , is a statistic used for comparing the similarity and diversity of sample sets....

    ,
  • Needleman–Wunsch algorithm
    Needleman–Wunsch algorithm
    The Needleman–Wunsch algorithm performs a global alignment on two sequences . It is commonly used in bioinformatics to align protein or nucleotide sequences. The algorithm was published in 1970 by Saul B. Needleman and Christian D...

     or Sellers algorithm,
  • Smith–Waterman algorithm,
  • Gotoh distance or Smith-Waterman-Gotoh distance,
  • Monge Elkan distance,
  • Jaro–Winkler distance,
  • Soundex
    Soundex
    Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless...

    ,
  • Matching coefficient,
  • Dice's coefficient,
  • Jaccard similarity or Jaccard coefficient or Tanimoto coefficient,
  • Overlap coefficient,
  • Euclidean distance
    Euclidean distance
    In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" distance between two points that one would measure with a ruler, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space becomes a metric space...

    ,
  • Q-gram distance,
  • and more.


SimMetrics provides a library of floating-point based (0.0-1.0) similarity measures between pairs of string data as well as the unnormalised metric output.

SimMetrics has been reimplemented and expanded by the original authors as the new tool K-Integrate. K-Integrate is a part of a commercial venture in the company Knowledge Now Limited,Knowledge Now Limited this tool unlike SimMetrics is obtainable under a commercial license for commercial usage.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK