Statistically Improbable Phrases
Encyclopedia
Statistically Improbable Phrases, Statimprophrases or SIPs constitute a system developed by Amazon.com
Amazon.com
Amazon.com, Inc. is a multinational electronic commerce company headquartered in Seattle, Washington, United States. It is the world's largest online retailer. Amazon has separate websites for the following countries: United States, Canada, United Kingdom, Germany, France, Italy, Spain, Japan, and...

 to compare all of the books they index in the Search Inside! program and find phrases in each that are the most unlikely to be found in any other book indexed. The system is used to find the most nearly unique portions of books for use as a summary or keyword.

SIP is also used more generally to refer to a search string likely to generate meaningful results from a search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

; that is, a string whose chance of occurring in a desirable result is much greater than its chance of occurring in a non-desirable result.

Example

  • Book 1


The big brown fox jumped over the lazy dogs. The lazy dog did not like the fact that the big brown fox jumped over him, so the lazy dog ran after him.

  • Book 2


You never have to log in to read Wikipedia. You do not have to log in even to edit articles on Wikipedia—anyone can edit almost any article, even without logging in. Nevertheless, creating an account is quick, free and non-intrusive, and it's generally considered a good idea to do so, for a variety of reasons.

  • Book 3


If you create an account, you can pick a username. Edits you make while logged in will be assigned to that name. That means you get full credit for your contributions in the page history (when not logged in, the edits are just assigned to your (potentially random) IP address). You can also view all your contributions by clicking the "My contributions" link, which is only visible when you are logged in.

SIPs

For Book 1, the SIP would most likely be "Big Brown Fox" and "Lazy Dog"

For Book 2, the SIP would most likely be "Wikipedia", but not "account" because it is featured in Book 3 many times.

For Book 3, the SIP would most likely be "Contributions", and "Logged In"

See also

  • Googlewhack
    Googlewhack
    A Googlewhack is a type of a contest for finding a Google search query consisting of exactly two words without quotation marks, that returns exactly one hit. A Googlewhack must consist of two actual words found in a dictionary...

     — a pair of words occurring on a single webpage, as indexed by Google
  • tf–idf
    Tf–idf
    The tf–idf weight is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus...

    — a similar weight often used in information retrieval and text mining.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK