Adversarial information retrieval
Encyclopedia
Adversarial information retrieval (adversarial IR) is a topic in information retrieval
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

 related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.

On the Web, the predominant form of such manipulation is search engine spamming
Spamdexing
In computing, spamdexing is the deliberate manipulation of search engine indexes...

 (also known as spamdexing), which involves employing various techniques to disrupt the activity of web search engines, usually for financial gain. Examples of spamdexing are link-bombing
Google bomb
The terms Google bomb and Googlewashing refer to practices, such as creating large numbers of links, that cause a web page to have a high ranking for searches on unrelated or off topic keyword phrases, often for comical or satirical purposes...

, comment or referrer spam, spam blog
Spam blog
A spam blog, sometimes referred to by the neologism splog, is a blog which the author uses to promote affiliated websites, to increase the search engine rankings of associated sites or to simply sell links/ads....

s (splogs), malicious tagging. Reverse engineering of ranking algorithms
Ranking function
In information retrieval, a ranking function is a function used by search engines to rank matching documents according to their relevance to a given search query....

, advertisement blocking
Ad filtering
Ad filtering or ad blocking is removing or altering advertising content in a webpage. Advertising can exist in a variety of forms including pictures, animations, text, or pop-up windows. More advanced filters allow fine-grained control of advertisements through features such as blacklists,...

, and web content filtering may also be considered forms of adversarial data manipulation.

Activities intended to poison the supply of useful data make search engines less useful for users. If search engines are more exclusionary they risk becoming more like directories and less dynamic.

Topics

Topics related to Web spam (spamdexing):
  • Link spam
  • Keyword spamming
  • Cloaking
    Cloaking
    Cloaking is a search engine optimization technique in which the content presented to the search engine spider is different from that presented to the user's browser. This is done by delivering content based on the IP addresses or the User-Agent HTTP header of the user requesting the page...

  • Malicious tagging
  • Spam related to blogs, including comment spam, splogs
    Spam blog
    A spam blog, sometimes referred to by the neologism splog, is a blog which the author uses to promote affiliated websites, to increase the search engine rankings of associated sites or to simply sell links/ads....

    , and ping spam
    Sping
    Sping is short for "spam ping", and is related to pings from blogs using trackbacks, called trackback spam. Pings are messages sent from blog and publishing tools to a centralized network service providing notification of newly published posts or content...



Other topics:
  • Click fraud
    Click fraud
    Click fraud is a type of Internet crime that occurs in pay per click online advertising when a person, automated script or computer program imitates a legitimate user of a web browser clicking on an ad, for the purpose of generating a charge per click without having actual interest in the target...

     detection
  • Reverse engineering of a search engine
    Search engine
    A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

    's ranking
    Ranking
    A ranking is a relationship between a set of items such that, for any two items, the first is either 'ranked higher than', 'ranked lower than' or 'ranked equal to' the second....

     algorithm
  • Web content filtering
    Content filtering
    Content filtering is the technique whereby content is blocked or allowed based on analysis of its content, rather than its source or other criteria. It is most widely used on the internet to filter email and web access.- Content filtering of email :...

  • Advertisement blocking
    Ad filtering
    Ad filtering or ad blocking is removing or altering advertising content in a webpage. Advertising can exist in a variety of forms including pictures, animations, text, or pop-up windows. More advanced filters allow fine-grained control of advertisements through features such as blacklists,...

  • Stealth crawling
  • Malicious tagging or voting in social networks

History

The term "adversarial information retrieval" was first coined in 2000 by Andrei Broder
Andrei Broder
Andrei Zary Broder is a Research Fellow and Vice President of Emerging Search Technology for Yahoo!. He previously has worked for AltaVista as the vice president of research, and for IBM Research as a Distinguished Engineer and CTO of IBM's Institute for Search and Text Analysis.Broder's research...

 (then Chief Scientist at Alta Vista) during the Web plenary session at the TREC
Text Retrieval Conference
The Text REtrieval Conference is an on-going series of workshops focusing on a list of different information retrieval research areas, or tracks. It is co-sponsored by the National Institute of Standards and Technology and the Intelligence Advanced Research Projects Activity , and began in 1992...

-9 conference.

External links

  • AIRWeb: series of workshops on Adversarial Information Retrieval on the Web
  • Web Spam Challenge: competition for researchers on Web Spam Detection
  • Web Spam Datasets: datasets for research on Web Spam Detection
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK