Change Detection and Notification
Encyclopedia
Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

 change detection and notification (CDN) refers to automatic detection of changes made to World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

 pages and notification to interested users by email or other means. Whereas search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

s are designed to find web pages, CDN systems are designed to monitor changes to web pages. Before change detection and notification, it was necessary for users to manually check for web page changes, either by revisiting web sites or periodically searching again. Efficient and effective change detection and notification is hampered by the fact that most servers do not accurately track content changes through Last-Modified or ETag
HTTP ETag
An ETag, or entity tag, is part of HTTP, the protocol for the World Wide Web. It is one of several mechanisms that HTTP provides for cache validation, and which allows a client to make conditional requests. This allows caches to be more efficient, and saves bandwidth, as a web server does not...

 headers.

History

In 1996, NetMind
NetMind
NetMind Technologies was an Internet software company founded in February 1996 by Matt Freivald, and .The company pioneered Internet change detection and notification at a time when most companies were still focused on Internet search...

 developed the first change detection and notification tool, known as Mind-it, which ran for six years. This spawned new services such as ChangeDetection.com (1999), ChangeDetect (2002) and Google Alerts
Google Alerts
Google Alerts is content change detection and notification service, offered by the search engine company Google, that automatically notifies users when new content from news, web, blogs, video and/or discussion groups matches a set of search terms selected by the user and stored by the Google...

 (2004). Historically, change polling has been done either by a server which sent email notifications or a desktop program which audibly alerted the user to a change.

The prevalence of cloud computing and smartphones is changing the CDN market, namely how polling is done and how notifications are sent. A mobile CDN device with a cloud back end does not suffer from limited bandwidth, storage or processing power, and notifications are delivered to wherever the device is. One such service is dasPing (2011).

Three approaches

  • A local application with a graphical user interface
    Graphical user interface
    In computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...

     polls and tracks changes.
  • A server polls, tracks changes and sends email notifications with a web browser user interface.
  • A mobile device connects to a cloud server and can be notified in real time by the server when a change is detected.

Considerations

Some web pages change regularly, due to the inclusion of adverts or feeds in the presented page. This can trigger false-positives in the change-detection, since users are often only interested in changes to the main content. Some approaches to mitigate this issue exist.
  • Create a metric of difference between two versions of a page (calculated for example from change in total size, changes in HTML file, or changes in the DOM tree) and ignore changes below some threshold. The threshold may be set by the user, or estimated automatically by comparing some early versions of the page.

  • Content extraction. For popular sites, or sites running popular software, content may be actively separated from chaff by selecting a sub-tree of the DOM, for example using XPath.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK