Content filtering
Encyclopedia
Content filtering is the technique whereby content is blocked or allowed based on analysis of its content, rather than its source or other criteria. It is most widely used on the internet to filter email
Email
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...

 and web access.

Content filtering of email

Content filtering is the most commonly used group of methods to filter spam
Spam (electronic)
Spam is the use of electronic messaging systems to send unsolicited bulk messages indiscriminately...

. Content filters act either on the content, the information contained in the mail body, or on the mail headers (like "Subject:") to either classify, accept or reject a message.

The most popular filter is the Bayesian filter
Bayesian filtering
Bayesian filtering may refer to:* Bayesian spam filtering, a method to detect spam.* Recursive Bayesian estimation, a method to estimate the state of a system evolving in time.* Bayes' theorem...

, which is a statistical filter.

Usually anti-virus methods can be classified as content filters too, since they scan simplified versions of either the binary attachments of mail or the HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 contents. Content filters can also refer to parental controls
Parental controls
Parental controls are features which may be included in digital television services, computer and video games, mobile phones and computer software...

 software that analyzes data and either restricts the data or changes the data as with chat filtering. Depending on where content or packets are filtered in the OSI or Internet model, content filtering will refer to technologies designed to ascertain the logic of data and that depends on the application, spam, viruses
Computer virus
A computer virus is a computer program that can replicate itself and spread from one computer to another. The term "virus" is also commonly but erroneously used to refer to other types of malware, including but not limited to adware and spyware programs that do not have the reproductive ability...

, computer worm
Computer worm
A computer worm is a self-replicating malware computer program, which uses a computer network to send copies of itself to other nodes and it may do so without any user intervention. This is due to security shortcomings on the target computer. Unlike a computer virus, it does not need to attach...

s, denial-of-service attack
Denial-of-service attack
A denial-of-service attack or distributed denial-of-service attack is an attempt to make a computer resource unavailable to its intended users...

s, trojans
Trojan horse (computing)
A Trojan horse, or Trojan, is software that appears to perform a desirable function for the user prior to run or install, but steals information or harms the system. The term is derived from the Trojan Horse story in Greek mythology.-Malware:A destructive program that masquerades as a benign...

, spyware
Spyware
Spyware is a type of malware that can be installed on computers, and which collects small pieces of information about users without their knowledge. The presence of spyware is typically hidden from the user, and can be difficult to detect. Typically, spyware is secretly installed on the user's...

, human understandable subject of data and much more because to an extent it depends on the application or user requirements, hate websites, swear words, chat application subject matter.

It is important to note that the Internet does not have a clear security model standard designed to limit the extent of security incidents such as worms which could potentially overload the Internet causing a global denial of service. Developing intelligent and sophisticated content filtering technology with standards and cooperation among ISPs may be the solution.

Content filtering of web content

Content filtering is commonly used by organizations such as offices and schools to prevent computer users from viewing inappropriate web sites or content, or as a pre-emptive security measure to prevent access of known malware
Malware
Malware, short for malicious software, consists of programming that is designed to disrupt or deny operation, gather information that leads to loss of privacy or exploitation, or gain unauthorized access to system resources, or that otherwise exhibits abusive behavior...

 hosts. Filtering rules are typically set by a central IT department and may be implemented via software on individual computers or at a central point on the network such as the proxy server
Proxy server
In computer networks, a proxy server is a server that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource available from a different server...

 or internet router. Depending on the sophistication of the system used, it may be possible for different computer users to have different levels of internet access.

Content filtering software is sometimes also used on home computers in order to restrict access to inappropriate websites for children using the computer. Such software is typically described as parental control software.

Filtering methods

Common content filtering methods include:
  • Attachment - The blocking of certain types of file (e.g. executable programs).
  • Bayesian
    Bayesian filtering
    Bayesian filtering may refer to:* Bayesian spam filtering, a method to detect spam.* Recursive Bayesian estimation, a method to estimate the state of a system evolving in time.* Bayes' theorem...

  • DNS Based filtering - www.opendns.com
  • Char-set
    Character (computing)
    In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....

  • Content
    Content (media and publishing)
    In media production and publishing, content is information and experiences that may provide value for an end-user/audience in specific contexts. Content may be delivered via any medium such as the internet, television, and audio CDs, as well as live events such as conferences and stage performances...

    -encoding
  • Heuristic
    Heuristic
    Heuristic refers to experience-based techniques for problem solving, learning, and discovery. Heuristic methods are used to speed up the process of finding a satisfactory solution, where an exhaustive search is impractical...

     - Filtering based on heuristic scoring of the content based on multiple criteria.
  • HTML
    HTML
    HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

     anomalies
  • Language
    Language
    Language may refer either to the specifically human capacity for acquiring and using complex systems of communication, or to a specific instance of such a system of complex communication...

  • Mail header
    Header
    Header may refer to: Computers and engineering* Header , supplemental data at the beginning of a data block** E-mail header** HTTP header* Header file, a text file used in computer programming...

     - Filtering based solely on the analysis of e-mail headers. Made less effective by the ease of message header forgery.
  • Mailing List
    Mailing list
    A mailing list is a collection of names and addresses used by an individual or an organization to send material to multiple recipients. The term is often extended to include the people subscribed to such a list, so the group of subscribers is referred to as "the mailing list", or simply "the...

     - Used to detect mailing list messages and file them in appropriate folders.
  • Phrase
    Phrase
    In everyday speech, a phrase may refer to any group of words. In linguistics, a phrase is a group of words which form a constituent and so function as a single unit in the syntax of a sentence. A phrase is lower on the grammatical hierarchy than a clause....

    s - Filtering based on detecting phrases in the content text.
  • Proximity - Filtering based on detecting words or phrases when used in proximity.
  • Regular Expression
    Regular expression
    In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...

     - Filtering based on rules written as regular expressions.
  • URL-Filtering based on the URL. Suitable for blocking websites or sections of websites.


Most content filtering systems use a combination of techniques.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK