Wordfilter
Encyclopedia
A wordfilter is a script typically used on Internet forum
Internet forum
An Internet forum, or message board, is an online discussion site where people can hold conversations in the form of posted messages. They differ from chat rooms in that messages are at least temporarily archived...

s or chat room
Chat room
The term chat room, or chatroom, is primarily used by mass media to describe any form of synchronous conferencing, occasionally even asynchronous conferencing...

s that automatically scans users' posts or comments as they are submitted and automatically changes or censors
Censorship
thumb|[[Book burning]] following the [[1973 Chilean coup d'état|1973 coup]] that installed the [[Military government of Chile |Pinochet regime]] in Chile...

 particular words or phrases.

The most primitive wordfilters search only for a specific string and replaces it regardless of the situation. More advanced wordfilters will make distinctions against certain words, such as filtering "ass" but not "grass". The most advanced wordfilters may use regular expression
Regular expression
In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...

s.

Removal of vulgar language

Most commonly, wordfilters are used to censor language considered inappropriate by the operators of the forum or chat room. Expletive
Profanity
Profanity is a show of disrespect, or a desecration or debasement of someone or something. Profanity can take the form of words, expressions, gestures, or other social behaviors that are socially constructed or interpreted as insulting, rude, vulgar, obscene, desecrating, or other forms.The...

s are typically partially replaced ('f*ck'), completely replaced ('****'), or replaced by nonsense words ('fark'). This relieves the administrators or moderators of the task of constantly patrolling the board to watch for such language. This may also help the message board avoid content-control software
Content-control software
Content-control software, also known as censorware or web filtering software, is a term for software designed and optimized for controlling what content is permitted to a reader, especially when it is used to restrict material delivered over the Web...

 installed on users' computers or networks, since such software often blocks access to Web pages that contain vulgar language.

Filtered phrases may be permanently replaced as it is saved (example: phpBB
PhpBB
phpBB is a popular Internet forum package written in the PHP scripting language. The name "phpBB" is an abbreviation of PHP Bulletin Board...

 1.x), or the original phrase may be saved but displayed as the censored text. In some software users can view the text behind the wordfilter by quoting the post.

Cliché control

Cliché
Cliché
A cliché or cliche is an expression, idea, or element of an artistic work which has been overused to the point of losing its original meaning or effect, especially when at some earlier time it was considered meaningful or novel. In phraseology, the term has taken on a more technical meaning,...

s -- particular words or phrases constantly reused in posts, also known as "memes" -- often develop on forums. Some users find that these clichés add to the fun, but other users find them tedious, especially when overused. Administrators may configure the wordfilter to replace the annoying cliché with a more embarrassing phrase, or remove it altogether.

Vandalism control

Internet forums are sometimes attacked by vandals
Vandalism
Vandalism is the behaviour attributed originally to the Vandals, by the Romans, in respect of culture: ruthless destruction or spoiling of anything beautiful or venerable...

 who try to fill the forum with repeated nonsense messages, or by spammers
Spam (electronic)
Spam is the use of electronic messaging systems to send unsolicited bulk messages indiscriminately...

 who try to insert links to their commercial web sites. The site's wordfilter may be configured to remove the nonsense text used by the vandals, or to remove all links to particular websites from posts.

Lameness filter

Lameness filters are text-based wordfilters used by Slash-based websites to stop junk
Spam (electronic)
Spam is the use of electronic messaging systems to send unsolicited bulk messages indiscriminately...

 comments from being posted in response to stories. Some of the things they are designed to filter include:
  • Too many capital letters
  • Too much repetition
  • ASCII art
    ASCII art
    ASCII art is a graphic design technique that uses computers for presentation and consists of pictures pieced together from the 95 printable characters defined by the ASCII Standard from 1963 and ASCII compliant character sets with proprietary extended characters...

  • Comments which are too short or long
  • Use of HTML tags that try to break web pages
  • Comment titles consisting solely of "first post"
  • Any occurrence of the word "gay" or other terms deemed (by the programmers) to be offensive/vulgar

Circumventing filters

Since wordfilters are automated and look only for particular sequences of characters
Character (computing)
In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....

, users aware of the filters will sometimes try to circumvent them by changing their lettering just enough to avoid the filters. A user trying to avoid a vulgarity filter might use "shi-" instead of "shit", for example. Some administrators respond by revising the wordfilters to catch common substitutions; others may make filter evasion a punishable offense of its own. A simple example of evading a wordfilter would be entering "f.uck" instead of "fuck" or using leet
Leet
Leet , also known as eleet or leetspeak, is an alternative alphabet for the English language that is used primarily on the Internet. It uses various combinations of ASCII characters to replace Latinate letters...

. More advanced techniques of wordfilter evasion include the use of images, using hidden tags (such as fu[i][/i]ck), or Cyrillic characters
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...

.
Another method is to use a soft hyphen
Soft hyphen
In computing and typesetting, a soft hyphen is a type of hyphen used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed....

. A soft hyphen is only used to indicate where a word can be split when breaking text lines and is not displayed. By placing this halfway in a word, the word gets broken up and will in some cases not be recognised by the wordfilter.

Some more advanced filters, such as those in the online game RuneScape
RuneScape
RuneScape is a fantasy massively multiplayer online role-playing game released in January 2001 by Andrew and Paul Gower, and developed and published by Jagex Games Studio. It is a graphical browser game implemented on the client-side in Java, and incorporates 3D rendering...

, can detect bypassing such as "sh1t" instead of "shit". However, the downside of sensitive wordfilters is that legitimate phrases get filtered out as well.

Censorship aspects

Wordfilters are coded into the Internet forums or chat rooms, and operate only on material submitted to the forum or chat room in question. This distinguishes wordfilters from content-control software
Content-control software
Content-control software, also known as censorware or web filtering software, is a term for software designed and optimized for controlling what content is permitted to a reader, especially when it is used to restrict material delivered over the Web...

, which is typically installed on an end user's PC or computer network, and which can filter all Internet content sent to or from the PC or network in question. Since wordfilters alter a user's words without his or her consent, some users still consider them to be censorship
Censorship
thumb|[[Book burning]] following the [[1973 Chilean coup d'état|1973 coup]] that installed the [[Military government of Chile |Pinochet regime]] in Chile...

, while others consider them an acceptable part of a forum operator's right to control the contents of the forum.

Cultural significance

Some wordfilters originally implemented for their humorous value became Internet meme
Internet meme
The term Internet meme is used to describe a concept that spreads via the Internet. The term is a reference to the concept of memes, although the latter concept refers to a much broader category of cultural information.-Description:...

s. One example was 4chan
4chan
4chan is an English-language imageboard website. Launched on October 1, 2003, its boards were originally used for the posting of pictures and discussion of manga and anime...

's wordfilter which replaced "wapanese" (slang term for a Westerner obsessed with Japanese culture
Japanophile
Japanophilia is an interest in, or love of, Japan and anything Japanese; its opposite is Japanophobia. One who has such an interest or love is a Japanophile...

) with the initially nonsensical "weeaboo" — a word taken from a strip of the webcomic The Perry Bible Fellowship
The Perry Bible Fellowship
The Perry Bible Fellowship is a newspaper comic strip and webcomic by Nicholas Gurewitch. It originated in the Syracuse University newspaper The Daily Orange. The comics are usually three or four panels long, and are generally characterized by the juxtaposition of whimsical childlike imagery or...

 — which then became a popular synonym for "wapanese" (for example as of November 2009, it has 7x the Google hits).

False positives

The Scunthorpe problem occurs when a wordfilter, spam filter, or search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

, blocks content because the text contains a string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

 of letters that are shared by an obscene
Obscenity
An obscenity is any statement or act which strongly offends the prevalent morality of the time, is a profanity, or is otherwise taboo, indecent, abhorrent, or disgusting, or is especially inauspicious...

word.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK