SpamBayes
Encyclopedia
SpamBayes is a Bayesian
Bayesian spam filtering
Bayesian spam filtering is a statistical technique of e-mail filtering. It makes use of a naive Bayes classifier to identify spam e-mail.Bayesian classifiers work by correlating the use of tokens , with spam and non spam e-mails and then using Bayesian inference to calculate a probability that an...

 spam filter
E-mail filtering
Email filtering is the processing of email to organize it according to specified criteria. Most often this refers to the automatic processing of incoming messages, but the term also applies to the intervention of human intelligence in addition to anti-spam techniques, and to outgoing emails as well...

 written in Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

 which uses techniques laid out by Paul Graham in his essay "A Plan for Spam". It has subsequently been improved by Gary Robinson
Gary Robinson
Gary Robinson is an American software engineer notable for his mathematical algorithms to fight spam.-Fighting spam with algorithms:In 2003, Robinson published an article in Linux Journal which discussed mathematical approaches for fighting spam which led to work along with Tim Peters on the...

 and Tim Peters, among others.

The most notable difference between a conventional Bayesian filter and the filter used by SpamBayes is that there are three classifications rather than two: spam, non-spam (called ham in SpamBayes), and unsure. The user trains a message as being either ham or spam; when filtering a message, the spam filters generate one score for ham and another for spam.

If the spam score is high and the ham score is low, the message will be classified as spam.
If the spam score is low and the ham score is high, the message will be classified as ham.
If the scores are both high or both low, the message will be classified as unsure.

This approach leads to a low number of false positives and false negatives, but it may result in a number of unsures which need a human decision.

Web filtering

Some work has gone into applying SpamBayes to filter internet content
Content-control software
Content-control software, also known as censorware or web filtering software, is a term for software designed and optimized for controlling what content is permitted to a reader, especially when it is used to restrict material delivered over the Web...

 via a proxy web server.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK