The
Extended Boolean Model was described in a Communications of the ACM article appearing in 1983, by Gerard Salton, Edward A. Fox, and Harry Wu. The goal of the
Extended Boolean Model is to overcome the drawbacks of the Boolean model that has been used in
information retrievalInformation retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...
. The Boolean model doesn't consider term weights in queries, and the result set of a Boolean query is often either too small or too big. The idea of the extended model is to make use of partial matching and term weights as in the vector space model. It combines the characteristics of the
Vector Space ModelVector space model is an algebraic model for representing text documents as vectors of identifiers, such as, for example, index terms. It is used in information filtering, information retrieval, indexing and relevancy rankings...
with the properties of Boolean algebra and ranks the similarity between queries and documents. This way a document may be somewhat relevant if it matches some of the queried terms and will be returned as a result, whereas in the
Standard Boolean modelThe Boolean model of information retrieval is a classical information retrieval model and, at the same time, the first and most adopted one. It is used by virtually all commercial IR systems today.-Definitions:...
it wasn't.
Thus, the extended Boolean model can be considered as a generalization of both the Boolean and vector space models; those two are special cases if suitable settings and definitions are employed. Further, research has shown effectiveness improves relative to that for Boolean query processing. Other research has shown that
relevance feedbackRelevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query and to use information about whether or not those results are relevant to perform a new query...
and
query expansionQuery expansion is the process of reformulating a seed query to improve retrieval performance in information retrieval operations.In the context of web search engines, query expansion involves evaluating a user's input and expanding the search query to match additional documents...
can be integrated with extended Boolean query processing.
Definitions
In the
Extended Boolean model, a document is represented as a vector (similarly to in the vector model). Each
i dimensionIn mathematics, the dimension of a vector space V is the cardinality of a basis of V. It is sometimes called Hamel dimension or algebraic dimension to distinguish it from other types of dimension...
corresponds to a separate term associated with the document.
The weight of term associated with document is measured by its normalized Term frequency and can be defined as:
where is inverse document frequency.
The weight vector associated with document can be represented as:
The 2 Dimensions Example
Considering the space composed of two terms and only, the corresponding term weights are and . Thus, for query , we can calculate the similarity with the following formula:
For query , we can use:
Generalizing the idea and P-norms
We can generalize the previous 2D extended Boolean model example to higher t-dimensional space using Euclidean distances.
This can be done using P-norms which extends the notion of distance to include p-distances, where is a new parameter.
- A generalized conjunctive query is given by:

- The similarity of
and
can be defined as:
- A generalized disjunctive query is given by:

- The similarity of
and
can be defined as:
Examples
Consider the query . The similarity between query and document can be computed using the formula:
Improvements over the Standard Boolean Model
Lee and Fox compared the Standard and Extended Boolean models with three test collections, CISI, CACM and INSPEC.
Using P-norms they obtained an average precision improvement of 79%, 106% and 210% over the Standard model, for the CISI, CACM and INSPEC collections, respectively.
The P-norm model is computationally expensive because of the number of exponentiation operations that it requires but it achieves much better results than the Standard model and even
Fuzzy retrievalFuzzy retrieval techniques are based on the Extended Boolean model and the Fuzzy set theory. There are two classical fuzzy retrieval models: Mixed Min and Max and the Paice model...
techniques. The
Standard Boolean modelThe Boolean model of information retrieval is a classical information retrieval model and, at the same time, the first and most adopted one. It is used by virtually all commercial IR systems today.-Definitions:...
is still the most efficient.
Further reading