Doug Cutting
Encyclopedia
Douglass Read Cutting is an advocate and creator of open-source search technology. He originated Lucene
Lucene
Apache Lucene is a free/open source information retrieval software library, originally created in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License....

 and, with Mike Cafarella, Nutch
Nutch
Nutch is an effort to build an open source web search engine based on Lucene Java for the search and index component.- Features :Nutch is coded entirely in the Java programming language, but data is written in language-independent formats...

, both open-source search technology projects which are now managed through the Apache Software Foundation
Apache Software Foundation
The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers...

. He holds a bachelor's degree from Stanford University. Prior to developing Lucene, Doug held search technology positions at Excite
Excite
Excite is a collection of Internet sites and services owned by IAC Search & Media, which is a subsidiary of InterActive Corporation . Launched in 1994, it is an online service offering a variety of content, including an Internet portal, a search engine, a web-based email, instant messaging, stock...

, Apple Inc. and Xerox PARC
Xerox PARC
PARC , formerly Xerox PARC, is a research and co-development company in Palo Alto, California, with a distinguished reputation for its contributions to information technology and hardware systems....

. Lucene
Lucene
Apache Lucene is a free/open source information retrieval software library, originally created in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License....

, a search indexer, and Nutch
Nutch
Nutch is an effort to build an open source web search engine based on Lucene Java for the search and index component.- Features :Nutch is coded entirely in the Java programming language, but data is written in language-independent formats...

, a spider or crawler, are the two key components of an open-source general search platform, which first crawls the Web for content, and then structures it into a searchable index. Cutting's leadership of these two projects extended the concepts and capabilities of general open-source software projects such as Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 and MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

 into the important vertical domain of search. While it is difficult to track the total number of installations of these platforms, public announcements of the use of Lucene
Lucene
Apache Lucene is a free/open source information retrieval software library, originally created in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License....

 and its direct descendant Solr
Solr
Solr is an open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document handling...

 by various venture-backed startups indicate a significant level of adoption. Perhaps the most significant deployment of Lucene
Lucene
Apache Lucene is a free/open source information retrieval software library, originally created in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License....

 is Wikipedia
Wikipedia
Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 20 million articles have been written collaboratively by volunteers around the world. Almost all of its articles can be edited by anyone with access to the site,...

, where it powers search for the entire site.

In December 2004, Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

 Labs published a paper on the MapReduce
MapReduce
MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. Parts of the framework are patented in some countries....

 algorithm, which allows very large scale computations to be trivially parallelized across large clusters of servers. Cutting, realizing the importance of this paper to extending Lucene
Lucene
Apache Lucene is a free/open source information retrieval software library, originally created in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License....

 into the realm of extremely large (web-scale) search problems, created the open-source Hadoop
Hadoop
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data...

 framework
Software framework
In computer programming, a software framework is an abstraction in which software providing generic functionality can be selectively changed by user code, thus providing application specific software...

 that allows applications based on the MapReduce
MapReduce
MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. Parts of the framework are patented in some countries....

 paradigm to be run on large clusters of commodity hardware. Cutting was an employee of Yahoo!
Yahoo!
Yahoo! Inc. is an American multinational internet corporation headquartered in Sunnyvale, California, United States. The company is perhaps best known for its web portal, search engine , Yahoo! Directory, Yahoo! Mail, Yahoo! News, Yahoo! Groups, Yahoo! Answers, advertising, online mapping ,...

, where he led the Hadoop
Hadoop
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data...

 project full-time; he has since moved on to Cloudera
Cloudera
Cloudera Inc. is a Palo Alto-based enterprise software company which provides Apache Hadoop-based software and services. It contributes to Hadoop and related Apache projects and provides a distribution for Hadoop for the enterprise. Cloudera has two products: and...

.

In July 2009, Doug Cutting was elected to the board of directors of the Apache Software Foundation, and in September 2010, he was elected its chairman.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK