All Topics  
CiteSeer

 

   Email Print
   Bookmark   Link






 

CiteSeer



 
 
CiteSeer is a public search engine and digital library
Digital library

A digital library is a library in which collections are stored in digital formats and accessible by computers. The digital content may be stored locally, or accessed remotely via computer networks....
 for scientific and academic papers. It was created by researchers Steve Lawrence
Steve Lawrence (computer scientist)

Dr. Steve Lawrence was among the group at NEC Research which was responsible for the creation of the Search engine /Digital library CiteSeer. He is currently an employee at Google....
, Kurt Bollacker
Kurt Bollacker

Dr. Kurt Bollacker is a computer scientist with a research background in the areas of machine learning, digital libraries, semantic networks, and electro-cardiographic modeling....
 and Lee Giles
Lee Giles

Dr. C. Lee Giles is the David Reese Professor at the College of Information Sciences and Technology at the Pennsylvania State University. He is also Professor of Computer Science and Engineering, Professor of Supply Chain and Information Systems, and Director of the Intelligent Systems Research Laboratory....
 while they were at the NEC Research Institute (now NEC Labs), Princeton, New Jersey
Princeton, New Jersey

Princeton, New Jersey is located in Mercer County, New Jersey, New Jersey, United States. Princeton University has been sited in the town since 1756....
, USA. CiteSeer's goal was to actively crawl and harvest academic and scientific documents on the web and use autonomous citation index
Citation index

A citation index is an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents....
ing to permit querying by citation or by document, ranking them by citation impact
Citation impact

Citation is the process of acknowledging or citing the author, year, title, and locus of publication of a source used in a published work. Such citations can be counted as measures of the usage and impact of the cited work....
. It is hosted on the World Wide Web
World Wide Web

The World Wide Web is a very large set of interlinked hypertext documents accessed via the Internet. With a Web browser, one can view Web pages that may contain writing, s, videos, and other multimedia and navigate between them using hyperlinks....
 at the College of Information Sciences and Technology, The Pennsylvania State University
Pennsylvania State University

The Pennsylvania State University is a Commonwealth System of Higher Education, Land-grant university, space grant college public research university located in State College, PA, Pennsylvania, United States....
, and has over 700,000 documents, primarily in the fields of computer
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
 and information science
Information science

Information science is an interdisciplinarity science primarily concerned with the collection, Categorization, manipulation, storage, information retrieval and dissemination of information....
 and engineering.

CiteSeer freely provides Open Archives Initiative
Open Archives Initiative

The Open Archives Initiative is an attempt to build a "low-barrier interoperability framework" for archives containing digital content . It allows people to harvest Metadata ....
 metadata
Metadata

Metadata is "data about other data", of any sort in any media. An item of metadata may describe an individual datum, or content item, or a collection of data including multiple content items and hierarchical levels, for example a database schema....
 of all indexed documents and links indexed documents when possible to other sources of metadata such as DBLP
DBLP

DBLP is a computer science bibliography website hosted at University of Trier, in Germany. It was originally a database and logic programming bibliography site, and has existed at least since the 1980s....
 and the ACM
Association for Computing Machinery

The Association for Computing Machinery, or ACM, was founded in 1947 as the world's first scientific and educational computing society. Its membership was approximately 83,000 as of 2007....
 portal.

CiteSeer's goal was to improve the dissemination and access of academic and scientific literature.






Discussion
Ask a question about 'CiteSeer'
Start a new discussion about 'CiteSeer'
Answer questions from other users
Full Discussion Forum



Encyclopedia


CiteSeer is a public search engine and digital library
Digital library

A digital library is a library in which collections are stored in digital formats and accessible by computers. The digital content may be stored locally, or accessed remotely via computer networks....
 for scientific and academic papers. It was created by researchers Steve Lawrence
Steve Lawrence (computer scientist)

Dr. Steve Lawrence was among the group at NEC Research which was responsible for the creation of the Search engine /Digital library CiteSeer. He is currently an employee at Google....
, Kurt Bollacker
Kurt Bollacker

Dr. Kurt Bollacker is a computer scientist with a research background in the areas of machine learning, digital libraries, semantic networks, and electro-cardiographic modeling....
 and Lee Giles
Lee Giles

Dr. C. Lee Giles is the David Reese Professor at the College of Information Sciences and Technology at the Pennsylvania State University. He is also Professor of Computer Science and Engineering, Professor of Supply Chain and Information Systems, and Director of the Intelligent Systems Research Laboratory....
 while they were at the NEC Research Institute (now NEC Labs), Princeton, New Jersey
Princeton, New Jersey

Princeton, New Jersey is located in Mercer County, New Jersey, New Jersey, United States. Princeton University has been sited in the town since 1756....
, USA. CiteSeer's goal was to actively crawl and harvest academic and scientific documents on the web and use autonomous citation index
Citation index

A citation index is an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents....
ing to permit querying by citation or by document, ranking them by citation impact
Citation impact

Citation is the process of acknowledging or citing the author, year, title, and locus of publication of a source used in a published work. Such citations can be counted as measures of the usage and impact of the cited work....
. It is hosted on the World Wide Web
World Wide Web

The World Wide Web is a very large set of interlinked hypertext documents accessed via the Internet. With a Web browser, one can view Web pages that may contain writing, s, videos, and other multimedia and navigate between them using hyperlinks....
 at the College of Information Sciences and Technology, The Pennsylvania State University
Pennsylvania State University

The Pennsylvania State University is a Commonwealth System of Higher Education, Land-grant university, space grant college public research university located in State College, PA, Pennsylvania, United States....
, and has over 700,000 documents, primarily in the fields of computer
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
 and information science
Information science

Information science is an interdisciplinarity science primarily concerned with the collection, Categorization, manipulation, storage, information retrieval and dissemination of information....
 and engineering.

CiteSeer freely provides Open Archives Initiative
Open Archives Initiative

The Open Archives Initiative is an attempt to build a "low-barrier interoperability framework" for archives containing digital content . It allows people to harvest Metadata ....
 metadata
Metadata

Metadata is "data about other data", of any sort in any media. An item of metadata may describe an individual datum, or content item, or a collection of data including multiple content items and hierarchical levels, for example a database schema....
 of all indexed documents and links indexed documents when possible to other sources of metadata such as DBLP
DBLP

DBLP is a computer science bibliography website hosted at University of Trier, in Germany. It was originally a database and logic programming bibliography site, and has existed at least since the 1980s....
 and the ACM
Association for Computing Machinery

The Association for Computing Machinery, or ACM, was founded in 1947 as the world's first scientific and educational computing society. Its membership was approximately 83,000 as of 2007....
 portal.

CiteSeer's goal was to improve the dissemination and access of academic and scientific literature. As a non-profit service that can be freely used by anyone, it has been considered as part of the open access
Open access

Open access -- free online access -- can be provided in two ways: open access publishing and open access self-archiving, by its authors, of non-open-access publications ....
 movement that is attempting to change academic and scientific publishing
Academic publishing

Academic publishing describes the subfield of publishing which distributes academia research and scholarship. Most academic work is published in Academic journal article, book or thesis form....
 to allow greater access to scientific literature.

The name can be construed to have at least two explanations. As a pun, a 'sightseer' is a tourist who looks at the sights, so a 'cite seer' would be a researcher who looks at cited papers. Another is a 'seer' is a prophet and a 'cite seer' is a prophet of citations.

CiteSeer has not been comprehensively updated since 2005 due to limitations in its architecture design. It has a representative sampling of research documents in computer and information science but is limited in coverage because it only has access to papers that are publicly available, usually at an authors homepage.

A new version and design of CiteSeer can be found at the Next Generation CiteSeer, CiteSeerx
CiteSeerX

CiteSeerx is a public search engine and digital library for scientific and academic papers with a focus on computer science and information science....
, website. It's important to note that CiteSeer-like engines and archives usually only harvest documents from publicly available websites and do not crawl publisher websites. As such authors whose documents are freely available are more likely to be represented in the index.

Compared to DBLP

A comparison of DBLP references versus those in CiteSeer will always be found lacking since DBLP is manually implemented bibliography. As an example consider the references in DBLP for well known authors such as Alex Pentland
Alex Pentland

Alex Pentland is the Toshiba Professor at MIT, a serial , and is one of the most-cited . Pentland obtained his Ph.D. from MIT in 1981, was Lecturer at Stanford University in both Computer Science and Psychology, and joined the MIT faculty in 1986, where he became Academic Head of the Media Laboratory and received the Toshiba Chair in Med...
 (MIT) or Ramesh Jain
Ramesh Jain

Ramesh Chandra Jain is a scientist and entrepreneur whose decades long career has spanned several universities and startup companies. He is best known for founding the company Virage and working on some of the early visual information retrieval systems....
 (UCI
University of California, Irvine

The University of California, Irvine is a public university coeducational research university founded in 1965, situated in Irvine, California....
) (DBLP listings for Alex Pentland - http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/p/Pentland:Alex.html or Ramesh Jain - http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/j/Jain:Ramesh.html). DBLP shows a regular number of publications (~9) each year in DBLP
DBLP

DBLP is a computer science bibliography website hosted at University of Trier, in Germany. It was originally a database and logic programming bibliography site, and has existed at least since the 1980s....
 through 2007. While CiteSeer has only one of their publications after 2000, DBLP has none of their actual publications but link to those publications on publisher websites.

Recent developments


Other CiteSeer Engines

The CiteSeer model had been extended to cover academic documents in business with SmealSearch
SmealSearch

SmealSearch is a web portal, search engine and digital library for academic business documents that was originally hosted at the defunct eBusiness Research Center at the Pennsylvania State University....
 and in e-business with eBizSearch. However, these were not maintained by their sponsors. A older version of both of these can be found at BizSeer.IST. For enhanced access and performance, similar versions of CiteSeer were supported at universities such as the Massachusetts Institute of Technology
Massachusetts Institute of Technology

The Massachusetts Institute of Technology is a private university research university located in Cambridge, Massachusetts, Massachusetts, United States....
, University of Zürich and the National University of Singapore
National University of Singapore

File:NUS, University Cultural Centre 3, Nov 06.JPGThe National University of Singapore is Singapore's oldest university. It is the largest university in the country in terms of student enrollment and curriculum offered....
. However, these versions of CiteSeer proved difficult to maintain.

Versions of CiteSeer have been or are available at the following links:


Other Seer like search and repository systems have been built for chemistry, ChemXSeer
ChemXSeer

ChemXSeer project, funded by the National Science Foundation, is a public integrated digital library, database, and search engine for scientific papers in chemistry....
 and for archaeology, ArchSeer. Another has been built for robots.txt file search, BotSeer
BotSeer

BotSeer is a Web-based information system and search tool that provides resources and services for research on Web robots and trends in Robot Exclusion Protocol deployment and adherence....
. All of these are built on the open source indexer Lucene
Lucene

Lucene is a free software/open source software information retrieval Library , originally created in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License....
.

Next Generation CiteSeer (CiteSeerx)

The Next Generation CiteSeer project, CiteSeerx, funded by the National Science Foundation and Microsoft Research
Microsoft Research

Microsoft Research is a division of Microsoft created in 1991 for researching various computer science topics and issues. It is one of the top research centers worldwide currently employing Turing Award winners C.A.R....
, enhances CiteSeer both as a search engine and as a digital library. As an example, CiteSeer's notion of "contribution" to acknowledgments
Acknowledgment (creative arts)

In the Creative Arts and scientific literature, an acknowledgment is an expression of gratitude for assistance in creating a literary or artistic work....
 in addition to citations, which would make it the first automatically generated acknowledgment index
Acknowledgment index

An acknowledgment index is an experimental method for analyzing the scientific literature; it quantifies the acknowledgment in scientific journals....
. CiteSeerx is designed differently from CiteSeer with new algorithms for entity extraction and a modular, expandable, robust, scalable architecture based on open source tools such as Lucene
Lucene

Lucene is a free software/open source software information retrieval Library , originally created in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License....
 and many Apache projects. As such, CiteSeerx will promote the creation of other Seer like systems.

The Next Generation CiteSeer, CiteSeerx, is now available in beta with over one million documents indexed and constantly growing.

See also

  • Citation index
    Citation index

    A citation index is an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents....
  • CiteULike
    CiteULike

    CiteULike is based on the principle of social bookmarking and is aimed to promote and to develop the sharing of scientific references amongst researchers....
  • The Collection of Computer Science Bibliographies
    The Collection of Computer Science Bibliographies

    The Collection of Computer Science Bibliographies is one of the oldest bibliography collections freely accessible on the Internet. It is a collection of bibliographies of scientific literature in computer science and mathematics from various sources, covering most aspects of computer science....
  • DBLP
    DBLP

    DBLP is a computer science bibliography website hosted at University of Trier, in Germany. It was originally a database and logic programming bibliography site, and has existed at least since the 1980s....
     (Digital Bibliography & Library Project)
  • getCITED
    GetCITED

    GetCITED is a website database that lists publication and citation information on academic articles whose information is entered by members. It aims to include not only journal articles but also book chapters and other publications, both peer-reviewed and non-reviewed....
  • Google Scholar
    Google Scholar

    Google Scholar is a freely-accessible Web search engine that indexes the full text of scholarly literature across an array of publishing formats and disciplines....
  • Institute for Scientific Information
    Institute for Scientific Information

    The Institute for Scientific Information was founded by Eugene Garfield in 1960. It was acquired by Thomson Scientific & Healthcare in 1992, became known as Thomson ISI and now as Thomson Scientific....
    's Web of Science
    Web of Science

    ISI Web of Knowledge is an online academic database provided by Thomson Scientific. It provides access to many databases and other resources: Web of Science , ISI Proceedings, Current Contents, Medline, ISI Essential Science Indicators, Journal Citation Reports , in-cites, Science Watch, ISI_highly_cited_researcher, Index to Organism Names, a...
  • Libra (Academic Search)
    Libra (Academic Search)

    Libra Academic Search is a public search engine for academic papers and literature, which is developed by Microsoft Research Asia. It uses the method of object-level Vertical search technology....
  • List of academic databases and search engines
  • Scirus
    Scirus

    Scirus is a comprehensive science-specific search engine. Like CiteSeer and Google Scholar, it is focused on scientific information. Unlike CiteSeer, Scirus is not only for computer sciences and IT and not all of the results include full text....
  • Scopus
    Scopus

    Scopus is a bibliographic database of Abstract s and citations for academic journal Article s. It indexes 15,800 peer reviewed journals in the scientific, technical, medical and social sciences fields....
  • SmealSearch
    SmealSearch

    SmealSearch is a web portal, search engine and digital library for academic business documents that was originally hosted at the defunct eBusiness Research Center at the Pennsylvania State University....


External links

  • (includes among other collections also CiteSeer and DBLP)
  • by Steve Lawrence, C. Lee Giles and Kurt Bollacker