DBpedia Spotlight
Encyclopedia
DBpedia Spotlight is a tool for annotating mentions of DBpedia
DBpedia
DBpedia is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources,...

 resources in text, providing a solution for linking unstructured information sources to the Linked Open Data
Linked Data
In computing, linked data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a...

 cloud through DBpedia. DBpedia Spotlight performs named entity extraction, including entity detection and Name Resolution (a.k.a. disambiguation). It can also be used for building your solution for Named entity recognition
Named entity recognition
Named-entity recognition is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.Most research on NER...

, amongst other information extraction
Information extraction
Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...

 tasks.

The project was started in June 2010 at the Free University of Berlin
Free University of Berlin
Freie Universität Berlin is one of the leading and most prestigious research universities in Germany and continental Europe. It distinguishes itself through its modern and international character. It is the largest of the four universities in Berlin. Research at the university is focused on the...

 by the researchers from the Web Based Systems Group. The objective of DBpedia Spotlight is to provide a flexible solution that can be customized for many use cases. Instead of focusing on a few entity types, the project strives to support the annotation of all 3.5M entities and concepts from more than 320 classes in DBpedia.

DBpedia Spotlight is publicly available as a Web Service
Web service
A Web service is a method of communication between two electronic devices over the web.The W3C defines a "Web service" as "a software system designed to support interoperable machine-to-machine interaction over a network". It has an interface described in a machine-processable format...

 for testing purposes or a Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

/Scala API licensed via Apache License
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

. The DBpedia Spotlight distribution also includes a jQuery
JQuery
jQuery is a cross-browser JavaScript library designed to simplify the client-side scripting of HTML. It was released in January 2006 at BarCamp NYC by John Resig...

 plugin that allows developers to annotate pages anywhere on the Web by adding one line to their page. Clients are also available in Java or PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

.

The tool currently only handles English language, although developers have reported http://answers.semanticweb.com/questions/4075/dbpedia-spotlight-in-other-languages that work on the internationalization of DBpedia Spotlight has started.

The documentation of the tool is divided in a Users Manual, Technical Documentation and Installation. All pages can be accessed from http://spotlight.dbpedia.org.

Known Uses

While still a relatively young project, Web developers and researchers across the world have started adopting the tool.
  • EuropeanaConnect Media Annotation Prototype
  • Twitter Swarm NLP Chrome Extension
  • Simple API Test Tool
  • Topica: University of Sheffield's entry to AIMashup2011
  • RDFaCE, an RDFa
    RDFa
    RDFa is a W3C Recommendation that adds a set of attribute-level extensions to XHTML for embedding rich metadata within Web documents...

     Content Editor based on TinyMCE
    TinyMCE
    TinyMCE, also known as the Tiny Moxiecode Content Editor, is a platform-independent web-based JavaScript/HTML WYSIWYG editor control, released as open source software under the LGPL by Moxiecode Systems AB. It has the ability to convert HTML textarea fields or other HTML elements to editor instances...

  • NERD, a framework for comparing Named Entity Recognition and Disambiguation tools

  • Simon, R., Jung, J., Haslhofer, B. The YUMA Media Annotation Framework. Research and Advanced Technology for Digital Libraries http://dx.doi.org/10.1007/978-3-642-24469-8_43
  • Fabian Abel, Ilknur Celik, Geert-Jan Houben and Patrick Siehndel. ((http://fabianabel.de/papers/2011-wis-twitter-faceted-search-iswc.pdf Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter)). In the 10th International Semantic Web Conference (ISWC2011), Bonn, Germany.
  • Kristian Slabbekoorn, Laura Hollink, Geert-Jan Houben. ((http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Workshops/DeRiVE/derive2011_submission_15.pdf Domain-aware Matching of Events to DBpedia)). Accepted for publication at the DeRiVE workshop on Detection, Representation, and Exploitation of Events in the Semantic Web (ISWC2011), Bonn, Germany.

External links

  • Project Website: http://spotlight.dbpedia.org.
  • Source code: http://sourceforge.net/projects/dbp-spotlight/
  • Zaino, Jennifer. The Spotlight’s on DBpedia. Article on SemanticWeb.com. February 21, 2011.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK