Cyc
Encyclopedia
Cyc is an artificial intelligence project that attempts to assemble a comprehensive ontology
Ontology (computer science)
In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain and may be used to describe the domain.In theory, an ontology is...

 and knowledge base
Knowledge base
A knowledge base is a special kind of database for knowledge management. A Knowledge Base provides a means for information to be collected, organised, shared, searched and utilised.-Types:...

 of everyday common sense knowledge, with the goal of enabling AI
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...

 applications to perform human-like reasoning.
The project was started in 1984 by Douglas Lenat
Douglas Lenat
Douglas B. Lenat is the CEO of Cycorp, Inc. of Austin, Texas, and has been a prominent researcher in artificial intelligence, especially machine learning , knowledge representation, blackboard systems, and "ontological engineering"...

 at MCC
Microelectronics and Computer Technology Corporation
Microelectronics and Computer Technology Corporation was the first, and - at one time - one of the largest, computer industry research and development consortia in the United States....

 and is developed by company Cycorp.
Parts of the project are released as OpenCyc, which provides an API, RDF endpoint, and data dump under an open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 license.

Overview

The project was started in 1984 as part of Microelectronics and Computer Technology Corporation
Microelectronics and Computer Technology Corporation
Microelectronics and Computer Technology Corporation was the first, and - at one time - one of the largest, computer industry research and development consortia in the United States....

. The objective was to codify, in machine-usable form, millions of pieces of knowledge that comprise human common sense. CycL presented a proprietary knowledge representation schema that utilized first-order relationships. In 1986, Doug Lenat estimated the effort to complete Cyc would be 250,000 rules and 350 man-years of effort.
The Cyc Project was spun off into Cycorp, Inc. in Austin, Texas
Austin, Texas
Austin is the capital city of the U.S. state of :Texas and the seat of Travis County. Located in Central Texas on the eastern edge of the American Southwest, it is the fourth-largest city in Texas and the 14th most populous city in the United States. It was the third-fastest-growing large city in...

 in 1994.

The name "Cyc" (from "encyclopedia", pronounced [saɪk] like syke) is a registered trademark owned by Cycorp. The original knowledge base is proprietary, but a smaller version of the knowledge base, intended to establish a common vocabulary for automatic reasoning, was released as OpenCyc under an open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 (Apache) license. More recently, Cyc has been made available to AI researchers under a research-purposes license as ResearchCyc.

Typical pieces of knowledge represented in the database are "Every tree is a plant" and "Plants die eventually". When asked whether trees die, the inference engine can draw the obvious conclusion and answer the question correctly. The Knowledge Base (KB) contains over one million human-defined assertions, rules or common sense ideas. These are formulated in the language CycL
CycL
CycL in computer science and artificial intelligence is an ontology language used by Doug Lenat's Cyc artificial intelligence project. Ramanathan V. Guha was instrumental in the design of early versions of the language. There is a close variant of CycL known as MELD.The original version of CycL was...

, which is based on predicate calculus and has a syntax
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....

 similar to that of the Lisp programming language
Lisp programming language
Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older...

.

Much of the current work on the Cyc project continues to be knowledge engineering
Knowledge engineering
Knowledge engineering was defined in 1983 by Edward Feigenbaum, and Pamela McCorduck as follows:At present, it refers to the building, maintaining and development of knowledge-based systems...

, representing facts about the world by hand, and implementing efficient inference mechanisms on that knowledge. Increasingly, however, work at Cycorp involves giving the Cyc system the ability to communicate with end users in natural language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...

, and to assist with the knowledge formation process via machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

.

Like many companies, Cycorp has ambitions to use the Cyc natural language understanding tools to parse the entire internet to extract structured data.

In 2008, Cyc resources were mapped to many Wikipedia
Wikipedia
Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 20 million articles have been written collaboratively by volunteers around the world. Almost all of its articles can be edited by anyone with access to the site,...

 articles, potentially easing connecting with other open datasets like DBpedia
DBpedia
DBpedia is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources,...

 and Freebase
Freebase (database)
Freebase is a large collaborative knowledge base consisting of metadata composed mainly by its community members. It is an online collection of structured data harvested from many sources, including individual 'wiki' contributions. Freebase aims to create a global resource which allows people to...

.

Knowledge base

The concept names in Cyc are known as constants. Constants start with an optional "#$" and are case-sensitive. There are constants for:
  • Individual items known as individuals, such as #$BillClinton or #$France.
  • Collections, such as #$Tree-ThePlant (containing all trees) or #$EquivalenceRelation (containing all equivalence relation
    Equivalence relation
    In mathematics, an equivalence relation is a relation that, loosely speaking, partitions a set so that every element of the set is a member of one and only one cell of the partition. Two elements of the set are considered equivalent if and only if they are elements of the same cell...

    s). A member of a collection is called an instance of that collection.
  • Truth Functions which can be applied to one or more other concepts and return either true or false. For example #$siblings is the sibling relationship, true if the two arguments are siblings. By convention, truth function constants start with a lower-case letter. Truth functions may be broken down into logical connectives (such as #$and, #$or, #$not, #$implies), quantifiers (#$forAll, #$thereExists, etc.) and predicates.
  • Functions, which produce new terms from given ones. For example, #$FruitFn, when provided with an argument describing a type (or collection) of plants, will return the collection of its fruits. By convention, function constants start with an upper-case letter and end with the string "Fn".


The most important predicates are #$isa and #$genls. The first one describes that one item is an instance of some collection, the second one that one collection is a subcollection of another one. Facts about concepts are asserted using certain CycL sentences. Predicates are written before their arguments, in parentheses:
(#$isa #$BillClinton #$UnitedStatesPresident)
"Bill Clinton belongs to the collection of U.S. presidents" and
(#$genls #$Tree-ThePlant #$Plant)
"All trees are plants".
(#$capitalCity #$France #$Paris)
"Paris is the capital of France."

Sentences can also contain variables, strings starting with "?". These sentences are called "rules". One important rule asserted about the #$isa predicate reads
(#$implies
(#$and
(#$isa ?OBJ ?SUBSET)
(#$genls ?SUBSET ?SUPERSET))
(#$isa ?OBJ ?SUPERSET))
with the interpretation "if OBJ is an instance of the collection SUBSET
Subset
In mathematics, especially in set theory, a set A is a subset of a set B if A is "contained" inside B. A and B may coincide. The relationship of one set being a subset of another is called inclusion or sometimes containment...

 and SUBSET is a subcollection of SUPERSET
SuperSet
SuperSet Software was a group founded by friends and former Eyring Research Institute co-workers Drew Major, Dale Neibaur, Kyle Powell and later joined by Mark Hurst...

, then OBJ is an instance of the collection SUPERSET". Another typical example is
(#$relationAllExists #$biologicalMother #$ChordataPhylum #$FemaleAnimal)
which means that for every instance of the collection #$ChordataPhylum (i.e. for every chordate
Chordate
Chordates are animals which are either vertebrates or one of several closely related invertebrates. They are united by having, for at least some period of their life cycle, a notochord, a hollow dorsal nerve cord, pharyngeal slits, an endostyle, and a post-anal tail...

), there exists a female animal (instance of #$FemaleAnimal) which is its mother (described by the predicate #$biologicalMother).

The knowledge base
Knowledge base
A knowledge base is a special kind of database for knowledge management. A Knowledge Base provides a means for information to be collected, organised, shared, searched and utilised.-Types:...

 is divided into microtheories (Mt), collections of concepts and facts typically pertaining to one particular realm of knowledge. Unlike the knowledge base as a whole, each microtheory is required to be free from contradictions. Each microtheory has a name which is a regular constant; microtheory constants contain the string "Mt" by convention. An example is #$MathMt, the microtheory containing mathematical knowledge. The microtheories can inherit from each other and are organized in a hierarchy:
one specialization of #$MathMt is #$GeometryGMt, the microtheory about geometry.

Inference engine

An inference engine
Inference engine
In computer science, and specifically the branches of knowledge engineering and artificial intelligence, an inference engine is a computer program that tries to derive answers from a knowledge base. It is the "brain" that expert systems use to reason about the information in the knowledge base for...

 is a computer program that tries to derive answers from a knowledge base.
The Cyc inference engine performs general logical deduction (including modus ponens
Modus ponens
In classical logic, modus ponendo ponens or implication elimination is a valid, simple argument form. It is related to another valid form of argument, modus tollens. Both Modus Ponens and Modus Tollens can be mistakenly used when proving arguments...

, modus tollens
Modus tollens
In classical logic, modus tollens has the following argument form:- Formal notation :...

, universal quantification
Universal quantification
In predicate logic, universal quantification formalizes the notion that something is true for everything, or every relevant thing....

 and existential quantification
Existential quantification
In predicate logic, an existential quantification is the predication of a property or relation to at least one member of the domain. It is denoted by the logical operator symbol ∃ , which is called the existential quantifier...

).

OpenCyc

The latest version of OpenCyc, 2.0, was released in July 2009. OpenCyc 1.0 includes the entire Cyc ontology containing hundreds of thousands of terms, along with millions of assertions relating the terms to each other; however, these are mainly taxonomic assertions, not the complex rules available in Cyc. The knowledge base contains 47,000 concepts and 306,000 facts and can be browsed on the OpenCyc website.

The first version of OpenCyc was released in spring 2002 and contained only 6,000 concepts and 60,000 facts. The knowledge base is released under the Apache License
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

. Cycorp has stated its intention to release OpenCyc under parallel, unrestricted licences to meet the needs of its users. The CycL
CycL
CycL in computer science and artificial intelligence is an ontology language used by Doug Lenat's Cyc artificial intelligence project. Ramanathan V. Guha was instrumental in the design of early versions of the language. There is a close variant of CycL known as MELD.The original version of CycL was...

 and SubL
SubL
SubL is a programming language based on Common Lisp, which can be easily compiled into the C programming language.It is the low-level, efficient language that is used to implement the Cyc inference engine, and knowledge base lookup and matching algorithms....

 interpreter (the program that allows you to browse and edit the database as well as to draw inferences) is released free of charge, but only as a binary, without source code. It is available for Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 and Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

. The open source Texai project has released the RDF
Resource Description Framework
The Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...

-compatible content extracted from OpenCyc.

ResearchCyc

In July 2006, Cycorp released the binaries of ResearchCyc 1.0, a version of Cyc aimed at the research community, at no charge. (ResearchCyc was in beta stage of development during all of 2004; a beta version was released in February 2005.) In addition to the taxonomic information contained in OpenCyc, ResearchCyc includes significantly more semantic knowledge (i.e., additional facts) about the concepts in its knowledge base, and includes a large lexicon, English parsing and generation tools, and Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 based interfaces for knowledge editing and querying.

Terrorism Knowledge Base

The comprehensive Terrorism
Terrorism
Terrorism is the systematic use of terror, especially as a means of coercion. In the international community, however, terrorism has no universally agreed, legally binding, criminal law definition...

 Knowledge Base is an application of Cyc in development that will try to ultimately contain all relevant knowledge about "terrorist" groups, their members, leaders, ideology, founders, sponsors, affiliations, facilities, locations, finances, capabilities, intentions, behaviors, tactics, and full descriptions of specific terrorist events. The knowledge is stored as statements in mathematical logic, suitable for computer understanding and reasoning.

Cyclopedia

Cyclopedia is being developed; it superimposes Cyc keywords on pages taken from Wikipedia pages.

Cleveland Clinic Foundation

The Cleveland Clinic
Cleveland Clinic
The Cleveland Clinic is a multispecialty academic medical center located in Cleveland, Ohio, United States. The Cleveland Clinic is currently regarded as one of the top 4 hospitals in the United States as rated by U.S. News & World Report...

 has used Cyc to develop a natural language query interface of biomedical information.
The query is parsed into a set of CycL (higher-order logic) fragments with open variables, then after applying various constraints (medical domain knowledge, common sense, discourse pragmatics, syntax), there is a way to fit those fragments together, one semantically meaningful formal query.

Criticisms of the Cyc Project

The Cyc project has been described as "one of the most controversial endeavors of the artificial intelligence history", so it has inevitably garnered its share of criticism. Criticisms include:
  • The complexity of the system—arguably necessitated by its encyclopedic ambitions—and the consequent difficulty in adding to the system by hand
  • Scalability problems, from widespread reification
    Reification (knowledge representation)
    Reification in knowledge representation involves the representation of factual assertions, that are referred to by other assertions; which might then be manipulated in some way...

    , especially as constants
  • Unsatisfactory treatment of the concept of substance
    Substance theory
    Substance theory, or substance attribute theory, is an ontological theory about objecthood, positing that a substance is distinct from its properties. A thing-in-itself is a property-bearer that must be distinguished from the properties it bears....

     and the related distinction between intrinsic and extrinsic properties
    Intrinsic and extrinsic properties
    An intrinsic property is an essential or inherent property of a system or of a material itself or within. It is independent of how much of the material is present and is independent of the form the material, e.g., one large piece or a collection of smaller pieces...

  • The lack of any meaningful benchmark or comparison for the efficiency of Cyc's inference engine
  • The current incompleteness of the system in both breadth and depth and the related difficulty in measuring its completeness
  • Limited documentation
  • The lack of up-to-date on-line training material makes it difficult for new people to learn the systems
  • A large number of gaps in not only the ontology of ordinary objects, but an almost complete lack of relevant assertions describing such objects

See also

  • Categorical logic
    Categorical logic
    Categorical logic is a branch of category theory within mathematics, adjacent to mathematical logic but more notable for its connections to theoretical computer science. In broad terms, categorical logic represents both syntax and semantics by a category, and an interpretation by a functor...

  • Chinese room
    Chinese room
    The Chinese room is a thought experiment by John Searle, which first appeared in his paper "Minds, Brains, and Programs", published in Behavioral and Brain Sciences in 1980...

  • DARPA Agent Markup Language
  • Mindpixel
    Mindpixel
    Mindpixel was a web-based collaborative artificial intelligence project which aimed to create a knowledgebase of millions of human validated true/false statements, or probabilistic propositions. It ran from 2000 to 2005.-Description:...

  • Never-Ending Language Learning
    Never-Ending Language Learning
    Never-Ending Language Learning system is a semantic machine learning system developed by a research team at Carnegie Mellon University, and supported by grants from DARPA, Google, and the NSF, with portions of the system running on a supercomputing cluster provided by Yahoo!.-Process and...

  • Open Mind Common Sense
    Open Mind Common Sense
    Open Mind Common Sense is an artificial intelligence project based at the Massachusetts Institute of Technology Media Lab whose goal is to build and utilize a large commonsense knowledge base from the contributions of many thousands of people across the Web.Since its founding in 1999, it has...

  • Semantic Web
    Semantic Web
    The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...

  • UMBEL
    UMBEL
    UMBEL, short for Upper Mapping and Binding Exchange Layer, is an extracted subset of OpenCyc, providing the Cyc data in an RDF ontology based on SKOS and OWL 2...

  • SHRDLU
    SHRDLU
    SHRDLU was an early natural language understanding computer program, developed by Terry Winograd at MIT from 1968-1970. In it, the user carries on a conversation with the computer, moving objects, naming collections and querying the state of a simplified "blocks world", essentially a virtual box...

  • DBpedia
    DBpedia
    DBpedia is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources,...

  • freebase (database)
    Freebase (database)
    Freebase is a large collaborative knowledge base consisting of metadata composed mainly by its community members. It is an online collection of structured data harvested from many sources, including individual 'wiki' contributions. Freebase aims to create a global resource which allows people to...

  • YAGO
    YAGO (database)
    YAGO is a knowledge base developed at the Max-Planck-Institute Saarbrücken.The knowledge base contains information harvested from Wikipedia and linked to Wordnet....

  • Wolfram Alpha
    Wolfram Alpha
    Wolfram Alpha is an answer-engine developed by Wolfram Research. It is an online service that answers factual queries directly by computing the answer from structured data, rather than providing a list of documents or web pages that might contain the answer as a search engine might...

  • True Knowledge
    True Knowledge
    True Knowledge Ltd. company in Cambridge, England, founded by William Tunstall-Pedoe, which specialises in knowledge base and semantic search engine software. Its first product was an answer engine that aimed to directly answer questions posed in plain English text, which is accomplished using a...

  • Fifth generation computer
    Fifth generation computer
    The Fifth Generation Computer Systems project was an initiative by Japan'sMinistry of International Trade and Industry, begun in 1982, to create a "fifth generation computer" which was supposed to perform much calculation using massive parallel processing...


Further reading


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK