Backus–Naur form
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, BNF (Backus Normal Form or Backus–Naur Form) is a notation technique
Metasyntax
A metasyntax describes the allowable structure and composition of phrases and sentences of a metalanguage, which is used to describe either a natural language or a computer programming language...

 for context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

s, often used to describe the syntax
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....

 of languages used in computing, such as computer programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s, document formats, instruction set
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

s and communication protocols.
It is applied wherever exact descriptions of languages are needed, for instance, in official language specifications, in manuals, and in textbooks on programming language theory.

Many extensions and variants of the original notation are used; some are exactly defined, including Extended Backus–Naur Form
Extended Backus–Naur form
In computer science, Extended Backus–Naur Form is a family of metasyntax notations used for expressing context-free grammars: that is, a formal way to describe computer programming languages and other formal languages. They are extensions of the basic Backus–Naur Form metasyntax notation.The...

(EBNF) and Augmented Backus–Naur Form
Augmented Backus–Naur form
In computer science, Augmented Backus–Naur Form is a metalanguage based on Backus–Naur Form , but consisting of its own syntax and derivation rules. The motive principle for ABNF is to describe a formal system of a language to be used as a bidirectional communications protocol...

(ABNF).

History

The idea of describing the structure of language with rewriting rules can be traced back to at least the work of Pāṇini (about the 4th century BC), who used it in his description of Sanskrit
Sanskrit
Sanskrit , is a historical Indo-Aryan language and the primary liturgical language of Hinduism, Jainism and Buddhism.Buddhism: besides Pali, see Buddhist Hybrid Sanskrit Today, it is listed as one of the 22 scheduled languages of India and is an official language of the state of Uttarakhand...

 word structure. American linguists
Linguistics
Linguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....

 such as Leonard Bloomfield
Leonard Bloomfield
Leonard Bloomfield was an American linguist who led the development of structural linguistics in the United States during the 1930s and the 1940s. His influential textbook Language, published in 1933, presented a comprehensive description of American structural linguistics...

 and Zellig Harris
Zellig Harris
Zellig Sabbettai Harris was a renowned American linguist, mathematical syntactician, and methodologist of science. Originally a Semiticist, he is best known for his work in structural linguistics and discourse analysis and for the discovery of transformational structure in language...

 took this idea a step further by attempting to formalize language and its study in terms of formal definitions and procedures (around 1920-1960).

Meanwhile, string rewriting rules as formal, abstract systems were introduced and studied by mathematicians such as Axel Thue
Semi-Thue system
In theoretical computer science and mathematical logic a string rewriting system , historically called a semi-Thue system, is a rewriting system over strings from a alphabet...

 (in 1914), Emil Post
Post canonical system
A Post canonical system, as created by Emil Post, is a string-manipulation system that starts with finitely-many strings and repeatedly transforms them by applying a finite set of specified rules of a certain form, thus generating a formal language...

 (1920s-1940s) and Alan Turing
Turing machine
A Turing machine is a theoretical device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a...

 (1936). Noam Chomsky
Noam Chomsky
Avram Noam Chomsky is an American linguist, philosopher, cognitive scientist, and activist. He is an Institute Professor and Professor in the Department of Linguistics & Philosophy at MIT, where he has worked for over 50 years. Chomsky has been described as the "father of modern linguistics" and...

, teaching linguistics to students of information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

 at MIT, combined linguistics and mathematics, by taking what is essentially Thue's formalism as the basis for the description of the syntax of natural language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...

; he also introduced a clear distinction between generative rules (those of context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

s) and transformation rules (1956).

John Backus, a programming language designer at IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

, proposed "metalinguistic formulas"
to describe the syntax of the new programming language IAL, known today as ALGOL 58
ALGOL 58
ALGOL 58, originally known as IAL, is one of the family of ALGOL computer programming languages. It was an early compromise design soon superseded by ALGOL 60...

 (1959),
using the BNF notation.

Further development of ALGOL led to ALGOL 60
ALGOL 60
ALGOL 60 is a member of the ALGOL family of computer programming languages. It gave rise to many other programming languages, including BCPL, B, Pascal, Simula, C, and many others. ALGOL 58 introduced code blocks and the begin and end pairs for delimiting them...

; in its report (1963), Peter Naur
Peter Naur
Peter Naur is a Danish pioneer in computer science and Turing award winner. His last name is the N in the BNF notation , used in the description of the syntax for most programming languages...

 named Backus's notation Backus Normal Form, and simplified it to minimize the character set used.
However, Donald Knuth
Donald Knuth
Donald Ervin Knuth is a computer scientist and Professor Emeritus at Stanford University.He is the author of the seminal multi-volume work The Art of Computer Programming. Knuth has been called the "father" of the analysis of algorithms...

 argued that BNF should rather be read as Backus–Naur Form, as it is
"not a normal form
Normal form (term rewriting)
In abstract rewriting, a normal form is an element of the system which cannot be rewritten any further. Stated formally, for some reduction relation ⋅ → ⋅ over X a term t in X is a normal form if there does not exist a term t′ in X such that t → t′.Consider...

 in any sense",
unlike, for instance, Chomsky Normal Form
Chomsky normal form
In computer science, a context-free grammar is said to be in Chomsky normal form if all of its production rules are of the form:Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can be transformed into an equivalent one which is in Chomsky normal form....

.

Introduction

A BNF specification is a set of derivation rules, written as

::= __expression__


where <symbol
Symbol
A symbol is something which represents an idea, a physical entity or a process but is distinct from it. The purpose of a symbol is to communicate meaning. For example, a red octagon may be a symbol for "STOP". On a map, a picture of a tent might represent a campsite. Numerals are symbols for...

> is a nonterminal, and the expression
Expression (mathematics)
In mathematics, an expression is a finite combination of symbols that is well-formed according to rules that depend on the context. Symbols can designate numbers , variables, operations, functions, and other mathematical symbols, as well as punctuation, symbols of grouping, and other syntactic...

 consists of one or more sequences of symbols; more sequences are separated by the vertical bar
Vertical bar
The vertical bar is a character with various uses in mathematics, where it can be used to represent absolute value, among others; in computing and programming and in general typography, as a divider not unlike the interpunct...

, '|', indicating a choice
Choice
Choice consists of the mental process of judging the merits of multiple options and selecting one of them. While a choice can be made between imagined options , often a choice is made between real options, and followed by the corresponding action...

, the whole being a possible substitution
Substitution
Substitution may refer to:- Sciences :* Substitution , a syntactic transformation on strings of symbols of a formal language* Substitution of variables* Substitution cipher, a method of encryption...

 for the symbol on the left. Symbols that never appear on a left side are terminals. On the other hand, symbols that appear on a left side are non-terminals and are always enclosed between the pair <>.

Example

As an example, consider this possible BNF for a U.S.
United States
The United States of America is a federal constitutional republic comprising fifty states and a federal district...

 postal address
Address (geography)
An address is a collection of information, presented in a mostly fixed format, used for describing the location of a building, apartment, or other structure or a plot of land, generally using political boundaries and street names as references, along with other identifiers such as house or...

:

::=

::=
|

::= | "."

::=

::= ","

::= "Sr." | "Jr." | | ""

This translates into English as:
  • A postal address consists of a name-part, followed by a street-address
    Street name
    A street name or odonym is an identifying name given to a street. The street name usually forms part of the address...

     part, followed by a zip-code
    ZIP Code
    ZIP codes are a system of postal codes used by the United States Postal Service since 1963. The term ZIP, an acronym for Zone Improvement Plan, is properly written in capital letters and was chosen to suggest that the mail travels more efficiently, and therefore more quickly, when senders use the...

     part.
  • A name-part consists of either: a personal-part followed by a last name
    Last Name
    "Last Name" is the title of a song composed by country singer Carrie Underwood, Hillary Lindsey and Luke Laird. It is the third single from Underwood's second studio album, Carnival Ride. It was released in the United States on April 7, 2008, by which point the song had already charted...

     followed by an optional suffix
    Suffix (name)
    A name suffix, in the Western English-language naming tradition, follows a person's full name and provides additional information about the person. Post-nominal letters indicate that the individual holds a position, educational degree, accreditation, office, or honor.- Academic :Academic suffixes...

     (Jr., Sr., or dynastic number) and end-of-line, or a personal part followed by a name part (this rule illustrates the use of recursion
    Recursion (computer science)
    Recursion in computer science is a method where the solution to a problem depends on solutions to smaller instances of the same problem. The approach can be applied to many types of problems, and is one of the central ideas of computer science....

     in BNFs, covering the case of people who use multiple first and middle names and/or initials).
  • A personal-part consists of either a first name or an initial
    Initial
    In a written or published work, an initial is a letter at the beginning of a work, a chapter, or a paragraph that is larger than the rest of the text. The word is derived from the Latin initialis, which means standing at the beginning...

     followed by a dot.
  • A street address consists of a house number, followed by a street name, followed by an optional apartment
    Apartment
    An apartment or flat is a self-contained housing unit that occupies only part of a building...

     specifier, followed by an end-of-line.
  • A zip-part consists of a town
    Town
    A town is a human settlement larger than a village but smaller than a city. The size a settlement must be in order to be called a "town" varies considerably in different parts of the world, so that, for example, many American "small towns" seem to British people to be no more than villages, while...

    -name, followed by a comma, followed by a state code, followed by a ZIP-code followed by an end-of-line.
  • A opt-suffix-part consists of a suffix, such as "Sr.", "Jr." or a roman-numeral
    Roman numerals
    The numeral system of ancient Rome, or Roman numerals, uses combinations of letters from the Latin alphabet to signify values. The numbers 1 to 10 can be expressed in Roman numerals as:...

    , or an empty string (i.e. nothing).


Note that many things (such as the format of a first-name, apartment specifier, ZIP-code, and Roman numeral) are left unspecified here. If necessary, they may be described using additional BNF rules.

Further examples

BNF's syntax itself may be represented with a BNF like the following:

::= |
::= "<" ">" "::="

::= " " | ""
::= | "|"
::= |
::= |
::= | "<" ">"
::= '"' '"' | "'" "'"

This assumes that no whitespace
Whitespace (computer science)
In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...

 is necessary for proper interpretation of the rule. represents the appropriate line-end
Newline
In computing, a newline, also known as a line break or end-of-line marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the...

 specifier (in ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

, carriage-return and/or line-feed, depending on the operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

). and are to be substituted with a declared rule's name/label or literal text, respectively.

In the U.S. postal address example above, the entire block-quote is a syntax. Each line or unbroken grouping of lines is a rule; for example one rule begins with " ::=". The other part of that rule (aside from a line-end) is an expression, which consists of two lists separated by a pipe "|". These two lists consists of some terms (three terms and two terms, respectively). Each term in this particular rule is a rule-name.

Variants

There are many variants and extensions of BNF, generally either for the sake of simplicity and succinctness, or to adapt it to a specific application. One common feature of many variants is the use of regular expression
Regular expression
In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...

 repetition operators such as * and +. The Extended Backus–Naur Form
Extended Backus–Naur form
In computer science, Extended Backus–Naur Form is a family of metasyntax notations used for expressing context-free grammars: that is, a formal way to describe computer programming languages and other formal languages. They are extensions of the basic Backus–Naur Form metasyntax notation.The...

 (EBNF) is a common one. In fact the example above is not the pure form invented for the ALGOL 60 report. The bracket notation "[ ]" was introduced a few years later in IBM's PL/I
PL/I
PL/I is a procedural, imperative computer programming language designed for scientific, engineering, business and systems programming applications...

 definition but is now universally recognised. ABNF
Augmented Backus–Naur form
In computer science, Augmented Backus–Naur Form is a metalanguage based on Backus–Naur Form , but consisting of its own syntax and derivation rules. The motive principle for ABNF is to describe a formal system of a language to be used as a bidirectional communications protocol...

 and RBNF are other extensions commonly used to describe Internet Engineering Task Force (IETF)
Internet Engineering Task Force
The Internet Engineering Task Force develops and promotes Internet standards, cooperating closely with the W3C and ISO/IEC standards bodies and dealing in particular with standards of the TCP/IP and Internet protocol suite...

 protocols.

Parsing expression grammar
Parsing expression grammar
A parsing expression grammar, or PEG, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language...

s build on the BNF and regular expression
Regular expression
In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...

 notations to form an alternative class of formal grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...

, which is essentially analytic rather than generative
Generative grammar
In theoretical linguistics, generative grammar refers to a particular approach to the study of syntax. A generative grammar of a language attempts to give a set of rules that will correctly predict which combinations of words will form grammatical sentences...

 in character.

Many BNF specifications found online today are intended to be human readable and are non-formal. These often include many of the following syntax rules and extensions:
  • Optional items enclosed in square brackets. E.g. [].
  • Items repeating 0 or more times are enclosed in curly brackets or suffixed with an asterisk ('*'), such as " ::= {}" or " ::= *", respectively.
  • Items repeating 1 or more times are suffixed with an addition (plus) symbol ('+').
  • Terminals may appear in bold and NonTerminals in plain text rather than using italics and angle brackets.
  • Alternative choices in a production are separated by the ‘|’ symbol. E.g., | .
  • Where items need to be grouped they are enclosed in simple parentheses.

See also

  • Extended Backus–Naur Form
    Extended Backus–Naur form
    In computer science, Extended Backus–Naur Form is a family of metasyntax notations used for expressing context-free grammars: that is, a formal way to describe computer programming languages and other formal languages. They are extensions of the basic Backus–Naur Form metasyntax notation.The...

    .
  • Augmented Backus–Naur Form
    Augmented Backus–Naur form
    In computer science, Augmented Backus–Naur Form is a metalanguage based on Backus–Naur Form , but consisting of its own syntax and derivation rules. The motive principle for ABNF is to describe a formal system of a language to be used as a bidirectional communications protocol...

    .
  • Syntax diagram (Railroad diagram).
  • Definite clause grammar
    Definite clause grammar
    A definite clause grammar is a way of expressing grammar, either for natural or formal languages, in a logic programming language such as Prolog. DCGs are usually associated with Prolog, but similar languages such as Mercury also include DCGs...

     A more expressive alternative to BNF used in Prolog
    Prolog
    Prolog is a general purpose logic programming language associated with artificial intelligence and computational linguistics.Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is declarative: the program logic is expressed in terms of...

  • Wirth syntax notation
    Wirth syntax notation
    Wirth syntax notation is a metasyntax, that is, a formal way to describe formal languages. Originally proposed by Niklaus Wirth in 1977 as an alternative to Backus-Naur form , it has several advantages over BNF in that it can be defined using itself, it contains an explicit iteration construct,...

     An alternative to BNF from 1977.
  • Pseudocode
    Pseudocode
    In computer science and numerical computation, pseudocode is a compact and informal high-level description of the operating principle of a computer program or other algorithm. It uses the structural conventions of a programming language, but is intended for human reading rather than machine reading...

     A similar concept

Software using BNF

  • ANTLR
    ANTLR
    In computer-based language recognition, ANTLR , or ANother Tool for Language Recognition, is a parser generator that uses LL parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set , first developed in 1989, and is under active development...

     Another parser generator written in Java
    Java (programming language)
    Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

    .
  • BNF Converter (BNFC) (BNFC) .
  • Coco/R
    Coco/R
    Coco/R is a compiler generator that takes an L-attributed Extended Backus–Naur Form grammar of a source language and generates a scanner and a parser for that language....

     Compiler generator accepting an attributed grammar in EBNF
  • GOLD
    GOLD (parser)
    GOLD is a freeware parsing system that is designed to support multiple programming languages.- Design :The system uses a DFA for lexical analysis and the LALR algorithm for parsing. Both of these algorithms are state machines that use tables to determine actions...

     BNF parser.
  • GNU bison
    GNU bison
    GNU bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser which reads sequences of tokens and decides whether the sequence conforms to the syntax...

     GNU version of yacc.
  • RPA BNF parser. Online(PHP) demo parsing: JavaScript, XML
  • Yacc
    Yacc
    The computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...

     parser generator (used with Lex pre-processor).
  • bnfparser2, a universal syntax verification utility

External links


Language grammars

  • Algol-60 BNF, the original BNF.
  • BNF grammars for SQL-92, SQL-99 and SQL-2003, Freely available BNF grammars for SQL
    SQL
    SQL is a programming language designed for managing data in relational database management systems ....

    .
  • BNF Web Club, Freely available BNF grammars for SQL, Ada
    Ada (programming language)
    Ada is a structured, statically typed, imperative, wide-spectrum, and object-oriented high-level computer programming language, extended from Pascal and other languages...

    , Java
    Java (programming language)
    Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

    .
  • BNF Examples and Full Programming Languages, Freely available BNF grammars for programming languages (C, Java, JavaScript
    JavaScript
    JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

    , C#, VB.NET, SQL-89), academic languages, file formats (HTML
    HTML
    HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

    , XML
    XML
    Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

    , CSS
    CSS
    -Computing:*Cascading Style Sheets, a language used to describe the style of document presentations in web development*Central Structure Store in the PHIGS 3D API*Closed source software, software that is not distributed with source code...

    ) and others (regular expression
    Regular expression
    In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...

    s, YACC
    Yacc
    The computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...

    ). These are written in the GOLD
    GOLD (parser)
    GOLD is a freeware parsing system that is designed to support multiple programming languages.- Design :The system uses a DFA for lexical analysis and the LALR algorithm for parsing. Both of these algorithms are state machines that use tables to determine actions...

     Meta-Language, consisting of BNF and Regular Expressions.
  • Free Programming Language Grammars for Compiler Construction, Freely available BNF/EBNF
    Extended Backus–Naur form
    In computer science, Extended Backus–Naur Form is a family of metasyntax notations used for expressing context-free grammars: that is, a formal way to describe computer programming languages and other formal languages. They are extensions of the basic Backus–Naur Form metasyntax notation.The...

     grammars for C/C++, Pascal
    Pascal (programming language)
    Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...

    , COBOL
    COBOL
    COBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....

    , Ada 95
    Ada (programming language)
    Ada is a structured, statically typed, imperative, wide-spectrum, and object-oriented high-level computer programming language, extended from Pascal and other languages...

    , PL/I
    PL/I
    PL/I is a procedural, imperative computer programming language designed for scientific, engineering, business and systems programming applications...

    .
  • BNF files related to the STEP standard. Includes Parts 11, 14, and 21 of the ISO 10303 (STEP)
    ISO 10303
    ISO 10303 is an ISO standard for the computer-interpretable representation and exchange of product manufacturing information. Its official title is: Automation systems and integration — Product data representation and exchange...

    standard
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK