All Topics  
Syntax of programming languages

 

   Email Print
   Bookmark   Link






 

Syntax of programming languages



 
 
In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
, the syntax of a programming language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
 is the set of rules that define the combinations of symbols that are considered to be syntactically correct program
Computer program

Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running....
s in that language. The syntax of a language defines its surface form. Text-based programming languages are based on sequences of characters, while visual programming languages are based on the spatial layout and connections between symbols (which may be textual or graphical).

The lexical grammar
Lexical grammar

In computer science, a lexical grammar can be thought of as the syntax of Token . That is, the rules governing how a character sequence is divided up into subsequences of characters, each of which represents an individual token....
 of a textual language specifies how characters must be chunked into tokens.






Discussion
Ask a question about 'Syntax of programming languages'
Start a new discussion about 'Syntax of programming languages'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
, the syntax of a programming language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
 is the set of rules that define the combinations of symbols that are considered to be syntactically correct program
Computer program

Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running....
s in that language. The syntax of a language defines its surface form. Text-based programming languages are based on sequences of characters, while visual programming languages are based on the spatial layout and connections between symbols (which may be textual or graphical).

The lexical grammar
Lexical grammar

In computer science, a lexical grammar can be thought of as the syntax of Token . That is, the rules governing how a character sequence is divided up into subsequences of characters, each of which represents an individual token....
 of a textual language specifies how characters must be chunked into tokens. Other syntax rules specify the permissible sequences of these tokens and the process of assigning meaning to these token sequences is part of semantics
Semantics

Semantics is the study of meaning in communication. The word is derived from the Greek language word s??a?t???? , "significant", from s??a??? , "to signify, to indicate" and that from s??a , "sign, mark, token"....
.

The syntactic analysis of source code usually entails the transformation of the linear sequence of tokens into a hierarchical syntax tree (abstract syntax tree
Abstract syntax tree

In computer science, an abstract syntax tree , or just syntax tree, is a directed tree representation of the abstract syntactic structure of source code written in a certain programming language....
s are one convenient form of syntax tree). This process is called parsing
Parsing

In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a sequence of lexical analysis#Token to determine their grammatical structure with respect to a given formal grammar....
, as it is in syntactic analysis
Syntax

In linguistics, syntax is the study of the principles and rules for constructing Sentence s in natural languages. In addition to referring to the discipline, the term syntax is also used to refer directly to the rules and principles that govern the sentence structure of any individual language, as in "the Irish syntax"....
 in linguistics
Linguistics

Linguistics is the science study of natural language. Linguistics encompasses a number of sub-fields. An important topical division is between the study of language structure and the study of Meaning ....
. Tools have been written that automatically generate parsers from a specification of a language grammar written in Backus-Naur form, e.g., Yacc
Yacc

The computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parsing based on an Formal grammar written in a notation similar to Backus-Naur form....
 (yet another compiler compiler).

Syntax definition


The syntax of textual programming languages is usually defined using a combination of regular expression
Regular expression

In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters....
s (for lexical
Lexical analysis

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical analysis are called lexical analyzers or lexers....
 structure) and Backus-Naur Form (for grammatical
Context-free grammar

In formal language theory, a context-free grammar is a formal grammar in which every Production rule is of the formwhere V is a single nonterminal symbol, and w is a string of Terminal and nonterminal symbolss and/or nonterminals ....
 structure) to inductively specify syntactic categories
Syntactic category

A syntactic category is either a phrasal category, such as noun phrase or verb phrase, which can be decomposed into smaller syntactic categories, or a lexical category, such as noun or verb, which cannot be further decomposed....
 (nonterminals) and terminal symbols. Syntactic categories are defined by rules called productions, which specify the values that belong to a particular syntactic category. Terminal symbols are the concrete characters or strings of characters (for example keyword
Keyword (computer programming)

In computer programming, a keyword is a word or identifier that has a particular meaning to the programming language. The meaning of keywords ? and, indeed, the meaning of the notion of keyword ? differs widely from language to language....
s such as define, if, let, or void) from which syntactically valid programs are constructed.

Below is a simple grammar, based on Lisp
Lisp programming language

Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older....
, which defines productions for the syntactic categories expression, atom, number, symbol, and list:

expression = atom | list
atom = number | symbol
number = [+-]?['0'-'9']+
symbol = ['A'-'Z''a'-'z'].*
list = '(' expression* ')'


This grammar specifies the following:
  • an expression is either an atom or a list;
  • an atom is either a number or a symbol;
  • a number is an unbroken sequence of one or more decimal digits, optionally preceded by a plus or minus sign;
  • a symbol is a letter followed by zero or more of any characters (excluding whitespace); and
  • a list is a matched pair of parentheses, with zero or more expressions inside it.
Here the decimal digits, upper- and lower-case characters, and parentheses are terminal symbols.

The following are examples of well-formed token sequences in this grammar: '12345', '', '(a b c232 (1))'

The grammar needed to specify a programming language can be classified by its position in the Chomsky hierarchy
Chomsky hierarchy

Within the field of computer science, specifically in the area of formal languages, the Chomsky hierarchy is a containment hierarchy of classes of formal grammars....
. The syntax of most programming languages can be specified using a Type-2 grammar, i.e., they are context-free grammar
Context-free grammar

In formal language theory, a context-free grammar is a formal grammar in which every Production rule is of the formwhere V is a single nonterminal symbol, and w is a string of Terminal and nonterminal symbolss and/or nonterminals ....
s.

Syntax versus semantics


The syntax of a language describes the form of a valid program, but does not provide any information about the meaning of the program or the results of executing that program. The meaning given to a combination of symbols is handled by semantics (either formal
Formal semantics of programming languages

In theoretical computer science, formal semantics is the field concerned with the rigorous mathematical study of the meaning of programming languages and models of computation....
 or hard-coded in a reference implementation
Reference implementation (computing)

In computing, a reference implementation is a software example of a specification. These are intended to help others implement their own version of the specification or find problems during the creation of a specification....
). Not all syntactically correct programs are semantically correct. Many syntactically correct programs are nonetheless ill-formed, per the language's rules; and may (depending on the language specification and the soundness of the implementation) result in an error on translation or execution. In some cases, such programs may exhibit undefined behavior. Even when a program is well-defined within a language, it may still have a meaning that is not intended by the person who wrote it.

Using natural language
Natural language

In the philosophy of language, a natural language is a language that is spoken, Sign language, or writing by humans for general-purpose communication, as distinguished from formal languages and from constructed languages....
 as an example, it may not be possible to assign a meaning to a grammatically correct sentence or the sentence may be false:
  • "Colorless green ideas sleep furiously
    Colorless green ideas sleep furiously

    "Colorless green ideas sleep furiously" is a sentence composed by Noam Chomsky in 1957 as an example of a sentence whose grammar is correct but whose meaning is Nonsense....
    ." is grammatically well-formed but has no generally accepted meaning.
  • "John is a married bachelor." is grammatically well-formed but expresses a meaning that cannot be true.


The following C language fragment is syntactically correct, but performs an operation that is not semantically defined (because p is a null pointer, the operations p->real and p->im have no meaning):

complex *p = NULL; complex abs_p = sqrt (p->real * p->real + p->im * p->im);