Syntax highlighting
Encyclopedia
Syntax highlighting is a feature of some text editor
Text editor
A text editor is a type of program used for editing plain text files.Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code....

s that display text—especially source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

—in different colors and font
Typeface
In typography, a typeface is the artistic representation or interpretation of characters; it is the way the type looks. Each type is designed and there are thousands of different typefaces in existence, with new ones being developed constantly....

s according to the category of terms. This feature eases writing in a structured language such as a programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

 or a markup language
Markup language
A markup language is a modern system for annotating a text in a way that is syntactically distinguishable from that text. The idea and terminology evolved from the "marking up" of manuscripts, i.e. the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts...

 as both structures and syntax error
Syntax error
In computer science, a syntax error refers to an error in the syntax of a sequence of characters or tokens that is intended to be written in a particular programming language....

s are visually distinct. Highlighting does not affect the meaning of the text itself; it's made only for human readers/editors.

Syntax highlighting is a form of secondary notation
Secondary notation
In design, a secondary notation is defined as "visual cues which are not part of formal notation". Properties like position, indentation, color, symmetry, when used to convey information, are secondary notation....

, since the highlights are not part of the text meaning, but serve to reinforce it. Some editors also integrate syntax highlighting with other features, such as spell checking or code folding
Code folding
Code folding is a feature of some text editors, source code editors and IDEs that allows the user to selectively hide and display sections of a currently-edited file as a part of routine edit operations...

, as aids to editing which are external to the language.

Practical considerations

Syntax highlighting is one strategy to improve the readability and context of the text; especially for code that spans several pages. The reader can easily ignore large sections of comments or code, depending on what one desires.

Syntax highlighting also helps programmers find errors in their program. For example, most editors highlight string literal
String literal
A string literal is the representation of a string value within the source code of a computer program. There are numerous alternate notations for specifying string literals, and the exact notation depends on the individual programming language in question...

s in a different color. Consequently, spotting a missing delimiter is much easier because of the contrasting color of the text. Brace matching is another important feature with many popular editors. This makes it simple to see if a brace has been left out or locate the match of the brace the cursor is on by highlighting the pair in a different color.

Some text editors can also export the color markup in a format that is suitable for printing or for importing into word-processing or other kinds of text-formatting software; for instance an HTML, colorized LaTeX, PostScript
PostScript
PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

 or RTF
Rich Text Format
The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation since 1987 for Microsoft products and for cross-platform document interchange....

 version of its syntax highlighting.

Multi-document editors

For editors that support more than one language, the user can usually specify the language of the text, such as C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, LaTeX
LaTeX
LaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...

, HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

, or the text editor can automatically recognize it based on the file extension or by scanning contents of the file. This automatic language detection presents potential problems. For example, a user may want to edit a document containing:
  • more than one language (for example when editing an HTML
    HTML
    HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

     file that contains embedded JavaScript
    JavaScript
    JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

     code).
  • a language that is not recognized (for example when editing source code for an obscure or relatively new programming language).
  • a language that differs from the file type (for example when editing source code in an extension-less file in an editor that uses file extensions to detect the language)


In these cases, it is not clear what language to use, and a document may not be highlighted or be highlighted incorrectly.

Syntax elements

Most editors with syntax highlighting allow different colors and text styles to be given to dozens of different lexical sub-elements of syntax. These include keywords, comments, control-flow statements, variables, and other elements. Programmers often heavily customize their settings in an attempt to show as much useful information as possible without making the code difficult to read.

Examples

Below is a snippet
Snippet (programming)
Snippet is a programming term for a small region of re-usable source code, machine code or text. Ordinarily, these are formally-defined operative units to incorporate into larger programming modules...

 of syntax highlighted C
C
Ĉ or ĉ is a consonant in Esperanto orthography, representing the sound .Esperanto orthography uses a diacritic for all four of its postalveolar consonants, as do the Latin-based Slavic alphabets...

 code:

/* Hello World */
  1. include
  2. include


int main
{
printf("Hello World\n");
return 0;
}


Below is another snippet
Snippet (programming)
Snippet is a programming term for a small region of re-usable source code, machine code or text. Ordinarily, these are formally-defined operative units to incorporate into larger programming modules...

 of syntax highlighted C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 code:

// Create "windowCount" Window objects:
int windowCount = 10;
Window **windows = new Window *[max];
for (int i = 0; i < windowCount; ++i) {
windows[i] = new Window;
}


In the C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 example, the editor has recognized the keyword
Keyword (computer programming)
In computer programming, a keyword is a word or identifier that has a particular meaning to the programming language. The meaning of keywords — and, indeed, the meaning of the notion of keyword — differs widely from language to language....

s int, new, and for. The comment
Comment (computer programming)
In computer programming, a comment is a programming language construct used to embed programmer-readable annotations in the source code of a computer program. Those annotations are potentially significant to programmers but typically ignorable to compilers and interpreters. Comments are usually...

 at the beginning is also highlighted in a specific manner to distinguish it from working code.

History and limitations

The Live Parsing Editor (LEXX) was written for the VM
VM (operating system)
VM refers to a family of IBM virtual machine operating systems used on IBM mainframes System/370, System/390, zSeries, System z and compatible systems, including the Hercules emulator for personal computers. The first version, released in 1972, was VM/370, or officially Virtual Machine Facility/370...

 operating system for the computerization of the Oxford English Dictionary
Oxford English Dictionary
The Oxford English Dictionary , published by the Oxford University Press, is the self-styled premier dictionary of the English language. Two fully bound print editions of the OED have been published under its current name, in 1928 and 1989. The first edition was published in twelve volumes , and...

 in 1985 and was one of the first to use color syntax highlighting. Its live parsing capability allowed user-supplied parsers to be added to the editor, for text, programs, data file, etc. See: LEXX – A programmable structured editor, Cowlishaw, M. F., IBM Journal of Research and Development, Vol 31, No. 1, 1987, IBM Reprint order number G322-0151

Since most text editors highlight syntax based on complex pattern matching
Pattern matching
In computer science, pattern matching is the act of checking some sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact. The patterns generally have the form of either sequences or tree structures...

 heuristics rather than actually implementing a parser
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...

 for each possible language, which could be prohibitively complex, the highlighting is almost never completely accurate. Moreover, depending on the pattern matching algorithms, the highlighting "engine" can become very slow for certain types of language structures. Some editors overcome this problem by not always parsing the whole file but rather just the visible area, sometimes scanning backwards in the text up to a limited number of lines for "syncing".

However, modern language-specific IDE
Integrated development environment
An integrated development environment is a software application that provides comprehensive facilities to computer programmers for software development...

s (in contrast to text editors) generally perform actual language parsing so they can be completely accurate.

See the Programming features section of the Comparison of text editors
Comparison of text editors
This article provides basic comparisons for common text editors. More feature details for text editors are available from the Category of text editor features and from the individual products' articles...

 article for a list of some editors that have syntax highlighting.

Syntax highlighting engines

Some popular syntax highlighting engines are:
  • vi IMproved
    Vim (text editor)
    Vim is a text editor written by Bram Moolenaar and first released publicly in 1991. Based on the vi editor common to Unix-like systems, Vim is designed for use both from a command line interface and as a standalone application in a graphical user interface...

     syntax files (.vim)
  • GeSHi
    GeSHi
    GeSHi or Generic Syntax Highlighter is a free software library that allows syntax highlighting of source code for several markup and programming languages. The program is written in PHP and is bundled or available as an add-on in popular web-based applications, such as Dokuwiki, Mambo, MediaWiki ,...

     written in PHP
  • Kate
    Kate (text editor)
    In computing, Kate is a text editor by KDE. The name Kate is an acronym for KDE Advanced Text Editor.-History:Kate has been part of KDE Software Compilation since release 2.2 in 2001. Because of the KParts technology, it is possible to embed Kate as an editing component in other KDE applications...

     syntax highlighting system (has also been ported to Perl)
  • TextMate
    TextMate
    TextMate is a general-purpose GUI text editor for Mac OS X created by Allan Odgaard. Popular with programmers, some notable features include declarative customizations, tabs for open documents, recordable macros, folding sections and snippets, shell integration, and an extensible bundle...

    syntax files (has also been ported to Ruby)
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK