Literate programming
Encyclopedia
Literate programming is an approach to programming
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...

 introduced by Donald Knuth
Donald Knuth
Donald Ervin Knuth is a computer scientist and Professor Emeritus at Stanford University.He is the author of the seminal multi-volume work The Art of Computer Programming. Knuth has been called the "father" of the analysis of algorithms...

 as an alternative to the structured programming
Structured programming
Structured programming is a programming paradigm aimed on improving the clarity, quality, and development time of a computer program by making extensive use of subroutines, block structures and for and while loops - in contrast to using simple tests and jumps such as the goto statement which could...

 paradigm of the 1970s.

The literate programming paradigm
Programming paradigm
A programming paradigm is a fundamental style of computer programming. Paradigms differ in the concepts and abstractions used to represent the elements of a program and the steps that compose a computation A programming paradigm is a fundamental style of computer programming. (Compare with a...

, as conceived by Knuth, represents a move away from writing programs
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

 in the manner and order imposed by the computer, and instead enables programmer
Programmer
A programmer, computer programmer or coder is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to...

s to develop programs in the order demanded by the logic and flow of their thoughts. Literate programs are written as an uninterrupted exposition of logic in an ordinary human language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...

, much like the text of an essay
Essay
An essay is a piece of writing which is often written from an author's personal point of view. Essays can consist of a number of elements, including: literary criticism, political manifestos, learned arguments, observations of daily life, recollections, and reflections of the author. The definition...

, in which macros which hide abstractions and traditional source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

 are included.

Literate programming tools are used to obtain two representations from a literate source file: one suitable for further compilation
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 or execution by a computer
Executable
In computing, an executable file causes a computer "to perform indicated tasks according to encoded instructions," as opposed to a data file that must be parsed by a program to be meaningful. These instructions are traditionally machine code instructions for a physical CPU...

, the "tangled" code, and another for viewing as formatted documentation
Documentation
Documentation is a term used in several different ways. Generally, documentation refers to the process of providing evidence.Modules of Documentation are Helpful...

, which is said to be "woven" from the literate source. While the first generation of literate programming tools were computer language-specific, the later ones are language-agnostic and exist above the programming languages.

Concept

A literate program is an explanation of the program logic in a natural language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...

, such as English, interspersed with snippets of macros and traditional source code. Macros in a literate source file are simply title-like or explanatory phrases in a human language that describe human abstractions created while solving the programming problem, and hiding chunks of code or lower-level macros. These macros are similar to the algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

s in pseudocode
Pseudocode
In computer science and numerical computation, pseudocode is a compact and informal high-level description of the operating principle of a computer program or other algorithm. It uses the structural conventions of a programming language, but is intended for human reading rather than machine reading...

 typically used in teaching computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

. These arbitrary explanatory phrases become precise new operators, created on the fly by the programmer, forming a meta-language on top of the underlying programming language.

A preprocessor
Preprocessor
In computer science, a preprocessor is a program that processes its input data to produce output that is used as input to another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers...

 is used to substitute arbitrary hierarchies, or rather "interconnected 'webs' of macros", to produce the compilable source code with one command ("tangle"), and documentation with another ("weave"). The preprocessor also provides an ability to write out the content of the macros and to add to already created macros in any place in the text of the literate program source file, thereby disposing of the need to keep in mind the restrictions imposed by traditional programming languages or to interrupt the flow of thought.

Advantages

According to Knuth, literate programming provides for higher-quality programs, since it forces programmers to explicitly state the thoughts behind the program, making poorly thought-out design decisions more obvious. Knuth also claims that literate programming provides a first-rate documentation system, which is not an add-on, but is grown naturally in the process of exposition of one's thoughts during a program creation. The resulting documentation allows authors to restart their own thought processes at any later time, and allows other programmers to understand the construction of the program more easily. This differs from traditional documentation, in which a programmer is presented with source code that follows a compiler-imposed order, and must decipher the thought process behind the program from the code and its associated comments. The meta-language capabilities of literate programming are also claimed to facilitate thinking in general, giving a higher "bird's eye view" of the code and increasing the number of concepts the mind can successfully retain and process. Applicability of the concept to programming on a large scale, that of commercial-grade programs is proven by an edition of TeX
TeX
TeX is a typesetting system designed and mostly written by Donald Knuth and released in 1978. Within the typesetting system, its name is formatted as ....

 code as a literate program.

Misconceptions

Literate programming is very often misunderstood to refer only to formatted documentation produced from a common file with both source code and comments, or to voluminous commentaries included with code. This misconception has led to claims that comment-extraction tools, such as the Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

 Plain Old Documentation
Plain Old Documentation
Plain Old Documentation, abbreviated pod, is a lightweight markup language used to document the Perl programming language.-Design:pod is designed to be a simple, clean language with just enough syntax to be useful. It purposefully does not include mechanisms for fonts, images, colors or tables...

 system, are "literate programming tools". However, because these tools do not implement the "web of abstract concepts" hiding behind the system of natural-language macros, or provide an ability to change the order of the source code from a machine-imposed sequence to one convenient to the human mind, they cannot properly be called literate programming tools in the sense intended by Knuth.

Example

A classic example of literate programming is the literate implementation of the standard Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 wc
Wc (Unix)
wc is a command in Unix-like operating systems.The program reads either standard input or a list of files and generates one or more of the following statistics: number of bytes, number of words, and number of lines...

word counting program. Knuth presented a CWEB
CWEB
CWEB is a computer programming system created by Donald Knuth and Silvio Levy as a follow up to Knuth's WEB literate programming system, using the C programming language instead of Pascal....

 version of this example in Chapter 12 of his Literate Programming book. The same example was later rewritten for the noweb
Noweb
noweb is a literate programming tool, created in 1989–1999 by Norman Ramsey, and designed to be simple, easily extensible and language independent....

 literate programming tool. This example provides a good illustration of the basic elements of literate programming.

Creation of macros
The following snippet of the wc literate program shows how arbitrary descriptive phrases in a natural language are used in a literate program to create macros, which act as new "operators" in the literate programming language, and hide chunks of code or other macros. The mark-up notation consists of double angle brackets ("<<...>>") that indicate macros, the "@" symbol which indicates the end of the code section in a noweb file. The "<<*>>" symbol stands for the "root", topmost node the literate programming tool will start expanding the web of macros from. Actually, writing out the expanded source code can be done from any section or subsection (i.e. a piece of code designated as "<>=", with the equal sign), so one literate program file can contain several files with machine source code.

The purpose of wc is to count lines, words, and/or characters in a list of files. The
number of lines in a file is ......../more explanations/

Here, then, is an overview of the file wc.c that is defined by the noweb program wc.nw:
<<*>>=
<
>
<>
<>
<>
<>
@

We must include the standard I/O definitions, since we want to send formatted output
to stdout and stderr.
<
>=
#include
@

Note also that the unraveling of the chunks can be done in any place in the literate program text file, not necessarily in the order they are sequenced in the enclosing chunk, but as is demanded by the logic reflected in the explanatory text that envelops the whole program.

Program as a web – macros are not just section names
Macros are not the same as "section names" in standard documentation. Literate programming macros can hide any chunk of code behind themselves, and be used inside any low-level machine language operators, often inside logical operators such as "if", "while" or "case". This is illustrated by the following snippet of the wc literate program.


The present chunk, which does the counting, was actually one of
the simplest to write. We look at each character and change state if it begins or ends
a word.

<>=
while (1) {
<>
c = *ptr++;
if (c > ' ' && c < 0177) {
/* visible ASCII codes */
if (!in_word) {
word_count++;
in_word = 1;
}
continue;
}
if (c '\n') line_count++;
else if (c != ' ' && c != '\t') continue;
in_word = 0;
/* c is newline, space, or tab */
}
@


In fact, macros can stand for any arbitrary chunk of code or other macros, and are thus more general than top-down or bottom-up "chunking", or than subsectioning. Knuth says that when he realized this, he began to think of a program as a web of various parts.

Order of human logic, not that of the compiler
In a noweb literate program besides the free order of their exposition, the chunks behind macros, once introduced with "<<...>>=", can be grown later in any place in the file by simply writing "<>=" and adding more content to it, as the following snippet illustrates ("plus" is added by the document formatter for readability, and is not in the code).

The grand totals must be initialized to zero at the beginning of the program.
If we made these variables local to main, we would have to do this initialization
explicitly; however, C globals are automatically zeroed. (Or rather,``statically
zeroed. (Get it?)

<>+=
long tot_word_count, tot_line_count,
tot_char_count;
/* total number of words, lines, chars */
@


Record of the train of thought
The documentation for a literate program is produced as part of writing the program. Instead of comments provided as side notes to source code a literate program contains the explanation of concepts on each level, with lower level concepts deferred to their appropriate place, which allows for better communication of thought. The snippets of the literate wc above show how an explanation of the program and its source code are interwoven. Such exposition of ideas creates the flow of thought that is like a literary work. Knuth wrote a "novel" which explains the code of the computer strategy game Colossal Cave Adventure
Colossal Cave Adventure
Colossal Cave Adventure gave its name to the computer adventure game genre . It was originally designed by Will Crowther, a programmer and caving enthusiast who based the layout on part of the Mammoth Cave system in Kentucky...

.
Tools
The first published literate programming environment was WEB
WEB
WEB is a computer programming system created by Donald E. Knuth as the first implementation of what he called "literate programming": the idea that one could create software as works of literature, by embedding source code inside descriptive text, rather than the reverse , in an order that is...

, introduced by Donald Knuth
Donald Knuth
Donald Ervin Knuth is a computer scientist and Professor Emeritus at Stanford University.He is the author of the seminal multi-volume work The Art of Computer Programming. Knuth has been called the "father" of the analysis of algorithms...

 in 1981 for his TeX
TeX
TeX is a typesetting system designed and mostly written by Donald Knuth and released in 1978. Within the typesetting system, its name is formatted as ....

 typesetting system; it uses Pascal
Pascal (programming language)
Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...

 as its underlying programming language and TeX for typesetting of the documentation. The complete commented TeX source code was published in Knuth's TeX: The program, volume B of his 5-volume Computers and Typesetting
Computers and Typesetting
Computers and Typesetting is a 5-volume set of books by Donald Knuth published 1986 describing the TeX and Metafont systems for digital typography. Knuth's computers and typesetting project was the result of his frustration with the lack of decent software for the typesetting of mathematical and...

. Knuth had privately used a literate programming system called DOC as early as 1979. He was inspired by the ideas of Pierre-Arnoul de Marneffe. The free CWEB
CWEB
CWEB is a computer programming system created by Donald Knuth and Silvio Levy as a follow up to Knuth's WEB literate programming system, using the C programming language instead of Pascal....

, written by Knuth and Silvio Levy, is WEB adapted for C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 and C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

, runs on most operating systems and can produce TeX and PDF
Portable Document Format
Portable Document Format is an open standard for document exchange. This file format, created by Adobe Systems in 1993, is used for representing documents in a manner independent of application software, hardware, and operating systems....

 documentation.

Other implementations of the literate programming concept are noweb
Noweb
noweb is a literate programming tool, created in 1989–1999 by Norman Ramsey, and designed to be simple, easily extensible and language independent....

 and FunnelWeb, both of which are independent of the programming language of the source code. Noweb is well known for its simplicity: just 2 text markup conventions and 2 tool invocations are needed to use it, and it allows for text formatting in HTML rather than going through the TeX system.

FunnelWeb is another LP tool that can produce HTML documentation output. It has more complicated markup (with "@" escaping any FunnelWeb command), but has many more flexible options (see).

Nuweb can translate a single LP source into any number of code files in any mix of languages together with documentation in LaTeX
LaTeX
LaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...

. It does it in a single invocation; it does not have separate weave and tangle commands. It does not have the extensibility of noweb
Noweb
noweb is a literate programming tool, created in 1989–1999 by Norman Ramsey, and designed to be simple, easily extensible and language independent....

, but it can use the listings package of LaTeX
LaTeX
LaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...

 to provide pretty-printing and the hyperref package to provide hyperlinks in PDF output. It also has extensive indexing and cross-referencing facilities including cross-references from the generated code back to the documentation, both as automatically generated comments and as strings that the code can use to report its behaviour. Vimes is a type-checker for Z notation
Z notation
The Z notation , named after Zermelo–Fraenkel set theory, is a formal specification language used for describing and modelling computing systems. It is targeted at the clear specification of computer programs and computer-based systems in general.-History:...

 which shows the use of nuweb in a practical application. Around 15,000 lines of nuweb source are translated into nearly 15,000 lines of C/C++ code and over 460 pages of documentation. See External links.

Molly is a LP tool written in Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

, which aims to modernize and scale it with "folding HTML" and "virtual views" on code. It uses "noweb" markup for the literate source files see External links.

Codnar is an inverse literate programming tool available as a Ruby Gem
RubyGems
RubyGems is a package manager for the Ruby programming language that provides a standard format for distributing Ruby programs and libraries , a tool designed to easily manage the installation of gems, and a server for distributing them. It is analogous to EasyInstall for the Python programming...

 (see External links). Instead of the machine-readable source code being extracted out of the literate documentation sources, the literate documentation is extracted out of the normal machine-readable source code files. This allows these source code files to be edited and maintained as usual. The approach is similar to that used by popular API documentation tools, such as JavaDoc
Javadoc
Javadoc is a documentation generator from Sun Microsystems for generating API documentation in HTML format from Java source code.The "doc comments" format used by Javadoc is the de facto industry standard for documenting Java classes. Some IDEs, such as Netbeans and Eclipse automatically generate...

. Such tools, however, generate API reference documentation, while Codnar generates a linear narrative describing the code, similar to that created by classical LP tools. Codnar can co-exist with API documentation tools, allowing both a reference manual and a linear narrative to be generated from the same set of source code files.

The Leo text editor
Leo (text editor)
Leo is a text editorthat features outlines with clones as its central tool oforganization, navigation, customization and scripting.-Language:Leo is written in Python and use the Qt Gui toolkit...

 is an outlining editor which supports optional noweb and CWEB markup. The author of Leo mixes two different approaches: first, Leo is an outlining editor, which helps with management of large texts; second, Leo incorporates some of the ideas of literate programming, which in its pure form (i.e. the way it is used by Knuth Web tool and/or tools like "noweb") is possible only with some degree of inventiveness and the use of the editor in a way not exactly envisioned by its author (in modified @root nodes). However, this and other extensions (@file nodes) make outline programming and text management successful and easy and in some ways similar to literate programming.

The Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

 programming language has native support for semi-literate programming, inspired by CWEB but with a simpler implementation. When aiming for TeX output, one writes a plain LaTeX
LaTeX
LaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...

 file where source code is marked by a given surrounding environment; LaTeX can be set up to handle that environment, while the Haskell compiler looks for the right markers to identify Haskell statements to compile, removing the TeX documentation as if they were comments. However, as described above, this is not literate programming in the sense intended by Knuth. Haskell's functional, modular nature makes literate programming directly in the language somewhat easier, but it is not nearly as powerful as one of the WEB tools where "tangle" can reorganize in arbitrary ways.
See also

  • Sweave
    Sweave
    Sweave is a function in the statistical programming language R that enables integration of R code into LaTeX or LyX documents. The purpose is "to create dynamic reports, which can be updated automatically if data or analysis change"....

     – an example of use of the "noweb"-like Literate Programming tool inside the R language for creation of dynamic statistical reports
  • "Self-documenting
    Self-documenting
    In computer programming, self-documenting is a common descriptor for source code that follows certain loosely-defined conventions for naming and structure...

    "

External links
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK