All Topics  
Whitespace (computer science)

 

   Email Print
   Bookmark   Link






 

Whitespace (computer science)



 
 
In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
, whitespace is any single character
Character (computing)

In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written language form of a natural language....
 or series of characters that represents horizontal or vertical space in typography
Typography

Typography is the art and techniques of typesetting, type design, and modifying type glyphs. Type glyphs are created and modified using a variety of illustration techniques....
. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page. For example, the common whitespace symbol
Symbol

A symbol is something such as an entity, picture, written word, sound, or particular mark that represents something else by association, resemblance, or convention....
 " " (the Unicode
Unicode

Unicode is a computing industry standard allowing computers to consistently represent and manipulate Character expressed in most of the world's writing systems....
 character at the 32nd code point
Code point

In character encoding terminology a code point is any of the numerical values that make up the codespace. For example, ASCII comprises 128 code points in the range 0Hexadecimal to 7Fhex, Extended ASCII comprises 256 code points in the range 0Hexadecimal to FFhex, and Unicode comprises 1,114,112 code...
) represents a blank space
Space (punctuation)

In writing, a space is a blank area that is devoid of content, which word divider, letters, numbers, and punctuation. Conventions for interword separation and intersentence spaces vary among languages, and in some cases the spacing rules are quite complex....
, as used between words and sentences in Western script
Writing system

A writing system is a type of symbolic system used to represent elements or statements expressible in language....
s.

The term whitespace is based on the assumption that the background color used for rendered text is white, and is thus confusing if it is not.

As is common in technical literature, the two words "white space" have found widespread usage as the single term "whitespace", especially when used as an adjective
Adjective

In grammar, an adjective is a word whose main syntax role is to grammatical modifier a noun or pronoun, giving more information about the noun or pronoun's definition....
, as in "whitespace character".






Discussion
Ask a question about 'Whitespace (computer science)'
Start a new discussion about 'Whitespace (computer science)'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
, whitespace is any single character
Character (computing)

In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written language form of a natural language....
 or series of characters that represents horizontal or vertical space in typography
Typography

Typography is the art and techniques of typesetting, type design, and modifying type glyphs. Type glyphs are created and modified using a variety of illustration techniques....
. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page. For example, the common whitespace symbol
Symbol

A symbol is something such as an entity, picture, written word, sound, or particular mark that represents something else by association, resemblance, or convention....
 " " (the Unicode
Unicode

Unicode is a computing industry standard allowing computers to consistently represent and manipulate Character expressed in most of the world's writing systems....
 character at the 32nd code point
Code point

In character encoding terminology a code point is any of the numerical values that make up the codespace. For example, ASCII comprises 128 code points in the range 0Hexadecimal to 7Fhex, Extended ASCII comprises 256 code points in the range 0Hexadecimal to FFhex, and Unicode comprises 1,114,112 code...
) represents a blank space
Space (punctuation)

In writing, a space is a blank area that is devoid of content, which word divider, letters, numbers, and punctuation. Conventions for interword separation and intersentence spaces vary among languages, and in some cases the spacing rules are quite complex....
, as used between words and sentences in Western script
Writing system

A writing system is a type of symbolic system used to represent elements or statements expressible in language....
s.

The term whitespace is based on the assumption that the background color used for rendered text is white, and is thus confusing if it is not.

As is common in technical literature, the two words "white space" have found widespread usage as the single term "whitespace", especially when used as an adjective
Adjective

In grammar, an adjective is a word whose main syntax role is to grammatical modifier a noun or pronoun, giving more information about the noun or pronoun's definition....
, as in "whitespace character". Some specifications refer to "white space" while others refer to "whitespace"; there is no difference between the terms, although exactly which characters are being referred to does vary from context to context. For example, in HTML
HTML

HTML, an Acronym and initialism of HyperText Markup Language, is the predominant markup language for Web pages. It provides a means to describe the structure of text-based information in a document?by denoting certain text as links, headings, paragraphs, lists, and so on?and to supplement that text with interactive forms, embedded '...
, "whitespace" includes the form feed character, while in XML, "white space" does not.

The most common whitespace characters may be typed via the space bar
Space bar

The space bar, spacebar, or space key, is a key on an alphanumeric keyboard in the form of a horizontal bar in the lowermost row, significantly wider than other keys....
 or the Tab key
Tab key

Tab key on a alphanumeric keyboard is used to advance the cursor to the next tab stop....
. Depending on context, a line-break generated by the Return key (Enter key
Enter key

In computer Keyboard s, the enter key in most cases causes a command-line interface, window form or dialog box to operate its default function, which is typically to finish an "entry" and begin the desired process....
) may be considered whitespace as well.

Runs of whitespace occurring within source code
Source code

In computer science, source code is any collection of statements or declarations written in some human-readable computer programming language....
 written in computer programming language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
s are generally ignored; such languages are free-form
Free-form language

In computer programming, a free-form language is a programming language in which the positioning of character on the page in program text is not significant....
. But, for example, in Haskell
Haskell (programming language)

Haskell is a standardized, purely functional programming language with non-strict programming language, named after logician Haskell Curry. The goals of the language are described as:...
 and Python
Python (programming language)

Python is a general-purpose high-level programming language. Its design philosophy emphasizes code readability. Python's core syntax and semantics are Minimalism , while the standard library is large and comprehensive....
, whitespace and indentation are used for syntactical purposes. And in Whitespace
Whitespace (programming language)

Whitespace is an esoteric programming language developed by Edwin Brady and Chris Morris at the University of Durham. It was released on 1 April 2003 ....
, whitespaces are the only valid characters for programming, while any other characters are ignored.

Still, for most programming languages, abundant use of whitespace, especially trailing whitespace at the end of lines, is considered a nuisance. In interpreted language
Interpreted language

In computer programming an interpreted language is a programming language whose implementation often takes the form of an interpreter . Theoretically, any language may be compiler or interpreted, so this designation is applied purely because of common implementation practice and not some underlying property of a language....
s, parsing of unnecessary whitespace may affect the speed of execution. In markup language
Markup language

A markup language is a set of codes that give instructions regarding the structure of a text or how it is to be displayed. Markup languages have been in use for centuries, and in recent years have been used in computer typesetting and word-processing systems to specify the formatting, layout, structure, and other elements of a document....
s like HTML
HTML

HTML, an Acronym and initialism of HyperText Markup Language, is the predominant markup language for Web pages. It provides a means to describe the structure of text-based information in a document?by denoting certain text as links, headings, paragraphs, lists, and so on?and to supplement that text with interactive forms, embedded '...
, unnecessary whitespace increases the file size, and may so affect the speed of transfer over a network. On the other hand, unnecessary whitespace can also inconspicuously mark code, similar to, but less obvious than comments in code. This can be desirable to prove an infringement
Infringement

Infringement, when used alone, has several possible meanings in the English language.In a legal context, an infringement refers to the violation of a law or a right....
 of license or copyright that was committed by copying and pasting.

The C language
C (programming language)

C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
 defines whitespace to be "... space, horizontal tab, new-line, vertical tab, and form-feed". The HTTP network protocol has very strict requirements about what type of whitespace can occur in the control structures (such as the header fields) and where it must and must not occur.

On some occasions, such as a textbook on the Modula-2 computer language published ca. 1985 by Springer-Verlag, it is necessary to explicitly show a symbol to indicate a space code. That book, at least, used the symbol ? (Unicode U+2423, decimal 9251, OPEN BOX) to show an explicit space code. (In case it doesn't render well on a monitor screen, it's like a ] (closing square bracket) rotated a quarter-turn clockwise, although not as wide, and placed below the writing line. Some fonts render it too narrowly.)

Such usage is similar to multiword file names written for operating systems and applications that are confused by embedded space codes—such file names instead use a low line (_) as a word separator, as_in_this_phrase.

Another such symbol was ? (Unicode U+2422, decimal 9250, LATIN SMALL B WITH STROKE). This was used in the early years of computer programming (especially by IBM?) when writing on coding forms. Keypunch operators immediately recognized the symbol as an "explicit space".

Unicode


In Unicode
Unicode

Unicode is a computing industry standard allowing computers to consistently represent and manipulate Character expressed in most of the world's writing systems....
 (Unicode Character Database) the following codepoints are defined as whitespace:

  • U+0009–U+000D (control characters, containing Tab
    Tab key

    Tab key on a alphanumeric keyboard is used to advance the cursor to the next tab stop....
    , CR
    Carriage return

    Originally, carriage return was the term for the control character in Baudot code on a Teleprinter for end of line return to beginning of line and did not include line feed....
     and LF)
  • U+0020 SPACE
  • U+0085 NEL (control character next line)
  • U+00A0 NBSP (NO-BREAK SPACE)
  • U+1680 OGHAM SPACE MARK
  • U+180E MONGOLIAN VOWEL SEPARATOR
  • U+2000–U+200A (different sorts of spaces)
  • U+2028 LS (LINE SEPARATOR)
  • U+2029 PS (PARAGRAPH SEPARATOR)
  • U+202F NNBSP (NARROW NO-BREAK SPACE)
  • U+205F MMSP (MEDIUM MATHEMATICAL SPACE)
  • U+3000 IDEOGRAPHIC SPACE


See also

  • Programming style
    Programming style

    Programming style is a set of rules or guidelines used when writing the source code for a computer program. It is often claimed that following a particular programming style will help programmers to read and understand source code conforming to the style, and help to avoid introducing errors....
  • Indent style
    Indent style

    In computer programming, an indent style is a convention governing the indentation of block s of code to convey the program's structure. This article largely addresses the C and its descendants, but can be applied to most other programming languages ....
  • Space (punctuation)
    Space (punctuation)

    In writing, a space is a blank area that is devoid of content, which word divider, letters, numbers, and punctuation. Conventions for interword separation and intersentence spaces vary among languages, and in some cases the spacing rules are quite complex....
  • Trim (programming)
    Trim (programming)

    In programming, trim or strip is a common string manipulation function which removes leading and trailing whitespace from a string .For example, in Python :...