All Topics  
Space (punctuation)

 

   Email Print
   Bookmark   Link






 

Space (punctuation)



 
 
In writing, a space is a blank area that is devoid of content, which separates words, letters, numbers, and punctuation. Conventions for interword
Interword separation

In punctuation, a word divider is a glyph that separates written words. In languages which use the Latin alphabet, Cyrillic alphabet, and Arabic alphabets, as well as other languages of Europe and the Mideast, the word divider is a blank Space , or whitespace, a convention which is spreading, along with other aspects of European punctuati...
 and intersentence spaces vary among languages, and in some cases the spacing rules are quite complex.

The Latin alphabet, used for English, was originally written scripta continua, without any word separators. Later interpunct
Interpunct

An interpunct is a small dot used for interword separation in ancient Latin alphabet, being perhaps the first consistent visual representation of word boundaries in written language....
s,
centred dots, were added to make reading easier, and replaced with spaces after 600–800 AD.






Discussion
Ask a question about 'Space (punctuation)'
Start a new discussion about 'Space (punctuation)'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In writing, a space is a blank area that is devoid of content, which separates words, letters, numbers, and punctuation. Conventions for interword
Interword separation

In punctuation, a word divider is a glyph that separates written words. In languages which use the Latin alphabet, Cyrillic alphabet, and Arabic alphabets, as well as other languages of Europe and the Mideast, the word divider is a blank Space , or whitespace, a convention which is spreading, along with other aspects of European punctuati...
 and intersentence spaces vary among languages, and in some cases the spacing rules are quite complex.

The Latin alphabet, used for English, was originally written scripta continua, without any word separators. Later interpunct
Interpunct

An interpunct is a small dot used for interword separation in ancient Latin alphabet, being perhaps the first consistent visual representation of word boundaries in written language....
s,
centred dots, were added to make reading easier, and replaced with spaces after 600–800 AD. In typesetting, spaces have historically been of multiple lengths with particular space-lengths being used for specific typographic purposes, such as separating words or separating sentences or separating punctuation from words. Following the invention of the typewriter and the subsequent overlap of designer style-preferences and computer-technology limitations, much of this reader-centric variation has been lost in normal use.

In computer representation of text, spaces of various sizes, styles, or language characteristics (different space characters) are indicated with unique code points.

Use of the space in natural languages


Spaces between words


Modern English uses a space to separate words, but not all languages follow this practice. Spaces were not used to separate words in Latin
Latin

Latin is an Italic language, historically spoken in Latium and Ancient Rome. Through the Military history of the Roman Empire, Latin spread throughout the Mediterranean and a large part of Europe....
 until roughly AD 600–AD 800. Ancient Hebrew and Arabic did use spaces, partly to compensate in clarity for the lack of vowels. Traditionally, all CJK
CJK

CJK is a collective term for Chinese language, Japanese language, and Korean language, which constitute the main East Asian languages. The term is used in the field of software and communications internationalization....
 languages have no spaces: modern Chinese
Chinese language

Chinese or the Sinitic language is a language family consisting of language mutually unintelligible to varying degrees. Originally the indigenous languages spoken by the Han Chinese in China, it forms one of the two branches of Sino-Tibetan languages of languages....
 and Japanese
Japanese language

IPA: [n?iho?go] is a language spoken by over 130 million people in Japan and in Japanese emigrant communities. It is related to the Ryukyuan languages....
 (except when written with little or no kanji
Kanji

are the Chinese characters that are used in the modern Japanese language logogram along with hiragana , katakana , Arabic numerals, and the occasional use of the Latin alphabet....
) still do not, but modern Korean
Korean language

Korean is the official language of North Korea and South Korea. It is also one of the two official languages in the Yanbian Korean Autonomous Prefecture in People's Republic of China....
 uses spaces.

Spaces between sentences

For current practice, see here.


There are three main conventions relating to the number of spaces used to separate sentences within the same paragraph:
  • one widened space, typically two to three times wider than an inter-word space (traditional typography)
  • two spaces (English spacing or American typewriter spacing)
  • one space (French spacing
    French spacing

    Double spacing at the ends of sentences is a typographical convention that has sometimes been termed English spacing. Since the mid-1990s, it has often been termed French spacing, although that term has traditionally referred to the practice of single spacing....
    )


Double spacing can also refer to a style of line spacing: the insertion of a full additional empty line between lines of text. This is commonly used for text which may incorporate later markup or modifications, such as proof-readers' copies, legal documents, or academic assignments for correction.

Space characters and digital typography


The variable-width general-purpose space

In computer character encoding
Character encoding

A character encoding system consists of a code that pairs a sequence of character from a given character set with something else, such as a sequence of natural numbers, octet or electrical pulses, in order to facilitate the transmission of data through telecommunication networks and/or Computer data storage of Character in compute...
s, there is a normal general-purpose space (Unicode
Unicode

Unicode is a computing industry standard allowing computers to consistently represent and manipulate Character expressed in most of the world's writing systems....
 character ; 32 decimal) whose width will vary according to the design of the typeface. Typical values range from 1/5-em to 1/3-em (in digital typography an em
Em (typography)

An em is a typographic unit in the field of typography, equal to the point size of the current font. This unit of measurement is not defined in terms of any specific typeface, and thus is the same for all fonts at a given point size....
 is equal to the nominal size of the font, so for a 10-point font the space will probably be between 2 and 3.3 points). Sophisticated fonts may have differently sized spaces for bold, italic, and small-caps faces, and often compositors will manually adjust the width of the space depending on the size and prominence of the text.

In addition to this general-purpose space, it is possible to encode a space of a specific width. See the table below for a complete list.

(In monospaced proofreading
Proofreading

Proof-reading traditionally means reading a proof copy of a writing in order to detect and correct any errors. Modern proofreading often requires reading Copy at earlier stages as well....
 copy, only em- and en-spaces are represented using this character (which is called an em-quad or an en-quad), while other types of spaces are represented with a number sign.
Number sign

'Number sign' is a name for the symbol '#'; it is the preferred Unicode name for the code point associated with that glyph. The symbol is similar to the musical symbol called Sharp ....


Breaking and non-breaking spaces

When rendered, the generic Unicode space is often considered insignificant when appearing at the end of a line of text, or when part of a sequence of whitespace characters, so it may be omitted or "collapsed" in such circumstances. The non-breaking space
Non-breaking space

In computer-based text processing and digital typesetting, a non-breaking space or no-break space is a variant of the space character that prevents an automatic line break at its position....
, (160 decimal), renders the same as a normal space but is expressly non-collapsible. It is often used to prevent line wrapping or to indent text, though best World Wide Web
World Wide Web

The World Wide Web is a very large set of interlinked hypertext documents accessed via the Internet. With a Web browser, one can view Web pages that may contain writing, s, videos, and other multimedia and navigate between them using hyperlinks....
 practice prescribes using CSS
CSS

CSS may stand for:...
 for the latter purpose.

Hair spaces around dashes

Typically, both en dashes and em dashes are set continuous with the text (as illustrated by use in the Chicago Manual of Style, 6.80, 6.83–86). However, an em dash can optionally be surrounded with a so-called hair space, (8202 decimal). This space should be much thinner than a normal space, and is seldom used on its own. It can be written in HTML by using the numeric character reference
Numeric character reference

A numeric character reference is a common markup construct used in SGML and other SGML-based markup languages such as HTML and XML. It consists of a short sequence of character s that, in turn, represent a single character from the Universal Character Set of Unicode....
   or  . Very few user agent
User agent

A user agent is the client application used with a particular network protocol; the phrase is most commonly used in reference to those which access the World Wide Web, but other systems such as Session Initiation Protocol use the term user agent to refer to the user's phone....
s are able to render a hair space correctly: in most cases the result is an unwanted symbol or a question mark on the screen, depending on the font
Typeface

In typography, a typeface is a set of one or more fonts, in one or more sizes, designed with stylistic unity, each comprising a coordinated set of glyphs....
 and renderer capabilities.

Normal space versus hair space
Normal space left right
Normal space with em dash left — right
Hair space with em dash
No space with em dash left—right


Table of spaces

Unicode defines several space characters with specific semantics and rendering characteristics, as shown in the table below. Depending on the browser and fonts used to view this table, not all spaces may display properly:

Space characters defined in Unicode
Code No break
Non-breaking space

In computer-based text processing and digital typesetting, a non-breaking space or no-break space is a variant of the space character that prevents an automatic line break at its position....
HTML entity Name In Block Display Description
U+0020    Space Basic Latin] [ Normal space, same as ASCII character 0x20
U+00A0   No-Break Space Latin-1 Supplement] [ Identical to U+0020, but not a point at which a line may be broken
U+1680    Ogham Space Mark Ogham]?[ Used for interword separation
Interword separation

In punctuation, a word divider is a glyph that separates written words. In languages which use the Latin alphabet, Cyrillic alphabet, and Arabic alphabets, as well as other languages of Europe and the Mideast, the word divider is a blank Space , or whitespace, a convention which is spreading, along with other aspects of European punctuati...
 in Ogham
Ogham

Ogham is an Early Medieval alphabet used primarily to represent the Old Irish language, and occasionally the Brythonic languages ancestor of Welsh language....
 text. Normally a vertical line in vertical text or a horizontal line in horizontal text, but may also be a blank space in "stemless" fonts. Requires an Ogham font.
U+180E  ᠎ Mongolian Vowel Separator,
or MVS
Mongolian]᠎[ A narrow space character (not to be confused with "thin space", below) used in Mongolian to cause the final two characters of a word to take on different shapes.
U+2002    En Space,
or Nut
General Punctuation] [ Width of one en
En (typography)

An en is a typographic unit, half of the width of an em . By definition, it is equivalent to half of the height of the font . As its name suggests, it is also traditionally the width of a lowercase letter "n"....
 (half of one em
Em (typography)

An em is a typographic unit in the field of typography, equal to the point size of the current font. This unit of measurement is not defined in terms of any specific typeface, and thus is the same for all fonts at a given point size....
). U+2000 En Quad is canonically equivalent to this character (En Space is preferred).
U+2003    Em Space,
or Mutton
General Punctuation] [ Width of one em
Em (typography)

An em is a typographic unit in the field of typography, equal to the point size of the current font. This unit of measurement is not defined in terms of any specific typeface, and thus is the same for all fonts at a given point size....
. U+2001 Em Quad is canonically equivalent to this character (Em Space is preferred).
U+2004    Three-Per-Em Space,
or Thick Space
General Punctuation] [ One third of an em wide
U+2005    Four-Per-Em Space,
or Mid Space
General Punctuation] [ One fourth of an em wide
U+2006    Six-Per-Em Space General Punctuation] [ One sixth of an em wide. In computer typography sometimes equated to U+2009.
U+2007   Figure Space General Punctuation]?[ In fonts with monospaced digits, equal to the width of one digit
U+2008    Punctuation Space General Punctuation]?[ As wide as the narrow punctuation in a font
U+2009    Thin Space General Punctuation]?[ One fifth (sometimes one sixth) of an em wide. Recommended for use as a thousands separator for measures made with SI units. Unlike U+2002 to U+2008, its width may get adjusted in typesetting.
U+200A    Hair Space General Punctuation]?[ Thinner than a thin space
U+200B  ​ Zero Width Space,
or ZWSP
General Punctuation]?[ Used to indicate word boundaries to text processing systems when using scripts that do not use explicit spacing.
U+200C  ‌ Zero Width Non Joiner
Zero-width non-joiner

The zero width non joiner is a non-printing character used in the computerized typesetting of some cursive script, Korean hangul or Persian alphabet script....
,
or ZWNJ
General Punctuation]‌[ When placed between two characters that would otherwise be connected, a ZWNJ causes them to be printed in their final and initial forms, respectively.
U+200D  ‍ Zero Width Joiner
Zero-width joiner

The zero width joiner is a non-printing character used in the computerized typesetting of some cursive scripts, such as the Arabic alphabet script or the Korean hangul script....
,
or ZWJ
General Punctuation]‍[ When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected forms.
U+202F   Narrow No-Break Space General Punctuation]?[ Similar in function to U+00A0 No-Break Space. Introduced in Unicode 3.0 for Mongolian, to separate a suffix from the word stem without indicating a word boundary. When used with Mongolian, its width is usually one third of the normal space; in other context, its width resembles that of the Thin Space (U+2009) at least with some fonts.
U+205F    Medium Mathematical Space General Punctuation]?[ Used in mathematical formulae
U+2060 ⁠ Word Joiner General Punctuation]?[ Identical to U+200B, but not a point at which a line may be broken. Introduced in Unicode 3.2 to replace the deprecated "zero width no-break space" function of the U+FEFF character.
U+3000    Ideographic Space CJK Symbols and Punctuation] [ As wide as a CJK
CJK

CJK is a collective term for Chinese language, Japanese language, and Korean language, which constitute the main East Asian languages. The term is used in the field of software and communications internationalization....
 character cell (fullwidth)
U+FEFF  Zero Width No-Break Space
= Byte Order Mark (BOM)
Byte Order Mark

A byte-order mark is the Unicode character at code point U+FEFF when that character is used to denote the endianness of a string of Universal Character Set/Unicode characters encoded in UTF-16 or UTF-32....
Arabic Presentation Forms-B][ Used primarily as a Byte Order Mark character. Use as an indication of non-breaking is deprecated as of Unicode 3.2. See U+2060 instead.


Unicode also provides some visible characters to stand in for space when necessary in the "Control Pictures" block: the Symbol For Space (U+2420), the Blank Symbol (U+2422), and the Open Box (U+2423). The interpunct
Interpunct

An interpunct is a small dot used for interword separation in ancient Latin alphabet, being perhaps the first consistent visual representation of word boundaries in written language....
 · is also often used to represent a space in word processing programs such as Microsoft Word
Microsoft Word

Microsoft Word is Microsoft's word processor computer software. It was first released in 1983 under the name Multi-Tool Word for Xenix systems....
.

Use of the space in computing


In programming language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
 syntax, spaces are frequently used to explicitly separate tokens. Aside from this use, spaces and other whitespace character
Whitespace (computer science)

In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page....
s are usually ignored by modern programming languages. Exceptions are Haskell
Haskell (programming language)

Haskell is a standardized, purely functional programming language with non-strict programming language, named after logician Haskell Curry. The goals of the language are described as:...
, ABC, and Python
Python (programming language)

Python is a general-purpose high-level programming language. Its design philosophy emphasizes code readability. Python's core syntax and semantics are Minimalism , while the standard library is large and comprehensive....
, which use the amount of whitespace in indentation to indicate the bounds of a block, and a whimsical language called Whitespace
Whitespace (programming language)

Whitespace is an esoteric programming language developed by Edwin Brady and Chris Morris at the University of Durham. It was released on 1 April 2003 ....
, where whitespace is the only meaningful syntactical element.

Text editors, word processor
Word processor

A word processor is a computer Application software used for the production of any sort of printable material.Word processor may also refer to an obsolete type of stand-alone office machine, popular in the 1970s and 80s, combining the keyboard text-entry and printing functions of an electric typewriter with a dedicated computer for th...
s, and desktop publishing software differ in how they represent whitespace on the screen, and how they represent spaces at the ends of lines longer than the screen or column width. In some cases, spaces are shown simply as blank space; in other cases they may be represented by an interpunct
Interpunct

An interpunct is a small dot used for interword separation in ancient Latin alphabet, being perhaps the first consistent visual representation of word boundaries in written language....
 or other symbols. Many different characters (described below) could be used to produce spaces, and non-character functions (such as margins and tab settings) can also affect whitespace.

Space characters in markup languages

Generalised markup languages, such as SGML, do not treat space characters differently from other characters.

However, special-purpose markup languages may do. In particular, web markup languages such as XML and HTML
HTML

HTML, an Acronym and initialism of HyperText Markup Language, is the predominant markup language for Web pages. It provides a means to describe the structure of text-based information in a document?by denoting certain text as links, headings, paragraphs, lists, and so on?and to supplement that text with interactive forms, embedded '...
 treat whitespace characters specially, including space characters, for programmers' convenience. One or more space characters read by conforming Display-time processors of those markup language
Markup language

A markup language is a set of codes that give instructions regarding the structure of a text or how it is to be displayed. Markup languages have been in use for centuries, and in recent years have been used in computer typesetting and word-processing systems to specify the formatting, layout, structure, and other elements of a document....
s are collapsed to 0 or 1 space, depending on their semantic context. For example, double (or more) spaces within text are collapsed to a single space, and spaces which appear on either side of the "=" that separates an attribute name from its value have no effect on the interpretation of the document. Element end tags can contain trailing spaces, and empty-element tags in XML can contain spaces before the "/>".

In XML attribute values, sequences of whitespace characters are treated as a single space when the document is read by a parser. Whitespace in XML element content is not changed in this way by the parser, but an application receiving information from the parser may choose to apply similar rules to element content. An XML document author can use the xml:space="preserve" attribute on an element to force the parser to discourage the downstream application from altering whitespace in that element's content.

In most HTML element
HTML element

In computing, an HTML element indicates structure in an HTML document and a way of hierarchically arranging content. More specifically, an HTML element is an Standard Generalized Markup Language element that meets the requirements of one or more of the HTML Document Type Definitions ....
s, a sequence of whitespace characters is treated as a single inter-word separator, which may manifest as a single space character when rendering text in a language that normally inserts such space between words. Conforming HTML renderers are required to apply a more literal treatment of whitespace within a few prescribed elements, such as the pre tag and any element for which CSS
Cascading Style Sheets

Cascading Style Sheets is a stylesheet language used to describe the presentation of a document written in a markup language. Its most common application is to style web pages written in HTML and XHTML, but the language can be applied to any kind of XML document, including Scalable Vector Graphics and XUL....
 has been used to apply pre-like whitespace processing. In such elements, space characters will not be "collapsed" into inter-word separators.

In both XML and HTML, the non-breaking space
Non-breaking space

In computer-based text processing and digital typesetting, a non-breaking space or no-break space is a variant of the space character that prevents an automatic line break at its position....
 character, along with other non-"standard" spaces, is not treated as collapsible "whitespace", so it is not subject to the rules above.

See also

  • Hard space
    Hard space

    In typesetting and text editors, the term hard space has several meanings, all related to a special way of representing the whitespace between characters....
  • Hyphenation
  • Internal field separator
    Internal field separator

    In Unix operating systems, internal field separator refers to the character or characters designated as whitespace by the operating system....
  • Non-breaking space
    Non-breaking space

    In computer-based text processing and digital typesetting, a non-breaking space or no-break space is a variant of the space character that prevents an automatic line break at its position....


External links


  • , by Jukka "Yucca" Korpela.