Soft hyphen
Encyclopedia
In computing and typesetting, a soft hyphen is a type of hyphen
Hyphen
The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. The hyphen should not be confused with dashes , which are longer and have different uses, or with the minus sign which is also longer...

 used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed.

Additional semantics associated with the soft hyphen vary. According to the Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 standard, a soft hyphen is not displayed if the line is not broken at that point. HTML4 describes it as a "hyphenation hint," though it suggests that that interpretation is not universal:
In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent
User agent
In computing, a user agent is a client application implementing a network protocol used in communications within a client–server distributed computing system...

 as just another character. The soft hyphen tells the user agent where a line break can occur. Those browsers that interpret soft hyphens must observe the following semantics: If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored.


ISO 8859-1 specifies that it is always visible. EBCDIC
EBCDIC
Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....

 has a SHY character, with "SHY" an abbreviation for "syllable
Syllable
A syllable is a unit of organization for a sequence of speech sounds. For example, the word water is composed of two syllables: wa and ter. A syllable is typically made up of a syllable nucleus with optional initial and final margins .Syllables are often considered the phonological "building...

 hyphen," which is defined by IBM to mean a "hyphen used to divide a word at the end of a line [that] may be removed when a program adjusts lines."

In most parts of ISO-8859 the soft hyphen is at position 0xAD (hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

), and since the first 256 positions in Unicode are taken from ISO-8859-1, it has a Unicode codepoint of U+00AD. HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 3.2 introduced a character entity for the soft hyphen, "­". In TeX
TeX
TeX is a typesetting system designed and mostly written by Donald Knuth and released in 1978. Within the typesetting system, its name is formatted as ....

 and LaTeX
LaTeX
LaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...

 the soft hyphen is represented by the command \- .

To show the effect of a soft hyphen, the following “wocka”s have been separated with soft hyphens

On browsers supporting soft hyphens, resizing the window will hyphenate the above text only as “wocka-”s (and never with, for example, “wock-”). On browsers not supporting soft hyphens, the above might appear as one very long line, or as several lines but without hyphens at the end of each broken line.

Compare soft hyphen's semantics and HTML implementation with the zero-width space
Zero-width space
The zero-width space is a non-printing character used in computerized typesetting to indicate word boundaries to text processing systems when using scripts that do not use explicit spacing, or after characters that are not followed by a visible space but after which there may nevertheless be a...

.

Accessibility issues

Soft hyphens are known to cause some text-to-speech systems to mispronounce words.

Security issues

Soft hyphens have been used to obscure
Obfuscated code
Obfuscated code is source or machine code that has been made difficult to understand for humans. Programmers may deliberately obfuscate code to conceal its purpose or its logic to prevent tampering, deter reverse engineering, or as a puzzle or recreational challenge for someone reading the source...

 malicious domains
Domain name
A domain name is an identification string that defines a realm of administrative autonomy, authority, or control in the Internet. Domain names are formed by the rules and procedures of the Domain Name System ....

 or URLs in E-mail spam
E-mail spam
Email spam, also known as junk email or unsolicited bulk email , is a subset of spam that involves nearly identical messages sent to numerous recipients by email. Definitions of spam usually include the aspects that email is unsolicited and sent in bulk. One subset of UBE is UCE...

.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK