A string literal is the representation of a string
String (computer science)
In computer programming and some branches of mathematics, a string is an ordered sequence of symbols. These symbols are chosen from a predetermined set or alphabet.... value within the source code
Source code
In computer science, source code is any collection of statements or declarations written in some human-readable computer programming language.... of a computer program
Computer program
Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running.... . There are numerous alternate notations for specifying string literals, and the exact notation depends on the individual programming language
Programming language
A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer.... in question. Nevertheless, there are some
general guidelines that most modern programming languages follow.
Specifically, most string literals can be specified using:
require the use of balanced "bracketed" characters on either side of the string.
Advantages:
Drawbacks:
This is however not a drawback when the prefix is generated by an algorithm as most likely the case.
ntation.
- title: An example multi-line string in YAML
body : |
This is a multi-line string.
"special" metacharacters may
appear here.
Discussion
Ask a question about 'String literal'
Start a new discussion about 'String literal'
Answer questions from other users
Full Discussion Forum
Encyclopedia
A string literal is the representation of a string
String (computer science)
In computer programming and some branches of mathematics, a string is an ordered sequence of symbols. These symbols are chosen from a predetermined set or alphabet.... value within the source code
Source code
In computer science, source code is any collection of statements or declarations written in some human-readable computer programming language.... of a computer program
Computer program
Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running.... . There are numerous alternate notations for specifying string literals, and the exact notation depends on the individual programming language
Programming language
A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer.... in question. Nevertheless, there are some
general guidelines that most modern programming languages follow.
Specifically, most string literals can be specified using:
Fortran is a general-purpose programming language, procedural programming language, imperative programming language programming language that is especially suited to numerical analysis and scientific computing.... programming language (for example), string literals were written in so-called Hollerith notation, where a decimal count of the number of characters was followed by the letter H, and then the characters of the string:
27HAn example Hollerith string
This declarative notation style is contrasted with bracketed delimiter
Delimiter
A delimiter is a sequence of one or more character s used to specify the boundary between separate, independent regions in plain text or other data stream.... quoting, because it does
not require the use of balanced "bracketed" characters on either side of the string.
Advantages:
eliminates text searching (for the delimiter character) and therefore requires significantly less overhead
Computational overhead
In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal....
A delimiter is a sequence of one or more character s used to specify the boundary between separate, independent regions in plain text or other data stream....
A metacharacter is a character that has a special meaning to a computer program, such as a Operating system shell or a regular expression engine.... s that might otherwise be mistaken as commands
can be used for quite effective data compression of plain text strings
Drawbacks:
this type of notation is error-prone if used as manual entry by programmer
Programmer
A programmer is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software.... s
This is however not a drawback when the prefix is generated by an algorithm as most likely the case.
YAML is a human-readable data serialization format that takes concepts from languages such as XML, C , Python , Perl, as well as the format for electronic mail as specified by Request for Comments .... , string literals may be specified by the relative positioning of whitespace and
indentation.
- title: An example multi-line string in YAML
body : |
This is a multi-line string.
"special" metacharacters may
appear here. The extent of this string is
indicated by indentation.
A delimiter is a sequence of one or more character s used to specify the boundary between separate, independent regions in plain text or other data stream.... (also balanced delimiters, or quoting)
to specify string literals. Double quotes are the most common quoting delimiters used:
"Hi There!"
Some languages also allow the use of single quotes as an alternative to double quotes (though the string must begin and end with the same kind of quotation mark):
'Hi There!'
Note that these quotation marks are unpaired (the same character is used as an opener and a closer), which is a hangover from the typewriter
Typewriter
A typewriter is a Machine or electromechanical device with a set of "keys" that, when pressed, cause Typeface to be printed on a medium, usually paper.... technology which was the precursor of the earliest computer input and output devices. The Unicode
Unicode
Unicode is a computing industry standard allowing computers to consistently represent and manipulate Character expressed in most of the world's writing systems.... character set includes paired (separate opening and closing) versions of both single and double quotes:
Visual Basic , formerly called Visual Basic .NET , is an object-oriented programming computer language that can be viewed as an evolution of Microsoft Visual Basic implemented on the .NET Framework.... .
PostScript is a dynamically typed concatenative programming language programming language created by John Warnock and Charles Geschke in 1982. PostScript is best known for its use as a page description language in the electronic and desktop publishing areas.... programming language uses parentheses, with embedded newlines allowed,
and also embedded unescaped parentheses provided they are properly paired:
Tcl is a scripting language created by John Ousterhout. Originally "born out of frustration"?according to the author?with programmers devising their own languages intended to be embedded into applications, Tcl quickly gained wide acceptance on its own and is generally thought to be easy to learn, but powerful in competent hands.... programming language uses braces (embedded newlines allowed, embedded unescaped braces allowed provided properly paired):
This practice derives on one hand from the single quotes in Unix shells (these are raw strings) and on the other from the use of braces in C
C (programming language)
C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system.... for compound statements, since blocks of code is in Tcl syntactically the same thing as string literals. That the delimiters are paired is essential for making this feasible.
Delimiter collision
Delimiter collision is a common problem for string literal notations that use
balanced delimiters and quoting. The problem occurs when a programmer attempts to use a quoting character as part of the string literal itself. Because this is a very common problem, a number of methods for avoiding delimiter collision have been invented.
Modula-2 is a computer programming language invented by Niklaus Wirth at ETH, around 1978, as a successor to his intermediate language Modula. Modula-2 was implemented in 1980 for the Lilith computer, which was commercialized in 1982 by startup company DISER as MC1 and MC2.... , JavaScript
JavaScript
JavaScript is a scripting language widely used for client-side web development. It was the originating Programming language dialect of the ECMAScript standard.... ) attempt to avoid the delimiter collision problem by allowing a dual quoting
style. Typically, this consists of allowing the programmer to use either single quotes
or double quotes interchangeably.
"This is John's apple."
'I said, "Can you hear me?"'
One problem with dual quoting is that it doesn't allow for the inclusion of both styles
of quotes at once within the same literal (unless escaped, see below).
Some programming languages allow subtle variations on dual quoting, treating single quotes
and double quotes slightly differently (e.g.sh
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Version 7 Unix, and replaced the Thompson shell, whose executable file had the same name, sh.... , Perl
Perl
In computer programming, Perl is a high-level programming language, List of programming languages by category, Interpreter , dynamic programming language.... ).
Escape character
One method for avoiding delimiter collision is to use escape character
Escape character
In computing and telecommunication, an escape character is a single character which in a sequence of characters signifies that what is to follow takes an alternative interpretation.... s:
"I said, \"Can you hear me?\""
The most commonly-used escape character for this purpose is the backslash "\",
the tradition for which originated on Unix. From a language design standpoint, this
approach is adequate, but there are drawbacks:
text can be rendered unreadable when littered with numerous escape characters
escape characters are required to be escaped, when not intended as escape characters
although easy to type, they can be cryptic to someone unfamiliar with the language
"I said, \"The Windows path is C:\\Foo\\Bar\\Baz\""
The confusing presence of too many escape and slash characters in a string is commonly disparaged as leaning toothpick syndrome
Leaning toothpick syndrome
In computer programming, leaning toothpick syndrome is the situation in which a quoted expression becomes unreadable because it contains a large number of escape characters, usually backslashes , to avoid Delimiter#Delimiter collision.... .
Escape sequence
An extended concept of the escape character, an escape sequence is also a means of avoiding
delimiter collision. An escape sequence consists of two or more consecutive characters that can have
special meaning when used in the context of a string literal.
"I said, \x22Can you hear me?\x22"
Escape sequences can also be used for purposes other than avoiding delimiter collision, and
can also include metacharacters. (see Metacharacters below).
Pascal is an influential imperative programming and Procedural programming programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structure.... , BASIC
BASIC
In computer programming, BASIC is a family of high-level programming languages. The Dartmouth BASIC was designed in 1964 by John George Kemeny and Thomas Eugene Kurtz at Dartmouth College in New Hampshire, United States to provide computer access to non-science students.... and DCL
DIGITAL Command Language
DCL, the DIGITAL Command Language, is the standard command languageadopted by most of the operating systems that were sold by the former Digital Equipment Corporation .... ) avoid delimiter collision
by doubling up on the quotation marks that are intended to be part of the string literal
itself:
'This Pascal stringcontains two apostrophes
"I said, ""Can you hear me?"""
Extended quoting styles
Some languages extend the previously-mentioned quoting conventions even further. These extended approaches provide an even more flexible style of notation for avoiding delimiter collision.
Triple quoting:
One such extension, the use of triple quoting, is used in Python
Python (programming language)
Python is a general-purpose high-level programming language. Its design philosophy emphasizes code readability. Python's core syntax and semantics are Minimalism , while the standard library is large and comprehensive.... :
This is John's apple.
"""John is Nancy's so-called "boyfriend"."""
Triple quoted string literals may be delimited by """ or . Triple quoting in Python also has the added benefit of allowing string literals to span more than one physical line of source code.
Multiple quoting:
Another such extension is the use of multiple quoting, which allows the author to choose which characters should specify the bounds of a string literal.
In computer programming, Perl is a high-level programming language, List of programming languages by category, Interpreter , dynamic programming language.... :
qq^I said, "Can you hear me?"^
qq@I said, "Can you hear me?"@
qq§I said, "Can you hear me?"§
all produce the desired result.
Although this notation is more flexible, few languages support it. Perl
Perl
In computer programming, Perl is a high-level programming language, List of programming languages by category, Interpreter , dynamic programming language....
and Ruby
Ruby (programming language)
Ruby is a dynamic programming language, reflection , general purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features.... are two that do.
Here documents
A Here document is an alternate quoting notation that allows the programmer
to specify an arbitrary unique identifier as a content boundary for a string literal.
This avoids delimiter collision, and also preserves newlines in the source code
as newlines in the string literal itself.
A metacharacter is a character that has a special meaning to a computer program, such as a Operating system shell or a regular expression engine.... s inside string literals. Metacharacters
have varying interpretations depending on the context and language, but are generally a kind
of 'processing command' for representing printing or nonprinting characters.
C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system.... string literal, if the backslash is followed
by a letter such as "b", "n" or "t", then this represents a nonprinting backspace, newline
or tab character respectively. Or if the backslash is followed by 3 octal
Octal
The octal numeral system, or oct for short, is the radix-8 number system, and uses the digits 0 to 7. Numerals can be made from Binary numeral system numerals by grouping consecutive digits into groups of three .... digits,
then this sequence is interpreted as representing the arbitrary character with the specified
ASCII
ASCII
American Standard Code for Information Interchange , is a coding standard that can be used for interchanging information, if the information is expressed mainly by the written form of English words.... code. This was later extended to allow more modern hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen.... character code notation:
"I said,\t\t\x22Can you hear me?\x22\n"
Raw strings
A few languages provide a method of specifying that a literal is to be processed without any language specific interpretation.
Python is a general-purpose high-level programming language. Its design philosophy emphasizes code readability. Python's core syntax and semantics are Minimalism , while the standard library is large and comprehensive.... 'raw strings' are preceded by an r. In such strings backslashes are not interpreted as escape sequences, making it simpler to write DOS/Windows paths
Path (computing)
A path is the general form of a computer file or directory name, specifying a unique location in a file system. A path points to a file system location by following the directory tree hierarchy expressed in a string of character in which path components, separated by a delimiting character, represent each directory.... and regular expressions:
r"The Windows path is C:\Foo\Bar\Baz\ "
C#'s notation is called @-quoting:
@"C:\Foo\Bar\Baz\"
Which also allows double-up quotes:
@"I said, ""Hello there."""
The term CDATA, meaning character data, is used for distinct, but related purposes in the markup languages Standard Generalized Markup Language and XML.... sections allows use of characters such as & and < without an XML parser attempting to interpret them as part of the structure of the document itself. This can be useful when including literal text and scripting code, to keep the document well formed
Well-formed XML document
A "well-formed" XML document is defined as an XML document that has correct XML syntax. According to W3C, this means:* XML documents must have a root element... .
Variable interpolation
Languages differ on whether and how to interpret string literals as either
'raw' or 'variable interpolated'. Variable interpolation is the process
of evaluating an expression containing one or more variables, and returning
output where the variables are replaced with their corresponding values in
memory.
In sh-compatible Unix shells
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Version 7 Unix, and replaced the Thompson shell, whose executable file had the same name, sh.... , quote-delimited (") strings are interpolated, while apostrophe-delimited (') strings are not.
In computer programming, Perl is a high-level programming language, List of programming languages by category, Interpreter , dynamic programming language.... code:
$sName = "Nancy";
$sGreet = "Hello World";
print "$sName said $sGreet to the crowd of people.";
In computer programming, a sigil is a symbol attached to a variable name, showing the variable's datatype or Scope . The term was first applied to Perl usage by Philip Gwyn in 1999 to replace the more cumbersome "funny character in front of a variable name".... character ($) is interpreted to indicate variable
interpolation.
The class of printf functions is a class of function , typically associated with curly bracket programming languages, that accept a string parameter which specifies a method for rendering a number of other parameters into a string.... function produces the same output
using notation such as:
printf "%s said %s to the crowd of people.", ($sName,$sGreet);
The metacharacters (%s) indicate variable interpolation.
This is contrasted with "raw" strings:
print '$sName said $sGreet to the crowd of people.';
In computer programming, a sigil is a symbol attached to a variable name, showing the variable's datatype or Scope . The term was first applied to Perl usage by Philip Gwyn in 1999 to replace the more cumbersome "funny character in front of a variable name".... , and are not interpreted to have any meaning other than plain text.
REXX is an Interpreted language programming language which was developed at IBM. It is a structured high-level programming language which was designed to be both easy to learn and easy to read.... uses suffix characters to specify characters or strings using their hexadecimal or binary code. E.g.,
'20'x
"0010 0000"b
"00100000"b
all yield the space character, avoiding the function call X2C(20).
Embedding source code in string literals
Languages that lack flexibility in specifying string literals make
it particularly cumbersome to write programming code that generates
other programming code. This is particularly true when the generation
language is the same or similar to the output language.
for example:
writing code to produce quines
generating an output language from within a web template
Web template
A web template is a tool used to Separation of concerns content from presentation in web design, and for mass-production of web documents. It is a basic component of a web template system.... ;
SQL is a database computer language designed for the retrieval and management of data in relational database management systems , database schema creation and modification, and database object access control management.... to generate more SQL
PostScript is a dynamically typed concatenative programming language programming language created by John Warnock and Charles Geschke in 1982. PostScript is best known for its use as a page description language in the electronic and desktop publishing areas.... representation of a document for printing purposes, from within a document-processing application written in C
C (programming language)
C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system.... or some other language.
Nevertheless, some languages are particularly well-adapted to produce
this sort of self-similar output, especially those that support multiple options
for avoiding delimiter collision.
Using string literals as code that generates
other code may have adverse security implications, especially if the output is based at least partially on untrusted
user input. This is particularly acute in the case of Web-based applications, where malicious users can take advantage of such weaknesses to subvert the operation of the application, for example by mounting an SQL injection
SQL injection
SQL injection is a code injection technique that exploits a security vulnerability occurring in the database layer of an application software. The vulnerability is present when user input is either incorrectly filtered for string literal escape sequences embedded in SQL statements or user input is not Strongly-typed programming language and t... attack.
In computer programming, a sigil is a symbol attached to a variable name, showing the variable's datatype or Scope . The term was first applied to Perl usage by Philip Gwyn in 1999 to replace the more cumbersome "funny character in front of a variable name"....