All Topics  
Newline

 

   Email Print
   Bookmark   Link






 

Newline



 
 
In computing
Computing

Computing is usually defined as the activity of using and developing computer technology, computer hardware and computer software. It is the computer-specific part of information technology....
, a newline (also known as a line break or end-of-line / EOL character) is a special character
Character (computing)

In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written language form of a natural language....
 or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the text, immediately proceeding the newline. The actual codes representing a newline vary across hardware platforms and operating systems, which can be a problem when exchanging data between systems with different representations.

There is also some confusion whether newlines terminate or separate lines.






Discussion
Ask a question about 'Newline'
Start a new discussion about 'Newline'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In computing
Computing

Computing is usually defined as the activity of using and developing computer technology, computer hardware and computer software. It is the computer-specific part of information technology....
, a newline (also known as a line break or end-of-line / EOL character) is a special character
Character (computing)

In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written language form of a natural language....
 or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the text, immediately proceeding the newline. The actual codes representing a newline vary across hardware platforms and operating systems, which can be a problem when exchanging data between systems with different representations.

There is also some confusion whether newlines terminate or separate lines. If a newline is considered a separator, there will be no newline after the last line of a file. The general convention on most systems is to add a newline even after the last line, i.e., to treat newline as a line terminator. Some programs have problems processing the last line of a file if it isn't newline terminated. Conversely, programs that expect newline to be used as a separator will interpret a final newline as starting a new (empty) line. This can result in a different line count being reported for the file, but is otherwise generally harmless.

In text intended primarily to be read by humans using software which implements the word wrap
Word wrap

Word wrap or line wrap is the feature, supported by most text editors, word processors, and web browsers, of automatically replacing some of the blank spaces between words by line breaks, such that each line fits in the viewable window, allowing text to be read from top to bottom without any horizontal scrolling....
 feature, a newline character typically only needs to be stored if a line break is required independent of whether the next word would fit on the same line, such as between paragraphs and in vertical lists. See hard return
Hard return

A hard return is a paragraph break in a word processor. It differs from a soft return in that it starts a new paragraph. Besides affecting the document statistics, this means that:...
 and soft return
Soft return

In word processing and text-oriented markup languages the term soft return can mean a line break due to word wrapping. Alternatively it can mean a stored line break that is not a paragraph break....
.

Representations

Software applications and operating system
Operating system

An operating system is an interface between hardware and applications; it is responsible for the management and coordination of activities and the sharing of the limited resources of the computer....
s usually represent a newline with one or two control characters:

  • Systems based on ASCII
    ASCII

    American Standard Code for Information Interchange , is a coding standard that can be used for interchanging information, if the information is expressed mainly by the written form of English words....
     or a compatible character set use either LF (Line feed, 0x
    Hexadecimal

    In mathematics and computer science, hexadecimal is a numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 09 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen....
    0A
    ) or CR (Carriage Return
    Carriage return

    Originally, carriage return was the term for the control character in Baudot code on a Teleprinter for end of line return to beginning of line and did not include line feed....
    , 0x0D) individually, or CR followed by LF (CR+LF, 0x0D 0x0A); see below for the historical reason for the CR+LF convention. These characters are based on printer commands: The line feed indicated that one line of paper should feed out of the printer, and a carriage return indicated that the printer carriage should return to the beginning of the current line.
    • LF:    Multics
      Multics

      Multics was an extremely influential early time-sharing operating system. The project was started in 1964. The last known running Multics installation was shut down on October 30, 2000....
      , Unix
      Unix

      Unix is a computer operating system originally developed in 1969 by a group of American Telephone & Telegraph employees at Bell Labs, including Ken Thompson , Dennis Ritchie, Douglas McIlroy, and Joe Ossanna....
       and Unix-like
      Unix-like

      A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....
       systems (GNU
      GNU

      GNU is a computer operating system composed entirely of free software. Its name is a recursive acronym for GNU's Not Unix; it was chosen because its design is Unix-like, but differs from Unix by being free software and containing no Unix code....
      /Linux
      Linux

      Linux is a generic term referring to Unix-like computer operating systems based on the Linux kernel. Their development is one of the most prominent examples of free and open source software collaboration; typically all the underlying source code can be used, freely modified, and redistributed by anyone under the terms of the GNU GPL license...
      , AIX
      AIX operating system

      AIX is the name given to a series of Proprietary software operating systems sold by IBM for several of its computer system platforms, based on UNIX System V with 4.3BSD-compatible command and programming interface extensions....
      , Xenix
      Xenix

      Xenix is a version of the Unix operating system, licensed by Microsoft from AT&T in the late 1970s. The Santa Cruz Operation later acquired exclusive rights to the software, and eventually began distributing it as SCO UNIX....
      , Mac OS X
      Mac OS X

      Mac OS X is a line of computer operating systems developed, marketed, and sold by Apple Inc., and since 2002 has been included with all new Macintosh computer systems....
      , FreeBSD
      FreeBSD

      FreeBSD is a Unix-like free software operating system descended from AT&T Unix via the Berkeley Software Distribution branch through the 386BSD and Berkeley Software Distribution#4.4BSD and descendants operating systems....
      , etc.), BeOS
      BeOS

      BeOS was an operating system for personal computers which began development by Be Inc. in 1991. It was first written to run on BeBox hardware. BeOS was optimized for digital media work and was written to take advantage of modern hardware facilities such as symmetric multiprocessing by utilizing modular I/O bandwidth, pervasive multithreading,...
      , Amiga
      Amiga

      The Amiga is a family of personal computers originally developed by Amiga Corporation. Development on the Amiga began in 1982 with Jay Miner as the principal hardware designer....
      , RISC OS
      RISC OS

      RISC OS is a computer operating system which was originally developed by Acorn Computers Ltd in Cambridge, England for their ARM architecture based computers....
      , and others
    • CR+LF: DEC
      Digital Equipment Corporation

      Digital Equipment Corporation was a pioneering United States company in the computer industry. It is often referred to within the computing industry as DEC ....
       RT-11
      RT-11

      RT-11 was a small, single-user real-time operating system for the Digital Equipment Corporation PDP-11 family of 16-bit computers. RT-11 was first implemented in 1970 and was widely used for real-time computing systems, process control, and data acquisition across the full line of PDP-11 computers....
       and most other early non-Unix, non-IBM OSes, CP/M
      CP/M

      CP/M is an operating system originally created for Intel 8080/Intel 8085 based microcomputers by Gary Kildall of Digital Research. Initially confined to single tasking on 8-bit processors and no more than 64 kilobytes of memory, later versions of CP/M added multi-user variations, and were migrated to 16-bit processors....
      , MP/M
      MP/M

      MP/M was the multi-user version of the CP/M operating system, created by Digital Research developer Tom Rolander in 1979. It allowed multiple users to connect to a single computer, each using a separate computer terminal....
      , DOS
      DOS

      DOS, short for "Disk Operating System", is a shorthand term for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions Windows 95, Windows 98, and Windows Me....
      , OS/2
      OS/2

      OS/2 is a computer operating system, initially created by Microsoft and IBM, then later developed by IBM exclusively. The name stands for "Operating System/2," because it was introduced as part of the same generation change release as IBM's "IBM Personal System/2 " line of second-generation personal computers....
      , Microsoft Windows
      Microsoft Windows

      Microsoft Windows is a series of software operating systems and graphical user interfaces produced by Microsoft. Microsoft first introduced an operating environment named Windows in November 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces ....
      , Symbian OS
      Symbian OS

      Symbian OS is a proprietary software operating system designed for mobile devices, with associated Library , user interface, frameworks and reference implementations of common tools, developed by Symbian Ltd....
    • CR:    Commodore
      Commodore International

      Commodore, the commonly used name for Commodore International, was a United States electronics company based in West Chester, Pennsylvania which was a vital player in the home computer/personal computer field in the 1980s....
       machines, Apple II family, Mac OS
      Mac OS history

      On January 24, 1984, Apple Computer, Inc. introduced the Macintosh personal computer, with the Macintosh 128K model, which came bundled with the Mac OS operating system, then known as the System Software....
       up to version 9
      Mac OS 9

      Mac OS 9 is the final major release of Apple Inc. "Classic" Mac OS. Introduced on October 23 1999, Apple positioned it as "The Best Internet Operating System Ever," highlighting Apple Sherlock Internet search capabilities, integration with Apple's free online services known as .Mac, and improved Open Transport networking....
       and OS-9
      OS-9

      OS-9 is a family of real-time computing, process , computer multitasking, multi-user, Unix-like operating systems, developed in the 1980s, originally by Microware for the Motorola 6809 microprocessor....
  • EBCDIC
    EBCDIC

    Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used on IBM mainframe operating systems such as z/OS, OS/390, VM and VSE , as well as IBM midrange computer operating systems such as OS/400 and i5/OS ....
     systems—mainly IBM mainframe systems, including z/OS
    Z/OS

    z/OS is a 64-bit operating system for mainframe computers, created by IBM. It is the successor to OS/390, which in turn followed MVS and combined a number of formerly separate, related products....
     (OS/390
    OS/390

    OS/390 is an International Business Machines operating system for the System/390 IBM mainframes.OS/390 was introduced in late 1995 in an effort, led by the late Randy Stelman, to simplify the packaging and ordering for the key, entitled elements needed to complete a fully functional MVS operating system package....
    ) and i5/OS (OS/400
    OS/400

    IBM i is an operating system used on IBM Power Systems, a unified server platform from the former IBM System i and IBM System p servers. IBM i was formerly known as i5/OS or OS/400....
    )—use NEL (Next Line, 0x15) as the newline character. Note that EBCDIC also has control characters called CR and LF, but the numerical value of LF (0x25) differs from the one used by ASCII (0x0A). Additionally, there are some EBCDIC variants that also use NEL but assign a different numeric code to the character.
  • Operating systems for the CDC 6000 series
    CDC 6000 series

    The CDC 6000 series was a family of mainframe computers manufactured by Control Data Corporation in the 1960s. It consisted of CDC 6400, CDC 6500, CDC 6600 and CDC 6700 computers, which all were extremely rapid and efficient for their time....
     defined a newline as two or more zero-valued six-bit characters at the end of a 60-bit word. Some configurations also defined a zero-valued character as a colon
    Colon (punctuation)

    The colon is a punctuation mark, consisting of two equally sized dots centered on the same vertical line....
     character, with the result that multiple colons could be interpreted as a newline depending on position.
  • Many older systems stored the characters for each line in a separate "record". There was thus no line terminator character.
    • Many old mainframe
      Mainframe

      Mainframe may refer to one of the following:* Mainframe computer, large data processing systems* Mainframe Entertainment, a Canadian computer animation and design company....
       operating systems added a carriage control character
      ASA carriage control characters

      Computer printer uses some very simple control characters to control the movement of the paper through a line printer. "ASA" is the abbreviation of the American Standards Association, a former name for the American National Standards Institute , which is believed to have sanctioned these control characters....
       to the start of the next record, this could indicate if the next record was a continuation of the line started by the previous record, or a new line, or should overprint the previous line (similar to a CR). Often this was a normal printing character such as '#' that thus could not be used as the first character in a line. Some early line printers interpreted these characters directly in the records sent to them.
    • OpenVMS
      OpenVMS

      OpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is the name of a high-end computer server operating system that runs on the VAX and DEC Alpha families of computers, developed by Digital Equipment Corporation of Maynard, Massachusetts, Massachusetts , and most recently on Hewlett-Packard systems built around the In...
       uses a record-based file system
      Record-oriented filesystem

      In computer science, a record-oriented filesystem is a file system where files are stored as a collection of storage record. There are several different record formats: fixed-length or variable length, and different physical organizations or padding mechanisms, metadata is associated with the file records to define the record length....
      , which stores text files as one record per line. In most file formats, no line terminators are actually stored, but the Record Management Services
      Record Management Services

      Record Management Services are procedures in the OpenVMS, RSTS/E, RT11 and high-end RSX-11 operating systems that computer program may call to process Computer file and Database record within files....
       facility can transparently add a terminator to each line when it is retrieved by an application.
    • Fixed line length was used by some early mainframe
      Mainframe

      Mainframe may refer to one of the following:* Mainframe computer, large data processing systems* Mainframe Entertainment, a Canadian computer animation and design company....
       operating systems. In such a system, an implicit end-of-line was assumed every 80 characters, for example. No newline character was stored. If a file was imported from the outside world, lines shorter than the line length had to be padded with spaces, while lines longer than the line length had to be truncated. This mimicked the use of punched cards, on which each line was stored on a separate card, usually with 80 columns on each card.


Most textual Internet
Internet

The Internet is a global network of interconnected computers, enabling users to share information along multiple channels. Typically, a computer that connects to the Internet can access information from a vast array of available server and other computers by moving information from them to the computer's local memory....
 protocols
Protocol (computing)

In computer science, a protocol is a convention or standard that controls or enables the connection, communication, and data transfer between computing endpoints....
 (including HTTP, SMTP
Simple Mail Transfer Protocol

Simple Mail Transfer Protocol is an Internet standard for E-mail transmission across Internet Protocol networks. SMTP was first defined in RFC 821 , and last updated by RFC 5321 , which describes the protocol in widespread use today, also known as extended SMTP ....
, FTP
File Transfer Protocol

File Transfer Protocol is a network protocol used to transfer data from one computer to another through a network such as the Internet.FTP is a file transfer protocol for exchanging and manipulating files over a Transmission Control Protocol computer network....
, IRC
Internet Relay Chat

Internet Relay Chat is a form of real-time Internet text messaging or synchronous conferencing. It is mainly designed for Many-to-many in discussion forums, called #Channels, but also allows One-to-one via instant messaging, as well as chat and data transfers via Direct Client-to-Client....
 and many others) mandate the use of ASCII CR+LF (0x0D 0x0A) on the protocol level, but recommend that tolerant applications recognize lone LF as well. In practice, there are many applications that erroneously use the C
C (programming language)

C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
 newline character '\n' instead (see section Newline in programming languages below). This leads to problems when trying to communicate with systems adhering to a stricter interpretation of the standards; one such system is the qmail
Qmail

qmail is a mail transfer agent that runs on Unix. It was written, starting December 1995, by Daniel J. Bernstein as a more computer security replacement for the popular Sendmail program....
 MTA
Mail transfer agent

A mail transfer agent The term mail server is also used to mean a computer acting as an MTA that is running the appropriate software. The term mail exchanger , in the context of the Domain Name System formally refers to an IP address assigned to a device hosting a mail server, and by extension also indicates the server itsel...
 that actively refuses to accept messages from systems that send bare LF instead of the required CR+LF.

Unicode

The Unicode
Unicode

Unicode is a computing industry standard allowing computers to consistently represent and manipulate Character expressed in most of the world's writing systems....
 standard addresses the problem by defining a large number of characters that conforming applications should recognize as line terminators:

 LF:    Line Feed, U+000A
 CR:    Carriage Return
Carriage return

Originally, carriage return was the term for the control character in Baudot code on a Teleprinter for end of line return to beginning of line and did not include line feed....
, U+000D
 CR+LF: CR followed by LF, U+000D followed by U+000A
 NEL:   Next Line, U+0085
 FF:    Form Feed, U+000C
 LS:    Line Separator, U+2028
 PS:    Paragraph Separator, U+2029

This may seem overly complicated compared to an approach such as converting all line terminators to a single character, for example LF. The simple approach breaks down, however, when trying to convert a text file from an encoding like EBCDIC
EBCDIC

Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used on IBM mainframe operating systems such as z/OS, OS/390, VM and VSE , as well as IBM midrange computer operating systems such as OS/400 and i5/OS ....
 to Unicode and back. When converting to Unicode, NEL would have to be replaced by LF, but when converting back it would be impossible to decide if an LF should be mapped to an EBCDIC LF or NEL. The approach taken in the Unicode standard allows round-trip transformation to be information-preserving while still enabling applications to recognize all possible types of line terminators.

History

ASCII was developed simultaneously by the ISO
International Organization for Standardization

The International Organization for Standardization , widely known as ISO , is an international standard-setting body composed of representatives from various national standards organizations....
 and the ASA, the predecessor organization to ANSI
American National Standards Institute

The American National Standards Institute or ANSI is a private non-profit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States....
. During the period of 1963–1968, the ISO draft standards supported the use of either CR+LF or LF alone as a newline, while the ASA drafts supported only CR+LF. The Multics
Multics

Multics was an extremely influential early time-sharing operating system. The project was started in 1964. The last known running Multics installation was shut down on October 30, 2000....
 operating system began development in 1964 and used LF alone as its newline. Unix
Unix

Unix is a computer operating system originally developed in 1969 by a group of American Telephone & Telegraph employees at Bell Labs, including Ken Thompson , Dennis Ritchie, Douglas McIlroy, and Joe Ossanna....
 followed the Multics practice, and later systems followed Unix.

The sequence CR+LF was in common use on many early computer systems that had adopted teletype machines, typically an ASR33
ASR33

Introduced about 1963, Teletype Corporation's ASR33 was a very popular model of teleprinter. Designed for light-duty office use, it was much flimsier than its heavy duty cousin, the Model 35ASR....
, as a console device, because this sequence was required to position those printers at the start of a new line. On these systems, text was often routinely composed to be compatible with these printers, since the concept of device driver
Device driver

In computing, a device driver or software driver is a computer program allowing higher-level computer programs to interact with a hardware device....
s hiding such hardware details from the application was not yet well developed; applications had to talk directly to the teletype machine and follow its conventions. The separation of the two functions concealed the fact that the print head could not return from the far right to the beginning of the next line in one-character time. That is why the sequence was always sent with the CR first. In fact, it was often necessary to send extra characters (extraneous CRs or NULs, which are ignored) to give the print head time to move to the left margin. Even after teletypes were replaced by computer terminal
Computer terminal

A computer terminal is an electronic or electromechanical computer hardware device that is used for entering data into, and displaying data from, a computer or a computing system....
s with higher baud
Baud

In telecommunications and electronics, baud is synonymous to symbols/s or pulses/s. It is the unit of symbol rate, also known as baud rate or modulation rate; the number of distinct symbol changes made to the transmission medium per second in a digitally modulation signal or a line code....
 rates, many operating systems still supported automatic sending of these fill characters, for compatibility with cheaper terminals that required multiple character times to scroll
Scrolling

In computer graphics, movies, television, and other kinetic displays, scrolling is sliding text, images or video across a monitor or display. "Scrolling", as such, does not change the layout of the text or pictures, or but incrementally moves panning or Tilt the user's view across what is apparently a larger image that is not wholly seen....
 the display.

MS-DOS
MS-DOS

MS-DOS is an operating system commercialized by Microsoft. It was the most commonly used member of the DOS family of operating systems and was the main operating system for personal computers during the 1980s....
, built upon a CP/M
CP/M

CP/M is an operating system originally created for Intel 8080/Intel 8085 based microcomputers by Gary Kildall of Digital Research. Initially confined to single tasking on 8-bit processors and no more than 64 kilobytes of memory, later versions of CP/M added multi-user variations, and were migrated to 16-bit processors....
 clone called 86-DOS (which Microsoft
Microsoft

Microsoft Corporation is a multinational corporation computer technology corporation that develops, manufactures, licenses, and supports a wide range of computer software products for computing devices....
 purchased and renamed), adopted CP/M's CR+LF; CP/M's use of CR+LF made sense for using computer terminals via serial lines. This convention was inherited by Microsoft's later Windows
Microsoft Windows

Microsoft Windows is a series of software operating systems and graphical user interfaces produced by Microsoft. Microsoft first introduced an operating environment named Windows in November 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces ....
 operating system.

Newline in programming languages

To facilitate the creation of portable
Porting

In computer science, porting is the process of adapting software so that an executable Computer program can be created for a computing environment that is different from the one for which it was originally designed ....
 programs, programming languages provide some abstractions to deal with the different types of newline sequences used in different environments.

The C programming language
C (programming language)

C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
 provides the escape sequence
Escape sequence

An escape sequence is a series of character used to change the state of computers and their attached peripheral devices. These are also known as control sequences, reflecting their use in device control....
s '\n' (newline) and '\r' (carriage return). However, these are not required to be equivalent to the ASCII LF and CR control characters. The C standard only guarantees two things:
  1. Each of these escape sequences maps to a unique implementation-defined number that can be stored in a single char value.
  2. When writing a file in text mode, '\n' is transparently translated to the native newline sequence used by the system, which may be longer than one character. (Note that a C implementation is allowed not to store newline characters in files. For example, the lines of a text file could be stored as rows of a SQL
    SQL

    SQL is a database computer language designed for the retrieval and management of data in relational database management systems , database schema creation and modification, and database object access control management....
     table or as fixed-length records.) When reading in text mode, the native newline sequence is translated back to '\n'. In binary mode, the second mode of I/O
    Input/output

    In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world ? possibly a human, or another information processing system....
     supported by the C library, no translation is performed, and the internal representation of any escape sequence is output directly.


On Unix platforms, where C originated, the native newline sequence is ASCII LF (0x0A), so '\n' was simply defined to be that value. With the internal and external representation being identical, the translation performed in text mode effectively turns into a no-op
NOP

In computer science NOP or NOOP is an assembly language instruction, sequence of programming language statements, or protocol command that effectively does nothing at all....
, making text mode and binary mode behave the same. This has caused many programmers who developed their software on Unix systems simply to ignore the distinction completely, resulting in code that is not portable to different platforms.

Another common problem is the use of '\n' when communicating using an Internet protocol that mandates the use of ASCII CR+LF for ending lines. Writing '\n' to a text mode stream works correctly on Windows systems, but produces only LF on Unix, and something completely different on more exotic systems. Using "\r\n" in binary mode is slightly better, as it works on many ASCII-compatible systems, but still fails in the general case. One approach is to use binary mode and specify the numeric values of the control sequence directly, "\x0D\x0A".

Many languages, such as C++
C++

C++ is a general-purpose programming language. It is regarded as a middle-level language, as it comprises a combination of both high-level programming language and low-level programming language language features....
 and Perl
Perl

In computer programming, Perl is a high-level programming language, List of programming languages by category, Interpreter , dynamic programming language....
 provide the same interpretation of '\n' as C.

Java
Java (programming language)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java ....
, PHP
PHP

PHP is a scripting language originally designed for producing dynamic web pages. It has evolved to include a command line interface capability and can be used in Standalone software Graphical user interface....
 and Python
Python (programming language)

Python is a general-purpose high-level programming language. Its design philosophy emphasizes code readability. Python's core syntax and semantics are Minimalism , while the standard library is large and comprehensive....
 also provide '\n' and '\r' escape sequences. In contrast to C, these are guaranteed to represent the values U+000A and U+000D, respectively. The Java I/O libraries do not transparently translate these into platform-dependent newline sequences on input or output. Instead, they provide functions for writing a full line that automatically add the native newline sequence, and functions for reading lines that accept any of CR, LF, or CR+LF as a line terminator (see ). The method can be used to retrieve the underlying line separator.

Example: String crlf = System.getProperty( "line.separator" ); String lineColor = "Color: Red" + crlf;

Some languages have created special variable
Variable

A variable is a symbol that stands for a value that may vary; the term usually occurs in opposition to constant, which is a symbol for a non-varying value, i.e....
s, constants and subroutine
Subroutine

In computer science, a subroutine or subprogram is a portion of computer code within a larger computer program, which performs a specific task and is relatively independent of the remaining code....
s to facilitate newlines during program execution. One example is the PHP
PHP

PHP is a scripting language originally designed for producing dynamic web pages. It has evolved to include a command line interface capability and can be used in Standalone software Graphical user interface....
 constant PHP_EOL, which will produce either '\r\n' or '\n' appropriate to the operating system the program is executed on. Though special newline handling facilities can aid execution during runtime, they do not ensure the validity of newlines for the source code
Source code

In computer science, source code is any collection of statements or declarations written in some human-readable computer programming language....
 itself.

Common problems

The different newline conventions often cause text files that have been transferred between systems of different types to be displayed incorrectly. For example, files originating on Unix
Unix

Unix is a computer operating system originally developed in 1969 by a group of American Telephone & Telegraph employees at Bell Labs, including Ken Thompson , Dennis Ritchie, Douglas McIlroy, and Joe Ossanna....
 or Apple Macintosh systems may appear as a single long line on a Windows
Microsoft Windows

Microsoft Windows is a series of software operating systems and graphical user interfaces produced by Microsoft. Microsoft first introduced an operating environment named Windows in November 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces ....
 system. Conversely, when viewing a file from a Windows computer on a Unix system, the extra CR may be displayed as ^M at the end of each line or as a second line break.

The problem can be hard to spot if some programs handle the foreign newlines properly while others don't. For example, a compiler
Compiler

A compiler is a computer program that transforms source code written in a programming language into another computer language . The most common reason for wanting to transform source code is to create an executable program....
 may fail with obscure syntax errors even though the source file looks correct when displayed on the console
Command line interface

A command-line interface is a mechanism for interacting with a computer operating system or software by typing commands to perform specific tasks....
 or in an editor
Text editor

A text editor is a type of software application used for editing plain text files.Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code....
. Modern text editors generally recognize all flavours of CR / LF newlines and allow the user to convert between the different standards. Web browser
Web browser

A Web browser is a application software which enables a user to display and interact with text, images, videos, music, games and other information typically located on a Web page at a website on the World Wide Web or a local area network....
s are usually also capable of displaying text files of different types.

The File Transfer Protocol
File Transfer Protocol

File Transfer Protocol is a network protocol used to transfer data from one computer to another through a network such as the Internet.FTP is a file transfer protocol for exchanging and manipulating files over a Transmission Control Protocol computer network....
 can automatically convert newlines in files being transferred between systems
Operating system

An operating system is an interface between hardware and applications; it is responsible for the management and coordination of activities and the sharing of the limited resources of the computer....
 with different newline representations when the transfer is done in "ASCII mode". However, transferring binary files in this mode usually has disastrous results: Any occurrence of the newline byte sequence—which does not have line terminator semantics in this context, but is just part of a normal sequence of bytes—will be translated to whatever newline representation the other system uses, effectively corrupting the file. FTP clients often employ some heuristic
Heuristic (computer science)

In computer science, a heuristic algorithm, or simply a heuristic, is an algorithm that is able to produce an acceptable solution to a problem in many practical scenarios, but for which there is no formal proof of its correctness....
s (for example, inspection of filename extension
Filename extension

A filename extension is a substring to the filename of a computer file applied to indicate the encoding convention of its contents.In some operating systems it is optional, while in some others it is a requirement....
s) to automatically select either binary or ASCII mode, but in the end it is up to the user to make sure his or her files are transferred in the correct mode. If there is any doubt as to the correct mode, binary mode should be used, as then no files will be altered by FTP, though they may display incorrectly.

Conversion utilities

Text editors are often used for converting a text file between different newline formats; most modern editors can read and write files using at least the different ASCII CR/LF conventions. The standard Windows
Microsoft Windows

Microsoft Windows is a series of software operating systems and graphical user interfaces produced by Microsoft. Microsoft first introduced an operating environment named Windows in November 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces ....
 editor Notepad
Notepad

Notepad is a simple text editor included in all versions of Microsoft Windows since Windows 1.0 in 1985....
 is not one of them (though Wordpad
WordPad

WordPad is a basic word processor that is included with almost all versions of Microsoft Windows from Windows 95 upwards. It is more advanced than Notepad but more simple than Microsoft Works Word Processor and Microsoft Word....
 is).

On Windows systems without a better editor, the old MS-DOS editor EDIT that still ships with modern Windows versions is often used to convert a Unix text file to DOS/Windows newlines. This is done by creating a shortcut to EDIT on the desktop (context menu / New / Shortcut / "edit" / Next / Finish), dragging the text file in question onto it, and then saving the file again (File / Save).

Editors are often unsuitable for converting larger files. For larger files (on Windows NT/2000/XP) the following command is often used: TYPE unix_file | FIND "" /V > dos_file

On many Unix
Unix

Unix is a computer operating system originally developed in 1969 by a group of American Telephone & Telegraph employees at Bell Labs, including Ken Thompson , Dennis Ritchie, Douglas McIlroy, and Joe Ossanna....
 systems, the dos2unix (sometimes named fromdos or d2u) and unix2dos
Unix2dos

unix2dos is a UNIX tool to convert an ASCII text file from UNIX format to DOS format and vice versa. When invoked as unix2dos the program will convert a UNIX text file to DOS format, when invoked as dos2unix it will convert a DOS text file to UNIX format....
 (sometimes named todos or u2d) utilities are used to translate between ASCII CR+LF (DOS/Windows) and LF (Unix) newlines. Different versions of these commands vary slightly in their syntax. However, the tr
Tr (Unix)

tr is a command in Unix-like operating systems.When executed, the program reads from the standard input and writes to the standard output. It takes as Command-line argument two sets of characters, and replaces occurrences of the characters in the first set with the corresponding elements from the other set....
command is available on virtually every Unix-like
Unix-like

A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....
 system and is used to perform arbitrary replacement operations on single characters. A DOS/Windows text file can be converted to Unix format by simply removing all ASCII CR characters with tr -d '\r' < inputfile > outputfile or, if the text has only CRs, by converting CRs to LFs with tr '\r' '\n' < inputfile > outputfile

The same tasks are sometimes performed with sed
Sed

sed is a Unix utility which parses text files and implements a programming language which can apply textual transformations to such files. It reads input files line by line , applying the operation which has been specified via the command line , and then outputs the line....
, or in Perl
Perl

In computer programming, Perl is a high-level programming language, List of programming languages by category, Interpreter , dynamic programming language....
 if the platform has a Perl interpreter: sed -e 's/$/\r/' inputfile > outputfile# UNIX to DOS (adding CRs) sed -e 's/\r$//' inputfile > outputfile# DOS to UNIX (removing CRs) perl -pe 's/\r\n|\n|\r/\r\n/g' inputfile > outputfile # Convert to DOS perl -pe 's/\r\n|\n|\r/\n/g' inputfile > outputfile # Convert to UNIX perl -pe 's/\r\n|\n|\r/\r/g' inputfile > outputfile # Convert to old Mac

To identify what type of line breaks a text file contains the file
File

File or filing may refer to:Tools:* File * Filing * Nail filePaper or computer records:* File folder, a folder for holding loose papers...
command can be used. Moreover, the editor vim can be convenient to make a file compatible with the Windows notepad text editor. For example: [prompt] > file myfile.txt myfile.txt: ASCII English text [prompt] > vim myfile.txt within vim :set fileformat=dos :wq [prompt] > file myfile.txt myfile.txt: ASCII English text, with CRLF line terminators

The following grep commands echo the filename (in this case myfile.txt) to the command line if the file is of the specified style:

grep -PL $'\n' myfile.txt # show UNIX style file (LF terminated) grep -Pl $'\r\n' myfile.txt # show DOS style file (CRLF terminated)

For Debian-based systems, these commands are used:

egrep -L $'\n' myfile.txt # show UNIX style file (LF terminated) egrep -l $'\r\n' myfile.txt # show DOS style file (CRLF terminated)

The above grep commands work under Unix
Unix

Unix is a computer operating system originally developed in 1969 by a group of American Telephone & Telegraph employees at Bell Labs, including Ken Thompson , Dennis Ritchie, Douglas McIlroy, and Joe Ossanna....
 systems or in Cygwin
Cygwin

Cygwin is a Unix-like environment and command-line interface for Microsoft Windows. Cygwin provides native integration of Windows-based applications, data, and other system resources with applications, software tools, and data of the Unix-like environment....
 under Windows. Note that these commands make some assumptions about the kinds of files that exist on the system (specifically it's assuming only UNIX and DOS-style files—no Mac OS 9-style files). Check the -P, -L, and -l options to understand how it works.

This technique is often combined with find
Find

The find program is a directory Search_engine_ on Unix-like platforms. It searches through one or more directory tree of a filesystem, locating Computer files based on some user-specified criteria....
to list files recursively. For instance, the following command checks all "regular files" (e.g. it will exclude directories, symbolic links, etc.) to find all UNIX-style files in a directory tree, starting from the current directory (.), and saves the results in file unix_files.txt, overwriting it if the file already exists:

find . -type f -exec grep -PL '\r\n' \; > unix_files.txt

The file command also detects the type of EOL used: file myfile.txt > myfile.txt: ASCII text, with CRLF line terminators

Other tools permit the user to visualise the EOL characters: od -a myfile.txt cat -e myfile.txt hexdump -c myfile.txt

dos2unix, unix2dos, mac2unix, unix2mac, mac2dos, dos2mac can perform conversions. The flip command is often used.

See also

  • ASA carriage control characters
    ASA carriage control characters

    Computer printer uses some very simple control characters to control the movement of the paper through a line printer. "ASA" is the abbreviation of the American Standards Association, a former name for the American National Standards Institute , which is believed to have sanctioned these control characters....
  • C0 and C1 control codes
    C0 and C1 control codes

    The C0 and C1 control code sets define control codes for use in text by computer systems that use the ISO/IEC 2022 system of specifying control and graphic characters....
  • Page break
    Page Break

    A page break is a marker in an electronic document, which tells the document interpreter that the contents which follows is part of a new page. A page break causes a form feed to be sent to the printer during spooling of the document to the printer....


External links

  • The Unicode reference, see paragraph 5.8 in of the Unicode 4.0 standard (PDF)
  • - software for Unix that converts to and from DOS newlines
  • : a Windows
    Microsoft Windows

    Microsoft Windows is a series of software operating systems and graphical user interfaces produced by Microsoft. Microsoft first introduced an operating environment named Windows in November 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces ....
     shell extension that is able to convert multiple files from DOS to UNIX (and vice-versa) line endings right from the context menu
    Context menu

    A context menu is a menu in a graphical user interface that appears upon user interaction, such as a Right click#Common mouse operations. A context menu offers a limited set of choices that are available in the current state, or context, of the operating system or application....
    .