Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Transmission Control Protocol

Transmission Control Protocol

Overview
The Transmission Control Protocol (TCP) is one of the core protocols
Communications protocol
A communications protocol is a system of digital message formats and rules for exchanging those messages in or between computing systems and in telecommunications...

 of the Internet Protocol Suite
Internet protocol suite
The Internet protocol suite is the set of communications protocols used for the Internet and other similar networks. It is commonly known as TCP/IP from its most important protocols: Transmission Control Protocol and Internet Protocol , which were the first networking protocols defined in this...

. TCP is one of the two original components of the suite, complementing the Internet Protocol
Internet Protocol
The Internet Protocol is the principal communications protocol used for relaying datagrams across an internetwork using the Internet Protocol Suite...

 (IP), and therefore the entire suite is commonly referred to as TCP/IP. TCP provides reliable, ordered delivery of a stream of bytes from a program on one computer to another program on another computer. TCP is the protocol that major Internet applications such as the World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

, email
Email
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...

, remote administration
Remote administration
Remote administration refers to any method of controlling a computer from a remote location.Software that allows remote administration is becoming increasingly common and is often used when it is difficult or impractical to be physically near a system in order to use it, or in order to access web...

 and file transfer
File transfer
File transfer is a generic term for the act of transmitting files over a computer network or the Internet. There are numerous ways and protocols to transfer files over a network. Computers which provide a file transfer service are often called file servers. Depending on the client's perspective the...

 rely on. Other applications, which do not require reliable data stream service, may use the User Datagram Protocol
User Datagram Protocol
The User Datagram Protocol is one of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol network without requiring...

 (UDP), which provides a datagram
Datagram
A datagram is a basic transfer unit associated with a packet-switched network in which the delivery, arrival time, and order are not guaranteed....

 service that emphasizes reduced latency
Latency (engineering)
Latency is a measure of time delay experienced in a system, the precise definition of which depends on the system and the time being measured. Latencies may have different meaning in different contexts.-Packet-switched networks:...

 over reliability.
Discussion
Ask a question about 'Transmission Control Protocol'
Start a new discussion about 'Transmission Control Protocol'
Answer questions from other users
Full Discussion Forum
 
Unanswered Questions
Encyclopedia
The Transmission Control Protocol (TCP) is one of the core protocols
Communications protocol
A communications protocol is a system of digital message formats and rules for exchanging those messages in or between computing systems and in telecommunications...

 of the Internet Protocol Suite
Internet protocol suite
The Internet protocol suite is the set of communications protocols used for the Internet and other similar networks. It is commonly known as TCP/IP from its most important protocols: Transmission Control Protocol and Internet Protocol , which were the first networking protocols defined in this...

. TCP is one of the two original components of the suite, complementing the Internet Protocol
Internet Protocol
The Internet Protocol is the principal communications protocol used for relaying datagrams across an internetwork using the Internet Protocol Suite...

 (IP), and therefore the entire suite is commonly referred to as TCP/IP. TCP provides reliable, ordered delivery of a stream of bytes from a program on one computer to another program on another computer. TCP is the protocol that major Internet applications such as the World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

, email
Email
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...

, remote administration
Remote administration
Remote administration refers to any method of controlling a computer from a remote location.Software that allows remote administration is becoming increasingly common and is often used when it is difficult or impractical to be physically near a system in order to use it, or in order to access web...

 and file transfer
File transfer
File transfer is a generic term for the act of transmitting files over a computer network or the Internet. There are numerous ways and protocols to transfer files over a network. Computers which provide a file transfer service are often called file servers. Depending on the client's perspective the...

 rely on. Other applications, which do not require reliable data stream service, may use the User Datagram Protocol
User Datagram Protocol
The User Datagram Protocol is one of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol network without requiring...

 (UDP), which provides a datagram
Datagram
A datagram is a basic transfer unit associated with a packet-switched network in which the delivery, arrival time, and order are not guaranteed....

 service that emphasizes reduced latency
Latency (engineering)
Latency is a measure of time delay experienced in a system, the precise definition of which depends on the system and the time being measured. Latencies may have different meaning in different contexts.-Packet-switched networks:...

 over reliability.

Historical origin


In May 1974 the Institute of Electrical and Electronic Engineers (IEEE) published a paper entitled "A Protocol for Packet Network Interconnection." The paper's authors, Vint Cerf
Vint Cerf
Vinton Gray "Vint" Cerf is an American computer scientist, who is recognized as one of "the fathers of the Internet", sharing this title with American computer scientist Bob Kahn...

 and Bob Kahn
Bob Kahn
Robert Elliot Kahn is an American Internet pioneer, engineer and computer scientist, who, along with Vinton G. Cerf, invented the Transmission Control Protocol and the Internet Protocol , the fundamental communication protocols at the heart of the Internet.-Career:After receiving a B.E.E...

 described an internetworking protocol for sharing resources using packet-switching among the nodes. A central control component of this model was the Transmission Control Program that incorporated both connection-oriented links and datagram
Datagram
A datagram is a basic transfer unit associated with a packet-switched network in which the delivery, arrival time, and order are not guaranteed....

 services between hosts. The monolithic Transmission Control Program was later divided into a modular architecture consisting of the Transmission Control Protocol at the connection-oriented layer and the Internet Protocol at the internetworking (datagram) layer. The model became known informally as TCP/IP, although formally it was henceforth called the Internet Protocol Suite.

Network function


TCP provides a communication service at an intermediate level between an application program and the Internet Protocol
Internet Protocol
The Internet Protocol is the principal communications protocol used for relaying datagrams across an internetwork using the Internet Protocol Suite...

 (IP). That is, when an application program
Application software
Application software, also known as an application or an "app", is computer software designed to help the user to perform specific tasks. Examples include enterprise software, accounting software, office suites, graphics software and media players. Many application programs deal principally with...

 desires to send a large chunk of data across the Internet using IP, instead of breaking the data into IP-sized pieces and issuing a series of IP requests, the software can issue a single request to TCP and let TCP handle the IP details.

IP works by exchanging pieces of information called packets. A packet is a sequence of octets
Octet (computing)
An octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as there is no standard for the size of the byte.-Overview:...

 and consists of a header followed by a body. The header describes the packet's destination and, optionally, the routers to use for forwarding until it arrives at its destination. The body contains the data IP is transmitting.

Due to network congestion, traffic load balancing, or other unpredictable network behavior, IP packets can be lost
Packet loss
Packet loss occurs when one or more packets of data travelling across a computer network fail to reach their destination. Packet loss is distinguished as one of the three main error types encountered in digital communications; the other two being bit error and spurious packets caused due to noise.-...

, duplicated, or delivered out of order
Out-of-order delivery
In computer networking, out-of-order delivery is the delivery of data packets in a different order from which they were sent. Out-of-order delivery can be caused by packets following multiple paths through a network, or via parallel processing paths within network equipment that are not designed to...

. TCP detects these problems, requests retransmission of lost data, rearranges out-of-order data, and even helps minimize network congestion to reduce the occurrence of the other problems. Once the TCP receiver has reassembled the sequence of octets originally transmitted, it passes them to the application program. Thus, TCP abstracts the application's communication from the underlying networking details.

TCP is utilized extensively by many of the Internet's most popular applications, including the World Wide Web (WWW)
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

, E-mail
E-mail
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...

, File Transfer Protocol
File Transfer Protocol
File Transfer Protocol is a standard network protocol used to transfer files from one host to another host over a TCP-based network, such as the Internet. FTP is built on a client-server architecture and utilizes separate control and data connections between the client and server...

, Secure Shell
Secure Shell
Secure Shell is a network protocol for secure data communication, remote shell services or command execution and other secure network services between two networked computers that it connects via a secure channel over an insecure network: a server and a client...

, peer-to-peer
Peer-to-peer
Peer-to-peer computing or networking is a distributed application architecture that partitions tasks or workloads among peers. Peers are equally privileged, equipotent participants in the application...

 file sharing
File sharing
File sharing is the practice of distributing or providing access to digitally stored information, such as computer programs, multimedia , documents, or electronic books. It may be implemented through a variety of ways...

, and some streaming media
Streaming media
Streaming media is multimedia that is constantly received by and presented to an end-user while being delivered by a streaming provider.The term "presented" is used in this article in a general sense that includes audio or video playback. The name refers to the delivery method of the medium rather...

 applications.

TCP is optimized for accurate delivery rather than timely delivery, and therefore, TCP sometimes incurs relatively long delays (in the order of seconds) while waiting for out-of-order messages or retransmissions of lost messages. It is not particularly suitable for real-time applications such as Voice over IP
Voice over IP
Voice over Internet Protocol is a family of technologies, methodologies, communication protocols, and transmission techniques for the delivery of voice communications and multimedia sessions over Internet Protocol networks, such as the Internet...

. For such applications, protocols like the Real-time Transport Protocol
Real-time Transport Protocol
The Real-time Transport Protocol defines a standardized packet format for delivering audio and video over IP networks. RTP is used extensively in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications, television services and...

 (RTP) running over the User Datagram Protocol
User Datagram Protocol
The User Datagram Protocol is one of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol network without requiring...

 (UDP) are usually recommended instead.

TCP is a reliable stream delivery service that guarantees delivery of a data stream sent from one host to another without duplication or losing data. Since packet transfer is not reliable, a technique known as positive acknowledgment with retransmission is used to guarantee reliability of packet transfers. This fundamental technique requires the receiver to respond with an acknowledgment message as it receives the data. The sender keeps a record of each packet it sends, and waits for acknowledgment before sending the next packet. The sender also keeps a timer from when the packet was sent, and retransmits a packet if the timer expires. The timer is needed in case a packet gets lost or corrupted.

TCP consists of a set of rules: for the protocol, that are used with the Internet Protocol, and for the IP, to send data "in a form of message units" between computers over the Internet. While IP handles actual delivery of the data, TCP keeps track of the individual units of data transmission, called segments, that a message is divided into for efficient routing through the network. For example, when an HTML file is sent from a Web server, the TCP software layer of that server divides the sequence of octets of the file into segments and forwards them individually to the IP software layer (Internet Layer
Internet layer
The internet layer or IP layer is a group of internetworking methods in the Internet protocol suite, commonly also called TCP/IP, which is the foundation of the Internet...

). The Internet Layer encapsulates each TCP segment into an IP packet by adding a header that includes (among other data) the destination IP address
IP address
An Internet Protocol address is a numerical label assigned to each device participating in a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing...

. Even though every packet has the same destination address, they can be routed on different paths through the network. When the client program on the destination computer receives them, the TCP layer (Transport Layer
Transport layer
In computer networking, the transport layer or layer 4 provides end-to-end communication services for applications within a layered architecture of network components and protocols...

) reassembles the individual segments and ensures they are correctly ordered and error free as it streams them to an application.

TCP segment structure


Transmission Control Protocol accepts data from a data stream, segments it into chunks, and adds a TCP header creating a TCP segment. The TCP segment is then encapsulated
Encapsulation (networking)
In computer networking, encapsulation is a method of designing modular communication protocols in which logically separate functions in the network are abstracted from their underlying structures by inclusion or information hiding within higher level objects....

 into an Internet Protocol
Internet Protocol
The Internet Protocol is the principal communications protocol used for relaying datagrams across an internetwork using the Internet Protocol Suite...

 (IP) datagram. A TCP segment is "the packet of information that TCP uses to exchange data with its peers."

The term TCP packet, though sometimes informally used, is not in line with current terminology, where segment refers to the TCP PDU (Protocol Data Unit), datagram to the IP PDU and frame to the data link layer PDU:
Processes transmit data by calling on the TCP and passing buffers of data as arguments. The TCP packages the data from these buffers into segments and calls on the internet module [e.g. IP] to transmit each segment to the destination TCP.


A TCP segment consists of a segment header and a data section. The TCP header contains 10 mandatory fields, and an optional extension field (Options, pink background in table).

The data section follows the header. Its contents are the payload data carried for the application. The length of the data section is not specified in the TCP segment header. It can be calculated by subtracting the combined length of the TCP header and the encapsulating IP header from the total IP datagram length (specified in the IP header).
TCP Header
Offsets Octet
Octet (computing)
An octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as there is no standard for the size of the byte.-Overview:...

0 1 2 3
Octet Bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

 0 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031
0   0 Source port Destination port
4  32 Sequence number
8  64 Acknowledgment number (if ACK set)
12  96 Data offset Reserved N
S
tt style="font-size:8pt; line-height:1">C
W
R
tt style="font-size:8pt; line-height:1">E
C
E
tt style="font-size:8pt; line-height:1">U
R
G
tt style="font-size:8pt; line-height:1">A
C
K
tt style="font-size:8pt; line-height:1">P
S
H
tt style="font-size:8pt; line-height:1">R
S
T
tt style="font-size:8pt; line-height:1">S
Y
N
tt style="font-size:8pt; line-height:1">F
I
N
Window Size
16 128 Checksum Urgent pointer (if URG set)
20
...
160
...
Options (if Data Offset > 5,padded at end with "0" bytes if necessary)
...

  • Source port (16 bits) – identifies the sending port
  • Destination port (16 bits) – identifies the receiving port
  • Sequence number (32 bits) – has a dual role:
  • If the SYN flag is set (1), then this is the initial sequence number. The sequence number of the actual first data byte and the acknowledged number in the corresponding ACK are then this sequence number plus 1.
  • If the SYN flag is clear (0), then this is the accumulated sequence number of the first data byte of this packet for the current session.
  • Acknowledgment number (32 bits) – if the ACK flag is set then the value of this field is the next sequence number that the receiver is expecting. This acknowledges receipt of all prior bytes (if any). The first ACK sent by each end acknowledges the other end's initial sequence number itself, but no data.
  • Data offset (4 bits) – specifies the size of the TCP header in 32-bit words. The minimum size header is 5 words and the maximum is 15 words thus giving the minimum size of 20 bytes and maximum of 60 bytes, allowing for up to 40 bytes of options in the header. This field gets its name from the fact that it is also the offset from the start of the TCP segment to the actual data.
  • Reserved (3 bits) – for future use and should be set to zero
  • Flags (9 bits) (aka Control bits) – contains 9 1-bit flags
  • NS (1 bit) – ECN-nonce concealment protection (added to header by RFC 3540).
  • CWR (1 bit) – Congestion Window Reduced (CWR) flag is set by the sending host to indicate that it received a TCP segment with the ECE flag set and had responded in congestion control mechanism (added to header by RFC 3168).
  • ECE (1 bit) – ECN-Echo indicates
  • If the SYN flag is set (1), that the TCP peer is ECN
    Explicit Congestion Notification
    Explicit Congestion Notification is an extension to the Internet Protocol and to the Transmission Control Protocol and is defined in RFC 3168 . ECN allows end-to-end notification of network congestion without dropping packets. ECN is an optional feature that is only used when both endpoints...

     capable.
  • If the SYN flag is clear (0), that a packet with Congestion Experienced flag in IP header set is received during normal transmission (added to header by RFC 3168).
    • URG (1 bit) – indicates that the Urgent pointer field is significant
    • ACK (1 bit) – indicates that the Acknowledgment field is significant. All packets after the initial SYN packet sent by the client should have this flag set.
    • PSH (1 bit) – Push function. Asks to push the buffered data to the receiving application.
    • RST (1 bit) – Reset the connection
    • SYN (1 bit) – Synchronize sequence numbers. Only the first packet sent from each end should have this flag set. Some other flags change meaning based on this flag, and some are only valid for when it is set, and others when it is clear.
    • FIN (1 bit) – No more data from sender
    • Window size (16 bits) – the size of the receive window, which specifies the number of bytes (beyond the sequence number in the acknowledgment field) that the receiver is currently willing to receive (see Flow control and Window Scaling)
    • Checksum (16 bits) – The 16-bit checksum
      Checksum
      A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...

       field is used for error-checking of the header and data
    • Urgent pointer (16 bits) – if the URG flag is set, then this 16-bit field is an offset from the sequence number indicating the last urgent data byte
    • Options (Variable 0-320 bits, divisible by 32) – The length of this field is determined by the data offset field. Options have up to three fields: Option-Kind (1 byte), Option-Length (1 byte), Option-Data (variable). The Option-Kind field indicates the type of option, and is the only field that is not optional. Depending on what kind of option we are dealing with, the next two fields may be set: the Option-Length field indicates the total length of the option, and the Option-Data field contains the value of the option, if applicable. For example, an Option-Kind byte of 0x01 indicates that this is a No-Op option used only for padding, and does not have an Option-Length or Option-Data byte following it. An Option-Kind byte of 0 is the End Of Options option, and is also only one byte. An Option-Kind byte of 0x02 indicates that this is the Maximum Segment Size option, and will be followed by a byte specifying the length of the MSS field (should be 0x04). Note that this length is the total length of the given options field, including Option-Kind and Option-Length bytes. So while the MSS value is typically expressed in two bytes, the length of the field will be 4 bytes (+2 bytes of kind and length). In short, an MSS option field with a value of 0x05B4 will show up as (0x02 0x04 0x05B4) in the TCP options section.
    • Padding – The TCP header padding is used to ensure that the TCP header ends and data begins on a 32 bit boundary. The padding is composed of zeros.

Some options may only be sent when SYN is set; they are indicated below as [SYN] Option-Kind and standard lengths given as (Option-Kind,Option-Length).
  • 0 (8 bits) - End of options list
  • 1 (8 bits) - No operation (NOP, Padding) This may be used to align option fields on 32-bit boundaries for better performance.
  • 2,4,SS (32 bits) - Maximum segment size (see maximum segment size) [SYN]
  • 3,3,S (24 bits) - Window scale (see window scaling for details) [SYN]
  • 4,2 (16 bits) - Selective Acknowledgement permitted. [SYN] (See selective acknowledgments for details)
  • 5,N,BBBB,EEEE,... (variable bits, N is either 10, 18, 26, or 34)- Selective ACKnowledgement (SACK) These first two bytes are followed by a list of 1-4 blocks being selectively acknowledged, specified as 32-bit begin/end pointers.
  • 8,10,TTTT,EEEE (80 bits)- Timestamp and echo of previous timestamp (see TCP timestamps for details)
  • 14,3,S (24 bits) - TCP Alternate Checksum Request. [SYN]
  • 15,N,... (variable bits) - TCP Alternate Checksum Data.

Protocol operation


TCP protocol operations may be divided into three phases. Connections must be properly established in a multi-step handshake process (connection establishment) before entering the data transfer phase. After data transmission is completed, the connection termination closes established virtual circuits and releases all allocated resources.

A TCP connection is managed by an operating system through a programming interface that represents the local end-point for communications, the Internet socket
Internet socket
In computer networking, an Internet socket or network socket is an endpoint of a bidirectional inter-process communication flow across an Internet Protocol-based computer network, such as the Internet....

. During the lifetime of a TCP connection it undergoes a series of state
State (computer science)
In computer science and automata theory, a state is a unique configuration of information in a program or machine. It is a concept that occasionally extends into some forms of systems programming such as lexers and parsers....

 changes:

  1. LISTENING : In case of a server, waiting for a connection request from any remote client.
  2. SYN-SENT : waiting for the remote peer to send back a TCP segment with the SYN and ACK flags set. (usually set by TCP clients)
  3. SYN-RECEIVED : waiting for the remote peer to send back an acknowledgment after having sent back a connection acknowledgment to the remote peer. (usually set by TCP servers)
  4. ESTABLISHED : the port is ready to receive/send data from/to the remote peer.
  5. FIN-WAIT-1 :
  6. FIN-WAIT-2 :Indicates that the client is waiting for the server's fin segment (which indicates the server's application process is ready to close and the server is ready to initiate its side of the connection termination)
  7. CLOSE-WAIT :
  8. LAST-ACK : indicates that the server is in the process of sending its own fin segment (which indicates the server's application process is ready to close and the server is ready to initiate it's side of the connection termination )
  9. TIME-WAIT : represents waiting for enough time to pass to be sure the remote peer received the acknowledgment of its connection termination request. According to RFC 793 a connection can stay in TIME-WAIT for a maximum of four minutes known as a MSL (maximum segment lifetime).
  10. CLOSED : connection is closed

Connection establishment


To establish a connection, TCP uses a three-way handshake
Handshaking
In information technology, telecommunications, and related fields, handshaking is an automated process of negotiation that dynamically sets parameters of a communications channel established between two entities before normal communication over the channel begins...

.

Before a client attempts to connect with a server, the server must first bind to a port to open it up for connections: this is called a passive open.
Once the passive open is established, a client may initiate an active open.
To establish a connection, the three-way (or 3-step) handshake occurs:
  1. SYN: The active open is performed by the client sending a SYN to the server. It sets the segment's sequence number to a random value A.
  2. SYN-ACK: In response, the server replies with a SYN-ACK. The acknowledgment number is set to one more than the received sequence number (A + 1), and the sequence number that the server chooses for the packet is another random number, B.
  3. ACK: Finally, the client sends an ACK back to the server. The sequence number is set to the received acknowledgement value i.e. A + 1, and the acknowledgement number is set to one more than the received sequence number i.e. B + 1.


At this point, both the client and server have received an acknowledgment of the connection.

Resource usage


Most implementations allocate an entry in a table that maps a session to a running operating system process. Because TCP packets do not include a session identifier, both endpoints identify the session using the client's address and port. Whenever a packet is received, the TCP implementation must perform a lookup on this table to find the destination process.

The number of sessions in the server side is limited only by memory and can grow as new connections arrive, but the client must allocate a random port before sending the first SYN to the server. This port remains allocated during the whole conversation, and effectively limits the number of outgoing connections from each of the client's IP addresses. If an application fails to properly close unrequired connections, a client can run out of resources and become unable to establish new TCP connections, even from other applications.

Both endpoints must also allocate space for unacknowledged packets and received (but unread) data.

Data transfer


There are a few key features that set TCP apart from User Datagram Protocol
User Datagram Protocol
The User Datagram Protocol is one of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol network without requiring...

:
  • Ordered data transfer - the destination host rearranges according to sequence number
  • Retransmission of lost packets - any cumulative stream not acknowledged is retransmitted
  • Error-free data transfer
  • Flow control - limits the rate a sender transfers data to guarantee reliable delivery. The receiver continually hints the sender on how much data can be received (controlled by the sliding window). When the receiving host's buffer fills, the next acknowledgment contains a 0 in the window size, to stop transfer and allow the data in the buffer to be processed.
  • Congestion control

Reliable transmission


TCP uses a sequence number to identify each byte of data. The sequence number identifies the order of the bytes sent from each computer so that the data can be reconstructed in order, regardless of any fragmentation, disordering, or packet loss
Packet loss
Packet loss occurs when one or more packets of data travelling across a computer network fail to reach their destination. Packet loss is distinguished as one of the three main error types encountered in digital communications; the other two being bit error and spurious packets caused due to noise.-...

 that may occur during transmission.
For every payload byte transmitted, the sequence number must be incremented. In the first two steps of the 3-way handshake, both computers exchange an initial sequence number (ISN). This number can be arbitrary, and should in fact be unpredictable to defend against TCP Sequence Prediction Attack
TCP Sequence Prediction Attack
A TCP sequence prediction attack is an attempt to predict the sequence number used to identify the packets in a TCP connection, which can be used to counterfeit packets.The attacker hopes to correctly guess the sequence number to be used by the sending host...

s.

TCP primarily uses a cumulative acknowledgment scheme, where the receiver sends an acknowledgment signifying that the receiver has received all data preceding the acknowledged sequence number. The sender sets the sequence number field to the sequence number of the first payload byte in the segment's data field, and the receiver sends an acknowledgment specifying the sequence number of the next byte they expect to receive. For example, if a sending computer sends a packet containing four payload bytes with a sequence number field of 100, then the sequence numbers of the four payload bytes are 100, 101, 102 and 103. When this packet arrives at the receiving computer, it would send back an acknowledgment number of 104 since that is the sequence number of the next byte it expects to receive in the next packet.

In addition to cumulative acknowledgments, TCP receivers can also send selective acknowledgments to provide further information.

If the sender infers that data has been lost in the network, it retransmits
Retransmission (data networks)
Retransmission, essentially identical with Automatic repeat request , is the resending of packets which have been either damaged or lost. It is a term that refers to one of the basic mechanisms used by protocols operating over a packet switched computer network to provide reliable communication...

 the data.

Error detection


Sequence numbers and acknowledgments cover discarding duplicate packets, retransmission of lost packets, and ordered-data transfer.
To assure correctness a checksum
Checksum
A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...

 field is included (see TCP segment structure for details on checksumming).

The TCP checksum is a weak check by modern standards. Data Link Layers with high bit error rates may require additional link error correction/detection capabilities. The weak checksum is partially compensated for by the common use of a CRC or better integrity check at layer 2, below both TCP and IP, such as is used in PPP
Point-to-Point Protocol
In networking, the Point-to-Point Protocol is a data link protocol commonly used in establishing a direct connection between two networking nodes...

 or the Ethernet
Ethernet
Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....

 frame. However, this does not mean that the 16-bit TCP checksum is redundant: remarkably, introduction of errors in packets between CRC-protected hops is common, but the end-to-end
End-to-end principle
The end-to-end principle is a classic design principle of computer networking which states that application specific functions ought to reside in the end hosts of a network rather than in intermediary nodes, provided they can be implemented "completely and correctly" in the end hosts...

 16-bit TCP checksum catches most of these simple errors. This is the end-to-end principle
End-to-end principle
The end-to-end principle is a classic design principle of computer networking which states that application specific functions ought to reside in the end hosts of a network rather than in intermediary nodes, provided they can be implemented "completely and correctly" in the end hosts...

 at work.

Flow control


TCP uses an end-to-end flow control
Flow control
In data communications, flow control is the process of managing the pacing of data transmission between two nodes to prevent a fast sender from outrunning a slow receiver. It provides a mechanism for the receiver to control the transmission speed, so that the receiving node is not overwhelmed with...

 protocol to avoid having the sender send data too fast for the TCP receiver to receive and process it reliably. Having a mechanism for flow control is essential in an environment where machines of diverse network speeds communicate. For example, if a PC sends data to a hand-held PDA that is slowly processing received data, the PDA must regulate data flow so as not to be overwhelmed.

TCP uses a sliding window
Sliding Window Protocol
A sliding window protocol is a feature of packet-based data transmission protocols. Sliding window protocols are used where reliable in-order delivery of packets is required, such as in the Data Link Layer as well as in the Transmission Control Protocol .Conceptually, each portion of the...

 flow control protocol. In each TCP segment, the receiver specifies in the receive window field the amount of additional received data (in bytes) that it is willing to buffer for the connection. The sending host can send only up to that amount of data before it must wait for an acknowledgment and window update from the receiving host.

When a receiver advertises a window size of 0, the sender stops sending data and starts the persist timer. The persist timer is used to protect TCP from a deadlock
Deadlock
A deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg"...

 situation that could arise if a subsequent window size update from the receiver is lost, and the sender cannot send more data until receiving a new window size update from the receiver. When the persist timer expires, the TCP sender attempts recovery by sending a small packet so that the receiver responds by sending another acknowledgement containing the new window size.

If a receiver is processing incoming data in small increments, it may repeatedly advertise a small receive window. This is referred to as the silly window syndrome
Silly window syndrome
Silly window syndrome is a problem in computer networking caused by poorly-implemented TCP flow control. If a server with this problem is unable to process all incoming data, it requests that its clients reduce the amount of data they send at a time...

, since it is inefficient to send only a few bytes of data in a TCP segment, given the relatively large overhead of the TCP header. TCP senders and receivers typically employ flow control logic to specifically avoid repeatedly sending small segments. The sender-side silly window syndrome avoidance logic is referred to as Nagle's algorithm
Nagle's algorithm
Nagle's algorithm, named after John Nagle, is a means of improving the efficiency of TCP/IP networks by reducing the number of packets that need to be sent over the network....

.

Congestion control


The final main aspect of TCP is congestion control. TCP uses a number of mechanisms to achieve high performance and avoid congestion collapse, where network performance can fall by several orders of magnitude. These mechanisms control the rate of data entering the network, keeping the data flow below a rate that would trigger collapse. They also yield an approximately max-min fair
Max-min fairness
In communication networks and multiplexing, a division of the bandwidth resources is said to be max-min fair when: firstly, the minimum data rate that a dataflow achieves is maximized; secondly, the second lowest data rate that a dataflow achieves is maximized, etc.In best-effort statistical...

 allocation between flows.

Acknowledgments for data sent, or lack of acknowledgments, are used by senders to infer network conditions between the TCP sender and receiver. Coupled with timers, TCP senders and receivers can alter the behavior of the flow of data. This is more generally referred to as congestion control and/or network congestion avoidance.

Modern implementations of TCP contain four intertwined algorithms: Slow-start
Slow-start
Slow-start is part of the congestion control strategy used by TCP, the data transmission protocol used by many Internet applications. Slow-start is used in conjunction with other algorithms to avoid sending more data than the network is capable of transmitting, that is, to avoid causing network...

, congestion avoidance
TCP congestion avoidance algorithm
Transmission Control Protocol uses a network congestion avoidance algorithm that includes various aspects of an additive increase/multiplicative decrease scheme, with other schemes such as slow-start in order to achieve congestion avoidance....

, fast retransmit
Fast retransmit
Fast Retransmit is an enhancement to TCP which reduces the time a sender waits before retransmitting a lost segment.A TCP sender uses a timer to recognize lost segments...

, and fast recovery (RFC 5681).

In addition, senders employ a retransmission timeout (RTO) that is based on the estimated round-trip time (or RTT) between the sender and receiver, as well as the variance in this round trip time. The behavior of this timer is specified in RFC 2988. There are subtleties in the estimation of RTT. For example, senders must be careful when calculating RTT samples for retransmitted packets; typically they use Karn's Algorithm
Karn's Algorithm
Karn's Algorithm addresses the problem of getting accurate estimates of the round-trip time for messages when using TCP. The algorithm was proposed by Phil Karn in 1987....

 or TCP timestamps (see RFC 1323). These individual RTT samples are then averaged over time to create a Smoothed Round Trip Time (SRTT) using Jacobson
Van Jacobson
Van Jacobson is one of the primary contributors to the TCP/IP protocol stack which is the technological foundation of today’s Internet. He is renowned for his pioneering achievements in network performance and scaling....

's algorithm. This SRTT value is what is finally used as the round-trip time estimate.

Enhancing TCP to reliably handle loss, minimize errors, manage congestion and go fast in very high-speed environments are ongoing areas of research and standards development. As a result, there are a number of TCP congestion avoidance algorithm
TCP congestion avoidance algorithm
Transmission Control Protocol uses a network congestion avoidance algorithm that includes various aspects of an additive increase/multiplicative decrease scheme, with other schemes such as slow-start in order to achieve congestion avoidance....

 variations.

Maximum segment size


The maximum segment size
Maximum segment size
The maximum segment size is a parameter of the TCP protocol that specifies the largest amount of data, specified in octets, that a computer or communications device can receive in a single TCP segment, and therefore in a single IP datagram. It does not count the TCP header or the IP header...

 (MSS) is the largest amount of data, specified in bytes, that TCP is willing to receive in a single segment. For best performance, the MSS should be set small enough to avoid IP fragmentation
IP fragmentation
The Internet Protocol implements datagram fragmentation, so that packets may be formed that can pass through a link with a smaller maximum transmission unit than the original datagram size....

, which can lead to packet loss and excessive retransmissions. To try to accomplish this, typically the MSS is announced by each side using the MSS option when the TCP connection is established, in which case it is derived from the maximum transmission unit (MTU) size of the data link layer
Data link layer
The data link layer is layer 2 of the seven-layer OSI model of computer networking. It corresponds to, or is part of the link layer of the TCP/IP reference model....

 of the networks to which the sender and receiver are directly attached. Furthermore, TCP senders can use path MTU discovery
Path MTU discovery
Path MTU Discovery is a standardized technique in computer networking for determining the maximum transmission unit size on the network path between two Internet Protocol hosts, usually with the goal of avoiding IP fragmentation...

 to infer the minimum MTU along the network path between the sender and receiver, and use this to dynamically adjust the MSS to avoid IP fragmentation within the network.

MSS announcement is also often called "MSS negotiation". Strictly speaking, the MSS is not "negotiated" between the originator and the receiver, because that would imply that both originator and receiver will negotiate and agree upon a single, unified MSS that applies to all communication in both directions of the connection. In fact, two completely independent values of MSS are permitted for the two directions of data flow in a TCP connection. This situation may arise, for example, if one of the devices participating in a connection has an extremely limited amount memory reserved (perhaps even smaller than the overall discovered Path MTU) for processing incoming TCP segments.

Selective acknowledgments


Relying purely on the cumulative acknowledgment scheme employed by the original TCP protocol can lead to inefficiencies when packets are lost. For example, suppose 10,000 bytes are sent in 10 different TCP packets, and the first packet is lost during transmission. In a pure cumulative acknowledgment protocol, the receiver cannot say that it received bytes 1,000 to 9,999 successfully, but failed to receive the first packet, containing bytes 0 to 999. Thus the sender may then have to resend all 10,000 bytes.

To solve this problem TCP employs the selective acknowledgment (SACK) option, defined in RFC 2018, which allows the receiver to acknowledge discontinuous blocks of packets that were received correctly, in addition to the sequence number of the last contiguous byte received successively, as in the basic TCP acknowledgment. The acknowledgement can specify a number of SACK blocks, where each SACK block is conveyed by the starting and ending sequence numbers of a contiguous range that the receiver correctly received. In the example above, the receiver would send SACK with sequence numbers 1000 and 9999. The sender thus retransmits only the first packet, bytes 0 to 999.

An extension to the SACK option is the duplicate-SACK option, defined in RFC 2883. An out-of-order packet delivery can often falsely indicate the TCP sender of lost packet and, in turn, the TCP sender retransmits the suspected-to-be-lost packet and slow down the data delivery to prevent network congestion. The TCP sender undoes the action of slow-down, that is a recovery of the original pace of data transmission, upon receiving a D-SACK that indicates the retransmitted packet is duplicate.

The SACK option is not mandatory and it is used only if both parties support it. This is negotiated when connection is established. SACK uses the optional part of the TCP header (see TCP segment structure for details). The use of SACK is widespread - all popular TCP stacks support it. Selective acknowledgment is also used in Stream Control Transmission Protocol
Stream Control Transmission Protocol
In computer networking, the Stream Control Transmission Protocol is a Transport Layer protocol, serving in a similar role to the popular protocols Transmission Control Protocol and User Datagram Protocol...

 (SCTP).

Window scaling



For more efficient use of high bandwidth networks, a larger TCP window size may be used. The TCP window size field controls the flow of data and its value is limited to between 2 and 65,535 bytes.

Since the size field cannot be expanded, a scaling factor is used. The TCP window scale option
TCP window scale option
The TCP window scale option is an option to increase the TCP receive window size above its maximum value of 65,535 bytes.This TCP option, along with several others, is defined in IETF RFC 1323 which deals with Long-Fat Networks, or LFN....

, as defined in RFC 1323, is an option used to increase the maximum window size from 65,535 bytes to 1 gigabyte. Scaling up to larger window sizes is a part of what is necessary for TCP Tuning
TCP tuning
TCP tuning techniques adjust the network congestion avoidance parameters of TCP connections over high-bandwidth, high-latency networks. Well-tuned networks can perform up to 10 times faster in some cases.- Bandwidth-delay product :...

.

The window scale option is used only during the TCP 3-way handshake. The window scale value represents the number of bits to left-shift the 16-bit window size field. The window scale value can be set from 0 (no shift) to 14 for each direction independently. Both sides must send the option in their SYN segments to enable window scaling in either direction.

Some routers and packet firewalls rewrite the window scaling factor during a transmission. This causes sending and receiving sides to assume different TCP window sizes. The result is non-stable traffic that may be very slow. The problem is visible on some sending and receiving sites behind the path of defective routers.

TCP timestamps


TCP timestamps, defined in RFC 1323, can help TCP determine in which order packets were sent.
TCP timestamps are not normally aligned to the system clock and start at some random value. Many operating systems will increment the timestamp for every elapsed milisecond; however the RFC only states that the tics should be proportional.

There are two timestamp fields:
a 4-byte sender timestamp value (my timestamp)
a 4-byte echo reply timestamp value (the most recent timestamp received from you).

TCP timestamps are used in an algorithm known as Protection Against Wrapped Sequence numbers, or PAWS (see RFC 1323 for details). PAWS is used when the TCP window size exceeds the possible numbers of sequence numbers (2^32 or 4 billion/gig). In the case where a packet was potentially retransmitted it answers the question: "Is this sequence number in the first 4GB or the second?" And the timestamp is used to break the tie.

RFC 1323 incorrectly states in section 2.2 that the window scale must be limited to 2^14 to remain under 1GB (which is correct, but the sequence number limit is 4GB); however a scale of 16 and a window size of 65535 would be 65536 less than the 2^32 possible sequence numbers and thus an acceptable yet excessive value. Because of this error many systems have limited the max scale to 2^14 to "follow the RFC".

Also, the Eifel detection algorithm (RFC 3522) uses TCP timestamps to determine if retransmissions are occurring because packets are lost or simply out of order.

Out of band data


One is able to interrupt or abort the queued stream instead of waiting for the stream to finish. This is done by specifying the data as urgent. This tells the receiving program to process it immediately, along with the rest of the urgent data. When finished, TCP informs the application and resumes back to the stream queue.
An example is when TCP is used for a remote login session, the user can send a keyboard sequence that interrupts or aborts the program at the other end. These signals are most often needed when a program on the remote machine fails to operate correctly. The signals must be sent without waiting for the program to finish its current transfer. name="comer" />

TCP OOB data was not designed for the modern Internet. The urgent pointer only alters the processing on the remote host and doesn't expedite any processing on the network itself. When it gets to the remote host there are two slightly different interpretations of the protocol, which means only single bytes of OOB data are reliable. This is assuming it's reliable at all as it's one of the least commonly used protocol elements and tends to be poorly implemented.

Forcing data delivery


Normally, TCP waits for 200 ms or for a full packet of data to send (Nagle's Algorithm = tries to group small messages into a single packet). This creates minor, but potentially serious delays if repeated constantly during a file transfer. For example a typical send block would be 4KB, a typical MSS is 1460, so 2 packets go out on a 10Mb/s ethernet taking ~1.2 ms each followed by a third carrying the remaining 1176 after a 197ms pause because TCP is waiting for a full buffer.

In the case of telnet, each user keystroke is echoed back by the server before the user can see it on the screen. This delay would become very annoying.

Setting the socket option TCP_NODELAY overrides the default 200ms send delay. Application programs use this socket option to force output to be sent after writing a character or line of characters.

The RFC defines the PSH push bit as "a message to the receiving TCP stack to send this data immediately up to the receiving application". There is no way to indicate or control it in User space
User space
A conventional computer operating system usually segregates virtual memory into kernel space and user space. Kernel space is strictly reserved for running the kernel, kernel extensions, and most device drivers...

 using Berkeley sockets
Berkeley sockets
The Berkeley sockets application programming interface comprises a library for developing applications in the C programming language that perform inter-process communication, most commonly for communications across a computer network....

 and it is controlled by Protocol stack
Protocol stack
The protocol stack is an implementation of a computer networking protocol suite. The terms are often used interchangeably. Strictly speaking, the suite is the definition of the protocols, and the stack is the software implementation of them....

 only.

Connection termination


The connection termination phase uses, at most, a four-way handshake, with each side of the connection terminating independently. When an endpoint wishes to stop its half of the connection, it transmits a FIN packet, which the other end acknowledges with an ACK. Therefore, a typical tear-down requires a pair of FIN and ACK segments from each TCP endpoint. After both FIN/ACK exchanges are concluded, the terminating side waits for a timeout before finally closing the connection, during which time the local port is unavailable for new connections; this prevents confusion due to delayed packets being delivered during subsequent connections.

A connection can be "half-open"
TCP half-open
The term half-open refers to TCP connections which state is out of synchronization between the two communicating hosts, possibly due to crash of one side...

, in which case one side has terminated its end, but the other has not. The side that has terminated can no longer send any data into the connection, but the other side can. The terminating side should continue reading the data until the other side terminates as well.

It is also possible to terminate the connection by a 3-way handshake, when host A sends a FIN and host B replies with a FIN & ACK (merely combines 2 steps into one) and host A replies with an ACK. This is perhaps the most common method.

It is possible for both hosts to send FINs simultaneously then both just have to ACK. This could possibly be considered a 2-way handshake since the FIN/ACK sequence is done in parallel for both directions.

Some host TCP stacks may implement a half-duplex close sequence, as Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 or HP-UX
HP-UX
HP-UX is Hewlett-Packard's proprietary implementation of the Unix operating system, based on UNIX System V and first released in 1984...

 do. If such a host actively closes a connection but still has not read all the incoming data the stack already received from the link, this host sends a RST instead of a FIN (Section 4.2.2.13 in RFC 1122). This allows a TCP application to be sure the remote application has read all the data the former sent—waiting the FIN from the remote side, when it actively closes the connection. However, the remote TCP stack cannot distinguish between a Connection Aborting RST and this Data Loss RST. Both cause the remote stack to throw away all the data it received, but that the application still didn't read.

Some application protocols may violate the OSI model layers
OSI model
The Open Systems Interconnection model is a product of the Open Systems Interconnection effort at the International Organization for Standardization. It is a prescription of characterizing and standardizing the functions of a communications system in terms of abstraction layers. Similar...

, using the TCP open/close handshaking for the application protocol open/close handshaking - these may find the RST problem on active close. As an example:
s = connect(remote);
send(s, data);
close(s);
For a usual program flow like above, a TCP/IP stack like that described above does not guarantee that all the data arrives to the other application.

Vulnerabilities


TCP may be attacked in a variety of ways. The results of a thorough security assessment of TCP, along with possible mitigations for the identified issues, was published in 2009, and is currently being pursued within the IETF.

Denial of service


By using a spoofed IP
IP address spoofing
In computer networking, the term IP address spoofing or IP spoofing refers to the creation of Internet Protocol packets with a forged source IP address, called spoofing, with the purpose of concealing the identity of the sender or impersonating another computing system.-Background:The basic...

 address and repeatedly sending purposely assembled
Mangled packet
In computer networking, a mangled or invalid packet is a packet — especially IP packet — that either lacks order or self-coherence, or contains code aimed to confuse or disrupt computers, firewalls, routers, or any service present on the network....

 SYN packets, attackers can cause the server to consume large amounts of resources keeping track of the bogus connections. This is known as a SYN flood
SYN flood
A SYN flood is a form of denial-of-service attack in which an attacker sends a succession of SYN requests to a target's system in an attempt to consume enough server resources to make the system unresponsive to legitimate traffic.-Technical details:...

 attack. Proposed solutions to this problem include SYN cookies
SYN cookies
SYN Cookies are the key element of a technique used to guard against SYN flood attacks. Daniel J. Bernstein, the technique's primary inventor, defines SYN Cookies as "particular choices of initial TCP sequence numbers by TCP servers." In particular, the use of SYN Cookies allows a server to avoid...

 and Cryptographic puzzles. Sockstress
Sockstress
Sockstress is a method that is used to attack servers on the Internet and other networks utilizing TCP, including Windows, Mac, Linux, BSD and any router or other internet appliance that accepts TCP connections...

 is a similar attack, that might be mitigated with system resource management. An advanced DoS attack involving the exploitation of the TCP Persist Timer was analyzed at Phrack
Phrack
Phrack is an ezine written by and for hackers first published November 17, 1985. Described by Fyodor as "the best, and by far the longest running hacker zine," the magazine is open for contributions by anyone who desires to publish remarkable works or express original ideas on the topics of interest...

 #66.

Connection hijacking



An attacker who is able to eavesdrop a TCP session and redirect packets can hijack a TCP connection. To do so, the attacker learns the sequence number from the ongoing communication and forges a false segment that looks like the next segment in the stream. Such a simple hijack can result in one packet being erroneously accepted at one end. When the receiving host acknowledges the extra segment to the other side of the connection, synchronization is lost. Hijacking might be combined with ARP or routing attacks that allow taking control of the packet flow, so as to get permanent control of the hijacked TCP connection.

Impersonating a different IP address was not difficult prior to RFC 1948, when the initial sequence number was easily guessable. That allowed an attacker to blindly send a sequence of packets that the receiver would believe to come from a different IP address, without the need to deploy ARP or routing attacks: it is enough to ensure that the legitimate host of the impersonated IP address is down, or bring it to that condition using denial of service attacks. This is why the initial sequence number is chosen at random.

TCP ports



TCP uses port number
TCP and UDP port
In computer networking, a port is an application-specific or process-specific software construct serving as a communications endpoint in a computer's host operating system. A port is associated with an IP address of the host, as well as the type of protocol used for communication...

s to identify sending and receiving application end-points on a host, or Internet socket
Internet socket
In computer networking, an Internet socket or network socket is an endpoint of a bidirectional inter-process communication flow across an Internet Protocol-based computer network, such as the Internet....

s
. Each side of a TCP connection has an associated 16-bit unsigned port number (0-65535) reserved by the sending or receiving application. Arriving TCP data packets are identified as belonging to a specific TCP connection by its sockets, that is, the combination of source host address, source port, destination host address, and destination port. This means that a server computer can provide several clients with several services simultaneously, as long as a client takes care of initiating any simultaneous connections to one destination port from different source ports.

Port numbers are categorized into three basic categories: well-known, registered, and dynamic/private. The well-known ports are assigned by the Internet Assigned Numbers Authority
Internet Assigned Numbers Authority
The Internet Assigned Numbers Authority is the entity that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System , media types, and other Internet Protocol-related symbols and numbers...

 (IANA) and are typically used by system-level or root processes. Well-known applications running as servers and passively listening for connections typically use these ports. Some examples include: FTP
File Transfer Protocol
File Transfer Protocol is a standard network protocol used to transfer files from one host to another host over a TCP-based network, such as the Internet. FTP is built on a client-server architecture and utilizes separate control and data connections between the client and server...

 (20 and 21), SSH
Secure Shell
Secure Shell is a network protocol for secure data communication, remote shell services or command execution and other secure network services between two networked computers that it connects via a secure channel over an insecure network: a server and a client...

 (22), TELNET
TELNET
Telnet is a network protocol used on the Internet or local area networks to provide a bidirectional interactive text-oriented communications facility using a virtual terminal connection...

 (23), SMTP (25) and HTTP (80). Registered ports are typically used by end user applications as ephemeral
Ephemeral port
An ephemeral port is a short-lived transport protocol port for Internet Protocol communications allocated automatically from a predefined range by the TCP/IP software...

 source ports when contacting servers, but they can also identify named services that have been registered by a third party. Dynamic/private ports can also be used by end user applications, but are less commonly so. Dynamic/private ports do not contain any meaning outside of any particular TCP connection.

Development


TCP is a complex protocol. However, while significant enhancements have been made and proposed over the years, its most basic operation has not changed significantly since its first specification RFC 675 in 1974, and the v4 specification RFC 793, published in September 1981. RFC 1122, Host Requirements for Internet Hosts, clarified a number of TCP protocol implementation requirements. RFC 2581, TCP Congestion Control, one of the most important TCP-related RFCs in recent years, describes updated algorithms that avoid undue congestion. In 2001, RFC 3168 was written to describe explicit congestion notification
Explicit Congestion Notification
Explicit Congestion Notification is an extension to the Internet Protocol and to the Transmission Control Protocol and is defined in RFC 3168 . ECN allows end-to-end notification of network congestion without dropping packets. ECN is an optional feature that is only used when both endpoints...

 (ECN
Explicit Congestion Notification
Explicit Congestion Notification is an extension to the Internet Protocol and to the Transmission Control Protocol and is defined in RFC 3168 . ECN allows end-to-end notification of network congestion without dropping packets. ECN is an optional feature that is only used when both endpoints...

), a congestion avoidance signaling mechanism.

The original TCP congestion avoidance algorithm
TCP congestion avoidance algorithm
Transmission Control Protocol uses a network congestion avoidance algorithm that includes various aspects of an additive increase/multiplicative decrease scheme, with other schemes such as slow-start in order to achieve congestion avoidance....

 was known as "TCP Tahoe", but many alternative algorithms have since been proposed (including TCP Reno, TCP Vegas
TCP Vegas
TCP Vegas is a TCP congestion avoidance algorithm that emphasizes packet delay, rather than packet loss, as a signal to help determine the rate at which to send packets. It was developed at the University of Arizona by Lawrence Brakmo and Larry L...

, FAST TCP
FAST TCP
FAST TCP is a TCP congestion avoidance algorithm especially targeted at long-distance, high latency links, developed at the , California Institute of Technology and now being commercialized by...

, TCP New Reno, and TCP Hybla).

TCP Interactive (iTCP) is a research effort into TCP extensions that allows applications to subscribe to TCP events and register handler components that can launch applications for various purposes, including application-assisted congestion control.

Multipath TCP (MPTCP) is an ongoing effort within the IETF that aims at allowing a TCP connection to use multiple paths to maximise resource usage and increase redundancy.
The redundancy offered by Multipath TCP in the context of wireless networks enables statistical multiplexing of resources, and thus increases TCP throughput dramatically. Multipath TCP also brings performance benefits in datacenter environments and an implementation in the Linux kernel is being developed.

TCP Cookie Transactions
TCP Cookie Transactions
In computer networking, TCP Cookie Transactions is an extension of Transmission Control Protocol intended to secure it against denial-of-service attacks, such as resource exhaustion by SYN flooding and malicious connection termination by third parties...

 (TCPCT) is an extension proposed in December 2009 to secure servers against denial-of-service attacks. Unlike SYN cookies
SYN cookies
SYN Cookies are the key element of a technique used to guard against SYN flood attacks. Daniel J. Bernstein, the technique's primary inventor, defines SYN Cookies as "particular choices of initial TCP sequence numbers by TCP servers." In particular, the use of SYN Cookies allows a server to avoid...

, TCPCT does not conflict with other TCP extensions such as window scaling. TCPCT was designed due to necessities of DNSSEC
DNSSEC
The Domain Name System Security Extensions is a suite of Internet Engineering Task Force specifications for securing certain kinds of information provided by the Domain Name System as used on Internet Protocol networks...

, where servers have to handle large numbers of short-lived TCP connections.

tcpcrypt
Tcpcrypt
In computer networking, tcpcrypt is a transport layer communication encryption protocol. Unlike prior protocols like TLS , tcpcrypt is implemented as a TCP extension. It was designed by a team of six security and networking experts: Andrea Bittau, Mike Hamburg, Mark Handley, David Mazières, Dan...

 is an extension proposed in July 2010 to provide transport-level encryption directly in TCP itself. It's designed to work transparently and not require any configuration. Unlike TLS
Transport Layer Security
Transport Layer Security and its predecessor, Secure Sockets Layer , are cryptographic protocols that provide communication security over the Internet...

 (SSL), tcpcrypt itself does not provide authentication, but provides simple primitives down to the application to do that. , the first tcpcrypt IETF draft has been published and implementations exist for several major platforms.

TCP over wireless networks


TCP has been optimized for wired networks. Any packet loss
Packet loss
Packet loss occurs when one or more packets of data travelling across a computer network fail to reach their destination. Packet loss is distinguished as one of the three main error types encountered in digital communications; the other two being bit error and spurious packets caused due to noise.-...

 is considered to be the result of network congestion
Network congestion
In data networking and queueing theory, network congestion occurs when a link or node is carrying so much data that its quality of service deteriorates. Typical effects include queueing delay, packet loss or the blocking of new connections...

 and the congestion window size is reduced dramatically as a precaution. However, wireless links are known to experience sporadic and usually temporary losses due to fading, shadowing, hand off, and other radio effects, that cannot be considered congestion. After the (erroneous) back-off of the congestion window size, due to wireless packet loss, there can be a congestion avoidance phase with a conservative decrease in window size. This causes the radio link to be underutilized. Extensive research has been done on the subject of how to combat these harmful effects. Suggested solutions can be categorized as end-to-end solutions (which require modifications at the client or server), link layer solutions (such as RLP
Radio Link Protocol
Radio Link Protocol is an automatic repeat request fragmentation protocol used over a wireless air interface. Most wireless air interfaces are tuned to provide 1% packet loss, and most Vocoders are mutually tuned to sacrifice very little voice quality at 1% packet loss...

 in cellular networks), or proxy based solutions (which require some changes in the network without modifying end nodes.

Hardware implementations


One way to overcome the processing power requirements of TCP is to build hardware implementations of it, widely known as TCP Offload Engine
TCP Offload Engine
TCP offload engine or TOE is a technology used in network interface cards to offload processing of the entire TCP/IP stack to the network controller...

s (TOE). The main problem of TOEs is that they are hard to integrate into computing systems, requiring extensive changes in the operating system of the computer or device. One company to develop such a device was Alacritech
Alacritech
Alacritech is a Silicon Valley manufacturer of storage network acceleration solutions that improve the performance of the existing enterprise network storage infrastructure without having to replace the investments made in the NFS-based storage network...

.

Debugging


A packet sniffer
Packet sniffer
A packet analyzer is a computer program or a piece of computer hardware that can intercept and log traffic passing over a digital network or part of a network...

, which intercepts TCP traffic on a network link, can be useful in debugging networks, network stacks and applications that use TCP by showing the user what packets are passing through a link. Some networking stacks support the SO_DEBUG socket option, which can be enabled on the socket using setsockopt. That option dumps all the packets, TCP states, and events on that socket, which is helpful in debugging. Netstat
Netstat
netstat is a command-line tool that displays network connections , routing tables, and a number of network interface statistics...

 is another utility that can be used for debugging.

Alternatives


For many applications TCP is not appropriate. One big problem (at least with normal implementations) is that the application cannot get at the packets coming after a lost packet until the retransmitted copy of the lost packet is received. This causes problems for real-time applications such as streaming media
Streaming media
Streaming media is multimedia that is constantly received by and presented to an end-user while being delivered by a streaming provider.The term "presented" is used in this article in a general sense that includes audio or video playback. The name refers to the delivery method of the medium rather...

, real-time multiplayer games and voice over IP
Voice over IP
Voice over Internet Protocol is a family of technologies, methodologies, communication protocols, and transmission techniques for the delivery of voice communications and multimedia sessions over Internet Protocol networks, such as the Internet...

 (VoIP) where it is sometimes more useful to get most of the data in a timely fashion than it is to get all of the data in order.

For both historical and performance reasons, most storage area network
Storage area network
A storage area network is a dedicated network that provides access to consolidated, block level data storage. SANs are primarily used to make storage devices, such as disk arrays, tape libraries, and optical jukeboxes, accessible to servers so that the devices appear like locally attached devices...

s (SANs) prefer to use Fibre Channel
Fibre Channel
Fibre Channel, or FC, is a gigabit-speed network technology primarily used for storage networking. Fibre Channel is standardized in the T11 Technical Committee of the InterNational Committee for Information Technology Standards , an American National Standards Institute –accredited standards...

 protocol (FCP) instead of TCP/IP.

Also for embedded systems, network booting
Network booting
Network booting is the process of booting a computer from a network rather than a local drive. This method of booting can be used by routers, diskless workstations and centrally managed computers such as public computers at libraries and schools...

 and servers that serve simple requests from huge numbers of clients (e.g. DNS
Domain name system
The Domain Name System is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities...

 servers) the complexity of TCP can be a problem. Finally, some tricks such as transmitting data between two hosts that are both behind NAT
Network address translation
In computer networking, network address translation is the process of modifying IP address information in IP packet headers while in transit across a traffic routing device....

 (using STUN
STUN
STUN is a standardized set of methods, including a network protocol, used in NAT traversal for applications of real-time voice, video, messaging, and other interactive IP communications....

 or similar systems) are far simpler without a relatively complex protocol like TCP in the way.

Generally, where TCP is unsuitable, the User Datagram Protocol
User Datagram Protocol
The User Datagram Protocol is one of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol network without requiring...

 (UDP) is used. This provides the application multiplexing
Multiplexing
The multiplexed signal is transmitted over a communication channel, which may be a physical transmission medium. The multiplexing divides the capacity of the low-level communication channel into several higher-level logical channels, one for each message signal or data stream to be transferred...

 and checksums that TCP does, but does not handle building streams or retransmission, giving the application developer the ability to code them in a way suitable for the situation, or to replace them with other methods like forward error correction
Forward error correction
In telecommunication, information theory, and coding theory, forward error correction or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels....

 or interpolation
Interpolation (computer programming)
In the context of computer animation, interpolation refers to inbetweening, or filling in frames between the key frames. It typically calculates the inbetween frames through use of piecewise polynomial interpolation to draw images semi-automatically....

.

SCTP
Stream Control Transmission Protocol
In computer networking, the Stream Control Transmission Protocol is a Transport Layer protocol, serving in a similar role to the popular protocols Transmission Control Protocol and User Datagram Protocol...

 is another IP protocol that provides reliable stream oriented services similar to TCP. It is newer and considerably more complex than TCP, and has not yet seen widespread deployment. However, it is especially designed to be used in situations where reliability and near-real-time considerations are important.

Venturi Transport Protocol
Venturi Transport Protocol
Venturi Transport Protocol is a patented proprietary transport layer protocol that is designed to transparently replace TCP in order to overcome inefficiencies in the design of TCP related to wireless data transport...

 (VTP) is a patented proprietary protocol
Proprietary protocol
In telecommunications, a proprietary protocol is a communications protocol owned by a single organization or individual.-Enforcement:Proprietors may enforce restrictions through patents and by keeping the protocol specification a trade secret...

 that is designed to replace TCP transparently to overcome perceived inefficiencies related to wireless data transport.

TCP also has issues in high bandwidth environments. The TCP congestion avoidance algorithm
TCP congestion avoidance algorithm
Transmission Control Protocol uses a network congestion avoidance algorithm that includes various aspects of an additive increase/multiplicative decrease scheme, with other schemes such as slow-start in order to achieve congestion avoidance....

 works very well for ad-hoc environments where the data sender is not known in advance, but if the environment is predictable, a timing based protocol such as Asynchronous Transfer Mode
Asynchronous Transfer Mode
Asynchronous Transfer Mode is a standard switching technique designed to unify telecommunication and computer networks. It uses asynchronous time-division multiplexing, and it encodes data into small, fixed-sized cells. This differs from approaches such as the Internet Protocol or Ethernet that...

 (ATM) can avoid TCP's retransmits overhead.

Multipurpose Transaction Protocol
Multipurpose Transaction Protocol
Multipurpose Transaction Protocol software is a proprietary transport protocol developed and marketed by Data Expedition, Inc. . DEI claims that MTP offers superior performance and reliability when compared to the Transmission Control Protocol transport protocol.-General:MTP is implemented using...

 (MTP/IP) is patented proprietary software that is designed to adaptively achieve high throughput and transaction performance in a wide variety of network conditions, particularly those where TCP is perceived to be inefficient.

TCP checksum for IPv4


When TCP runs over IPv4
IPv4
Internet Protocol version 4 is the fourth revision in the development of the Internet Protocol and the first version of the protocol to be widely deployed. Together with IPv6, it is at the core of standards-based internetworking methods of the Internet...

, the method used to compute the checksum is defined in RFC 793:

The checksum field is the 16 bit one's complement of the one's complement sum of all 16-bit words in the header and text. If a segment contains an odd number of header and text octets to be checksummed, the last octet is padded on the right with zeros to form a 16-bit word for checksum purposes. The pad is not transmitted as part of the segment. While computing the checksum, the checksum field itself is replaced with zeros.


In other words, after appropriate padding, all 16-bit words are added using one's complement arithmetic. The sum is then bitwise complemented and inserted as the checksum field. A pseudo-header that mimics the IPv4 packet header used in the checksum computation is shown in the table below.
TCP pseudo-header for checksum computation (IPv4)
Bit offset 0–3 4–7 8–15 16–31
0 Source address
32 Destination address
64 Zeros Protocol TCP length
96 Source port Destination port
128 Sequence number
160 Acknowledgement number
192 Data offset Reserved Flags Window
224 Checksum Urgent pointer
256 Options (optional)
256/288+  
Data
 


The source and destination addresses are those of the IPv4 header. The protocol value is 6 for TCP (cf. List of IP protocol numbers). The TCP length field is the length of the TCP header and data.

TCP checksum for IPv6


When TCP runs over IPv6
IPv6
Internet Protocol version 6 is a version of the Internet Protocol . It is designed to succeed the Internet Protocol version 4...

, the method used to compute the checksum is changed, as per RFC 2460:
Any transport or other upper-layer protocol that includes the addresses from the IP header in its checksum computation must be modified for use over IPv6, to include the 128-bit IPv6 addresses instead of 32-bit IPv4 addresses.


A pseudo-header that mimics the IPv6 header for computation of the checksum is shown below.
TCP pseudo-header for checksum computation (IPv6)
Bit offset 0 - 7 8–15 16–23 24–31
0 Source address
32
64
96
128 Destination address
160
192
224
256 TCP length
288 Zeros Next header
320 Source port Destination port
352 Sequence number
384 Acknowledgement number
416 Data offset Reserved Flags Window
448 Checksum Urgent pointer
480 Options (optional)
480/512+  
Data
 

  • Source address – the one in the IPv6 header
  • Destination address – the final destination; if the IPv6 packet doesn't contain a Routing header, TCP uses the destination address in the IPv6 header, otherwise, at the originating node, it uses the address in the last element of the Routing header, and, at the receiving node, it uses the destination address in the IPv6 header.
  • TCP length – the length of the TCP header and data
  • Next Header – the protocol value for TCP

Checksum offload


Many TCP/IP software stack implementations provide options to use hardware assistance to automatically compute the checksum in the network adapter prior to transmission onto the network or upon reception from the network for validation. This may relieve OS to use precious CPU cycles calculating checksum. Hence, overall network performance is increased.

This feature may cause packet analyzers detecting outbound network traffic upstream of the network adapter and unaware or uncertain about the use of checksum offload to report invalid checksum in outbound packets.

See also



  • Connection-oriented protocol
    Connection-oriented protocol
    A connection-oriented networking protocol is one that establishes a communication session, then delivers a stream of data in the same order as it was sent. It may be a circuit switched connection, or a virtual circuit connection in a packet switched network...

  • T/TCP
    T/TCP
    T/TCP is a variant of the TCP protocol.It is an experimental TCP extension for efficient transaction-oriented service.It was developed to fill the gap between TCP and UDP, by Bob Braden in 1994....

     variant of TCP
  • TCP and UDP port
    TCP and UDP port
    In computer networking, a port is an application-specific or process-specific software construct serving as a communications endpoint in a computer's host operating system. A port is associated with an IP address of the host, as well as the type of protocol used for communication...

  • TCP and UDP port numbers for a long list of ports/services
  • TCP congestion avoidance algorithms
  • Nagle's algorithm
    Nagle's algorithm
    Nagle's algorithm, named after John Nagle, is a means of improving the efficiency of TCP/IP networks by reducing the number of packets that need to be sent over the network....

  • Karn's Algorithm
    Karn's Algorithm
    Karn's Algorithm addresses the problem of getting accurate estimates of the round-trip time for messages when using TCP. The algorithm was proposed by Phil Karn in 1987....

  • Maximum transmission unit
    Maximum transmission unit
    In computer networking, the maximum transmission unit of a communications protocol of a layer is the size of the largest protocol data unit that the layer can pass onwards. MTU parameters usually appear in association with a communications interface...

  • IP fragmentation
    IP fragmentation
    The Internet Protocol implements datagram fragmentation, so that packets may be formed that can pass through a link with a smaller maximum transmission unit than the original datagram size....

  • Maximum segment size
    Maximum segment size
    The maximum segment size is a parameter of the TCP protocol that specifies the largest amount of data, specified in octets, that a computer or communications device can receive in a single TCP segment, and therefore in a single IP datagram. It does not count the TCP header or the IP header...

  • Silly window syndrome
    Silly window syndrome
    Silly window syndrome is a problem in computer networking caused by poorly-implemented TCP flow control. If a server with this problem is unable to process all incoming data, it requests that its clients reduce the amount of data they send at a time...

  • TCP segment
  • TCP Sequence Prediction Attack
    TCP Sequence Prediction Attack
    A TCP sequence prediction attack is an attempt to predict the sequence number used to identify the packets in a TCP connection, which can be used to counterfeit packets.The attacker hopes to correctly guess the sequence number to be used by the sending host...

  • SYN flood
    SYN flood
    A SYN flood is a form of denial-of-service attack in which an attacker sends a succession of SYN requests to a target's system in an attempt to consume enough server resources to make the system unresponsive to legitimate traffic.-Technical details:...

  • SYN cookies
    SYN cookies
    SYN Cookies are the key element of a technique used to guard against SYN flood attacks. Daniel J. Bernstein, the technique's primary inventor, defines SYN Cookies as "particular choices of initial TCP sequence numbers by TCP servers." In particular, the use of SYN Cookies allows a server to avoid...

  • TCP Tuning
    TCP tuning
    TCP tuning techniques adjust the network congestion avoidance parameters of TCP connections over high-bandwidth, high-latency networks. Well-tuned networks can perform up to 10 times faster in some cases.- Bandwidth-delay product :...

     for high performance networks
  • Path MTU discovery
    Path MTU discovery
    Path MTU Discovery is a standardized technique in computer networking for determining the maximum transmission unit size on the network path between two Internet Protocol hosts, usually with the goal of avoiding IP fragmentation...

  • tcphdr
    Tcphdr
    tcphdr is a struct in the C programming language. The tcphdr struct is used as a template to form a TCP header in a raw socket. The structure can be found in the default include files of most Unix distributions. It is most commonly located in the header file.The tcphdr struct is unique in that it...

     - the Unix TCP header structure in the C programming language
    C (programming language)
    C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

  • Stream Control Transmission Protocol
    Stream Control Transmission Protocol
    In computer networking, the Stream Control Transmission Protocol is a Transport Layer protocol, serving in a similar role to the popular protocols Transmission Control Protocol and User Datagram Protocol...

     (SCTP)
  • Multipurpose Transaction Protocol
    Multipurpose Transaction Protocol
    Multipurpose Transaction Protocol software is a proprietary transport protocol developed and marketed by Data Expedition, Inc. . DEI claims that MTP offers superior performance and reliability when compared to the Transmission Control Protocol transport protocol.-General:MTP is implemented using...

     (MTP/IP)
  • Transport protocol comparison table
  • Sockstress
    Sockstress
    Sockstress is a method that is used to attack servers on the Internet and other networks utilizing TCP, including Windows, Mac, Linux, BSD and any router or other internet appliance that accepts TCP connections...



Further reading

  • W. Richard Stevens
    W. Richard Stevens
    William Richard Stevens was one of the most famous and widely acclaimed authors of UNIX and TCP/IP books.-Biography:...

    . TCP/IP Illustrated, Volume 1: The Protocols. ISBN 0-201-63346-9
  • W. Richard Stevens
    W. Richard Stevens
    William Richard Stevens was one of the most famous and widely acclaimed authors of UNIX and TCP/IP books.-Biography:...

     and Gary R. Wright. TCP/IP Illustrated, Volume 2: The Implementation. ISBN 0-201-63354-X
  • W. Richard Stevens
    W. Richard Stevens
    William Richard Stevens was one of the most famous and widely acclaimed authors of UNIX and TCP/IP books.-Biography:...

    . TCP/IP Illustrated, Volume 3: TCP for Transactions
    T/TCP
    T/TCP is a variant of the TCP protocol.It is an experimental TCP extension for efficient transaction-oriented service.It was developed to fill the gap between TCP and UDP, by Bob Braden in 1994....

    , HTTP, NNTP
    Network News Transfer Protocol
    The Network News Transfer Protocol is an Internet application protocol used for transporting Usenet news articles between news servers and for reading and posting articles by end user client applications...

    , and the UNIX Domain Protocols. ISBN 0-201-63495-3

RFC

  • RFC 675 - Specification of Internet Transmission Control Program, December 1974 Version
  • RFC 793 - TCP v4
  • RFC 1122 - includes some error corrections for TCP
  • RFC 1323 - TCP-Extensions
  • RFC 1379 - Extending TCP for Transactions—Concepts
  • RFC 1948 - Defending Against Sequence Number Attacks
  • RFC 2018 - TCP Selective Acknowledgment Options
  • RFC 2988 - Computing TCP's Retransmission Timer
  • RFC 4614 - A Roadmap for TCP Specification Documents
  • RFC 5681 - TCP Congestion Control

Others