This article details the basic principles of ** TCP **, applications, ** architecture **, etc., and explains how to use ** TCP ** to build high-performance servers. ..

This blog is a translation from the English version. You can check the original from here. We use some machine translation. We would appreciate it if you could point out any translation errors. *

TCP features

TCP is a connection-oriented protocol that provides reliable full-duplex communication to user processes. In this way, you can ensure reliable and orderly data packets and support traffic control. Here's why TCP should implement the above behavior, starting with the following aspects:

Why the IP network layer cannot ensure the reliability of data packets
How to secure data packets that are reachable and orderly by TCP 3, how TCP supports traffic control 4, TCP status and applications

OSI network layer

To understand why data packets are not reliable at the IP network layer, let's first look at the OSI network layer. In the following layers, TCP is located in the transport layer, ensuring protocol reliability and continuity. Since the specific packets sent and received are determined by the link layer and physical layer below them, the work of TCP is also based on the optimization and improvement underneath.

Communication between the client and server uses the application protocol. Communication at the transport layer uses TCP, TCP uses lower layer IP, and IP communicates using some form of data link layer.

It is known that the data in the network will eventually be transmitted over multiple router connections. The underlying Ethernet protocol defines how electronic signals form data packets, which solves the problem of point-to-point communication on local area networks (LANs), but on multiple LANs. It cannot solve the problem of intercommunication.

The IP protocol used at the network layer defines its own set of address rules, which primarily addresses addressing and routing issues to find the best route to send information according to the IP address of the other party. I will solve it. The LAN is connected through a router and directs packets to be forwarded to a particular routing interface based on the IP protocol. However, the IP protocol does not ensure the arrival and integrity of packets, and some packets will be dropped to ensure data transmission efficiency, especially when the network is congested. I will.

This is done in TCP to ensure the integrity, orderliness, and reliability of data packets.

Deep dive into TCP

TCP packet configuration

Many networks have a maximum transmission unit (MTU), which is the limit for data frames at the link layer. For example, the MTU on Ethernet is 1,500 bytes. IP datagrams are sent over Ethernet. If its length is greater than the MTU value, then the shard must be sent so that the length of each shard is less than the MTU.

The data packet also contains header information, including IP header information and Ethernet header information, in addition to its own TCP header. IP packets require at least 20 bytes to load Ethernet data packets. Therefore, the maximum load of IP data packets is 1480 bytes.

So what is the size of a TCP packet?

You will need the value of MSS to determine this. MSS is a TCP concept (located in the options field of the header). MSS is the maximum value of a data segment that a TCP data packet can send each time. If the TCP packet is longer than the MSS, it must be sent on a segment-by-segment basis. If MSS is not configured, the default value is 536 bytes. That is, one TCP packet is about 500 bytes.

Ensuring reliability

As mentioned above, the underlying router does not ensure the reliability or order of packets when forwarding them.

First, to ensure packet integrity, TCP subpackages packets larger than the MSS based on the MSS. The default MSS is 563 bytes, which is smaller than the MUT for packets because it is sharded at the network layer.

Next, SEQ and ACK are added, and a timeout retransmission mechanism is adopted to ensure packet reliability.

SEQ

To ensure the order of the packets, TCP assigns a sequence number (SEQ) to each packet. This allows the receiver to restore the packets in sequence. Also, if packets are lost, you can know which packets were lost. In general, the SEQ of the first packet is a random number and can start at 1.

ACK

Now that the SEQ has been assigned, how do you ensure the arrival of the package?

This is determined based on ACK. Each time a packet is received, the recipient must return an ACK so that the sender can confirm that it was sent. In addition, the receiver must validate each packet. If an error is found during validation, the ACK will not be sent, triggering the sender to time out and resend.

The ACK contains the following information:

--SEQ to receive the next packet --Remaining capacity of the receiving window of the receiver

I'm using wireshark to capture oschina packets and see the data from the three-way handshake.

Native IP: 192.168.1.103 oschinaIp: 116.211.174.177 Three-way handshake process: 1.me->osChina:syn=1 seq=x ack=0 2.osChina->me:syn=1 seq=y ack=x+1 3.me->osChina:seq=x+1 ack=y+1

1、me->osChina:syn=1 seq=0 ack=0

2、 osChina->me:syn=1 seq=0 ack=0+1

3、 me->osChina:seq=0+1 ack=0+1

Let's compare the processes of three parties.

** Resend timeout **

We know that the network is very unstable. Even if you add SEQ or ACK to a data packet to ensure order, there is no guarantee that problems such as packet loss and timeout will not occur. What if the data sent by the sender or the ACK returned by the recipient is lost or timed out on the network?

RTO, Retransmission Time Out. An evaluation method is needed to determine if a packet has timed out. RTT measures the round-trip time of a given connection. As network traffic changes, time changes accordingly. TCP needs to track these changes and dynamically adjust the RTO.

If the sender does not receive an ACK for the packet within a certain amount of time, it is determined that the packet has been lost in the network and the packet is automatically retransmitted. This mechanism is called a retransmission timeout.

If within this period the sender does not receive the ACK message due to the loss of the message from the recipient, the sender resends the packet to the recipient. If the sender receives an ACK message for this packet after the timeout timer, but the sender has already retransmitted this packet due to a timeout, the sender does not process the ACK at this point and simply discards it. After receiving this packet, the receiver returns an ACK message again.

Traffic control

From the above, we can see that TCP can ensure the reliability of data, but efficiency must also be considered. There are three things to consider:

Support for sending packets in bulk
Supports congestion control according to network conditions
A function to grasp the state of the receiver and reduce the burden on the receiver

Based on the above three requirements, we are taking the following measures.

Sliding window

It is too inefficient to send and check TCP packets one by one. Even if reliability is ensured, efficiency cannot be ensured by sending and checking packet by packet. In such a case, you need a method to send and check all at once, which is the sliding window.

Sliding send window:

In the send window, from left to right, the data before this window must be the data sent and confirmed by the recipient, and the data entering the send window is the data that the sender can send and send. The data after the window is the data that cannot be sent.

Two solutions have been proposed in the event of a timeout or loss.

1, Go-Back-N. All packets with a SEQ following the SEQ of the lost package will be retransmitted. 2. Select ARQ to send only lost packets and send without duplication (high efficiency and prevent sending of duplicate packets.

The sliding window also has the ability to inform the sender of the processing status of the receiver. Assuming the TCP receiver's cache is full and cannot process any more data, but the sender does not know it, in this case the sender can tell the sender the size of the current sliding window and no more. Does not send the data of. In this case, the sender will not send any more data, provided that it informs the sender of the size of the current sliding window each time it sends a packet.

Also, the recipient sends an ACK immediately after receiving the data, but at the same time declares the sender the size of the window as 0.
Also, until there is enough free space in the cache, an ACK will not be sent immediately when a packet arrives. This will prevent the sender from sliding the window. But there are also problems. The delay in sending the ACK by the receiver must not exceed the timeout period. If it is too long, the sender may mistakenly think that the data has been lost and resend it.

Congestion control

I know that the network situation is unstable. In a good case, you can send more packets. In the worst case, if the packet transmission rate does not change, not only will the network load increase, but there will be too many packets, loss will occur, timeout retransmissions will increase, and communication efficiency will definitely decrease.

Based on this, both parties of TCP communication hold a value called a congestion window (cwnd, congestion window) that depends on the congestion rate in the network, and the value of the transmission window on the transmitting side is the size of the congestion window. It is a value equal to. If there is no congestion in the network, you can increase the value in the congestion window so that the sender can send more data to the network. Otherwise, reduce the value of the congestion window so as not to increase the congestion rate of the network.

TCP currently has four main algorithms for congestion control:

1, slow start 2, avoiding congestion 3, fast retransmission 4, fast recovery

I will not introduce a concrete algorithm implementation. A roughly implemented feature is to find the right transmission rate from the current network conditions so that the network is not overloaded. For example, Slow Start means that the transmission speed is slow at first, and the rate is adjusted based on the packet loss that occurs. If there is no packet loss, increase the transmission speed. If packet loss occurs, the transmission speed will decrease.

TCP status

As any TCP user knows, a 3-way handshake occurs when TCP establishes a connection and a 4-way handshake occurs when the connection is lost. So what is the state?

It's worth remembering the figure above. Let's organize by looking at the figure below and see the specific application status.

Thus, if the connection is successfully established, the status will be ESTABLISHED. If the status of the receiving side is SYN_RECV, it means that the message has been replied to the second handshake message, and it is waiting for reconfirmation of the sending side. If your network is subject to a large number of SYN attacks, there are many SYN_RECV statuses. In this case, identifying these IP addresses and using firewall filtering can solve a number of fake connectivity issues.

Lost connection-TIME_WAIT

On the network, one party is actively closed, but not in the four-way handshake. Are there any TCP-established channels left? How long will it be closed? The TCP status at this time is TIME_WAIT. In reality, you can imagine that this is often the case. Many closed connections are actively closed rather than handshake communications. If it is closed at this time, can the previous TCP channel be reconnected? Or do I need to reconnect?

For both TCP implementations, you must choose a value for MSL. The default value is 2 minutes or 30 seconds. The default value for TIME_WAIT is twice that of MSL and lasts between 1 and 4 minutes. MSL is the longest time for an IP data packet to survive in the network.

Two reasons for the existence of TIME_WAIT: 1. Because a reliable TCP full-duplex connection has ended 2. Because old duplicate packets are allowed to disappear in the network

TCP must prevent the reproduction of old duplicate packets of the connection after the connection is closed, and is mistaken for the embodiment of the same connection. If TIME_WAIT is long enough, which is twice the length of the MSL, it is sufficient to allow it to survive at most during the MSL before packets in one direction are dropped.

From the TIME_WAIT status to the CLOSED status, there is a timeout setting, which is 2 * MSL (RFC793 defines MSL as 2 minutes and Linux as 30 seconds). If this time limit is exceeded, the current TCP channel is defined as closed.

Alibaba Cloud is the No. 1 (2019 Gartner) cloud infrastructure operator in the Asia-Pacific region with two data centers in Japan and more than 60 availability zones in the world. Click here for more information on Alibaba Cloud. Alibaba Cloud Japan Official Page *

TCP: Basic Principles and Application Architecture