――I made the contract for the optical line of my house 1 Gbps, but when I tried to download a large file (ISO of Linux) etc. by HTTP, the speed did not come out
--There is a limit to the throughput that can be output with a single TCP connection. --How to check ~~ TCP receive window size ~~ for Linux (to be exact, kernel buffer size)
$cat /proc/sys/net/ipv4/tcp_rmem
4096 87380 6291456
From the left [min, default, max]
So the maximum throughput is
T_{max} = win / RTT
Therefore, if it is a single TCP connection communication with a server with RTT = 30ms in the default state ~~ About 87380 [byte] * 8/30 [ms] ≒ 23.3 [Mbps] only ~~ In fact, ** if there is no congestion ** the window size will get bigger and bigger, so it will be faster.
Of course, TCP has a window scale option to support wideband networks. The window size can be expanded up to 1 Gbyte (it is unknown whether it is actually used)
Theoretically speaking, if there are multiple connections, the bandwidth will be N times if N are bundled.
Wget and curl are famous as tools that can be used from the command line There is aria2 etc. that can use multiple connections Use explosive downloader aria2, which is several times faster than curl and wget --Qiita When establishing multiple connections, do not overload the other server
HTTP has a Range Request RFC 7233 — HTTP / 1.1: Range Requests Implementation is in popular Python for the time being
--For example, suppose you request a 1000 byte file --If you send a GET request with'Range: bytes = 0-499' in the header, --Add'Content-Range: bytes 0-499 / 1000'to the response header and return only the first 500 bytes of the file in the body. --Status code is '206 Partial Content'
However, in some cases the server does not accept Range headers.
Use this feature to request different parts of a file from multiple TCP connections at the same time
Python has a module called selectors that can handle select system calls at a higher level (in the standard library!) 18.4. selectors — High level I / O multiplexing — Python 3.6.1 documentation This guy monitors and multiplexes multiple sockets Use like this
#Imagine a connection with two TCP echo servers, A and B
import selectors
import socket
#Omission
sel = selectors.DefaultSelectors()
sock_A = socket.create_connection(address_A)
sock_B = socket.create_connection(address_B)
sel.resister(sock_A, selectors.EVENT_READ)
sel.resister(sock_B, selectors.EVENT_READ)
sock_B.sendall('Hello'.encode()) # send something to A
sock_B.sendall('Hello'.encode()) # send something to B
while True:
events = sel.select()
for key, mask in events:
message = key.fileobj.recv(512)
print(message.decode())
--Since it is not possible to keep all the pieces of the file that are returned separately in the memory, write them to the file sequentially from the place where the order is aligned. --It is not a good decision to continue using a poorly performing TCP connection, so evaluate each connection, discard the poorly performing connection, replace it with a new one, and resend the request.
https://github.com/johejo/rangedl There are still some bugs
Environment Python 3.6.1
$ pip install git+http://github.com/johejo/rangedl.git
$ rangedl [URL] -n [NUM_OF_CONNECTION] -s [SPLIT_SIZE_MB]
--By default, tqdm shows the progress bar. The progress bar is not displayed with the -p option. --For security reasons, the number of connections cannot exceed 10. --If the split size specified by the option is smaller than the value of'File size / Number of connections', the value of'File size / Number of connections' is forcibly set as the split size.
――Depending on the mood of the line, I was able to download at about 200Mbps. --When split_size is set to 1MB, the memory usage is about 30-80MB. Is it unavoidable that the CPU usage is high ...
Recommended Posts