Recently, I am making a slightly unusual mechanism at work that uses socket communication to send data from a PC using an Android device as a server.
Until now, when I talked about "communication," I was only aware of HTTP (S)
, but I wanted to take this opportunity to learn more about communication, so for the time being, I said " TCP / IP
. I'm studying from a place like "ha".
As part of that study, I tried to implement an HTTP client using socket communication, so I would like to introduce its contents. The language is Python.
For the time being, please refer to the following page for what is socket communication in the first place.
Socket programming HOWTO | python
As described on this page, this article will proceed on the premise that "socket communication is TCP for the time being".
Also, I will briefly summarize what I learned from studying.
Before the explanation of TCP and HTTP, I will briefly touch on the word "protocol". (Because I didn't understand it well)
The protocol is according to Professor Eijiro "Rules for sending and receiving data between computers".
Communication devices that exist all over the world and software that operates in them (including OS) are, of course, manufactured and developed by various companies and people. And each device and software is manufactured and developed without matching the specifications with each other.
In such a situation, even if "then let's exchange data between machines all over the world", if there is no common specification, "what kind of machine" "how to send radio waves" " I can't move my hand without getting the information necessary for implementation, such as "what data does the radio wave represent?"
That is where the "rule" called "TCP / IP" was born. As long as it is developed according to the rules described in TCP / IP, it is possible to send and receive data without having to hold meetings with each company.
And that "rule" is called a "protocol" in IT terms.
By the way, how did you create and spread such a universal protocol? !! I will omit the question because it will be long. It was easy to understand when I looked it up together with the term "OSI reference model".
TCP / IP is a collection of individual protocols required for devices around the world to communicate.
There are several types of protocols depending on the communication method and usage, and they are further classified into four layers according to the physical and software layers. The first layer at the bottom is already at the level of "what kind of machine sends radio waves".
TCP
is located in the third layer among them, and is a protocol that defines" rules for reliably transmitting and receiving communication contents between two machines, regardless of the data contents. "
Looking only at the letters, it is the same, but the position of the word itself is different between "TCP / IP" and "TCP", and "one of the protocol list called TCP / IP" is TCP.
In addition, HTTP
is located in the 4th layer, and is" a rule defined by adding more rules to TCP to optimize the format and transmission / reception timing of data sent / received for website browsing. "
As I wrote a little earlier, TCP is a protocol for exchanging data between two machines without excess or deficiency, so it does not matter what format the data sent and received there is used in what application. There is none. It is HTTP that determines the rules, and other protocols located in the fourth layer, such as SSH
and FTP
. The concept of "client" and "server" is not so different in TCP communication. The person who created the opportunity to connect first is the "client", and the person who is connected is the "server", but once connected, both can send and receive data in the same way.
With this alone, you can see that HTTP is not everything when you say "communication". It may be easier to understand if you read the following contents with that in mind.
Then, how to proceed with the concrete implementation.
If it is true, I think that it is correct to read the RFC properly or think about the design while looking at the implementation of other HTTP clients, but I guess by trial and error without looking at anything. Do you think it's better to proceed while thinking? So, this time, we are proceeding in the following way.
HTTP
that you know.In this article, I'm writing about the first part that I'll try to make while thinking for myself.
So let's start implementing it. The code can also be found on GitHub.
http_client.py
if __name__ == '__main__':
resp = ChooyanHttpClient.request('127.0.0.1', 8010)
if resp.responce_code == 200:
print(resp.body)
First is an image of how to use your own HTTP client. After passing the host and port, we aim to get the object that holds the response data.
http_client.py
class ChooyanHttpClient:
def request(host, port=80):
response = ChooyanResponse()
return response
class ChooyanResponse:
def __init__(self):
self.responce_code = None
self.body = None
if __name__ == '__main__':
...The following is omitted
Next, add the ChooyanHttpClient
and ChooyanResponse
classes according to the usage image above.
I've added it, but I haven't done anything yet.
This time, we aim to get the response code and body that will be the request result into this response
object.
Then add a socket
module to communicate.
http_client.py
import socket
class ChooyanHttpClient:
def request(host, port=80):
response = ChooyanResponse()
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host, port))
return response
class ChooyanResponse:
...The following is omitted
As explained earlier, the 4th layer HTTP
protocol is created by using the 3rd layer TCP
protocol and adding more rules.
This time, the purpose is to implement a library that communicates according to the HTTP
protocol, so import the socket
module that communicates with the TCP
that is the base.
To use the socket
module, see [Socket Programming HOWTO]
](Https://docs.python.jp/3/howto/sockets.html) It is briefly described on the page. In this article as well, we will proceed with the implementation while referring to it.
What we have added here is to use the socket
module to establish a connection with the corresponding machine on the specified host and port.
When you actually do this, it will start communicating with the specified server. (I'm not sure because nothing appears on the screen)
Well, it's hard from here.
I was able to connect to the machine of the specified host and port using the socket
module earlier.
However, as it is, no data is returned yet. It's okay to establish a connection, but it's natural because we haven't sent the "request" data yet.
So now I'll write the code to send the request to the server.
http_client.py
import socket
class ChooyanHttpClient:
def request(host, port=80):
response = ChooyanResponse()
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((request.host, request.port))
request_str = 'GET / HTTP/1.1\nHost: %s\r\n\r\n' % (host)
s.send(request_str.encode('utf-8'))
return response
class ChooyanResponse:
...The following is omitted
I added a line to execute the send ()
function and a line to assemble the string to be passed to it once.
Now you can send (fixed) data to the server.
When executed, I think that this request will appear in the access log on the server side.
Now you can send a GET request (data representing) to the specified host, but this alone is not enough to communicate with the server yet. This is because there is no code that corresponds to "receive".
If you're wondering, "Well, you sent a request, so you'll get a response, right?", That's because a commercially available HTTP client library is built that way. I haven't made it properly yet, so I can't receive a response.
In TCP, there are no particular rules regarding the timing of data transmission for each server and client. In other words, each other can "send their favorite data when they want."
However, if only that is decided, it is not possible to know what kind of data will be sent to each other at what timing by servers and clients with different creators, so we will limit that freedom to some extent and make a common recognition as a rule. Must have. One of those common perceptions is the HTTP
protocol.
In other words, in HTTP, the rule is that "when you send a request, a response is returned", so the client must implement "when you send a request, wait for the response to be received".
The code looks like this.
http_client.py
import socket
class ChooyanHttpClient:
def request(request):
...abridgement
s.connect((request.host, request.port))
request_str = 'GET / HTTP/1.1\nHost: %s\r\n\r\n' % (host)
s.send(request_str.encode('utf-8'))
response = s.recv(4096)
...The following is omitted
Added one line to the recv ()
function. This line will block the processing of this program until the server sends the data.
However, this still has problems.
I will omit the details (because I do not understand it properly), but in socket communication, data cannot be received all at once. In fact, as mentioned above, socket communication allows you to "send your favorite data whenever you want", so it is not decided how much "once" it is.
Therefore, the program does not know how much data is in a mass as it is, and does not know when to disconnect. [^ 1]
Even with the recv ()
function mentioned earlier, if you receive some data to a good point (or up to the number of bytes specified in the argument) instead of "all", it will proceed to the next process once.
In other words, this code can only accept responses up to 4096 bytes. So, modify the code so that you can receive enough data.
http_client.py
import socket
class ChooyanHttpClient:
def request(request):
...abridgement
s.send(request_str.encode('utf-8'))
data = []
while True:
chunk = s.recv(4096)
data.append(chunk)
response.body = b''.join(data)
return response
...The following is omitted
It receives up to 4096 bytes in an infinite loop and adds more and more to the array. Finally, if you concatenate it, you can receive the data from the server without missing it.
However, this is still incomplete. When I run this code, it doesn't get out of the infinite loop and I can't return the result to the caller.
As I wrote earlier, socket communication does not have the concept of "once" and there is no end to communication. This means that the program doesn't know where to end the infinite loop.
With that, the characteristic of HTTP, "If you send it once, it will be returned once" cannot be realized, so in HTTP it is decided to specify the data size (in the body part) using the Content-Length
header. The following code creates a mechanism to read it.
http_client.py
import socket
class ChooyanHttpClient:
def request(request):
...abridgement
s.send(request_str.encode('utf-8'))
headerbuffer = ResponseBuffer()
allbuffer = ResponseBuffer()
while True:
chunk = s.recv(4096)
allbuffer.append(chunk)
if response.content_length == -1:
headerbuffer.append(chunk)
response.content_length = ChooyanHttpClient.parse_contentlength(headerbuffer)
else:
if len(allbuffer.get_body()) >= response.content_length:
break
response.body = allbuffer.get_body()
response.responce_code = 200
s.close()
return response
def parse_contentlength(buffer):
while True:
line = buffer.read_line()
if line.startswith('Content-Length'):
return int(line.replace('Content-Length: ', ''))
if line == None:
return -1
class ChooyanResponse:
def __init__(self):
self.responce_code = None
self.body = None
self.content_length = -1
class ResponseBuffer:
def __init__(self):
self.data = b''
def append(self, data):
self.data += data
def read_line(self):
if self.data == b'':
return None
end_index = self.data.find(b'\r\n')
if end_index == -1:
ret = self.data
self.data = b''
else:
ret = self.data[:end_index]
self.data = self.data[end_index + len(b'\r\n'):]
return ret.decode('utf-8')
def get_body(self):
body_index = self.data.find(b'\r\n\r\n')
if body_index == -1:
return None
else:
return self.data[body_index + len(b'\r\n\r\n'):]
...The following is omitted
It's been a long time, but I'll explain what I'm trying to do in order.
Content-Length
lineAs an HTTP response format
It has been decided.
Therefore, first of all, each time data is received, it is ordered from the front.
Content-Length
Content-Length
, extract only the numerical partI am doing that. Now you can retrieve the Content-Length
.
However, Content-Length
only describes the size of the body part. The header and the response code on the first line are not included.
Therefore, using all the received data, I try to compare the size of the data after the two consecutive line breaks (that is, the body part after the first blank line) with Content-Length
. ..
With this, if the Content-Length
size and the size of the body part match (in the code, just in case it is greater than or equal to Content-Length
"), you can exit the loop and return the data to the caller. I can do it.
Now that you're finally able to send requests and receive responses, it's still useless as an HTTP client.
The request is terrible, limited to the GET method, limited to the root path, and no request header, and the response is just returning all the data including the response code and header as a byte string.
There are still many things to do, such as formatting the data around here, changing the behavior according to the header, fine-tuning the transmission / reception timing, timeout processing, etc., but this article has become quite long. I have done so, so I would like to do that next time.
For the time being, I tried to implement HTTP client-like processing, but I feel that I was able to deepen my understanding of TCP and HTTP by this alone. It's hard to make an HTTP client library ... What kind of implementation is it, such as requests
or ʻurllib`?
That's why I will continue next time.
For reference, I decided to study in a similar way by looking at this article. In this article, I made the HTTP "server", but there were many contents that were very helpful in making the client.
He explained socket communication in a very light and easy-to-understand manner, which was very helpful for me while studying socket communication. Although it is a Python document, it is helpful regardless of the language.
[^ 1]: I thought this was due to the KeepAlive header added in HTTP 1.1. If you disable this, the server will disconnect when it sends the data to the end, and the client-side recv () function will return 0, so you can detect this and break out of the loop.
Recommended Posts