Recently, I am making a slightly unusual mechanism at work that uses socket communication to send data from a PC using an Android device as a server.

Until now, when I talked about "communication," I was only aware of HTTP (S), but I wanted to take this opportunity to learn more about communication, so for the time being, I said " TCP / IP. I'm studying from a place like "ha".

As part of that study, I tried to implement an HTTP client using socket communication, so I would like to introduce its contents. The language is Python.

before that

For the time being, please refer to the following page for what is socket communication in the first place.

Socket programming HOWTO | python

As described on this page, this article will proceed on the premise that "socket communication is TCP for the time being".

Also, I will briefly summarize what I learned from studying.

A brief description of the TCP and HTTP protocols

protocol

Before the explanation of TCP and HTTP, I will briefly touch on the word "protocol". (Because I didn't understand it well)

The protocol is according to Professor Eijiro "Rules for sending and receiving data between computers".

Communication devices that exist all over the world and software that operates in them (including OS) are, of course, manufactured and developed by various companies and people. And each device and software is manufactured and developed without matching the specifications with each other.

In such a situation, even if "then let's exchange data between machines all over the world", if there is no common specification, "what kind of machine" "how to send radio waves" " I can't move my hand without getting the information necessary for implementation, such as "what data does the radio wave represent?"

That is where the "rule" called "TCP / IP" was born. As long as it is developed according to the rules described in TCP / IP, it is possible to send and receive data without having to hold meetings with each company.

And that "rule" is called a "protocol" in IT terms.

By the way, how did you create and spread such a universal protocol? !! I will omit the question because it will be long. It was easy to understand when I looked it up together with the term "OSI reference model".

TCP and HTTP

TCP / IP is a collection of individual protocols required for devices around the world to communicate.

There are several types of protocols depending on the communication method and usage, and they are further classified into four layers according to the physical and software layers. The first layer at the bottom is already at the level of "what kind of machine sends radio waves".

TCP is located in the third layer among them, and is a protocol that defines" rules for reliably transmitting and receiving communication contents between two machines, regardless of the data contents. "

Looking only at the letters, it is the same, but the position of the word itself is different between "TCP / IP" and "TCP", and "one of the protocol list called TCP / IP" is TCP.

In addition, HTTP is located in the 4th layer, and is" a rule defined by adding more rules to TCP to optimize the format and transmission / reception timing of data sent / received for website browsing. "

As I wrote a little earlier, TCP is a protocol for exchanging data between two machines without excess or deficiency, so it does not matter what format the data sent and received there is used in what application. There is none. It is HTTP that determines the rules, and other protocols located in the fourth layer, such as SSH and FTP. The concept of "client" and "server" is not so different in TCP communication. The person who created the opportunity to connect first is the "client", and the person who is connected is the "server", but once connected, both can send and receive data in the same way.

With this alone, you can see that HTTP is not everything when you say "communication". It may be easier to understand if you read the following contents with that in mind.

How to proceed

Then, how to proceed with the concrete implementation.

If it is true, I think that it is correct to read the RFC properly or think about the design while looking at the implementation of other HTTP clients, but I guess by trial and error without looking at anything. Do you think it's better to proceed while thinking? So, this time, we are proceeding in the following way.

Implement a client that can be used like HTTP while keeping in mind the HTTP that you know.
Get closer to a decent HTTP client while looking at the RFC and implementation of existing HTTP client libraries

In this article, I'm writing about the first part that I'll try to make while thinking for myself.

Try to implement

So let's start implementing it. The code can also be found on GitHub.

ChooyanHttp - GitHub

Usage image

`http_client.py`


if __name__ == '__main__':
    resp = ChooyanHttpClient.request('127.0.0.1', 8010)
    if resp.responce_code == 200:
        print(resp.body)

First is an image of how to use your own HTTP client. After passing the host and port, we aim to get the object that holds the response data.

Create a class that does nothing

`http_client.py`


class ChooyanHttpClient:

    def request(host, port=80):
        response = ChooyanResponse()
        return response

class ChooyanResponse:
    def __init__(self):
        self.responce_code = None
        self.body = None

if __name__ == '__main__':

...The following is omitted

Click here for diff

Next, add the ChooyanHttpClient and ChooyanResponse classes according to the usage image above.

I've added it, but I haven't done anything yet.

This time, we aim to get the response code and body that will be the request result into this response object.

Use socket module

Then add a socket module to communicate.

`http_client.py`


import socket

class ChooyanHttpClient:

    def request(host, port=80):
        response = ChooyanResponse()

        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.connect((host, port))

        return response

class ChooyanResponse:

...The following is omitted

Click here for diff

As explained earlier, the 4th layer HTTP protocol is created by using the 3rd layer TCP protocol and adding more rules.

This time, the purpose is to implement a library that communicates according to the HTTP protocol, so import the socket module that communicates with the TCP that is the base.

To use the socket module, see [Socket Programming HOWTO] ](Https://docs.python.jp/3/howto/sockets.html) It is briefly described on the page. In this article as well, we will proceed with the implementation while referring to it.

What we have added here is to use the socket module to establish a connection with the corresponding machine on the specified host and port.

When you actually do this, it will start communicating with the specified server. (I'm not sure because nothing appears on the screen)

Try to request

Well, it's hard from here.

I was able to connect to the machine of the specified host and port using the socket module earlier.

However, as it is, no data is returned yet. It's okay to establish a connection, but it's natural because we haven't sent the "request" data yet.

So now I'll write the code to send the request to the server.

`http_client.py`


import socket

class ChooyanHttpClient:

    def request(host, port=80):
        response = ChooyanResponse()

        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.connect((request.host, request.port))
        request_str = 'GET / HTTP/1.1\nHost: %s\r\n\r\n' % (host)
        s.send(request_str.encode('utf-8'))

        return response

class ChooyanResponse:

...The following is omitted

Click here for diff

I added a line to execute the send () function and a line to assemble the string to be passed to it once.

Now you can send (fixed) data to the server.

When executed, I think that this request will appear in the access log on the server side.

Receive a response

Now you can send a GET request (data representing) to the specified host, but this alone is not enough to communicate with the server yet. This is because there is no code that corresponds to "receive".

If you're wondering, "Well, you sent a request, so you'll get a response, right?", That's because a commercially available HTTP client library is built that way. I haven't made it properly yet, so I can't receive a response.

In TCP, there are no particular rules regarding the timing of data transmission for each server and client. In other words, each other can "send their favorite data when they want."

However, if only that is decided, it is not possible to know what kind of data will be sent to each other at what timing by servers and clients with different creators, so we will limit that freedom to some extent and make a common recognition as a rule. Must have. One of those common perceptions is the HTTP protocol.

In other words, in HTTP, the rule is that "when you send a request, a response is returned", so the client must implement "when you send a request, wait for the response to be received".

The code looks like this.

`http_client.py`


import socket

class ChooyanHttpClient:

    def request(request):

...abridgement

        s.connect((request.host, request.port))
        request_str = 'GET / HTTP/1.1\nHost: %s\r\n\r\n' % (host)
        s.send(request_str.encode('utf-8'))
        response = s.recv(4096)

...The following is omitted

Click here for diff

Added one line to the recv () function. This line will block the processing of this program until the server sends the data.

However, this still has problems.

I will omit the details (because I do not understand it properly), but in socket communication, data cannot be received all at once. In fact, as mentioned above, socket communication allows you to "send your favorite data whenever you want", so it is not decided how much "once" it is.

Therefore, the program does not know how much data is in a mass as it is, and does not know when to disconnect. [^ 1]

Even with the recv () function mentioned earlier, if you receive some data to a good point (or up to the number of bytes specified in the argument) instead of "all", it will proceed to the next process once.

In other words, this code can only accept responses up to 4096 bytes. So, modify the code so that you can receive enough data.

`http_client.py`


import socket

class ChooyanHttpClient:

    def request(request):
...abridgement

        s.send(request_str.encode('utf-8'))

        data = []
        while True:
            chunk = s.recv(4096)
            data.append(chunk)

        response.body = b''.join(data)
        return response

...The following is omitted

Click here for diff

It receives up to 4096 bytes in an infinite loop and adds more and more to the array. Finally, if you concatenate it, you can receive the data from the server without missing it.

However, this is still incomplete. When I run this code, it doesn't get out of the infinite loop and I can't return the result to the caller.

As I wrote earlier, socket communication does not have the concept of "once" and there is no end to communication. This means that the program doesn't know where to end the infinite loop.

With that, the characteristic of HTTP, "If you send it once, it will be returned once" cannot be realized, so in HTTP it is decided to specify the data size (in the body part) using the Content-Length header. The following code creates a mechanism to read it.

`http_client.py`


import socket

class ChooyanHttpClient:

    def request(request):

...abridgement
        s.send(request_str.encode('utf-8'))

        headerbuffer = ResponseBuffer()
        allbuffer = ResponseBuffer()
        while True:
            chunk = s.recv(4096)
            allbuffer.append(chunk)

            if response.content_length == -1:
                headerbuffer.append(chunk)
                response.content_length = ChooyanHttpClient.parse_contentlength(headerbuffer)

            else:
                if len(allbuffer.get_body()) >= response.content_length:
                    break

        response.body = allbuffer.get_body()
        response.responce_code = 200

        s.close()
        return response

    def parse_contentlength(buffer):
        while True:
            line = buffer.read_line()
            if line.startswith('Content-Length'):
                return int(line.replace('Content-Length: ', ''))
            if line == None:
                return -1

class ChooyanResponse:
    def __init__(self):
        self.responce_code = None
        self.body = None
        self.content_length = -1

class ResponseBuffer:
    def __init__(self):
        self.data = b''

    def append(self, data):
        self.data += data

    def read_line(self):
        if self.data == b'':
            return None

        end_index = self.data.find(b'\r\n')
        if end_index == -1:
            ret = self.data
            self.data = b''
        else:
            ret = self.data[:end_index]
            self.data = self.data[end_index + len(b'\r\n'):]
        return ret.decode('utf-8')

    def get_body(self):
        body_index = self.data.find(b'\r\n\r\n')
        if body_index == -1:
            return None
        else:
            return self.data[body_index + len(b'\r\n\r\n'):]

...The following is omitted

Click here for diff

It's been a long time, but I'll explain what I'm trying to do in order.

Find the `Content-Length` line

As an HTTP response format

One response header per line
Blank line in front of body part

It has been decided.

Therefore, first of all, each time data is received, it is ordered from the front.

Extract one line (from line feed code to line feed code)
Check if it starts with Content-Length
If it starts with Content-Length, extract only the numerical part

I am doing that. Now you can retrieve the Content-Length.

However, Content-Length only describes the size of the body part. The header and the response code on the first line are not included.

Therefore, using all the received data, I try to compare the size of the data after the two consecutive line breaks (that is, the body part after the first blank line) with Content-Length. ..

With this, if the Content-Length size and the size of the body part match (in the code, just in case it is greater than or equal to Content-Length "), you can exit the loop and return the data to the caller. I can do it.

Improve

Now that you're finally able to send requests and receive responses, it's still useless as an HTTP client.

The request is terrible, limited to the GET method, limited to the root path, and no request header, and the response is just returning all the data including the response code and header as a byte string.

There are still many things to do, such as formatting the data around here, changing the behavior according to the header, fine-tuning the transmission / reception timing, timeout processing, etc., but this article has become quite long. I have done so, so I would like to do that next time.

Once summarized

For the time being, I tried to implement HTTP client-like processing, but I feel that I was able to deepen my understanding of TCP and HTTP by this alone. It's hard to make an HTTP client library ... What kind of implementation is it, such as requests or ʻurllib`?

That's why I will continue next time.

reference

Create a simple HTTP server-Qiita

For reference, I decided to study in a similar way by looking at this article. In this article, I made the HTTP "server", but there were many contents that were very helpful in making the client.

Socket Programming HOWTO | python

He explained socket communication in a very light and easy-to-understand manner, which was very helpful for me while studying socket communication. Although it is a Python document, it is helpful regardless of the language.

[^ 1]: I thought this was due to the KeepAlive header added in HTTP 1.1. If you disable this, the server will disconnect when it sends the data to the end, and the client-side recv () function will return 0, so you can detect this and break out of the loop.

[TCP / IP] After studying, try to make an HTTP client-like with Python

before that

A brief description of the TCP and HTTP protocols

protocol

TCP and HTTP

How to proceed

Try to implement

Usage image

http_client.py

Create a class that does nothing

http_client.py

Use socket module

http_client.py

Try to request

http_client.py

Receive a response

http_client.py

http_client.py

http_client.py

Find the Content-Length line

Improve

Once summarized

reference

`http_client.py`

`http_client.py`

`http_client.py`

`http_client.py`

`http_client.py`

`http_client.py`

`http_client.py`

Find the `Content-Length` line