Summary
- I wanted to record Radiko, and when I looked it up, various people wrote the code.
- I could just git clone it and run it, but I was wondering what kind of processing it was doing, so I wrote the code myself while reading the code of the predecessors.
- There was something I learned when I tried it, so I summarized it.
- Protocol called hls (HTTP Live Streaming) used in Radiko
- What is happening when listening to a program on Radiko
- How to check the process whose specifications are not published
- What I got together how I investigated
Tried environment
- OS: Mac OS 10.14.6
- docker desktop community: 2.1.0.5
- docker image: python:3.7.5-buster
- Google Chrome: Version: 78.0.3904.108 Official Build 64-bit
What I made
https://github.com/1021ky/radiko_recorder
What I found
Protocol called hls used in Radiko
- Radiko uses hls (short for HTTP Live Streaming), a protocol for distributing media files such as video and audio.
- It seems that I used flash a few years ago, but I started using hls due to the influence of flash will end in 2020.
- hls is specified in RFC8216.
- The server allows you to get two types of playlists, one is a split media file and the other is a description of how to play the split file.
- Master Playlist that describes the entire distribution
- Media Playlist that describes each of the divided files
- Playlist is a UTF-8 text file with the extension .m3u8 or .m3u and can be referenced by URI.
- The client refers to the Playlist, obtains the URI of the media file and the information necessary for playing the media file from it, and plays it.
Outline of processing performed when listening to a program on Radiko
It turned out that the client and the server are communicating with each other by HTTP as shown in the figure below.
How did you find out
Roughly it looks like this. I couldn't go in this order smoothly, and I went back and forth several times.
- Read the code written by another person → I understand that authorization processing and audio file acquisition processing are required
- Write the code → Call Radiko's API and get HTTP status code 40X, so you know that something is missing
- Find out what kind of communication is being done with the developer tools of Google Chrome → Understand that it was insufficient by looking at the specific communication content
- After writing the code, I will summarize what I did this time → There were words that I could not explain well → I understand that hls and m3u8 are specified by RFC, not by Radiko's own specifications
I read the code from the following article.
- Simple Radiko Recording Script
- [Python] Play Radiko
- I made a software that automatically searches and records radiko, super A & G, sound spring and sound with Python3
- Road to radiko recording (download) Part 1
Since 1 was written in a shell script, I googled the commands I didn't understand and immediately knew what kind of package I needed.
- rtmpdump for getting audio files played in flash
- Ffmpeg to convert the acquired file to MP3
However, I wasn't sure why the pre-processing was done or what it was doing. (The shell power is insufficient.)
I was trying to write a few in Python, which I usually use, so it was easy to understand, and I found that the process I didn't understand in the shell script earlier was the authorization process.
In 4, I found out how to find out what kind of logic is used to generate the partial key of Radiko's original specifications.
Now that we have the packages and libraries we need, we've written the code.
So, as mentioned above, I called Radiko's API and returned an HTTP status code of 40X series, so I investigated what the Chrome developer tools are doing on the browser that fails with my code.
The following is when I investigated the authorization process.
Looking at the request content, I knew when and where I needed something in the request header that started with X-Radiko-***. When I took a quick look at the code, I had overlooked it. So, it finally started to work.
What I got in the article
What is hls when putting it together? What is m3u8? I wondered again, and when I looked it up, I found that it was decided by RFC. If it was decided by RFC, I also found that there might be a library that makes them easy to handle.
The m3u8 library was easy to use and cleaned up my messy code.
Also, reading the RFC and understanding the terms made it easier to name methods and variables. For example, the variables Master Playlist and Media playlist were very appropriate before reading the RFC and had names that were unclear when read again.
A recently read book, Isao Ueda. 101 principles that will be useful for a lifetime that you want to learn by the third year of Principles of Programming.
There is also a saying, "Good programmers write good code, great programmers borrow good code."
There was. When I read it, I had an understanding that I could understand somehow, but this time, I feel that I have deepened my understanding that knowing what is standard will lead to an improvement in level.
Summary
- Learned about hls and its implementation through Radiko recording scripting
- I was able to experience the great benefits of knowing what is standard
- Easy to find library (no need to make extra)
- Easy to implement
- It is a good opportunity to think of easy-to-read code by reading and comparing the code with the same function.