Preface

There are many articles on how to retrieve YouTube video URLs, but support for shortened URLs that start with https://youtu.be/ generated when you press the" Share "button, and URLs. If you include query parameters (for example, t = 15 that specifies the time or feature = youtu.be that indicates the transfer from the shortened URL), I felt that all of them were not considered, so write them here as a memo Try. By the way, the YouTube URL query parameter t, which indicates the playback start position, is https://youtu.be/r4Mkv-q4NmQ?t=5437 and Like https://youtu.be/r4Mkv-q4NmQ?t=5437s Of course, all are specified in seconds Like https://youtu.be/r4Mkv-q4NmQ?t=1h30m37s If you set ◯ h △ m □ s, the URL will start playing from" ◯ hours △ minutes □ seconds "!

Of course, it is possible to omit ◯ h and use △ m □ s.

The YouTube URL in this article is basically my posted video or channel URL!

Source code

Works with Python3 series. It seems that there is no ʻurllib.parse` module in Python2 series.

import urllib.parse
import re

##############################################################
##Extract YouTube video id from URL list
##Supports normal URLs and shortened URLs. Error message is displayed for unsupported URLs
##Arguments: List of URLs
##Return value: List of extracted video ids
##############################################################
def pick_up_vid_list(url_list):
  vid_list = []
  pattern_watch = 'https://www.youtube.com/watch?'
  pattern_short = 'https://youtu.be/'

  for i, url in enumerate(url_list):
    #When using a normal URL
    if re.match(pattern_watch,url):
      yturl_qs = urllib.parse.urlparse(url).query
      vid = urllib.parse.parse_qs(yturl_qs)['v'][0]
      vid_list.append(vid)

    #For shortened URLs
    elif re.match(pattern_short,url):
      # "https://youtu.be/"The 11 characters following the video ID
      vid = url[17:28]
      vid_list.append(vid)

    else:
      print('error:\n URL is\"https://www.youtube.com/watch?\"Or')
      print('  \"https://youtu.be/\"Please specify the URL that starts with.')
      print('  - '+ str(i+1)+ 'Item:' + url)
  return vid_list

Brief commentary

For regular URLs that start with https://www.youtube.com/watch?, the video ID corresponds to the v parameter of the URL query, so I'm extracting it! In the shortened URL that starts with https://youtu.be/, the 11 characters following https://youtu.be/ are always the video ID, so I'm taking it out!

I was worried about the possibility of carrying up to 12 characters and thought I had to look for it with a regular expression, but apparently it's okay. → [About the risk of the v value of YouTube being carried-Nipotan Research Institute](http://blog.livedoor.jp/nipotan/archives/50588074.html" About the risk of the value of v of YouTube being carried --Nipotan Research Institute ") Also, according to this article, it seems that the video ID is made up of [0-9] [a-z] [A-Z], - and _. According to "[Characters that can be used in URLs, characters that cannot be used](https://www.ipentec.com/document/web-url-invalid-char" Characters that can be used in URLs, characters that cannot be used ")" It seems that it can not be used for anything other than this, so I will not increase the character type, and if it becomes insufficient, I will increase the number of digits.

Example of use

url_list = [
'https://www.youtube.com/watch?v=k3nPaVj8-3w',
'https://www.youtube.com/watch?v=2k-uF-QPcEM&t=5',
'https://www.youtube.com/watch?v=5_Vy0ZtPo_w',
'https://youtu.be/_t-i0KLiJBk',
'https://youtu.be/tfIvsrRxaXg',
'https://youtu.be/biaC_2Mx7Mw?t=283',
'https://www.youtube.com/',
'https://www.youtube.com/channel/UCDWM7dKT5vLXqSi_YljdlBw']
vid_list = pick_up_vid_list(url_list)

for vid in vid_list:
  print (vid)

Execution result:

error:
URL is"https://www.youtube.com/watch?"Or
  "https://youtu.be/"Please specify the URL that starts with.
  -7th: https://www.youtube.com/
error:
URL is"https://www.youtube.com/watch?"Or
  "https://youtu.be/"Please specify the URL that starts with.
  -8th: https://www.youtube.com/channel/UCDWM7dKT5vLXqSi_YljdlBw
k3nPaVj8-3w
2k-uF-QPcEM
5_Vy0ZtPo_w
_t-i0KLiJBk
tfIvsrRxaXg
biaC_2Mx7Mw

Afterword

Some standard Python methods can analyze query parameters! Great comfort! I can't do it without using purl.js with JavaScript! Well, of course you can implement it yourself, but ... it's a hassle.

References

How to use regular expressions in Python --Qiita How to use Python's regular expression module re (match, search, sub, etc.)| note.nkmk.me Get / create / change URL query string (parameter) in Python| note.nkmk.me