Here is the regular expression for "date" in python.
The year version is as follows.
pattern = r'[12]\d{3}[/\-Year](0?[1-9]|1[0-2])[/\-Month](0?[1-9]|[12][0-9]|3[01])Day?$'
# OK
#February 22, 2020
# 2020-2-22
# 2020/2/22
# 1985/01/12
# 2010/12/11
# 2022/02/22
# NG
# 9999/99/99
The Japanese calendar version is as follows.
pattern = r'(Meiji|Taisho|Showa|Heisei|Reiwa)\d{1,2}Year(0?[1-9]|1[0-2])Month(0?[1-9]|[12][0-9]|3[01])Day'
# OK
#Reiwa February 22, 2002
#February 22, 2nd year of Reiwa
#February 22, 1990
#February 22, 1945
#February 22, 1918
#February 22, 1897
# NG
#Reiwa September 99, 1999
The environment uses Google Colaboratory. The Python version is below.
import platform
print("python " + platform.python_version())
# python 3.6.9
The regular expression check tool used: https://regex101.com/ While checking here, we will create a regular expression and implement it in the code.
Also, this is easy to understand about Python regular expressions in general. https://qiita.com/luohao0404/items/7135b2b96f9b0b196bf3
Let's write the code immediately. First, import the library for using regular expressions.
import re
First of all 2022/02/22 Let's create a regular expression that matches the string.
pattern = r'2022/02/22'
Of course, this is an exact match, so it matches. Let's check with the code.
pattern = r'2022/02/22'
string = r'2022/02/22'
prog = re.compile(pattern)
result = prog.match(string)
if result:
print(result.group())
# 2022/02/22
The matched string is displayed. After that, for the sake of simplicity, only the regular expression pattern is described.
In addition to "2022/02/22", there are other dates such as "1985/01/12" and "2010/12/11". The regular expressions that match these are as follows.
pattern = r'\d\d\d\d/\d\d/\d\d'
The regular expression used is:
letter | Description |
---|---|
\d | Any number |
Example | Matching string |
---|---|
\d\d\d\d | 2022 |
\d\d | 02, 22 |
The regular expression above can be expressed more easily.
pattern = r'\d{4}/\d{2}/\d{2}'
The newly used regular expressions are:
letter | Description |
---|---|
{m} | Repeat m of the previous character m times |
Example | Matching string |
---|---|
\d{4} | 2022 |
\d{2} | 02, 22 |
However, this will result in an impossible date string, such as "9999/99/99". This time, we will allow only the following conditions as the YYYY / MM / DD format.
The modified regular expression is as follows.
pattern = r'[12]\d{3}/(0[1-9]|1[0-2])/(0[1-9]|[12][0-9]|3[01])'
The newly used regular expressions are:
letter | Description |
---|---|
[abc] | a,b,Any letter of c |
Example | Matching string |
---|---|
[12]\d{3} | 1000~2999 |
0[1-9] | 01~09 |
1[0-2] | 10~12 |
[12][0-9] | 10~29 |
3[01] | 30, 31 |
We also used the following regular expressions.
letter | Description |
---|---|
(abc|efg) | Either abc or efg string |
Example | Matching string |
---|---|
(0[1-9]|1[0-2]) | 01~09 or 10~12 That is, 01~12 |
(0[1-9]|[12][0-9]|3[01]) | 01~09 or 10~29 or 30, 31 That is, 01~31 |
You now have a regular expression that matches only the above conditions.
However, with this, things that are not 0-filled (0 padded), such as "2020/2/22", cannot be taken. The modified regular expression is as follows.
pattern = r'[12]\d{3}\/(0?[1-9]|1[0-2])\/(0?[1-9]|[12][0-9]|3[01])$'
The newly used regular expressions are:
letter | Description |
---|---|
? | Repeat 0 or 1 of the previous character |
Example | Matching string |
---|---|
0?[1-9] | 1~9 or 01~09 |
We also used the following regular expressions.
letter | Description |
---|---|
$ | End of string |
Without this, "2022/02/22" will only match until "2022/02/2".
With this, it is possible to handle the one without 0 padding (0 padding).
Furthermore, let's modify it so that it matches not only "/ (slash)" but also "-(hyphen)" and "year / month (day)".
pattern = r'[12]\d{3}[/\-Year](0?[1-9]|1[0-2])[/\-Month](0?[1-9]|[12][0-9]|3[01])Day?$'
Here, "\-" is an escape, which means that "-(slash)" is not used in a special meaning but is a character.
Now you have a regular expression that matches not only "/ (slash)" but also "-(hyphen)" and "year / month (day)".
Dates include not only the Western calendar but also Japanese calendar dates such as "February 22, 2nd year of Reiwa", so let's create a regular expression here as well.
Consider the following as conditions for dates in the Japanese calendar. --A character string that starts with any of Meiji, Taisho, Showa, Heisei, and Reiwa. --The year is a two-digit number. There is no such thing as "1999", but this time it is acceptable. --The numbers are separated only by "year / month / day". Excludes "/ (slash)" and "-(hyphen)".
The regular expression is:
pattern = r'(Meiji|Taisho|Showa|Heisei|Reiwa)\d{1,2}Year(0?[1-9]|1[0-2])Month(0?[1-9]|[12][0-9]|3[01])Day'
This time, I used Python to create a regular expression for "date".
Character strings with a certain pattern, such as dates, times, and amounts, are compatible with regular expressions. Try to extract various character strings with regular expressions.
Recommended Posts