https://qiita.com/jnchito/items/893c887fbf19e17d3ff9 https://qiita.com/jnchito/items/64c3fdc53766ac6f2008
To be honest, I am indebted to the author of this article. This article is an output article of the above article, so it may be better than reading this article.
"A mini language for efficiently searching and replacing character strings by specifying a pattern"
Was written in the above article. Hmmmm. Certainly vague, but I remember "patterns", searches, and replacements.
Play around with the https://rubular.com/ site to learn regular expressions.
The test string looks like this.
Name: Onikan
Phone: 03-1234-5678
Address: 1 Chuo-ku, Tokyo-2-3
When I typed \ d
, only the numerical value was displayed in blue. (By the way, backslash can be done with option + \ on mac)
In other words, \ d
represents one half-width number (0123456789). \ d
is also called ** metalanguage **, and it also means that it represents a set of characters, so it seems to be called ** character class **. Hmmmm.
This is the ** "mini language for efficiently searching and replacing character strings" ** that came out earlier, and is it one of the "patterns"?
Anyway, remember that \ d
represents one half-width number.
Since \ d
represents one half-width number, let's learn the case of representing two or three.
I typed \ d \ d- \ d \ d \ d \ d- \ d \ d \ d \ d
. This time, the whole including hyphens is selected.
\ d \ d
represents two concatenated half-width numbers. (Like 12 or 34.)
text = <<-TEXT
Name: Onikan
Phone: 03-1234-5678
Address: 1 Chuo-ku, Tokyo-2-3
TEXT
text.scan /\d\d-\d\d\d\d-\d\d\d\d/
# => ["03-1234-5678"]
By the way, if you are not familiar with the part text = <<-TEXT
, you may want to search for" Ruby here document ".
const text = "Name: Onikan\n Phone: 03-1234-5678\n Address: 1 Chuo-ku, Tokyo-2-3";
text.match(/\d\d-\d\d\d\d-\d\d\d\d/g);
// => ["03-1234-5678"]
\ n
is a line feed code. g
is called a global option. There are the following differences with and without.
--Yes: When the first one is found, the search ends. --None: Extract the matching character string.
/ \ d \ d- \ d \ d \ d \ d- \ d \ d \ d \ d /
does not correspond to all numbers. For example
etc. Learn regular expressions that can handle this. The important thing here is to find the ** pattern ** to search for. In this case, it is as follows.
--2 to 5 half-width numbers --Hyphen --1 to 4 half-width numbers --Hyphen ――4 half-width numbers
Line up in this order. The new knowledge that comes out here uses the metacharacters {n, m} and {n}
. Since it specifies the amount of characters, it is called a quantity specifier.
{n, m}
indicates that "the last character is n or more and m or less". For example, if it is \ d {1,4}
, ** half-width number ** represents 1 to 4 characters.
So, if you apply it to the previous pattern, it will look like this.
It should be \ d {2,5}-\ d {1,4}-\ d {4}
.
In the above example, "03 (1234) 5678" etc. cannot be supported. So, ** I want to change it so that it can handle hyphens or parentheses **
New patterns are "hyphens or (" and "hyphens or)". New knowledge comes out here.
"One character of either A or B" → means [AB]
. (Since it represents a set of characters, it is a kind of character class.) By the way, there is no limit to the number of characters in []
. [ABC]
represents any one character.
Therefore, "hyphen or (" is expressed as [-(]
. "Hyphen or)" [-)]
. Let's write the whole thing.
\d{2,5}[-(]\d{1,4}[-)]\d{4}
Hyphens can have a special meaning. For example, [A-Z]
stands for "A or B or C or ... Z". In other words, it represents one full-width English character. That is, it may represent a range of ** characters. ** **
If a hyphen is entered at the beginning or end of []
like [-AB]
or [AB-]
, it will be regarded as the hyphen itself.
--\ d
represents one single-byte number
--{n, m}
indicates that the immediately preceding character is n or more and m or less.
--{n}
represents exactly n characters.
--[ab] ʻa or b one letter --
[a-z] is one letter of a or b or c or ... z --
[-az]` represents-or a or z.
As I said at the beginning, this article is an output article, so I think you should refer to it.
https://qiita.com/jnchito/items/893c887fbf19e17d3ff9