Don't use \ d in Python 3 regular expressions!

It seems that it is not known unexpectedly, but in the regular expression of the Python 3 standard re module, \ d also matches so-called double-byte numbers.

I will actually try it.

`python`


>>> import re
>>> re.findall(r"\d", "012０１２")
['0', '1', '2', '０', '１', '２']
>>>

\ d also matches '0', '1', and '2'.

The reason why this behavior is not well known is

The explanation of "Regular expression HOWTO" in the official Python 3 document is now quite misleading. It has become.
Few people bother to test whether \ d matches so-called double-byte numbers.
Devout Pythonista tends to prefer the regex-free approach

There may be other reasons as well.

The re module documentation also states that it is recommended to use [0-9] instead of \ d to match \ d only to [0-9]. , but for reasons such as wanting to use the long regular expressions for other languages as they are, \ If you want to keep d, you can either add flags = re.ASCII to the argument or add(? A)to the beginning of the regular expression.

`python`


>>> import re
>>> re.findall(r"\d", "012０１２", flags=re.ASCII)
['0', '1', '2']
>>> re.findall(r"(?a)\d", "012０１２")
['0', '1', '2']
>>>

However, these flags affect the entire regular expression. For more information, please read the re module documentation.

Note that flags = is omitted. You can write it as re.findall (r" \ d "," 012012 ", re.ASCII), but if you omit it poorly, you may get hooked, so it is strongly recommended not to omit it.

By the way, I myself

Do not write code shared by Python 2/3
I want to use the regex module instead of the standard re module
When the regular expression is long, it is better to have (? A) etc. at the beginning.

For reasons such as

`python`


import regex

RE_DIGITS = regex.compile(r"""(?xa)
    \A\d+\Z""")


def is_digits(digits):
    if RE_DIGITS.match(digits) is not None:
        return True
    else:
        return False

I like to write like this, but I still feel uneasy when using \ d, so I try to write[0-9]as much as possible. (ʻIs_digits () `is just an example, so just in case)