ChangeLog
--2020-04-18: Corrected the behavior in the __main__
module.
Let's take a quick look at Python packages and modules, which many people seem to understand and use. It's not very organized, so I'll rewrite it someday.
I have Python machine learning code written by someone else.
There is a file you want to execute in the location ./a/b/c.py
from the current directory.
This file is written in a common format that works in two ways depending on the value of __name__
.
Inside, ./a/b/d.py
is ** absolutely imported ** by ʻimport d`.
At this time, assuming that PYTHONPATH
is left as the default, which one works correctly when it is set to python ./a/b/c.py
or when it is set to ʻimport abcfrom another python file. Will change. If the latter does not work when the former is working, and the latter works, if you rewrite
c.py` to relative import, the former will not work.
What the hell is going on is the motivation for studying this time.
Let me explain the latter situation a little more. You are running pytest ./test/<test file> .py
or python -m pytest ./test/ <test file> .py
for unit testing. The point is that the test file is not in the current directory.
I browsed the v3.8.2 version of both.
--Python Tutorial, 6. Modules: https://docs.python.org/ja/3/tutorial/modules.html --Python Language Reference, 1. Command Line and Environment: https://docs.python.org/ja/3/using/cmdline.html --Python Language Reference, 5. Import System: https://docs.python.org/ja/3/reference/import.html
Note that the tutorial reads top-level files, that is, files launched by python <file>
as ** scripts **.
The simplest
--import statement ʻimport m imports ** module **
m --The module
m is a ** Python file ** called
m.py. --This allows you to access the ** names ** (variable names, function names, class names, etc.) defined in
m.py with
m.n`.
I understand.
--Actually, it is often necessary to include the package path in addition to the module name (see below). --Actually, you can also import packages (described later)
--You can refer to your own module name with __name__
. In the m
imported by the above method, __name__
becomes " m "
--However, if you are at the top level, that is, if your module is a file specified by the argument of the python
command, __name__
becomes " __main__ "
. Think of the top-level environment as running as a __main__
module.
--Modules can import other modules
sys.path
)In the above example, the module to be imported m
, that is, the file m.py
, is at the beginning of the list of directories set to sys.path
(the value of the path
variable of the sys
module). It is searched in order from. By default, they are arranged in the following order:
python
)PYTHONPATH
environment variableHere, the first is the songwriter, ** the directory containing the executed file is searched instead of the current directory. ** ** Both are the same only if you execute the file in the current directory.
This almost revealed the reason for the problem I wrote in "Motivation." If you run python ./a/b/c.py
in the former way, the first rule puts ./a/b
at the beginning of sys.path
. Then ʻimport dis first searched for this directory, and the import succeeds because it actually has
./a/b/d.py`.
On the other hand, when pytest is executed, sys.path
starts with ./test
, so ʻimport abc does not pass in the first place, and even if it is avoided by setting
PYTHONPATH,
c ʻimport d in .py
does not pass this time.
If ʻimport dis changed to
from .import d` (was it?) And ** relative import **, the former pattern will certainly not pass [confirmation required].
sys.path
anti-patternIf the module directory structure is complicated, or if you are trying to realize dynamic import,
--Trying to do something by dynamically rewriting sys.path
in your Python program
It is said that ** is a bad move ** [citation needed]. It seems correct to make full use of ʻimportlib`.
A ** package ** is a collection of the above modules in a directory. The directory name becomes the package name. Directories can have a hierarchical structure, which corresponds to the package path (dot-separated sequence of package names).
--ʻImport a.b.cimports the module
c from the package ʻa.b
. That is, import ʻa / b / c.pyin the module's search target directory. --The directory
p must contain the **
p / __ init__.pyfile ** in order to be recognized as the package
p`.
From here, many people seem to understand it only somehow. I am so too.
-** You can import the package itself **. That is, it can be ʻimport a.b --When the package itself, for example
p, is imported, the result of running
p / __ init__.pyas a module goes into the
p namespace. --When importing the package ʻa.b
, ʻa is imported first, and then ʻa.b
is imported. That is, ʻa / __ init__.py is executed, then ʻa / b / __ init__.py
is executed.
--When importing the module ʻa.b.c, the file is executed as above, and then ʻa / b / c.py
is executed.
When the package p
is imported, the variable __path__
in p / __ init__.py
can refer to the string representing the directory of p
.
from ... import ...
statement--If you execute ʻimport a.b.c with a simple import statement, you must use ʻa.b.c.n
to reference the name n
of the module c
.
--However, you can refer to it with c.n
by setting from a.b import c
.
--from import
can be
--Name n
. If you do from a.b.c import n
, you can refer to it only with n
after that.
--Module c
. As I explained earlier
--Package b
. With from a import b
. After that, the package b
can be referenced only by b
instead of ʻa.b`.
If you import the package b
as above, b / __ init__.py
will be executed. This file is empty by default, and when it is empty, the interpreter does nothing, so importing ** b
does not allow you to access ʻa.b.c`. ** **
When importing other modules, the above-mentioned absolute import is usually used. The module search is done for sys.path
.
You can use the from import
statement above to do a ** relative import **.
Within the imported module (that is, the .py
file)
from . import m
from .. import m
from ..p import q
You can import packages, modules, names, etc. relative to each other. At this time, .
and ..
are resolved based on the ** package path ** where the current module exists.
That is, if the current module is ʻa.b.m, that is, the package it belongs to is ʻa.b
, then .
is ʻa.b and
.. is ʻa
.
Note that the __main__
module does not have a package path. An important consequence of this is that you can't do relative imports from modules running as ** __main__
. ** **
In addition, I feel that it is okay to do something like ʻimport ..p.q.m`, but it seems that it can not be done
This solves the second question. Rewriting ʻa / b / c.pyto relative import like
from .import d works fine if
c is imported with ʻimport abc
, but python a / When run as b / c.py
, c.py
behaves as module __main__
and relative import is not available. As a result, I was disappointed with the behavior.
Furthermore, as a consequence of this, modules imported from code that you want to use as both scripts and modules, such as ʻa / b / c.pyin the example, need to be
pip install`. That way you can always use absolute import, so you can use either method.
It was also confirmed that the import behavior differs depending on how pytest is started.
I tried putting a print statement in the test script (-s
is an option to prevent pytest from capturing standard output)
--pytest -s test / <test file.py>
starts sys.path
with the test
directory, followed by the system default values.
--But if you do python -m pytest -s test / <test file.py>
, the empty string " "
is added after the test
directory. Perhaps this means the current directory.
Some of the reasons for this are also known. If you specify a module with -m
when starting the Python interpreter, the current directory is added to the beginning of sys.path
. In this case, it may be pytest, not the interpreter, that adds test
to sys.path
. Because in this case, -s test / <test file.py>
is just an argument, not the name of the script to be executed [examine].
Also, when you install a command with pip, check the settings such as sys.path
when you start the command.
Any mistakes are welcome.
Recommended Posts