About the Python import system
--Module is a .py
file
--The package is the directory where you put __init__.py
You may understand that, By moving away from physical files and thinking a bit more abstractly, I think we can get a better view of the entire system, such as module names, relative imports, initialization timing, and name visibility.
Note that this article does not cover the ** namespace package [^ 1] ** (because I don't understand it).
I browsed the v3.8.2 version of both.
--Python language reference, 7.11. import statement: https://docs.python.org/ja/3/reference/simple_stmts.html#the-import-statement --Python Language Reference, 5. Import System: https://docs.python.org/ja/3/reference/import.html --Python Tutorial, 6. Modules: https://docs.python.org/ja/3/tutorial/modules.html
I confirmed the operation with Python v3.8.2. Verify the operation as a test case of pytest.
pip install pytest
See the appendix for checking the operation of pytest itself.
--A module is a namespace unit that can be imported.
--All code runs in a module (namespace). The top level module name is __main__
.
--When you import a module, you can see the module and its namespace from the current namespace.
--A package is a module that can have submodules.
--Once imported modules are cached globally.
A brief summary of the attributes:
__name__ |
__path__ |
__package__ |
|
---|---|---|---|
module: | module名 | None | Parent package name |
package: | package名 | packageのパス | package名 |
This is a commonly used concept in programming languages
** This is a function that allows you to use "names" such as variable names x
in different places in a program with different meanings without fear of collision. ** **
For example, builtins.open
and ʻos.open` are both function names, but because they belong to different namespaces.
It can be defined and used as different.
The "module" described below is this ** namespace unit **.
There is a glossary in the Python docs with a "namespace" entry: https://docs.python.org/ja/3/glossary.html#term-namespace
As mentioned in the overview, a module is a ** namespace unit that can be imported **.
First, because modules are namespace units, for modules m
and n
, the names in m`` x
(= mx
) and the names in n
x
(=nx
) Is distinguished.
Second, less consciously, all the code runs in a module, that is, in a namespace.
When the code declares and defines a name, it is registered in that namespace.
Immediately after the Python interpreter is launched, the code is running in the ** __ main__
module namespace **.
This is often used in ** a way to tell if a .py
file was launched directly or imported **.
The module can then be imported. Importing a module means
--Find and cache the module and --Load and initialize, --Binding a module to the current namespace
That is. The document says 2 steps, but if you think of it as 3 steps, the processing unit is easy to understand.
Simply put, by importing the module m
as ʻimport m`
--Search for module m
in sys.path
and register the module object in the global cache sys.modules
.
--Register a name in that namespace by loading and initializing the module m
. For example, m.f
or m.C
.
--Register the name m
in the current namespace so that it points to the module m
.
Start the Python interpreter in interactive mode and see for yourself. (Partially shaped for readability)
[.../module-experiments]$ python
Python 3.8.2 (default, Apr 18 2020, 18:08:23)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print(__name__)
__main__
>>> import m
>>> sys.modules['m']
<module 'm' from '.../module-experiments/m.py'>
>>> m
<module 'm' from '.../module-experiments/m.py'>
>>> m.f
<function f at 0x108452b80>
>>> m.C
<class 'm.C'>
You can see that the module immediately after starting is __main__
.
You can also see that by importing m
, you can access m
, m.f
and m.C
.
The imported modules are cached in the local namespace. Therefore, like any other name, modules imported within a function cannot be accessed by other functions. However, since the cache is global, it can be accessed.
A package is just like a module, except that you can search for and import submodules within it. By using packages, you can create a hierarchical structure of modules.
The name of the module (or package) will then be something like ʻa.b.c separated by dots. By importing the module ʻa.b.c
with ʻimport a.b.c`
--Import module (package) ʻa --Import the submodule (package)
b (ʻa.b
) in the namespace of module ʻa --Import the submodule
c (ʻa.b.c
) in the namespace of module ʻa.b`
This allows you to access the module ʻa.b.c with ʻa.b.c
.
** Note: You will not be able to access it with c
. ** **
You can also see that this rule allows you to import packages and modules in the same way.
Of course, ʻa.d and ʻa.d.e
cannot be accessed immediately after importing ʻa.b.c. If you do ʻimport a.d.e
here, ʻa` exists in the cache, so
--Import the submodule (package) d
(ʻa.d) in the namespace of module ʻa
--Import the submodule ʻe (ʻa.d.e
) in the namespace of module ʻa.d`
Only happens, so at these points the module is initialized.
Technically, a package is a module that has the __path__
attribute.
This attribute is used when looking for subpackages.
In other words, in ʻimport a.b.c`,
--Find ʻa in
sys.path and import --In the namespace of module ʻa
, find ʻa.b from ʻa.__path__
and import it.
--In the namespace of module ʻa.b, find ʻa.b.c
in ʻa.b.path` and import it.
That is happening.
Note that the module containing the package has the __name__
attribute.
For imported modules, this value is the fully qualified name of the module (such as ʻa.b.c`).
Since a package is a module, it can be imported with the import statement like a module.
import ... as ...
If the import statement has a ʻas clause, the imported module itself is bound to the name below ʻas
.
For example, ʻimport a.b as b allows you to access the module ʻa.b
directly by name b
.
At this time, since the module ʻa is loaded and initialized, the information about the module ʻa
(and of course, ʻa.b) is cached. However, ʻa
is not bound to the namespace.
>>> import sys
>>> import a.b as b
['/Users/shinsa/git/python-experiments/module-experiments/a']
a
>>> sys.modules['a']
<module 'a' from '/Users/shinsa/git/python-experiments/module-experiments/a/__init__.py'>
>>> sys.modules['a.b']
<module 'a.b' from '/Users/shinsa/git/python-experiments/module-experiments/a/b/__init__.py'>
>>> b
<module 'a.b' from '/Users/shinsa/git/python-experiments/module-experiments/a/b/__init__.py'>
>>> a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
The module name comes immediately after ʻimport, and you cannot use the defined name in this place. Therefore, just because ʻimport a.b as b
is used, ʻimport a.b.c cannot be written as ʻimport b.c
.
from
clauseThis form of import takes the case of from a.b import c
. c
can be a module, a package, or a name.
The following processing is performed.
--Import ʻa.b. However, the name is not bound. --Attempts to bind
c. --If
c is the name defined in ʻa.b
, bind ʻa.b.c to the name
c. --If
c is a module or package, import ʻa.b.c
(but do not bind the name) and bind ʻa.b.c to the name
c`.
If the ʻasclause is specified, instead of binding
c in the above example, it is bound to the name specified in the ʻas
clause.
from ... import *
If the name to import is *
, that is, from M import *
, then in the module M
--The name specified by __all__
if defined
--Otherwise, names defined with M
that do not start with _
Are all bound in the current namespace.
In the from ... import ...
statement, you can use relative package representations in the from
clause.
For example, in the initialization script for package ʻa.b, or in the script for module ʻa.b.c
.
--from .
means from a.b
,
--from ..
means from a
,
--from .d
is from a.d
,
Represent each. You may have more dots.
PEP 366
Relative package calculations were based on the __name__
attribute.
Then, for example, when you start the interpreter with python -m a.b.c
, ʻa / b / c.py operates as a
main module. Due to this, there was a problem that, for example, ʻa.b.d
could not be relatively imported.
Newer versions of Python now calculate relative imports based on the __package__
attribute, which resolves the above call-time issue (PEP 366). However, if you start it with python <path> /a/b/c.py
, you need to set the attributes manually.
So far, modules, packages, and __path__
have been explained abstractly.
In other words, I'm not talking about a specific file system at all.
In recent versions, Python's import system has become a highly abstract and powerful system.
By default, the import system operates on a file system.
By default, modules are represented by .py
files and packages are represented by directories.
The details are as follows, and it can be seen that it fits the abstract data structure so far.
--The contents of sys.path
are regarded as the file path and searched. ʻImport m searches the directories contained in
sys.pathin order, and
m.py in one of the directories is imported. --Package ʻa.b
is loaded and initialized by executing ʻa / b / __ init__.py. --Loading and initializing the module ʻa.b.c
is done by executing ʻa / b / c.py. --The
path attribute value of package ʻa.b
is the directory path of b
.
Conversely, for example, a URI-based import system can be implemented, and the service discovery mechanism can be implemented using Python's import system.
The Python documentation is confusing because the information is scattered all over the place. It's still understandable if you do so to avoid duplication of information, Instead, each piece of information is subtly duplicated but scattered. In addition, some information is out of date.
Recommended Posts