Introduction

About the Python import system

--Module is a .py file --The package is the directory where you put __init__.py

You may understand that, By moving away from physical files and thinking a bit more abstractly, I think we can get a better view of the entire system, such as module names, relative imports, initialization timing, and name visibility.

Note that this article does not cover the ** namespace package [^ 1] ** (because I don't understand it).

References

I browsed the v3.8.2 version of both.

--Python language reference, 7.11. import statement: https://docs.python.org/ja/3/reference/simple_stmts.html#the-import-statement --Python Language Reference, 5. Import System: https://docs.python.org/ja/3/reference/import.html --Python Tutorial, 6. Modules: https://docs.python.org/ja/3/tutorial/modules.html

environment

I confirmed the operation with Python v3.8.2. Verify the operation as a test case of pytest.

pip install pytest

See the appendix for checking the operation of pytest itself.

Overview

--A module is a namespace unit that can be imported. --All code runs in a module (namespace). The top level module name is __main__. --When you import a module, you can see the module and its namespace from the current namespace. --A package is a module that can have submodules. --Once imported modules are cached globally.

A brief summary of the attributes:

	`__name__`	`__path__`	`__package__`
module:	module名	None	Parent package name
package:	package名	packageのパス	package名

What is a namespace?

This is a commonly used concept in programming languages ** This is a function that allows you to use "names" such as variable names x in different places in a program with different meanings without fear of collision. ** ** For example, builtins.open and ʻos.open` are both function names, but because they belong to different namespaces. It can be defined and used as different. The "module" described below is this ** namespace unit **.

There is a glossary in the Python docs with a "namespace" entry: https://docs.python.org/ja/3/glossary.html#term-namespace

What is a module? What is an import?

As mentioned in the overview, a module is a ** namespace unit that can be imported **.

Namespace unit

First, because modules are namespace units, for modules m and n, the names in m`` x (= mx) and the names in n x (=nx) Is distinguished.

All code works in a namespace

Second, less consciously, all the code runs in a module, that is, in a namespace. When the code declares and defines a name, it is registered in that namespace. Immediately after the Python interpreter is launched, the code is running in the ** __ main__ module namespace **. This is often used in ** a way to tell if a .py file was launched directly or imported **.

Module import

The module can then be imported. Importing a module means

--Find and cache the module and --Load and initialize, --Binding a module to the current namespace

That is. The document says 2 steps, but if you think of it as 3 steps, the processing unit is easy to understand. Simply put, by importing the module m as ʻimport m`

--Search for module m in sys.path and register the module object in the global cache sys.modules. --Register a name in that namespace by loading and initializing the module m. For example, m.f or m.C. --Register the name m in the current namespace so that it points to the module m.

Start the Python interpreter in interactive mode and see for yourself. (Partially shaped for readability)

[.../module-experiments]$ python
Python 3.8.2 (default, Apr 18 2020, 18:08:23) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> print(__name__)
__main__

>>> import m
>>> sys.modules['m']
<module 'm' from '.../module-experiments/m.py'>
>>> m
<module 'm' from '.../module-experiments/m.py'>
>>> m.f
<function f at 0x108452b80>
>>> m.C
<class 'm.C'>

You can see that the module immediately after starting is __main__. You can also see that by importing m, you can access m, m.f and m.C.

The imported modules are cached in the local namespace. Therefore, like any other name, modules imported within a function cannot be accessed by other functions. However, since the cache is global, it can be accessed.

What is a package?

A package is just like a module, except that you can search for and import submodules within it. By using packages, you can create a hierarchical structure of modules.

Put simply

The name of the module (or package) will then be something like ʻa.b.c separated by dots. By importing the module ʻa.b.c with ʻimport a.b.c`

--Import module (package) ʻa --Import the submodule (package)b (ʻa.b) in the namespace of module ʻa --Import the submodulec (ʻa.b.c) in the namespace of module ʻa.b`

This allows you to access the module ʻa.b.c with ʻa.b.c. ** Note: You will not be able to access it with c. ** **

You can also see that this rule allows you to import packages and modules in the same way.

Of course, ʻa.d and ʻa.d.e cannot be accessed immediately after importing ʻa.b.c. If you do ʻimport a.d.e here, ʻa` exists in the cache, so

--Import the submodule (package) d (ʻa.d) in the namespace of module ʻa --Import the submodule ʻe (ʻa.d.e) in the namespace of module ʻa.d`

Only happens, so at these points the module is initialized.

A little more detail

Technically, a package is a module that has the __path__ attribute. This attribute is used when looking for subpackages. In other words, in ʻimport a.b.c`,

--Find ʻa in sys.path and import --In the namespace of module ʻa, find ʻa.b from ʻa.__path__ and import it. --In the namespace of module ʻa.b, find ʻa.b.c in ʻa.b.path` and import it.

That is happening.

Note that the module containing the package has the __name__ attribute. For imported modules, this value is the fully qualified name of the module (such as ʻa.b.c`).

Package import

Since a package is a module, it can be imported with the import statement like a module.

import ... as ...

If the import statement has a ʻas clause, the imported module itself is bound to the name below ʻas. For example, ʻimport a.b as b allows you to access the module ʻa.b directly by name b.

At this time, since the module ʻa is loaded and initialized, the information about the module ʻa (and of course, ʻa.b) is cached. However, ʻa is not bound to the namespace.

>>> import sys
>>> import a.b as b
['/Users/shinsa/git/python-experiments/module-experiments/a']
a

>>> sys.modules['a']
<module 'a' from '/Users/shinsa/git/python-experiments/module-experiments/a/__init__.py'>
>>> sys.modules['a.b']
<module 'a.b' from '/Users/shinsa/git/python-experiments/module-experiments/a/b/__init__.py'>

>>> b
<module 'a.b' from '/Users/shinsa/git/python-experiments/module-experiments/a/b/__init__.py'>

>>> a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined

Module name and name are different

The module name comes immediately after ʻimport, and you cannot use the defined name in this place. Therefore, just because ʻimport a.b as b is used, ʻimport a.b.c cannot be written as ʻimport b.c.

Import with `from` clause

This form of import takes the case of from a.b import c. c can be a module, a package, or a name. The following processing is performed.

--Import ʻa.b. However, the name is not bound. --Attempts to bind c. --If c is the name defined in ʻa.b, bind ʻa.b.c to the name c. --If c is a module or package, import ʻa.b.c (but do not bind the name) and bind ʻa.b.c to the name c`.

If the ʻasclause is specified, instead of bindingc in the above example, it is bound to the name specified in the ʻas clause.

from ... import *

If the name to import is *, that is, from M import *, then in the module M

--The name specified by __all__ if defined --Otherwise, names defined with M that do not start with _

Are all bound in the current namespace.

Relative import

In the from ... import ... statement, you can use relative package representations in the from clause. For example, in the initialization script for package ʻa.b, or in the script for module ʻa.b.c.

--from . means from a.b, --from .. means from a, --from .d is from a.d,

Represent each. You may have more dots.

PEP 366

Relative package calculations were based on the __name__ attribute. Then, for example, when you start the interpreter with python -m a.b.c, ʻa / b / c.py operates as a main module. Due to this, there was a problem that, for example, ʻa.b.d could not be relatively imported.

Newer versions of Python now calculate relative imports based on the __package__ attribute, which resolves the above call-time issue (PEP 366). However, if you start it with python <path> /a/b/c.py, you need to set the attributes manually.

Default implementation: Import system for file system

So far, modules, packages, and __path__ have been explained abstractly. In other words, I'm not talking about a specific file system at all. In recent versions, Python's import system has become a highly abstract and powerful system.

By default, the import system operates on a file system. By default, modules are represented by .py files and packages are represented by directories. The details are as follows, and it can be seen that it fits the abstract data structure so far.

--The contents of sys.path are regarded as the file path and searched. ʻImport m searches the directories contained in sys.pathin order, andm.py in one of the directories is imported. --Package ʻa.b is loaded and initialized by executing ʻa / b / __ init__.py. --Loading and initializing the module ʻa.b.c is done by executing ʻa / b / c.py. --The path attribute value of package ʻa.b is the directory path of b.

Conversely, for example, a URI-based import system can be implemented, and the service discovery mechanism can be implemented using Python's import system.

Finally: A little dissatisfied

The Python documentation is confusing because the information is scattered all over the place. It's still understandable if you do so to avoid duplication of information, Instead, each piece of information is subtly duplicated but scattered. In addition, some information is out of date.

Get an abstract understanding of Python modules and packages