--Explain how to create a Python package --Explain what you want to be aware of when creating a Python package ――This article is Poem
We are planning to have an intern this summer, but they have never made a package.
On the other hand, as a company, if you do not package it, it will take time to use it in practice, which is difficult.
So I wrote a document for internal use called "How to make a Python package". Let the intern read this and make a nice package! It is a convenient plan.
However, I haven't studied the packaging know-how properly and systematically, so I'm worried about it. So, I decided to post a poem on Qiita and have the poem corrected.
Please forgive that Japanese is strange in some places. I am inconvenient in Japanese. [^ 1]
In other words, it is the prerequisite knowledge that the intern student has.
--I use Python for research on a daily basis. --You can do pre-processing work such as machine learning and natural language processing. (100 knocks of natural language processing seems to have been cleared) --I haven't made a package yet. --Git can be used.
↓ Poem from here
In fact, there is no official directory structure in the Python world. The code works even if you have a sloppy directory structure. However, that would make the code very confusing to other developers. That's why you need an easy-to-understand directory structure.
So I recommend following the this blog method. This blog clones the Github repository. Again, let's clone this repository.
MacBook-Pro% git clone [email protected]:kennethreitz/samplemod.git
Python's package naming convention is __ all lowercase __ and __ underscore delimited __ is recommended. Google's Great People and [People in Python](https://www.python.org/dev/peps/ pep-0008 / # package-and-module-names) says so, so it's safer to follow.
This time, let's create a package called hoge_hoge
. It's a good idea to match the cloned directory with the package name.
Also, the cloned directory has the original git file. So, let's delete the git file first.
After that, make the directory where the script is placed the same name as the package name. In other words, it's hoge_hoge
.
MacBook-Pro% mv samplemod hoge_hoge
MacBook-Pro% cd hoge_hoge
MacBook-Pro% rm -rf .git
MacBook-Pro% mv sample hoge_hoge
After running the above command, your directory structure should look like this.
(Changed in response to @ shn's suggestion. Requirement.txt
has been deleted.)
MacBook-Pro% tree
.
├── LICENSE
├── Makefile
├── README.rst
├── docs
│ ├── Makefile
│ ├── conf.py
│ ├── index.rst
│ └── make.bat
├── hoge_hoge
│ ├── __init__.py
│ ├── core.py
│ └── helpers.py
├── setup.py
└── tests
├── __init__.py
├── context.py
├── test_advanced.py
└── test_basic.py
Let's initialize a new git for your next package.
MacBook-Pro% git init
MacBook-Pro% git add .
MacBook-Pro% git commit -m 'first commit'
Now you are ready.
There are several tools that initialize the directory structure of python packages. Like cookiecutter. But I don't recommend it very much. This is because too many extra files are created.
setup.py
is a required file for the package. In this file, write the dependency information, version information, and package name of the package to be created.
This poem describes only the minimum fields. If you're interested in other fields, take a look at The Scriptures of People in Python.
The contents of the original setup.py
should look like this.
setup(
name='sample',
version='0.0.1',
description='Sample package for Python-Guide.org',
long_description=readme,
author='Kenneth Reitz',
author_email='[email protected]',
url='https://github.com/kennethreitz/samplemod',
license=license,
packages=find_packages(exclude=('tests', 'docs'))
)
Field name | Description | More detailed explanation |
---|---|---|
name | Write the name of the package | |
version | Write the current version information | |
description | Let's briefly write "What does this package do?" | |
install_requires | Write dependent package information | this |
dependency_links | Write when the dependent package does not exist on the pypi. | this |
__ In particular, what you absolutely must write __ is ʻinstall_requires. Developers using your packages don't know "Which package do you depend on?" Without this information, other developers will be in trouble. (
Dependency_links` will be removed by [pip update] around 2019/1 (https://github.com/pypa/pip/pull/6060). Alternative methods will be described later.)
Maybe you've done python setup.py install
before. This command works because the dependency information is written properly.
ʻList the dependent packages in the install_requiresfield. For example, if you need
numpy, write
numpy`.
setup(
name='sample',
version='0.0.1',
description='Sample package for Python-Guide.org',
long_description=readme,
author='Kenneth Reitz',
author_email='[email protected]',
install_requires=['numpy'],
url='https://github.com/kennethreitz/samplemod',
license=license,
packages=find_packages(exclude=('tests', 'docs'))
)
If you are a general company, you probably operate a package that you have developed in-house. There is no such package on Pypi, so you have to tell setup.py where the package is located. dependency_links
is needed at such times.
You may not want to use dependency_links
in the future. The reason is that dependency_links
is obsolete in pip.
An alternative is to write it in requirements.txt.
requirements.txt
has long been used by ʻinstall_requires` as an alternative package-dependent description.
It's a selfish opinion, but I think it's a fairly common method.
Describe the dependency information in requirements.txt
like this.
#You can write the package name that exists in pypi as it is
numpy
#Use equations when you want to specify the version
scipy == 1.2.2
#git for packages that don't exist on pypi://Repository URL.git
git://[email protected]/foo/foo.git
#For private repositories+Add ssh git+ssh://[email protected]/foo/foo.git
git+ssh://[email protected]/foo/foo.git
Then, write the process to read requirements.txt in setup.py.
This code is just right.
import os, sys
from setuptools import setup, find_packages
def read_requirements():
"""Parse requirements from requirements.txt."""
reqs_path = os.path.join('.', 'requirements.txt')
with open(reqs_path, 'r') as f:
requirements = [line.rstrip() for line in f]
return requirements
setup(
..., # Other stuff here
install_requires=read_requirements(),
)
(Important) This story is no longer available at least on pip. setuptools may soon remove this argument. Therefore, please consider it deprecated.
(In the case of Hayashi) Since I am using Github, I will write only the example of git. For other version control systems, StackOverflow article Let's see.
The basic formula is
git+ssh://git@URL-TO-THE-PACKAGE#egg=PACKAGE-NAME
is. For example, suppose you want to write dependency information for a package called ʻunko`. At that time
git+ssh://[email protected]/our-company-internal/unko.git#egg=unko
is.
If you want to fix it in a specific version, you can specify the branch name or tag name after @
.
git+ssh://[email protected]/our-company-internal/[email protected]#egg=unko-0.1
@ v0.1
is a github branch name or tag name. -0.1
is the package version.
Finally, rewrite the information such as name
, ʻauthor`.
Writing your name will probably make you more attached;)
The final setup.py
looks like this.
import os, sys
from setuptools import setup, find_packages
def read_requirements():
"""Parse requirements from requirements.txt."""
reqs_path = os.path.join('.', 'requirements.txt')
with open(reqs_path, 'r') as f:
requirements = [line.rstrip() for line in f]
return requirements
setup(
name='hoge_hoge',
version='0.0.1',
description='Sample package for Python-Guide.org',
long_description=readme,
author='Kensuke Mitsuzawa',
author_email='[email protected]',
install_requires=read_requirements(),
url='https://github.com/kennethreitz/samplemod',
license=license,
packages=find_packages(exclude=('tests', 'docs'))
)
There are many ways to write code, but I won't go into details this time. Please write as before. But [__ "What I want you to remember" __](http://qiita.com/Kensuke-Mitsuzawa/items/7717f823df5a30c27077#%E8%A6%9A%E3%81%88%E3%81%A6% E3% 81% 8A% E3% 81% 84% E3% 81% A6% E3% 81% BB% E3% 81% 97% E3% 81% 84% E3% 81% 93% E3% 81% A8) Therefore I want it.
Be sure to write the test. Maybe you've written the code to check the operation before. However, in addition to checking the operation, please consider the test case yourself as much as possible and verify the output.
Here, we will only explain how to write test cases. In Python there is a standard test framework called ʻunit test`. Let's use this framework. I will omit the details, but see This article-JP and This article-EN. Let's master how to use it.
The place to put the test code is under the tests /
directory.
├── setup.py
└── tests
├── __init__.py
├── context.py
├── test_advanced.py
└── test_basic.py
Once you've created one module script for your package, create one test script as well. That way, the correspondence is clearly understood and maintainability is improved. Now let's say you created hoge_hoge / core.py
as a module script.
├── hoge_hoge
│ ├── __init__.py
│ ├── core.py
Place the test script for core.py
undertests /
. It will be easier to understand if you add the prefix test_
to test_core.py
.
└── tests
├── __init__.py
├── test_core.py
There are two things to keep in mind when writing a test script.
--The test class name is camel case
--The prefix test_
is required before the method name. Without this prefix, test would be ignored.
Once tested, let's run the script. python test_core.py
.
Or if it is pycharm, press the normal run button and the test will run.
import unittest
class TestCore(unittest.TestCase):
@classmethod
def setUpClass(cls):
# procedures before tests are started. This code block is executed only once
pass
@classmethod
def tearDownClass(cls):
# procedures after tests are finished. This code block is executed only once
pass
def setUp(self):
# procedures before every tests are started. This code block is executed every time
pass
def tearDown(self):
# procedures after every tests are finished. This code block is executed every time
pass
def test_core(self):
# one test case. here.
# You must “test_” prefix always. Unless, unittest ignores
pass
if __name__ == '__main__':
unittest.main()
(Updated in response to @ podhmo's comment. Thanks!)
Once you've written some test scripts, let's make python setup.py test
executable.
Be sure to add a habit of executing this command if you make a major change. I often find problems that are out of my mind. When I am working on improving the package, I often forget to run the test by changing it to "Oh, change this too".
It's also very helpful as other developers can easily test it in their own environment.
The current test directory has this structure.
└── tests
├── __init__.py
├── test_all.py
├── test_advanced.py
└── test_basic.py
Specify the location of the test directory in setup.py. setup.py will automatically load the test class and run the test for you.
setup(
name='hoge_hoge',
version='0.0.1',
description='Sample package for Python-Guide.org',
long_description=readme,
author='Kensuke Mitsuzawa',
author_email='[email protected]',
install_requires=['numpy', 'unko'],
dependency_links=['git+ssh://[email protected]/our-company-internal/unko.git#egg=unko'],
url='https://github.com/kennethreitz/samplemod',
license=license,
packages=find_packages(exclude=('tests', 'docs')),
test_suite='tests'
)
Run python setup.py test
. You can see how all the tests are running.
If there is an error, it will be displayed at the end.
Mechanko awesome algorithm may be implemented in your package. However, other developers do not have the time to check the contents one by one. (Unfortunately)
The interface is needed to make your mechanko awesome algorithm ready for monkey developers.
Even with a simple interface, monkey developers complain that they don't know how to use it. Let's write a sample code and teach you how to use it.
I think the place to write the interface depends on your preference. If you're writing object-oriented code, you can write interface methods inside your class. If you are writing a function object in a script file like Python, you can prepare a script file for the interface. (By the way, I am the latter.)
In any case, make sure that __ input / output is clarified __. Take the request package interface (https://github.com/kennethreitz/requests/blob/fb014560611f6ebb97e7deb03ad8336c3c8f2db1/requests/api.py) as an example. [^ 2]
The important thing in the sample code is to clarify __ input / output __. Monkey developers often copy and paste sample code. The sample code that allows you to understand the input and output immediately even when using copy is the best. As an example, jaconv usage is mentioned.
If your team has a README writing style, be sure to follow that style __. (Unfortunately, there is no such method in Heisha)
The principle when writing a README is that "other developers can understand the installation method and the execution method just by reading the README".
In most cases, you should have the following information.
--How to install (usually python setup.py install
should be fine)
--How to run the test (usually python setup.py test
should be fine)
――How do you use it? (In general, "Look at the sample code" should be fine)
If there are any items that need to be prepared in advance, they must be written in the README. For example
--Mecab must be installed in advance --You have to set up RedisDB
In this case, you must write it in the README. If you can afford it, write a Make file. Other developers cry and rejoice.
One thing to keep in mind is that "other developers can't understand I / O just by looking at the code." That's why you need sample code or you have to write doc comments firmly. However, it is difficult to write polite comments on all methods and functions (the time available for internships is finite). At the very least, write the __ interface function input / output __ with a polite explanation.
Use github to manage your code. So (at least at our company), your package may soon be used by other developers.
That's why you should cherish the meaning of branch. In most cases, the master
branch is the" stable version ". The development version must be pushed to another branch.
As an example, this is my development style.
--Push to master until the first alpha version (always state at the beginning of the README "This is under development") --After the alpha version, create another development branch. How to handle branches follows git-flow [^ 3].
git-flow
is a rule that determines how to handle branches. This rule gives teams flexibility in development.
Follow git-flow, even if you're developing it by yourself. This is because other developers can participate in the development smoothly. For the flow of git-flow, see This good article.
Do the same for other developers to get the same results. Don't hard code information that can only be executed in your environment. Common hard-coded information
--Unix command execution path --Absolute path to external file
If this information is hard coded, it cannot be executed by other developers. I'm in trouble. For example, here is a bad example
def do_something():
path_to_external_unix_command = '/usr/local/bin/command'
path_to_input_file = '/Users/kensuke-mi/Desktop/codes/hoge_hoge/input.txt'
return call_some_function(path_to_input_file)
The contents of "input.txt" remain unknown to anyone. In this case, this is good.
--If you need an external file, push it to the same repository. Use relative path from script --Allow information to be passed as an argument to the function. --If unix command is required, describe it in the configuration file.
Here's a good example above.
def do_something(path_to_external_unix_command, path_to_input_file):
return call_some_function(path_to_input_file)
path_to_input_file = './input.txt'
path_to_external_unix_command = load_from_setting_file()
do_something(path_to_external_unix_command, path_to_input_file)
I wrote "I / O clearly" several times, but to be honest, it's troublesome. Moreover, it is easy to forget when you change it. In this case, it doesn't make sense to write a clear comment. So here is the type hint. It is a mechanism that can describe the type of an object like Java. However, the difference from Java is that Python's "type hint has no effect on execution". [^ 4]
--Easy for other developers to understand I / O types --Pycharm (as well as other IDEs) warns you when there is an I / O mismatch, so it's easy to notice mistakes. --Pycharm (and other IDEs) will suggest methods from type information, speeding up development
――The amount of typing characters increases a little
I'll give you an example.
Who are the names
and persons
in this implementation? It is not well understood. And what is greeting
a function that returns? Is also unknown.
def greeting(names, persons):
target = decide_target_person(persons)
greet = generate(target, names)
return greet
Now, let's add a type hint here.
def greeting(names: List[str], persons: Dict[str, str])->str:
target = decide_target_person(persons)
greet = generate(target, names)
return greet
Now the I / O is clear!
Pycharm fully supports type hints.
--Suggest of the method of the object --Warning when there is a type mismatch
The image below is the development screen of Pycharm. I am getting a warning because I / O of target
is wrong.
This area is a good article.
-PEP 0484 type hint (JP) -About Python 3.5 Type hint (JP)
In Python2, the type hint of Python3 expression cannot be interpreted and an error occurs. However, PEP offers a great way to "write a type hint as a comment". Pycharm is only partially supported yet, but it should be fully supported in the near future.
↑ So far Poem
It feels messy, but is it okay? Is it possible to make a package with only such information? I am anxious. Tsukkomi is welcome.
[^ 1]: Actually, the original document itself is written in English. Some international students come to the internship.
[^ 2]: The request package has a reputation for being "very easy to read code".
[^ 3]: Actually git-flow
is the name of the tool, but it often indicates the development style (only around me ??)
[^ 4]: Isn't that against Python's philosophy? I hear a few voices saying, but it is an item that is ʻAccepted` in the PEP where people in Python are active. Therefore, there are some things that you should keep. (In the first place, the member who made this proposal is Guido van Rossum)
Recommended Posts