Hello everyone. This is the article on the 23rd day of "ABEJA Advent Calendar 2020".
While belonging to a law firm as a lawyer, I support legal affairs at ABEJA.
Also, since I got the JDLA E qualification, I am doing a little engineer-like activity by calling myself a machine learning engineer.
I often talk about work, AI and law, but when I do that, I often get asked.
① There are various rights such as ownership, copyright, patent right, and I'm not sure what will be the problem in relation to AI. (2) I'm not sure from what perspective data usage is legally OK / NG. ③ I don't understand the model rights issue when entrusting / consigning AI development.
And so on.
So today, let's explain these three problems at once in an easy-to-understand manner! I wrote an article. I will explain with a little touch on "Contract Guidelines for AI / Data Usage (AI)" by the Ministry of Economy, Trade and Industry (in this article, it is called "AI Contract Guidelines").
It's a long sentence, so I hope you can read and refer to it only where you are interested!
(By the way, ABEJA Legal has a lawyer who has experience in setting up an RD team and develops a machine learning model by himself, and the Article on "AI and Fairness" written by that lawyer is also highly acclaimed. Inside!)
So let's get into the main subject right away!
The AI development and usage stages can be illustrated as follows (quoted from page 12 of the AI Contract Guidelines).
In addition, various know-how will be shared in the process of AI development, and new ones will be born.
Then, raw data, learning datasets, learning programs, inference programs, learned parameters, know-how, etc. are likely to be problematic in development contracts.
When you hear the word "right", do you think of ownership, copyright, and patent right? (Some people may have come up with "intellectual property right," but intellectual property right is copyright, patent. It is a term used as a general term for rights such as rights and trademark rights, so I will not use it this time).
Then, if you can fill in the blanks in the following table, it seems that you can answer the question "what rights matter?" Do you know what's in the blank?
Now, I would like to briefly explain the ownership, copyright, and patent rights to fill in the blanks.
The answer is that none of the above tables give rise to ownership! This is because it is said that ownership arises in "tangible things" (you can see that you can read Article 206 → Article 85 of the Civil Code). Data and programs are not tangible, so they have no ownership. If the data is stored on a recording medium, even if you own the recording medium, you cannot say that you own the data inside. So the answer for the ownership part is:
Unlike ownership, copyright is basically intangible. Copyright can occur in "a creative expression of thought or emotion."
Copyright can occur in creative texts, photographs, and programs, as well as in paintings and music.
So, for example, if the raw data used for learning is a sentence written by someone or a photograph taken by someone, the data may be copyrighted (on the other hand, it was obtained mechanically). Basically, copyright does not occur for numerical data or mechanically taken photographs, and it does not occur for parameters).
The source code of learning programs and inference programs can also be copyrighted.
So, the answer for the copyright part is as follows.
(The data set for learning is marked with △, but it is omitted here because there is a complicated problem when the explanation of "work of database" is started.)
Here, I will explain a little about the meaning of "copyright is generated". Copyright is often referred to as "a bundle of rights." What this means is that if you have a copyright, you will be able to copy the copyrighted work (copy right), publish it on Qiita (public transmission right), edit it (adaptation right), and so on. However, you can ask others to stop doing such things without your permission. Copyright is a bundle of these various rights.
Can you somehow imagine that ownership is the right to control the "things" that you own, while copyright is the right to creation that is intangible? For example, taking a painting as an example, the "picture" as an object has ownership, and there is a "picture" as an intangible object (which is visually integrated with the picture as an object). If the picture as an object is stolen, we will request a return based on the ownership, but if you want to stop selling with a similar picture, basically the ownership (Because I was not robbed of the picture as an object). In this case, you will be requested to stop making similar pictures on the basis of copyright.
A patent right is a right that becomes effective when an invention is filed and registered.
One of the features is that copyrights are generated at the same time as a copyrighted work is made, but patent rights are not generated unless they are applied for and registered.
So what does a patent right come from? Patent rights must be inventions, so they usually do not occur in simple data such as raw data. Regarding learning programs, considering the case of developing purely AI models (excluding peripheral systems, etc.), there is a possibility that patents will be granted for inventions related to algorithms (as a famous invention). , There is Google's batch normalization. Batch normalization has already been registered as a patent in Japan, the United States, etc. There are many things I would like to talk about about the acquisition of important patents by GAFA etc., but here I can't write it, so I'll take another opportunity).
So, the answer to the patent right part has been filled in!
So, I think there are some points of concern here. That is, there are all x (in short, no ownership, copyright, or patent right can occur). For example, non-creative raw data and trained parameters. The learned parameters are important information, but since they are simple matrix values with no creativity, it is considered that neither copyright nor patent right is basically generated. Is there any protection for this kind of "information for which no rights have arisen"? The answer is "not protected" (although omitted here, there are exceptions such as "trade secrets" and "limited data").
In other words, if you inadvertently pass information that does not generate rights to a third party, you cannot basically complain no matter how the third party handles the information.
That's a problem! That's why non-disclosure clauses such as NDA come out. Confidentiality clauses usually ・ Do not use information for any purpose other than ●● (limited purpose) ・ Do not disclose to third parties (limited to the target audience) ・ Do not duplicate (limitation of usage) However, this also has the effect of protecting information that does not generate rights (it is troublesome to conclude an NDA before giving and receiving information, but it is actually very important. I hope you understand).
So, I will summarize the points so far.
First of all, this figure is important.
In addition, the following points are important.
-** Ownership only arises in tangible property, not in data or programs. ** ** -** Copyrights and patents arise intangibly. ** ** -** Copyright is generated at the same time as the creation, but patent right is not generated unless it is registered. ** ** -** Copyright is a bundle of rights. If you have the copyright, you can do various things such as copying, publishing, and editing (you cannot do it without the copyright). ** ** -** Some information does not have rights such as ownership, copyright, and patent rights. To protect such information, it is necessary to conclude a contract and impose restrictions on disclosure. ** **
In machine learning, you first need to collect data. There are various ways to secure data, but there are various methods such as (1) preparing it in-house, (2) buying data from a specific company, (3) using a public data set, and (4) collecting it from an unspecified number of people. So let's look at them in order.
(1) There are many examples of preparing in-house, but for example, it is conceivable to prepare a large number of photos of in-house products in order to create AI for detecting defective products. In this case, the prepared image is usually not creative (because it will be taken mechanically) and it seems that copyright does not occur (whether or not the raw data is copyrighted depends on the creativity of the data. I explained earlier that it depends on the presence or absence.) There seems to be no legal obstacle to using it.
(2) When buying data from a specific company, it seems okay if you confirm that the seller's company has the rights and transfer the rights together. Also, it seems good to leave the rights to the seller and get permission to use them for learning. The point is that it depends on the content of the contract you conclude. In fact, the Ministry of Economy, Trade and Industry has issued guidelines for data transactions ("Contract Guidelines for AI / Data Use (Data)"). If you are interested, please take a look.
(3) When using a publicly available dataset, terms of use are often set. For example, in the famous image dataset COCO, the element called BY of Creative Commons Attribution 4.0 License is adopted, and it is required to display the credit of the original author (name, work title, etc.). It has been. Since it is understood that the terms of use of such a dataset are agreed (= contracted) when the data is downloaded, the parties are disciplined in the form of a contract between the data provider and the recipient. To do. The structure that the terms of use are applied as a contractual obligation is the same as NDA ((2) The same applies when purchasing data from a specific company). The point is that you need to consider the terms of use that you have set.
④ What is the problem when collecting from an unspecified number of people? For example, suppose you want to collect landscape images by crawling scraping and use them for learning.
The collected landscape images were taken by various people, so they are copyrighted. And if you want to use it for learning, you need to copy the image and preprocess it such as resizing and annotate it, but this seems to infringe the copy right and adaptation right. Also, I don't know who is the copyright holder even if I get permission from the copyright holder.
But, in fact, such an act can be done without infringing copyright. You may know that Japanese copyright law is called "machine learning paradise", but the text of Article 30-4 of the Copyright Law is the purpose of information analysis such as machine learning. Then, it is said that the copyrighted work can be used in certain cases. This Article 30-4 of the Copyright Law was amended by the revised Copyright Law that came into effect on January 1, 2019, and it is a more machine learning paradise article (although it was said to be a machine learning paradise before the revision). It has become.
By the way, ABEJA has partnered with RPA Technologies in February 2019 to develop a one-stop dataset crawling service that was made possible for the first time by the revised law. In the press release at the start of the service, we explained the revised law in considerable detail, which is unusual for a press release. If you are interested in the contents of Article 30-4 of the Copyright Law, please read Press Release.
As another case of collecting data from an unspecified number of people, let's consider the case of installing a camera on the street and collecting facial images of people. Images obtained mechanically from the camera are basically not copyrighted works, but in this case, it is necessary to pay attention to the restrictions of other viewpoints such as the Personal Information Protection Law and privacy / ethics.
In this way, when collecting and utilizing data, it may be necessary to consider laws such as the Personal Information Protection Law and privacy and ethics.
-** Isn't the data utilization restricted by the contract? ** -** Isn't data utilization restricted by laws and regulations such as the Personal Information Protection Law **?
** * Even if the data contains copyrighted works of a third party, it can be legally utilized **
In the contract / contract of model development, I think that learning programs, inference programs, learned parameters, etc. are often delivered in the development phase. As shown in the figure below, the learning program and inference program are copyrighted, but the learned parameters do not have such rights.
In this article, I would like to explain the most important learning programs and inference programs that generate copyright.
When the copyright of the learning program / inference program to be delivered is specified in the contract, (1) to which copyright belongs (rights attribution), and (2) under what conditions both parties can use it (usage conditions). It is necessary to consider two steps.
First, you need to decide whether the copyright belongs to the user vendor.
You have options such as user attribution, vendor attribution, and sharing between users and vendors. As mentioned in the terms of use below, there is not much formula that "should belong to", but
――In the case of a business model in which the vendor profits from the license fee, it is often the case that the copyright belongs. ――The user takes the initiative in developing the development method, etc., and when the vendor's discretion is narrow, the copyright is often attributed to the user. ――When user vendors collaborate to sell completed models, they are often shared.
There may be a rough tendency such as.
By the way, if the contract does not decide anything about attribution, the copyright basically belongs to the vendor who generated the program.
The object of ownership (tangible object) is basically "only one person can use the object", but the object of copyright (intangible object) can be used by many people at the same time.
For example, a house is tangible and subject to ownership,
--The owner lives --Let the lessee live without using the owner
There are only options such as.
On the other hand, if it is subject to copyright (for example, a program),
--Used only by copyright holders --Although it is used by copyright holders, it will be made public and used by people all over the world. ――The copyright holder does not use it, but I will use it exclusively for acquaintances. Not only use, but also allow improvement. An acquaintance wants to publish an improved version, so I allow it.
There are various ways to use it.
Earlier, I explained that copyright is a "bundle of rights."
You are free to decide which parts of this bundle you want to allow (a). For example ・ Reproduction is allowed freely ・ Lending is allowed, but conditions such as payment of money are added. ・ Although adaptation (editing) is allowed, the range that can be edited is limited. ・ Prohibition other than duplication, lending, and adaptation And so on.
And, again, an important point of view, copyright holders' rights can be contractually restricted (b).
If you limit all the rights of the copyright holder and, on the other hand, grant a wide range of licenses to a third party, it is no longer much different from the copyright of the licensed third party. It can also be a state.
"Which rights belong to" is important, but did you understand that "how to set the terms of use" is important?
Then, what kind of factors should be focused on when determining the terms of use? In this regard, the following table of the Ministry of Economy, Trade and Industry's AI contract guidelines can be used as a reference (AI contract guidelines, page 31).
For example ・ Copyright belongs to vendor ・ Users receive a free and indefinite non-exclusive license to the extent necessary for their own business. ・ Vendors are basically free to use, but they must not be deployed horizontally to users' competitors. It is possible to set conditions such as.
Once you have decided on these conditions, you can attach the above table to the contract as a "attached sheet" and put it in the clause to the effect that "you will use it under the attached conditions"! (If you complete this table nicely and bring it to legal affairs, it should be incorporated into the contract!) The following is an example of regulations with reference to the AI contract guidelines (see pages 114 to 118 of the AI contract guidelines).
- Article ●● (Copyright of this product) *
Aside from difficult expressions such as "moral rights of authors," I think you somehow got an image.
Finally, I would like to introduce two pitfalls that are often encountered when concluding a contract.
Regarding the attribution of copyright, if the user and the vendor do not give up and the settlement cannot be settled, I think that as a compromise, the copyright may not be attributed to one of the user and vendor, but may be "shared". It may also be shared in cases such as joint development.
In this case, do you think that you can use each other freely because you shared it?
Let's take a look at Article 65, Paragraphs 1 and 2 of the Copyright Law.
(Exercise of shared copyright) Article 65 Regarding the copyright of joint works and other copyrights related to sharing (hereinafter referred to as "shared copyright" in this Article), each co-owner shall obtain the consent of other co-owners. Equity may not be transferred or used as a pledge.
Looking at the text, the exercise of copyright requires "agreement of other co-owners". In other words, you can't do anything unless you get an agreement that you can use it for ●●. Therefore, when sharing, it is important to obtain an agreement that "you can use it for ●●" under the same contract.
Even in the draft contract of the AI contract guideline, there is a provision about this "agreement" when sharing it (page 114). It's a little long, but I'll post the text.
Article ●● (Copyright of the Deliverables, etc.)
In the case where the attribution of copyright cannot be settled without giving up to the user or vendor, it may be stipulated that the attribution should be decided by consultation. However, this is a risky contract for the parties.
If nothing is specified about the attribution of the copyright, the copyright belongs to the developer, that is, the vendor.
Then, as a user, if the consultation is not completed in the case of "determined by consultation", the right will not be obtained. Also, if you do not have a license, you will not be able to use any of the models you have created.
Even as a vendor, for example, if you were thinking of licensing a user for a fee and making a profit, if the "discussion" was not finalized, you would end up losing your intention, and even though you made it, it would not be money. It can be.
Therefore, it should be avoided that the attribution of rights is pending and "determined by consultation".
The same may be true if there is no provision for attribution at the start of the development PJ. In other words, there may be a situation in which, although we have individually concluded contracts with Assessment and PoC, we have not been able to conclude the discussion of model rights attribution for the first time in the development phase. The only way to prevent this was to conclude a "basic contract" that applies to the entire development PJ at the start of the assessment, and stipulate the attribution of the model rights there. However, there are many cases where the development target is not clearly decided at the beginning of the development PJ, and in such cases it may be better to avoid forcibly discussing the attribution of rights, so what kind of contract is concluded. It is better to judge for each case.
-** Copyrights apply to learning programs and inference programs. ** ** -** It is necessary to decide two things, attribution of rights and terms of use. ** ** -** When considering the terms of use, the table on page 31 of the AI contract guidelines will be helpful. ** ** -** If you want to share the copyright, make an agreement to exercise it. ** ** -** In many cases, it is better to avoid provisions such as "determined by consultation". ** **
What did you think?
This time, I posted an article to Qiita for the first time, and while I have seen many AI development contracts so far, "If the engineers / business side and legal affairs have a better understanding, the contract will be smoother. There were many times when I thought, "The negotiations are going on ...". I also saw technicians questioning the law but not having the right counselors around.
I hope this article will help you solve such a problem!
It's been a long sentence, so I wonder if anyone has read it to the end. .. .. I feel like that, We may update the article in the future, so if you have any questions, please feel free to comment!
Until the end Thank you for reading.
※※※※※※※ Please note that the opinion part of this article is my personal view, not the view of the organization to which I belong. ・ This article is a general statement, so please consult an expert when deciding on a specific case. ・ Please note that this article is focused on clarity and may be inaccurate in the strict sense of the word. ※※※※※※※
Recommended Posts