I tried to extend Python's Active Record framework

When I joined the company, it was Active Record

In the summer of 2015, I joined an AI startup. It was a decent development for the first time in 5 years.

I'm using Python Django (and later Python's Google App Engine). It's a framework very similar to Rails. I touched Rails about 7 years ago. So I was able to keep up with the idea of the framework.

Outburst incompetence, outburst anti-patterns

However, the code actually in operation was terrible. And the code of the server engineer who joined the company about a month ago contained the spell of Medapani.

Below is a list of the symptoms that I noticed.

I was confused.

Why did this happen

I thought about the cause.

Cause 1

I can't design and analyze in the first place

――There is no document at all saying "because it will change later" --Thoughts that are unfriendly to the person who takes over --No system basic design ――I wanted at least batch data flow and major screen transitions. --Remarks "You can understand by reading the code" in the first stage of shit engineer certification --Even if there is a design document, the granularity is too coarse --I have never seen UML ――The important thing on the server side is table design and data flow, but I just say "what is it delicious?" --Tables, classes, methods, and variable naming conventions are too common sense and painful --Please stop using Set or List in the name of the Model class. --The method name doesn't start with a verb

Cause 2

Don't know enterprise application architecture patterns

--There was a fucking engineer who wanted to know the architecture only with Rails ――No, I know that Rails has a good engineer. .. .. --At a minimum, if you don't know about microservices architecture or Domain Driven Design, the architecture implementation is tough. --There was AI code, just copy and paste the intern's code and paste it into the utils file. --If it's an app or service released by a company, you'll have to review it. .. ..

Cause 3

The manager is a sales person and is too ignorant about IT product making and manufacturing. One of the startup anti-patterns is for business owners to start without a good understanding of the tech business.

--Too amateur in product management ――When there is no CTO / product manager from the time of establishment (CTO has entered and is improving while struggling) --User interviews were assigned to people without skills --The CEO is a sales person. Even though I don't write the code, I don't go to interviews, get media coverage, and take the stage at events. ――I had an interview only once, but I explained and persuaded him. No, the purpose would be to listen to and dig deeper into the problems users are having. .. .. ――I think it may be necessary to raise funds, but the most convincing thing is not the presentation material, but the product. What would you do without doing activities to improve the quality? .. .. --COO is a sales system. I made it on the premise of improving a product, but I'm trying to scale it. --Among the problem hypothesis, solution hypothesis, product hypothesis, and scale hypothesis, I am trying to scale by skipping various steps and verifications -Opinions based on Entrepreneur's textbook and Inspired Is confused with high-speed PDCA --Too amateur project management ――When you hear that the development period is one month, there are three weeks left. ――When I thought that I would finish another task and start development in the remaining two weeks, there was a customer presentation. One week left. ――Three days ago, I learned that the final day of development was the day of customer presentation at the company-wide MTG. It is possible to include milestones in the development period. .. .. Development schedule, 2 days left ahead of schedule. ――When I pointed out that in the company-wide MTG, I was told that "the server side only looks at the DB". I haven't received the AI code at that time, and I don't know what will happen, including the data structure. ――In the end, I made it in time by staying up all night with the connection with AI and the GCE environment. --Product quality is neglected ――Delivery time priority, release for the time being --Even if you tell the man-hours, the release date will be advanced due to bargaining power. ――For weekend work and recovery all night ――When you get sick, you are told that you are not working well and that you are not giving value. --Specifications are determined by internal bargaining power ――Even if you decide on measures to improve the process, you have never been protected

Countermeasures

Cause 1 tried a study session on the ICONIX process. Click here for the reference book.

Let's talk about the solution for cause 2.

Supplement about Active Record pattern

It's misleading, but the Active Record pattern didn't start with Rails. It is described in Fowler's Enterprise Application Architecture Pattern.

I understand that the View and DB tables are 1: 1 and the View: Model: Table is 1: 1: 1 if you include the Model in between. It is adopted when simple configuration, simple service, and complicated functions are not added.

However, as the system becomes more complex and the tables and models grow, it becomes difficult. It's easy to get messed up.

Although it overlaps with the above symptoms, I think the problem can be classified into several patterns. The contents corresponding to the problem pattern are described below.

Problem pattern 1: Logic is embedded in the presentation layer

The view and batch interfaces have only three things to do.

--Validation of request and argument parameters --In some cases, input validators were created and transferred. ――But there are too many to fix. .. .. --request ・ Parse the argument parameters and transfer the process to another layer --For views, repack the returned object into response

Problem pattern 2: There is a process to combine with other models

Introduction of service layer

We use the term service-oriented architecture here. Such a common guy.

ServiceLayerSketch.gif

Changed to execute service layer methods from views and batch interfaces.

Arrangement of terms

In Domain Driven Design, this is the application controller. Note that this is not what Domain Driven Design calls a service (see Evans Classification).

There is also a web presentation pattern in the enterprise application architecture pattern, which also has an application controller.

Oh, it's confusing.

Naming convention

Provides a coarse-grained API for views. The naming convention is divided into the following two patterns.

--CRUD: XxxService with only one model (table) --Combine multiple models (tables) CRUD: YyyEngine

Singleton

It's also useless to instantiate each time you call from a view. Recipe to singleton with class decorator was adopted.

Customized QuerySet

When JOIN is faster in batch processing, it can be freely used in a model extended with a combined QuerySet. Describe the case where the Foreign Key is affixed.

Assumed ER diagram

Sometimes I created a model with additional information from item later.

Untitled Diagram.png

Treat the JOIN data as if it were a model.

Basic model

It looks like this.

models/item.py


class Item(models.Model):
    name = models.CharField(max_length=128, blank=True)
    description = models.CharField(max_length=1024, blank=True)
    image = mdoels.ImageField(upload_to='hoge', blank=True, null=True)

models/item_feature_1.py


class ItemFeature1(models.Model):
    item = models.ForeignKey(Item)
    feature_1 = models.TextField()

models/item_extra_info.py


class ItemExtraInfo(models.Model):
    item = models.ForeignKey(Item)
    info = models.TextField()

QuerySet, Manager, Model for join

QuerySet for joining looks like this. Others are not considered on the premise of joining with select.

joined_query_set.py


class JoinedItemFeatureQuerySet(models.QuerySet):
    def __iter__(self):
        queryset = self.values(
            'id', 'name', 'description', 'image',
            'itemfeature1__feature_1',
            'itemextrainfo__info')

        results = list(queryset.iterator())

        instances = []
        for result in results:
            item_feature = self.model(
                result['id'],
                result['name'],
                result['description'],
                result['image']
            )
            item_feature.feature_1 = result['itemfeature1__feature_1']  #Add / pack model properties
            item_feature.info = result['itemextrainfo__info']

            instances.append(item_feature)

        return iter(instances)  #Repack to iterator protocol

custom_manager.py


class JoinedItemFeatureManager(models.Manager):
    def get_queryset(self):
        queryset = JoinedItemFeatureQuerySet(self.model, using=self._db)
        queryset = queryset.filter(del_flg=False, itemfeature1__isnull=False, itemextrainfo__isnull=False)  #Don't be null
        return queryset

joined_item_domain.py


class JoinedItemFeatureDomain(Item):
    objects = JoinedItemFeatureManager()

    class Meta:
        proxy = True  #Do not create a table.

You can use the data freely with joined_item_features = JoinedItemFeatureDomain.objects.filter (...). .. .. Haz.

Problem pattern 3: Model processing is too much / complex

Methods that can be reused are moderately abstracted (machine learning system, etc.)

It can be cut out as a strategy or reused as a calculation-only class.

――For things that can be reused later or that can be made into a company's own library --The size of the trained model of the machine learning system is large. I don't want to load it many times, so I made it a singleton. --Variable memory in the middle of calculation may have remained --In some cases, only the parameters predicted by pickle.dump were saved.

Model is only associated with a table, and processing is implemented in a subclass

In Django, there is a method called proxy model, and I set it in a subclass.

Below is an excerpt of the code for my personal project.

intro/models/abstract_model.py


from django.contrib.auth.models import User
from django.db import models


class AbstractModel(models.Model):
    registered_at = models.DateTimeField(auto_created=True)
    registered_by = models.ForeignKey(User, related_name='%(class)s_registered_by')

    updated_at = models.DateTimeField(auto_now_add=True)
    updated_by = models.ForeignKey(User, related_name='%(class)s_updated_by')

    class Meta:
        abstract = True

intro/models/article.py



from django.contrib.auth.models import User
from intro.models.abstract_model import AbstractModel

class Article(AbstractModel):
    title = models.CharField(max_length=128)
    description = models.CharField(max_length=2048)
    author = models.ForeignKey(Author)  #The description of the Author class is omitted.
    categories = models.ForeignKey(Category, null=True, blank=True)  #Category Class description omitted
    url = models.URLField()

intro/models/article_domain.py


from intro.models.article import Article
from intro.consts import DEFAULT_MECAB_PATH

class ArticleDomain(Article):
    class Meta:
        proxy = True

    def __init__(self, *args, **kwargs):
        # Do initialize...

    def __call__(self, mecab_path=None):
        if not mecab_path:
            self.mecab_path = DEFAULT_MECAB_PATH

    def parse_morpheme(self):
        # Do morpheme analysis

    @classmethod
    def train(cls, filter_params)
        authors = filter_params.get('author')
        articles = None
        if authors and articles:
            articles = Articles.objects.filter(authors__in=authors)
        # Do some extraction

        # Do training


    def predict(self, texts)
        # Do prediction

Introducing test code

Create unit tests with from django.test import TestCase while refactoring Obviously, the outlook for the code has improved a lot.

Implementation of subclasses with different data extraction conditions (with customized QuerySet)

Changed the data extraction condition in get_queryset in CustomManager class so that CustomManager is held in proxy class.

It worked well with a domain model that was complicated for batch.

Hexagonal architecture is adopted for email transmission / external service cooperation

Writing an email sending implementation in the model or drawing urllib2 solidly for linking to external services is awkward.

2304.gif

Created a Gateway class for external cooperation (written as adapter in the above figure), and inherited and used the Gateway class in the domain model class or model class (written as Application in the above figure). ..

Model classes usually use convenience methods inherited from superclasses. Following that, we inherited the Gateway class and used convenient methods.

Finally

Regarding cause 3, I decided to give up on the management and change jobs (although only the CTO is decent).

If I took the time, I might have been able to improve the company. However, considering the time it takes for people to change, the speed at which technology advances, and the conditions from other companies, I thought that if I remained, I would waste my time and life. I'm not young enough to tolerate such waste. I'm not young enough to deal with the memories of others who are lonely.

Not only startups, but management recognized it as super important.

Recommended Posts

I tried to extend Python's Active Record framework
I tried to touch Python's GUI library "PySimple GUI"
I tried to debug.
I tried to paste
I tried to learn PredNet
I tried to organize SVM.
I tried to reintroduce Linux
I tried to introduce Pylint
I tried to summarize SparseMatrix
I tried to touch jupyter
I tried to implement StarGAN (1)
I tried to implement Deep VQE
I tried to touch Python (installation)
I tried to implement adversarial validation
I tried to explain Pytorch dataset
I tried Watson Speech to Text
I tried to touch Tesla's API
I tried to implement hierarchical clustering
I tried to organize about MCMC.
I tried to implement Realness GAN
I tried to move the ball
I tried to estimate the interval.
I tried to create a linebot (implementation)
I tried to summarize Python exception handling
I tried to implement PLSA in Python
I tried using Azure Speech to Text.
I tried to implement Autoencoder with TensorFlow
I tried to summarize the umask command
I tried to implement permutation in Python
I tried to create a linebot (preparation)
I tried to visualize AutoEncoder with TensorFlow
I tried to recognize the wake word
I tried to get started with Hy
I tried to implement PLSA in Python 2
Python3 standard input I tried to summarize
I tried to classify text using TensorFlow
I tried to summarize the graphical modeling.
I tried adding post-increment to CPython Implementation
I tried to implement ADALINE in Python
I tried to let optuna solve Sudoku
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
I tried to implement PPO in Python
I tried to implement CVAE with PyTorch
I tried to make a Web API
I tried to solve TSP with QAOA
[Python] I tried to calculate TF-IDF steadily
I tried to touch Python (basic syntax)
I tried shortening Python's FizzBuzz little by little
I tried benchmarking a web application framework
I tried my best to return to Lasso
I tried to summarize Ansible modules-Linux edition
I tried the Python Tornado Testing Framework
I tried to predict Covid-19 using Darts
I made a script to record the active window using win32gui of Python
I tried to predict next year with AI
I tried to program bubble sort by language
I tried web scraping to analyze the lyrics.
I tried to detect Mario with pytorch + yolov3
I tried to implement reading Dataset with PyTorch
I tried to use lightGBM, xgboost with Boruta