I was told about the PHP version of Faker that creates dummy data on Twitter, so when I searched for the Pyhton version, I had a plan, so I installed it. ..

Installation

% pip install faker
% faker --version
faker 4.0.2

Trial data creation

`make_fake_data.py`


from faker.factory import Factory

Faker = Factory.create
fake = Faker()
fake.seed(0)
fake = Faker("ja_JP")

print(
    fake.csv(
        header=None,
        data_columns=("{{name}}", "{{zipcode}}", "{{address}}", "{{phone_number}}"),
        num_rows=10,
        include_row_ids=False,
    )
)

Try running it in VS Code debug mode.

 %  env PTVSD_LAUNCHER_PORT=53546 /usr/local/opt/python/bin/python3.7 /Users/nandymak/.vscode/extensions/ms-python.python-2020.2.64397/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/launcher /Users/nandymak/dev/fake-data/make_fake_data.py 
"Naoko Fujimoto","265-8376","23-19-7 Misuji, Nishi-ku, Yokohama-shi, Oita Gomigaya Corp. 948","92-4115-7815"
"Ryosuke Nagisa","989-9052","36-3-1 Tomihisa-cho, Shirako-cho, Chosei-gun, Saga Platinum Urban 097","53-5139-3328"
"Sotaro Ito","520-8016","35-7-20 Kudanminami, Mizuho-cho, Nishitama-gun, Kyoto Prefecture","090-4719-6593"
"Kenichi Kato","627-4260","3-27-3 Kaminarimon, Niijima Village, Akita Prefecture Senzoku Court 684","090-3396-9477"
"Yoko Watanabe","812-5855","13-8-1, Marunouchi JP Tower, Edogawa-ku, Nara Prefecture","090-1352-5601"
"Kumiko Yamagishi","836-9402","8-21-7 Nagahata, Kokubunji City, Miyazaki Prefecture Hitotsubashi Park 510","090-3217-3008"
"Shota Inoue","226-1179","3-20-4 Gomigaya, Sakae-cho, Inba-gun, Ishikawa Prefecture Shibaura Urban 792","090-3022-5841"
"Kana Sasada","482-6715","25-27-9, Rokubancho, Seya-ku, Yokohama-shi, Nagasaki Heights Konan 150","090-2375-9459"
"Mai Nakatsugawa","732-5083","13-23-11 Maeyaroku Corp. 960, Higashikurume City, Nagano Prefecture","080-9602-7142"
"Ryosuke Yamada","618-0001","27-7-18 Hirasuka, Chiyoda-ku, Mie Court Marunouchi JP Tower 206","65-0300-8913"

That kind of data was created. It's so much like that, when using it in a company, if you do not say in advance that it is dummy data generated by Faker, personal information may be leaked and it may cause a fuss.

Besides CSV

TSV or DSV? And so on. I have to sort out the functions that can be used (TODO).

`faker.py`


fake.tsv(header=None, data_columns=('{{name}}', '{{address}}'), num_rows=10, include_row_ids=False)

What I noticed

Zip code and address are not associated and generated (obviously)
"Match the Kanji name and Kana name of Faker's dummy data" Some people are thinking about it.
It seems that it can be done by devising the zip code and address.
The phone number is similar to that of a mobile phone, but the fixed-line (poi) number is a little strange because the leading zero is dropped.

Items that can be specified

I tried to summarize it in a table, but I gave up because it seems that there are more than 200. How to generate test data using Faker in Python

For the time being, here are some things that you might use often. You can probably find the full list by looking at the official website Docs »Locales» Language ja_JP.

Address system (faker.providers.address)

Method name	meaning	sample
address	Street address	161 Chizuka Palace, 38-9-5 Hirasuka, Hachijo-cho, Hachijojima, Kumamoto
ban	address	No. 6
building_name	Building name	Park
building_number	Building number?	263
chome	Chome	1-chome
city	Municipalities	Komae City
city_suffix	Municipalities(Fixed value?)	Ville
country	Country	New Caledonia
gou	No.	No. 15
postcode	Postal code	288-2290
prefecture	Name of prefectures	Tochigi Prefecture
street_address	address	215 Kimura Street
street_name	Street name	Sasaki Street
street_suffix	Street suffix(※1)	Street
town	Town name	Odaiba
zipcode	Postal code	149-3866

1: Words following the street name to further explain the street

Personal name system (faker.providers.person)

Method name	meaning	sample
name	Full name(Chinese characters)	Yui Aoyama
last_name	Last name(Chinese characters)	Takahashi
first_name	name(Chinese characters)	Yumiko
name_female	Female name(Chinese characters)	Tomomi Tanabe
name_male	Male name(Chinese characters)	Yoichi Fujimoto
last_name_male	Male surname(Chinese characters)？	Nishinoen
first_name_male	Male name(Chinese characters)	Atsushi
last_name_female	Female surname(Chinese characters)？	Yoshida
first_name_female	Female name(Chinese characters)	Tomomi
romanized_name	Full name(Romaji)	Akira Sasada
last_romanized_name	Last name(Romaji)	Ogaki
first_romanized_name	name(Romaji)	Naoki
first_romanized_name_male	Male name(Romaji)	Manabu
first_romanized_name_female	Female name(Romaji)	Rei
kana_name	Full name(Kana)	Takahashi Miki
last_kana_name	Last name(Kana)	Saito
first_kana_name	name(Kana)	Yoichi
first_kana_name_male	Male name(Kana)	Naoto
first_kana_name_female	Female name(Kana)	My

I wonder if there is a distinction between men and women in the surname. .. ..

Check item length

I checked the item length of the data generated to plunge into the RDB. I confirmed it at Colaboratory.

# !pip install faker                         #Run only for the first time
import numpy as np
import pandas as pd
from faker.factory import Factory
Faker = Factory.create
fake = Faker()
fake = Faker("ja_JP")
test_data = []
x = 1000000                              #Number of measurements

%timeit 
for i in range(0, x):
    test_data.append(len(fake.address()))  #Specify the item you want to measure(fake.xxxxx())

a=np.mean(test_data)
b=np.max(test_data)
print('mean={}、max={}'.format(a,b))

It is about 50 characters at the maximum in 1 million cases.

mean=26.205511、max=53

I tried it several times, but 55 was the maximum, so it seems good to think about 64 characters. ** Please note that it is not the number of bytes. ** **

Generate SQL for Insert (TODO)

As for my homework, I would like to create a "CREATE TABLE" statement and a WRAPPER that generates an "INSERT" statement so that I can create a table in the RDB after specifying the required items. ~~ * You need to find out the number of digits and attributes of the item name that Faker spits out for each method. ~~

It is necessary to investigate how much dialect of each RDB can be absorbed. I think First Target will be MySQL.
You have to think of some standard patterns that make it easy to select items.

For the time being, I should have forgiven around here for today.

Create test data like that with Python (Part 1)