I was told about the PHP version of Faker that creates dummy data on Twitter, so when I searched for the Pyhton version, I had a plan, so I installed it. ..
% pip install faker
% faker --version
faker 4.0.2
make_fake_data.py
from faker.factory import Factory
Faker = Factory.create
fake = Faker()
fake.seed(0)
fake = Faker("ja_JP")
print(
fake.csv(
header=None,
data_columns=("{{name}}", "{{zipcode}}", "{{address}}", "{{phone_number}}"),
num_rows=10,
include_row_ids=False,
)
)
Try running it in VS Code debug mode.
% env PTVSD_LAUNCHER_PORT=53546 /usr/local/opt/python/bin/python3.7 /Users/nandymak/.vscode/extensions/ms-python.python-2020.2.64397/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/launcher /Users/nandymak/dev/fake-data/make_fake_data.py
"Naoko Fujimoto","265-8376","23-19-7 Misuji, Nishi-ku, Yokohama-shi, Oita Gomigaya Corp. 948","92-4115-7815"
"Ryosuke Nagisa","989-9052","36-3-1 Tomihisa-cho, Shirako-cho, Chosei-gun, Saga Platinum Urban 097","53-5139-3328"
"Sotaro Ito","520-8016","35-7-20 Kudanminami, Mizuho-cho, Nishitama-gun, Kyoto Prefecture","090-4719-6593"
"Kenichi Kato","627-4260","3-27-3 Kaminarimon, Niijima Village, Akita Prefecture Senzoku Court 684","090-3396-9477"
"Yoko Watanabe","812-5855","13-8-1, Marunouchi JP Tower, Edogawa-ku, Nara Prefecture","090-1352-5601"
"Kumiko Yamagishi","836-9402","8-21-7 Nagahata, Kokubunji City, Miyazaki Prefecture Hitotsubashi Park 510","090-3217-3008"
"Shota Inoue","226-1179","3-20-4 Gomigaya, Sakae-cho, Inba-gun, Ishikawa Prefecture Shibaura Urban 792","090-3022-5841"
"Kana Sasada","482-6715","25-27-9, Rokubancho, Seya-ku, Yokohama-shi, Nagasaki Heights Konan 150","090-2375-9459"
"Mai Nakatsugawa","732-5083","13-23-11 Maeyaroku Corp. 960, Higashikurume City, Nagano Prefecture","080-9602-7142"
"Ryosuke Yamada","618-0001","27-7-18 Hirasuka, Chiyoda-ku, Mie Court Marunouchi JP Tower 206","65-0300-8913"
That kind of data was created. It's so much like that, when using it in a company, if you do not say in advance that it is dummy data generated by Faker, personal information may be leaked and it may cause a fuss.
TSV or DSV? And so on. I have to sort out the functions that can be used (TODO).
faker.py
fake.tsv(header=None, data_columns=('{{name}}', '{{address}}'), num_rows=10, include_row_ids=False)
I tried to summarize it in a table, but I gave up because it seems that there are more than 200. How to generate test data using Faker in Python
For the time being, here are some things that you might use often. You can probably find the full list by looking at the official website Docs »Locales» Language ja_JP.
Method name | meaning | sample |
---|---|---|
address | Street address | 161 Chizuka Palace, 38-9-5 Hirasuka, Hachijo-cho, Hachijojima, Kumamoto |
ban | address | No. 6 |
building_name | Building name | Park |
building_number | Building number? | 263 |
chome | Chome | 1-chome |
city | Municipalities | Komae City |
city_suffix | Municipalities(Fixed value?) | Ville |
country | Country | New Caledonia |
gou | No. | No. 15 |
postcode | Postal code | 288-2290 |
prefecture | Name of prefectures | Tochigi Prefecture |
street_address | address | 215 Kimura Street |
street_name | Street name | Sasaki Street |
street_suffix | Street suffix(※1) | Street |
town | Town name | Odaiba |
zipcode | Postal code | 149-3866 |
Method name | meaning | sample |
---|---|---|
name | Full name(Chinese characters) | Yui Aoyama |
last_name | Last name(Chinese characters) | Takahashi |
first_name | name(Chinese characters) | Yumiko |
name_female | Female name(Chinese characters) | Tomomi Tanabe |
name_male | Male name(Chinese characters) | Yoichi Fujimoto |
last_name_male | Male surname(Chinese characters)? | Nishinoen |
first_name_male | Male name(Chinese characters) | Atsushi |
last_name_female | Female surname(Chinese characters)? | Yoshida |
first_name_female | Female name(Chinese characters) | Tomomi |
romanized_name | Full name(Romaji) | Akira Sasada |
last_romanized_name | Last name(Romaji) | Ogaki |
first_romanized_name | name(Romaji) | Naoki |
first_romanized_name_male | Male name(Romaji) | Manabu |
first_romanized_name_female | Female name(Romaji) | Rei |
kana_name | Full name(Kana) | Takahashi Miki |
last_kana_name | Last name(Kana) | Saito |
first_kana_name | name(Kana) | Yoichi |
first_kana_name_male | Male name(Kana) | Naoto |
first_kana_name_female | Female name(Kana) | My |
I checked the item length of the data generated to plunge into the RDB. I confirmed it at Colaboratory.
# !pip install faker #Run only for the first time
import numpy as np
import pandas as pd
from faker.factory import Factory
Faker = Factory.create
fake = Faker()
fake = Faker("ja_JP")
test_data = []
x = 1000000 #Number of measurements
%timeit
for i in range(0, x):
test_data.append(len(fake.address())) #Specify the item you want to measure(fake.xxxxx())
a=np.mean(test_data)
b=np.max(test_data)
print('mean={}、max={}'.format(a,b))
It is about 50 characters at the maximum in 1 million cases.
mean=26.205511、max=53
I tried it several times, but 55 was the maximum, so it seems good to think about 64 characters. ** Please note that it is not the number of bytes. ** **
As for my homework, I would like to create a "CREATE TABLE" statement and a WRAPPER that generates an "INSERT" statement so that I can create a table in the RDB after specifying the required items. ~~ * You need to find out the number of digits and attributes of the item name that Faker spits out for each method. ~~
For the time being, I should have forgiven around here for today.
Recommended Posts