The day before yesterday, I met SKK and was impressed. (User life 3rd day w) I don't want to use Google-based CGI Server, so I chose dictionary life. Hatena dictionary seems to be good.
--There are many sources in ruby, but there is no python --The python code was created in less than 10 minutes -~~ Read with CorvusSKK → Character code error → Suffering ~~ -~~ I built it at home and brought the Dictionary, but there was an error ~~ --~~ CorvusSKK? Windows? Problems ~~ ――SKK FEP seems to be easy to make a dictionary --I was able to register with SKKFEP! --Living with SKKFEP!
--Tell me about the import in CurvusSKK --Thank you, @corvussolis.
By the way, I have never dealt with it, but the source
make_skk_dic.py
# coding=utf-8
import pandas as pd
import numpy as np
import codecs
import re
def furi_del_norm(txt):
r = re.match(r"[A-Month]", txt)
if r:
return np.Nan
return txt
def main():
df = pd.DataFrame()
with codecs.open("keywordlist_furigana.csv", 'r', "euc_jp", "ignore") as file:
df = pd.read_table(file, delimiter="\t")
df.columns = ["furi", "word"]
df = df.dropna()
df["word"] = df["word"].replace('\r')
df["furi"] = df["furi"].replace('\r')
df["furi"] = df["furi"].apply(furi_del_norm)
df = df.dropna()
df = df.sort(columns=["furi"], ascending=True)
# to_csv no good
TMP_FILE_PATH = "SKK-JISHO.hatena"
with codecs.open(TMP_FILE_PATH, 'w', "utf-8", "ignore") as file:
#With Corvus SKK, enable the following(Postscript:2017/03/03)
# file.write(";; okuri-ari entries.")
# file.write(";; okuri-nasi entries.")
for i, row in df.iterrows():
file.write("%s /%s/" % (str(row["furi"]), str(row["word"])))
file.write("\n")
if __name__ == "__main__":
main()
Recommended Posts