[Language processing 100 knocks 2020] Chapter 6: Machine learning

Introduction

2020 version of 100 knocks of language processing, which is famous as a collection of problems of natural language processing, has been released. This article summarizes the results of solving Chapter 6: Machine Learning out of the following Chapters 1 to 10. ..

-Chapter 1: Preparatory Movement -Chapter 2: UNIX Commands -Chapter 3: Regular Expressions -Chapter 4: Morphological analysis -Chapter 5: Dependency Analysis --Chapter 6: Machine Learning --Chapter 7: Word Vector --Chapter 8: Neural Net --Chapter 9: RNN, CNN --Chapter 10: Machine Translation

Advance preparation

We use Google Colaboratory for answers. For details on how to set up and use Google Colaboratory, see this article. The notebook containing the execution results of the following answers is available on github.

Chapter 6: Machine Learning

In this chapter, we will use the News Aggregator Data Set published by Fabio Gasparetti to work on the task (category classification) of classifying news article headlines into the categories of "business", "science and technology", "entertainment", and "health".

50. Obtaining and shaping data

Download the News Aggregator Data Set and create training data (train.txt), verification data (valid.txt), and evaluation data (test.txt) as follows.

  1. Unzip the downloaded zip file and read the explanation of readme.txt.
  2. Extract only cases (articles) whose information sources (publishers) are “Reuters”, “Huffington Post”, “Businessweek”, “Contactmusic.com”, and “Daily Mail”.
  3. Randomly sort the extracted cases.
  4. Divide 80% of the extracted cases into training data and the remaining 10% into verification data and evaluation data, and save them with the file names train.txt, valid.txt, and test.txt, respectively. Write one case per line in the file, and use the tab-delimited format of the category name and article headline (this file will be reused later in Problem 70).

After creating the learning data and evaluation data, check the number of cases in each category.

First, download the specified data.

!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00359/NewsAggregatorDataset.zip
!unzip NewsAggregatorDataset.zip
#Check the number of lines
!wc -l ./newsCorpora.csv

output


422937 ./newsCorpora.csv
#Check the first 10 lines
!head -10 ./newsCorpora.csv

output


1	Fed official says weak data caused by weather, should not slow taper	http://www.latimes.com/business/money/la-fi-mo-federal-reserve-plosser-stimulus-economy-20140310,0,1312750.story\?track=rss	Los Angeles Times	b	ddUyU0VZz0BRneMioxUPQVP6sIxvM	www.latimes.com	1394470370698
2	Fed's Charles Plosser sees high bar for change in pace of tapering	http://www.livemint.com/Politics/H2EvwJSK2VE6OF7iK1g3PP/Feds-Charles-Plosser-sees-high-bar-for-change-in-pace-of-ta.html	Livemint	b	ddUyU0VZz0BRneMioxUPQVP6sIxvM	www.livemint.com	1394470371207
3	US open: Stocks fall after Fed official hints at accelerated tapering	http://www.ifamagazine.com/news/us-open-stocks-fall-after-fed-official-hints-at-accelerated-tapering-294436	IFA Magazine	b	ddUyU0VZz0BRneMioxUPQVP6sIxvM	www.ifamagazine.com	1394470371550
4	Fed risks falling 'behind the curve', Charles Plosser says	http://www.ifamagazine.com/news/fed-risks-falling-behind-the-curve-charles-plosser-says-294430	IFA Magazine	b	ddUyU0VZz0BRneMioxUPQVP6sIxvM	www.ifamagazine.com	1394470371793
5	Fed's Plosser: Nasty Weather Has Curbed Job Growth	http://www.moneynews.com/Economy/federal-reserve-charles-plosser-weather-job-growth/2014/03/10/id/557011	Moneynews	b	ddUyU0VZz0BRneMioxUPQVP6sIxvM	www.moneynews.com	1394470372027
6	Plosser: Fed May Have to Accelerate Tapering Pace	http://www.nasdaq.com/article/plosser-fed-may-have-to-accelerate-tapering-pace-20140310-00371	NASDAQ	b	ddUyU0VZz0BRneMioxUPQVP6sIxvM	www.nasdaq.com	1394470372212
7	Fed's Plosser: Taper pace may be too slow	http://www.marketwatch.com/story/feds-plosser-taper-pace-may-be-too-slow-2014-03-10\?reflink=MW_news_stmp	MarketWatch	b	ddUyU0VZz0BRneMioxUPQVP6sIxvM	www.marketwatch.com	1394470372405
8	Fed's Plosser expects US unemployment to fall to 6.2% by the end of 2014	http://www.fxstreet.com/news/forex-news/article.aspx\?storyid=23285020-b1b5-47ed-a8c4-96124bb91a39	FXstreet.com	b	ddUyU0VZz0BRneMioxUPQVP6sIxvM	www.fxstreet.com	1394470372615
9	US jobs growth last month hit by weather:Fed President Charles Plosser	http://economictimes.indiatimes.com/news/international/business/us-jobs-growth-last-month-hit-by-weatherfed-president-charles-plosser/articleshow/31788000.cms	Economic Times	b	ddUyU0VZz0BRneMioxUPQVP6sIxvM	economictimes.indiatimes.com	1394470372792
10	ECB unlikely to end sterilisation of SMP purchases - traders	http://www.iii.co.uk/news-opinion/reuters/news/152615	Interactive Investor	b	dPhGU51DcrolUIMxbRm0InaHGA2XM	www.iii.co.uk	1394470501265
#Replaced double quotes with single quotes to avoid errors when reading
!sed -e 's/"/'\''/g' ./newsCorpora.csv > ./newsCorpora_re.csv

Next, read in the pandas data frame and create the data according to the instructions in the question sentence. Scikit-learn's `train_test_split``` is used to split the data. At that time, if you use the stratify``` option, the composition ratio of the specified column will be divided so that it will be the same for each data after division. Here, the objective variable for classification, `` CATEGORY```, is specified so that there is no bias for each data.

import pandas as pd
from sklearn.model_selection import train_test_split

#Data reading
df = pd.read_csv('./newsCorpora_re.csv', header=None, sep='\t', names=['ID', 'TITLE', 'URL', 'PUBLISHER', 'CATEGORY', 'STORY', 'HOSTNAME', 'TIMESTAMP'])

#Data extraction
df = df.loc[df['PUBLISHER'].isin(['Reuters', 'Huffington Post', 'Businessweek', 'Contactmusic.com', 'Daily Mail']), ['TITLE', 'CATEGORY']]

#Data split
train, valid_test = train_test_split(df, test_size=0.2, shuffle=True, random_state=123, stratify=df['CATEGORY'])
valid, test = train_test_split(valid_test, test_size=0.5, shuffle=True, random_state=123, stratify=valid_test['CATEGORY'])

#Data storage
train.to_csv('./train.txt', sep='\t', index=False)
valid.to_csv('./valid.txt', sep='\t', index=False)
test.to_csv('./test.txt', sep='\t', index=False)

#Confirmation of the number of cases
print('[Learning data]')
print(train['CATEGORY'].value_counts())
print('[Verification data]')
print(valid['CATEGORY'].value_counts())
print('[Evaluation data]')
print(test['CATEGORY'].value_counts())

output


[Learning data]
b    4501
e    4235
t    1220
m     728
Name: CATEGORY, dtype: int64
[Verification data]
b    563
e    529
t    153
m     91
Name: CATEGORY, dtype: int64
[Evaluation data]
b    563
e    530
t    152
m     91
Name: CATEGORY, dtype: int64

51. Feature extraction

Extract the features from the training data, verification data, and evaluation data, and save them with the file names train.feature.txt, valid.feature.txt, and test.feature.txt, respectively. Feel free to design the features that are likely to be useful for categorization. The minimum baseline would be an article headline converted to a word string.

This time, we will calculate TF-IDF for a group of words in which the headline of an article is divided by spaces, and use that value as a feature quantity. It also calculates TF-IDF not only for one word (uni-gram) but also for two consecutive words (bi-gram). In calculating the above, three processes are performed as text pre-processing: (1) replace symbols with spaces, (2) lowercase alphabets, and (3) replace number strings with 0.

import string
import re

def preprocessing(text):
  table = str.maketrans(string.punctuation, ' '*len(string.punctuation))
  text = text.translate(table)  #Replace symbols with spaces
  text = text.lower()  #Lowercase
  text = re.sub('[0-9]+', '0', text)  #Replace digit string with 0

  return text
#Data recombination
df = pd.concat([train, valid, test], axis=0)
df.reset_index(drop=True, inplace=True)  #Reassign the index

#Implementation of pretreatment
df['TITLE'] = df['TITLE'].map(lambda x: preprocessing(x))

print(df.head())

output


                                               TITLE CATEGORY
0  refile update 0 european car sales up for sixt...        b
1  amazon plans to fight ftc over mobile app purc...        t
2  kids still get codeine in emergency rooms desp...        m
3  what on earth happened between solange and jay...        e
4  nato missile defense is flight tested over hawaii        b
from sklearn.feature_extraction.text import TfidfVectorizer

#Data split
train_valid = df[:len(train) + len(valid)]
test = df[len(train) + len(valid):]

# TfidfVectorizer
vec_tfidf = TfidfVectorizer(min_df=10, ngram_range=(1, 2))  # ngram_TF in range-Specify the length of the word for which the IDF is calculated

#Vectorization
X_train_valid = vec_tfidf.fit_transform(train_valid['TITLE'])  #Do not use test information
X_test = vec_tfidf.transform(test['TITLE'])

#Convert vector to data frame
X_train_valid = pd.DataFrame(X_train_valid.toarray(), columns=vec_tfidf.get_feature_names())
X_test = pd.DataFrame(X_test.toarray(), columns=vec_tfidf.get_feature_names())

#Data split
X_train = X_train_valid[:len(train)]
X_valid = X_train_valid[len(train):]

#Data storage
X_train.to_csv('./X_train.txt', sep='\t', index=False)
X_valid.to_csv('./X_valid.txt', sep='\t', index=False)
X_test.to_csv('./X_test.txt', sep='\t', index=False)

print(X_train.head())

output


    0m  0million  0nd   0s  0st  ...  yuan  zac  zac efron  zendaya  zone
0  0.0       0.0  0.0  0.0  0.0  ...   0.0  0.0        0.0      0.0   0.0
1  0.0       0.0  0.0  0.0  0.0  ...   0.0  0.0        0.0      0.0   0.0
2  0.0       0.0  0.0  0.0  0.0  ...   0.0  0.0        0.0      0.0   0.0
3  0.0       0.0  0.0  0.0  0.0  ...   0.0  0.0        0.0      0.0   0.0
4  0.0       0.0  0.0  0.0  0.0  ...   0.0  0.0        0.0      0.0   0.0

[5 rows x 2815 columns]

[Technical explanation] Measure the importance of words? What is the calculation method of TF-IDF and Okapi BM25

52. Learning

Learn the logistic regression model using the training data constructed in> 51.

We will continue to use scikit-learn to learn the logistic regression model.

from sklearn.linear_model import LogisticRegression

#Model learning
lg = LogisticRegression(random_state=123, max_iter=10000)
lg.fit(X_train, train['CATEGORY'])

output


LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=123, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

53. Forecast

Implement a program that calculates the category and its prediction probability from the given article headline using the logistic regression model learned in> 52.

Defines a function that takes a dataset as input that has undergone 51 text preprocessing to vectorization with TF-IDF.

import numpy as np

def score_lg(lg, X):
  return [np.max(lg.predict_proba(X), axis=1), lg.predict(X)]
train_pred = score_lg(lg, X_train)
test_pred = score_lg(lg, X_test)

print(train_pred)

output


[array([0.8402725 , 0.67906432, 0.55642575, ..., 0.86051523, 0.61362406,
       0.90827641]), array(['b', 't', 'm', ..., 'b', 'm', 'e'], dtype=object)]

54. Measurement of correct answer rate

Measure the correct answer rate of the logistic regression model learned in> 52 on the training data and evaluation data.

Use scikit-learn's accuracy_score to calculate the accuracy rate.

from sklearn.metrics import accuracy_score

train_accuracy = accuracy_score(train['CATEGORY'], train_pred[1])
test_accuracy = accuracy_score(test['CATEGORY'], test_pred[1])
print(f'Correct answer rate (learning data):{train_accuracy:.3f}')
print(f'Correct answer rate (evaluation data):{test_accuracy:.3f}')

output


Correct answer rate (learning data): 0.927
Correct answer rate (evaluation data): 0.885

55. Creating a confusion matrix

Create a confusion matrix of the logistic regression model learned in> 52 on the training data and evaluation data.

The confusion matrix is also calculated using scikit-learn. In addition, the calculated confusion matrix is visualized using seaborn.

from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

#Training data
train_cm = confusion_matrix(train['CATEGORY'], train_pred[1])
print(train_cm)
sns.heatmap(train_cm, annot=True, cmap='Blues')
plt.show()

output


[[4344   93    8   56]
 [  52 4173    2    8]
 [  96  125  494   13]
 [ 192  133    7  888]]

55_train.png

#Evaluation data
test_cm = confusion_matrix(test['CATEGORY'], test_pred[1])
print(test_cm)
sns.heatmap(test_cm, annot=True, cmap='Blues')
plt.show()

output


[[528  20   2  13]
 [ 12 516   1   1]
 [ 11  26  52   2]
 [ 38  26   1  87]]

55_test.png

56. Measurement of precision, recall, F1 score

Measure the precision, recall, and F1 score of the logistic regression model learned in> 52 on the evaluation data. Obtain the precision rate, recall rate, and F1 score for each category, and integrate the performance for each category with the micro-average and macro-average.

from sklearn.metrics import precision_score, recall_score, f1_score

def calculate_scores(y_true, y_pred):
  #Compliance rate
  precision = precision_score(test['CATEGORY'], test_pred[1], average=None, labels=['b', 'e', 't', 'm'])  #If None is specified, the precision for each class is returned by ndarray.
  precision = np.append(precision, precision_score(y_true, y_pred, average='micro'))  #Add micro mean at the end
  precision = np.append(precision, precision_score(y_true, y_pred, average='macro'))  #Add macro mean at the end

  #Recall
  recall = recall_score(test['CATEGORY'], test_pred[1], average=None, labels=['b', 'e', 't', 'm'])
  recall = np.append(recall, recall_score(y_true, y_pred, average='micro'))
  recall = np.append(recall, recall_score(y_true, y_pred, average='macro'))

  #F1 score
  f1 = f1_score(test['CATEGORY'], test_pred[1], average=None, labels=['b', 'e', 't', 'm'])
  f1 = np.append(f1, f1_score(y_true, y_pred, average='micro'))
  f1 = np.append(f1, f1_score(y_true, y_pred, average='macro'))

  #Combine results into a data frame
  scores = pd.DataFrame({'Compliance rate': precision, 'Recall': recall, 'F1 score': f1},
                        index=['b', 'e', 't', 'm', 'Micro average', 'Macro mean'])

  return scores
print(calculate_scores(test['CATEGORY'], test_pred[1]))

output


Match rate recall F1 score
b	         0.896	0.938	0.917
e	         0.878	0.974	0.923
t	         0.845	0.572	0.682
m	         0.929	0.571	0.707
Micro average 0.885	0.885	0.885
Macro average 0.887	0.764	0.807

57. Confirmation of feature weights

Check the top 10 features with high weights and the top 10 features with low weights in the logistic regression model learned in> 52.

The weight of each learned feature is stored in `` `coef_``` for each class.

features = X_train.columns.values
index = [i for i in range(1, 11)]
for c, coef in zip(lg.classes_, lg.coef_):
  print(f'【category】{c}')
  best10 = pd.DataFrame(features[np.argsort(coef)[::-1][:10]], columns=['Higher importance'], index=index).T
  worst10 = pd.DataFrame(features[np.argsort(coef)[:10]], columns=['Lower importance'], index=index).T
  display(pd.concat([best10, worst10], axis=0))
  print('\n')

output


[Category] b
          1      2      3    4       5     6          7       8       9   \
Top importance bank fed china ecb stocks euro obamacare oil yellen
Lower importance video ebola the her and she apple google star

              10  
Highest importance dollar
Lower importance microsoft


[Category] e
               1       2       3      4      5     6     7         8   \
Top importance kardashian chris her movie star film paul he
Lower importance us update google study china gm ceo facebook

            9     10  
Highest importance wedding she
Lower importance apple says


[Category] m
             1      2       3      4     5     6       7      8        9   \
Top importance ebola study cancer drug mers fda cases cdc could
Lower importance facebook gm ceo apple bank deal google sales climate

               10  
Top importance cigarettes
Lower importance twitter


[Category] t
           1         2      3          4        5         6       7        8   \
Top importance google facebook apple microsoft climate gm nasa tesla
Lower importance stocks fed her percent drug american cancer ukraine

            9           10  
Top importance comcast heartbleed
Lower importance still shares

58. Change regularization parameters

When training a logistic regression model, the degree of overfitting during learning can be controlled by adjusting the regularization parameters. Learn the logistic regression model with different regularization parameters and find the accuracy rate on the training data, validation data, and evaluation data. Summarize the results of the experiment in a graph with the regularization parameters on the horizontal axis and the accuracy rate on the vertical axis.

If the regularization is too strong (C is small), learning does not proceed and the accuracy is low, and if the regularization is too weak (C is large), overfitting occurs, and the difference in accuracy between learning and evaluation is widening. From this result, we can confirm that it is important to choose the appropriate C.

from tqdm import tqdm

result = []
for C in tqdm(np.logspace(-5, 4, 10, base=10)):
  #Model learning
  lg = LogisticRegression(random_state=123, max_iter=10000, C=C)
  lg.fit(X_train, train['CATEGORY'])

  #Get predicted value
  train_pred = score_lg(lg, X_train)
  valid_pred = score_lg(lg, X_valid)
  test_pred = score_lg(lg, X_test)

  #Calculation of correct answer rate
  train_accuracy = accuracy_score(train['CATEGORY'], train_pred[1])
  valid_accuracy = accuracy_score(valid['CATEGORY'], valid_pred[1])
  test_accuracy = accuracy_score(test['CATEGORY'], test_pred[1])

  #Storage of results
  result.append([C, train_accuracy, valid_accuracy, test_accuracy])

output


100%|██████████| 10/10 [07:26<00:00, 44.69s/it]  #Show progress using tqdm
#Visualization
result = np.array(result).T
plt.plot(result[0], result[1], label='train')
plt.plot(result[0], result[2], label='valid')
plt.plot(result[0], result[3], label='test')
plt.ylim(0, 1.1)
plt.ylabel('Accuracy')
plt.xscale ('log')
plt.xlabel('C')
plt.legend()
plt.show()

58.png

59. Searching for hyperparameters

Learn the categorization model while changing the learning algorithm and learning parameters. Find the learning algorithm parameter that has the highest accuracy rate on the verification data. Also, find the correct answer rate on the evaluation data when the learning algorithm and parameters are used.

Here, the parameter search is performed for `C``` that specifies the strength of regularization and `l1_ratio``` that specifies the balance between L1 regularization and L2 regularization. In addition, optuna is used for optimization.

!pip install optuna
import optuna

#Specify the optimization target with a function
def objective_lg(trial):
  #Set of parameters to be tuned
  l1_ratio = trial.suggest_uniform('l1_ratio', 0, 1)
  C = trial.suggest_loguniform('C', 1e-4, 1e4)

  #Model learning
  lg = LogisticRegression(random_state=123, 
                          max_iter=10000, 
                          penalty='elasticnet', 
                          solver='saga', 
                          l1_ratio=l1_ratio, 
                          C=C)
  lg.fit(X_train, train['CATEGORY'])

  #Get predicted value
  valid_pred = score_lg(lg, X_valid)

  #Calculation of correct answer rate
  valid_accuracy = accuracy_score(valid['CATEGORY'], valid_pred[1])    

  return valid_accuracy 
#optimisation
study = optuna.create_study(direction='maximize')
study.optimize(objective_lg, timeout=3600)

#View results
print('Best trial:')
trial = study.best_trial
print('  Value: {:.3f}'.format(trial.value))
print('  Params: ')
for key, value in trial.params.items():
  print('    {}: {}'.format(key, value))

output


Best trial:
  Value: 0.892
  Params: 
    l1_ratio: 0.23568685768996045
    C: 4.92280374981671

Automatic tuning of hyperparameters with Optuna -Pytorch Lightning edition-

Learn the model again with the searched parameters and check the accuracy rate.

#Parameter setting
l1_ratio = trial.params['l1_ratio']
C = trial.params['C']

#Model learning
lg = LogisticRegression(random_state=123, 
                        max_iter=10000, 
                        penalty='elasticnet', 
                        solver='saga', 
                        l1_ratio=l1_ratio, 
                        C=C)
lg.fit(X_train, train['CATEGORY'])

#Get predicted value
train_pred = score_lg(lg, X_train)
valid_pred = score_lg(lg, X_valid)
test_pred = score_lg(lg, X_test)

#Calculation of correct answer rate
train_accuracy = accuracy_score(train['CATEGORY'], train_pred[1]) 
valid_accuracy = accuracy_score(valid['CATEGORY'], valid_pred[1]) 
test_accuracy = accuracy_score(test['CATEGORY'], test_pred[1]) 

print(f'Correct answer rate (learning data):{train_accuracy:.3f}')
print(f'Correct answer rate (verification data):{valid_accuracy:.3f}')
print(f'Correct answer rate (evaluation data):{test_accuracy:.3f}')

output


Correct answer rate (learning data): 0.966
Correct answer rate (verification data): 0.892
Correct answer rate (evaluation data): 0.895

Since the correct answer rate of the evaluation data when learning with the default parameters was 0.885, it can be seen that the accuracy was improved by adopting the appropriate parameters.

This time I will also try XGBoost. In addition, this does not perform parameter search, but learns the model with fixed parameters.

!pip install xgboost
import xgboost as xgb

params={'objective': 'multi:softmax', 
        'num_class': 4,
        'eval_metric': 'mlogloss',
        'colsample_bytree': 1.0, 
        'colsample_bylevel': 0.5,
        'min_child_weight': 1,
        'subsample': 0.9, 
        'eta': 0.1, 
        'max_depth': 5,
        'gamma': 0.0,
        'alpha': 0.0,
        'lambda': 1.0,
        'num_round': 1000,
        'early_stopping_rounds': 50,
        'verbosity': 0
        }

#Format conversion for XGBoost
category_dict = {'b': 0, 'e': 1, 't':2, 'm':3}
y_train = train['CATEGORY'].map(lambda x: category_dict[x])
y_valid = valid['CATEGORY'].map(lambda x: category_dict[x])
y_test = test['CATEGORY'].map(lambda x: category_dict[x])
dtrain = xgb.DMatrix(X_train, label=y_train)
dvalid = xgb.DMatrix(X_valid, label=y_valid)
dtest = xgb.DMatrix(X_test, label=y_test)

#Model learning
num_round = params.pop('num_round')
early_stopping_rounds = params.pop('early_stopping_rounds')
watchlist = [(dtrain, 'train'), (dvalid, 'eval')]
model = xgb.train(params, dtrain, num_round, evals=watchlist, early_stopping_rounds=early_stopping_rounds)
#Get predicted value
train_pred = model.predict(dtrain, ntree_limit=model.best_ntree_limit)
valid_pred = model.predict(dvalid, ntree_limit=model.best_ntree_limit)
test_pred = model.predict(dtest, ntree_limit=model.best_ntree_limit)

#Calculation of correct answer rate
train_accuracy = accuracy_score(y_train, train_pred) 
valid_accuracy = accuracy_score(y_valid, valid_pred) 
test_accuracy = accuracy_score(y_test, test_pred) 

print(f'Correct answer rate (learning data):{train_accuracy:.3f}')
print(f'Correct answer rate (verification data):{valid_accuracy:.3f}')
print(f'Correct answer rate (evaluation data):{test_accuracy:.3f}')

output


Correct answer rate (learning data): 0.963
Correct answer rate (verification data): 0.873
Correct answer rate (evaluation data): 0.873

in conclusion

100 Language Processing Knock is designed so that you can learn not only natural language processing itself, but also basic data processing and general-purpose machine learning. Even those who are studying machine learning in online courses will be able to practice very good output, so please try it.

Recommended Posts

[Language processing 100 knocks 2020] Chapter 6: Machine learning
100 Language Processing Knock 2020 Chapter 6: Machine Learning
100 language processing knocks ~ Chapter 1
100 language processing knocks Chapter 2 (10 ~ 19)
100 language processing knocks (2020): 40
100 language processing knocks (2020): 32
100 language processing knocks (2020): 35
[Language processing 100 knocks 2020] Chapter 3: Regular expressions
100 language processing knocks (2020): 47
100 language processing knocks (2020): 39
100 language processing knocks (2020): 22
100 language processing knocks (2020): 26
100 language processing knocks (2020): 34
100 language processing knocks 2020: Chapter 4 (morphological analysis)
[Language processing 100 knocks 2020] Chapter 5: Dependency analysis
100 language processing knocks (2020): 42
100 language processing knocks (2020): 29
100 language processing knocks (2020): 49
100 language processing knocks 06 ~ 09
100 language processing knocks (2020): 43
100 language processing knocks (2020): 24
[Language processing 100 knocks 2020] Chapter 1: Preparatory movement
100 language processing knocks (2020): 45
100 language processing knocks (2020): 10-19
[Language processing 100 knocks 2020] Chapter 7: Word vector
100 Language Processing Knock 2020 Chapter 10: Machine Translation (90-98)
100 language processing knocks (2020): 30
100 language processing knocks (2020): 00-09
100 language processing knocks 2020: Chapter 3 (regular expression)
100 language processing knocks (2020): 31
[Language processing 100 knocks 2020] Chapter 8: Neural network
100 language processing knocks (2020): 48
[Language processing 100 knocks 2020] Chapter 2: UNIX commands
100 language processing knocks (2020): 44
100 language processing knocks (2020): 41
100 language processing knocks (2020): 37
[Language processing 100 knocks 2020] Chapter 9: RNN, CNN
100 language processing knocks (2020): 25
100 language processing knocks (2020): 23
100 language processing knocks (2020): 33
100 language processing knocks (2020): 20
100 language processing knocks (2020): 27
[Language processing 100 knocks 2020] Chapter 4: Morphological analysis
100 language processing knocks (2020): 46
100 language processing knocks (2020): 21
100 language processing knocks (2020): 36
100 language processing knocks Chapter 4: Morphological analysis 31. Verbs
100 amateur language processing knocks: 41
100 amateur language processing knocks: 71
100 amateur language processing knocks: 56
100 amateur language processing knocks: 24
100 amateur language processing knocks: 50
100 amateur language processing knocks: 59
100 amateur language processing knocks: 62
100 amateur language processing knocks: 60
100 Language Processing Knock 2020 Chapter 1
100 amateur language processing knocks: 92
100 amateur language processing knocks: 30
100 amateur language processing knocks: 06
100 amateur language processing knocks: 84
100 amateur language processing knocks: 81