Kaggle --Quora Question Pairs [^ 1] 3rd place solution [^ 2] research article.
[3rd place] Overview Of 3rd Place Solution Author: Jared Turkewitz Discussion URL: https://www.kaggle.com/c/quora-question-pairs/discussion/34288
--Use neural network, LightGBM, XGBoost --The first layer of Model Stacking is 1300 features --Use LightGBM (5 times faster than XGboost, slightly less accurate) --Stacking of 15 models --XGBoost is the best for a single model (CV = 0.185)
--Natural language processing features: word match, similar word match, etc. --Distance between TI-IDF and LDA --Word co-occurrence (self-mutual information [^ 4]) [^ 7] --Number of word matches --Fuzzy word matching scale (editing distance, letter N-gram distance)
--Common number of words, frequency, question frequency only for question 1, question frequency only for question 2, etc.
--Bidirectional LSTM --Distributed expression --Learned GloVe --part of speech embedding --named entity embedding --dependency parse embedding [^ 6] --siamese network [^ 3] --Attention part - Softmax Matching - Maxpool Matching
--Selectively adjust forecasts according to question frequency
References
Recommended Posts