[Survey] Kaggle --Quora 3rd place solution summary

Kaggle --Quora Question Pairs [^ 1] 3rd place solution [^ 2] research article.

[3rd place] Overview Of 3rd Place Solution Author: Jared Turkewitz Discussion URL: https://www.kaggle.com/c/quora-question-pairs/discussion/34288

architecture

--Use neural network, LightGBM, XGBoost --The first layer of Model Stacking is 1300 features --Use LightGBM (5 times faster than XGboost, slightly less accurate) --Stacking of 15 models --XGBoost is the best for a single model (CV = 0.185)

Features in natural language processing

--Natural language processing features: word match, similar word match, etc. --Distance between TI-IDF and LDA --Word co-occurrence (self-mutual information [^ 4]) [^ 7] --Number of word matches --Fuzzy word matching scale (editing distance, letter N-gram distance)

LDA --Word2Vec Distance --Characteristics by part of speech and named entity [^ 5]
mirror features
Such

Features on the graph structure

--Common number of words, frequency, question frequency only for question 1, question frequency only for question 2, etc.

Page Rank --Order --Shortest path --Creak size

Neural net

--Bidirectional LSTM --Distributed expression --Learned GloVe --part of speech embedding --named entity embedding --dependency parse embedding [^ 6] --siamese network [^ 3] --Attention part - Softmax Matching - Maxpool Matching

Other ideas

--Selectively adjust forecasts according to question frequency

References