2nd place solution I will introduce the approach of the runner-up. 2nd place solution | team brain-afk Feature generation, multiple models, stacking and FFM customization.

Most important features

There are some places where the understanding of the English translation is doubtful, so if you are interested, please check the original text. There are some questions in the discussion part at the bottom.

Features created by rcarson ("rcarson's" Features ”). This is very powerful and can be further enhanced by including timestamp lines just before, 1 hour, 1 day, and 1 day or more after the page_view doc.
Competing ads: For each row, we used the hashes of individual competing ads as FFM features.
Hashed all "clicked on page_views" document / traffic_source combinations. In other words, if there is a user who came from the document called search, it can be considered separately from the user who came from the document called internal. In all documents, less than 80 events will be dropped.
In documents_meta.csv, hash all page view documents for each user. This slightly improves accuracy, probably due to the inclusion of rare documents.
1 hour after click: Hash feature of document that was carried out by the user within 1 hour after the ad was clicked (I do not understand this English translation) ・ This is in the form of leak data I will. After clicking on an ad, there are many ads in other documents, or the user is looking for content related to the ad.
Interaction of ad_doc_category and doc_category. Weekdays – worked well in the stack – and hour.
Time difference between display doc creation time and current time. In addition, the time difference between display doc creation time and ad doc creation time within the page if the user views the ad in the same publisher or in the same source.
flag. Whether the user saw the same ad in a similar category or topic
flag. Did you see the ad in the past, or did the person click on the ad in the past? Publishers likewise talk about sources, categories, and topics.
flag. Did the user see the ad in the future? Did the user see a similar campaign ad that they didn't see in the future?

Model used in Layer 1

Except for LibFFM and FTRL, it may be similar to a normal classification competition. I think it is quite rare to have Liblinear. Is Keras just a multi-layer perceptron? No particular mention is made.

Lib FFM (described later)
Implemented FTRL with Vowpal Wabbit. Use tingrtu's program (described later)
Liblinear (described later)
XGBoost
Keras
Logistic regression
SVC

Alexey customized FFM. As a result, the speed is increased and the memory consumption is low. The code for k will be published to github immediately.

CV&Meta Modeling The word "belnd" and "stacking" mentioned here are the story of ensemble learning.

We used 6M rows as a self-validation set and sampled according to the structure of the test set. (2 future days, 50% common days / 50% future days rows). In addition, a subset of about 14M was used for training for new ideas faster (?).

Before Alexey joined the team, he did 20 layer1 models in a 6M set, with a 0.003 point improvement on the public board. Then, I learned the data of 6M set with the blend model of XGBoost & Keras. (=> no common days / future days separation (what do you mean ???))

In the last week of the deadline, one model improved a lot and the stacking result improved. In the prediction of Layer1, the generalized time was used as the feature quantity for Layer2. This improved the score of 0.00020. Alexey joined and finished the final submission by merging and blending the stack data he had.

Final solution We submitted the bagging results of Alexey's meta stack, XGBoost, and Keras by geometric mean. The weight was intuitively determined with reference to the LB score.

Best single model Alexey's custom sh FFM implementation gave the best accuracy. 0.70017 in public leader boarc.

rcarson features

[Extract lean in 30 mins with small memory](https://www.kaggle.com/jiweiliu/outbrain-click-prediction/extract-leak-in-30-mins-with-small-memory/] published by rcarson Introducing the feature quantity generation of code). (Click here for gist)

There are two data to use, page_views.csv and promoted_content.csv. To summarize, page_views shows the id of the web page that the user has visited, and promoted_content shows the details of each ad id.

for c,row in enumerate(csv.DictReader(open('../input/promoted_content.csv'))):
    if row['document_id'] != '':
        leak[row['document_id']] = 1

If there is a document id included in promoted_content, flag leak.

filename = '../input/page_views.csv'
filename = '../input/page_views_sample.csv' # comment this out locally
for c,row in enumerate(csv.DictReader(open(filename))):
    if count>limit:
	    break
    if c%1000000 == 0:
        print (c,count)
    if row['document_id'] not in leak:
	    continue
    if leak[row['document_id']]==1:
	    leak[row['document_id']] = set()
    lu = len(leak[row['document_id']])
    leak[row['document_id']].add(row['uuid'])
    if lu!=len(leak[row['document_id']]):
	    count+=1

After that, when a page containing promoted_content comes in page_views, its user id (uuid) is added.

for i in leak:
    if leak[i]!=1:
	    tmp = list(leak[i])
	    fo.write('%s,%s\n'%(i,' '.join(tmp)))
	    del tmp

Finally, the document information with one or more uuids included in leak is written to a file.

Looking at the exported file, it looks like this.

The link file of document_id and uuid is completed. In other words, it is a file that describes information about which user visited on a unique Web page.

About libffm

Collaborative filtering, an advanced version of FM. It has been used frequently since the middle of last year in competitions dealing with that type of data.

As of March 9, 2017, as a result of trying to install libffm on linux and mac, some bugs occur. The problem is that the latest sdk does not have nanosocket, and import ffm cannot be done if installed with an earlier version. I've been on the agenda several times on turi-code issues and stackoverflow, but it doesn't seem to be resolved. Therefore, the libffm sample program frequently used by the competition participants is not used here. It's a shame because it's a library that has produced very good results this time.

Once the problem is resolved, I'll cover the details in another article.

FTRL-Proximal SRK has released python code using FTRL. Click here for gist.

FTRL is the algorithm used by Google for CTR prediction this is the original paper.

It is said that it uses a program that uses vowpal wabbit and this code. I haven't actually used vowpal wabbit, but Kaggle's superiors are using it quite a bit, so I'm thinking of implementing it together with FTRL and uploading it as another article. It seems that the correlation with the result predicted by a classifier such as Tensorflow is small, and it can be used when combining in ensemble learning.

liblinear A tool for linear prediction for large datasets. It's rare to see it used in competitions, but it seems to lead to unexpectedly higher scores. Reference site (English)

Kaggle Summary: Outbrain # 2

Most important features

Model used in Layer 1

rcarson features

About libffm