We, Team AI, hold machine learning study sessions and data analysis hackathons every day in Shibuya. The goal is to have a community of 1 million people, mainly in Tokyo.
I hope that this data analysis movement will spread throughout Japan and around the world. We have compiled a tutorial that is useful when doing a data analysis hackathon. It's a lot of fun, so everyone, especially the locals, should definitely try it for yourself! We will also cooperate as Team AI.
Watch this first! Explanation of Kaggle Kernel function created by Ishii (Increases productivity!) => https://www.youtube.com/watch?v=HkJmnpBjiI0
https://www.codexa.net/what-is-kaggle/
http://luvtome.blog5.fc2.com/blog-entry-644.html
Lots of DataSets Here, click on an interesting dataset with lots of likes. DataSet can also be searched by keyword. https://www.kaggle.com/datasets
Full-time Kaggler Curry-chan's detailed Kaggle commentary; https://note.mu/currypurin/n/nf390914c721e
Curry-chan also has Kaggle information on Twitter; https://twitter.com/currypurin
2018/9/6 Cross-search engine for datasets announced by Google It's very convenient https://toolbox.google.com/datasetsearch
Kaggle begins http://qiita.com/taka4sato/items/802c494fdebeaa7f43b7
If you want to become a data scientist, start with Kaggle
http://qiita.com/KIKUYA-Takumi/items/13ac849582318f559271
Kaggle Slack Group
Global Group 3000 people https://kagglenoobs.herokuapp.com/
400 people mainly in Japanese group high level http://kaggler-ja.herokuapp.com/
Fintech Data Hackathon
Bitcoin Price Prediction (LightWeight CSV) https://www.kaggle.com/team-ai/bitcoin-price-prediction
Uniqlo (FastRetailing) Stock Price Prediction
https://www.kaggle.com/daiearth22/uniqlo-fastretailing-stock-price-prediction
Foreign Exchange (FX) Prediction - USD/JPY https://www.kaggle.com/team-ai/foreign-exchange-fx-prediction-usdjpy
Foreign Exchange(FX) Prediction - EUR/USD https://www.kaggle.com/meehau/EURUSD/kernels Is the fairly carefully written Kernel => prediction accuracy 99.7% true? ?? https://www.kaggle.com/daiearth22/eurusd-15-minute-interval-price-prediction?scriptVersionId=8708587
Kaggle datasets in finance category (competition is heavy data) https://www.kaggle.com/tags/finance
Credit Card Fraud Credit card fraud detection data (66MB, so heavy) https://www.kaggle.com/mlg-ulb/creditcardfraud
StockPrice and News Correlation analysis of news and stock price (6MB) https://www.kaggle.com/aaron7sun/stocknews
Loan Data for risk analysis Lending risk calculation data (6KB light) https://www.kaggle.com/zhijinzhai/loandata
Loan Data for risk analysis(heavy data) Loan risk calculation data (240MBvery heavy) https://www.kaggle.com/wendykan/lending-club-loan-data
A story about predicting exchange rates with Deep Learning http://qiita.com/ognek/items/1b776d504d20bd6f6d7d
When I verified the stock price forecast paper with Twitter sentiment analysis, I was able to predict up and down with an accuracy of about 70% http://qiita.com/ryo_grid/items/5a5ecc602186a3381c87
Format and display time series data with different scales and units with Python or Matplotlib http://qiita.com/zaburo/items/00f364422ef3fe64f156
Indian financial data provider; https://www.quandl.com/
I received some useful information from a day trader.
Alpha AI's open source project for stock price forecasting from data preprocessing to LSTM training-98% accuracy https://github.com/VivekPa/AlphaAI
Finance x Python Mokumokukai FinPy https://fin-py.connpass.com/
Quantopian Mokumokukai https://quantopian-tokyo.connpass.com/
Zero commission stock trading app Stream https://smartplus-sec.com/stream/
Python day trader Doriran Twitter https://twitter.com/patraqushe?lang=en
Day trading engineer Shinseitaro Twitter https://twitter.com/shinseitaro
Investor support app MyTrade that can be used for free https://mytrade.jp/
Dragon King theory that predicts economic crisis with the concept of anomaly detection (similar to Black Swan) https://www.ted.com/talks/didier_sornette_how_we_can_predict_the_next_financial_crisis/transcript?language=ja#t-6583
Dragon King theoretical paper https://arxiv.org/abs/0907.4290
I tried to analyze card payment default data with Excel (statistics that can not be heard now) https://medium.com/team-ai-math/data-analysis-by-excel-b90fcbd7f4fe
25 overseas FinTech investment surveys Jan 2018 https://medium.com/team-ai-fintech/fintech-investment-jan-35d2424f22f4
Featured overseas FinTech service example 20 https://medium.com/team-ai-fintech/fintech-startups-20-2c21b27ea003
Medical Data Hackathon
Synchronized brainwave dataset EEG https://www.kaggle.com/berkeley-biosense/synchronized-brainwave-dataset
Breast Cancer Wisconsin (Diagnostic) Data Set Breast Cancer https://www.kaggle.com/uciml/breast-cancer-wisconsin-data
Hospital General Information Hospital https://www.kaggle.com/cms/hospital-general-information
Zika Virus Epidemic Zika fever https://www.kaggle.com/cdc/zika-virus-epidemic
Cervical Cancer Risk Classification Cervical Cancer https://www.kaggle.com/loveall/cervical-cancer-risk-classification
Medical Appointment No Shows Patient slapstick analysis https://www.kaggle.com/joniarroba/noshowappointments
Mental Health in Tech Survey Mental Health in Tech Survey https://www.kaggle.com/osmi/mental-health-in-tech-survey
Google's cool data visualization tool FACETS https://pair-code.github.io/facets/
RandamForest's Regressor roughly detects the importance of variables (useful!) http://scikit-learn.org/…/sklearn.ensemble.RandomForestRegr…
Pands Profiling to get an overview of the acquired data https://wonderwall.hatenablog.com/entry/2018/02/12/171500
Pharmaceutical open data DrugBank https://www.drugbank.ca/
Open protein data Protein Bank https://www.rcsb.org/
Google's free GPU cloud Colaboratory is super convenient http://itsukara.hateblo.jp/entry/2018/02/05/214949
NASA/Space Data Hackathon
Exoplanet Hunting in Deep Space Planetary exploration data https://www.kaggle.com/keplersmachines/kepler-labelled-time-series-data
Solar Radiation Prediction Solar Radiation Data https://www.kaggle.com/dronio/SolarEnergy
Climate Change: Earth Surface Temperature Data Earth Surface Temperature Data https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data
Meteorite Landings Meteorite impact data https://www.kaggle.com/nasa/meteorite-landings
UFO Sightings UFO discovery data https://www.kaggle.com/NUFORC/ufo-sightings
Open Exoplanet Catalog exoplanet data https://www.kaggle.com/mrisdal/open-exoplanet-catalogue
Kepler Exoplanet Search Results Exoplanet data 2 https://www.kaggle.com/nasa/kepler-exoplanet-search-results/kernels
NASA Exoplanet Exploration Kepler Space Telescope Mission Details https://japanese.engadget.com/2018/03/15/9-4500/
Sakura Internet's artificial satellite data utilization mechanism Tellus https://www.sakura.ad.jp/information/pressreleases/2018/07/31/1968197591/
Google Earth API https://developers.google.com/earth-engine/
Marketing/Retail Data Hackathon
Springleaf Marketing Response Direct mail response analysis 150MB https://www.kaggle.com/c/springleaf-marketing-response/kernels
Coupon Purchase Prediction Recruit Pompare data https://www.kaggle.com/c/coupon-purchase-prediction
Airbnb New User Bookings Airbnb Booking Data Analysis Where will a new guest book their first travel experience? https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings
Rossmann Store Sales Retail Sales Forecast https://www.kaggle.com/c/rossmann-store-sales/data
Home Depot Product Search Relevance Predict the relevance of search results on homedepot.com https://www.kaggle.com/c/home-depot-product-search-relevance
Acquire Valued Shoppers Challenge Predict which shoppers will become repeat buyers https://www.kaggle.com/c/acquire-valued-shoppers-challenge
Getting real about fake news https://www.kaggle.com/mrisdal/fake-news
Starbucks Locations Worldwide https://www.kaggle.com/starbucks/store-locations
Retail rocket recommendation system dataset https://www.kaggle.com/retailrocket/ecommerce-dataset
Grupo Bimbo Inventory Demand Optimize food sales and minimize returns (Train data 3GB data available) Maximize sales and minimize returns of bakery goods https://www.kaggle.com/c/grupo-bimbo-inventory-demand
Innerwear Data from Victoria's Secret https://www.kaggle.com/PromptCloudHQ/innerwear-data-from-victorias-secret-and-others
Natural language processing tutorial => https://qiita.com/daisuke-team-ai/items/d2e18f07a08d9b4cb783
https://www.kaggle.com/abhishek/approaching-almost-any-nlp-problem-on-kaggle
NLP Data;
Shinzo Abe Twitter Data (Prime Minister Abe's Twitter data) https://www.kaggle.com/team-ai/shinzo-abe-japanese-prime-minister-twitter-nlp/version/1
World News on Reddit News data analysis on the bulletin board https://www.kaggle.com/rootuser/worldnews-on-reddit
South Park Dialogue Identify the speaker from the dialogue data of the animation script https://www.kaggle.com/tovarischsukhov/southparklines
Deep NLP Analysis of Chatbot and resume data https://www.kaggle.com/samdeeplearning/deepnlp
Python Questions from StackOverFlow Question analysis about Python on programming Q & A site https://www.kaggle.com/stackoverflow/pythonquestions
Japanese English Bilingual Corpus (Wikipedia Corpus in Japanese and English) https://www.kaggle.com/team-ai/japaneseenglish-bilingual-corpus
Japanese lemma frequency 15000 list of frequently used words in Japanese A list of the 15,000 most common word forms in Japanese https://www.kaggle.com/rtatman/japanese-lemma-frequency
Japanese Whiskey Review Dataset (English but Japanese Whiskey Review) 1,000+ Reviews of Japanese Whisky https://www.kaggle.com/koki25ando/japanese-whisky-review
(For advanced users) A competition to classify similar questions on the Q & A site Quora https://www.kaggle.com/c/quora-question-pairs
Extra; President Trump's Twitter AI => Talk to him and he'll answer right away! https://twitter.com/TrumpSidekik
HR Data
Kaggle ML and Data Science Survey, 2017 Data Analysis Industry-Wide Analysis A big picture view of the state of data science and machine learning. https://www.kaggle.com/kaggle/kaggle-survey-2017
U.S. Incomes by Occupation and Gender Analysis of Income Gap by Gender Analyze gender gap and differences in industry's incomes https://www.kaggle.com/jonavery/incomes-by-career-and-gender
Daily Happiness & Employee Turnover Correlation Analysis of Performance and Employee Happiness Is There a Relationship Between Employee Happiness and Job Turnover? https://www.kaggle.com/harriken/employeeturnover
IBM HR Analytics Employee Attrition & Performance IBM Turnover Analysis Predict attrition of your valuable employees https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset
Human Resources Analytics Why are talented employees leaving their jobs? Analysis Why are our best and most experienced employees leaving prematurely? https://www.kaggle.com/ludobenistant/hr-analytics
2016 New Coder Survey Attribute data for 15,000 new software engineers A survey of 15,000+ people who are new to software development https://www.kaggle.com/freecodecamp/2016-new-coder-survey-
U.S. Incomes by Occupation and Gender Income inequality analysis by occupation and gender Analyze gender gap and differences in industry's incomes https://www.kaggle.com/jonavery/incomes-by-career-and-gender
Get time series data from k-db.com in Python
http://qiita.com/sawadybomb/items/03c3814268d3e2904e6c
Quora has a lot of know-how on time series forecasting (for FinTech); https://www.google.co.jp/search?q=how+to+predict+time+series+quora&rlz=1C5CHFA_enJP747JP747&oq=how+to+predict+time+series+quora&aqs=chrome..69i57.8273j0j7&sourceid=chrome&ie=UTF-8
(Preserved version: For amateurs) Machine learning / data analysis List of articles to read by Team AI
http://qiita.com/daisuke-team-ai/items/68f82f6502e06678c660
Pandas
Official Site http://pandas.pydata.org/ Loose fluffy pandas cheat sheet
http://qiita.com/tanemaki/items/2ed05e258ef4c9e6caac
If you remember this much, you can manage Pandas
http://qiita.com/kojim/items/c56ec63063bec62bc5ed
Seaborn
Official Site https://seaborn.pydata.org/
Beautiful graph drawing with python -seaborn makes data analysis and visualization easier Part 1
http://qiita.com/hik0107/items/3dc541158fceb3156ee0
Beautiful graph drawing with python -seaborn makes data analysis and visualization easier Part 2
http://qiita.com/hik0107/items/7233ca334b2a5e1ca924
Japanese settings for matplotlib and Seaborn axes
http://qiita.com/kshigeru/items/0cfc0778bab197687967
Recommended Posts