Instacart Market Basket Analysis

Which products will an Instacart consumer purchase again?

Image Courtesy: https://www.slideshare.net/JeremyStanley4/deep-learning-at-instacart

Table of Contents

1. Business/Real-world Problem

2. Business objectives and constraints

3. Data overview and Data set column analysis

Data Source:

4. Mapping the real world problem to a Machine Learning Problem

5. Performance metric:

6. Exploratory Data Analysis

Basic sanity checks on the Data

df_prior_final is obtained by joining order_products_train.csv file with orders.csv, aisles.csv, departments.csv
df_orders is DataFrame built from orders.csv file

Data distribution in different files

Total number of Orders/Reorders per day

Total number of orders per hour

Total number of orders vs days_since_prior_order

Total orders placed by user vs Order count

Bucket size of users vs Order count

Mean_add_to_cart_order vs Reorder

Word Cloud of Products Ordered

Top 10 department based on orders and their reorder count

Top 10 aisle based on orders and their reorder count

Contribution of each department and aisle to Total Reorders

7. Existing Solutions/Approaches

MF is matrix factorization technique. Surprise is a python library to solve recommender systems problems.

8. First cut approach

feature_engineering function is used to extract features from prior data for each row in train data,the complete code can be found on my Github repository

Working Approach:

Process Flow

9. Feature Engineering

User features

Product features

User X product Features

Aisle features

Department

Word2Vec on products

From the above we can see that if we choose n_components=40 then we can preserve 80% of variance

Encoding cyclic features

One common way to encode cyclic features is to use sine an cosine transformations. 
We can do that using following transformation.
Xsin = sin(2∗π∗x/max(x))
Xcos = cos(2∗π∗x/max(x))

Whether a product is Organic or not

10. Model building

Hyper-parameter tuning LGBM using RandomizedSearchCV
Feature importance of final Model

11.Inference Pipeline

12. Summary

Kaggle Submission Score
Success

13. Future Work

14. Profile

15. References

Software Engineer with interest in Data driven analysis using Machine Learning and Deep Learning techniques.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store