KaggleX Final Project Submission by Nupur Gokhale | Kaggle

I learned from my mentor and competed in the Auto competition on Kaggle. Through this experience, I gained valuable insights into creating a robust local validation framework and building blocks of recommender systems. I used libraries like Polaris, pandas, and numpy for data manipulation and trained my ranker model using xgboost. Our team moved up 167 places in the final leaderboard, ending up at 477 out of around 2500 participants. The key insight I gained is that the best learning happens by doing, and I’m grateful for my mentor’s guidance throughout the project. 🌟🚀📈

📊

Mentorship Program Project 🎓

In this presentation, I will highlight the details of my participation in the Carol X bypoc mentorship program. My mentor, Mensur Blakik, and I formed a team to compete in the Auto competition – a multi-objective recommender system project.

Background:

  • Earned an MBA from Syracuse University 🎓
  • Currently working as a data scientist at Cassette Media 📊
  • Aspiring to work as a data scientist in a data-driven organization
  • Interested in expanding ML knowledge and working as an ML engineer

The Project Objective 🧠

The main goal of this project was to predict clicks, carts, and orders for every session in the test set, in the absence of provided features. The explicit recommender system was used for this purpose.

Lessons Learned:

  • Building robust local validation frameworks
  • Understanding retrieval and ranking in recommender systems
  • Utilizing powerful libraries such as Polaris, pandas, and numpy

Competition Results 🏆

After participating in the competition, our team moved up 167 places in the final leaderboard, achieving a rank of 477 out of approximately 2500 participants.

Detailed Blog Post 📝

Here, I have provided a comprehensive background of the entire project, outlining the various steps involved and the ultimate goal of the proof of concept.

Features and Training

  • Implementing surrogate datasets for easier demonstration
  • Creating a local validation framework for the training data
  • Identifying user behavior using co-visitation matrices

Training the Model 🤖

  • Utilizing matrix factorization and word to vector techniques for model training
  • Generating and analyzing historical e-commerce data features
  • Achieving a local recall score of 65.64%

Final Model Results

  • Successfully training the model using the entire dataset
  • Validating the model with a test score of 64%

Conclusion and Takeaways 📌

The most valuable learning experiences come from practical application. I am incredibly grateful for the invaluable guidance of my mentor, Mensur, throughout this project.

Key Takeaways:

Learnings Key Points
Robust local validation framework Important for data validation
Understanding retrieval and ranking Vital for creating an effective model
Matrix factorization and word to vector Powerful techniques for model training

Thank you for the opportunity to present this project, and I look forward to future learning and growth as a data scientist.

Share the Post:

Related Posts