Back to projects
November 2023
2 min read

Taxi Fare Prediction

A extensive end-to-end machine learning pipeline to predict Taxi Fare using various qualitative and quantitative factors.
  • scikit-learn
  • Python
  • Matplotlib
  • Pandas
  • NumPy

This competition was part of an academic project for the course Machine Learning Project (CS2008P) at the IITM BS in Data Science Program.

  • A Kaggle Competition to build the most accurate models for predicting the total amount paid by travelers for taxi rides.
  • Successfully predicted Taxi Fare with an R^2 value of 94.6%, ranking 63/714 participants.
  • Performed Exploratory Data Analysis, handled missing values through Imputation.
  • Developed a robust system to test various sklearn ML estimators, including in a convenient manner. Tested 8+ estimators, including Linear Regression, Decision Trees, Random Forests, ExTra Trees and more. Key statistics were also visualized.

This competition was interesting to me, as with the onset of on-demand taxi applications like Ola, Uber and the like, it’s not always obvious how fare is calculated based on a variety of both qualitative and quantitative factors. This was a valuable learning to me.

Additionally, Scikit-Learn, as a library is an indespensible tool for developers and analysts alike. However, while training single model is convenient and straight-forward, training a variety of different models with various pipelines, and hyper parameters is not so straight forward. In this project, I have developed custom functions to facilitate the training various models and also compare them and choose the best among them.