Highest Demanded Location Prediction System for Taxi Drivers

Highest Demanded Location Prediction System for Taxi Drivers 🚖 | Optimized Fleet Management 📈 | Enhanced Customer Experience 🌟

<a href="https://github.com/cepdnaclk/e19-co544-Demand-Location-Prediction-For-Taxis/issues">
    <img src="https://img.shields.io/github/issues/cepdnaclk/e19-co544-Demand-Location-Prediction-For-Taxis" alt="GitHub issues">
</a>
<a href="https://github.com/cepdnaclk/e19-co544-Demand-Location-Prediction-For-Taxis/pulls">
    <img src="https://img.shields.io/github/issues-pr/cepdnaclk/e19-co544-Demand-Location-Prediction-For-Taxis" alt="GitHub pull requests">
</a>
<a href="https://github.com/cepdnaclk/e19-co544-Demand-Location-Prediction-For-Taxis/releases">
    <img src="https://img.shields.io/github/downloads/cepdnaclk/e19-co544-Demand-Location-Prediction-For-Taxis/total" alt="GitHub downloads">
</a>
<a href="https://github.com/cepdnaclk/e19-co544-Demand-Location-Prediction-For-Taxis/releases">
    <img src="https://img.shields.io/github/v/release/cepdnaclk/e19-co544-Demand-Location-Prediction-For-Taxis" alt="GitHub release">
</a>

</br>

Table of Content

Introduction
Problem
Opportunity in Domain
Detailed Solution
Data Collection
Exploratory Data Analysis
Getting Started
Contributors
Links
Suggested Approaches
Getting Started
Contributors
Links

📚 Introduction

This project leverages machine learning to predict taxi ride demand across different regions. By forecasting ride numbers for specific locations and time, we aim to create a more efficient and profitable ecosystem for taxi services.

🚧 Problem

For Taxi Drivers:

Less Availability: Long waits without passengers.
Less Profit: Inefficient positioning reduces earnings.

For Customers:

High Stand Time: Long wait times for taxis.
Time Wastage: Delays due to inefficient taxi distribution.
Unfair Pricing: Surge pricing during peak demand.

🛠️ Detailed Solution

We are interested on 265 regions(PUlocations) on Newyork.
A machine learning model is trained for one region. the data we need to train the model are dates and respective number of rides per date.
This model can predict the number of rides excepted for next few days(may be 1,2,3...) for that particular region.
We do the same for other 264 regions where data is available. Now once we give the data for each respective model we can predict the number of rides for next day. Since we have number of rides for next day for all the 265 regions we can show it on a heatmap.

📊 Data Collection

Our primary data source is the New York City Taxi and Limousine Commission website. From this resource, we can obtain approximately 20 features categorized by monthly data spanning over 20 years.

🔍 Exploratory Data Analysis

Steps Involved

Remove colums with higher percentage with Null values.
Removes any rows from the dataFrame that contain missing values (NaN)
Calculate trip times and speed. Then we can remove more data with unusual values. Using this we could remove the rows with unusual speeds, and trip times.
Visualize interested parameters in the box plots. And checked for outliers.
Convert pickup times raw into date time object. Since we are not intersted on the time we could remove that.
Fianally, we could filer only date of ride (based on the pickup time) and PUlocation.
Collect .csv files based on month in format. Finally we have 12 csv files for a year with naming format YYYY-MM.

🚀 Getting Started

Anybody can explore this project and gain insights. It’s easy.

Clone the repository to the htdocs folder inside the XAMPP installation location.

Contributors

First Image Second Image

Convert pickup times raw into date time object. Since we are not intersted on the time we could remove that.</li>

Fianally, we could filer only date of ride (based on the pickup time) and PUlocation.</li>

Collect .csv files based on month in format. Finally we have 12 csv files for a year with naming format YYYY-MM.</li>

Now we can consider one pu_location at a time with respective dates and train the machine learning model.</li>

</ol>

💡 Suggested Approaches

Auto Regression model
ARIMA model
Nural network
LSTM model

👥 contributors

E/19/034, H.M.K.D. Bambaragama, email
E/19/226, K.G.M. Madushanka, email
E/19/278, A.P.T.T. Perera, email
E/19/409, D.P. Udugamasooriya, email
E/19/432, U.I. Wickramaarachchi, email

Highest Demanded Location Prediction System for Taxi Drivers