Stock Mentor

Team

E/18/013, Abhilash R., email
E/18/058, De Alwis K. K. M., email
E/18/115, Gowsigan A., email

6sp milestone 2

Background
Problem
Solution
Impact
User view
Timeline
Links

Background

According to the above image in the stock market when a person; - buy stocks for a lower price and sell it to a higher price = Long , - sell stocks for a higher price first and then buy stocks for a lower price = Shorts, - not buying or selling stocks = hold

Problem

In stock market investors buy, sell or, hold data on daily basis. - Investors engage in daily buying, selling, or holding of stock market data. - Predicting market movements accurately is challenging, leading to potential losses and risk. - Investors should conduct thorough research and seek professional advice before making investment decisions.

Solution

Developing a web application specifically designed to provide investors with recommendations on stock trading decisions for different companies. The application will offer daily suggestions on whether to buy, sell, or hold stocks, assisting investors in making informed choices. By leveraging this application, investors can gain valuable insights to navigate the dynamic stock market effectively.

Following are the facts that were taken into consideration when designing the solution

Classification

Three key categories
- Long
- Hold
- Short
Imbalanced data

An imbalanced data set refers to a dataset where the distribution of classes or labels is heavily skewed.

Impact

There are hundreds of models available, but unfortunately, more than 70% of them make incorrect predictions for the stock market. This is primarily because these models are solely based on software knowledge without considering the crucial aspect of business knowledge. However, our application will stand out by incorporating the business ideas from the project owner, resulting in higher accuracy and a greater impact.

Investor Confidence: Accurate predictions can enhance investor confidence in the stock market. If investors have access to reliable predictions of hold, buy, or sell periods, they may make more informed decisions, leading to increased participation and potentially higher trading volumes.
Increased accuracy in trading decisions refers to the potential for algorithmic trading systems to make more precise and data-driven choices when executing trades

Challenges & remedies

Difficulties in finding a suitable dataset for the project

Utilizing ParseHub web scraping tool.
Obtaining historical data from Yahoo Finance.

The data set contained unclear data, unwanted columns, and duplicated values.

Used set for the solution

Explore the data here and see what story data tells us about the stocks.
Check for null values and try to impute them.
Check each and every feature and see whether it is useful to us or not.
Plot various plots to understand the relationship between different features.
Check whether there is a correlation between data or not.

Unable to obtain the target class for the dataset

Leverage predefined stock market indicators.
Served as a basis for determining the target class for each data entry.

User view

Screenshot 2023-09-15 at 22 44 19

Screenshot 2023-09-15 at 22 44 07

Project Progression

6sp milestone 2 (1)

We completed our project following this workflow:

In the 7th week, we identified the problem.
From the 8th to the 11th week, we defined the ML model and dataset, while simultaneously developing the web application.
During weeks 12 to 13, we created the ML model and integrated it into the application.
Finally, we completed our project in the last week, which was the 14th week.

Throughout each development phase, we adhered to the Agile methodology and conducted testing every week.

Process

We divided the project into smaller tasks and allocated them among our team members. To facilitate this process, we utilized a Kanban board. For the discussion, we have used Slack workspace.

ML Evaluation

Our data set is imbalanced distribution Normally in stock data set is like this. Here the image shows that

Screenshot 2023-09-15 at 22 20 46

Here we are using SMOTE technique on training data, SMOTE algorithm synthetically produces samples using KNN(K nearest neighbour) algorithm, so it understands the local behaviour of a particular class label and try to produce synthetic rows according to that behaviour.

It is classification problem so we are using random forest model with smote train data. Here our model perform with train dataset and test data set

Screenshot 2023-09-15 at 22 21 34

Testing

We conducted API testing using the Postman application, where we tested the endpoints. Additionally, we performed unit testing for each frontend component both manually and by employing test cases. Model Selection: We tested multiple machine learning models and selected the one with the highest F1 score, indicating superior overall performance.

Screenshot 2023-09-15 at 22 23 11