Stock Mentor
Team
Table of Contents
Background
According to the above image in the stock market when a person; - buy stocks for a lower price and sell it to a higher price = Long , - sell stocks for a higher price first and then buy stocks for a lower price = Shorts, - not buying or selling stocks = hold
Problem
In stock market investors buy, sell or, hold data on daily basis. - Investors engage in daily buying, selling, or holding of stock market data. - Predicting market movements accurately is challenging, leading to potential losses and risk. - Investors should conduct thorough research and seek professional advice before making investment decisions.
Solution
Developing a web application specifically designed to provide investors with recommendations on stock trading decisions for different companies. The application will offer daily suggestions on whether to buy, sell, or hold stocks, assisting investors in making informed choices. By leveraging this application, investors can gain valuable insights to navigate the dynamic stock market effectively.
Following are the facts that were taken into consideration when designing the solution
-
Classification
Three key categories
- Long
- Hold
- Short
-
Imbalanced data
An imbalanced data set refers to a dataset where the distribution of classes or labels is heavily skewed.
Impact
There are hundreds of models available, but unfortunately, more than 70% of them make incorrect predictions for the stock market. This is primarily because these models are solely based on software knowledge without considering the crucial aspect of business knowledge. However, our application will stand out by incorporating the business ideas from the project owner, resulting in higher accuracy and a greater impact.
-
Investor Confidence: Accurate predictions can enhance investor confidence in the stock market. If investors have access to reliable predictions of hold, buy, or sell periods, they may make more informed decisions, leading to increased participation and potentially higher trading volumes.
-
Increased accuracy in trading decisions refers to the potential for algorithmic trading systems to make more precise and data-driven choices when executing trades
Challenges & remedies
Difficulties in finding a suitable dataset for the project
- Utilizing ParseHub web scraping tool.
- Obtaining historical data from Yahoo Finance.
The data set contained unclear data, unwanted columns, and duplicated values.
Used set for the solution
- Explore the data here and see what story data tells us about the stocks.
- Check for null values and try to impute them.
- Check each and every feature and see whether it is useful to us or not.
- Plot various plots to understand the relationship between different features.
- Check whether there is a correlation between data or not.
Unable to obtain the target class for the dataset
- Leverage predefined stock market indicators.
- Served as a basis for determining the target class for each data entry.
User view
Project Progression
We completed our project following this workflow:
- In the 7th week, we identified the problem.
- From the 8th to the 11th week, we defined the ML model and dataset, while simultaneously developing the web application.
- During weeks 12 to 13, we created the ML model and integrated it into the application.
- Finally, we completed our project in the last week, which was the 14th week.
Throughout each development phase, we adhered to the Agile methodology and conducted testing every week.
Process
We divided the project into smaller tasks and allocated them among our team members. To facilitate this process, we utilized a Kanban board. For the discussion, we have used Slack workspace.
ML Evaluation
Our data set is imbalanced distribution Normally in stock data set is like this. Here the image shows that
Here we are using SMOTE technique on training data, SMOTE algorithm synthetically produces samples using KNN(K nearest neighbour) algorithm, so it understands the local behaviour of a particular class label and try to produce synthetic rows according to that behaviour.
It is classification problem so we are using random forest model with smote train data. Here our model perform with train dataset and test data set
Testing
We conducted API testing using the Postman application, where we tested the endpoints. Additionally, we performed unit testing for each frontend component both manually and by employing test cases. Model Selection: We tested multiple machine learning models and selected the one with the highest F1 score, indicating superior overall performance.