Document Tag Generator

Sample Image


Table of Contents

  1. Introduction
  2. Local Installation
  3. Links


✨ Problem

The project website of the Department of Computer Engineering currently has nearly 150 projects. These projects are categorized by only batches and subjectwise. Also, some of the projects have some tags but some of them are not relevant to those projects. Also, some projects do not have any tags. Currently, users can search projects by keywords, but those keywords are derived only from project descriptions.

Screenshot 2022-03-14 222117

✨ Our goal

Our goal is to generate relevant tags for each project according to the description of the projects and other valid data available on the project pages.

✨ Solution

Our plan is to build an ML model to generate relevant tags. The data needed to implement the ML model is retrieved from the project pages and the project repositories. To get all details (link to the repositories and project pages + other details) of the project pages and repositories API of the website is used. To get the data from the project pages a scraping tool will be used.

</br> __ *Project Owner : Mr. Nuwan Jaliyagoda*
*Scrum Master: Mr. Thushara Bandara
* __

✨Our Team

E/17/100 - Gunathilaka R.M.S.M

E/17/246 - Perera K.S.D

E/17/284 - Rathnayaka R.L.D.A.S

Local Installation

The site is built by Jekyll Builder and hosted on GitHub pages.

current version is 2.7.1

rbenv install 2.7.1
rbnev global 2.7.1

For the API install this additional python packages

pip install requests
cd ./python_scripts/



In department projects website, frontend, and backend are already implemented. According to the current implementation, users can search projects using tags. But the tagging was done using a simple algorithm such that it checks whether the project description contains the searching tag. Our goal is to implement a machine learning model, which can do tagging in a much better way.

In order to train the machine learning model we need a data set that contains the details of the projects. We hope to use project descriptions, project repositories, and project pages to generate the data set. By using this dataset, we have to train a good ML model, which can tag projects in the department website in a better way.

After implementing the ML model, we have to integrate it with the backend of the department website. Then we need to run the ML model to generate tags and those tags should be stored in a json file inside the backend repository.

Backend of the department project website can be accessed by a API. It contains a end point to access that json file which contains all the generated tags and their corresponding tags. When a user search a project using tags, by using tags file, relevant projects will be shown to the user.

When new project is added to the department project website, we need to run the ML model again and update the tags file. GitHub actions can be used for that.

Since project pages and project repositories are update regularly, we hope to run the ML model weekly.