ABOUT OUR PRODUCT
While DNA sequencing is an integral part of genetics research, there is a significant lack of fully-functional tools that provide easy access to DNA sequencing in a cost-effective manner.
Our goal is to minimize the inefficiencies faced by professionals who conduct NGS for Chloroplast, Mitochondria and plant ITS regions by developing a complete solution that incorporates existing DNA sequencing pipelines into a single environment that is both intuitive and accessible.
- Provides a GUI environment integrating all the necessary tools for the pipelines.
- Automates many of the intermediate steps while providing the user the relevant information and statistics for each step.
- Provides suggestions to the user on possible next steps to take in case there are multiple possibilities.
Our product is focused on the domain of Next Generation Sequencing and Skim sequencing technique.
DNA Sequencing is the process of determining the order of nucleotide bases A, G, T, C in a DNA molecule. Knowledge of DNA sequences has become indispensable for biological research in a wide variety of fields ranging from diagnosis of diseases in medicine to improvement of crops in the field of agriculture.
Next Generation Sequencing is a series of highly-scalable methods and techniques that allow for high throughput sequencing of DNA by fragmenting the genome and sequencing randomly sampled fragments.
Skim sequencing is a cost-effective technique that generates fragments of DNA known as genome skims. Since the genome skims are shallow sequenced, computationally efficient tools are required to assemble the genome skims back to the complete genome.
Chloroplast genes and genomes play an important role in plant phylogeny and species identification and there are several assembly pipelines designed to assemble a whole chloroplast genome. They are involved in plastid gene expression and photosynthesis.
Mitochondria are structures within cells that convert the energy from food into a form that cells can use. Mitochondria genes and genomes are the most important genomic data for animal phylogeny and species identification.
The nuclear ribosomal Internal Transcribed Spacer (ITS) region is one of the most suitable genomic regions for molecular barcoding of plant species. ITS is the spacer DNA situated between the small-subunit ribosomal RNA (rRNA) and large-subunit rRNA genes in the chromosome.
Our goal is to minimize the inefficiencies and problems faced by the professionals who conduct their research in NGS for Chloroplast, Mitochondria and plant ITS regions. We plan to provide this combined functionality of each stage in the process of NGS, through a well-planned combination of pipelines, in the final form of an open source software with additional features which will provide the users the opportunity to carry out NGS with ease, so that they can make decisions while receiving recommendations made by us in order to achieve maximum accuracy required for the process.
To minimize the inefficiencies and problems faced by the researchers, while optimizing the processing time through multi-threading.
Provide this combined functionality of each stage in the process of NGS, through a well-planned combination of pipelines.
Provide an open source software with additional features at each step of the pipeline.
Provide a platform for users to carry out Skim sequencing without having to manually choose the tools with best performance.
Provide a single package with dependencies and resolve issues with different file formats.
Additional functionalities include providing recommendations to the user in order to improve performance. This is handled by an ML component.
The users can select from the three pipelines; Chloroplast, Mitochondria and ITS region and select the an assembly tool of their choice, among the ones that have been provided by us in the toolkit. The running of the process is automated along with many of the configuration decisions that need to be set prior to running. The users can observe the relevant output logs once the assembly process is over, along with the annotated genomes and necessary statistics.
Integrate the pipelines
The solution will contain a GUI environment that integrates all the necessary tools for the pipelines into a single space.
Automate the Workflow
The software will automate a lot of the intermediate steps while providing the user the relevant information and statistics for each step.
The solution will attempt to provide suggestions to the user on possible next steps to take in case there are multiple possibilities.
In the general workflow of the system, the user can choose among the existing pipelines for Chloroplast, Mitochondria and ITS regions. The steps under the implementation of each pipeline are given above. We will be using many open-source assembly tools such as FastPlast, NovoPlasty and GetOrganelle, along with other tools for pre-processing of data and for annotations. The outputs provided by these tools will be presented to the user at each step of the pipeline.
We have used the MVC architecture for the development our software. The User will be directly providing input through the App GUI, which displays all the necessary information to the user. The GUI is directly updated through the Controllers. There are several Controllers which associate with the methods defined in the Services and the Tool Wrappers in order to provide the pipeline functionality. There are multiple Services in the system to handle CLI commands, file-format handling, dependency installation, storing user preferences, check connectivity, get necessary file path for input files, and many more. All the open-source tools we are using for implementing the pipelines will be handled using Tool Wrappers for the ease of implementation. The recommendation engine will be running as a seperate part of the Services provided by the software.
Given below is the GUI that our team is currently working on.
The implementation of the Fast-Plast tool in the server and the obtained results for selected input files are shown below.