Enhancing Multimodal Fusion Techniques for Depression Detection

Overview

This project focuses on advancing multimodal fusion techniques for depression detection by addressing key challenges and integrating clinical data into the system. Our main contributions include:

Handling Missing Modalities: Developing methods to manage incomplete data when some modalities are absent, ensuring robust performance.
Addressing Class Imbalance: Removing or balancing classes in datasets to prevent overfitting and improve generalization.
Incorporating Clinical Data: Extracting clinically relevant information from transcripts to enhance the accuracy of depression detection.
Counselor Channeling Web Application: Building a web platform that allows users to connect with counselors and monitor depression status.
Chatbot Integration: Providing an interactive chatbot that asks questions, gathers user responses, and predicts depression levels based on answers.

System Workflow

Data Processing & Model Training
- Preprocessing multimodal data (audio, video, text, clinical).
- Handling missing modalities and class imbalance.
- Incorporating clinical data extracted from transcripts.
- Training deep learning models using Python, PyTorch, etc.
Web Application
- Frontend developed with React.
- Backend built with Spring Boot.
- Users can connect with counselors, view depression assessments, and interact with the chatbot.
Deployment & Infrastructure
- Containerization using Docker and Kubernetes.
- Cloud deployment on AWS:
  - S3 for storing videos and preprocessed data.
  - Lambda functions for triggering preprocessing tasks.

Web Application High-Level Architecture

System Architecture Overview

The web application follows a microservices architecture pattern with clear separation of concerns between frontend, backend, and AI processing components.

System Architecture Diagram

Component Architecture

Frontend Layer (React.js)

User Dashboard: Personal depression assessment history and trends
Counselor Portal: Interface for mental health professionals
Chatbot Interface: Interactive depression screening questionnaire
Real-time Communication: WebSocket integration for counselor-patient communication
File Upload Components: Secure upload for audio/video data
Authentication Module: User login, registration, and session management

Backend Layer (Spring Boot)

Ingress Controller: RESTful endpoints for frontend communication
Authentication Service: JWT-based user authentication and authorization
User Management Service: User profiles, roles, and permissions
Counselor Matching Service: Algorithm-based counselor assignment
Session Management: Counseling session scheduling and tracking
Data Processing Coordinator: Orchestrates AI model inference requests
Notification Service: Email/SMS notifications for appointments and results

AI Processing Layer (Python)

Model Inference API: Serves trained depression detection models
Multimodal Fusion Service: Combines audio, video, and text predictions
Feature Extraction Services: Processes raw data into model-ready features

Cloud Infrastructure (AWS)

S3 Storage: Raw and processed multimedia files
Lambda Functions: Serverless data preprocessing triggers

Data Preprocessing Methods

Data Type	Preprocessing Method	Description
Audio Data Preprocessing
Video Data Preprocessing
Text Data Preprocessing
Clinical Data Preprocessing
Multimodal Data Integration

Tech Stack

Features

Multimodal data fusion handling missing modalities.
Dataset balancing to prevent overfitting.
Clinical data extraction from transcripts for improved accuracy.
Counselor channeling through a user-friendly web interface.
Interactive chatbot for depression screening based on user responses.
Cloud-based scalable deployment.