SD19 Handwritten Character Recognition System

A sophisticated deep learning system for recognizing handwritten characters using state-of-the-art Convolutional Neural Networks (CNNs). This project demonstrates the power of modern machine learning techniques applied to the classic problem of handwritten character recognition.

Sample Character Recognition

Team

Table of Contents

  1. Introduction
  2. System Architecture
  3. Dataset
  4. Model Implementation
  5. Results and Performance
  6. GUI Application
  7. Installation and Usage
  8. Technical Specifications
  9. Future Work
  10. Links

Introduction

Handwritten character recognition is a fundamental problem in computer vision and machine learning with numerous practical applications including:

This project implements a comprehensive solution using the NIST Special Database 19 (SD19), which contains isolated handwritten characters from forms. Our system achieves high accuracy through:

Key Features

Problem Statement

The challenge of recognizing handwritten characters involves several complexities:

System Architecture

Our system follows a modular architecture with clear separation of concerns:

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Input Layer   │────│ Preprocessing    │────│  CNN Model      │
│                 │    │ • Resize         │    │ • Feature       │
│ • Image File    │    │ • Normalize      │    │   Extraction    │
│ • Canvas Draw   │    │ • Augmentation   │    │ • Classification│
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                                         │
┌─────────────────┐    ┌──────────────────┐             │
│   Output        │────│ Post-processing  │─────────────┘
│                 │    │ • Softmax        │
│ • Predicted     │    │ • Confidence     │
│   Character     │    │ • Class Mapping  │
│ • Confidence    │    │                  │
└─────────────────┘    └──────────────────┘

Component Overview

  1. Input Module: Handles various input sources (files, canvas drawings)
  2. Preprocessing Module: Standardizes images for model consumption
  3. Model Module: CNN implementations in both PyTorch and TensorFlow
  4. Prediction Module: Unified interface for making predictions
  5. GUI Module: Interactive application for user interaction

Dataset

SD19 (NIST Special Database 19)

Character Classes

The dataset includes 47 distinct character classes:

Category Characters Count
Digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 10
Uppercase A, B, C.c, D, E, F, G, H, I.i, J.j, K.k, L.l, M.m, N, O.o, P.p, Q, R, S.s, T, U.u, V.v, W.w, X.x, Y.y, Z.z 26
Lowercase a, b, d, e, f, g, h, n, q, r, t 11

Data Preprocessing Pipeline

  1. Image Loading: Read images from directory structure
  2. Grayscale Conversion: Ensure single-channel input
  3. Resizing: Standardize to 128×128 pixels
  4. Normalization: Scale pixel values to [0, 1]
  5. Augmentation: Rotation, shifting, and zooming for training

Model Implementation

Convolutional Neural Network Architecture

Our CNN architecture is designed for optimal performance on character recognition:

Input (128×128×1)
      ↓
Conv2D (32, 3×3) → ReLU → BatchNorm → MaxPool(2×2)
      ↓
Conv2D (64, 3×3) → ReLU → BatchNorm → MaxPool(2×2)
      ↓
Conv2D (128, 3×3) → ReLU → BatchNorm → MaxPool(2×2)
      ↓
Flatten
      ↓
Dense (512) → ReLU → Dropout(0.5)
      ↓
Dense (47) → Softmax
      ↓
Output (47 classes)

Key Design Decisions

  1. Multiple Convolutional Layers: Extract hierarchical features
  2. Batch Normalization: Accelerate training and improve stability
  3. Dropout Regularization: Prevent overfitting
  4. Progressive Filter Increase: Capture complex patterns at different scales

Framework Implementations

PyTorch Implementation

TensorFlow/Keras Implementation

Results and Performance

Training Performance

Metric PyTorch Model TensorFlow Model
Training Accuracy 96.2% 95.8%
Validation Accuracy 93.1% 92.7%
Training Time (50 epochs) 2.3 hours 2.1 hours
Model Size 15.2 MB 14.8 MB

Per-Class Performance

The model shows excellent performance across most character classes:

Confusion Matrix Analysis

Common misclassifications include:

GUI Application

Features

The interactive GUI application provides:

  1. Drawing Canvas:
    • Draw characters with mouse/stylus
    • Real-time stroke rendering
    • Clear canvas functionality
  2. File Upload:
    • Support for PNG, JPG, TIFF formats
    • Automatic preprocessing
    • Batch processing capability
  3. Model Management:
    • Load different trained models
    • Switch between PyTorch and TensorFlow
    • Model performance information
  4. Prediction Display:
    • Character prediction with confidence score
    • Visual feedback
    • Prediction history

User Interface Design

The GUI follows modern design principles:

Installation and Usage

Quick Start

# Clone repository
git clone https://github.com/cepdnaclk/e20-co542-handwritten-character-recognition.git
cd e20-co542-handwritten-character-recognition

# Install dependencies
pip install -r requirements.txt

# Run GUI application
python src/gui/application.py

Training Your Own Model

# For PyTorch
python scripts/train_pytorch.py

# For TensorFlow
python scripts/train_keras.py

API Usage

from src.utils.prediction import ModelPredictor
from src.models.pytorch_model import load_model

# Load and use model
model = load_model('saved_models/sd19_model.pth')
predictor = ModelPredictor(model, 'pytorch')
result = predictor.predict_image('character.png')

Technical Specifications

System Requirements

Minimum Requirements:

Recommended Requirements:

Dependencies

Core Libraries:

Development Tools:

Performance Benchmarks

Operation CPU Time GPU Time
Single Prediction 45ms 12ms
Batch (32 images) 980ms 156ms
Model Loading 1.2s 0.8s

Future Work

Planned Enhancements

  1. Extended Dataset Support:
    • EMNIST integration
    • Multi-language character sets
    • Cursive handwriting recognition
  2. Architecture Improvements:
    • Attention mechanisms
    • Transformer-based models
    • Ensemble methods
  3. Deployment Options:
    • Web application
    • Mobile app (iOS/Android)
    • Edge device deployment
  4. Advanced Features:
    • Real-time video recognition
    • Sequence recognition (words/sentences)
    • Style transfer capabilities

Research Directions


Acknowledgments

We thank the Department of Computer Engineering, University of Peradeniya, for providing the resources and guidance for this project. Special thanks to NIST for making the SD19 dataset publicly available for research purposes.


Introduction

description of the real world problem and solution, impact

Other Sub Topics

…..