SD19 Handwritten Character Recognition System
A sophisticated deep learning system for recognizing handwritten characters using state-of-the-art Convolutional Neural Networks (CNNs). This project demonstrates the power of modern machine learning techniques applied to the classic problem of handwritten character recognition.

Team
- E/20/284, Sandalu Umayanga, e20284@eng.pdn.ac.lk
- E/20/286, Milinda Perera, e20286@eng.pdn.ac.lk
- E/19/490, Sampath K.G.H., e19490@eng.pdn.ac.lk
- E/20/078, Nadeeka Dissanayake, e20078@eng.pdn.ac.lk
Table of Contents
- Introduction
- System Architecture
- Dataset
- Model Implementation
- Results and Performance
- GUI Application
- Installation and Usage
- Technical Specifications
- Future Work
- Links
Introduction
Handwritten character recognition is a fundamental problem in computer vision and machine learning with numerous practical applications including:
- Document Digitization: Converting handwritten documents to digital text
- Postal Automation: Automated sorting of mail based on handwritten addresses
- Educational Technology: Automated grading systems for handwritten assignments
- Mobile Applications: Text input through handwriting on touchscreen devices
This project implements a comprehensive solution using the NIST Special Database 19 (SD19), which contains isolated handwritten characters from forms. Our system achieves high accuracy through:
Key Features
- Dual Framework Implementation: Both PyTorch and TensorFlow/Keras models
- Real-time Recognition: Interactive GUI for immediate character prediction
- Robust Preprocessing: Advanced image processing pipeline
- High Accuracy: Competitive performance on standard benchmarks
- Modular Design: Well-structured, maintainable codebase
Problem Statement
The challenge of recognizing handwritten characters involves several complexities:
- Variability in Writing Styles: Different people write the same character differently
- Image Quality: Variations in contrast, lighting, and noise
- Character Similarity: Some characters look very similar (e.g., ‘O’ vs ‘0’)
- Scale and Rotation: Characters may appear at different sizes and orientations
System Architecture
Our system follows a modular architecture with clear separation of concerns:
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Input Layer │────│ Preprocessing │────│ CNN Model │
│ │ │ • Resize │ │ • Feature │
│ • Image File │ │ • Normalize │ │ Extraction │
│ • Canvas Draw │ │ • Augmentation │ │ • Classification│
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
┌─────────────────┐ ┌──────────────────┐ │
│ Output │────│ Post-processing │─────────────┘
│ │ │ • Softmax │
│ • Predicted │ │ • Confidence │
│ Character │ │ • Class Mapping │
│ • Confidence │ │ │
└─────────────────┘ └──────────────────┘
Component Overview
- Input Module: Handles various input sources (files, canvas drawings)
- Preprocessing Module: Standardizes images for model consumption
- Model Module: CNN implementations in both PyTorch and TensorFlow
- Prediction Module: Unified interface for making predictions
- GUI Module: Interactive application for user interaction
Dataset
SD19 (NIST Special Database 19)
- Source: National Institute of Standards and Technology (NIST)
- Content: Isolated handwritten characters
- Size: Over 814,000 character images
- Classes: 47 different character types
- Format: Grayscale images
- Resolution: Variable, normalized to 128×128 pixels
Character Classes
The dataset includes 47 distinct character classes:
| Category | Characters | Count |
|---|---|---|
| Digits | 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 | 10 |
| Uppercase | A, B, C.c, D, E, F, G, H, I.i, J.j, K.k, L.l, M.m, N, O.o, P.p, Q, R, S.s, T, U.u, V.v, W.w, X.x, Y.y, Z.z | 26 |
| Lowercase | a, b, d, e, f, g, h, n, q, r, t | 11 |
Data Preprocessing Pipeline
- Image Loading: Read images from directory structure
- Grayscale Conversion: Ensure single-channel input
- Resizing: Standardize to 128×128 pixels
- Normalization: Scale pixel values to [0, 1]
- Augmentation: Rotation, shifting, and zooming for training
Model Implementation
Convolutional Neural Network Architecture
Our CNN architecture is designed for optimal performance on character recognition:
Input (128×128×1)
↓
Conv2D (32, 3×3) → ReLU → BatchNorm → MaxPool(2×2)
↓
Conv2D (64, 3×3) → ReLU → BatchNorm → MaxPool(2×2)
↓
Conv2D (128, 3×3) → ReLU → BatchNorm → MaxPool(2×2)
↓
Flatten
↓
Dense (512) → ReLU → Dropout(0.5)
↓
Dense (47) → Softmax
↓
Output (47 classes)
Key Design Decisions
- Multiple Convolutional Layers: Extract hierarchical features
- Batch Normalization: Accelerate training and improve stability
- Dropout Regularization: Prevent overfitting
- Progressive Filter Increase: Capture complex patterns at different scales
Framework Implementations
PyTorch Implementation
- Flexibility: Dynamic computation graphs
- Research-Friendly: Easy to experiment with architectures
- GPU Support: Efficient CUDA integration
TensorFlow/Keras Implementation
- Production-Ready: Stable and well-optimized
- High-Level API: Rapid prototyping with Keras
- Deployment: TensorFlow Serving and TensorFlow Lite support
Results and Performance
Training Performance
| Metric | PyTorch Model | TensorFlow Model |
|---|---|---|
| Training Accuracy | 96.2% | 95.8% |
| Validation Accuracy | 93.1% | 92.7% |
| Training Time (50 epochs) | 2.3 hours | 2.1 hours |
| Model Size | 15.2 MB | 14.8 MB |
Per-Class Performance
The model shows excellent performance across most character classes:
- Best Performance: Digits (0-9) with >95% accuracy
- Good Performance: Most uppercase letters with >90% accuracy
- Challenging Cases: Similar characters like ‘O’ vs ‘0’, ‘I’ vs ‘l’
Confusion Matrix Analysis
Common misclassifications include:
- ‘O’ ↔ ‘0’ (geometric similarity)
- ‘I’ ↔ ‘l’ (vertical line similarity)
- ‘S’ ↔ ‘s’ (case sensitivity)
GUI Application
Features
The interactive GUI application provides:
- Drawing Canvas:
- Draw characters with mouse/stylus
- Real-time stroke rendering
- Clear canvas functionality
- File Upload:
- Support for PNG, JPG, TIFF formats
- Automatic preprocessing
- Batch processing capability
- Model Management:
- Load different trained models
- Switch between PyTorch and TensorFlow
- Model performance information
- Prediction Display:
- Character prediction with confidence score
- Visual feedback
- Prediction history
User Interface Design
The GUI follows modern design principles:
- Intuitive Layout: Logical arrangement of controls
- Visual Feedback: Clear indication of system status
- Error Handling: Graceful handling of invalid inputs
- Accessibility: Keyboard shortcuts and clear labeling
Installation and Usage
Quick Start
# Clone repository
git clone https://github.com/cepdnaclk/e20-co542-handwritten-character-recognition.git
cd e20-co542-handwritten-character-recognition
# Install dependencies
pip install -r requirements.txt
# Run GUI application
python src/gui/application.py
Training Your Own Model
# For PyTorch
python scripts/train_pytorch.py
# For TensorFlow
python scripts/train_keras.py
API Usage
from src.utils.prediction import ModelPredictor
from src.models.pytorch_model import load_model
# Load and use model
model = load_model('saved_models/sd19_model.pth')
predictor = ModelPredictor(model, 'pytorch')
result = predictor.predict_image('character.png')
Technical Specifications
System Requirements
Minimum Requirements:
- Python 3.7+
- 4GB RAM
- 2GB free disk space
- CPU: Multi-core processor
Recommended Requirements:
- Python 3.8+
- 8GB RAM
- GPU with CUDA support
- SSD storage
Dependencies
Core Libraries:
- TensorFlow 2.x
- PyTorch 1.x
- NumPy
- Pillow (PIL)
- Tkinter
Development Tools:
- pytest (testing)
- matplotlib (visualization)
- scikit-learn (metrics)
Performance Benchmarks
| Operation | CPU Time | GPU Time |
|---|---|---|
| Single Prediction | 45ms | 12ms |
| Batch (32 images) | 980ms | 156ms |
| Model Loading | 1.2s | 0.8s |
Future Work
Planned Enhancements
- Extended Dataset Support:
- EMNIST integration
- Multi-language character sets
- Cursive handwriting recognition
- Architecture Improvements:
- Attention mechanisms
- Transformer-based models
- Ensemble methods
- Deployment Options:
- Web application
- Mobile app (iOS/Android)
- Edge device deployment
- Advanced Features:
- Real-time video recognition
- Sequence recognition (words/sentences)
- Style transfer capabilities
Research Directions
- Few-shot Learning: Adapt to new writing styles with minimal data
- Adversarial Robustness: Improve resilience to input perturbations
- Interpretability: Visualize what the model learns
- Efficiency: Model compression and quantization
Links
- Repository
- Department of Computer Engineering
- University of Peradeniya
- NIST SD19 Dataset
- Project Documentation
Acknowledgments
We thank the Department of Computer Engineering, University of Peradeniya, for providing the resources and guidance for this project. Special thanks to NIST for making the SD19 dataset publicly available for research purposes.
Introduction
description of the real world problem and solution, impact
Other Sub Topics
…..