Neuromorphic Memory Optimization for Edge AI Accelerators

50% Power Reduction • Q12.4 + TTFS • 82.4% Accuracy

1. Abstract

Neuromorphic computing mimics the brain's neural structure, enabling event-driven, spike-based communication that overcomes the traditional Von Neumann memory bottleneck. However, this architecture introduces its own memory challenge: synaptic memory typically occupies over 70% of total chip area and accounts for more than 80% of power consumption.

In the Cerebra-H accelerator of the SNAP-V SoC, weight memory alone consumes 95.97% (479.95 mW) of total system power. This work targets that bottleneck by optimizing memory across three dimensions: representation, organization, and access mechanisms.

Key hyperparameters investigated include synaptic weight bit-width, quantization strategies, and fixed-point (QM.N) format selection. Using a Bayesian optimization loop to navigate this design space, we identified a Q12.4 fixed-point configuration with Time-To-First-Spike (TTFS) encoding as the optimal operating point — achieving 82.4% inference accuracy at 0.248 W, a nearly 50% reduction in power relative to the SNAP-V baseline (96.69% accuracy at 0.5001 W).

3. Methodology

We optimized synaptic memory through three main pillars:

1. Representation Optimization

Hardware-aware quantization from 32-bit floating point to lower bit-widths (8/16/32-bit). Exploration of fixed-point QM.N formats (M = integer bits, N = fractional bits) to balance clipping and underflow.

2. Organization & Access Mechanisms

Modifications to Cerebra-H's clustered architecture and weight memory subsystem. RTL-level changes to support different bit-widths and encoding schemes.

3. Automated Design Space Exploration

Bayesian Optimization loop integrating: High-level simulation (TENNLab), RTL generation (Verilog), Memory macro generation (OpenRAM 6T SRAM), and Gate-level power analysis (Synopsys VCS + RTL Compiler + PrimePower).

Encoding Schemes Evaluated: Rate Encoding vs. Time-To-First-Spike (TTFS)

SNAP-V SoC Architecture

Figure 1: Baseline SNAP-V SoC microarchitecture showing integration with the Cerebra-H neuromorphic accelerator.

4. Experiment Setup and Implementation

Synaptic Weight Memory Isolation

Figure 2: Architectural breakdown and hardware profiling of the synaptic weight memory allocation subsystem.

5. Results and Analysis

Power Reduction

50.4%

From 0.5001W → 0.248W

Accuracy

82.4%

MNIST Dataset

Optimal Config

Q12.4 + TTFS

Best Power-Accuracy Tradeoff

Key Achievement Comparison

Configuration Accuracy Power (W) Improvement
Baseline (SNAP-V) 96.69% 0.5001 Reference
Optimized (Ours) 82.40% 0.2480 -50.4%
Highlights:
Accuracy vs Bit-Width Quantization

Figure 3: Impact of varying quantization bit-widths and numerical representations on classification accuracy.

MNIST Power and Accuracy Trade-offs

Figure 4: Pareto frontier analysis highlighting the system-level power-performance tradeoffs on the MNIST benchmark dataset.

6. Research Team

Malinga G.A.I.

E/20/242

Ariyarathna D.B.S.

E/20/024

Panawennage L.S.

E/20/279

Supervisors

Prof. Roshan G. Ragel  •  Dr. Isuru Nawinne

Department of Computer Engineering, University of Peradeniya

7. Conclusion

This research demonstrates that systematic optimization of synaptic memory representation and access mechanisms is critical for ultra-low-power neuromorphic hardware. By combining hardware-aware quantization, fixed-point arithmetic, and Bayesian optimization, we achieved nearly 50% power reduction while maintaining functional accuracy. The identified patterns (clipping vs underflow) and automated workflow provide a practical framework for future neuromorphic co-design efforts targeting edge AI devices.