NeuroMemOpt | Neuromorphic Memory Optimization

1. Abstract

Neuromorphic computing mimics the brain's neural structure, enabling event-driven, spike-based communication that overcomes the traditional Von Neumann memory bottleneck. However, this architecture introduces its own memory challenge: synaptic memory typically occupies over 70% of total chip area and accounts for more than 80% of power consumption.

In the Cerebra-H accelerator of the SNAP-V SoC, weight memory alone consumes 95.97% (479.95 mW) of total system power. This work targets that bottleneck by optimizing memory across three dimensions: representation, organization, and access mechanisms.

Key hyperparameters investigated include synaptic weight bit-width, quantization strategies, and fixed-point (QM.N) format selection. Using a Bayesian optimization loop to navigate this design space, we identified a Q12.4 fixed-point configuration with Time-To-First-Spike (TTFS) encoding as the optimal operating point — achieving 82.4% inference accuracy at 0.248 W, a nearly 50% reduction in power relative to the SNAP-V baseline (96.69% accuracy at 0.5001 W).

2. Related Works

MorphIC [Frenkel et al.] – Binary-weight digital neuromorphic processor with stochastic online learning.
TrueNorth [IBM] – Landmark 1 million neuron programmable neurosynaptic chip.
CyNAPSE – Adaptive caching for power savings in event-driven accelerators.
SpiDR & Compute-in-Memory designs – Demonstrated high efficiency through sparsity exploitation.
Q-SpiNN & other quantization frameworks – Focused on low-precision SNN weights.
SNAP-V SoC (our baseline) – RISC-V based neuromorphic platform with Cerebra-H accelerator.

This work builds upon these by introducing a systematic Bayesian Optimization workflow combining hardware-aware quantization, fixed-point iteration, and encoding scheme co-optimization.

3. Methodology

We optimized synaptic memory through three main pillars:

1. Representation Optimization

Hardware-aware quantization from 32-bit floating point to lower bit-widths (8/16/32-bit). Exploration of fixed-point QM.N formats (M = integer bits, N = fractional bits) to balance clipping and underflow.

2. Organization & Access Mechanisms

Modifications to Cerebra-H's clustered architecture and weight memory subsystem. RTL-level changes to support different bit-widths and encoding schemes.

3. Automated Design Space Exploration

Bayesian Optimization loop integrating: High-level simulation (TENNLab), RTL generation (Verilog), Memory macro generation (OpenRAM 6T SRAM), and Gate-level power analysis (Synopsys VCS + RTL Compiler + PrimePower).

Encoding Schemes Evaluated: Rate Encoding vs. Time-To-First-Spike (TTFS)

Figure 1: Baseline SNAP-V SoC microarchitecture showing integration with the Cerebra-H neuromorphic accelerator.

4. Experiment Setup and Implementation

Baseline Platform: SNAP-V SoC with Cerebra-H NoC-based neuromorphic accelerator (1024 neurons).
Datasets: Iris & MNIST.
Neuron Model: Leaky Integrate-and-Fire (LIF).
Simulation Framework: TENNLab + snnTorch (for training).
Hardware Flow: RTL modifications for quantized weights and fixed-point arithmetic. OpenRAM for accurate 6T SRAM macro modeling. Full Synopsys tool flow for power estimation at 45nm.
Optimization: Bayesian Optimization (Gaussian Process surrogate) over bit-width, QM.N format, and encoding scheme.

Figure 2: Architectural breakdown and hardware profiling of the synaptic weight memory allocation subsystem.

5. Results and Analysis

Power Reduction

50.4%

From 0.5001W → 0.248W

Accuracy

82.4%

MNIST Dataset

Optimal Config

Q12.4 + TTFS

Best Power-Accuracy Tradeoff

Key Achievement Comparison

Configuration	Accuracy	Power (W)	Improvement
Baseline (SNAP-V)	96.69%	0.5001	Reference
Optimized (Ours)	82.40%	0.2480	-50.4%

Highlights:

Lowest Power: Q10.6 + TTFS → 0.246 W (79.0% accuracy)
Best Accuracy-Power Balance: Q12.4 + TTFS
Balanced QM.N formats (sufficient integer + fractional bits) effectively prevent clipping and underflow.
Significant reduction in both static (leakage) and dynamic power from smaller, more efficient weight memory.

Figure 3: Impact of varying quantization bit-widths and numerical representations on classification accuracy.

Figure 4: Pareto frontier analysis highlighting the system-level power-performance tradeoffs on the MNIST benchmark dataset.

6. Research Team

Malinga G.A.I.

E/20/242

Ariyarathna D.B.S.

E/20/024

Panawennage L.S.

E/20/279

Supervisors

Prof. Roshan G. Ragel • Dr. Isuru Nawinne

Department of Computer Engineering, University of Peradeniya

7. Conclusion

This research demonstrates that systematic optimization of synaptic memory representation and access mechanisms is critical for ultra-low-power neuromorphic hardware. By combining hardware-aware quantization, fixed-point arithmetic, and Bayesian optimization, we achieved nearly 50% power reduction while maintaining functional accuracy. The identified patterns (clipping vs underflow) and automated workflow provide a practical framework for future neuromorphic co-design efforts targeting edge AI devices.