Neuromorphic computing mimics the brain's neural structure, enabling event-driven, spike-based communication that overcomes the traditional Von Neumann memory bottleneck. However, this architecture introduces its own memory challenge: synaptic memory typically occupies over 70% of total chip area and accounts for more than 80% of power consumption.
In the Cerebra-H accelerator of the SNAP-V SoC, weight memory alone consumes 95.97% (479.95 mW) of total system power. This work targets that bottleneck by optimizing memory across three dimensions: representation, organization, and access mechanisms.
Key hyperparameters investigated include synaptic weight bit-width, quantization strategies, and fixed-point (QM.N) format selection. Using a Bayesian optimization loop to navigate this design space, we identified a Q12.4 fixed-point configuration with Time-To-First-Spike (TTFS) encoding as the optimal operating point — achieving 82.4% inference accuracy at 0.248 W, a nearly 50% reduction in power relative to the SNAP-V baseline (96.69% accuracy at 0.5001 W).