A RISC-V SoC with Configurable Neuromorphic Acceleration for Small-Scale Spiking Neural Networks
Team
- E/19/129, Gunawardana K.H., e19129@eng.pdn.ac.lk
- E/19/275, Peeris M.S., e19275@eng.pdn.ac.lk
- E/19/309, Rambukwella H.M.W.K.G., e19309@eng.pdn.ac.lk
Supervisors
- Dr. Isuru Nawinne, isurunawinne@eng.pdn.ac.lk
- Prof. Roshan G. Ragel, roshanr@eng.pdn.ac.lk
Table of Contents
- Abstract
- Related Works
- Methodology
- Experiment Setup and Implementation
- Results and Analysis
- Conclusion
- Links
Abstract
This project focuses on designing a RISC-V System-on-Chip (SoC) integrated with a Neuromorphic Accelerator optimized for small-scale Spiking Neural Network (SNN) applications of approximately 1,000 neurons. The RISC-V processor manages the accelerator to enable low-power, low-latency edge computing, addressing the limitations of existing neuromorphic hardware, which is often resource-intensive and tailored for large-scale applications. The proposed architecture incorporates highly parallel processing nodes, a distributed memory system, and an efficient Network-on-Chip (NoC) to ensure seamless communication, reducing latency and enhancing performance. This system is aimed at providing efficient, real-time AI capabilities for embedded systems and edge devices, with applications in robotics, image processing, natural language processing, and signal processing. By integrating on-chip learning and general-purpose computing, the design enhances versatility and energy efficiency, making SNNs more viable for resource-constrained environments.
Related Works
The development of Spiking Neural Networks (SNNs) and neuromorphic hardware has been extensively explored, with various learning techniques and architectures proposed to enhance their efficiency. Below is a summary of key findings from the literature review:
Learning Techniques for SNNs
-
Supervised Learning: Methods like SpikeProp, SuperSpike, SLAYER, and surrogate gradient techniques address the non-differentiable nature of spikes. For instance, SuperSpike uses membrane potential derivatives for gradient-based learning, while SLAYER distributes error credit over time, supporting axonal delay learning. However, these methods face challenges such as high computational complexity and memory constraints due to multiple timestep processing.
-
ANN-to-SNN Conversion: This approach maps trained Artificial Neural Networks (ANNs) to SNNs, adjusting neuron firing thresholds to approximate ANN behavior. It is energy-efficient but sacrifices temporal dynamics, leading to trade-offs between accuracy and inference latency.
-
Unsupervised Learning: Spike-Timing-Dependent Plasticity (STDP) adjusts synaptic weights based on spike timing, with stable STDP (S-STDP) incorporating spike traces to prevent catastrophic forgetting. This method is memory-efficient but less accurate than supervised approaches.
-
Reinforcement Learning: Reward-modulated STDP (R-STDP) integrates reward signals, achieving up to 25-fold energy efficiency improvements in robotics and edge applications.
-
Hybrid Learning: Approaches like DECOLLE use local error functions for deep continuous learning, improving scalability and efficiency by avoiding global backpropagation.
Hardware Architectures
-
ASIC-Based Designs: Systems like Intel’s Loihi, IBM’s TrueNorth, and SpiNNaker offer high parallelism and energy efficiency but are optimized for large-scale applications, making them inefficient for small-scale SNNs. For example, Loihi supports 128 cores with 1,024 neurons each, while TrueNorth achieves ultra-low power for 1 million neurons.
-
FPGA-Based Designs: Implementations like SpikeHard and Caspian provide reconfigurability for smaller-scale applications but often lack on-chip learning or flexibility for diverse neuron models. SpikeHard supports 259K neurons with 170x energy efficiency, while Caspian operates at 10-20 mW.
-
Software Frameworks: Tools like Nengo, Lava, and Brian 2 support SNN simulation and integration with hardware. Nengo excels in large-scale modeling, Lava in real-time processing, and Brian 2 in rapid prototyping for small-scale SNNs.
RISC-V SoC Designs
RISC-V’s open-source, modular Instruction Set Architecture (ISA) is ideal for neuromorphic SoCs. Designs like GAP-8 and Marsellus integrate accelerators for AI and IoT, achieving high efficiency (e.g., 12.4 TOPS/W for Marsellus). However, dedicated neuromorphic SoCs for small-scale SNNs remain underexplored, highlighting a gap that this project addresses.
On-Chip Communication
Network-on-Chip (NoC) architectures, such as hierarchical NoC (H-NoC) and mixed-mode routing, optimize sparse, event-driven SNN communication. For example, Carrillo et al.’s H-NoC supports up to 400 neurons per cluster with low power overhead, while Fang et al.’s GALS-based NoC reduces congestion using a 2D mesh topology.
The literature underscores the need for configurable, resource-efficient neuromorphic hardware tailored for small-scale SNNs, integrated with general-purpose computing capabilities, which this project aims to address.
Methodology
The research methodology is structured into five phases, as outlined in the original proposal, to develop and validate the proposed RISC-V SoC with a neuromorphic accelerator. The approach is iterative, allowing refinements based on findings from each phase.
Phase 1: System Design and High-Level Modeling
Objective: Define the system architecture, including neuron structures, the neuromorphic accelerator, spike communication, and RISC-V processor integration.
Activities:
- Design a high-level system with a RISC-V-based SoC, incorporating peripheral interfaces and communication protocols.
- Conduct simulations using Python-based Nengo to model SNN behavior, defining parameters like neuron models, synaptic weights, and thresholds.
- Establish performance metrics (e.g., neuron capacity, power consumption, latency) based on real-world applications like robotics and edge computing.
Phase 2: Hardware Implementation
Objective: Develop the neuromorphic accelerator and RISC-V SoC hardware.
Activities:
- Implement the accelerator with 1,024 neurons, a spike interconnect, and NoC for initialization.
- Design an RV32IM-based RISC-V processor with an accelerator controller for SNN execution.
- Develop memory interfaces and sensor connectivity.
- Perform simulation and functional verification using GTKWave, with unit tests for individual modules.
Phase 3: FPGA Prototyping
Objective: Validate the design on hardware.
Activities:
- Synthesize the design on an Altera DE2-115 FPGA board.
- Test real-time behavior, measuring energy consumption, execution latency, and SNN performance under real workloads.
- Conduct scalability testing to identify bottlenecks.
Phase 4: ASIC Design and Analysis
Objective: Synthesize and optimize the design for ASIC implementation.
Activities:
- Use Synopsys Design Compiler for ASIC synthesis, analyzing area, power, and timing.
- Perform power analysis with Synopsys PrimePower and timing verification with Synopsys PrimeTime.
- Optimize design parameters for performance and efficiency.
Phase 5: Iterative Refinement and Optimization
Objective: Enhance the design based on FPGA and ASIC evaluations.
Activities:
- Apply optimizations to neuron models, memory access, and power efficiency.
- Iterate the design cycle to address bottlenecks and meet target specifications.
- Validate the design against real-world applications, comparing metrics like latency and throughput with existing solutions.
The methodology focuses on five key areas: neuromorphic accelerator optimizations, accelerator initialization, spike I/O, RISC-V processor and SoC design, and on-chip learning (STDP, ANN-to-SNN conversion, spike-based backpropagation, and hybrid learning).
Experiment Setup and Implementation
Hardware and Software Tools
Languages:
- Verilog: Used for hardware design of the neuromorphic accelerator and RISC-V SoC.
- Python: Used with the Nengo framework for SNN modeling, simulation, and parameter extraction.
Simulation:
- GTKWave: For visualizing and debugging Verilog simulation outputs.
Hardware:
- Altera DE2-115 FPGA Board: For prototyping and testing the design.
- FPGA-Based Server: For additional simulations and scalability testing.
Tools:
- Quartus II: For FPGA synthesis, programming, and resource utilization analysis.
- Synopsys Design Compiler: For ASIC synthesis and generation of time, power, and area metrics.
- Synopsys PrimePower: For accurate power analysis.
- Synopsys PrimeTime: For timing analysis.
Implementation Details
-
Neuron Design: Comprises control logic, a network interface, an accumulator, a potential adder, and a potential decay unit. The accumulator processes incoming spikes, the potential decay unit applies decay, and the adder compares the potential to a threshold to generate spikes.
-
Neuromorphic Accelerator: Includes a spike interconnect for neuron communication and a hierarchical NoC for management and initialization. It supports 1,024 neurons with distributed memory for weights and parameters.
-
Accelerator Controller: Mediates communication between the RISC-V processor and the accelerator using custom instructions, featuring a DMA controller, network interface, and spike I/O unit.
-
RISC-V SoC: Consists of two RV32IM cores—one for neuromorphic tasks (initialization, inference, learning) and one for general-purpose tasks (sensor interfacing, control decisions). The SoC includes on-chip memory, a ROM for BIOS, and peripherals (I2C, SPI, GPIO) connected via an APB bus.
-
On-Chip Learning: Supports STDP for unsupervised learning, ANN-to-SNN conversion for supervised learning, spike-based backpropagation, and hybrid learning for optimized performance.
The implementation will be validated through simulations, FPGA prototyping, and ASIC synthesis, with performance metrics collected for latency, power consumption, and computational accuracy.
Results and Analysis
As the project is in the proposal stage, specific results are not yet available. However, the expected outcomes and analysis plan are outlined below:
Performance Metrics:
- Latency: Measure execution latency for SNN workloads, comparing with existing neuromorphic systems like Loihi and SpiNNaker.
- Power Consumption: Evaluate energy efficiency using Synopsys PrimePower, targeting lower power than large-scale ASIC designs.
- Resource Utilization: Assess FPGA and ASIC resource usage (area, memory) using Quartus II and Synopsys Design Compiler.
- Accuracy: Test SNN model accuracy in applications like robotics and edge computing, comparing STDP, ANN-to-SNN conversion, and hybrid learning approaches.
- Scalability: Analyze performance under varying workloads to ensure the design scales efficiently for small-scale SNNs.
The analysis will use quantitative methods, comparing metrics against existing solutions and baseline models. Real-world applicability will be evaluated through test cases in robotics and edge computing, ensuring the design meets resource-constrained requirements.
Conclusion
This research proposes a novel RISC-V SoC with a configurable neuromorphic accelerator tailored for small-scale SNNs, addressing the inefficiencies of existing large-scale neuromorphic hardware. By integrating a RISC-V processor, a hierarchical NoC, and on-chip learning capabilities (STDP, ANN-to-SNN conversion, hybrid learning), the design ensures low-power, low-latency performance for edge computing applications. The methodology, spanning system design, hardware implementation, FPGA prototyping, ASIC synthesis, and iterative refinement, provides a robust framework for developing and validating the system. Expected contributions include a resource-efficient, versatile neuromorphic platform that enhances real-time AI capabilities in robotics, image processing, and signal processing. Future work will focus on optimizing the design based on FPGA and ASIC results, exploring additional learning algorithms, and expanding application scenarios.