Deriving inspiration from the human brain: neuromorphic computing

David Pollington

Author:

David Pollington, Head of Research

Connect on LinkedIn

AI has been on a phenomenal rise.  But this capability comes at a cost in terms of the compute required for training and inference, and the associated energy consumption.  Whilst numerous techniques are being devised for shrinking AI/ML for deployment on edge devices, realising AI’s full potential will require a step-change in compute efficiency (performance/Watt).

The normative approach of packing more transistors into next generation chips (silicon scaling) to boost compute performance has worked well over the past few decades but is starting to become increasingly challenging, and is likely to hit a wall with energy consumption and heat dissipation becoming a limiting factor.

Whilst the energy efficiency of computers has improved by a remarkable 15 orders of magnitude over the past 80 years, the energy required to flip a bit has plateaued and is still 1000x short of the Landauer limit, the theoretical minimum for computation based on information theory.  Sending that bit across a chip consumes another 10,000x in energy.

Switching energy per transistor

With physics (silicon scaling) delivering progressively slower gains, researchers are exploring novel architectures that go beyond the limitations of conventional processors to profoundly shift the trajectory for compute efficiency.  In particular, researchers have been taking inspiration from the human brain.

The human brain operates 100 billion neurons interconnected by more than 100 trillion synapses to simultaneously compute, reason and store information, and yet consumes a mere 20W.  This equates to a power density of 10mW/cm2 compared to a modern processor running an artificial neural network (ANN) that requires 100W/cm2, 4 orders of magnitude more.

At a structural level, ANNs comprise a series of nodes interconnected by weighted links that resemble the neurons and synapses of the brain, but they’re still far from a biological brain in architecture and function.

An important difference is in how the architecture treats storage and compute.  The majority of computers follow the von Neumann architecture in which data and compute are logically separated and data is fetched when required for processing. 

But as noted previously, shuttling data to/from the processor consumes energy and introduces unwanted latency, the so-called von Neumann bottleneck.  The brain sidesteps this issue by collocating storage and compute and treating them synonymously.

Another major difference is in how information is encoded and processed.  Within ANN implementations, input data is encoded as discrete scalars (e.g., pixel values in an image) and clocked through the ANN in a deterministic, synchronous manner to produce an output. 

The brain though operates on continuous input values (shaped waves), is stochastic and asynchronous in nature, and effectively encodes information through the timing, magnitude, and shape of spikes that fire between the neurons.

Achieving a similar energy efficiency to the human brain requires a change in both compute architecture and information encoding that more closely mimics the structure and function of the human brain – neuromorphic computing.

Neuromorphic computing

Whilst not a new concept, neuromorphic computing has been receiving a lot of attention lately due to its promise of reducing the computational energy, latency, as well as learning complexity in ANNs.  In the last three years alone, there have been over 4.1 million patents filed on neuromorphic computing, with IBM being one of the leaders.

A neuromorphic approach typically incorporates several of the following architecture design principles:

Non von Neumann & distributed (neurons)

Memory and compute are combined in individual, highly distributed processing units analogous to neurons, and highly interconnected similar to the brain’s synapses – mitigates the von Neumann bottleneck.

Inherently parallel

All processing units can be operated simultaneously – massive parallel computation.

Inherently scalable

Blocks of processing units can be combined to create a single large system for running larger and larger ANNs – inherently scalable.

Event-driven

Individual processing units are inherently idle until there is work to be done – event-driven. In conventional ANNs, the neurons and associated processor logic are continuously on, and whilst power gating can be used to switch off parts of the chip, this doesn’t fully exploit the temporally sparse nature of ANNs.  With an event-driven approach, the individual processing units are inherently idle until there is work to be done thereby reducing energy consumption by ~2 orders of magnitude compared with typical ANN implementations.

Neuromorphic compute also takes a different approach to information encoding.  Today’s ANN implementations are typically state-based and synchronous.  The brain is different, conveying information through the use of spikes.

Many neuromorphic implementations therefore employ Spiking Neural Networks (SNNs) that emulate the brain by encoding information similarly through the timing, magnitude, and shape of spikes.  

Moreover, they adopt an event-driven methodology whereby the neurons only fire when needed rather than at every propagation cycle as ANN neurons do, and when they do fire, they trigger a huge number of parallel operations via the distributed and highly connected processing units described earlier.

Mechanisms such as spike-timing-dependent plasticity (STDP) adjust the strength of connections between neurons based on the timing of their spikes thereby enabling the SNN to learn from temporal patterns in the data, mimicking the way humans learn from experiences over time.

The key differences between today’s von Neumann compute architecture and the neuromorphic alternative are summarised below:

https://www.nature.com/articles/s43588-021-00184-y#Fig1

Neuromorphic implementation options

An easy way to explore and implement a neuromorphic approach is through software.  Intel’s open source Lava framework, for example, is designed for asynchronous event-based processing and enables a trained ANN to be converted to an SNN for execution on standard GPUs.

Such an approach offers promising performance and energy reduction by only needing to perform accumulate computations (at the spiking threshold) rather than the multiply and accumulate (MAC) computations intrinsic within an ANN.

Having said that, it doesn’t leverage all the inherent computational capabilities of SNNs (in particular the temporal aspect), and ultimately is limited by the GPU’s von-Neumann architecture it operates on.

Efforts are therefore being made to move instead to purpose-built neuromorphic silicon.

Intel, for instance, have developed the Loihi neuromorphic processor, which in its latest guise (Loihi 2) provides a million computational neurons for complex tasks such as pattern recognition and sensory data processing.

By taking an event-based and asynchronous approach, Loihi’s neurons carry information in both the timing and magnitude of digitally-represented spikes, hence Loihi is extremely energy efficient.

Intel “Hala Point” neuromorphic system

With Hala Point, Intel have created a neuromorphic system comprising over a thousand Loihi 2 neuromorphic processors to achieve a billion neurons – equivalent in size to that of an owl’s brain – and able to solve optimisation problems 50x faster than classical compute and using 100x less energy.

SpiNNcloud, a spinout from the Technical University Dresden, is another player in the neuromorphic space.  Leveraging research developed by Dresden within the EU Human Brain Project, they are developing a low-latency and energy-efficient cognitive AI platform combining deep learning, symbolic AI, and neuromorphic computing suitable for a range of AI applications, and with the aim of emulating at least 5 billion neurons.

In a similar vein, Brainchip’s Akida neuromorphic chip combines event-based processing with near-memory compute to target edge applications including advanced driver assistance systems, drones, and IoT devices.

Polyn takes a different route, combining a fixed analog stage that pre-processes sensor input signals with a digital stage providing application-dependent processing to target ultra-low power devices such as wearables.

And finally, IBM’s NorthPole chip is an inference-only accelerator comprising 256 computing cores, each of which contains its own memory (memory near compute).  By minimising data movement, and utilising lower precision (e.g., INT8, 4 and 2-bit parameters), it has shown high energy efficiency in image classification whilst dispensing with the high-precision typically required for ML training.

Source: https://modha.org

Notwithstanding their differences, these approaches all fit under the neuromorphic banner, and all set out to improve performance and energy efficiency compared to existing solutions.

In practise though, complexities in implementation can negate some of the theoretical benefits.  For instance, whilst SNNs involve a simpler accumulation operation than the MAC used in conventional ANNs, the overhead of supporting sparse activation in time- and event-driven computations can result in greater energy usage than today’s highly optimised GPUs and ANN accelerators.

Cost is another important factor.  The tight integration of memory and compute in neuromorphic architectures minimises data transfer thereby reducing energy, but on the flip side requires the memory to be fabricated using the same expensive logic processes as the processing units, and this can be 100x the cost of the off-chip DRAM used in conventional architectures.

Given these considerations, neuromorphic is unlikely to outperform GPUs and ANN accelerators at large scale in data centres.  It might though be well suited to small-scale, edge-based applications such as voice and gesture recognition, and within mission critical sensing and robotic applications where its low energy and realtime capabilities would be a real benefit.

Other areas being explored include its use within software-defined radio for power-efficient edge devices, and leveraging the massively parallel, event-driven nature of neuromorphic for graph algorithms and optimisation tasks in real-world data modelling.

Neuromorphic is certainly not a panacea for delivering a fundamental step-change in compute efficiency, but in specific targeted applications it shows a lot of promise, and hence represents an interesting area for startups to explore.

Future articles will explore other new compute architectures including in-memory, analog, and of course quantum computing, as well as a few others on the horizon.

The future of AI compute is photonics… or is it?

The future of AI compute is photonics… or is it?

Exploring the potential of photonics to improve efficiency in the future of compute

Novel computing paradigms on the horizon

Novel computing paradigms on the horizon

Exploring more esoteric approaches to the future of compute