Bloc Ventures

AI has been on a phenomenal rise. But this capability comes at a cost in terms of the compute required for training and inference, and the associated energy consumption. Whilst numerous techniques are being devised for shrinking AI/ML for deployment on edge devices, realising AI’s full potential will require a step-change in compute efficiency (performance/Watt).

The normative approach of packing more transistors into next generation chips (silicon scaling) to boost compute performance has worked well over the past few decades but is starting to become increasingly challenging, and is likely to hit a wall with energy consumption and heat dissipation becoming a limiting factor.

Whilst the energy efficiency of computers has improved by a remarkable 15 orders of magnitude over the past 80 years, the energy required to flip a bit has plateaued and is still 1000x short of the Landauer limit, the theoretical minimum for computation based on information theory. Sending that bit across a chip consumes another 10,000x in energy.

Switching energy per transistor

With physics (silicon scaling) delivering progressively slower gains, researchers are exploring novel architectures that go beyond the limitations of conventional processors to profoundly shift the trajectory for compute efficiency. In particular, researchers have been taking inspiration from the human brain.

The human brain operates 100 billion neurons interconnected by more than 100 trillion synapses to simultaneously compute, reason and store information, and yet consumes a mere 20W. This equates to a power density of 10mW/cm² compared to a modern processor running an artificial neural network (ANN) that requires 100W/cm², 4 orders of magnitude more.

At a structural level, ANNs comprise a series of nodes interconnected by weighted links that resemble the neurons and synapses of the brain, but they’re still far from a biological brain in architecture and function.

An important difference is in how the architecture treats storage and compute. The majority of computers follow the von Neumann architecture in which data and compute are logically separated and data is fetched when required for processing.

But as noted previously, shuttling data to/from the processor consumes energy and introduces unwanted latency, the so-called von Neumann bottleneck. The brain sidesteps this issue by collocating storage and compute and treating them synonymously.

Another major difference is in how information is encoded and processed. Within ANN implementations, input data is encoded as discrete scalars (e.g., pixel values in an image) and clocked through the ANN in a deterministic, synchronous manner to produce an output.

The brain though operates on continuous input values (shaped waves), is stochastic and asynchronous in nature, and effectively encodes information through the timing, magnitude, and shape of spikes that fire between the neurons.

Achieving a similar energy efficiency to the human brain requires a change in both compute architecture and information encoding that more closely mimics the structure and function of the human brain – neuromorphic computing.

Neuromorphic computing

Whilst not a new concept, neuromorphic computing has been receiving a lot of attention lately due to its promise of reducing the computational energy, latency, as well as learning complexity in ANNs. In the last three years alone, there have been over 4.1 million patents filed on neuromorphic computing, with IBM being one of the leaders.

A neuromorphic approach typically incorporates several of the following architecture design principles:

Non von Neumann & distributed (neurons)

Memory and compute are combined in individual, highly distributed processing units analogous to neurons, and highly interconnected similar to the brain’s synapses – mitigates the von Neumann bottleneck.

Inherently parallel

All processing units can be operated simultaneously – massive parallel computation.

Inherently scalable

Blocks of processing units can be combined to create a single large system for running larger and larger ANNs – inherently scalable.

Event-driven

Individual processing units are inherently idle until there is work to be done – event-driven. In conventional ANNs, the neurons and associated processor logic are continuously on, and whilst power gating can be used to switch off parts of the chip, this doesn’t fully exploit the temporally sparse nature of ANNs. With an event-driven approach, the individual processing units are inherently idle until there is work to be done thereby reducing energy consumption by ~2 orders of magnitude compared with typical ANN implementations.

Neuromorphic compute also takes a different approach to information encoding. Today’s ANN implementations are typically state-based and synchronous. The brain is different, conveying information through the use of spikes.

Many neuromorphic implementations therefore employ Spiking Neural Networks (SNNs) that emulate the brain by encoding information similarly through the timing, magnitude, and shape of spikes.

Moreover, they adopt an event-driven methodology whereby the neurons only fire when needed rather than at every propagation cycle as ANN neurons do, and when they do fire, they trigger a huge number of parallel operations via the distributed and highly connected processing units described earlier.

Mechanisms such as spike-timing-dependent plasticity (STDP) adjust the strength of connections between neurons based on the timing of their spikes thereby enabling the SNN to learn from temporal patterns in the data, mimicking the way humans learn from experiences over time.

The key differences between today’s von Neumann compute architecture and the neuromorphic alternative are summarised below:

https://www.nature.com/articles/s43588-021-00184-y#Fig1

Neuromorphic implementation options

An easy way to explore and implement a neuromorphic approach is through software. Intel’s open source Lava framework, for example, is designed for asynchronous event-based processing and enables a trained ANN to be converted to an SNN for execution on standard GPUs.

Such an approach offers promising performance and energy reduction by only needing to perform accumulate computations (at the spiking threshold) rather than the multiply and accumulate (MAC) computations intrinsic within an ANN.

Having said that, it doesn’t leverage all the inherent computational capabilities of SNNs (in particular the temporal aspect), and ultimately is limited by the GPU’s von-Neumann architecture it operates on.

Efforts are therefore being made to move instead to purpose-built neuromorphic silicon.

Intel, for instance, have developed the Loihi neuromorphic processor, which in its latest guise (Loihi 2) provides a million computational neurons for complex tasks such as pattern recognition and sensory data processing.

By taking an event-based and asynchronous approach, Loihi’s neurons carry information in both the timing and magnitude of digitally-represented spikes, hence Loihi is extremely energy efficient.

Intel “Hala Point” neuromorphic system

With Hala Point, Intel have created a neuromorphic system comprising over a thousand Loihi 2 neuromorphic processors to achieve a billion neurons – equivalent in size to that of an owl’s brain – and able to solve optimisation problems 50x faster than classical compute and using 100x less energy.

SpiNNcloud, a spinout from the Technical University Dresden, is another player in the neuromorphic space. Leveraging research developed by Dresden within the EU Human Brain Project, they are developing a low-latency and energy-efficient cognitive AI platform combining deep learning, symbolic AI, and neuromorphic computing suitable for a range of AI applications, and with the aim of emulating at least 5 billion neurons.

In a similar vein, Brainchip’s Akida neuromorphic chip combines event-based processing with near-memory compute to target edge applications including advanced driver assistance systems, drones, and IoT devices.

Polyn takes a different route, combining a fixed analog stage that pre-processes sensor input signals with a digital stage providing application-dependent processing to target ultra-low power devices such as wearables.

And finally, IBM’s NorthPole chip is an inference-only accelerator comprising 256 computing cores, each of which contains its own memory (memory near compute). By minimising data movement, and utilising lower precision (e.g., INT8, 4 and 2-bit parameters), it has shown high energy efficiency in image classification whilst dispensing with the high-precision typically required for ML training.

Source: https://modha.org

Notwithstanding their differences, these approaches all fit under the neuromorphic banner, and all set out to improve performance and energy efficiency compared to existing solutions.

In practise though, complexities in implementation can negate some of the theoretical benefits. For instance, whilst SNNs involve a simpler accumulation operation than the MAC used in conventional ANNs, the overhead of supporting sparse activation in time- and event-driven computations can result in greater energy usage than today’s highly optimised GPUs and ANN accelerators.

Cost is another important factor. The tight integration of memory and compute in neuromorphic architectures minimises data transfer thereby reducing energy, but on the flip side requires the memory to be fabricated using the same expensive logic processes as the processing units, and this can be 100x the cost of the off-chip DRAM used in conventional architectures.

Given these considerations, neuromorphic is unlikely to outperform GPUs and ANN accelerators at large scale in data centres. It might though be well suited to small-scale, edge-based applications such as voice and gesture recognition, and within mission critical sensing and robotic applications where its low energy and realtime capabilities would be a real benefit.

Other areas being explored include its use within software-defined radio for power-efficient edge devices, and leveraging the massively parallel, event-driven nature of neuromorphic for graph algorithms and optimisation tasks in real-world data modelling.

Neuromorphic is certainly not a panacea for delivering a fundamental step-change in compute efficiency, but in specific targeted applications it shows a lot of promise, and hence represents an interesting area for startups to explore.

Future articles will explore other new compute architectures including in-memory, analog, and of course quantum computing, as well as a few others on the horizon.

The focus of AI and ML innovation to-date has understandably been in those areas characterised by an abundance of labelled data with the goal of deriving insights, making recommendations and automating processes.

But not every potential application of AI produces enough labelled data to utilise such techniques – use cases such as spotting manufacturing defects on a production line is a good example where images of defects (for training purposes) are scarce and hence a different approach is needed.

Interest is now turning within academia and AI labs to the harder class of problems in which data is limited or more variable in nature, requiring a different approach. Techniques include: leveraging datasets in a similar domain (few-shot learning), auto-generating labels (semi-supervised learning), leveraging the underlying structure of data (self-supervised learning), or even synthesising data to simulate missing data (data augmentation).

Characterising limited-data problems

Deep learning using neural networks has become increasingly adept at performing tasks such as image classification and natural language processing (NLP), and seen widespread adoption across many industries and diverse sectors.

Machine Learning is a data driven approach, with deep learning models requiring thousands of labelled images to build predictive models that are more accurate and robust. And whilst it’s generally true that more data is better, it can take much more data to deliver relatively marginal improvements in performance.

Figure 1: Diminishing returns of two example AI algorithms [Source: https://medium.com/@charlesbrun]

Manually gathering and labelling data to train ML models is expensive and time consuming. To address this, the commercial world has built large sets of labelled data, often through crowd-sourcing and through specialists like iMerit offering data labelling and annotation services.

But such data libraries and collection techniques are best suited to generalist image classification. For manufacturing, and in particular spotting defects on a production line, the 10,000+ images required per defect to achieve sufficient performance is unlikely to exist, the typical manufacturing defect rate being less than 1%. This is a good example of a ‘limited-data’ problem, and in such circumstances ML models tend to overfit (over optimise) to the sparse training data, hence struggle to generalise to new (unknown) images and end up delivering poor overall performance as a result.

So what can be done for limited-data use cases?

A number of different techniques can be used for addressing these limited-data problems depending on the circumstances, type of data and the amount of training examples available.

Few-shot learning

Few-shot learning is a set of techniques that can be used in situations where there are only a few example images (shots) in the training data for each class of image (e.g. dogs, cats). The fewer the examples, the greater the risk of the model overfitting (leading to poor performance) or adversely introducing bias into the model’s predictions. To address this issue, few-shot learning leverages a separate but related larger dataset to (pre)train the target model.

Three of the more popular approaches are meta-learning (training a meta-learner to extract generalisable knowledge), transfer learning (utilising shared knowledge between source and target domains) and metric learning (classifying an unseen sample based on its similarity to labelled samples).

Once a human has seen one or two pictures of a new animal species, they’re pretty good at recognising that animal species in other images – this is a good example of meta-learning. When meta-learning is applied in the context of ML, the model consecutively learns how to solve lots of different tasks, and in doing so becomes better at learning how to handle new tasks; in essence, ‘learning how to learn’ similar to a human – illustrated below:

Figure 2: Meta-learning [Source: www.borealisai.com]

Transfer learning takes a different approach. When training ML models, part of the training effort involves learning how to extract features from the data; this feature extraction part of the neural network will be very similar for problems in similar domains, such as recognising different animal species, and hence can be used in instances where there is limited data.

Metric learning (or distance metric learning) determines similarity between images based on a distance metric and decides whether two images are sufficiently similar to be considered the same. Deep metric learning takes the approach one step further by using neural networks to automatically learn discriminative features from the images and compute the distance metric based on these features – very similar in fact to how a human learns to differentiate animal species.

Self-supervised & semi-supervised learning

Techniques such as few-shot learning can work well in situations where there is a larger labelled dataset (or pre-trained model) in a similar domain, but this won’t always be the case.

Semi-supervised learning can address this lack of sufficient data by leveraging the data that is labelled to predict labels for the rest hence creating a larger labelled dataset for use in training. But what if there isn’t any labelled data? In such circumstances, self-supervised learning is an emerging technique that sidesteps the lack of labelled data by obtaining supervisory signals from the data itself, such as the underlying structure in the data.

Figure 3 Predicting hidden parts of the input (in grey) from visible parts (in green) using self-supervised learning [source: metaAI]

Data augmentation

An alternate approach is simply to fill the gap through data augmentation by simulating real-world events and synthesising data samples to create a sufficiently large dataset for training. Such an approach has been used by Tesla to complement the billions of real-world images captured via its fleet of autonomous vehicles for training their AI algorithms, and by Amazon within their Amazon’s Go stores for determining which products each customer is taking from the shelves.

Figure 4: An Amazon Go store [Source: https://www.aboutamazon.com/what-we-do]

Whilst synthetic data might seem like a panacea for any limited-data problem, it’s too costly to simulate for every eventuality, and it’s impractical to predict anomalies or defects a system may face when put into operation.

Data augmentation has the potential to reinforce any biases that may be present in the limited amount of original labelled data, and/or causing overfitting of the model by creating too much similarity within the training samples such that the model struggles to generalise to the real-world.

Applying these techniques to computer vision

Mindtrace is utilising the unsupervised and few-shot learning techniques described previously to deliver a computer vision system that is especially adept in environments characterised by limited input data and where models need to adapt to changing real-life conditions.

Pre-trained models bringing knowledge from different domains create a base AI solution that is fine-tuned from limited (few-shot) or unlabelled data to deliver state-of-the-art performance for asset inspection and defect detection.

Figure 6: Mindtrace [Source: https://www.mindtrace.ai]

This approach enables efficient learning from limited data, drastically reducing the need for labelled data (by up to 90%) and the time / cost of model development (by a factor of 6x) whilst delivering high accuracy.

Furthermore, the approach is auto-adaptive, the models continuously learn and adapt after deployment without needing to be retrained, and are better able to react to changing circumstances in asset inspection or new cameras on a production line for detecting defects, for example.

The solution is also specifically designed for deployment at the edge by reducing the size of the model through pruning (optimal feature selection) and reducing the processing and memory overhead via quantisation (reducing the precision using lower bitwidths).

Furthermore, through a process of swarm learning, insights and learnings can be shared between edge devices without having to share the data itself or process the data centrally, hence enabling all devices to feed off one-another to improve performance and quickly learn to perform new tasks (Bloc invested in Mindtrace in 2021).

In summary

The focus of AI and ML innovation to-date has understandably been in areas characterised by an abundance of labelled data to derive insights, make recommendations or automate processes.

Increasingly though, interest is turning to the harder class of problems with data that is limited and dynamic in nature such as the asset inspection examples discussed. Within Industry 4.0, limited-data ML techniques can be used by autonomous robots to learn a new movement or manipulation action in a similar way to a human with minimal training, or to auto-navigate around a new or changing environment without needing to be re-programmed.

Limited-data ML is now being trialled across cyber threat intelligence, visual security (people and things), scene processing within military applications, medical imaging (e.g., to detect rare pathologies) and smart retail applications.

Mindtrace has developed a framework that can deliver across a multitude of corporate needs.

Figure 7: Example Autonomous Mobile Robots from Panasonic [Source: Panasonic]