The mobile industry has been marching to the constant drum beat of defining, standardising and launching the next iteration of mobile technology every ten years or so – 3G launching in 2001, 4G in 2009, 5G in 2019, and now 6G mooted for 2030 if not before.
The ITU have laid out a set of 6G goals for the industry to work towards in the IMT-2030 framework. It builds on the existing 5G capabilities, with the aim of 6G delivering immerse, massive, hyper-reliable and low latency communications as well as increasing reach and facilitating connected intelligence & AI services.
In the past, each successive iteration has demanded wholesale upgrades of the network infrastructure – great for the telco equipment manufacturers, but ruinous for the mobile network operators (MNOs).
The GSMA and McKinsey estimate that MNOs worldwide have invested somewhere between $480 billion and $1 trillion respectively to fund the rollout of 5G. And whilst 5G has delivered faster network speeds, there are precious few examples of where 5G has delivered incremental value for the MNOs.
MNOs remain unconvinced that there are any new apps on the horizon that need ‘a new G’, and hence the likes of AT&T, Orangeand SKT amongst others have been vocal about an evolution of 5G rather than yet another step-change with 6G.
This sentiment has been further reinforced by the wider MNO community through NGMN with a clearly stated position that 6G should entail software-based feature upgrades rather than “inherently triggering a hardware refresh”.
The NGMN 6G Position Statement goes on to set a number of MNO-endorsed priorities focused primarily on network simplification with the goal of lowering operational costs, rather than the introduction of whizzy new features.
In particular, it emphasises “automating network operations and orchestration to enable efficient, dynamic service provisioning”, and “proactive network management to predict and address issues before they impact user experience”.
AI will play a pivotal role in achieving these aims. Machine learning (ML) models, for instance, could be trained on a digital twin of the physical environment and then transferred to the network to optimise it for each deployment site, radio condition, and device/user context, thereby improving performance, quality of service (QoS), and network robustness.
AI could also be used to address power consumption, another concern for MNOs, with energy costs representing as much as 40% of a network‘s OPEX according to the GSMA, and now outpacing MNO sales growth by over 50%.
By analysing traffic patterns and other factors, AI is able to make predictions on future traffic demand and thereby identify where parts of the network could be temporarily shut down, reducing energy consumption by 25% and with no adverse impact on the perceived network performance. Turkcell, for example, found that AI was able to reduce their energy consumption by ~63GWh, equivalent to the energy required by OpenAI to train GPT-4.
Startups such as Net.AI have been pioneering in this space, applying AI to improve the energy efficiency of network infrastructure, whilst EkkoSense have recently helped VM O2 save over £1 million a year in the cost of cooling its data centres, and Three have been working with Ericsson to improve energy efficiency by ~70% through the use of AI, data analytics and a ‘Micro Sleep’ feature.
Progressive MNOs such as SKT and Verizon have set out clear strategies for embracing AI, with SKT also teaming up with DT, e&, and Singtel to develop large language models for automating aspects of customer services (Telco LLM) as part of the Global Telco AI Alliance.
Given its staid image, it may be surprising to learn that telecoms is leading the adoption of GenAI (70%) ahead of both the retail (66%) and banking & insurance (60%) sectors. 89% of telcos plan to invest in GenAI next financial year – the joint highest along with the insurance industry – with application across marketing, sales and IT.
Wider adoption though of AI across all aspects of mobile won’t be easy, and certainly the integration of AI natively into future mobile networks to achieve the desired performance and energy efficiency benefits will present a few challenges.
Naysayers argue that there will be insufficient data to model all network eventualities, and that energy optimisation will be hampered given that much of the passive infrastructure at basestations (power management; air conditioning etc.) is not IP-connected.
Interoperability is another concern, particularly in the early days where AI systems are likely to be developed and deployed independently, hence creating a risk of inadvertently working against each other (e.g., maximising network performance vs shutting down parts of the network to conserve energy).
Other MNOs including Orange baulk at the cost of acquiring sufficient compute to support AI given the current purchasing frenzy around GPUs, and the associated energy cost.
This is likely to be mitigated in the longer term through combining telco and AI compute in an AI-native virtualised network infrastructure. Ericsson, T-Mobile, Softbank, NVIDIA, AWS, ARM, Microsoft, and Samsung have recently joined forces to form the AI-RAN Alliance with the goal of developing AI-integrated RAN solutions, running AI and virtualised network functions on the same compute to boost network efficiency and performance.
NVIDIA and Softbank have already run a successful pilot, demonstrating carrier-grade 5G performance on an Nvidia-accelerated AI-RAN solution whilst using the network’s excess capacity to run AI interference workloads concurrently. In doing so, they were able to unlock the two thirds of network capacity typically running idle outside of peak hours for additional monetisation, effectively turning network base stations from cost centres into AI revenue-producing assets – Nvidia and SoftBank estimate that an MNO could earn ~$5 in AI inference revenue for every $1 capex invested in new AI-RAN infrastructure.
As mentioned earlier, many MNOs are already experimenting with AI across different aspects of their network and operations. To-date, this has typically involved employing in-house resources in limited pilots.
Going forward, it’s likely that AI will only realise its true potential through fostering a closer collaboration between the telecoms and AI sectors (cf. GSMA/ATI 2019), and opening up the network to embrace 3rd party innovation for tackling operational efficiency, a market potentially worth $20 billion by 2030.
This is a contentious issue though; whilst MNOs have been moving gradually towards a more open network architecture through adoption of network function virtualisation and participation in industry initiatives such as O-RAN that target network disaggregation, at an operational level many MNOs still prefer procuring from incumbent suppliers and resist any pressure to diversify their supply chain.
Taking such a stance presents a number of hurdles for startups hoping to innovate in network AI, and worst still, the lack of a validated channel-to-market disincentivises any funding of said startups by VCs.
Elsewhere, there may be other opportunities; for instance, in private 5G where solutions can be curated and optimised for individual Enterprise customers, or in the deployment of neutral host networks at public hotspots run by local councils or at stadiums – a £40m UK Gov initiative is currently underway to fund pilots in these areas.
There may also be opportunities to innovate on top of public 5G, leveraging new capabilities such as network slicing to deliver targeted solutions to particular verticals such as broadcasting and gaming, although this hinges on MNO rollout of 5G Standalone, which has been a long time coming, and making these capabilities available to developers via APIs.
MNOs have tried exposing network capabilities commercially many times in the past, either individually or through industry initiatives such as One API Exchange and the Operator Platform Group, but with little success.
MobiledgeX, an edge computing platform spanning 25 MNOs across Europe, APAC and North America was arguably more successful, but eventually acquired and subsumed into Google Cloud in 2022.
Since then, MNOs have progressively come to terms with the need to partner with the Hyperscalers, and more recently Vodafoneand BT amongst others have signed major deals with the likes of Google, Microsoft and AWS.
More interestingly, a collection of leading MNOs (Vodafone; AT&T; Verizon; Bharti Airtel; DT; Orange; Telefonica; Singtel; Reliance Jio; Telstra; T-Mobile US; and America Movil) have recently teamed up with Ericsson and Google Cloud to form a joint venture for exposing industry-wide CAMARA APIs to foster innovation over mobile networks.
If they get it right this time, they may succeed in attracting startups and developers to the telecoms sector, and in doing so unlock a potential $300bn in monetising the network as a platform, as well as expanding their share of the worldwide $5 trillion ICT spend.
The New Space era
The space industry has seen rapid change over the past decade with increased involvement from the private sector dramatically reducing costs through innovation, and democratising access for new commercial customers and market entrants – the New Space era.
Telecoms in particular has seen recent growth with the likes of Starlink, Kuiper, Globalstar, Iridium and others all vying to offer a mix of telecoms, messaging and internet services either to dedicated IoT devices and consumer equipment in the home or direct to mobile phones through 5G non-terrestrial networks (NTN).
Earth Observation (EO) is also going through a period of transformation. Whilst applications such as weather forecasting, precision farming and environmental monitoring have traditionally involved data being sent back to Earth for processing, the latest EO satellites and hyperspectral sensors gather far more data than can be quickly downlinked and hence there is a movement towards processing the data in orbit, and only relaying the most important and valuable insights back to Earth; for instance, using AI to spot ships and downlink their locations, sizes and headings rather than transferring enormous logs of ocean data.
Both applications are dependent on higher levels of compute to move signal processing up into the satellites, and rapid development and launch cycles to gain competitive advantage whilst delivering on a constrained budget shaped by commercial opportunity rather than government funding.
Space is harsh
The space environment though is harsh, with exposure to ionising radiation degrading satellite electronics over time, and high-energy particles from Galactic Cosmic Rays (GCR) and solar flares striking the satellite causing single event effects (SEE) that can ‘upset’ circuits and potentially lead to catastrophic failure.
The cumulative damage is known as the total ionising dose (TID) and has the effect of gradually shifting semiconductor gate biasing over time to a point where the transistors remain permanently open (or closed) leading to component failure.
In addition, the high-energy particles cause ‘soft errors’ such as a glitch in output or a bit flip in memory which corrupts data, potentially causing a crash that requires a system reboot; worst case, they can cause a destructive latch-up and burnout due to overcurrent.
Radiation-hardening
The preferred method for reducing these risks is to use components that have been radiation-hardened (‘rad-hard’).
Historically this involved using exotic process technologies such as silicon carbide or gallium nitride to provide high levels of immunity, but such an approach results in slower development time and significantly higher cost that can’t be justified beyond mission-critical and deep-space missions.
Radiation-Hardening By Design (RHBD) is now more common for developing space-grade components and involves a number of steps, such as changes in the substrate for chip fabrication to lower manufacturing costs, switching from ceramic to plastic packaging to reduce size and cost, and specific design & layout changes to improve tolerance to radiation. One such design change is the adoption of triple modular redundancy (TMR) in which mission critical data is stored and processed in three separate locations on the chip, with the output only being used if two or more responses agree.
These measures in combination can provide protection against radiation dose levels (TID) in the 100-300 krad range – not as high as for fully rad-hard mission-critical components (1000 krad+), but more than sufficient for enabling multi-year operation with manageable risk, and at lower cost.
ESCC radiation hardness assurance (RHA) qualification levels
Having said that, they still cost much more than their commercial-off-the-shelf (COTS) equivalents due to larger die sizes, numerous design & test cycles, and smaller manufacturing volumes. RHBD components also tend to shy away from employing the latest semiconductor technology (smaller CMOS feature sizes, higher clock rates etc.) in order to maximise radiation tolerance, but in doing so sacrifice compute performance and power efficiency.
For today’s booming space industry, it will be challenging for space-grade components alone to keep pace with the required speed of innovation, high levels of compute, and constrained budgets driven by many New Space applications.
The challenge therefore is to design systems using COTS components that are sufficiently radiation-tolerant for the intended satellite lifespan, yet still able to meet the performance and cost goals. But this is far from easy, and the march to ever smaller process nodes (e.g., <7nm) to increase compute performance and lower power consumption are making COTS components increasingly vulnerable to the effects of radiation.
Radiation-tolerance
Over the past decade, the New Space industry has experimented with numerous approaches, but in essence the path most commonly taken to achieve the desired level of radiation tolerance for small satellites is through a combination of upscreening COTS components, dedicated shielding, and novel system design.
Upscreening
The radiation-tolerance of COTS components varies based on their internal design & process node, intended usage, and variations and imperfections in the chip-manufacturing process.
Typically withstanding radiation dose levels (TID) in the 5-10 krad range before malfunction, some components, such as those designed for automotive or medical applications and manufactured using larger process nodes (e.g., >28nm), may be tolerant up to 30-50 krad or even higher whilst also being better able at coping with the vibration, shock and temperature variations of space.
A process of upscreening can therefore be used to select components with better intrinsic radiation tolerance, and then filter out those with the highest tolerances within a batch.
Whilst this addresses the degradation over time due to ionising radiation, the components will still be vulnerable to single event effects (SEE) caused by high-energy particles, and even more so if they are based on smaller process nodes and pack billions of transistors into each chip.
Shielding
Shielding would seem the obvious answer, and can reduce the radiation dose levels whilst protecting to some extent against the particles that cause SEE soft errors.
Typically achieved using aluminium, shield performance can be improved by increasing its thickness and/or coating with a high-density metal such as tantalum. However, whilst aluminium shielding is generally effective, it’s less useful with some high-energy particles that can scatter on impact and shower the component with more radiation.
The industry has more recently been experimenting with novel materials, one example being the 3D-printed nanocomposite metal material pioneered by Cosmic Shielding Corporation that is capable of reducing the radiation dose rate by 60-70% whilst also being more effective at limiting SEE.
Concept image of a custom plasteel shielding solution for a space-based computer (CSC)
System design
Careful component selection and shielding can go a long way towards achieving radiation tolerance, but equally important is to design in a number of fault management mechanisms within the overall system to cope with the different types of soft error.
Redundancy
Similar to TMR, triple redundancy can be employed at the system level to spot and correct processing errors, and shut down a faulty board where needed. SpaceX follows such an approach, using triple-redundancy within its Merlin engine computers, with each constantly checking on the others as part of a fault-tolerant design.
System reset
More widely within the system, background analysis and watchdog timers can be used to spot when a part of the system has hung or is acting erroneously and reboot accordingly to (hopefully) clear the error.
Error correction
To mitigate the risk of data becoming corrupted in memory due to a single event upset (SEU), “scrubber” circuits utilising error-detection and correction codes (EDAC) can be used to continuously sweep the memory, checking for errors and writing back any corrections.
Latch-up protection
Latch-ups are a particular issue and can result in a component malfunctioning or even being permanently damaged due to overcurrent, but again can be mitigated to some extent through the inclusion of latch-up protection circuitry within the system to monitor and reset the power when a latch-up is detected.
This is a vibrant area for innovation within the New Space industry with a number of companies developing novel solutions including Zero Error Systems, Ramon Space, Resilient Computing and AAC Clyde Space amongst others.
Design trade-offs
Whilst the New Space industry has shown that it’s possible to leverage cheaper and higher performing COTS components when they’re employed in conjunction with various fault-mitigation measures, the effectiveness of this approach will ultimately be dependent on the satellite’s orbit, the associated levels of radiation it will be exposed to, and its target lifespan.
For the New Space applications mentioned earlier, a low earth orbit (LEO) is commonly needed to improve image resolution (EO), and optimise transmission latency and link budget (telecoms).
Orbiting at this altitude though results in a shorter satellite lifespan:
- More residual atmosphere and microscopic debris slow the satellite down and require it to burn more propellant to maintain orbit
- Debris also gradually degrades the performance of the solar panels
- More time is spent in the Earth’s shadow, thereby causing batteries to go through multiple discharge/recharge cycles per day which reduces their lifetime
Because of this degradation, LEO satellites may experience a useful lifespan limited to ~5-7yrs (Starlink 5yrs; Kuiper 7yrs), whilst smaller CubeSats typically used for research purposes may only last a matter of months.
Given these short periods in service, the cumulative radiation exposure for LEO satellites will be relatively low (<30 krad), making it possible to design them with upscreened COTS components combined with fault management and some degree of shielding without too much risk of failure (and in the case of telecoms, the satellite constellations are complemented with spares should any fail).
With CubeSats, where lifetime and hence radiation exposure is lower still (<10 krad), and the emphasis is often on minimising size and weight to reduce costs in manufacture and launch, dedicated shielding may not be used at all, with reliance being placed instead on radiation-tolerant design at the system level… although by taking such risks, CubeSats tend to be prone to more SEE-related errors and repeated reboots (possibly as much as every 3-6 weeks).
Spotlight on telecoms
In the telecoms industry, there is growing interest in deploying services that support direct-to-device (D2D) 5G connectivity between satellites and mobile phones to provide rural coverage and emergency support, similar to Apple’s Emergency SOS service delivered via Globalstar‘s satellite constellation.
To-date, constellations such as Globalstar’s have relied on simple bent-pipe (‘transparent’) transponders to relay signals between basestations on the ground and mobile devices. Going forward, there is growing interest in moving the basestations up into the satellites themselves to deliver higher bandwidth connectivity (a ‘regenerative’ approach). Lockheed Martin, in conjunction with Accelercomm, Radisys and Keysight, have already demonstrated technical feasibility of a 5G system and plan a live trial in orbit later in 2024.
Source: Lockheed Martin
The next few years will see further deployments of 5G non-terrestrial networks (NTN) for IoT and consumer applications as suitable space-grade and space-tolerant processors for running gNodeB basestation software and the associated inter-satellite link (ISL) processing become more widely available, and the 3GPP R19 standards for the NTN regenerative payload approach are finalised.
Takeaways
New Space is driving a need for high levels of compute performance and rapid development using the latest technology, but is limited by strict budgetary constraints to achieve commercial viability. The space environment though is harsh, and delivering on these goals is challenging.
Radiation-hardening of electronic components is the gold standard, but expensive due to the lengthier design & testing cycles, and higher marginal cost of manufacturing due to the smaller volumes, and such an approach often needs to sacrifice compute performance to achieve higher levels of radiation immunity and longer lifespan.
COTS components can unlock higher levels of compute performance and rapid development, but are typically too vulnerable to radiation exposure to be used in space applications on their own for any appreciable length of time without risk of failure.
Radiation-tolerance can be improved through a process of upscreening components, dedicated shielding, and fault management measures within the system design. And this approach is typically sufficient for those New Space applications targeting LEO orbits which, due to the limited satellite lifespans at this altitude, and with some protection from the Earth’s atmosphere, experience doses of radiation low enough during their lifetime to be adequately mitigated by the rad-tolerant approach.
Satellites are set to become more intelligent, whether that be through running advanced communications software, or deriving insights from imaging data, or the use of AI for other New Space applications. This thirst for compute performance is being met by a mix of rad-tolerant (~30-50 krad) and rad-hard (>100 krad) processors and single-board computers (SBC) coming to market from startups (Aethero), more established companies (Vorago Technologies; Coherent Logix), and industry stalwarts such as BAE, as well as NASA’s own HPSC (in partnership with Microchip) offering both rad-tolerant (for LEO) and rad-hard RISC-V options launching in 2025.
The small satellite market is set to expand significantly over the coming decade with the growing democratisation of space, and the launch, and subsequent refresh, of the telecoms constellations – Starlink alone will likely refresh 2400 satellites a year (assuming 12k constellation; 5yr satellite lifespan), and could need even more if permitted to expand their constellation with another 30,000 satellites over the coming years.
As interest from the private sector in the commercialisation of space grows, it’s opening up a number of opportunities for startups to innovate and bring solutions to market that boost performance, increase radiation tolerance, and reduce cost.
Some such as EdgX are seeking to revolutionise compute and power efficiency through the use of neuromorphic computing, whilst others (Zero Error Systems; Apogee Semiconductor) are focused on developing software and IP libraries that can be integrated into future designs for radiation-tolerant components and systems, and others (Little Place Labs) are utilising onboard AI to transform satellite data near-realtime into actionable insights for rapid relay back to Earth.
Space has had many false dawns in the past, but the future for New Space, and the opportunities it presents for startups to innovate and disrupt the norms, are numerous.
Introduction
The global economy is fuelled by knowledge and information, the digital sector growing 6x faster than the economy as a whole. AI will act as a further accelerant in boosting the economy and has the potential of being transformative across many sectors.
Figure 1: GenAI wide application
However, in realising this promise, GenAI faces a number of challenges: 1) foundational LLMs have a great set of skills and generalist knowledge, but know nothing about individual companies’ products & services; 2) whilst LLMs are great at providing an instant answer, they often misunderstand the question or simply hallucinate in their response; equally, 3) LLMs need to improve in their reasoning capabilities to understand and solve complex problems; and finally, 4) LLMs are compute intensive, inhibiting their widespread adoption.
This article looks at how these challenges are being addressed.
Tailoring LLMs: Fine-tuning
First and foremost, LLMs need customisation to understand company data & processes and best deliver on the task at hand. Pre-training, in which a company develops its own LLM trained on their data might seem like the obvious choice but is not easy, requiring massive amounts of data, expensive compute, and a dedicated team.
Figure 2: LLM pre-training
A simpler approach takes an existing foundational model and fine-tunes it by adapting its parameters to obtain the knowledge, skills and behaviour needed.
Figure 3: LLM fine-tuning
Fine-tuning though updates all the parameters, so can quickly become costly and slow if adapting a large LLM, and keeping it current will probably require re-tuning on a regular basis. Parameter-efficient fine-tuning (PEFT) techniques such as Low-rank Adaptation (LoRA) address this issue by targeting a subset of the parameters, speeding up the process and reducing cost.
Rather than fine-tuning the model, an alternate is to provide guidance within the prompt, an approach known as in-context learning.
In-context Learning
Typically, LLMs are prompted in a zero-shot manner:
Figure 4: Zero-shot prompting [source: https://www.oreilly.com]
LLMs do surprisingly well when prompted in this way, but perform much better if provided with more guidance via a few input/output examples:
Figure 5: Few-shot ICL [source: https://www.oreilly.com]
As is often the case with LLMs, more examples tends to improve performance, but not always – irrelevant material included in the prompt can drastically deteriorate the LLM’s performance. In addition, many LLMs pay most attention to the beginning and end of their context leaving everything else “lost in the middle”.
Figure 6: Accuracy doesn’t scale with ICL volume
In-context learning neatly side-steps the effort of fine-tuning but results in lengthier prompts which increases compute and cost; a few-shot PEFT approach may produce better results, and be cheaper in the long run.
Whilst fine-tuning and in-context learning hones the LLM’s skills to particular tasks, it doesn’t acquire new knowledge. For tasks such as answering customer queries on a company’s products and services, the LLM needs to be provided with supplementary knowledge at runtime. Retrieval-Augmented Generation (RAG) is one such mechanism for doing that.
Retrieval-Augmented Generation (RAG)
The RAG approach introduces a knowledge store from which relevant text chunks can be retrieved and appended to the prompt fed into the LLM. A keyword search would suffice, but it’s much better to retrieve information based on vector similarity via the use of embeddings:
Figure 7: RAG system
RAG boosts the knowledge available to the LLM (hence reducing hallucinations), prevents the LLM’s knowledge becoming stale, and also gives access to the source for fact-checking.
RAPTOR goes one step further by summarising the retrieved text chunks at different abstraction levels to fit the target LLM’s maximum context length (or to fit within a given token budget), whilst Graph RAG uses a graph database to provide contextual information to the LLM on the retrieved text chunk thus enabling deeper insights.
Figure 8: GraphRAG
Small Language Models (SLMs)
Foundational LLMs incorporate a broad set of skills but their sheer size precludes them from being run locally on devices such as laptops and phones. For singular tasks, such as being able to summarise a pdf, or proof-read an essay, a Small Language Model (SLM) that has been specially trained for the task will often suffice. Microsoft’s Phi-3, for example, has demonstrated impressive capabilities in a 3.8B parameter model small enough to fit on a smartphone. The recently announced Copilot+ PCs will include a number of these SLMs, each capable of performing different functions to aid the user.
Figure 9: Microsoft SLM Phi-3
Apple have also adopted the SLM approach to host LLMs on Macs and iPhones, but interestingly have combined the SLM with different sets of LoRA weights loaded on-demand to adapt it to different tasks rather than needing to deploy separate models. Combined with parameter quantisation, they’ve squeezed the performance of an LLM across a range of tasks into an iPhone 15 Pro whilst still generating an impressive 30 tokens/s output speed.
Figure 10: Apple Intelligence
Reasoning
Whilst LLMs are surprisingly good across a range of tasks, they still struggle with reasoning and navigating complex problems.
Using the cognitive model devised by the late Daniel Kahneman, LLMs today arguably resemble the Fast-Thinking system, and especially so in zero-shot mode, trying to produce an instant response, and often getting it wrong. For AI to be truly intelligent, it needs to improve its Slow Thinking and reasoning capabilities.
Figure 11: Daniel Kahneman’s cognitive system model
Instruction Prompting
Experimentation has found, rather surprisingly, that including simple instructions in the prompt can encourage the LLM to take a more measured approach in producing its answer.
Chain of Thought (CoT) is one such technique in which the model is instructed with a phrase such as “Let’s consider step by step…” to encourage it to break down the task into steps, whilst adding a simple “be concise” instruction can shorten responses by 50% with minimal impact on accuracy. Tree of Thoughts (ToT) is a similar technique, but goes further by allowing multiple reasoning paths to be explored in turn before settling on a final answer.
Figure 12: CoT and ToT instruction prompting
Self-ask goes further still by asking the model to break down the input query into sub-questions and answer each of these first (and with the option of retrieving up-to-date information from external sources), before using this knowledge to compile the final answer – essentially a combination of CoT and RAG.
Figure 13: Self-ask instruction prompting
Agentic Systems
Agentic systems combine the various planning & knowledge-retrieval techniques outlined so far with additional capabilities for handling more complex tasks. Most importantly, agentic systems examine their own work and identify ways to improve it, rather than simply generating output in a single pass. This self-reflection is further enhanced by giving the AI Agent tools to evaluate its output, such as running the code through unit tests to check for correctness, style, and efficiency, and then refining accordingly.
Figure 14: Agentic system
More sophisticated agentic workflows might separate the data gathering, reasoning and action taking components across multiple agents that act autonomously and work collaboratively to complete the task. Some tasks might even be farmed out to multiple agents to see which comes back with the best outcome; a process known as response diversity.
Compared to the usual zero shot approach, agentic workflows have demonstrated significant increases in performance – an agentic workflow using the older GPT-3.5 model (or perhaps one of the SLMs mentioned earlier) can outperform a much larger and sophisticated model such as GPT-4.
Figure 15: Agentic system performance
It’s still early days though, and there remain a number of areas that need further optimisation. LLM context length, for instance, can place limits on the amount of historical information, detailed instructions, input from other agents, and own self-reflection that can be accommodated in the reasoning process. Separately, the iterative nature of agentic workflows can result in cascading errors, hence requiring intermediate quality control steps to monitor and rectify errors as they happen.
Takeaways
LLMs burst onto the scene in 2022 with impressive fast-thinking capabilities albeit plagued with hallucinations and knowledge constraints. Hallucinations remain an area not fully solved, but progress in other areas has been rapid: expanding the capabilities with multi-modality, addressing the knowledge constraints with RAG, and integrating LLMs at an application and device level via CoPilot, CoPilot+ PCs & Apple Intelligence using SLMs and LoRA.
2024 will likely see agentic workflows driving massive progress in automation, and LLMs gradually improving their capabilities in slow-thinking for complex problem solving.
And after that? Who knows, but as the futurologist Roy Amara sagely pointed out, “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run”… so let’s see.
An overview of GNSS
Global Navigation Satellite Systems (GNSS) such as GPS, Galileo, GLONASS and BeiDou are constellations of satellites that transmit positioning and timing data. This data is used across an ever-widening set of consumer and commercial applications ranging from precision mapping, navigation and logistics (aerial, ground and maritime) to enable autonomous unmanned flight and driving. Such data is also fundamental to critical infrastructure through the provision of precise time (UTC) for synchronisation of telecoms, power grids, the internet and financial networks.
Whilst simple SatNav applications can be met with relatively cheap single-band GNSS receivers and patch antennas in smartphones, their positioning accuracy is limited to ~5m. Much more commercial value can be unlocked by increasing the accuracy to cm-level, thereby enabling new applications such as high precision agriculture and by improving resilience in the face of multi-path interference or poor satellite visibility to increase availability. Advanced Driver Assistance Systems (ADAS), in particular, are dependent on high precision to assist in actions such as changing lanes. And in the case of Unmanned Aerial Vehicles (UAVs) aka drones, especially those operating beyond visual line of sight (BVLoS), GNSS accuracy and resilience are equally mission critical.
GNSS receivers need unobstructed line-of-sight to at least four satellites to generate a location fix, and even more for cm-level positioning. Buildings, bridges and trees can either block signals or cause multi-path interference thereby forcing the receiver to fallback to less-precise GNSS modes and may even lead to complete loss of signal tracking and positioning. Having access to more signals by utilising multiple GNSS bands and/or more than one GNSS constellation can make a big difference in reducing position acquisition time and improving accuracy.
Source: https://www.mdpi.com/1424-8220/22/9/3384
Interference and GNSS spoofing
Interference in the form of jamming and spoofing can be generated intentionally by bad actors, and is a growing threat. In the case of UAVs, malicious interference to jam the GNSS receiver can force the UAV to land; whilst GNSS spoofing, in which a fake signal is generated to offset the measured position, directs the UAV off its planned course to another location; the intent being to hijack the payload.
The sophistication of such attacks has historically been limited to military scenarios and been mitigated by expensive protection systems. But with the growing availability of cheap and powerful software defined radio (SDR) equipment, civil applications and critical infrastructure reliant on precise timing are equally becoming vulnerable to such attacks. As such, being able to determine the direction of incoming signals and reject those not originating from the GNSS satellites is an emerging challenge.
Importance of GNSS antennas and design considerations
GNSS signals are extremely weak, and can be as low as 1/1000th of the thermal noise. Therefore, the performance of the antenna being first in the signal processing chain, is of paramount importance to achieving high signal fidelity. This is even more important where the aim is to employ phase measurements of the carrier signal to further increase GNSS accuracy.
However, more often than not the importance of the antenna is overlooked or compromised by cost pressures, often resulting in the use of simple patch antennas. This is a mistake, as doing so dramatically increases the demands on the GNSS receiver, resulting in higher power consumption and lower performance.
In comparison to patch antennas, a balanced helical design with its circular polarisation is capable of delivering much higher signal quality and with superior rejection of multi-path interference and common mode noise. These factors, combined with the design’s independence of ground plane effects, results in a superior antenna plus an ability to employ carrier phase measurements resulting in positional accuracy down to 10cm or better. This is simply not possible with patch antennas in typical scenarios such as on a metal vehicle roof or a drone’s wing.
The SWAP-C challenge
Size is also becoming a key issue – as GNSS becomes increasingly employed across a variety of consumer and commercial applications, there is growing demand to miniaturise GNSS receivers. Consequently, Size, Weight and Power (SWAP-C) are becoming the key factors for the system designer, as well as cost.
UAV GNSS receivers, for example, often need to employ multiple antennas to accurately determine position and heading, and support multi-band/multi-constellation to improve resiliency. But in parallel they face size and weight constraints, plus the antennas must be deployed in close proximity without suffering from near-field coupling effects or in-band interference from nearby high-speed digital electronics on the UAV. As drones get smaller (e.g. quadcopter delivery or survey drones) the SWAP-C challenge only increases further.
Aesthetics also comes into play – very similar to the mobile phone evolution from whip-lash antenna to highly integrated smartphone, the industrial designer is increasingly influencing the end design of GNSS equipment. This influence places further challenges on the RF design and particularly where design aesthetics require the antenna to be placed inside an enclosure.
The manufacturing challenge
Optimal GNSS performance is achieved through the use of ceramic cores within the helical antenna. However doing so raises a number of manufacturing yield challenges. As a result of material and dimensional tolerances, the cores have a “relative dielectric mass” and so the resonance frequency can vary between units. This can lead to excessive binning in order to deliver antennas that meet the final desired RF performance. Production yield could be improved with high precision machining and tuning, but such an approach results in significantly higher production costs.
Many helical antenna manufacturers have therefore limited their designs to using air-cores to simplify production, whereby each antenna is 2D printed and folded into the 3D form of a helical antenna. This is then manually soldered and tuned, an approach which could be considered ‘artisanal’ by modern manufacturing standards. This also results in an antenna that is larger, less mechanically robust, and more expensive, and generally less performant in diverse usage scenarios.
Introducing Helix Geospace
Helix Geospace excels in meeting these market needs with its 3D printed helical antennas using novel ceramics to deliver a much smaller form factor that is also electrically small thereby mitigating coupling issues. The ceramic dielectric loaded design delivers a much tighter phase centre for precise position measurements and provides a consistent radiation pattern to ensure signal fidelity and stability regardless of the satellites’ position relative to the antenna and its surroundings.
In Helix’s case, they overcome the ‘artisanal’ challenge of manufacturing through the development of an AI-driven 3D manufacturing process which automatically assesses the material variances of each ceramic core. The adaptive technology compensates by altering the helical pattern that is printed onto it, thereby enabling the production of high-performing helical antennas at volume.
This proprietary technique makes it possible to produce complex multi-band GNSS antennas on a single common dielectric core – multi-band antennas are increasingly in demand as designers seek to produce universal GNSS receiver systems for global markets. The ability to produce multi-band antennas is transformative, enabling a marginal cost of production that meets the high performance requirements of the market, but at a price point that opens up the use of helical antennas to a much wider range of price sensitive consumer and commercial applications.
In summary
As end-users and industry seek higher degrees of automation, there is growing pressure on GNSS receiver systems to evolve towards cm-level accuracy whilst maintaining high levels of resiliency, and in form factors small and light enough for widespread adoption.
By innovating in its manufacturing process, Helix is able to meet this demand with GNSS antennas that deliver a high level of performance and resilience at a price point that unlocks the widest range of consumer and commercial opportunities be that small & high precision antennas for portable applications, making cm-level accuracy a possibility within ADAS solutions, or miniaturised anti-jamming CRPA arrays suitable for mission critical UAVs as small as quadcopters (or smaller!).
Hello Tomorrow Global Summit is an annual event – hosted in Paris – bringing together deep tech startups, investors and ecosystem supporters. The conference always provides a great opportunity to meet like-minded members of the deep tech community and learn more about the cutting-edge technologies being built today.
The mood this year was positive, perhaps not surprising given that deep tech is now the biggest sector of VC in Europe, although much debate ensued on how to provide an environment for long-term growth before Europe can consider itself comparable with the US.
Here are five of the most prominent technology areas being discussed and some interesting startups we met within each category.
Data centre tech
There were a number of companies tackling the demand for higher bandwidth and more energy-efficient optical interconnect for data centres: innovative modulation schemes (Phanofi), all-optical transceiver chips (NEW Photonics) and novel end-end optical networking (Astrape Networks).
Novel compute
Whilst there were a handful of companies within the quantum compute space, there were surprisingly few tackling novel compute. Two of note were Literal Labs (formerly Mignon) with their Tsetlin machine approach to AI, and Nanomation, a spinout from Cambridge that has developed novel software for utilising nanomaterials in next-gen chipmaking, neuromorphic computing, quantum applications and biosensing.
Outside of hardware, Embedl are an interesting Swedish startup utilising neural search, pruning and other mechanisms to reduce ML model size for more efficient deployment (and performance) on edge processors.
AR/VR/XR
The AR/VR/XR space remains a target for startups and spinouts from academic research, despite the lacklustre performance of the smartglasses sector to-date (albeit with renewed interested following the launch of Apple’s Vision Pro).
Propositions ranged from Holographic Extended Reality (HXR) projection chips (Swave Photonics) for spatial computing, through ultra-bright laser-based displays enabling better use of smartglasses in sunny conditions (VitreaLab), to SPAE sensors for energy-efficient 3D scene scanning and eye/gaze tracking (VoxelSensors).
Sensing
Sensing in general was a surprisingly well-attended space with a number of deep tech startups covering a wide gamut of industrial and consumer applications.
Multispectral imaging has emerged over the past few years covering a broad range of applications based on the wavelengths supported, and the size and cost of the imaging system.
In many cases, the focus is on industrial applications and replacing (or at least complementing) existing sensing systems (LiDAR, radar, cameras) to improve performance in adverse weather conditions or harsh environments (such as mining), or for use on production lines for spotting anomalies and defects.
Spectricity is unique in this regard by targeting consumer applications in smartphones (e.g., skin health, improved colour photography etc.) as well as industrial applications in agritech and manufacturing.
Elsewhere in sensing, simple RGB camera systems are getting a boost through AI (Tripleye), and companies such as Calyo are demonstrating an ability to deliver ultra-low SWAP-C 3D imaging through novel use of Ultrasound.
3D digital twins
And finally, startups such as Blackshark.AI and AVES Reality are using satellite and other imaging data for generating hyper-realistic 3D representations of the physical world for use across a range of applications including training for autonomous vehicles, digital representations & planning for utility companies, and a number of dual use civil/military applications.
Executive summary
AI has exploded onto the scene, capturing the imagination of the public and investment community alike, and with many seeing it as transformative in driving future economic growth. Certainly, it’s finding its way into all manner of devices and services.
But its thirst for data and generative capabilities across all modalities (text, images, audio, video etc.) will drive a need for a more hyperconnected compute fabric encompassing sensing, connectivity, processing and flexible storage. Addressing these needs will likely be dependent on a paradigm shift away from just connecting things, to enabling “connected intelligence”. Future networks will need to get ‘smarter’, and will likely only achieve this by embracing AI throughout.
Emergence of AI
Whilst AI has been around for 70 years, recent advances in compute combined with the invention of the Transformer ML architecture and an abundance of Internet-generated data has enabled AI performance to advance rapidly; the launch of ChatGPT in particular capturing worldwide public interest in November 2022 and heralding AI’s breakout year in 2023.
Many now regard AI as facilitating the next technological leap forward, and it has been recognised as a crucial technology within the UK Science & Technology Framework.
But how transformative will AI really be?
Sundar Pichai, CEO Alphabet believes “AI will have a more profound impact on humanity than fire, electricity and the internet”, whilst Bill Gates sees it as “the most important advance in technology since the graphical user interface”.
Investor sentiment has also been very positive, AI being the only sector within Deep Tech to increase over the past year, spurred on by the emergence of Generative AI (GenAI).
But is GenAI just another bubble?
The autonomous vehicle sector was in a similar position back in 2021, attracting huge investment at the time but ultimately failing to deliver on expectations.
In comparison, commercialisation of GenAI has gotten off to an impressive start with OpenAI surpassing $2bn of annualised revenue and potentially reaching $5 billion+ this year, despite dropping its prices massively as performance improved – GPT3 by 40x; GPT3.5 by 10x with another price reduction recently announced, the third in a year.
On the back of this stellar performance, OpenAI is raising new funding at a $100bn valuation ~62x its forward revenues.
Looking more broadly, the AI industry is forecast to reach $2tn in value by 2030, and contribute more than $15 trillion to the global economy, fuelled to a large extent by the rise of GenAI.
Whilst undeniably value creating, there is concern around AI’s future impact on jobs with the IMF predicting that 40% may be affected, and even higher in advanced economies. Whether this results in substantial job losses remains a point of debate, the European Central Bank concluding that in the face of ongoing development and adoption, “most of AI’s impact on employment and wages – and therefore on growth and equality – has yet to be seen”.
Future services enabled/enhanced by AI
Knowledge workers
GenAI has already shown an impressive ability to create new content (text, pictures, code) thereby automating, augmenting, and accelerating the activities of ‘knowledge workers’.
These capabilities will be applied more widely to the Enterprise in 2024, as discussed in Microsoft’s Future of Work report, and also extend to full multimodality (text, images, video, audio etc.), stimulating an uptick in more spatial and immersive experiences.
Spatial & immersive experiences (XR)
The market for AR/VR headsets has been in decline but likely to get a boost this year with the launch of Meta’s Quest 3 and Apple’s Vision Pro ‘spatial computer’.
Such headsets, combined with AI, enable a range of applications including:
- Enabling teams to collaborate virtually on product development
- Providing information for engineers in the field – e.g., Siemens and Sony
- Simulating realistic scenarios for training purposes (such as healthcare)
- Showcasing products (retail) and enabling new entertainment & gaming experiences
The Metaverse admittedly was over-hyped, but enabling users to “see the world differently” through MR/AR or “see a different world” through VR is expected to boost the global economy by $1.5tn.
Cyber-physical & autonomous systems
Just as AI can help bridge the gap for humans between the physical and the digital, GenAI can also be used to create digital twins for monitoring, simulating and potentially controlling complex physical systems such as machinery.
AI will also be used extensively in robotics and other autonomous systems, enhancing the computer vision and positioning & navigation (SLAM) of smart robots in factories, warehouses, ports, and smart homes with predictions that 80% of humans will engage with them daily by 2030.
Personal devices
And finally, AI is already prevalent in many personal devices; in future, “Language models running on your personal device will become your very personal AI assistant”.
Connected Intelligence
These future services will drive a shift to more data being generated at the edge within end-user devices or in the local networks they interface to.
As the volume and velocity increases, relaying all this data via the cloud for processing becomes inefficient, costly and reduces AI’s effectiveness. Moving AI processing at or near the source of data makes more sense, and brings a number of advantages over cloud-based processing:
- improved responsiveness – vital in smart factories or autonomous vehicles where fast reaction time is mission critical
- increased data autonomy – by retaining data locally thereby complying with data residency laws and mitigating many privacy and security concerns
- minimising data-transfer costs of moving data in/out of the cloud
- performing continuous learning & model adaptation in-situ
Moving AI processing completely into the end-user device may seem ideal, but presents a number of challenges given the high levels of compute and memory required, especially when utilising LLMs to offer personal assistants in-situ.
Running AI on end-user devices may therefore not always be practical, or even desirable if the aim is to maximise battery life or support more user-friendly designs, hence AI workloads may need to be offloaded to the best source of compute, or perhaps distributed across several, such as a wearable offloading AI processing to an interconnected smartphone, or a smartphone leveraging compute at the network edge (MEC).
Future networks, enabled by AI
Future networks will need to evolve to support these new compute and connectivity paradigms and the diverse requirements of all the AI-enabled services outlined.
5G Advanced will meet some of the connectivity needs in the near-term, such as low latency performance, precise timing, and simultaneous multi-bearer connectivity. But going forward, telecoms networks will need to become ‘smarter’ as part of a more hyperconnected compute fabric encompassing sensing, connectivity, processing and flexible storage.
Natively supporting AI within the network will be essential to achieving this ‘smartness’, and is also the stated aim for 6G. A few examples:
Hardware design
AI/ML is already used to enhance 5G Advanced baseband processing, but for 6G could potentially design the entire physical layer.
Network planning & configuration
Network planning & configuration is increasing in complexity as networks become more dense and heterogeneous with the move to mmWave, small cells, and deployment of neutral host and private networks. AI can speed up the planning process, and potentially enable administration via natural language prompts (Optimisation by Prompting (OPRO); Google DeepMind).
Network management & optimisation
Network management is similarly challenging given the increasing network density and diversity of application requirements, so will need to evolve from existing rule-based methods to an intent-based approach using AI to analyse traffic data, foresee needs, and manage network infrastructure accordingly with minimal manual intervention.
Net AI, for example, use specialised AI engines to analyse network traffic accurately in realtime, enabling early anomaly detection and efficient control of active radio resources with an ability to optimise energy consumption without compromising customer experience.
AI-driven network analysis can also be used as a more cost-effective alternative to drive-testing, or for asset inspection to predict system failures, spot vulnerabilities, and address them via root cause analysis and resolution.
In future, a cognitive network may achieve a higher level of automation where the human network operator is relieved from network management and configuration tasks altogether.
Network security
With the attack surface potentially increasing massively to 10 million devices/km2 in 6G driven by IoT deployments, AI will be key to effective monitoring of the network for anomalous and suspicious behaviour. It can also perform source code analysis to unearth vulnerabilities prior to release, and thereby help mitigate against supply chain attacks such as experienced by SolarWinds in 2020.
Energy efficiency
Energy consumption can be as high as 40% of a network‘s OPEX, and contribute significantly to an MNO’s overall carbon footprint. With the mobile industry committing to reducing carbon emissions by 2030 in alignment with UN SDG 9, AI-based optimisation in conjunction with renewables is seen as instrumental to achieving this.
Basestations are the ‘low-hanging fruit’, accounting for 70-80% of total network energy consumption. Through network analysis, AI is able predict future demand and thereby identify where and when parts of the RAN can be temporarily shut down, maintaining the bare-minimum network coverage and reducing energy consumption in the process by 25% without adversely impacting on perceived network performance.
Turkcell, for example, determined that AI was able to reduce network energy consumption by ~63GWh – to put that into context, it’s equivalent to the energy required to train OpenAI’s GPT-4.
Challenges
Applying AI to the operations of the network is not without its challenges.
Insufficient data is one of the biggest constraints, often because passive infrastructure such as diesel generators, rectifiers and AC, and even some network equipment, are not IP-connected to allow access to system logs and external control. Alternatively, there may not be enough data to model all eventualities, or some of the data may remain too privacy sensitive even when anonymised – synthesising data using AI to fill these gaps is one potential solution.
Deploying the AI/ML models themselves also throws up a number of considerations.
Many AI systems are developed and deployed in isolation and hence may inadvertently work against one another; moving to multi-vendor AI-native networks within 6G may compound this issue.
The AI will also need to be explainable, opening up the AI/ML models’ black-box behaviour to make it more intelligible to humans. The Auric framework that AT&T use for automatically configuring basestations is based on decision trees that provide a good trade-off between accuracy and interpretability. Explainability will also be important in uncovering adversarial attacks, either on the model itself, or in attempts to pollute the data to gain some kind of commercial advantage.
Skillset is another issue. Whilst AI skills are transferable into the telecoms industry, system performance will be dependent on deep telco domain knowledge. Many MNOs are experimenting in-house, but it’s likely that AI will only realise its true potential through greater collaboration between the Telco and AI industries; a GSMA project bringing together the Alan Turing Institute and Telenor to improve Telenor’s energy efficiency being a good example.
Perhaps the biggest risk is that the ecosystem fails to become sufficiently open to accommodate 3rd party innovation. A shift to Open RAN principles combined with 6G standardisation, if truly AI-native, may ultimately address this issue and democratise access. Certainly there are a number of projects sponsored within the European Smart Networks and Services Joint Undertaking (SNS JU) such as ORIGAMI that have this open ecosystem goal in mind.
Takeaways
The global economy today is fuelled by knowledge and information, the digital ICT sector growing six times faster than the rest of the economy in the UK – AI will act as a further accelerant.
Achieving its full potential will be dependent on a paradigm shift away from just connecting things, to enabling “connected intelligence”. Future networks will need to get ‘smarter’, and will likely only achieve this by embracing AI throughout.
The UK is a leader in AI investment within Europe; but can it harness this competence to successfully deliver the networks of tomorrow?
The popularity of ChatGPT has introduced the world to large language models (LLMs) and their extraordinary abilities in performing natural language tasks.
According to Accenture, such tasks account for 62% of office workers’ time, and 65% of that could be made more productive through using LLMs to automate or augment Enterprise working practises thereby boosting productivity, innovation, and customer engagement.
To give some examples, LLMs could be integrated into Customer Services to handle product queries, thereby improving response times and customer satisfaction. Equally, LLMs could assist in drafting articles, scripts, or promotional materials, or be used by analysts for summarising vast amounts of information, or gauging market sentiment by analysing customer reviews and feedback.
Whilst potentially disruptive and likely to lead to some job losses (by the mid-2030s, up to 30% of jobs could be automated), this disruption and new way of working is also forecast to grow global revenues by 9%.
It’s perhaps not surprising then that Enterprise executives are showing a keen interest in LLMs and the role they could play in their organisations’ strategies over the next 3 to 5 years.
Large language models such as OpenAI’s GPT-4 or GPT-3.5 (upon which ChatGPT is based) or open source alternatives such as Meta’s recently launched Llama2, are what’s known as foundation models.
Such models are pre-trained on a massive amount of textual data and then tuned through a process of alignment to be performant across a broad range of natural language tasks. Crucially though, their knowledge is limited by the extent of the data they were trained on, and their behaviour is dictated by the approach and objectives employed during the alignment phase.
To put it bluntly, a foundational LLM, whilst exhibiting a dazzling array of natural language skills, is less adept at generating legal documents or summarising medical information, and may be inadequate for those Customer Support applications requiring more empathy, and will certainly lack detailed knowledge on a particular product or business.
To be truly useful therefore, LLMs need to be adapted to the domain and particular use cases where they’ll be employed.
Domain-specific pre-training
One approach would be to collect domain-specific data and train a new model.
However, pre-training your own LLM from scratch is not easy, requiring massive amounts of data, lots of expensive compute hours for training the model, and a dedicated team working on it for weeks or even months. As a result, very few organisations choose this path, although notable examples include BloombergGPT (finance) and Med-PaLM 2 (medicine) and Nvidia have recently launched the NeMo framework to lend a helping hand.
Nonetheless, training a dedicated model is a serious undertaking and only open to those with the necessary resources. For everyone else, an alternate (and arguably easier) approach is to start with an existing foundational model such as GPT-3.5 and fine-tune from there.
Fine-tuning
As a form of transfer learning, fine-tuning adapts the parameters within a foundational model to better perform particular tasks.
Guidance from OpenAI for gpt-3.5-turbo indicates that 50-100 well-crafted examples is usually sufficient to fine-tune a model, although the amount will ultimately depend on the use case.
In comparison to domain-specific pre-trained models which require lots of resource, fine-tuning a foundational model requires less data, costs less, and can be completed in days, putting it well within the reach of many companies.
But it’s not without its drawbacks…
A common misconception is that fine-tuning enables the model to acquire new information, but in reality it only teaches it to perform better within particular tasks, a goal which can also be achieved through careful prompting as we’ll see later.
Fine-tuning also won’t prevent hallucinations that undermine the reliability and trustworthiness of the model’s output; and there is always a risk of introducing biases or inaccuracies into the model via the examples chosen, or inadvertently training it with sensitive information which subsequently leaks out (hence consideration should be given to using synthetic data).
Where support is required for a diverse set of tasks or edge cases within a given domain, relying on fine-tuning alone might result in a model that is too generic, performing poorly against each subtask. In such a situation, individual models may need to be created for each task and updated frequently to stay current and relevant as new knowledge becomes available, hence becoming a resource-intensive and cumbersome endeavour.
Fortunately, there are other techniques that can be employed, either in concert with or replacing fine-tuning entirely – prompt engineering.
Few-shot prompting
Irrespective of how a language model has been pre-trained and whether or not it’s been fine-tuned, the usefulness of its output is directly related to the quality of the prompt it receives. As so aptly put by OpenAI, “GPTs can’t read your mind“.
Although models can perform relatively well when prompted in a zero-shot manner (i.e., comprising just the task description and any input data), they can also be inconsistent, and may try to answer a question by regurgitating random facts or making something up from their training data (i.e., hallucinating) – they might know how words relate statistically, but they don’t know what they mean.
Output can be improved by supplementing the prompt with one or more input/output examples (few-shot) that provide context to the instruction as well as guidance on desired format, style of response and length; this is known as in-context learning (ICL); see below:
The order in which examples are provided can impact a model’s performance, as can the format used. Diversity is also incredibly important, models prompted with a diverse set of examples tending to perform better (although only the larger foundational models such as GPT-4 cope well with examples that diverge too far from what the model was originally pre-trained with).
Retrieval Augmented Generation
A good way of achieving this diversity is to retrieve task-specific examples from domain-specific knowledge sources using frameworks such as LlamaIndex, LangChain, HoneyHive, Lamini or Microsoft’s LLM-AUGMENTER.
Commonly referred to as Retrieval Augmented Generation, this approach ensures that the model has access to the most current and reliable domain-specific facts (rather than the static corpus it was pre-trained with), and users have visibility of the model’s sources thereby enabling its responses to be checked for accuracy.
As so aptly put by IBM Research, “It’s the difference between an open-book and a closed-book exam“, and hence it’s not surprising that LLMs perform much better when provided with external information sources to draw upon.
A straightforward way of implementing the RAG method is via a keyword search to retrieve relevant text chunks from external documentation, but a better approach is to use embeddings.
Put simply, embedding is a process by which the text is tokenised and passed through the LLM to create a numerical representation of the semantic meaning of the words and phrases within the text, and this representation is then be stored in a vector database (such as Pinecone, Weaviate or Chroma).
Upon receiving a query, the RAG system conducts a vector search of the database based on an embedding of the user query, retrieves relevant text chunks based on similarity and appends them to the prompt for feeding into the LLM:
Care though is needed to not overload the prompt with too much information as any increase in the prompt size directly increases the compute, time and cost for the LLM to derive an output (computation increasing quadratically with input length), and also risks exceeding the foundation model’s max prompt window size (and especially so in the case of open source models which typically have much smaller windows).
Whilst providing additional context and task-specific data should reduce the instances of hallucinations, LLMs still struggle with complex arithmetic, common sense, or symbolic reasoning, hence attention is also needed to the way the LLM is instructed to perform the task, an approach known as instruction prompting.
Instruction prompting
Chain of Thought (CoT) is one such technique, explored by Google and OpenAI amongst others, in which the model is directly instructed to follow smaller, intermediate steps towards deriving the final answer. Extending the prompt instruction with a phrase as simple as “Let’s consider step by step…” can have a surprising effect in helping the model to break down the task into steps rather than jumping in with a quick, and often incorrect, answer.
Self-ask is a similar approach in which the model is asked to generate and then answer sub-questions about the input query first (and with the option of farming out these sub-questions to Google Search to retrieve up-to-date answers), before then using this knowledge to compile the final answer (essentially a combination of CoT and RAG).
Yet another technique, Tree of Thoughts (ToT) is similar in generating a solution based on a sequence of individual thoughts, but goes further by allowing multiple reasoning paths to be considered simultaneously (forming a tree of potential thoughts) and exploring each in turn before settling on a final answer.
Whilst proven to be effective, these various instruction prompting techniques take a linear approach that progresses from one thought to the next. Humans think a little differently, following and sometimes combining insights from different chains of thought to arrive at the final answer. This reasoning process can be modelled as a graph structure and forms yet another area of research.
A final technique, which might seem even more peculiar than asking the model to take a stepwise approach (CoT and ToT) is to assign it a “role” or persona within the prompt such as “You are a famous and brilliant mathematician”. Whilst this role based prompting may seem bizarre, it’s actually providing the model with additional context to better understand the question, and has been found surprisingly to produce better answers.
Options & considerations
The previous sections have identified a range of techniques that can be employed to contextualise an LLM to Enterprise tasks, but which should you choose?
The first step is to choose whether to generate your own domain pre-trained model, fine-tune an existing foundational model, or simply rely on prompting at runtime:
There’s more discussion later on around some of the criteria to consider when selecting which foundational model to use…
Fine-tuning may at first seem the most logical path, but requires a careful investment of time and effort, hence sticking with a foundational model and experimenting with the different prompting techniques is often the best place to start, a sentiment echoed by OpenAI in their guidance for GPT.
Choice of which techniques to try will be dependent on the nature of the task:
Good results can often be achieved by employing different prompting techniques in combination:
- Detailed instructions (instruction prompting) – especially where the task involves complex reasoning
- Carefully chosen set of examples (few-shot learning) – to demonstrate the tone, format and length of output that is required
- Supplementary information (in-context learning, RAG & embeddings) – retrieved from domain-specific knowledge sources to provide more context
It’s also about balance – few-shot learning typically consumes a lot of tokens which can be problematic given the limited window size of many LLMs. So rather than guiding the model in terms of desired behaviour via a long set of examples, this can be offset by incorporating a more precise, textual description of what’s required via instruction prompting.
Prompt window size can also be a limitation in domains such as medical and legal which are more likely to require large amounts of information to be provided in the prompt; for instance most research papers (~5-8k tokens) would exceed the window size of the base GPT-3.5 model as well as many of the open source LLMs which typically only support up to 2,000 tokens (~1,500 words).
Choosing a different LLM with a larger window is certainly an option (GPT-4 can extend to 32k tokens), but as mentioned earlier will quadratically increase the amount of compute, time and cost needed to complete the task, hence in such applications it may be more appropriate to fine-tune the LLM, despite the initial outlay.
Model size is yet another factor that needs to be considered. Pre-training a domain-specific LLM, or fine-tuning a small foundational model (such as GPT-3.5 Turbo) can often match or even outperform prompting a larger foundation equivalent (such as GPT-4) whilst being smaller and requiring fewer examples to contextualise the prompt (by up to 90%) and hence cheaper to run.
Of course, fine-tuning and prompt engineering are not mutually exclusive, so there may be some benefit in fine-tuning a model generically for the domain, and then using it to develop solutions for each task via a combination of in-context learning and instruction prompting.
In particular, fine-tuning doesn’t increase domain-level knowledge, so reducing hallucinations might require adopting techniques such as instruction prompting, in-context learning and RAG/embedding, the latter also being beneficial where responses need to be verifiable for legal or regulatory reasons.
Essentially, the choice of approach will very much come down to use case. If the aim is to deliver a natural language search/recommendation capability for use with Enterprise data, a good approach would be to employ semantic embeddings within a RAG framework. Such an approach is highly scalable for dealing with a large database of documents, and able to retrieve more relevant content (via vector search) as well as being more cost-effective than fine-tuning.
Conversely, in the case of a Customer Support chatbot, fine-tuning the model to exhibit the right behaviours and tone of voice will be important, and could then be combined with in-context learning/RAG to ensure the information it has access to is up-to-date.
Choosing a foundational LLM
There are a range of foundational models to choose from with well-known examples coming from OpenAI (GPT-3.5), Google (PaLM 2), Meta (LLama2), Anthropic (Claude 2), Cohere (Command), Databricks (Dolly 2.0), and Israel’s AI21 Labs, plus an increasingly large array of open source variants that have often been fine-tuned towards particular skillsets.
Deployment on-prem provides the Enterprise with more control and privacy, but increasingly a number of players are launching cloud-based solutions that enable Enterprises to fine-tune a model without comprising the privacy of their data (in contrast to the public use of ChatGPT, for example).
OpenAI, for instance, have recently announced availability for fine-tuning on GPT-3.5 Turbo, with GPT-4 coming later this year. For a training file with 100,000 tokens (e.g., 50 examples each with 2000 tokens), the expected cost might be as little as ~$2.40, so experimenting with fine-tuning is certainly within the reach of most Enterprises albeit with the ongoing running costs of using OpenAI’s APIs for utilising the GPT model.
If an Enterprise doesn’t need to fine-tune, OpenAI now offer ChatGPT Enterprise, based on GPT-4, and with an expanded context window (32k tokens), better performance (than the public ChatGPT) and guaranteed security for protecting the Enterprise’s data.
Alternatively, Microsoft have teamed up with Meta to support Llama 2 on Azure and Windows, and for those that prefer more flexibility, Hugging Face has become by far the most popular open source library to train and fine-tune LLMs (and other modalities).
As mentioned previously, players are also bringing to market LLMs pre-trained for use within a particular domain; for example: BloombergGPT for finance; Google’s Med-PaLM-2 for helping clinicians determine medical issues within X-rays and Sec-PaLM which was tweaked for cybersecurity use cases; Salesforce’s XGen-7B family of LLMs for sifting through lengthy documents to extract data insights, or their Einstein GPT (based on ChatGPT) for use with CRM; IBM’s watsonx.ai geospatial foundation model for Earth observation data; AI21 Labs hyper-optimized task-specific models for content management or expert knowledge systems; Harvey AI for generating legal documents etc.
‘Agents’ take the capabilities of LLMs further still by taking a stated goal from the user and combining LLM capabilities with search and other functionality to complete the task – there are a number of open source projects innovating in this area (AutoGPT, AgentGPT, babyagi, JARVIS, HuggingGPT), but also commercial propositions such as Dust.
It’s a busy space… so what are the opportunities (if any) for startups to innovate and claim a slice of the pie?
Uncovering the opportunities
Perhaps not surprisingly given the rapid advancements that have been achieved over the past 12mths, attention in the industry has very much focused on deriving better foundational models and delivering the immense compute resources and data needed to train them, and consequently has created eye-wateringly high barriers for new entrants (Inflection AI recently raising $1.3bn to join the race).
Whilst revenues from offering foundational models and associated services look promising (if you believe the forecasts that OpenAI is set to earn $1bn over the next 12mths), value will also emerge higher up the value stack, building on the shoulders of giants so to speak, and delivering solutions and tools targeted towards specific domains and use cases.
Success at this level will be predicated on domain experience as well as delivering a toolchain or set of SaaS capabilities that enable Enterprises to quickly embrace LLMs, combine them with their data, and generate incremental value and a competitive advantage in their sector.
In stark contrast to the Big Data and AI initiatives in the past that have delivered piecemeal ‘actionable insights’, LLMs have the potential of unlocking comprehensive intelligence, drawing on every documented aspect of a given business, and making it searchable and accessible through natural language by any employee rather than being constrained by the resources of corporate Business Intelligence functions.
But where might startups go hunting for monetisable opportunities?
One potential option is around embeddings – noisy, biased, or poorly-formatted data can lead to suboptimal embeddings resulting in reduced performance, so is a potential micro-area for startups to address: developing a proposition, backed-up with domain-specific experience, and crafting an attractive niche in the value chain helping businesses in targeted sectors.
Another area is around targeted, and potentially personalised, augmentation tools. Whilst the notion of GenAI/LLMs acting as copilots to augment and assist humans is often discussed in relation to software development (GitHub Copilot; StarCoder), it could equally assist workers across a multitude of everyday activities. Language tasks are estimated to account for 62% of office workers’ time, and hence there is in theory huge scope for decomposing these tasks and automating or assisting them using LLM copilots. And just as individuals personalise and customise their productivity tools to best match their individual workflows and sensibilities, the same is likely to apply for LLM copilots.
Many expect that it will turn into an AI gold rush, with those proving commercial value (finding the gold) or delivering the tools to help businesses realise this value (picks & shovels) earning early success and with a chance of selling out to one of the bigger players keen to do a land grab (e.g., Salesforce, Oracle, Microsoft, GCP, AWS etc.) and before the competition catches up.
Defensibility though is likely to be a challenge, at least in the pure sense of protecting IP, and perhaps is reserved for those with access to domain-specific data sets that gives them an edge – Bloomberg, for instance, had the luxury of training their GPT model using their own repository of financial documents spanning forty years (a massive 363 billion tokens).
Takeaways
Foundational LLMs have come a long way, and can be used across a dazzling array of natural language tasks.
And yet when it comes to Enterprise applications, their knowledge is static and therefore out of date, they’re unable to state their source (given the statistical nature of how LLMs produce outputs), and they’re liable to deliver incorrect factual responses (hallucinations).
To do well in an Enterprise setting they need to be provided with detailed and appropriate context, and adequately guided.
Industry and academia are now working diligently to address these needs, and this article has outlined some of the different techniques being developed and employed.
But LLMs are still an immature technology, hence developers and startups that understand the technology in depth are likely to be the ones best able to build more effective applications, and build them faster and more easily, than those who don’t – this, perhaps, is the opportunity for startups.
As stated by OpenAI’s CEO Sam Altman, “Writing a really great prompt for a chatbot persona is an amazingly high-leverage skill and an early example of programming in a little bit of natural language”.
We’re entering the dawn of natural language programming…
Random numbers are used everywhere, from facilitating lotteries, to simulating the weather or behaviour of materials, and ensuring secure data exchange between infrastructure and vehicles in intelligent transport systems (C-ITS).
Perhaps most importantly though, randomness is at the core of the security we all rely upon for transacting safely on the internet. Specifically, random numbers are used in the creation of secure cryptographic keys for encrypting data to safeguard its confidentiality, integrity and authenticity.
Random numbers are provided via random number generators (RNGs) that utilise a source of entropy (randomness) and an algorithm to generate random numbers. A number of RNG types exist based on the method of implementation (e.g. hardware and/or software).
Pseudo-random number generators (PRNGs)
PRNGs that rely purely on software can be cost-effective, but are intrinsically deterministic and given the same seed will produce the exact same sequence of random numbers (and due to memory constraints, this sequence will eventually repeat) – whilst the output of the PRNG may be statistically random, its behaviour is entirely predictable.
An attacker able to guess which PRNG is being used can deduce its state by observing the output sequence and thereby predict each random number – and this can be as small as 624 observations in the case of common PRNGs such as the Mersenne Twister MT19937. Moreover, an unfortunate choice of seed can lead to a short cycle length before the number sequence repeats which again opens the PRNG to attack. In short, PRNGs are inherently vulnerable and far from ideal for cryptography.
True random number generators (TRNGs)
TRNGs extract randomness (entropy) from a physical source and use this to generate a sequence of random numbers that in theory are highly unpredictable. Certainly they address many of the shortcomings of PRNGs, but they’re not perfect either.
In many TRNGs, the entropy source is based on thermal or electrical noise, or jitter in an oscillator, any of which can be manipulated by an attacker able to control the environment in which the TRNG is operating (e.g., temperature, EMF noise or voltage modulation).
Given the need to sample the entropy source, a TRNG can be slow in operation, and fundamentally limited by the nature of the entropy pool – a poorly designed implementation or choice of entropy can result in the entropy pool quickly becoming exhausted. In such a scenario, the TRNG has the choice of either reducing the amount of entropy used for generating each random number (compromising security) or scaling back the number of random numbers it generates.
Either situation could result in the same or similar random numbers being output until the entropy pool is replenished – a serious vulnerability that can be addressed through in-built health checks, but with the risk that output is ceased hence opening the TRNG up to denial-of-service attacks. Ring-oscillator based TRNGs, for instance, have a hard limit in how fast they can be run, and if more entropy is sought by combining a few in parallel they can produce similar outputs hence undermining their usefulness.
IoT devices in particular often have difficulty gathering sufficient entropy during initialisation to generate strong cryptographic keys given the lack of entropy sources in these simple devices, hence can be forced to use hard-coded keys, or seed the RNG from unique (but easy to guess) identifiers such as the device’s MAC address, both of which seriously undermines security robustness.
Ideally, RNGs need an entropy source that is completely unpredictable and chaotic, not influenced by external environmental factors, able to provide random bits in abundance to service a large volume of requests and facilitate stronger keys, and service these requests quickly and at high volume.
Quantum random number generators (QRNGs)
QRNGs are a special class of RNG that utilise Heisenberg’s Uncertainty Principle (an inability to know the position and speed of a photon or electron with perfect accuracy) to provide a pure source of entropy and therefore address all the aforementioned requirements.
Not only do they provide a provably random entropy source based on the laws of physics, QRNGs are also intrinsically high entropy hence able to deliver truly random bit sequences and at high speed thereby enabling QRNGs to run much faster than other TRNGs, and more efficiently than PRNGs across high volume applications.
They are also more resistant to environmental factors, and thereby at less risk from external manipulation, whilst also being able to operate reliably in EMF noisy environments such as data centres for serving random numbers to thousands of servers realtime.
But not all QRNGs are created equal; poor design in the physical construction and/or processing circuitry can compromise randomness or reduce the level of entropy resulting in system failure at high volumes. QRNG designs embracing sophisticated silicon photonics in an attempt to create high entropy sources can become cost prohibitive in comparison to established RNGs, whilst other designs often have size and heat constraints.
Introducing Crypta Labs
Careful design and robustness in implementation is therefore vital – Crypta Labs have been pioneering in quantum technology since 2014 and through their research have developed a unique solution utilising readily available components that makes use of quantum photonics as a source of entropy to produce a state-of-the-art QRNG capable of delivering quantum random numbers at very high speeds and easily integrated into existing systems. Blueshift Memory is an early adopter of the technology, creating a cybersecurity memory solution that will be capable of countering threats from quantum computing.
Rapid advances in compute power are undermining traditional cryptographic approaches and exploiting any weakness; even a slight imperfection in the random number generation can be catastrophic. Migrating to QRNGs reduces this threat and provides resiliency against advances in classical compute and the introduction of basic quantum computers expected over the next few years. In time, quantum computers will advance sufficiently to break the encryption algorithms themselves, but such computers will require tens of millions of physical qubits and therefore are unlikely to materialise for another 10 years or more.
Post-quantum cryptography (PQC) algorithms
In preparation for this quantum future, an activity spearheaded by NIST in the US with input from academia and the private sector (e.g., IBM, ARM, NXP, Infineon) is developing a set of PQC algorithms that will be safe against this threat. A component part of ensuring these PQC algorithms are quantum-safe involves moving to much larger key sizes, hence a dependency on QRNGs able to deliver sufficiently high entropy at scale.
In summary, a transition by hardware manufacturers (servers, firewalls, routers etc.) to incorporate QRNGs at the board level addresses the shortcomings of existing RNGs whilst also providing quantum resilience for the coming decade. Not only are they being adopted by large corporates such as Alibaba, they also form a component part of the White House’s strategy to combat the quantum threat in the US.
Given that QRNGs are superior to other TRNGs, can contribute to future-proofing cryptography for the next decade, and are now cost effective and easy to implement in the case of Crypta Lab’s solution, they really are a no-brainer.
Foreword from David Leftley (CTO)
Light and its compositions has fascinated scientists for more than 2,000 years – from Aristotle to Newton, Young, Hertz and Maxwell et.al. But it was ultimately Einstein, with his photoelectric effect theory, who postulated that light is made up of particles called photons. Einstein is famous for his research on the theory of relativity yet it was his work on theoretically revealing the photoelectric effect based on the light quantum hypothesis that won him the Nobel Prize in physics in 1921.
The photon has many mysterious physical properties such as possessing the dual properties of a wave and a particle. And it was Einstein himself who described things such as entanglement as “spooky”. Learning more about these properties is allowing us to use light more effectively than ever before and applying this knowledge allows us to make significant advances in many new areas of science and technology. To date, photonics has been predominantly applied to communications with the introduction of optical fibre transforming networks as early as the 1960’s. But now we are seeing photonics taking hold in other areas such as computing, AI, sensing and quantum communications.
In the following brief, David Pollington explores the challenges and exposes some of the early stage innovation opportunities that exist in Photonics and these application areas.
The introduction of PICs
Until now, photonics has focused predominantly on enabling high-speed communications, a lucrative market that now tops $10bn just for optical transceivers. But photonics also has application in many other areas ranging from solid-state LiDAR to inertial navigation sensors, spectrometers, refractive index biosensors, quantum computers, and accelerating AI.
This article discusses the merits of using photons rather than electrons with an especial focus on photonic integrated circuits (PICs), the wide range of integrated photonics use cases, the current industry landscape of PICs, and the opportunities for startups to innovate in this space.
A PIC is a microchip containing photonic components which form a circuit that processes photons as opposed to electrons to perform a function.
In comparison with digital microelectronic circuits in which the majority of functions are performed using transistors, with photonics it’s a little more complex; there’s no single dominant device but rather a variety of components which can be either passive (e.g. couplers, switches, modulators, multiplexers) or active (e.g., amplifiers, detectors, lasers) and then interconnected via waveguides to form a circuit.
Figure 1: Electronic and photonic circuit building blocks [AIP Publishing]
Similar to digital microelectronics, photonic circuits can be fabricated on silicon, enabling high-density PICs to be made in existing CMOS foundries and co-integrated with transistor-based electronics. Silicon on insulator (SOI) has been the most widely used process within silicon photonics (SiPh) but is likely to be replaced in certain applications by Silicon Nitride (SiN) due to its wider spectral range, low propagation loss and higher power handling (particularly useful for detectors, spectrometers, biosensors, and quantum computers).
However, neither of the silicon-based processes can generate light on their own nor allow for the integration of active components, hence a separate process, Indium Phosphide (InP), is commonly used for fabricating high-performance amplifiers, lasers, and detectors.
Figure 2: Number of photonic components/PIC [Chao Xiang, University of California]
The design challenges with PICs
Photonic circuit design is complex; individual components need to be tailored to the target application requirements (wavelengths; linearities; power levels), process types (SiPh, InP, SiN) and fabs (characterisation), and hence often need to be designed from scratch, requiring large design teams and/or a dependency on independent design houses.
Once the photonic circuit has been designed and verified there are still hurdles in packaging, laser integration, testing and additional processing steps – whilst packaging, assembly, and testing are only a small part of the cost for digital microelectronics (10%), the reverse is true for photonics, and can be as much as 80% of the total cost for InP photonic devices.
Fabs are starting to address this issue by providing process design kits (PDKs) that designers can use to design, simulate, and verify designs before handing them back to the foundry for fabrication. These PDKs include a base set of photonic building blocks (BBs) to bootstrap the design process but are often limited to particular wavelengths and applications (e.g., telecoms).
A market opportunity therefore exists for design houses and other 3rd parties to license out BBs and even entire circuits that fit the broader set of application requirements. LioniX, for example, offers a range of SiN PIC modules, whilst PhotonDelta, a growth accelerator for the Dutch integrated photonics industry, offers a number of design libraries.
EDA tools can then be used for combining these BBs into photonic circuits whilst facilitating seamless integration of electronic and photonic components in IC designs where needed.
However, as mentioned earlier, these designs still then need to be optimised/characterised for the target process/fab, as imperfections and fluctuations of even a few nanometres can cause scattering or reflections and affect performance. In many respects, the photonic circuit design process is more akin to RF and PCB design than digital microelectronics – mostly analogue, and needing careful selection and qualification of components.
A number of parties are exploring ways of addressing some of these issues and accelerating photonic design. Researchers in Canada, for instance, are using machine learning to predict the precise structure of a photonic component once it’s fabricated to a particular process/fab thereby enabling accurate simulation and optimisation to circumvent the ‘trial and error’ nature of photonic design. Similarly, a startup in the UK, Wave Photonics, has pioneered computational techniques to auto-adapt functional building blocks for different wavelengths, fabs and performance design trade-offs.
Nevertheless, the fabrication process still involves a degree of trial and error today, and it may well be 3-5yrs and require a large number of wafer runs and assemblies before the process is perfected sufficiently to deliver the predictable outcomes required to scale up to larger circuits and high volumes.
The use of photonics in communications & networking
Digital microelectronics has become pervasive, but with the demand for ever-faster compute and higher-bandwidth networking, the interaction of electrons with other particles in the copper wires at these speeds is resulting in higher energy consumption, more heat, and restrictions to performance. Photons don’t suffer from any of these constraints being virtually frictionless and able to travel faster, support higher bandwidths, and be more energy efficient, hence present an intriguing alternative.
Whilst optical links have been introduced within data centres to form high-speed clusters, the wiring within the racks is typically copper, and as processing demands continue to rise, this is creating a bottleneck and issues around both energy consumption and cooling.
The answer is likely to be through Co-Packaged Optics (CPO) in which the switch ASICs and optical engines are integrated on a single packaged substrate to move the optical connection as close as possible to the switching chip. Doing so enables higher density integration and improves cost effectiveness and energy efficiency with savings of up to 30% of the total system power. Ayar Labs, for example, integrate the optical/electrical components of a transceiver bar the laser inside an optical I/O chiplet.
Figure 3: Ayar Labs optical I/O chiplet [Ayar Labs]
In a similar vein, Nvidia and TSMC are interconnecting multiple AI GPUs via a chip-on-wafer-on-substrate (CoWoS) 2.5D package, and Lightmatter’s Passage enables chiplets to be interconnected via nanophotonic waveguides.
The demand for low-power optical transceivers within data centres, and in particular for Co-Packaged Optics (CPO), will be a key driver in the growth of the silicon photonics market over the next 3-5yrs ($3-4 billion by 2025).
Accelerating AI
With AI compute requirements doubling every 3.4mths (c.f. Moore’s Law which had a 2-year doubling period) [OpenAI] fuelled most recently by the race to generative AI, there is a growing need to develop novel computing systems capable of fast and energy-efficient computation.
Figure 4: Computational requirements for training transformer models [Nvidia]
Silicon photonics may provide an answer, utilising the unique properties of light to solve complex mathematical problems and meet today’s AI compute demands but with energy consumption as low as one photon per operation and performed at the speed of light hence orders of magnitude faster and more energy efficient than digital computation (although getting data efficiently in/out the photonic chip remains a challenge).
To give an example, detecting edges in an image is of great use in the world of computer vision (e.g., for feature extraction or pattern detection) but requires a lot of compute to perform the CNN multiplication operations.
Figure 5: Example of image edge detection [Brighton Nkomo]
Fourier Transforms (FFT) represent a faster method, enabling the image data to be converted from the spatial domain to the frequency domain where edges will be represented by high frequencies which can be captured via a high pass filter. An inverse FFT then transforms the data back into an image showing just the edges of objects in the original image.
The downside is that FFTs themselves are computationally intensive, so this approach presents only a marginal improvement when using digital computation.
Light though has unique properties, and its interference behaviour can be used to perform FFT operations in a massively parallel way that is not only incredibly fast, but also tremendously energy efficient compared to digital computation.
Figure 6: Solving a complex mathematic equation with light [Ella Maru studio]
In practise though, there remain a few obstacles.
Optical components can’t be packed nearly as tightly as transistors hence chip size can be an issue, although membrane-based nanophotonic technologies should in future enable tens of thousands of components per chip, and new approaches such as the use of very thin nanostructured surfaces combined with semi-transparent mirrors are being explored for performing the matrix multiplications in AI/ML inference.
Another issue is around accuracy. Today’s implementations are mainly targeted at performing inference on ML models trained using digital systems. Physical imperfections in the PIC fabrication, and quantisation noise introduced through the optical/electrical converters for getting data in and out of the photonic chip, can result in a ‘reality gap’ between the trained model and inference output that adversely affects accuracy and energy efficiency.
These challenges though present huge opportunity for innovation, whether that be through improving PIC density, optimising the optical/electrical interface to improve precision, or harnessing the unique properties of light to deliver a step-change in AI inference performance and energy efficiency.
Salience Labs for instance are pioneering a novel ‘on-memory compute’ architecture and using different wavelengths of light to facilitate highly parallelised operation and boost performance, whilst Lumai are exploring the application of photonics for more efficient ML model training.
With the AI chip market projected to be worth a colossal $309bn by 2030 [Verified Market Research], the application of integrated photonics to AI acceleration is likely to attract a lot more investor interest and innovation going forward.
Integrated photonics in sensors
At a component level, integrated photonics is being employed in inertial sensors to achieve ultra-precise positioning/indoor navigation [Zero Point Motion], and separately is enabling laser diodes to be integrated with micro-optics and electrical interfaces on a millimetre-sized chip for use in AR/VR glasses as demonstrated in this YouTube video.
Figure 7: Fully integrated RGB laser light engine for retinal projection in AR [LionIX]
Integrated photonics also opens up the prospect of lab-on-a-chip (LOC) biosensors through a combination of miniaturisation, extreme sensitivity, supporting multiple simultaneous tests, and enabling mass production at low cost. The Point of Care (PoC) market is expected to double in the next few years to $10.1B by 2025 [Yole Development].
Figure 8: Diagnostics platform providing Point of Care (POC) tests [SurfiX Diagnostics]
And finally, the intrinsic benefits in photonics for computing FFTs can also be used to provide the massive vector transforms needed for fast and efficient fully homomorphic encryption (FHE) to enable secure processing in the cloud or by 3rd parties without the data ever being in the clear.
Can integrated photonics reach scale?
The opportunity is clear. But for integrated photonics to thrive and reach million-scale volumes across multiple sectors there will need to be a more comprehensive set of pre-validated design libraries and tools that decouple design from the underlying fabrication and packaging technology to enable a fabless model that attracts new entrants and innovation.
The opportunity for startups is therefore twofold 1) innovating within the design process & toolchain to reduce lead times and improve performance, 2) applying integrated photonics within new products & services in networking, AI acceleration, ultra-sensitive sensors, and healthcare.
Europe has a heritage in photonics, so it’s perhaps not surprising that European research organisations, spinouts and startups are leading the industry.
Figure 9: Value chain companies by geography [PhotonDelta: SiN; InP]
In the photonics design, packaging and testing space, example European companies include Alcyon Photonics, Wave Photonics, Bright Photonics, VLC Photonics, Photon Design, FiconTEC, PhotonDelta and LioniX.
Companies developing photonic chips to accelerate AI include Optalysys, Salience Labs and Lumai whilst those using photonics to produce ultra-sensitive sensors include Zero Point Motion, Miraex and PhotonFirst; SMART Photonics and EFFECT Photonics are addressing the telecoms/networking space, and organisations such as PhotonDelta and JePPIX are helping to coordinate the growth of integrated photonics across Europe.
Integrated photonics faces many challenges, but there is increasing evidence that the technology is set to follow the same trajectory as microelectronics over the coming years. The potential upside is therefore huge, both in terms of market value but also in the opportunity this presents for innovative startups.
If you’re a startup in this space, we’d love to hear from you.
An introduction from our CTO
Whilst security is undoubtedly important, fundamentally it’s a business case based on the time-value depreciation of the asset being protected, which in general leads to a design principle of “it’s good enough” and/or “it will only be broken in a given timeframe”.
At the other extreme, history has given us many examples where reliance on theoretical certainty fails due to unknowns. One such example being the Titanic which was considered by its naval architects as unsinkable. The unknown being the iceberg!
It is a simple fact that weaker randomness leads to weaker encryption, and with the inexorable rise of compute power due to Moore’s law, the barriers to breaking encryption are eroding. And now with the advent of the quantum-era, cyber-crime is about to enter an age in which encryption when done less than perfectly (i.e. lacking true randomness) will no longer be ‘good enough’ and become ever more vulnerable to attack.
In the following, Bloc’s Head of Research David Pollington takes a deeper dive into the landscape of secure communications and how it will need to evolve to combat the threat of the quantum-era. Bloc’s research findings inform decisions on investment opportunities.
Setting the scene
Much has been written on quantum computing’s threat to encryption algorithms used to secure the Internet, and the robustness of public-key cryptography schemes such as RSA and ECDH that are used extensively in internet security protocols such as TLS.
These schemes perform two essential functions: securely exchanging keys for encrypting internet session data, and authenticating the communicating partners to protect the session against Man-in-the-Middle (MITM) attacks.
The security of these approaches relies on either the difficulty of factoring integers (RSA) or calculating discrete logarithms (ECDH). Whilst safe against the ‘classical’ computing capabilities available today, both will succumb to Shor’s algorithm on a sufficiently large quantum computer. In fact, a team of Chinese scientists have already demonstrated an ability to factor integers of 48bits with just 10 qubits using Schnorr’s algorithm in combination with a quantum approximate optimization to speed-up factorisation – projecting forwards, they’ve estimated that 372 qubits may be sufficient to crack today’s RSA-2048 encryption, well within reach over the next few years.
The race is on therefore to find a replacement to the incumbent RSA and ECDH algorithms… and there are two schools of thought: 1) Symmetric encryption + Quantum Key Distribution (QKD), or 2) Post Quantum Cryptography (PQC).
Quantum Key Distribution (QKD)
In contrast to the threat to current public-key algorithms, most symmetric cryptographic algorithms (e.g., AES) and hash functions (e.g., SHA-2) are considered to be secure against attacks by quantum computers.
Whilst Grover’s algorithm running on a quantum computer can speed up attacks against symmetric ciphers (reducing the security robustness by a half), an AES block-cipher using 256-bit keys is currently considered by the UK’s security agency NCSC to be safe from quantum attack, provided that a secure mechanism is in place for sharing the keys between the communicating parties – Quantum Key Distribution (QKD) is one such mechanism.
Rather than relying on the security of underlying mathematical problems, QKD is based on the properties of quantum mechanics to mitigate tampering of the keys in transit. QKD uses a stream of single photons to send each quantum state and communicate each bit of the key.
However, there are a number of implementation considerations that affect its suitability:
Integration complexity & cost
- QKD transmits keys using photons hence is reliant on a suitable optical fibre or free-space (satellite) optical link – this adds complexity and cost, precludes use in resource-constrained edge devices (such as mobile phones and IoT devices), and reduces flexibility in applying upgrades or security patches
Distance constraints
- A single QKD link over optical fibre is typically limited to a few 100 km’s with a sweet spot in the 20–50 km range
- Range can be extended using quantum repeaters, but doing so entails additional cost, security risks, and threat of interception as the data is decoded to classical bits before re-encrypting and transmitting via another quantum channel; it also doesn’t scale well for constructing multi-user group networks
- Alternative, greater range can be achieved via satellite links, but at significant additional cost
- A fully connected entanglement-based quantum communication network is theoretically possible without requiring trusted nodes, but is someway off commercialisation and will be dependent again on specialist hardware
Authentication
- A key tenet of public-key schemes is mutual authentication of the communicating parties – QKD doesn’t inherently include this, and hence is reliant on either encapsulating the symmetric key using RSA or ECDH (which, as already discussed, isn’t quantum-safe), or using pre-shared keys exchanged offline (which adds complexity)
- Given that the resulting authenticated channel could then be used in combination with AES for encrypting the session data, to some extent this negates the need for QKD
DoS attack
- The sensitivity of QKD channels to detecting eavesdropping makes them more susceptible to denial of service (DoS) attacks
Post-Quantum Cryptography (PQC)
Rather than replacing existing public key infrastructure, an alternative is to develop more resilient cryptographic algorithms.
With that in mind, NIST have been running a collaborative activity with input from academia and the private sector (e.g., IBM, ARM, NXP, Infineon) to develop and standardise new algorithms deemed to be quantum-safe.
A number of mathematical approaches have been explored with a large variation in performance. Structured lattice-based cryptography algorithms have emerged as prime candidates for standardisation due to a good balance between security, key sizes, and computational efficiency. Importantly, it has been shown that lattice-based algorithms can be implemented on low-power IoT edge devices (e.g., using Cortex M4) whilst maintaining viable battery runtimes.
Four algorithms have been short-listed by NIST: CRYSTALS-Kyber for key establishment, CRYSTALS-Dilithium for digital signatures, and then two additional digital signature algorithms as fallback (FALCON, and SPHINCS+). SPHINCS+ is a hash-based backup in case serious vulnerabilities are found in the lattice-based approach.
NIST aims to have the PQC algorithms fully standardised by 2024, but have released technical details in the meantime so that security vendors can start working towards developing end-end solutions as well as stress-testing the candidates for any vulnerabilities. A number of companies (e.g., ResQuant, PQShield and those mentioned earlier) have already started developing hardware implementations of the two primary algorithms.
Commercial landscape
QKD has made slow progress in achieving commercial adoption, partly because of the various implementation concerns outlined above. China has been the most active, the QUESS project in 2016 creating an international QKD satellite channel between China and Vienna, and in 2017 the completion of a 2000km fibre link between Beijing and Shanghai. The original goal of commercialising a European/Asian quantum-encrypted network by 2020 hasn’t materialised, although the European Space Agency is now aiming to launch a quantum satellite in 2024 that will spend three years in orbit testing secure communications technologies.
BT has recently teamed up with EY (and BT’s long term QKD tech partner Toshiba) on a two year trial interconnecting two of EY’s offices in London, and Toshiba themselves have been pushing QKD in the US through a trial with JP Morgan.
Other vendors in this space include ID Quantique (tech provider for many early QKD pilots), UK-based KETS, MagiQ, Qubitekk, Quintessance Labs and QuantumCtek (commercialising QKD in China). An outlier is Arqit; a QKD supporter and strong advocate for symmetric encryption that addresses many of the QKD implementation concerns through its own quantum-safe network and has partnered with Virgin Orbit to launch five QKD satellites, beginning in 2023.
Given the issues identified with QKD, both the UK (NCSC) and US (NSA) security agencies have so far discounted QKD for use in government and defence applications, and instead are recommending post-quantum cryptography (PQC) as the more cost effective and easily maintained solution.
There may still be use cases (e.g., in defence, financial services etc.) where the parties are in fixed locations, secrecy needs to be guaranteed, and costs are not the primary concern. But for the mass market where public-key solutions are already in widespread use, the best approach is likely to be adoption of post-quantum algorithms within the existing public-key frameworks once the algorithms become standardised and commercially available.
Introducing the new cryptographic algorithms though will have far reaching consequences with updates needed to protocols, schemes, and infrastructure; and according to a recent World Economic Forum report, more than 20 billion digital devices will need to be upgraded or replaced.
Widespread adoption of the new quantum-safe algorithms may take 10-15 years, but with the US, UK, French and German security agencies driving the use of post-quantum cryptography, it’s likely to become defacto for high security use cases in government and defence much sooner.
Organisations responsible for critical infrastructure are also likely to move more quickly – in the telco space, the GSMA, in collaboration with IBM and Vodafone, have recently launched the GSMA Post-Quantum Telco Network Taskforce. And Cloudflare has also stepped up, launching post-quantum cryptography support for all websites and APIs served through its Content Delivery Network (19+% of all websites worldwide according to W3Techs).
Importance of randomness
Irrespective of which encryption approach is adopted, their efficacy is ultimately dependent on the strength of the cryptographic key used to encrypt the data. Any weaknesses in the random number generators used to generate the keys can have catastrophic results, as was evidenced by the ROCA vulnerability in an RSA key generation library provided by Infineon back in 2017 that resulted in 750,000 Estonian national ID cards being compromised.
Encryption systems often rely upon Pseudo Random Number Generators (PRNG) that generate random numbers using mathematical algorithms, but such an approach is deterministic and reapplication of the seed generates the same random number.
True Random Number Generators (TRNGs) utilise a physical process such as thermal electrical noise that in theory is stochastic, but in reality is somewhat deterministic as it relies on post-processing algorithms to provide randomness and can be influenced by biases within the physical device. Furthermore, by being based on chaotic and complex physical systems, TRNGs are hard to model and therefore it can be hard to know if they have been manipulated by an attacker to retain the “quality of the randomness” but from a deterministic source.Ultimately, the deterministic nature of PRNGs and TRNGs opens them up to quantum attack.
A further problem with TRNGs for secure comms is that they are limited to either delivering high entropy (randomness) or high throughput (key generation frequency) but struggle to do both. In practise, as key requests ramp to serve ever-higher communication data rates, even the best TRNGs will reach a blocking rate at which the randomness is exhausted and keys can no longer be served. This either leads to downtime within the comms system, or the TRNG defaults to generating keys of 0 rendering the system highly insecure; either eventuality results in the system becoming highly susceptible to denial of service attacks.
Quantum Random Number Generators (QRNGs) are a new breed of RNGs that leverage quantum effects to generate random numbers. Not only does this achieve full entropy (i.e., truly random bit sequences) but importantly can also deliver this level of entropy at a high throughput (random bits per second) hence ideal for high bandwidth secure comms.
Having said that, not all QRNGs are created equal – in some designs, the level of randomness can be dependent on the physical construction of the device and/or the classical circuitry used for processing the output, either of which can result in the QRNG becoming deterministic and vulnerable to quantum attack in a similar fashion to the PRNG and TRNG. And just as with TRNGs, some QRNGs can run out of entropy at high data rates leading to system failure or generation of weak keys.
Careful design and robustness in implementation is therefore vital – Crypta Labs have been pioneering in quantum tech since 2014 and through their research have designed a QRNG that can deliver hundreds of megabits per second of full entropy whilst avoiding these implementation pitfalls.
Summary
Whilst time estimates vary, it’s considered inevitable that quantum computers will eventually reach sufficient maturity to beat today’s public-key algorithms – prosaically dubbed Y2Q. The Cloud Security Alliance (CSA) have started a countdown to April 14 2030 as the date by which they believe Y2Q will happen.
QKD was the industry’s initial reaction to counter this threat, but whilst meeting the security need at a theoretical level, has arguably failed to address implementation concerns in a way which is cost effective, scalable and secure for the mass market, at least to the satisfaction of NCSC and NSA.
Proponents of QKD believe key agreement and authentication mechanisms within public-key schemes can never be fully quantum-safe, and to a degree they have a point given the recent undermining of Rainbow, one of the short-listed PQC candidates. But QKD itself is only a partial solution.
The collaborative project led by NIST is therefore the most likely winner in this race, and especially given its backing by both the NSA and NCSC. Post-quantum cryptography (PQC) appears to be inherently cheaper, easier to implement, and deployable on edge devices, and can be further strengthened through the use of advanced QRNGs. Deviating away from the current public-key approach seems unnecessary compared to swapping out the current algorithms for the new PQC alternatives.
Looking to the future
Setting aside the quantum threat to today’s encryption algorithms, an area ripe for innovation is in true quantum communications, or quantum teleportation, in which information is encoded and transferred via the quantum states of matter or light.
It’s still early days, but physicists at QuTech in the Netherlands have already demonstrated teleportation between three remote, optically connected nodes in a quantum network using solid-state spin qubits.
Longer term, the goal is to create a ‘quantum internet’ – a network of entangled quantum computers connected with ultra-secure quantum communication guaranteed by the fundamental laws of physics.
When will this become a reality? Well, as with all things quantum, the answer is typically ‘sometime in the next decade or so’… let’s see.