BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Google Lengthens, Mixes, Broadens AI With Gemini Toolset

Following

AI is growing. The push to develop, refine, segment and interconnect the use of Large Language Models (LLMs) as they become part of the essential components used in Artificial Intelligence (AI)-centric enterprise software applications is never far from the news.

Google DeepMind CEO Demis Hassabis has unveiled the next version of Google’s Gemini LLM, now at version 1.5 and available in a new form. The LLM formerly known as Bard has been described as a ‘step change’ in technology advancement and its Pro version release is available as a developer preview.

Long-context understanding

Hassabis has called out Gemini 1.5 for its ability to deliver ‘long-context understanding’ - a term used to explain an AI model’s ability to track vector relationships across longer pieces of text and (as we move to multi-modal AI ingestion of images, video, sound files and other) other data sources as well.

Due to the performance degradation naturally experienced as the amount of data in a piece of information increases, long-context understanding technology is engineered to appreciate the need to create relationships between data points that are further apart and, crucially, not necessarily at the start or end of a piece of information.

Having first emerged in December last year, this version of the Gemini line is positioned as a ‘research release’ exclusively for software application developers and Google cloud customers. In some ways redolent of the developer preview system used by Redmond in the Microsoft Developer Network (MSDN), the company appears to be aligning with the programming community closer (or at least putting it first) than some of the open source approaches seen elsewhere. Whether Google is trying to win hearts and minds, put a tighter strap on AI safety, or simply looking to exert more overall control and steerage is open to discussion.

Mixture-of-Experts

Hassabis has also explained how Gemini 1.5 offers what is known as a new Mixture-of-Experts (MoE) architecture. An approach designed to divide neural network architecture logic into smaller incremental ‘expert’ networks, MoE is emblematic of the way we are now looking to develop AI into more specialized component model structures that are really good at one (or at least less) things than their larger counterparts.

Because there is such a vast quantity of information available in any given ‘training corpora’ (body of knowledge or work), if AI models are allowed to focus in one or other direction they can make more sense of what is happening around them. Think of it like a mixed group of specialists sitting in a room, some of whom understand food science and gastronomy, while others have an innate appreciation for rocket science. If a video is played showing how to make a perfect omelette, the food specialists light up, absorb the information and take it in while the rocket scientists mostly switch off or start to think about lunch. MoE models are built to selectively come to life and start using the relevant expert pathways in a neural network architecture only when it matters.

“Depending on the type of input given, MoE models learn to selectively activate only the most relevant expert pathways in its neural network. This specialization massively enhances the model’s efficiency. Google has been an early adopter and pioneer of the MoE technique for deep learning through research,” explained Hassabis, in a Google AI blog post. “Our latest innovations in model architecture allow Gemini 1.5 to learn complex tasks more quickly and maintain quality while being more efficient to train and serve. These efficiencies are helping our teams iterate, train and deliver more advanced versions of Gemini faster than ever before… and we’re working on further optimizations.”

Token offerings

In this next-generation version of Gemini, Google has increased the amount of information its AI models can process to now be able to run up to 1 million tokens consistently. As previously explained here, tokens are a core AI technique used for segmenting, defining and classifying words, parts of words (letters, even) or word components so that AI models can be taught to place relationships and values on pieces of information. Further here, we can say that tokens can be images, videos, audio or code. The more tokens an AI model can handle, the more knowledge it can potentially have.

As Google engineer Chaim Gartenberg notes, “The full 1 million token context window is computationally intensive and still requires further optimizations to improve latency, which we’re actively working on as we scale it out.”

Is it safe then? According to Hassabis, this latest technology update has been built according to and in line with Google’s own AI Principles and robust safety policies.

“We’re ensuring our models undergo extensive ethics and safety tests. We then integrate these research learnings into our governance processes and model development and evaluations to continuously improve our AI systems,” he said. “Since introducing [Gemini] 1.0 Ultra in December, our teams have continued refining the model, making it safer for a wider release. We’ve also conducted novel research on safety risks and developed red-teaming techniques to test for a range of potential harms.”

Gemini family

There’s a whole ‘family’ of Google Gemini options from the company’s Nano product which is engineered for the model phone space, upward to Gemini Pro for developers and onward to the premium Gemini Ultra version. Whether the price and service capability differentiation Google is offering currently remains part of the way the product is channelled in future is not known, but it makes reasonable sense to diversify its delivering given the wider trend for AI itself to diversify and specialize as showcased here.

As we now appreciate the need to include Small Language Models (SLMs) also known as ‘private AI’ and a range of other names in our LLM universe, the need to mix, broaden and lengthen our use of AI with fluid tokenization control and Mixture-of-Experts (MoE) architectures is very of the moment.

Did Google name Gemini because it wanted us to think of its AI as a ‘twin’ to our own human existence? Sadly not, most sources think it was the coming together of two internal Google AI teams (Google Brain & Google DeepMind), but the astrological connection doesn’t hurt either.

Follow me on Twitter or LinkedIn