2022: We reviewed this year’s AI breakthroughs

Linking to demos so that you can also review them yourself

Eleni Nisioti
10 min readDec 23, 2022

--

Have you been finding the leaps of AI in the last past years impressive? Just wait until you hear what happened in 2022.

In our review of 2019 we talked a lot about reinforcement learning and Generative Adversarial Networks (GANs), in 2020 we focused on Natural Language Processing (NLP) and algorithmic bias, in 2021 Transformers stole the spotlight.

This year is intense: we have, among others, a new generative model that beats GANs, an AI-powered chatbot that discusses with more than 1 million people in a week and prompt engineering, a job that did not exist a year ago.

To cover as many breakthroughs as possible we have broken down our review in four parts:

🎨 Text-to-Image generation

💬 Language generation

🎲 Games

🧬 Biology

We provide links to all currently available demos: many of this year’s inventions come with a demo that allows you to personally interact with a model. This is a great and fun way to form your own opinions about what these models can and can’t do.

Some of this year’s breakthroughs created controversies. As people played around with them, many problems such as bias and mis-information became prevalent. We believe that controversies are part of these breakthroughs and will discuss them alongside them.

🎨 Text-to-Image generation

Text-to-Image generation is the automated generation of images based on a prompt provided by human users. Remember OpenAI’s Dall-e and its avocado chair? This was one of the first appearances of an AI model used for Text-to-Image generation. The world was hooked. We, humans, can easily imagine weird concepts using language, but seeing our plots expressed in images is another thing.

Dall-e, and pre-2022 tools in general, attributed their success either to the use of the Transformer or Generative Adversarial Networks. The former is a powerful architecture for artificial neural networks that was originally introduced for language tasks (you’ve probably heard of GPT-3?) and the latter is a technique for generating data (such as very realistic photographs).

What happened?

In 2022 we got diffusion models (NeurIPS paper). These are generative models (as are GANs and auto-encoders) but they are trained following a very different logic. The idea is (as most successful ideas in machine learning are) rather simple: these models slowly destroy the original mages by adding random noise to it and then learn how to remove this noise. In this way, they learn what matters about the data.

Here are some diffusion models that this year gave us: Dalle-2 by OpenAI, Imagen by Google and Stable Diffusion by Huggin Face. Compared to the first version of Dall-e, the new models can create more complex images and understand more natural language:

Examples of what different Text-to-Image models output for different prompts

How is this even possible?

When we compare the output of this year’s models to that of Dall-e, we can talk about a new generation of Text-to-Image models. What is rather nauseating is that the leap to the new generation happened within a year. We suspect that the reason is: diffusion models are more stable and powerful than generative models of the past, the neural networks are getting bigger and, last but not least, additional engineering is ensuring that the images look more photo-realistic.

Who should I follow?

These models have immense costs and large commercial value (think content generation on websites and artwork), so it makes sense that progress is driven by a handful of large companies: OpenAI, Google and Hugging Face are some examples. But the open-source community has been quick to respond: as companies in this area tend to not publish about or openly share their models, independent research organisations such as Midjourney are creating their own open-source versions.

The elephant in the room?

These models are disrupting many industries: from online content generation to art, today’s Text-to-Image models can for the first time compete with humans in ingenuity and complexity. Perhaps these models will obliterate many jobs while at the same time creating new jobs, such as prompt engineering. What is the role of humans in this new world and what do artists have to say?

Useful links:

Language generation

Machines that can understand and communicate in our own language have always been hard to construct. As humans we do not know exactly how we learn language: it just happens. The first computational linguistics methods tried to bypass the immense complexity of human language learning by hard-coding syntax and grammar rules in their models. Then, came the first deep learning approaches that saw language generation simply as a prediction task: if you can predict the next world of every sentence then you are effectively speaking the _ .

These early systems had difficulties learning long-range dependencies but were effective enough to be used in tasks such as translation. Then came the Transformer architecture, which solved the issue of long-range dependencies, and, along with it, the family of BERT models , GPT-2 and its larger successor, GPT-3. The debate was on again: maybe language generation is really just a prediction task?

What happened?

This year there was a new trend in language models: instead of getting bigger they started specialising.

  • ChatGPT is a smaller cousin of GPT-3 customised for chatting. It was created by fine-tuning GPT-3 with supervised and reinforcement learning: humans provided dialogues or rated ChatGPT’s dialogues in order to guide it towards, not just writing human language, but also chatting in a believable way.
  • Galactica is a language model by Meta AI customised for scientific research . It has been trained on a large corpora of scientific papers and can respond to questions with technical explanations and relevant papers you should look into.
  • GitHub introduced its Copilot plugin and Amazon its Codewhisperer for automatic code generation. With these tools you can describe in natural language what functionality you want your program to have. This trend started in 2021, with OpenAI Codex, a GPT-3 based tool.

How is this even possible?

Reducing the range of applications a model can be used for can have a similar effect to increasing its size: the model has more computational power to devote to concepts it needs to learn. Although we do not know how exactly ChatGPT works, we also expect that there is a lot of hard-coded behaviours: ChatGPT is censored, biased to positive interactions and dialogues with it involved many disclaimers.

GPT-3 writes a poem
ChatGPT writes a poem

Who should I follow?

We expect that more companies will join this market to offer automated solutions, but the largest advances will be made by owners of large amounts of specialised data, like Github, and able to spend hundreds of thousands of dollars a day to keep their models running.

The elephant in the room?

As the whole world was beta testing these models many problems came out, among which mis-information and bias were most prominent. It is telling that Galactica was taken down by Meta AI after only three days of use. Reading this post, you too know that all language models do is predict the most plausible next thing to say: plausible does not mean true, it simply means believable by a user interacting with them. You should take whatever these models say with a grain of salt: ChatGPT is less than your all-knowing assistant and more like your uncle John that has a reply to anything and never fact-checks. This does not mean these models are useless: they are fun to have around and, once specialised and correctly engineered, can prove useful to conscious users.

Another issue with models injudiciously scraping the web is copyrights: this is most prevalent in licensed content, such as online code. Is the AI that is scrapping millions of repositories on Github reusing and modifying the code? Or is it “just learning” from it? We in fact know that many of the repositories scrapped under OpenAI Codex and Github Copilot were under restrictive licences, which puts these tools in a gray legal area.

Useful links

Games

We pause our discussion of ground-breaking AI applications to look at which games the AI community solved this year. Games are fun; but this is only part of the reason of why AI researchers are obsessed with them. In contrast to our real world where biases and problems are lurking even after the heaviest data engineering, games are test-beds with well-defined rules. This is why some of biggest discoveries of AI after the deep learning bloom of 2006 were playing chess, Go, and Starcraft.

When an AI research group chooses a game to work on, they search for problem property that AI is currently struggling with: for example we started with large search spaces (Go), then moved to cooperation and competition (Starcraft), then tackled delayed and sparse rewards (the Atari game Montezuma’s revenge).

What happened?

A focus of the last years was on partially-observable problems: there are some games, like Hanabi, where you need to make choice while some part of important information is not available to you. Most multi-player games are actually partially-observable.

This year we saw success in two such games:

How is this even possible?

These works are important feats of coupling reinforcement learning with game theory for strategic reasoning in multi-player set-ups. The agent playing Stratego learned how to bluff against human player in many occasions and the Diplomacy agent learned how to communicate about its policies to cooperate and compete. The neural networks employed in such set-ups are much smaller than the ones used for language models and they train on immense amounts of free unlabelled data using self-play: the algorithm plays against itself so that no human-labelled data are necessary.

Who should I follow?

DeepMind is traditionally the company announcing big feats in AI playing games, but there are many other companies and universities working in this area. Perhaps we should expect a similar project by OpenAI soon?

The elephant in the room?

As AI is entering the realm of social games, it is becoming obvious that the way humans play is not necessarily optimal: our emotions and social norms come into play. So caution is from now required when hearing that some algorithm achieved super-human performance in a game. If we want to drive this field forward, should we instead make AI that loses like humans?

A lot of human players will soften their approach or they’ll start getting motivated by revenge and CICERO never does that. It just plays the situation as it sees it. So it’s ruthless in executing to its strategy, but it’s not ruthless in a way that annoys or frustrates other players — Andrew Goff, Diplomacy World Champion

AI for Biology

Biology is a field with many pressing questions, a lot of unlabelled data and hard computational problems. It is not surprising that it has become a major application area for deep learning. DeepMind’s AlphaFold was an impressive first step in this direction: Transformers managed to predict protein structures from genes, a task immensely useful for drug discovery, where protein structures are important for understanding the functionality of certain chemicals but are costly to produce in a lab.

What happened?

This year Meta AI presented its Metagenomic Atlas project: the objective here is to create a database that reveals the structure of the meta-genomic world. This world is vast but hidden to us (they call it the dark matter of the protein universe): each time you grab a handful of soil, you are holding millions of DNA samples corresponding to proteins we have never seen before. What proteins is this DNA coding for and what is their purpose? Analysing this dark matter will enable us to understand nature and evolution better, with some anticipated applications being the discovery of new treatments to diseases and the production of cleaner energy.

How is this even possible?

This project started by building up upon knowledge acquired through AlphaFold: a neural network with the Transformer architecture, but one order of magnitude larger than the one used for AlphaFold and heavily engineered to learn quicker.

Live interaction with the MetaGenomic Atlas and paper analysis

Who should I follow?

The market of AI for synthetic biology is at constant bloom. In addition to AI behemoths, many AI start-ups and biotech companies are springing up, battling certain diseases with AI tools or developing novel ways in which prediction models can benefit biology.

The elephant in the room?

We have a word of caution for almost every application we describe in this post. But biology is different: problems are well-defined here and the users of these tools are trained experts rather than the public. As long as these AI discoveries are accompanied by meaningful solutions that practitioners can make sense of, we believe that this line of research will not create any controversies.

Useful links

We hope you enjoyed this post as much as we enjoyed writing it. Preparing these reviews makes us realise: no matter how closely you follow AI new throughout the year, you can’t but be surprised at the end by the sheer amount of progress in this field.

As we close out another year and look forward to all the possibilities in the field of AI, we wish you a Merry Christmas and a happily exciting new year.

Applied Data Science Partners is a London based consultancy that implements end-to-end data science solutions for businesses, delivering measurable value. If you’re looking to do more with your data, please get in touch via our website. Follow us on LinkedIn for more AI and data science stories!

--

--

Eleni Nisioti

PhD student in AI. Deep learning is not just for machines. I like my coffee like I like my code. Without bugs.