I previously posted a blog that was a glossary of terms related to artificial intelligence. It included this brief definition of "generative AI":
Generative AI: Generative AI refers to artificial intelligence systems that are designed to generate new content, whether it's text, images, music or other creative outputs. These systems often use techniques like generative adversarial networks (GANs) or autoregressive models to create content that is original and, in some cases, indistinguishable from content created by humans.
I expect for someone learning about AI, it's frustrating to read definitions of terms that include other terms you may not understand. In this case, generative adversarial networks — GANs — is probably a new term for many. This post will explain what GANs are for that reason — and also because they’re super cool.
GANs are a class of artificial intelligence algorithms and neural networks used in unsupervised machine learning. GANs were introduced in 2014 and have since become a powerful tool in various domains, particularly for generating realistic, high-quality data such as images, text and audio.
The core concept of GANs revolves around a game-theoretic framework, where two neural networks, the generator and the discriminator, are pitted against each other in a kind of "adversarial" competition. Here's how GANs work:
- Generator: The generator network's primary task is to create data samples that resemble the target data. For example, it might generate images that look like real photographs. It starts with random noise as input and generates data that, initially, is of poor quality.
- Discriminator: The discriminator network is a binary classifier. Its job is to evaluate data samples and distinguish between real data (e.g., actual photos) and fake data produced by the generator. It assigns a probability that a given data sample is real.
The training process in generative adversarial networks involves a competitive interplay between the generator and discriminator:
- The generator attempts to create data that is so realistic that the discriminator cannot distinguish it from genuine data.
- The discriminator, in turn, tries to improve its ability to tell the real data from the generated data.
Are you team generator or team discriminator?
This dynamic competition continues through numerous iterations of training, with both networks gradually improving their performance. Over time, the generator gets better at creating data that is more convincing, while the discriminator improves its ability to differentiate real from generated data.
The training continues until an equilibrium is reached, where the generator creates data that is virtually indistinguishable from real data, and the discriminator can't confidently tell the difference.
Applications of generative adversarial networks are widespread and include:
- Image Generation: GANs are famous for creating photorealistic images, such as faces of people who don't exist, high-quality artwork, or even convincing deepfake videos.
- Data Augmentation: GANs can be used to generate additional data for training machine learning models, especially when the available data is limited.
- Style Transfer: GANs can be employed to transform the style of images, converting a photograph into the style of a famous painting, for example.
- Super Resolution: GANs can be used to increase the resolution and quality of images, which is valuable in applications like medical imaging and enhancing video quality.
- Text-to-Image Synthesis: GANs can generate images from textual descriptions, which is useful in generating scenes based on natural language descriptions.
- Image-to-Image Translation: GANs can convert images from one domain to another, like turning satellite images into maps or black-and-white photos into color.
GANs have made significant contributions to the field of AI and have opened up exciting possibilities in image generation, data synthesis and more. However, they also pose ethical challenges, particularly in the context of deepfakes and manipulated content, where the generated data can be used for misleading or harmful purposes. Let’s hope for everyone’s sake that the discriminators can stay one step ahead of the generators.