{% extends "layout.html" %} {% block content %}
Story-style intuition: The Artist vs. The Art Critic
Imagine two types of AI that both study thousands of cat photos.
โข The Discriminative Model is like an art critic. Its only job is to learn the difference between a cat photo and a dog photo. If you show it a new picture, it can tell you, "That's a cat," but it can't create a cat picture of its own. It learns a decision boundary.
โข The Generative Model is like an artist. It studies the cat photos so deeply that it understands the "essence" of what makes a cat a catโthe patterns, the textures, the shapes. It learns the underlying distribution of "cat-ness." Because it has this deep understanding, it can then be asked to create a brand new, never-before-seen picture of a cat from scratch.
Generative Models are a class of statistical models that learn the underlying probability distribution of a dataset. Their primary goal is to understand the data so well that they can "generate" new data samples that are similar to the ones they were trained on.
Generative models come in several powerful flavors, each with a different approach to learning and creating.
Analogy: The Master Forger. A VAE is like a forger who learns to create masterpieces. It first "compresses" a real painting into a secret recipe (a condensed set of characteristics called the latent space). It then learns to "decompress" that recipe back into a painting. By learning this process, it can later create new recipes and generate new, unique paintings.
Analogy: The Artist and Critic Game. A GAN consists of two competing neural networks: a Generator (the artist) that tries to create realistic images, and a Discriminator (the critic) that tries to tell the difference between real images and the artist's fakes. They train together in a game where the artist gets better at fooling the critic, and the critic gets better at catching fakes. This competition pushes the artist to create incredibly realistic images.
Analogy: The Sculptor. A Diffusion Model is like a sculptor who starts with a random block of marble (pure noise) and slowly chisels away the noise, step by step, until a clear statue (a realistic image) emerges. It learns this "denoising" process by first practicing in reverse: taking a perfect statue and systematically adding noise to it until it becomes a random block.
Training large generative models from scratch is a major undertaking. Here are conceptual sketches of what the code looks like using popular frameworks.
import torch.nn as nn
import numpy as np
class Generator(nn.Module):
def __init__(self, latent_dim, img_shape):
super(Generator, self).__init__()
self.img_shape = img_shape
self.model = nn.Sequential(
# Takes a random noise vector (latent_dim) and upsamples it
nn.Linear(latent_dim, 128),
nn.LeakyReLU(0.2, inplace=True),
nn.Linear(128, 256),
nn.BatchNorm1d(256),
nn.LeakyReLU(0.2, inplace=True),
nn.Linear(256, 512),
nn.BatchNorm1d(512),
nn.LeakyReLU(0.2, inplace=True),
nn.Linear(512, int(np.prod(self.img_shape))),
nn.Tanh() # Scales output to be between -1 and 1
)
def forward(self, z):
img = self.model(z)
img = img.view(img.size(0), *self.img_shape)
return img
import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
def build_generator(latent_dim, img_shape):
model = tf.keras.Sequential()
model.add(layers.Dense(256, input_dim=latent_dim))
model.add(layers.LeakyReLU(alpha=0.2))
model.add(layers.BatchNormalization(momentum=0.8))
model.add(layers.Dense(512))
model.add(layers.LeakyReLU(alpha=0.2))
model.add(layers.BatchNormalization(momentum=0.8))
model.add(layers.Dense(1024))
model.add(layers.LeakyReLU(alpha=0.2))
model.add(layers.BatchNormalization(momentum=0.8))
model.add(layers.Dense(np.prod(img_shape), activation='tanh'))
model.add(layers.Reshape(img_shape))
return model
# Easiest way to get started with powerful generative models!
from transformers import pipeline
# Initialize a text generation pipeline with a pre-trained model
generator = pipeline('text-generation', model='gpt2')
# Generate text
prompt = "In a world where AI could dream,"
generated_text = generator(prompt, max_length=50, num_return_sequences=1)
print(generated_text[0]['generated_text'])
1. A generative model (the artist) learns the underlying distribution of the data, P(X), and can create new samples. A discriminative model (the critic) learns the decision boundary between classes, P(Y|X), and can only classify existing data.
2. The Generator tries to create fake data that looks real. The Discriminator tries to distinguish between real data and the Generator's fake data.
3. The core idea is to learn to reverse a process of gradually adding noise to an image. By mastering this "denoising" process, the model can start with pure noise and denoise it step-by-step into a coherent new image.
4. Your model with an FID score of 5 is much better. For Frechet Inception Distance (FID), a lower score is better, as it indicates that the statistical distribution of your generated images is closer to the distribution of the real images.
The Story: Decoding the AI Artist's Toolkit