|
|
{% extends "layout.html" %}
|
|
|
|
|
|
{% block content %}
|
|
|
<!DOCTYPE html>
|
|
|
<html lang="en">
|
|
|
<head>
|
|
|
<meta charset="UTF-8">
|
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
|
<title>Study Guide: Generative Models</title>
|
|
|
|
|
|
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
|
|
|
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
|
|
|
<style>
|
|
|
|
|
|
body {
|
|
|
background-color: #ffffff;
|
|
|
color: #000000;
|
|
|
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif;
|
|
|
font-weight: normal;
|
|
|
line-height: 1.8;
|
|
|
margin: 0;
|
|
|
padding: 20px;
|
|
|
}
|
|
|
|
|
|
|
|
|
.container {
|
|
|
max-width: 800px;
|
|
|
margin: 0 auto;
|
|
|
padding: 20px;
|
|
|
}
|
|
|
|
|
|
|
|
|
h1, h2, h3 {
|
|
|
color: #000000;
|
|
|
border: none;
|
|
|
font-weight: bold;
|
|
|
}
|
|
|
|
|
|
h1 {
|
|
|
text-align: center;
|
|
|
border-bottom: 3px solid #000;
|
|
|
padding-bottom: 10px;
|
|
|
margin-bottom: 30px;
|
|
|
font-size: 2.5em;
|
|
|
}
|
|
|
|
|
|
h2 {
|
|
|
font-size: 1.8em;
|
|
|
margin-top: 40px;
|
|
|
border-bottom: 1px solid #ddd;
|
|
|
padding-bottom: 8px;
|
|
|
}
|
|
|
|
|
|
h3 {
|
|
|
font-size: 1.3em;
|
|
|
margin-top: 25px;
|
|
|
}
|
|
|
|
|
|
|
|
|
strong {
|
|
|
font-weight: 900;
|
|
|
}
|
|
|
|
|
|
|
|
|
p, li {
|
|
|
font-size: 1.1em;
|
|
|
border-bottom: 1px solid #e0e0e0;
|
|
|
padding-bottom: 10px;
|
|
|
margin-bottom: 10px;
|
|
|
}
|
|
|
|
|
|
|
|
|
li:last-child {
|
|
|
border-bottom: none;
|
|
|
}
|
|
|
|
|
|
|
|
|
ol {
|
|
|
list-style-type: decimal;
|
|
|
padding-left: 20px;
|
|
|
}
|
|
|
|
|
|
ol li {
|
|
|
padding-left: 10px;
|
|
|
}
|
|
|
|
|
|
|
|
|
ul {
|
|
|
list-style-type: none;
|
|
|
padding-left: 0;
|
|
|
}
|
|
|
|
|
|
ul li::before {
|
|
|
content: "•";
|
|
|
color: #000;
|
|
|
font-weight: bold;
|
|
|
display: inline-block;
|
|
|
width: 1em;
|
|
|
margin-left: 0;
|
|
|
}
|
|
|
|
|
|
|
|
|
pre {
|
|
|
background-color: #f4f4f4;
|
|
|
border: 1px solid #ddd;
|
|
|
border-radius: 5px;
|
|
|
padding: 15px;
|
|
|
white-space: pre-wrap;
|
|
|
word-wrap: break-word;
|
|
|
font-family: "Courier New", Courier, monospace;
|
|
|
font-size: 0.95em;
|
|
|
font-weight: normal;
|
|
|
color: #333;
|
|
|
border-bottom: none;
|
|
|
}
|
|
|
|
|
|
|
|
|
.story-gen {
|
|
|
background-color: #f5f3ff;
|
|
|
border-left: 4px solid #4a00e0;
|
|
|
margin: 15px 0;
|
|
|
padding: 10px 15px;
|
|
|
font-style: italic;
|
|
|
color: #555;
|
|
|
font-weight: normal;
|
|
|
border-bottom: none;
|
|
|
}
|
|
|
|
|
|
.story-gen p, .story-gen li {
|
|
|
border-bottom: none;
|
|
|
}
|
|
|
|
|
|
.example-gen {
|
|
|
background-color: #f8f7ff;
|
|
|
padding: 15px;
|
|
|
margin: 15px 0;
|
|
|
border-radius: 5px;
|
|
|
border-left: 4px solid #8e2de2;
|
|
|
}
|
|
|
|
|
|
.example-gen p, .example-gen li {
|
|
|
border-bottom: none !important;
|
|
|
}
|
|
|
|
|
|
|
|
|
.quiz-section {
|
|
|
background-color: #fafafa;
|
|
|
border: 1px solid #ddd;
|
|
|
border-radius: 5px;
|
|
|
padding: 20px;
|
|
|
margin-top: 30px;
|
|
|
}
|
|
|
.quiz-answers {
|
|
|
background-color: #f8f7ff;
|
|
|
padding: 15px;
|
|
|
margin-top: 15px;
|
|
|
border-radius: 5px;
|
|
|
}
|
|
|
|
|
|
|
|
|
table {
|
|
|
width: 100%;
|
|
|
border-collapse: collapse;
|
|
|
margin: 25px 0;
|
|
|
}
|
|
|
th, td {
|
|
|
border: 1px solid #ddd;
|
|
|
padding: 12px;
|
|
|
text-align: left;
|
|
|
}
|
|
|
th {
|
|
|
background-color: #f2f2f2;
|
|
|
font-weight: bold;
|
|
|
}
|
|
|
|
|
|
|
|
|
@media (max-width: 768px) {
|
|
|
body, .container {
|
|
|
padding: 10px;
|
|
|
}
|
|
|
h1 { font-size: 2em; }
|
|
|
h2 { font-size: 1.5em; }
|
|
|
h3 { font-size: 1.2em; }
|
|
|
p, li { font-size: 1em; }
|
|
|
pre { font-size: 0.85em; }
|
|
|
table, th, td { font-size: 0.9em; }
|
|
|
}
|
|
|
</style>
|
|
|
</head>
|
|
|
<body>
|
|
|
|
|
|
<div class="container">
|
|
|
<h1>🌌 Study Guide: Generative Models</h1>
|
|
|
|
|
|
<h2>🔹 Core Concepts</h2>
|
|
|
<div class="story-gen">
|
|
|
<p><strong>Story-style intuition: The Artist vs. The Art Critic</strong></p>
|
|
|
<p>Imagine two types of AI that both study thousands of cat photos.
|
|
|
<br>• The <strong>Discriminative Model</strong> is like an <strong>art critic</strong>. Its only job is to learn the difference between a cat photo and a dog photo. If you show it a new picture, it can tell you, "That's a cat," but it can't create a cat picture of its own. It learns a decision boundary.
|
|
|
<br>• The <strong>Generative Model</strong> is like an <strong>artist</strong>. It studies the cat photos so deeply that it understands the "essence" of what makes a cat a cat—the patterns, the textures, the shapes. It learns the underlying distribution of "cat-ness." Because it has this deep understanding, it can then be asked to create a brand new, never-before-seen picture of a cat from scratch.</p>
|
|
|
</div>
|
|
|
<p><strong>Generative Models</strong> are a class of statistical models that learn the underlying probability distribution of a dataset. Their primary goal is to understand the data so well that they can "generate" new data samples that are similar to the ones they were trained on.</p>
|
|
|
|
|
|
<h2>🔹 Types of Generative Models</h2>
|
|
|
<p>Generative models come in several powerful flavors, each with a different approach to learning and creating.</p>
|
|
|
<ul>
|
|
|
<li>
|
|
|
<strong>Probabilistic Models:</strong> These models explicitly learn a probability distribution P(X). Examples include Naïve Bayes and Gaussian Mixture Models (GMMs). They are often easy to interpret but less powerful for complex data like images.
|
|
|
</li>
|
|
|
<li>
|
|
|
<strong>Variational Autoencoders (VAEs):</strong>
|
|
|
<div class="story-gen"><p><strong>Analogy: The Master Forger.</strong> A VAE is like a forger who learns to create masterpieces. It first "compresses" a real painting into a secret recipe (a condensed set of characteristics called the latent space). It then learns to "decompress" that recipe back into a painting. By learning this process, it can later create new recipes and generate new, unique paintings.</p></div>
|
|
|
</li>
|
|
|
<li>
|
|
|
<strong>Generative Adversarial Networks (GANs):</strong>
|
|
|
<div class="story-gen"><p><strong>Analogy: The Artist and Critic Game.</strong> A GAN consists of two competing neural networks: a <strong>Generator</strong> (the artist) that tries to create realistic images, and a <strong>Discriminator</strong> (the critic) that tries to tell the difference between real images and the artist's fakes. They train together in a game where the artist gets better at fooling the critic, and the critic gets better at catching fakes. This competition pushes the artist to create incredibly realistic images.</p></div>
|
|
|
|
|
|
</li>
|
|
|
<li>
|
|
|
<strong>Diffusion Models:</strong>
|
|
|
<div class="story-gen"><p><strong>Analogy: The Sculptor.</strong> A Diffusion Model is like a sculptor who starts with a random block of marble (pure noise) and slowly chisels away the noise, step by step, until a clear statue (a realistic image) emerges. It learns this "denoising" process by first practicing in reverse: taking a perfect statue and systematically adding noise to it until it becomes a random block.</p></div>
|
|
|
</li>
|
|
|
</ul>
|
|
|
|
|
|
<h2>🔹 Mathematical Foundations</h2>
|
|
|
<ul>
|
|
|
<li>
|
|
|
<strong>Joint Probability P(X, Y):</strong> Generative models often learn the joint probability of features X and labels Y. This allows them to generate new pairs of (X, Y).
|
|
|
</li>
|
|
|
<li>
|
|
|
<strong>Maximum Likelihood Estimation (MLE):</strong> This is the principle most generative models use for training. They adjust their parameters to maximize the probability (likelihood) that the observed training data was generated by the model.
|
|
|
</li>
|
|
|
<li>
|
|
|
<strong>ELBO (for VAEs):</strong> VAEs optimize a lower bound on the data likelihood called the Evidence Lower Bound. It's a clever way to make an otherwise intractable optimization problem solvable.
|
|
|
</li>
|
|
|
<li>
|
|
|
<strong>Adversarial Loss (for GANs):</strong> This is the "minimax" game objective where the Generator tries to minimize the loss while the Discriminator tries to maximize it.
|
|
|
</li>
|
|
|
</ul>
|
|
|
|
|
|
<h2>🔹 Workflow of Generative Models</h2>
|
|
|
<ol>
|
|
|
<li><strong>Collect Data:</strong> Gather a large, high-quality dataset of the thing you want to generate (e.g., thousands of celebrity faces).</li>
|
|
|
<li><strong>Choose a Model:</strong> Select the right type of generative model for your task (e.g., a GAN or Diffusion Model for realistic images).</li>
|
|
|
<li><strong>Train the Model:</strong> This is the most computationally expensive step, where the model learns the underlying patterns and distribution of the training data.</li>
|
|
|
<li><strong>Generate New Samples:</strong> After training, you can use the model to generate new, synthetic data by sampling from its learned distribution.</li>
|
|
|
<li><strong>Evaluate Quality:</strong> Assess the quality of the generated samples using both quantitative metrics (like FID) and human evaluation.</li>
|
|
|
</ol>
|
|
|
|
|
|
<h2>🔹 Applications</h2>
|
|
|
<ul>
|
|
|
<li><strong>Image Generation and Editing:</strong> Creating photorealistic faces, art, or modifying existing images (e.g., DALL-E, Midjourney, Stable Diffusion).</li>
|
|
|
<li><strong>Text Generation:</strong> Powering chatbots, writing articles, and generating code (e.g., GPT-4).</li>
|
|
|
<li><strong>Data Augmentation:</strong> Creating more training data for other machine learning models, which is especially useful for rare events or imbalanced datasets.</li>
|
|
|
<li><strong>Drug Discovery and Design:</strong> Generating new molecular structures with desired properties to accelerate scientific research.</li>
|
|
|
<li><strong>Music and Art Creation:</strong> Composing new melodies or creating novel artistic styles.</li>
|
|
|
</ul>
|
|
|
|
|
|
<h2>🔹 Advantages & Disadvantages</h2>
|
|
|
<h3>Advantages:</h3>
|
|
|
<ul>
|
|
|
<li>✅ <strong>Creative and Powerful:</strong> Can generate novel, high-quality data that has never been seen before.</li>
|
|
|
<li>✅ <strong>Unsupervised Learning:</strong> Can learn from vast amounts of unlabeled data.</li>
|
|
|
<li>✅ <strong>Data Augmentation:</strong> Solves the problem of limited training data by creating realistic synthetic samples.</li>
|
|
|
</ul>
|
|
|
<h3>Disadvantages:</h3>
|
|
|
<ul>
|
|
|
<li>❌ <strong>Computationally Expensive:</strong> Training large generative models requires significant GPU resources and time.</li>
|
|
|
<li>❌ <strong>Training Instability:</strong> GANs, in particular, can be notoriously difficult to train, suffering from problems like mode collapse.</li>
|
|
|
<li>❌ <strong>Difficult to Evaluate:</strong> How do you objectively measure "creativity" or "realism"? Evaluating the quality of generated content is often subjective.</li>
|
|
|
</ul>
|
|
|
|
|
|
<h2>🔹 Key Evaluation Metrics</h2>
|
|
|
<ul>
|
|
|
<li><strong>Inception Score (IS):</strong> Measures how diverse and clear the generated images are. A higher score is better.</li>
|
|
|
<li><strong>Frechet Inception Distance (FID):</strong> Compares the statistical distribution of generated images to real images. It's considered a more reliable metric than IS. A lower score is better.</li>
|
|
|
<li><strong>Perplexity (for text):</strong> Measures how well a language model predicts a sample of text. A lower perplexity indicates the model is less "surprised" by the text, meaning it's a better fit.</li>
|
|
|
</ul>
|
|
|
|
|
|
<h2>🔹 Python Implementation (Conceptual Sketches)</h2>
|
|
|
<div class="story-gen">
|
|
|
<p>Training large generative models from scratch is a major undertaking. Here are conceptual sketches of what the code looks like using popular frameworks.</p>
|
|
|
</div>
|
|
|
<div class="example-gen">
|
|
|
<h3>Simple GAN Generator in PyTorch</h3>
|
|
|
<pre><code>
|
|
|
import torch.nn as nn
|
|
|
import numpy as np
|
|
|
|
|
|
class Generator(nn.Module):
|
|
|
def __init__(self, latent_dim, img_shape):
|
|
|
super(Generator, self).__init__()
|
|
|
self.img_shape = img_shape
|
|
|
self.model = nn.Sequential(
|
|
|
# Takes a random noise vector (latent_dim) and upsamples it
|
|
|
nn.Linear(latent_dim, 128),
|
|
|
nn.LeakyReLU(0.2, inplace=True),
|
|
|
nn.Linear(128, 256),
|
|
|
nn.BatchNorm1d(256),
|
|
|
nn.LeakyReLU(0.2, inplace=True),
|
|
|
nn.Linear(256, 512),
|
|
|
nn.BatchNorm1d(512),
|
|
|
nn.LeakyReLU(0.2, inplace=True),
|
|
|
nn.Linear(512, int(np.prod(self.img_shape))),
|
|
|
nn.Tanh() # Scales output to be between -1 and 1
|
|
|
)
|
|
|
|
|
|
def forward(self, z):
|
|
|
img = self.model(z)
|
|
|
img = img.view(img.size(0), *self.img_shape)
|
|
|
return img
|
|
|
</code></pre>
|
|
|
<h3>Simple GAN Generator in TensorFlow/Keras</h3>
|
|
|
<pre><code>
|
|
|
import tensorflow as tf
|
|
|
from tensorflow.keras import layers
|
|
|
import numpy as np
|
|
|
|
|
|
def build_generator(latent_dim, img_shape):
|
|
|
model = tf.keras.Sequential()
|
|
|
|
|
|
model.add(layers.Dense(256, input_dim=latent_dim))
|
|
|
model.add(layers.LeakyReLU(alpha=0.2))
|
|
|
model.add(layers.BatchNormalization(momentum=0.8))
|
|
|
|
|
|
model.add(layers.Dense(512))
|
|
|
model.add(layers.LeakyReLU(alpha=0.2))
|
|
|
model.add(layers.BatchNormalization(momentum=0.8))
|
|
|
|
|
|
model.add(layers.Dense(1024))
|
|
|
model.add(layers.LeakyReLU(alpha=0.2))
|
|
|
model.add(layers.BatchNormalization(momentum=0.8))
|
|
|
|
|
|
model.add(layers.Dense(np.prod(img_shape), activation='tanh'))
|
|
|
model.add(layers.Reshape(img_shape))
|
|
|
|
|
|
return model
|
|
|
</code></pre>
|
|
|
<h3>Using a Pre-trained Model from Hugging Face</h3>
|
|
|
<pre><code>
|
|
|
# Easiest way to get started with powerful generative models!
|
|
|
from transformers import pipeline
|
|
|
|
|
|
# Initialize a text generation pipeline with a pre-trained model
|
|
|
generator = pipeline('text-generation', model='gpt2')
|
|
|
|
|
|
# Generate text
|
|
|
prompt = "In a world where AI could dream,"
|
|
|
generated_text = generator(prompt, max_length=50, num_return_sequences=1)
|
|
|
|
|
|
print(generated_text[0]['generated_text'])
|
|
|
</code></pre>
|
|
|
</div>
|
|
|
|
|
|
<div class="quiz-section">
|
|
|
<h2>📝 Quick Quiz: Test Your Knowledge</h2>
|
|
|
<ol>
|
|
|
<li><strong>What is the key difference between a generative model and a discriminative model?</strong></li>
|
|
|
<li><strong>In a GAN, what are the roles of the Generator and the Discriminator?</strong></li>
|
|
|
<li><strong>What is the core idea behind Diffusion Models?</strong></li>
|
|
|
<li><strong>You have trained a GAN to generate images of cats. You calculate the FID score and get a value of 5. Your colleague trains another model and gets an FID score of 45. Which model is better, and why?</strong></li>
|
|
|
</ol>
|
|
|
<div class="quiz-answers">
|
|
|
<h3>Answers</h3>
|
|
|
<p><strong>1.</strong> A generative model (the artist) learns the underlying distribution of the data, P(X), and can create new samples. A discriminative model (the critic) learns the decision boundary between classes, P(Y|X), and can only classify existing data.</p>
|
|
|
<p><strong>2.</strong> The <strong>Generator</strong> tries to create fake data that looks real. The <strong>Discriminator</strong> tries to distinguish between real data and the Generator's fake data.</p>
|
|
|
<p><strong>3.</strong> The core idea is to learn to reverse a process of gradually adding noise to an image. By mastering this "denoising" process, the model can start with pure noise and denoise it step-by-step into a coherent new image.</p>
|
|
|
<p><strong>4.</strong> Your model with an FID score of 5 is much better. For Frechet Inception Distance (FID), a <strong>lower score</strong> is better, as it indicates that the statistical distribution of your generated images is closer to the distribution of the real images.</p>
|
|
|
</div>
|
|
|
</div>
|
|
|
|
|
|
<h2>🔹 Key Terminology Explained</h2>
|
|
|
<div class="story-gen">
|
|
|
<p><strong>The Story: Decoding the AI Artist's Toolkit</strong></p>
|
|
|
</div>
|
|
|
<ul>
|
|
|
<li>
|
|
|
<strong>Latent Space:</strong>
|
|
|
<br>
|
|
|
<strong>What it is:</strong> A lower-dimensional, compressed representation of the data. It's where the model captures the essential features or "essence" of the data.
|
|
|
<br>
|
|
|
<strong>Story Example:</strong> Imagine a "face space." In this latent space, one axis might represent "age," another "smile intensity," and another "hair color." By picking a point in this space, the model can generate a face with those specific attributes.
|
|
|
</li>
|
|
|
<li>
|
|
|
<strong>Minimax Game:</strong>
|
|
|
<br>
|
|
|
<strong>What it is:</strong> A concept from game theory used to describe the GAN training process. It's a two-player game where one player's gain is the other player's loss.
|
|
|
<br>
|
|
|
<strong>Story Example:</strong> The Generator wants to <strong>mini</strong>mize the probability that the Discriminator catches its fakes. The Discriminator wants to <strong>max</strong>imize its ability to correctly identify fakes. This push-and-pull is the <strong>minimax</strong> game that forces both to improve.
|
|
|
</li>
|
|
|
<li>
|
|
|
<strong>Mode Collapse (in GANs):</strong>
|
|
|
<br>
|
|
|
<strong>What it is:</strong> A common failure case in GAN training where the Generator finds a single "safe" output that can fool the Discriminator and only produces that one output, instead of a diverse range of samples.
|
|
|
<br>
|
|
|
<strong>Story Example:</strong> The artist discovers that drawing one specific, very realistic-looking cat is enough to always fool the critic. So, it stops learning and only ever produces that single cat image. It has "collapsed" to a single mode.
|
|
|
</li>
|
|
|
</ul>
|
|
|
|
|
|
</div>
|
|
|
|
|
|
</body>
|
|
|
</html>
|
|
|
{% endblock %}
|
|
|
|
|
|
|