File size: 20,457 Bytes
f7c7e26 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 |
{% extends "layout.html" %}
{% block content %}
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Study Guide: Generative Models</title>
<!-- MathJax for rendering mathematical formulas -->
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
<style>
/* General Body Styles */
body {
background-color: #ffffff; /* White background */
color: #000000; /* Black text */
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif;
font-weight: normal;
line-height: 1.8;
margin: 0;
padding: 20px;
}
/* Container for centering content */
.container {
max-width: 800px;
margin: 0 auto;
padding: 20px;
}
/* Headings */
h1, h2, h3 {
color: #000000;
border: none;
font-weight: bold;
}
h1 {
text-align: center;
border-bottom: 3px solid #000;
padding-bottom: 10px;
margin-bottom: 30px;
font-size: 2.5em;
}
h2 {
font-size: 1.8em;
margin-top: 40px;
border-bottom: 1px solid #ddd;
padding-bottom: 8px;
}
h3 {
font-size: 1.3em;
margin-top: 25px;
}
/* Main words are even bolder */
strong {
font-weight: 900;
}
/* Paragraphs and List Items with a line below */
p, li {
font-size: 1.1em;
border-bottom: 1px solid #e0e0e0; /* Light gray line below each item */
padding-bottom: 10px; /* Space between text and the line */
margin-bottom: 10px; /* Space below the line */
}
/* Remove bottom border from the last item in a list for cleaner look */
li:last-child {
border-bottom: none;
}
/* Ordered lists */
ol {
list-style-type: decimal;
padding-left: 20px;
}
ol li {
padding-left: 10px;
}
/* Unordered Lists */
ul {
list-style-type: none;
padding-left: 0;
}
ul li::before {
content: "โข";
color: #000;
font-weight: bold;
display: inline-block;
width: 1em;
margin-left: 0;
}
/* Code block styling */
pre {
background-color: #f4f4f4;
border: 1px solid #ddd;
border-radius: 5px;
padding: 15px;
white-space: pre-wrap;
word-wrap: break-word;
font-family: "Courier New", Courier, monospace;
font-size: 0.95em;
font-weight: normal;
color: #333;
border-bottom: none;
}
/* Generative Models Specific Styling */
.story-gen {
background-color: #f5f3ff;
border-left: 4px solid #4a00e0; /* Deep purple accent */
margin: 15px 0;
padding: 10px 15px;
font-style: italic;
color: #555;
font-weight: normal;
border-bottom: none;
}
.story-gen p, .story-gen li {
border-bottom: none;
}
.example-gen {
background-color: #f8f7ff;
padding: 15px;
margin: 15px 0;
border-radius: 5px;
border-left: 4px solid #8e2de2; /* Lighter purple accent */
}
.example-gen p, .example-gen li {
border-bottom: none !important;
}
/* Quiz Styling */
.quiz-section {
background-color: #fafafa;
border: 1px solid #ddd;
border-radius: 5px;
padding: 20px;
margin-top: 30px;
}
.quiz-answers {
background-color: #f8f7ff;
padding: 15px;
margin-top: 15px;
border-radius: 5px;
}
/* Table Styling */
table {
width: 100%;
border-collapse: collapse;
margin: 25px 0;
}
th, td {
border: 1px solid #ddd;
padding: 12px;
text-align: left;
}
th {
background-color: #f2f2f2;
font-weight: bold;
}
/* --- Mobile Responsive Styles --- */
@media (max-width: 768px) {
body, .container {
padding: 10px;
}
h1 { font-size: 2em; }
h2 { font-size: 1.5em; }
h3 { font-size: 1.2em; }
p, li { font-size: 1em; }
pre { font-size: 0.85em; }
table, th, td { font-size: 0.9em; }
}
</style>
</head>
<body>
<div class="container">
<h1>๐ Study Guide: Generative Models</h1>
<h2>๐น Core Concepts</h2>
<div class="story-gen">
<p><strong>Story-style intuition: The Artist vs. The Art Critic</strong></p>
<p>Imagine two types of AI that both study thousands of cat photos.
<br>โข The <strong>Discriminative Model</strong> is like an <strong>art critic</strong>. Its only job is to learn the difference between a cat photo and a dog photo. If you show it a new picture, it can tell you, "That's a cat," but it can't create a cat picture of its own. It learns a decision boundary.
<br>โข The <strong>Generative Model</strong> is like an <strong>artist</strong>. It studies the cat photos so deeply that it understands the "essence" of what makes a cat a catโthe patterns, the textures, the shapes. It learns the underlying distribution of "cat-ness." Because it has this deep understanding, it can then be asked to create a brand new, never-before-seen picture of a cat from scratch.</p>
</div>
<p><strong>Generative Models</strong> are a class of statistical models that learn the underlying probability distribution of a dataset. Their primary goal is to understand the data so well that they can "generate" new data samples that are similar to the ones they were trained on.</p>
<h2>๐น Types of Generative Models</h2>
<p>Generative models come in several powerful flavors, each with a different approach to learning and creating.</p>
<ul>
<li>
<strong>Probabilistic Models:</strong> These models explicitly learn a probability distribution P(X). Examples include Naรฏve Bayes and Gaussian Mixture Models (GMMs). They are often easy to interpret but less powerful for complex data like images.
</li>
<li>
<strong>Variational Autoencoders (VAEs):</strong>
<div class="story-gen"><p><strong>Analogy: The Master Forger.</strong> A VAE is like a forger who learns to create masterpieces. It first "compresses" a real painting into a secret recipe (a condensed set of characteristics called the latent space). It then learns to "decompress" that recipe back into a painting. By learning this process, it can later create new recipes and generate new, unique paintings.</p></div>
</li>
<li>
<strong>Generative Adversarial Networks (GANs):</strong>
<div class="story-gen"><p><strong>Analogy: The Artist and Critic Game.</strong> A GAN consists of two competing neural networks: a <strong>Generator</strong> (the artist) that tries to create realistic images, and a <strong>Discriminator</strong> (the critic) that tries to tell the difference between real images and the artist's fakes. They train together in a game where the artist gets better at fooling the critic, and the critic gets better at catching fakes. This competition pushes the artist to create incredibly realistic images.</p></div>
</li>
<li>
<strong>Diffusion Models:</strong>
<div class="story-gen"><p><strong>Analogy: The Sculptor.</strong> A Diffusion Model is like a sculptor who starts with a random block of marble (pure noise) and slowly chisels away the noise, step by step, until a clear statue (a realistic image) emerges. It learns this "denoising" process by first practicing in reverse: taking a perfect statue and systematically adding noise to it until it becomes a random block.</p></div>
</li>
</ul>
<h2>๐น Mathematical Foundations</h2>
<ul>
<li>
<strong>Joint Probability P(X, Y):</strong> Generative models often learn the joint probability of features X and labels Y. This allows them to generate new pairs of (X, Y).
</li>
<li>
<strong>Maximum Likelihood Estimation (MLE):</strong> This is the principle most generative models use for training. They adjust their parameters to maximize the probability (likelihood) that the observed training data was generated by the model.
</li>
<li>
<strong>ELBO (for VAEs):</strong> VAEs optimize a lower bound on the data likelihood called the Evidence Lower Bound. It's a clever way to make an otherwise intractable optimization problem solvable.
</li>
<li>
<strong>Adversarial Loss (for GANs):</strong> This is the "minimax" game objective where the Generator tries to minimize the loss while the Discriminator tries to maximize it.
</li>
</ul>
<h2>๐น Workflow of Generative Models</h2>
<ol>
<li><strong>Collect Data:</strong> Gather a large, high-quality dataset of the thing you want to generate (e.g., thousands of celebrity faces).</li>
<li><strong>Choose a Model:</strong> Select the right type of generative model for your task (e.g., a GAN or Diffusion Model for realistic images).</li>
<li><strong>Train the Model:</strong> This is the most computationally expensive step, where the model learns the underlying patterns and distribution of the training data.</li>
<li><strong>Generate New Samples:</strong> After training, you can use the model to generate new, synthetic data by sampling from its learned distribution.</li>
<li><strong>Evaluate Quality:</strong> Assess the quality of the generated samples using both quantitative metrics (like FID) and human evaluation.</li>
</ol>
<h2>๐น Applications</h2>
<ul>
<li><strong>Image Generation and Editing:</strong> Creating photorealistic faces, art, or modifying existing images (e.g., DALL-E, Midjourney, Stable Diffusion).</li>
<li><strong>Text Generation:</strong> Powering chatbots, writing articles, and generating code (e.g., GPT-4).</li>
<li><strong>Data Augmentation:</strong> Creating more training data for other machine learning models, which is especially useful for rare events or imbalanced datasets.</li>
<li><strong>Drug Discovery and Design:</strong> Generating new molecular structures with desired properties to accelerate scientific research.</li>
<li><strong>Music and Art Creation:</strong> Composing new melodies or creating novel artistic styles.</li>
</ul>
<h2>๐น Advantages & Disadvantages</h2>
<h3>Advantages:</h3>
<ul>
<li>โ
<strong>Creative and Powerful:</strong> Can generate novel, high-quality data that has never been seen before.</li>
<li>โ
<strong>Unsupervised Learning:</strong> Can learn from vast amounts of unlabeled data.</li>
<li>โ
<strong>Data Augmentation:</strong> Solves the problem of limited training data by creating realistic synthetic samples.</li>
</ul>
<h3>Disadvantages:</h3>
<ul>
<li>โ <strong>Computationally Expensive:</strong> Training large generative models requires significant GPU resources and time.</li>
<li>โ <strong>Training Instability:</strong> GANs, in particular, can be notoriously difficult to train, suffering from problems like mode collapse.</li>
<li>โ <strong>Difficult to Evaluate:</strong> How do you objectively measure "creativity" or "realism"? Evaluating the quality of generated content is often subjective.</li>
</ul>
<h2>๐น Key Evaluation Metrics</h2>
<ul>
<li><strong>Inception Score (IS):</strong> Measures how diverse and clear the generated images are. A higher score is better.</li>
<li><strong>Frechet Inception Distance (FID):</strong> Compares the statistical distribution of generated images to real images. It's considered a more reliable metric than IS. A lower score is better.</li>
<li><strong>Perplexity (for text):</strong> Measures how well a language model predicts a sample of text. A lower perplexity indicates the model is less "surprised" by the text, meaning it's a better fit.</li>
</ul>
<h2>๐น Python Implementation (Conceptual Sketches)</h2>
<div class="story-gen">
<p>Training large generative models from scratch is a major undertaking. Here are conceptual sketches of what the code looks like using popular frameworks.</p>
</div>
<div class="example-gen">
<h3>Simple GAN Generator in PyTorch</h3>
<pre><code>
import torch.nn as nn
import numpy as np
class Generator(nn.Module):
def __init__(self, latent_dim, img_shape):
super(Generator, self).__init__()
self.img_shape = img_shape
self.model = nn.Sequential(
# Takes a random noise vector (latent_dim) and upsamples it
nn.Linear(latent_dim, 128),
nn.LeakyReLU(0.2, inplace=True),
nn.Linear(128, 256),
nn.BatchNorm1d(256),
nn.LeakyReLU(0.2, inplace=True),
nn.Linear(256, 512),
nn.BatchNorm1d(512),
nn.LeakyReLU(0.2, inplace=True),
nn.Linear(512, int(np.prod(self.img_shape))),
nn.Tanh() # Scales output to be between -1 and 1
)
def forward(self, z):
img = self.model(z)
img = img.view(img.size(0), *self.img_shape)
return img
</code></pre>
<h3>Simple GAN Generator in TensorFlow/Keras</h3>
<pre><code>
import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
def build_generator(latent_dim, img_shape):
model = tf.keras.Sequential()
model.add(layers.Dense(256, input_dim=latent_dim))
model.add(layers.LeakyReLU(alpha=0.2))
model.add(layers.BatchNormalization(momentum=0.8))
model.add(layers.Dense(512))
model.add(layers.LeakyReLU(alpha=0.2))
model.add(layers.BatchNormalization(momentum=0.8))
model.add(layers.Dense(1024))
model.add(layers.LeakyReLU(alpha=0.2))
model.add(layers.BatchNormalization(momentum=0.8))
model.add(layers.Dense(np.prod(img_shape), activation='tanh'))
model.add(layers.Reshape(img_shape))
return model
</code></pre>
<h3>Using a Pre-trained Model from Hugging Face</h3>
<pre><code>
# Easiest way to get started with powerful generative models!
from transformers import pipeline
# Initialize a text generation pipeline with a pre-trained model
generator = pipeline('text-generation', model='gpt2')
# Generate text
prompt = "In a world where AI could dream,"
generated_text = generator(prompt, max_length=50, num_return_sequences=1)
print(generated_text[0]['generated_text'])
</code></pre>
</div>
<div class="quiz-section">
<h2>๐ Quick Quiz: Test Your Knowledge</h2>
<ol>
<li><strong>What is the key difference between a generative model and a discriminative model?</strong></li>
<li><strong>In a GAN, what are the roles of the Generator and the Discriminator?</strong></li>
<li><strong>What is the core idea behind Diffusion Models?</strong></li>
<li><strong>You have trained a GAN to generate images of cats. You calculate the FID score and get a value of 5. Your colleague trains another model and gets an FID score of 45. Which model is better, and why?</strong></li>
</ol>
<div class="quiz-answers">
<h3>Answers</h3>
<p><strong>1.</strong> A generative model (the artist) learns the underlying distribution of the data, P(X), and can create new samples. A discriminative model (the critic) learns the decision boundary between classes, P(Y|X), and can only classify existing data.</p>
<p><strong>2.</strong> The <strong>Generator</strong> tries to create fake data that looks real. The <strong>Discriminator</strong> tries to distinguish between real data and the Generator's fake data.</p>
<p><strong>3.</strong> The core idea is to learn to reverse a process of gradually adding noise to an image. By mastering this "denoising" process, the model can start with pure noise and denoise it step-by-step into a coherent new image.</p>
<p><strong>4.</strong> Your model with an FID score of 5 is much better. For Frechet Inception Distance (FID), a <strong>lower score</strong> is better, as it indicates that the statistical distribution of your generated images is closer to the distribution of the real images.</p>
</div>
</div>
<h2>๐น Key Terminology Explained</h2>
<div class="story-gen">
<p><strong>The Story: Decoding the AI Artist's Toolkit</strong></p>
</div>
<ul>
<li>
<strong>Latent Space:</strong>
<br>
<strong>What it is:</strong> A lower-dimensional, compressed representation of the data. It's where the model captures the essential features or "essence" of the data.
<br>
<strong>Story Example:</strong> Imagine a "face space." In this latent space, one axis might represent "age," another "smile intensity," and another "hair color." By picking a point in this space, the model can generate a face with those specific attributes.
</li>
<li>
<strong>Minimax Game:</strong>
<br>
<strong>What it is:</strong> A concept from game theory used to describe the GAN training process. It's a two-player game where one player's gain is the other player's loss.
<br>
<strong>Story Example:</strong> The Generator wants to <strong>mini</strong>mize the probability that the Discriminator catches its fakes. The Discriminator wants to <strong>max</strong>imize its ability to correctly identify fakes. This push-and-pull is the <strong>minimax</strong> game that forces both to improve.
</li>
<li>
<strong>Mode Collapse (in GANs):</strong>
<br>
<strong>What it is:</strong> A common failure case in GAN training where the Generator finds a single "safe" output that can fool the Discriminator and only produces that one output, instead of a diverse range of samples.
<br>
<strong>Story Example:</strong> The artist discovers that drawing one specific, very realistic-looking cat is enough to always fool the critic. So, it stops learning and only ever produces that single cat image. It has "collapsed" to a single mode.
</li>
</ul>
</div>
</body>
</html>
{% endblock %}
|