what is cGAN?
Translating an image given "condition" \(P(X|condition)\) condition을 따르는 확률분포를 학습하는 것
Generative model vs Conditional generative model
- Generative model generates a random sample
- Conditional generative model generates a random smaple under the given "condition"
Example of conditional generative model
\(P\)(high resolution audio|low resolution audio\()\),
\(P(\) English sentence|Chinese sentence\()\),
\(P(\) A full article|An article's title and subtitle\(), etc.\)
cGAN의 모델구조 : GAN모델구조 + C(condition), GAN의 장점을 이어받아 확장시켰다.
EXAMPLE : Image-to-Image translation (style transfer, 흑백 색칠, super resolution 등등)
Difference between regression and conditional GAN for SR
MAE(L1), MSE(L2) loss를 활용하여 학습하는 경우를 regression model이라고 한다. 이 결과는 적당히 좋음(만족스럽지 않음) ~적당히 평균 image를 내놓으면 학습이 쉬워지기 때문에 Loss를 이렇게 평범한 걸 쓰면 안 된다 이 말이야~
Toy example)
conditions
- Task : colorizing the given image
- Real image has only two colors, "black" or "white"
L1 loss generates "gray" output, an average of "black" and "white" (possible solutions)
GAN loss generates "black" or "white" output, since gray image is caught by discriminator
응용사례
1. Pix2 Pix
- Transalting an image to a corresponding image in another domain (e.g., style)
- Example of a conditional GAN where the condition is given as an input image
Loss function of Pix2 Pix
- If only using L1 loss, blurry image are generated (초반 길라잡이)
- GAN loss induces more realistic outputs close to real distribution
Pix2 Pix loss : Total loss(GAN loss +L1 loss)
\(G^* = arg\min_G\max_D L_{cGAN}(G, D) + \lambda L_{L_1}(G)\)
\(L_{cGAN}(G, D) = E_{x, y}[\log D(x, y)]+ [E_{x, z}\log(1-D(x, G(x, z))] \)
\(L_{L_1} (G) = E_{x, y, z}[||y-G(x, z)||_1] \)
2. CycleGan
- In Pix2 Pix, we need "pairwise data" to learn translation between two domains(supervised)
- e.g., "sketch" and "real image" pairs are required to translate a sketch into a real image
- However, it is hard or impossible to get a pairwise dataset
- Cycle GAN enables the translation between domains with non-pairwise datasets
- Do note require direct correspondences between(Monet portrait, real photo)
CycleGAN loss = GAN loss(in both direction)+ Cycle-consistency loss
\(L_{GAN}(X->Y) + L_{GAN}(Y->X) + L_{cycle}(G, F) \), where G/F are generators
- GAN loss : transalte an image image in domain A to B, and vice versa
- Cycle - consistency loss : Enforce the fact that an image and its manipulated one by going back-and-fourth should be same
GAN loss in CycleGAN
- GAN loss does translation
- CycleGAN has two GAN losses, in both direction (X->Y, Y->X)
- GAN loss : \(L(D_x) + L(D_y) + L(G) + L(F) \)
- G,F : generator
- \(D_X, D_Y\): discriminator
If we solely use GAN loss
- (Mode Collapse) Regardless of input, the generator could always outpu the same one!
- Contents of input is not properly reflected in output!
Solution : Cycle-consistency loss to preserve contents
-Translate an image in X to Y, and translate its output image into X again
- The recovered image should be same with the original image!
- Contents in an image should be preserved to do so
- No supervision(o.e., self-supervision)
3. Perceptual loss
Perceptual loss, yet another approach for achieveing high quality output
-Adversarial loss
- Relatively hard to train and code(Generator & Discriminator adversarially improve)
- Do not require any pre-trained networks
- since no pre-trained network is required, can be applied to various applications
- Perceptual loss
- Simple to train and code(trained only with simple forward & backward computation)
- Requiring a pre-trained network to measure a learned loss
- Feature reconstruction loss : style target과 content target에 해당되는 loss를 각각 사용
Various GAN applications
- Deepfake, there is deepface detection challenge also
- Face de-identicication, Face anonymization with passcode
- Video translation(manipulation)
'Have Done > Generative Model' 카테고리의 다른 글
Generative Model Basic 2 (2) | 2023.12.05 |
---|---|
Generative model Basic 1 (0) | 2023.12.05 |
Conditional Generative model (0) | 2023.12.05 |
댓글