Week4 정리

Transfer Learning

1. FC Layer만 update하는 방법 (주로 데이터가 적을때)

2. Cov Layer의 learning rate는 낮게하고, FC layer는 learning rate를 높게 함(주로 데이터가 충분히 있을때)

Knowledge distillation 

모델압축에 유용하게 사용됨,

pretrained된 모델의 inference를 unlabed data의 pseudo-label로써 사용하는 방법

1. Unsupervised Version (label이 없는 데이터를 학습)

같은 input에 대해 Teacher Model, Student Model의 출력값을 통해 Student Model만을 학습시킴

2. Supervised Version (label이 있는 데이터를 학습)

Distillation Loss와 Student Loss의 weighted sum을 이용하여 Student Model의 학습"만" 진행한다.

Distllation Loss from Teacher Model(pre-trained)

Student Loss form Student Model(Not trained)

Distillation Loss = KLdiv Loss

Student Loss = Cross Entropy Loss

Self-training (Augmentation + Teacher-Student networks + Semi-superviesed learning)

= Self-training with noisy student

1. Train teacher model with labeled data

2. predict labels for unlabeled data using the pre-trained teacher model

3. Train a student model with labeled and pseudo-labeled data with noise(e.g. RandAugment)

4. Set the trained student as a new teacher model, and iterate Step 2&3 with a new student model

Layer Norm vs Batch Norm

Layer Norm : input(image/ feature map)의 한 채널에 대한 Norm

Batch Norm : input의 모든 채널에 대한 매pixel들에 대한 norm

Problem Solver