Object detection

TASKS

Image

Segmentic segmentation

instance segmentation (같은 클래스끼리 객체의 구분이 가능하냐)

panoptic segmentation (instace segmentaion + alpha)

Object Detection = classification + Box localization (label :(P_class, x_min, y_min, x_max, y_max) 꼴)

무인차 응용사례, OCR 등

object detection의 역사

- image의 경계(gradient)를 근사하는 접근(SVM)

- selective search : box proposal(candidate object)

1) over-segmentation : 일단 색깔 전체를 나눔

2) Iteratively merging similar regions : 유사한 영역을 합치는 작업을 반복함

3) Extracting candidate boxes from all remaining segmentations : segmentation의 후보가 될 box를 추출함

Two stage detector

R-CNN 압도적으로 높은 성능을 보이며 데뷔 : bounding box를 미리 찾은 다음에 classification 하는 접근

1. Input image

2. Extract region proposals (2k 이하)

3. Compute CNN features

4. Classify regions

Fast R-CNN : 영상전체의 feature를 추출한 다음 재활용하여 box를 찾음 , box는 여전히 selective search 등으로 뽑아냄

1. input image

2. conv를 통해 feature map을 생산

3. ROI pooling : ROI feature extraction from the feature map through ROI pooling

4. Class and box prediction for each ROI

-> R-CNN보다 빠름

Faster R-CNN : Region proposal을 nn으로 대체함(RPN)으로써 end-to-end 구조 완성

Anchor boxes : 각 위치에서 있을 것 같은 box후보군을 미리 설정해 놓음

- A set of pre-defined bounding boxes

- IoU with GT > 0.7 => positive sample

- IoU with GT < 0.3 => negative sample

RPN(region proposal network)

conv layer에서 구해진 feature map에 RPN의 결과인 후보들에 대해 pooling을 한 뒤 classify 함

feature map에서 sliding window방식으로 k개의 anchor boxes를 갖고

classification layer(object vs no object)와 regression layer(x, y, w, h)로 결과를 뽑아냄 -> 이 부분은 학습용이고

결과는 다른 layer로 냄

IoU(intersection over Union) = Area of Overlap / Area of Union 높을수록 잘했다.

Single stage detection - 정확도는 조금 떨어지더라도 속도를 확보해서 real time inference를 가능하도록 함

region proposal 없이 box regression과 classifiaction을 동시에 진행함

YOLO

input image를 S x S grid로 나눈 뒤 각 grid에 대해 B개의 1. box와 confidence score, 2. class score를 예측한 뒤 합침

SSD(single shot multibox detector)

multi-scale output, multople feature maps를 만들어냄

모든 feature map마다 생성된 anchor box를 모두 이용하여 inference 함 - 성능/속도 yolo 다 이김

Two-stage detector vs Single-stage detector

single은 roi pooling이 없기에 모든 영역에서의 loss가 계산되기에 불필요한 gradient가 생성됨

-> detection 문제에서 positive sample은 적고 negative exmaple이 매우 많아져서 class imbalance문제가 있음

-> 이를 해결하기 위해 Focal loss를 제안함 : cross entropy loss의 발전버전, class imbalance문제 해결 시도

FPN(Feature Pyramid Networks) + class/box prediction branches

U-Net과 구조는 유사하지만 expanding부분에서 concatenation이 아닌 + 연산으로 fusion 함

DETR(Detection with Transformer)

Detecting objects as keypoints - 물체의 중심 point를 찾고 확장시키면서 object 찾는 연구가 논의되고 있음

CornerNet (single stage와 유사- 속도 높고 성능 낮음)

- Bounding box = {top-left, bottom-right} corners

heat map을 통해 각각의 corner를 찾은 뒤 embedding매칭을 해서 각 코너를 추출함????

CenterNet (성능 올려잇)

1 - Bounding box = {top-left, bottom-right, center} point

2 -Bounding box = {width, height, center} point ~ 이게 빠르고 성능도 좋음

'On Going > Computer Vision' 카테고리의 다른 글

Landmark localization (1)	2023.12.05
Segmentations(Instance & Panoptic) (1)	2023.12.05
[Computer Vision] Super Resolution (0)	2022.04.11
[Computer Vision] Face detector (0)	2022.03.16
[Computer Vision] 나의 관심 분야 (0)	2022.03.15

Problem Solver

Object detection

'On Going > Computer Vision' 카테고리의 다른 글

댓글

티스토리툴바

Object detection

'On Going > Computer Vision' 카테고리의 다른 글

관련글

댓글

티스토리툴바