1. object detection and segmentation on COCO도 잘하고
2. Semantic segmentation on ADE20K 도 잘하고
3. model efficiency도 낫다
In addition,
Hybrid model = CNN + self attention이 연구되고 있다.
Prior to ViT, the focus was on augmenting a ConvNet with self-attention/non-local modules [8, 55, 66, 79] to capture long-range dependencies
The original ViT [20] first studied a hybrid configuration, and a large body of follow-up works focused on reintroducing convolutional priors to ViT, either in an explicit [15, 16, 21, 82, 86, 88] or implicit [45] fashion.
최근 동향
결론
여전히 ConvNet의 구조는 competible하다!
댓글