[FastVit] Vision Transformer from APPLE

https://github.com/apple/ml-fastvit?s=09

https://arxiv.org/pdf/2303.14189.pdf

In this work, we introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art latency-accuracy trade-off.

To this end, we introduce a novel token mixing operator, RepMixer, a building block of FastViT, that uses structural reparameterization to lower the memory access cost by removing skip-connections in the network.

We further apply traintime overparametrization and large kernel convolutions to boost accuracy and empirically show that these choices have minimal effect on latency

'On Going > Deep Learning' 카테고리의 다른 글

[ViT] Vision Transformer 리뷰 (0)	2025.03.24
On Going/Deep Learning[Paper Review] ConvNeXt - A ConvNet for the 2020s (4/n) ConvNet : Other tasks (0)	2023.12.04
[Paper Review] ConvNeXt - A ConvNet for the 2020s (3/n) ConvNet : Evaluations on classification (0)	2023.11.28
[Paper Review] ConvNeXt - A ConvNet for the 2020s (2/n) ConvNet : a Roadmap (1)	2023.11.27
[Paper Review] ConvNeXt - A ConvNet for the 2020s (1/n) Introduction (1)	2023.11.27

Problem Solver

[FastVit] Vision Transformer from APPLE

'On Going > Deep Learning' 카테고리의 다른 글

댓글

티스토리툴바

[FastVit] Vision Transformer from APPLE

'On Going > Deep Learning' 카테고리의 다른 글

관련글

댓글

티스토리툴바