본문 바로가기
On Going/Deep Learning

[FastVit] Vision Transformer from APPLE

by 에아오요이가야 2023. 8. 18.

https://github.com/apple/ml-fastvit?s=09 

 

https://arxiv.org/pdf/2303.14189.pdf

 

In this work, we introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art latency-accuracy trade-off.

 

To this end, we introduce a novel token mixing operator, RepMixer, a building block of FastViT, that uses structural reparameterization to lower the memory access cost by removing skip-connections in the network.

 

We further apply traintime overparametrization and large kernel convolutions to boost accuracy and empirically show that these choices have minimal effect on latency

 

 

댓글