On Going/Deep Learning
[FastVit] Vision Transformer from APPLE
에아오요이가야
2023. 8. 18. 14:09
https://github.com/apple/ml-fastvit?s=09
https://arxiv.org/pdf/2303.14189.pdf
In this work, we introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art latency-accuracy trade-off.
To this end, we introduce a novel token mixing operator, RepMixer, a building block of FastViT, that uses structural reparameterization to lower the memory access cost by removing skip-connections in the network.
We further apply traintime overparametrization and large kernel convolutions to boost accuracy and empirically show that these choices have minimal effect on latency