Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Masked Autoencoders Are Scalable Vision Learners
[go: Go Back, main page]

Papers
arxiv:2111.06377

Masked Autoencoders Are Scalable Vision Learners

Published on Nov 11, 2021
Authors:
,
,
,
,

Abstract

Masked autoencoders, using an asymmetric architecture, efficiently train large models for computer vision that achieve high accuracy and superior transfer performance.

This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core designs. First, we develop an asymmetric encoder-decoder architecture, with an encoder that operates only on the visible subset of patches (without mask tokens), along with a lightweight decoder that reconstructs the original image from the latent representation and mask tokens. Second, we find that masking a high proportion of the input image, e.g., 75%, yields a nontrivial and meaningful self-supervisory task. Coupling these two designs enables us to train large models efficiently and effectively: we accelerate training (by 3x or more) and improve accuracy. Our scalable approach allows for learning high-capacity models that generalize well: e.g., a vanilla ViT-Huge model achieves the best accuracy (87.8%) among methods that use only ImageNet-1K data. Transfer performance in downstream tasks outperforms supervised pre-training and shows promising scaling behavior.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2111.06377
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 104

Browse 104 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2111.06377 in a dataset README.md to link it from this page.

Spaces citing this paper 135

Browse 135 spaces citing this paper

Collections including this paper 6