3줄 요약

Introduction

  1. presenting a simple yet surprisingly powerful general approach for visual prompting (MAE-VQGAN)
  2. providing a new dataset that allows a model to learn such grid structures without any labeling, task descriptions, or any additional information about the grid structure
  3. showing that while using our new dataset for training is essential, adding more generic image data from other sources further improves the results.

MAE-VQGAN model