3줄 요약
features learned via self-supervision are not ‘ready for use’ → need to be adapted for a given downstream task by fine-tuning
In NLP, they use prompting to employ a model for a new task without any fine-tuning (additional training)
→ Is it possible to build a single general model that can perform a wide range of tasks without any fine-tuning?
Visual prompt task → 상단의 이미지 2개를 보고 규칙을 유추한 뒤 좌측 하단의 그림에 규칙을 적용해서 우측 하단의 hole을 inpainting한다
MAE(Masked AutoEncoder), VQGAN(이미지 생성모델 with transformer)
image inpainting model using visual prompting
input: image $(x)$ and binary mask $(m)$
output: new image with the masked regions filled $(y)$