SEMINAR

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Joohoon Lee
2023.03.21
Self-supervised Learning
Multi-Modal
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
VENUE2021 NeurIPS
PAPER LINKOpenReview