SEMINAR

End-to-End Test-Time Training for Long Context

Hansol Jeong

2026.04.30

Natural Language Processing

VENUE2025 arXiv

PAPER LINKarXiv

Overview

test-time에서 모델을 지속적으로 업데이트하여 문맥 정보를 학습
Test-Time Training
- next token prediction loss 기반으로 매 step 파라미터 업데이트
- 문맥 정보를 모델 파라미터에 축적
TTT-E2E 구조
- outer loop: 초기 파라미터(meta-learning) 학습
- inner loop: test-time sequential update
- 별도 auxiliary loss 없이 NTP loss로 end-to-end 학습
Mini-batch TTT
- token 단위 대신 mini-batch 단위 업데이트
- 안정성 및 효율성 향상
- sliding window 기반으로 구성
Sliding Window Attention
- 제한된 window 내에서 attention 수행
- window size > batch size로 설정하여 정보 전달 유지
- long context에서도 안정적 처리 가능
Efficient Update Strategy
- MLP layer만 test-time에 업데이트
- 전체 block 중 일부(약 1/4)만 선택적으로 업데이트
- 일부 MLP는 고정하여 catastrophic forgetting 완화