SEMINAR

SHARP: Steering Hallucination in LVLMs via Representation Engineering

Daeun Moon

2026.04.17

LVLM

Multi-Modal

VENUE2025 EMNLP

Overview

LVLM hallucination은 다양한 원인이 존재하지만 구분 없이 다뤄지는 경우가 많음
주요 원인은 두 가지로 구분 가능
- textual prior에 과도하게 의존하는 경우
- vision과 context 간 충돌이 발생하는 경우
모델 내부 representation에서 이러한 원인을 구분하고 활용하는 것이 핵심 문제

hallucination을 원인별로 분리하고 representation 수준에서 제어하는 방법 제안
Cause-specific Analysis
- textual prior와 vision-context conflict 두 가지 원인으로 분해
- 각 원인별로 faithful과 hallucinated 데이터를 구성
- representation에서 두 경우가 구분 가능한지 분석
Representation Separability
- hidden state 기반 linear probing으로 hallucination 여부와 원인 구분 가능 확인
- 중간 layer에서 가장 잘 구분되는 특징이 나타남
Steering Vector Extraction
- truthful representation 평균과 hallucinated representation 평균의 차이로 vector 정의
- 원인별로 서로 다른 steering vector 생성
Inference-time Intervention
- 특정 layer의 hidden state에 steering vector를 직접 추가
- 추가 학습 없이 inference 단계에서만 적용
- intervention strength와 vector weight로 조절