Lost in Translation: Do LVLM Judges Generalize Across Languages?

论文概览

Automatic evaluators such as reward models play a central role in the alignment and evaluation of large vision-language models (LVLMs). Despite their growing importance, these evaluators are almost exclusively assessed…

规范主键

arxiv:2604.19405

合并来源

arXiv

作者

Md Tahmid Rahman Laskar，Mohammed Saidul Islam，Mir Tafseer Nayeem，Amran Bhuiyan，Mizanur Rahman，Shafiq Joty，Enamul Hoque，Jimmy Huang

分类

cs.CL

标签

评测 / 方法

主题词

Benchmark / Evaluation

首次出现

2026-04-22 11:37:03 (UTC+08:00)

个人反馈

把你为什么标记这篇论文、接下来准备怎么处理，直接挂在规范化详情页上。

当前还没有个人反馈，可以先用本地 feedback CLI 补上。

反馈操作

复制规范主键或本地 CLI 命令，把这篇论文快速加入个人反馈状态文件。

行动提醒状态

这里记录这篇论文最近已经触发过哪些 action reason，便于解释为什么今天没有再次提醒。

当前还没有记录过 action 提醒。

来源与外链

优先展示这篇论文在各来源上的规范化入口，再补当前摘要页和 PDF。

arXiv PDF

历史命中

按归档时间回看它在哪些 feed 中出现过，并保留当日 digest 产物入口。

LLM

2026-04-22

2026-04-22 11:37:03 (Asia/Shanghai)

Automatic evaluators such as reward models play a central role in the alignment and evaluation of large vision-language models (LVLMs). Despite their growing importance, these eva…

Score 98 · summary matched "reasoning"；summary matched "alignment"；summary matched "benchmark"

Markdown JSON 对应 Feed 页

Lost in Translation: Do LVLM Judges Generalize Across Languages?

论文概览

个人反馈

反馈操作

行动提醒状态

来源与外链

历史命中

2026-04-22

相关推荐

Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

Beyond Rating: A Comprehensive Evaluation and Benchmark for AI Reviews

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic

QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies