Keyword Tracking

关键词追踪：computer-use agent

这个页面会长期追踪你配置里关心的关键词，并把命中的论文按日期沉淀下来。

近期走势

最近一次命中来自 LM：Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets

2026-06-15

2026-06-16

2026-06-17

2026-06-18

2026-06-19

2026-06-20

2026-06-21

2026-06-22

2026-06-23

2026-06-24

2026-06-25

2026-06-26

2026-06-27

2026-06-28

命中明细

按日期回看匹配到这个关键词的论文标题，并保留来源 feed 信息。

2026-06-25

2026-06-25 13:11:21 (Asia/Shanghai)

Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets

查看原始来源

Computer-use agents turn vision-language model (VLM) predictions into executable GUI clicks, so reliable uncertainty estimates are essential for rejection, calibration, miss-sever…

2026-06-24

2026-06-24 13:06:49 (Asia/Shanghai)

Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation

查看原始来源

Computer-Use Agents (CUAs) execute high-level user goals by perceiving and acting directly within graphical user interfaces. However, reinforcement learning for CUAs remains diffi…

2026-06-23

2026-06-23 13:10:02 (Asia/Shanghai)

Agent Runtime Security

Capable but Careless: Do Computer-Use Agents Follow Contextual Integrity?

查看原始来源

Computer-use agents (CUAs) now act on a user's behalf across personal applications such as email, calendars, and to-do lists. This cross-application access is useful, but it also…

2026-06-16

2026-06-16 14:38:43 (Asia/Shanghai)

Agent Runtime Security

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

查看原始来源

Current benchmarks for computer-use agents evaluate models in impersonal environments. This leaves a gap between evaluation and deployment where personal assistants are expected t…

2026-06-12

2026-06-12 13:55:02 (Asia/Shanghai)

Agent Runtime Security

ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm

查看原始来源

Existing computer-use agents remain fundamentally limited in professional software manipulation: GUI-based agents suffer from fragile visual grounding and long-horizon error accum…

2026-06-10

2026-06-10 13:25:04 (Asia/Shanghai)

Agent Runtime Security

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

查看原始来源

Recent years have witnessed the rapid evolution of AI agents toward handling increasingly complex, real-world tasks. However, existing benchmarks rarely evaluate whether agents ca…

Agent Runtime Security

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

查看原始来源

Data tells stories that shape society; the data journalist's job is to turn raw information into stories non-experts can trust. A high-quality news feature takes a newsroom team w…

2026-06-09

2026-06-09 13:12:49 (Asia/Shanghai)

Agent Runtime Security

WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

查看原始来源

Computer-use agents (CUAs) increasingly operate in runtimes that combine visual desktop control, command-line execution, code editing, browsers, and external tools. Existing bench…

2026-06-03

2026-06-03 14:09:56 (Asia/Shanghai)

Agent Runtime Security

MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents

查看原始来源

Computer-use agents could automate repetitive screen-based clinical work, but their reliability in medical graphical user interfaces remains largely unvalidated. Existing benchmar…

2026-05-28

2026-05-28 13:15:52 (Asia/Shanghai)

Agent Runtime Security

Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents

查看原始来源

Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software domain remains expensive. Small open computer-use agent…

2026-05-26

2026-05-26 13:09:24 (Asia/Shanghai)

Agent Runtime Security

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

查看原始来源

Reinforcement learning with verifiable rewards (RLVR) has driven breakthroughs in domains such as math, tool-use, and software engineering, yet its extension to computer-use agent…

Agent Runtime Security

AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions

查看原始来源

Autonomous computer use agents that powered by multimodal large language models (MLLMs) are emerging as capable assistants for completing complex digital workflows. However, real-…

2026-05-21

2026-05-21 13:14:24 (Asia/Shanghai)

Agent Runtime Security

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

查看原始来源

Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click,…

2026-05-20

2026-05-20 13:10:58 (Asia/Shanghai)

Agent Runtime Security

OpenComputer: Verifiable Software Worlds for Computer-Use Agents

查看原始来源

We present OpenComputer, a verifier-grounded framework for constructing verifiable software worlds for computer-use agents. OpenComputer integrates four components: (1) app-specif…

Agent Runtime Security

TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing

查看原始来源

LLM routing matters most in long-horizon applications such as coding agents, deep research systems, and computer-use agents, where a single user request triggers many model calls.…

2026-05-08

2026-05-08 14:15:32 (Asia/Shanghai)

Agent Runtime Security

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

查看原始来源

Self-hosted computer-use agents (SHCUAs), such as OpenClaw, combine natural-language interaction with direct access to host-side resources, including browsers, files, scripts, sys…

2026-05-01

2026-05-01 12:53:56 (Asia/Shanghai)

Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

查看原始来源

This paper examines how different types of large language model (LLM) agents perform on scientific visualization (SciVis) tasks, where users generate visualization workflows from…

2026-04-08

2026-04-08 17:10:24 (Asia/Shanghai)

LLM

Gym-Anything: Turn any Software into an Agent Environment

查看原始来源

Computer-use agents hold the promise of assisting in a wide range of digital economic activities. However, current research has largely focused on short-horizon tasks over a limit…