最近一次命中来自 Agent Runtime Security：Jailbreaking for the Average Jane: Choosing Optimal Jailbreaks via Bandit Algorithms for Automatically Enhanced Queries

7 / 7d26 / 30d42 / all

Topic

guardrail

最近一次命中来自 Agent Runtime Security：Do Safety Guardrails Need to Reason? LeanGuard: A Fast and Light Approach for Robust Moderation

8 / 7d24 / 30d40 / all

Topic

prompt injection

最近一次命中来自 LM：Prompt Injection in Automated Résumé Screening with Large Language Models: Single and Multi-Injection Settings

5 / 7d21 / 30d31 / all

Topic

SWE-bench

最近一次命中来自 Terminal and SWE Agents：To Run or Not to Run: Analyzing the Cost-Effectiveness of Code Execution in LLM-Based Program Repair

3 / 7d16 / 30d29 / all

Topic

computer-use agent

最近一次命中来自 LM：Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets

3 / 7d9 / 30d18 / all

Topic

Terminal-Bench

最近一次命中来自 Terminal and SWE Agents：LemonHarness Technical Report

2 / 7d7 / 30d12 / all

Topic

code agent

最近一次命中来自 Terminal and SWE Agents：How Much Static Structure Do Code Agents Need? A Study of Deterministic Anchoring

1 / 7d10 / 30d11 / all

Topic

instruction tuning

最近一次命中来自 LM：SARA: Unlocking Multilingual Knowledge in Mixture-of-Experts via Semantically Anchored Routing Alignment

2 / 7d8 / 30d11 / all

Topic

repository-level

最近一次命中来自 Terminal and SWE Agents：Evaluating LLMs on Real-World Software Performance Optimization

2 / 7d7 / 30d10 / all

Topic

in-context learning

最近一次命中来自 LM：MedGuards: Multi-Agent System for Reliable Medical Error Detection and Correction

2 / 7d7 / 30d9 / all

Topic

agent runtime

最近一次命中来自 Agent Runtime Security：Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

0 / 7d3 / 30d8 / all

Topic

indirect prompt injection

最近一次命中来自 Agent Runtime Security：CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts

0 / 7d3 / 30d6 / all

Topic

policy enforcement

最近一次命中来自 LM：A Technical Taxonomy of LLM Agent Communication Protocols

0 / 7d2 / 30d5 / all

Topic

code editing

最近一次命中来自 Agent Runtime Security：WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

0 / 7d1 / 30d4 / all

Topic

code generation benchmark

最近一次命中来自 Terminal and SWE Agents：VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination

0 / 7d3 / 30d4 / all

Topic

issue resolution

最近一次命中来自 Terminal and SWE Agents：Unlocking Model Potentials Through Adaptive Multi-Agent Scaffolding for Efficient Issue Resolution

1 / 7d2 / 30d4 / all

Topic

program repair

最近一次命中来自 Terminal and SWE Agents：Smaller Models, Unexpected Costs: Trade-offs in LLM Quantization for Automated Program Repair

2 / 7d3 / 30d4 / all

Topic

sandboxing

最近一次命中来自 Agent Runtime Security：Burnyard: Future of Malware Analysis

1 / 7d3 / 30d4 / all

Topic

agent security

最近一次命中来自 Agent Runtime Security：Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

0 / 7d3 / 30d3 / all

Topic

automated program repair

最近一次命中来自 Terminal and SWE Agents：Smaller Models, Unexpected Costs: Trade-offs in LLM Quantization for Automated Program Repair

1 / 7d2 / 30d3 / all

Topic

data exfiltration

最近一次命中来自 Agent Runtime Security：Securing LLM-Agent Long-Term Memory Against Poisoning: Non-Malleable, Origin-Bound Authority with Machine-Checked Guarantees

1 / 7d1 / 30d3 / all

Topic

secure agent

最近一次命中来自 Agent Runtime Security：Provably Secure Agent Guardrail

0 / 7d0 / 30d3 / all

Topic

SWE bench

最近一次命中来自 Terminal and SWE Agents：Exploration Structure in LLM Agents for Multi-File Change Localization

0 / 7d2 / 30d3 / all

Topic

terminal agent

最近一次命中来自 Terminal and SWE Agents：Tmax: A simple recipe for terminal agents

1 / 7d2 / 30d3 / all

Topic

test generation

最近一次命中来自 Terminal and SWE Agents：Knowledge Matters: Injecting Project and Testing Knowledge into LLM-based Unit Test Generation

0 / 7d2 / 30d3 / all

Topic

privilege escalation

最近一次命中来自 Agent Runtime Security：Seeing Is Not Screening: Multimodal Hidden Instruction Attacks on Agent Skill Scanners

0 / 7d1 / 30d2 / all

Topic

retrieval augmented generation

最近一次命中来自 LM：Probabilistic Agents in Deterministic Audits: Evaluating Multi-Agent Systems for Automated Audits Based on the German IT-Grundschutz

1 / 7d2 / 30d2 / all

Topic

agent attack

最近一次命中来自 Agent Runtime Security：Secure UAV Swarms in Low-Altitude Wireless Networks: Challenges and Solutions

0 / 7d0 / 30d1 / all

Topic

agent defense

最近一次命中来自 Agent Runtime Security：SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning

0 / 7d1 / 30d1 / all

Topic

agent isolation

最近一次命中来自 LLM：RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents

0 / 7d0 / 30d1 / all

Topic

agent sandbox

最近一次命中来自 Agent Runtime Security：DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback

0 / 7d0 / 30d1 / all

Topic

bug fixing

最近一次命中来自 Terminal and SWE Agents：DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

0 / 7d1 / 30d1 / all

Topic

code repair

最近一次命中来自 Terminal and SWE Agents：SHERLOC: Structured Diagnostic Localization for Code Repair Agents

1 / 7d1 / 30d1 / all

Topic

LLM agent security

最近一次命中来自 Agent Runtime Security：Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

0 / 7d1 / 30d1 / all

Topic

malicious tool

最近一次命中来自 Agent Runtime Security：From Control Boundary to Insurance Claim: Reconstructing AI-Mediated Losses Through the CER Framework

0 / 7d1 / 30d1 / all

Topic

multimodal language model

最近一次命中来自 LLM：Don't Show Pixels, Show Cues: Unlocking Visual Tool Reasoning in Language Models via Perception Programs

0 / 7d0 / 30d1 / all

Topic

patch generation

最近一次命中来自 Agent Runtime Security：EviACT: An Evidence-to-Action Framework for Agentic Program Repair

0 / 7d0 / 30d1 / all

Topic

repository level

最近一次命中来自 Terminal and SWE Agents：Dependency-Guided Repository-Level C-to-Rust Translation with Reinforcement Alignment

0 / 7d1 / 30d1 / all

Topic

runtime security

最近一次命中来自 LLM：ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

0 / 7d0 / 30d1 / all

Topic

shell agent

最近一次命中来自 Agent Runtime Security：How Agentic AI Coding Assistants Become the Attacker's Shell

0 / 7d0 / 30d1 / all

Topic

software engineering agent

最近一次命中来自 Terminal and SWE Agents：Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents

0 / 7d0 / 30d1 / all

Topic

terminal bench

最近一次命中来自 LM：What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design

0 / 7d0 / 30d1 / all

Topic

AI agent security

暂未命中，页面会持续追踪后续归档。

0 / 7d0 / 30d0 / all

Topic

autonomous agent security

暂未命中，页面会持续追踪后续归档。

0 / 7d0 / 30d0 / all

Topic

browser agent security

暂未命中，页面会持续追踪后续归档。

0 / 7d0 / 30d0 / all

Topic

code agent security

暂未命中，页面会持续追踪后续归档。

0 / 7d0 / 30d0 / all

Topic

command line agent

暂未命中，页面会持续追踪后续归档。

0 / 7d0 / 30d0 / all

Topic

function calling security

暂未命中，页面会持续追踪后续归档。

0 / 7d0 / 30d0 / all

Topic

MCP security

暂未命中，页面会持续追踪后续归档。

0 / 7d0 / 30d0 / all

Topic

model context protocol security

暂未命中，页面会持续追踪后续归档。

0 / 7d0 / 30d0 / all

Topic

software engineering benchmark

暂未命中，页面会持续追踪后续归档。

0 / 7d0 / 30d0 / all

Topic

terminal benchmark

暂未命中，页面会持续追踪后续归档。

0 / 7d0 / 30d0 / all

Topic

tool calling security

暂未命中，页面会持续追踪后续归档。

0 / 7d0 / 30d0 / all

Topic

tool-use security

暂未命中，页面会持续追踪后续归档。

0 / 7d0 / 30d0 / all

Topic

持续升温论文

这些论文在多个日期或多个 feed 中反复出现，更适合放进长期观察列表。

Momentum

Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery

This study proposes a lightweight multimodal adaptation framework to bridge the representation gap between RGB-pretrained VLMs and thermal infrared imagery, and demonstrates its practical utility using a real drone-coll…

1 天2 个 feed2 次命中

首次出现：2026-04-08 17:10:24 (UTC+08:00)最近出现：2026-04-08 17:10:24 (UTC+08:00)

Momentum

MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control

MLLMs have been successfully applied to multimodal embedding tasks, yet their generative reasoning capabilities remain underutilized. Directly incorporating chain-of-thought reasoning into embedding learning introduces…

1 天2 个 feed2 次命中

首次出现：2026-04-08 17:10:24 (UTC+08:00)最近出现：2026-04-08 17:10:24 (UTC+08:00)

查看完整持续升温视图查看周度回顾

趋势与订阅总览

Feed 趋势订阅

LM

LLM

Agent Runtime Security

Vision

Terminal and SWE Agents

PubMed AI

OpenAlex AI

关键词长期追踪

LLM

language model

benchmark

large language model

agent

evaluation

reasoning

RAG

alignment

coding agent

jailbreak

guardrail

prompt injection

SWE-bench

computer-use agent

Terminal-Bench

code agent

instruction tuning

repository-level

in-context learning

agent runtime

indirect prompt injection

policy enforcement

code editing

code generation benchmark

issue resolution

program repair

sandboxing

agent security

automated program repair

data exfiltration

secure agent

SWE bench

terminal agent

test generation

privilege escalation

retrieval augmented generation

agent attack

agent defense

agent isolation

agent sandbox

bug fixing

code repair

LLM agent security

malicious tool

multimodal language model

patch generation

repository level

runtime security

shell agent

software engineering agent

terminal bench

AI agent security

autonomous agent security

browser agent security

code agent security

command line agent

function calling security

MCP security

model context protocol security

software engineering benchmark

terminal benchmark

tool calling security

tool-use security

untrusted tool

持续升温论文

Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery

MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control