{
  "generated_at": "2026-05-12T12:42:08.140141+08:00",
  "timezone": "Asia/Shanghai",
  "lookback_hours": 24,
  "sorting": {
    "default_sort_by": "hybrid",
    "summary": "hybrid (relevance first, published_at tie-break)",
    "weights": {
      "title_match_weight": 40,
      "summary_match_weight": 18,
      "doi_weight": 12,
      "pdf_weight": 8,
      "rich_summary_weight": 6,
      "metadata_weight": 4,
      "multi_source_weight": 10,
      "freshness_weight_cap": 24
    },
    "feeds": [
      {
        "name": "LM",
        "sort_by": "hybrid"
      },
      {
        "name": "Agent Runtime Security",
        "sort_by": "hybrid"
      }
    ]
  },
  "highlights": [
    "主题「LLM」：命中 19 篇，覆盖 LM、Agent Runtime Security，代表论文包括 《WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation》、《ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox》。",
    "主题「Language Model」：命中 16 篇，覆盖 LM、Agent Runtime Security，代表论文包括 《WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation》、《Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights》。",
    "主题「Benchmark」：命中 5 篇，覆盖 LM、Agent Runtime Security，代表论文包括 《ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox》、《AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents》。",
    "主题「Large Language Model」：命中 1 篇，覆盖 LM，代表论文包括 《Conformity Generates Collective Misalignment in AI Agents Societies》。",
    "主题「RAG」：命中 1 篇，覆盖 Agent Runtime Security，代表论文包括 《MATRA: Modeling the Attack Surface of Agentic AI Systems -- OpenClaw Case Study》。"
  ],
  "focus_items": [],
  "action_items": [],
  "topic_sections": [
    {
      "name": "LLM",
      "paper_count": 19,
      "feed_names": [
        "LM",
        "Agent Runtime Security"
      ],
      "paper_titles": [
        "WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation",
        "ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox",
        "AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents",
        "LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments",
        "Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights",
        "Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge",
        "LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges",
        "ConQuR: Corner Aligned Activation Quantization via Optimized Rotations for LLMs",
        "Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning",
        "DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization",
        "From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World",
        "Geospatial-Temporal Sensemaking of Remote Sensing Activity Detections with Multimodal Large Language Model",
        "BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD",
        "Grounded Satirical Generation with RAG",
        "Intrinsic Guardrails: How Semantic Geometry of Personality Interacts with Emergent Misalignment in LLMs",
        "Re-Triggering Safeguards within LLMs for Jailbreak Detection",
        "Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing",
        "RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems",
        "MATRA: Modeling the Attack Surface of Agentic AI Systems -- OpenClaw Case Study"
      ],
      "key_points": [
        "《WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation》〔评测 / 方法〕：Large language and vision-language models increasingly power agents that act on a user's behalf through command-line interface (CLI) harnesses. However, most a…",
        "《ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox》〔评测 / 数据 / 应用 / 方法〕：Current LLM agents are proficient at calling isolated APIs but struggle with the \"last mile\" of commercial software automation. In real-world scenarios, tools…"
      ]
    },
    {
      "name": "Language Model",
      "paper_count": 16,
      "feed_names": [
        "LM",
        "Agent Runtime Security"
      ],
      "paper_titles": [
        "WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation",
        "Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights",
        "Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge",
        "Conformity Generates Collective Misalignment in AI Agents Societies",
        "LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges",
        "ConQuR: Corner Aligned Activation Quantization via Optimized Rotations for LLMs",
        "Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning",
        "DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization",
        "Geospatial-Temporal Sensemaking of Remote Sensing Activity Detections with Multimodal Large Language Model",
        "BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD",
        "Grounded Satirical Generation with RAG",
        "Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization",
        "Intrinsic Guardrails: How Semantic Geometry of Personality Interacts with Emergent Misalignment in LLMs",
        "Re-Triggering Safeguards within LLMs for Jailbreak Detection",
        "Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing",
        "RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems"
      ],
      "key_points": [
        "《WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation》〔评测 / 方法〕：Large language and vision-language models increasingly power agents that act on a user's behalf through command-line interface (CLI) harnesses. However, most a…",
        "《Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights》〔评测 / 应用 / 方法〕：Large Language Models(LLMs) are increasingly explored for cybersecurity applications such as vulnerability detection. In the domain of threat modelling, prior…"
      ]
    },
    {
      "name": "Benchmark",
      "paper_count": 5,
      "feed_names": [
        "LM",
        "Agent Runtime Security"
      ],
      "paper_titles": [
        "ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox",
        "AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents",
        "LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments",
        "From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World",
        "Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization"
      ],
      "key_points": [
        "《ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox》〔评测 / 数据 / 应用 / 方法〕：Current LLM agents are proficient at calling isolated APIs but struggle with the \"last mile\" of commercial software automation. In real-world scenarios, tools…",
        "《AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents》〔评测 / 应用 / 方法〕：Recent advances in machine learning and large-scale biological data collections have revived the prospect of building a virtual cell, a computational model of…"
      ]
    },
    {
      "name": "Large Language Model",
      "paper_count": 1,
      "feed_names": [
        "LM"
      ],
      "paper_titles": [
        "Conformity Generates Collective Misalignment in AI Agents Societies"
      ],
      "key_points": [
        "《Conformity Generates Collective Misalignment in AI Agents Societies》〔评测 / 应用 / 方法〕：Artificial intelligence safety research focuses on aligning individual language models with human values, yet deployed AI systems increasingly operate as inter…"
      ]
    },
    {
      "name": "RAG",
      "paper_count": 1,
      "feed_names": [
        "Agent Runtime Security"
      ],
      "paper_titles": [
        "MATRA: Modeling the Attack Surface of Agentic AI Systems -- OpenClaw Case Study"
      ],
      "key_points": [
        "《MATRA: Modeling the Attack Surface of Agentic AI Systems -- OpenClaw Case Study》〔应用 / 方法〕：LLMs are increasingly deployed as autonomous agents with access to tools, databases, and external services, yet practitioners (across different sectors) lack s…"
      ]
    }
  ],
  "template": "zh_daily_brief",
  "feeds": [
    {
      "name": "LM",
      "key_points": [
        "《WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation》〔评测 / 方法〕：Large language and vision-language models increasingly power agents that act on a user's behalf through command-line interface (CLI) harnesses. However, most a…",
        "《ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox》〔评测 / 数据 / 应用 / 方法〕：Current LLM agents are proficient at calling isolated APIs but struggle with the \"last mile\" of commercial software automation. In real-world scenarios, tools…",
        "《AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents》〔评测 / 应用 / 方法〕：Recent advances in machine learning and large-scale biological data collections have revived the prospect of building a virtual cell, a computational model of…",
        "《LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments》〔评测 / 方法〕：The rapid proliferation of LLM-based autonomous agents in real operating system environments introduces a new category of safety risk beyond content safety: be…",
        "《Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights》〔评测 / 应用 / 方法〕：Large Language Models(LLMs) are increasingly explored for cybersecurity applications such as vulnerability detection. In the domain of threat modelling, prior…"
      ],
      "sort_by": "hybrid",
      "papers": [
        {
          "title": "WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation",
          "summary": "Large language and vision-language models increasingly power agents that act on a user's behalf through command-line interface (CLI) harnesses. However, most agent benchmarks still rely on synthetic sandboxes, short-horizon tasks, mock-service APIs, and final-answer checks, leaving open whether agents can complete realistic long-horizon work in the runtimes where they are deployed. This work presents WildClawBench, a native-runtime benchmark of 60 human-authored, bilingual, multimodal tasks spanning six thematic categories. Each task averages roughly 8 minutes of wall-clock time and over 20 tool calls, and runs inside a reproducible Docker container hosting an actual CLI agent harness (OpenClaw, Claude Code, Codex, or Hermes Agent) with access to real tools rather than mock services. Grading is hybrid, combining deterministic rule-based checks, environment-state auditing of side effects, and an LLM/VLM judge for semantic verification. Across 19 frontier models, the best, Claude Opus 4.7, reaches only 62.2% overall under OpenClaw, while every other model stays below 60%, and switching harness alone shifts a single model by up to 18 points. These results show that long-horizon, native-runtime agent evaluation remains a far-from-resolved task for current frontier models. We release the tasks, code, and containerized tooling to support reproducible evaluation.",
          "authors": [
            "Shuangrui Ding",
            "Xuanlang Dai",
            "Long Xing",
            "Shengyuan Ding",
            "Ziyu Liu",
            "Yang JingYi",
            "Penghui Yang",
            "Zhixiong Zhang",
            "Xilin Wei",
            "Xinyu Fang",
            "Yubo Ma",
            "Haodong Duan",
            "Jing Shao",
            "Jiaqi Wang",
            "Dahua Lin",
            "Kai Chen",
            "Yuhang Zang"
          ],
          "categories": [
            "cs.CL"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10912v1",
          "abstract_url": "https://arxiv.org/abs/2605.10912v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10912v1",
          "published_at": "2026-05-11T17:49:43+00:00",
          "updated_at": "2026-05-11T17:49:43+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10912",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10912v1"
          },
          "relevance_score": 205,
          "match_reasons": [
            "title matched \"agent\"",
            "title matched \"benchmark\"",
            "title matched \"evaluation\"",
            "summary matched \"language model\"",
            "summary matched \"LLM\"",
            "summary matched \"RAG\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10912"
        },
        {
          "title": "ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox",
          "summary": "Current LLM agents are proficient at calling isolated APIs but struggle with the \"last mile\" of commercial software automation. In real-world scenarios, tools are not independent; they are atomic, interdependent, and prone to environmental noise. We introduce $\\textbf{ComplexMCP}$, a benchmark designed to evaluate agents in these rigorous conditions. Built on the Model Context Protocol (MCP), $\\textbf{ComplexMCP}$ provides over 300 meticulously tested tools derived from 7 stateful sandboxes, ranging from office suites to financial systems. Unlike existing datasets, our benchmark utilizes a seed-driven architecture to simulate dynamic environment states and unpredictable API failures, ensuring a deterministic yet diverse evaluation. We evaluate various LLMs across full-context and RAG paradigms, revealing a stark performance gap: even top-tier models fail to exceed a 60% success rate, far trailing human performance 90%. Granular trajectory analysis identifies three fundamental bottlenecks: (1) $\\textbf{tool retrieval saturation}$ as action spaces scale; (2) $\\textbf{over-confidence}$, where agents skip essential environment verifications; and (3) $\\textbf{strategic defeatism}$, a tendency to rationalize failure rather than pursuing recovery. These findings underscore the insufficiency of current agents for interdependent workflows, positioning $\\textbf{ComplexMCP}$ as a critical testbed for the next generation of resilient autonomous systems.",
          "authors": [
            "Yuanyang Li",
            "Xue Yang",
            "Longyue Wang",
            "Weihua Luo",
            "Hongyang Chen"
          ],
          "categories": [
            "cs.AI",
            "cs.SE"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10787v1",
          "abstract_url": "https://arxiv.org/abs/2605.10787v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10787v1",
          "published_at": "2026-05-11T16:20:51+00:00",
          "updated_at": "2026-05-11T16:20:51+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "数据",
            "应用",
            "方法"
          ],
          "topics": [
            "LLM",
            "Benchmark"
          ],
          "doi": null,
          "arxiv_id": "2605.10787",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10787v1"
          },
          "relevance_score": 185,
          "match_reasons": [
            "title matched \"LLM\"",
            "title matched \"agent\"",
            "title matched \"evaluation\"",
            "summary matched \"RAG\"",
            "summary matched \"benchmark\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10787"
        },
        {
          "title": "AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents",
          "summary": "Recent advances in machine learning and large-scale biological data collections have revived the prospect of building a virtual cell, a computational model of cellular behavior that could accelerate biological discovery. One of the most compelling promises of this vision is the ability to perform in silico phenotypic screens, in which a model predicts the effects of cellular perturbations in unseen biological contexts. This task combines heterogeneous textual inputs with diverse phenotypic outputs, making it particularly well-suited to LLMs and agentic systems. Yet, no standard benchmark currently exists for this task, as existing efforts focus on narrower molecular readouts that are only indirectly aligned with the phenotypic endpoints driving many real-world drug discovery workflows. In this work, we present AssayBench, a benchmark for phenotypic screen prediction, built from 1,920 publicly available CRISPR screens spanning five broad classes of cellular phenotypes. We formulate the screen prediction task as a gene rank prediction for each screen and introduce the adjusted nDCG, a continuous metric for comparing performance across heterogeneous assays. Our extensive evaluation shows that existing methods remain far from empirically estimated performance ceilings and zero-shot generalist LLMs outperform biology-specific LLMs and trainable baselines. Optimization techniques such as fine-tuning, ensembling, and prompt optimization can further improve LLM performance on this task. Overall, AssayBench offers a practical testbed for measuring progress toward in silico phenotypic screening and, more broadly, virtual cell models.",
          "authors": [
            "Edward De Brouwer",
            "Carl Edwards",
            "Alexander Wu",
            "Jenna Collier",
            "Graham Heimberg",
            "Xiner Li",
            "Meena Subramaniam",
            "Ehsan Hajiramezanali",
            "David Richmond",
            "Jan-Christian Hütter",
            "Sara Mostafavi",
            "Gabriele Scalia"
          ],
          "categories": [
            "cs.LG",
            "cs.AI",
            "q-bio.QM"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10876v1",
          "abstract_url": "https://arxiv.org/abs/2605.10876v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10876v1",
          "published_at": "2026-05-11T17:27:16+00:00",
          "updated_at": "2026-05-11T17:27:16+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "应用",
            "方法"
          ],
          "topics": [
            "LLM",
            "Benchmark"
          ],
          "doi": null,
          "arxiv_id": "2605.10876",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10876v1"
          },
          "relevance_score": 168,
          "match_reasons": [
            "title matched \"LLM\"",
            "title matched \"agent\"",
            "title matched \"benchmark\"",
            "summary matched \"evaluation\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10876"
        },
        {
          "title": "LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments",
          "summary": "The rapid proliferation of LLM-based autonomous agents in real operating system environments introduces a new category of safety risk beyond content safety: behavior jailbreak, where an adversary induces an agent to execute dangerous OS-level operations with irreversible consequences. Existing benchmarks either evaluate safety at the semantic layer alone, missing physical-layer harms, or fail to isolate test cases, letting earlier runs contaminate later ones. We present LITMUS (LLM-agents In-OS Testing for Measuring Unsafe Subversion), a benchmark addressing both gaps via a semantic-physical dual verification mechanism and OS-level state rollback. LITMUS comprises 819 high-risk test cases organized into one harmful seed subset and six attack-extended subsets covering three adversarial paradigms (jailbreak speaking, skill injection, and entity wrapping), plus a fully automated multi-agent evaluation framework judging behavior at both conversational and OS-level physical layers. Evaluation across frontier agents reveals three findings: (1) current agents lack effective safety awareness, with strong models (e.g., Claude Sonnet 4.6) still executing 40.64% of high-risk operations; (2) agents exhibit pervasive Execution Hallucination (EH), verbally refusing a request while the dangerous operation has already completed at the system level, invisible to every prior semantic-only framework; and (3) skill injection and entity wrapping attacks achieve high success rates, exposing pronounced agent vulnerabilities. LITMUS provides the first standardized platform for reproducible, physically grounded behavioral safety evaluation of LLM agents in real OS environments.",
          "authors": [
            "Chiyu Zhang",
            "Huiqin Yang",
            "Bendong Jiang",
            "Xiaolei Zhang",
            "Yiran Zhao",
            "Ruyi Chen",
            "Lu Zhou",
            "Xiaogang Xu",
            "Jiafei Wu",
            "Liming Fang",
            "Zhe Liu"
          ],
          "categories": [
            "cs.CR",
            "cs.CL"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10779v1",
          "abstract_url": "https://arxiv.org/abs/2605.10779v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10779v1",
          "published_at": "2026-05-11T16:14:04+00:00",
          "updated_at": "2026-05-11T16:14:04+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "方法"
          ],
          "topics": [
            "LLM",
            "Benchmark"
          ],
          "doi": null,
          "arxiv_id": "2605.10779",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10779v1"
          },
          "relevance_score": 167,
          "match_reasons": [
            "title matched \"LLM\"",
            "title matched \"agent\"",
            "title matched \"benchmark\"",
            "summary matched \"evaluation\"",
            "has PDF",
            "has rich summary",
            "has complete metadata",
            "title matched \"jailbreak\""
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10779"
        },
        {
          "title": "Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights",
          "summary": "Large Language Models(LLMs) are increasingly explored for cybersecurity applications such as vulnerability detection. In the domain of threat modelling, prior work has primarily evaluated a number of general-purpose Large Language Models under limited prompting settings. In this study, we extend the research area of structured threat modelling by systematically evaluating domain-adapted language models of different sizes to their general counterparts. We use both LLMs and Small Language Models(SLMs) that were domain adapted to telecommunications and cybersecuirty. For the structured threat modelling, we selected the widely used STRIDE approach and the application area is 5G security. We present a comprehensive empirical evaluation using 52 different configurations (on 8 different language models) to analyze the impact of 1) domain adaptation, 2) model scale, 3) decoding strategies (greedy vs. stochastic sampling), and 4) prompting technique on STRIDE threat classification. Our results show that domain-adapted models do not consistently outperform their general-purpose counterparts, and decoding strategies significantly affect model behavior and output validity. They also show that while larger models generally achieve higher performance, these gains are neither consistent nor sufficient for reliable threat modelling. These findings highlight fundamental limitations of current LLMs for structured threat modelling tasks and suggest that improvements require more than additional training data or model scaling, motivating the need for incorporating more task-specific reasoning and stronger grounding in security concepts. We present insights on invalid outputs encountered and present suggestions for prompting tailored specifically for STRIDE threat modelling.",
          "authors": [
            "Saba Pourhanifeh",
            "AbdulAziz AbdulGhaffar",
            "Ashraf Matrawy"
          ],
          "categories": [
            "cs.CR",
            "cs.AI"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10808v1",
          "abstract_url": "https://arxiv.org/abs/2605.10808v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10808v1",
          "published_at": "2026-05-11T16:31:25+00:00",
          "updated_at": "2026-05-11T16:31:25+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "应用",
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10808",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10808v1"
          },
          "relevance_score": 163,
          "match_reasons": [
            "title matched \"language model\"",
            "title matched \"evaluation\"",
            "summary matched \"large language model\"",
            "summary matched \"LLM\"",
            "summary matched \"reasoning\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10808"
        },
        {
          "title": "Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge",
          "summary": "Reasoning-capable large language models (LLMs) have recently been adopted as automated judges, but their benefits and costs in LLM-as-a-Judge settings remain unclear. Through controlled comparisons between reasoning and non-reasoning judges, we show that explicit reasoning substantially improves judgment accuracy on tasks requiring structured verification (e.g., math and coding), while offering limited or even negative gains on simpler evaluations and incurring significantly higher computational cost. These findings motivate that reasoning should be used selectively rather than universally, with awareness of possible distribution shift. We propose a Robust Adaptive Cost-Efficient Routing (RACER), which dynamically selects between reasoning and non-reasoning judges under a fixed budget by formulating routing as a constrained distributionally robust optimization problem. RACER explicitly accounts for distribution shift via a KL-divergence uncertainty set, admits an efficient primal--dual algorithm, and enjoys theoretical guarantees including uniqueness of the optimal policy and linear convergence. Extensive experiments show that RACER achieves superior accuracy--cost trade-offs under distribution shift.",
          "authors": [
            "Wenbo Zhang",
            "Lijinghua Zhang",
            "Liner Xiang",
            "Hengrui Cai"
          ],
          "categories": [
            "cs.AI",
            "cs.CL",
            "stat.ML"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10805v1",
          "abstract_url": "https://arxiv.org/abs/2605.10805v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10805v1",
          "published_at": "2026-05-11T16:30:20+00:00",
          "updated_at": "2026-05-11T16:30:20+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10805",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10805v1"
          },
          "relevance_score": 163,
          "match_reasons": [
            "title matched \"LLM\"",
            "title matched \"reasoning\"",
            "summary matched \"language model\"",
            "summary matched \"large language model\"",
            "summary matched \"evaluation\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10805"
        },
        {
          "title": "Conformity Generates Collective Misalignment in AI Agents Societies",
          "summary": "Artificial intelligence safety research focuses on aligning individual language models with human values, yet deployed AI systems increasingly operate as interacting populations where social influence may override individual alignment. Here we show that populations of individually aligned AI agents can be driven into stable misaligned states through conformity dynamics. Simulating opinion dynamics across nine large language models and one hundred opinion pairs, we find that each agent's behavior is governed by two competing forces: a tendency to follow the majority and an intrinsic bias toward specific positions. Using tools from statistical physics, we derive a quantitative theory that predicts when populations become trapped in long-lived misaligned configurations, and identifies predictable tipping points where small numbers of adversarial agents can irreversibly shift population-level alignment even after manipulation ceases. These results demonstrate that individual-level alignment provides no guarantee of collective safety, calling for evaluation frameworks that account for emergent behavior in AI populations.",
          "authors": [
            "Giordano De Marzo",
            "Alessandro Bellina",
            "Claudio Castellano",
            "Viola Priesemann",
            "David Garcia"
          ],
          "categories": [
            "physics.soc-ph",
            "cs.CL",
            "cs.MA"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10721v1",
          "abstract_url": "https://arxiv.org/abs/2605.10721v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10721v1",
          "published_at": "2026-05-11T15:30:48+00:00",
          "updated_at": "2026-05-11T15:30:48+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "应用",
            "方法"
          ],
          "topics": [
            "Language Model",
            "Large Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10721",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10721v1"
          },
          "relevance_score": 162,
          "match_reasons": [
            "title matched \"agent\"",
            "title matched \"alignment\"",
            "summary matched \"language model\"",
            "summary matched \"large language model\"",
            "summary matched \"evaluation\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10721"
        },
        {
          "title": "LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges",
          "summary": "The integration of Large Language Models (LLMs) into Electronic Design Automation (EDA) and hardware security is rapidly reshaping the semiconductor industry. While LLMs offer unprecedented capabilities in generating Register Transfer Level (RTL) code, automating testbenches, and bridging the semantic gap between high-level specifications and silicon, they simultaneously introduce severe vulnerabilities. This comprehensive review provides an in-depth analysis of the state-of-the-art in LLM-driven hardware design, organized around key advancements in EDA synthesis, hardware trust, design for security, and education. We systematically expand on the methodologies of recent breakthroughs -- from reasoning-driven synthesis and multi-agent vulnerability extraction to data contamination and adversarial machine learning (ML) evasion. We integrate general discussions on critical countermeasures, such as dynamic benchmarking to combat data memorization and aggressive red-teaming for robust security assessment. Finally, we synthesize cross-cutting lessons learned to guide future research toward secure, trustworthy, and autonomous design ecosystems.",
          "authors": [
            "Johann Knechtel",
            "Ozgur Sinanoglu",
            "Ramesh Karri"
          ],
          "categories": [
            "cs.CR",
            "cs.AR",
            "cs.LG"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10807v1",
          "abstract_url": "https://arxiv.org/abs/2605.10807v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10807v1",
          "published_at": "2026-05-11T16:31:14+00:00",
          "updated_at": "2026-05-11T16:31:14+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "应用",
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10807",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10807v1"
          },
          "relevance_score": 159,
          "match_reasons": [
            "title matched \"LLM\"",
            "summary matched \"language model\"",
            "summary matched \"large language model\"",
            "summary matched \"reasoning\"",
            "summary matched \"agent\"",
            "summary matched \"benchmark\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10807"
        },
        {
          "title": "ConQuR: Corner Aligned Activation Quantization via Optimized Rotations for LLMs",
          "summary": "Large language models (LLMs) are costly to deploy due to their large memory footprint and high inference cost. Weight-activation quantization can reduce these costs, but low-bit activation quantization remains difficult because activation outliers induce large quantization error. Recent rotation-based methods address this by applying orthogonal transformations that redistribute activation magnitude across dimensions, but existing approaches either require expensive end-to-end rotation training or rely on stored activation corpora, introducing significant compute or storage overhead. We propose a lightweight post-training rotation calibration method for LLM activation quantization. Our method learns orthogonal rotations that align normalized activations with the corners of an inscribed hypercube, encouraging activation energy to be distributed more evenly across dimensions. This objective admits an efficient closed-form update via the orthogonal Procrustes problem, avoiding gradient-based optimization over the orthogonal group. We further introduce an online calibration procedure that updates rotations as calibration samples are processed, eliminating the need to store activations on disk and allowing rotations to adapt to quantized activation distributions during calibration. Experiments on Llama-2 and Llama-3 models from 3B to 70B parameters show that our method achieves competitive or improved performance across perplexity benchmarks and common sense reasoning tasks while avoiding both costly end-to-end training and large offline activation storage.",
          "authors": [
            "Chayne Thrash",
            "Ali Abbasi",
            "Soheil Kolouri"
          ],
          "categories": [
            "cs.LG"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10793v1",
          "abstract_url": "https://arxiv.org/abs/2605.10793v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10793v1",
          "published_at": "2026-05-11T16:23:10+00:00",
          "updated_at": "2026-05-11T16:23:10+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10793",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10793v1"
          },
          "relevance_score": 159,
          "match_reasons": [
            "title matched \"LLM\"",
            "summary matched \"language model\"",
            "summary matched \"large language model\"",
            "summary matched \"reasoning\"",
            "summary matched \"RAG\"",
            "summary matched \"benchmark\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10793"
        },
        {
          "title": "Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning",
          "summary": "Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, yet real-world deployment often requires continual capability expansion across sequential tasks. In such scenarios, Multimodal Continual Instruction Tuning (MCIT) aims to acquire new capabilities while limiting catastrophic forgetting. Existing methods mainly follow a module-composition paradigm: they maintain task-level prompts or LoRA experts and dynamically route or aggregate a subset of them at inference. However, samples within the same task can still differ substantially in visual scenes, question intents, and reasoning demands. This motivates instance-level adaptation to individual query-image pairs rather than only selecting or combining task-level modules. To this end, we propose DRAPE (Dynamic Cross-Modal Prompt Generation), a prompt-learning framework that synthesizes continuous instance-specific soft prompts for MCIT. Instead of selecting prompts from a fixed pool, DRAPE derives prompt queries from the textual instruction and cross-attends to visual patch features, producing query-image conditioned prompts that are prepended to the frozen LLM. To mitigate forgetting during sequential updates, DRAPE applies null-space gradient projection to the shared projector and uses CLIP-based prototype routing for task-label-free generator selection at inference. Extensive experiments on MCIT benchmarks show that DRAPE achieves state-of-the-art performance among representative prompt-based and LoRA-based continual-learning baselines.",
          "authors": [
            "Tao Hu",
            "Da-Wei Zhou"
          ],
          "categories": [
            "cs.CV",
            "cs.AI",
            "cs.LG"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10765v1",
          "abstract_url": "https://arxiv.org/abs/2605.10765v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10765v1",
          "published_at": "2026-05-11T15:59:06+00:00",
          "updated_at": "2026-05-11T15:59:06+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "应用",
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10765",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10765v1"
          },
          "relevance_score": 159,
          "match_reasons": [
            "title matched \"instruction tuning\"",
            "summary matched \"language model\"",
            "summary matched \"large language model\"",
            "summary matched \"LLM\"",
            "summary matched \"reasoning\"",
            "summary matched \"benchmark\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10765"
        },
        {
          "title": "DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization",
          "summary": "Although Large Language Models (LLMs) have made remarkable progress, current preference optimization methods still struggle to align directional consistency while preserving reasoning diversity. To address this limitation, we propose Directional-Groupwise Preference Optimization (DGPO), a lightweight framework that aggregates supervision signals at the group level and explicitly models direction-aware alignment through multi-candidate comparisons. DGPO organizes forward and reverse question-answer instances into structured sets and optimizes a margin-based likelihood objective that separates coherent reasoning paths from inconsistent alternatives. This group-wise formulation captures richer relative information than pairwise objectives and reinforces consistency across diverse reasoning pathways. Empirical results show that our constructed reverse data yields a 3.2% average improvement across five benchmarks, while DGPO further delivers consistent gains across multiple datasets and model families, achieving average accuracy improvements of up to 3.6%.",
          "authors": [
            "Mengyi Deng",
            "Zhiwei Li",
            "Xin Li",
            "Tingyu Zhu",
            "Yulan Yuan",
            "Zhijiang Guo",
            "Wei Wang"
          ],
          "categories": [
            "cs.CL"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10863v1",
          "abstract_url": "https://arxiv.org/abs/2605.10863v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10863v1",
          "published_at": "2026-05-11T17:10:44+00:00",
          "updated_at": "2026-05-11T17:10:44+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "数据",
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10863",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10863v1"
          },
          "relevance_score": 156,
          "match_reasons": [
            "summary matched \"language model\"",
            "summary matched \"large language model\"",
            "summary matched \"LLM\"",
            "summary matched \"reasoning\"",
            "summary matched \"alignment\"",
            "summary matched \"RAG\"",
            "summary matched \"benchmark\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10863"
        },
        {
          "title": "From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World",
          "summary": "AI pentesting agents are increasingly credible as offensive security systems, but current benchmarks still provide limited guidance on which will perform best in real-world targets. Existing evaluation protocols assess and optimize for predefined goals such as capture-the-flag, remote code execution, exploit reproduction, or trajectory similarity, in simplified or narrow settings. These tools are valuable for measuring bounded capabilities, yet they do not adequately capture the complexity, open-ended exploration, and strategic decision-making required in realistic pentesting. In this paper, we present a practical evaluation protocol that shifts assessment from task completion to validated vulnerability discovery, allowing evaluation in sufficiently complex targets spanning multiple attack surfaces and vulnerability classes. The protocol combines structured ground-truth with LLM-based semantic matching to identify vulnerabilities, bipartite resolution to score findings under realistic ambiguity, continuous ground-truth maintenance, repeated and cumulative evaluation of stochastic agents, efficiency metrics, and reduced-suite selection for sustainable experimentation. This protocol extends the state of the art by enabling a more realistic, operationally informative comparison of AI pentesting agents. To enable reproducibility, we also release expert-annotated ground truth and code for the proposed evaluation protocol: https://github.com/jd0965199-oss/ethibench.",
          "authors": [
            "Pedro Conde",
            "Henrique Branquinho",
            "Valerio Mazzone",
            "Bruno Mendes",
            "André Baptista",
            "Nuno Moniz"
          ],
          "categories": [
            "cs.AI",
            "cs.CR"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10834v1",
          "abstract_url": "https://arxiv.org/abs/2605.10834v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10834v1",
          "published_at": "2026-05-11T16:50:00+00:00",
          "updated_at": "2026-05-11T16:50:00+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "方法"
          ],
          "topics": [
            "LLM",
            "Benchmark"
          ],
          "doi": null,
          "arxiv_id": "2605.10834",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10834v1"
          },
          "relevance_score": 146,
          "match_reasons": [
            "title matched \"agent\"",
            "title matched \"evaluation\"",
            "summary matched \"LLM\"",
            "summary matched \"benchmark\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10834"
        },
        {
          "title": "Geospatial-Temporal Sensemaking of Remote Sensing Activity Detections with Multimodal Large Language Model",
          "summary": "We introduce SMART-HC-VQA, a Sentinel-2-based visual question answering dataset derived from the IARPA SMART Heavy Construction dataset, designed for spatiotemporal analysis of human activity. The dataset transforms construction-site annotations, construction-type labels, temporal-phase labels, geographic metadata, and observation relationships into natural language question-answer triplets. This approach redefines the existing dataset as a temporally extended automatic target recognition and visual question answering (VQA) challenge, considering a fixed geospatial site as a target whose attributes and activity states evolve across sparse satellite observations. Currently, SMART-HC-VQA comprises 21,837 accessible Sentinel-2 image chips, 65,511 single-image VQA examples, and approximately 2.3 million two-image temporal comparison examples generated via our novel Image-Pairwise Combinatorial Augmentation. We detail the workflow for retrieving and processing Sentinel-2 imagery, segmenting large satellite tiles into site-centered images, maintaining traceability to SMART-HC annotations, and analyzing the distributions of site size, observation count, temporal coverage, construction type, and phase labels. Additionally, we describe an implemented multi-image MLLM training framework based on LLaVA-NeXT Mistral-7B, adapted to accept multiple dated image inputs and train on metadata-derived VQA examples. This work offers a reproducible foundation for understanding language-guided remote sensing activities, aiming not only to detect change but also to reason about the ongoing processes, their progression, and potential future developments.",
          "authors": [
            "David F. Ramirez",
            "Tim Overman",
            "Kristen Jaskie",
            "Andreas Spanias"
          ],
          "categories": [
            "eess.IV",
            "cs.AI",
            "cs.CV"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10739v1",
          "abstract_url": "https://arxiv.org/abs/2605.10739v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10739v1",
          "published_at": "2026-05-11T15:42:09+00:00",
          "updated_at": "2026-05-11T15:42:09+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "数据",
            "应用",
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10739",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10739v1"
          },
          "relevance_score": 145,
          "match_reasons": [
            "title matched \"language model\"",
            "title matched \"large language model\"",
            "summary matched \"LLM\"",
            "summary matched \"RAG\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10739"
        },
        {
          "title": "BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD",
          "summary": "Industrial Computer-Aided Design (CAD) code generation requires models to produce executable parametric programs from visual or textual inputs. Beyond recognizing the outer shape of a part, this task involves understanding its 3D structure, inferring engineering parameters, and choosing CAD operations that reflect how the part would be designed and manufactured. Despite the promise of Multimodal large language models (MLLMs) for this task, they are rarely evaluated on whether these capabilities jointly hold in realistic industrial CAD settings. We present BenchCAD, a unified benchmark for industrial CAD reasoning. BenchCAD contains 17,900 execution-verified CadQuery programs across 106 industrial part families, including bevel gears, compression springs, twist drills, and other reusable engineering designs. It evaluates models through visual question answering, code question answering, image-to-code generation, and instruction-guided code editing, enabling fine-grained analysis across perception, parametric abstraction, and executable program synthesis. Across 10+ frontier models, BenchCAD shows that current systems often recover coarse outer geometry but fail to produce faithful parametric CAD programs. Common failures include missing fine 3D structure, misinterpreting industrial design parameters, and replacing essential operations such as sweeps, lofts, and twist-extrudes with simpler sketch-and-extrude patterns. Fine-tuning and reinforcement learning improve in-distribution performance, but generalization to unseen part families remains limited. These results position BenchCAD as a benchmark for measuring and improving the industrial readiness of multimodal CAD automation.",
          "authors": [
            "Haozhe Zhang",
            "Kaichen Liu",
            "Miaomiao Chen",
            "Lei Li",
            "Shaojie Yang",
            "Cheng Peng",
            "Hanjie Chen"
          ],
          "categories": [
            "cs.AI",
            "cs.CV",
            "cs.SE"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10865v1",
          "abstract_url": "https://arxiv.org/abs/2605.10865v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10865v1",
          "published_at": "2026-05-11T17:13:36+00:00",
          "updated_at": "2026-05-11T17:13:36+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10865",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10865v1"
          },
          "relevance_score": 142,
          "match_reasons": [
            "title matched \"benchmark\"",
            "summary matched \"language model\"",
            "summary matched \"large language model\"",
            "summary matched \"LLM\"",
            "summary matched \"reasoning\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10865"
        },
        {
          "title": "Grounded Satirical Generation with RAG",
          "summary": "Humor generation remains challenging task for Large Language Models (LLMs), due to their subjective nature. We focus on satire, a form of humor strongly shaped by context. In this work, we present a novel pipeline for grounded satire generation that uses Retrieval-Augmented Generation (RAG) over current news to produce satirical dictionary definitions in the Finnish context. We also introduce a new task-specific evaluation framework and annotate 100 generated definitions with six human annotators, enabling analysis across multiple experimental conditions, including cultural background, source-word type, and the presence or absence of RAG. Our results show that the generated definitions are perceived as more political than humorous. Both topic-based word selection and RAG improve the political relevance of the outputs, but neither yields clear gains in humor generation. In addition, our LLM-as-a-judge evaluation of five state-of-the-art models indicates that LLMs correlate well with human judgments on political relevance, but perform poorly on humor. We release our code and annotated dataset to support further research on grounded satire generation and evaluation.",
          "authors": [
            "Oona Itkonen",
            "Yuxin Su",
            "Linyao Du",
            "Ona De Gibert"
          ],
          "categories": [
            "cs.CL"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10853v1",
          "abstract_url": "https://arxiv.org/abs/2605.10853v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10853v1",
          "published_at": "2026-05-11T17:00:51+00:00",
          "updated_at": "2026-05-11T17:00:51+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "数据",
            "应用",
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10853",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10853v1"
          },
          "relevance_score": 142,
          "match_reasons": [
            "title matched \"RAG\"",
            "summary matched \"language model\"",
            "summary matched \"large language model\"",
            "summary matched \"LLM\"",
            "summary matched \"evaluation\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10853"
        }
      ]
    },
    {
      "name": "Agent Runtime Security",
      "key_points": [
        "《Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization》〔评测 / 方法〕：Recent studies show that gradient-based universal image jailbreaks on vision-language models (VLMs) exhibit little or no cross-model transferability, casting d…",
        "《Intrinsic Guardrails: How Semantic Geometry of Personality Interacts with Emergent Misalignment in LLMs》〔评测 / 方法〕：Fine-tuning Large Language Models (LLMs) on benign narrow data can sometimes induce broad harmful behaviors, a vulnerability termed emergent misalignment (EM).…",
        "《Re-Triggering Safeguards within LLMs for Jailbreak Detection》〔应用 / 方法〕：This paper proposes a jailbreaking prompt detection method for large language models (LLMs) to defend against jailbreak attacks. Although recent LLMs are equip…",
        "《Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing》〔方法〕：This paper proposes a guaranteed defense method for large language models (LLMs) to safeguard against jailbreaking attacks. Drawing inspiration from the denois…",
        "《RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems》〔应用 / 方法〕：This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models (LLMs) in…"
      ],
      "sort_by": "hybrid",
      "papers": [
        {
          "title": "Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization",
          "summary": "Recent studies show that gradient-based universal image jailbreaks on vision-language models (VLMs) exhibit little or no cross-model transferability, casting doubt on the feasibility of transferable multimodal jailbreaks. We revisit this conclusion under a strictly untargeted threat model without enforcing a fixed prefix or response pattern. Our preliminary experiment reveals that refusal behavior concentrates at high-entropy tokens during autoregressive decoding, and non-refusal tokens already carry substantial probability mass among the top-ranked candidates before attack. Motivated by this finding, we propose Untargeted Jailbreak via Entropy Maximization(UJEM)-KL, a lightweight attack that maximizes entropy at these decision tokens to flip refusal outcomes, while stabilizing the remaining low-entropy positions to preserve output quality. Across three VLMs and two safety benchmarks, UJEM-KL achieves competitive white-box attack success rates and consistently improves transferability, while remaining effective under representative defenses. Our experimental results indicate that the limited transferability primarily stems from overly constrained optimization objectives.",
          "authors": [
            "Mengqi He",
            "Xinyu Tian",
            "Xin Shen",
            "Shu Zou",
            "Jinhong Ni",
            "Zhaoyuan Yang",
            "Weikang Li",
            "Xuesong Li",
            "Jing Zhang"
          ],
          "categories": [
            "cs.CV",
            "cs.AI"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10764v1",
          "abstract_url": "https://arxiv.org/abs/2605.10764v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10764v1",
          "published_at": "2026-05-11T15:59:02+00:00",
          "updated_at": "2026-05-11T15:59:02+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "方法"
          ],
          "topics": [
            "Language Model",
            "Benchmark"
          ],
          "doi": null,
          "arxiv_id": "2605.10764",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10764v1"
          },
          "relevance_score": 69,
          "match_reasons": [
            "title matched \"jailbreak\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10764"
        },
        {
          "title": "Intrinsic Guardrails: How Semantic Geometry of Personality Interacts with Emergent Misalignment in LLMs",
          "summary": "Fine-tuning Large Language Models (LLMs) on benign narrow data can sometimes induce broad harmful behaviors, a vulnerability termed emergent misalignment (EM). While prior work links these failures to specific directions in the activation space, their relationship to the model's broader persona remains unexplored. We map the latent personality space of LLMs through established psychometric profiles like the Big Five, Dark Triad, and LLM-specific behaviors (e.g. evil, sycophancy), and show that the semantic geometry is highly stable across aligned models and their corrupted fine-tunes. Through causal interventions, we find that directions isolating social valence, such as the 'Evil' persona vector, and a Semantic Valence Vector (SVV) that we introduce, function as intrinsic guardrails: ablating them drives the misalignment rates above $40$%, while amplifying them suppresses the failure mode to less than $3$%. Leveraging the structural stability of the personality space, we show that vectors extracted $\\textit{a priori}$ from an instruct-tuned model transfer zero-shot to successfully regulate EM in corrupted fine-tunes. Overall, our findings suggest that harmful fine-tuning does not overwrite a model's internal representation of personality, allowing conserved representations to serve as robust, cross-distribution guardrails.",
          "authors": [
            "Krishak Aneja",
            "Manas Mittal",
            "Anmol Goel",
            "Ponnurangam Kumaraguru",
            "Vamshi Krishna Bonagiri"
          ],
          "categories": [
            "cs.CL",
            "cs.AI"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10633v1",
          "abstract_url": "https://arxiv.org/abs/2605.10633v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10633v1",
          "published_at": "2026-05-11T14:21:57+00:00",
          "updated_at": "2026-05-11T14:21:57+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "评测",
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10633",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10633v1"
          },
          "relevance_score": 67,
          "match_reasons": [
            "title matched \"guardrail\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10633"
        },
        {
          "title": "Re-Triggering Safeguards within LLMs for Jailbreak Detection",
          "summary": "This paper proposes a jailbreaking prompt detection method for large language models (LLMs) to defend against jailbreak attacks. Although recent LLMs are equipped with built-in safeguards, it remains possible to craft jailbreaking prompts that bypass them. We argue that such jailbreaking prompts are inherently fragile, and thus introduce an embedding disruption method to re-activate the safeguards within LLMs. Unlike previous defense methods that aim to serve as standalone solutions, our approach instead cooperates with the LLM's internal defense mechanisms by re-triggering them. Moreover, through extensive analysis, we gain a comprehensive understanding of the disruption effects and develop an efficient search algorithm to identify appropriate disruptions for effective jailbreak detection. Extensive experiments demonstrate that our approach effectively defends against state-of-the-art jailbreak attacks in white-box and black-box settings, and remains robust even against adaptive attacks.",
          "authors": [
            "Zheng Lin",
            "Zhenxing Niu",
            "Haoxuan Ji",
            "Yuzhe Huang",
            "Haichang Gao"
          ],
          "categories": [
            "cs.CR",
            "cs.AI"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10611v1",
          "abstract_url": "https://arxiv.org/abs/2605.10611v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10611v1",
          "published_at": "2026-05-11T14:09:31+00:00",
          "updated_at": "2026-05-11T14:09:31+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "应用",
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10611",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10611v1"
          },
          "relevance_score": 67,
          "match_reasons": [
            "title matched \"jailbreak\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10611"
        },
        {
          "title": "Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing",
          "summary": "This paper proposes a guaranteed defense method for large language models (LLMs) to safeguard against jailbreaking attacks. Drawing inspiration from the denoised-smoothing approach in the adversarial defense domain, we propose a novel smoothing-based defense method, termed Disrupt-and-Rectify Smoothing (DR-Smoothing). Specifically, we integrate a two-stage prompt processing scheme-first disrupting the input prompt, then rectifying it-into the conventional smoothing defense framework. This disrupt-and-rectify approach improves upon previous disrupt-only approaches by restoring out-of-distribution disrupted prompts to an in-distribution form, thereby reducing the risk of unpredictable LLM behavior. In addition, this two-stage scheme offers a distinct advantage in striking a balance between harmlessness and helpfulness in jailbreaking defense. Notably, we present a theoretical analysis for generic smoothing framework, offering a tight bound for the defense success probability and the requirements on the disruption strength. Our approach can defend against both token-level and prompt-level jailbreaking attacks, under both established and adaptive attacking scenarios. Extensive experiments demonstrate that our approach surpasses current state-of-the-art defense methods in terms of both harmlessness and helpfulness.",
          "authors": [
            "Zheng Lin",
            "Zhenxing Niu",
            "Haoxuan Ji",
            "Haichang Gao"
          ],
          "categories": [
            "cs.CR",
            "cs.AI"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10582v1",
          "abstract_url": "https://arxiv.org/abs/2605.10582v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10582v1",
          "published_at": "2026-05-11T13:54:26+00:00",
          "updated_at": "2026-05-11T13:54:26+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10582",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10582v1"
          },
          "relevance_score": 67,
          "match_reasons": [
            "title matched \"jailbreak\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10582"
        },
        {
          "title": "RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems",
          "summary": "This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models (LLMs) in data-driven applications. We leverage novel pruning strategies to efficiently identify a minimal set of rules that subsume all others. We further demonstrate novel applications of these rules for LLM safety, specifically to test the resiliency of safety training and effectiveness of adversarial prompt injections.",
          "authors": [
            "Joel Rorseth",
            "Parke Godfrey",
            "Lukasz Golab",
            "Divesh Srivastava",
            "Jarek Szlichta"
          ],
          "categories": [
            "cs.CL"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10862v1",
          "abstract_url": "https://arxiv.org/abs/2605.10862v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10862v1",
          "published_at": "2026-05-11T17:10:35+00:00",
          "updated_at": "2026-05-11T17:10:35+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "应用",
            "方法"
          ],
          "topics": [
            "LLM",
            "Language Model"
          ],
          "doi": null,
          "arxiv_id": "2605.10862",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10862v1"
          },
          "relevance_score": 48,
          "match_reasons": [
            "summary matched \"prompt injection\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10862"
        },
        {
          "title": "MATRA: Modeling the Attack Surface of Agentic AI Systems -- OpenClaw Case Study",
          "summary": "LLMs are increasingly deployed as autonomous agents with access to tools, databases, and external services, yet practitioners (across different sectors) lack systematic methods to assess how known threat classes translate into concrete risks within a specific agentic deployment. We present MATRA, a pragmatic threat modeling framework for agentic AI systems that adapts established risk assessment methodology to systematically assess how known LLM threats translate into deployment-specific risks. MATRA begins with an asset-based impact assessment and utilizes attack trees to determine the likelihood of these impacts occurring within the system architecture. We demonstrate MATRA on a personal AI agent deployment using OpenClaw, quantifying how architectural controls such as network sandboxing and least-privilege access reduce risk by limiting the blast radius of successful injections.",
          "authors": [
            "Tim Van hamme",
            "Thomas Vissers",
            "Javier Carnerero-Cano",
            "Mario Fritz",
            "Emil C. Lupu",
            "Lieven Desmet",
            "Dinil Mon Divakaran"
          ],
          "categories": [
            "cs.AI",
            "cs.CR"
          ],
          "paper_id": "http://arxiv.org/abs/2605.10763v1",
          "abstract_url": "https://arxiv.org/abs/2605.10763v1",
          "pdf_url": "https://arxiv.org/pdf/2605.10763v1",
          "published_at": "2026-05-11T15:58:37+00:00",
          "updated_at": "2026-05-11T15:58:37+00:00",
          "source": "arxiv",
          "date_label": "Published",
          "analysis": null,
          "tags": [
            "应用",
            "方法"
          ],
          "topics": [
            "LLM",
            "RAG"
          ],
          "doi": null,
          "arxiv_id": "2605.10763",
          "source_variants": [
            "arxiv"
          ],
          "source_urls": {
            "arxiv": "https://arxiv.org/abs/2605.10763v1"
          },
          "relevance_score": 47,
          "match_reasons": [
            "summary matched \"sandboxing\"",
            "has PDF",
            "has rich summary",
            "has complete metadata"
          ],
          "feedback_status": null,
          "feedback_note": null,
          "feedback_next_action": null,
          "feedback_due_date": null,
          "feedback_snoozed_until": null,
          "feedback_review_interval_days": null,
          "canonical_id": "arxiv:2605.10763"
        }
      ]
    }
  ]
}