<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>code generation benchmark Topic Archive</title>
<link>code-generation-benchmark.html</link>
<description>关键词 code generation benchmark 的长期追踪 RSS，汇总历史命中文献。</description>
<language>zh-CN</language>
<lastBuildDate>Sun, 28 Jun 2026 05:24:06 +0000</lastBuildDate>
<item>
<title>VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination</title>
<link>../papers/arxiv-026cd79abeae.html</link>
<guid>https://arxiv.org/abs/2606.17999v1#2026-06-17#code-generation-benchmark</guid>
<pubDate>Wed, 17 Jun 2026 14:22:19 +0800</pubDate>
<description>MDLMs generate text by denoising a preallocated masked response canvas, making response-length modeling central to instruction tuning. Existing MDLMs often inherit the autoregressive convention of using repeated \texttt{[EOS]} tokens for padding during instruction tuning, giving \texttt{[EOS]} a dual role as both a semantic terminator and a padding token. We show that this dual role is a root cause of \texttt{[EOS]} overflow under large-block decoding. To decouple these roles, we propose VoidPa…</description>
</item>
<item>
<title>No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages</title>
<link>../papers/arxiv-3ac1fcf1ccb2.html</link>
<guid>https://arxiv.org/abs/2606.16827v1#2026-06-16#code-generation-benchmark</guid>
<pubDate>Tue, 16 Jun 2026 14:38:43 +0800</pubDate>
<description>Large Language Models (LLMs) have significantly advanced the automation of software engineering tasks. One prominent example is code generation, where an LLM produces code in a specified programming language based on a natural language description. Most research in this area has focused on high-resource languages, such as Python or Java, which benefit from abundant training data. A smaller body of work has explored low-resource languages, which are underrepresented in training corpora. In contr…</description>
</item>
<item>
<title>Closing the Loop on Latent Reasoning via Test-Time Reconstruction</title>
<link>../papers/arxiv-d8f49ccdc82d.html</link>
<guid>https://arxiv.org/abs/2606.06252#2026-06-05#code-generation-benchmark</guid>
<pubDate>Fri, 05 Jun 2026 13:25:00 +0800</pubDate>
<description>Recent work moves intermediate reasoning from natural-language traces into latent or cache-level representations to reduce token overhead and avoid a discrete communication bottleneck. However, this shift also removes a key advantage of textual reasoning: intermediate states are no longer inspectable, making it difficult to determine whether a latent state still preserves the constraints of the original query. As a result, latent reasoning typically operates in an open loop, where a latent stat…</description>
</item>
<item>
<title>Syntax Without Semantics: Teaching Large Language Models to Code in an Unseen Language</title>
<link>../papers/arxiv-ddd1be9c8e89.html</link>
<guid>https://arxiv.org/abs/2605.15607#2026-05-18#code-generation-benchmark</guid>
<pubDate>Mon, 18 May 2026 13:13:17 +0800</pubDate>
<description>Large language models (LLMs) achieve high pass rates on code generation benchmarks, yet whether they can transfer this ability to languages absent from pretraining remains poorly understood. We introduce PyLang, a minimal imperative language absent from all pretraining corpora, and evaluate frontier models zero-shot and fine-tuned Qwen3 (4B, 8B, 32B) on 352 problems. We find that fine-tuning quickly teaches syntax but fails to transfer semantic competence: Python outperforms PyLang by up to 19%…</description>
</item>
</channel>
</rss>
