Trajectory-Refined Distillation
On-policy distillation (OPD) has become a central post-training tool for large language models (LLMs), providi
- 用途
- 技術検証・論文読解補助
- 難易度
- Easy
- コスト
- High
「RAG」の検索結果
51 件On-policy distillation (OPD) has become a central post-training tool for large language models (LLMs), providi
Repository-level coding benchmarks such as SWE-bench have driven a rapid surge in the capabilities of coding a
Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this
Large language models exhibit impressive zero-shot capabilities across a wide range of downstream tasks. Howev
We introduce MMAE, a Massive Multitask Audio Editing benchmark, serving as the first comprehensive evaluation
We present dots.tts, a 2B-parameter continuous autoregressive text-to-speech (TTS) foundation model that model
Retrieval-augmented QA pipelines often route retrieved passages through an LLM rewriter before a smaller reade
Image-to-Video diffusion models leverage input images to generate visually stunning content, yet frequently pr
Despite the rapid progress of Vision-Language Models (VLMs), the field lacks benchmarks that rigorously diagno
Reasoning models produce long chain-of-thought traces that are costly to distill and encourage verbose student
Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing
Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery
Video generation models have made impressive strides in synthesizing visually compelling content, yet their ou
Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reus
Large language models can reproduce training data, but existing memorization evaluations mostly measure whethe
Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predo
Video event prediction (VEP) requires models to infer unobserved future states from partial video evidence. Ex
Large language models are increasingly used to simulate social media users and infer how individuals may respo
A situated query like "where is Lin Wei?" often encodes more than its literal content: the user may also want
AI research often requires decisions before future evidence exists: which bottleneck to attack, which directio
Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not transla
Muon improves training efficiency over Adam in large language-model training by about two times, but the local
Progress in genomic foundation models is difficult to assess due to fragmented benchmarks, incompatible evalua
Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on
World-action models (WAMs) jointly generate future video and robot actions through iterative diffusion, achiev
System prompt optimization improves agent behavior without modifying the underlying model, yielding human-read
As multi-modal models advance towards long-form video understanding, memory emerges as a critical capability.
Instruction-guided speech editing requires a model to modify specified speech attributes while preserving unre
Equipping Large Language Models (LLMs) to execute reliable multi-step workflows has become a central challenge
3D vision has rapidly evolved, driven by increasingly diverse data representations, learning paradigms, and mo
Selection is a core operation in interactive image editing. To be practical, a user should be able to specify
Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science.
Large Reasoning Models (LRMs) have achieved remarkable progress thanks to Reinforcement Learning with Verifiab
High-quality pretraining data is a central ingredient in modern language models, but German-language resources
Memory is an indispensable capability for long-horizon LLM agents, enabling them to preserve and utilize infor
Existing benchmarks for MLLM-generated web artifacts assess interaction through local evidence and miss the re
Computer-use agents extend language models from text generation to sustained interaction with files, terminals
Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existin
On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more s
Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. Ho
The rapid progress of frontier large language models has led to widespread benchmark saturation, limiting the
Equivariance theory predicts that an architectural symmetry prior reduces sample complexity by a factor of |G|
Reinforcement learning with verifiable rewards has rapidly advanced reasoning in vision--language models. Howe
Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substanti
Transfer learning aims to facilitate the learning of a target domain by transferring knowledge from a source d
Large Language Models exhibit paradoxical fragility in fundamental arithmetic, implying a disconnect between i
AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capabilit
Mixture-of-Experts (MoE) is now the dominant architecture for frontier language models, yet it requires all ex
Language models can use verifiable rewards to improve at a wide variety of reasoning tasks. However, both para
We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework tha
Off-policy reinforcement learning of pretrained flow policies remains challenging due to the instability of op