On the Geometry of On-Policy Distillation
On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training
- 用途
- 検出
- 難易度
- Easy
- コスト
- High
「Fine-tuning」の検索結果
21 件On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training
Whisper, a widely adopted ASR model, is known to suffer from hallucinations - coherent transcriptions generate
Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this
Confidence-based loss weighting is usually avoided in generative models because it accelerates errors when the
Hard-negative source selection for dense retrieval is usually decided only after fine-tuning and downstream ev
Harmony is a compact symbolic layer where mathematical pitch relations, acoustic consonance, and musical conve
Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing
Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story prog
Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by under
Automatic Speech Recognition (ASR) has become a key technology for human--AI interaction. However, code-switch
We introduce VideoKR, the first large-scale training corpus specifically designed to strengthen knowledge- and
System prompt optimization improves agent behavior without modifying the underlying model, yielding human-read
Processing video in vision-language models is expensive: each frame occupies hundreds of tokens, and inference
Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning
Agentic language model systems alternate between two structurally distinct step types: structured tool calls (
How can a population of agents self-orchestrate and self-adapt into stronger collective intelligence without c
Reinforcement learning with verifiable rewards has rapidly advanced reasoning in vision--language models. Howe
We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision b
Off-policy reinforcement learning of pretrained flow policies remains challenging due to the instability of op
Recent video-based world models have made pixel-space environments interactive at the camera level: users can
Vision-Language Models (VLMs) are increasingly deployed in embodied environments, where they need produce nume