On the Geometry of On-Policy Distillation
On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training
- 用途
- 検出
- 難易度
- Easy
- コスト
- High
「supervised」の検索結果
14 件On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training
Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this
Confidence-based loss weighting is usually avoided in generative models because it accelerates errors when the
Latent visual reasoning (LVR) inserts supervised latent tokens between perception and answer generation in vis
Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term in
Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by under
Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reus
3D vision has rapidly evolved, driven by increasingly diverse data representations, learning paradigms, and mo
Memory is an indispensable capability for long-horizon LLM agents, enabling them to preserve and utilize infor
Training accurate medical image segmentation models requires large amounts of densely annotated data, which is
Weak-to-strong generalization studies how to improve a strong student using supervision from a weaker teacher
Reinforcement learning with verifiable rewards has rapidly advanced reasoning in vision--language models. Howe
Transfer learning aims to facilitate the learning of a target domain by transferring knowledge from a source d
Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajector