Trajectory-Refined Distillation
On-policy distillation (OPD) has become a central post-training tool for large language models (LLMs), providi
- 用途
- 技術検証・論文読解補助
- 難易度
- Easy
- コスト
- High
「LoRA」の検索結果
20 件On-policy distillation (OPD) has become a central post-training tool for large language models (LLMs), providi
Repository-level coding benchmarks such as SWE-bench have driven a rapid surge in the capabilities of coding a
Confidence-based loss weighting is usually avoided in generative models because it accelerates errors when the
Harmony is a compact symbolic layer where mathematical pitch relations, acoustic consonance, and musical conve
Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills i
While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning a
Reasoning models produce long chain-of-thought traces that are costly to distill and encourage verbose student
Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing
Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery
Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reus
Processing video in vision-language models is expensive: each frame occupies hundreds of tokens, and inference
Large Reasoning Models (LRMs) have achieved remarkable progress thanks to Reinforcement Learning with Verifiab
Agentic language model systems alternate between two structurally distinct step types: structured tool calls (
Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answe
How can a population of agents self-orchestrate and self-adapt into stronger collective intelligence without c
Equivariance theory predicts that an architectural symmetry prior reduces sample complexity by a factor of |G|
We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision b
Recent video-based world models have made pixel-space environments interactive at the camera level: users can
Vision-Language Models (VLMs) are increasingly deployed in embodied environments, where they need produce nume
Diffusion-based image editing has achieved strong visual fidelity under natural language instructions, yet mos