MLinfo | 機械学習・AI論文まとめ

On the Geometry of On-Policy Distillation

On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training

深層学習軽量化・量子化検出生成テキスト

用途: 検出
難易度: Easy
コスト: High

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Whisper, a widely adopted ASR model, is known to suffer from hallucinations - coherent transcriptions generate

自然言語処理ファインチューニング検出音声

用途: 検出
難易度: Easy
コスト: Medium

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this

深層学習Transformer強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

センサ/時系列自然言語処理ファインチューニング生成テキスト音声

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

Confidence-based loss weighting is usually avoided in generative models because it accelerates errors when the

用途: 生成
難易度: Easy
コスト: High

ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives

Hard-negative source selection for dense retrieval is usually decided only after fine-tuning and downstream ev

深層学習Transformer検索テキスト

用途: 検索
難易度: Easy
コスト: High

説明可能センサ/時系列品質予測/異常検知深層学習Transformer分類テキスト音声

How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling

Harmony is a compact symbolic layer where mathematical pitch relations, acoustic consonance, and musical conve

用途: 分類
難易度: Easy
コスト: Low

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing

深層学習RNN / LSTMテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story prog

自然言語処理ファインチューニングテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Low

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by under

深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs

Automatic Speech Recognition (ASR) has become a key technology for human--AI interaction. However, code-switch

自然言語処理ファインチューニング分類生成音声

用途: 分類
難易度: Easy
コスト: Low

huggingfaceHugging Faceあり2026-06-03

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

We introduce VideoKR, the first large-scale training corpus specifically designed to strengthen knowledge- and

自然言語処理ファインチューニング生成テキスト動画

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-06-03

SePO: Self-Evolving Prompt Agent for System Prompt Optimization

System prompt optimization improves agent behavior without modifying the underlying model, yielding human-read

自然言語処理RAG生成テキスト

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-06-03

Video2LoRA: Parametric Video Internalization for Vision-Language Models

Processing video in vision-language models is expensive: each frame occupies hundreds of tokens, and inference

自然言語処理ファインチューニング要約QA画像

用途: 要約
難易度: Easy
コスト: High

huggingfaceGitHubありHugging Faceあり2026-06-02

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning

自然言語処理大規模言語モデル強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-06-01

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

Agentic language model systems alternate between two structurally distinct step types: structured tool calls (

品質予測/異常検知深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-06-01

Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

How can a population of agents self-orchestrate and self-adapt into stronger collective intelligence without c

自然言語処理ファインチューニング

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

huggingfaceHugging Faceあり2026-05-30

SDR: Set-Distance Rewards for Radiology Report Generation

Reinforcement learning with verifiable rewards has rapidly advanced reasoning in vision--language models. Howe

品質予測/異常検知深層学習Transformer生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-05-28

Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning

We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision b

自然言語処理ファインチューニング画像テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-05-26

Trust Region Q Adjoint Matching

Off-policy reinforcement learning of pretrained flow policies remains challenging due to the instability of op

自然言語処理RAG強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-05-24

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

Recent video-based world models have made pixel-space environments interactive at the camera level: users can

自然言語処理ファインチューニング生成画像動画

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-05-22

SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

Vision-Language Models (VLMs) are increasingly deployed in embodied environments, where they need produce nume

自然言語処理ファインチューニング画像テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High