MLinfo | 機械学習・AI論文まとめ

用途: AIエンジニアリング
難易度: Easy
コスト: High

gradio — Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Pythonでマシンラーニングアプリを作成・共有することができるライブラリです。

強化学習方策勾配 (PPO / A3C)画像

用途: マシンラーニングアプリ作成
難易度: Easy
コスト: Medium

強化学習方策勾配 (PPO / A3C)分類テキスト

paperless-ngx — A community-supported supercharged document management system: scan, index and archive all your documents

paperless-ngxは、コミュニティによってサポートされたスーパーチャージドのドキュメント管理システムで、ドキュメントのスキャン・インデックス・アーカイブが可能である。

用途: ドキュメント管理
難易度: Easy
コスト: Low

MaaAssistantArknights — 《明日方舟》小助手，全日常一键长草！| A one-click tool for the daily tasks of Arknights, supporting all clients.

ゲーム『明日方舟』の支援ツール。全日常のタスクを一括で実行可能。

強化学習方策勾配 (PPO / A3C)

用途: ゲームの支援ツール
難易度: Easy
コスト: Medium

Gymnasium — An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Gymnasiumは、シングルエージェントRLの疑似環境を提供するAPIです。

用途: 疑似環境を提供する
難易度: Easy
コスト: Medium

ART — Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.6, GPT-OSS, Llama, and more!

ARTは、多段強化学習トレーナーです。このトレーナーは、GRPOを使用して、現実世界のタスクに対して、多段強化学習を行うことができます。

用途: 多段強化学習トレーナー
難易度: Easy
コスト: High

PufferLib — Puffing up reinforcement learning

用途: 強化学習用ライブラリ
難易度: Easy
コスト: Medium

RLinf — RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI

この研究では、弾性シミュレーションに基づいて、エピソード間の状態を保つために、リプラスの重みと、エピソードの初期状態を用いました。

用途: 弾性シミュレーション
難易度: Easy
コスト: Medium

品質予測/異常検知自然言語処理大規模言語モデル動画強化学習

Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning

Agentic reinforcement learning (RL) has become an important post-training paradigm for turning LLMs from stati

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Stabilizing On-Policy Distillation for MLLM Reasoning with Global Normalization

オンポリシーディストリレーションは、近年、重要なポストトレーニングの研究分野となりました。強い教師モデルを使用して学習トレッジを密に細かく指示することで、トピック認識を実現します。しかしなだな的にトークンレベルにおいてデ

深層学習軽量化・量子化マルチモーダル強化学習

用途: オンポリシーディストリレーション問題
難易度: Hard
コスト: High

Collaborative Human-Agent Protocol (CHAP)

この論文では、人機協力における分散型コミュニティを考慮するために、新しいフレームワークを提案する。これにより、分散型人機協力がより効果的に設計できる。

強化学習マルチエージェント生成

用途: 分散型人機協力
難易度: Hard
コスト: Medium

強化学習方策勾配 (PPO / A3C)画像テキスト

PhysScene: A Scene Graph Dataset for Scientific Visual Reasoning in Physics Experiments

Scene Graphs (SGs) provide structured representations of visual scenes by modeling objects and their pairwise

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

品質予測/異常検知深層学習Transformer検出画像テキスト

Temporal-Aware Reasoning Optimization for Video Temporal Grounding

Multi-modal Large Language Models (MLLMs) have achieved remarkable progress in video temporal grounding with r

用途: 検出
難易度: Hard
コスト: High

githubGitHubあり2026-06-08

ml-agents — The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

Unityを使用してマシンラーニングエージェントを訓練して訓練できるツールです。

コンピュータビジョン3D・点群3D強化学習

用途: Unityでマシンラーニングエージェント
難易度: Easy
コスト: High

説明可能品質予測/異常検知強化学習モデルフリー (DQN / SAC)

arxivGitHubあり2026-06-07

Scaling Decision-Focused Learning to Large Problems with Lagrangian Decomposition

Decision-focused learning has shown great promise for addressing predict-then-optimize problems, particularly

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

説明可能品質予測/異常検知自然言語処理大規模言語モデル画像テキストマルチモーダル

arxivGitHubあり2026-06-06

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in visual understanding, yet the

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

githubGitHubあり2026-06-06

Awesome-Process-Reward-Models — A comprehensive collection of process reward models.

医療では、イメージャは単に画像を解釈するのではなく、複数の画像を比較して診断を行うことが多い。しかし、現在の技術ではこのような比較を行うことは困難であるため、メドリコのDBというデータセットを利用することで、医療の比較推

強化学習RLHF

用途: 医療における画像の比較
難易度: Easy
コスト: Medium

huggingfaceHugging Faceあり2026-06-05

On the Geometry of On-Policy Distillation

On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training

深層学習軽量化・量子化検出生成テキスト

用途: 検出
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-06-05

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this

深層学習Transformer強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-06-05

DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning

Deep Research (DR) has emerged as a new agentic paradigm to tackle complex, open-ended research tasks, demandi

品質予測/異常検知強化学習マルチエージェント生成

用途: 生成
難易度: Easy
コスト: Low

CPUで試しやすい強化学習方策勾配 (PPO / A3C)回帰テキスト

arxivGitHubあり2026-06-04

TorchKM: A GPU-Oriented Library for Kernel Learning and Model Selection

TorchKM is an open-source library for kernel machines, including support vector machines, kernel logistic regr

用途: 回帰
難易度: Hard
コスト: High

huggingfaceHugging Faceあり2026-06-04

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by under

深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-06-03

Self-Evaluation Is Already There: Eliciting Latent Judge Calibration in Base LLMs with Minimal Data

Large language models are increasingly evaluated by other models, raising a natural question: can a model pred

少数データ向き品質予測/異常検知深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-06-03

Reinforcement Learning from Rich Feedback with Distributional DAgger

Reasoning models have advanced rapidly, but the dominant reinforcement learning from verifiable rewards (RLVR)

深層学習軽量化・量子化強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

huggingfaceHugging Faceあり2026-06-03

Audio Interaction Model

Audio is an inherently interactive modality, yet today's Large Audio Language Models (LALMs) are offline, and

強化学習マルチエージェントテキスト音声

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceGitHubありHugging Faceあり2026-06-03

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rub

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-06-03

dm_control — Google DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.

物理ベースのシミュレーションおよびロールアウト学習環境を提供するツールです。

用途: セルフモデリング環境
難易度: Easy
コスト: Medium

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science.

深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

Large Reasoning Models (LRMs) have achieved remarkable progress thanks to Reinforcement Learning with Verifiab

深層学習Transformer強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Low

Self-Distilled Policy Gradient

On-policy self-distillation, where a language model conditions on privileged context to supervise its own gene

深層学習軽量化・量子化生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: Medium

品質予測/異常検知自然言語処理大規模言語モデルテキスト自己教師強化学習

MemTrain: Self-Supervised Context Memory Training

Memory is an indispensable capability for long-horizon LLM agents, enabling them to preserve and utilize infor

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching

Wide-baseline matching (WBM) requires integrating geometric understanding, viewpoint changes, fine-grained per

自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

Large Language Models Hack Rewards, and Society

Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs

自然言語処理大規模言語モデル生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spe

深層学習軽量化・量子化生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-06-01

FinGPT — FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

このリポジトリでは、Lecture Learning Modelsに対してReinforcement Learningを実行するライブラリを提供しています。

自然言語処理大規模言語モデルテキスト

用途: 可搬性のあるReinforcement Learning
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-05-30

SDR: Set-Distance Rewards for Radiology Report Generation

Reinforcement learning with verifiable rewards has rapidly advanced reasoning in vision--language models. Howe

品質予測/異常検知深層学習Transformer生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-05-29

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as the cornerstone for shaping the

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-05-29

PaLM-rlhf-pytorch — Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

この論文では、Reinforcement Learning with Human Feedback (RLHF) を元にしたPaLMアーキテクチャの実装を提示します。基本的にChatGPTのようなLLMですが、PaLMと

深層学習Transformer強化学習

用途: LLMのトレーニングデータと人間のフィードバック
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-05-28

Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajector

品質予測/異常検知自然言語処理大規模言語モデルテキスト自己教師強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-05-28

Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning

We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision b

自然言語処理ファインチューニング画像テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-05-26

Trust Region Q Adjoint Matching

Off-policy reinforcement learning of pretrained flow policies remains challenging due to the instability of op

自然言語処理RAG強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-05-26

Book-Mathematical-Foundation-of-Reinforcement-Learning — This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

Mathematical Foundations of Reinforcement Learningは、ディープラーニングにおける推論力学習の数学的基礎を網羅している。

用途: ディープラーニングに関する本書の制作
難易度: Easy
コスト: Medium

githubGitHubあり2026-05-26

deep-rl-class — This repo contains the Hugging Face Deep Reinforcement Learning Course.

強化学習に関する学習教室を提供するリポジトリです。

用途: 強化学習教室
難易度: Easy
コスト: Medium

githubGitHubあり2026-05-23

open_spiel — OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.

ゲームの一般的な強化学習用エンドポインティであるEnvironmentおよびアルゴリズムの集合。

用途: ゲームの一般的な強化学習用エンドポインティ
難易度: Easy
コスト: Medium

githubGitHubあり2026-05-20

awesome-RLHF — A curated list of reinforcement learning with human feedback resources (continually updated)

人工知能による画像水印除去ツールとライブラリを提供する。

強化学習RLHF

用途: 人工知能の水印除去
難易度: Easy
コスト: Medium