MLinfo | 機械学習・AI論文まとめ

強化学習方策勾配 (PPO / A3C)分類テキスト

paperless-ngx — A community-supported supercharged document management system: scan, index and archive all your documents

paperless-ngxは、コミュニティによってサポートされたスーパーチャージドのドキュメント管理システムで、ドキュメントのスキャン・インデックス・アーカイブが可能である。

用途: ドキュメント管理
難易度: Easy
コスト: Low

gradio — Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Pythonでマシンラーニングアプリを作成・共有することができるライブラリです。

強化学習方策勾配 (PPO / A3C)画像

用途: マシンラーニングアプリ作成
難易度: Easy
コスト: Medium

MaaAssistantArknights — 《明日方舟》小助手，全日常一键长草！| A one-click tool for the daily tasks of Arknights, supporting all clients.

ゲーム『明日方舟』の支援ツール。全日常のタスクを一括で実行可能。

強化学習方策勾配 (PPO / A3C)

用途: ゲームの支援ツール
難易度: Easy
コスト: Medium

machine-learning-for-trading — Code for Machine Learning for Trading, 3rd edition — from data sourcing to live execution.

LLMの推論 Transparency を高めるために、DiffusionGemmaの計算を分離しVariable Transparency とAlgorithmic Transparencyを評価します。

用途: LLMの透明性、誤用、過度安定化を理解する
難易度: Easy
コスト: High

stable-baselines3 — PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

このリポジトリでは、LLMベースのエージェントアプリケーションのための強化学習の橋渡しを提供しています。

用途: 強化学習を簡素化させる橋渡し
難易度: Easy
コスト: High

ART — Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.6, GPT-OSS, Llama, and more!

ARTは、多段強化学習トレーナーです。このトレーナーは、GRPOを使用して、現実世界のタスクに対して、多段強化学習を行うことができます。

自然言語処理大規模言語モデル強化学習

用途: 多段強化学習トレーナー
難易度: Easy
コスト: High

PufferLib — Puffing up reinforcement learning

用途: 強化学習用ライブラリ
難易度: Easy
コスト: Medium

rllm — Democratizing Reinforcement Learning for LLMs

このリポジトリでは、AIエンジニアリングのためのリソースを提供しています。

自然言語処理大規模言語モデル強化学習

用途: AIエンジニアリング
難易度: Easy
コスト: High

arxivGitHubあり2026-07-23

Workflow-Localized Mechanism Learning: Attribution-Guided Repair and Knowledge Reuse for Structured Agent Skills

Agent Skills package reusable procedural knowledge as external artifacts for frozen language-model agents, yet

MI向き強化学習方策勾配 (PPO / A3C)

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

githubGitHubあり2026-07-23

qlib — Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, including supervised learning, market dynamics modeling, and RL, and is now equipped with https://github.com/microsoft/RD-Agent to automate R&D process.

クエンティング投資プラットフォームを実現するためにAI技術を活用します。

強化学習方策勾配 (PPO / A3C)教師あり

用途: クエンティング投資プラットフォーム
難易度: Easy
コスト: Medium

githubGitHubあり2026-07-23

FinGPT — FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

このリポジトリでは、Lecture Learning Modelsに対してReinforcement Learningを実行するライブラリを提供しています。

自然言語処理大規模言語モデルテキスト

用途: 可搬性のあるReinforcement Learning
難易度: Easy
コスト: High

githubGitHubあり2026-07-23

ml-agents — The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

Unityを使用してマシンラーニングエージェントを訓練して訓練できるツールです。

コンピュータビジョン3D・点群3D強化学習

用途: Unityでマシンラーニングエージェント
難易度: Easy
コスト: High

説明可能品質予測/異常検知自然言語処理大規模言語モデル生成テキスト強化学習

arxivGitHubあり2026-07-21

Beyond Score Prediction: LLM-Based Essay Scoring and Feedback Generation via Reinforcement Learning with Rubric Rewards

Large language models (LLMs) have been widely applied to automated essay scoring (AES) and automated feedback

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理ファインチューニング翻訳テキスト強化学習

arxivGitHubあり2026-07-21

Reasoning Before Translation: Enhancing Legal Machine Translation with Structured Reasoning

この研究では、平衡方程式を満たすPINNs（物理基準付きニューラルネットワーク）を使用して、平均脱出時間の計算を目的とした椭球型境界条件付きPINNsを提案し、PINNsを使用した計算と実験室データを比較します。

用途: 平均脱出時間計算を目的とした椭円型境界条件付きPINNs
難易度: Hard
コスト: High

MI向き品質予測/異常検知自然言語処理大規模言語モデル画像音声動画

arxivGitHubあり2026-07-21

OmniReasoner: Thinking with Long Audio-Video via Native Tool Use

オリジナルのデータとZoom-Inのツールを組み合わせた方法、OmniReasonerを提案する。これにより、オリンモードルLLMsの長いオーディオビデオの論理的推論を改善できる。

用途: 長いオーディオビデオの論理的推論を改善する
難易度: Hard
コスト: High

huggingfaceHugging Faceあり2026-07-21

ISO: An RLVR-Native Optimization Stack

Reinforcement learning with verifiable rewards (RLVR) is rapidly advancing the reasoning capabilities of langu

深層学習正規化・最適化手法テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-07-21

Book-Mathematical-Foundation-of-Reinforcement-Learning — This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

Mathematical Foundations of Reinforcement Learningは、ディープラーニングにおける推論力学習の数学的基礎を網羅している。

用途: ディープラーニングに関する本書の制作
難易度: Easy
コスト: Medium

MI向き強化学習方策勾配 (PPO / A3C)生成

arxivGitHubあり2026-07-20

MEVION: Low-Cost Open-Source Data Collection System for Powerful and High-Speed Dual-Arm Manipulation

The global competition for developing robotic foundation models is intensifying. Among the data collection sys

用途: 生成
難易度: Hard
コスト: Medium

huggingfaceGitHubありHugging Faceあり2026-07-20

Differentiable Logic Gate Networks for Low-Latency EEG Classification on Edge Devices

Real-time EEG classification on edge devices is bottlenecked by the floating-point arithmetic of conventional

CPUで試しやすい強化学習マルチエージェント分類検出

用途: 分類
難易度: Easy
コスト: Low

huggingfaceHugging Faceあり2026-07-20

ConsiSpace: Learning Geometric Consistency Matters for Video Spatial Reasoning

Video spatial reasoning is essential for navigation-oriented perception and long-video question answering, whe

深層学習軽量化・量子化QAテキスト動画

用途: QA
難易度: Easy
コスト: High

githubGitHubあり2026-07-20

Gymnasium — A standard API for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Gymnasiumは、シングルエージェントRLの疑似環境を提供するAPIです。

用途: 疑似環境を提供する
難易度: Easy
コスト: Medium

huggingfaceHugging Faceあり2026-07-19

TimeLens2: Generalist Video Temporal Grounding with Multimodal LLMs

Video multimodal large language models (MLLMs) can describe what happens in a video, but rarely identify when

自然言語処理大規模言語モデル検出テキスト動画

用途: 検出
難易度: Easy
コスト: High

huggingfaceGitHubありHugging Faceあり2026-07-19

Distilled Reinforcement Learning for LLM Post-training

Large language model (LLM) post-training is essential for improving reasoning, adaptation, and alignment. Exis

説明可能品質予測/異常検知深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

説明可能強化学習方策勾配 (PPO / A3C)画像

arxivGitHubあり2026-07-18

SinD 2.0: A Multi-City UAV Dataset with Semantic Risk Annotations for SOTIF-Oriented Safety Validation at Signalized Intersections

Safety validation at signalized intersections remains a critical bottleneck for the deployment of autonomous d

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-18

Group Entropy-Controlled Policy Optimization

Entropy control has become an effective tool in reinforcement learning (RL) of large language models (LLMs), h

深層学習軽量化・量子化生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

SeerGuard: A Safety Framework for Mobile GUI Agents via World Model Prediction

Mobile graphical user interface (GUI) agents have demonstrated remarkable capabilities in automating complex t

強化学習モデルベース

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

CPUで試しやすい深層学習軽量化・量子化マルチモーダル強化学習

JoyNexus: Service-Oriented Multi-Tenant Post-Training for VLA Models

The post-training of Vision-Language-Action (VLA) models is essential due to the diversity of simulators, robo

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

Understanding Reasoning from Pretraining to Post-Training

Reinforcement learning (RL) has become central to improving large language models (LLMs) on complex reasoning

自然言語処理大規模言語モデルテキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

When Does Muon Help Agentic Reinforcement Learning?

Muon is competitive with AdamW in large-scale pre-training, but its value for reinforcement-learning (RL) post

深層学習正規化・最適化手法強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

DSWorld: A Data Science World Model for Efficient Autonomous Agents

Despite strong capabilities in data understanding and decision-making, autonomous data science agents still he

深層学習軽量化・量子化強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-07-17

open_spiel — OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.

ゲームの一般的な強化学習用エンドポインティであるEnvironmentおよびアルゴリズムの集合。

用途: ゲームの一般的な強化学習用エンドポインティ
難易度: Easy
コスト: Medium

huggingfaceGitHubありHugging Faceあり2026-07-16

On-Policy Delta Distillation

On-policy distillation is an alternative post-training method in reinforcement learning that alleviates the co

深層学習軽量化・量子化強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-16

Beyond Entropy: Correctness-Aware Advantage Shaping via Contrastive Policy Optimization

Reinforcement learning with verifiable rewards (RLVR) commonly uses entropy for advantage shaping. However, en

深層学習軽量化・量子化生成強化学習

用途: 生成
難易度: Easy
コスト: Medium

githubGitHubあり2026-07-15

vowpal_wabbit — Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbitは、機械学習を進歩させるためのオンライン学習、ハッシュ、reduceなどの強力なアルゴリズムを含むシステムです。その結果、さまざまな問題に応じて、高品質な解決策を提供できます。

強化学習テキスト

用途: 強い機械学習アルゴリズムを実行し複雑な問題を解決するためのシステム
難易度: Easy
コスト: Medium

huggingfaceHugging Faceあり2026-07-13

SVR-R1: Bootstrapping Multi-modal Reasoning with Self-verification in Reinforcement Learning

We introduce Self-Verified Reasoner (SVR-R1), a multi-turn RL framework that turns a model's own verification

コンピュータビジョンセグメンテーション生成マルチモーダル強化学習

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-12

Predictive Divergence Masks for LLM RL

Reinforcement learning for large language models (LLMs) typically relies on trust-region masks to stabilize of

深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-11

Beyond Euclidean Clipping: Overcoming Exploration Collapse in LLM RL via Riemannian Isometric Policy Optimization

Reinforcement learning (RL) has become a dominant paradigm for enhancing LLMs' reasoning capabilities. However

自然言語処理大規模言語モデル強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-08

DeepSearch-World: Self-Distillation for Deep Search Agents in a Verifiable Environment

Training tool-use agents to improve from their own experience remains challenging, as supervised fine-tuning r

深層学習軽量化・量子化生成強化学習

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-08

Agon: Competitive Cross-Model RL with Implicit Rival Grading of Reasoning

Reinforcement learning from verifiable rewards (e.g. GRPO) is the engine behind today's reasoning models, yet

コンピュータビジョンセグメンテーションテキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-07-08

deep-reinforcement-learning — Repo for the Deep Reinforcement Learning Nanodegree program

この研究はDeep Reinforcement Learningに関する学習用リポジトリです。

強化学習モデルフリー (DQN / SAC)

用途: 実装・検証基盤
難易度: Easy
コスト: Medium

arxivGitHubあり2026-07-07

FootsiesGym: A Fighting Game Benchmark for Two-Player Zero-Sum Imperfect-Information Games

格闘ゲームNeutral Playにおける非確定情報ゲームを取り扱い、非確定情報ゲーム向けのオープンソース環境 FootsiesGymを開発した。

深層学習軽量化・量子化強化学習

用途: 格闘ゲーム環境作成
難易度: Hard
コスト: High

arxivGitHubあり2026-06-18

Evolutionary Discovery of Developmental Reward Schedules in Deep Reinforcement Learning

The temporal structure of reward composition in reinforcement learning (RL) is typically hand-designed and hel

MI向き深層学習Transformer強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivGitHubあり2026-06-18

Provably Sub-Linear Two-Timescale NeuroEvolution with Online Plasticity

NeuroEvolution of Augmenting Topologies (NEAT) is a widely used neuroevolution algorithm for learning neural n

コンピュータビジョンセグメンテーション強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-05-07

Masked Diffusion Language Models are Strong and Steerable Text-Based World Models for Agentic RL

Recent growth in reinforcement learning (RL) has surfaced a need for diverse, specialized training environment

自然言語処理大規模言語モデルテキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High