MLinfo | 機械学習・AI論文まとめ

強化学習方策勾配 (PPO / A3C)分類テキスト

paperless-ngx — A community-supported supercharged document management system: scan, index and archive all your documents

paperless-ngxは、コミュニティによってサポートされたスーパーチャージドのドキュメント管理システムで、ドキュメントのスキャン・インデックス・アーカイブが可能である。

用途: ドキュメント管理
難易度: Easy
コスト: Low

gradio — Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Pythonでマシンラーニングアプリを作成・共有することができるライブラリです。

強化学習方策勾配 (PPO / A3C)画像

用途: マシンラーニングアプリ作成
難易度: Easy
コスト: Medium

MaaAssistantArknights — 《明日方舟》小助手，全日常一键长草！| A one-click tool for the daily tasks of Arknights, supporting all clients.

ゲーム『明日方舟』の支援ツール。全日常のタスクを一括で実行可能。

用途: ゲームの支援ツール
難易度: Easy
コスト: Medium

machine-learning-for-trading — Code for Machine Learning for Trading, 3rd edition — from data sourcing to live execution.

LLMの推論 Transparency を高めるために、DiffusionGemmaの計算を分離しVariable Transparency とAlgorithmic Transparencyを評価します。

用途: LLMの透明性、誤用、過度安定化を理解する
難易度: Easy
コスト: High

stable-baselines3 — PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

このリポジトリでは、LLMベースのエージェントアプリケーションのための強化学習の橋渡しを提供しています。

用途: 強化学習を簡素化させる橋渡し
難易度: Easy
コスト: High

ART — Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.6, GPT-OSS, Llama, and more!

ARTは、多段強化学習トレーナーです。このトレーナーは、GRPOを使用して、現実世界のタスクに対して、多段強化学習を行うことができます。

用途: 多段強化学習トレーナー
難易度: Easy
コスト: High

PufferLib — Puffing up reinforcement learning

用途: 強化学習用ライブラリ
難易度: Easy
コスト: Medium

rllm — Democratizing Reinforcement Learning for LLMs

このリポジトリでは、AIエンジニアリングのためのリソースを提供しています。

用途: AIエンジニアリング
難易度: Easy
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル画像テキストマルチモーダル

MIRROR: Learning from the Other View for Multi-Modal Reasoning

多モーダル理解技術のための新しいアプローチであるMIRROR（Learning from the Other View）を提案しました。MIRRORは、テキスト、図、テキストと図の組み合わせから同等の視点を提供することで

用途: 多モーダル理解技術の開発
難易度: Hard
コスト: High

Compact Latent Coordination for Autonomous Vehicles at Unsignalized Intersections

Coordinating autonomous vehicles at unsignalized intersections remains a critical challenge for multi-agent re

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

How Many Bits Can an Adapter Write? Measuring the Capacity and Memorization of Parameter-Efficient Fine-Tuning

パ

用途: パラメータ効率性ファインチューニングモデルの能力を測定する
難易度: Hard
コスト: Medium

Approximate Quantum State Preparation Through Proximal Policy Optimization

この研究では、深層強化学習を用いて、クォンタムSTATEPREPARATIONの近似方程式を学習し、クォンタムシステムの最適な操作手法を検討するための新しいアプローチを提案します。

用途: クォンタムSTATE PREPARATION
難易度: Hard
コスト: Medium

Relative Value Learning

この研究では、反対称関数を用いて、機械学習モデルが状態のどの点からどの点への値の差を予測できるような相対的な値学習(RV)を提案し、制御や推定を向上させる可能性があります。

用途: 値の差を予測
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理ファインチューニング強化学習

TOUR: A Trajectory-Level Unlearning Benchmark for Offline Reinforcement Learning

この研究では、固定行動軌道に基づいて訓練されたオフサイト学習エージェントのデータ削除を評価するためのTOURを提案し、オフサイト学習の安全性を高めます。

用途: オフサイト学習のデータ削除
難易度: Hard
コスト: High

自然言語処理大規模言語モデル異常検知テキスト強化学習

Training Large Language Models for Self-Explanation Faithfulness

この研究では、自己説明の信頼性を検証するためのRL方法を提案し、自己説明の信頼性を直接最適化するための新しいアプローチを検討します。

用途: 自己説明の信頼性
難易度: Hard
コスト: High

From Evaluation to Optimisation: Hierarchy-Aware Training Signals for CWE Prediction in Python

The original ALPHA benchmark introduced a taxonomy-aware penalty for evaluating CWE-level vulnerability predic

自然言語処理ファインチューニング分類強化学習

用途: 分類
難易度: Hard
コスト: High

MI向き自然言語処理ファインチューニングテキスト強化学習

The Weight of Silence: A Causal Case for Weights Over the Scratchpad in Latent Chess Reasoning

ラテン言語モデルを使用すると、言語モデルの内部の計算結果を分析できる。計算結果は、連続ベクトル空間として実行される中間計算であり、これを分析すると、モデルがどのように結果を得ているかを明らかにできる。

用途: ラテン言語モデルの中間計算を分析する
難易度: Hard
コスト: High

Multi-turn RL with Structural and Performance Aware Rewards for CUDA Kernel Generation

CUDAカーネルの生成を支援するCudaPerfを提案した研究で、この方法により、高性能のCUDAカーネルを効率的に生成できる。

自然言語処理大規模言語モデル生成強化学習

用途: CUDAカーネルの生成を支援する
難易度: Hard
コスト: High

Offline RL with Hierarchical Action Chunking

オフラインRL（非実時学習）におけるタスクの分割を支援するOffline RL with Hierarchical Action Chunkingを提案した研究で、この方法により、タスクの分割が効

用途: オフラインRLにおけるタスクの分割
難易度: Hard
コスト: High

Robust Asynchronous Q-Learning under Reward and State Corruption via Batching

Motivated by reinforcement learning in harsh environments, we consider the problem of learning an optimal poli

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知強化学習マルチエージェントテキスト

AREX: Towards a Recursively Self-Improving Agent for Deep Research

Deep research requires agents to find answers that jointly satisfy multiple constraints. Discovering such answ

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

PATS: Policy-Aware Training Scaffolding for Agentic Reinforcement Learning

In long-horizon LLM agent reinforcement learning, weak policies often repeat similar failures, producing uninf

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成強化学習

Expert Behavior Prior Reinforcement Learning

Behavior prior reinforcement learning (BPRL) has emerged as a promising paradigm to improve sample efficiency

用途: 生成
難易度: Hard
コスト: High

A New Well-Supported Semantics for Description Logic Programs

この研究では、大規模言語モデルを活用して、説明理論の拡張を研究しました。大規模言語モデルを活用することで、説明理論の拡張が可能になりました。

用途: 説明理論の拡張
難易度: Hard
コスト: Medium

Hybrid MKNF with Classical Negation in the Rule Component

この研究では、大規模言語モデルを活用して、双方の否定を許容する制御論理プログラミング言語を開発しました。大規模言語モデルを活用することで、双方の否定を許容する制御論理プログラミング言語が可能になりました。

用途: 双方の否定の許容
難易度: Hard
コスト: Medium

Chess\_db: A framework for working with large chess game datasets

Chess is a two player strategic game that is embedded in classical AI culture as it was once the frontier for

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

EmoAgent-R1: Towards Multimodal Emotion Understanding with Reinforcement Learning-based Dynamic Agent Specialization

Multimodal large language models (MLLMs) have achieved impressive performance in multimodal emotion recognitio

自然言語処理大規模言語モデル分類テキスト動画

用途: 分類
難易度: Hard
コスト: High

arxivGitHubあり2026-07-23

Workflow-Localized Mechanism Learning: Attribution-Guided Repair and Knowledge Reuse for Structured Agent Skills

Agent Skills package reusable procedural knowledge as external artifacts for frozen language-model agents, yet

MI向き強化学習方策勾配 (PPO / A3C)

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Sample-Efficient Learning from Agent Experience

Real-world agent learning is often constrained by costly environment interactions, such as running time-consum

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

FORGE-plus: Force-Budgeted Recovery for Contact-Rich Assembly with a Frozen LLM Supervisor

強制制約に基づく強化学習を利用し、低コストで高精度の組み立てが可能になると同時に、組み立てに失敗してもロボットが安全に回避できるように、ロボットの制御のための強化学習を提案します。

用途: 非対称ロボット組み立て
難易度: Hard
コスト: High

Deep Reinforcement-Learning-Guided Model Predictive Control for Preventing Overtakes in Autonomous Racing

オートモーティブレーシングにおける防御阻止を目的とした、強化学習とモデル予測制御のハイブリッドフレームワークを提案します。このフレームワークでは、自律車

自然言語処理RAG

用途: オートモーティブレーシングにおける防御阻止
難易度: Hard
コスト: Low

Advances in STV Margin Computation

Single transferable vote (STV) is a multi-winner preferential proportional electoral system. The margin is the

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Discrete Truthful Heterogeneous Two-Facility Location: The Line and Beyond

We study deterministic strategyproof mechanisms for discrete heterogeneous two-facility location. In our model

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

githubGitHubあり2026-07-23

qlib — Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, including supervised learning, market dynamics modeling, and RL, and is now equipped with https://github.com/microsoft/RD-Agent to automate R&D process.

クエンティング投資プラットフォームを実現するためにAI技術を活用します。

強化学習方策勾配 (PPO / A3C)教師あり

用途: クエンティング投資プラットフォーム
難易度: Easy
コスト: Medium

githubGitHubあり2026-07-23

FinGPT — FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

このリポジトリでは、Lecture Learning Modelsに対してReinforcement Learningを実行するライブラリを提供しています。

自然言語処理大規模言語モデルテキスト

用途: 可搬性のあるReinforcement Learning
難易度: Easy
コスト: High

githubGitHubあり2026-07-23

ml-agents — The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

Unityを使用してマシンラーニングエージェントを訓練して訓練できるツールです。

コンピュータビジョン3D・点群3D強化学習

用途: Unityでマシンラーニングエージェント
難易度: Easy
コスト: High

Perspective Latents as an Architectural Condition for Causal Emergence in Active Inference Agents

A recent line of work measures causal emergence in reinforcement learning agents through Integrated Informatio

コンピュータビジョン動画認識強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

説明可能品質予測/異常検知強化学習方策勾配 (PPO / A3C)分類音声

Explanation-Based Runtime Verification for Trustworthy ML-driven Optical Networks

Machine learning (ML) models are increasingly integrated into optical network automation frameworks to support

用途: 分類
難易度: Hard
コスト: Low

Adaptive Multi-Horizon Reinforcement Learning

Effective decision-making in complex and changing environments requires balancing short-term and long-term con

コンピュータビジョン動画認識強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

SalesLoop: Reinforcement Learning from Performance Feedback for Sales Lead Ranking

Lead ranking in Customer Relationship Management (CRM) systems faces a persistent challenge: models achieving

コンピュータビジョン動画認識強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Towards Miniature Humanoid Tele-Loco-Manipulation Using Virtual Reality and Reinforcement Learning

この研究では、人間の遠隔操作を可能にするために、バーチャルリアリティと強化学習を組み合わせることを提案した。人類との対話に従って、ロボットの身体を操作し、移動することができるようになった。

用途: 人間の遠隔操作
難易度: Hard
コスト: Low

PG-KINN: A Physics-Informed Petrov-Galerkin Kolmogorov-Arnold Network for Solving Forward and Inverse PDEs

この研究では、物理学に関する知見を学習アーキテクチャに組み込んだPetrov-Galerkinコロモゴロフアーノルドネットワーク(Physics-Informed Petrov-Galerkin Kolmogorov-A

用途: 方程式の解決における学習の改善
難易度: Hard
コスト: Medium

MI向き深層学習Transformer生成テキスト強化学習

OLEDLM: A Unified Language Model for OLED Molecular Design

OLED 材料の開発を目指す新しいアプローチ、causal language models を用いて optoelectronic プロパティを予測するフレームワークを提案する。

用途: OLED 材料の開発
難易度: Hard
コスト: High

Active Inference as a Convex Markov Decision Process

エピステミック目標を扱う研究、Active Inference を用いてエピステミック目標を提案する。

用途: エピステミック目標
難易度: Hard
コスト: Low

Generalized Kalman filter based temporal difference reinforcement learning

この研究では、強化学習の強化値と行動値（Q値）関数を条件的期待として扱い、これらの関数の推定を確率的推論として表現する新たなフレームワークを提案しました。

深層学習Transformer強化学習

用途: 強化学習における条件的期待の利用
難易度: Hard
コスト: Medium

MI向きセンサ/時系列強化学習方策勾配 (PPO / A3C)テキスト時系列

Post-Training in Time Series Foundation Models: A Unifying Framework

この研究では、学習前の時系列ベースの学習模型を、トレーニング後の適応を使用して、目的のタスクに適応させる方法を提案しました。

用途: 時系列ベースの学習模型のトレーニング後の適応
難易度: Hard
コスト: High

Fisher Widths: Local Learning Geometry and Anisotropic Recovery

We study Gaussian-width complexity on statistical manifolds through a pair of functionals: the primal Fisher w

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

表形式向き品質予測/異常検知自然言語処理ファインチューニングテキスト表形式強化学習

Asymptotically Optimal Regret for Reinforcement Learning without Horizon Dependence

We study horizon-free regret minimization for finite-horizon time-homogeneous tabular Markov decision processe

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

Dreamer-CPC: Message Learning with World Models for Decentralized Multi-agent Reinforcement Learning

分散されたシステムにおける分散多エージェント強化学習を実現するための方法を提案している。この方法は、個々のエージェントがローカルな観測に基づいてメッセージを交換し、長期の経験を考慮したメッセージを学習することで、分散され

強化学習方策勾配 (PPO / A3C)埋め込み

用途: 分散されたマルチエージェント強化学習
難易度: Hard
コスト: Low

The World Model Remembers, the Actor Forgets: Dream Rehearsal for Continual Model-Based RL

Model-based reinforcement-learning agents of the DreamerV3 family forget catastrophically when trained on task

強化学習モデルベース

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

説明可能自然言語処理プロンプトエンジニアリング強化学習

SLPO: Scaling Latent Reasoning via a Surrogate Policy

この研究では、ラベラーの品質が悪い場合の対策として、ラベラーの評価を自動化します。特に、ラベラーの評価はオブジェクト検出のタスクでは困難です。したがって、ラベラーの評価を自動化するために、画像認識のデータを分析してラベラ

用途: ラベル付けの品質を確保し、品質管理が必要な画像認識
難易度: Hard
コスト: Medium

センサ/時系列深層学習軽量化・量子化画像テキストマルチモーダル

Robostral Navigate

Deploying navigation systems at scale requires a recipe that minimizes sensor assumptions, generalizes across

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Courteous Anticipation: Improving Long-Lived Task Planning in Persistent Shared Environments

We consider a task planning scenario in which robots sharing a persistent environment are assigned tasks one a

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Reinforcement Learning for Large Language Model Selective Evidence Adoption from Contaminated Retrieval Results

Retrieval-augmented large language models frequently face contexts that interleave useful evidence with mislea

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

TRUST-ESD: A Risk-Calibrated and Governance-Aware AI Framework for Enterprise Strategic Decision Support Under Uncertainty

Enterprise strategic decision support requires AI systems that are not only accurate, but also uncertainty-awa

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

品質予測/異常検知深層学習軽量化・量子化生成画像テキスト

Learning to Detect UI Principle Violations via Reinforcement Learning

Small language models and coding agents increasingly generate web front-end code, yet their outputs are typica

用途: 生成
難易度: Hard
コスト: High

Notes to Self: Can LLMs Benefit from Experiential Abstractions?

Humans distill experience into reusable abstractions, e.g., strategies and cautionary reminders, and apply the

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

説明可能強化学習方策勾配 (PPO / A3C)分類テキスト

Two-Step Occupation Coding

職業コード付けは、職業タイトルから職業分類を識別することであり、二つのステップで実行される二つのアプローチのうちのどちらかが最も効果的であることを示しました。

用途: 職業コード付けのための二段階的なアプローチ
難易度: Hard
コスト: Low

Rewarding Better Thinking for LLM Preference Alignment

この研究では、偏見が蓄積されることが多くのLLMで問題となります。一方、この研究によって、LLMの偏見を解決する新しいアプローチが提案されました。

用途: LLMの偏見を解決する
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル生成動画強化学習

PercepCap: Video Captioner with Structured Spatio-Temporal Perception

ビデオキャプション生成には、空間と時刻の理解が重要です。PercepCapアルゴリズムは、ビデオ入力を空間時刻認識に分解することで、生成されたキャプションの理解度が向上するとともに、空間時刻の誤差をより正確に検出でき、キ

用途: ビデオキャプション生成のための構造化された空間時刻の理解
難易度: Hard
コスト: High

自然言語処理ファインチューニング画像動画マルチモーダル

EA-Nav: Learning Safe Visual Navigation Policies with Embodiment Awareness

Cross-embodiment navigation is a key challenge in embodied intelligence. Due to differences in embodiment, the

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Trace: A Taxonomy-Guided Environment for Multidomain Visual Reasoning

自動運転システムには、道路のトポロジー（ドライバブルレーンとその接続性）を理解する機能が必要です。最近の検出モデルは360度の前方視野からボリュームイメージを取得することで、道路上のレーンのトポロジーを推測することができ

自然言語処理RAG画像テキストマルチモーダル

用途: 道路のトポロジー認識を改善
難易度: Hard
コスト: High

Safe and Scalable Multi-Drone Payload Transport via CBF-based Reinforcement Learning with Zero-Shot Sim-to-Real Transfer

Multi-drone payload transportation has emerged as a promising research paradigm with potential applications in

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

センサ/時系列強化学習方策勾配 (PPO / A3C)検出音声

Distributed Acoustic Localization Array Deployed Using a Soft Everting Vine Robot

Soft robot exteroception is increasingly being explored for a variety of field applications. In this work, we

用途: 検出
難易度: Hard
コスト: Medium

Digital Twin Modeling of a Highly Automated Agricultural Tractor

このプロジェクトでは、農林業用自動化トラクターのデジタルツインモデリングが行われた。デジタルツインはCAN通信を使用することでトラクターの動きを模倣し、実際のトラクターの動作をシミュレートする。

強化学習画像

用途: 農林業用自動化トラクターのデジタルツインモデリング
難易度: Hard
コスト: Medium

Contact-Persistent Full Actuation for Aerial Physical Interaction

Fully actuated unmanned aerial vehicles (UAVs) are usually certified through rank conditions on a control-allo

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Improved Lower Bounds and Output Augmentation for Facility Location Mechanisms

We study the strategic facility location problem under the egalitarian objective, where a mechanism uses the r

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Identity-Truthful Online Decision-Making

In Bayesian online selection, a decision-maker observes a sequence of stochastic rewards and must immediately

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Boundary-Adapted PINNs for Elliptic Dirichlet Problems: $H^2(Ω)$ A Priori Error Bounds with Application to Mean Escape Time Computation

この研究では、Oceanモデルを使用して、オーシャンで不完全な観測を使用する可能性と、生成的ステートスペースモデルと最適化フレームワークを使用して直接不完全な観測から学習する能力を評価します。

強化学習方策勾配 (PPO / A3C)テキスト

用途: Oceanモデルにおける不完全な観測の使用
難易度: Hard
コスト: Medium

Copy Less, Ground More: Overcoming Repetitive Copying in Long-Context Reasoning via Evidence-Aware Reinforcement Learning

Large language models that generate step-by-step reasoning traces have achieved strong performance on complex

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル翻訳テキスト強化学習

The Price of Reasoning: Cost-Quality Tradeoffs in Reinforcement Learning for Neural Machine Translation

この研究では、学生チームのテーブル演習（TTX）における評価方法を提案し、複雑でオープンエンドな状況にあるチームの行動とコミュニケーションを記録できるTTX学習プラットフォームを使用します。

用途: 計算機教育のチーム問題解決能力評価
難易度: Hard
コスト: High

説明可能品質予測/異常検知自然言語処理大規模言語モデル生成テキスト強化学習

arxivGitHubあり2026-07-21

Beyond Score Prediction: LLM-Based Essay Scoring and Feedback Generation via Reinforcement Learning with Rubric Rewards

Large language models (LLMs) have been widely applied to automated essay scoring (AES) and automated feedback

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理ファインチューニング翻訳テキスト強化学習

arxivGitHubあり2026-07-21

Reasoning Before Translation: Enhancing Legal Machine Translation with Structured Reasoning

この研究では、平衡方程式を満たすPINNs（物理基準付きニューラルネットワーク）を使用して、平均脱出時間の計算を目的とした椭球型境界条件付きPINNsを提案し、PINNsを使用した計算と実験室データを比較します。

用途: 平均脱出時間計算を目的とした椭円型境界条件付きPINNs
難易度: Hard
コスト: High

Measuring Reward-Seeking via Contrastive Belief Updates

この研究では、強化学習の報酬探求を量化するために、新しい測定方法を提案しています。この方法は、モデルが報酬を取得する際にどのように操作しようとしているかを示すことができます。

用途: 強化学習における報酬探求の測定
難易度: Hard
コスト: High

H$^2$SD: Hybrid Hindsight Self-Distillation

Reinforcement learning with verifiable rewards (RLVR) provides reliable outcome supervision for language model

深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

説明可能深層学習Transformer生成強化学習

Stale but Stable: Staleness-Adaptive Trust Regions for Stabilizing Asynchronous Reinforcement Learning

離散RLは、長所と短所を含む複雑なランク付けゴールの最適化に効果があります。しかし、その計算コストは通常高く、自動微分化などの複雑なグラadientsの計算アラウンドを必要とします。この文書では、長所と短所を含むランク付

用途: 離散RLアルゴリズムの性能アップデート
難易度: Hard
コスト: High

MI向き品質予測/異常検知自然言語処理大規模言語モデル画像音声動画

arxivGitHubあり2026-07-21

OmniReasoner: Thinking with Long Audio-Video via Native Tool Use

オリジナルのデータとZoom-Inのツールを組み合わせた方法、OmniReasonerを提案する。これにより、オリンモードルLLMsの長いオーディオビデオの論理的推論を改善できる。

用途: 長いオーディオビデオの論理的推論を改善する
難易度: Hard
コスト: High

CRB-Driven Beamforming and Trajectory Optimization for UAV-assisted ISAC System

UVAを用いたISACシステムを構築し、ISACシステムの動作の最適化を行うためにCRBを利用したビーム形成法とパス追従法を提案した。

センサ/時系列自然言語処理RAG強化学習

用途: UVAを用いたISACシステム
難易度: Hard
コスト: Low

Emergent Autonomous Drifting for Collision Avoidance in Real-World Winter Driving Scenarios

Real-world collision avoidance is a core motivation for studying the dynamics and control of high sideslip dri

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Computing on the Fly: Navigating a Vision for the Future of Drone Computing

The report envisions a decade in which drones move goods, medical supplies, and information at a scale compara

強化学習検出生成

用途: 検出
難易度: Hard
コスト: High

The Twist Decomposition of Serial Robots Under Lower-Mobility Tasks

This paper introduces a twist decomposition framework for serial manipulators performing lower mobility tasks.

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Fabric Pneumatic Artificial Muscles Based on the Drawstring Principle

Pneumatic artificial muscles have wide applications in robotics and industrial fields. Conventional pneumatic

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

End-to-end Conditional Diffusion for Realistic and Controllable Visual Traffic Scenario Generation

この文書では、閉回路交通シナリオ生成のための変分ベースのアプローチ「E2E-CDiff」を提案しました。これを使用すると、実世界に近い交通ルールを生成したり、交通ルールを操作することができるようになります。

生成AI拡散モデル生成画像

用途: 自動運転データの生成
難易度: Hard
コスト: High

表形式向き深層学習軽量化・量子化テキスト3D強化学習

Intelligent Multi-UAV Navigation in ITNTNs: A Hierarchical LLM Approach

The deployment of high-speed Uncrewed Aerial Vehicles (UAVs) in 3D aerial highways necessitates robust coordin

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Packing Linear Programs and Fractional Knapsack using Comparison Oracles

We study the problem of recovering the objective of a packing linear program when the algorithm accesses only

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Hospitals/Residents with Inseparable Couples: Finding a Coalition-Stable Assignment Is NP-Hard

In recent work on course allocation, Rodríguez and Manlove consider the complexity of finding a stable assignm

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-21

ISO: An RLVR-Native Optimization Stack

Reinforcement learning with verifiable rewards (RLVR) is rapidly advancing the reasoning capabilities of langu

深層学習正規化・最適化手法テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-07-21

Book-Mathematical-Foundation-of-Reinforcement-Learning — This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

Mathematical Foundations of Reinforcement Learningは、ディープラーニングにおける推論力学習の数学的基礎を網羅している。

用途: ディープラーニングに関する本書の制作
難易度: Easy
コスト: Medium

Search-on-Graph-R1: Training Large Language Models to Search Knowledge Graphs with Reinforcement Learning

Knowledge graph question answering (KGQA) requires navigating from topic entities to an answer several relatio

自然言語処理大規模言語モデルQAテキスト強化学習

用途: QA
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化テキスト強化学習

LLM-as-a-Coach: Experiential Learning for Non-Verifiable Tasks

この研究では、ルビック評価を含む非確認タスクの最適化を目的とします。従来のRLには、モデル評価の情報が使われるだけですが、モデル自身は反省や自己改善はすることがありません。ここでは、LJMをコーチとみなして、モデルが反省

用途: ルビック評価を含む非確認タスクの最適化
難易度: Hard
コスト: High

Integrity-Gated Eco-CACC: Epistemic Admissibility for Cooperative Driving at Signalized Intersections

Eco-Cooperative Adaptive Cruise Control (Eco-CACC) systems rely on accurate localization, signal timing, and i

センサ/時系列強化学習モデルベース検出

用途: 検出
難易度: Hard
コスト: Medium

The Open Ant: A Robot Platform for Reinforcement Learning Research

Reinforcement learning (RL) research has demonstrated success in both physical and simulated domains; however,

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Towards Torque-Driven Reinforcement Learning for Quadruped Locomotion

Reinforcement learning (RL) for legged robots is advancing locomotion, demonstrating its ability to adapt to n

センサ/時系列深層学習軽量化・量子化強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Isaac Sim-to-Real: Reinforcement Learning based Locomotion for Quadrupeds

existing locomotion methodの制約を解決するためのreinforcement learning based loco-manipulation method、Isaac Sim-to-Realを提

用途: ロボットの自律歩行を解決する
難易度: Hard
コスト: High

Importance Sampling and PCA for Finding Failures in Commercial Autonomous Vehicles

existing fault detection methodの限界を解決するためのadaptive stress testing methodを提案し、商用自動運転システムの故障率を減らす。

コンピュータビジョンセグメンテーション強化学習

用途: 自動運転システムの故障検出を解決する
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト動画

FARO: Feasibility-Aware Robot Motion Optimization

Fast planning of novel behaviors in unseen scenarios remains a fundamental challenge in robotics. The high-dim

用途: 生成
難易度: Hard
コスト: High

MI向き強化学習方策勾配 (PPO / A3C)生成

arxivGitHubあり2026-07-20

MEVION: Low-Cost Open-Source Data Collection System for Powerful and High-Speed Dual-Arm Manipulation

The global competition for developing robotic foundation models is intensifying. Among the data collection sys

用途: 生成
難易度: Hard
コスト: Medium

RT-SHCUA: Real-Time Self-Hosted Computer-Use Agent for UAV Control

Natural-language control offers a promising interface for unmanned aerial vehicles (UAVs), but directly applyi

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Value-Aware Prediction for Robust Multi-Agent Coordination Under Communication Loss

Robust multi-agent coordination relies heavily on inter-agent communication, which is frequently disrupted by

深層学習正規化・最適化手法テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

説明可能品質予測/異常検知強化学習方策勾配 (PPO / A3C)画像

ConceptTree: Bringing Semantic Transparency to Black-Box Decision Making for Robotic Manipulation

この論文では、ConceptTreeというフレームワークを提案しています。このフレームワークは、人の見える概念を使用して、マニピュレーションの高位のスキル選択を表現し、透明性を高めます。

用途: マニピュレーションの高位のスキル選択のための透明性の実現
難易度: Hard
コスト: High

Generalize and Guide: Decomposing Rewards for Few-Shot Inverse Reinforcement Learning

複数タスク間の説明性を提供するための逆強化学習は、複数タスク間の説明性を提供することによって、複雑なタスクを解決することに関与していますが、この研究では、複数タスク間の説明性を提供するための逆強化学習の新たなアプローチを

少数データ向き自然言語処理RAG強化学習

用途: 複数タスク間の説明性のための逆強化学習のための新たなアプローチ
難易度: Hard
コスト: Low

Lifelong Multi-Subsystem Pickup and Delivery with Buffer-Limited Handover Stations

Pickup and Deliveryシステムでは、ロード管理が大きな問題です。この研究では、 Pickup and Deliveryシステムにおけるオフロード管理を考慮した新しいアプローチであるHandover-Awa

用途: Pickup and Deliveryシステムのオフロード管理
難易度: Hard
コスト: Medium

Stability and Comfort in Mobile Robot-Pedestrian Interactions

Mobile robots in public spaces must ensure pedestrians' comfort, and yet empirical studies of walkers' subject

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Predictive Training with Latent Imagination for Visual Quadruped Navigation

四足ロボットのナビゲーションのための予測的推論方法が提案されます。ロボットは、現在の観察と短期的な記憶によってアクションを選択しますが、障害物の発展を予測することができないため、このアプローチには課題があります。この課題

深層学習Transformer画像

用途: ロボットのナビゲーション
難易度: Hard
コスト: High

Predicting Grasping Compliance in Robotic Hands through Analytical-Model-Informed Neural Networks

In robotic manipulation studies, grasping is often treated as a binary success or failure problem, usually def

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Disturbance-Aware Flight for Aerial Robots in Narrow Space

Autonomous flight of aerial robots in narrow space remains challenging due to strong aerodynamic disturbances

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

1-out-of-5 Maximin-Share Allocations Always Exist for Four Agents

For four agents with nonnegative additive valuations, a complete 1-out-of-5 maximin-share allocation always ex

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

MI向きセンサ/時系列強化学習マルチエージェント異常検知

Compositional Semantic Communication for Physical AI: Category Theory Meets Game Theory

Physical artificial intelligence (AI) systems involve distributed sensing agents with embedded AI models that

用途: 異常検知
難易度: Hard
コスト: Medium

huggingfaceGitHubありHugging Faceあり2026-07-20

Differentiable Logic Gate Networks for Low-Latency EEG Classification on Edge Devices

Real-time EEG classification on edge devices is bottlenecked by the floating-point arithmetic of conventional

CPUで試しやすい強化学習マルチエージェント分類検出

用途: 分類
難易度: Easy
コスト: Low

huggingfaceHugging Faceあり2026-07-20

ConsiSpace: Learning Geometric Consistency Matters for Video Spatial Reasoning

Video spatial reasoning is essential for navigation-oriented perception and long-video question answering, whe

深層学習軽量化・量子化QAテキスト動画

用途: QA
難易度: Easy
コスト: High

githubGitHubあり2026-07-20

Gymnasium — A standard API for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Gymnasiumは、シングルエージェントRLの疑似環境を提供するAPIです。

用途: 疑似環境を提供する
難易度: Easy
コスト: Medium

表形式向き深層学習Transformer表形式強化学習

Non-Asymptotic Best Policy Identification Guarantees in Online Reinforcement Learning

In this work we study the Best Policy Identification (BPI) problem in online, tabular Reinforcement Learning.

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

Rethinking the Suitability of Reinforcement Learning Algorithms Under Practical Transfer Constraints

Transfer-oriented reinforcement learning requires evaluating algorithms along dimensions that go beyond standa

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Retriever: Composing Closed-Loop Asynchronous Robot Programs

Building long-horizon robot agents requires composing closed-loop pipelines -- perception, belief update, plan

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

VIDAR: Visual-Inertial Dense Alignment and Reconstruction via a Geometric Foundation Model

Monocular foundation models provide dense geometry but usually lack a stable metric scale. This paper presents

強化学習画像

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Optimal Safety Control using High-Order Control Barrier Functions

This paper investigates the optimal safety control problem of nonlinear control systems by proposing novel hig

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Equilibrium analysis of three-player General Lotto game with leader-follower framework

In this paper, we introduce the General Lotto game with a regulator (R-Lotto), a leader-follower extension of

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-19

TimeLens2: Generalist Video Temporal Grounding with Multimodal LLMs

Video multimodal large language models (MLLMs) can describe what happens in a video, but rarely identify when

自然言語処理大規模言語モデル検出テキスト動画

用途: 検出
難易度: Easy
コスト: High

huggingfaceGitHubありHugging Faceあり2026-07-19

Distilled Reinforcement Learning for LLM Post-training

Large language model (LLM) post-training is essential for improving reasoning, adaptation, and alignment. Exis

説明可能品質予測/異常検知深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

arxivPaper only2026-07-18

Value-Monotonicity Matters: A Concordance Loss for Deep Survival Prediction

Deep survival models are evaluated almost exclusively by the concordance index (C-index), yet they are commonl

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

説明可能強化学習方策勾配 (PPO / A3C)画像

arxivGitHubあり2026-07-18

SinD 2.0: A Multi-City UAV Dataset with Semantic Risk Annotations for SOTIF-Oriented Safety Validation at Signalized Intersections

Safety validation at signalized intersections remains a critical bottleneck for the deployment of autonomous d

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-18

Censorship Resistance and Throughput with Multiple Concurrent Proposers

Censorship resistance is the defining advantage of blockchains over their centralized counterparts. Yet block

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-18

Audited Auctions: Reducing Harms in Advertising

Although standard auction mechanisms help truthfully reveal preferences of bidders, they can inadvertently res

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-18

Group Entropy-Controlled Policy Optimization

Entropy control has become an effective tool in reinforcement learning (RL) of large language models (LLMs), h

深層学習軽量化・量子化生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

From Optimal Policies to Individual Differences: Rethinking Reinforcement Learning for Biology

Reinforcement learning (RL) is primarily known as a computational method for optimizing control tasks, but it

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

Differentiable Reinforcement Learning for Path Tracking by an Agile Fish-Like Robot

Fish-like swimming has inspired the design of several dozens if not hundreds of bioinspired robots in the last

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョンマルチモーダル画像強化学習

Foresight Residual RL for Long-Horizon Robot Manipulation with Vision-Language-Action Models

Vision-Language-Action (VLA) policies offer strong general-purpose manipulation priors, but often fail on tigh

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Certifiable Safe Model-Based Reinforcement Learning with Control-Affine Dynamics Approximation

Safe model-based reinforcement learning (RL) often bridges control-theoretic analysis and RL for robots to saf

深層学習軽量化・量子化生成3D強化学習

用途: 生成
難易度: Hard
コスト: High

Linear Stability Analysis of an INDI Pitch-Rate Controller under Model Mismatch for a Tilt-Rotor VTOL UAV

Incremental Nonlinear Dynamic Inversion (INDI) is attractive for unmanned aerial vehicle (UAV) flight control

説明可能強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

コンピュータビジョンセグメンテーション生成テキスト

Handroid: Bridging Dexterous Hand and Humanoid

この研究では、Robotのヘッドレスアンドメインアームの両方を1台のロボットに組み込み、両機能を切り替えれるようにする技術、Handroidを開発しています。

用途: ヘッドレスアンドメインアームの両方の開発
難易度: Hard
コスト: Medium

A New Implementation of NeoSLAM and a Comparative Evaluation with RatSLAM

この研究では、SLAMアプリケーション、NeoSLAMとRatSLAMを比較評価し、NeoSLAMを改良するとともに、比較評価のための基準となるデータセットを提案しています。

用途: SLAMアプリケーションの比較評価
難易度: Hard
コスト: Medium

Let the Body Follow: Coupled Egocentric Control for Whole-Body Robot Teleoperation

この研究では、ロボットの制御をエゴセンタリックにし、視覚情報と身体情報を連携させて、ロボットの移動と姿勢を制御することができるシステムを提案しています。

用途: 連携したエゴセンタリックで全体的なロボットの制御
難易度: Hard
コスト: Medium

Data and Learning Where it Matters for Contact-Rich Manipulation

この研究では、接触の豊富なマニピュレーションを実現するための、データの収集と学習を改良した方法を提案し、ロボットの制御の精度を

自然言語処理RAG異常検知強化学習

用途: 接触の豊富なマニピュレーションのためのデータ収集と学習
難易度: Hard
コスト: Low

Vessel Trajectory Prediction using COLREGs-aware Optimal Planning

This paper presents a trajectory prediction method for marine vessels based on optimal planning. Crude initial

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Fair Allocation of Divisible Goods under Non-Linear Valuations

We study the problem of dividing homogeneous divisible goods among agents with non-linear valuations. Specific

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

SeerGuard: A Safety Framework for Mobile GUI Agents via World Model Prediction

Mobile graphical user interface (GUI) agents have demonstrated remarkable capabilities in automating complex t

強化学習モデルベース

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

CPUで試しやすい深層学習軽量化・量子化マルチモーダル強化学習

JoyNexus: Service-Oriented Multi-Tenant Post-Training for VLA Models

The post-training of Vision-Language-Action (VLA) models is essential due to the diversity of simulators, robo

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

Understanding Reasoning from Pretraining to Post-Training

Reinforcement learning (RL) has become central to improving large language models (LLMs) on complex reasoning

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

When Does Muon Help Agentic Reinforcement Learning?

Muon is competitive with AdamW in large-scale pre-training, but its value for reinforcement-learning (RL) post

深層学習正規化・最適化手法強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

DSWorld: A Data Science World Model for Efficient Autonomous Agents

Despite strong capabilities in data understanding and decision-making, autonomous data science agents still he

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-07-17

open_spiel — OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.

ゲームの一般的な強化学習用エンドポインティであるEnvironmentおよびアルゴリズムの集合。

用途: ゲームの一般的な強化学習用エンドポインティ
難易度: Easy
コスト: Medium

arxivPaper only2026-07-16

Proactive Inpatient Bed Requests for Emergency Department Admissions

Emergency department (ED) boarding occurs when admitted patients remain in the ED while awaiting inpatient bed

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

arxivPaper only2026-07-16

SMC-ES: Automated synthesis of formally verified control policies

The deployment of autonomous cyber-physical systems in safety-critical environments requires closed-loop contr

強化学習モデルフリー (DQN / SAC)生成

用途: 安全な制御ポリシーを自動生成する
難易度: Hard
コスト: Medium

arxivPaper only2026-07-16

PAC Learning in Turn-Based Stochastic Games with Reachability Objectives: A Decentralized Private Approach via Expected Conditional Distance

この研究は、多人数確率ゲームへのPAC学習を研究することに関心があります。PAC学習は、機械学習モデルの確信度を高めると同時に、モデルの誤差を低下させるものです。

コンピュータビジョンセグメンテーション強化学習

用途: 多人数確率ゲームへのPAC学習を研究する
難易度: Hard
コスト: Medium

huggingfaceGitHubありHugging Faceあり2026-07-16

On-Policy Delta Distillation

On-policy distillation is an alternative post-training method in reinforcement learning that alleviates the co

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-16

Beyond Entropy: Correctness-Aware Advantage Shaping via Contrastive Policy Optimization

Reinforcement learning with verifiable rewards (RLVR) commonly uses entropy for advantage shaping. However, en

深層学習軽量化・量子化生成強化学習

用途: 生成
難易度: Easy
コスト: Medium

NeuralChaos: Optimal Adapted Approximation of Square Integrable Predictable Processes

可予性プロセスの近似は、数学的ファイナンス、機械学習、制御理論、物理学などの分野で重要な問題である。このため、NeuralChaosを用いて、可予性プロセスの近似を解くための新しい方法を提案した。

用途: 可予性プロセスの近似
難易度: Hard
コスト: Medium

少数データ向き強化学習マルチエージェント回帰テキスト

Operator-Informed Gaussian Processes for Complex Helmholtz Wavefields: From Synthetic Benchmarks to In Vivo Brain Elastography

Helmholtz方程式は、時間共伴振波の伝播を記述する重要な方程式であり、媒質が損失した場合複素係数を持ちます。ここでは、空間での波場から波方程式を推測するために、物理知識に基づくGaussian Process（GP

用途: 复雑なHelmholtz波場のための物理知識に基づくGaussian Process
難易度: Hard
コスト: Medium

Algebraic Representability as the Limiting Regime of Grokking: An Exactly Solvable Model with Holomorphic Activations

分割的な計算の極限に、表現可能な関数クラスが有限次元の代数的多様体に退化することを示し、モデルキャパシティの増加が一般化を促進することを明らかにした。

用途: アルゴリズムの表現可能性
難易度: Hard
コスト: High

DAGR: State-Conditioned Goal Representations via Difference-Aware Goal Cross-Attention

この研究では、目標が現在の状況に依存するゴール表現を確立します。研究者は、目標の静的表現をステート条件表現に更新することで、現在の状況に応じて目標を修正します。

深層学習Attention機構強化学習

用途: ステートコンディショナルゴール表現
難易度: Hard
コスト: Low

Stable Voting is PSPACE-Complete

Stable Voting and Simple Stable Voting, introduced by Holliday and Pacuit, are Condorcet-consistent voting rul

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

The Dynamic Verifiable Multi-Agent Human Agentic Loyalty Loop (DVM-HALL) Model and the Net Human-Agent Score (NHAS) in Autonomous Commerce

自動販売店で客と交わるAIロボットの信頼性を確立する必要がある。このモデルは、客とロボットの信頼関係を構築し、客の買い物をサポートすることを目的としている。

強化学習RLHF

用途: 自動販売店で客と交わるAIロボットの信頼性の確立
難易度: Hard
コスト: Medium

githubGitHubあり2026-07-15

vowpal_wabbit — Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbitは、機械学習を進歩させるためのオンライン学習、ハッシュ、reduceなどの強力なアルゴリズムを含むシステムです。その結果、さまざまな問題に応じて、高品質な解決策を提供できます。

用途: 強い機械学習アルゴリズムを実行し複雑な問題を解決するためのシステム
難易度: Easy
コスト: Medium

arxivPaper only2026-07-14

A Better-than-$e^{1/e}$ Approximation Algorithm for Nash Social Welfare under Additive Valuations

We present an $(e^{1/e} - c)$-approximation algorithm for maximizing Nash social welfare under additive valuat

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-14

Cycles in Liquid Democracy: A Game-Theoretic Justification

代表権という概念は、政治や経済のシステムで重要である。デリゲーション、または代表権の授与、はさまざまなシステムに現れる。デリゲーションを安全かつ効率的かつ有効に運用するために、意思決定者はデリゲーションを設計する際に考慮

品質予測/異常検知強化学習

用途: 代表権の確保
難易度: Hard
コスト: Medium

Disentangling Forced and Internal Climate Variability in Single Realizations using Dynamic Mode Decomposition with Control

We show that a single climate realization can be decomposed into forced and internal components by treating ex

説明可能強化学習モデルベース検出回帰

用途: 検出
難易度: Hard
コスト: Medium

説明可能強化学習モデルフリー (DQN / SAC)テキスト

Auditing the Risk Claims of Distributional Reinforcement Learning

分布型強化学習のリスク評価を容易にするために、分布型強化学習におけるリスク評価を分析しました。

用途: 分布型強化学習のリスク評価
難易度: Hard
コスト: High

品質予測/異常検知強化学習方策勾配 (PPO / A3C)検出

Removable Defects: The Economics and Limits of Deliberate Deficiency

A specialist tolerates blind spots that a generalist does not. Usually this is treated as a cost to be minimiz

用途: 検出
難易度: Hard
コスト: High

Actor-Critic Learning for Extended Mean Field Control with Deterministic Policies

This paper develops a model-free reinforcement learning framework for continuous--time extended mean field con

深層学習Transformer強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

表形式向き品質予測/異常検知深層学習Transformer検出表形式強化学習

Transformer-Guided Swarm Intelligence for Frugal Neural Architecture Search

この研究では、従来のNAS方法のコストを抑えるための方法を開発します。この方法では、NASをトランスフォーマーを使用して実行します。

用途: NAS (Neural Architecture Search) のコストを抑えるための方法を開発
難易度: Hard
コスト: Low

品質予測/異常検知強化学習方策勾配 (PPO / A3C)

Philosopher and Prophet Inequalities for Divisible Items

We study online welfare maximization with divisible resources. A sequence of $n$ players arrive one by one; up

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-13

SVR-R1: Bootstrapping Multi-modal Reasoning with Self-verification in Reinforcement Learning

We introduce Self-Verified Reasoner (SVR-R1), a multi-turn RL framework that turns a model's own verification

コンピュータビジョンセグメンテーション生成マルチモーダル強化学習

用途: 生成
難易度: Easy
コスト: High

arxivPaper only2026-07-12

Reinforcement Learning for Execution under Dynamic Fees in a Closed-Loop DEX Simulator

Trader-facing dynamic fees are increasingly proposed for automated market makers (AMMs), but historical data d

表形式向き自然言語処理RAG表形式強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

arxivPaper only2026-07-12

The Complexity of Computing Coarse Correlated Equilibria in Markov Games with a Single Controller

We study the complexity of computing stationary Markov coarse correlated equilibria (CCE) in discounted single

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-12

Which Wallpaper Groups Arise from Tiled Games?

Which discrete symmetry groups can arise from strategic interaction? We tile the plane with copies of a bimatr

強化学習方策勾配 (PPO / A3C)分類

用途: 分類
難易度: Hard
コスト: Low

huggingfaceHugging Faceあり2026-07-12

Predictive Divergence Masks for LLM RL

Reinforcement learning for large language models (LLMs) typically relies on trust-region masks to stabilize of

深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

Double elimination formats for a 64-team FIFA World Cup

The recent expansion of the FIFA World Cup to 48 teams has prompted discussions regarding a potential further

品質予測/異常検知強化学習マルチエージェント

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Best-of-Both-Worlds Fairness for Mixed Goods and Chores

We study the fundamental problem of fairly dividing indivisible items among agents with additive utilities. In

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Optimal Subsidy Bounds for Goods and Chores: One Dollar Each Suffices

We study the fair allocation of $m$ indivisible items to $n$ agents with additive utilities. In our setting, e

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Fair Division with Binary Valuations: Characterizations

We consider the fair allocation of indivisible goods with binary valuations. In this setting, the maximum Nash

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-11

Beyond Euclidean Clipping: Overcoming Exploration Collapse in LLM RL via Riemannian Isometric Policy Optimization

Reinforcement learning (RL) has become a dominant paradigm for enhancing LLMs' reasoning capabilities. However

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

arxivPaper only2026-07-10

A Symbolic Neural CPU for Quantization-Simulated Writeback and Interpretable Program Execution

Neural networks can learn algorithmic input-output mappings, but trusting a learned executor requires more tha

説明可能CPUで試しやすい深層学習Transformer強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-10

Beyond Bayesian Nash: Learning Minimax-Regret Equilibria for Adversarial Team Games under Asymmetric Information

Adversarial team games (ATGs) with asymmetric information, such as adversarial path-finding, goal search, and

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-10

Implicit Midpoint Gradient Descent: Fast and Learning rate free convergence for Zero-Sum Games

We study unconstrained bilinear zero-sum games, a fundamental model in online learning, adversarial optimizati

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-09

Offline Nash Solvers Meet Online Tree Search in Multi-Agent Games on Graphs

マルチエージェントゲームにNash合図を解決するためのPrimitive-GuidedTree Searchアルゴリズムを提案。

用途: マルチエージェントゲームを解決する
難易度: Hard
コスト: Medium

Pure Nash Equilibria in Graphical Games of Bounded Width Revisited

We revisit the complexity of deciding whether a graphical game admits a pure Nash equilibrium (PNE) parameteri

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Positional Determinacy with Colored Vertices: a 1-to-2-Player Lift

位置的決定性を保証するために、頂点を色付けした対称ゲームを研究した。その結果、ペアの色付けゲームの位置的決定性も保証されることがわかりました。

用途: 位相決定の問題
難易度: Hard
コスト: Medium

Eigenmanifold in Game: Evidence from human continuous strategy game experiments

In evolutionary game dynamics, there exists a hypothesis, which states that, the dynamic structure of the game

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Stable Matchings with Minimum Utility Gap

安定マッチング問題では、エージェントが均等な利益を得られるようにマッチングを行う問題です。この問題を解くために、パートナーを2つ以上選択できるマッチングを取り巻く枠組みを提案し、2つのメートルを使用して利益の均衡度を評価

用途: 安定なマッチングプレイの問題
難易度: Hard
コスト: Medium

Simple Nash Equilibria for Qualitative Multiplayer Games

この研究では、確率的ゲーム理論の不完全情報の問題を調べました。不完全情報にはゲームの結果に関する不確実性があります。このような状況では、ゲーム理論者はゲームの結果を予測するために情報を取得することになります。

用途: ゲーム理論の不完全情報問題
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-08

DeepSearch-World: Self-Distillation for Deep Search Agents in a Verifiable Environment

Training tool-use agents to improve from their own experience remains challenging, as supervised fine-tuning r

深層学習軽量化・量子化生成強化学習

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-08

Agon: Competitive Cross-Model RL with Implicit Rival Grading of Reasoning

Reinforcement learning from verifiable rewards (e.g. GRPO) is the engine behind today's reasoning models, yet

コンピュータビジョンセグメンテーションテキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-07-08

deep-reinforcement-learning — Repo for the Deep Reinforcement Learning Nanodegree program

この研究はDeep Reinforcement Learningに関する学習用リポジトリです。

用途: 実装・検証基盤
難易度: Easy
コスト: Medium

A Gold-Standard Study of What Makes a Lightweight Game-Playing Agent Strong

これは、プレイヤーが勝つゲームの勝利条件の強制とパロディーを目的としています。カードプレーヤーのゲームで特に興味を持っています。

深層学習CNNテキスト強化学習

用途: パソコンゲームの勝利するアリソーの決定
難易度: Hard
コスト: High

Quantum combinatorial games

これは、量子論を使用して、ゲーム理論の分野に新たなアプローチを提案する研究です。研究では、2つのプレイヤー間でプレイされる、確率のないゲームに関する既存の理論を検討しています。

用途: kvantum kombinatorial game
難易度: Hard
コスト: Medium

arxivGitHubあり2026-07-07

FootsiesGym: A Fighting Game Benchmark for Two-Player Zero-Sum Imperfect-Information Games

格闘ゲームNeutral Playにおける非確定情報ゲームを取り扱い、非確定情報ゲーム向けのオープンソース環境 FootsiesGymを開発した。

用途: 格闘ゲーム環境作成
難易度: Hard
コスト: High

センサ/時系列深層学習軽量化・量子化検出生成強化学習

6G Sensing Security: Distributed Game-Theoretic RL for Urban Beamforming and Attacker Detection

Next-generation wireless networksにおける分散型ゲーム理論を用いた6Gのセキュリティを研究します。分散型ゲーム理論は、6Gの通信システムが環境の認識とデータの伝送両方を実現するために必要な

用途: 6Gにおける分散型ゲーム理論
難易度: Hard
コスト: Medium

Strategic Bargaining in Multi-Buyer Markets: Reinforcement Learning from Verifiable Rewards for LLM Negotiations

複数の買い手を持つ市場における交渉システムを構築します。マーケットの規模を知り切れていない場合、セラーの損失が生じます。セラーは市場の規模を測る必要がありますが、これは複数の買い手を持つ場合に困難です。

用途: 複数の買い手を持つ市場における交渉
難易度: Hard
コスト: High

arxivPaper only2026-07-06

Game Conductors of Finite Groups: Determinantal Torsion from Structured Payoff Probes

We attach to a finite group $G$ and a structured payoff probe $φ$ an integer \emph{payoff-difference lattice}

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-06

Dynamics and Convergences for Markov Coevolutionary Opinion Formation Games in Dynamic Social Networks

While deterministic variants of the coevolutionary opinion formation games such as the K-Nearest Neighbor (K-N

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-06

Multi Choice Min Prophet

We study the minimization counterpart of the classic prophet inequality, often termed the min prophet or cost

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-05

Mechanism Design for Locating a Bridge Between Regions with Prelocated Facilities

In many urban planning projects, social planners require the construction of a bridge to connect two regions s

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-03

New bounds on randomized metric distortion of top-$k$ voting

We prove new upper and lower bounds on metric distortion for randomized social choice mechanisms. Under first-

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-03

Random Serial Dictatorship is $\sqrt{2}$-Envy-Free

We analyze the house allocation problem, in which a set of agents must be matched to a set of objects for whic

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-03

A Tractable Continuous-Time Model for Designing Interventions for Time-Inconsistent Agents

Designing effective goals and rewards for time-inconsistent agents is a central problem in many long-term task

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

表形式向き品質予測/異常検知自然言語処理RAG生成表形式強化学習

Hybridizing a Grouping Metaheuristic with Reinforcement Learning for the One-Dimensional Bin Packing Problem

1D バイナリングパッキング問題（1D-BPP）とは、さまざまな用途に多く応用される、分配不可能なNP困難な組合せ最適化問題である。この研究では、Falkenauerのハイブリッドグループゲンエイリアスアリファメント（

用途: 1D バイナリングパッキング
難易度: Hard
コスト: Low

Complex dynamics in the Sherrington-Kirkpatrick game

エンビリー率という、公平な割り当てに基づく新しいロケーションゲームの問題を解決するための、ステーションポイントの最適位置を決定するためのアプローチを提示しました。

用途: エンビリー率に基づくロケーションゲーム
難易度: Hard
コスト: Medium

Facility Location Game with Envy Ratio

マックス方程式に基づく二階ロケーション問題の問題を解決するための、アプローチを提示しました。

用途: マックス方程式に基づく二階ロケーション問題
難易度: Hard
コスト: Medium

Deep Reinforcement Learning to Master the Asymmetric Strategy of Baghchal

Baghchal is a two-player asymmetric board game with Nepali origins where four tigers are to capture goats and

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

MMAO-Dyn: A Metabolic Multi-Agent Optimizer for Dynamic Optimization

この研究では、メタボリックマルチエージェント最適化 (MMAO) が動的最適化に適用できるようにする必要がありました。MMAO-Dyn は、環境の変化によって元の有効な局所的構造を無効にした非stationary な設

強化学習マルチエージェントテキスト

用途: 動的最適化
難易度: Hard
コスト: Medium

Asymmetric Trading Prophets

この研究では、トレーダーと預言者の動作を研究します。トレーダーは価格変化を予測し、利益を最大化します。預言者は価格変化が予測できることを知っています。この研究では、トレーダーと預言者の競合する行動を分析し、トレーダーの利

用途: 交易者と預言者
難易度: Hard
コスト: Medium

Multiwinner Voting with Spatial Preferences under Incomplete Information

この研究では、多くの候補者を持つ投票問題を研究します。投票者は複数の候補者を支持し、評価を評価したり、拒否したりすることができます。この研究では、投票に公平性を考慮する方法を提案します。

用途: 多くの候補者の投票法
難易度: Hard
コスト: Medium

Which Voting Rules Are More Resilient to Coalitional Manipulation?

この研究では、多くの候補者の投票法の持続可能性を研究します。投票法は、投票者が投票を操作することを防ぐことができます。この研究では、投票法の持続可能性を評価します。

用途: 多くの候補者の投票法の持続可能性
難易度: Hard
コスト: Medium

自然言語処理ファインチューニング分類埋め込み強化学習

Diffusing Blame: Task-Dependent Credit Assignment in Biologically Plausible Dual-Stream Networks

Biological neural circuits obey Dale's principle: each neuron's synapses are uniformly excitatory or inhibitor

用途: 分類
難易度: Hard
コスト: High

A Large-Scale Empirical Evaluation of MMAO Under Fair-Budget Continuous and Discrete Benchmarks

この研究では、多様なベンチマークを用いて、Metabolic Multi-Agent Optimizer (MMAO)の適切性を評価します。MMAOは、複数エージェント間でリソースを分配するための閉ループのシステムです。

用途: 適切な方法を用いてリソース分配を最適化する
難易度: Hard
コスト: Medium

Knowing Who, Not How Much: Learning-Augmented Mechanisms for Consumer Utility Maximization

個人の価値を尊重するためのメカニズム設計の研究。個人の価値とメカニズム設計の関係を考察し、個人の意思決定を援助するためのメカニズムを設計する。

用途: 個人の意思決定を援助するためのメカニズム設計の研究
難易度: Hard
コスト: Medium

品質予測/異常検知強化学習マルチエージェントテキスト

A Contextual-Bandit Oversight Game with Two-Sided Informational Asymmetry

AIを援助するための意思決定者によるオーバーサイトの研究。AIが提案した行動の評価と決定を行うために、意思決定者とAIが情報を交流するオーバーサイトの実現を研究する。

用途: AIを援助するための意思決定者によるオーバーサイトの研究
難易度: Hard
コスト: Medium

Learning Fair Allocation of Indivisible Items from Limited Feedback

個人の価値を尊重するためのアイテムの分配を決定するアルゴリズム。この研究では、個人のアイテムの価値を尊重するための分配を決定するアルゴリズムを開発する。

用途: 個人の価値を尊重するためのアイテムの分配を決定するアルゴリズム
難易度: Hard
コスト: Medium

arxivPaper only2026-06-27

Reaching as Cheap as Possible in 1-clock Robust Weighted Timed Games

The value problem for 2-player games on graph generally consists in determining the minimal value Min can ensu

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-26

Neuromorphic Energy-Aware Learning for Adaptive Deep Brain Stimulation

Neuromorphic and edge computing research has focused on reducing the inference cost of neural network controll

深層学習軽量化・量子化音声強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-26

GTI-mSEMP Framework : A Proposed Framework to Simulate Malware Propagation with Inclusion of Attacker-Defender Strategy

マルウェアの感染は、ネットワーク全体に広がる可能性があります。既存のモデルの場合、脅威に対する防御戦略は静的なパラメータとして扱われますが、実際には攻撃方策と防御方策の間の競合関係に依存します。このため、ゲーム理論を用い

センサ/時系列強化学習

用途: マルウェアの感染をシミュレーションし、防御戦略を提案する
難易度: Hard
コスト: Medium

arxivPaper only2026-06-26

Characterisation of reactive Nash equilibria in repeated additive games

共同作業ゲームでは、2つのプレイヤーが協力または競争

用途: 共同作業ゲームにおける反応戦略の分析
難易度: Hard
コスト: Medium

説明可能MI向き品質予測/異常検知深層学習軽量化・量子化生成

Multi-Objective Molecular Generation with Frequency-Controlled Evolutionary Dynamics

Molecule generation methods that leverage generative models have been successfully applied to drug discovery.

用途: 生成
難易度: Hard
コスト: High

Pick Two: An Adversarial Animal Survival Game

The "Pick Two" animal selection puzzle is a popular thought experiment in which two animal species must defend

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Almost EFX in Hypergraphs

この研究では、個々の価値に基づいて分割可能な財を分配する方法を提案している。この分配方法は、個々の価値を考慮しながら、効率的な分配を目指している。

用途: 分割可能な財の分配
難易度: Hard
コスト: Medium

Existence of Pure Strategy Nash Equilibria in Finite Noncooperative Games

この研究では、非協力ゲームの純戦略均衡の条件を提案している。この条件は、個々のゲームの結果を考慮しながら、均衡の必要性を評価している。

用途: 非協力ゲームの純戦略均衡
難易度: Hard
コスト: Medium

EvoFlock: evolved inverse design of multi-agent motion

多エージェントモデルの調整は、現実的なシミュレーションの実現を支援します。本研究では、新しく開発したモデルによって、調整を行うことができます。

用途: 多エージェントモデルの調整
難易度: Hard
コスト: Medium

Hotelling-Downs with Facility Synergy: The Mall Effect

このプロジェクトでは、複数のステイションに対応するマルチステイションのエンドポイントの最適配置を探します。

用途: 位置付けるためのマルチステイションのエンドポイントの最適配置の開発
難易度: Hard
コスト: Medium

Restoring Incentive Compatibility in Two-Stage Energy Markets with Prosumers

分布制御に基づく電力市場の問題は、供給と需要がバランスのとれた状況ではなく、供給が需要より多い状況を表現することができます。

強化学習マルチエージェント生成

用途: 電力供給の分散化における不均衡解決問題の解決
難易度: Hard
コスト: Medium

How to program a never-losing chess engine

This article proposes a model, based on graph theory, to represent a variety of two-player games of perfect in

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Equilibrium and Infeasibility: A new solution concept for games

この研究では、ゲームの非共通性を考慮した新しい解決概念の提案、共通解決のための制約を用いない、多項式時間解決を提案します。

用途: 共同ゲームにおける不可能性の対処の研究
難易度: Hard
コスト: Medium

Strict Fairness at What Cost? Envy-Free Contracts with Subsidies

共同契約設計は、代理人が複数のタスクを、代理人に分配するという点で重要です。

用途: 共同契約設計における偏りのなく、公平な契約の設計の研究
難易度: Hard
コスト: Medium

Decidability and Undecidability Results for LIA-Definable Impartial Combinatorial Games

この研究では、非決定主義的ゲームの可解性と不可解性に関する定量的な結果を示します。

用途: 有界可能性に関する線形整数方程式の定義を持つ非決定主義的ゲームの可解性と不可解性の研究
難易度: Hard
コスト: Medium

arxivPaper only2026-06-22

Rationalizing collective revealed preferences with an application in fair resource allocation

This paper presents a revealed preference approach for rationalizing collective consumption behavior. We intro

説明可能強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-22

Flow Games with Public Arcs: the Least Core and the Nucleolus

We study flow games with public arcs, an extension of classical cooperative flow games that allows players to

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-21

A Note on Learnable Nash Equilibrium

A Nash equilibrium is learnable if there exists a myopic adjustment dynamic for which it is asymptotically sta

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-21

Fundamental market design as a layer of AI-agent alignment

This paper argues that AI-agent alignment in markets should not be understood only as a property of agents, bu

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-21

Stationary Robust Mean-Field Games under Model Mismatches

Deploying multi-agent reinforcement learning (MARL) in the real world is often limited by model mismatches bet

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-06-20

Physics-Informed Eikonal Caging for Whole-Arm Manipulation Planning

Planning contact-rich whole-arm manipulation is challenging because interactions that involve extended robot g

品質予測/異常検知強化学習方策勾配 (PPO / A3C)動画

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-06-20

Game-Theoretic Framework for Private Data Sharing in Vehicular Networks

We present a novel game-theoretic framework designed to enhance privacy and scalability in decentralized vehic

センサ/時系列強化学習方策勾配 (PPO / A3C)

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivGitHubあり2026-06-18

Evolutionary Discovery of Developmental Reward Schedules in Deep Reinforcement Learning

The temporal structure of reward composition in reinforcement learning (RL) is typically hand-designed and hel

MI向き深層学習Transformer強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivGitHubあり2026-06-18

Provably Sub-Linear Two-Timescale NeuroEvolution with Online Plasticity

NeuroEvolution of Augmenting Topologies (NEAT) is a widely used neuroevolution algorithm for learning neural n

コンピュータビジョンセグメンテーション強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-15

Evolutionary Bilevel Reward Shaping for Generalization in Reinforcement Learning

移動環境のロボット学習を可能にするアルゴリズムが提案されている。

機械学習特徴量エンジニアリング強化学習

用途: 移動環境のロボット学習
難易度: Hard
コスト: High

huggingfaceHugging Faceあり2026-05-07

Masked Diffusion Language Models are Strong and Steerable Text-Based World Models for Agentic RL

Recent growth in reinforcement learning (RL) has surfaced a need for diverse, specialized training environment

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High