強化学習

強化学習方策勾配 (PPO / A3C)分類テキスト

paperless-ngx — A community-supported supercharged document management system: scan, index and archive all your documents

gradio — Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Pythonでマシンラーニングアプリを作成・共有することができるライブラリです。

強化学習方策勾配 (PPO / A3C)画像

MaaAssistantArknights — 《明日方舟》小助手，全日常一键长草！| A one-click tool for the daily tasks of Arknights, supporting all clients.

ゲーム『明日方舟』の支援ツール。全日常のタスクを一括で実行可能。

未読 130件

強化学習方策勾配 (PPO / A3C)分類テキスト

paperless-ngx — A community-supported supercharged document management system: scan, index and archive all your documents

用途: ドキュメント管理
難易度: Easy
コスト: Low

gradio — Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Pythonでマシンラーニングアプリを作成・共有することができるライブラリです。

強化学習方策勾配 (PPO / A3C)画像

用途: マシンラーニングアプリ作成
難易度: Easy
コスト: Medium

MaaAssistantArknights — 《明日方舟》小助手，全日常一键长草！| A one-click tool for the daily tasks of Arknights, supporting all clients.

ゲーム『明日方舟』の支援ツール。全日常のタスクを一括で実行可能。

用途: ゲームの支援ツール
難易度: Easy
コスト: Medium

machine-learning-for-trading — Code for Machine Learning for Trading, 3rd edition — from data sourcing to live execution.

LLMの推論 Transparency を高めるために、DiffusionGemmaの計算を分離しVariable Transparency とAlgorithmic Transparencyを評価します。

用途: LLMの透明性、誤用、過度安定化を理解する
難易度: Easy
コスト: High

stable-baselines3 — PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

このリポジトリでは、LLMベースのエージェントアプリケーションのための強化学習の橋渡しを提供しています。

用途: 強化学習を簡素化させる橋渡し
難易度: Easy
コスト: High

PufferLib — Puffing up reinforcement learning

用途: 強化学習用ライブラリ
難易度: Easy
コスト: Medium

Approximate Quantum State Preparation Through Proximal Policy Optimization

この研究では、深層強化学習を用いて、クォンタムSTATEPREPARATIONの近似方程式を学習し、クォンタムシステムの最適な操作手法を検討するための新しいアプローチを提案します。

用途: クォンタムSTATE PREPARATION
難易度: Hard
コスト: Medium

品質予測/異常検知強化学習マルチエージェントテキスト

AREX: Towards a Recursively Self-Improving Agent for Deep Research

Deep research requires agents to find answers that jointly satisfy multiple constraints. Discovering such answ

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

A New Well-Supported Semantics for Description Logic Programs

この研究では、大規模言語モデルを活用して、説明理論の拡張を研究しました。大規模言語モデルを活用することで、説明理論の拡張が可能になりました。

用途: 説明理論の拡張
難易度: Hard
コスト: Medium

Hybrid MKNF with Classical Negation in the Rule Component

この研究では、大規模言語モデルを活用して、双方の否定を許容する制御論理プログラミング言語を開発しました。大規模言語モデルを活用することで、双方の否定を許容する制御論理プログラミング言語が可能になりました。

用途: 双方の否定の許容
難易度: Hard
コスト: Medium

Chess\_db: A framework for working with large chess game datasets

Chess is a two player strategic game that is embedded in classical AI culture as it was once the frontier for

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivGitHubあり2026-07-23

Workflow-Localized Mechanism Learning: Attribution-Guided Repair and Knowledge Reuse for Structured Agent Skills

Agent Skills package reusable procedural knowledge as external artifacts for frozen language-model agents, yet

MI向き強化学習方策勾配 (PPO / A3C)

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Advances in STV Margin Computation

Single transferable vote (STV) is a multi-winner preferential proportional electoral system. The margin is the

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Discrete Truthful Heterogeneous Two-Facility Location: The Line and Beyond

We study deterministic strategyproof mechanisms for discrete heterogeneous two-facility location. In our model

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

githubGitHubあり2026-07-23

qlib — Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, including supervised learning, market dynamics modeling, and RL, and is now equipped with https://github.com/microsoft/RD-Agent to automate R&D process.

クエンティング投資プラットフォームを実現するためにAI技術を活用します。

強化学習方策勾配 (PPO / A3C)教師あり

用途: クエンティング投資プラットフォーム
難易度: Easy
コスト: Medium

説明可能品質予測/異常検知強化学習方策勾配 (PPO / A3C)分類音声

Explanation-Based Runtime Verification for Trustworthy ML-driven Optical Networks

Machine learning (ML) models are increasingly integrated into optical network automation frameworks to support

用途: 分類
難易度: Hard
コスト: Low

PG-KINN: A Physics-Informed Petrov-Galerkin Kolmogorov-Arnold Network for Solving Forward and Inverse PDEs

この研究では、物理学に関する知見を学習アーキテクチャに組み込んだPetrov-Galerkinコロモゴロフアーノルドネットワーク(Physics-Informed Petrov-Galerkin Kolmogorov-A

用途: 方程式の解決における学習の改善
難易度: Hard
コスト: Medium

MI向きセンサ/時系列強化学習方策勾配 (PPO / A3C)テキスト時系列

Post-Training in Time Series Foundation Models: A Unifying Framework

この研究では、学習前の時系列ベースの学習模型を、トレーニング後の適応を使用して、目的のタスクに適応させる方法を提案しました。

用途: 時系列ベースの学習模型のトレーニング後の適応
難易度: Hard
コスト: High

Fisher Widths: Local Learning Geometry and Anisotropic Recovery

We study Gaussian-width complexity on statistical manifolds through a pair of functionals: the primal Fisher w

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Dreamer-CPC: Message Learning with World Models for Decentralized Multi-agent Reinforcement Learning

分散されたシステムにおける分散多エージェント強化学習を実現するための方法を提案している。この方法は、個々のエージェントがローカルな観測に基づいてメッセージを交換し、長期の経験を考慮したメッセージを学習することで、分散され

強化学習方策勾配 (PPO / A3C)埋め込み

用途: 分散されたマルチエージェント強化学習
難易度: Hard
コスト: Low

The World Model Remembers, the Actor Forgets: Dream Rehearsal for Continual Model-Based RL

Model-based reinforcement-learning agents of the DreamerV3 family forget catastrophically when trained on task

強化学習モデルベース

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Courteous Anticipation: Improving Long-Lived Task Planning in Persistent Shared Environments

We consider a task planning scenario in which robots sharing a persistent environment are assigned tasks one a

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

TRUST-ESD: A Risk-Calibrated and Governance-Aware AI Framework for Enterprise Strategic Decision Support Under Uncertainty

Enterprise strategic decision support requires AI systems that are not only accurate, but also uncertainty-awa

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

説明可能強化学習方策勾配 (PPO / A3C)分類テキスト

Two-Step Occupation Coding

職業コード付けは、職業タイトルから職業分類を識別することであり、二つのステップで実行される二つのアプローチのうちのどちらかが最も効果的であることを示しました。

用途: 職業コード付けのための二段階的なアプローチ
難易度: Hard
コスト: Low

センサ/時系列強化学習方策勾配 (PPO / A3C)検出音声

Distributed Acoustic Localization Array Deployed Using a Soft Everting Vine Robot

Soft robot exteroception is increasingly being explored for a variety of field applications. In this work, we

用途: 検出
難易度: Hard
コスト: Medium

Digital Twin Modeling of a Highly Automated Agricultural Tractor

このプロジェクトでは、農林業用自動化トラクターのデジタルツインモデリングが行われた。デジタルツインはCAN通信を使用することでトラクターの動きを模倣し、実際のトラクターの動作をシミュレートする。

強化学習画像

用途: 農林業用自動化トラクターのデジタルツインモデリング
難易度: Hard
コスト: Medium

Contact-Persistent Full Actuation for Aerial Physical Interaction

Fully actuated unmanned aerial vehicles (UAVs) are usually certified through rank conditions on a control-allo

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Improved Lower Bounds and Output Augmentation for Facility Location Mechanisms

We study the strategic facility location problem under the egalitarian objective, where a mechanism uses the r

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Identity-Truthful Online Decision-Making

In Bayesian online selection, a decision-maker observes a sequence of stochastic rewards and must immediately

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Boundary-Adapted PINNs for Elliptic Dirichlet Problems: $H^2(Ω)$ A Priori Error Bounds with Application to Mean Escape Time Computation

この研究では、Oceanモデルを使用して、オーシャンで不完全な観測を使用する可能性と、生成的ステートスペースモデルと最適化フレームワークを使用して直接不完全な観測から学習する能力を評価します。

強化学習方策勾配 (PPO / A3C)テキスト

用途: Oceanモデルにおける不完全な観測の使用
難易度: Hard
コスト: Medium

Emergent Autonomous Drifting for Collision Avoidance in Real-World Winter Driving Scenarios

Real-world collision avoidance is a core motivation for studying the dynamics and control of high sideslip dri

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Computing on the Fly: Navigating a Vision for the Future of Drone Computing

The report envisions a decade in which drones move goods, medical supplies, and information at a scale compara

強化学習検出生成

用途: 検出
難易度: Hard
コスト: High

The Twist Decomposition of Serial Robots Under Lower-Mobility Tasks

This paper introduces a twist decomposition framework for serial manipulators performing lower mobility tasks.

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Fabric Pneumatic Artificial Muscles Based on the Drawstring Principle

Pneumatic artificial muscles have wide applications in robotics and industrial fields. Conventional pneumatic

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Packing Linear Programs and Fractional Knapsack using Comparison Oracles

We study the problem of recovering the objective of a packing linear program when the algorithm accesses only

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Hospitals/Residents with Inseparable Couples: Finding a Coalition-Stable Assignment Is NP-Hard

In recent work on course allocation, Rodríguez and Manlove consider the complexity of finding a stable assignm

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

githubGitHubあり2026-07-21

Book-Mathematical-Foundation-of-Reinforcement-Learning — This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

Mathematical Foundations of Reinforcement Learningは、ディープラーニングにおける推論力学習の数学的基礎を網羅している。

用途: ディープラーニングに関する本書の制作
難易度: Easy
コスト: Medium

Integrity-Gated Eco-CACC: Epistemic Admissibility for Cooperative Driving at Signalized Intersections

Eco-Cooperative Adaptive Cruise Control (Eco-CACC) systems rely on accurate localization, signal timing, and i

センサ/時系列強化学習モデルベース検出

用途: 検出
難易度: Hard
コスト: Medium

The Open Ant: A Robot Platform for Reinforcement Learning Research

Reinforcement learning (RL) research has demonstrated success in both physical and simulated domains; however,

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

MI向き強化学習方策勾配 (PPO / A3C)生成

arxivGitHubあり2026-07-20

MEVION: Low-Cost Open-Source Data Collection System for Powerful and High-Speed Dual-Arm Manipulation

The global competition for developing robotic foundation models is intensifying. Among the data collection sys

用途: 生成
難易度: Hard
コスト: Medium

RT-SHCUA: Real-Time Self-Hosted Computer-Use Agent for UAV Control

Natural-language control offers a promising interface for unmanned aerial vehicles (UAVs), but directly applyi

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

説明可能品質予測/異常検知強化学習方策勾配 (PPO / A3C)画像

ConceptTree: Bringing Semantic Transparency to Black-Box Decision Making for Robotic Manipulation

この論文では、ConceptTreeというフレームワークを提案しています。このフレームワークは、人の見える概念を使用して、マニピュレーションの高位のスキル選択を表現し、透明性を高めます。

用途: マニピュレーションの高位のスキル選択のための透明性の実現
難易度: Hard
コスト: High

Lifelong Multi-Subsystem Pickup and Delivery with Buffer-Limited Handover Stations

Pickup and Deliveryシステムでは、ロード管理が大きな問題です。この研究では、 Pickup and Deliveryシステムにおけるオフロード管理を考慮した新しいアプローチであるHandover-Awa

用途: Pickup and Deliveryシステムのオフロード管理
難易度: Hard
コスト: Medium

Stability and Comfort in Mobile Robot-Pedestrian Interactions

Mobile robots in public spaces must ensure pedestrians' comfort, and yet empirical studies of walkers' subject

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Predicting Grasping Compliance in Robotic Hands through Analytical-Model-Informed Neural Networks

In robotic manipulation studies, grasping is often treated as a binary success or failure problem, usually def

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Disturbance-Aware Flight for Aerial Robots in Narrow Space

Autonomous flight of aerial robots in narrow space remains challenging due to strong aerodynamic disturbances

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

1-out-of-5 Maximin-Share Allocations Always Exist for Four Agents

For four agents with nonnegative additive valuations, a complete 1-out-of-5 maximin-share allocation always ex

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

MI向きセンサ/時系列強化学習マルチエージェント異常検知

Compositional Semantic Communication for Physical AI: Category Theory Meets Game Theory

Physical artificial intelligence (AI) systems involve distributed sensing agents with embedded AI models that

用途: 異常検知
難易度: Hard
コスト: Medium

huggingfaceGitHubありHugging Faceあり2026-07-20

Differentiable Logic Gate Networks for Low-Latency EEG Classification on Edge Devices

Real-time EEG classification on edge devices is bottlenecked by the floating-point arithmetic of conventional

CPUで試しやすい強化学習マルチエージェント分類検出

用途: 分類
難易度: Easy
コスト: Low

githubGitHubあり2026-07-20

Gymnasium — A standard API for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Gymnasiumは、シングルエージェントRLの疑似環境を提供するAPIです。

用途: 疑似環境を提供する
難易度: Easy
コスト: Medium

Retriever: Composing Closed-Loop Asynchronous Robot Programs

Building long-horizon robot agents requires composing closed-loop pipelines -- perception, belief update, plan

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

VIDAR: Visual-Inertial Dense Alignment and Reconstruction via a Geometric Foundation Model

Monocular foundation models provide dense geometry but usually lack a stable metric scale. This paper presents

強化学習画像

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Optimal Safety Control using High-Order Control Barrier Functions

This paper investigates the optimal safety control problem of nonlinear control systems by proposing novel hig

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Equilibrium analysis of three-player General Lotto game with leader-follower framework

In this paper, we introduce the General Lotto game with a regulator (R-Lotto), a leader-follower extension of

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-18

Value-Monotonicity Matters: A Concordance Loss for Deep Survival Prediction

Deep survival models are evaluated almost exclusively by the concordance index (C-index), yet they are commonl

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

説明可能強化学習方策勾配 (PPO / A3C)画像

arxivGitHubあり2026-07-18

SinD 2.0: A Multi-City UAV Dataset with Semantic Risk Annotations for SOTIF-Oriented Safety Validation at Signalized Intersections

Safety validation at signalized intersections remains a critical bottleneck for the deployment of autonomous d

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-18

Censorship Resistance and Throughput with Multiple Concurrent Proposers

Censorship resistance is the defining advantage of blockchains over their centralized counterparts. Yet block

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-18

Audited Auctions: Reducing Harms in Advertising

Although standard auction mechanisms help truthfully reveal preferences of bidders, they can inadvertently res

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Differentiable Reinforcement Learning for Path Tracking by an Agile Fish-Like Robot

Fish-like swimming has inspired the design of several dozens if not hundreds of bioinspired robots in the last

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Linear Stability Analysis of an INDI Pitch-Rate Controller under Model Mismatch for a Tilt-Rotor VTOL UAV

Incremental Nonlinear Dynamic Inversion (INDI) is attractive for unmanned aerial vehicle (UAV) flight control

説明可能強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

A New Implementation of NeoSLAM and a Comparative Evaluation with RatSLAM

この研究では、SLAMアプリケーション、NeoSLAMとRatSLAMを比較評価し、NeoSLAMを改良するとともに、比較評価のための基準となるデータセットを提案しています。

用途: SLAMアプリケーションの比較評価
難易度: Hard
コスト: Medium

Let the Body Follow: Coupled Egocentric Control for Whole-Body Robot Teleoperation

この研究では、ロボットの制御をエゴセンタリックにし、視覚情報と身体情報を連携させて、ロボットの移動と姿勢を制御することができるシステムを提案しています。

用途: 連携したエゴセンタリックで全体的なロボットの制御
難易度: Hard
コスト: Medium

Vessel Trajectory Prediction using COLREGs-aware Optimal Planning

This paper presents a trajectory prediction method for marine vessels based on optimal planning. Crude initial

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Fair Allocation of Divisible Goods under Non-Linear Valuations

We study the problem of dividing homogeneous divisible goods among agents with non-linear valuations. Specific

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-17

SeerGuard: A Safety Framework for Mobile GUI Agents via World Model Prediction

Mobile graphical user interface (GUI) agents have demonstrated remarkable capabilities in automating complex t

強化学習モデルベース

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

githubGitHubあり2026-07-17

open_spiel — OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.

ゲームの一般的な強化学習用エンドポインティであるEnvironmentおよびアルゴリズムの集合。

用途: ゲームの一般的な強化学習用エンドポインティ
難易度: Easy
コスト: Medium

arxivPaper only2026-07-16

SMC-ES: Automated synthesis of formally verified control policies

The deployment of autonomous cyber-physical systems in safety-critical environments requires closed-loop contr

強化学習モデルフリー (DQN / SAC)生成

用途: 安全な制御ポリシーを自動生成する
難易度: Hard
コスト: Medium

少数データ向き強化学習マルチエージェント回帰テキスト

Operator-Informed Gaussian Processes for Complex Helmholtz Wavefields: From Synthetic Benchmarks to In Vivo Brain Elastography

Helmholtz方程式は、時間共伴振波の伝播を記述する重要な方程式であり、媒質が損失した場合複素係数を持ちます。ここでは、空間での波場から波方程式を推測するために、物理知識に基づくGaussian Process（GP

用途: 复雑なHelmholtz波場のための物理知識に基づくGaussian Process
難易度: Hard
コスト: Medium

Algebraic Representability as the Limiting Regime of Grokking: An Exactly Solvable Model with Holomorphic Activations

分割的な計算の極限に、表現可能な関数クラスが有限次元の代数的多様体に退化することを示し、モデルキャパシティの増加が一般化を促進することを明らかにした。

用途: アルゴリズムの表現可能性
難易度: Hard
コスト: High

Stable Voting is PSPACE-Complete

Stable Voting and Simple Stable Voting, introduced by Holliday and Pacuit, are Condorcet-consistent voting rul

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

The Dynamic Verifiable Multi-Agent Human Agentic Loyalty Loop (DVM-HALL) Model and the Net Human-Agent Score (NHAS) in Autonomous Commerce

自動販売店で客と交わるAIロボットの信頼性を確立する必要がある。このモデルは、客とロボットの信頼関係を構築し、客の買い物をサポートすることを目的としている。

強化学習RLHF

用途: 自動販売店で客と交わるAIロボットの信頼性の確立
難易度: Hard
コスト: Medium

githubGitHubあり2026-07-15

vowpal_wabbit — Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbitは、機械学習を進歩させるためのオンライン学習、ハッシュ、reduceなどの強力なアルゴリズムを含むシステムです。その結果、さまざまな問題に応じて、高品質な解決策を提供できます。

用途: 強い機械学習アルゴリズムを実行し複雑な問題を解決するためのシステム
難易度: Easy
コスト: Medium

arxivPaper only2026-07-14

A Better-than-$e^{1/e}$ Approximation Algorithm for Nash Social Welfare under Additive Valuations

We present an $(e^{1/e} - c)$-approximation algorithm for maximizing Nash social welfare under additive valuat

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-14

Cycles in Liquid Democracy: A Game-Theoretic Justification

代表権という概念は、政治や経済のシステムで重要である。デリゲーション、または代表権の授与、はさまざまなシステムに現れる。デリゲーションを安全かつ効率的かつ有効に運用するために、意思決定者はデリゲーションを設計する際に考慮

品質予測/異常検知強化学習

用途: 代表権の確保
難易度: Hard
コスト: Medium

Disentangling Forced and Internal Climate Variability in Single Realizations using Dynamic Mode Decomposition with Control

We show that a single climate realization can be decomposed into forced and internal components by treating ex

説明可能強化学習モデルベース検出回帰

用途: 検出
難易度: Hard
コスト: Medium

説明可能強化学習モデルフリー (DQN / SAC)テキスト

Auditing the Risk Claims of Distributional Reinforcement Learning

分布型強化学習のリスク評価を容易にするために、分布型強化学習におけるリスク評価を分析しました。

用途: 分布型強化学習のリスク評価
難易度: Hard
コスト: High

品質予測/異常検知強化学習方策勾配 (PPO / A3C)検出

Removable Defects: The Economics and Limits of Deliberate Deficiency

A specialist tolerates blind spots that a generalist does not. Usually this is treated as a cost to be minimiz

用途: 検出
難易度: Hard
コスト: High

品質予測/異常検知強化学習方策勾配 (PPO / A3C)

Philosopher and Prophet Inequalities for Divisible Items

We study online welfare maximization with divisible resources. A sequence of $n$ players arrive one by one; up

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-12

The Complexity of Computing Coarse Correlated Equilibria in Markov Games with a Single Controller

We study the complexity of computing stationary Markov coarse correlated equilibria (CCE) in discounted single

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-12

Which Wallpaper Groups Arise from Tiled Games?

Which discrete symmetry groups can arise from strategic interaction? We tile the plane with copies of a bimatr

強化学習方策勾配 (PPO / A3C)分類

用途: 分類
難易度: Hard
コスト: Low

Double elimination formats for a 64-team FIFA World Cup

The recent expansion of the FIFA World Cup to 48 teams has prompted discussions regarding a potential further

品質予測/異常検知強化学習マルチエージェント

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Best-of-Both-Worlds Fairness for Mixed Goods and Chores

We study the fundamental problem of fairly dividing indivisible items among agents with additive utilities. In

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Optimal Subsidy Bounds for Goods and Chores: One Dollar Each Suffices

We study the fair allocation of $m$ indivisible items to $n$ agents with additive utilities. In our setting, e

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Fair Division with Binary Valuations: Characterizations

We consider the fair allocation of indivisible goods with binary valuations. In this setting, the maximum Nash

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-10

Beyond Bayesian Nash: Learning Minimax-Regret Equilibria for Adversarial Team Games under Asymmetric Information

Adversarial team games (ATGs) with asymmetric information, such as adversarial path-finding, goal search, and

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-10

Implicit Midpoint Gradient Descent: Fast and Learning rate free convergence for Zero-Sum Games

We study unconstrained bilinear zero-sum games, a fundamental model in online learning, adversarial optimizati

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-09

Offline Nash Solvers Meet Online Tree Search in Multi-Agent Games on Graphs

マルチエージェントゲームにNash合図を解決するためのPrimitive-GuidedTree Searchアルゴリズムを提案。

用途: マルチエージェントゲームを解決する
難易度: Hard
コスト: Medium

Pure Nash Equilibria in Graphical Games of Bounded Width Revisited

We revisit the complexity of deciding whether a graphical game admits a pure Nash equilibrium (PNE) parameteri

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Positional Determinacy with Colored Vertices: a 1-to-2-Player Lift

位置的決定性を保証するために、頂点を色付けした対称ゲームを研究した。その結果、ペアの色付けゲームの位置的決定性も保証されることがわかりました。

用途: 位相決定の問題
難易度: Hard
コスト: Medium

Eigenmanifold in Game: Evidence from human continuous strategy game experiments

In evolutionary game dynamics, there exists a hypothesis, which states that, the dynamic structure of the game

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Stable Matchings with Minimum Utility Gap

安定マッチング問題では、エージェントが均等な利益を得られるようにマッチングを行う問題です。この問題を解くために、パートナーを2つ以上選択できるマッチングを取り巻く枠組みを提案し、2つのメートルを使用して利益の均衡度を評価

用途: 安定なマッチングプレイの問題
難易度: Hard
コスト: Medium

Simple Nash Equilibria for Qualitative Multiplayer Games

この研究では、確率的ゲーム理論の不完全情報の問題を調べました。不完全情報にはゲームの結果に関する不確実性があります。このような状況では、ゲーム理論者はゲームの結果を予測するために情報を取得することになります。

用途: ゲーム理論の不完全情報問題
難易度: Hard
コスト: Medium

githubGitHubあり2026-07-08

deep-reinforcement-learning — Repo for the Deep Reinforcement Learning Nanodegree program

この研究はDeep Reinforcement Learningに関する学習用リポジトリです。

用途: 実装・検証基盤
難易度: Easy
コスト: Medium

arxivPaper only2026-07-07

Quantum combinatorial games

これは、量子論を使用して、ゲーム理論の分野に新たなアプローチを提案する研究です。研究では、2つのプレイヤー間でプレイされる、確率のないゲームに関する既存の理論を検討しています。

用途: kvantum kombinatorial game
難易度: Hard
コスト: Medium

arxivPaper only2026-07-06

Game Conductors of Finite Groups: Determinantal Torsion from Structured Payoff Probes

We attach to a finite group $G$ and a structured payoff probe $φ$ an integer \emph{payoff-difference lattice}

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-06

Dynamics and Convergences for Markov Coevolutionary Opinion Formation Games in Dynamic Social Networks

While deterministic variants of the coevolutionary opinion formation games such as the K-Nearest Neighbor (K-N

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-06

Multi Choice Min Prophet

We study the minimization counterpart of the classic prophet inequality, often termed the min prophet or cost

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-05

Mechanism Design for Locating a Bridge Between Regions with Prelocated Facilities

In many urban planning projects, social planners require the construction of a bridge to connect two regions s

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-03

New bounds on randomized metric distortion of top-$k$ voting

We prove new upper and lower bounds on metric distortion for randomized social choice mechanisms. Under first-

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-03

Random Serial Dictatorship is $\sqrt{2}$-Envy-Free

We analyze the house allocation problem, in which a set of agents must be matched to a set of objects for whic

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-03

A Tractable Continuous-Time Model for Designing Interventions for Time-Inconsistent Agents

Designing effective goals and rewards for time-inconsistent agents is a central problem in many long-term task

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-02

Complex dynamics in the Sherrington-Kirkpatrick game

エンビリー率という、公平な割り当てに基づく新しいロケーションゲームの問題を解決するための、ステーションポイントの最適位置を決定するためのアプローチを提示しました。

用途: エンビリー率に基づくロケーションゲーム
難易度: Hard
コスト: Medium

arxivPaper only2026-07-02

Facility Location Game with Envy Ratio

マックス方程式に基づく二階ロケーション問題の問題を解決するための、アプローチを提示しました。

用途: マックス方程式に基づく二階ロケーション問題
難易度: Hard
コスト: Medium

MMAO-Dyn: A Metabolic Multi-Agent Optimizer for Dynamic Optimization

この研究では、メタボリックマルチエージェント最適化 (MMAO) が動的最適化に適用できるようにする必要がありました。MMAO-Dyn は、環境の変化によって元の有効な局所的構造を無効にした非stationary な設

強化学習マルチエージェントテキスト

用途: 動的最適化
難易度: Hard
コスト: Medium

Asymmetric Trading Prophets

この研究では、トレーダーと預言者の動作を研究します。トレーダーは価格変化を予測し、利益を最大化します。預言者は価格変化が予測できることを知っています。この研究では、トレーダーと預言者の競合する行動を分析し、トレーダーの利

用途: 交易者と預言者
難易度: Hard
コスト: Medium

Multiwinner Voting with Spatial Preferences under Incomplete Information

この研究では、多くの候補者を持つ投票問題を研究します。投票者は複数の候補者を支持し、評価を評価したり、拒否したりすることができます。この研究では、投票に公平性を考慮する方法を提案します。

用途: 多くの候補者の投票法
難易度: Hard
コスト: Medium

Which Voting Rules Are More Resilient to Coalitional Manipulation?

この研究では、多くの候補者の投票法の持続可能性を研究します。投票法は、投票者が投票を操作することを防ぐことができます。この研究では、投票法の持続可能性を評価します。

用途: 多くの候補者の投票法の持続可能性
難易度: Hard
コスト: Medium

A Large-Scale Empirical Evaluation of MMAO Under Fair-Budget Continuous and Discrete Benchmarks

この研究では、多様なベンチマークを用いて、Metabolic Multi-Agent Optimizer (MMAO)の適切性を評価します。MMAOは、複数エージェント間でリソースを分配するための閉ループのシステムです。

用途: 適切な方法を用いてリソース分配を最適化する
難易度: Hard
コスト: Medium

Knowing Who, Not How Much: Learning-Augmented Mechanisms for Consumer Utility Maximization

個人の価値を尊重するためのメカニズム設計の研究。個人の価値とメカニズム設計の関係を考察し、個人の意思決定を援助するためのメカニズムを設計する。

用途: 個人の意思決定を援助するためのメカニズム設計の研究
難易度: Hard
コスト: Medium

品質予測/異常検知強化学習マルチエージェントテキスト

A Contextual-Bandit Oversight Game with Two-Sided Informational Asymmetry

AIを援助するための意思決定者によるオーバーサイトの研究。AIが提案した行動の評価と決定を行うために、意思決定者とAIが情報を交流するオーバーサイトの実現を研究する。

用途: AIを援助するための意思決定者によるオーバーサイトの研究
難易度: Hard
コスト: Medium

Learning Fair Allocation of Indivisible Items from Limited Feedback

個人の価値を尊重するためのアイテムの分配を決定するアルゴリズム。この研究では、個人のアイテムの価値を尊重するための分配を決定するアルゴリズムを開発する。

用途: 個人の価値を尊重するためのアイテムの分配を決定するアルゴリズム
難易度: Hard
コスト: Medium

arxivPaper only2026-06-27

Reaching as Cheap as Possible in 1-clock Robust Weighted Timed Games

The value problem for 2-player games on graph generally consists in determining the minimal value Min can ensu

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-26

GTI-mSEMP Framework : A Proposed Framework to Simulate Malware Propagation with Inclusion of Attacker-Defender Strategy

マルウェアの感染は、ネットワーク全体に広がる可能性があります。既存のモデルの場合、脅威に対する防御戦略は静的なパラメータとして扱われますが、実際には攻撃方策と防御方策の間の競合関係に依存します。このため、ゲーム理論を用い

センサ/時系列強化学習

用途: マルウェアの感染をシミュレーションし、防御戦略を提案する
難易度: Hard
コスト: Medium

arxivPaper only2026-06-26

Characterisation of reactive Nash equilibria in repeated additive games

共同作業ゲームでは、2つのプレイヤーが協力または競争

用途: 共同作業ゲームにおける反応戦略の分析
難易度: Hard
コスト: Medium

arxivPaper only2026-06-25

Pick Two: An Adversarial Animal Survival Game

The "Pick Two" animal selection puzzle is a popular thought experiment in which two animal species must defend

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-25

Almost EFX in Hypergraphs

この研究では、個々の価値に基づいて分割可能な財を分配する方法を提案している。この分配方法は、個々の価値を考慮しながら、効率的な分配を目指している。

用途: 分割可能な財の分配
難易度: Hard
コスト: Medium

arxivPaper only2026-06-25

Existence of Pure Strategy Nash Equilibria in Finite Noncooperative Games

この研究では、非協力ゲームの純戦略均衡の条件を提案している。この条件は、個々のゲームの結果を考慮しながら、均衡の必要性を評価している。

用途: 非協力ゲームの純戦略均衡
難易度: Hard
コスト: Medium

EvoFlock: evolved inverse design of multi-agent motion

多エージェントモデルの調整は、現実的なシミュレーションの実現を支援します。本研究では、新しく開発したモデルによって、調整を行うことができます。

用途: 多エージェントモデルの調整
難易度: Hard
コスト: Medium

Hotelling-Downs with Facility Synergy: The Mall Effect

このプロジェクトでは、複数のステイションに対応するマルチステイションのエンドポイントの最適配置を探します。

用途: 位置付けるためのマルチステイションのエンドポイントの最適配置の開発
難易度: Hard
コスト: Medium

Restoring Incentive Compatibility in Two-Stage Energy Markets with Prosumers

分布制御に基づく電力市場の問題は、供給と需要がバランスのとれた状況ではなく、供給が需要より多い状況を表現することができます。

強化学習マルチエージェント生成

用途: 電力供給の分散化における不均衡解決問題の解決
難易度: Hard
コスト: Medium

How to program a never-losing chess engine

This article proposes a model, based on graph theory, to represent a variety of two-player games of perfect in

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Equilibrium and Infeasibility: A new solution concept for games

この研究では、ゲームの非共通性を考慮した新しい解決概念の提案、共通解決のための制約を用いない、多項式時間解決を提案します。

用途: 共同ゲームにおける不可能性の対処の研究
難易度: Hard
コスト: Medium

Strict Fairness at What Cost? Envy-Free Contracts with Subsidies

共同契約設計は、代理人が複数のタスクを、代理人に分配するという点で重要です。

用途: 共同契約設計における偏りのなく、公平な契約の設計の研究
難易度: Hard
コスト: Medium

Decidability and Undecidability Results for LIA-Definable Impartial Combinatorial Games

この研究では、非決定主義的ゲームの可解性と不可解性に関する定量的な結果を示します。

用途: 有界可能性に関する線形整数方程式の定義を持つ非決定主義的ゲームの可解性と不可解性の研究
難易度: Hard
コスト: Medium

arxivPaper only2026-06-22

Rationalizing collective revealed preferences with an application in fair resource allocation

This paper presents a revealed preference approach for rationalizing collective consumption behavior. We intro

説明可能強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-22

Flow Games with Public Arcs: the Least Core and the Nucleolus

We study flow games with public arcs, an extension of classical cooperative flow games that allows players to

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-21

A Note on Learnable Nash Equilibrium

A Nash equilibrium is learnable if there exists a myopic adjustment dynamic for which it is asymptotically sta

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-21

Fundamental market design as a layer of AI-agent alignment

This paper argues that AI-agent alignment in markets should not be understood only as a property of agents, bu

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-20

Physics-Informed Eikonal Caging for Whole-Arm Manipulation Planning

Planning contact-rich whole-arm manipulation is challenging because interactions that involve extended robot g

品質予測/異常検知強化学習方策勾配 (PPO / A3C)動画

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-06-20

Game-Theoretic Framework for Private Data Sharing in Vehicular Networks

We present a novel game-theoretic framework designed to enhance privacy and scalability in decentralized vehic

センサ/時系列強化学習方策勾配 (PPO / A3C)

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium