MLinfo | 機械学習・AI論文まとめ

OpenBB — Open Data Platform for analysts, quants and AI agents.

OpenBBは、分析家・量算家・AIエージェント用の金融データプラットフォームを提供している。

機械学習教師あり学習

用途: 金融分析用データ
難易度: Easy
コスト: Medium

自然言語処理大規模言語モデルテキスト音声マルチモーダル

screenpipe — YC (S26) | Record your screen 24/7 and plug into your agents. Local, private, secure. Connect to OpenClaw, Hermes agent and 100+ apps

ユーザーの行動を認識し、オートエージェントを構築するためのツール。

用途: オートエージェント構築
難易度: Easy
コスト: High

自然言語処理大規模言語モデルテキストマルチモーダル

ai-agent-book — 《深入理解 AI Agent：设计原理与工程实践》（李博杰著）开源主仓库：全书正文、编译版 PDF 与按章配套代码

この論文では、現在のVision-Language-Benchmark（VLB）を超える、MLLMがアクティブな観察を実演できるようにするためのバenchmark、ActiveVisionを提案する。このActiveVi

用途: 弁論の実際的な対象を形成するためにAIが活用される
難易度: Easy
コスト: High

ART — Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.6, GPT-OSS, Llama, and more!

ARTは、多段強化学習トレーナーです。このトレーナーは、GRPOを使用して、現実世界のタスクに対して、多段強化学習を行うことができます。

自然言語処理大規模言語モデル強化学習

用途: 多段強化学習トレーナー
難易度: Easy
コスト: High

AReaL — The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

このリポジトリでは、高性能で大規模なベクトルデータベースとベクトル検索エンジンを提供しています。

用途: 高性能で大規模なベクトルデータベース
難易度: Easy
コスト: High

mlflow — The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

このリポジトリでは、AIワークロードを管理するためのシステムであるSkypilotを提供しています。

品質予測/異常検知自然言語処理大規模言語モデル

用途: AIワークロードを管理するためのシステム
難易度: Easy
コスト: High

zenml — ZenML 🙏: One AI Platform from Pipelines to Agents. https://zenml.io.

aimは、利用しやすく強力なオープンソースのエクスペリメントトラッカーです。

用途: AIプラットフォーム
難易度: Easy
コスト: High

haystack — Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

オープンソースのAIオーケストレーションフレームワークです。LLMアプリケーションの構築に必要なパイプラインやエージェントワークフローの設計ができるようになっています。

深層学習Transformer生成要約テキスト

用途: LLMアプリケーションの構築
難易度: Easy
コスト: High

DocsGPT — Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.

このリポジトリでは、トークナイザーの最適化を提供しています。

深層学習Transformerテキスト

用途: トークナイザーの最適化
難易度: Easy
コスト: Medium

botpress — The open-source hub to build & deploy GPT/LLM Agents ⚡️

オープンソースのGPT/LLMエージェント作成ツールです。

用途: GPT/LLMエージェントの構築
難易度: Easy
コスト: High

Compact Latent Coordination for Autonomous Vehicles at Unsignalized Intersections

Coordinating autonomous vehicles at unsignalized intersections remains a critical challenge for multi-agent re

自然言語処理RAG強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

The Dark Room in the Reward Channel: Dense Prediction Rewards Collapse GRPO-Trained LLM Agents -- and What Actually Works

Dense per-step supervision is an appealing remedy for sparse-reward, long-horizon LLM agents: reward the agent

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Approximate Quantum State Preparation Through Proximal Policy Optimization

この研究では、深層強化学習を用いて、クォンタムSTATEPREPARATIONの近似方程式を学習し、クォンタムシステムの最適な操作手法を検討するための新しいアプローチを提案します。

用途: クォンタムSTATE PREPARATION
難易度: Hard
コスト: Medium

品質予測/異常検知自然言語処理ファインチューニング強化学習

TOUR: A Trajectory-Level Unlearning Benchmark for Offline Reinforcement Learning

この研究では、固定行動軌道に基づいて訓練されたオフサイト学習エージェントのデータ削除を評価するためのTOURを提案し、オフサイト学習の安全性を高めます。

用途: オフサイト学習のデータ削除
難易度: Hard
コスト: High

Multi-turn RL with Structural and Performance Aware Rewards for CUDA Kernel Generation

CUDAカーネルの生成を支援するCudaPerfを提案した研究で、この方法により、高性能のCUDAカーネルを効率的に生成できる。

自然言語処理大規模言語モデル生成強化学習

用途: CUDAカーネルの生成を支援する
難易度: Hard
コスト: High

OpenForgeRL: Train Harness-native Agents in Any Environment

OpenForgeRLは、ハーネス付きエージェントを訓練するためのフレームワークを提供する。これにより、エージェントが複雑なトラジショナルハーネスを利用して、外部システムと協力し、複数のタスクを同時に解決できるようになっ

深層学習軽量化・量子化マルチモーダル

用途: ハーネス付きエージェントのトレーニング
難易度: Hard
コスト: High

GS-Agent: Creating 4D Physical Worlds With Generative Simulation

GS-Agentは、自然言語から生成することができ、物理的に正しく動作する4次元の世界を生成することができる。方法は、物理的正しさを保つために、生成時に物理的推論を使用した。

MI向き自然言語処理RAG生成画像テキスト

用途: 4次元の物理世界の生成
難易度: Hard
コスト: High

Same Dangerous Objective, Opposite Advice: Direct Exposure versus Multi-Agent Mediation

この研究では、LMOの安全性を調べた。結果は、直面する危険目標に対してモデルが安全なアドバイスを出すことができた。

用途: 直接暴露対照的暴露
難易度: Hard
コスト: High

Agentic Context Management: Solving Agent Memory and Cost by Treating Them as Lifecycle and Architecture Problems

Agentic Context Managementは、エージェントのメモリとコストを管理できるようにした。方法は、エージェントが自己管理できるように、トレーナーが制御できるようにした。

自然言語処理RAG要約テキスト

用途: エージェントメモリとコストの解決
難易度: Hard
コスト: Low

Toward Continuous Assurance for the Democratization of AI Agent Creation in Industry

Democratization of AI Agent Creationは、オーガナイゼーションがオープンなAIエージェントを作成できるようにした。方法は、エージェントの信頼性を

用途: AIエージェントの民主化
難易度: Hard
コスト: Low

arxivGitHubあり2026-07-23

Agentic coding without the cloud: evaluating open-weight large language models on longitudinal data preparation tasks

Large language models (LLMs) and agents are now widely used tools in code development, with data typically sen

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知強化学習マルチエージェントテキスト

AREX: Towards a Recursively Self-Improving Agent for Deep Research

Deep research requires agents to find answers that jointly satisfy multiple constraints. Discovering such answ

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Agent-Guided Relational Concept Discovery: Toward Interpretable Surgical Margin Assessment

Deep learning models can effectively use Rapid Evaporative Ionization Mass Spectrometry (REIMS) data for surgi

説明可能深層学習Transformer

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

PATS: Policy-Aware Training Scaffolding for Agentic Reinforcement Learning

In long-horizon LLM agent reinforcement learning, weak policies often repeat similar failures, producing uninf

自然言語処理大規模言語モデルテキスト強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

表形式向き自然言語処理大規模言語モデル生成テキスト

Euclid-MCP: A Model Context Protocol Server for Deterministic Logical Reasoning via Prolog

Large Language Models (LLMs) excel at natural language understanding and generation but remain unreliable for

用途: 生成
難易度: Hard
コスト: High

センサ/時系列自然言語処理大規模言語モデル画像テキスト3D

VoLN: Vision-Only Long-Horizon Navigation---Paradigm, Benchmark, and Method

Vision-and-Language Navigation (VLN) enables embodied agents to follow natural-language instructions. However,

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Regulating autonomous and agentic AI

Regulating activities where regulatees use autonomous and agentic AI is challenging. Regulatory assumptions ab

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Toward cryptographically verifiable authorization for autonomous AI agents: A security hypothesis, preliminary formal model, and proof-of-concept implementation

Autonomous AI agents increasingly execute actions, invoke tools, and operate on protected resources with limit

MLOpsモデルデプロイテキスト音声

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

GRADRAG: Cross-Component Prompt Adaptation for Coordinated Multi-Agent RAG

Retrieval-Augmented Generation (RAG) systems increasingly employ multiple LLM agents. Yet, most prior work opt

自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成強化学習

Expert Behavior Prior Reinforcement Learning

Behavior prior reinforcement learning (BPRL) has emerged as a promising paradigm to improve sample efficiency

用途: 生成
難易度: Hard
コスト: High

arxivGitHubあり2026-07-23

pAI-Econ-claude: A Gated Human-in-the-Loop Multi-Agent Architecture for AI-Assisted Economic Theory Development

この研究では、大規模言語モデルを活用して、経済学の研究活動をサポートするシステムを開発しました。このシステムは、学者が理論モデル開発を自動化することができます。

用途: 経済学の研究支援システム
難易度: Hard
コスト: High

ICAE-Bench: Evaluating Coding Agents as Interactive Project Builders

この研究では、大規模言語モデルを活用して、コード生成を自動化するエージェントを評価しました。大規模言語モデルを活用することで、エージェントの性能が向上しました。

品質予測/異常検知機械学習教師あり学習

用途: コーディングエージェント評価
難易度: Hard
コスト: Medium

Explainable Belief Harmonization under Dynamic Epistemic Partitions

この研究では、大規模言語モデルを活用して、信念の共有を組み合わせるモデルを開発しました。大規模言語モデルを活用することで、信念の共有を推測することができました。

説明可能自然言語処理RAG検出

用途: 共有された信念を組み合わせるモデル
難易度: Hard
コスト: Low

Explainability Framework for Policy-Aware Autonomous Agents

この研究では、大規模言語モデルを活用して、自己決定エージェントの説明可能性を研究しました。大規模言語モデルを活用することで、エージェントの行動を推測することができました。

用途: 自律エージェントの説明可能性
難易度: Hard
コスト: Low

AttriMem: Attribution-Guided Process Feedback for Agent Memory Learning

代理記憶の学習は、LGMが効果的に情報を保持・更新・処理できることを意味します。この研究では、アトリビューテッドグラフィックフィードバックを使用して、代理記憶を最適化する方法を提案します。

自然言語処理大規模言語モデルQA

用途: 代理記憶の学習
難易度: Hard
コスト: High

HiMe: Real-Time Self-Hosted Personal Agent Platform for Health Insights with Wearable Devices

Traditional approaches to wearable health signal analysis, such as smartwatches, are constrained by rigid anal

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

EmoAgent-R1: Towards Multimodal Emotion Understanding with Reinforcement Learning-based Dynamic Agent Specialization

Multimodal large language models (MLLMs) have achieved impressive performance in multimodal emotion recognitio

自然言語処理大規模言語モデル分類テキスト動画

用途: 分類
難易度: Hard
コスト: High

arxivGitHubあり2026-07-23

Workflow-Localized Mechanism Learning: Attribution-Guided Repair and Knowledge Reuse for Structured Agent Skills

Agent Skills package reusable procedural knowledge as external artifacts for frozen language-model agents, yet

MI向き強化学習方策勾配 (PPO / A3C)

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

GuardianAgentBench: Where Agents Fail and How to Guard Them

_guardianAgentBenchBenchmarkは、580のシナリオを6つのドメインで評価し、3つの実稼動フレームワークであるLangChain、LlamaIndex、Vectaraを利用します。このベンチマーク

用途: 機械学習Agentの安全性と信頼性を確保
難易度: Hard
コスト: High

Delivery, Not Storage: Cue-Anchored Working Memory as a Harness Property for Coding Agents

Coding agents ship with one kind of memory: documents. Instruction files, plan artifacts, and auto-written mem

MI向き自然言語処理RAGテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

SciExplore: Evaluating Autonomous Agents from Scientific Navigation to Information Integration

Scientific research involves complex information-seeking and reasoning workflows across heterogeneous sources.

自然言語処理大規模言語モデル生成QAテキスト

用途: 生成
難易度: Hard
コスト: High

Traceable Scholarship: Page Anchors and Ariadne's Thread for Humanistic Inquiry in the Age of Generative AI

Generative AI lets large language models produce scholarly-looking text within seconds, yet fluency does not e

自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Hard
コスト: High

Is Deep Research Reliable? Misleading Knowledge Induces False Conclusions

Deep Research agents extend LLM-based assistants into long-horizon workflows involving planning, retrieval, ev

用途: 生成
難易度: Hard
コスト: High

Auditing Provenance Sensitivity in LLM Agent Action Selection

LLM agents choose tools and arguments from context that mixes user requests, tool outputs, retrieved records,

深層学習Transformer検出テキスト

用途: 検出
難易度: Hard
コスト: High

MemTools: A Unified Research Framework for Interoperable Agent Memory

この論文では、記憶システムをサポートするフレームワークMemToolsが構築され、記憶システムの開発を容易にすることを目指しました。これにより開発者は、記憶システムの各コンポーネントを開発およびテストしやすくなり、設計と

自然言語処理RAGマルチモーダル

用途: エージェントの記憶をサポートするフレームワークの構築
難易度: Hard
コスト: High

Sample-Efficient Learning from Agent Experience

Real-world agent learning is often constrained by costly environment interactions, such as running time-consum

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Tencent WorkBuddy Bench: A Multi-Domain Coding-Agent Benchmark with Contamination-Resistant Task Construction

コーディングエージェントの評価基準を導入し、現実世界のコミットやプルリクエストに基づくタスクを構築した。

自然言語処理大規模言語モデル画像テキスト

用途: コーディングエージェントの評価
難易度: Hard
コスト: High

LegalCiteTrust: Benchmarking Citation Trustworthiness in Chinese Long-Form Legal Research Reports

Chinese language の長形法律研究報告における出典の信頼性を評価し、信頼性が低い出典を検出および評価する目的で LegalCiteTrust を提案している。

用途: 法律研究報告の信頼性改善
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成画像テキスト

Streaming Multi-Agent Autoregressive Diffusion Model with World State Registers

多エージェントのシミュレーションにおいて、共有世界状態がエージェント間で保持され、その世界状態が観測結果に反映されると仮定している。

用途: マルチエージェントのシミュレーション
難易度: Hard
コスト: High

Causal-AgentIR: Self-Evolving Causal Memory for Adaptive Image Restoration Agents

Image restoration agents have recently emerged as a flexible paradigm for handling diverse and unpredictable d

品質予測/異常検知生成AIGAN画像テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

Show, Don't Tell: Evaluating Spatial Cognition in Generative Pixels Rather Than LLM Text

空間理解は、物理世界と静的のセマンティック理解の間でつながるために不可欠です。多くの空間タスクは、場所、領域、パスの自然な表現は、ポインティングやマーキングなど、連続的な視覚的シーンで行われることが多いが、現行の空間推論

自然言語処理大規模言語モデル生成画像テキスト

用途: 空間理解
難易度: Hard
コスト: High

Engine-Native Editable 3D World Reconstruction with Objects and Lighting

この論文では、Lumeraという手法を提案します。Lumeraは、Engine-Native 3D World ReconstructionとLightsを検出するために使用します。

自然言語処理大規模言語モデル検出生成画像

用途: 3D世界の再構成
難易度: Hard
コスト: High

Agentic Designer: Progressive Multi-Agent Collaboration for Structure-Aware Interior Layout Generation

Generating realistic interior furniture layouts that strictly adhere to architectural constraints (e.g., walls

用途: 生成
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーションQA画像テキスト

Beyond Episodic Evaluation: Memory Architectural Bottlenecks in Sequential Embodied Question Answering

Embodied question answering (EQA) is traditionally evaluated under an episodic formulation, where agents solve

用途: QA
難易度: Hard
コスト: High

Discrete Truthful Heterogeneous Two-Facility Location: The Line and Beyond

We study deterministic strategyproof mechanisms for discrete heterogeneous two-facility location. In our model

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

qlib — Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, including supervised learning, market dynamics modeling, and RL, and is now equipped with https://github.com/microsoft/RD-Agent to automate R&D process.

クエンティング投資プラットフォームを実現するためにAI技術を活用します。

強化学習方策勾配 (PPO / A3C)教師あり

用途: クエンティング投資プラットフォーム
難易度: Easy
コスト: Medium

AgentsMeetRL — Awesome List for Agentic RL

エージェントRRLに関連するアワーショットリスト。

用途: エージェントRRL
難易度: Easy
コスト: High

ml-agents — The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

Unityを使用してマシンラーニングエージェントを訓練して訓練できるツールです。

コンピュータビジョン3D・点群3D強化学習

用途: Unityでマシンラーニングエージェント
難易度: Easy
コスト: High

giskard-oss — 🐢 Open-Source Evaluation & Testing library for LLM Agents

giskard-ossは、LLMエージェントの評価とテストライブラリを提供します。

用途: LLMエージェントの評価とテストライブラリ
難易度: Easy
コスト: High

LLMs Get Lost in Evolving User Intent

As LLMs become more capable, they are increasingly deployed as collaborative agents, taking on user-delegated

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Perspective Latents as an Architectural Condition for Causal Emergence in Active Inference Agents

A recent line of work measures causal emergence in reinforcement learning agents through Integrated Informatio

コンピュータビジョン動画認識強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

End-to-End Learning of Safe Optimal Feedback Control in High Dimensions with Control Barrier Function Layers

We consider the problem of learning high-dimensional semi-global feedback controllers under hard safety constr

自然言語処理埋め込み・検索

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Adaptive Multi-Horizon Reinforcement Learning

Effective decision-making in complex and changing environments requires balancing short-term and long-term con

コンピュータビジョン動画認識強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Frontier Financial Judgement: Can agents tell what might move a stock?

We introduce Frontier Financial Judgement, a challenging new benchmark developed in collaboration with profess

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Algorithmic Approaches to Sequential Decision-Making and Social Epistemology

As humans, we face many decisions that require us to choose between sticking to something and giving up. This

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

センサ/時系列自然言語処理ファインチューニング分類

Autonomous Collaborative Learning Among an Ensemble of Tsetlin Machines with Consensus-Based Inference

協同学習を扱う研究、Autonomous Collaborative Learning を用いて協同学習を提案する。

用途: 協同学習
難易度: Hard
コスト: Low

Bayesian uncertainty estimation improves clinical decision making in medical AI agents

Machine learning models for medical image analysis typically lack a reliable measure of confidence, limiting t

深層学習正規化・最適化手法分類検出画像

用途: 分類
難易度: Hard
コスト: High

表形式向きCPUで試しやすいコンピュータビジョンセグメンテーション検出

Harnessing Disagreement: Detecting Correlated Agreement Blindness in Multi-Agent Triage

この研究では、マルチエージェントによるトリージュア

用途: マルチエージェントによるトリेजュアの安全性の評価
難易度: Hard
コスト: Medium

DocOps: A Verifiable Benchmark for Autonomous Agents in Complex Document Operations

As autonomous agents rapidly evolve, their ability to reliably manipulate ubiquitous digital documents has bec

機械学習教師あり学習テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Know Your Agent: Reconnaissance-Driven Pentesting of AI Agents

AIエージェントを標的にしたテストを実施し、潜在的な見過ごしあり難い弱点を捜索して強化された攻撃戦略を実施することを提唱している。

用途: 弱点の侵食の対策
難易度: Hard
コスト: Low

Dreamer-CPC: Message Learning with World Models for Decentralized Multi-agent Reinforcement Learning

分散されたシステムにおける分散多エージェント強化学習を実現するための方法を提案している。この方法は、個々のエージェントがローカルな観測に基づいてメッセージを交換し、長期の経験を考慮したメッセージを学習することで、分散され

強化学習方策勾配 (PPO / A3C)埋め込み

用途: 分散されたマルチエージェント強化学習
難易度: Hard
コスト: Low

TriAgent: Divergence-Aware Multi-Agent Committees for Cost-Efficient Financial Sentiment Analysis

生産的言語モデルの利用による金銭的感情分析に対処するための方法を提案している。複数のエージェントを活用したコミティー方式を使用し、さまざまな粒度のテキストデータに対応できるように、単語レベルのルールベースアプローチ、句節

深層学習Transformer検出テキスト

用途: 金融分野の感情分析
難易度: Hard
コスト: High

The World Model Remembers, the Actor Forgets: Dream Rehearsal for Continual Model-Based RL

Model-based reinforcement-learning agents of the DreamerV3 family forget catastrophically when trained on task

強化学習モデルベース

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

HARP: The Human--AI Research Platform

Large language models (LLMs) have shifted human--computer interaction from `traditional'' interface journeys t

MI向き自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

ArbiGraph: Arbitrarily Scalable Verifiable Task Graphs for Evaluating Context Management

We introduce ARBIGRAPH, a benchmark generator for evaluating whether tool-assisted language agents can retain,

MLOpsモデルデプロイテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

IssueTrojanBench: Benchmarking AI Coding Agents Against Malicious Issue Requests

AI coding agents powered by LLMs are increasingly integrated into real-world software development, where they

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

NVIDIA-labs OO Agents: Native Python Object-Oriented Agents

Traditional agent development is split across prompt templates, tool schemas, callback code, and workflow grap

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

From Agent Failures to Text Policies: What Works and What Breaks

TextGrad improves language-model systems by revising text from feedback. Its core thesis is that natural-langu

センサ/時系列機械学習時系列テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

品質予測/異常検知画像検査深層学習軽量化・量子化生成画像テキスト

Demonstrating GenDB: Instance-Optimized and Customized Query Processing Code Generation via LLM Agents

Traditional query processing engines require continuous development and extensions to support new techniques a

用途: 生成
難易度: Hard
コスト: High

表形式向き自然言語処理大規模言語モデル生成テキスト

PoTRE: Test-Time Reasoning inspired by Cognitive Heterogeneity

モデルの脆弱性を解決するために、四つのエージェントに分割される多様なフレームワークPoTREを導入した。モデルの推論能力を強化し、単一のストリーミングアプローチよりも複雑な理論的制約とアブストラクションに抵抗できるように

用途: 複雑な推論力のあるタスクの解決
難易度: Hard
コスト: High

The Ethics of Autonomous AI Agents for Offensive Security

侵攻テストツールが異なっている点、決定主義的な性質、狭く特定されたスコープ、専門技術の操作を用いたものと異なり、LLM駆動の自治的セキュリティツールは3つの次元で不確実性を示した。政策決定への説明が困難、影響の開放性、行

用途: 自律的セキュリティツールの倫理的考慮
難易度: Hard
コスト: High

Small, Free, and Effective: Orchestrating Open-Weight Small Language Models to Outperform Single LLM for Malware Analysis

分析報告の迅速な解釈が求められるときに行われるマルウェア分析を実現するために、閉じた重みの大きい言語モデルを使用しないことが多い。オープン重みの言語モデルは、マルウェア分析のために適切な言語能力と、閉じた重みの大きい言語

用途: マルウェア分析のための小規模な言語モデル
難易度: Hard
コスト: High

PRO-LONG: Programmatic Memory Enables Long-Horizon Reasoning

Long-horizon tasks require sustained perception, reasoning, and exploration, and are a persistent challenge fo

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

EvoDRC: A Self-Evolving Agentic Framework for Automated DRC Violation Repair

Design Rule Check closureを促進するための自動修正フレームワーク、EvoDRCを開発し、複雑な幾何学的相互作用を考慮した修正を実行する。

用途: デザインルール違反修正の自動化
難易度: Hard
コスト: High

Drift-Aware RL-based Wavelet Denoising for Network-Traffic Anomaly Detection

回線流量データに対するノイズと漂移を考慮した波列減少アルゴリズムを実装し、静的な波列減少法が漂移のあるシナリオでは効果を低下していると指摘する。

品質予測/異常検知自然言語処理RAG検出異常検知

用途: 回線流量異常検出システムの精度向上
難易度: Hard
コスト: Low

Safe Remediation as Risk-Constrained Intervention Decision in Microservice Systems

ITオペレーションにおける安全なリメディエーションを確保するためのリスクを制約した介入決定問題として定式化し、安全なリメディエーションを確実に行うためのConstrained Markov Decision Proces

説明可能機械学習教師あり学習テキスト

用途: ITオペレーションの安全性確保
難易度: Hard
コスト: Medium

品質予測/異常検知深層学習軽量化・量子化生成画像テキスト

Learning to Detect UI Principle Violations via Reinforcement Learning

Small language models and coding agents increasingly generate web front-end code, yet their outputs are typica

用途: 生成
難易度: Hard
コスト: High

OpenSkillRisk: Benchmarking Agent Safety When Using Real-World Risky Third-Party Skills

大きな言語モデルのエージェントは、第三者のスキルによる実際的な危険を認識し回避する能力を評価します。

用途: 第三者のスキルで安全でない動作を行うリスクを評価する
難易度: Hard
コスト: High

Solar Open 2 Technical Report

We present Solar Open 2, a 250B-A15B Mixture-of-Experts language model built for long-horizon agentic tasks, s

品質予測/異常検知深層学習軽量化・量子化テキスト

用途: 長期のアギーントタスクに適した言
難易度: Hard
コスト: High

JANUS: Foreseeing Latent Risk for Long-Horizon Agent Safety

Agent safety is moving from content moderation toward preventing operational failures before tool-using agents

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

Beyond Relevance-Centric Retrieval: Rubric-Oriented Document Set Selection and Ranking

3D オキュピエンシー予測には、物体の配置と密度を解釈するための視覚的手法が必要です。従来の方法では、計算コストが高くなりすぎていたが、新しく提案されたGaussianSeedアルゴリズムは、層を階層化することで、計算コ

用途: 3次元空間における物体の配置と密度の予測
難易度: Hard
コスト: High

センサ/時系列深層学習軽量化・量子化QA画像テキスト

Multimodal Large Language Models for Remote Sensing Image Understanding: Domain-Specific or General-Purpose?

画像理解のための多モーダルラージランゲージモデルは、強力ですが、まだ能力と限界については明確な理解が不足しています。この論文では、多モーダルラージランゲージモデルが画像理解においてどの程度の能力と限界を持つか、を分析し、

用途: 画像理解における多モーダルラージランゲージモデルの能力と限界
難易度: Hard
コスト: High

Unified Prediction and Planning via Conflict-Aware Disjoint Parameter Training

この研究では、共感覚的ロボット移動において動く人間の行動の予測と安全な動作プランニングの自動化を目的として、統合的な行動予測と安全な動作プランニングのフレームワークを提案します。

用途: 人間の動きの予測と安全な動作プランニングの自動化
難易度: Hard
コスト: High

MI向き品質予測/異常検知自然言語処理大規模言語モデル生成画像テキスト

ETPDesigner: Multi-Agent Orchestration for Interactive Multimodal Electronic Theater Program

ETPデザイナはマルチモーダルな電子シアターのデザインを自動化するフレームワークを提案します。

用途: 生成
難易度: Hard
コスト: High

自然言語処理ファインチューニング画像動画マルチモーダル

EA-Nav: Learning Safe Visual Navigation Policies with Embodiment Awareness

Cross-embodiment navigation is a key challenge in embodied intelligence. Due to differences in embodiment, the

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョンマルチモーダルQA画像

Silent Failures in Multimodal Agentic Search:A Diagnostic Taxonomy and Cross-Judge Evaluation

この研究では、可視化された質問への対応を評価するために、新しい方法を提案しました。この方法は、質問への回答の正確性だけでなく、質問への回答のパターンや特徴も評価することができます。

用途: 可視化された質問への対応を評価する
難易度: Hard
コスト: High

SafeGen: Goal-Conditioned Video Diffusion of Safety-Critical Scenarios for VLM-Based Autonomous Driving

VLMs are increasingly deployed in AD systems, creating an urgent need for rigorous safety evaluation under rar

自然言語処理RAG生成画像テキスト

用途: 生成
難易度: Hard
コスト: High

NavVerse: Benchmarking Indoor-to-Outdoor Embodied Navigation in Continuous Robot Simulation

Robots deployed in delivery, campus, and emergency-response settings often need to navigate from buildings to

自然言語処理プロンプトエンジニアリングテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Distributed Motion Planning with Safety Guarantees for Self-Reconfiguring Robotic Boats

Aquatic self-reconfigurable robots must assemble into desired shapes while ensuring safe interactions among mu

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

自然言語処理プロンプトエンジニアリング検出画像テキスト

ReferTrack: Referring Then Tracking for Embodied Visual Tracking

ReferTrack は、自然言語で対象の車両に付近する自動車を追従させるシステムである。このシステムでは、対象の車両に付近する自動車を認識する後、自動車の動きを予測する。

用途: 自動車が対象の車両に付きそわせるシステム
難易度: Hard
コスト: High

Defer to Plan: Adaptive Multi-Agent Fusion for End-to-End V2X Driving

Defer to Plan は、自動車が情報交換をして安全に走行するためのシステムである。このシステムでは、自動車間で情報が交換され、各車が安全に走行するような経路を選択できる。

用途: 自動車が情報交換をして安全に走行するためのシステム
難易度: Hard
コスト: Medium

Remote ID Spoofing-Aware Trajectory Planning for Small Unmanned Aerial Systems

This work presents a decentralized, spoofing-aware trajectory planning framework for small unmanned aerial sys

MLOpsモデルデプロイ

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Improved Lower Bounds and Output Augmentation for Facility Location Mechanisms

We study the strategic facility location problem under the egalitarian objective, where a mechanism uses the r

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Identity-Truthful Online Decision-Making

In Bayesian online selection, a decision-maker observes a sequence of stochastic rewards and must immediately

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-22

NexForge: Scaling Agent Capabilities through Requirement-Driven Task Synthesis for LLMs

Scaling executable agent training data for LLM post-training is bottlenecked by substrate-bound methods that t

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-07-22

atomic-agents — Building AI agents, atomically

AIエージェントを組み立てるためのライブラリ。

用途: AIエージェント建設
難易度: Easy
コスト: High

githubGitHubあり2026-07-22

Finance-LLMs — Comprehensive Compilation of Real-World LLM & AI Agent Use Cases in Financial Services

販売データを分析するために、機械学習モデルが使用されるリソースが提供されていました。

用途: 販売データを分析する
難易度: Easy
コスト: High

Fundamental limits of distributed multiclass classification from simple binary decisions

分類タスクの性能を最適化するために、分布化されたクラスター間の分離を考慮したアルゴリズムが提案されていました。

機械学習教師あり学習分類

用途: 分類タスクの性能を最適化する
難易度: Hard
コスト: Low

Twin Agent: Context Residual Compression for Privilege Separated Agents

Large language model (LLM) agents are vulnerable to security risks, such as prompt injection attacks from untr

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivGitHubあり2026-07-21

Knowledge-Centric Self-Improvement

知識を重視した自己向上の研究を実施し、自己向上を知識を重視することにより効果的に行う方法を提案した。

用途: 知識を重視した自己向上
難易度: Hard
コスト: High

Agents in the Wild: Where Research Meets Deployment

分散型言語モデル（LLM）やコンテキストを活用するエージェントは、製品開発やファイナンス分野で活用されている。エージェントを実用化するには、堅牢性、安全性、信頼性を確保することが大切となる。このチュートリアルでは、エー

用途: エージェントの実践
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理埋め込み・検索分類生成

Supra Cognitive Modes: A Routed Architecture for Agent Memory

この研究では、エージェントメモリーのワークロードは直接的事実検索、関係連鎖や現在の状態の推論、長時間の履歴上に関係がある合成を組み合わせて、Supra Cognitive Modes を開発しました。このアーキテクチャで

用途: メモリアーキテクチャの設計
難易度: Hard
コスト: Low

MedDDC-Eval: Diagnosis-Decoupled Evaluation of Multi-Turn Medical Consultation Agents

Multi-turn medical consultation agents must decide what to ask, adapt to patient responses, and determine when

説明可能品質予測/異常検知自然言語処理RAG生成

用途: 生成
難易度: Hard
コスト: Low

説明可能品質予測/異常検知深層学習軽量化・量子化テキスト

Verifiable Self-Evolution for Open-Ended Dialogue Skills via Future-Feedback Prediction

この研究では、固定化された言語モデルを強化するために、自律進化する対話スキルを開発しています。このシナリオでは、ユーザーの反応がモデルの進化に影響を受けないため、対話の対称性を維持する必要があります。

用途: 对話スキル自律進化
難易度: Hard
コスト: Medium

AI Tour Meeting: Group Travel Planning by LLM Agents

This paper proposes AI Tour Meeting, a group travel planning framework powered by multiple Large Language Mode

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

RF-Agent: A Practical Framework for Building Language Agents for RFIC Design

Large language models (LLMs) have driven rapid progress in electronic design automation (EDA), yet their appli

深層学習軽量化・量子化生成テキスト

用途: 生成
難易度: Hard
コスト: High

AgentDebugX: An Open-Source Toolkit for Failure Observability, Attribution, and Recovery in LLM Agents

LLM agent failures are difficult to debug because the step where an error surfaces is often not the one that c

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivGitHubあり2026-07-21

AutoIndex: Learning Representation Programs for Retrieval

リカバリーのためのプログラムを学習するフレームワークを提案し、そのプログラムを用いて、文書にラベルを付与する検索システムを構築する。

品質予測/異常検知自然言語処理RAGテキスト

用途: リカバリーのためのプログラムの学習
難易度: Easy
コスト: Low

PathAgentBench: Benchmarking Evidence-Seeking Vision-Language Models on Whole-Slide Pathology Image

Whole-slide image (WSI) diagnosis requires identifying diagnostically relevant regions, examining them across

自然言語処理ファインチューニング検出生成画像

用途: 検出
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformerセグメンテーション画像テキスト

IGGT4D: Streaming 4D Instance-Grounded Geometry Transformer

実際の空間知能では、空間に続いて流れるビデオを理解する必要がある。この問題を解決するために、4次元空間を理解することができるモデルを提案する。

用途: 空間に続いて流れるビデオを理解する
難易度: Hard
コスト: High

Stochastic Multi-Objective Kinodynamic Planning Against Adversaries

この研究では、複雑な環境に対処した後、キノ動的計画を解決します。

用途: キノ動的計画
難易度: Hard
コスト: Low

Computing on the Fly: Navigating a Vision for the Future of Drone Computing

The report envisions a decade in which drones move goods, medical supplies, and information at a scale compara

強化学習検出生成

用途: 検出
難易度: Hard
コスト: High

Agentic Real2Sim: Physics-based World Modeling with Vision-Language Agents

Real-to-sim conversion for robotic interaction with objects remains labor-intensive because it requires more t

コンピュータビジョンマルチモーダル画像テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Beyond Transformers: Linear Attention Policy for Open-Vocabulary Object Goal Navigation

オープン・バグナビゲーションには、エージェントへの部分観測が含まれます。パフォーマンスの向上のために、内部状態更新が重要です。これを実現するには、ポリシーネットワークの更新が必要です。最近のアプローチでは、トランスフォー

深層学習Transformerテキスト3D

用途: オープン・バグナビゲーション問題を解決する
難易度: Hard
コスト: High

表形式向き深層学習軽量化・量子化テキスト3D強化学習

Intelligent Multi-UAV Navigation in ITNTNs: A Hierarchical LLM Approach

The deployment of high-speed Uncrewed Aerial Vehicles (UAVs) in 3D aerial highways necessitates robust coordin

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Stability in Combinatorial Markets with Side Payments

Combinatorial markets provide a general framework for trading bundles of indivisible goods. Building on the co

機械学習教師あり学習分類

用途: 分類
難易度: Hard
コスト: Low

huggingfaceHugging Faceあり2026-07-21

FinanceComplexQA: Benchmarking Agentic Reasoning on Industrial-grade Financial Documents

Agentic Reasoning has become a transformative force in financial analysis due to its ability to integrate larg

品質予測/異常検知自然言語処理RAG生成要約テキスト

用途: 生成
難易度: Easy
コスト: Low

huggingfaceHugging Faceあり2026-07-21

ABot-World-0: Infinite Interactive World Rollout on a Single Desktop GPU

We present ABot-World-0, an action-conditioned video world model for real-time, long-horizon closed-loop inter

品質予測/異常検知深層学習軽量化・量子化テキスト動画

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-07-21

agent-starter-pack — Ship AI Agents to Google Cloud in minutes, not months. Production-ready templates with built-in CI/CD, evaluation, and observability.

AIエージェントをGoogle Cloudに展開することが可能で、CI/CD、評価、観察など、プロダクションリードテンプレートが事前に用意されています。

用途: AIエージェントをGoogle Cloudに展開
難易度: Easy
コスト: High

githubGitHubあり2026-07-21

BettaFish — 微舆：人人可用的多Agent舆情分析助手，打破信息茧房，还原舆情原貌，预测未来走向，辅助决策！从0实现，不依赖任何框架。

微舆は人人可用的多Agent舆情分析助手であり、情報茧房を打破して舆情の原貌を還元し、未来の走向を予測し、決策を助けることができます。

用途: 舆情分析助手の問題を解決する
難易度: Easy
コスト: High

The Story Shapes the Agent: Narrative Priors in LLM Behavior

Persona prompting is widely used to steer LLM agent behavior, yet the narrative framing of a task can matter m

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer分類テキスト

SWE-Pruner Pro: The Coder LLM Already Knows What to Prune

Pruning long context for coding agents has been a vital technology for efficient context management. While exi

用途: 分類
難易度: Hard
コスト: High

Operational Hallucination and Safety Drift in AI Agents

Large language models (LLMs) serving as planners in tool-using autonomous agents introduce dynamic reliability

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

説明可能品質予測/異常検知自然言語処理ファインチューニング検出異常検知テキスト

O-VAD: Industrial Video Anomaly Detection through Object-Centric Tracking and Reasoning

工場の中の異常が検出されるように設計された機械学習モデルを提案しています。通常の方法では、モデルはビデオ内のすべての内容を考慮し、複雑な問題を解決することは困難です。提案されたモデルのアプローチは、オブジェクトを検出して

用途: 産業ビデオの異常発生検出
難易度: Hard
コスト: High

FinSAgent: Corpus-Aligned Multi-Agent RAG Framework for Evidence-Grounded SEC Filing Question Answering

金融質問回答を実行するには、長い標準化されて高度に冗長な説明書に分散する証券取引委員会（SEC）の証拠を取得する必要がある。既存の取得を拡張するおよび多要素システムの多くの選択肢は、モデルの先行事項と目的のファイルリング

深層学習軽量化・量子化生成QAテキスト

用途: 金融質問回答問題を解決
難易度: Hard
コスト: Low

品質予測/異常検知コンピュータビジョンマルチモーダル画像

MAGE: Human-Like Macro Placement via Agentic Multimodal Reasoning

Macro placement still requires substantial manual refinement in industrial physical design flows. We present M

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Beyond Fixed Goal Delivery: Online POMDP Planning for Target Interception in Crowds

Target interception in crowded environments requires reaching a moving objective while navigating among multip

機械学習特徴量エンジニアリング

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Towards Torque-Driven Reinforcement Learning for Quadruped Locomotion

Reinforcement learning (RL) for legged robots is advancing locomotion, demonstrating its ability to adapt to n

センサ/時系列深層学習軽量化・量子化強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivGitHubあり2026-07-20

UniETP: Unifying Environments for Generalizable Embodied Task Planning

This paper focuses on the problem of Embodied Task Planning, where an agent is required to execute a sequence

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

RoboHarness: Memory-Driven Orchestration of Heterogeneous Robot Policies for Long-Horizon Planning

existing robot control methodの限界を解決するためのmemory-driven orchestration method、RoboHarnessを提案し、長期計画を実現する。

自然言語処理プロンプトエンジニアリング異常検知

用途: ロボットの長期計画を解決する
難易度: Hard
コスト: High

RT-SHCUA: Real-Time Self-Hosted Computer-Use Agent for UAV Control

Natural-language control offers a promising interface for unmanned aerial vehicles (UAVs), but directly applyi

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Value-Aware Prediction for Robust Multi-Agent Coordination Under Communication Loss

Robust multi-agent coordination relies heavily on inter-agent communication, which is frequently disrupted by

深層学習正規化・最適化手法テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

Generalize and Guide: Decomposing Rewards for Few-Shot Inverse Reinforcement Learning

複数タスク間の説明性を提供するための逆強化学習は、複数タスク間の説明性を提供することによって、複雑なタスクを解決することに関与していますが、この研究では、複数タスク間の説明性を提供するための逆強化学習の新たなアプローチを

少数データ向き自然言語処理RAG強化学習

用途: 複数タスク間の説明性のための逆強化学習のための新たなアプローチ
難易度: Hard
コスト: Low

Lifelong Multi-Subsystem Pickup and Delivery with Buffer-Limited Handover Stations

Pickup and Deliveryシステムでは、ロード管理が大きな問題です。この研究では、 Pickup and Deliveryシステムにおけるオフロード管理を考慮した新しいアプローチであるHandover-Awa

用途: Pickup and Deliveryシステムのオフロード管理
難易度: Hard
コスト: Medium

GeoWorldAD: Geometry World Action Model for Autonomous Driving

Autonomous driving requires both safe and efficient planning decisions in dynamic 3D environments. Although re

深層学習Transformer画像動画3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Nonexistence of Simultaneously EF1 and Pareto Optimal Allocations for Submodular Valuations

The existence of allocations of indivisible goods that are simultaneously fair (envy-free up to one item (EF1)

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

1-out-of-5 Maximin-Share Allocations Always Exist for Four Agents

For four agents with nonnegative additive valuations, a complete 1-out-of-5 maximin-share allocation always ex

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

MI向きセンサ/時系列強化学習マルチエージェント異常検知

Compositional Semantic Communication for Physical AI: Category Theory Meets Game Theory

Physical artificial intelligence (AI) systems involve distributed sensing agents with embedded AI models that

用途: 異常検知
難易度: Hard
コスト: Medium

The Shared Discovery Paradox: How a One-Answer Rule Turns Better Information into Worse Search

組織の中で、情報は共有され、同僚がその情報に基づいて行動することが多い。研究者たちは、このような情報の共有によって、共有する前よりも探索の精度が向上することがあることに注目しました。しかし、このような共有によって、探索の

用途: 探索問題
難易度: Hard
コスト: Low

Beyond Stability: Improved Efficiency Guarantees for $α$-Stable Matchings

Stable matching mechanisms are fundamental to market design but face an inherent tension between stability and

機械学習特徴量エンジニアリング

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

When One Good Is Not Enough: EF1 and Pareto Optimality Are Not Compatible for Submodular Valuations

One of the central questions in discrete fair division is whether fairness and efficiency can be achieved simu

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

huggingfaceGitHubありHugging Faceあり2026-07-20

Differentiable Logic Gate Networks for Low-Latency EEG Classification on Edge Devices

Real-time EEG classification on edge devices is bottlenecked by the floating-point arithmetic of conventional

CPUで試しやすい強化学習マルチエージェント分類検出

用途: 分類
難易度: Easy
コスト: Low

説明可能品質予測/異常検知自然言語処理大規模言語モデル動画マルチモーダル

EduPanel: A Three-Agent LLM Judge for Teaching Videos -- Reliability, Complementarity, and Human Trust Calibration

Teaching videos are becoming a major medium for education, creating a growing need for scalable evaluation of

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

FlashRT: Agent Harness for Guiding Agents to Deploy Real-Time Multimodal Applications

Real-time multimodal applications, including voice agents and interactive video generation, compose heterogene

深層学習軽量化・量子化生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

huggingfaceGitHubありHugging Faceあり2026-07-20

WorldCupArena: Fine-Grained Evaluation of Language Models and Deep-Research Agents on Football Forecasting

Predicting a football match before kickoff requires more than knowing past results: a model must use changing

コンピュータビジョンセグメンテーション予測テキスト

用途: 予測
難易度: Easy
コスト: Low

Self-State Attacks on Self-Hosted AI Agents: How Far Can OS Defenses Go?

Self-hosted AI agents read and write their own memory and configuration files to function. An agent may get co

深層学習Transformer検出

用途: 検出
難易度: Easy
コスト: Medium

Coercion and Deception in AI-to-AI Management: An Agentic Benchmark of Unprompted Escalation

Multi-agent systems routinely place one AI agent in authority over another. When a subordinate refuses a task,

自然言語処理大規模言語モデル分類テキスト

用途: 分類
難易度: Easy
コスト: High

githubGitHubあり2026-07-20

Gymnasium — A standard API for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Gymnasiumは、シングルエージェントRLの疑似環境を提供するAPIです。

用途: 疑似環境を提供する
難易度: Easy
コスト: Medium

表形式向き深層学習Transformer表形式強化学習

Non-Asymptotic Best Policy Identification Guarantees in Online Reinforcement Learning

In this work we study the Best Policy Identification (BPI) problem in online, tabular Reinforcement Learning.

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

Retriever: Composing Closed-Loop Asynchronous Robot Programs

Building long-horizon robot agents requires composing closed-loop pipelines -- perception, belief update, plan

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Temporal Fair Division of Indivisible Goods with Structured Constraints

This paper investigates temporal fair division, a setting where items are allocated over multiple rounds and a

コンピュータビジョン動画認識テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Equilibrium analysis of three-player General Lotto game with leader-follower framework

In this paper, we introduce the General Lotto game with a regulator (R-Lotto), a leader-follower extension of

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-19

TimeLens2: Generalist Video Temporal Grounding with Multimodal LLMs

Video multimodal large language models (MLLMs) can describe what happens in a video, but rarely identify when

自然言語処理大規模言語モデル検出テキスト動画

用途: 検出
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-19

EvolvingWorld: An Open-Schema Framework for Co-Evolving Role-Play Agents and World Model in Interactive Literary World

This paper introduces EvolvingWorld, a framework and benchmark for character and world co-evolution in interac

用途: 生成
難易度: Easy
コスト: High

ADMM-Based Safety-Critical Distributed NMPC for Cooperative Transportation by Quadrupedal Robots

This paper presents a safety-critical distributed nonlinear model predictive control (DNMPC) framework for coo

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

説明可能深層学習軽量化・量子化生成テキストマルチモーダル

G2-Nav: Grounded and Guarded Vision-Language Costmaps for Robot Social Navigation

Social navigation requires the robot to reason and respond in complex real-world environments. While recent wo

用途: 生成
難易度: Hard
コスト: High

A BIM-enabled, Agent-based Discrete-event Simulation Platform for Robotic Studies: A Method based on Graph Theory

Indoor robots are increasingly employed for facility management tasks such as cleaning and inspection. These a

品質予測/異常検知深層学習軽量化・量子化検出

用途: 検出
難易度: Hard
コスト: Medium

センサ/時系列コンピュータビジョンセグメンテーション分類検出3D

InLiER: Learning-Free Heterogeneous LiDAR Place Recognition via Intermediate Mixed-Radix Structural Keypoint Tokenization

LiDAR place recognition supports loop closure, relocalization, and multi-agent map management. As robotic plat

用途: 分類
難易度: Hard
コスト: High

PhyAgentOS: A Self-Evolving Operating System for Embodied Agents with Decoupled Cognitive Planning and Physical Execution

Vision-language-action models, world models, and agentic planners each advance physical intelligence, yet thei

MI向きコンピュータビジョンマルチモーダル

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

AI-Augmented Model Predictive Control for Safe and Adaptive Rendezvous and Proximity Operations

Autonomous rendezvous and proximity operations (RPO) in adversarial orbital environments require guidance arch

説明可能深層学習軽量化・量子化

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

SAGE: A Socially-Aware Generative Engine for Heterogeneous Multi-Agent Navigation

Safe and socially compliant navigation in open human-robot environments requires robots to reason about hetero

深層学習Transformer生成テキスト

用途: 生成
難易度: Hard
コスト: High

huggingfaceHugging Faceあり2026-07-18

DataFlow-Harness: A Grounded Code-Agent Platform for Constructing Editable LLM Data Pipelines

Large language models (LLMs) are increasingly used to automate data-processing workflows, yet coding agents ty

自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-18

Environment-free Synthetic Data Generation for API-Calling Agents

Training API-calling large language model (LLM) agents demands massive amounts of high-quality trajectories. H

用途: 生成
難易度: Easy
コスト: High

From Optimal Policies to Individual Differences: Rethinking Reinforcement Learning for Biology

Reinforcement learning (RL) is primarily known as a computational method for optimizing control tasks, but it

自然言語処理RAG強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

少数データ向き条件最適化自然言語処理RAG検出画像

Embodied Active Learning under Limited Annotation and Navigation Budget for Object Detection

この研究では、ロボットのナビゲーション時間と注釈時間の制約を考慮したオブジェクト検出フレームワークを提案します。

用途: オブジェクト検出を適応化
難易度: Hard
コスト: Low

Network-Induced Strategic Communication in Opinion Dynamics

Classical opinion dynamics typically assume a fixed mapping from private opinions to public signals, such as l

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

Fair Allocation of Divisible Goods under Non-Linear Valuations

We study the problem of dividing homogeneous divisible goods among agents with non-linear valuations. Specific

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

SeerGuard: A Safety Framework for Mobile GUI Agents via World Model Prediction

Mobile graphical user interface (GUI) agents have demonstrated remarkable capabilities in automating complex t

強化学習モデルベース

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

Nonuniformity Principle in Human-AI Coworking

As generative AI is increasingly applied to automate multi-step and high-stake workflows, human judgment and i

品質予測/異常検知機械学習教師あり学習生成

用途: 生成
難易度: Easy
コスト: Medium

RecGPT-V3 Technical Report

Large language models (LLMs) are transforming recommender systems from matching co-occurrence patterns in hist

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

Recursive Harness Self-Improvement

Under model--harness co-evolution, harnesses are not merely inference-time scaffolds but data-generating compo

品質予測/異常検知深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

When Does Muon Help Agentic Reinforcement Learning?

Muon is competitive with AdamW in large-scale pre-training, but its value for reinforcement-learning (RL) post

深層学習正規化・最適化手法強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

DSWorld: A Data Science World Model for Efficient Autonomous Agents

Despite strong capabilities in data understanding and decision-making, autonomous data science agents still he

深層学習軽量化・量子化強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

Beyond Success Rate: Cost-Aware Evaluation of Offensive and Defensive Security Agents

Security-agent evaluations commonly measure peak offensive capability under generous inference budgets, emphas

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

arxivPaper only2026-07-16

Compensation Design

コンピューティングリソースの確実な割り当てを実現するため、compensation designという新しい分野を提案します。

品質予測/異常検知深層学習軽量化・量子化

用途: コンピューティングリソースの確実な割り当て
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-16

Multi-Turn On-Policy Distillation with Prefix Replay

We study on-policy distillation (OPD) for agentic tasks, where an LLM agent interacts with an environment over

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-16

RESOURCE2SKILL: Distilling Executable Agent Skills from Human-Created Multimodal Resources

Skills are a useful abstraction for software agents, turning human and agent experience into reusable procedur

自然言語処理RAG画像テキスト動画

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-07-16

agent-lightning — The absolute trainer to light up AI agents.

最適なAIモデルを効率的に学習するためのオーサリングツール。Agent Lightningを使用して、トレーナーをセットアップし、データをトレーニングしてモデルを学習することができる。

用途: AI_AGENTのトレーナーを簡単にセットアップする
難易度: Easy
コスト: High

少数データ向き強化学習マルチエージェント回帰テキスト

Operator-Informed Gaussian Processes for Complex Helmholtz Wavefields: From Synthetic Benchmarks to In Vivo Brain Elastography

Helmholtz方程式は、時間共伴振波の伝播を記述する重要な方程式であり、媒質が損失した場合複素係数を持ちます。ここでは、空間での波場から波方程式を推測するために、物理知識に基づくGaussian Process（GP

用途: 复雑なHelmholtz波場のための物理知識に基づくGaussian Process
難易度: Hard
コスト: Medium

Analogical Deep Research: Retrieving and Integrating Historical Analogies for Foresight Analysis

述語学習における歴史的類推を推測し、歴史的類推を評価するためのアナロジーディープリサーチという新しいタスクを提案し、述語学習における歴史的類推が重要な役

自然言語処理大規模言語モデル生成テキスト

用途: 述語学習で歴史的類推
難易度: Hard
コスト: High

When Is Delegated Play Truthful? Within-Range Regret and the Trilemma of Aligned Delegation

Advertisers delegate bidding to autobidders; users delegate tasks to language-model agents. A person describes

コンピュータビジョンセグメンテーションテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

The Dynamic Verifiable Multi-Agent Human Agentic Loyalty Loop (DVM-HALL) Model and the Net Human-Agent Score (NHAS) in Autonomous Commerce

自動販売店で客と交わるAIロボットの信頼性を確立する必要がある。このモデルは、客とロボットの信頼関係を構築し、客の買い物をサポートすることを目的としている。

強化学習RLHF

用途: 自動販売店で客と交わるAIロボットの信頼性の確立
難易度: Hard
コスト: Medium

Tighter Bounds for the Random-Offerer Mechanism in Bilateral Trade

二者間取引のランダムオフィシャーの機構を研究し、選択された取引者が取引を実行するための最適な価格を決定することを目的とした研究。

用途: 二者間取引におけるランダムオフィシャーの機構
難易度: Hard
コスト: Medium

Auctions with Contract Design

仕事によって得られる資格の特性に基づいてオークションの参加者を評価し、契約設計を含む複雑なオークションの枠組みについて述べる。

品質予測/異常検知自然言語処理RAG

用途: 契約設計を含むオークション
難易度: Hard
コスト: Low

huggingfaceHugging Faceあり2026-07-15

Diagnosing and Calibrating Tool-Call Boundary Drift in Multi-Teacher On-Policy Distillation

Agentic language models must learn when to call tools, when to consume tool responses, and when to answer dire

深層学習軽量化・量子化生成テキスト

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-15

Cura 1T: Specialized Model for Agentic Healthcare

Healthcare spans high-stakes communication, expert reasoning, and workflow execution, yet specialized LLMs tha

自然言語処理大規模言語モデル画像テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-07-15

ai-engineering-hub — In-depth tutorials on LLMs, RAGs and real-world AI agent applications.

このリポジトリには、LLM、RAG、およびオーソリティの認識を含む、AIエンジニアリングのための深いドキュメントがあります。

用途: 記事を理解するためのテキスト分析ツール
難易度: Easy
コスト: High

arxivPaper only2026-07-14

Stability Buys Time: A Re-Keying Game for Encrypted Multi-Agent Control

暗号化された制御システムでは、クラウドがホモモルフィック暗号化された状態を操作し、動物達の動作をプライバシーで管理することができる。安全を確保するために、サイドチャネル攻撃のリスクを考慮しながら、制御機器が信頼できると仮

深層学習軽量化・量子化検出

用途: 暗号化された制御
難易度: Hard
コスト: Low

arxivPaper only2026-07-14

On Incentivized Exploration beyond Bayesianism and Full-Information

We extend Incentive Compatible Exploration beyond the Bayesian full-information setting of Kremer et al. [2014

自然言語処理ファインチューニング

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-14

ReflectWorld-MM: An Entity-Oriented Multimodal Memory System for Open-Ended Video Streams

Building assistants that can continually watch the world, remember what they see, and reason over their accumu

コンピュータビジョンマルチモーダル画像テキスト音声

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-14

From Human-Centric to Agentic Code Review: The Impact of Different Generations of Generative AI Technology on Review Quality

Code review helps maintain software quality before code integration, but it also imposes a substantial workloa

品質予測/異常検知深層学習Transformer生成テキスト

用途: 生成
難易度: Easy
コスト: High

Awesome-Embodied-Robotics-and-Agent — This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥

Embodied AIやロボットとLarge Language Modelを組み合わせた研究のリポジトリ。

用途: Embodied AIやロボット研究
難易度: Easy
コスト: High

OpenRLHF — An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

OpenRLHFは、Ray上に構築された強化学習フレームワークです。このフレームワークは、PPO、DAPO、REINFORCE++など、様々な強化学習アルゴリズムをサポートしています。

深層学習Transformer画像

用途: 強化学習フレームワーク
難易度: Easy
コスト: High

agents-towards-production — End-to-end, code-first tutorials for building production-grade GenAI agents. From prototype to enterprise deployment.

AIエージェントの開発と実装を行うためのエンドツーマンド、コードファーストのチュートリアル。

用途: AIエージェントの開発と実装
難易度: Easy
コスト: High

memvid — Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.

MemVidは、サーバーレスで単一ファイルの記憶層を提案し、AIエージェントが即時検索と長期的な記憶を持つようにする記憶層です。

自然言語処理大規模言語モデル生成テキスト動画

用途: AIエージェントの記憶を管理する
難易度: Easy
コスト: High

説明可能強化学習モデルフリー (DQN / SAC)テキスト

Auditing the Risk Claims of Distributional Reinforcement Learning

分布型強化学習のリスク評価を容易にするために、分布型強化学習におけるリスク評価を分析しました。

用途: 分布型強化学習のリスク評価
難易度: Hard
コスト: High

Forgetting Our Way to Shared Meaning: Effects of Forgetting on Conceptual Alignment in a Non-Partnership Coordination Game

Shared meaning in language requires people to learn and agree on categories. We ask how characteristics of age

数学・理論最適化

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Paradoxes of Game Theoretic Equilibria and Price of Anarchy

この研究では、ゲーム理論的な均衡点を理解するための手法を開発します。この手法を使用すると、ゲーム理論的な均衡点を理解できます。

用途: ゲーム理論的な均衡点を理解するための手法を開発
難易度: Hard
コスト: Low

Efficient Online Proportional Sampling with Applications to Smoothed Online Learning

この研究では、オンライン確率サンプリングを高速化するための新しいアルゴリズムを提案した。このアルゴリズムは、オブジェクトの分割構造を考慮することで、効率的なデータ構造を構築し、オンライン確率サンプリングを高速化できる。

用途: オンライン確率サンプリング
難易度: Hard
コスト: Medium

arxivPaper only2026-07-12

Reinforcement Learning for Execution under Dynamic Fees in a Closed-Loop DEX Simulator

Trader-facing dynamic fees are increasingly proposed for automated market makers (AMMs), but historical data d

表形式向き自然言語処理RAG表形式強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

arxivPaper only2026-07-12

Representation theorems for actual and alpha powers over general concurrent game frames without assuming independence of agents

Concurrent game frames are a standard semantic framework for logics of strategic reasoning. Two notions of coa

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Double elimination formats for a 64-team FIFA World Cup

The recent expansion of the FIFA World Cup to 48 teams has prompted discussions regarding a potential further

品質予測/異常検知強化学習マルチエージェント

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Best-of-Both-Worlds Fairness for Mixed Goods and Chores

We study the fundamental problem of fairly dividing indivisible items among agents with additive utilities. In

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Optimal Subsidy Bounds for Goods and Chores: One Dollar Each Suffices

We study the fair allocation of $m$ indivisible items to $n$ agents with additive utilities. In our setting, e

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Fair Division with Binary Valuations: Characterizations

We consider the fair allocation of indivisible goods with binary valuations. In this setting, the maximum Nash

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-10

A Knowledge-Based Multi-Agent Framework for Security Control Recommendation

Hardening IT on-premises environments can be a daunting task for teams without access to adequate cybersecurit

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

arxivPaper only2026-07-10

Implicit Midpoint Gradient Descent: Fast and Learning rate free convergence for Zero-Sum Games

We study unconstrained bilinear zero-sum games, a fundamental model in online learning, adversarial optimizati

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-10

Optimal Metric Distortion for Learning-Augmented Matching on the Line

さま

品質予測/異常検知MLOpsモデルデプロイ

用途: Matching on the Line
難易度: Hard
コスト: Medium

githubGitHubあり2026-07-10

multimind-sdk — Your SDK solves all of this. One interface. Unified logic. Local + hosted models. Fine-tuning. Agent tools. Enterprise-ready. Hybrid RAG.Star 🌟 if you like it!

GUI操作自動化に伴う停止判定、復讐、再検索に関する問題を解決し、 GUI操作自動化を実現するためのフレームワークを開発します。

自然言語処理大規模言語モデルマルチモーダル

用途: GUI操作自動化ツール
難易度: Easy
コスト: High

arxivPaper only2026-07-09

Offline Nash Solvers Meet Online Tree Search in Multi-Agent Games on Graphs

マルチエージェントゲームにNash合図を解決するためのPrimitive-GuidedTree Searchアルゴリズムを提案。

用途: マルチエージェントゲームを解決する
難易度: Hard
コスト: Medium

arxivPaper only2026-07-09

Provably Optimal Learning Algorithms for Assistance Games

この論文では、人とアシスタントが協力してタスクを解決する場合のオンラインバージョンの協力ゲーム (Assistance Games) 構造を研究しています。この文脈では、人間は世界の状況を把握できますが、アシスタントは人

用途: 人間とアシスタントが協力してタスクを解決すること
難易度: Hard
コスト: Medium

品質予測/異常検知深層学習Transformer画像テキスト

Social-spatial dependencies for learning visual navigation

これは、社会的行動を予測するための新しいフレームワークであるSocial-spatial dependenciesを提案し、個々のエージェントが社会的信号を学習する能力を向上させる。

用途: 社会的行動の予測
難易度: Hard
コスト: Low

Dynamic neural manifolds for flexible closed-loop control on neuromorphic hardware

これは、脳モデルとシンナー2チップの結合により動的なニューロンマニフルドを実現したDynamic neural manifolds for flexible closed-loop control on neuromor

説明可能深層学習Transformer

用途: Flexible closed-loop制御
難易度: Hard
コスト: Medium

Institutional Red-Teaming: Deployment Rules, Not Just Models, Causally Shape Multi-Agent AI Safety

複数のエージェントの行動を分析するための方法を提案した。複数のエージェントの行動を

用途: 複数のエージェントの行動を分析する
難易度: Hard
コスト: High

Stable Matchings with Minimum Utility Gap

安定マッチング問題では、エージェントが均等な利益を得られるようにマッチングを行う問題です。この問題を解くために、パートナーを2つ以上選択できるマッチングを取り巻く枠組みを提案し、2つのメートルを使用して利益の均衡度を評価

用途: 安定なマッチングプレイの問題
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-08

DeepSearch-World: Self-Distillation for Deep Search Agents in a Verifiable Environment

Training tool-use agents to improve from their own experience remains challenging, as supervised fine-tuning r

深層学習軽量化・量子化生成強化学習

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-08

Agon: Competitive Cross-Model RL with Implicit Rival Grading of Reasoning

Reinforcement learning from verifiable rewards (e.g. GRPO) is the engine behind today's reasoning models, yet

コンピュータビジョンセグメンテーションテキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

arxivPaper only2026-07-07

A Gold-Standard Study of What Makes a Lightweight Game-Playing Agent Strong

これは、プレイヤーが勝つゲームの勝利条件の強制とパロディーを目的としています。カードプレーヤーのゲームで特に興味を持っています。

深層学習CNNテキスト強化学習

用途: パソコンゲームの勝利するアリソーの決定
難易度: Hard
コスト: High

arxivPaper only2026-07-07

Slack and Budget Breaking in Threshold Team Production

しきい値システムでは、任意のタスクを実行するには、$\Nstar = \kappa + \Delta$の公証書が必要です。この場合、$\Delta$が余分な公証書で、タスクの遅延は公証書が$\Nstar - \kappa

用途: タスクの完了を確保する方法
難易度: Hard
コスト: Medium

arxivPaper only2026-07-07

Strategic Bargaining in Multi-Buyer Markets: Reinforcement Learning from Verifiable Rewards for LLM Negotiations

複数の買い手を持つ市場における交渉システムを構築します。マーケットの規模を知り切れていない場合、セラーの損失が生じます。セラーは市場の規模を測る必要がありますが、これは複数の買い手を持つ場合に困難です。

自然言語処理大規模言語モデルテキスト強化学習

用途: 複数の買い手を持つ市場における交渉
難易度: Hard
コスト: High

huggingfaceHugging Faceあり2026-07-07

Behavioral Privacy Leakage in Agentic Negotiation: Formalizing and Mitigating Inference Attacks via Randomized Policies

Autonomous negotiation agents are increasingly deployed in high-stakes settings such as insurance and procurem

センサ/時系列機械学習時系列

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

githubGitHubあり2026-07-07

DATAGEN — DATAGEN: AI-driven multi-agent research assistant automating hypothesis generation, data analysis, and report writing.

AIドライブのマルチエージェント研究アシスタント。仮説の生成、データ分析、およびレポートの生成を自動化する。

用途: AI研究アシスタント
難易度: Easy
コスト: High

Heaviside Continuity of Rolling Coefficients for Eliminating Epistemic Entropy in Large Language Models

本研究では、推論プロセスの検証を目的とした Heaviside 不連続性の考慮を提案する。これにより、推論プロセスにおける潜在的なミスを検出した上で、正しい出力を生成することができる。

用途: 大容量言語モデルでの推論の検証
難易度: Hard
コスト: High

Dynamics and Convergences for Markov Coevolutionary Opinion Formation Games in Dynamic Social Networks

While deterministic variants of the coevolutionary opinion formation games such as the K-Nearest Neighbor (K-N

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Multi Choice Min Prophet

We study the minimization counterpart of the classic prophet inequality, often termed the min prophet or cost

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Strategic Buying Agents

オンライン購入の最適化を目的とするストラテジックビーイングアージェントフレームワークを発表する。

コンピュータビジョンセグメンテーションテキスト

用途: オンライン購入の最適化
難易度: Hard
コスト: Medium

arxivPaper only2026-07-05

Beyond Self-Resolution: Settlement Factorization for Robust Natural Language Mechanism

Language models increasingly mediate paid advice: agents submit open-ended forecasts, recommendations, plans,

深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-05

Mechanism Design for Locating a Bridge Between Regions with Prelocated Facilities

In many urban planning projects, social planners require the construction of a bridge to connect two regions s

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-07-04

Towards Self-Evolving Agents: A Human-Inspired Adaptive Exploration-Exploitation Framework for Genetic Network Programming

Recent advancements in agentic AI have increasingly moved toward graph-based methods, driven by the demand for

説明可能自然言語処理ファインチューニング

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

SeqGPT: A Constrained Transformer Agent for the Inverse Designof Multi-Panel Composite Structures

Optimizing composite stacking sequences to match continuous targets (e.g., Lamination or Buckling Parameters)

品質予測/異常検知深層学習Transformer

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Congestion Games with Heterogeneous Valuations: An Optimal Transport Approach

In emerging urban mobility and logistics applications, such as advanced air mobility, electric vehicle chargin

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

Random Serial Dictatorship is $\sqrt{2}$-Envy-Free

We analyze the house allocation problem, in which a set of agents must be matched to a set of objects for whic

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Teaming Up with AI: Coordination and Cooperation

Successful diffusion of AI in the workforce hinges on the economic value that AI brings to human endeavors. Br

生成AI拡散モデル

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

A Tractable Continuous-Time Model for Designing Interventions for Time-Inconsistent Agents

Designing effective goals and rewards for time-inconsistent agents is a central problem in many long-term task

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

表形式向き品質予測/異常検知自然言語処理RAG生成表形式強化学習

Hybridizing a Grouping Metaheuristic with Reinforcement Learning for the One-Dimensional Bin Packing Problem

1D バイナリングパッキング問題（1D-BPP）とは、さまざまな用途に多く応用される、分配不可能なNP困難な組合せ最適化問題である。この研究では、Falkenauerのハイブリッドグループゲンエイリアスアリファメント（

用途: 1D バイナリングパッキング
難易度: Hard
コスト: Low

Mechanism and Stability Analysis of Metabolic Closed-Loop Metaheuristics

この論文は、メタ解析システムのフレームワークレベルでの解釈を研究する。メタ解析システムのリソースループの解釈は、ナラティブのための象徴的表現だけではなく、フレームワークレベルにおいても存在するのではないかという質問を中心

深層学習Transformer生成

用途: メタ解析システムの安定性の分析
難易度: Hard
コスト: Medium

Epistemic Horizon Minority Games: When Abundance Reduces Strategic Value

Strategic value can fall when an option becomes visible. A route, signal, bet, or opportunity may be attractiv

深層学習Transformer分類画像

用途: 分類
難易度: Hard
コスト: Low

Complex dynamics in the Sherrington-Kirkpatrick game

エンビリー率という、公平な割り当てに基づく新しいロケーションゲームの問題を解決するための、ステーションポイントの最適位置を決定するためのアプローチを提示しました。

用途: エンビリー率に基づくロケーションゲーム
難易度: Hard
コスト: Medium

Facility Location Game with Envy Ratio

マックス方程式に基づく二階ロケーション問題の問題を解決するための、アプローチを提示しました。

用途: マックス方程式に基づく二階ロケーション問題
難易度: Hard
コスト: Medium

Constrained Distributed Heterogeneous Two-Facility Location Problems with Max-Variant Cost

連携分散H二階ロケーション問題下のmax-variantコスト最小化の問題を解決するための、アプローチを提示しました。

用途: 連携分散H二階ロケーション問題下のmax-variantコスト最小化
難易度: Hard
コスト: Low

Congestion-Based Slot Pricing in a Railway Auction Game

鉄道アジストゲームのスロット価格決定の問題を解決するための、アプローチを提示しました。

用途: 鉄道アジストゲームのスロット価格決定
難易度: Hard
コスト: Medium

表形式向きコンピュータビジョンセグメンテーション分類表形式

MMAO-Cls: Metabolic Multi-Agent Optimization for Joint Feature Selection and Classifier Tuning

マルチアジェント最適化を使用して、クラスター選択とモデル調整のためのMMAOクラスの実現を提案しました。

用途: クラスター選択とモデル調整のためのメタボリックマルチアジェント最適化
難易度: Hard
コスト: Low

MMAO-Dyn: A Metabolic Multi-Agent Optimizer for Dynamic Optimization

この研究では、メタボリックマルチエージェント最適化 (MMAO) が動的最適化に適用できるようにする必要がありました。MMAO-Dyn は、環境の変化によって元の有効な局所的構造を無効にした非stationary な設

強化学習マルチエージェントテキスト

用途: 動的最適化
難易度: Hard
コスト: Medium

Online Fair Division Meets Reordering Buffers

この研究では、個人的な価値が付与されたアイテムを公平に分配する問題を研究します。アイテムは個人の価値を付与することがあり、アイテムの分配が公平または公正ではない場合があります。この研究では、分配に公平性を考慮する方法を提

用途: 分配問題の公平性
難易度: Hard
コスト: Medium

Fair Allocation under Conflict Constraints via Strong Colorability

この研究では、グラフ分配の公平性を研究します。分配には、互いに隣接する頂点が同じアーギェントに割り当てられることが含まれます。この研究では、公平性を考慮したグラフ分配を提案します。

自然言語処理ファインチューニング

用途: グラフ分配の公平性
難易度: Hard
コスト: Medium

Positive and Negative Determinant Strategies in Repeated Games with Behavior-Value Inconsistency

この研究では、直接相互作用を研究します。直接相互作用は、個人が他の個人が行う行動に影響を与えることがあります。この研究では、直接相互作用を考慮したゲーム理論の枠組みを提案します。

用途: 直接相互作用
難易度: Hard
コスト: Low

A Large-Scale Empirical Evaluation of MMAO Under Fair-Budget Continuous and Discrete Benchmarks

この研究では、多様なベンチマークを用いて、Metabolic Multi-Agent Optimizer (MMAO)の適切性を評価します。MMAOは、複数エージェント間でリソースを分配するための閉ループのシステムです。

強化学習モデルフリー (DQN / SAC)

用途: 適切な方法を用いてリソース分配を最適化する
難易度: Hard
コスト: Medium

Knowing Who, Not How Much: Learning-Augmented Mechanisms for Consumer Utility Maximization

個人の価値を尊重するためのメカニズム設計の研究。個人の価値とメカニズム設計の関係を考察し、個人の意思決定を援助するためのメカニズムを設計する。

用途: 個人の意思決定を援助するためのメカニズム設計の研究
難易度: Hard
コスト: Medium

品質予測/異常検知強化学習マルチエージェントテキスト

A Contextual-Bandit Oversight Game with Two-Sided Informational Asymmetry

AIを援助するための意思決定者によるオーバーサイトの研究。AIが提案した行動の評価と決定を行うために、意思決定者とAIが情報を交流するオーバーサイトの実現を研究する。

用途: AIを援助するための意思決定者によるオーバーサイトの研究
難易度: Hard
コスト: Medium

A Lifecycle and Application-Stack Survey of Large Language Model Vulnerabilities: Attacks, Risks, Defenses, and Open Problems

LLMの不正行為に対する防御。この研究では、LLMの不正行為を防ぐための防御の枠組みを開発し、LLMの不正行為の危険性を分析する。

用途: LLMの不正行為に対する防御
難易度: Hard
コスト: High

Learning Fair Allocation of Indivisible Items from Limited Feedback

個人の価値を尊重するためのアイテムの分配を決定するアルゴリズム。この研究では、個人のアイテムの価値を尊重するための分配を決定するアルゴリズムを開発する。

用途: 個人の価値を尊重するためのアイテムの分配を決定するアルゴリズム
難易度: Hard
コスト: Medium

githubGitHubあり2026-06-30

CV — ✅（已完结）超级全面的深度学习笔记【土堆 Pytorch】【李沐动手学深度学习】【吴恩达深度学习】【大飞大模型Agent】

深層学習のノート書。このノートには、土山さんのPytorchノート、おしうの「深層学習」を実践するノート、Wu's「深層学習」をテキスト化したノート、およびダフィンの「大モデルエージェント」のノートが含まれている。

用途: 深層学習ノート
難易度: Easy
コスト: High

arxivPaper only2026-06-29

Minimal MMAO: A Resource-Closed-Loop Framework for Adaptive Metaheuristic Search

This paper presents the Metabolic Multi-Agent Optimizer (MMAO) as an adaptive metaheuristic built around endog

品質予測/異常検知深層学習軽量化・量子化

用途: メタヒューリスティックの自動チューニング
難易度: Hard
コスト: Medium

arxivPaper only2026-06-29

From Detecting Agency to Doing Work: Self-Caused Credit Builds a Durable Behavioral Self in a Minimal Spiking Agent

How does an agent that can tell self from world come to be durably shaped by that distinction? Recent work sho

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-29

A Tunable Incentive Mechanism for Binary Aggregation Without Verification

Binary aggregation without verifiable ground truth arises when agents' reports must be aggregated without acce

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivGitHubあり2026-06-28

When LLMs Develop Languages: Symbolic Communication for Efficient Multi-Agent Reasoning

Chain-of-Thought (CoT) improves large language models (LLMs) on difficult reasoning tasks, but it often incurs

MI向き深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-06-28

Improved Multi-Dimensional Forecasting for Swap Regret

We study the problem of forecasting for an arbitrary number of downstream agents with unknown objectives, each

センサ/時系列機械学習時系列予測

用途: 予測
難易度: Hard
コスト: Low

arxivPaper only2026-06-28

Optimism as a Vulnerability: Deceptive Stackelberg Control of UCB Bandit Followers

Upper Confidence Bound (UCB) algorithms guarantee sublinear regret for agents learning unknown stochastic envi

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-27

LLM Semantic Signaling Game and Mechanism Design: Systematic Blindness, Awareness Shaping, and Mindset Dynamics

Large language models (LLMs) increasingly mediate strategic interactions through natural language, making sema

自然言語処理大規模言語モデル検出テキスト

用途: 検出
難易度: Hard
コスト: High

arxivPaper only2026-06-27

Exit-and-Join Dynamics and Equilibrium in Continuum Cooperative Games

This paper develops a continuum theory of exit-and-join coalition dynamics in nonatomic cooperative games. We

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-27

The Two Genie Game: Adoption and Welfare in Audit-Grounded AI Governance

We ask under what conditions an agent with a harm-minimizing policy can displace an approval-seeking (RLHF) ag

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-26

MMAO: A Metabolic Multi-Agent Optimizer with Endogenous Resource Allocation for Continuous and Discrete Optimization

Traditional meta-heuristics often rely on fixed population sizes, manually chosen search scales, and externall

センサ/時系列深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-26

Triadic Werewolf: A Jester Role for Multi-Hop Theory of Mind in LLMs

これは、LARGE LANGUAGE MODELS (LLM) の理論心の評価を拡張し、三重なるWerewolfゲームを追加しました。

用途: 三重なるWerewolfゲーム
難易度: Hard
コスト: High

Surviving by Serving: Functional Relevance Drives Self-Organization in Complex Adaptive Systems

この研究では、複雑な適応システムの分析をしました。これは、システムの構造を分析することで、系統的な機構がどのように発生するかを理解するために行われた。

自然言語処理ファインチューニング生成

用途: 複雑な適応システムの分析
難易度: Hard
コスト: Medium

Pick Two: An Adversarial Animal Survival Game

The "Pick Two" animal selection puzzle is a popular thought experiment in which two animal species must defend

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Parametric Open Source Games

オープンスースペルミートゲームには、プレイヤーが決定手順に依存して動作するエージェントが含まれる。オープンスースペルミートゲームのパラメトリックモデルが提案され、自発的勾配の理論的枠組みが確立される。

用途: オープンスースペルミートゲームの解析
難易度: Hard
コスト: Medium

Almost EFX in Hypergraphs

この研究では、個々の価値に基づいて分割可能な財を分配する方法を提案している。この分配方法は、個々の価値を考慮しながら、効率的な分配を目指している。

用途: 分割可能な財の分配
難易度: Hard
コスト: Medium

Learning Anonymous Pricing for Online Resource Allocation

この研究では、オンラインリソース分配のアルゴリズムを提案している。このアルゴリズムは、リソースの供給と要求のバランスを考慮しながら、効率的な分配を目指している。

用途: オンラインリソース分配
難易度: Hard
コスト: Medium

Existence of Pure Strategy Nash Equilibria in Finite Noncooperative Games

この研究では、非協力ゲームの純戦略均衡の条件を提案している。この条件は、個々のゲームの結果を考慮しながら、均衡の必要性を評価している。

用途: 非協力ゲームの純戦略均衡
難易度: Hard
コスト: Medium

The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators

この論文では、エージェントの評価を一連の開発の間で共進化させるための新しい方法を提案します。

用途: エージェントの評価を一連の開発の間で共進化させる
難易度: Hard
コスト: Medium

EvoFlock: evolved inverse design of multi-agent motion

多エージェントモデルの調整は、現実的なシミュレーションの実現を支援します。本研究では、新しく開発したモデルによって、調整を行うことができます。

用途: 多エージェントモデルの調整
難易度: Hard
コスト: Medium

Strategyproof Facility Location and Committee Selection with Mixed Max and Sum Agent Types

この研究

用途: 共有タスク割当の設計
難易度: Hard
コスト: Low

Restoring Incentive Compatibility in Two-Stage Energy Markets with Prosumers

分布制御に基づく電力市場の問題は、供給と需要がバランスのとれた状況ではなく、供給が需要より多い状況を表現することができます。

強化学習マルチエージェント生成

用途: 電力供給の分散化における不均衡解決問題の解決
難易度: Hard
コスト: Medium

Equilibrium and Infeasibility: A new solution concept for games

この研究では、ゲームの非共通性を考慮した新しい解決概念の提案、共通解決のための制約を用いない、多項式時間解決を提案します。

用途: 共同ゲームにおける不可能性の対処の研究
難易度: Hard
コスト: Medium

SidConArena: An Environment Evaluating Agents in Open-Ended,Positive-Sum Bargaining Game

Evaluating LLM agents requires dynamic environments that go beyond static reasoning and zero-sum games. Real-w

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Strict Fairness at What Cost? Envy-Free Contracts with Subsidies

共同契約設計は、代理人が複数のタスクを、代理人に分配するという点で重要です。

用途: 共同契約設計における偏りのなく、公平な契約の設計の研究
難易度: Hard
コスト: Medium

Decomposing Financial Market Dynamics via Mechanism Analysis in an Evolutionary Multi-Agent Simulation

Evolutionary agent-based markets (ABMs) couple several mechanisms -- who reproduces, how price forms, how bias

品質予測/異常検知自然言語処理RAG

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games

Recent work has established that regularized policy gradient methods such as PPO, when used in self-play, can

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Low

Flow Games with Public Arcs: the Least Core and the Nucleolus

We study flow games with public arcs, an extension of classical cooperative flow games that allows players to

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Neural Parameter Calibration for Finite-State Mean Field Games

Mean field games efficiently approximate a very large population of strategic agents. While these games can ai

用途: メンフィールドゲームのパラメータの学習
難易度: Hard
コスト: Medium

YUKTI: From Natural-Language Situations to Robust, Verifiable Decisions An Uncertainty-Typed Proposition IR, Assumption-Robust Pareto Frontiers, and a Regret Certificate

Language models turn a worded situation into a numeric plan, and the dominant pipelines (NL4Opt, OptiMUS, ORLM

深層学習軽量化・量子化テキスト音声

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Emergent Culture in Minimal LLM Systems

What happens when LLM agents operate with no context outside a turn, minimal prompting, and simple tools? Insp

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Fundamental market design as a layer of AI-agent alignment

This paper argues that AI-agent alignment in markets should not be understood only as a property of agents, bu

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

Stationary Robust Mean-Field Games under Model Mismatches

Deploying multi-agent reinforcement learning (MARL) in the real world is often limited by model mismatches bet

自然言語処理大規模言語モデル強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Theorist Toolbox: Tools for Agent Based LLM-assisted economic theory Research

Empirical economists often start their projects with a toolbox. Shared packages, replication archives, and cir

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-06-20

Quantifying Theoretical AI Alignment Guarantees: Receiver-Utility Bounds in Bayesian Persuasion

Misalignment can change how information moves from an AI agent to a human user. We model this as an informatio

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-19

Simultaneously Efficient Allocation of Indivisible Items Across Multiple Dimensions

Many allocation problems are intrinsically multidimensional, since an item may contribute differently to sever

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

arxivPaper only2026-06-15

Evolution & Foundation: AI Shares Creative Control

AIが人間と協力して作り出すアイデアを評価するための新しい手法を提案し、創造性の評価を向上させた。

自然言語処理ファインチューニング生成画像3D

用途: AIの創造性を評価するための新しい手法
難易度: Hard
コスト: High

arxivPaper only2026-06-15

Evolutionary Bilevel Reward Shaping for Generalization in Reinforcement Learning

移動環境のロボット学習を可能にするアルゴリズムが提案されている。

機械学習特徴量エンジニアリング強化学習

用途: 移動環境のロボット学習
難易度: Hard
コスト: High

arxivPaper only2026-06-12

Co-Evolved Spiking Neural Network Ensembles via Marginal Contribution Fitness

Evolutionary optimization of spiking neural networks (SNNs) becomes increasingly difficult as task complexity

自然言語処理RAG分類回帰

用途: 分類
難易度: Hard
コスト: Low

huggingfaceHugging Faceあり2026-05-07

Masked Diffusion Language Models are Strong and Steerable Text-Based World Models for Agentic RL

Recent growth in reinforcement learning (RL) has surfaced a need for diverse, specialized training environment

自然言語処理大規模言語モデルテキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High