MLinfo | 機械学習・AI論文まとめ

MLinfo|日々更新される技術をキャッチアップ/検索

「Agent」の検索結果

30 件

すべて arxiv github huggingface 実装あり

huggingfaceHugging Faceあり2026-07-22

NexForge: Scaling Agent Capabilities through Requirement-Driven Task Synthesis for LLMs

Scaling executable agent training data for LLM post-training is bottlenecked by substrate-bound methods that t

自然言語処理大規模言語モデル生成

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-21

FinanceComplexQA: Benchmarking Agentic Reasoning on Industrial-grade Financial Documents

Agentic Reasoning has become a transformative force in financial analysis due to its ability to integrate larg

品質予測/異常検知自然言語処理RAG生成要約テキスト

用途: 生成
難易度: Easy
コスト: Low

→

huggingfaceHugging Faceあり2026-07-21

ABot-World-0: Infinite Interactive World Rollout on a Single Desktop GPU

We present ABot-World-0, an action-conditioned video world model for real-time, long-horizon closed-loop inter

品質予測/異常検知深層学習軽量化・量子化テキスト動画

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-07-20

Differentiable Logic Gate Networks for Low-Latency EEG Classification on Edge Devices

Real-time EEG classification on edge devices is bottlenecked by the floating-point arithmetic of conventional

CPUで試しやすい強化学習マルチエージェント分類検出

用途: 分類
難易度: Easy
コスト: Low

→

huggingfaceHugging Faceあり2026-07-20

EduPanel: A Three-Agent LLM Judge for Teaching Videos -- Reliability, Complementarity, and Human Trust Calibration

Teaching videos are becoming a major medium for education, creating a growing need for scalable evaluation of

説明可能品質予測/異常検知自然言語処理大規模言語モデル動画マルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-20

FlashRT: Agent Harness for Guiding Agents to Deploy Real-Time Multimodal Applications

Real-time multimodal applications, including voice agents and interactive video generation, compose heterogene

深層学習軽量化・量子化生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-07-20

WorldCupArena: Fine-Grained Evaluation of Language Models and Deep-Research Agents on Football Forecasting

Predicting a football match before kickoff requires more than knowing past results: a model must use changing

コンピュータビジョンセグメンテーション予測テキスト

用途: 予測
難易度: Easy
コスト: Low

→

huggingfaceHugging Faceあり2026-07-20

Self-State Attacks on Self-Hosted AI Agents: How Far Can OS Defenses Go?

Self-hosted AI agents read and write their own memory and configuration files to function. An agent may get co

深層学習Transformer検出

用途: 検出
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-07-20

Coercion and Deception in AI-to-AI Management: An Agentic Benchmark of Unprompted Escalation

Multi-agent systems routinely place one AI agent in authority over another. When a subordinate refuses a task,

自然言語処理大規模言語モデル分類テキスト

用途: 分類
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-19

TimeLens2: Generalist Video Temporal Grounding with Multimodal LLMs

Video multimodal large language models (MLLMs) can describe what happens in a video, but rarely identify when

自然言語処理大規模言語モデル検出テキスト動画

用途: 検出
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-19

EvolvingWorld: An Open-Schema Framework for Co-Evolving Role-Play Agents and World Model in Interactive Literary World

This paper introduces EvolvingWorld, a framework and benchmark for character and world co-evolution in interac

自然言語処理大規模言語モデル生成

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-18

DataFlow-Harness: A Grounded Code-Agent Platform for Constructing Editable LLM Data Pipelines

Large language models (LLMs) are increasingly used to automate data-processing workflows, yet coding agents ty

自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-18

Environment-free Synthetic Data Generation for API-Calling Agents

Training API-calling large language model (LLM) agents demands massive amounts of high-quality trajectories. H

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-17

SeerGuard: A Safety Framework for Mobile GUI Agents via World Model Prediction

Mobile graphical user interface (GUI) agents have demonstrated remarkable capabilities in automating complex t

強化学習モデルベース

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-07-17

Nonuniformity Principle in Human-AI Coworking

As generative AI is increasingly applied to automate multi-step and high-stake workflows, human judgment and i

品質予測/異常検知機械学習教師あり学習生成

用途: 生成
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-07-17

RecGPT-V3 Technical Report

Large language models (LLMs) are transforming recommender systems from matching co-occurrence patterns in hist

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-17

Recursive Harness Self-Improvement

Under model--harness co-evolution, harnesses are not merely inference-time scaffolds but data-generating compo

品質予測/異常検知深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-17

When Does Muon Help Agentic Reinforcement Learning?

Muon is competitive with AdamW in large-scale pre-training, but its value for reinforcement-learning (RL) post

深層学習正規化・最適化手法強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-17

DSWorld: A Data Science World Model for Efficient Autonomous Agents

Despite strong capabilities in data understanding and decision-making, autonomous data science agents still he

深層学習軽量化・量子化強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-17

Beyond Success Rate: Cost-Aware Evaluation of Offensive and Defensive Security Agents

Security-agent evaluations commonly measure peak offensive capability under generous inference budgets, emphas

コンピュータビジョンセグメンテーション

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-07-16

Multi-Turn On-Policy Distillation with Prefix Replay

We study on-policy distillation (OPD) for agentic tasks, where an LLM agent interacts with an environment over

深層学習軽量化・量子化

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-16

RESOURCE2SKILL: Distilling Executable Agent Skills from Human-Created Multimodal Resources

Skills are a useful abstraction for software agents, turning human and agent experience into reusable procedur

自然言語処理RAG画像テキスト動画

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-15

Diagnosing and Calibrating Tool-Call Boundary Drift in Multi-Teacher On-Policy Distillation

Agentic language models must learn when to call tools, when to consume tool responses, and when to answer dire

深層学習軽量化・量子化生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-15

Cura 1T: Specialized Model for Agentic Healthcare

Healthcare spans high-stakes communication, expert reasoning, and workflow execution, yet specialized LLMs tha

自然言語処理大規模言語モデル画像テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-14

ReflectWorld-MM: An Entity-Oriented Multimodal Memory System for Open-Ended Video Streams

Building assistants that can continually watch the world, remember what they see, and reason over their accumu

コンピュータビジョンマルチモーダル画像テキスト音声

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-14

From Human-Centric to Agentic Code Review: The Impact of Different Generations of Generative AI Technology on Review Quality

Code review helps maintain software quality before code integration, but it also imposes a substantial workloa

品質予測/異常検知深層学習Transformer生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-08

DeepSearch-World: Self-Distillation for Deep Search Agents in a Verifiable Environment

Training tool-use agents to improve from their own experience remains challenging, as supervised fine-tuning r

深層学習軽量化・量子化生成強化学習

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-08

Agon: Competitive Cross-Model RL with Implicit Rival Grading of Reasoning

Reinforcement learning from verifiable rewards (e.g. GRPO) is the engine behind today's reasoning models, yet

コンピュータビジョンセグメンテーションテキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-07

Behavioral Privacy Leakage in Agentic Negotiation: Formalizing and Mitigating Inference Attacks via Randomized Policies

Autonomous negotiation agents are increasingly deployed in high-stakes settings such as insurance and procurem

センサ/時系列機械学習時系列

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-05-07

Masked Diffusion Language Models are Strong and Steerable Text-Based World Models for Agentic RL

Recent growth in reinforcement learning (RL) has surfaced a need for diverse, specialized training environment

自然言語処理大規模言語モデルテキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→