MLinfo | 機械学習・AI論文まとめ

品質予測/異常検知自然言語処理大規模言語モデル画像テキストマルチモーダル

MIRROR: Learning from the Other View for Multi-Modal Reasoning

多モーダル理解技術のための新しいアプローチであるMIRROR（Learning from the Other View）を提案しました。MIRRORは、テキスト、図、テキストと図の組み合わせから同等の視点を提供することで

用途: 多モーダル理解技術の開発
難易度: Hard
コスト: High

Windowed-MTP: Removing the Full-Context Draft-KV Tax at Million-Token Context

Speculative decoding accelerates autoregressive generation by having a cheap draft propose tokens that a targe

用途: 生成
難易度: Hard
コスト: High

センサ/時系列自然言語処理大規模言語モデル分類検出埋め込み

Toward Generalizable Cognitive Impairment Detection with Speech-Based Multimodal Large Language Models

Cognitive impairment (CI) is a growing public health concern. Early and accurate diagnosis is critical for ena

用途: 分類
難易度: Hard
コスト: High

Test-Time Scaling via Error Localization

Scaling inference-time computation has emerged as a reliable method to improve the performance of large langua

自然言語処理大規模言語モデル検出生成テキスト

用途: 検出
難易度: Hard
コスト: High

AI Assistants Overassist

Large language models (LLMs) are increasingly used as tutors and thought partners, helping users reason throug

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Adaptive Depth Sparse Framework: Similarity-Driven Resource Allocation for Pre-Trained LLMs

Large language models (LLMs) achieve strong generation and reasoning performance, but the Transformer architec

深層学習Transformer生成テキスト

用途: 生成
難易度: Hard
コスト: High

The Dark Room in the Reward Channel: Dense Prediction Rewards Collapse GRPO-Trained LLM Agents -- and What Actually Works

Dense per-step supervision is an appealing remedy for sparse-reward, long-horizon LLM agents: reward the agent

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Automated Synthesis and Adversarial Validation of Executable Causal Research Pipelines

この研究では、機械学習モデルを使用して血糖値の変化を予測し、糖尿病管理のためには血糖値データの前処理が重要であることの重要性を強調しています。

用途: 病気予測
難易度: Hard
コスト: High

Relative Value Learning

この研究では、反対称関数を用いて、機械学習モデルが状態のどの点からどの点への値の差を予測できるような相対的な値学習(RV)を提案し、制御や推定を向上させる可能性があります。

用途: 値の差を予測
難易度: Hard
コスト: High

自然言語処理大規模言語モデル異常検知テキスト強化学習

Training Large Language Models for Self-Explanation Faithfulness

この研究では、自己説明の信頼性を検証するためのRL方法を提案し、自己説明の信頼性を直接最適化するための新しいアプローチを検討します。

用途: 自己説明の信頼性
難易度: Hard
コスト: High

Multi-turn RL with Structural and Performance Aware Rewards for CUDA Kernel Generation

CUDAカーネルの生成を支援するCudaPerfを提案した研究で、この方法により、高性能のCUDAカーネルを効率的に生成できる。

自然言語処理大規模言語モデル生成強化学習

用途: CUDAカーネルの生成を支援する
難易度: Hard
コスト: High

Position Bias is Hidden Behind Ceiling Effects: A Permutation Diagnostic for LLM Benchmarks

LLM（言語モデル）の評価における位置バイアスを分析するための方法を提案した研究で、この方法により、位置バイアスが評価結果にどのような影響を与えるかが明らかにできる。

自然言語処理大規模言語モデル検出生成

用途: LLMの評価における位置バイアスを分析する
難易度: Hard
コスト: High

Offline RL with Hierarchical Action Chunking

オフラインRL（非実時学習）におけるタスクの分割を支援するOffline RL with Hierarchical Action Chunkingを提案した研究で、この方法により、タスクの分割が効

用途: オフラインRLにおけるタスクの分割
難易度: Hard
コスト: High

Robust Asynchronous Q-Learning under Reward and State Corruption via Batching

Motivated by reinforcement learning in harsh environments, we consider the problem of learning an optimal poli

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

The Geometry of Personality: Activation Steering with Jungian Cognitive Functions

Activation steering enables control and interpretation of LLMs, yet existing work primarily models personality

説明可能深層学習Transformer

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Beyond Sycophancy: Structured Resistance and Compliance in LLM Moral Reasoning

この研究では、言語モデルが社会的に正しい判断を下すことができる方法について調べた。結果は、模式が対立を保つ能力が高くなり、他の人の視点を受け入れやすくなったことである。

用途: 社会的道徳判断の向上
難易度: Hard
コスト: High

From Resource Flow to Executable Tests: Petri-Net-Guided LLM Test Generation for Concurrent Stateful Rust APIs

この研究は、リソースフローの動作を表すPetriネットと、APIを操作するためのテストを自動生成する方法を提案した。方法は、APIの機能をテストするためのシナリオを生成し、テストが正しく実行されるようにした。

用途: 共時進行のコンカURRENCYAPIのテスト
難易度: Hard
コスト: High

Same Dangerous Objective, Opposite Advice: Direct Exposure versus Multi-Agent Mediation

この研究では、LMOの安全性を調べた。結果は、直面する危険目標に対してモデルが安全なアドバイスを出すことができた。

用途: 直接暴露対照的暴露
難易度: Hard
コスト: High

Improved lower bounds for the Shannon capacity of odd cycles

この研究は、奇数サイクルのシャノン容量の最小限度を検討した。結果は、グラフの独立集合の大きさに基づいて最小限度を計算することができた。

用途: シャノン容量の最小限度
難易度: Hard
コスト: High

Agentic coding without the cloud: evaluating open-weight large language models on longitudinal data preparation tasks

Large language models (LLMs) and agents are now widely used tools in code development, with data typically sen

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Thinkink: 2D Spatial Ink-native Interaction with LLMs

People often use handwritten notes and sketches to externalize ideas for ideation. To integrate large language

深層学習軽量化・量子化画像テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Detecting LLM-Generated Tokens in Human--LLM Coauthored Text

The rise of human-AI collaborative writing has created a growing need for fine-grained detection methods that

自然言語処理大規模言語モデル分類検出テキスト

用途: 分類
難易度: Hard
コスト: High

RUMBA: Russian User Memory Benchmark

The ability to handle long-term memory in LLMs is becoming increasingly critical, yet existing benchmarks rema

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

PATS: Policy-Aware Training Scaffolding for Agentic Reinforcement Learning

In long-horizon LLM agent reinforcement learning, weak policies often repeat similar failures, producing uninf

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

表形式向き自然言語処理大規模言語モデル生成テキスト

Euclid-MCP: A Model Context Protocol Server for Deterministic Logical Reasoning via Prolog

Large Language Models (LLMs) excel at natural language understanding and generation but remain unreliable for

用途: 生成
難易度: Hard
コスト: High

センサ/時系列自然言語処理大規模言語モデル画像テキスト3D

VoLN: Vision-Only Long-Horizon Navigation---Paradigm, Benchmark, and Method

Vision-and-Language Navigation (VLN) enables embodied agents to follow natural-language instructions. However,

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

SPORD: A Simulation-Propose-then-OR-Dispose Approach for Supply Chain Planning

For years, supply chain planning at e-commerce firms has operated as a collection of isolated projects. Each p

CPUで試しやすい自然言語処理大規模言語モデル

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

From Static Bibliometrics to Dynamic Knowledge Graphs: An LLM-Powered Framework for Modernizing Science, Technology, and Innovation (STI) Analytics

Bibliometric indicators - citation counts, h-indexes, co-authorship networks - have long anchored science, tec

自然言語処理大規模言語モデル検出テキスト

用途: 検出
難易度: Hard
コスト: High

GRADRAG: Cross-Component Prompt Adaptation for Coordinated Multi-Agent RAG

Retrieval-Augmented Generation (RAG) systems increasingly employ multiple LLM agents. Yet, most prior work opt

用途: 生成
難易度: Hard
コスト: High

Scaling Up Formal Representation of Clinical Trial Protocols in Ensemble Logic Using LLMs: A Preliminary Study

The reliance on unstructured free text for documenting clinical trial protocols creates a significant barrier

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデルQA画像テキスト

Unlearning Under Imbalance: Benchmarking Fairness in Multimodal LLM Unlearning

LLMは、人間のアイデンティティのシミュレーションを使用して個人データを削除したり、未均衡なデータを削除したりしますが、これらのアプローチには制限があります。

用途: モデルの個人データ削除
難易度: Hard
コスト: High

少数データ向きCPUで試しやすい条件最適化自然言語処理大規模言語モデル生成

An LLM-Driven Workflow for Automated Process Control Strategy Generation and Tuning from Dynamic Process Models

このプロセスでは、大規模言語モデルを使用して、ダイナミックプロセスモデルに基づいて自動化された制御戦略を生成します。

用途: オートメーションされた制御戦略の生成
難易度: Hard
コスト: High

A Comparative Evaluation of Embeddings and LLMs in a Greek Book Publisher Setting - The CUP Dataset

この研究では、大規模言語モデルを活用して、Greekに基づく書籍検索システムの評価を行いました。大規模言語モデルを活用することで、検索精度が高まりました。

深層学習Transformer要約

用途: 書籍検索システムの評価
難易度: Hard
コスト: High

pAI-Econ-claude: A Gated Human-in-the-Loop Multi-Agent Architecture for AI-Assisted Economic Theory Development

この研究では、大規模言語モデルを活用して、経済学の研究活動をサポートするシステムを開発しました。このシステムは、学者が理論モデル開発を自動化することができます。

用途: 経済学の研究支援システム
難易度: Hard
コスト: High

slang.gr as a Large-Scale Crowdsourced Resource for Non-Standard Greek

この研究では、大規模言語モデルを使用して、GREEKのスラングを研究しました。このスラングは大規模言語モデルを活用することで推測することができました。

説明可能深層学習Transformer

用途: スラングの研究
難易度: Hard
コスト: High

Case study: solving P-99 with LPTP and an LLM

Ninety-Nine Prolog Problems (P-99) is a famous set of Prolog exercises. We solved the first thirty three just

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Case study: proving sqrt(2) irrational with LPTP and an LLM

LLM (Large Language Model)とLP (Logic Programming)を組み合わせて、有理数である√2の非有理性を証明します。この証明には、LLMが主観的な論理式を生成し、LPが証明を行うプロ

用途: 有理数である√2の非有理性の証明
難易度: Hard
コスト: High

Safeguards for Speech2Speech LLM-Assistants: A Case Study in Automotive Applications

S2S (Speech-to-Speech) LLMアシスタントを利用して、人間のような話し方をすることができますが、安全対策の実装が困難です。この研究では、S2S LLMアシスタントの安全対策を2つのアプローチで実現し

自然言語処理大規模言語モデルテキスト音声

用途: S2S LLMアシスタントの安全対策
難易度: Hard
コスト: High

SafeStep: AI-powered Travel Assistance for Elderly People with Frailty or Dementia

老年者は移動が困難になることが多いため、この研究では老年者の安全な移動支援システムを開発します。このシステムでは、LLMと予測モデルを組み合わせて、老年者の安全な移動を支援します。

用途: 老年者の安全な移動支援
難易度: Hard
コスト: High

CRAG-MM-Diagnostics: Enabling Stage-Wise Analysis of Knowledge-Intensive VQA

知識重視の質問応答システム (KI-VQA) を分析するために、新しい評価基準を提案します。これらの基準では、VLMの各タスクを個別に評価することができます。

自然言語処理大規模言語モデル分類QA画像

用途: 知識重視の質問応答システムの分析
難易度: Hard
コスト: High

V-DEAL: Diagnosing Video Safety De-Calibration as an Understanding-Refusal Coupling Failure

ビデオLMMの安全性を確認するために、新しい診断フレームワークを提案します。これらのフレームワークは、モデルの挙動、理解、セマンティクスを同時に考慮します。

自然言語処理大規模言語モデル画像テキスト動画

用途: ビデオ安全性デ-カリブレーションの診断
難易度: Hard
コスト: High

One More Turn, Less Regret: A Regret-Based Multi-Turn Benchmark for LLMs' Clarification Policies

再発防止を目指す会話助言の評価基準である RegretBench を提案します。这一基準评估了會話助言の多輪交互式決定における後悔を最小化すること。

品質予測/異常検知深層学習軽量化・量子化

用途: 再発防止による会話助言の評価
難易度: Hard
コスト: High

AttriMem: Attribution-Guided Process Feedback for Agent Memory Learning

代理記憶の学習は、LGMが効果的に情報を保持・更新・処理できることを意味します。この研究では、アトリビューテッドグラフィックフィードバックを使用して、代理記憶を最適化する方法を提案します。

自然言語処理大規模言語モデルQA

用途: 代理記憶の学習
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成テキスト音声

Faster IndexTTS-2: Accelerating and Streaming Autoregressive Zero-Shot Text-to-Speech Synthesis on GPUs

Autoregressive text-to-speech models achieve strong naturalness but suffer from slow inference due to sequenti

用途: 生成
難易度: Hard
コスト: High

HiMe: Real-Time Self-Hosted Personal Agent Platform for Health Insights with Wearable Devices

Traditional approaches to wearable health signal analysis, such as smartwatches, are constrained by rigid anal

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

EmoAgent-R1: Towards Multimodal Emotion Understanding with Reinforcement Learning-based Dynamic Agent Specialization

Multimodal large language models (MLLMs) have achieved impressive performance in multimodal emotion recognitio

自然言語処理大規模言語モデル分類テキスト動画

用途: 分類
難易度: Hard
コスト: High

Reexamining zero-shot summarization: Empirical investigation of trustworthiness of LLM-summarizers

Zero-shot summarization using Large Language Models (LLMs) has significantly advanced the abstractive summariz

MI向き深層学習軽量化・量子化分類生成要約

用途: 分類
難易度: Hard
コスト: High

GuardianAgentBench: Where Agents Fail and How to Guard Them

_guardianAgentBenchBenchmarkは、580のシナリオを6つのドメインで評価し、3つの実稼動フレームワークであるLangChain、LlamaIndex、Vectaraを利用します。このベンチマーク

用途: 機械学習Agentの安全性と信頼性を確保
難易度: Hard
コスト: High

SciExplore: Evaluating Autonomous Agents from Scientific Navigation to Information Integration

Scientific research involves complex information-seeking and reasoning workflows across heterogeneous sources.

自然言語処理大規模言語モデル生成QAテキスト

用途: 生成
難易度: Hard
コスト: High

Scientific exploration, collaboration and labor division in the large language model era

Large language models (LLMs) have rapidly and significantly entered scientific workflows, but it remains uncle

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Traceable Scholarship: Page Anchors and Ariadne's Thread for Humanistic Inquiry in the Age of Generative AI

Generative AI lets large language models produce scholarly-looking text within seconds, yet fluency does not e

用途: 生成
難易度: Hard
コスト: High

Is Deep Research Reliable? Misleading Knowledge Induces False Conclusions

Deep Research agents extend LLM-based assistants into long-horizon workflows involving planning, retrieval, ev

用途: 生成
難易度: Hard
コスト: High

Code Monitor Red Teaming for Public-Test-Passing Code

Visible tests are a common gate for LLM-generated code, but passing them does not certify specification correc

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Auditing Evidence Use in Medical LLM Diagnosis

Medical LLMs are often evaluated by whether they select the correct diagnosis, but diagnostic accuracy alone d

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Auditing Provenance Sensitivity in LLM Agent Action Selection

LLM agents choose tools and arguments from context that mixes user requests, tool outputs, retrieved records,

深層学習Transformer検出テキスト

用途: 検出
難易度: Hard
コスト: High

説明可能MI向き品質予測/異常検知深層学習Transformer分類生成画像

Enhancing Explainable Cardiac Diagnosis with Guide-Grounded Multimodal LLMs

The electrocardiogram (ECG) is a cornerstone of cardiac as- sessment, yet clinical deployment of deep learning

用途: 分類
難易度: Hard
コスト: High

Profiling Lightweight Large Language Models

Lightweight large language models (LLMs) are increasingly being deployed locally on personal computers and are

用途: 生成
難易度: Hard
コスト: High

Search Hardness-Aware LLM-Based Problem Formulation for Expensive Simulation-Driven Design

シミュレーション駆動設計では、高精度なシミュレーションを少なくすることで設計を実現しています。既存の手法では、その問題に取り組むために最適化アルゴリズムが改善されてきましたが、問題の定義自体は検討されていません。この論文

深層学習軽量化・量子化生成

用途: コスト削減的なシミュレーション駆動設計
難易度: Hard
コスト: High

MedGame: Storytelling Gamification Empowered by Large Language Models for Medical Education

Large Language Models (LLMs) は医学教育に大きな可能性を持っていますが、現在のシステムでは、質問に答えるか一時的なフィードバックしか行なわれていません。一方、臨床病例を決定センターへの学習トレ

自然言語処理大規模言語モデル生成QAテキスト

用途: 医学教育への Large Language ModeL の適用
難易度: Hard
コスト: High

When Trivia Is Not Trivial: Everyday Knowledge Failures in Multilingual LLMs

この論文では、大規模言語モデル (LLMs) が日常的な文化的知識を評価する能力に着目しています。ここで、TriviaRoomQA というクイズスタイルで問題を提示して、LLMs が日常的な文化的知識をどのように評価する

用途: 大規模言語モデルにおける日常生活の知識の評価
難易度: Hard
コスト: High

センサ/時系列自然言語処理大規模言語モデル生成テキスト音声

An Evaluation Framework for Structured Audio Captions Validated by Controlled Perturbations

この論文では、音声字幕の評価手法が提案され、音声字幕の評価において既存の手法の制約を克服することを目指しました。提案されたフレームワークは音声字幕の各側面を評価し、質問回答型の評価手法ではなく字幕の中立性を評価することが

用途: 音声字幕の評価フレームワークの構築
難易度: Hard
コスト: High

Capital Markets LLM Reliability Score (CM-LRS): From Plausible to Bankable

In capital-markets workflows the question is rarely whether a large language model can produce a fluent draft,

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデルテキスト

news-crawler-LM: A Small Long-Context Model For High-Quality News Crawling

Extracting structured content from news pages remains challenging due to heterogeneous HTML layouts, inconsist

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

A Unified Moral-Value Dataset for Instruction Tuning

Large language models (LLMs) have developed rapidly and become valuable tools in everyday life. However, how t

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

説明可能品質予測/異常検知自然言語処理大規模言語モデル生成テキスト

PrefReward: Learning User Preference Matrix for Personalized Text Generation

Large Language Models (LLMs) have demonstrated remarkable ability in generating personalized content by levera

用途: 生成
難易度: Hard
コスト: High

QuantiBias: Benchmarking Quantization-Induced Bias in LLMs

Almost every large language model that reaches a broad audience is quantized: trained in full precision, then

用途: 生成
難易度: Hard
コスト: High

CultureTalk-ID: A Multi-Task Dialogue Benchmark for Cultural Commonsense in Indonesian Local Languages

Culture is lived through conversation, yet existing Indonesian cultural commonsense benchmarks evaluate LLMs o

自然言語処理大規模言語モデル翻訳テキスト

用途: 翻訳
難易度: Hard
コスト: High

Where Animacy Lives in Large Language Models: Tracing the Circuits of the Animacy Concept

Distinguishing animate from inanimate concepts in written language requires more than shallow text processing,

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成テキスト

Transformer-Assisted LLM-Based Source Code Summarisation: to Enable More Secure Software Development

ソフトウェア開発の維持フェーズで、ソースコードの自然言語解説を生成するためのモデルの改善を目的とした研究。

用途: ソフトウェア開発のスピードアップ
難易度: Hard
コスト: High

Tencent WorkBuddy Bench: A Multi-Domain Coding-Agent Benchmark with Contamination-Resistant Task Construction

コーディングエージェントの評価基準を導入し、現実世界のコミットやプルリクエストに基づくタスクを構築した。

自然言語処理大規模言語モデル画像テキスト

用途: コーディングエージェントの評価
難易度: Hard
コスト: High

LegalCiteTrust: Benchmarking Citation Trustworthiness in Chinese Long-Form Legal Research Reports

Chinese language の長形法律研究報告における出典の信頼性を評価し、信頼性が低い出典を検出および評価する目的で LegalCiteTrust を提案している。

用途: 法律研究報告の信頼性改善
難易度: Hard
コスト: High

REFACT: Adaptive Fact Restatement for Compact and Faithful Chain-of-Thought Reasoning

長形推論のための言語モデルが、提供されたコンテキストから乖離した論理を生成する可能性があることを指摘し、コンテキストと推論論理をより適切に融合するため、 REFACT (REstating Facts in Adapti

用途: Chain-Of-Thought (CoT) の改善
難易度: Hard
コスト: High

Out of Sight, Still in Mind: Token Compression for Omni-LLMs

The goal of this paper is to reduce the input token cost of Omni-modal large language models (Omni-LLMs) at in

自然言語処理大規模言語モデル画像テキスト音声

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

HalluScope: Fine-grained Hallucination Diagnosis for Multimodal Large Language Models

大規模言語モデルはさまざまな画像をテキストに変換する上で優れた性能を示しているが、発生するホログラフィックな診断にはまだ解決策が必要です。この研究では、主流の粗い検出方法の欠点を補うため、細部の診断方法を提案しています。

説明可能自然言語処理大規模言語モデル分類検出生成

用途: ホログラフィックハロウィーンの診断
難易度: Hard
コスト: High

Geo3R: Mitigating Spatial Reasoning Hallucination in Multimodal Large Language Models

大規模言語モデルのハロウィーン診断では、対象の 3D 空間関係を推論する際に、視覚化が欠如していることが問題となります。この研究では、これらのハロウィーンを軽減するためのアプローチを提案しています。

自然言語処理大規模言語モデル画像テキスト3D

用途: 3D空間推論のハロウィーン診断
難易度: Hard
コスト: High

深層学習Transformerテキストマルチモーダル

C-PTQ: Fisher-weighted Channel-wise Sensitivity for Post-training Quantization of MLLMs

大規模言語モデルの圧縮には、モデルのパフォーマンスが低下する可能性があるため、量化の保護が重要です。この研究では、Fisher加重チャネル感受性を用い、MLLMの量化を安定させるためのC-PTQをプロPOSEしています。

用途: 大規模言語モデル圧縮
難易度: Hard
コスト: High

Show, Don't Tell: Evaluating Spatial Cognition in Generative Pixels Rather Than LLM Text

空間理解は、物理世界と静的のセマンティック理解の間でつながるために不可欠です。多くの空間タスクは、場所、領域、パスの自然な表現は、ポインティングやマーキングなど、連続的な視覚的シーンで行われることが多いが、現行の空間推論

用途: 空間理解
難易度: Hard
コスト: High

自然言語処理大規模言語モデル画像テキストマルチモーダル

Do Pathology Vision-Language Models Truly See Pathology?

パスロジは、現在、パスロジ認識のための画像言語モデルを評価するために広く使用されていますが、この研究では、パスロジ認識において画像言語モデルの視覚知覚が機能していることを疑問に問っています。

用途: パスロジの認識
難易度: Hard
コスト: High

深層学習Transformer画像テキストマルチモーダル

MVEI & EmObserver: Empowering MLLM-Oriented Visual Emotional Intelligence via Emotion Statement Judgement

感情認識は、現代のアギを促進するために不可欠ですが、大規模

用途: 感情認識
難易度: Hard
コスト: High

Engine-Native Editable 3D World Reconstruction with Objects and Lighting

この論文では、Lumeraという手法を提案します。Lumeraは、Engine-Native 3D World ReconstructionとLightsを検出するために使用します。

自然言語処理大規模言語モデル検出生成画像

用途: 3D世界の再構成
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル画像テキスト動画

ViSTR-Bench: Can MLLMs Reason from Continuous Visual Cues in Dynamic Scenes?

この論文では、ViSTR-Benchという手法を提案します。ViSTR-Benchは、MLLMが動的シーンから情報を取得できるかどうかを評価します。

用途: 3Dシーンの分析
難易度: Hard
コスト: High

Agentic Designer: Progressive Multi-Agent Collaboration for Structure-Aware Interior Layout Generation

Generating realistic interior furniture layouts that strictly adhere to architectural constraints (e.g., walls

用途: 生成
難易度: Hard
コスト: High

FORGE-plus: Force-Budgeted Recovery for Contact-Rich Assembly with a Frozen LLM Supervisor

強制制約に基づく強化学習を利用し、低コストで高精度の組み立てが可能になると同時に、組み立てに失敗してもロボットが安全に回避できるように、ロボットの制御のための強化学習を提案します。

用途: 非対称ロボット組み立て
難易度: Hard
コスト: High

Are Diversity Metrics Measuring Diversity? A Capability-Controlled Audit of Majority-Vote Gain in LLM Ensembles

Majority voting over LLMs is widely assumed to benefit from diversity, and diversity measures are used to choo

自然言語処理大規模言語モデル回帰

用途: 回帰
難易度: Hard
コスト: High

GaugeQuant: Online Learning of Quantization-Optimal Bases from LLM Symmetries

Transformers are known to have internal continuous symmetries that leave outputs invariant, while modifying qu

深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

LLMs Get Lost in Evolving User Intent

As LLMs become more capable, they are increasingly deployed as collaborative agents, taking on user-delegated

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Frontier Financial Judgement: Can agents tell what might move a stock?

We introduce Frontier Financial Judgement, a challenging new benchmark developed in collaboration with profess

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Are Single-Token Sparse Autoencoder Features Causally Necessary? Layer-Depth and SAE-Family Effects

Sparse autoencoder (SAE) features are used to interpret and steer large language models, yet whether a feature

説明可能自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

The Blessing of Dimensionality: How Near-Orthogonality in High-Dimensional Spaces Explains Temporal Portability

この研究では、分布に変化がない場合の時間軸への適応性を

用途: 分布に変化がない場合の時間軸への適応性
難易度: Hard
コスト: High

MI向き深層学習Transformer生成テキスト強化学習

OLEDLM: A Unified Language Model for OLED Molecular Design

OLED 材料の開発を目指す新しいアプローチ、causal language models を用いて optoelectronic プロパティを予測するフレームワークを提案する。

用途: OLED 材料の開発
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデルテキスト

Co-Evolving LLM Evaluators and Policies via DynamicRubric

Post-training with evaluator feedback on policy-induced samples serves as a major mechanism for improving larg

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

HijackKV: New Threat in Position-Independent KV Cache Reuse

この研究では、マルウェア検出に使用されるデープラーニングモデルにおける、位置依存性KVキャッシュ（Key-value Cache）を改善する方法を提案しました。

用途: マルウェア検出における位置依存性KVキャッシュの改善
難易度: Hard
コスト: High

表形式向き自然言語処理大規模言語モデルテキスト表形式

Auto-Fill: Learning to Predict Missing Values Accurately with Specialist Language Models

Predicting missing cell values in tabular data is a fundamental problem in data cleaning. While state-of-the-a

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

TriAgent: Divergence-Aware Multi-Agent Committees for Cost-Efficient Financial Sentiment Analysis

生産的言語モデルの利用による金銭的感情分析に対処するための方法を提案している。複数のエージェントを活用したコミティー方式を使用し、さまざまな粒度のテキストデータに対応できるように、単語レベルのルールベースアプローチ、句節

深層学習Transformer検出テキスト

用途: 金融分野の感情分析
難易度: Hard
コスト: High

AlphaRoute: Large Language Models as Semantic Optimizers for Multi-Objective Routing

VLSIのグローバルルーティングは、信号ネットワークを 3D グリッド上で割り当てることが目的であり、信号遅れやワ

説明可能自然言語処理大規模言語モデルテキスト3D

用途: マルチ目標ルーティング
難易度: Hard
コスト: High

Efficient Clustering with Provable Guardrails for LLM Inference at Scale

Scaling LLM-based applications to millions of users is bottlenecked by the inference cost and latency of moder

品質予測/異常検知深層学習軽量化・量子化

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Refusal-Gated Decoding: Preserving Refusal Behavior Under High-Temperature Sampling

High-temperature sampling is one of the primary mechanisms for increasing diversity in LLMs. Recent advances i

用途: 生成
難易度: Hard
コスト: High

HARP: The Human--AI Research Platform

Large language models (LLMs) have shifted human--computer interaction from `traditional'' interface journeys t

MI向き自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

IssueTrojanBench: Benchmarking AI Coding Agents Against Malicious Issue Requests

AI coding agents powered by LLMs are increasingly integrated into real-world software development, where they

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

GPE: Evaluating Robust Evidence Aggregation for Fact Verification under Controllable GEO-Style Poisoning

Large language models increasingly use search tools to retrieve up-to-date information, introducing a new atta

用途: 生成
難易度: Hard
コスト: High

NVIDIA-labs OO Agents: Native Python Object-Oriented Agents

Traditional agent development is split across prompt templates, tool schemas, callback code, and workflow grap

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

WaveformQA: Benchmarking LLM Temporal Reasoning on Digital Waveforms

Large Language Models (LLMs) have demonstrated strong capabilities in code generation and reasoning, yet their

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知画像検査深層学習軽量化・量子化生成画像テキスト

Demonstrating GenDB: Instance-Optimized and Customized Query Processing Code Generation via LLM Agents

Traditional query processing engines require continuous development and extensions to support new techniques a

用途: 生成
難易度: Hard
コスト: High

Understanding Generative AI-mediated User Engagement with Academic Library Resources

This study empirically analyzed generative AI as an emerging discovery pathway to academic library resources.

用途: 生成
難易度: Hard
コスト: High

Sound Probabilistic Safety Bounds for Large Language Models

最新言語モデル(LLM)が危険な生成を防ぐための確信的な安全な限界を計算するための新しいフレームワークを提案した。Clopper-Pearsonの信頼区間の新しい応用として、PAC(可能性が最も近い)の境界を得るためのア

深層学習軽量化・量子化生成テキスト音声

用途: 生成性質へのリスクを抑える
難易度: Hard
コスト: High

表形式向き自然言語処理大規模言語モデル生成テキスト

PoTRE: Test-Time Reasoning inspired by Cognitive Heterogeneity

モデルの脆弱性を解決するために、四つのエージェントに分割される多様なフレームワークPoTREを導入した。モデルの推論能力を強化し、単一のストリーミングアプローチよりも複雑な理論的制約とアブストラクションに抵抗できるように

用途: 複雑な推論力のあるタスクの解決
難易度: Hard
コスト: High

The Ethics of Autonomous AI Agents for Offensive Security

侵攻テストツールが異なっている点、決定主義的な性質、狭く特定されたスコープ、専門技術の操作を用いたものと異なり、LLM駆動の自治的セキュリティツールは3つの次元で不確実性を示した。政策決定への説明が困難、影響の開放性、行

用途: 自律的セキュリティツールの倫理的考慮
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル翻訳テキスト

On the Systematic Challenges of Culturally Loaded Machine Translation: Dream of the Red Chamber as the Cultural Lens

文化的意味の表現が表現された翻訳には、翻訳システムが表現する意味を理解するために、表現の文化的背景を考慮する必要があることを指摘した。文化的背景が表現されている表現された翻訳には、いくつかの課題があり、LLMベースの翻訳

用途: 文化的意味の表現
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル生成

DQAOA-GPT: AI-Accelerated Distributed Quantum Optimization for Combinatorial Problems

組み合わせ方程式の最適化を解くための新しいフレームワークを提示した。分布される量子アルゴリズムの局所的な制限に直面する際、最適化の解を導けるために、分布される量子近似最適化アプローチと深層学習アルゴリズムを組み合わせた。

用途: 方程式組み合わせの最適化
難易度: Hard
コスト: High

Small, Free, and Effective: Orchestrating Open-Weight Small Language Models to Outperform Single LLM for Malware Analysis

分析報告の迅速な解釈が求められるときに行われるマルウェア分析を実現するために、閉じた重みの大きい言語モデルを使用しないことが多い。オープン重みの言語モデルは、マルウェア分析のために適切な言語能力と、閉じた重みの大きい言語

用途: マルウェア分析のための小規模な言語モデル
難易度: Hard
コスト: High

SLAI T-Rex: Full-Parameter Post-training of the DeepSeek-V4 Family on Ascend SuperPOD

Full-parameter post-training of trillion-parameter-scale MoE models introduces substantial system-level challe

品質予測/異常検知深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Reinforcement Learning for Large Language Model Selective Evidence Adoption from Contaminated Retrieval Results

Retrieval-augmented large language models frequently face contexts that interleave useful evidence with mislea

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

PRO-LONG: Programmatic Memory Enables Long-Horizon Reasoning

Long-horizon tasks require sustained perception, reasoning, and exploration, and are a persistent challenge fo

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Reading and Steering Representations of Materials-Science Mechanisms in an Open-Weight Language Model

Large language models can answer scientific questions, yet a correct output does not reveal whether the model

MI向き自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Language-Specific versus Cross-Lingual Knowledge Graphs for Implicit Aspect Identification in Arabic: A Comparative Study of Reasoning and Adaptation Strategies

Aspect-based sentiment analysis (ABSA) in Arabic must recover both explicitly stated aspects and implicit aspe

用途: 生成
難易度: Hard
コスト: High

Geometric Configurations of Perturbed Jailbreak Prompts

Perturbation techniques that turn unsuccessful jailbreak prompts into successful ones are continuously evolvin

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

EvoDRC: A Self-Evolving Agentic Framework for Automated DRC Violation Repair

Design Rule Check closureを促進するための自動修正フレームワーク、EvoDRCを開発し、複雑な幾何学的相互作用を考慮した修正を実行する。

深層学習Transformer

用途: デザインルール違反修正の自動化
難易度: Hard
コスト: High

Rushes: A Human Preference Dataset for Pluralistic Alignment

We introduce Rushes, a dataset and benchmark for studying revealed human engagement preferences in interactive

用途: 生成
難易度: Hard
コスト: High

REGARD: Regional Affective Differences in Large Language Models

Large language models trained and aligned within different linguistic and regional ecosystems may frame the sa

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成画像テキスト

Learning to Detect UI Principle Violations via Reinforcement Learning

Small language models and coding agents increasingly generate web front-end code, yet their outputs are typica

用途: 生成
難易度: Hard
コスト: High

LKValues: Aligning Large Language Models with Sri Lankan Societal Values

Value alignment of Large Language Models (LLMs) has been shown to be culturally biased toward Western norms. T

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Notes to Self: Can LLMs Benefit from Experiential Abstractions?

Humans distill experience into reusable abstractions, e.g., strategies and cautionary reminders, and apply the

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

PyroDash: Cost-Efficient Token-Level Small-Large Language Model Collaborative Inference

危険な問題に対する正しい答えを提供する大きな言語モデルと費用の効率が良い、小さな言語モデルを協力させる技術が開発されました。

用途: 小さな言語モデルを大きい言語モデルと協力させる手法が効率的かつ安全に実装される
難易度: Hard
コスト: High

Which Values Do LLMs Confuse? A Schwartz-Based Recognition Study

LLMは、状況から価値観を判断できるかどうか、という研究が調査されました。LLMは、状況に応じて真の価値観を推測することができました。

用途: LLMが真の価値観を理解できているかどうかを検証する
難易度: Hard
コスト: High

Exposure is Optional: Learning Unlike Coordination in Language Models

同じカテゴリを組み合わせることしかできないという考えに対抗して、異なるカテゴリを組み合わせることができるかどうかについて、言語モデルが調査されました。

用途: 不同のカテゴリを組み合わせることができるかどうかを検証する
難易度: Hard
コスト: High

Evaluating the Effectiveness of Persona Simulation in Opinion Prediction with GPT-4.1

Persona simulation involves utilizing large language models (LLMs) to anticipate human choices or interactions

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

HalluTruthQA: A Fine-Grained Benchmark for Hallucination Detection, Localization, and Explanation in Arabic Question Answering

大きな言語モデルは真実の情報を提供できるように見えますが、実際は虚偽情報を提供することが多く、これを検知、検出、および検証するための基準を作成するため、HalluTruthQAが開発されました。

自然言語処理大規模言語モデル検出QAテキスト

用途: 仮想の答えを検知、検出、および検証するための基準を作成する
難易度: Hard
コスト: High

surprisal is Not a Theory

大きな言語モデルは、さまざまな価値観に対して異なる反応を示すことがあり、これらの反応がどのように影響するかを調べました。

用途: 様々な価値観に対するLLMの反応を調べる
難易度: Hard
コスト: High

Gotta Catch them all: the modes of Sycophancy

大きな言語モデルは、ユーザーの信念と事実的な正しさを合わせる傾向があるが、これらの傾向は多様であることを明らかにしました。

用途: 大きな言語モデルの恭順のさまざまなタイプを調べる
難易度: Hard
コスト: High

MI向き自然言語処理大規模言語モデル生成画像テキスト

Back to Back with a Copy: A Computational Analysis of AI-Generated Visual Contemporary Art Pastiches

AIは、特に当代芸術作品のパスティーシュを作成する能力が高いが、これらの作品はどれだけ実際の作品と似ているかを調べました。

用途: AI生成された芸術作品と原画との相似性を調べる
難易度: Hard
コスト: High

OpenSkillRisk: Benchmarking Agent Safety When Using Real-World Risky Third-Party Skills

大きな言語モデルのエージェントは、第三者のスキルによる実際的な危険を認識し回避する能力を評価します。

用途: 第三者のスキルで安全でない動作を行うリスクを評価する
難易度: Hard
コスト: High

Understanding the Impact of Linguistic Realization Choices on LLM Stance with Causal Tracing

大きな言語モデルの答えは、質問や入力の形態に応じて異なる傾向があることを認識しました。

用途: LLMの立場を調べるための言語現実化の影響を調べる
難易度: Hard
コスト: High

TINY_SCHILLER: A Drop-In German Drama Corpus for Small Language Models

小さな言語モデルに対するドロップインコーパス、tiny_schillerを導入し、単一ファイルで利用できるようにし、言語モデルを簡単にprototyping、fine-tuning、教育、研究に利用できるようにする。

用途: ナイーブな言語モデルに対するドロップインコーパスの提供
難易度: Hard
コスト: High

Overview of FinMMEval 2026 Task 1: Multilingual Financial Multiple-Choice Question Answering

FinMMEval 2026 タスク 1 は、英語、中国語、アラビア語、ヒンディー語で行われる多言語的な金融質問に答えるものを評価します。

自然言語処理大規模言語モデルQAテキスト

用途: 金融問題を解決する
難易度: Hard
コスト: High

D2VBench: Benchmarking Large Language Models with Value Dilemmas in Daily Scenarios

With the wide application of large language models (LLMs) in real-world scenarios, the value implication of th

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

VizRAG: Enhancing Retrieval-Augmented Generation with Hypergraph Visualization

Hypergraph-based RAG systems surpass traditional graph-based approaches by organizing complex n-ary atomic fac

用途: 生成
難易度: Hard
コスト: High

Rewarding Better Thinking for LLM Preference Alignment

この研究では、偏見が蓄積されることが多くのLLMで問題となります。一方、この研究によって、LLMの偏見を解決する新しいアプローチが提案されました。

用途: LLMの偏見を解決する
難易度: Hard
コスト: High

Beyond Relevance-Centric Retrieval: Rubric-Oriented Document Set Selection and Ranking

3D オキュピエンシー予測には、物体の配置と密度を解釈するための視覚的手法が必要です。従来の方法では、計算コストが高くなりすぎていたが、新しく提案されたGaussianSeedアルゴリズムは、層を階層化することで、計算コ

用途: 3次元空間における物体の配置と密度の予測
難易度: Hard
コスト: High

Reference-Free Evaluation of Reasoning in Open-Ended Question Answering

この研究では、AI生成物の論理的評価に必要なものとして、生成物がどうやって結果を得るのかを明らかにすることの重要性を強調しています。この研究では生成物を分解し、その論理的な構造を理解するために自然言語推論を利用し、生成物

自然言語処理大規模言語モデルQAテキスト

用途: AI生成物の論理的評価
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル生成動画強化学習

PercepCap: Video Captioner with Structured Spatio-Temporal Perception

ビデオキャプション生成には、空間と時刻の理解が重要です。PercepCapアルゴリズムは、ビデオ入力を空間時刻認識に分解することで、生成されたキャプションの理解度が向上するとともに、空間時刻の誤差をより正確に検出でき、キ

用途: ビデオキャプション生成のための構造化された空間時刻の理解
難易度: Hard
コスト: High

Look Less, Think Faster: Joint Token-Compute Adaptation for Multimodal LLMs

多モーダルラージランゲージモデルは、視覚言語タスクに強いですが、高い推論コストで問題となっています。Look Less, Think Fasterアルゴリズムは、単位次元を個別に最適化することで、多モーダルラージランゲー

深層学習軽量化・量子化画像テキストマルチモーダル

用途: 多モーダルラージランゲージモデルによる視覚言語タスクでのコスト削減
難易度: Hard
コスト: High

自然言語処理大規模言語モデル画像テキストマルチモーダル

Diverse-Intent Multi-Turn Fashion Image Retrieval

複数ターンのファッション画像検索は、実世界のファッション検索では重要なタスクです。Diverse-Intent Multi-Turn Fashion Image Retrievalアルゴリズムは、異なる検索用途を扱うこと

用途: 複数ターンのファッション画像検索
難易度: Hard
コスト: High

センサ/時系列深層学習軽量化・量子化QA画像テキスト

Multimodal Large Language Models for Remote Sensing Image Understanding: Domain-Specific or General-Purpose?

画像理解のための多モーダルラージランゲージモデルは、強力ですが、まだ能力と限界については明確な理解が不足しています。この論文では、多モーダルラージランゲージモデルが画像理解においてどの程度の能力と限界を持つか、を分析し、

用途: 画像理解における多モーダルラージランゲージモデルの能力と限界
難易度: Hard
コスト: High

センサ/時系列品質予測/異常検知自然言語処理大規模言語モデル生成画像テキスト

RS-RIE-Bench: Benchmarking Reasoning-Guided Remote Sensing Image Editing

Remote sensing image editing aims to modify remote sensing images according to natural language instructions w

用途: 生成
難易度: Hard
コスト: High

自然言語処理大規模言語モデル画像テキストマルチモーダル

Development of an automated, reliable, and clinically meaningful artificial intelligence (AI) tool for diagnosing cardiac disease from conventional cardiovascular magnetic resonance (CMR) images

Aims: Cardiovascular magnetic resonance (CMR) imaging enables non-invasive assessment of myocardial structure,

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

MI向き品質予測/異常検知自然言語処理大規模言語モデル生成画像テキスト

ETPDesigner: Multi-Agent Orchestration for Interactive Multimodal Electronic Theater Program

ETPデザイナはマルチモーダルな電子シアターのデザインを自動化するフレームワークを提案します。

用途: 生成
難易度: Hard
コスト: High

MV-Bench: Benchmarking Multimodal Large Language Models for Coordinated Multi-View Interface Construction

Multimodal large language models (MLLMs) are increasingly expected to automate visualization development by ge

用途: 生成
難易度: Hard
コスト: High

自然言語処理大規模言語モデルセグメンテーション画像テキスト

Memory-Augmented Multimodal Large Language Models for Small Object Understanding in Streaming Aerial Videos

この研究では、ドローンで小さな物体を認識することを目的としたメモリ拡張型大規模言語モデルを開発しました。このモデルは、複雑なドローンの場面で、ユーザーの指示に従って物体を識別できるようになります。

用途: ドローンで物体認識を実行する
難易度: Hard
コスト: High

LENS: LLM-guided Environment Simplification for Planning and Control in Clutter

Despite recent advances in general-purpose robotic manipulation, real-world multi-object clutter remains chall

深層学習軽量化・量子化マルチモーダル

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Total Variation Distance Estimation in Autoregressive Models

自動変換モデルで使用されるLLMの同定の精度の評価に役立つ「Total Variation Distance Estimation」を行った研究。この研究では3種類のアクセスモデルと異なる推定方法を提案し、実験で推定方

用途: LLMの同定の精度の評価のためのTV距離の推定
難易度: Hard
コスト: High

Optimizing Regret

決定関数とコストの関数間の共変性により、損失関数を最適化することで、適切な行動決定を可能にすることができます。また、これに基づいて、共変性の傾向を最適化する方向性を考察し、正確に予測された結果を持つモデルを導出するのに役

用途: 適切な行動決定のための損失関数の最適化
難易度: Hard
コスト: High

Adaptive Capitulation: A Structural Failure Mode of LLM Responses in Vulnerability Contexts

Large language models operating in emotionally sensitive contexts face a structural trilemma: when users in vu

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Scaling Laws for Hypernetwork-Based Knowledge Injection in Large Language Models

ハイパーネットワークを用いた知識付与法を提案し、大規模言語モデルに確実に知識を付与する方法について検討した。

自然言語処理大規模言語モデル異常検知テキスト

用途: LLMに知識を付与
難易度: Hard
コスト: High

Twin Agent: Context Residual Compression for Privilege Separated Agents

Large language model (LLM) agents are vulnerable to security risks, such as prompt injection attacks from untr

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Knowledge-Centric Self-Improvement

知識を重視した自己向上の研究を実施し、自己向上を知識を重視することにより効果的に行う方法を提案した。

用途: 知識を重視した自己向上
難易度: Hard
コスト: High

When Reasoning Narrows the Move: Diversity Collapse in LLM Game Play

Supervised fine-tuning (SFT) is widely used to adapt large language models to downstream tasks, but its effect

用途: 生成
難易度: Hard
コスト: High

Copy Less, Ground More: Overcoming Repetitive Copying in Long-Context Reasoning via Evidence-Aware Reinforcement Learning

Large language models that generate step-by-step reasoning traces have achieved strong performance on complex

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Agents in the Wild: Where Research Meets Deployment

分散型言語モデル（LLM）やコンテキストを活用するエージェントは、製品開発やファイナンス分野で活用されている。エージェントを実用化するには、堅牢性、安全性、信頼性を確保することが大切となる。このチュートリアルでは、エー

用途: エージェントの実践
難易度: Hard
コスト: High

Two-Level Meta-Rubrics for Evaluating Open-Ended Generation: GAMUT, a Benchmark for Factual Completeness

Evaluating the factuality of long-form generations has focused predominantly on precision, measuring whether t

用途: 生成
難易度: Hard
コスト: High

表形式向き自然言語処理大規模言語モデルテキスト表形式

Prompt Design at Scale: How Format, Instruction Count, and Context Length Shape Instruction Adherence and Hallucination in Large Language Models

Practitioners make three prompt-design decisions with almost no controlled evidence behind them: how to format

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Inference-Time Steering for Cross-Lingual Factual Consistency in LLMs

Although Large Language Models (LLMs) demonstrate remarkable multilingual fluency, their internal knowledge re

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

MeetingToM: Evaluating Multimodal LLMs on Theory-of-Mind Reasoning in Multi-Party Meetings

Theory of Mind (ToM), the ability to infer other's beliefs, intentions, and states of knowledge, is central to

自然言語処理大規模言語モデルQAテキスト音声

用途: QA
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル翻訳テキスト強化学習

The Price of Reasoning: Cost-Quality Tradeoffs in Reinforcement Learning for Neural Machine Translation

この研究では、学生チームのテーブル演習（TTX）における評価方法を提案し、複雑でオープンエンドな状況にあるチームの行動とコミュニケーションを記録できるTTX学習プラットフォームを使用します。

用途: 計算機教育のチーム問題解決能力評価
難易度: Hard
コスト: High

説明可能品質予測/異常検知自然言語処理大規模言語モデル生成テキスト強化学習

Beyond Score Prediction: LLM-Based Essay Scoring and Feedback Generation via Reinforcement Learning with Rubric Rewards

Large language models (LLMs) have been widely applied to automated essay scoring (AES) and automated feedback

用途: 生成
難易度: Hard
コスト: High

Computational Humor with Multimodal LLMs: Methods, Datasets, Evaluation, and Challenges

Multimodal humor in memes, cartoons, and comics remains difficult for AI systems because intended meaning depe

自然言語処理大規模言語モデル分類生成画像

用途: 分類
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル分類検出生成

AutoJourn: Multi-Perspective Summarisation, Bias Detection and Bias Neutralisation for LLM-Generated News in Automated Journalism

We present AutoJourn, a demonstration system for multi-perspective news generation and bias-aware evaluation u

用途: 分類
難易度: Hard
コスト: High

Measuring Reward-Seeking via Contrastive Belief Updates

この研究では、強化学習の報酬探求を量化するために、新しい測定方法を提案しています。この方法は、モデルが報酬を取得する際にどのように操作しようとしているかを示すことができます。

用途: 強化学習における報酬探求の測定
難易度: Hard
コスト: High

Reasoning Error from Known Fact: Step-Level Self-Consistency Group Relative Policy Optimization for LLM

人間は、大きな言語モデルを使って長い論理的推論を行うが、このような推論の結果は正しくない可能性がある。ここでは、これらの発言を検証する手法を提唱する。

用途: 大型言語モデルの中での論理的推論を検証する
難易度: Hard
コスト: High

HindsightBench: A Black-Box Behavioral Audit Protocol for Parametric Hindsight in Time-Indexed LLM Decision Tasks

大規模言語モデルは、決定タスクを遂行する過程で、実行された事実を含むパラメトリックな知識を漏らす傾向にある。大規模言語モデルが実際にどのような意思決定タスクを遂行したかを検証するのは困難であるものの、これが確かに事実であ

用途: LLMによる金融意思決定タスクの検証
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル生成QAテキスト

AILQA: Evaluating AI-Driven Legal Question Answering Systems for the Indian Legal System

This comprehensive study introduces an advanced Artificial Intelligence for Indian Legal Question Answering (A

用途: 生成
難易度: Hard
コスト: High

CASE: Causal Alignment and Structural Enforcement for Improving Chain-of-Thought Faithfulness

Chain-of-thought (CoT) reasoning is widely used to improve both the performance and interpretability of large

説明可能自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Hard
コスト: High

AI Tour Meeting: Group Travel Planning by LLM Agents

This paper proposes AI Tour Meeting, a group travel planning framework powered by multiple Large Language Mode

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

RF-Agent: A Practical Framework for Building Language Agents for RFIC Design

Large language models (LLMs) have driven rapid progress in electronic design automation (EDA), yet their appli

用途: 生成
難易度: Hard
コスト: High

BaseRT: Advancing Best-in-Class LLM Inference with Apple M5 Neural Accelerators

Apple's M5 generation introduces a redesigned GPU architecture in which every core carries a dedicated Neural

用途: 生成
難易度: Hard
コスト: High

AgentDebugX: An Open-Source Toolkit for Failure Observability, Attribution, and Recovery in LLM Agents

LLM agent failures are difficult to debug because the step where an error surfaces is often not the one that c

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Find Before You Fine-Tune: A Diagnostic Study of Small LLMs for Cybersecurity QA

Large Language Models (LLMs) are increasingly fine-tuned for critical-domain Question-Answering (QA), yet choo

用途: 分類
難易度: Hard
コスト: High

Semantic Primes as Explanans for Emotion in Large Language Models

大判言語モデル（LLM）における感情の解釈を研究し、感情表現は内在する主観的変数によってどのように説明されるかを問う。

用途: 感情解釈
難易度: Hard
コスト: High

Fusion Embedding: A Unified Embedding Space for Text, Image, Video, and Audio

A single embedding space that covers text, images, video, and audio lets one index serve every query a user ca

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformerテキスト

Mark, Don't Erase: Token Inoculation for Dual-Use Knowledge in LLMs

ここでは、危険な知識を持つモデルにコントロールトークンを追加し、コントロールトークンに基づいてモデルが危険な知識を操作することを目標としていました。

用途: 多用語の安全管理
難易度: Hard
コスト: High

センサ/時系列自然言語処理大規模言語モデル画像テキスト動画

D3VL: Understanding Driving Scenes from 3D Time Series Data and Video with Language Models

Recent advances in Multimodal Large Language Models (MLLMs) have triggered the development of end-to-end MLLMs

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

MI向き品質予測/異常検知自然言語処理大規模言語モデル画像音声動画

OmniReasoner: Thinking with Long Audio-Video via Native Tool Use

オリジナルのデータとZoom-Inのツールを組み合わせた方法、OmniReasonerを提案する。これにより、オリンモードルLLMsの長いオーディオビデオの論理的推論を改善できる。

用途: 長いオーディオビデオの論理的推論を改善する
難易度: Hard
コスト: High

表形式向き深層学習軽量化・量子化テキスト3D強化学習

Intelligent Multi-UAV Navigation in ITNTNs: A Hierarchical LLM Approach

The deployment of high-speed Uncrewed Aerial Vehicles (UAVs) in 3D aerial highways necessitates robust coordin

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル検出

LLM Detection as an Intervention: Downstream Impact under Strategic User Behavior

LLMが広く使用されるようになり、LLMを識別するツールが開発されている。しかし、識別システムは、使用者の行動に影響を与えている。つまり、識別システムが機能しないと、ユーザが別のシステムを使用することに関連し、最終的な

用途: LLMを識別
難易度: Hard
コスト: High

Program Synthesis for Simulation-Based Inference: Joint Model Selection and Parameter Estimation

Neural simulation-based inference enables parameter estimation for complex models, but typically requires the

深層学習Transformer生成画像テキスト

用途: 生成
難易度: Hard
コスト: High

The Story Shapes the Agent: Narrative Priors in LLM Behavior

Persona prompting is widely used to steer LLM agent behavior, yet the narrative framing of a task can matter m

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Search-on-Graph-R1: Training Large Language Models to Search Knowledge Graphs with Reinforcement Learning

Knowledge graph question answering (KGQA) requires navigating from topic entities to an answer several relatio

自然言語処理大規模言語モデルQAテキスト強化学習

用途: QA
難易度: Hard
コスト: High

表形式向きMI向き自然言語処理大規模言語モデルテキスト

Structured Output Collapses Answer Diversity Across 44 Language Models

When a language model must choose one answer from a large space of equally valid options, a format clause -- "

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Using Fine-Tuned LLMs to Identify Indicators of Vulnerability in UK Police Incident Logs

Purpose: Understanding how much of routine policing involves vulnerable people could inform resourcing, traini

自然言語処理大規模言語モデル分類

用途: 分類
難易度: Hard
コスト: High

Relay-Bench: Evaluating LLMs on Multi-Domain Reasoning Chains

Introducing Relay-Bench, an unsaturated, holistic, text-only benchmark that measures LLMs' ability to complete

自然言語処理大規模言語モデル画像テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Building a European Multilingual Evaluation Dataset: The MMLU Localisation Project within the EMT Network

This paper reports on a collaboration between the Directorate-General for Translation (DGT) and the European M

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Enabling Multilingual Privacy Policy Audits: Large-Scale Analysis of Spanish Mobile Apps

Automated analyses of privacy policies enable large-scale assessments of transparency in digital ecosystems, y

用途: 分類
難易度: Hard
コスト: High

Convolution for Large Language Models

Large language models (LLMs) largely rely on Transformers, where self-attention provides global token interact

深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Automated Discovery Has No Universally Superior Harness

Autonomous discovery systems such as OpenEvolve and TTT-Discover are often used as general-purpose harnesses.

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

It's Not What You Say, It's How You Say It: Evaluating LLM Responses to Expressions of Belief

Users frequently express their beliefs to large language models (LLMs). In some situations, the LLM should acc

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer分類テキスト

SWE-Pruner Pro: The Coder LLM Already Knows What to Prune

Pruning long context for coding agents has been a vital technology for efficient context management. While exi

用途: 分類
難易度: Hard
コスト: High

MI向きセンサ/時系列自然言語処理大規模言語モデル

VEHBench: A Stage-Local Diagnostic Benchmark for LLM-Assisted Vibration Energy Harvester Design

Battery-free Internet of Things (IoT) requires iterative design of vibration energy harvesters (VEHs) under co

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Operational Hallucination and Safety Drift in AI Agents

Large language models (LLMs) serving as planners in tool-using autonomous agents introduce dynamic reliability

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

MI向き品質予測/異常検知深層学習軽量化・量子化生成テキスト3D

Do Language Models Dream of Binding Molecules? Benchmarking LLMs under Spatial Constraints

Structure-based drug design (SBDD) leverages the 3D structure of protein targets, often complemented by other

用途: 生成
難易度: Hard
コスト: High

How Does Alignment Tuning Shape Representations of Sycophancy and Related Cue-Induced Biases in LLMs?

研究では、LLMの不正回答を起こす根本原因を探りました。モデルを5つの家族と7つのBCTバイアスのタイプで検討すると、モデル内の特定のパターンが見つかりました。このパターンが不正回答の根本原因となります。

少数データ向き自然言語処理大規模言語モデル

用途: LLMの不正回答の根本原因の特定
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化テキスト強化学習

LLM-as-a-Coach: Experiential Learning for Non-Verifiable Tasks

この研究では、ルビック評価を含む非確認タスクの最適化を目的とします。従来のRLには、モデル評価の情報が使われるだけですが、モデル自身は反省や自己改善はすることがありません。ここでは、LJMをコーチとみなして、モデルが反省

用途: ルビック評価を含む非確認タスクの最適化
難易度: Hard
コスト: High

VDAR-Router: Adaptive LLMs Routing via Verbalized Query Difficulty Analysis Retrieval

大きな言語モデルは実用システムで増えているため、費用対効果のあるモデルを選択することが重要になる。モデルを割り当てるためにLKM路線が提案された。しかし、既存の路線方法は入力問に基づいてモデルを選択し、モデルに適合しない

用途: model routingの問題を解決
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト動画

FARO: Feasibility-Aware Robot Motion Optimization

Fast planning of novel behaviors in unseen scenarios remains a fundamental challenge in robotics. The high-dim

用途: 生成
難易度: Hard
コスト: High

Receiver-Centered Robot-to-Human Handover with Grasp-Aware Object Orientation

共役ロボットは、人間オペレータと同梱するワークスペースを共有し、機械手のハンドオーバーなどの安全性の高いマイクロイベント頻繁に発生します。但し、従来の静的なハンドオーバーは、非対称の産業工具を取り扱う際、不自然な抓を持つ

自然言語処理大規模言語モデル分類3D

用途: 道具のハンドオーバー
難易度: Hard
コスト: High

arxivPaper only2026-07-19

Kernelized Linear Attention: Breaking the Capacity Wall with Symmetric Cones

Linear attention promises constant-time recurrent inference but degrades sharply on associative recall. We for

深層学習RNN / LSTM異常検知

用途: 異常検知
難易度: Hard
コスト: High

arxivPaper only2026-07-19

Efficient Sequential Evaluation of Large Language Models

We study the problem of sequentially evaluating a new large language model (LLM) on a fixed question set using

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

説明可能品質予測/異常検知自然言語処理大規模言語モデル生成テキスト

arxivGitHubあり2026-07-19

CoEvoP&R: Co-Evolving Placement Objectives with Routing Feedback via Large Language Models

Analytical placers rely on differentiable objective functions to guide placement, typically combining intermed

用途: 生成
難易度: Hard
コスト: High

arxivPaper only2026-07-18

A Causal Markov Condition for Value

This paper proposes a causal independence principle for value -- the value Causal Markov Condition (v-CMC) --

自然言語処理大規模言語モデルテキスト音声

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-07-18

How to Build Marcus's Algebraic Mind: From Minsky's Emotion-Machine Viewpoint

In The Algebraic Mind, Marcus identified three cognitive components: operations over variables, recursively st

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-07-18

How to Build Marcus's Algebraic Mind: From Thagard's Brain--Mind Viewpoint

Two critiques of connectionist cognition converge on one missing capacity. In The Algebraic Mind, Marcus isola

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Dimension-invariant uniform consistency of the empirical spatial distribution function and its associated spatial depth estimator

空間分布関数は、データ分析における重要な手法であるが、正確な空間分布関数を評価する方法が必要。この問題を解決するために、空間分布関数を評価する方法を提案。

用途: 空間分布関数の評価
難易度: Hard
コスト: High

ASK-NN: An Asymmetric Nearest-Neighbor Test that detects Distribution Drifts in Natural Language

Hallucinations and artificial text in LLM-generated outputs often appear as distributional deviations between

自然言語処理大規模言語モデル検出テキスト

用途: 検出
難易度: Hard
コスト: High

Evolutionary Algorithm-Guided LLMs for Physics-Informed Neural Network Design

Physics-informed neural networks (PINNs) are unusually sensitive to interacting choices of architecture, activ

用途: 生成
難易度: Hard
コスト: High

Vision-Language-Motion Maps: An Open-Vocabulary, Uncertainty-Aware, Queryable Motion Attribute for 3D Scene Maps

この研究では、動的なシナリオを分析するために可視化した地図上にMotion Attributeを付与し、Language QueryによるMotion Attributeフィルタを使用して分析することができます。

自然言語処理大規模言語モデル3Dマルチモーダル

用途: 可視化した地図上での動的なシナリオの分析
難易度: Hard
コスト: High

arxivPaper only2026-07-16

Sharp Stability Threshold and Certification for Designing Stable Residual Architectures

弾性的深層ネットワークの安定性に関連する問題を解決するための新しい原理が提案されました。この原理は、入力量のエクスポネンシャルに基づく安定性しきい値が得られます。この安定しきい値は、各残差ブロックの速度場の入力量のエクス

MI向き深層学習Transformer

用途: 安定性問題の解決
難易度: Hard
コスト: High

arxivPaper only2026-07-15

Supervised Fine-Tuning vs. In-Context Learning: An Equilibrium Analysis of LLM Personalization under Congestion

Large Language Models（LLM）の個別化はモデルを適応させることができるが、計算リソースが限られている状況では、コストがかかるSupervised Fine-Tuning法か、軽量なIn-Contex

深層学習軽量化・量子化回帰テキスト

用途: LLMの個別化の戦略
難易度: Hard
コスト: High

arxivPaper only2026-07-15

Analogical Deep Research: Retrieving and Integrating Historical Analogies for Foresight Analysis

述語学習における歴史的類推を推測し、歴史的類推を評価するためのアナロジーディープリサーチという新しいタスクを提案し、述語学習における歴史的類推が重要な役

用途: 述語学習で歴史的類推
難易度: Hard
コスト: High

arxivPaper only2026-07-15

How to Guide LLM Generation: Dual-Surrogate Guided Search for Automated Heuristic Design

Large language models (LLMs) have made automated heuristic design (AHD) increasingly practical by generating e

説明可能自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Hard
コスト: High

arxivPaper only2026-07-13

Are we Merging the Right Models? Impact of Expert Training Duration on Model Merging for LLMs

Multi-task model merging combines separately trained expert models into a single model that handles all tasks

品質予測/異常検知自然言語処理大規模言語モデル

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-07-13

Long-Memory Reservoir Computing for Data-Scarce Dengue Forecasting

大型言語モデル(LLM)は最近急速に普及していますが、その推論に際してはAI加速器が必要になります。トークンフェーズはLSTMなどのニューラルネットワークで処理される分野ですが、現在AI加速器におけるこの分野の効率を向上

センサ/時系列深層学習RNN / LSTM回帰予測時系列

用途: AI加速器でのLLMトークンフェーズを最適化する
難易度: Hard
コスト: High

arxivPaper only2026-07-10

Deep Learning for Dynamic Programming with Recursive Utility Using First-order Conditions

This paper proposes the certainty-equivalent first-order learning (CEFOL) algorithm, a deep learning algorithm

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-07-08

Institutional Red-Teaming: Deployment Rules, Not Just Models, Causally Shape Multi-Agent AI Safety

複数のエージェントの行動を分析するための方法を提案した。複数のエージェントの行動を

用途: 複数のエージェントの行動を分析する
難易度: Hard
コスト: High

arxivPaper only2026-07-07

Do You Remember? Toward Memory-Centric Multimodal AI

Human memory is reconstructive, not a faithful recording. Current multimodal LLMs (MLLMs) lack this capability

品質予測/異常検知深層学習軽量化・量子化画像テキスト3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-07-07

Strategic Bargaining in Multi-Buyer Markets: Reinforcement Learning from Verifiable Rewards for LLM Negotiations

複数の買い手を持つ市場における交渉システムを構築します。マーケットの規模を知り切れていない場合、セラーの損失が生じます。セラーは市場の規模を測る必要がありますが、これは複数の買い手を持つ場合に困難です。

用途: 複数の買い手を持つ市場における交渉
難易度: Hard
コスト: High

LLM for the development of FCM

This article is about the development of a fuzzy cognitive map using a local large language model. In the ligh

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

少数データ向きCPUで試しやすい条件最適化深層学習軽量化・量子化生成テキスト

LLM-Driven Evolutionary Generation of Multi-Objective Bayesian Optimization Algorithms

Designing effective multi-objective Bayesian optimization (MOBO) algorithms requires balancing many interdepen

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成テキスト

QDEvo: A Multi-Objective Quality-Diversity Framework for Automated Heuristic Design

The integration of Large Language Models (LLMs) with evolutionary computation has emerged as a powerful paradi

用途: 生成
難易度: Hard
コスト: High

Heaviside Continuity of Rolling Coefficients for Eliminating Epistemic Entropy in Large Language Models

本研究では、推論プロセスの検証を目的とした Heaviside 不連続性の考慮を提案する。これにより、推論プロセスにおける潜在的なミスを検出した上で、正しい出力を生成することができる。

用途: 大容量言語モデルでの推論の検証
難易度: Hard
コスト: High

arxivPaper only2026-07-05

Decentralized Aggregation of LLM Predictions via Wagering Mechanisms

It is increasingly common to aggregate predictions from multiple LLMs, each with domain expertise or access to

自然言語処理大規模言語モデル予測

用途: 予測
難易度: Hard
コスト: High

arxivPaper only2026-07-03

Rank-Order N-of-M Codes for Sparse Distributed Memory: Disentangling Representation and Learning Effects in Noise Robustness Against Contemporary Neuromorphic Architectures

Large language models remain limited as continual learning systems, motivating renewed interest in Sparse Dist

表形式向き自然言語処理大規模言語モデル埋め込みテキスト表形式

用途: 埋め込み
難易度: Hard
コスト: High

arxivPaper only2026-07-03

Scaffolding the Strategist: Architecture-Dependent Reasoning Interventions in Hotelling Spatial Markets

We investigate whether structured reasoning interventions improve the strategic economic reasoning of large la

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-06-30

A Lifecycle and Application-Stack Survey of Large Language Model Vulnerabilities: Attacks, Risks, Defenses, and Open Problems

LLMの不正行為に対する防御。この研究では、LLMの不正行為を防ぐための防御の枠組みを開発し、LLMの不正行為の危険性を分析する。

用途: LLMの不正行為に対する防御
難易度: Hard
コスト: High

arxivPaper only2026-06-30

Incentivizing Data Trading via Profit Reallocation

データ市場におけるデータの取引の促進。この研究では、データの取引を促進するための経済的インセンティブを開発する。

用途: データ市場におけるデータの取引の促進
難易度: Hard
コスト: High

arxivPaper only2026-06-29

Semantics-Aware Bilevel Co-Evolution: Towards Automated Multicomponent Algorithm Design

LLM-assisted evolutionary search (LES) has emerged as a promising paradigm for automated algorithm design. How

品質予測/異常検知自然言語処理大規模言語モデル生成

用途: 生成
難易度: Hard
コスト: High

arxivGitHubあり2026-06-28

When LLMs Develop Languages: Symbolic Communication for Efficient Multi-Agent Reasoning

Chain-of-Thought (CoT) improves large language models (LLMs) on difficult reasoning tasks, but it often incurs

MI向き深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-06-28

Travel-Oriented Reasoning Large Language Model via Domain-Specific Knowledge Graphs

Large language models (LLMs) demonstrate broad reasoning abilities but struggle with accuracy and reliability

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-06-27

LLM Semantic Signaling Game and Mechanism Design: Systematic Blindness, Awareness Shaping, and Mindset Dynamics

Large language models (LLMs) increasingly mediate strategic interactions through natural language, making sema

自然言語処理大規模言語モデル検出テキスト

用途: 検出
難易度: Hard
コスト: High

arxivPaper only2026-06-26

Triadic Werewolf: A Jester Role for Multi-Hop Theory of Mind in LLMs

これは、LARGE LANGUAGE MODELS (LLM) の理論心の評価を拡張し、三重なるWerewolfゲームを追加しました。

用途: 三重なるWerewolfゲーム
難易度: Hard
コスト: High

arxivPaper only2026-06-24

SidConArena: An Environment Evaluating Agents in Open-Ended,Positive-Sum Bargaining Game

Evaluating LLM agents requires dynamic environments that go beyond static reasoning and zero-sum games. Real-w

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-06-23

Distributed Quality-Diversity Search for Toxicity in Large Language Models

この研究では、多様性のあるトキシックテストを検索します。

用途: 多様性のあるトキシックテストの検索
難易度: Hard
コスト: High

arxivPaper only2026-06-23

Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability of Large Language Models under Fog of War

大規模言語モデルの戦略1対1ベンチマークであるAge of LLMを紹介。マインスイーパーゲームを想定し、フォーゲットオブラーサー、マインスイーパー対戦、JSONスキーマへの従属性という三つのストレスアウトを設定。

用途: マインスイーパーゲーム用ベンチマーク
難易度: Hard
コスト: High

arxivPaper only2026-06-22

YUKTI: From Natural-Language Situations to Robust, Verifiable Decisions An Uncertainty-Typed Proposition IR, Assumption-Robust Pareto Frontiers, and a Regret Certificate

Language models turn a worded situation into a numeric plan, and the dominant pipelines (NL4Opt, OptiMUS, ORLM

深層学習軽量化・量子化テキスト音声

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-06-22

Each Judge Its Own Yardstick: Discovering Per-VLM Taxonomies for Physical Video Evaluation

Maintaining physical consistency in video generators and world models increasingly relies on vision-language m

自然言語処理大規模言語モデルテキスト動画マルチモーダル

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-06-22

Measuring Behavior Portability in Large Language Models

この研究では、モデルの行動を分析し、モデルの行動を他の環境に適応させる能力を評価する方法であるBehavioral Portability Testを開発しました。

説明可能深層学習Transformerテキスト

用途: 弾力性と行動をポートレートする
難易度: Hard
コスト: High

Multi-Level Resistive Synapses for On-Chip Neural Networks: A Physics-Based Design of a Memristive Crossbar Fabric with Quasi-Continuous Conductance States

Building on resistive communication, this paper presents a physics-based design of an on-chip neural network w

深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Emergent Culture in Minimal LLM Systems

What happens when LLM agents operate with no context outside a turn, minimal prompting, and simple tools? Insp

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Stationary Robust Mean-Field Games under Model Mismatches

Deploying multi-agent reinforcement learning (MARL) in the real world is often limited by model mismatches bet

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Theorist Toolbox: Tools for Agent Based LLM-assisted economic theory Research

Empirical economists often start their projects with a toolbox. Shared packages, replication archives, and cir

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-06-18

Formally Verified Code Synthesis for Structured Data Translation in a Medical Internet of Things

In this work we present a LLM powered, evolutionary code synthesis system for structured data translation in a

表形式向き自然言語処理大規模言語モデル生成表形式

用途: 生成
難易度: Hard
コスト: High

arxivPaper only2026-06-13

Large Language Model-Driven Cooperative Operator Ensemble Evolution for Permutation Flow Shop Scheduling

この研究では、PFSPのIterated Greedy (IG) アルゴリズムのパフォーマンスを改善するために、Large Language Model-Driven Cooperative Operator Ensem

少数データ向きCPUで試しやすい品質予測/異常検知自然言語処理大規模言語モデルテキスト

用途: PFSPのIterated Greedy (IG) アルゴリズムのパフォーマンスを改善すること。
難易度: Hard
コスト: High

arxivPaper only2026-06-12

MeEvo: Metacognitive Evolution Combined with Natural Evolution for Automatic Heuristic Design

この研究では、自動補助関数設計（AHD）についての研究を行った。AHDは、マシン学習が可能になる以前から研究されていたトピックであり、マシン学習によって、AHDがさらに活用可能になった。この研究では、AHDにおけるメタ認

用途: 自動補助関数設計
難易度: Hard
コスト: High