MLinfo | 機械学習・AI論文まとめ

MLinfo|日々更新される技術をキャッチアップ/検索

「text」の検索結果

107 件

すべて arxiv github huggingface 実装あり

huggingfaceGitHubありHugging Faceあり2026-06-07

Trajectory-Refined Distillation

On-policy distillation (OPD) has become a central post-training tool for large language models (LLMs), providi

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

SWE-Explore: Benchmarking How Coding Agents Explore Repositories

Repository-level coding benchmarks such as SWE-bench have driven a rapid surge in the capabilities of coding a

深層学習軽量化・量子化検出テキスト

用途: 検出
難易度: Easy
コスト: Low

→

huggingfaceHugging Faceあり2026-06-05

On the Geometry of On-Policy Distillation

On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training

深層学習軽量化・量子化検出生成テキスト

用途: 検出
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices

We present SigmaScale, a method for learning auxiliary scaling matrices S to aid truncated Singular Value Deco

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-05

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

Large language models exhibit impressive zero-shot capabilities across a wide range of downstream tasks. Howev

品質予測/異常検知自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

MMAE: A Massive Multitask Audio Editing Benchmark

We introduce MMAE, a Massive Multitask Audio Editing benchmark, serving as the first comprehensive evaluation

MI向き自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization

Despite being a pivotal frontier, interactive world modeling remains underexplored in terms of the versatile c

コンピュータビジョン3D・点群テキスト3D

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-05

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

Video understanding is being rapidly transformed by multimodal large language models (MLLMs), as research move

深層学習軽量化・量子化画像テキスト音声

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

dots.tts Technical Report

We present dots.tts, a 2B-parameter continuous autoregressive text-to-speech (TTS) foundation model that model

センサ/時系列品質予測/異常検知深層学習軽量化・量子化生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

Towards Retrieving Interaction Spaces for Agentic Search

Retrieval for search agents is still inherited from non-agentic information retrieval: a retriever ranks the c

自然言語処理大規模言語モデル検索テキスト

用途: 検索
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors

Despite advances in 3D scene understanding, existing 3D Large Multimodal Models operate in offline settings, r

深層学習軽量化・量子化生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

Confidence-based loss weighting is usually avoided in generative models because it accelerates errors when the

センサ/時系列自然言語処理ファインチューニング生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

Empirical Study on the Characteristics and Evolution of AI-usage in GitHub Repositories: Evidence from Code Comments

Developers increasingly use AI tools such as ChatGPT, Copilot, and Claude in everyday software workflows, but

深層学習Transformer分類生成テキスト

用途: 分類
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives

Hard-negative source selection for dense retrieval is usually decided only after fine-tuning and downstream ev

深層学習Transformer検索テキスト

用途: 検索
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling

Harmony is a compact symbolic layer where mathematical pitch relations, acoustic consonance, and musical conve

説明可能センサ/時系列品質予測/異常検知深層学習Transformer分類テキスト音声

用途: 分類
難易度: Easy
コスト: Low

→

huggingfaceHugging Faceあり2026-06-04

LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents

Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills i

MI向き深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

A Geometric Account of Activation Steering through Angle-Norm Decomposition

Linear activation steering has gained popularity as a simple and empirically effective way to control language

説明可能深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-06-04

Answer Presence Drives RAG Rewriting Gains

Retrieval-augmented QA pipelines often route retrieved passages through an LLM rewriter before a smaller reade

品質予測/異常検知自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents

Latent visual reasoning (LVR) inserts supervised latent tokens between perception and answer generation in vis

品質予測/異常検知コンピュータビジョンマルチモーダル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

Evaluating LLM mediators remains challenging, as mediation unfolds as a real-time trajectory shaped by disputa

MI向き深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

Object insertion aims to seamlessly composite a reference object into a specified region of a background image

MI向き品質予測/異常検知コンピュータビジョン3D・点群生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

OpenSkill: Open-World Self-Evolution for LLM Agents

Self-evolving agents requires adaptation after deployment, but existing approaches assume a usable learning lo

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term in

機械学習教師あり学習テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Low

→

huggingfaceHugging Faceあり2026-06-04

UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs

We introduce UnpredictaBench, an evaluation that tests the ability of large language models (LLMs) to capture

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning a

自然言語処理大規模言語モデル画像テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

LLM Explainability with Counterfactual Chains and Causal Graphs

Causal graphs provide a high-level language for making mechanisms transparent. Recent work uses Large Language

説明可能自然言語処理大規模言語モデル分類テキスト

用途: 分類
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-04

Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

Despite the rapid progress of Vision-Language Models (VLMs), the field lacks benchmarks that rigorously diagno

品質予測/異常検知深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation

We study the transformation of autoregressive models (ARLMs) into diffusion language models (DLMs). Rather tha

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark

In real-world applications, models are expected to perform reliably across diverse settings. Yet, many existin

自然言語処理大規模言語モデル画像テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing

深層学習RNN / LSTMテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story prog

自然言語処理ファインチューニングテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Low

→

huggingfaceHugging Faceあり2026-06-04

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

Planning for real-world problems by language models often involves both world and user constraints, which may

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing

Developing unified video generation and editing models capable of interpreting interleaved multimodal inputs i

深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by under

深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-04

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery

自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs

Large language models can reproduce training data, but existing memorization evaluations mostly measure whethe

深層学習軽量化・量子化生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Towards One-to-Many Temporal Grounding

Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predo

品質予測/異常検知自然言語処理大規模言語モデルテキスト動画

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Latent Reasoning with Normalizing Flows

Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the

自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

Video event prediction (VEP) requires models to infer unobserved future states from partial video evidence. Ex

自然言語処理大規模言語モデル画像テキスト動画

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) excel at 2D semantic understanding but lack intrinsic 3D awareness, r

表形式向き自然言語処理大規模言語モデルテキスト動画3D

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions

Large language models are increasingly used to simulate social media users and infer how individuals may respo

深層学習Transformerテキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Benchmark Everything Everywhere All at Once

Benchmarks are fundamental for evaluating and advancing LLMs and MLLMs by providing standardized and explicit

品質予測/異常検知自然言語処理大規模言語モデルテキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Why Muon Outperforms Adam: A Curvature Perspective

Muon improves training efficiency over Adam in large language-model training by about two times, but the local

深層学習正規化・最適化手法テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Self-Evaluation Is Already There: Eliciting Latent Judge Calibration in Base LLMs with Minimal Data

Large language models are increasingly evaluated by other models, raising a natural question: can a model pred

少数データ向き品質予測/異常検知深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Vision language models (VLMs) excel at many tasks but still struggle with spatial reasoning when critical info

表形式向き説明可能コンピュータビジョンマルチモーダル画像テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on

自然言語処理RAGテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Low

→

huggingfaceHugging Faceあり2026-06-03

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

We introduce VideoKR, the first large-scale training corpus specifically designed to strengthen knowledge- and

自然言語処理ファインチューニング生成テキスト動画

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

Experience internalization converts contextual experience from past interactions into reusable parametric capa

品質予測/異常検知深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Personal AI Agent for Camera Roll VQA

We study the personal camera roll visual question answering setting. In this setting, a conversational AI assi

深層学習軽量化・量子化QA画像テキスト

用途: QA
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-06-03

SePO: Self-Evolving Prompt Agent for System Prompt Optimization

System prompt optimization improves agent behavior without modifying the underlying model, yielding human-read

自然言語処理RAG生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Video2LoRA: Parametric Video Internalization for Vision-Language Models

Processing video in vision-language models is expensive: each frame occupies hundreds of tokens, and inference

自然言語処理ファインチューニング要約QA画像

用途: 要約
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

BRepCLIP: Contrastive Multimodal Pretraining on BRep Primitives for CAD Understanding

Learning representations of CAD models is a largely open problem. While 3D representation learning has flouris

深層学習Transformer分類生成埋め込み

用途: 分類
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Audio Interaction Model

Audio is an inherently interactive modality, yet today's Large Audio Language Models (LALMs) are offline, and

強化学習マルチエージェントテキスト音声

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

ZipSplat: Fewer Gaussians, Better Splats

Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forw

品質予測/異常検知深層学習Transformer画像テキスト3D

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation

Autoregressive mesh generation has gained attention by tokenizing meshes into sequences and training models in

深層学習Attention機構生成テキスト3D

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

Training Data Attribution (TDA) seeks to trace a model's predictions back to its training data. The gold stand

センサ/時系列深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Evaluating Large Language Models in Dynamic Clinical Decision-Making with Standardized Patient Cases

Large language models (LLMs) are increasingly proposed as clinical agents, yet static, single-turn benchmarks

MI向き自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-03

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

Instruction-guided speech editing requires a model to modify specified speech attributes while preserving unre

自然言語処理大規模言語モデル生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Text-to-Image Models Need Less from Text Encoders Than You Think

Text-to-image models rely on text prompts as their primary interface to human intent. Prompts are encoded by a

品質予測/異常検知深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

Equipping Large Language Models (LLMs) to execute reliable multi-step workflows has become a central challenge

自然言語処理大規模言語モデル検出テキスト

用途: 検出
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

MAOAM: Unified Object and Material Selection with Vision-Language Models

Selection is a core operation in interactive image editing. To be practical, a user should be able to specify

MI向き自然言語処理RAG生成セグメンテーション画像

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs

Inference-time scaling has emerged as a critical avenue for enhancing Large Language Models' performance, yet

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-02

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science.

深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Qwen-Image-Flash: Beyond Objective Design

Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet

MI向き深層学習軽量化・量子化生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Multimodal agents in robotics, AR, and autonomous driving must reason about places and layouts from continuous

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト動画

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-02

Self-Distilled Policy Gradient

On-policy self-distillation, where a language model conditions on privileged context to supervise its own gene

深層学習軽量化・量子化生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-06-02

KletterMix: Climbing Toward High-Quality German Pretraining Data

High-quality pretraining data is a central ingredient in modern language models, but German-language resources

MI向き品質予測/異常検知自然言語処理RAGテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

MemTrain: Self-Supervised Context Memory Training

Memory is an indispensable capability for long-horizon LLM agents, enabling them to preserve and utilize infor

品質予測/異常検知自然言語処理大規模言語モデルテキスト自己教師強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching

Wide-baseline matching (WBM) requires integrating geometric understanding, viewpoint changes, fine-grained per

自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

We present AAD-1, an Asymmetric Adversarial Distillation framework for One-step autoregressive image-to-video

深層学習軽量化・量子化生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Large Language Models Hack Rewards, and Society

Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs

自然言語処理大規模言語モデル生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts

Existing benchmarks for MLLM-generated web artifacts assess interaction through local evidence and miss the re

品質予測/異常検知自然言語処理大規模言語モデル画像テキスト動画

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification

Structured financial audit verification is difficult for language-model agents because correctness depends on

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Computer-use agents extend language models from text generation to sustained interaction with files, terminals

自然言語処理大規模言語モデル検出生成テキスト

用途: 検出
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents

Large language model (LLM) agents are evolving from request-response assistants into long-running software act

自然言語処理大規模言語モデル回帰画像テキスト

用途: 回帰
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

When Graph Tokens Sink: A Mechanistic Analysis of Graph Language Models

Graph Language Models (GLMs) have become a promising direction for adapting Large Language Models (LLMs) to gr

深層学習軽量化・量子化テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Unlocking Feature Learning in Gated Delta Networks at Scale

Training and scaling Large Language Models demand enormous computational resources, motivating both efficient

深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-02

Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spe

深層学習軽量化・量子化生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-01

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

Agentic language model systems alternate between two structurally distinct step types: structured tool calls (

品質予測/異常検知深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-01

Parametric Social Identity Injection and Diversification in Public Opinion Simulation

Large language models (LLMs) have recently been adopted as synthetic agents for public opinion simulation, off

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-01

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeated

品質予測/異常検知自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-01

AdaCodec: A Predictive Visual Code for Video MLLMs

Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existin

自然言語処理大規模言語モデル画像テキスト動画

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-01

LLM Anonymization Against Agentic Re-Identification

Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become c

自然言語処理大規模言語モデル検出テキスト

用途: 検出
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-01

Cosmos 3: Omnimodal World Models for Physical AI

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, i

深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-01

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more s

品質予測/異常検知深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-01

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. Ho

自然言語処理RAG回帰テキストマルチモーダル

用途: 回帰
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-05-31

SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces

Large language models are increasingly deployed as coding agents, shifting safety from individual responses to

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-31

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

The rapid progress of frontier large language models has led to widespread benchmark saturation, limiting the

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-30

SDR: Set-Distance Rewards for Radiology Report Generation

Reinforcement learning with verifiable rewards has rapidly advanced reasoning in vision--language models. Howe

品質予測/異常検知深層学習Transformer生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-30

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substanti

品質予測/異常検知自然言語処理RAGテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-30

SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory

AI glasses present a compelling platform for AI agents to serve as personalized memory assistants. To be genui

深層学習Transformer分類QA画像

用途: 分類
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-05-29

The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models

Large Language Models exhibit paradoxical fragility in fundamental arithmetic, implying a disconnect between i

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-29

MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding

Multimodal Large Language Models (MLLMs) have demonstrated significant achievements in general visual question

品質予測/異常検知自然言語処理大規模言語モデル分類QA画像

用途: 分類
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-29

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as the cornerstone for shaping the

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-05-29

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

Speech translation systems increasingly span speech-to-text translation (S2TT), speech-to-speech translation (

品質予測/異常検知コンピュータビジョン動画認識生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-29

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

Humans can effortlessly perceive spatial layouts, form cognitive representations, reason about spatial relatio

コンピュータビジョン3D・点群検出テキスト3D

用途: 検出
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-28

Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajector

品質予測/異常検知自然言語処理大規模言語モデルテキスト自己教師強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-28

Multimodal Music Recommendation System using LLMs

Music recommendation systems typically treat songs as opaque tokens, relying on collaborative interaction hist

センサ/時系列品質予測/異常検知深層学習Transformerテキスト音声マルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-28

Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning

We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision b

自然言語処理ファインチューニング画像テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-27

Pruning and Distilling Mixture-of-Experts into Dense Language Models

Mixture-of-Experts (MoE) is now the dominant architecture for frontier language models, yet it requires all ex

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-27

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Language models can use verifiable rewards to improve at a wide variety of reasoning tasks. However, both para

説明可能深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-27

Augmenting Attention with Exponentially Decaying Memory Improves Query-Aware KV Sparsity

Efficient inference is critical for long-context language models, where attention computation and KV-cache acc

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-26

DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework tha

品質予測/異常検知自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-25

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple eva

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-22

SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

Vision-Language Models (VLMs) are increasingly deployed in embodied environments, where they need produce nume

自然言語処理ファインチューニング画像テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-04

Liberating LLM Capabilities in Full-Duplex Speech Models

Speech-based large language models are typically constrained to spoken replies, which limits their user-facing

自然言語処理大規模言語モデル生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-04-16

Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing

Diffusion-based image editing has achieved strong visual fidelity under natural language instructions, yet mos

品質予測/異常検知深層学習軽量化・量子化画像テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→