MLinfo | 機械学習・AI論文まとめ

MLinfo|日々更新される技術をキャッチアップ/検索

「text」の検索結果

190 件

すべて arxiv github huggingface 実装あり

githubGitHubあり2026-06-10

screenpipe — YC (S26) | AI that knows what you've seen, said, or heard. Records everything you do, say, hear 24/7, local, private, secure

ユーザーの行動を認識し、オートエージェントを構築するためのツール。

自然言語処理大規模言語モデルテキスト音声マルチモーダル

用途: オートエージェント構築
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-09

transformers — 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

🤗 Transformersは、テキスト・ビジョン・音声など複雑なモデル定義をサポートするフレームワークで、インフェレンスターやトレーニングに使用できる。

深層学習Transformer分類テキスト音声

用途: 機械学習モデル定義
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-09

paperless-ngx — A community-supported supercharged document management system: scan, index and archive all your documents

paperless-ngxは、コミュニティによってサポートされたスーパーチャージドのドキュメント管理システムで、ドキュメントのスキャン・インデックス・アーカイブが可能である。

強化学習方策勾配 (PPO / A3C)分類テキスト

用途: ドキュメント管理
難易度: Easy
コスト: Low

→

githubGitHubあり2026-06-09

diffusers — 🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

.diffusion モデルのライブラリ。画像・動画・音声生成に利用可能。

生成AI拡散モデル生成画像テキスト

用途: 画像・動画・音声生成
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-09

label-studio — Label Studio is a multi-type data labeling and annotation tool with standardized output format

データラベル化と注釈化を行うためのツールです。

コンピュータビジョン物体検出分類セグメンテーション画像

用途: データラベル化ツール
難易度: Easy
コスト: Low

→

githubGitHubあり2026-06-09

cs249r_book — Machine Learning Systems

マシンラーニングシステムの理論と実装に関する本。

深層学習テキスト

用途: 機械学習システム
難易度: Easy
コスト: Medium

→

githubGitHubあり2026-06-09

awesome-llm-unlearning — A resource repository for machine unlearning in large language models

このリポジトリは大規モデルの無学習に関するリソースをまとめたものです。

自然言語処理大規模言語モデルテキスト

用途: 大規模言語モデルの無学習
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-09

Meshroom — Node-based Visual Programming Toolbox

ノードベースのビジュアルプログラミングツールです。

コンピュータビジョン3D・点群画像テキスト3D

用途: ビジュアルプログラミングツール
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-09

unsloth — Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

Unsloth Studioは、オープンモデルのトレーニングと実行を支援するWebUIです。このライブラリは、Gemma4、Qwen3.5などのオープンモデルのテストとトレーニングを支援するために使われます。

自然言語処理大規模言語モデルテキスト音声

用途: オープンモデルのトレーニングと実行
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-09

sglang — SGLang is a high-performance serving framework for large language models and multimodal models.

SGLangは、大規模言語モデルのサービングフレームワークです。このライブラリは、高性能なサービスフレームワークで、大規模言語モデルのサービングをサポートしています。

深層学習Transformer画像テキストマルチモーダル

用途: 大規模言語モデルのサービングフレームワーク
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-09

Sana — SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

SANAは、高解像度画像生成モデルSANAを紹介する本研究であり、低計算コストで優れた高解像度画像を生成できる。

深層学習Transformer生成画像テキスト

用途: 高解像度画像合成
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-09

Helios — Helios: Real Real-Time Long Video Generation Model

長時間のビデオ生成を実現するためのモデルのサポートを紹介している。

深層学習軽量化・量子化生成画像テキスト

用途: ビデオ生成
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-09

haystack — Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

オープンソースのAIオーケストレーションフレームワークです。LLMアプリケーションの構築に必要なパイプラインやエージェントワークフローの設計ができるようになっています。

深層学習Transformer生成要約テキスト

用途: LLMアプリケーションの構築
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-09

DocsGPT — Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.

このリポジトリでは、トークナイザーの最適化を提供しています。

深層学習Transformerテキスト

用途: トークナイザーの最適化
難易度: Easy
コスト: Medium

→

githubGitHubあり2026-06-09

FunASR — Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

電気生理信号から表現を学習し、脳コンピューターインターフェースの開発を支援する。

深層学習Transformer分類検出テキスト

用途: 電気生理信号から表現を学習する
難易度: Easy
コスト: Low

→

githubGitHubあり2026-06-09

unstructured — Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

ドキュメントを構造化するために使えるオープンソースのETLソリューション。

表形式向き自然言語処理大規模言語モデル画像テキスト表形式

用途: ドキュメントの構造化
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-09

txtai — 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

LLMを利用するために、セマンティック検索やLLMのオーケストレーションなどを行えるフレームワーク。

深層学習Transformer生成テキスト

用途: セマンティック検索
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-09

TextBlob — Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

テキスト分析、センチメント分析や単語分割などを行えるライブラリ。

自然言語処理テキスト音声

用途: テキスト分析
難易度: Easy
コスト: Medium

→

arxivGitHubあり2026-06-08

Internalizing Geometric Law: Learning from Solver Residuals for Precision-Critical Generation

自然言語から機械設計や技術図案などの正確な構成を作成することができるシステムを開発しました。このシステムは、Geometric Constraintsを満たす正確な構成を作成するために、Constraint DSL (D

自然言語処理大規模言語モデル生成テキスト

用途: 機械設計や技術図案の生成
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

Beyond FLOPs: Benchmarking Real Inference Acceleration of LLM Pruning under a GEMM-Centric Taxonomy

分析研究は、LLM推論速度を速めるため、トークン、レイヤー、ヘッド、次元、注意パターンの削減技術である削減技術を適用し、広範なパラダイムとして成長しています。削減方法の実装によって、実現された加速の度合いは、ハードウェア

品質予測/異常検知深層学習軽量化・量子化テキスト

用途: LLM推論加速問題
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

INFUSER: Influence-Guided Self-Evolution Improves Reasoning

Self-evolution offers a scalable path to stronger reasoning: a pretrained language model improves itself with

機械学習教師なし学習テキスト教師なし

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

Clinical early warning systems built on electronic health records, in which clinical observations are recorded

説明可能センサ/時系列品質予測/異常検知自然言語処理大規模言語モデルテキスト時系列

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

An 84-Format Numeric Catalog with Bit-Exact Conformance Vectors: A Vendor-Neutral Reference for FP8, BF16, MXFP4, and Microscaling Formats

この論文では、数値形式の標準化を提案する。これにより、数字の解釈と操作がより効率的に行える。

機械学習教師あり学習テキスト

用途: 数値形式の標準化
難易度: Easy
コスト: Medium

→

arxivGitHubあり2026-06-08

From Rigid to Dynamic: Entropy-Guided Adaptive Inference for Long-Context LLMs

Existing sparse attention and KV cache compression methods for long-context LLM inference typically apply fixe

品質予測/異常検知自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

PhysScene: A Scene Graph Dataset for Scientific Visual Reasoning in Physics Experiments

Scene Graphs (SGs) provide structured representations of visual scenes by modeling objects and their pairwise

強化学習方策勾配 (PPO / A3C)画像テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: Medium

→

arxivGitHubあり2026-06-08

TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

可勉強のターブルの信号に関する表現モデルが、異なるトレーニングパラダイムを持つモデルを評価しやすくする基準であるTRL-Benchを提案している。

表形式向き品質予測/異常検知深層学習軽量化・量子化埋め込みテキスト表形式

用途: 可勉強のタブラー信号に対する表現モデルの評価基準を標準化する
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

DynaOD: Dynamic Origin-Destination Flow Generation with Discrete-to-Continuous Temporal Semantic Modeling

Dynamic origin-destination (OD) flow generation seeks to synthesize realistic mobility dynamics from temporal

深層学習Transformer生成テキスト

用途: 生成
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

FAME: Forecastability-Aware Mixture of Experts for Heterogeneous Time Series Forecasting

この研究では、複数の時系列予測を合わせたモデルを使用して、個々の時系列の特性を考慮した予測を行うFAMEを提案します。このモデルは、個々の時系列の特性を考慮することで、より正確な予測が可能になります。

表形式向きCPUで試しやすいセンサ/時系列深層学習Transformer予測テキスト時系列

用途: 多様な時系列予測
難易度: Easy
コスト: Low

→

arxivGitHubあり2026-06-08

Gradient-Guided Reward Optimization for Inference-time Alignment

Ensuring the reliability of Large Language Models (LLMs) under distribution drift requires inference-time adap

品質予測/異常検知深層学習軽量化・量子化検出生成テキスト

用途: 検出
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

Civil Court Simulation with Large Language Models

Court simulation bridges legal education and judicial practice, yet human-based simulations are costly and dif

品質予測/異常検知自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

MUDIDI: A Two-Stage Framework for Multilingual Dictionary Digitization with Language Models

この研究では、低リソース言語や絶滅言語の辞書のデジタル化が重要であるが、マルチモーダル辞書をデジタル化する方法は今まで難しかったが、この研究では、最近のビジョン言語モデルを用いて辞書のデジタル化が容易になり、辞書内の文字

品質予測/異常検知自然言語処理大規模言語モデル分類セグメンテーションテキスト

用途: ムルティリンガル辞書のデジタル化
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

What Should a Skill Remember? Quality-Cost Trade-offs in Cost-Aware Skill Rewriting for Language Model Agents

Large language model agents increasingly rely on skills: reusable procedural documents encoding workflows, too

品質予測/異常検知自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks

As large language models (LLMs) are increasingly applied to real-world legal tasks, evaluating the reliability

品質予測/異常検知自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

Language-Aware Token Boosting: LLM Language Confusion Reduction Without Tuning

Large language models (LLMs) sometimes exhibit language confusion when generating non-English text. Existing a

品質予測/異常検知自然言語処理大規模言語モデル要約テキスト

用途: 要約
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions?

Reasoning Vision-Language Models (VLMs) achieve strong performance on complex multimodal tasks, but reliable r

コンピュータビジョンマルチモーダル画像テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

Echo-DM: Ultrasound Marker Removal via Conditional Latent Diffusion and Region-Aware Fusion

Clinical ultrasound images often contain artificial markers, such as measurement calipers and text, to assist

品質予測/異常検知自然言語処理RAG画像テキスト音声

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-08

Temporal-Aware Reasoning Optimization for Video Temporal Grounding

Multi-modal Large Language Models (MLLMs) have achieved remarkable progress in video temporal grounding with r

品質予測/異常検知深層学習Transformer検出画像テキスト

用途: 検出
難易度: Hard
コスト: High

→

githubGitHubあり2026-06-08

mxcp — Model eXecution + Context Protocol: Enterprise-Grade Data-to-AI Infrastructure

データをAIに変換する基盤を構築することで、ビジネス上の問題を解決できます。この研究では、Model eXecution + Context ProtocolであるMXCPを提案し、データの変換を簡素化した上で、AIアプ

自然言語処理大規模言語モデルテキスト

用途: データをAIに変換する基盤を構築することによって、ビジネスを改善する
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-08

VoxCPM — VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

マルチラギングスピーチ生成やクリエイティブボイスデザイン、ルートライフクライミングなど、テクスチャファリーTTSの最新技術を実現するためのフレームワークです。

生成AI音声・音楽生成生成テキスト音声

用途: マルチラギングスピーチ生成
難易度: Easy
コスト: Medium

→

arxivGitHubあり2026-06-07

BLM-SGAN: Bidirectional Language Modeling for Semantic-Spatial Text-to-Image Generation

Despite the success of image generation from text descriptions, it still faces challenges that are difficult t

深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: Low

→

arxivGitHubあり2026-06-07

IR-SIM: A Lightweight Skill-Native Simulator for Navigation, Learning, and Benchmarking

Simulation plays a key role in automated robotics research supported by large language models (LLMs). However,

センサ/時系列深層学習軽量化・量子化生成画像テキスト

用途: 生成
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-07

Artificial Intelligence for Mathematical Reasoning: An Integrated Survey of Language Models, Neuro-symbolic Systems, and Verified Discovery

Mathematical reasoning has long served as a stringent test of machine intelligence; over the past decade, it h

MI向き自然言語処理大規模言語モデル生成テキストマルチモーダル

用途: 生成
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-07

Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?

Recently, large time series models (LTSMs) have gained increasing attention due to their similarities to large

センサ/時系列自然言語処理大規模言語モデルテキスト時系列

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-07

Can LLMs understand LilyPond? A benchmark for symbolic music generation and understanding

Symbolic music evaluation for large language models remains fragmented across representations, datasets, and m

品質予測/異常検知深層学習Transformer分類生成テキスト

用途: 分類
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-07

TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding

Chain-of-thought (CoT) reasoning has proven effective for enhancing problem-solving in large language models.

自然言語処理大規模言語モデル画像テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-07

Segmentation-Assisted Brain MRI Synthesis with Cross-Image Multi-Contrast Feature Memory Bank Retrieval Augmentation

Multi-contrast brain MRI provide complementary soft-tissue characteristics that aid in the screening and diagn

表形式向きコンピュータビジョンセグメンテーション生成画像テキスト

用途: 生成
難易度: Easy
コスト: Low

→

huggingfaceGitHubありHugging Faceあり2026-06-07

Trajectory-Refined Distillation

On-policy distillation (OPD) has become a central post-training tool for large language models (LLMs), providi

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-07

presidio — An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

presidioは、テキスト、画像、構造化データを含む敏感データを検出、削除、マスク、アノニマイズするオープンソースフレームワークです。自然言語処理、パターンマッチング、カスタマイズ可能なパイプラインをサポートします。

表形式向き深層学習Transformer分類検出画像

用途: データのプライバシーを保護する
難易度: Easy
コスト: Low

→

arxivGitHubあり2026-06-06

Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses

LLM agents increasingly rely on external inference conditions: prompts, tools, memory, SOPs, skills, and harne

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-06

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in visual understanding, yet the

説明可能品質予測/異常検知自然言語処理大規模言語モデル画像テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-06

Diffusion Language Model Parallel Decoding via Product-of-Experts Bridge

Diffusion language models (DLMs) offer substantial speed advantages through parallel decoding, but the lack of

品質予測/異常検知深層学習軽量化・量子化生成テキスト

用途: 生成
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-06

Defending Against Malicious Finetuning by Scaling Train-time Adversarial Attacks

Current open-weight large language models (LLMs) are prone to malicious finetuning attacks, which could compro

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-06

Illusions of the Gold Standard: A Large-scale Analysis of Human Evaluation Protocols for Long-form Text Generation

Human evaluation plays a critical role in assessing the quality of generated text. However, the reliability an

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-06

How Much MRI Preprocessing Is Enough? A Cost-Utility Study for Brain MRI Foundation Models

MRI preprocessing defines the input distribution seen by brain MRI foundation models, yet it is usually treate

深層学習Transformer分類セグメンテーション回帰

用途: 分類
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-06

Property-Informed Diffusion-Based Text-to-Microstructure Generation

Designing 3D metamaterial microstructures that meet the intended functions remains a major challenge, as it ty

自然言語処理RAG生成テキスト3D

用途: 生成
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-06

VideoWeaver: Evaluating and Evolving Skills for Agentic Long Video Generation

Recent agent frameworks such as Claude Code, Codex, and OpenClaw are strong at tool use and orchestration, but

MI向き品質予測/異常検知自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Hard
コスト: High

→

githubGitHubあり2026-06-06

testtimescaling.github.io — "what, how, where, and how well? a survey on test-time scaling in large language models" repository

大規模言語モデルのテスト時間調整に関する調査のリポジトリ。

自然言語処理大規模言語モデルテキスト

用途: 大規模言語モデルのテスト時間調整
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-06

DiT-Extrapolation — Official implementation for "RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers" (ICML 2025) , UltraViCo (ICLR 2026) and UltraImage

分類問題では、多くの場合、ラベルは存在しないため、従来の学習アルゴリズムでは困難に感じられるが、In-Context Multiple Instance Learningという手法を使用することで、低ラベル環境で効率的に

深層学習Transformer生成画像動画

用途: 多クラス分類タスク
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-06

awesome-nlp — :book: A curated list of resources dedicated to Natural Language Processing (NLP)

このリポジトリは自然言語処理(NLP)に関するリソースをまとめたものです。

自然言語処理テキスト

用途: NLPリソースのまとめ
難易度: Easy
コスト: Medium

→

arxivGitHubあり2026-06-05

LLM-Guided Evolution for Medical Decision Pipelines

Adapting large language models (LLMs) to clinical workflows often requires costly fine-tuning or manual prompt

説明可能自然言語処理大規模言語モデル分類画像テキスト

用途: 分類
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-05

RhinoVLA Technical Report

この論文では、VLAモデルをedgeハードウェアにデプロイするための手法を提案しています。この手法は、VLAモデルをedgeハードウェアにデプロイするためのフレームワークです。この手法は、edgeハードウェアを利用してV

深層学習軽量化・量子化画像テキストマルチモーダル

用途: VLAモデルをedgeハードウェアにデプロイするための手法
難易度: Hard
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

SWE-Explore: Benchmarking How Coding Agents Explore Repositories

Repository-level coding benchmarks such as SWE-bench have driven a rapid surge in the capabilities of coding a

深層学習軽量化・量子化検出テキスト

用途: 検出
難易度: Easy
コスト: Low

→

huggingfaceHugging Faceあり2026-06-05

On the Geometry of On-Policy Distillation

On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training

深層学習軽量化・量子化検出生成テキスト

用途: 検出
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices

We present SigmaScale, a method for learning auxiliary scaling matrices S to aid truncated Singular Value Deco

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-05

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

Large language models exhibit impressive zero-shot capabilities across a wide range of downstream tasks. Howev

品質予測/異常検知自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

MMAE: A Massive Multitask Audio Editing Benchmark

We introduce MMAE, a Massive Multitask Audio Editing benchmark, serving as the first comprehensive evaluation

MI向き自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization

Despite being a pivotal frontier, interactive world modeling remains underexplored in terms of the versatile c

コンピュータビジョン3D・点群テキスト3D

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-05

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

Video understanding is being rapidly transformed by multimodal large language models (MLLMs), as research move

深層学習軽量化・量子化画像テキスト音声

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

dots.tts Technical Report

We present dots.tts, a 2B-parameter continuous autoregressive text-to-speech (TTS) foundation model that model

センサ/時系列品質予測/異常検知深層学習軽量化・量子化生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

Towards Retrieving Interaction Spaces for Agentic Search

Retrieval for search agents is still inherited from non-agentic information retrieval: a retriever ranks the c

自然言語処理大規模言語モデル検索テキスト

用途: 検索
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors

Despite advances in 3D scene understanding, existing 3D Large Multimodal Models operate in offline settings, r

深層学習軽量化・量子化生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

Confidence-based loss weighting is usually avoided in generative models because it accelerates errors when the

センサ/時系列自然言語処理ファインチューニング生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

Empirical Study on the Characteristics and Evolution of AI-usage in GitHub Repositories: Evidence from Code Comments

Developers increasingly use AI tools such as ChatGPT, Copilot, and Claude in everyday software workflows, but

深層学習Transformer分類生成テキスト

用途: 分類
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives

Hard-negative source selection for dense retrieval is usually decided only after fine-tuning and downstream ev

深層学習Transformer検索テキスト

用途: 検索
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-05

How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling

Harmony is a compact symbolic layer where mathematical pitch relations, acoustic consonance, and musical conve

説明可能センサ/時系列品質予測/異常検知深層学習Transformer分類テキスト音声

用途: 分類
難易度: Easy
コスト: Low

→

githubGitHubあり2026-06-05

Causal-Forcing — [ICML 2026] Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation" & Causal Forcing++

この論文では、Causal-Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive

品質予測/異常検知深層学習軽量化・量子化生成テキスト動画

用途: 高品質のビデオ生成を実現する。
難易度: Easy
コスト: High

→

arxivGitHubあり2026-06-04

TorchKM: A GPU-Oriented Library for Kernel Learning and Model Selection

TorchKM is an open-source library for kernel machines, including support vector machines, kernel logistic regr

CPUで試しやすい強化学習方策勾配 (PPO / A3C)回帰テキスト

用途: 回帰
難易度: Hard
コスト: High

→

arxivGitHubあり2026-06-04

A Conversational Framework for Human-Robot Collaborative Manipulation with Distributed Generative AI models

この研究では、人間-ロボット協力のためのDistributed Conversational Frameworkを提案します。

自然言語処理大規模言語モデル生成画像テキスト

用途: 人間-ロボット協力
難易度: Hard
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents

Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills i

MI向き深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

A Geometric Account of Activation Steering through Angle-Norm Decomposition

Linear activation steering has gained popularity as a simple and empirically effective way to control language

説明可能深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-06-04

Answer Presence Drives RAG Rewriting Gains

Retrieval-augmented QA pipelines often route retrieved passages through an LLM rewriter before a smaller reade

品質予測/異常検知自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents

Latent visual reasoning (LVR) inserts supervised latent tokens between perception and answer generation in vis

品質予測/異常検知コンピュータビジョンマルチモーダル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

Evaluating LLM mediators remains challenging, as mediation unfolds as a real-time trajectory shaped by disputa

MI向き深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

Object insertion aims to seamlessly composite a reference object into a specified region of a background image

MI向き品質予測/異常検知コンピュータビジョン3D・点群生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

OpenSkill: Open-World Self-Evolution for LLM Agents

Self-evolving agents requires adaptation after deployment, but existing approaches assume a usable learning lo

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term in

機械学習教師あり学習テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Low

→

huggingfaceHugging Faceあり2026-06-04

UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs

We introduce UnpredictaBench, an evaluation that tests the ability of large language models (LLMs) to capture

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning a

自然言語処理大規模言語モデル画像テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

LLM Explainability with Counterfactual Chains and Causal Graphs

Causal graphs provide a high-level language for making mechanisms transparent. Recent work uses Large Language

説明可能自然言語処理大規模言語モデル分類テキスト

用途: 分類
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-04

Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

Despite the rapid progress of Vision-Language Models (VLMs), the field lacks benchmarks that rigorously diagno

品質予測/異常検知深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation

We study the transformation of autoregressive models (ARLMs) into diffusion language models (DLMs). Rather tha

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark

In real-world applications, models are expected to perform reliably across diverse settings. Yet, many existin

自然言語処理大規模言語モデル画像テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing

深層学習RNN / LSTMテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story prog

自然言語処理ファインチューニングテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Low

→

huggingfaceHugging Faceあり2026-06-04

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

Planning for real-world problems by language models often involves both world and user constraints, which may

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing

Developing unified video generation and editing models capable of interpreting interleaved multimodal inputs i

深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by under

深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-04

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery

自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs

Large language models can reproduce training data, but existing memorization evaluations mostly measure whethe

深層学習軽量化・量子化生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Towards One-to-Many Temporal Grounding

Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predo

品質予測/異常検知自然言語処理大規模言語モデルテキスト動画

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Latent Reasoning with Normalizing Flows

Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the

自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

Video event prediction (VEP) requires models to infer unobserved future states from partial video evidence. Ex

自然言語処理大規模言語モデル画像テキスト動画

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) excel at 2D semantic understanding but lack intrinsic 3D awareness, r

表形式向き自然言語処理大規模言語モデルテキスト動画3D

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions

Large language models are increasingly used to simulate social media users and infer how individuals may respo

深層学習Transformerテキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-04

Benchmark Everything Everywhere All at Once

Benchmarks are fundamental for evaluating and advancing LLMs and MLLMs by providing standardized and explicit

品質予測/異常検知自然言語処理大規模言語モデルテキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-04

Irodori-TTS — A Flow Matching-based Text-to-Speech Model with Emoji-driven Style Control

Emotion-driven Style Controlを使用してテキストから声の変換が実行され、感情のあるテキストをエモタイザブルな声に変換することが可能になります。

生成AI拡散モデル生成テキスト音声

用途: テキスト-to-声の変換
難易度: Easy
コスト: High

→

arxivGitHubあり2026-06-03

HyFAD: Hybrid Time-Frequency Diffusion with Frequency-Aware Embedding for Time Series Imputation

Diffusion models have demonstrated strong performance in time series modeling due to their ability to progress

センサ/時系列自然言語処理埋め込み・検索生成テキスト時系列

用途: 生成
難易度: Hard
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Why Muon Outperforms Adam: A Curvature Perspective

Muon improves training efficiency over Adam in large language-model training by about two times, but the local

深層学習正規化・最適化手法テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Self-Evaluation Is Already There: Eliciting Latent Judge Calibration in Base LLMs with Minimal Data

Large language models are increasingly evaluated by other models, raising a natural question: can a model pred

少数データ向き品質予測/異常検知深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Vision language models (VLMs) excel at many tasks but still struggle with spatial reasoning when critical info

表形式向き説明可能コンピュータビジョンマルチモーダル画像テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on

自然言語処理RAGテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Low

→

huggingfaceHugging Faceあり2026-06-03

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

We introduce VideoKR, the first large-scale training corpus specifically designed to strengthen knowledge- and

自然言語処理ファインチューニング生成テキスト動画

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

Experience internalization converts contextual experience from past interactions into reusable parametric capa

品質予測/異常検知深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Personal AI Agent for Camera Roll VQA

We study the personal camera roll visual question answering setting. In this setting, a conversational AI assi

深層学習軽量化・量子化QA画像テキスト

用途: QA
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-06-03

SePO: Self-Evolving Prompt Agent for System Prompt Optimization

System prompt optimization improves agent behavior without modifying the underlying model, yielding human-read

自然言語処理RAG生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Video2LoRA: Parametric Video Internalization for Vision-Language Models

Processing video in vision-language models is expensive: each frame occupies hundreds of tokens, and inference

自然言語処理ファインチューニング要約QA画像

用途: 要約
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

BRepCLIP: Contrastive Multimodal Pretraining on BRep Primitives for CAD Understanding

Learning representations of CAD models is a largely open problem. While 3D representation learning has flouris

深層学習Transformer分類生成埋め込み

用途: 分類
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Audio Interaction Model

Audio is an inherently interactive modality, yet today's Large Audio Language Models (LALMs) are offline, and

強化学習マルチエージェントテキスト音声

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

ZipSplat: Fewer Gaussians, Better Splats

Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forw

品質予測/異常検知深層学習Transformer画像テキスト3D

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation

Autoregressive mesh generation has gained attention by tokenizing meshes into sequences and training models in

深層学習Attention機構生成テキスト3D

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

Training Data Attribution (TDA) seeks to trace a model's predictions back to its training data. The gold stand

センサ/時系列深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-03

Evaluating Large Language Models in Dynamic Clinical Decision-Making with Standardized Patient Cases

Large language models (LLMs) are increasingly proposed as clinical agents, yet static, single-turn benchmarks

MI向き自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-03

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

Instruction-guided speech editing requires a model to modify specified speech attributes while preserving unre

自然言語処理大規模言語モデル生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Text-to-Image Models Need Less from Text Encoders Than You Think

Text-to-image models rely on text prompts as their primary interface to human intent. Prompts are encoded by a

品質予測/異常検知深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

Equipping Large Language Models (LLMs) to execute reliable multi-step workflows has become a central challenge

自然言語処理大規模言語モデル検出テキスト

用途: 検出
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

MAOAM: Unified Object and Material Selection with Vision-Language Models

Selection is a core operation in interactive image editing. To be practical, a user should be able to specify

MI向き自然言語処理RAG生成セグメンテーション画像

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs

Inference-time scaling has emerged as a critical avenue for enhancing Large Language Models' performance, yet

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-02

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science.

深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Qwen-Image-Flash: Beyond Objective Design

Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet

MI向き深層学習軽量化・量子化生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Multimodal agents in robotics, AR, and autonomous driving must reason about places and layouts from continuous

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト動画

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-02

Self-Distilled Policy Gradient

On-policy self-distillation, where a language model conditions on privileged context to supervise its own gene

深層学習軽量化・量子化生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-06-02

KletterMix: Climbing Toward High-Quality German Pretraining Data

High-quality pretraining data is a central ingredient in modern language models, but German-language resources

MI向き品質予測/異常検知自然言語処理RAGテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

MemTrain: Self-Supervised Context Memory Training

Memory is an indispensable capability for long-horizon LLM agents, enabling them to preserve and utilize infor

品質予測/異常検知自然言語処理大規模言語モデルテキスト自己教師強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching

Wide-baseline matching (WBM) requires integrating geometric understanding, viewpoint changes, fine-grained per

自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

We present AAD-1, an Asymmetric Adversarial Distillation framework for One-step autoregressive image-to-video

深層学習軽量化・量子化生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Large Language Models Hack Rewards, and Society

Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs

自然言語処理大規模言語モデル生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts

Existing benchmarks for MLLM-generated web artifacts assess interaction through local evidence and miss the re

品質予測/異常検知自然言語処理大規模言語モデル画像テキスト動画

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification

Structured financial audit verification is difficult for language-model agents because correctness depends on

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Computer-use agents extend language models from text generation to sustained interaction with files, terminals

自然言語処理大規模言語モデル検出生成テキスト

用途: 検出
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents

Large language model (LLM) agents are evolving from request-response assistants into long-running software act

自然言語処理大規模言語モデル回帰画像テキスト

用途: 回帰
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

When Graph Tokens Sink: A Mechanistic Analysis of Graph Language Models

Graph Language Models (GLMs) have become a promising direction for adapting Large Language Models (LLMs) to gr

深層学習軽量化・量子化テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-02

Unlocking Feature Learning in Gated Delta Networks at Scale

Training and scaling Large Language Models demand enormous computational resources, motivating both efficient

深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-02

Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spe

深層学習軽量化・量子化生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-01

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

Agentic language model systems alternate between two structurally distinct step types: structured tool calls (

品質予測/異常検知深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-01

Parametric Social Identity Injection and Diversification in Public Opinion Simulation

Large language models (LLMs) have recently been adopted as synthetic agents for public opinion simulation, off

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-01

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeated

品質予測/異常検知自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-01

AdaCodec: A Predictive Visual Code for Video MLLMs

Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existin

自然言語処理大規模言語モデル画像テキスト動画

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-01

LLM Anonymization Against Agentic Re-Identification

Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become c

自然言語処理大規模言語モデル検出テキスト

用途: 検出
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-01

Cosmos 3: Omnimodal World Models for Physical AI

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, i

深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-06-01

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more s

品質予測/異常検知深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-06-01

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. Ho

自然言語処理RAG回帰テキストマルチモーダル

用途: 回帰
難易度: Easy
コスト: High

→

githubGitHubあり2026-06-01

FinGPT — FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

このリポジトリでは、Lecture Learning Modelsに対してReinforcement Learningを実行するライブラリを提供しています。

自然言語処理大規模言語モデルテキスト

用途: 可搬性のあるReinforcement Learning
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-05-31

SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces

Large language models are increasingly deployed as coding agents, shifting safety from individual responses to

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-31

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

The rapid progress of frontier large language models has led to widespread benchmark saturation, limiting the

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

githubGitHubあり2026-05-31

Open-dLLM — Open diffusion language model for code generation — releasing pretraining, evaluation, inference, and checkpoints.

Open-dLLMはOpen diffusion language modelを公開しており、コード生成の前トレーニング、評価、推論、チェックポイントを公開しています。

自然言語処理大規模言語モデル生成テキスト

用途: コード生成の問題を解決する
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-30

SDR: Set-Distance Rewards for Radiology Report Generation

Reinforcement learning with verifiable rewards has rapidly advanced reasoning in vision--language models. Howe

品質予測/異常検知深層学習Transformer生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-30

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substanti

品質予測/異常検知自然言語処理RAGテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-30

SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory

AI glasses present a compelling platform for AI agents to serve as personalized memory assistants. To be genui

深層学習Transformer分類QA画像

用途: 分類
難易度: Easy
コスト: High

→

arxivGitHubあり2026-05-29

Welfare, Improvability, and Variance: A Principal-Agent Approach to Optimal Benchmark Item Aggregation

AI benchmarks have well-documented limitations, with prior work examining contamination, saturation, and const

自然言語処理RAGテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Low

→

huggingfaceGitHubありHugging Faceあり2026-05-29

The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models

Large Language Models exhibit paradoxical fragility in fundamental arithmetic, implying a disconnect between i

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-29

MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding

Multimodal Large Language Models (MLLMs) have demonstrated significant achievements in general visual question

品質予測/異常検知自然言語処理大規模言語モデル分類QA画像

用途: 分類
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-29

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as the cornerstone for shaping the

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-05-29

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

Speech translation systems increasingly span speech-to-text translation (S2TT), speech-to-speech translation (

品質予測/異常検知コンピュータビジョン動画認識生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-29

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

Humans can effortlessly perceive spatial layouts, form cognitive representations, reason about spatial relatio

コンピュータビジョン3D・点群検出テキスト3D

用途: 検出
難易度: Easy
コスト: High

→

githubGitHubあり2026-05-29

prompt-in-context-learning — Awesome resources for in-context learning and prompt engineering: Mastery of the LLMs such as ChatGPT, GPT-3, and FlanT5, with up-to-date and cutting-edge updates.

このリポジトリはChatGPT、GPT-3、FlanT5などのLLMsの在り方や、in-context learningとprompt engineeringのリソースをまとめたものです。

自然言語処理大規模言語モデルテキスト

用途: LLMマスターへのリソース
難易度: Easy
コスト: High

→

arxivGitHubあり2026-05-28

PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers

ポーカーはIAの代表的な問題です。しかし、強いエキスパートレベルを達成するために、長時間にわたるトレーニングと解釈が必要とされてきました。LLMを使用すると、トレーニングやソルバーが不要となり、ポーカーをプレイすることが

説明可能自然言語処理大規模言語モデルテキスト

用途: ポーカーゲーム
難易度: Hard
コスト: High

→

huggingfaceHugging Faceあり2026-05-28

Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajector

品質予測/異常検知自然言語処理大規模言語モデルテキスト自己教師強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-28

Multimodal Music Recommendation System using LLMs

Music recommendation systems typically treat songs as opaque tokens, relying on collaborative interaction hist

センサ/時系列品質予測/異常検知深層学習Transformerテキスト音声マルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-28

Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning

We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision b

自然言語処理ファインチューニング画像テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-27

Pruning and Distilling Mixture-of-Experts into Dense Language Models

Mixture-of-Experts (MoE) is now the dominant architecture for frontier language models, yet it requires all ex

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-27

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Language models can use verifiable rewards to improve at a wide variety of reasoning tasks. However, both para

説明可能深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-27

Augmenting Attention with Exponentially Decaying Memory Improves Query-Aware KV Sparsity

Efficient inference is critical for long-context language models, where attention computation and KV-cache acc

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

githubGitHubあり2026-05-27

FlowEdit — Official implementation of the paper: "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models"

画像エディティング用推論モデルの改良方法についての公式実装であるFlowEdit。

生成AI拡散モデル生成画像テキスト

用途: 画像エディティング用推論モデルの改良
難易度: Easy
コスト: High

→

githubGitHubあり2026-05-27

memvid — Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.

MemVidは、サーバーレスで単一ファイルの記憶層を提案し、AIエージェントが即時検索と長期的な記憶を持つようにする記憶層です。

自然言語処理大規模言語モデル生成テキスト動画

用途: AIエージェントの記憶を管理する
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-26

DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework tha

品質予測/異常検知自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-25

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple eva

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

githubGitHubあり2026-05-25

Matcha-TTS — [ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Matcha-TTSは、高速で条件付き流のマッチングを実現するTTSアーキテクチャであり、話者の特徴を考慮する。

生成AI拡散モデルテキスト音声

用途: TTSアーキテクチャ設計
難易度: Easy
コスト: High

→

githubGitHubあり2026-05-24

custom-diffusion — Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)

CVPR 2023で発表されたCustom Diffusionは、テキストから画像を生成するプロセスをカスタマイズできるDiffusionモデルです。テキストからイメージを生成する際の要件を設定できるので、画像生成の柔軟

自然言語処理ファインチューニング生成画像テキスト

用途: 画像生成のカスタマイズ
難易度: Easy
コスト: High

→

githubGitHubあり2026-05-23

PaddleNLP — Easy-to-use and powerful LLM and SLM library with awesome model zoo.

PaddleNLPは、分類モデルと言語モデルを簡単に使用できる強力なライブラリであり、モデルズーという素晴らしいモデル・ザーのコレクションを備えています。

深層学習Transformerテキスト

用途: 分類モデルと言語モデル
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-05-22

SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

Vision-Language Models (VLMs) are increasingly deployed in embodied environments, where they need produce nume

自然言語処理ファインチューニング画像テキストマルチモーダル

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→

githubGitHubあり2026-05-22

rasa — 💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

rasaは、テキストやボイスベースの会話を自動化するオープンソースの機械学習フレームワークです。自然言語理解(NLU)、会話管理、 slackやFacebook等への接続など、幅広い機能を提供しています。

自然言語処理テキスト

用途: チャットボット作成
難易度: Easy
コスト: Medium

→

githubGitHubあり2026-05-21

langextract — A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

LLMを使用して、自然言語処理における情報抽出を行うためのPythonライブラリです。

自然言語処理大規模言語モデル画像テキスト

用途: 自然言語処理情報抽出
難易度: Easy
コスト: High

→

arxivGitHubあり2026-05-19

optimize_anything: A Universal API for Optimizing any Text Parameter

LLM（大規模言語モデル）を利用してテキストパラメータを最適化するシステムを提案しました。このシステムは、単一のシステムでさまざまなタスク（単一タスク、複数タスク、未知の入力など）を実行可能でした。また、システムは、最適

自然言語処理大規模言語モデルテキスト

用途: 任意のテキストパラメータを最適化することが可能
難易度: Hard
コスト: High

→

githubGitHubあり2026-05-19

spaCy — 💫 Industrial-strength Natural Language Processing (NLP) in Python

💫 Industrial-strength Natural Language Processing (NLP) in Python

機械学習教師あり学習分類テキスト

用途: 分類
難易度: Easy
コスト: Low

→

githubGitHubあり2026-05-14

VidCom2 — [EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

VidCom2は、ビデオ圧縮を改善するためのPlug-and-Playのインフェレンスアクセレレーションを備えたVideo Large Language Modelsです。

深層学習軽量化・量子化テキスト動画マルチモーダル

用途: ビデオ圧縮改善
難易度: Easy
コスト: High

→

githubGitHubあり2026-05-13

maths-cs-ai-compendium — Become a cracked AI/ML Research Engineer

Becoming a cracked AI/ML Research Engineerには、AI/ML研究者のスキルと知識を高めるための手法が紹介されています。

コンピュータビジョンマルチモーダルテキスト音声

用途: AI/ML研究者を育成
難易度: Easy
コスト: High

→

arxivGitHubあり2026-05-09

ARES-LSHADE: Autoresearch-Enhanced LSHADE with Memetic Polish for the GNBG Benchmark

We present ARES-LSHADE, a memetic differential-evolution variant submitted to the GECCO 2026 competition on LL

自然言語処理大規模言語モデルテキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

→

arxivGitHubあり2026-05-07

CoupleEvo: Evolving Heuristics for Coupled Optimization Problems Using Large Language Models

CoupleEvoは、大規模言語モデルを活用したカップルの最適化問題の自動ヒューリスティクーデザインアプローチを提案します。3つの進化的調整戦略が提示されます。

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト

用途: カップルの最適化問題を解決する
難易度: Hard
コスト: High

→

huggingfaceHugging Faceあり2026-05-04

Liberating LLM Capabilities in Full-Duplex Speech Models

Speech-based large language models are typically constrained to spoken replies, which limits their user-facing

自然言語処理大規模言語モデル生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-04-16

Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing

Diffusion-based image editing has achieved strong visual fidelity under natural language instructions, yet mos

品質予測/異常検知深層学習軽量化・量子化画像テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

→