MLinfo | 機械学習・AI論文まとめ

transformers — 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

🤗 Transformersは、テキスト・ビジョン・音声など複雑なモデル定義をサポートするフレームワークで、インフェレンスターやトレーニングに使用できる。

深層学習Transformer分類テキスト音声

用途: 機械学習モデル定義
難易度: Easy
コスト: High

強化学習方策勾配 (PPO / A3C)分類テキスト

paperless-ngx — A community-supported supercharged document management system: scan, index and archive all your documents

paperless-ngxは、コミュニティによってサポートされたスーパーチャージドのドキュメント管理システムで、ドキュメントのスキャン・インデックス・アーカイブが可能である。

用途: ドキュメント管理
難易度: Easy
コスト: Low

diffusers — 🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

.diffusion モデルのライブラリ。画像・動画・音声生成に利用可能。

生成AI拡散モデル生成画像テキスト

用途: 画像・動画・音声生成
難易度: Easy
コスト: High

コンピュータビジョン物体検出分類セグメンテーション画像

label-studio — Label Studio is a multi-type data labeling and annotation tool with standardized output format

データラベル化と注釈化を行うためのツールです。

用途: データラベル化ツール
難易度: Easy
コスト: Low

cs249r_book — Machine Learning Systems

マシンラーニングシステムの理論と実装に関する本。

深層学習テキスト

用途: 機械学習システム
難易度: Easy
コスト: Medium

Medical_Image_Analysis — Foundation models based medical image analysis

医学画像分析は、医療の診断や治療を支援するために画像に記載されたデータから情報を抽出する研究分野です。この研究では、foundation modelsを用い、医療画像分析のための新しいアプローチを提案しました。found

自然言語処理大規模言語モデル生成画像テキスト

用途: 医学画像分析
難易度: Easy
コスト: High

自然言語処理大規模言語モデルテキスト音声マルチモーダル

screenpipe — YC (S26) | Record your screen 24/7 and plug into your agents. Local, private, secure. Connect to OpenClaw, Hermes agent and 100+ apps

ユーザーの行動を認識し、オートエージェントを構築するためのツール。

用途: オートエージェント構築
難易度: Easy
コスト: High

Meshroom — Node-based Visual Programming Toolbox

ノードベースのビジュアルプログラミングツールです。

コンピュータビジョン3D・点群画像テキスト3D

用途: ビジュアルプログラミングツール
難易度: Easy
コスト: High

unsloth — Unsloth is a local UI for training and running Gemma 4, Qwen3.6, DeepSeek, Kimi, GLM and other models.

Unsloth Studioは、オープンモデルのトレーニングと実行を支援するWebUIです。このライブラリは、Gemma4、Qwen3.5などのオープンモデルのテストとトレーニングを支援するために使われます。

自然言語処理大規模言語モデルテキスト音声

用途: オープンモデルのトレーニングと実行
難易度: Easy
コスト: High

深層学習Transformer画像テキストマルチモーダル

sglang — SGLang is a high-performance serving framework for large language models and multimodal models.

SGLangは、大規模言語モデルのサービングフレームワークです。このライブラリは、高性能なサービスフレームワークで、大規模言語モデルのサービングをサポートしています。

用途: 大規模言語モデルのサービングフレームワーク
難易度: Easy
コスト: High

自然言語処理大規模言語モデルテキストマルチモーダル

ai-agent-book — 《深入理解 AI Agent：设计原理与工程实践》（李博杰著）开源主仓库：全书正文、编译版 PDF 与按章配套代码

この論文では、現在のVision-Language-Benchmark（VLB）を超える、MLLMがアクティブな観察を実演できるようにするためのバenchmark、ActiveVisionを提案する。このActiveVi

用途: 弁論の実際的な対象を形成するためにAIが活用される
難易度: Easy
コスト: High

Sana — SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

SANAは、高解像度画像生成モデルSANAを紹介する本研究であり、低計算コストで優れた高解像度画像を生成できる。

用途: 高解像度画像合成
難易度: Easy
コスト: High

haystack — Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

オープンソースのAIオーケストレーションフレームワークです。LLMアプリケーションの構築に必要なパイプラインやエージェントワークフローの設計ができるようになっています。

深層学習Transformer生成要約テキスト

用途: LLMアプリケーションの構築
難易度: Easy
コスト: High

FunASR — Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenAI-compatible/MCP serving.

電気生理信号から表現を学習し、脳コンピューターインターフェースの開発を支援する。

深層学習Transformer分類検出テキスト

用途: 電気生理信号から表現を学習する
難易度: Easy
コスト: High

DocsGPT — Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.

このリポジトリでは、トークナイザーの最適化を提供しています。

用途: トークナイザーの最適化
難易度: Easy
コスト: Medium

rasa — 💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

rasaは、テキストやボイスベースの会話を自動化するオープンソースの機械学習フレームワークです。自然言語理解(NLU)、会話管理、 slackやFacebook等への接続など、幅広い機能を提供しています。

自然言語処理テキスト

用途: チャットボット作成
難易度: Easy
コスト: Medium

表形式向き深層学習Transformer分類検出画像

presidio — An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

presidioは、テキスト、画像、構造化データを含む敏感データを検出、削除、マスク、アノニマイズするオープンソースフレームワークです。自然言語処理、パターンマッチング、カスタマイズ可能なパイプラインをサポートします。

用途: データのプライバシーを保護する
難易度: Easy
コスト: Low

3D-Aware VLMs with Implicit and Explicit Geometries

3次元空間理解技術のための新しいアプローチであるVLM-IE3D（Vision-Language Models with Implicit and Explicit 3D geometry）を提案しました。VLM-IE3

コンピュータビジョン3D・点群検出画像テキスト

用途: 3次元空間理解技術の開発
難易度: Hard
コスト: High

Agentic coding without the cloud: evaluating open-weight large language models on longitudinal data preparation tasks

Large language models (LLMs) and agents are now widely used tools in code development, with data typically sen

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

When Are Reasoning-Based Guardrails Not Efficient? ResponseGuard: A Fast Vision-Language Guard for Real-Time Moderation

A vision-language AI assistant returns its answer as a stream of generated tokens. Therefore, a safety guard t

深層学習軽量化・量子化検出画像テキスト

用途: 検出
難易度: Hard
コスト: High

DINOde: Continuous Vision-Text Alignment for Open-Vocabulary Semantic Segmentation

Open-vocabulary semantic segmentation (OVSS) leverages textual semantics to segment objects beyond predefined

自然言語処理RAGセグメンテーション画像テキスト

用途: セグメンテーション
難易度: Hard
コスト: High

From a Word-Level Dictionary to Sentence-Level Semantics: Multilingual Grievance Labelling with Contextual Models

Grievance is one of the warning signs analysts look for when assessing threats of violence. It is increasingly

自然言語処理RAGテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Low

REFACT: Adaptive Fact Restatement for Compact and Faithful Chain-of-Thought Reasoning

長形推論のための言語モデルが、提供されたコンテキストから乖離した論理を生成する可能性があることを指摘し、コンテキストと推論論理をより適切に融合するため、 REFACT (REstating Facts in Adapti

用途: Chain-Of-Thought (CoT) の改善
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成画像テキスト

Inference-Time Scaling of Diffusion Models via Progressive Seed Pruning

ディフュージョンモデルにおける初期的なNoise Seed の影響が、モデルが生成する高質のイメージに大きく影響していることを提示し、Seed Search 時の時間的負荷を削減するための方法を提案した。

用途: ディフュージョンモデルのサケリング
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション生成テキスト動画

T-STAR: A Large-Scale Benchmark for Spatio-Temporal Panoptic Scene Graph Generation in Satellite Video

Structured understanding of satellite video is essential for advancing dynamic geospatial scene analysis from

用途: 生成
難易度: Hard
コスト: High

深層学習Transformer画像テキストマルチモーダル

MVEI & EmObserver: Empowering MLLM-Oriented Visual Emotional Intelligence via Emotion Statement Judgement

感情認識は、現代のアギを促進するために不可欠ですが、大規模

用途: 感情認識
難易度: Hard
コスト: High

huggingfaceHugging Faceあり2026-07-23

K12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMs

Large language models are increasingly used in K-12 education, but existing benchmarks mainly test exam questi

自然言語処理大規模言語モデルQA画像テキスト

用途: QA
難易度: Easy
コスト: High

awesome-llm-unlearning — A resource repository for machine unlearning in large language models

このリポジトリは大規モデルの無学習に関するリソースをまとめたものです。

用途: 大規模言語モデルの無学習
難易度: Easy
コスト: High

FinGPT — FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

このリポジトリでは、Lecture Learning Modelsに対してReinforcement Learningを実行するライブラリを提供しています。

用途: 可搬性のあるReinforcement Learning
難易度: Easy
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成テキスト動画

Causal-Forcing — [ICML 2026] Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation" & Causal Forcing++

この論文では、Causal-Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive

用途: 高品質のビデオ生成を実現する。
難易度: Easy
コスト: High

表形式向き自然言語処理大規模言語モデル画像テキスト表形式

unstructured — Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

ドキュメントを構造化するために使えるオープンソースのETLソリューション。

用途: ドキュメントの構造化
難易度: Easy
コスト: High

txtai — 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

LLMを利用するために、セマンティック検索やLLMのオーケストレーションなどを行えるフレームワーク。

深層学習Transformer生成テキスト

用途: セマンティック検索
難易度: Easy
コスト: High

GaugeQuant: Online Learning of Quantization-Optimal Bases from LLM Symmetries

Transformers are known to have internal continuous symmetries that leave outputs invariant, while modifying qu

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Antigen-specific Antibody Multi-modal Foundation Model for Functional Antibody Design

この研究では、抗原特異性抗体を設計するために、抗原および抗体の間でエピトープレベルでのペアリングが必要であることを考慮した、抗原特異性の抗体多モーダルファンデーションモデル（AAMFM）を提案しました。

自然言語処理RAG分類生成テキスト

用途: 抗原特異性抗体設計
難易度: Hard
コスト: High

表形式向き自然言語処理大規模言語モデルテキスト表形式

Auto-Fill: Learning to Predict Missing Values Accurately with Specialist Language Models

Predicting missing cell values in tabular data is a fundamental problem in data cleaning. While state-of-the-a

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

ArbiGraph: Arbitrarily Scalable Verifiable Task Graphs for Evaluating Context Management

We introduce ARBIGRAPH, a benchmark generator for evaluating whether tool-assisted language agents can retain,

MLOpsモデルデプロイテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

PRO-LONG: Programmatic Memory Enables Long-Horizon Reasoning

Long-horizon tasks require sustained perception, reasoning, and exploration, and are a persistent challenge fo

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Rushes: A Human Preference Dataset for Pluralistic Alignment

We introduce Rushes, a dataset and benchmark for studying revealed human engagement preferences in interactive

用途: 生成
難易度: Hard
コスト: High

LKValues: Aligning Large Language Models with Sri Lankan Societal Values

Value alignment of Large Language Models (LLMs) has been shown to be culturally biased toward Western norms. T

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

emb-diversity: A Tool for Embedding-Based Measurement of Data Diversity

There is growing evidence that data diversity is crucial for developing fair and robust NLP models. However, c

自然言語処理RAGテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Low

D2VBench: Benchmarking Large Language Models with Value Dilemmas in Daily Scenarios

With the wide application of large language models (LLMs) in real-world scenarios, the value implication of th

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成画像テキスト

SHFormer: Dynamic Spectral Filtering Convolutional Neural Network and High-pass Kernel Generation Transformer for Adaptive MRI Reconstruction

Attention Mechanism (AM) selectively focuses on essential information for imaging tasks and captures relations

用途: 生成
難易度: Hard
コスト: High

自然言語処理大規模言語モデル画像テキストマルチモーダル

Development of an automated, reliable, and clinically meaningful artificial intelligence (AI) tool for diagnosing cardiac disease from conventional cardiovascular magnetic resonance (CMR) images

Aims: Cardiovascular magnetic resonance (CMR) imaging enables non-invasive assessment of myocardial structure,

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョンセグメンテーション生成画像テキスト

OSVE: One Step Video Editing with One Step Diffusion Models

Text-guided video editing with diffusion models is impractically slow, hindered by costly multi-step sampling

用途: 生成
難易度: Hard
コスト: High

深層学習Attention機構セグメンテーション画像テキスト

Lean-SAM2: Target-Anchored Memory and Encoder Acceleration for SAM2

The Segment Anything Model 2 (SAM2) has advanced temporal promptable segmentation, yet its deployment remains

用途: セグメンテーション
難易度: Easy
コスト: Medium

自然言語処理プロンプトエンジニアリング検出画像テキスト

ReferTrack: Referring Then Tracking for Embodied Visual Tracking

ReferTrack は、自然言語で対象の車両に付近する自動車を追従させるシステムである。このシステムでは、対象の車両に付近する自動車を認識する後、自動車の動きを予測する。

用途: 自動車が対象の車両に付きそわせるシステム
難易度: Hard
コスト: High

表形式向き説明可能深層学習軽量化・量子化テキスト表形式

CircuitKIT : Circuit Discovery, Evaluation, and Application Toolkit for Mechanistic Interpretability

機械学習モデルの解釈のためのツールが提案されていました。これにより、モデルがどのように機能しているかが理解できるようになります。

用途: 機械学習モデルの解釈
難易度: Easy
コスト: Low

表形式向き自然言語処理大規模言語モデルテキスト表形式

Prompt Design at Scale: How Format, Instruction Count, and Context Length Shape Instruction Adherence and Hallucination in Large Language Models

Practitioners make three prompt-design decisions with almost no controlled evidence behind them: how to format

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

説明可能品質予測/異常検知自然言語処理大規模言語モデル生成テキスト強化学習

Beyond Score Prediction: LLM-Based Essay Scoring and Feedback Generation via Reinforcement Learning with Rubric Rewards

Large language models (LLMs) have been widely applied to automated essay scoring (AES) and automated feedback

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理ファインチューニング翻訳テキスト強化学習

Reasoning Before Translation: Enhancing Legal Machine Translation with Structured Reasoning

この研究では、平衡方程式を満たすPINNs（物理基準付きニューラルネットワーク）を使用して、平均脱出時間の計算を目的とした椭球型境界条件付きPINNsを提案し、PINNsを使用した計算と実験室データを比較します。

用途: 平均脱出時間計算を目的とした椭円型境界条件付きPINNs
難易度: Hard
コスト: High

CASE: Causal Alignment and Structural Enforcement for Improving Chain-of-Thought Faithfulness

Chain-of-thought (CoT) reasoning is widely used to improve both the performance and interpretability of large

説明可能自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Hard
コスト: High

BaseRT: Advancing Best-in-Class LLM Inference with Apple M5 Neural Accelerators

Apple's M5 generation introduces a redesigned GPU architecture in which every core carries a dedicated Neural

用途: 生成
難易度: Hard
コスト: High

Is EEG-to-Text Feasible in Real-World Scenarios? An In-Depth Analysis Using a Neuropsychology-Inspired Benchmark

Translating brain signals into text could restore communication for people with severe paralysis, yet practica

MLOpsモデルデプロイテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

AutoIndex: Learning Representation Programs for Retrieval

リカバリーのためのプログラムを学習するフレームワークを提案し、そのプログラムを用いて、文書にラベルを付与する検索システムを構築する。

品質予測/異常検知自然言語処理RAGテキスト

用途: リカバリーのためのプログラムの学習
難易度: Easy
コスト: Low

Detect Early, Escalate Rarely: Anytime Detection of AI-Generated Video from the Compressed Bitstream

Detectors for AI-generated video are evaluated offline. A clip is decoded to pixels and scored once, increasin

CPUで試しやすい深層学習CNN検出画像テキスト

用途: 検出
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理RAG生成要約テキスト

FinanceComplexQA: Benchmarking Agentic Reasoning on Industrial-grade Financial Documents

Agentic Reasoning has become a transformative force in financial analysis due to its ability to integrate larg

用途: 生成
難易度: Easy
コスト: Low

品質予測/異常検知自然言語処理ファインチューニング分類生成テキスト

Moving Alphabet: A Controlled Study of Training Data for Text-to-Video Generation

Text-to-video generation has advanced significantly over the past five years through scaling of model size, da

用途: 分類
難易度: Easy
コスト: High

品質予測/異常検知深層学習軽量化・量子化テキスト動画

ABot-World-0: Infinite Interactive World Rollout on a Single Desktop GPU

We present ABot-World-0, an action-conditioned video world model for real-time, long-horizon closed-loop inter

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

Generative World Renderer at the Speed of Play

Generative world renderer AlayaRenderer receives structured world states exported from physics engines and syn

深層学習軽量化・量子化生成テキスト

用途: 生成
難易度: Easy
コスト: Medium

説明可能深層学習Transformer生成画像テキスト

Text Template Tokens Are Implicit Semantic Registers in Diffusion Transformers

Text-to-image diffusion transformers (DiTs) jointly process text and image tokens, yet their internal computat

用途: 生成
難易度: Easy
コスト: High

品質予測/異常検知深層学習Transformer生成画像テキスト

Mage-Flow: An Efficient Native-Resolution Foundation Model for Image Generation and Editing

Large-scale visual generators are increasingly capable but costly to train, fine-tune, and deploy. We introduc

用途: 生成
難易度: Easy
コスト: High

ISO: An RLVR-Native Optimization Stack

Reinforcement learning with verifiable rewards (RLVR) is rapidly advancing the reasoning capabilities of langu

深層学習正規化・最適化手法テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

Where Should Optimizer State Live? Tiered State Allocation for Memory-Efficient Mixture-of-Experts Training

Optimizer state is the largest single line item in the memory budget of mixture-of-experts (MoE) training: on

深層学習正規化・最適化手法テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceGitHubありHugging Faceあり2026-07-21

Delineate Anything v2: A Global Foundation Model for Field Delineation

Accurate agricultural field boundary delineation at large scale is a foundational task for food security, supp

自然言語処理RAG画像テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Low

githubGitHubあり2026-07-21

TextBlob — Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

テキスト分析、センチメント分析や単語分割などを行えるライブラリ。

自然言語処理テキスト音声

用途: テキスト分析
難易度: Easy
コスト: Medium

arxivGitHubあり2026-07-20

For What Reason? Interpreting Models' Encoding of Causation and Antithesis

Discourse relations provide document structure, critical to language understanding and enabling language model

説明可能深層学習Transformerテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

AlayaWorld: Interactive Long-Horizon World Modeling -- Full Technical Report

Unlike conventional video game development, which relies on labor-intensive pipelines for asset production, an

用途: 生成
難易度: Easy
コスト: High

Subliminal Clocks: Latent Time Modelling in Diffusion Language Models

Diffusion Language Models (DLMs) have recently emerged as a promising alternative to autoregressive models. Un

説明可能生成AI拡散モデルテキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceGitHubありHugging Faceあり2026-07-20

SciForma: Structure-Faithful Generation of Scientific Diagrams

Structural fidelity is essential to scientific methodology diagrams. To communicate research logic, these diag

品質予測/異常検知自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

ConsiSpace: Learning Geometric Consistency Matters for Video Spatial Reasoning

Video spatial reasoning is essential for navigation-oriented perception and long-video question answering, whe

深層学習軽量化・量子化QAテキスト動画

用途: QA
難易度: Easy
コスト: High

HOMIE: Human-object Centric Video Personalization via Multimodal Intelligent Enchancement

Human-object centric video personalization (HOCVP) is a core task within subject-driven video generation. Howe

用途: 生成
難易度: Easy
コスト: High

説明可能自然言語処理ファインチューニング分類生成異常検知

Token-Level Off-Policy Learning for Faithful Generation Under Distribution Shift

We propose Token-Level Off-Policy Labeling (TOPL), an off-policy training paradigm that reframes post-training

用途: 分類
難易度: Easy
コスト: High

FlashRT: Agent Harness for Guiding Agents to Deploy Real-Time Multimodal Applications

Real-time multimodal applications, including voice agents and interactive video generation, compose heterogene

深層学習軽量化・量子化生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

huggingfaceGitHubありHugging Faceあり2026-07-20

WorldCupArena: Fine-Grained Evaluation of Language Models and Deep-Research Agents on Football Forecasting

Predicting a football match before kickoff requires more than knowing past results: a model must use changing

コンピュータビジョンセグメンテーション予測テキスト

用途: 予測
難易度: Easy
コスト: Low

Coercion and Deception in AI-to-AI Management: An Agentic Benchmark of Unprompted Escalation

Multi-agent systems routinely place one AI agent in authority over another. When a subordinate refuses a task,

自然言語処理大規模言語モデル分類テキスト

用途: 分類
難易度: Easy
コスト: High

githubGitHubあり2026-07-20

Open-dLLM — Open diffusion language model for code generation — releasing pretraining, evaluation, inference, and checkpoints.

Open-dLLMはOpen diffusion language modelを公開しており、コード生成の前トレーニング、評価、推論、チェックポイントを公開しています。

用途: コード生成の問題を解決する
難易度: Easy
コスト: High

説明可能品質予測/異常検知自然言語処理大規模言語モデル生成テキスト

arxivGitHubあり2026-07-19

CoEvoP&R: Co-Evolving Placement Objectives with Routing Feedback via Large Language Models

Analytical placers rely on differentiable objective functions to guide placement, typically combining intermed

用途: 生成
難易度: Hard
コスト: High

huggingfaceHugging Faceあり2026-07-19

TimeLens2: Generalist Video Temporal Grounding with Multimodal LLMs

Video multimodal large language models (MLLMs) can describe what happens in a video, but rarely identify when

自然言語処理大規模言語モデル検出テキスト動画

用途: 検出
難易度: Easy
コスト: High

huggingfaceGitHubありHugging Faceあり2026-07-19

Distilled Reinforcement Learning for LLM Post-training

Large language model (LLM) post-training is essential for improving reasoning, adaptation, and alignment. Exis

説明可能品質予測/異常検知深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-19

The Geometry of Semantic Space: A Continuous Geometric Framework for the Transformer Architecture

We present a continuous geometric framework that models the discrete algebraic operations of the Transformer a

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-07-19

testtimescaling.github.io — "what, how, where, and how well? a survey on test-time scaling in large language models" repository

大規模言語モデルのテスト時間調整に関する調査のリポジトリ。

用途: 大規模言語モデルのテスト時間調整
難易度: Easy
コスト: High

huggingfaceGitHubありHugging Faceあり2026-07-18

Dataset Distillation by Influence Matching

We revisit dataset distillation from an outcome-centric perspective. Rather than aligning process surrogates (

深層学習軽量化・量子化分類画像テキスト

用途: 分類
難易度: Easy
コスト: High

DataFlow-Harness: A Grounded Code-Agent Platform for Constructing Editable LLM Data Pipelines

Large language models (LLMs) are increasingly used to automate data-processing workflows, yet coding agents ty

自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

Group Entropy-Controlled Policy Optimization

Entropy control has become an effective tool in reinforcement learning (RL) of large language models (LLMs), h

深層学習軽量化・量子化生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト

Environment-free Synthetic Data Generation for API-Calling Agents

Training API-calling large language model (LLM) agents demands massive amounts of high-quality trajectories. H

用途: 生成
難易度: Easy
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル分類QA画像

Can Multimodal Large Language Models Understand OCT?

Optical coherence tomography (OCT) imaging is essential for the diagnosis and treatment of retinal diseases. A

用途: 分類
難易度: Easy
コスト: High

githubGitHubあり2026-07-18

maths-cs-ai-compendium — Become a cracked AI/ML Research Engineer

Becoming a cracked AI/ML Research Engineerには、AI/ML研究者のスキルと知識を高めるための手法が紹介されています。

コンピュータビジョンマルチモーダルテキスト音声

用途: AI/ML研究者を育成
難易度: Easy
コスト: High

自然言語処理大規模言語モデル画像テキストマルチモーダル

An Exam for Active Observers

Human vision is a closed loop: gaze is continuously redirected by intermediate hypotheses rather than a single

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

RecGPT-V3 Technical Report

Large language models (LLMs) are transforming recommender systems from matching co-occurrence patterns in hist

深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

Understanding Reasoning from Pretraining to Post-Training

Reinforcement learning (RL) has become central to improving large language models (LLMs) on complex reasoning

自然言語処理大規模言語モデルテキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

Recursive Harness Self-Improvement

Under model--harness co-evolution, harnesses are not merely inference-time scaffolds but data-generating compo

品質予測/異常検知深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

MI向き自然言語処理大規模言語モデル生成画像テキスト

S1-Omni: A Unified Multimodal Reasoning Model for Scientific Understanding, Prediction, and Generation

We present S1-Omni, a unified multimodal reasoning model for scientific understanding, prediction, and generat

用途: 生成
難易度: Easy
コスト: High

説明可能自然言語処理大規模言語モデル画像テキスト音声

Audio-Visual Flamingo: Open Audio-Visual Intelligence for Long and Complex Videos

We present Audio-Visual Flamingo (AV-Flamingo), a fully open state-of-the-art audio-visual large language mode

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

自然言語処理大規模言語モデル生成テキストマルチモーダル

githubGitHubあり2026-07-17

generative-ai — Comprehensive resources on Generative AI, including a detailed roadmap, projects, use cases, interview preparation, and coding preparation.

ゼネレーティブAIに関連するリソースの一覧。

用途: ゼネレーティブAI
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-16

Trajectory-aware Cross-view Geo-localization with Sequential Observations

Cross-view geo-localization matches ground-level observations against geo-tagged satellite imagery. Recent met

品質予測/異常検知深層学習軽量化・量子化検出画像テキスト

用途: 検出
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-16

RESOURCE2SKILL: Distilling Executable Agent Skills from Human-Created Multimodal Resources

Skills are a useful abstraction for software agents, turning human and agent experience into reusable procedur

自然言語処理RAG画像テキスト動画

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-16

Xiaomi-Robotics-1: Scaling Vision-Language-Action Models with over 100K Hours of Real-World Trajectories

We present Xiaomi-Robotics-1, a foundational vision-language-action (VLA) model capable of (1) following diver

深層学習軽量化・量子化生成テキストマルチモーダル

用途: 生成
難易度: Easy
コスト: High

コンピュータビジョンセグメンテーション画像テキストマルチモーダル

Generalizable VLA Finetuning via Representation Anchoring and Language-Action Alignment

Finetuning a pretrained vision-language model (VLM) on robot demonstrations via behavior cloning (BC) has beco

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

コンピュータビジョンセグメンテーション画像テキスト動画

Open-AoE: An Open Egocentric Manipulation Dataset and Toolchain for Embodied Learning

Egocentric videos of human manipulation provide scalable supervision for embodied intelligence, yet existing r

用途: セグメンテーション
難易度: Easy
コスト: High

Diagnosing and Calibrating Tool-Call Boundary Drift in Multi-Teacher On-Policy Distillation

Agentic language models must learn when to call tools, when to consume tool responses, and when to answer dire

深層学習軽量化・量子化生成テキスト

用途: 生成
難易度: Easy
コスト: High

Cura 1T: Specialized Model for Agentic Healthcare

Healthcare spans high-stakes communication, expert reasoning, and workflow execution, yet specialized LLMs tha

自然言語処理大規模言語モデル画像テキスト

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

VideoRAE: Taming Video Foundation Models for Generative Modeling via Representation Autoencoders

Video generative models commonly rely on latent spaces learned by 3D Variational Autoencoders (3D-VAEs). Howev

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-07-15

vowpal_wabbit — Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbitは、機械学習を進歩させるためのオンライン学習、ハッシュ、reduceなどの強力なアルゴリズムを含むシステムです。その結果、さまざまな問題に応じて、高品質な解決策を提供できます。

強化学習テキスト

用途: 強い機械学習アルゴリズムを実行し複雑な問題を解決するためのシステム
難易度: Easy
コスト: Medium

huggingfaceHugging Faceあり2026-07-14

ReflectWorld-MM: An Entity-Oriented Multimodal Memory System for Open-Ended Video Streams

Building assistants that can continually watch the world, remember what they see, and reason over their accumu

コンピュータビジョンマルチモーダル画像テキスト音声

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-14

From Human-Centric to Agentic Code Review: The Impact of Different Generations of Generative AI Technology on Review Quality

Code review helps maintain software quality before code integration, but it also imposes a substantial workloa

品質予測/異常検知深層学習Transformer生成テキスト

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-07-14

Awesome-Embodied-Robotics-and-Agent — This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥

Embodied AIやロボットとLarge Language Modelを組み合わせた研究のリポジトリ。

用途: Embodied AIやロボット研究
難易度: Easy
コスト: High

githubGitHubあり2026-07-14

LakonLab — Official implementation of AsymFlow, pi-Flow, GMFlow

LakonLabは、AsymFlow、pi-Flow、GMFlowなどの生成型流体力学を実装するためのオープンソースプロジェクトです。

深層学習軽量化・量子化生成画像テキスト

用途: 生成型流体力学の実装
難易度: Easy
コスト: Medium

githubGitHubあり2026-07-14

memvid — Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.

MemVidは、サーバーレスで単一ファイルの記憶層を提案し、AIエージェントが即時検索と長期的な記憶を持つようにする記憶層です。

自然言語処理大規模言語モデル生成テキスト動画

用途: AIエージェントの記憶を管理する
難易度: Easy
コスト: High

huggingfaceGitHubありHugging Faceあり2026-07-13

RAGU: A Multi-Step GraphRAG Engine with a Compact Domain-Adapted LLM

Graph retrieval-augmented generation (GraphRAG) enhances large language models with structured knowledge, yet

自然言語処理大規模言語モデル検出生成要約

用途: 検出
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-13

Qwen-Music Technical Report

In this report, we introduce Qwen-Music, a powerful music generation model capable of producing highly musical

センサ/時系列品質予測/異常検知深層学習Transformer生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-07-13

Matcha-TTS — [ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Matcha-TTSは、高速で条件付き流のマッチングを実現するTTSアーキテクチャであり、話者の特徴を考慮する。

生成AI拡散モデルテキスト音声

用途: TTSアーキテクチャ設計
難易度: Easy
コスト: High

githubGitHubあり2026-07-13

Irodori-TTS — A Flow Matching-based Text-to-Speech Model with Emoji-driven Style Control

Emotion-driven Style Controlを使用してテキストから声の変換が実行され、感情のあるテキストをエモタイザブルな声に変換することが可能になります。

生成AI拡散モデル生成テキスト音声

用途: テキスト-to-声の変換
難易度: Easy
コスト: High

arxivGitHubあり2026-07-12

Beyond Looking Up, Try Looking Around: Harmonizing Global Structure and Local Consistency in Optimal Transport for Short Text Clustering

Pseudo-labeling based on Optimal Transport (OT) has become an effective mechanism for enhancing short text clu

用途: 技術検証・論文読解補助
難易度: Easy
コスト: Medium

huggingfaceHugging Faceあり2026-07-12

Predictive Divergence Masks for LLM RL

Reinforcement learning for large language models (LLMs) typically relies on trust-region masks to stabilize of

深層学習軽量化・量子化テキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-11

GigaChat Audio: Time-aware Large Audio Language Model

Temporal grounding in long recordings remains challenging for audio-conditioned LLMs. We present a time-aware

自然言語処理大規模言語モデルテキスト音声

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-07-11

awesome-nlp — :book: A curated list of resources dedicated to Natural Language Processing (NLP)

このリポジトリは自然言語処理(NLP)に関するリソースをまとめたものです。

自然言語処理テキスト

用途: NLPリソースのまとめ
難易度: Easy
コスト: Medium

huggingfaceHugging Faceあり2026-07-10

REBASE: Reference-Background Subspace Elimination for Training-Free In-Context Segmentation

Training-free in-context segmentation enables new object categories to be introduced at inference time from a

品質予測/異常検知自然言語処理プロンプトエンジニアリング検出セグメンテーション画像

用途: 検出
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-08

Agon: Competitive Cross-Model RL with Implicit Rival Grading of Reasoning

Reinforcement learning from verifiable rewards (e.g. GRPO) is the engine behind today's reasoning models, yet

コンピュータビジョンセグメンテーションテキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High

githubGitHubあり2026-07-08

VoxCPM — VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

マルチラギングスピーチ生成やクリエイティブボイスデザイン、ルートライフクライミングなど、テクスチャファリーTTSの最新技術を実現するためのフレームワークです。

生成AI音声・音楽生成生成テキスト音声

用途: マルチラギングスピーチ生成
難易度: Easy
コスト: Medium

huggingfaceHugging Faceあり2026-07-07

UI2App: Benchmarking Visual Interaction Inference in Executable Web Application Generation

Large language models (LLMs) have demonstrated growing competence in web page generation. However, existing te

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-07-07

enchanted — Enchanted is iOS and macOS app for chatting with private self hosted language models such as Llama2, Mistral or Vicuna using Ollama.

iOS、macOS用のアプリ「Enchanted」は、個人でホストした言語モデル（LLama2、Mistral、Vicunaなど）とのチャットを可能にする。

用途: 私家版の言語モデルとチャットするためのiOS、マックアプリ
難易度: Easy
コスト: High

githubGitHubあり2026-07-02

langextract — A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

LLMを使用して、自然言語処理における情報抽出を行うためのPythonライブラリです。

自然言語処理大規模言語モデル画像テキスト

用途: 自然言語処理情報抽出
難易度: Easy
コスト: High

githubGitHubあり2026-06-30

mxcp — Model eXecution + Context Protocol: Enterprise-Grade Data-to-AI Infrastructure

データをAIに変換する基盤を構築することで、ビジネス上の問題を解決できます。この研究では、Model eXecution + Context ProtocolであるMXCPを提案し、データの変換を簡素化した上で、AIアプ

用途: データをAIに変換する基盤を構築することによって、ビジネスを改善する
難易度: Easy
コスト: High

githubGitHubあり2026-06-30

ComfyUI-LTXVideo — LTX-Video Support for ComfyUI

医療画像分析で、深層學習モデルが実装されている問題に対する解決策を提示します。治療を導くために、批判的結果に影響を与える変化について特に重点が置かれています。

生成AI拡散モデル生成画像テキスト

用途: 医療画像を分析し治療を導く
難易度: Easy
コスト: High

arxivGitHubあり2026-06-28

When LLMs Develop Languages: Symbolic Communication for Efficient Multi-Agent Reasoning

Chain-of-Thought (CoT) improves large language models (LLMs) on difficult reasoning tasks, but it often incurs

MI向き深層学習軽量化・量子化テキスト

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成画像テキスト

githubGitHubあり2026-06-25

ml-mdm — Train high-quality text-to-image diffusion models in a data & compute efficient manner

Train high-quality text-to-image diffusion models in a data & compute efficient manner

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-05-07

Masked Diffusion Language Models are Strong and Steerable Text-Based World Models for Agentic RL

Recent growth in reinforcement learning (RL) has surfaced a need for diverse, specialized training environment

自然言語処理大規模言語モデルテキスト強化学習

用途: 技術検証・論文読解補助
難易度: Easy
コスト: High