MLinfo | 機械学習・AI論文まとめ

diffusers — 🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

.diffusion モデルのライブラリ。画像・動画・音声生成に利用可能。

生成AI拡散モデル生成画像テキスト

用途: 画像・動画・音声生成
難易度: Easy
コスト: High

rig — ⚙️🦀 Build modular and scalable LLM Applications in Rust

Rustを使ってモジュラーLLMアプリケーションを構築することができるライブラリです。

用途: モジュラーLLMアプリケーション作成
難易度: Easy
コスト: High

Medical_Image_Analysis — Foundation models based medical image analysis

医学画像分析は、医療の診断や治療を支援するために画像に記載されたデータから情報を抽出する研究分野です。この研究では、foundation modelsを用い、医療画像分析のための新しいアプローチを提案しました。found

用途: 医学画像分析
難易度: Easy
コスト: High

Sana — SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

SANAは、高解像度画像生成モデルSANAを紹介する本研究であり、低計算コストで優れた高解像度画像を生成できる。

用途: 高解像度画像合成
難易度: Easy
コスト: High

qdrant — Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

このリポジトリでは、データとAIアルゴリズムを製品化するためのプラットフォームであるTaipyを提供しています。

自然言語処理埋め込み・検索生成画像

用途: AIアプリケーションを製品化するためのプラットフォーム
難易度: Easy
コスト: Low

weaviate — Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

ベクトル検索と構造化されたフィルタリングを組み合わせたベクターデータベースです。

MLOps生成画像

用途: ベクターデータベース
難易度: Easy
コスト: Medium

metaflow — Build, Manage and Deploy AI/ML Systems

TensorZeroは、LLMゲートウェイ、オブザーバビリティ、評価、最適化、実験を統一したオープンソースのLLMOpsプラットフォームです。

用途: AI/MLシステムの構築、管理、展開ツール
難易度: Easy
コスト: High

kserve — Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

flyteは、高度に動的で堅牢なAIオーケストレーションプラットフォームであり、データ、モデル、コンピューティングを統合してAIワークフローを作成することができます。

用途: エクスペリメントトラッカーを簡単にする
難易度: Easy
コスト: High

openvino — OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

オープンソースのAI推論最適化と展開用ツールキットです。

深層学習Transformer分類生成音声

用途: AI推論の最適化と展開
難易度: Easy
コスト: Low

Awesome-Video-Diffusion — A curated list of recent diffusion models for video generation, editing, and various other applications.

Awesome-Video-Diffusionは、Recent Diffusion Models for Video Generation, Editing, and Othersのリストを公開しています。

生成AI拡散モデル生成動画

用途: ビデオ生成や編集の問題を解決する
難易度: Easy
コスト: High

FastVideo — A unified inference and post-training framework for accelerated video generation.

FastVideoは、加速されたビデオ生成用の統合推論とポストトレーニングのフレームワークです。

深層学習軽量化・量子化生成動画

用途: ビデオ生成を加速する
難易度: Easy
コスト: High

LightX2V — Lightweight Image Video Action Generation Inference Framework

zenmlは、データパイプラインからエージェントまで、AIプラットフォームです。

深層学習軽量化・量子化生成画像動画

用途: AI推論を軽量化したインフラ
難易度: Easy
コスト: High

FastGen — NVIDIA FastGen: Fast Generation from Diffusion Models

この論文では、ディフュージョンモデルの高速化を目的としたNVIDIA FastGenについて説明しています。FastGenは、ディフュージョンモデルから高速に生成することが可能です。

用途: ディフュージョンモデルの高速化
難易度: Easy
コスト: High

haystack — Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

オープンソースのAIオーケストレーションフレームワークです。LLMアプリケーションの構築に必要なパイプラインやエージェントワークフローの設計ができるようになっています。

深層学習Transformer生成要約テキスト

用途: LLMアプリケーションの構築
難易度: Easy
コスト: High

RAG_Techniques — This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each technique has a detailed notebook tutorial.

医学画像に対する疾患検出モデルを開発し、臨床現場で早期検出と迅速な介入を容易にすることを目的としたフレームワークを提案します。

用途: 医学画像の疾患検出
難易度: Easy
コスト: High

Expanding Flow Maps

流れベースの生成モデルに関する新しいアプローチであるExpanding Flow Mapsを提案しました。Expanding Flow Mapsは、定数次元または定数シーケンス長に限定されるものの従来のパラメータ化に比べ

用途: 流れの生成技術の開発
難易度: Hard
コスト: Medium

品質予測/異常検知画像検査深層学習Transformer検出生成画像

Synthetic data generation framework for quality control automation in gravure printing

印刷品質管理技術のための新しいアプローチであるシンセティックデータ生成フレームワークを提案しました。このフレームワークは、ロトグラビューグラビング技術における品質管理のためのシンセティックデータを生成することで、印刷

用途: 印刷品質管理技術の開発
難易度: Hard
コスト: High

センサ/時系列コンピュータビジョンセグメンテーション分類生成テキスト

Beyond Sufficiency: Time Series Explanation with Counterfactual Necessity

時系列データ分析技術のための新しいアプローチであるTimePNS（Time Series Explanation with Counterfactual Necessity）を提案しました。TimePNSは、時系列データ

用途: 時系列データ分析技術の開発
難易度: Hard
コスト: Low

Windowed-MTP: Removing the Full-Context Draft-KV Tax at Million-Token Context

Speculative decoding accelerates autoregressive generation by having a cheap draft propose tokens that a targe

用途: 生成
難易度: Hard
コスト: High

Finite-Sample Coverage Audits for High-Recall Candidate Generation: Certification and Learning-Theoretic Design

An initial high-recall stage in an empirical pipeline decides which items pass to later review, labelling, or

自然言語処理RAG生成テキスト

用途: 生成
難易度: Hard
コスト: Low

Test-Time Scaling via Error Localization

Scaling inference-time computation has emerged as a reliable method to improve the performance of large langua

自然言語処理大規模言語モデル検出生成テキスト

用途: 検出
難易度: Hard
コスト: High

Token Budget Saturation and Mechanistic Early Detection of Reasoning Non-Convergence in Chain-of-Thought Models

チェーン・オブ・サウト reasoning モデルの収束不明確さを解決する研究。このモデルの不完全収束は、生成するトークンの数に依存し、モデルには収束しない限り問題を解決する能力がない。これを解決するための予測を終了する

自然言語処理プロンプトエンジニアリング検出生成

用途: チェーン・オブ・サウト reasoning モデルに適切に予測を終了する方法を検討する
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョンセグメンテーション生成テキスト

Context-weighted Discrete Flow Matching

ディスクリートフロー・マッチングにおけるコンテキストの正しい有用性の利用を検討した。この研究では、ディスクリートフロー・マッチングのモデルの正確さを高めるためにコンテキストの有用性を適切に利用する方法を提案した。

用途: ディスクリートフロー・マッチングにおけるコンテキストの有用性
難易度: Hard
コスト: High

Mean-to-Score Discrete Diffusion: Posterior-Mean Denoisers for Score Entropy

ディスクリート確率模型におけるベイジアン解釈の分析を進めた。この研究では、正確さを高めるために、負のスコア比率を制限したディスクリート確率模型を提案した。

用途: ディスクリート確率模型におけるベイジアン解釈性分析
難易度: Hard
コスト: High

Emergent Misalignment Recruits a Pre-existing Persona Subspace

アライメントした言語モデルの偏った表現の理解を進めた。この研究では、アライメントした言語モデルの表現を分析して、偏った表現を理解することができ、これを用いて、偏った表現を正すことができると主張した。

自然言語処理ファインチューニング生成テキスト

用途: アライメントした言語モデルの偏った表現の理解
難易度: Hard
コスト: High

M$^3$-Gen: Interpretable Multimodal Generation of Gene Expression Profiles Using Clinical and Imaging Data

Integrating heterogeneous biomedical data, including clinical metadata, histopathology images, and molecular p

説明可能自然言語処理RAG生成画像マルチモーダル

用途: 生成
難易度: Hard
コスト: High

Adaptive Depth Sparse Framework: Similarity-Driven Resource Allocation for Pre-Trained LLMs

Large language models (LLMs) achieve strong generation and reasoning performance, but the Transformer architec

用途: 生成
難易度: Hard
コスト: High

センサ/時系列品質予測/異常検知深層学習Transformer生成予測テキスト

Transformer-based Diffusion models for Hydrological Time Series Probabilistic Imputation and Forecasting

The modeling of hydrometeorological time series with limited observations is a key challenge in the monitoring

用途: 生成
難易度: Hard
コスト: High

Automated Synthesis and Adversarial Validation of Executable Causal Research Pipelines

この研究では、機械学習モデルを使用して血糖値の変化を予測し、糖尿病管理のためには血糖値データの前処理が重要であることの重要性を強調しています。

用途: 病気予測
難易度: Hard
コスト: High

CASC: Causal Adversarial Subspace Clustering for Multivariate Spatiotemporal Data

この研究では、CASCフレームワークを提案し、多変量空間時系列データを含む多様なデータを扱えるグラムニューラルネットワークのサブスペースクラスタリングを実現します。

用途: 多変量空間時系列データのクラスタリング
難易度: Hard
コスト: Low

品質予測/異常検知コンピュータビジョンセグメンテーション生成マルチモーダル

Best-of-Evidence: Best-of-N Selection under Partial Verification

モデル出力の選択のためのBoN（ベストオブナ）を、部分検証が含まれるビジョン言語タスクに適用する。この方法により、モデル出力を効率化できる。

用途: 部分検証を含むビジョン言語タスクを効率化する
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理ファインチューニング検出生成

RadioTrace: Transmitter-Aware Diffusion for Radio Map Estimation without Deployment-Time Fine-Tuning

RFマップ（無線周波数マップ）を推定するためのTransmitter-Aware Diffusion（送信機認識拡張）を提案した研究で、この方法によりRFマップを効率的に推定できる。

用途: RFマップの推定を支援する
難易度: Hard
コスト: High

Multi-turn RL with Structural and Performance Aware Rewards for CUDA Kernel Generation

CUDAカーネルの生成を支援するCudaPerfを提案した研究で、この方法により、高性能のCUDAカーネルを効率的に生成できる。

自然言語処理大規模言語モデル生成強化学習

用途: CUDAカーネルの生成を支援する
難易度: Hard
コスト: High

Position Bias is Hidden Behind Ceiling Effects: A Permutation Diagnostic for LLM Benchmarks

LLM（言語モデル）の評価における位置バイアスを分析するための方法を提案した研究で、この方法により、位置バイアスが評価結果にどのような影響を与えるかが明らかにできる。

自然言語処理大規模言語モデル検出生成

用途: LLMの評価における位置バイアスを分析する
難易度: Hard
コスト: High

品質予測/異常検知生成AI動画生成生成画像テキスト

GraphVid: Interactive Graph-Controllable Video Generation

GraphVidは、グラフと文本から生成することができ、オブジェクトの複数の移動を正確に制御することができる。グラフではオブジェクトの動きを表す情報を保存し、文から生成の制約を指定することができる。

用途: コントロール可能なビデオ生成
難易度: Hard
コスト: High

From Resource Flow to Executable Tests: Petri-Net-Guided LLM Test Generation for Concurrent Stateful Rust APIs

この研究は、リソースフローの動作を表すPetriネットと、APIを操作するためのテストを自動生成する方法を提案した。方法は、APIの機能をテストするためのシナリオを生成し、テストが正しく実行されるようにした。

用途: 共時進行のコンカURRENCYAPIのテスト
難易度: Hard
コスト: High

ElasticTTT: Prior-Preserving Test-Time Tuning for Video Editing

ElasticTTTは、プログラムがテストのときに動作を調整できるようにした。方法は、テストのときにモデルが前のサンプルの情報と現在の情報を組み合わせて、ビデオを編集する際に正しく動作するようにした。

生成AI拡散モデル生成テキスト動画

用途: ビデオ編集時のテストタイムチューニング
難易度: Hard
コスト: High

GS-Agent: Creating 4D Physical Worlds With Generative Simulation

GS-Agentは、自然言語から生成することができ、物理的に正しく動作する4次元の世界を生成することができる。方法は、物理的正しさを保つために、生成時に物理的推論を使用した。

MI向き自然言語処理RAG生成画像テキスト

用途: 4次元の物理世界の生成
難易度: Hard
コスト: High

Artificial Epanorthosis: Why large language models overuse a classical rhetorical figure, and how to mitigate it

Artificial Epanorthosisは、大規模言語モデルが古典的なルレチックの表現を使用する傾向に注目した。結果は、モデルのトレーニングデータの形状がこの傾向に影響していることができた。

深層学習軽量化・量子化分類生成テキスト

用途: 大規模言語モデル上のエパノルシス
難易度: Hard
コスト: High

Bridging the Gap Between Plausibility and Admissibility: Constraint-Aware Flow Maps for Dynamic Graph Systems

Generative models can support decision-making under uncertainty by producing ensembles of plausible future sys

コンピュータビジョンセグメンテーション生成

用途: 生成
難易度: Hard
コスト: High

表形式向き自然言語処理大規模言語モデル生成テキスト

Euclid-MCP: A Model Context Protocol Server for Deterministic Logical Reasoning via Prolog

Large Language Models (LLMs) excel at natural language understanding and generation but remain unreliable for

用途: 生成
難易度: Hard
コスト: High

GRADRAG: Cross-Component Prompt Adaptation for Coordinated Multi-Agent RAG

Retrieval-Augmented Generation (RAG) systems increasingly employ multiple LLM agents. Yet, most prior work opt

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成強化学習

Expert Behavior Prior Reinforcement Learning

Behavior prior reinforcement learning (BPRL) has emerged as a promising paradigm to improve sample efficiency

用途: 生成
難易度: Hard
コスト: High

少数データ向きCPUで試しやすい条件最適化自然言語処理大規模言語モデル生成

An LLM-Driven Workflow for Automated Process Control Strategy Generation and Tuning from Dynamic Process Models

このプロセスでは、大規模言語モデルを使用して、ダイナミックプロセスモデルに基づいて自動化された制御戦略を生成します。

用途: オートメーションされた制御戦略の生成
難易度: Hard
コスト: High

pAI-Econ-claude: A Gated Human-in-the-Loop Multi-Agent Architecture for AI-Assisted Economic Theory Development

この研究では、大規模言語モデルを活用して、経済学の研究活動をサポートするシステムを開発しました。このシステムは、学者が理論モデル開発を自動化することができます。

用途: 経済学の研究支援システム
難易度: Hard
コスト: High

SafeStep: AI-powered Travel Assistance for Elderly People with Frailty or Dementia

老年者は移動が困難になることが多いため、この研究では老年者の安全な移動支援システムを開発します。このシステムでは、LLMと予測モデルを組み合わせて、老年者の安全な移動を支援します。

用途: 老年者の安全な移動支援
難易度: Hard
コスト: High

Can Generative Recommendation Reach Cold Items? A Temporal Perspective on Semantic-ID Generation

Semantic-ID-based generative recommendation represents items as sequences of shared semantic tokens, enabling

コンピュータビジョン動画認識生成テキスト

用途: 生成的な推奨システムの冷たいアイテム
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成テキスト音声

Faster IndexTTS-2: Accelerating and Streaming Autoregressive Zero-Shot Text-to-Speech Synthesis on GPUs

Autoregressive text-to-speech models achieve strong naturalness but suffer from slow inference due to sequenti

用途: 生成
難易度: Hard
コスト: High

Reexamining zero-shot summarization: Empirical investigation of trustworthiness of LLM-summarizers

Zero-shot summarization using Large Language Models (LLMs) has significantly advanced the abstractive summariz

MI向き深層学習軽量化・量子化分類生成要約

用途: 分類
難易度: Hard
コスト: High

SciExplore: Evaluating Autonomous Agents from Scientific Navigation to Information Integration

Scientific research involves complex information-seeking and reasoning workflows across heterogeneous sources.

自然言語処理大規模言語モデル生成QAテキスト

用途: 生成
難易度: Hard
コスト: High

Traceable Scholarship: Page Anchors and Ariadne's Thread for Humanistic Inquiry in the Age of Generative AI

Generative AI lets large language models produce scholarly-looking text within seconds, yet fluency does not e

用途: 生成
難易度: Hard
コスト: High

Source-Prior-Driven Selective Adaptation for Efficient Diffusion Model Finetuning

Fine-tuning large diffusion models for new domains or styles involves a trade-off: improving target-specific g

用途: 生成
難易度: Hard
コスト: High

Is Deep Research Reliable? Misleading Knowledge Induces False Conclusions

Deep Research agents extend LLM-based assistants into long-horizon workflows involving planning, retrieval, ev

用途: 生成
難易度: Hard
コスト: High

説明可能MI向き品質予測/異常検知深層学習Transformer分類生成画像

Enhancing Explainable Cardiac Diagnosis with Guide-Grounded Multimodal LLMs

The electrocardiogram (ECG) is a cornerstone of cardiac as- sessment, yet clinical deployment of deep learning

用途: 分類
難易度: Hard
コスト: High

Profiling Lightweight Large Language Models

Lightweight large language models (LLMs) are increasingly being deployed locally on personal computers and are

用途: 生成
難易度: Hard
コスト: High

Search Hardness-Aware LLM-Based Problem Formulation for Expensive Simulation-Driven Design

シミュレーション駆動設計では、高精度なシミュレーションを少なくすることで設計を実現しています。既存の手法では、その問題に取り組むために最適化アルゴリズムが改善されてきましたが、問題の定義自体は検討されていません。この論文

用途: コスト削減的なシミュレーション駆動設計
難易度: Hard
コスト: High

MedGame: Storytelling Gamification Empowered by Large Language Models for Medical Education

Large Language Models (LLMs) は医学教育に大きな可能性を持っていますが、現在のシステムでは、質問に答えるか一時的なフィードバックしか行なわれていません。一方、臨床病例を決定センターへの学習トレ

自然言語処理大規模言語モデル生成QAテキスト

用途: 医学教育への Large Language ModeL の適用
難易度: Hard
コスト: High

センサ/時系列自然言語処理大規模言語モデル生成テキスト音声

An Evaluation Framework for Structured Audio Captions Validated by Controlled Perturbations

この論文では、音声字幕の評価手法が提案され、音声字幕の評価において既存の手法の制約を克服することを目指しました。提案されたフレームワークは音声字幕の各側面を評価し、質問回答型の評価手法ではなく字幕の中立性を評価することが

用途: 音声字幕の評価フレームワークの構築
難易度: Hard
コスト: High

Capital Markets LLM Reliability Score (CM-LRS): From Plausible to Bankable

In capital-markets workflows the question is rarely whether a large language model can produce a fluent draft,

用途: 生成
難易度: Hard
コスト: High

Progressive Cramming: Reliable Token Compression and What It Reveals

Token cramming compresses sequences into learned embeddings with near-perfect reconstruction, but fixed token

自然言語処理埋め込み・検索生成テキスト

用途: 生成
難易度: Hard
コスト: Low

説明可能品質予測/異常検知自然言語処理大規模言語モデル生成テキスト

PrefReward: Learning User Preference Matrix for Personalized Text Generation

Large Language Models (LLMs) have demonstrated remarkable ability in generating personalized content by levera

用途: 生成
難易度: Hard
コスト: High

QuantiBias: Benchmarking Quantization-Induced Bias in LLMs

Almost every large language model that reaches a broad audience is quantized: trained in full precision, then

用途: 生成
難易度: Hard
コスト: High

Chemical Chain-of-Thought Functions as a Hallucination-Prone Molecular Scratchpad

化学物質の構造を予測する言語モデルが信頼性の低い情報を生成する傾向があることを指摘し、原因と解決策について検討している。

MI向き自然言語処理RAG生成テキスト

用途: 化学物質の構造予測
難易度: Hard
コスト: Low

品質予測/異常検知深層学習Transformer生成テキスト

Transformer-Assisted LLM-Based Source Code Summarisation: to Enable More Secure Software Development

ソフトウェア開発の維持フェーズで、ソースコードの自然言語解説を生成するためのモデルの改善を目的とした研究。

用途: ソフトウェア開発のスピードアップ
難易度: Hard
コスト: High

LegalCiteTrust: Benchmarking Citation Trustworthiness in Chinese Long-Form Legal Research Reports

Chinese language の長形法律研究報告における出典の信頼性を評価し、信頼性が低い出典を検出および評価する目的で LegalCiteTrust を提案している。

用途: 法律研究報告の信頼性改善
難易度: Hard
コスト: High

REFACT: Adaptive Fact Restatement for Compact and Faithful Chain-of-Thought Reasoning

長形推論のための言語モデルが、提供されたコンテキストから乖離した論理を生成する可能性があることを指摘し、コンテキストと推論論理をより適切に融合するため、 REFACT (REstating Facts in Adapti

用途: Chain-Of-Thought (CoT) の改善
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成画像テキスト

Streaming Multi-Agent Autoregressive Diffusion Model with World State Registers

多エージェントのシミュレーションにおいて、共有世界状態がエージェント間で保持され、その世界状態が観測結果に反映されると仮定している。

用途: マルチエージェントのシミュレーション
難易度: Hard
コスト: High

Inference-Time Scaling of Diffusion Models via Progressive Seed Pruning

ディフュージョンモデルにおける初期的なNoise Seed の影響が、モデルが生成する高質のイメージに大きく影響していることを提示し、Seed Search 時の時間的負荷を削減するための方法を提案した。

用途: ディフュージョンモデルのサケリング
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成動画

SANA-Video 2.0: Hybrid Linear Attention with Attention Residuals for Efficient Video Generation

ビデオ生成モデルの効率性と高品質性を向上させるための新しい方法を提案した。

用途: ビデオの生成
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョン3D・点群生成3D

Future Rendering $\neq$ Future Surface: A Benchmark and Dataset for Dynamic Surface Reconstruction Beyond the Observed Window

Dynamic-scene reconstruction is almost always evaluated inside the observed time window, yet deployment settin

用途: 生成
難易度: Hard
コスト: High

GrainGS: Gradient-Decoupled Gaussian Splatting for Efficient Dynamic Novel View Synthesis

3Dガウシアンスプレイティングによる動的なシーン再構成は、動的なモーションモデリング、構造的安定性とコンパクトな表現のバランスをとることが求められる。実際、既存のprimitive毎に実際に実装されている方法はローカルの

品質予測/異常検知深層学習軽量化・量子化生成3D

用途: 3D Gaussian Splatting動的シーン再構成
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成画像

SlerpFlow: Spherical Trajectory Correction for Rectified Flow Inversion

Rectified-flow-based diffusion transformers, particularly FLUX, have demonstrated outstanding performance in h

用途: 生成
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション生成テキスト動画

T-STAR: A Large-Scale Benchmark for Spatio-Temporal Panoptic Scene Graph Generation in Satellite Video

Structured understanding of satellite video is essential for advancing dynamic geospatial scene analysis from

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知生成AIGAN生成画像マルチモーダル

Physics-Informed Deep Learning Model for Cross-Modality Super-Resolution in Fluorescence Microscopy

Cross-modality image translation offers a route to super-resolution fluorescence microscopy from low-resolutio

用途: 生成
難易度: Hard
コスト: High

Decoupling Cross-Modality Manifold Discrepancy: Leveraging Visible Diffusion Priors for Infrared Super-Resolution

Infrared image super-resolution (IISR) mitigates the limitations imposed by low spatial resolution. Existing m

自然言語処理RAG生成画像マルチモーダル

用途: 生成
難易度: Hard
コスト: High

HalluScope: Fine-grained Hallucination Diagnosis for Multimodal Large Language Models

大規模言語モデルはさまざまな画像をテキストに変換する上で優れた性能を示しているが、発生するホログラフィックな診断にはまだ解決策が必要です。この研究では、主流の粗い検出方法の欠点を補うため、細部の診断方法を提案しています。

説明可能自然言語処理大規模言語モデル分類検出生成

用途: ホログラフィックハロウィーンの診断
難易度: Hard
コスト: High

Show, Don't Tell: Evaluating Spatial Cognition in Generative Pixels Rather Than LLM Text

空間理解は、物理世界と静的のセマンティック理解の間でつながるために不可欠です。多くの空間タスクは、場所、領域、パスの自然な表現は、ポインティングやマーキングなど、連続的な視覚的シーンで行われることが多いが、現行の空間推論

用途: 空間理解
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer検出生成画像

GroupVideo: Multi-Identity Customized Text-to-Video Generation

Current identity customized video generation methodologies are predominantly limited to single-identity scenar

用途: 検出
難易度: Hard
コスト: High

Explainable Deepfake Detection Challenge

Deepfake detection is moving beyond binary classification decisions toward systems that can also explain the v

説明可能コンピュータビジョン画像分類分類検出生成

用途: 分類
難易度: Easy
コスト: Low

Distribution-Alignment Bridge for Uncertainty-Aware Text-to-Video Retrieval

本論文では、テキストと動画を対応させるDistribution-Alignment Bridge（DAB）を提案します。DABは、テキストと動画のエンティティを確率分布として表現し、両者の間の分布の差異を解決します。この

自然言語処理埋め込み・検索生成テキスト動画

用途: テキストから動画の検索
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理RAG生成画像教師なし

Unsupervised Metal Artifact Reduction in Dental CBCT using Fine-tuned Cycle-Consistent Adversarial Networks

この研究では、歯科CBCT画像中のメタルアーティファクトを除去するための循環互換的アドバーサリアルネットワーク（CycleGAN）を提案します。CycleGANを使用すると、メタルアーティファクトを除去した後、CBCT画

用途: メタルアーティファクトの除去
難易度: Hard
コスト: Low

Ms. Forcing: Efficient Streaming Video Generation with Multi-Scale Patchification and Attention

この論文では、効率的なストリーミングビデオ生成手法であるMs. Forcingを提案します。Ms.フオーシングは、Multi-Scale PatchificationとAttentionを組み合わせた手法です。

深層学習Transformer生成動画

用途: ストリーミングビデオ生成
難易度: Hard
コスト: High

MagicMakeup: A Region-Controllable Diffusion Transformer for High-Fidelity Makeup-Transfer

この研究では、マイメイク移植を改善するために、マイメイクの強い地域性を考慮したRegion-Controllable Diffusion Transformer（MagicMakeup）を提案します。

用途: マイメイク移植
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成3D

FA-LAM: Focus-Aware Large Avatar Model for One-Shot 4D Animatable Gaussian Head

この論文では、Focus-Aware Large Avatar Model（FA-LAM）を提案します。FA-LAMは、一時的なGaussian頭の生成に適したモデルです。

用途: 一時的なGaussian頭の生成
難易度: Hard
コスト: High

Engine-Native Editable 3D World Reconstruction with Objects and Lighting

この論文では、Lumeraという手法を提案します。Lumeraは、Engine-Native 3D World ReconstructionとLightsを検出するために使用します。

自然言語処理大規模言語モデル検出生成画像

用途: 3D世界の再構成
難易度: Hard
コスト: High

WhereEdit: Mask-aware Local Latent Editing for One-Step Image Editing

この研究では、WhereEditという手法を提案します。WhereEditは、Mask-aware Local Latent Editingを使用して、一ステップの画像編集を実行します。

用途: 画像編集
難易度: Hard
コスト: Medium

Agentic Designer: Progressive Multi-Agent Collaboration for Structure-Aware Interior Layout Generation

Generating realistic interior furniture layouts that strictly adhere to architectural constraints (e.g., walls

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成画像3D

SubSplat: High-Resolution Pixel-aligned 3DGS via Sub-pixel Gaussian Reparameterization

Pixel-aligned Gaussian splatting enables efficient and generalizable novel-view synthesis. However, high-resol

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理RAG生成画像テキスト

TableVerse: A Large-scale Tabletop Dataset with Real-world Grounded Layouts for Generalizable Manipulation

オートメーションされたマニピュレーションを目的とした、大規模なテーブルトップのデータセットであるTableVerse を提案します。このデータセットには、物理的に可能な実世界のレイアウトを生成する実用的な方法が含まれてお

用途: オートメーションされたマニピュレーションのためのテーブル環境の生成
難易度: Hard
コスト: Low

Distributed Model-Based Diffusion For Scalable Multi-Robot Trajectory Optimization

多ロボットのトラッジオプティマイズを目的とした、分散型のモデリングベースの浸透を提案します。このフレームワークは、非凸の非線形の非可微分な環境を考慮しながら、効率的なトラッジ作成を支援します。

用途: 多ロボットのトラッジオプティマイズ
難易度: Hard
コスト: High

nestia — NestJS Helper + AI Chatbot Development

NestJSベースのAIチャットボット開発ツールです。

用途: AIチャットボット作成
難易度: Easy
コスト: High

xtuner — A Next-Generation Training Engine Built for Ultra-Large MoE Models

xtunerは、超大規模MoEモデルを高速にトレーニングするためのトレーニングエンジンです。

自然言語処理大規模言語モデル生成マルチモーダル

用途: MoEモデルの高速トレーニングを提供する
難易度: Easy
コスト: High

remove-ai-watermarks — AI watermark remover. CLI and Python library to strip visible and invisible AI watermarks (Gemini / Nano Banana sparkle, SynthID) and provenance metadata (C2PA, EXIF, IPTC) from images.

音声認識、声活動検出、テキスト処理などを行う、基盤となる音声認識ツールキットを提供する。

自然言語処理大規模言語モデル生成画像

用途: 音声認識の基盤技術の提供
難易度: Easy
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成テキスト動画

Causal-Forcing — [ICML 2026] Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation" & Causal Forcing++

この論文では、Causal-Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive

用途: 高品質のビデオ生成を実現する。
難易度: Easy
コスト: High

txtai — 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

LLMを利用するために、セマンティック検索やLLMのオーケストレーションなどを行えるフレームワーク。

用途: セマンティック検索
難易度: Easy
コスト: High

Self-Supervised Bio-Inspired Robotic Trajectory Planning with Obstacle Avoidance

Trajectory planning is a fundamental problem in robotics, requiring the generation of collision-free and effic

深層学習軽量化・量子化生成教師あり自己教師

用途: 生成
難易度: Hard
コスト: High

Leaky Language Models: Stealing Architecture and Inference Optimizations via Per-Token Timing

This work presents LeakyLMs, a set of attacks that leak proprietary model, architecture, and deployment inform

用途: 生成
難易度: Hard
コスト: Medium

Label-Free Finite-Volume-Residual Training of Attention Graph Neural Networks for Coupled Thermo-Fluid Fields

この研究では、注意機構を併用したグラフニューラルネットワーク (Attention Graph Neural Network) を開発し、流体場の予測精度を向上させた。

深層学習グラフニューラルネット生成3D

用途: 流体場の予測における注意機構の活用
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化検出生成異常検知

Classical Hardware Acceleration of Quantum Autoencoders for Real-Time Anomaly Detection in Collider Experiments

この研究では、クラスター検出アナライザーにおける量子力学の応用を研究し、精度を向上させた。

用途: クラスター検出アナライザーにおける量子力学の応用
難易度: Hard
コスト: Low

Statistical Inference for Rank Allocation in Low-Rank Adaptation

パラメータ効率の確保を目的とした Low-Rank Adaptation (LoRA) のランクの確立を扱う研究を紹介する。

深層学習Transformer生成QAテキスト

用途: パラメータ効率の確保
難易度: Hard
コスト: High

MI向き深層学習Transformer生成テキスト強化学習

OLEDLM: A Unified Language Model for OLED Molecular Design

OLED 材料の開発を目指す新しいアプローチ、causal language models を用いて optoelectronic プロパティを予測するフレームワークを提案する。

用途: OLED 材料の開発
難易度: Hard
コスト: High

SPECTRA: State-Space Exogenous Context and Temporal-Frequency Resolution Architecture for Probabilistic Energy Forecasting

Modern power systems increasingly require probabilistic forecasts amid interacting uncertainties from renewabl

自然言語処理RAG生成予測テキスト

用途: 生成
難易度: Hard
コスト: Low

品質予測/異常検知深層学習軽量化・量子化分類生成動画

HeadCast: Casting Attention Heads for Efficient Autoregressive Video Generation

流動画像生成を扱う研究、HeadCast を用いて流動画像生成を提案する。

用途: 流動画像生成
難易度: Hard
コスト: High

Machine-Learned Compact Subspace Generation for Quantum Selected Configuration Interaction within Density Matrix Embedding Framework

Sample-based Quantum Diagonalization (SQD), an extension of Quantum Selected Configuration Interaction (QSCI),

自然言語処理RAG生成テキスト

用途: 生成
難易度: Hard
コスト: Low

Antigen-specific Antibody Multi-modal Foundation Model for Functional Antibody Design

この研究では、抗原特異性抗体を設計するために、抗原および抗体の間でエピトープレベルでのペアリングが必要であることを考慮した、抗原特異性の抗体多モーダルファンデーションモデル（AAMFM）を提案しました。

自然言語処理RAG分類生成テキスト

用途: 抗原特異性抗体設計
難易度: Hard
コスト: High

Diffusion ReRoll: Revisable Denoising for Robotic Sequential Prediction

この研究では、実世界ロボットのシーケンシャル予測に使用できる、diffusion-based frameworkを提案しました。

自然言語処理RAG生成異常検知テキスト

用途: 実世界ロボットのシーケンシャル予測
難易度: Hard
コスト: High

MI向き自然言語処理ファインチューニング生成テキストマルチモーダル

Hypothesis-and-Refinement Learning of Organic Structures from Multimodal Spectroscopic Data

分子構造を決定するために、スペクトルデータから自動的な構造解析を実施するための方法を提案している。この方法は、スペクトルデータに基づいてヒントと改良を繰り返すことで、分子構造を決定するもので、分子の可能性の広範な構造スペ

用途: 分子構造の解析
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョンセグメンテーション分類生成画像

Analytic Distribution of Classifier-Free Guidance for Schedule Design

Classifier-free guidance (CFG) is the default mechanism for conditional generation in diffusion models, but th

用途: 分類
難易度: Hard
コスト: High

CPUで試しやすいコンピュータビジョンセグメンテーション生成

How Fast Can Reward Models Score? A Systems Study of C++ and PyTorch Inference Runtimes for RLHF

In RLHF pipelines, reward scoring blocks policy updates. Slow scoring bottlenecks the entire loop, since no up

用途: 生成
難易度: Hard
コスト: Medium

品質予測/異常検知深層学習軽量化・量子化生成テキスト

Multi-Mask Diffusion Language Models for Few-Step Generation

この研究では、多マスク分散言語モデルを提案します。このモデルには、複数のマスクがあり、それぞれが異なる生成タスクを実行することになります。このモデルは、生成タスクの多様性を高めることができ、生成された文がより多様性の高い

用途: リトルバイトの生成
難易度: Hard
コスト: High

MI向きコンピュータビジョンセグメンテーション生成テキスト

Nuclear Quantum Effects as a Denoising Problem

この研究では、核量子効果をシミュレートするために、画像時刻パス積分を利用した分散機械学習アルゴリズムを提案します。このアルゴリズムは、分散機械学習を利用して核量子効果をシミュレートすることに成功し、核量子効果に関連する問

用途: 核量子効果のシミュレーション
難易度: Hard
コスト: High

Can an AI System Be Creative? A Critical Perspective from Art and Engineering

This paper examines the question of whether artificial intelligence (AI) systems can be creative, approached f

深層学習Transformer分類生成画像

用途: 分類
難易度: Hard
コスト: Low

Refusal-Gated Decoding: Preserving Refusal Behavior Under High-Temperature Sampling

High-temperature sampling is one of the primary mechanisms for increasing diversity in LLMs. Recent advances i

用途: 生成
難易度: Hard
コスト: High

GPE: Evaluating Robust Evidence Aggregation for Fact Verification under Controllable GEO-Style Poisoning

Large language models increasingly use search tools to retrieve up-to-date information, introducing a new atta

用途: 生成
難易度: Hard
コスト: High

DS@GT ARC at ImageCLEFmed GANs 2026: Geometric Filtering for Privacy-Preserving CT Slice Generation

We present a privacy-preserving framework for synthetic lung CT slice generation developed for the Image-CLEFm

自然言語処理埋め込み・検索生成画像

用途: 生成
難易度: Hard
コスト: High

WaveformQA: Benchmarking LLM Temporal Reasoning on Digital Waveforms

Large Language Models (LLMs) have demonstrated strong capabilities in code generation and reasoning, yet their

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知画像検査深層学習軽量化・量子化生成画像テキスト

Demonstrating GenDB: Instance-Optimized and Customized Query Processing Code Generation via LLM Agents

Traditional query processing engines require continuous development and extensions to support new techniques a

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成テキスト動画

RealVDeblur: One-Step Diffusion for Generalizable Real-World Video Deblurring

Real-world video deblurring remains challenging due to diverse motion patterns, complex degradations, and the

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer分類生成画像

Persian Pixel: A large-scale synthetic OCR dataset for Persian language

Optical Character Recognition (OCR) for Persian remains substantially less mature than for Latin-script langua

用途: 分類
難易度: Hard
コスト: High

品質予測/異常検知生成AI拡散モデル検出生成テキスト

Generative AI floods and dilutes the market for books

Generative AI can produce book-length works of fiction at near-zero cost. These books are often dismissed as l

用途: 検出
難易度: Hard
コスト: High

Understanding Generative AI-mediated User Engagement with Academic Library Resources

This study empirically analyzed generative AI as an emerging discovery pathway to academic library resources.

用途: 生成
難易度: Hard
コスト: High

Sound Probabilistic Safety Bounds for Large Language Models

最新言語モデル(LLM)が危険な生成を防ぐための確信的な安全な限界を計算するための新しいフレームワークを提案した。Clopper-Pearsonの信頼区間の新しい応用として、PAC(可能性が最も近い)の境界を得るためのア

深層学習軽量化・量子化生成テキスト音声

用途: 生成性質へのリスクを抑える
難易度: Hard
コスト: High

表形式向き自然言語処理大規模言語モデル生成テキスト

PoTRE: Test-Time Reasoning inspired by Cognitive Heterogeneity

モデルの脆弱性を解決するために、四つのエージェントに分割される多様なフレームワークPoTREを導入した。モデルの推論能力を強化し、単一のストリーミングアプローチよりも複雑な理論的制約とアブストラクションに抵抗できるように

用途: 複雑な推論力のあるタスクの解決
難易度: Hard
コスト: High

少数データ向き深層学習Transformer生成テキスト

The Maskability Index: Predicting Task-Objective Alignment in Pretrained Language Models

ある知識関係がマスクスタイルのパラミトリックパラメータ化方法で適切かどうかを計算したメトリックとしてMaskability Index (MI)を導入した。DepthRankの違いを用いて、与えられたパラメータ化方法で知

用途: 強い知識獲得タスクでのパフォーマンスの向上
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成テキスト音声

Pushing the Frontier of Full-Song Generation: Hierarchical Autoregressive Planning Meets Flow-Matching Rendering

3つのタスクをサポートする音曲生成フレームワークを提示した。これらのタスクには、歌詞、テキストの説明、音楽的特性を利用して、歌詞の生成、バンドの音楽の生成、カバー曲の生成などが含まれる。

用途: 音楽の生成
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル生成

DQAOA-GPT: AI-Accelerated Distributed Quantum Optimization for Combinatorial Problems

組み合わせ方程式の最適化を解くための新しいフレームワークを提示した。分布される量子アルゴリズムの局所的な制限に直面する際、最適化の解を導けるために、分布される量子近似最適化アプローチと深層学習アルゴリズムを組み合わせた。

用途: 方程式組み合わせの最適化
難易度: Hard
コスト: High

MI向き品質予測/異常検知深層学習Transformer生成画像動画

StreamHOI: Interaction-aware Temporal Memory Adaptation for Streaming HOI Video Generation

オフラインでの短時間の視覚生成が一般的な人間の行動の分析では、人間の行動の長期的な視覚生成は、実践的な長時間の視覚生成では実行不能である。StreamHOI は、人間間の視覚的な行動の生成を生成したいくつの画像を使用して

用途: 人物間の相互作用による視覚生成
難易度: Hard
コスト: High

CUSUM-Shaped Inference-Time Monitoring and Targeted Re-Decoding for Quantized Small Language Model Reasoning

Quantized small autoregressive reasoning models can enter long, repetitive, or unproductive trajectories, yet

自然言語処理RAG検出生成回帰

用途: 検出
難易度: Hard
コスト: Low

Language-Specific versus Cross-Lingual Knowledge Graphs for Implicit Aspect Identification in Arabic: A Comparative Study of Reasoning and Adaptation Strategies

Aspect-based sentiment analysis (ABSA) in Arabic must recover both explicitly stated aspects and implicit aspe

用途: 生成
難易度: Hard
コスト: High

Global Difference Constraint Propagation for Constraint Programming

差分制約問題を扱うグローバルプロパゲーターを提案し、Finite Domain Propagationアルゴリズムの効率化に寄与。

用途: 制約プログラミングの効率化
難易度: Hard
コスト: Medium

Rushes: A Human Preference Dataset for Pluralistic Alignment

We introduce Rushes, a dataset and benchmark for studying revealed human engagement preferences in interactive

用途: 生成
難易度: Hard
コスト: High

Learning to Detect UI Principle Violations via Reinforcement Learning

Small language models and coding agents increasingly generate web front-end code, yet their outputs are typica

用途: 生成
難易度: Hard
コスト: High

PyroDash: Cost-Efficient Token-Level Small-Large Language Model Collaborative Inference

危険な問題に対する正しい答えを提供する大きな言語モデルと費用の効率が良い、小さな言語モデルを協力させる技術が開発されました。

用途: 小さな言語モデルを大きい言語モデルと協力させる手法が効率的かつ安全に実装される
難易度: Hard
コスト: High

MI向き自然言語処理大規模言語モデル生成画像テキスト

Back to Back with a Copy: A Computational Analysis of AI-Generated Visual Contemporary Art Pastiches

AIは、特に当代芸術作品のパスティーシュを作成する能力が高いが、これらの作品はどれだけ実際の作品と似ているかを調べました。

用途: AI生成された芸術作品と原画との相似性を調べる
難易度: Hard
コスト: High

Overview of FinMMEval 2026 Task 2: Multilingual Financial Short-Answer Question Answering

FinMMEval 2026 タスク 2 は、英語で提出された短答式の金融問題を解決することを目的としています。英語以外の言語による証拠も使用されます。

自然言語処理RAG生成QA検索

用途: 金融問題を解決する
難易度: Hard
コスト: Low

Sentence Splitter: Uncovering Latent Factual Structure for Self-Supervised Learning

この研究ではSentence Splitterシステムを提案し、自然言語処理の精度を高めることができました。このシステムは、自然言語を句点で分割することができます。

深層学習軽量化・量子化生成セグメンテーションQA

用途: 自然言語処理を改善する
難易度: Hard
コスト: High

VizRAG: Enhancing Retrieval-Augmented Generation with Hypergraph Visualization

Hypergraph-based RAG systems surpass traditional graph-based approaches by organizing complex n-ary atomic fac

用途: 生成
難易度: Hard
コスト: High

Beyond Relevance-Centric Retrieval: Rubric-Oriented Document Set Selection and Ranking

3D オキュピエンシー予測には、物体の配置と密度を解釈するための視覚的手法が必要です。従来の方法では、計算コストが高くなりすぎていたが、新しく提案されたGaussianSeedアルゴリズムは、層を階層化することで、計算コ

用途: 3次元空間における物体の配置と密度の予測
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョンマルチモーダル分類生成画像

Ocular Verification for Virtual Reality

Virtual reality (VR) headsets (e.g., Meta Quest, Apple Vision Pro) provide a seamless user experience due to t

用途: 分類
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理RAG生成テキスト3D

3D-GIMP: When 3D Gaussian Inpainting Meets PatchMatch

Recent advances in 3D scene editing have leveraged iterative diffusion models to update input views. However,

用途: 生成
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション生成画像3D

A real-time RGB-D perception pipeline for autonomous impact hammers in mining: self-filtering, rock segmentation and rock-breaking poses generation

Impact hammers, also known as rock-breakers, are essential machines in mining operations, where they perform s

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョンセグメンテーション生成画像3D

Axolotl3D: a Unified Framework for Faithful 3D Shape Completion

Recent 3D generative models produce high-quality geometry from a single image using large-scale priors and dif

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョン3D・点群生成画像3D

ATSplat: Compact Feed-forward 3D Gaussian Splatting with Adaptive Token Expansion

Novel View Synthesisは、入力画像から新しい視点の画像を生成するタスクです。ATSplatアルゴリズムは、3次元ガウススプラッタリングを Feed-forward に適合させました。これにより、ATSp

用途: Novel View Synthesis
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル生成動画強化学習

PercepCap: Video Captioner with Structured Spatio-Temporal Perception

ビデオキャプション生成には、空間と時刻の理解が重要です。PercepCapアルゴリズムは、ビデオ入力を空間時刻認識に分解することで、生成されたキャプションの理解度が向上するとともに、空間時刻の誤差をより正確に検出でき、キ

用途: ビデオキャプション生成のための構造化された空間時刻の理解
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション生成テキスト動画

Self Gradient Forcing: Native Long Video Extrapolation

長時間ビデオエクストラポレーションには、高度な視覚的知能が必要です。Self Gradient Forcingアルゴリズムは、学生モデルを教師モデルから生成される歴史の下で学習させることで、長時間ビデオエクストラポレーシ

用途: 長時間ビデオエクストラポレーションのための自力勾配強制
難易度: Hard
コスト: High

Evolving Cache Schedules for Fast Diffusion Policy Inference

分散式推論には、高解像度ビデオ生成のためにコストが高いという問題があります。Evolving Cache Schedulesアルゴリズムは、コストと効率性のトレードオフを最適化することで、キャッシュで推論コストを削減しま

用途: 分散式推論のためのキャッシュスケジュールの進化
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション生成画像動画

Vera: Identity-Faithful Human Subject-to-Video Generation

Subject-to-video (S2V) generation has made substantial progress in preserving reference subjects across divers

用途: 生成
難易度: Hard
コスト: High

センサ/時系列品質予測/異常検知自然言語処理大規模言語モデル生成画像テキスト

RS-RIE-Bench: Benchmarking Reasoning-Guided Remote Sensing Image Editing

Remote sensing image editing aims to modify remote sensing images according to natural language instructions w

用途: 生成
難易度: Hard
コスト: High

PerceptDrive: Perception Prior World-Action Modeling with Adaptive Expert Routing for End-to-End Autonomous Driving

Frozen perception foundation models encode rich geometric, semantic, and dynamic knowledge. Yet narrow conditi

深層学習軽量化・量子化生成動画自己教師

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成画像テキスト

SHFormer: Dynamic Spectral Filtering Convolutional Neural Network and High-pass Kernel Generation Transformer for Adaptive MRI Reconstruction

Attention Mechanism (AM) selectively focuses on essential information for imaging tasks and captures relations

用途: 生成
難易度: Hard
コスト: High

Importance-Aware OBS Pruning for Diffusion Models

セグメンテーションのパフォーマンスの向上と計算的リソースの削減を目的として、Lean-SAM2は対象領域をアサインする対象アンバウンダリーセグメンテーション（SAM2）にターゲットアンチャイニングされたメモリとエンコーダ

深層学習軽量化・量子化生成画像

用途: 画像のセグメンテーションに効率を実現
難易度: Hard
コスト: High

STEREOFLOW: Progressive Stereo Matching with StereoDiT and Transition Flow Matching

ステレオマッチングは3次元再構成において重要なタスクです。この研究では、ステレオマッチングを確率的生成タスクと組み合わせ、オブジェクト検出の向上を目的として、ステレオマッチングフレームワークと潜在分配を統合する方法を提案

深層学習Transformer生成回帰画像

用途: オブジェクト検出の向上
難易度: Hard
コスト: High

MI向き品質予測/異常検知自然言語処理大規模言語モデル生成画像テキスト

ETPDesigner: Multi-Agent Orchestration for Interactive Multimodal Electronic Theater Program

ETPデザイナはマルチモーダルな電子シアターのデザインを自動化するフレームワークを提案します。

用途: 生成
難易度: Hard
コスト: High

G-MAD: A Game-Based Data Generation Framework for Multi-View RGB-T Aerial Object Detection

This work introduces G-MAD, an open-source framework that uses Arma3 to generate synchronized multi-view RGB-T

コンピュータビジョン物体検出検出生成

用途: 検出
難易度: Hard
コスト: Medium

WearWow: Native 2K Multi-Garment Virtual Try-On via Adaptive Token Packing and Preference Alignment

Synthesizing native 2K multi-garment virtual try-on is a formidable frontier in digital fashion, critically bo

品質予測/異常検知自然言語処理RAG生成テキスト

用途: 生成
難易度: Hard
コスト: High

MV-Bench: Benchmarking Multimodal Large Language Models for Coordinated Multi-View Interface Construction

Multimodal large language models (MLLMs) are increasingly expected to automate visualization development by ge

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョンセグメンテーション生成画像テキスト

OSVE: One Step Video Editing with One Step Diffusion Models

Text-guided video editing with diffusion models is impractically slow, hindered by costly multi-step sampling

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習Attention機構分類生成画像

MTVDiff: Multimodal Conditional Latent Diffusion for Enhanced Thermal-to-Visible Face Translation

Thermal-to-visible face translation presents fundamental challenges including geometric discontinuities, seman

用途: 分類
難易度: Hard
コスト: High

KineBench: Benchmarking Embodied World Models via IDM-Free Kinematic Grounding

Evaluating the physical consistency of embodied world models(EWMs) is a critical open challenge. While closed-

コンピュータビジョン3D・点群生成異常検知画像

用途: 生成
難易度: Hard
コスト: High

少数データ向きCPUで試しやすい条件最適化自然言語処理ファインチューニング検出生成画像

PRISM-DR: Per-lesion Retinal Inference with Specialist Models for Diabetic Retinopathy

この研究では、糖尿病性黄斑病変の検出を目的としたPRISM-DRシステムを開発しました。このシステムは、医師が見逃す可能性がある小さな低コントラストな病変を見つけるのに役立ちます。

用途: 糖尿病性黄斑病変を検出する
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理ファインチューニング生成セグメンテーション画像

Extending a Large View Synthesis Model for Multi-view Panoptic Segmentation

自律ロボットには、障害物や事故の回避能力が必要です。これは、障害物や事故の回避能力が強化されていれば、障害物や事故に対しての対策がより効果的になります。障害物や事故の回避能力が強まることで、ロボットが障害物や事故から安全

用途: 自動ロボットが障害物や事故を回避できるようにする
難易度: Hard
コスト: High

Point-Selection Fine-Tuning Framework for Robust Point Cloud Classification

Noisy and corrupted points can substantially degrade point cloud recognition performance, especially under cha

深層学習軽量化・量子化分類生成3D

用途: 分類
難易度: Hard
コスト: High

SafeGen: Goal-Conditioned Video Diffusion of Safety-Critical Scenarios for VLM-Based Autonomous Driving

VLMs are increasingly deployed in AD systems, creating an urgent need for rigorous safety evaluation under rar

自然言語処理RAG生成画像テキスト

用途: 生成
難易度: Hard
コスト: High

SeededGrasp: Language-Guided Grasping in Complex Scenes with Multiple Embodiments

Language-Guided Grasping は、複雑なシーンで物体の把持を行うために、視覚言語モデル（VLM）を用いる。このアプローチでは、VLM は直接把持を予測するのではなく、3 次元空間における把持の位置を指

深層学習軽量化・量子化生成テキスト3D

用途: 複雑なシーンで物体の把持を実現
難易度: Hard
コスト: High

センサ/時系列品質予測/異常検知MLOpsモデルデプロイ生成

Morphing MILR: Design and control of a cable-driven limbless robot with rolling joints for maneuvering in complex environments

Limbless robots offer exceptional mobility in confined and cluttered environments due to their slender bodies

用途: う
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-22

NexForge: Scaling Agent Capabilities through Requirement-Driven Task Synthesis for LLMs

Scaling executable agent training data for LLM post-training is bottlenecked by substrate-bound methods that t

用途: 生成
難易度: Easy
コスト: High

picollm — On-device LLM Inference Powered by X-Bit Quantization

デバイス上のLLM推論をXビット量化を使用したもの。

用途: ラジケイタクイズナイゼーション
難易度: Easy
コスト: High

Finance-LLMs — Comprehensive Compilation of Real-World LLM & AI Agent Use Cases in Financial Services

販売データを分析するために、機械学習モデルが使用されるリソースが提供されていました。

用途: 販売データを分析する
難易度: Easy
コスト: High

OpenWorldLib — Unified Codebase for Advanced World Models.

OpenWorldLibは、進化する世界モデルを提供する統一されたコードベースです。

コンピュータビジョン3D・点群生成動画3D

用途: 世界モデルを提供する
難易度: Easy
コスト: High

Awesome-CVPR2026-CVPR2025-ICCV2025-CVPR2024-ECCV2026-ECCV2024-AIGC — A Collection of Papers and Codes for CVPR2026/CVPR2025/ICCV2025/CVPR2024/ECCV2026/ECCV2024 AIGC

CVPRに基づくAIを取り入れるための資料集を提供します。CVPR 2026、2025、2024、およびECCV 2024に基づくAIGCに関する研究論文とソフトウェアコードを含みます。

コンピュータビジョン3D・点群生成画像動画

用途: AIをCVPRに応用する
難易度: Easy
コスト: High

Boltzmann-Expected Molecular Design with Decoupled Annealing Flows

分子設計を自動化する方法「Boltzmann-Expected Molecular Design with Decoupled Annealing Flows（DECAF）」を提案。分子設計で重要な3次元構造の特性を確率

コンピュータビジョン3D・点群生成テキスト3D

用途: 分子設計の自動化
難易度: Hard
コスト: High

Strong Gravitational Lensing Posterior Sampling in Pixel-Space Using Diffusion Models and Recurrent Inference Machines

Modeling galaxy-galaxy strong gravitational lenses to infer the brightness of the source galaxy and the mass d

深層学習Transformer生成画像

用途: 生成
難易度: Hard
コスト: High

When Reasoning Narrows the Move: Diversity Collapse in LLM Game Play

Supervised fine-tuning (SFT) is widely used to adapt large language models to downstream tasks, but its effect

用途: 生成
難易度: Hard
コスト: High

Two-Level Meta-Rubrics for Evaluating Open-Ended Generation: GAMUT, a Benchmark for Factual Completeness

Evaluating the factuality of long-form generations has focused predominantly on precision, measuring whether t

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成テキスト

AdaFlash: Adaptive Speculative Decoding via On-Policy Distilled Diffusion Drafters

Offline再調整学習（RL）で、アクション偏好キューを使用し、エキスパートのフィードバックを利用してポリシーを向上させます。

用途: Offline RLにおけるアクション偏好キューの有効な使用
難易度: Hard
コスト: High

説明可能品質予測/異常検知自然言語処理大規模言語モデル生成テキスト強化学習

Beyond Score Prediction: LLM-Based Essay Scoring and Feedback Generation via Reinforcement Learning with Rubric Rewards

Large language models (LLMs) have been widely applied to automated essay scoring (AES) and automated feedback

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理埋め込み・検索分類生成

Supra Cognitive Modes: A Routed Architecture for Agent Memory

この研究では、エージェントメモリーのワークロードは直接的事実検索、関係連鎖や現在の状態の推論、長時間の履歴上に関係がある合成を組み合わせて、Supra Cognitive Modes を開発しました。このアーキテクチャで

用途: メモリアーキテクチャの設計
難易度: Hard
コスト: Low

Computational Humor with Multimodal LLMs: Methods, Datasets, Evaluation, and Challenges

Multimodal humor in memes, cartoons, and comics remains difficult for AI systems because intended meaning depe

自然言語処理大規模言語モデル分類生成画像

用途: 分類
難易度: Hard
コスト: High

MedDDC-Eval: Diagnosis-Decoupled Evaluation of Multi-Turn Medical Consultation Agents

Multi-turn medical consultation agents must decide what to ask, adapt to patient responses, and determine when

説明可能品質予測/異常検知自然言語処理RAG生成

用途: 生成
難易度: Hard
コスト: Low

品質予測/異常検知自然言語処理大規模言語モデル分類検出生成

AutoJourn: Multi-Perspective Summarisation, Bias Detection and Bias Neutralisation for LLM-Generated News in Automated Journalism

We present AutoJourn, a demonstration system for multi-perspective news generation and bias-aware evaluation u

用途: 分類
難易度: Hard
コスト: High

From a Multilingual Streaming ASR Backbone to Kenyan-Language Systems: Data-Centric Adaptation of Nemotron 3.5 for Kikuyu, Dholuo, and Kalenjin

Automatic speech recognition (ASR) for African languages is constrained by orthographic inconsistency, annotat

深層学習RNN / LSTM分類生成テキスト

用途: 分類
難易度: Hard
コスト: Low

HindsightBench: A Black-Box Behavioral Audit Protocol for Parametric Hindsight in Time-Indexed LLM Decision Tasks

大規模言語モデルは、決定タスクを遂行する過程で、実行された事実を含むパラメトリックな知識を漏らす傾向にある。大規模言語モデルが実際にどのような意思決定タスクを遂行したかを検証するのは困難であるものの、これが確かに事実であ

用途: LLMによる金融意思決定タスクの検証
難易度: Hard
コスト: High

HPD-Parsing: Hierarchical Parallel Document Parsing

Efficient teamwork typically combines global coordination with parallel execution, a principle not yet fully r

深層学習軽量化・量子化生成テキストマルチモーダル

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル生成QAテキスト

AILQA: Evaluating AI-Driven Legal Question Answering Systems for the Indian Legal System

This comprehensive study introduces an advanced Artificial Intelligence for Indian Legal Question Answering (A

用途: 生成
難易度: Hard
コスト: High

CASE: Causal Alignment and Structural Enforcement for Improving Chain-of-Thought Faithfulness

Chain-of-thought (CoT) reasoning is widely used to improve both the performance and interpretability of large

説明可能自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Hard
コスト: High

RF-Agent: A Practical Framework for Building Language Agents for RFIC Design

Large language models (LLMs) have driven rapid progress in electronic design automation (EDA), yet their appli

用途: 生成
難易度: Hard
コスト: High

BaseRT: Advancing Best-in-Class LLM Inference with Apple M5 Neural Accelerators

Apple's M5 generation introduces a redesigned GPU architecture in which every core carries a dedicated Neural

用途: 生成
難易度: Hard
コスト: High

CPUで試しやすい品質予測/異常検知深層学習軽量化・量子化生成テキスト

RAGAL: A Frugal, Fully Local Retrieval-Augmented Assistant for Technical Support at a Government Agency

Public institutions hold large volumes of sensitive documents and support tickets that cannot leave the premis

用途: 生成
難易度: Hard
コスト: High

説明可能深層学習Transformer生成強化学習

Stale but Stable: Staleness-Adaptive Trust Regions for Stabilizing Asynchronous Reinforcement Learning

離散RLは、長所と短所を含む複雑なランク付けゴールの最適化に効果があります。しかし、その計算コストは通常高く、自動微分化などの複雑なグラadientsの計算アラウンドを必要とします。この文書では、長所と短所を含むランク付

用途: 離散RLアルゴリズムの性能アップデート
難易度: Hard
コスト: High

Fusion Embedding: A Unified Embedding Space for Text, Image, Video, and Audio

A single embedding space that covers text, images, video, and audio lets one index serve every query a user ca

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成検索テキスト

PLAID-PRF: Pseudo-Relevance Feedback with Centroid-like Tokens in PLAID

Multi-vector dense retrieval models, such as ColBERT, achieve strong retrieval effectiveness by modelling fine

用途: 生成
難易度: Hard
コスト: Low

深層学習Transformer分類生成セグメンテーション

Pathologist Attention-Aligned Report Generation for Prostate Histopathology

The allocation of visual attention by pathologists during cancer diagnosis is a highly selective process that

用途: 分類
難易度: Hard
コスト: High

Geospatial Diffusion-based Evolution Synthesis (GeoDES) for Storm-Centered Weather Augmentation

While machine learning-based weather models hold significant promise, they struggle to predict the detailed st

深層学習軽量化・量子化生成画像動画

用途: 生成
難易度: Hard
コスト: High

MI向き深層学習Transformer生成画像テキスト

Appearance Pointers -- Multimodal Region Control of Diffusion Transformers

画像生成において、材料、 객체、領域を制御することが難しい問題がある。 Diffusion Transformers はテキストと画像を組み合わせて処理できるが、どちらをどの程度影響させるか決める仕組みがなかった。その

用途: 多モーダル画像制御
難易度: Hard
コスト: High

MI向き自然言語処理ファインチューニング生成画像テキスト

ExpertVerse: A General-Purpose Benchmark for Expert-Level Reasoning in Knowledge-Intensive Visual Synthesis

Recent advances in multimodal generative models have enabled instruction-based image generation to move beyond

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成画像

ROMS-IMLE: A Minimalist Approach to Competitive Single-Step Generative Modelling

生成モデルの構築のための新しいアプローチが提案されていました。これにより、生成モデルの構築が効率化され、強い表現力が得られるようになります。

用途: 生成モデルの構築
難易度: Hard
コスト: High

InstructMixup: Instruction-Guided Salient Patch Editing for Robust Data Augmentation

記述情報に従って画像や動画データを混ぜ合わせる「対数混合法」を拡張する方法、InstructMixupを提案する。これにより、データを拡張しながらデータの内容とラベルが維持される。

深層学習Transformer分類検出生成

用途: データ拡張のための対数混合法を拡張する
難易度: Hard
コスト: High

PathAgentBench: Benchmarking Evidence-Seeking Vision-Language Models on Whole-Slide Pathology Image

Whole-slide image (WSI) diagnosis requires identifying diagnostically relevant regions, examining them across

自然言語処理ファインチューニング検出生成画像

用途: 検出
難易度: Hard
コスト: High

Computing on the Fly: Navigating a Vision for the Future of Drone Computing

The report envisions a decade in which drones move goods, medical supplies, and information at a scale compara

強化学習検出生成

用途: 検出
難易度: Hard
コスト: High

センサ/時系列コンピュータビジョン動画認識検出生成画像

NGPS: GPS-Denied Aerial Geo-Localization and 2.5D Reconstruction via Deep Satellite Image Matching and Multi-Rate Sensor Fusion

この研究では、高空飛行の無信号位置指示のNGPS (Next-Generation Positioning System)というフレームワークを提案しました。NGPSは、GPSの信号を利用せずに位置推定を可能にします。N

用途: 高空飛行の無信号位置指示
難易度: Hard
コスト: High

Correct-by-Construction Behavior Tree Synthesis from Signal Temporal Logic Specifications with Application to Robotic Missions

行動木はロボットの複雑なタスクの実行に広く採用されており、モジュラーで反応的な制御を提供します。しかし、既存の合法的な生成方法は、線形時間論理（LTL）のみに制限されるため、量的タイミング制約を表現できません。この論文で

用途: 行動木の合法的な生成を解決する
難易度: Hard
コスト: Low

End-to-end Conditional Diffusion for Realistic and Controllable Visual Traffic Scenario Generation

この文書では、閉回路交通シナリオ生成のための変分ベースのアプローチ「E2E-CDiff」を提案しました。これを使用すると、実世界に近い交通ルールを生成したり、交通ルールを操作することができるようになります。

生成AI拡散モデル生成画像

用途: 自動運転データの生成
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理RAG生成要約テキスト

FinanceComplexQA: Benchmarking Agentic Reasoning on Industrial-grade Financial Documents

Agentic Reasoning has become a transformative force in financial analysis due to its ability to integrate larg

用途: 生成
難易度: Easy
コスト: Low

品質予測/異常検知自然言語処理ファインチューニング分類生成テキスト

Moving Alphabet: A Controlled Study of Training Data for Text-to-Video Generation

Text-to-video generation has advanced significantly over the past five years through scaling of model size, da

用途: 分類
難易度: Easy
コスト: High

Generative World Renderer at the Speed of Play

Generative world renderer AlayaRenderer receives structured world states exported from physics engines and syn

用途: 生成
難易度: Easy
コスト: Medium

説明可能深層学習Transformer生成画像テキスト

Text Template Tokens Are Implicit Semantic Registers in Diffusion Transformers

Text-to-image diffusion transformers (DiTs) jointly process text and image tokens, yet their internal computat

用途: 生成
難易度: Easy
コスト: High

品質予測/異常検知深層学習Transformer生成画像テキスト

Mage-Flow: An Efficient Native-Resolution Foundation Model for Image Generation and Editing

Large-scale visual generators are increasingly capable but costly to train, fine-tune, and deploy. We introduc

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-07-21

agent-starter-pack — Ship AI Agents to Google Cloud in minutes, not months. Production-ready templates with built-in CI/CD, evaluation, and observability.

AIエージェントをGoogle Cloudに展開することが可能で、CI/CD、評価、観察など、プロダクションリードテンプレートが事前に用意されています。

用途: AIエージェントをGoogle Cloudに展開
難易度: Easy
コスト: High

githubGitHubあり2026-07-21

DNA-Diffusion — 🧬 Generative modeling of regulatory DNA sequences with diffusion probabilistic models 💨

人工DNAシーケンスを生成するモデルを提案し、DNAシーケンスを扱える機械学習的手法を開発することを目的としている。

生成AI拡散モデル生成

用途: DNAシーケンスの発生学習
難易度: Easy
コスト: High

Vector Search As Nearest Neighbor Matching: RAG-based Policy Learning in Causal Inference

因果推論を用いた政策学習を提案し、政策選択を行う際に最も近い類似の証拠によって行動の有効性を評価することを目指している。

用途: 因果推論の政策学習
難易度: Hard
コスト: Low

Program Synthesis for Simulation-Based Inference: Joint Model Selection and Parameter Estimation

Neural simulation-based inference enables parameter estimation for complex models, but typically requires the

用途: 生成
難易度: Hard
コスト: High

Scalable and Efficient Joint Spiking Embedding Predictive Architecture for Large-Scale Dynamic Graphs

Dynamic graph learning aims to capture evolving structural and semantic patterns in real-world systems, such a

深層学習軽量化・量子化分類検出生成

用途: 分類
難易度: Hard
コスト: High

説明可能品質予測/異常検知自然言語処理RAG生成画像テキスト

PathReportEval: A Systematic Benchmark for Pathology Report Generation

Pathology report generation from whole-slide images (WSIs) is a rapidly growing multimodal learning problem, y

用途: 生成
難易度: Hard
コスト: High

Computational models of pragmatic reasoning with flexible generation of meaning and expression alternatives

Pragmatic language use requires reasoning about alternatives: the alternative expressions a speaker might have

自然言語処理RAG生成テキスト

用途: 生成
難易度: Hard
コスト: Low

MI向き品質予測/異常検知深層学習軽量化・量子化生成テキスト3D

Do Language Models Dream of Binding Molecules? Benchmarking LLMs under Spatial Constraints

Structure-based drug design (SBDD) leverages the 3D structure of protein targets, often complemented by other

用途: 生成
難易度: Hard
コスト: High

FinSAgent: Corpus-Aligned Multi-Agent RAG Framework for Evidence-Grounded SEC Filing Question Answering

金融質問回答を実行するには、長い標準化されて高度に冗長な説明書に分散する証券取引委員会（SEC）の証拠を取得する必要がある。既存の取得を拡張するおよび多要素システムの多くの選択肢は、モデルの先行事項と目的のファイルリング

深層学習軽量化・量子化生成QAテキスト

用途: 金融質問回答問題を解決
難易度: Hard
コスト: Low

説明可能自然言語処理RAG生成テキストマルチモーダル

STeP: Signal Temporal Logic for Precise Specifications for Action Generation with Vision Language Models

Vision-language-action (VLA) models have shown impressive generalization, but often lack interpretability and

用途: 生成
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション生成3Dマルチモーダル

Closing the Loop in Humanoid VLA: Persistent 3D Object Tokens for Verifiable Loco-Manipulation

existing VLA methodの制約を解決するためのpersistent object token methodを提案し、ロボット制御をより実用的なものにする。

用途: 人間のロボット制御を解決する
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト動画

FARO: Feasibility-Aware Robot Motion Optimization

Fast planning of novel behaviors in unseen scenarios remains a fundamental challenge in robotics. The high-dim

用途: 生成
難易度: Hard
コスト: High

MI向き強化学習方策勾配 (PPO / A3C)生成

arxivGitHubあり2026-07-20

MEVION: Low-Cost Open-Source Data Collection System for Powerful and High-Speed Dual-Arm Manipulation

The global competition for developing robotic foundation models is intensifying. Among the data collection sys

用途: 生成
難易度: Hard
コスト: Medium

コンピュータビジョンセグメンテーション生成画像動画

Does Robust VIO Need More Learning? Geometry-Verified Visual Measurements under Distribution Shift

Learning is increasingly introduced into visual-inertial odometry (VIO), ranging from learned feature front-en

用途: 生成
難易度: Hard
コスト: High

From Sign Language Generation to Humanoid Execution: Vision-Language Guided Retargeting with Collision Mitigation

この論文では、ラインダブルロボットのための自発的アクション生成を実現することを目標とし、vision-language 指向性の指令によりロボットが自発的に動作することができることを示します。

コンピュータビジョン3D・点群生成画像3D

用途: ラインダブルロボットのための自発的アクション生成
難易度: Hard
コスト: High

AlayaWorld: Interactive Long-Horizon World Modeling -- Full Technical Report

Unlike conventional video game development, which relies on labor-intensive pipelines for asset production, an

用途: 生成
難易度: Easy
コスト: High

huggingfaceGitHubありHugging Faceあり2026-07-20

SciForma: Structure-Faithful Generation of Scientific Diagrams

Structural fidelity is essential to scientific methodology diagrams. To communicate research logic, these diag

品質予測/異常検知自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

HOMIE: Human-object Centric Video Personalization via Multimodal Intelligent Enchancement

Human-object centric video personalization (HOCVP) is a core task within subject-driven video generation. Howe

用途: 生成
難易度: Easy
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル検出生成セグメンテーション

FlowMimic: Mask-free Visual Editing and Generation with Pixel-pair Warped Flow Field for Online Video Editing Data Generation and Modality Mimicry

In line with the prevailing direction of vision research, we explore the integration of both generation and ed

用途: 検出
難易度: Easy
コスト: High

説明可能自然言語処理ファインチューニング分類生成異常検知

Token-Level Off-Policy Learning for Faithful Generation Under Distribution Shift

We propose Token-Level Off-Policy Labeling (TOPL), an off-policy training paradigm that reframes post-training

用途: 分類
難易度: Easy
コスト: High

FlashRT: Agent Harness for Guiding Agents to Deploy Real-Time Multimodal Applications

Real-time multimodal applications, including voice agents and interactive video generation, compose heterogene

深層学習軽量化・量子化生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

DiFA: Inference-Time Forward-Process Alignment for Diffusion Models

The prevailing inference framework for diffusion models formulates generation fundamentally as a problem of nu

コンピュータビジョン画像分類生成画像

用途: 生成
難易度: Easy
コスト: High

ShotPlan: Cinematic Video Generation with Learnable Planning Token

Current video generation models achieve impressive results in single-shot generation, yet remain limited in ci

MI向き自然言語処理埋め込み・検索生成動画

用途: 生成
難易度: Easy
コスト: High

ReViV: Reconstructing the Viewer and the View in 4D from Monocular Egocentric Video

Egocentric devices, such as wearable front-facing cameras, provide a unique perspective for capturing the cont

深層学習Transformer生成動画3D

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-07-20

BentoML — The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

モデルをサービングするためのライブラリを紹介している。

自然言語処理大規模言語モデル生成マルチモーダル

用途: モデルのサービング
難易度: Easy
コスト: High

githubGitHubあり2026-07-20

Open-dLLM — Open diffusion language model for code generation — releasing pretraining, evaluation, inference, and checkpoints.

Open-dLLMはOpen diffusion language modelを公開しており、コード生成の前トレーニング、評価、推論、チェックポイントを公開しています。

用途: コード生成の問題を解決する
難易度: Easy
コスト: High

説明可能品質予測/異常検知自然言語処理大規模言語モデル生成テキスト

arxivGitHubあり2026-07-19

CoEvoP&R: Co-Evolving Placement Objectives with Routing Feedback via Large Language Models

Analytical placers rely on differentiable objective functions to guide placement, typically combining intermed

用途: 生成
難易度: Hard
コスト: High

huggingfaceHugging Faceあり2026-07-19

EvolvingWorld: An Open-Schema Framework for Co-Evolving Role-Play Agents and World Model in Interactive Literary World

This paper introduces EvolvingWorld, a framework and benchmark for character and world co-evolution in interac

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-19

HarmoHOI: Harmonizing Appearance and 3D Motion for Multi-view Hand-Object Interaction Synthesis

Hand-Object Interaction (HOI) synthesis is a cornerstone for animation production and embodied AI. Despite the

品質予測/異常検知深層学習Transformer生成画像動画

用途: 生成
難易度: Easy
コスト: High

arxivGitHubあり2026-07-18

Twisted Schrödinger Bridge Matching

Over the past few years, diffusion-based Schrödinger bridge models have been proposed to approximate optimal t

生成AI拡散モデル生成

用途: 生成
難易度: Hard
コスト: High

Semi-Supervised Conditional Generative Learning through Stochastic Interpolation and Sufficient Representations

Conditional generative modeling remains a challenging problem in semi-supervised settings where labeled data i

自然言語処理RAG生成教師あり半教師あり

用途: 生成
難易度: Hard
コスト: Low

表形式向き品質予測/異常検知コンピュータビジョンセグメンテーション生成画像表形式

Semi-Supervised Conditional Diffusion via Label Augmentation

Conditional diffusion models have become a powerful and flexible framework for learning complex conditional di

用途: 生成
難易度: Hard
コスト: High

Decision Variable Analysis-Guided Differentiated Fuzzy Search for Large-Scale Multi-Objective Optimization

Large-scale multi-objective optimization problems (LSMOPs) are challenging due to their high-dimensional decis

条件最適化深層学習Transformer生成

用途: 生成
難易度: Hard
コスト: Medium

説明可能深層学習軽量化・量子化生成テキストマルチモーダル

G2-Nav: Grounded and Guarded Vision-Language Costmaps for Robot Social Navigation

Social navigation requires the robot to reason and respond in complex real-world environments. While recent wo

用途: 生成
難易度: Hard
コスト: High

説明可能センサ/時系列コンピュータビジョンマルチモーダル生成画像

What Do They See? Interpreting Complex Road Scenarios Through the Eyes of Vision-Language-Action Models for Safe and Trustworthy Autonomous Vehicle Learning

End-to-end autonomous driving models are now able to navigate complex road scenarios, mapping raw sensor obser

用途: 生成
難易度: Hard
コスト: High

Token-Wise Latent Streaming from Slow Reasoners to Fast Planners for Dynamic Vision Language Navigation

Vision-Language Navigation in dynamic, human-centric environments exposes a fundamental tension: linguistic re

コンピュータビジョンマルチモーダル生成

用途: 生成
難易度: Hard
コスト: High

SAGE: A Socially-Aware Generative Engine for Heterogeneous Multi-Agent Navigation

Safe and socially compliant navigation in open human-robot environments requires robots to reason about hetero

用途: 生成
難易度: Hard
コスト: High

huggingfaceHugging Faceあり2026-07-18

DataFlow-Harness: A Grounded Code-Agent Platform for Constructing Editable LLM Data Pipelines

Large language models (LLMs) are increasingly used to automate data-processing workflows, yet coding agents ty

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-18

Group Entropy-Controlled Policy Optimization

Entropy control has become an effective tool in reinforcement learning (RL) of large language models (LLMs), h

深層学習軽量化・量子化生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-18

Environment-free Synthetic Data Generation for API-Calling Agents

Training API-calling large language model (LLM) agents demands massive amounts of high-quality trajectories. H

用途: 生成
難易度: Easy
コスト: High

表形式向きコンピュータビジョンセグメンテーション生成表形式

Do Generative Models Keep Time? A Time-Aware Evaluation of Synthetic Sequential Tabular Data

Synthetic sequential tabular data are increasingly used for privacy-preserving data sharing, yet a generator c

用途: 生成
難易度: Hard
コスト: Low

Evolutionary Algorithm-Guided LLMs for Physics-Informed Neural Network Design

Physics-informed neural networks (PINNs) are unusually sensitive to interacting choices of architecture, activ

用途: 生成
難易度: Hard
コスト: High

Certifiable Safe Model-Based Reinforcement Learning with Control-Affine Dynamics Approximation

Safe model-based reinforcement learning (RL) often bridges control-theoretic analysis and RL for robots to saf

深層学習軽量化・量子化生成3D強化学習

用途: 生成
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション生成テキスト

Handroid: Bridging Dexterous Hand and Humanoid

この研究では、Robotのヘッドレスアンドメインアームの両方を1台のロボットに組み込み、両機能を切り替えれるようにする技術、Handroidを開発しています。

用途: ヘッドレスアンドメインアームの両方の開発
難易度: Hard
コスト: Medium

品質予測/異常検知深層学習Transformer生成動画

FVAttn: Adaptive Sparse Attention with Runtime Load Balancing for Video Generation

Video Diffusion Transformers process long spatio-temporal sequences, making self-attention the main bottleneck

用途: 生成
難易度: Easy
コスト: High

Apple-π: Benchmarking Thinking with Video Towards Law-Grounded Physical Intelligence

Modern video generation models are increasingly hailed as emerging world models with an internalized grasp of

自然言語処理大規模言語モデル生成動画

用途: 生成
難易度: Easy
コスト: High

Nonuniformity Principle in Human-AI Coworking

As generative AI is increasingly applied to automate multi-step and high-stake workflows, human judgment and i

品質予測/異常検知機械学習教師あり学習生成

用途: 生成
難易度: Easy
コスト: Medium

MI向き自然言語処理大規模言語モデル生成画像テキスト

S1-Omni: A Unified Multimodal Reasoning Model for Scientific Understanding, Prediction, and Generation

We present S1-Omni, a unified multimodal reasoning model for scientific understanding, prediction, and generat

用途: 生成
難易度: Easy
コスト: High

自然言語処理大規模言語モデル生成テキストマルチモーダル

githubGitHubあり2026-07-17

generative-ai — Comprehensive resources on Generative AI, including a detailed roadmap, projects, use cases, interview preparation, and coding preparation.

ゼネレーティブAIに関連するリソースの一覧。

用途: ゼネレーティブAI
難易度: Easy
コスト: High

githubGitHubあり2026-07-17

Awesome-Model-Merging-Methods-Theories-Applications — Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. ACM Computing Surveys, 2026.

LLMのマージに関してのマニュアルです。理論、方法、応用などについての概要が記載されています。

用途: LLMのマージ
難易度: Easy
コスト: High

Diffusion models recover accurate mixture weights despite score function insensitivity

スコアベース生成モデルにおけるモード分解能の向上を目的とした研究で、モード分解能がスコア関数に依存しておらず、生成サンプルから混合重みを推測できることを明らかにした。

深層学習Transformer生成マルチモーダル

用途: スコアベース生成モデルにおけるモード分解能の向上
難易度: Hard
コスト: High

Optimal Self-Distillation for Rectified Flow via Linear Probing

Modern generative models are increasingly trained using model-generated signals, creating both opportunities f

深層学習軽量化・量子化生成画像

用途: モデル改善
難易度: Hard
コスト: Medium

NeuronSoup: Evolving Asynchronous, Shared-Neuron Temporal Graphs without Backpropagation

この研究では、共有ニューロンを使用して時系列グラフを学習する方法、NeuronSoup を開発しました。NeuronSoup では、各パスの信号は、変数数の間のニューロンを通過する途中で、共有ニューロンを使用して伝票され

深層学習Transformer分類生成

用途: 神経ネットワークの共有ニューロンによる時系列グラフの学習
難易度: Hard
コスト: Low

SMC-ES: Automated synthesis of formally verified control policies

The deployment of autonomous cyber-physical systems in safety-critical environments requires closed-loop contr

強化学習モデルフリー (DQN / SAC)生成

用途: 安全な制御ポリシーを自動生成する
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-16

Xiaomi-Robotics-1: Scaling Vision-Language-Action Models with over 100K Hours of Real-World Trajectories

We present Xiaomi-Robotics-1, a foundational vision-language-action (VLA) model capable of (1) following diver

深層学習軽量化・量子化生成テキストマルチモーダル

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-16

xHC: Expanded Hyper-Connections

Hyper-Connections (HC) expand the residual stream of Transformers into N parallel streams, providing a form of

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-16

Beyond Entropy: Correctness-Aware Advantage Shaping via Contrastive Policy Optimization

Reinforcement learning with verifiable rewards (RLVR) commonly uses entropy for advantage shaping. However, en

深層学習軽量化・量子化生成強化学習

用途: 生成
難易度: Easy
コスト: Medium

githubGitHubあり2026-07-16

TurboDiffusion — TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

画像認証システムにおける悪用された画像からの画像の認証方法を提示しました。

深層学習軽量化・量子化生成動画

用途: 画像認証システムの改良
難易度: Easy
コスト: High

説明可能品質予測/異常検知深層学習Transformer生成

A Temporal Machine Learning-Based Time-to-Event Model for Predicting ALS Progression and Healthcare Utilization

Amyotrophic lateral sclerosis (ALS) is a progressive and heterogeneous neurodegenerative disease in which pred

用途: 生成
難易度: Hard
コスト: Medium

Analogical Deep Research: Retrieving and Integrating Historical Analogies for Foresight Analysis

述語学習における歴史的類推を推測し、歴史的類推を評価するためのアナロジーディープリサーチという新しいタスクを提案し、述語学習における歴史的類推が重要な役

用途: 述語学習で歴史的類推
難易度: Hard
コスト: High

SPECS: Speciated Evolutionary Circuit Synthesis

電子回路シンセシスのための遺伝的アルゴリズムを開発する。遺伝的アルゴリズムは設計者が電子回路の実装をより効率的に行う手助けになる。

品質予測/異常検知数学・理論最適化生成

用途: 電子回路シンセシスのための遺伝的アルゴリズム
難易度: Hard
コスト: Medium

How to Guide LLM Generation: Dual-Surrogate Guided Search for Automated Heuristic Design

Large language models (LLMs) have made automated heuristic design (AHD) increasingly practical by generating e

説明可能自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Hard
コスト: High

huggingfaceHugging Faceあり2026-07-15

DiffGI: Differentiable Geometry Images for High-Fidelity Thin-Shell 3D Generation

Existing 3D generative models predominantly rely on implicit volumetric representations, which enforce waterti

深層学習Transformer生成画像3D

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-15

Diagnosing and Calibrating Tool-Call Boundary Drift in Multi-Teacher On-Policy Distillation

Agentic language models must learn when to call tools, when to consume tool responses, and when to answer dire

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-15

VideoRAE: Taming Video Foundation Models for Generative Modeling via Representation Autoencoders

Video generative models commonly rely on latent spaces learned by 3D Variational Autoencoders (3D-VAEs). Howev

用途: 生成
難易度: Easy
コスト: High

arxivPaper only2026-07-14

ANGLE: Angular Neural Generative Learning via Engression

Circular data, representing angles or directions, are frequently encountered in computer vision, biology, geol

深層学習軽量化・量子化生成回帰画像

用途: 生成
難易度: Hard
コスト: High

arxivPaper only2026-07-14

What Does Goodness Measure? A Likelihood-Ratio Account of Forward-Forward Learning

フォワードフォワード法で信頼性を向上させるために、対数比推定を用いて信頼性の正確な推定値とする。

品質予測/異常検知深層学習正規化・最適化手法生成

用途: フォワードフォワード法の信頼性を向上させる
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-14

From Human-Centric to Agentic Code Review: The Impact of Different Generations of Generative AI Technology on Review Quality

Code review helps maintain software quality before code integration, but it also imposes a substantial workloa

品質予測/異常検知深層学習Transformer生成テキスト

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-07-14

agents-towards-production — End-to-end, code-first tutorials for building production-grade GenAI agents. From prototype to enterprise deployment.

AIエージェントの開発と実装を行うためのエンドツーマンド、コードファーストのチュートリアル。

用途: AIエージェントの開発と実装
難易度: Easy
コスト: High

githubGitHubあり2026-07-14

LakonLab — Official implementation of AsymFlow, pi-Flow, GMFlow

LakonLabは、AsymFlow、pi-Flow、GMFlowなどの生成型流体力学を実装するためのオープンソースプロジェクトです。

深層学習軽量化・量子化生成画像テキスト

用途: 生成型流体力学の実装
難易度: Easy
コスト: Medium

githubGitHubあり2026-07-14

memvid — Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.

MemVidは、サーバーレスで単一ファイルの記憶層を提案し、AIエージェントが即時検索と長期的な記憶を持つようにする記憶層です。

自然言語処理大規模言語モデル生成テキスト動画

用途: AIエージェントの記憶を管理する
難易度: Easy
コスト: High

arxivPaper only2026-07-13

CDFM: Towards a General-Purpose Causal Discovery Foundation Model

この研究では、Causal Discovery Foundation Modelを提案しました。このモデルは、観測データから潜在的な原因構造を回復することを目的としています。

用途: 健康状態の推測
難易度: Hard
コスト: High

arxivPaper only2026-07-13

Representing the Non-dominated Set of Multi-objective Network Problems by Supported Non-dominated Points

In multi-objective combinatorial optimization, unsupported non-dominated points typically outnumber supported

品質予測/異常検知コンピュータビジョンセグメンテーション生成

用途: 生成
難易度: Hard
コスト: Medium

huggingfaceGitHubありHugging Faceあり2026-07-13

RAGU: A Multi-Step GraphRAG Engine with a Compact Domain-Adapted LLM

Graph retrieval-augmented generation (GraphRAG) enhances large language models with structured knowledge, yet

自然言語処理大規模言語モデル検出生成要約

用途: 検出
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-13

Qwen-Music Technical Report

In this report, we introduce Qwen-Music, a powerful music generation model capable of producing highly musical

センサ/時系列品質予測/異常検知深層学習Transformer生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-13

SVR-R1: Bootstrapping Multi-modal Reasoning with Self-verification in Reinforcement Learning

We introduce Self-Verified Reasoner (SVR-R1), a multi-turn RL framework that turns a model's own verification

コンピュータビジョンセグメンテーション生成マルチモーダル強化学習

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-07-13

Irodori-TTS — A Flow Matching-based Text-to-Speech Model with Emoji-driven Style Control

Emotion-driven Style Controlを使用してテキストから声の変換が実行され、感情のあるテキストをエモタイザブルな声に変換することが可能になります。

生成AI拡散モデル生成テキスト音声

用途: テキスト-to-声の変換
難易度: Easy
コスト: High

githubGitHubあり2026-07-13

UniPic — Open-source SOTA multi-image editing model

UniPicは、オープンソースの最先端の画像編集モデルの実装です。

コンピュータビジョンマルチモーダル生成画像

用途: 多画像編集モデルの実装
難易度: Easy
コスト: High

arxivPaper only2026-07-12

An Extreme Value Perspective on Learning Stress Laws

We introduce Self-Similar Generative Estimation (SS-GEN), a method for simulating multivariate tail events and

用途: 生成
難易度: Hard
コスト: Low

githubGitHubあり2026-07-11

LLMs-from-scratch — Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

この研究では、COVID-19臨床パスウェイズの予測監視を支援するために、パイプラインを構築しました。このパイプラインには、データリフティング、時間的再構成、イベントログの構築、プリフィックスベースの表現、予測モデルの整

用途: 医療機器へのアクセスを予測する
難易度: Easy
コスト: High

arxivPaper only2026-07-10

Manifold Constrained Conformal Prediction for Spatial Events

We introduce a new conformal prediction method that constructs calibrated prediction sets over collections of

自然言語処理RAG生成予測3D

用途: 生成
難易度: Hard
コスト: High

huggingfaceHugging Faceあり2026-07-10

OpenLongTail: Generative Scaling of Long-Tail Driving Data

Scaling robust driving policies is fundamentally bottlenecked by the scarcity of edge cases in curated dataset

自然言語処理RAG生成画像動画

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-07-09

Awesome-Item-ID-Gen-RecSys — Updating curated list of research advancements on item identification and item tokenization in generative recommender systems. The survey is titled "A Survey of Item Identifiers in Generative Recommendation: Construction, Alignment, and Generation"

本研究では、生成推奨システムにおけるアイテムIDの構築、調整、生成の手法について、アイテムIDの構築方法を分析しています。

用途: 生成推奨システムのアイテムIDの問題解決
難易度: Easy
コスト: High

arxivPaper only2026-07-08

Intrinsic-Noise Consolidation: A Doob-Barrier-Conditioned Diffusion Turns Analog Device Noise into a Continual-Learning Resource

計算機による学習記憶を安定化させることができる、新しい方程式を開発した。新しい方程式により、計算機による学習記憶を正確にコンソリデーションさせることができる。

用途: 計算機による学習記憶のコンソリデーション
難易度: Hard
コスト: High

huggingfaceHugging Faceあり2026-07-08

DeepSearch-World: Self-Distillation for Deep Search Agents in a Verifiable Environment

Training tool-use agents to improve from their own experience remains challenging, as supervised fine-tuning r

深層学習軽量化・量子化生成強化学習

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-07-08

VoxCPM — VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

マルチラギングスピーチ生成やクリエイティブボイスデザイン、ルートライフクライミングなど、テクスチャファリーTTSの最新技術を実現するためのフレームワークです。

生成AI音声・音楽生成生成テキスト音声

用途: マルチラギングスピーチ生成
難易度: Easy
コスト: Medium

arxivPaper only2026-07-07

6G Sensing Security: Distributed Game-Theoretic RL for Urban Beamforming and Attacker Detection

Next-generation wireless networksにおける分散型ゲーム理論を用いた6Gのセキュリティを研究します。分散型ゲーム理論は、6Gの通信システムが環境の認識とデータの伝送両方を実現するために必要な

センサ/時系列深層学習軽量化・量子化検出生成強化学習

用途: 6Gにおける分散型ゲーム理論
難易度: Hard
コスト: Medium

huggingfaceHugging Faceあり2026-07-07

UI2App: Benchmarking Visual Interaction Inference in Executable Web Application Generation

Large language models (LLMs) have demonstrated growing competence in web page generation. However, existing te

用途: 生成
難易度: Easy
コスト: High

githubGitHubあり2026-07-07

DATAGEN — DATAGEN: AI-driven multi-agent research assistant automating hypothesis generation, data analysis, and report writing.

AIドライブのマルチエージェント研究アシスタント。仮説の生成、データ分析、およびレポートの生成を自動化する。

用途: AI研究アシスタント
難易度: Easy
コスト: High

arxivPaper only2026-07-06

LLM-Driven Evolutionary Generation of Multi-Objective Bayesian Optimization Algorithms

Designing effective multi-objective Bayesian optimization (MOBO) algorithms requires balancing many interdepen

少数データ向きCPUで試しやすい条件最適化深層学習軽量化・量子化生成テキスト

用途: 生成
難易度: Hard
コスト: High

arxivPaper only2026-07-06

A Large-Scale Sparse Multiobjective Optimization Algorithm Based on Optimal Performance Scores

この論文では、大規模スパース多目標最適化の問題に取り組むために、新しく提唱された適応可能な初期値生成アルゴリズムを提案し、アルゴリズムの効率とパフォーマンスを評価する。

品質予測/異常検知コンピュータビジョンセグメンテーション生成

用途: 大規模スパース多目標最適化
難易度: Hard
コスト: Medium

arxivPaper only2026-07-06

QDEvo: A Multi-Objective Quality-Diversity Framework for Automated Heuristic Design

The integration of Large Language Models (LLMs) with evolutionary computation has emerged as a powerful paradi

品質予測/異常検知深層学習軽量化・量子化生成テキスト

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョン3D・点群生成画像3D

githubGitHubあり2026-07-06

Magic123 — [ICLR'24] Official PyTorch Implementation of Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

Magic123は、画像を1枚入力し、画像と3Dデータ双方の情報を利用して高質の3Dオブジェクトを生成することができる。

用途: 高質の3Dオブジェクト生成
難易度: Easy
コスト: High

githubGitHubあり2026-07-05

llm-app — Ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data. 🐳Docker-friendly.⚡Always in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, and more.

この論文では、RAG、AIパイプライン、企業検索を含むクラウドテンプレートを提供するアプリケーション「llm-app」を紹介します。 llm-app は Docker で動作し、Sharepoint、Google Dr

用途: AIパイプラインを構築する
難易度: Easy
コスト: High

arxivPaper only2026-07-02

Hybridizing a Grouping Metaheuristic with Reinforcement Learning for the One-Dimensional Bin Packing Problem

1D バイナリングパッキング問題（1D-BPP）とは、さまざまな用途に多く応用される、分配不可能なNP困難な組合せ最適化問題である。この研究では、Falkenauerのハイブリッドグループゲンエイリアスアリファメント（

表形式向き品質予測/異常検知自然言語処理RAG生成表形式強化学習

用途: 1D バイナリングパッキング
難易度: Hard
コスト: Low

arxivPaper only2026-07-02

Evolutionary Wave Function Collapse

波形機能崩壊 (WFC) は、プロセス内容生成のために普及している一種メソッドで、ローカルな隣接制約を学習しながら、例の入力からより大きな出力を生成する。WFCに進化的検索を組み合わせることで生成されたレベルの評価が可能

品質予測/異常検知深層学習Transformer生成

用途: プロセス内容生成における進化的波形機能崩壊
難易度: Hard
コスト: Medium

arxivPaper only2026-07-02

Mechanism and Stability Analysis of Metabolic Closed-Loop Metaheuristics

この論文は、メタ解析システムのフレームワークレベルでの解釈を研究する。メタ解析システムのリソースループの解釈は、ナラティブのための象徴的表現だけではなく、フレームワークレベルにおいても存在するのではないかという質問を中心

用途: メタ解析システムの安定性の分析
難易度: Hard
コスト: Medium

githubGitHubあり2026-07-02

learning — A log of things I'm learning

学習中のアイデアや知識を整理するための日記。

用途: 知識の学習記録
難易度: Easy
コスト: High

arxivPaper only2026-07-01

From Consistency to Collaborative Discovery: MFEA-CoD for Multitask Novelty Search

この研究では、多タスクの奇抜さを促進するために、エボリューション性の多タスク (EMT) を導入しました。EMT は、目標指向の最適化に焦点を当ててきましたが、共通性の構造を利用して、同時に複数の最適化問題を解決する能力

用途: 多タスクの奇抜さ検索
難易度: Hard
コスト: Low

githubGitHubあり2026-07-01

MeanFlow — PyTorch implementation of MeanFlow & iMF (one-step generative modeling).

Operad理論を用いて、モデルが組み合わせ式に対する複合的な回答の合致性を検証する手法が提案された。

生成AI拡散モデル生成

用途: 対象モデルが不正を検知する
難易度: Easy
コスト: High

arxivPaper only2026-06-30

Evaluation of Population Initialization Methods for Genetic Programming-based Symbolic Regression

We analyze the effect of optimizing the initial population of genetic programming (GP) for symbolic regression

機械学習教師あり学習生成回帰

用途: 生成
難易度: Hard
コスト: Medium

arxivPaper only2026-06-30

Distributed Hierarchical Temporal Memory with Shared Associative Memory for Cross-Entity Preemptive Warning

分散型時間関数記憶体を用いた異常検知システムを開発しました。このシステムは、関連のあるエンティティの予兆行動を共有メモリ空間に保存し、異常検知に役立ちます。このシステムは、異常検知に役立つ新しい方法を提供します。

センサ/時系列品質予測/異常検知自然言語処理RAG検出生成異常検知

用途: 分散型時間関数記憶体を用いた異常検知
難易度: Hard
コスト: Low

arxivPaper only2026-06-30

Data Sharing and Competition in Learning-by-Deploying Industries: Insights from Robotics and Beyond

データ共有と競争を経済学的にもうつることの影響を分析する。この研究では、企業がデータを共有することで、競争が減るか増すかを考察し、データ共有と競争の関係を分析する。

用途: データ共有と競争を経済学的にもうつることの影響を分析する
難易度: Hard
コスト: Low

githubGitHubあり2026-06-30

ComfyUI-LTXVideo — LTX-Video Support for ComfyUI

医療画像分析で、深層學習モデルが実装されている問題に対する解決策を提示します。治療を導くために、批判的結果に影響を与える変化について特に重点が置かれています。

生成AI拡散モデル生成画像テキスト

用途: 医療画像を分析し治療を導く
難易度: Easy
コスト: High

arxivPaper only2026-06-29

Semantics-Aware Bilevel Co-Evolution: Towards Automated Multicomponent Algorithm Design

LLM-assisted evolutionary search (LES) has emerged as a promising paradigm for automated algorithm design. How

品質予測/異常検知自然言語処理大規模言語モデル生成

用途: 生成
難易度: Hard
コスト: High

githubGitHubあり2026-06-29

HunyuanVideo — HunyuanVideo: A Systematic Framework For Large Video Generation Model

画面の生成モデルであるHunyuanVideoを開発した。HunyuanVideoは、複雑なシーケンスを生成する能力を持つ。

深層学習Transformer生成動画

用途: 画面の生成モデルへの応用
難易度: Easy
コスト: High

arxivPaper only2026-06-28

Generalized Bidding Games: Where Bidding and Stochastic Games Meet

Two-player games on graphs are a classical framework for analyzing strategic decision making. In turn-based ga

コンピュータビジョンセグメンテーション生成

用途: 生成
難易度: Hard
コスト: Medium

githubGitHubあり2026-06-28

awesome-japanese-llm — 日本語LLMまとめ - Overview of Japanese LLMs

分析システムの性能を向上するための学習モデル開発を行う。

自然言語処理大規模言語モデル生成マルチモーダル

用途: 分析システムの性能を向上するための学習モデル開発
難易度: Easy
コスト: High

githubGitHubあり2026-06-28

LanPaint — High quality training free inpaint for every stable diffusion model. Supports ComfyUI

画像生成のためのHigh Quality Training Free Inpaintを提供します。このInpaintはStable Diffusionモデルに使用でき、ComfyUIもサポートしています。

品質予測/異常検知生成AI拡散モデル生成画像動画

用途: 画像生成
難易度: Easy
コスト: High

arxivPaper only2026-06-27

Unified Complex-valued Neural Network: A Magnitude-Phase Computational Model for Event-Driven Neuromorphic Learning

Artificial neural networks (ANN) provide accurate continuous-valued representation, whereas spiking neural net

説明可能深層学習CNN生成

用途: 生成
難易度: Hard
コスト: High

arxivPaper only2026-06-25

Multi-Objective Molecular Generation with Frequency-Controlled Evolutionary Dynamics

Molecule generation methods that leverage generative models have been successfully applied to drug discovery.

説明可能MI向き品質予測/異常検知深層学習軽量化・量子化生成

用途: 生成
難易度: Hard
コスト: High

arxivPaper only2026-06-25

Surviving by Serving: Functional Relevance Drives Self-Organization in Complex Adaptive Systems

この研究では、複雑な適応システムの分析をしました。これは、システムの構造を分析することで、系統的な機構がどのように発生するかを理解するために行われた。

自然言語処理ファインチューニング生成

用途: 複雑な適応システムの分析
難易度: Hard
コスト: Medium

githubGitHubあり2026-06-25

ai-engineering-from-scratch — Learn it. Build it. Ship it for others.

このリポジトリでは、AIエンジニアリングのためのオープンソースプラットフォームであるMLflowを提供しています。

用途: AIエンジニアリングのためのプラットフォーム
難易度: Easy
コスト: Medium

githubGitHubあり2026-06-25

ml-mdm — Train high-quality text-to-image diffusion models in a data & compute efficient manner

Train high-quality text-to-image diffusion models in a data & compute efficient manner

用途: 生成
難易度: Easy
コスト: High

arxivPaper only2026-06-24

Restoring Incentive Compatibility in Two-Stage Energy Markets with Prosumers

分布制御に基づく電力市場の問題は、供給と需要がバランスのとれた状況ではなく、供給が需要より多い状況を表現することができます。

強化学習マルチエージェント生成

用途: 電力供給の分散化における不均衡解決問題の解決
難易度: Hard
コスト: Medium

arxivPaper only2026-06-23

Distributed Quality-Diversity Search for Toxicity in Large Language Models

この研究では、多様性のあるトキシックテストを検索します。

用途: 多様性のあるトキシックテストの検索
難易度: Hard
コスト: High

arxivPaper only2026-06-21

Design and Development of a Neuromorphic Silicon Suite: PVT Sensing, Stochastic LIF Inference, On-Chip STDP Learning, and Crossbar Programming

Edge neuromorphic systems need compact, configurable hardware that combines probabilistic inference, local lea

センサ/時系列深層学習Transformer生成

用途: 生成
難易度: Hard
コスト: Medium

arxivPaper only2026-06-21

Mitigating Polycentric Conflict-Trap Risk in Mali via Intergenerational Volterra Mean-Field-Type Games

Persistent instability in Mali and neighboring countries is not a temporary security crisis but a self-reprodu

機械学習特徴量エンジニアリング生成

用途: 生成
難易度: Hard
コスト: Medium

arxivPaper only2026-06-20

Evolutional Math: Cross-Validated Island-Model Genetic Programming for Interpretable Symbolic Regression on Small, Wide Datasets

Symbolic regression via genetic programming routinely fails on small, wide datasets - a regime common in clini

説明可能品質予測/異常検知機械学習教師あり学習生成回帰

用途: 生成
難易度: Hard
コスト: High

arxivPaper only2026-06-20

Distilling a Modular Reservoir Through a Genomic Bottleneck

The intricate structures of biological neural networks largely emerge during development, guided by a comparat

用途: 生成
難易度: Hard
コスト: High

arxivPaper only2026-06-19

On the Use of Survival Selection Methods for Evolutionary Diversity Optimisation

Generating a diverse set of high quality solutions for an optimisation problem has been studied extensively in

品質予測/異常検知深層学習軽量化・量子化生成

用途: 生成
難易度: Hard
コスト: Medium

arxivPaper only2026-06-18

Formally Verified Code Synthesis for Structured Data Translation in a Medical Internet of Things

In this work we present a LLM powered, evolutionary code synthesis system for structured data translation in a

表形式向き自然言語処理大規模言語モデル生成表形式

用途: 生成
難易度: Hard
コスト: High

arxivPaper only2026-06-15

Evolution & Foundation: AI Shares Creative Control

AIが人間と協力して作り出すアイデアを評価するための新しい手法を提案し、創造性の評価を向上させた。

自然言語処理ファインチューニング生成画像3D

用途: AIの創造性を評価するための新しい手法
難易度: Hard
コスト: High

arxivPaper only2026-06-12

Genetic Algorithm Based Coordination and Optimization Model for Generation Grid Load Storage in Active Distribution Networks

Create an optimization framework that combines fuzzy logic and genetic algorithms for risk assessment and coor

用途: 生成
難易度: Hard
コスト: Low

arxivPaper only2026-06-12

MeEvo: Metacognitive Evolution Combined with Natural Evolution for Automatic Heuristic Design

この研究では、自動補助関数設計（AHD）についての研究を行った。AHDは、マシン学習が可能になる以前から研究されていたトピックであり、マシン学習によって、AHDがさらに活用可能になった。この研究では、AHDにおけるメタ認

用途: 自動補助関数設計
難易度: Hard
コスト: High