MLinfo | 機械学習・AI論文まとめ

MLinfo|日々更新される技術をキャッチアップ/検索

「generation」の検索結果

37 件

すべて arxiv github huggingface 実装あり

huggingfaceHugging Faceあり2026-07-22

NexForge: Scaling Agent Capabilities through Requirement-Driven Task Synthesis for LLMs

Scaling executable agent training data for LLM post-training is bottlenecked by substrate-bound methods that t

自然言語処理大規模言語モデル生成

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-21

FinanceComplexQA: Benchmarking Agentic Reasoning on Industrial-grade Financial Documents

Agentic Reasoning has become a transformative force in financial analysis due to its ability to integrate larg

品質予測/異常検知自然言語処理RAG生成要約テキスト

用途: 生成
難易度: Easy
コスト: Low

→

huggingfaceHugging Faceあり2026-07-21

Moving Alphabet: A Controlled Study of Training Data for Text-to-Video Generation

Text-to-video generation has advanced significantly over the past five years through scaling of model size, da

品質予測/異常検知自然言語処理ファインチューニング分類生成テキスト

用途: 分類
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-21

Generative World Renderer at the Speed of Play

Generative world renderer AlayaRenderer receives structured world states exported from physics engines and syn

深層学習軽量化・量子化生成テキスト

用途: 生成
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-07-21

Text Template Tokens Are Implicit Semantic Registers in Diffusion Transformers

Text-to-image diffusion transformers (DiTs) jointly process text and image tokens, yet their internal computat

説明可能深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-21

Mage-Flow: An Efficient Native-Resolution Foundation Model for Image Generation and Editing

Large-scale visual generators are increasingly capable but costly to train, fine-tune, and deploy. We introduc

品質予測/異常検知深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-20

AlayaWorld: Interactive Long-Horizon World Modeling -- Full Technical Report

Unlike conventional video game development, which relies on labor-intensive pipelines for asset production, an

深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-07-20

SciForma: Structure-Faithful Generation of Scientific Diagrams

Structural fidelity is essential to scientific methodology diagrams. To communicate research logic, these diag

品質予測/異常検知自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-20

HOMIE: Human-object Centric Video Personalization via Multimodal Intelligent Enchancement

Human-object centric video personalization (HOCVP) is a core task within subject-driven video generation. Howe

深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-20

FlowMimic: Mask-free Visual Editing and Generation with Pixel-pair Warped Flow Field for Online Video Editing Data Generation and Modality Mimicry

In line with the prevailing direction of vision research, we explore the integration of both generation and ed

品質予測/異常検知自然言語処理大規模言語モデル検出生成セグメンテーション

用途: 検出
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-20

Token-Level Off-Policy Learning for Faithful Generation Under Distribution Shift

We propose Token-Level Off-Policy Labeling (TOPL), an off-policy training paradigm that reframes post-training

説明可能自然言語処理ファインチューニング分類生成異常検知

用途: 分類
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-20

FlashRT: Agent Harness for Guiding Agents to Deploy Real-Time Multimodal Applications

Real-time multimodal applications, including voice agents and interactive video generation, compose heterogene

深層学習軽量化・量子化生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-20

DiFA: Inference-Time Forward-Process Alignment for Diffusion Models

The prevailing inference framework for diffusion models formulates generation fundamentally as a problem of nu

コンピュータビジョン画像分類生成画像

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-20

ShotPlan: Cinematic Video Generation with Learnable Planning Token

Current video generation models achieve impressive results in single-shot generation, yet remain limited in ci

MI向き自然言語処理埋め込み・検索生成動画

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-20

ReViV: Reconstructing the Viewer and the View in 4D from Monocular Egocentric Video

Egocentric devices, such as wearable front-facing cameras, provide a unique perspective for capturing the cont

深層学習Transformer生成動画3D

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-19

EvolvingWorld: An Open-Schema Framework for Co-Evolving Role-Play Agents and World Model in Interactive Literary World

This paper introduces EvolvingWorld, a framework and benchmark for character and world co-evolution in interac

自然言語処理大規模言語モデル生成

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-19

HarmoHOI: Harmonizing Appearance and 3D Motion for Multi-view Hand-Object Interaction Synthesis

Hand-Object Interaction (HOI) synthesis is a cornerstone for animation production and embodied AI. Despite the

品質予測/異常検知深層学習Transformer生成画像動画

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-18

DataFlow-Harness: A Grounded Code-Agent Platform for Constructing Editable LLM Data Pipelines

Large language models (LLMs) are increasingly used to automate data-processing workflows, yet coding agents ty

自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-18

Group Entropy-Controlled Policy Optimization

Entropy control has become an effective tool in reinforcement learning (RL) of large language models (LLMs), h

深層学習軽量化・量子化生成テキスト強化学習

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-18

Environment-free Synthetic Data Generation for API-Calling Agents

Training API-calling large language model (LLM) agents demands massive amounts of high-quality trajectories. H

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-17

FVAttn: Adaptive Sparse Attention with Runtime Load Balancing for Video Generation

Video Diffusion Transformers process long spatio-temporal sequences, making self-attention the main bottleneck

品質予測/異常検知深層学習Transformer生成動画

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-17

Apple-π: Benchmarking Thinking with Video Towards Law-Grounded Physical Intelligence

Modern video generation models are increasingly hailed as emerging world models with an internalized grasp of

自然言語処理大規模言語モデル生成動画

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-17

Nonuniformity Principle in Human-AI Coworking

As generative AI is increasingly applied to automate multi-step and high-stake workflows, human judgment and i

品質予測/異常検知機械学習教師あり学習生成

用途: 生成
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-07-17

S1-Omni: A Unified Multimodal Reasoning Model for Scientific Understanding, Prediction, and Generation

We present S1-Omni, a unified multimodal reasoning model for scientific understanding, prediction, and generat

MI向き自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-16

Xiaomi-Robotics-1: Scaling Vision-Language-Action Models with over 100K Hours of Real-World Trajectories

We present Xiaomi-Robotics-1, a foundational vision-language-action (VLA) model capable of (1) following diver

深層学習軽量化・量子化生成テキストマルチモーダル

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-16

xHC: Expanded Hyper-Connections

Hyper-Connections (HC) expand the residual stream of Transformers into N parallel streams, providing a form of

深層学習Transformer生成

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-16

Beyond Entropy: Correctness-Aware Advantage Shaping via Contrastive Policy Optimization

Reinforcement learning with verifiable rewards (RLVR) commonly uses entropy for advantage shaping. However, en

深層学習軽量化・量子化生成強化学習

用途: 生成
難易度: Easy
コスト: Medium

→

huggingfaceHugging Faceあり2026-07-15

DiffGI: Differentiable Geometry Images for High-Fidelity Thin-Shell 3D Generation

Existing 3D generative models predominantly rely on implicit volumetric representations, which enforce waterti

深層学習Transformer生成画像3D

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-15

Diagnosing and Calibrating Tool-Call Boundary Drift in Multi-Teacher On-Policy Distillation

Agentic language models must learn when to call tools, when to consume tool responses, and when to answer dire

深層学習軽量化・量子化生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-15

VideoRAE: Taming Video Foundation Models for Generative Modeling via Representation Autoencoders

Video generative models commonly rely on latent spaces learned by 3D Variational Autoencoders (3D-VAEs). Howev

深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-14

From Human-Centric to Agentic Code Review: The Impact of Different Generations of Generative AI Technology on Review Quality

Code review helps maintain software quality before code integration, but it also imposes a substantial workloa

品質予測/異常検知深層学習Transformer生成テキスト

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceGitHubありHugging Faceあり2026-07-13

RAGU: A Multi-Step GraphRAG Engine with a Compact Domain-Adapted LLM

Graph retrieval-augmented generation (GraphRAG) enhances large language models with structured knowledge, yet

自然言語処理大規模言語モデル検出生成要約

用途: 検出
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-13

Qwen-Music Technical Report

In this report, we introduce Qwen-Music, a powerful music generation model capable of producing highly musical

センサ/時系列品質予測/異常検知深層学習Transformer生成テキスト音声

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-13

SVR-R1: Bootstrapping Multi-modal Reasoning with Self-verification in Reinforcement Learning

We introduce Self-Verified Reasoner (SVR-R1), a multi-turn RL framework that turns a model's own verification

コンピュータビジョンセグメンテーション生成マルチモーダル強化学習

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-10

OpenLongTail: Generative Scaling of Long-Tail Driving Data

Scaling robust driving policies is fundamentally bottlenecked by the scarcity of edge cases in curated dataset

自然言語処理RAG生成画像動画

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-08

DeepSearch-World: Self-Distillation for Deep Search Agents in a Verifiable Environment

Training tool-use agents to improve from their own experience remains challenging, as supervised fine-tuning r

深層学習軽量化・量子化生成強化学習

用途: 生成
難易度: Easy
コスト: High

→

huggingfaceHugging Faceあり2026-07-07

UI2App: Benchmarking Visual Interaction Inference in Executable Web Application Generation

Large language models (LLMs) have demonstrated growing competence in web page generation. However, existing te

深層学習Transformer生成画像テキスト

用途: 生成
難易度: Easy
コスト: High

→