MLinfo | 機械学習・AI論文まとめ

品質予測/異常検知コンピュータビジョンセグメンテーション生成画像テキスト

Echo-Memory: A Controlled Study of Memory in Action World Models

この研究では、エピソード記憶を制御するために、エピソード記憶モデルを設計および評価しました。エピソード記憶モデルは、エピソード内の重要な情報を記憶し、エピソード間の相関関係を特定することができます。

用途: エピソード記憶
難易度: Hard
コスト: High

Hybrid Robustness Verification for Spatio-Temporal Neural Networks

With AI increasingly deployed in safety-critical systems, providing formal robustness guarantees for the under

深層学習Transformer分類動画3D

用途: 分類
難易度: Hard
コスト: High

Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis

We study whether pretrained video foundation models encode intuitive-physics information in their frozen repre

自然言語処理埋め込み・検索動画

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Streaming Interventions: Can Video Large Language Models Correct Mistakes as They Occur?

動画大規模言語モデルを使用した質問に対する回答を研究。モデルの能力と限界を調査し、質問に対する答えを生成するための方法を提案した。

深層学習軽量化・量子化テキスト動画マルチモーダル

用途: 動画大規模言語モデルを使用した質問に対する回答
難易度: Hard
コスト: High

Conan-embedding-v3: Fusing Modality-Specific Models for Omni-Modal Embedding

この研究では、テキスト、画像、ビデオ、アウディオ等の異なるモダリティのデータを統合したオムニモダル検索システムを構築します。

自然言語処理ファインチューニング回帰検索画像

用途: オムニモーダル検索
難易度: Hard
コスト: High

Counterfactual Reasoning for Fine-Grained Evidence Disentanglement in VideoQA

この論文では、VideoQA が過度に信憑性の

コンピュータビジョンマルチモーダル検出画像動画

用途: ビデオQA に対するカウンターファクタルの推論
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル動画強化学習

arxivGitHubあり2026-06-08

Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning

Agentic reinforcement learning (RL) has become an important post-training paradigm for turning LLMs from stati

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Driving Video Retrieval for Complex Queries with Structured Grounding

Video retrieval at scale is central to data curation and safety validation in autonomous driving, where users

コンピュータビジョンマルチモーダルテキスト動画

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation

Video generative models have become increasingly powerful, but long-range consistency remains challenging to a

深層学習Transformer生成テキスト動画

用途: 生成
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション動画マルチモーダル

C$^3$ache: Accelerating World Action Models with Cross Inference Chunk Cache

ワールドアクションモデルを高速化するために、情報のキャッシュと伝達を提案します。

用途: ワールドアクションモデルを高速化するためのキャッシュと伝達
難易度: Hard
コスト: High

AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

この論文では、ロボット手術の制御を改善するために、ロボットの視覚的シーンの動作と操作を同時にモデル化する方法を提案する。

深層学習Transformer画像テキスト動画

用途: リモートハンドリングの制御
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョン動画認識検出画像テキスト

ArtiFact: A Large-Scale Multi-Modal Cultural Heritage Dataset

LLMを用いた臨床研究論文の草案作成を支援するために、生成されたテキストを検証するためのアーキテクチャを設計。これにより、虚偽の citaion、数字の不正確な記録、およびガイドライン違反が防がれます。

用途: 医学論文執筆のサポート
難易度: Hard
コスト: High

I Was Scrolling and Then I Saw a Pregnant Strawberry

AIのミニドラマ（または果実のドラマ）は、最近、ソーシャルメディアプラットフォーム上で広まった短い、アルゴリズム的かつ分散された生成AIビデオシリーズです。これらのビデオの視覚表現は、性的に見えると思われる果物が表現され

深層学習Transformer生成画像動画

用途: AIの小ドラマ
難易度: Hard
コスト: High

EgoTactile: Learning Grasp Pressure for Everyday Objects from Egocentric Video

Egocentricビデオを利用して手の圧力を推定できるモデル EgoTactile を提案している。

センサ/時系列自然言語処理RAG画像動画3D

用途: Egocentricビデオを利用した、手の圧力の推定
難易度: Hard
コスト: High

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

Egocentric visionを使用して、ペダストリアンの歩く道に渡るのを予測する。Closed-ended visual question answering（VQA）問題に形式することで、ビジョン言語モデルを使用

深層学習TransformerQA画像テキスト

用途: ペダストリアンが歩く道に渡るのを予測する
難易度: Hard
コスト: High

See More, Think Deeper: Query-Expanded Visual Evidence and Answer-Clue Guided Reflection for Long Video Understanding

Recent advances in Video Large Language Models (Video-LLMs) have enabled performance on long-video understandi

自然言語処理大規模言語モデル生成画像テキスト

用途: 生成
難易度: Hard
コスト: High

Cross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and Lipreading

この研究では、静黙の口承のシンセシスを実現するためのフレームワークを開発します。このフレームワークは、静黙の口承のシンセシスと精度を改善することができます。

センサ/時系列自然言語処理RAG生成音声動画

用途: 静黙の口承のシンセシス
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理RAG生成画像テキスト

Latent Spatial Memory for Video World Models

Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit poi

用途: 生成
難易度: Hard
コスト: High

コンピュータビジョン動画認識テキストマルチモーダル

MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action Models

Temporal modeling is essential for robotic manipulation, as effective control requires both memory of past int

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

GenEyePose: Patient-Free, Knowledge-Based Saccadic Eye Movement Modeling for Digital Neurophysiologic Biomarker Development

Eye movements, including saccades, are widely regarded as highly sensitive and objective biomarkers of neuroph

深層学習Transformer分類検出生成

用途: 分類
難易度: Hard
コスト: High

説明可能深層学習Transformerテキスト動画

MAVIS: Multi-Agent Video Retrieval via Structured Video Understanding

The dominant paradigm in video retrieval relies on embedding-based full-corpus scanning, which suffers from in

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理RAG生成テキスト音声

CineDance: Towards Next-Generation Multi-Shot Long-Form Cinematic Audio-Video Generation

The fidelity and structural diversity of training datasets fundamentally determine the capabilities of video g

用途: 生成
難易度: Hard
コスト: High

arxivGitHubあり2026-06-08

A VideoMAE-v2 Approach to Zero-Shot Traffic Accident Anticipation

Traffic accident anticipation -- predicting the likelihood of an imminent collision at every frame of a dashca

自然言語処理プロンプトエンジニアリング動画

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成動画

SwiftVR: Real-Time One-Step Generative Video Restoration

Real-time video restoration (VR) for live streams requires high-resolution outputs under strict per-frame late

用途: 生成
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション生成画像動画

Prisma-World: Camera-Controllable Multi-Agent Video World Model

Video world models have made rapid progress in generating controllable visual experiences, but most of them st

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル画像テキスト動画

CapRL++: Unified Reinforcement Learning with Verifiable Rewards for Dense Image and Video Captioning

Image and video captioning are fundamental tasks that bridge the visual and linguistic domains, playing a crit

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

センサ/時系列コンピュータビジョン動画認識画像テキストマルチモーダル

IB-HFN: Information Bottleneck-Driven SAR-Optical Fusion Network for High-Fidelity Cloud Removal

Synthetic aperture radar (SAR)-assisted optical cloud removal aims to recover surface information obscured by

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Self-supervised Learning Matters: A Simple Ensemble Solution for Micro-Gesture Recognition

In this paper, we present XInsight Lab's solution to the micro-gesture classification track of the 4th MiGA Ch

自然言語処理ファインチューニング分類埋め込み動画

用途: 分類
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成テキスト動画

LiteVSR: Lightweight Adaptation of Frozen Diffusion Transformers for Video Super-Resolution

Adapting large-scale pre-trained video generators for Video Super-Resolution (VSR) in novel domains remains co

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer検出画像テキスト

arxivGitHubあり2026-06-08

Temporal-Aware Reasoning Optimization for Video Temporal Grounding

Multi-modal Large Language Models (MLLMs) have achieved remarkable progress in video temporal grounding with r

用途: 検出
難易度: Hard
コスト: High

CP4D: Compositional Physics-aware 4D Scene Generation

4D generation (\textit{i.e.}, dynamic 3D generation) has recently emerged as a rapidly growing research fronti

MI向き自然言語処理RAG生成画像テキスト

用途: 生成
難易度: Hard
コスト: High

Vision-Language Guided Hyperspectral Object Tracking via Semantics Fusion and Contextual Template Updating

Hyperspectral object tracking (HOT) leverages the rich spectral information provided by hyperspectral videos (

深層学習軽量化・量子化画像テキスト動画

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Zero-Parameter Geometric Gating for Temporally Stable Low-Altitude UAV Video Semantic Segmentation

Video semantic segmentation for low-altitude UAVs requires temporal consistency, yet dense optical flow introd

コンピュータビジョンセグメンテーション画像動画

用途: セグメンテーション
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション生成画像テキスト

OmniGen-AR: AutoRegressive Any-to-Image Generation

Autoregressive (AR) models have demonstrated strong potential in visual generation, offering superior performa

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成画像動画

Ultra Flash: Scaling Real-Time Streaming Video Generation to High Resolutions

While recent autoregressive video diffusion models achieve remarkable streaming quality, they remain confined

用途: 生成
難易度: Hard
コスト: High

$ω$-EVA: Envision, Verify, and Act with Latent Interactive World Models

Embodied policies typically map current observations directly to actions, leaving candidate-action consequence

強化学習モデルベース生成画像動画

用途: 生成
難易度: Hard
コスト: High

MotionWAM: Towards Foundation World Action Models for Real-Time Humanoid Loco-Manipulation

World Action Models (WAMs) couple a video dynamics prior to the policy and have shown encouraging results on t

自然言語処理RAG画像動画マルチモーダル

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

表形式向きコンピュータビジョン動画認識テキスト動画マルチモーダル

Harnessing Streaming Video in the Wild

Vision-Language Models (VLMs) are increasingly required to process unbounded video streams in applications suc

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Beyond Consistency: Preserving Temporal Structure in Zero-Shot Video Editing

Existing zero-shot video editing methods rely on pre-trained diffusion models, successfully achieving spatial

自然言語処理RAG画像動画

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

BioVid: Autoregressive Video Generation with Biological Behavior Semantic Comprehension

Existing video generation frameworks treat sequence duration as an externally prescribed parameter -- fixed fr

深層学習Transformer生成テキスト動画

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル画像テキスト音声

OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning

While Omni-modal Large Language Models (OLLMs) have demonstrated impressive capabilities in jointly processing

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Towards Accurate Emotion-Attributed Video Captioning via Fine-grained Emotion-Cause Pair Extraction

Emotional Video Captioning (EVC) is a challenging task that aims to generate factually accurate and emotionall

説明可能自然言語処理RAG生成画像動画

用途: 生成
難易度: Hard
コスト: High

When Video Misreads: Closed-Loop Distillation of Reading Heuristics for Exploratory Manipulation Trace QA

Exploratory manipulation often turns an apparent failed attempt into the key evidence for what to do next. For

深層学習軽量化・量子化分類動画マルチモーダル

用途: 分類
難易度: Hard
コスト: High

表形式向きコンピュータビジョン動画認識生成画像テキスト

DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving

Reward models play a pivotal role in reinforcement learning (RL) and multi-modal trajectory selection for auto

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成画像動画

arxivGitHubあり2026-06-07

OmniTryOn: Video Try-On Anything at Once!

Although video virtual try-on (VVT) has achieved significant progress, existing methods still exhibit two fund

用途: 生成
難易度: Hard
コスト: High

Reinforcing Temporal Answer Grounding in Instructional Video via Candidate-Aware Causal Reasoning

The task of temporal answer grounding in instructional video (TAGV), which aims to locate precise video segmen

深層学習Transformer画像テキスト動画

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

表形式向き品質予測/異常検知自然言語処理大規模言語モデルテキスト動画

CoVEBench: Can Video Editing Models Handle Complex Instructions?

While recent text-guided video editing models excel at elementary tasks (e.g., style transfer, object insertio

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Video2Sim2Real: Full-Stack Autonomous Dexterous Skill Acquisition from a Single Human Video

Human manipulation videos are a convenient and intuitive source for robot learning. However, directly transfer

自然言語処理RAG動画3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

When No Answer Is Correct: Diagnosing Absent Answer Detection for MLLMs in Video Understanding

Multimodal large language models (MLLMs) have made substantial advancements in video understanding, yet the re

自然言語処理大規模言語モデル検出生成テキスト

用途: 検出
難易度: Hard
コスト: High

Decoupling Semantics and Logic: A Training-Free Coarse-to-Fine Pipeline for Video Retrieval-Augmented Generation

This paper presents our system description for the 2nd Workshop on Multimodal Augmented Generation via Multimo

深層学習軽量化・量子化生成検索画像

用途: 生成
難易度: Hard
コスト: High

Programmable Silicon Retina on Pixel Processor Array

Standard dynamic vision sensors approximate retinal processing by detecting temporal contrast changes, offerin

深層学習軽量化・量子化画像動画

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

TIDE: Task-Isolated Diffusion for Unified Video Editing and Generation

Recent advances in Diffusion Transformers have driven rapid progress in video generation and editing, yet thes

深層学習Transformer生成画像テキスト

用途: 生成
難易度: Hard
コスト: High

Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

World Action Models (WAMs) extend robot policy learning by incorporating future prediction as an additional tr

深層学習軽量化・量子化生成埋め込み動画

用途: 生成
難易度: Hard
コスト: High

MI向きコンピュータビジョンマルチモーダル画像テキスト動画

IMAGINE: Adaptive Schema-Imagery Enhanced Composition for Composed Video Retrieval

Composed Video Retrieval (CVR) is designed to retrieve a target video that matches a reference video modified

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Trustworthy Visual Predicates for Robust Manipulation Understanding under Degradation

Manipulation understanding requires reliable relational evidence, such as contact, support, containment, motio

深層学習Transformer検出画像動画

用途: 検出
難易度: Hard
コスト: High

MI向き品質予測/異常検知自然言語処理大規模言語モデル生成画像テキスト

arxivGitHubあり2026-06-06

VideoWeaver: Evaluating and Evolving Skills for Agentic Long Video Generation

Recent agent frameworks such as Claude Code, Codex, and OpenClaw are strong at tool use and orchestration, but

用途: 生成
難易度: Hard
コスト: High

Uncertainty-Aware Intention Prediction for Human-to-Robot Assembly Teleoperation

In assisted teleoperation for human-robot collaboration, accurate intention prediction is critical for enablin

自然言語処理RAG分類検出セグメンテーション

用途: 分類
難易度: Hard
コスト: High

MotionVLA: Injecting Geometric Motion into Vision-Language-Action Model

Vision-language-action (VLA) models increasingly condition robot policies on history, depth, or 4D features to

自然言語処理RAG生成画像テキスト

用途: 生成
難易度: Hard
コスト: High

Continual Quadruped Robots Coordination via Semantic Skill Discovery

Multi-quadruped coordination has attracted increasing attention due to its enhanced payload capacity, broader

自然言語処理RAGテキスト動画強化学習

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

vla.cpp: A Unified Inference Runtime for Vision-Language-Action Models

Vision-Language-Action (VLA) policies are typically shipped as Python/PyTorch stacks that assume a workstation

自然言語処理大規模言語モデル動画マルチモーダル

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

EgoAERO: Learning Dexterous Manipulation from a Single Egocentric Video without Object Assets

Egocentric RGB-D videos offer a natural source of human dexterous manipulation demonstrations, but existing da

品質予測/異常検知数学・理論最適化動画

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

MinNav: Minimalist Navigation Using Optical Flow For Active Tiny Aerial Robots

Navigation using a monocular camera is pivotal for autonomous operation on tiny aerial robots due to their per

自然言語処理ファインチューニング動画

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

センサ/時系列品質予測/異常検知コンピュータビジョン3D・点群生成テキスト動画

Dash2Sim: Closed-Loop Driving Simulation from in-the-wild Dashcam Videos

この論文では、ドライビングシミュレーションのためのフレームワークを提案しています。このフレームワークは、ドライビングシミュレーションを目的とした機械学習フレームワークです。このフレームワークは、大量のデータを扱う必要があ

用途: ドライビングシミュレーションのためのフレームワーク
難易度: Hard
コスト: High

自然言語処理ファインチューニング動画マルチモーダル

Robotic Policy Adaptation via Weight-Space Meta-Learning

Vision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling ge

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

QuadVerse: An Integrated Framework Aligning Visual-Physical Reality for Quadruped Simulation

この論文では、四足ロボットのシマイルのためのQuadVerseフレームワークを提案した。QuadVerseは、視覚的、物理的、動的なギャップを考慮したシマイルを用い、四足ロボットの実験環境とシマイルを統合した。

品質予測/異常検知自然言語処理RAG画像動画3D

用途: 四足ロボットのシマイル
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理RAG画像動画マルチモーダル

LARA: Latent Action Representation Alignment for Vision-Language-Action Models

Visual-language action (VLA) models enable robots to predict actions directly from observations and language i

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Dreaming when Necessary: Advancing World Action Models with Adaptive Multi-Modal Reasoning

World Action Models (WAMs) offer a promising approach to embodied intelligence, yet existing methods rely heav

深層学習軽量化・量子化画像テキスト動画

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Think Like a Pilot: Fine-Grained Long-Horizon UAV Navigation

VLNベンチマークでは、ディシクリットな操作や粗い操作が使われ、UAVのヴィジョンラングジュアクション（VLJ）タスクでは短い操作が中心で、長時間飛行に対応できるfineグラINEDUAVナビゲーション（FLIGHT）ベ

コンピュータビジョンマルチモーダルテキスト動画

用途: ドローンの長時間飛行
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョン3D・点群動画3D

What Matters When Cotraining Robot Manipulation Policies on Everyday Human Videos?

Human video datasets used for cotraining robot manipulation policies largely consist of curated demonstrations

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理RAGセグメンテーションテキスト動画

VOLT: Vision and Language Trajectory Segmentation for Faster-than-Demonstration Policies

この研究では、フェスタースター自動運

用途: フェスタースター自動運転用の高速動作
難易度: Hard
コスト: High

ActiveMimic: Egocentric Video Pretraining with Active Perception

この研究では、人々が実際に操作を行っている場合に、人が視点を変更してカメラ動きを生み出しながら学習することの重要性を認識し、ActiveMimicというプレトランジングフレームワークを提案します。

自然言語処理ファインチューニング動画

用途: エゴセンティック動画
難易度: Hard
コスト: High

MotionDisco: Motion Discovery for Extreme Humanoid Loco-Manipulation

この研究では、ヒューマノイドロボットのロコマニパションのための MotionDisco を提案し、ロボットは接触を検出して自律的に行動することができるようになります。

深層学習軽量化・量子化テキスト動画強化学習

用途: ヒューマノイドロボットのロコマニパション
難易度: Hard
コスト: High

Robots Need More than VLA and World Models

Generalist robot intelligence is often framed as a policy-scaling problem: collect more robot demonstrations,

コンピュータビジョン3D・点群生成動画3D

用途: 生成
難易度: Hard
コスト: High

World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis

統合された視覚言語アクションモデルを提案し、これを用いたタスクの性能を向上させることができるようになる。

深層学習Transformer生成画像テキスト

用途: 統合された視覚言語アクションモデル
難易度: Hard
コスト: High

コンピュータビジョンマルチモーダル異常検知テキスト動画

Towards a Data Flywheel for Embodied Intelligence in Logistics

Autonomous drivingでは、ロボットが視覚認識した情報に基づいて行動を決定する必要があるが、過去のデータで構築された空間モデルでは、ロボットの行動を予測することが困難であるため、空間モデルを構築することによ

用途: ロボットの行動予測に適した空間を構築
難易度: Hard
コスト: High

arxivPaper only2026-06-01

Identifiable Markov Switching Models with Instantaneous Effects and Exponential Families

Temporal systems often exhibit non-stationary behaviour, such as seasonal climate variation or glucose fluctua

センサ/時系列コンピュータビジョン動画認識検出時系列

用途: 検出
難易度: Hard
コスト: High

arxivPaper only2026-06-01

A Sheaf Framework for Strategic Multi-Agent Systems: From Consensus to Nash Equilibria

The coordination of heterogeneous autonomous agents in dynamic, adversarial environments requires simultaneous

コンピュータビジョン動画認識生成

用途: 生成
難易度: Hard
コスト: High

arxivPaper only2026-05-31

Functional Clustering of Survival Data via Smoothed Log-Hazard Trajectories: A Risk-Dynamics Perspective

This paper investigates clustering in survival data by shifting the analytical focus from cumulative survival

説明可能コンピュータビジョン動画認識

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-05-28

Deep Binarized Photonic Reservoir Computing for Ultrafast Multimedia Signal Processing

We present a deep photonic neural network architecture based on ultrafast binary optical modulation from a dig

センサ/時系列コンピュータビジョン動画認識分類検出画像

用途: 分類
難易度: Hard
コスト: High

arxivPaper only2026-05-27

CLANE: Continual Learning of Actions on Neuromorphic Hardware from Event Cameras

Recognizing and continuously learning novel human actions without forgetting prior classes is a requirement fo

センサ/時系列深層学習CNN分類画像動画

用途: 分類
難易度: Hard
コスト: High

arxivPaper only2026-05-21

Temporal Coding as a Substrate for Sensorimotor Object Inference: A Spiking Reinterpretation of Thousand Brains Architecture

この研究では、時間空間オブジェクト認識のためのお気に入りのサブストラットを開発するため、Spiking Reinterpretation of Thousand Brains Theoryという方法を提案しました。これは

センサ/時系列コンピュータビジョン動画認識分類

用途: 時間空間オブジェクト認識のためのお気に入りのサブストラットの開発
難易度: Hard
コスト: High

arxivPaper only2026-05-19

Scalable, Energy-Efficient Optical-Neural Architecture for Multiplexed Deepfake Video Detection

The rapid proliferation of AI-generated visual media has created an urgent need for efficient, trustworthy dee

深層学習Transformer検出画像動画

用途: 検出
難易度: Hard
コスト: High

arxivPaper only2026-05-16

A Truthful Multiunit Profit-Optimal Mechanism for Synthesizing Social Laws

This paper studies Social Law Synthesis (SLS) in strategic multi-agent environments as a new multi-unit mechan

コンピュータビジョン動画認識生成テキスト

用途: 生成
難易度: Hard
コスト: High

arxivPaper only2026-05-12

Self-organized MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization

The spatial and functional organization of the primate visual cortex is a fundamental problem in neuroscience.

深層学習Transformer画像動画3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High