MLinfo | 機械学習・AI論文まとめ

3D-Aware VLMs with Implicit and Explicit Geometries

3次元空間理解技術のための新しいアプローチであるVLM-IE3D（Vision-Language Models with Implicit and Explicit 3D geometry）を提案しました。VLM-IE3

コンピュータビジョン3D・点群検出画像テキスト

用途: 3次元空間理解技術の開発
難易度: Hard
コスト: High

GS-Agent: Creating 4D Physical Worlds With Generative Simulation

GS-Agentは、自然言語から生成することができ、物理的に正しく動作する4次元の世界を生成することができる。方法は、物理的正しさを保つために、生成時に物理的推論を使用した。

MI向き自然言語処理RAG生成画像テキスト

用途: 4次元の物理世界の生成
難易度: Hard
コスト: High

センサ/時系列自然言語処理大規模言語モデル画像テキスト3D

VoLN: Vision-Only Long-Horizon Navigation---Paradigm, Benchmark, and Method

Vision-and-Language Navigation (VLN) enables embodied agents to follow natural-language instructions. However,

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

自然言語処理プロンプトエンジニアリング分類画像テキスト

Sparse Concept Channels in Frozen 3D CT Vision Encoders

Large vision-language models are becoming increasingly dominant in 3D medical image interpretation, but we rar

用途: 分類
難易度: Hard
コスト: High

センサ/時系列深層学習軽量化・量子化画像3D自己教師

Boosting Robustness for All-Weather Self-Supervised Depth Estimation in Autonomous Driving

Self-supervised depth estimation is challenging for safe autonomous driving under various adverse weather cond

用途: 自走車両の障害物認識
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer画像テキスト動画

Texture++: Elevating 3D Asset Texture Resolution with a Region-Aware Diffusion Model

Numerous 3D assets are discarded due to low texture resolution, while current super-resolution models ignore t

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Recurrent Sinusoidal INRs for Efficient High-Fidelity Representation

We study sinusoidal recurrence as an iterative mechanism for harmonic spectral enrichment in implicit neural r

深層学習RNN / LSTM画像3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョン3D・点群生成3D

Future Rendering $\neq$ Future Surface: A Benchmark and Dataset for Dynamic Surface Reconstruction Beyond the Observed Window

Dynamic-scene reconstruction is almost always evaluated inside the observed time window, yet deployment settin

用途: 生成
難易度: Hard
コスト: High

GrainGS: Gradient-Decoupled Gaussian Splatting for Efficient Dynamic Novel View Synthesis

3Dガウシアンスプレイティングによる動的なシーン再構成は、動的なモーションモデリング、構造的安定性とコンパクトな表現のバランスをとることが求められる。実際、既存のprimitive毎に実際に実装されている方法はローカルの

品質予測/異常検知深層学習軽量化・量子化生成3D

用途: 3D Gaussian Splatting動的シーン再構成
難易度: Hard
コスト: High

DAPM: UAV Monocular Depth Estimation from Any Height, Pitch, Roll and FOV

UAVは、高度、ピッチ、ロール、FOVの変動を含む高度なカメラポーズにおいて動作するため、非対称分布の深さが含まれる広範な空中画像におけるモノラル深度推定を実現するには、高度な深度推定手法が必要である。ほとんどの推定手法

深層学習軽量化・量子化画像3D

用途: UAV用モノラル深度推定
難易度: Hard
コスト: High

GLAM-SLAM: Real-time Gaussian Large-scale Mapping via Flow Densification and Spatial Decomposition

一部のGaussianスプレイティングを利用したSL

品質予測/異常検知深層学習軽量化・量子化検出3D

用途: シンプルで実用的なSLAM
難易度: Hard
コスト: High

Learning-based Seam Correspondence Reconstruction in Sewing Patterns

Digital sewing patterns typically consist of disjoint 2D panels without explicit stitch annotations, making do

深層学習Transformerテキスト3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

DTIF: Robust Loop Closure Detection via Delaunay Triangle Topology in Complex Forests

Accurate forest inventory and large-scale mapping are essential for ecosystem monitoring and sustainable fores

深層学習Transformer検出3D

用途: 検出
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーションテキスト3D

Loss Landscape Topology Reveals Why Simple Baselines are Competitive at 3D Point Cloud Segmentation Under Class Imbalance

3D点群のセグメンテーションではクラス不均衡が発生し、有効な解決策が必要です。この研究では、11 つの不均衡対策を 2D のコンピュータビジョンとは異なる 3D の上で評価し、標準的な交差エントロピーと均衡の重み付けが競

用途: 3D点群のセグメンテーション
難易度: Hard
コスト: High

Geo3R: Mitigating Spatial Reasoning Hallucination in Multimodal Large Language Models

大規模言語モデルのハロウィーン診断では、対象の 3D 空間関係を推論する際に、視覚化が欠如していることが問題となります。この研究では、これらのハロウィーンを軽減するためのアプローチを提案しています。

自然言語処理大規模言語モデル画像テキスト3D

用途: 3D空間推論のハロウィーン診断
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョンセグメンテーション画像3D

TransBiolab: A Real-World Multi-View Dataset of Cluttered Transparent Biomedical Objects

自動化された生理学ラボでは、透明なプラスチック製品を認識、位置付け、操作するために視覚知覚が必要ですが、対象となる高品質のリアルワールドデータセットは現在限られています。この研究では、複雑なマルチオブジェクトのシーンを扱

用途: 膚質物体の可視化
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化画像動画3D

WAT3R: Feedforward Underwater 3D Reconstruction

Reliable feedforward underwater 3D reconstruction remains challenging due to severe light attenuation and back

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

RECO: Region-Aware Compensation for Extrinsic Perturbations in Roadside 3D Detection

この研究では、路上の3Dオブジェクト検出を改善するために、外部性を考慮した地域認識のアラーカンシーを提案します。

深層学習Transformer検出3D

用途: 鉄道沿いのオブジェクト検出
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformer生成3D

FA-LAM: Focus-Aware Large Avatar Model for One-Shot 4D Animatable Gaussian Head

この論文では、Focus-Aware Large Avatar Model（FA-LAM）を提案します。FA-LAMは、一時的なGaussian頭の生成に適したモデルです。

用途: 一時的なGaussian頭の生成
難易度: Hard
コスト: High

Engine-Native Editable 3D World Reconstruction with Objects and Lighting

この論文では、Lumeraという手法を提案します。Lumeraは、Engine-Native 3D World ReconstructionとLightsを検出するために使用します。

自然言語処理大規模言語モデル検出生成画像

用途: 3D世界の再構成
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル画像テキスト動画

ViSTR-Bench: Can MLLMs Reason from Continuous Visual Cues in Dynamic Scenes?

この論文では、ViSTR-Benchという手法を提案します。ViSTR-Benchは、MLLMが動的シーンから情報を取得できるかどうかを評価します。

用途: 3Dシーンの分析
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成画像3D

SubSplat: High-Resolution Pixel-aligned 3DGS via Sub-pixel Gaussian Reparameterization

Pixel-aligned Gaussian splatting enables efficient and generalizable novel-view synthesis. However, high-resol

用途: 生成
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーションQA画像テキスト

Beyond Episodic Evaluation: Memory Architectural Bottlenecks in Sequential Embodied Question Answering

Embodied question answering (EQA) is traditionally evaluated under an episodic formulation, where agents solve

用途: QA
難易度: Hard
コスト: High

CPUで試しやすい深層学習軽量化・量子化検出3Dマルチモーダル

Factorized Spatio-Temporal Convolutions for Human Pose Estimation from Planar Lidar

この論文では、安全な人とロボット間の対話を目的とした、人間の姿勢推定とロボットの動作制御の一連のネットワークが提案されます。

用途: 人間とロボット間の安全な交互作用
難易度: Hard
コスト: High

自然言語処理ファインチューニングテキスト3Dマルチモーダル

ZONDA: Zero-shot Object Navigation with Dynamic Avoidance in Multi-floor Environments

オブジェクト目標のナビゲーションにおける、動的な避け方とマルチフロア環境を考慮した、ゼロショットオブジェクトナビゲーションのフレームワークを提案します。このフレームワークでは、動的な人々とマルチフロア環境を考慮しながら、

用途: マルチフロアにおけるオブジェクト目標のナビゲーション
難易度: Hard
コスト: High

Label-Free Finite-Volume-Residual Training of Attention Graph Neural Networks for Coupled Thermo-Fluid Fields

この研究では、注意機構を併用したグラフニューラルネットワーク (Attention Graph Neural Network) を開発し、流体場の予測精度を向上させた。

深層学習グラフニューラルネット生成3D

用途: 流体場の予測における注意機構の活用
難易度: Hard
コスト: High

説明可能MI向きセンサ/時系列深層学習Transformer3D

AI-Driven Surrogate Models for Predicting Electrode-Scale Discharge Behavior in Lithium-Ion Batteries

Physics-based simulations are essential for understanding the electrode-scale discharge behavior of lithium-io

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

AlphaRoute: Large Language Models as Semantic Optimizers for Multi-Objective Routing

VLSIのグローバルルーティングは、信号ネットワークを 3D グリッド上で割り当てることが目的であり、信号遅れやワ

説明可能自然言語処理大規模言語モデルテキスト3D

用途: マルチ目標ルーティング
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョンセグメンテーション3D

Koopman Dreamer: Spectrally Constrained Latent Dynamics for Stable World-Model Imagination

Latent world models improve sample efficiency in continuous control by optimizing policies over imagined laten

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成テキスト動画

RealVDeblur: One-Step Diffusion for Generalizable Real-World Video Deblurring

Real-world video deblurring remains challenging due to diverse motion patterns, complex degradations, and the

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョン3D・点群3D自己教師

PRIME-SVR: Physics-infoRmed Implicit Multi-Echo Slice-to-Volume Reconstruction for Fetal T2 mapping

Slice-to-volume reconstruction (SVR) is the standard method for obtaining high-resolution (HR) 3D fetal brain

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

A Systematic Benchmark of Intensity Normalisation Methods for 3D Knee MRI Segmentation and Cross-Domain Generalisability

MRI画像の強度正規化方法を7つ比較し、3DUネットワークモデルでMeniscusの分割精度を評価。

コンピュータビジョンセグメンテーション画像3D

用途: MRI画像の強度正規化を解決する
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理大規模言語モデル生成テキスト

Beyond Relevance-Centric Retrieval: Rubric-Oriented Document Set Selection and Ranking

3D オキュピエンシー予測には、物体の配置と密度を解釈するための視覚的手法が必要です。従来の方法では、計算コストが高くなりすぎていたが、新しく提案されたGaussianSeedアルゴリズムは、層を階層化することで、計算コ

用途: 3次元空間における物体の配置と密度の予測
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理RAG生成テキスト3D

3D-GIMP: When 3D Gaussian Inpainting Meets PatchMatch

Recent advances in 3D scene editing have leveraged iterative diffusion models to update input views. However,

用途: 生成
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション生成画像3D

A real-time RGB-D perception pipeline for autonomous impact hammers in mining: self-filtering, rock segmentation and rock-breaking poses generation

Impact hammers, also known as rock-breakers, are essential machines in mining operations, where they perform s

用途: 生成
難易度: Hard
コスト: High

ODeform: Learning Continuous 4D Motion for Shape Deformation with Neural ODEs

Modeling continuous object deformation is important for many computer vision and robotics tasks, such as manip

自然言語処理埋め込み・検索3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョンセグメンテーション生成画像3D

Axolotl3D: a Unified Framework for Faithful 3D Shape Completion

Recent 3D generative models produce high-quality geometry from a single image using large-scale priors and dif

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョン3D・点群生成画像3D

ATSplat: Compact Feed-forward 3D Gaussian Splatting with Adaptive Token Expansion

Novel View Synthesisは、入力画像から新しい視点の画像を生成するタスクです。ATSplatアルゴリズムは、3次元ガウススプラッタリングを Feed-forward に適合させました。これにより、ATSp

用途: Novel View Synthesis
難易度: Hard
コスト: High

RIM: A Retrieval-In-Matching Framework for Cross-Domain Global Visual Localization of UAVs

Global visual localization of unmanned aerial vehicles (UAVs) using remote-sensing reference maps has attracte

センサ/時系列深層学習軽量化・量子化検出画像3D

用途: 検出
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョン3D・点群テキスト3D

GaussianSeed: Hierarchical Gaussian Seeding for High-Resolution 3D Occupancy Prediction

Vision-centric 3D occupancy prediction provides dense scene representations essential for autonomous driving a

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

STEREOFLOW: Progressive Stereo Matching with StereoDiT and Transition Flow Matching

ステレオマッチングは3次元再構成において重要なタスクです。この研究では、ステレオマッチングを確率的生成タスクと組み合わせ、オブジェクト検出の向上を目的として、ステレオマッチングフレームワークと潜在分配を統合する方法を提案

深層学習Transformer生成回帰画像

用途: オブジェクト検出の向上
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョンセグメンテーション3D自己教師

SIINR: Structurally Informed Implicit Neural Representations for super-resolution with uncertainty quantification of clinical quality diffusion MRI datasets

Diffusion Magnetic Resonance Imaging (dMRI) is a powerful tool for probing brain microstructure, but clinical

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

CPUで試しやすい深層学習軽量化・量子化セグメンテーション3D

StrokeSeg2: Stroke Lesion Segmentation in Clinical Research Workflows

Deep learning frameworks like nnU-Net achieve state-of-theart brain lesion segmentation performance but remain

用途: セグメンテーション
難易度: Hard
コスト: High

KineBench: Benchmarking Embodied World Models via IDM-Free Kinematic Grounding

Evaluating the physical consistency of embodied world models(EWMs) is a critical open challenge. While closed-

コンピュータビジョン3D・点群生成異常検知画像

用途: 生成
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション分類テキスト

Frequency-Hierarchical Active k-Space Sampling for Diagnostic MRI

3D Gaussian Splatting (3DGS) は 3D セグメント間の接着を実行するために使用され、テキストドライブの 3D シーンエディットには不可欠です。現行の方法では、固定位置撮影から 2D ディ

用途: 3D Gaussian Splatting (3DGS)のエディットを改善
難易度: Hard
コスト: High

Look Before You Edit: Attention-Guided Camera Placement and Multi-View Alignment for 3D Gaussian Splatting Editing

DRGBTトラッキングの分野では、目標物を変動するセンシングモデリティと観測プラットフォームの条件下で追跡することが求められます。ドリフト、視線、時間の条件変化に関しても検討が必要です。ただし、現在のバenchmarkで

深層学習Transformerテキスト3D

用途: DRGBTトラッキングを改善
難易度: Hard
コスト: High

Global Building Area Estimation Products: How Accurate Are They?

大規模視点合成モデルは、視点間の注意を交差させることで、未知の視点から3Dシーンを推論します。近年、そのようなモデルはRGB情報だけで3Dの空間関係を学習することができたため、近年の研究者たちは、3Dセグメンテーションに

コンピュータビジョン

用途: 新たな視点から3Dシーンを推論する
難易度: Hard
コスト: High

品質予測/異常検知自然言語処理ファインチューニング生成セグメンテーション画像

Extending a Large View Synthesis Model for Multi-view Panoptic Segmentation

自律ロボットには、障害物や事故の回避能力が必要です。これは、障害物や事故の回避能力が強化されていれば、障害物や事故に対しての対策がより効果的になります。障害物や事故の回避能力が強まることで、ロボットが障害物や事故から安全

用途: 自動ロボットが障害物や事故を回避できるようにする
難易度: Hard
コスト: High

CPUで試しやすい深層学習Transformer分類動画3D

A Unified Tokenization Framework for Pain Recognition using Heterogeneous 3D Modalities

Pain is a complex and pervasive phenomenon affecting a large percentage of the population, and accurate assess

用途: 分類
難易度: Hard
コスト: High

arxivGitHubあり2026-07-22

Point-Selection Fine-Tuning Framework for Robust Point Cloud Classification

Noisy and corrupted points can substantially degrade point cloud recognition performance, especially under cha

深層学習軽量化・量子化分類生成3D

用途: 分類
難易度: Hard
コスト: High

センサ/時系列コンピュータビジョン3D・点群テキスト3D

Scalable Low-Cost Laboratory Automation: A Digital Twin-Integrated Robotic Platform for Autonomous Liquid Handling (RAINBOTTM)

Laboratory automation accelerates discovery, yet its adoption is constrained by the high cost, proprietary des

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

SeededGrasp: Language-Guided Grasping in Complex Scenes with Multiple Embodiments

Language-Guided Grasping は、複雑なシーンで物体の把持を行うために、視覚言語モデル（VLM）を用いる。このアプローチでは、VLM は直接把持を予測するのではなく、3 次元空間における把持の位置を指

深層学習軽量化・量子化生成テキスト3D

用途: 複雑なシーンで物体の把持を実現
難易度: Hard
コスト: High

Boltzmann-Expected Molecular Design with Decoupled Annealing Flows

分子設計を自動化する方法「Boltzmann-Expected Molecular Design with Decoupled Annealing Flows（DECAF）」を提案。分子設計で重要な3次元構造の特性を確率

コンピュータビジョン3D・点群生成テキスト3D

用途: 分子設計の自動化
難易度: Hard
コスト: High

センサ/時系列自然言語処理大規模言語モデル画像テキスト動画

D3VL: Understanding Driving Scenes from 3D Time Series Data and Video with Language Models

Recent advances in Multimodal Large Language Models (MLLMs) have triggered the development of end-to-end MLLMs

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Crowd4D: Scene-Aware Monocular 4D Crowd Reconstruction

Recovering scene-consistent 4D crowd motion from monocular video in large-scale scenes remains challenging due

自然言語処理RAG画像動画3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

From Distances to Trajectories: Real-Time Signed Distance Function Mapping and Distance-Accelerated Motion Planning for UAVs

難しい環境で運用するためには、自動空飛ブイロード（UAV）が実際に障害物に存在する距離を判断し、安全な軌跡を計画することが求められる。これを行うために、複数のステージ（マッピングと計画）を連続化した、サイン・ディスタン

コンピュータビジョンセグメンテーション検出3D

用途: UAVの安全な運用
難易度: Hard
コスト: High

品質予測/異常検知深層学習Transformerセグメンテーション画像テキスト

IGGT4D: Streaming 4D Instance-Grounded Geometry Transformer

実際の空間知能では、空間に続いて流れるビデオを理解する必要がある。この問題を解決するために、4次元空間を理解することができるモデルを提案する。

用途: 空間に続いて流れるビデオを理解する
難易度: Hard
コスト: High

Anatomy-Aware 3D Mesh Refinement of Pericardium Segmentations on Computed Tomography

心臓の囲みの区別は、食道肥厚の測定に重要であるが、しかし、これを正確に区別することは難しい。これを解決するために、周囲の解剖学的構造を利用して囲みの区別を改善する方法を提案する。

自然言語処理RAGセグメンテーション画像テキスト

用途: 心臓CT画像から心臓の囲みを正確に区別する
難易度: Hard
コスト: High

Milo, a Fully Autonomous Indoor/Outdoor Robotic Guide Dog

Many Blind and Low-Vision (BLV) people rely on guide dogs for moment-to-moment navigation, such as staying on

コンピュータビジョン3D・点群検出3D

用途: 検出
難易度: Hard
コスト: High

Beyond Transformers: Linear Attention Policy for Open-Vocabulary Object Goal Navigation

オープン・バグナビゲーションには、エージェントへの部分観測が含まれます。パフォーマンスの向上のために、内部状態更新が重要です。これを実現するには、ポリシーネットワークの更新が必要です。最近のアプローチでは、トランスフォー

深層学習Transformerテキスト3D

用途: オープン・バグナビゲーション問題を解決する
難易度: Hard
コスト: High

センサ/時系列コンピュータビジョン3D・点群分類画像動画

MVP-Tac: A Miniaturized Dual-Modal Vision and Photoelastic Tactile Sensor for Robot-Assisted Minimally Invasive Surgery

Robot-assisted minimally invasive surgery (RMIS) offers major benefits over open and conventional laparoscopic

用途: 分類
難易度: Hard
コスト: High

表形式向き深層学習軽量化・量子化テキスト3D強化学習

Intelligent Multi-UAV Navigation in ITNTNs: A Hierarchical LLM Approach

The deployment of high-speed Uncrewed Aerial Vehicles (UAVs) in 3D aerial highways necessitates robust coordin

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

表形式向き深層学習Transformerテキスト表形式3D

Topological Signatures of Context-Level Reliability in TabPFN

多元表格予測モデルTabPFNは、条件設定されたサポートセットと、入力クエリーでタスク特指訓練を行うことなく、推論を行います。実行時間における内部挙動を理解する際に、 zig-zag永続ホモロジーを使用することで、Tab

用途: 予測と協調
難易度: Hard
コスト: High

Calibrated Alzheimer's Conversion Risk in Mild Cognitive Impairment: Persistent Homology of Clinical Trajectories with Conformal Guarantees

この論文では、アルツハイマー病の予測を行うためにPersistent HomologyとConformal Guaranteeという手法を提案する。この手法は、アルツハイマー病の予測を行うために、時間的な軌道を分析するこ

説明可能自然言語処理RAG3D

用途: アルツハイマー病の予測を行う
難易度: Hard
コスト: High

MI向き品質予測/異常検知深層学習軽量化・量子化生成テキスト3D

Do Language Models Dream of Binding Molecules? Benchmarking LLMs under Spatial Constraints

Structure-based drug design (SBDD) leverages the 3D structure of protein targets, often complemented by other

用途: 生成
難易度: Hard
コスト: High

品質予測/異常検知コンピュータビジョン3D・点群3D

Two-Stage Extrinsic Calibration of a Static Line-Scanning Lidar with a Rotary Platform

A line-scanning lidar yields range and azimuth values in a fixed plane. To perceive surrounding objects in 3D,

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Learning Adaptive Safety Margins for Visual Navigation

Robots in cluttered indoor spaces often fail not because they cannot generate collision-free paths, but becaus

コンピュータビジョン3D・点群画像テキスト3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Imitation of Arm Gestures by the Semi-Humanoid Robot NICO

existing HRI methodの制約を解決するためのgestures imitation methodを提案し、robustなジェスチャー認識を達成する。

コンピュータビジョン3D・点群3D

用途: 人間のジェスチャーの模倣を解決する
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション生成3Dマルチモーダル

Closing the Loop in Humanoid VLA: Persistent 3D Object Tokens for Verifiable Loco-Manipulation

existing VLA methodの制約を解決するためのpersistent object token methodを提案し、ロボット制御をより実用的なものにする。

用途: 人間のロボット制御を解決する
難易度: Hard
コスト: High

Distilling Global Traversability Priors for Image-based Affordance Prediction in Off-road Environments

existing robot navigation methodの限界を解決するためのglobal traversability prior extraction methodを提案し、オフロード環境でのロボット移動を実

センサ/時系列自然言語処理RAG画像3D

用途: オフロード環境でのロボット移動を解決する
難易度: Hard
コスト: High

RynnBrain 1.1: Towards More Capable and Generalizable Embodied Foundation Model

existing Embodied Foundation Modelの制限を解決するためのcontact-point prediction とnative 3D grounding methodを提案し、更に能力と

コンピュータビジョンセグメンテーション検出3D

用途: Embodied Foundation Modelの制限を解決する
難易度: Hard
コスト: High

コンピュータビジョンセグメンテーション生成画像動画

Does Robust VIO Need More Learning? Geometry-Verified Visual Measurements under Distribution Shift

Learning is increasingly introduced into visual-inertial odometry (VIO), ranging from learned feature front-en

用途: 生成
難易度: Hard
コスト: High

Lifelong Localization in Dynamic Indoor Environments Combining Odometry with Sparse Distance Sampling

自律ロボットの位置決めは、ロボットナビゲーションの主要なタスクです。ロボットが予測できない、非静的な障害物、またはロボットが未知の環境に入ることが多い。この研究では、ロボットのオドメトリと距離サンプリングを組み合わせて、

センサ/時系列自然言語処理RAG検出3D

用途: 自律ロボットの位置推定
難易度: Hard
コスト: High

Receiver-Centered Robot-to-Human Handover with Grasp-Aware Object Orientation

共役ロボットは、人間オペレータと同梱するワークスペースを共有し、機械手のハンドオーバーなどの安全性の高いマイクロイベント頻繁に発生します。但し、従来の静的なハンドオーバーは、非対称の産業工具を取り扱う際、不自然な抓を持つ

自然言語処理大規模言語モデル分類3D

用途: 道具のハンドオーバー
難易度: Hard
コスト: High

A2RL V\textsubscript{max}: The A2RL autonomous racing dataset for long-range, high-speed perception and multi-vehicle interaction

In autonomous driving development, a perception dataset is crucial, as it provides fundamental data for traini

コンピュータビジョン3D・点群検出テキスト3D

用途: 検出
難易度: Hard
コスト: High

From Sign Language Generation to Humanoid Execution: Vision-Language Guided Retargeting with Collision Mitigation

この論文では、ラインダブルロボットのための自発的アクション生成を実現することを目標とし、vision-language 指向性の指令によりロボットが自発的に動作することができることを示します。

コンピュータビジョン3D・点群生成画像3D

用途: ラインダブルロボットのための自発的アクション生成
難易度: Hard
コスト: High

SLAM in Low-Light Environments: Project Report

この論文では、低照明状況のためのSLAM実現を目標とし、LiDAR、深さ、または熱センサなどの補助的なセンサを取り入れることでSLAMを改良します。

センサ/時系列機械学習時系列検出3D

用途: 低照明状況のためのSLAM実現
難易度: Hard
コスト: High

GeoWorldAD: Geometry World Action Model for Autonomous Driving

Autonomous driving requires both safe and efficient planning decisions in dynamic 3D environments. Although re

深層学習Transformer画像動画3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

センサ/時系列コンピュータビジョンセグメンテーション検出画像3D

DeeperRadar: End-to-End MIMO Radar Design and Multi-Modal Fusion for Autonomous Vehicle Perception

DeeperRadar is a radar-centric, sensor-stack-conditioned framework that co-designs radar sensing and multi-mod

用途: 検出
難易度: Hard
コスト: High

Multi-Resolution Voxelized Map-Based Stereo Visual-Inertial Odometry

Incorporating prior maps significantly enhances the accuracy and robustness of pose estimation in visual-inert

コンピュータビジョン3D・点群画像3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

Move First, Commit Later: Selective LiDAR-to-BIM Global Initialization via Sequential Consensus with Symmetry-Aware Abstention

Global LiDAR-to-BIM initialization must place a robot within an as-designed building model without a prior pos

コンピュータビジョンセグメンテーション3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

センサ/時系列深層学習RNN / LSTM画像3D

DROID-ANCHOR: Odometry-Anchored Recurrent Metric Depth Estimation

Precise metric depth estimation is fundamental for autonomous robot navigation, yet monocular systems inherent

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-07-18

User-Driven Learning from Demonstration: A Trajectory and Impedance Learning Method

This paper presents a method for user-driven robot Learning from Demonstration (LfD) that reduces user effort

コンピュータビジョンセグメンテーション3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-07-18

InLiER: Learning-Free Heterogeneous LiDAR Place Recognition via Intermediate Mixed-Radix Structural Keypoint Tokenization

LiDAR place recognition supports loop closure, relocalization, and multi-agent map management. As robotic plat

センサ/時系列コンピュータビジョンセグメンテーション分類検出3D

用途: 分類
難易度: Hard
コスト: High

Cluster-Aware Matching via Laplacian Optimal Transport

この論文では、インフラリーダーの観点から安全交通のアセスメントを行うためのフレームワーク、PRISAを提案する。このフレームワークは、道路状況の観測と交通安全の評価を提供し、交通渋滞や事故を予測することで安全交通の実現を

説明可能品質予測/異常検知自然言語処理RAG3D

用途: 交通渋滞や事故を予測するためのインフラリーダーによる監視技術の開発
難易度: Hard
コスト: High

Certifiable Safe Model-Based Reinforcement Learning with Control-Affine Dynamics Approximation

Safe model-based reinforcement learning (RL) often bridges control-theoretic analysis and RL for robots to saf

深層学習軽量化・量子化生成3D強化学習

用途: 生成
難易度: Hard
コスト: High

Vision-Language-Motion Maps: An Open-Vocabulary, Uncertainty-Aware, Queryable Motion Attribute for 3D Scene Maps

この研究では、動的なシナリオを分析するために可視化した地図上にMotion Attributeを付与し、Language QueryによるMotion Attributeフィルタを使用して分析することができます。

自然言語処理大規模言語モデル3Dマルチモーダル

用途: 可視化した地図上での動的なシナリオの分析
難易度: Hard
コスト: High

VTLoc: Learning-based Tactile Contact Localization in Visual Point Clouds

VTLocフレームワークは、視覚情報と触覚情報を統合し、ロボットハンドの位置を推定することで、ロボットハンドの位置推定と動作操作を実現します。

コンピュータビジョン3D・点群検出画像テキスト

用途: ロボットハンドの位置推定
難易度: Hard
コスト: High

PIXIE: A Zero-Shot texture-invariant 6D pose estimation framework for unseen objects with assembly defects

PIXIEフレームワークは、6次元オブジェクト位置推定を実現し、ロボットハンドの制御と物体の操作を実現します。

深層学習Transformer画像テキスト3D

用途: オブジェクトの6次元位置推定
難易度: Hard
コスト: High

深層学習Transformerセグメンテーション動画3D

arxivGitHubあり2026-07-17

DPNeXt: A Lightweight Multi-Scale Feature Fusion Framework for Efficient ViT-Based Multi-Task Dense Prediction

多タスク学習はロボティクスの視覚理解系で、セマンティックセグメンテーションと深度推定の統合をサポートします。視覚基底モデル(VFM)は強力な特徴エンコーダとして広く採用されていますが、既存のデコード戦略は重要なボトルネ

用途: ロボティクスの多タスク学習による3D空間理解
難易度: Hard
コスト: High

arxivPaper only2026-07-15

PiVoT: A Variational Solution for Real-time Large-scale Multi-object Detection and Tracking under Heavy Clutter

難しい環境でマルチオブジェクトの検知と追跡が可能なPiVoTを開発、実用的なソリューションを提案した。

深層学習軽量化・量子化検出画像3D

用途: マルチオブジェクトの検知と追跡
難易度: Hard
コスト: High

arxivPaper only2026-07-10

Manifold Constrained Conformal Prediction for Spatial Events

We introduce a new conformal prediction method that constructs calibrated prediction sets over collections of

自然言語処理RAG生成予測3D

用途: 生成
難易度: Hard
コスト: High

arxivPaper only2026-07-07

Do You Remember? Toward Memory-Centric Multimodal AI

Human memory is reconstructive, not a faithful recording. Current multimodal LLMs (MLLMs) lack this capability

品質予測/異常検知深層学習軽量化・量子化画像テキスト3D

用途: 技術検証・論文読解補助
難易度: Hard
コスト: High

arxivPaper only2026-06-15

Evolution & Foundation: AI Shares Creative Control

AIが人間と協力して作り出すアイデアを評価するための新しい手法を提案し、創造性の評価を向上させた。

自然言語処理ファインチューニング生成画像3D

用途: AIの創造性を評価するための新しい手法
難易度: Hard
コスト: High

arxivPaper only2026-06-15

Wavelength-Multiplexed 2D Beam Steering via a Passive Diffractive Network

ワーブレートを利用

センサ/時系列コンピュータビジョン3D・点群3D

用途: ワーブレートを利用した2Dビームステリング
難易度: Hard
コスト: High