huggingfaceHugging Faceあり2026-06-04
Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models
Multimodal Large Language Models (MLLMs) excel at 2D semantic understanding but lack intrinsic 3D awareness, r
表形式向き自然言語処理大規模言語モデルテキスト動画3D
- 用途
- 技術検証・論文読解補助
- 難易度
- Easy
- コスト
- High
→