PhysScene: A Scene Graph Dataset for Scientific Visual Reasoning in Physics Experiments
Scene Graphs (SGs) provide structured representations of visual scenes by modeling objects and their pairwise
- 用途
- 技術検証・論文読解補助
- 難易度
- Hard
- コスト
- Medium
「image」の検索結果
25 件Scene Graphs (SGs) provide structured representations of visual scenes by modeling objects and their pairwise
Reasoning Vision-Language Models (VLMs) achieve strong performance on complex multimodal tasks, but reliable r
Diffusion-based generative models have achieved remarkable success in real-world image super-resolution (SR).
Clinical ultrasound images often contain artificial markers, such as measurement calipers and text, to assist
Multi-modal Large Language Models (MLLMs) have achieved remarkable progress in video temporal grounding with r
Source detection in modern observational astronomy is a cornerstone for localizing and identifying stellar sou
The rapid adoption of diffusion and large-scale generative models has made it increasingly challenging to dist
Despite the success of image generation from text descriptions, it still faces challenges that are difficult t
Simulation plays a key role in automated robotics research supported by large language models (LLMs). However,
Data pruning (DP), as an oft-stated strategy to alleviate heavy training burdens, reduces the volume of traini
Temporary work-zone speed limits are communicated through visually inconsistent signage and are often missing
Abnormality detection is a crucial yet challenging task in medical image analysis. Distinguishing abnormalitie
Although video virtual try-on (VVT) has achieved significant progress, existing methods still exhibit two fund
Chain-of-thought (CoT) reasoning has proven effective for enhancing problem-solving in large language models.
Palmprint modality offers a privacy-preserving biometric solution, yet its deployment is hindered by the domai
Multi-contrast brain MRI provide complementary soft-tissue characteristics that aid in the screening and diagn
World action models inherit the predictive capability of world models, enabling action generation to be guided
Vision-Language-Action (VLA) models achieve strong benchmark performance but still struggle in real-world depl
Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in visual understanding, yet the
Recovering the relative 6-DoF pose between two image groups underlies cross-sequence relocalization and multi-
Recent agent frameworks such as Claude Code, Codex, and OpenClaw are strong at tool use and orchestration, but
Variational autoencoders (VAEs) learn low-dimensional latent representations of high-dimensional data. When th
Adapting large language models (LLMs) to clinical workflows often requires costly fine-tuning or manual prompt
この論文では、VLAモデルをedgeハードウェアにデプロイするための手法を提案しています。この手法は、VLAモデルをedgeハードウェアにデプロイするためのフレームワークです。この手法は、edgeハードウェアを利用してV
この研究では、人間-ロボット 協力のためのDistributed Conversational Frameworkを提案します。