PriFT: Prior-Support Guided Supervised Fine-Tuning
この研究では、プレトレーニング済みモデルを低レベルタスクに向けて適応化するためのPrior-Support ガイドされた超視覚的フィニートゥニング方法であるPriFT を提案しました。
- 用途
- 低レベルタスクの適応化
- 難易度
- Hard
- コスト
- High
「qa」の検索結果
40 件この研究では、プレトレーニング済みモデルを低レベルタスクに向けて適応化するためのPrior-Support ガイドされた超視覚的フィニートゥニング方法であるPriFT を提案しました。
この論文では、VideoQA が過度に信憑性の
Self-evolution offers a scalable path to stronger reasoning: a pretrained language model improves itself with
Post-training quantization (PTQ) converts a trained full-precision model into low-bit weights without task-lev
Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and op
Large Language Models (LLMs) and Vision-Language Models (VLMs) are increasingly evaluated on table reasoning t
Medical agent systems are increasingly expected to support interactive clinical decision making rather than on
Egocentric visionを使用して、ペダストリアンの歩く道に渡るのを予測する。Closed-ended visual question answering(VQA)問題に形式することで、ビジョン言語モデルを使用
連続的な治療に適した臨床級LLM医系であるBaichuan-M4を導入。臨床的な医療エージェントシステムであるBaichuan-M4は、統合的な医療エージェントシステムをベースとし、医療エージェントと医療エージェントの連
Multimodal large language models (MLLMs) achieve strong results on visual reasoning benchmarks, but answer acc
この研究では、複雑な推論タスクにおいて、自動フォーマル化を用いて、推論タスクの正しさを検証するためのproxy-judge理論を提案し、この理論を用いて自動フォーマル化が行える方法を開発した。
We introduce ChinaHeritaQA, a multimodal benchmark dataset for evaluating the cultural reasoning abilities of
Agent skills extend language-model agents with task-specific procedures, scripts, and references, but the task
Large language models answer knowledge-intensive questions using both parametric memory and retrieved evidence
The processing of gigapixel whole slide images within vision language models faces a major difficulty due to a
Exploratory manipulation often turns an apparent failed attempt into the key evidence for what to do next. For
Chain-of-thought (CoT) reasoning has proven effective for enhancing problem-solving in large language models.
AgriGov is a curated, trilingual (English-Hindi-Marathi) dataset designed to address the scarcity of domain-gr
Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in visual understanding, yet the
Understanding customer shopping trajectories is essential for enabling personalized shopping experiences. Howe
Long-horizon robot operation requires spatio-temporal memory to record the environment state and recall it for
Despite advances in 3D scene understanding, existing 3D Large Multimodal Models operate in offline settings, r
End-to-end autonomous driving modelsがmulti-modal maneuver generationとreal-time inferenceをバランスすることが難しい問題を解決し、di
Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills i
Retrieval-augmented QA pipelines often route retrieved passages through an LLM rewriter before a smaller reade
Traffic sign recognition is crucial for intelligent transportation and autonomous driving, as it can improve d
We study the personal camera roll visual question answering setting. In this setting, a conversational AI assi
System prompt optimization improves agent behavior without modifying the underlying model, yielding human-read
Processing video in vision-language models is expensive: each frame occupies hundreds of tokens, and inference
Selection is a core operation in interactive image editing. To be practical, a user should be able to specify
Memory is an indispensable capability for long-horizon LLM agents, enabling them to preserve and utilize infor
Detecting coordination among unmanned aerial vehicle (UAV) fleets operating in shared airspace and identifying
Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substanti
AI glasses present a compelling platform for AI agents to serve as personalized memory assistants. To be genui
The emergence of specialized, domain-tuned Large Language Models (LLMs) has demonstrated that smaller models c
Multimodal Large Language Models (MLLMs) have demonstrated significant achievements in general visual question
Evolutionary systems have demonstrated remarkable results in creative domains, with recent applications in gen
Off-policy reinforcement learning of pretrained flow policies remains challenging due to the instability of op
Darwin Family
Variational Quantum Algorithms (VQAs) are a leading approach to exploiting near-term quantum hardware, leverag