PriFT: Prior-Support Guided Supervised Fine-Tuning
この研究では、プレトレーニング済みモデルを低レベルタスクに向けて適応化するためのPrior-Support ガイドされた超視覚的フィニートゥニング方法であるPriFT を提案しました。
- 用途
- 低レベルタスクの適応化
- 難易度
- Hard
- コスト
- High
「qa」の検索結果
28 件この研究では、プレトレーニング済みモデルを低レベルタスクに向けて適応化するためのPrior-Support ガイドされた超視覚的フィニートゥニング方法であるPriFT を提案しました。
この論文では、VideoQA が過度に信憑性の
Self-evolution offers a scalable path to stronger reasoning: a pretrained language model improves itself with
Post-training quantization (PTQ) converts a trained full-precision model into low-bit weights without task-lev
Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and op
Large Language Models (LLMs) and Vision-Language Models (VLMs) are increasingly evaluated on table reasoning t
Medical agent systems are increasingly expected to support interactive clinical decision making rather than on
Egocentric visionを使用して、ペダストリアンの歩く道に渡るのを予測する。Closed-ended visual question answering(VQA)問題に形式することで、ビジョン言語モデルを使用
連続的な治療に適した臨床級LLM医系であるBaichuan-M4を導入。臨床的な医療エージェントシステムであるBaichuan-M4は、統合的な医療エージェントシステムをベースとし、医療エージェントと医療エージェントの連
Multimodal large language models (MLLMs) achieve strong results on visual reasoning benchmarks, but answer acc
この研究では、複雑な推論タスクにおいて、自動フォーマル化を用いて、推論タスクの正しさを検証するためのproxy-judge理論を提案し、この理論を用いて自動フォーマル化が行える方法を開発した。
We introduce ChinaHeritaQA, a multimodal benchmark dataset for evaluating the cultural reasoning abilities of
Agent skills extend language-model agents with task-specific procedures, scripts, and references, but the task
Large language models answer knowledge-intensive questions using both parametric memory and retrieved evidence
The processing of gigapixel whole slide images within vision language models faces a major difficulty due to a
Exploratory manipulation often turns an apparent failed attempt into the key evidence for what to do next. For
Chain-of-thought (CoT) reasoning has proven effective for enhancing problem-solving in large language models.
AgriGov is a curated, trilingual (English-Hindi-Marathi) dataset designed to address the scarcity of domain-gr
Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in visual understanding, yet the
Understanding customer shopping trajectories is essential for enabling personalized shopping experiences. Howe
Long-horizon robot operation requires spatio-temporal memory to record the environment state and recall it for
End-to-end autonomous driving modelsがmulti-modal maneuver generationとreal-time inferenceをバランスすることが難しい問題を解決し、di
Traffic sign recognition is crucial for intelligent transportation and autonomous driving, as it can improve d
Detecting coordination among unmanned aerial vehicle (UAV) fleets operating in shared airspace and identifying
The emergence of specialized, domain-tuned Large Language Models (LLMs) has demonstrated that smaller models c
Evolutionary systems have demonstrated remarkable results in creative domains, with recent applications in gen
Darwin Family
Variational Quantum Algorithms (VQAs) are a leading approach to exploiting near-term quantum hardware, leverag