SWE-Explore: Benchmarking How Coding Agents Explore Repositories
Repository-level coding benchmarks such as SWE-bench have driven a rapid surge in the capabilities of coding a
- 用途
- 検出
- 難易度
- Easy
- コスト
- Low
「retrieval」の検索結果
15 件Repository-level coding benchmarks such as SWE-bench have driven a rapid surge in the capabilities of coding a
Deep Research (DR) has emerged as a new agentic paradigm to tackle complex, open-ended research tasks, demandi
Large language models exhibit impressive zero-shot capabilities across a wide range of downstream tasks. Howev
Retrieval for search agents is still inherited from non-agentic information retrieval: a retriever ranks the c
Hard-negative source selection for dense retrieval is usually decided only after fine-tuning and downstream ev
Retrieval-augmented QA pipelines often route retrieved passages through an LLM rewriter before a smaller reade
Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term in
Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story prog
Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery
Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predo
Learning representations of CAD models is a largely open problem. While 3D representation learning has flouris
Structured financial audit verification is difficult for language-model agents because correctness depends on
Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeated
Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substanti
AI glasses present a compelling platform for AI agents to serve as personalized memory assistants. To be genui