Failure is Feedback | History-Aware Backtracking for Agentic Traversal in Multimodal Graphs

Abstract

Open-domain multimodal document retrieval must navigate a graph of paragraphs, tables, and images where meaning depends on local context and hyperlinks. Existing graph-based retrievers rely on hop-agnostic similarity scores and rigid plans, so they struggle to recover when early hops are misleading or underspecified.

Failure is Feedback (FiF) reframes traversal as a sequential decision process. An orchestrator maintains a structured memory of successes and failures, escalates from lightweight vector matching to LLM reasoning only when a hop is ambiguous, and performs history-aware backtracking that re-anchors search using failure traces. Experiments on MultimodalQA, MMCoQA, and WebQA show state-of-the-art retrieval while keeping API usage cost-aware.

Highlights

History-aware backtracking

Failed hops are logged as actionable traces; FiF re-anchors to promising prior contexts instead of simply undoing steps, preventing repeated dead ends.

Cost-aware strategy escalation

Traversal begins with cheap vector matching and escalates to LLM reasoning only when ambiguity or past failures justify the extra cost.

Structured decision process

An orchestrator alternates Traverse, Plan, and Stop actions over a layered component graph, guided by a memory of subqueries and outcomes.

Why Failure is Feedback?

Motivating failures for graph-based retrieval

Motivating examples: (a) similarity-only traversal follows a spurious cue; (b) a fixed plan issues an underspecified hop and cannot recover. FiF turns such failures into guidance.

Method Overview

Overview of the FiF orchestrator and actions

The orchestrator decides when to Traverse (choose anchors, subquery, and strategy), when to Plan (revise subqueries), and when to Stop to rerank accumulated evidence. Memory keeps successes and failures, enabling history-aware re-anchoring and strategy escalation.

Layered Component Graph

FiF operates on a layered component graph that mixes navigational and hierarchical edges across documents, components, and subcomponents, supporting multi-hop and multi-granularity retrieval.

Experiments

Benchmarks: MultimodalQA, MMCoQA, and WebQA reconstructed as webpage-style corpora with paragraphs, tables, and images; evaluated with Recall@k and MRR@10, plus downstream QA (EM/F1).
Performance: FiF achieves state-of-the-art retrieval across all three benchmarks. Removing history-aware backtracking drops R@10 from 85.82 → 79.26 on MultimodalQA and 81.91 → 77.45 on WebQA, highlighting the value of failure traces.
Efficiency: Cost-aware escalation keeps average API usage moderate (≈$0.11/query on MultimodalQA with 7–8 LLM calls), balancing accuracy and latency.
Coming soon: Code and dataset links will be added once released.

BibTeX

@misc{yun2026failurefeedbackhistoryawarebacktracking,
      title={Failure is Feedback: History-Aware Backtracking for Agentic Traversal in Multimodal Graphs}, 
      author={Joohyung Yun and Doyup Lee and Wook-Shin Han},
      year={2026},
      eprint={2602.03432},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2602.03432}, 
}

Failure is Feedback:History-Aware Backtracking for Agentic Traversal in Multimodal Graphs