TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

EPISODE · Mar 24, 2026 · 25 MIN

TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

from Daily Paper Cast · host Jingwen Liang, Gengyu Wang

🤗 Upvotes: 42 | cs.CV Authors: Yan Shu, Bin Ren, Zhitong Xiong, Xiao Xiang Zhu, Begüm Demir, Nicu Sebe, Paolo Rota Title: TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation Arxiv: http://arxiv.org/abs/2603.19039v1 Abstract: Vision-language models (VLMs) have shown promise in earth observation (EO), yet they struggle with tasks that require grounding complex spatial reasoning in precise pixel-level visual representations. To address this problem, we introduce TerraScope, a unified VLM that delivers pixel-grounded geospatial reasoning with two key capabilities: (1) modality-flexible reasoning: it handles single-modality inputs (optical or SAR) and adaptively fuses different modalities into the reasoning process when both are available; (2) multi-temporal reasoning: it integrates temporal sequences for change analysis across multiple time points. In addition, we curate Terra-CoT, a large-scale dataset containing 1 million samples with pixel-level masks embedded in reasoning chains across multiple sources. We also propose TerraScope-Bench, the first benchmark for pixel-grounded geospatial reasoning with six sub-tasks that evaluates both answer accuracy and mask quality to ensure authentic pixel-grounded reasoning. Experiments show that TerraScope significantly outperforms existing VLMs on pixel-grounded geospatial reasoning while providing interpretable visual evidence.

NOW PLAYING

TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

0:00 25:57

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

No similar episodes found.

URL copied to clipboard!