We argue that existing training-free segmentation methods rely on an implicit and limiting assumption, that segmentation is a spectral graph partitioning problem over diffusion-derived affinities. Such approaches, based on global graph partitioning and eigenvector-based formulations of affinity matrices, suffer from several fundamental drawbacks, they require pre-selecting the number of clusters, induce boundary oversmoothing due to spectral relaxation, and remain highly sensitive to noisy or multi-modal affinity distributions. Moreover, many prior works neglect the importance of local neighborhood structure, which plays a crucial role in stabilizing affinity propagation and preserving fine-grained contours. To address these limitations, we reformulate training-free segmentation as a stochastic flow equilibrium problem over diffusion-induced affinity graphs, where segmentation emerges from a stochastic propagation process that integrates global diffusion attention with local neighborhoods extracted from stable diffusion, yielding a sparse yet expressive affinity structure. Building on this formulation, we introduce a Markov propagation scheme that performs random-walk-based label diffusion with an adaptive pruning strategy that suppresses unreliable transitions while reinforcing confident affinity paths. Experiments across seven widely used semantic segmentation benchmarks demonstrate that our method achieves state-of-the-art zero-shot performance, producing sharper boundaries, more coherent regions, and significantly more stable masks compared to prior spectral-clustering-based approaches.
| Model | VOC | Context | COCO-Object | COCO-Stuff-27 | Cityscapes | ADE20K |
|---|---|---|---|---|---|---|
| ReCO (NeurIPS'22) | 25.1 | 19.9 | 15.7 | 26.3 | 19.3 | 11.2 |
| MaskCLIP (ICML'23) | 38.8 | 23.6 | 20.6 | 19.6 | 10.0 | 9.8 |
| MaskCut (CVPR'23) | 53.8 | 43.4 | 30.1 | 41.7 | 18.7 | 35.7 |
| iSeg (Arxiv'24) | × | × | × | 45.2 | 25.0 | × |
| DiffSeg (CVPR'24) | 49.8 | 48.8 | 23.2 | 44.2 | 16.8 | 37.7 |
| DiffCut (NeurIPS'24) | 65.2 | 56.5 | 34.1 | 49.1 | 30.6 | 44.3 |
| Seg4Diff (NeurIPS'25) | 54.9 | 52.6 | 38.5 | 49.7 | 24.2 | 44.9 |
| SPARK | 66.9 (+1.7) | 57.7 (+1.2) | 42.7 (+4.2) | 52.0 (+2.3) | 33.9 (+3.5) | 48.0 (+3.1) |
@article{mahatha2025nerve,
title={NERVE: Neighbourhood \& Entropy-guided Random-walk for training free open-Vocabulary sEgmentation},
author={Mahatha, Kunal and Dolz, Jose and Desrosiers, Christian},
journal={arXiv preprint arXiv:2511.08248},
year={2025}
}