The above figure illustrates the noise distribution of Monte Carlo Estimation, where \( t_{pred} \) denotes the annotated error location (label), and \( t_{true} \) denotes the ground-truth error location (label).
Here we list an important observations on noise distribution:
Building upon the insights, we propose SCAN framework, consisting of two modules: (1) an efficient data synthesis framework to reduce substantial inference costs, and (2) robust training methods to mitigate the high noise ratio in synthetic data and enable robust learning with noisy labels.
@article{ding2025scan,
title={SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning},
author={Ding, Yuyang and Shi, Xinyu and Li, Juntao and Liang, Xiaobo and Tu, Zhaopeng and Zhang, Min},
journal={arXiv preprint arXiv:2509.16548},
year={2025}
}