<< ️Neural networks learn features that reflect the hierarchical, multi-scale structure of natural data. Synthetic datasets used to evaluate interpretability methods typically lack this structure, limiting their value as realistic toy models. >>
<< ️To close this gap, (AA) introduce a family of synthetic datasets consisting of hierarchical functions defined on critical mean-field percolation clusters embedded in a high-dimensional data space. The percolation data consists of sparse, low-dimensional fractal clusters with a power-law size distribution. Latent variables modeling a taxonomic hierarchy generate each data point's target value. >>
<< The data model is analytically tractable with known critical exponents that fix its properties without requiring hyperparameter tuning. (They) leverage a mapping between percolation clusters, random trees, and additive coalescence to propose an almost linear-time algorithm to jointly sample a random tree and its hierarchical latent decomposition, enabling data generation at arbitrary scale. >>
<< ️Using probing experiments, (AA) find that the model's ground-truth latent variables can be linearly decoded from neural network activations. Together, sparsity, self-similarity, power-law statistics, and analytical tractability make critical percolation a principled testbed for interpretability research. >>
Aryeh Brill, Tom Ingebretsen Carlson. Critical Percolation as a Synthetic Data Model for Interpretability. arXiv: 2606.20347v1 [cs.LG]. Jun 18, 2026.
Also: ai, artificial intell, bot, network, in https://www.inkgmr.net/kwrds.html
Keywords: gst, ai, artificial intell, bot, networks, neural networks, percolation, criticality, critical percolation, critical mean-field percolation clusters, interpretability research, random trees, fractals, hierarchical latent decomposition, sparsity, self-similarity, power-law statistics.