🧬 MIST: Mutual Information estimation via Supervised Training
Overview
MIST is a framework for fully data-driven mutual information (MI) estimation. It leverages neural networks trained on large meta-datasets of distributions to learn flexible, differentiable MI estimators that generalize across sample sizes, dimensions, and modalities. The framework supports uncertainty quantification via quantile regression and provides fast, well-calibrated inference suitable for integration into modern ML pipelines.
💡 Notice: We present two models trained using our framework. This page provides a description and usage instructions for MIST, a model that delivers point estimates of mutual information. If you're interested in MIST-QR that trained with a quantile regression loss to approximate the full sampling distribution of mutual information rather than a single point estimate, please visit the MIST-QR. 💡
Using MIST to Estimate Mutual Information
To use MIST in your own project, install the mist_statinf package via pip install mist-statinf (or follow the installation instructions in the MIST GitHub repository) and use it like a standard PyTorch model.
from mist_statinf import MISTForHF
# Load the model from Hugging Face
model = MISTForHF.from_pretrained("grgera/MIST")
model.eval()
# Replace with your data
X, Y = ([[ 0.49671415], [-0.1382643 ]], [[ 0.22494338], [ 0.08878913]])
mi = model.estimate_point(X, Y)
print("MIST estimate:", mi)
Intended uses
Our model is intended to be used as a fast, data-driven, and differentiable mutual information estimator for research and applied machine learning tasks. It is designed to provide accurate point estimates of mutual information (MI) between two sets of random variables, as well as calibrated uncertainty quantification via quantile regression, enabling users to approximate the full sampling distribution of MI rather than relying on a single scalar value.
The model is particularly well-suited for settings involving variable sample sizes, high-dimensional data, or multimodal inputs, and can be readily integrated into larger end-to-end trainable pipelines (e.g., representation learning, model selection, or dependency detection). Thanks to its training on a diverse synthetic meta-dataset and invariance-aware architecture, it generalizes reliably to distributions not seen during training.
Training procedure
We also provide the full training framework to adapt our models to custom statistical inference tasks. The project repository includes detailed instructions on how to reproduce our experiments or train a model from scratch, including tools to generate synthetic meta-datasets tailored to your specific data characteristics and estimation goals.
Citation
If you find our framework or MIST / MIST-QR estimators useful in your research, please cite the following paper:
@misc{gritsai2025mistmutualinformationsupervised,
title={MIST: Mutual Information Via Supervised Training},
author={German Gritsai and Megan Richards and Maxime Méloux and Kyunghyun Cho and Maxime Peyrard},
year={2025},
eprint={2511.18945},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2511.18945},
}
- Downloads last month
- 2