CompVis
/

flow-poke-transformer

+---
+license: cc-by-nc-4.0
+pipeline_tag: image-to-image
+library_name: pytorch
+---
+# Flow Poke Transformer (FPT)
+[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://compvis.github.io/flow-poke-transformer/)
+[![Paper](https://img.shields.io/badge/arXiv-paper-b31b1b)](https://huggingface.co/papers/2510.12777)
+[![Code](https://img.shields.io/badge/GitHub-Code-181717?logo=github)](https://github.com/CompVis/flow-poke-transformer)
+[![HuggingFace Weights](https://img.shields.io/badge/HuggingFace-Weights-orange)](https://huggingface.co/CompVis/flow-poke-transformer)
+## Paper and Abstract
+The Flow Poke Transformer (FPT) was presented in the paper [What If : Understanding Motion Through Sparse Interactions](https://huggingface.co/papers/2510.12777).
+FPT is a novel framework for directly predicting the distribution of local motion, conditioned on sparse interactions termed "pokes". Unlike traditional methods that typically only enable dense sampling of a single realization of scene dynamics, FPT provides an interpretable, directly accessible representation of multi-modal scene motion, its dependency on physical interactions, and the inherent uncertainties of scene dynamics. The model has been evaluated on several downstream tasks, demonstrating competitive performance in dense face motion generation, articulated object motion estimation, and moving part segmentation from pokes.
+## Project Page and Code
+*   **Project Page:** [https://compvis.github.io/flow-poke-transformer/](https://compvis.github.io/flow-poke-transformer/)
+*   **GitHub Repository:** [https://github.com/CompVis/flow-poke-transformer](https://github.com/CompVis/flow-poke-transformer)
+![FPT predicts distributions of potential motion for sparse points](https://compvis.github.io/flow-poke-transformer/static/images/teaser_fig.png)
+_FPT predicts distributions of potential motion for sparse points. Left: the paw pushing the hand down will force the hand downwards, resulting in a unimodal distribution. Right: the hand moving down results in two modes, the paw following along or staying put._
+## Usage
+The easiest way to try FPT is via our interactive demo:
+```shell
+python -m scripts.demo.app --compile True --warmup_compiled_paths True
+```
+Compilation is optional but recommended for a better user experience. A checkpoint will be downloaded from Hugging Face by default if not explicitly specified via the CLI.
+For programmatic usage, the simplest way to use FPT is via `torch.hub`:
+```python
+import torch
+model = torch.hub.load("CompVis/flow_poke_transformer", "fpt_base")
+```
+If you wish to integrate FPT into your own codebase, you can copy `model.py` and `dinov2.py` from the [GitHub repository](https://github.com/CompVis/flow-poke-transformer). The model can then be instantiated as follows:
+```python
+import torch
+from flow_poke.model import FlowPokeTransformer_Base
+model: FlowPokeTransformer_Base = FlowPokeTransformer_Base()
+state_dict = torch.load("fpt_base.pt") # You would need to download the weights separately
+model.load_state_dict(state_dict)
+model.requires_grad_(False)
+model.eval()
+```
+The `FlowPokeTransformer` class contains all necessary methods for various applications. For high-level usage, refer to the `FlowPokeTransformer.predict_*()` methods. For low-level usage, the module's `forward()` can be used.
+## Citation
+If you find our model or code useful, please cite our paper:
+```bibtex
+@inproceedings{baumann2025whatif,
+    title={What If: Understanding Motion Through Sparse Interactions},
+    author={Stefan Andreas Baumann and Nick Stracke and Timy Phan and Bj{\"o}rn Ommer},
+    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
+    year={2025}
+}
+```