Model Card for PULSAR-pbmc

PULSAR (Patient Understanding Leveraging Single-cell universAl Representation) is a multi-scale, multi-cellular foundation model for human peripheral blood mononuclear cells (PBMCs). It transforms a set of single-cell transcriptomes into an interpretable donor-level embedding that preserves single-cell resolution while capturing multicellular composition and coordination.

This repo hosts the zero-shot PBMC model (PULSAR-pbmc) used to produce donor embeddings without task-specific fine-tuning. A disease-aligned variant is also available (see Model Sources).

Model Details

Model Description

PULSAR (Patient Understanding Leveraging Single-cell universAl Representation) is a hierarchical, multi-scale foundation model for PBMC scRNA-seq that converts unordered sets of single cells into a 512-d donor embedding while preserving single-cell resolution. It integrates molecular priors from ESM2 protein embeddings, cellular representations via Universal Cell Embeddings (UCE, 1,280-d), and a Multicellular Transformer encoder–decoder trained with a high-masking, Masked Cell Modeling objective. Pretraining proceeds in two stages: a pan-tissue CELLxGENE corpus (≈36.2M cells; 6,807 samples) followed by continual pretraining on blood (≈8.74M cells; 2,588 samples). The resulting donor embeddings support zero-shot and lightweight-head downstream tasks, including large-scale reference mapping for disease classification (state-of-the-art accuracy with strong external generalization), regression of plasma proteomics from transcriptomes, forecasting of future outcomes (e.g., RA conversion in ACPA+ individuals and influenza vaccine responsiveness), and individualized cytokine perturbation modeling across donor, cellular, and gene levels. A “virtual instrument” conditions on cytokine protein embeddings to transform baseline donor states and, with the decoder and an optional UCE→expression head, generates perturbed cell distributions and gene programs. Attention over cells provides mechanistic interpretability, highlighting disease- and severity-relevant subsets and enriching for antigen-specific clonotypes in viral infection. PULSAR thus operationalizes the AI Virtual Cell vision by linking molecular, cellular, and multicellular organization into a unified, transferable representation for precision immunology.

  • Developed by: Kuan Pang (Stanford University, kuanpang@stanford.edu)
  • Model type: Transformer
  • License: MIT

Model Sources [optional]

Uses

Direct Use

  • Generate 512-d donor embeddings from PBMC scRNA-seq to:
    • Perform reference mapping/retrieval (kNN) for disease phenotypes
    • Build lightweight predictors for clinical variables (e.g., plasma proteomics, vaccine response)
    • Support in-silico perturbation pipelines (with the provided virtual-instrument and decoders)
    • Enable interpretability via attention over single cells and cell types

Downstream Use [optional]

  • Fine-tune/align the embedding space for a labeled task (e.g., contrastive alignment by disease label).
  • Integrate with perturbation modules to predict donor-, cell-, and gene-level responses to cytokines.

Out-of-Scope Use

The model might not work for tissue types other than PBMC, that also includes cell sorting samples.

How to Get Started with the Model

Use the code below to get started with the model.

Training Details

Training Data

Stage-1 pretraining corpus: CZ CELLxGENE Census (LTS 2023-07-25), 36.2M cells, 6,807 samples across 53 tissues and 69 conditions.

Stage-2 continual pretraining (blood focus): 8.736M cells, 2,588 blood/PBMC samples (balanced sexes; broad ages).

More details can be found in the Paper and GitHub.

Citation

BibTeX:

@article{pang2025pulsar,
  title={PULSAR: a Foundation Model for Multi-scale and Multicellular Biology},
  author={Pang, Kuan and Rosen, Yanay and Kedzierska, Kasia and He, Ziyuan and Rajagopal, Abhe and Gustafson, Claire E and Huynh, Grace and Leskovec, Jure},
  journal={bioRxiv},
  pages={2025--11},
  year={2025},
  publisher={Cold Spring Harbor Laboratory}
}
Downloads last month
63
Safetensors
Model size
87.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KuanP/PULSAR-pbmc

Finetunes
1 model