function aggregate_scores(obj) {
return d3.mean(obj.map(val => {
if (val.score === undefined || isNaN(val.score)) return 0;
return Math.min(1, Math.max(0, val.score))
}));
}
function transpose_list_of_objects(list) {
return Object.fromEntries(Object.keys(list[0]).map(key => [key, list.map(d => d[key])]))
}
function label_time(time) {
if (time < 1e-5) return "0s";
if (time < 1) return "<1s";
if (time < 60) return `${Math.floor(time)}s`;
if (time < 3600) return `${Math.floor(time / 60)}m`;
if (time < 3600 * 24) return `${Math.floor(time / 3600)}h`;
if (time < 3600 * 24 * 7) return `${Math.floor(time / 3600 / 24)}d`;
return ">7d"; // Assuming missing values are encoded as NaN
}
function label_memory(x_mb, include_mb = true) {
if (!include_mb && x_mb < 1e3) return "<1G";
if (x_mb < 1) return "<1M";
if (x_mb < 1e3) return `${Math.round(x_mb)}M`;
if (x_mb < 1e6) return `${Math.round(x_mb / 1e3)}G`;
if (x_mb < 1e9) return `${Math.round(x_mb / 1e6)}T`;
return ">1P";
}
function mean_na_rm(x) {
return d3.mean(x.filter(d => !isNaN(d)));
}
Foundation models
Modelling of single-cells to perform multiple tasks.
4 datasets · 8 methods · 2 control methods · 2 metrics
Info
Repository
Issues
build_main
MIT
Task info Method info Metric info Dataset info Results
Recent developments in deep-learning have led to the creation of several ‘foundation models’ for single-cell data. These are large models that have been trained on data from millions of cells and am to fully capture the variability in the single-cell landscape. Typically, they use a transformer architecture (Szałata et al. 2024) and undergo self-supervised pre-training using masking of parts of the input data. Trained foundation models can then be applied to a variety of downstream tasks, either by directly feeding new data into the model or by fine-tuning to better fit a new dataset or to produce a specific output. The general nature of single-cell foundation models and the large amount of data they have been trained on makes them potentially powerful tools for single-cell analysis but their performance is yet to be fully established.
Open Problems builds on existing evaluations (Boiarsky et al. 2023; Liu et al. 2024) of foundation models by incorporating them into our continuous benchmarking framework.
This overview combines results from the following benchmarks for individual tasks:
This benchmark is a work in progress. If you are interested in evaluating foundation models for single-cell data please fill in the form below to get in touch.
Foundation models contact form
Interpretation
The foundation models task combines results for multiple analysis tasks and therefore should be interpreted differently. We treat each task as a metric that means the overall performance of a foundation model on that type of analysis. For more information about how foundation models perform at specific aspects of each task you should refer to that tasks results page. As well as comparing between foundation models we provide additional context by including representative standard methods for each task, allowing us to see how foundation models compare to more established methods.
The high hardware and computational requirements of some foundation models present additional challenges for benchmarking and make it difficult to obtain results in some cases.
Summary
Display settings
Filter datasets
Filter methods
Filter metrics
Results
Results table of the scores per method, dataset and metric (after scaling). Use the filters to make a custom subselection of methods and datasets. The “Overall mean” dataset is the mean value across all datasets.
Dataset info
Show
GTEX v9
Source dataset · Data source · 23-01-2025 · 196.56 MiB
Single-nucleus cross-tissue molecular reference maps to decipher disease gene function (Eraslan et al. 2022).
Understanding the function of genes and their regulation in tissue homeostasis and disease requires knowing the cellular context in which genes are expressed in tissues across the body. Single cell genomics allows the generation of detailed cellular atlases in human tissues, but most efforts are focused on single tissue types. Here, we establish a framework for profiling multiple tissues across the human body at single-cell resolution using single nucleus RNA-Seq (snRNA-seq), and apply it to 8 diverse, archived, frozen tissue types (three donors per tissue). We apply four snRNA-seq methods to each of 25 samples from 16 donors, generating a cross-tissue atlas of 209,126 nuclei profiles, and benchmark them vs. scRNA-seq of comparable fresh tissues. We use a conditional variational autoencoder (cVAE) to integrate an atlas across tissues, donors, and laboratory methods. We highlight shared and tissue-specific features of tissue-resident immune cells, identifying tissue-restricted and non-restricted resident myeloid populations. These include a cross-tissue conserved dichotomy between LYVE1- and HLA class II-expressing macrophages, and the broad presence of LAM-like macrophages across healthy tissues that is also observed in disease. For rare, monogenic muscle diseases, we identify cell types that likely underlie the neuromuscular, metabolic, and immune components of these diseases, and biological processes involved in their pathology. For common complex diseases and traits analyzed by GWAS, we identify the cell types and gene modules that potentially underlie disease mechanisms. The experimental and analytical frameworks we describe will enable the generation of large-scale studies of how cellular and molecular processes vary across individuals and populations.
Tabula Sapiens
Source dataset · Data source · 23-01-2025 · 23.61 MiB
A multiple-organ, single-cell transcriptomic atlas of humans (Jones et al. 2022).
Tabula Sapiens is a benchmark, first-draft human cell atlas of nearly 500,000 cells from 24 organs of 15 normal human subjects. This work is the product of the Tabula Sapiens Consortium. Taking the organs from the same individual controls for genetic background, age, environment, and epigenetic effects and allows detailed analysis and comparison of cell types that are shared between tissues. Our work creates a detailed portrait of cell types as well as their distribution and variation in gene expression across tissues and within the endothelial, epithelial, stromal and immune compartments.
Immune Cell Atlas
Source dataset · Data source · 23-01-2025 · 117.72 MiB
Cross-tissue immune cell analysis reveals tissue-specific features in humans (Domínguez Conde et al. 2022).
Despite their crucial role in health and disease, our knowledge of immune cells within human tissues remains limited. We surveyed the immune compartment of 16 tissues from 12 adult donors by single-cell RNA sequencing and VDJ sequencing generating a dataset of ~360,000 cells. To systematically resolve immune cell heterogeneity across tissues, we developed CellTypist, a machine learning tool for rapid and precise cell type annotation. Using this approach, combined with detailed curation, we determined the tissue distribution of finely phenotyped immune cell types, revealing hitherto unappreciated tissue-specific features and clonal architecture of T and B cells. Our multitissue approach lays the foundation for identifying highly resolved immune cell types by leveraging a common reference dataset, tissue-integrated expression analysis, and antigen receptor sequencing.
Diabetic Kidney Disease
Source dataset · Data source · 23-01-2025 · 151.83 MiB
Multimodal single cell sequencing implicates chromatin accessibility and genetic background in diabetic kidney disease progression (Wilson et al. 2022).
Multimodal single cell sequencing is a powerful tool for interrogating cell-specific changes in transcription and chromatin accessibility. We performed single nucleus RNA (snRNA-seq) and assay for transposase accessible chromatin sequencing (snATAC-seq) on human kidney cortex from donors with and without diabetic kidney disease (DKD) to identify altered signaling pathways and transcription factors associated with DKD. Both snRNA-seq and snATAC-seq had an increased proportion of VCAM1+ injured proximal tubule cells (PT_VCAM1) in DKD samples. PT_VCAM1 has a pro-inflammatory expression signature and transcription factor motif enrichment implicated NFkB signaling. We used stratified linkage disequilibrium score regression to partition heritability of kidney-function-related traits using publicly-available GWAS summary statistics. Cell-specific PT_VCAM1 peaks were enriched for heritability of chronic kidney disease (CKD), suggesting that genetic background may regulate chromatin accessibility and DKD progression. snATAC-seq found cell-specific differentially accessible regions (DAR) throughout the nephron that change accessibility in DKD and these regions were enriched for glucocorticoid receptor (GR) motifs. Changes in chromatin accessibility were associated with decreased expression of insulin receptor, increased gluconeogenesis, and decreased expression of the GR cytosolic chaperone, FKBP5, in the diabetic proximal tubule. Cleavage under targets and release using nuclease (CUT&RUN) profiling of GR binding in bulk kidney cortex and an in vitro model of the proximal tubule (RPTEC) showed that DAR co-localize with GR binding sites. CRISPRi silencing of GR response elements (GRE) in the FKBP5 gene body reduced FKBP5 expression in RPTEC, suggesting that reduced FKBP5 chromatin accessibility in DKD may alter cellular response to GR. We developed an open-source tool for single cell allele specific analysis (SALSA) to model the effect of genetic background on gene expression. Heterozygous germline single nucleotide variants (SNV) in proximal tubule ATAC peaks were associated with allele-specific chromatin accessibility and differential expression of target genes within cis-coaccessibility networks. Partitioned heritability of proximal tubule ATAC peaks with a predicted allele-specific effect was enriched for eGFR, suggesting that genetic background may modify DKD progression in a cell-specific manner.
Method info
Show
Best standard method
Baseline best standard method
A combined ‘best’ standard method constructed by combining scores from the best overall standard method from each task.
The selected methods are:
Median standard method
Baseline median standard method
A combined ‘median’ standard method constructed by combining scores from the standard methods with the median overall score on each task.
The selected methods are:
Geneformer
Geneformer is a foundational transformer model pretrained on a large-scale corpus of single cell transcriptomes to enable context-aware predictions in settings with limited data in network biology (Theodoris et al. 2023) (Chen et al. 2024)
Geneformer is a context-aware, attention-based deep learning model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.
scGPT (fine-tuned)
A fine-tuned version of the scGPT foundation model (Cui et al. 2024)
scGPT is a foundation model for single-cell biology based on a generative pre-trained transformer and trained on a repository of over 33 million cells. Here we fine-tune a pre-trained model for each task.
scGPT (zero shot)
A zero-shot verions of the scGPT foundation model (Cui et al. 2024)
scGPT is a foundation model for single-cell biology based on a generative pre-trained transformer and trained on a repository of over 33 million cells. Here we preform zero-shot inference using a pre-trained model.
SCimilarity
SCimilarity provides unifying representation of single cell expression profiles (Heimberg et al. 2023)
SCimilarity is a unifying representation of single cell expression profiles that quantifies similarity between expression states and generalizes to represent new studies without additional training.
scPRINT
scPRINT is a large transformer model built for the inference of gene networks (Kalfon et al. 2024)
scPRINT is a large transformer model built for the inference of gene networks (connections between genes explaining the cell’s expression profile) from scRNAseq data. It uses novel encoding and decoding of the cell expression profile and new pre-training methodologies to learn a cell model. scPRINT can be used to perform the following analyses: - expression denoising: increase the resolution of your scRNAseq data - cell embedding: generate a low-dimensional representation of your dataset - label prediction: predict the cell type, disease, sequencer, sex, and ethnicity of your cells - gene network inference: generate a gene network from any cell or cell cluster in your scRNAseq dataset.
UCE
UCE offers a unified biological latent space that can represent any cell (Rosen et al. 2023)
Universal Cell Embedding (UCE) is a single-cell foundation model that offers a unified biological latent space that can represent any cell, regardless of tissue or species.
Control method info
Show
Postive control
Baseline positive control
Baseline positive control constructed by taking the highest mean score by a control method on each dataset for each task.
Negative control
Baseline negative control
Baseline negative control constructed by taking the lowest mean score by a control method on each dataset for each task.
Metric info
Show
Label projection
Automated cell type annotation from rich, labeled reference data.
A major challenge for integrating single cell datasets is creating matching cell type annotations for each cell. One of the most common strategies for annotating cell types is referred to as “cluster-then-annotate” whereby cells are aggregated into clusters based on feature similarity and then manually characterized based on differential gene expression or previously identified marker genes. Recently, methods have emerged to build on this strategy and annotate cells using known marker genes. However, these strategies pose a difficulty for integrating atlas-scale datasets as the particular annotations may not match.
To ensure that the cell type labels in newly generated datasets match existing reference datasets, some methods align cells to a previously annotated reference dataset and then project labels from the reference to the new dataset.
Here, we compare methods for annotation based on a reference dataset. The datasets consist of two or more samples of single cell profiles that have been manually annotated with matching labels. These datasets are then split into training and test batches, and the task of each method is to train a cell type classifer on the training set and project those labels onto the test set.
Batch Integration
Remove unwanted batch effects from scRNA-seq data while retaining biologically meaningful variation.
As single-cell technologies advance, single-cell datasets are growing both in size and complexity. Especially in consortia such as the Human Cell Atlas, individual studies combine data from multiple labs, each sequencing multiple individuals possibly with different technologies. This gives rise to complex batch effects in the data that must be computationally removed to perform a joint analysis. These batch integration methods must remove the batch effect while not removing relevant biological information. Currently, over 200 tools exist that aim to remove batch effects scRNA-seq datasets (Zappia, Phipson, and Oshlack 2018). These methods balance the removal of batch effects with the conservation of nuanced biological information in different ways. This abundance of tools has complicated batch integration method choice, leading to several benchmarks on this topic (Luecken et al. 2021; Tran et al. 2020; Chazarra-Gil et al. 2021; Mereu et al. 2020). Yet, benchmarks use different metrics, method implementations and datasets. Here we build a living benchmarking task for batch integration methods with the vision of improving the consistency of method evaluation.
In this task we evaluate batch integration methods on their ability to remove batch effects in the data while conserving variation attributed to biological effects. As input, methods require either normalised or unnormalised data with multiple batches and consistent cell type labels. The batch integrated output can be a feature matrix, a low dimensional embedding and/or a neighbourhood graph. The respective batch-integrated representation is then evaluated using sets of metrics that capture how well batch effects are removed and whether biological variance is conserved. We have based this particular task on the latest, and most extensive benchmark of single-cell data integration methods.
References
Boiarsky, Rebecca, Nalini Singh, Alejandro Buendia, Gad Getz, and David Sontag. 2023. “A deep dive into single-cell RNA sequencing foundation models.” bioRxiv, 2023.10.19.563100. https://doi.org/10.1101/2023.10.19.563100.
Chazarra-Gil, Ruben, Stijn van Dongen, Vladimir Yu Kiselev, and Martin Hemberg. 2021. “Flexible Comparison of Batch Correction Methods for Single-Cell RNA-Seq Using BatchBench.” Nucleic Acids Research 49 (7): e42–42. https://doi.org/10.1093/nar/gkab004.
Chen, Han, Madhavan S Venkatesh, Javier Gomez Ortega, Siddharth V Mahesh, Tarak N Nandi, Ravi K Madduri, Karin Pelka, and Christina V Theodoris. 2024. “Quantized Multi-Task Learning for Context-Specific Representations of Gene Network Dynamics,” August. https://doi.org/10.1101/2024.08.16.608180.
Cui, Haotian, Chloe Wang, Hassaan Maan, Kuan Pang, Fengning Luo, Nan Duan, and Bo Wang. 2024. “scGPT: Toward Building a Foundation Model for Single-Cell Multi-Omics Using Generative AI.” Nature Methods 21 (8): 1470–80. https://doi.org/10.1038/s41592-024-02201-0.
Domínguez Conde, C., C. Xu, L. B. Jarvis, D. B. Rainbow, S. B. Wells, T. Gomes, S. K. Howlett, et al. 2022. “Cross-Tissue Immune Cell Analysis Reveals Tissue-Specific Features in Humans.” Science 376 (6594). https://doi.org/10.1126/science.abl5197.
Eraslan, Gökcen, Eugene Drokhlyansky, Shankara Anand, Evgenij Fiskin, Ayshwarya Subramanian, Michal Slyper, Jiali Wang, et al. 2022. “Single-Nucleus Cross-Tissue Molecular Reference Maps Toward Understanding Disease Gene Function.” Science 376 (6594). https://doi.org/10.1126/science.abl4290.
Heimberg, Graham, Tony Kuo, Daryle DePianto, Tobias Heigl, Nathaniel Diamant, Omar Salem, Gabriele Scalia, et al. 2023. “Scalable Querying of Human Cell Atlases via a Foundational Model Reveals Commonalities Across Fibrosis-Associated Macrophages,” July. https://doi.org/10.1101/2023.07.18.549537.
Jones, Robert C., Jim Karkanias, Mark A. Krasnow, Angela Oliveira Pisco, Stephen R. Quake, Julia Salzman, Nir Yosef, et al. 2022. “The Tabula Sapiens: A Multiple-Organ, Single-Cell Transcriptomic Atlas of Humans.” Science 376 (6594). https://doi.org/10.1126/science.abl4896.
Kalfon, Jérémie, Jules Samaran, Gabriel Peyré, and Laura Cantini. 2024. “scPRINT: Pre-Training on 50 Million Cells Allows Robust Gene Network Predictions,” July. https://doi.org/10.1101/2024.07.29.605556.
Liu, Tianyu, Kexing Li, Yuge Wang, Hongyu Li, and Hongyu Zhao. 2024. “Evaluating the utilities of foundation models in single-cell data analysis.” bioRxiv.org: The Preprint Server for Biology, 2023.09.08.555192. https://doi.org/10.1101/2023.09.08.555192.
Luecken, Malte D., M. Büttner, K. Chaichoompu, A. Danese, M. Interlandi, M. F. Mueller, D. C. Strobl, et al. 2021. “Benchmarking Atlas-Level Data Integration in Single-Cell Genomics.” Nature Methods 19 (1): 41–50. https://doi.org/10.1038/s41592-021-01336-8.
Mereu, Elisabetta, Atefeh Lafzi, Catia Moutinho, Christoph Ziegenhain, Davis J McCarthy, Adrian Alvarez-Varela, Eduard Batlle, et al. 2020. “Benchmarking Single-Cell RNA-Sequencing Protocols for Cell Atlas Projects.” Nature Biotechnology 38 (6): 747–55. https://doi.org/10.1038/s41587-020-0469-4.
Rosen, Yanay, Yusuf Roohani, Ayush Agrawal, Leon Samotorcan, Tabula Sapiens Consortium, Stephen R. Quake, and Jure Leskovec. 2023. “Universal Cell Embeddings: A Foundation Model for Cell Biology,” November. https://doi.org/10.1101/2023.11.28.568918.
Szałata, Artur, Karin Hrovatin, Sören Becker, Alejandro Tejada-Lapuerta, Haotian Cui, Bo Wang, and Fabian J Theis. 2024. “Transformers in single-cell omics: a review and new perspectives.” Nature Methods 21 (8): 1430–43. https://doi.org/10.1038/s41592-024-02353-z.
Theodoris, Christina V., Ling Xiao, Anant Chopra, Mark D. Chaffin, Zeina R. Al Sayed, Matthew C. Hill, Helene Mantineo, et al. 2023. “Transfer Learning Enables Predictions in Network Biology.” Nature 618 (7965): 616–24. https://doi.org/10.1038/s41586-023-06139-9.
Tran, Hoa Thi Nhu, Kok Siong Ang, Marion Chevrier, Xiaomeng Zhang, Nicole Yee Shin Lee, Michelle Goh, and Jinmiao Chen. 2020. “A Benchmark of Batch-Effect Correction Methods for Single-Cell RNA Sequencing Data.” Genome Biology 21 (1). https://doi.org/10.1186/s13059-019-1850-9.
Wilson, Parker C., Yoshiharu Muto, Haojia Wu, Anil Karihaloo, Sushrut S. Waikar, and Benjamin D. Humphreys. 2022. “Multimodal Single Cell Sequencing Implicates Chromatin Accessibility and Genetic Background in Diabetic Kidney Disease Progression.” Nature Communications 13 (1). https://doi.org/10.1038/s41467-022-32972-z.
Zappia, Luke, Belinda Phipson, and Alicia Oshlack. 2018. “Exploring the Single-Cell RNA-Seq Analysis Landscape with the scRNA-Tools Database.” Edited by Dina Schneidman. PLOS Computational Biology 14 (6): e1006245. https://doi.org/10.1371/journal.pcbi.1006245.