This page was generated from a Jupyter notebook. You can download the notebook here to run it locally in Jupyter.

Human heart failure

In this notebook, we show how to infer cell-level spatial transcriptome and how to perform analyses on the tissue samples from human heart failure patients with myocardial infarction.

  • Thor inference

  • Pathway enrichment analysis

  • Transcription factor analyses

  • Semi-supervised annotation of fibrotic cells (link to external video tutorials to Mjolnir platform, which is to be launched)

The input data and Thor processed data can be downloaded from our google drive or from the original Nature paper. Aliases (in parentesis) are used in our manuscript of the sample names appeared in the original paper: RZ_GT_P2(RZ1), RZ_P11(RZ2), GT_IZ_P9(IZ1), IZ_P10(IZ2), FZ_GT_P19(FZ1), FZ_P18(FZ2).

For installation of Thor, please refer to this installation guide.

Import the packages

[1]:
import os
import logging
import datetime
import numpy as np

logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.basicConfig(format='%(name)s - %(levelname)s - %(message)s')

now = datetime.datetime.now()
logger.info(f"Current Time: {now}")
root - INFO - Current Time: 2023-12-13 00:13:40.927335
[2]:
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import scanpy as sc
import decoupler as dc
import warnings

warnings.filterwarnings("ignore")
sc.set_figure_params(scanpy=True, dpi=150, dpi_save=300)
sc.settings.verbosity = 'error'

from thor.analy import get_pathway_score, get_tf_activity
from thor.finest import fineST
from thor.pl import colors

Predicting cell-level gene expression using Markov graph diffusion

There are in total of six samples. We use sample “RZ_GT_P2” as an exmaple. All the process and parameters were kept the same.

We also skip the preprocessing of the images and the spatial transcriptome data in this tutorial. Please refer to the other two tutorials for preprocessing. The preprocessed data can be downloaded from the same google drive link.

[3]:
samples = ['RZ_GT_P2', 'RZ_P11', 'GT_IZ_P9', 'IZ_P10', 'FZ_GT_P19', 'FZ_P18']

library_names = {
    'FZ_P20': 'Visium_16_CK294',
    'FZ_GT_P19': 'Visium_14_CK292',
    'GT_IZ_P9': 'Visium_9_CK287',
    'IZ_P10': 'Visium_7_CK285',
    'RZ_P11': 'Visium_8_CK286',
    'RZ_GT_P2': 'Visium_5_CK283'
    }
[4]:
sn = samples[0]

# Below are the files you needed to run Thor
image_path = os.path.join("HE_image", sn, f"{library_names[sn]}.tif")
cell_features_csv_path = os.path.join("WSI_HF", sn, f"features_{sn}.csv")
cell_mask_path = os.path.join("WSI_HF", sn, f"nucleis_{sn}_mask.npz")
spot_adata_path = os.path.join("Spatial_HF", sn, f"{sn}_adata_spots_thor_processed.h5ad")

rz = fineST(
    image_path,
    sn,
    spot_adata_path=spot_adata_path,
    cell_features_csv_path=cell_features_csv_path,
    recipe='gene'

    )
rz.prepare_input(mapping_margin=10)

[12/13/23 00:13:46] INFO     Thor: Please check alignment of cells and spots
                    INFO     Thor: The first two columns in the node_feat DataFrame need to match the spatial coordinates from obsm['spatial'].
                    INFO     Thor: Mapping cells to the closest spots within 10 x the spot radius
../_images/notebooks_run_thor_analyses_HF_7_1.png
[5]:
rz.adata
[5]:
AnnData object with n_obs × n_vars = 32994 × 14467
    obs: 'n_counts', 'n_genes', 'percent.mt', 'Adipocyte', 'Cardiomyocyte', 'Endothelial', 'Fibroblast', 'Lymphoid', 'Mast', 'Myeloid', 'Neuronal', 'Pericyte', 'Cycling.cells', 'vSMCs', 'cell_type_original', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'development_stage_ontology_term_id', 'disease_ontology_term_id', 'ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sex_ontology_term_id', 'tissue_ontology_term_id', 'spot_barcodes', 'x', 'y', 'mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b', 'std_r', 'std_g', 'std_b', 'spot_heterogeneity'
    var: 'features'
    uns: 'X_approximate_distribution', 'X_normalization', 'default_embedding', 'schema_version', 'spatial', 'title', 'cell_image_props'
    obsm: 'X_pca', 'X_spatial', 'X_umap', 'spatial'

Run time ~ 10 mins for 32994 cells and 14467 genes on a MacbookPro M2 laptop.

[6]:
rz.set_genes_for_prediction(genes_selection_key='all')

rz.set_params(
    out_prefix="fineST",
    is_rawCounts=False,
    layer=None,
    is_rawCount=True,
    write_freq=20,
    n_iter=20,
    smoothing_scale=0.8,
    node_features_obs_list=['spot_heterogeneity'],
    n_neighbors=5,
    geom_morph_ratio=1,
    inflation_percentage=None,
    regulate_expression_mean=False,
    stochastic_expression_neighbors_level=None,
    n_jobs=8)

rz.predict_gene_expression()

[12/12/23 23:00:47] WARNING  Thor: Using all the genes. This may not be optimal and can be slow.
                    INFO     Thor: Using mode gene
                    INFO     Thor: Construct SNN with morphological features: ['mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b',
                             'std_r', 'std_g', 'std_b'].
[12/12/23 23:00:48] INFO     Thor: Finish constructing SNN
                    INFO     Thor: Add adata.obsp["snn_connectivities"]
                    INFO     Thor: Add adata.obsp["snn_knn_connectivities"]
                    INFO     Thor: Add adata.uns["snn"]
                    INFO     Thor: Weigh cells according to the spot heterogeneity.
                    INFO     Thor: Promote flow of information from more homogeneous cells to less.
                    INFO     Thor: Eliminate low quality edges (<0.1) between cells.
                    INFO     Thor: Compute transition matrix
                    INFO     Thor: self weight scale is set to: 0.200
                    INFO     Thor: Added transition matrix to adata.obsp["snn_transition_matrix"]
... storing 'spot_barcodes' as categorical
[12/12/23 23:00:49] INFO     Thor: fineST estimation starts.
Creating piles: 100%|██████████| 8/8 [00:20<00:00,  2.60s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:06<00:00, 21.31s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:09<00:00, 21.46s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:08<00:00, 21.43s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:04<00:00, 21.22s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:12<00:00, 21.63s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:10<00:00, 21.50s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:03<00:00, 21.20s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:01<00:00, 21.08s/it]
[12/12/23 23:10:33] INFO     Thor: fineST estimation finished.
                    
                    
[7]:
ad_thor = rz.load_result("fineST_20.npz")
sc.pp.normalize_total(ad_thor)
sc.pp.log1p(ad_thor)
ad_thor
[7]:
AnnData object with n_obs × n_vars = 32994 × 14467
    obs: 'n_counts', 'n_genes', 'percent.mt', 'Adipocyte', 'Cardiomyocyte', 'Endothelial', 'Fibroblast', 'Lymphoid', 'Mast', 'Myeloid', 'Neuronal', 'Pericyte', 'Cycling.cells', 'vSMCs', 'cell_type_original', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'development_stage_ontology_term_id', 'disease_ontology_term_id', 'ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sex_ontology_term_id', 'tissue_ontology_term_id', 'spot_barcodes', 'x', 'y', 'mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b', 'std_r', 'std_g', 'std_b', 'spot_heterogeneity', 'node_weights'
    var: 'features', 'used_for_prediction', 'used_for_reduced', 'used_for_vae'
    uns: 'X_approximate_distribution', 'X_normalization', 'default_embedding', 'schema_version', 'spatial', 'title', 'cell_image_props', 'snn', 'log1p'
    obsm: 'X_pca', 'X_spatial', 'X_umap', 'spatial'
    obsp: 'snn_connectivities', 'snn_knn_connectivities', 'snn_transition_matrix'
[8]:
palette = colors.continuous_palettes['blueYellow']
cmap = LinearSegmentedColormap.from_list('my_cmap', palette, N=256)
[9]:
sc.pl.spatial(ad_thor, color=['PDGFRA', 'CASQ2', 'FBLN2', 'PKP2'], spot_size=40, use_raw=False, ncols=4, cmap=cmap, vmin='p1', vmax='p99')
../_images/notebooks_run_thor_analyses_HF_13_0.png

Pathway enrichment analyses

Get the gene sets for the GO_BP pathways

[10]:
import decoupler as dc

msigdb = dc.get_resource('MSigDB')
msigdb = msigdb[msigdb['collection'] == 'go_biological_process']
msigdb = msigdb[~msigdb.duplicated(['geneset', 'genesymbol'])]
msigdb['geneset'] = [name.split('GOBP_')[1] for name in msigdb['geneset']]
msigdb
root - INFO - Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/about?format=text`
root - INFO - Downloading annotations for all proteins from the following resources: `['MSigDB']`
[10]:
genesymbol collection geneset
33 MAFF go_biological_process EMBRYO_DEVELOPMENT
44 MAFF go_biological_process POSITIVE_REGULATION_OF_RNA_METABOLIC_PROCESS
82 MAFF go_biological_process REGULATION_OF_EPITHELIAL_CELL_DIFFERENTIATION
94 MAFF go_biological_process EMBRYO_DEVELOPMENT_ENDING_IN_BIRTH_OR_EGG_HATC...
108 MAFF go_biological_process IN_UTERO_EMBRYONIC_DEVELOPMENT
... ... ... ...
3838543 PRAMEF22 go_biological_process POSITIVE_REGULATION_OF_CELL_POPULATION_PROLIFE...
3838544 PRAMEF22 go_biological_process APOPTOTIC_PROCESS
3838545 PRAMEF22 go_biological_process REGULATION_OF_CELL_DEATH
3838546 PRAMEF22 go_biological_process NEGATIVE_REGULATION_OF_DEVELOPMENTAL_PROCESS
3838547 PRAMEF22 go_biological_process NEGATIVE_REGULATION_OF_CELL_DEATH

620627 rows × 3 columns

[11]:
ptw_list = [
    'POSITIVE_REGULATION_OF_T_CELL_PROLIFERATION',
    'POSITIVE_REGULATION_OF_INFLAMMATORY_RESPONSE',
    'POSITIVE_REGULATION_OF_IMMUNE_EFFECTOR_PROCESS',
    'POSITIVE_REGULATION_OF_B_CELL_ACTIVATION',
    'POSITIVE_REGULATION_OF_FIBROBLAST_PROLIFERATION',
    'COLLAGEN_FIBRIL_ORGANIZATION',
    'CARDIAC_MUSCLE_CONTRACTION'
    ]
msigdb = msigdb[msigdb.geneset.isin(ptw_list)]
msigdb
[11]:
genesymbol collection geneset
3181 GPR183 go_biological_process POSITIVE_REGULATION_OF_B_CELL_ACTIVATION
5388 RIPK2 go_biological_process POSITIVE_REGULATION_OF_IMMUNE_EFFECTOR_PROCESS
5673 RIPK2 go_biological_process POSITIVE_REGULATION_OF_T_CELL_PROLIFERATION
12886 CEBPB go_biological_process POSITIVE_REGULATION_OF_INFLAMMATORY_RESPONSE
14576 SPHK1 go_biological_process POSITIVE_REGULATION_OF_FIBROBLAST_PROLIFERATION
... ... ... ...
3815299 SASH3 go_biological_process POSITIVE_REGULATION_OF_IMMUNE_EFFECTOR_PROCESS
3815331 SASH3 go_biological_process POSITIVE_REGULATION_OF_T_CELL_PROLIFERATION
3815499 SASH3 go_biological_process POSITIVE_REGULATION_OF_B_CELL_ACTIVATION
3826247 TAFAZZIN go_biological_process CARDIAC_MUSCLE_CONTRACTION
3838037 HLA-DRB3 go_biological_process POSITIVE_REGULATION_OF_IMMUNE_EFFECTOR_PROCESS

843 rows × 3 columns

Calculate the enrichment score using decoupler

[12]:
acts = get_pathway_score(ad_thor, layer=None, net_df=msigdb)
acts
Running ora on mat with 32994 samples and 14467 targets for 7 sources.
100%|██████████| 32994/32994 [00:28<00:00, 1175.86it/s]
[12]:
AnnData object with n_obs × n_vars = 32994 × 7
    obs: 'n_counts', 'n_genes', 'percent.mt', 'Adipocyte', 'Cardiomyocyte', 'Endothelial', 'Fibroblast', 'Lymphoid', 'Mast', 'Myeloid', 'Neuronal', 'Pericyte', 'Cycling.cells', 'vSMCs', 'cell_type_original', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'development_stage_ontology_term_id', 'disease_ontology_term_id', 'ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sex_ontology_term_id', 'tissue_ontology_term_id', 'spot_barcodes', 'x', 'y', 'mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b', 'std_r', 'std_g', 'std_b', 'spot_heterogeneity', 'node_weights'
    uns: 'X_approximate_distribution', 'X_normalization', 'cell_image_props', 'default_embedding', 'log1p', 'schema_version', 'snn', 'spatial', 'title'
    obsm: 'X_pca', 'X_spatial', 'X_umap', 'ora_estimate', 'ora_pvals', 'spatial', 'msigdb_ora_estimate', 'msigdb_ora_pvals'
    layers: 'smoothed'

The pathway enrichment scores can be visualized on our interactive browser Mjolnir or using SCANPY as follows

[13]:
for pathway in ptw_list:
    sc.pl.spatial(acts, color=pathway, layer='smoothed', spot_size=30, use_raw=False, ncols=3, cmap='rainbow', frameon=False, title=pathway.lower(), colorbar_loc=None, alpha=0.5, alpha_img=0.7)
../_images/notebooks_run_thor_analyses_HF_21_0.png
../_images/notebooks_run_thor_analyses_HF_21_1.png
../_images/notebooks_run_thor_analyses_HF_21_2.png
../_images/notebooks_run_thor_analyses_HF_21_3.png
../_images/notebooks_run_thor_analyses_HF_21_4.png
../_images/notebooks_run_thor_analyses_HF_21_5.png
../_images/notebooks_run_thor_analyses_HF_21_6.png

Transcription factor activity inference

Download the CollecTRI database.

[14]:
net = dc.get_collectri(organism='human', split_complexes=False)
net
root - INFO - Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/about?format=text`
[14]:
source target weight PMID
0 MYC TERT 1 10022128;10491298;10606235;10637317;10723141;1...
1 SPI1 BGLAP 1 10022617
2 SMAD3 JUN 1 10022869;12374795
3 SMAD4 JUN 1 10022869;12374795
4 STAT5A IL2 1 10022878;11435608;17182565;17911616;22854263;2...
... ... ... ... ...
43173 NFKB hsa-miR-143-3p 1 19472311
43174 AP1 hsa-miR-206 1 19721712
43175 NFKB hsa-miR-21-5p 1 20813833;22387281
43176 NFKB hsa-miR-224-5p 1 23474441;23988648
43177 AP1 hsa-miR-144 1 23546882

43178 rows × 4 columns

Calculate the TF activity based on regulated gene expression using decoupler

[15]:
acts = get_tf_activity(ad_thor, layer=None, net_df=net)
acts
Running ulm on mat with 32994 samples and 14467 targets for 659 sources.
100%|██████████| 4/4 [00:03<00:00,  1.20it/s]
[15]:
AnnData object with n_obs × n_vars = 32994 × 659
    obs: 'n_counts', 'n_genes', 'percent.mt', 'Adipocyte', 'Cardiomyocyte', 'Endothelial', 'Fibroblast', 'Lymphoid', 'Mast', 'Myeloid', 'Neuronal', 'Pericyte', 'Cycling.cells', 'vSMCs', 'cell_type_original', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'development_stage_ontology_term_id', 'disease_ontology_term_id', 'ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sex_ontology_term_id', 'tissue_ontology_term_id', 'spot_barcodes', 'x', 'y', 'mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b', 'std_r', 'std_g', 'std_b', 'spot_heterogeneity', 'node_weights'
    uns: 'X_approximate_distribution', 'X_normalization', 'cell_image_props', 'default_embedding', 'log1p', 'schema_version', 'snn', 'spatial', 'title'
    obsm: 'X_pca', 'X_spatial', 'X_umap', 'ora_estimate', 'ora_pvals', 'spatial', 'ulm_estimate', 'ulm_pvals', 'msigdb_ulm_estimate', 'msigdb_ulm_pvals'
    layers: 'smoothed'

The TF activity score can be visualized on our interactive browser Mjolnir or using SCANPY as follows

[16]:
tf_list = ['MYB', 'STAT4', 'SMAD3']
for tf in tf_list:
    sc.pl.spatial(acts, color=tf, layer='smoothed', spot_size=30, use_raw=False, ncols=3, cmap='rainbow', frameon=False, title=tf, alpha=0.5, alpha_img=0.7)
../_images/notebooks_run_thor_analyses_HF_28_0.png
../_images/notebooks_run_thor_analyses_HF_28_1.png
../_images/notebooks_run_thor_analyses_HF_28_2.png

Semi-supervised annotation

The semi-supervised annotation of similar cells can be performed on our web-based platform Mjolnir. Below are three video tips for how to use mjolnir.