This page was generated from a Jupyter notebook. You can download the notebook here to run it locally in Jupyter.

Human heart failure

In this notebook, we show how to infer cell-level spatial transcriptome and how to perform analyses on the tissue samples from human heart failure patients with myocardial infarction.

Thor inference
Pathway enrichment analysis
Transcription factor analyses
Semi-supervised annotation of fibrotic cells (link to external video tutorials to Mjolnir platform, which is to be launched)

The input data and Thor processed data can be downloaded from our google drive or from the original Nature paper. Aliases (in parentesis) are used in our manuscript of the sample names appeared in the original paper: RZ_GT_P2(RZ1), RZ_P11(RZ2), GT_IZ_P9(IZ1), IZ_P10(IZ2), FZ_GT_P19(FZ1), FZ_P18(FZ2).

For installation of Thor, please refer to this installation guide.

Import the packages

[1]:

import os
import logging
import datetime
import numpy as np

logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.basicConfig(format='%(name)s - %(levelname)s - %(message)s')

now = datetime.datetime.now()
logger.info(f"Current Time: {now}")

root - INFO - Current Time: 2023-12-13 00:13:40.927335

[2]:

%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import scanpy as sc
import decoupler as dc
import warnings

warnings.filterwarnings("ignore")
sc.set_figure_params(scanpy=True, dpi=150, dpi_save=300)
sc.settings.verbosity = 'error'

from thor.analy import get_pathway_score, get_tf_activity
from thor.finest import fineST
from thor.pl import colors

Predicting cell-level gene expression using Markov graph diffusion

There are in total of six samples. We use sample “RZ_GT_P2” as an exmaple. All the process and parameters were kept the same.

We also skip the preprocessing of the images and the spatial transcriptome data in this tutorial. Please refer to the other two tutorials for preprocessing. The preprocessed data can be downloaded from the same google drive link.

[3]:

samples = ['RZ_GT_P2', 'RZ_P11', 'GT_IZ_P9', 'IZ_P10', 'FZ_GT_P19', 'FZ_P18']

library_names = {
    'FZ_P20': 'Visium_16_CK294',
    'FZ_GT_P19': 'Visium_14_CK292',
    'GT_IZ_P9': 'Visium_9_CK287',
    'IZ_P10': 'Visium_7_CK285',
    'RZ_P11': 'Visium_8_CK286',
    'RZ_GT_P2': 'Visium_5_CK283'
    }

[4]:

sn = samples[0]

# Below are the files you needed to run Thor
image_path = os.path.join("HE_image", sn, f"{library_names[sn]}.tif")
cell_features_csv_path = os.path.join("WSI_HF", sn, f"features_{sn}.csv")
cell_mask_path = os.path.join("WSI_HF", sn, f"nucleis_{sn}_mask.npz")
spot_adata_path = os.path.join("Spatial_HF", sn, f"{sn}_adata_spots_thor_processed.h5ad")

rz = fineST(
    image_path,
    sn,
    spot_adata_path=spot_adata_path,
    cell_features_csv_path=cell_features_csv_path,
    recipe='gene'

    )
rz.prepare_input(mapping_margin=10)

[12/13/23 00:13:46] INFO     Thor: Please check alignment of cells and spots
                    INFO     Thor: The first two columns in the node_feat DataFrame need to match the spatial coordinates from obsm['spatial'].
                    INFO     Thor: Mapping cells to the closest spots within 10 x the spot radius

../_images/notebooks_run_thor_analyses_HF_7_1.png

[5]:

rz.adata

[5]:

AnnData object with n_obs × n_vars = 32994 × 14467
    obs: 'n_counts', 'n_genes', 'percent.mt', 'Adipocyte', 'Cardiomyocyte', 'Endothelial', 'Fibroblast', 'Lymphoid', 'Mast', 'Myeloid', 'Neuronal', 'Pericyte', 'Cycling.cells', 'vSMCs', 'cell_type_original', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'development_stage_ontology_term_id', 'disease_ontology_term_id', 'ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sex_ontology_term_id', 'tissue_ontology_term_id', 'spot_barcodes', 'x', 'y', 'mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b', 'std_r', 'std_g', 'std_b', 'spot_heterogeneity'
    var: 'features'
    uns: 'X_approximate_distribution', 'X_normalization', 'default_embedding', 'schema_version', 'spatial', 'title', 'cell_image_props'
    obsm: 'X_pca', 'X_spatial', 'X_umap', 'spatial'

Run time ~ 10 mins for 32994 cells and 14467 genes on a MacbookPro M2 laptop.

[6]:

rz.set_genes_for_prediction(genes_selection_key='all')

rz.set_params(
    out_prefix="fineST",
    is_rawCounts=False,
    layer=None,
    is_rawCount=True,
    write_freq=20,
    n_iter=20,
    smoothing_scale=0.8,
    node_features_obs_list=['spot_heterogeneity'],
    n_neighbors=5,
    geom_morph_ratio=1,
    inflation_percentage=None,
    regulate_expression_mean=False,
    stochastic_expression_neighbors_level=None,
    n_jobs=8)

rz.predict_gene_expression()

[12/12/23 23:00:47] WARNING  Thor: Using all the genes. This may not be optimal and can be slow.
                    INFO     Thor: Using mode gene
                    INFO     Thor: Construct SNN with morphological features: ['mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b',
                             'std_r', 'std_g', 'std_b'].
[12/12/23 23:00:48] INFO     Thor: Finish constructing SNN
                    INFO     Thor: Add adata.obsp["snn_connectivities"]
                    INFO     Thor: Add adata.obsp["snn_knn_connectivities"]
                    INFO     Thor: Add adata.uns["snn"]
                    INFO     Thor: Weigh cells according to the spot heterogeneity.
                    INFO     Thor: Promote flow of information from more homogeneous cells to less.
                    INFO     Thor: Eliminate low quality edges (<0.1) between cells.
                    INFO     Thor: Compute transition matrix
                    INFO     Thor: self weight scale is set to: 0.200
                    INFO     Thor: Added transition matrix to adata.obsp["snn_transition_matrix"]

... storing 'spot_barcodes' as categorical

[12/12/23 23:00:49] INFO     Thor: fineST estimation starts.

Creating piles: 100%|██████████| 8/8 [00:20<00:00,  2.60s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:06<00:00, 21.31s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:09<00:00, 21.46s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:08<00:00, 21.43s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:04<00:00, 21.22s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:12<00:00, 21.63s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:10<00:00, 21.50s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:03<00:00, 21.20s/it]
MCMC Iteration: 100%|██████████| 20/20 [07:01<00:00, 21.08s/it]

[12/12/23 23:10:33] INFO     Thor: fineST estimation finished.

[7]:

ad_thor = rz.load_result("fineST_20.npz")
sc.pp.normalize_total(ad_thor)
sc.pp.log1p(ad_thor)
ad_thor

[7]:

AnnData object with n_obs × n_vars = 32994 × 14467
    obs: 'n_counts', 'n_genes', 'percent.mt', 'Adipocyte', 'Cardiomyocyte', 'Endothelial', 'Fibroblast', 'Lymphoid', 'Mast', 'Myeloid', 'Neuronal', 'Pericyte', 'Cycling.cells', 'vSMCs', 'cell_type_original', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'development_stage_ontology_term_id', 'disease_ontology_term_id', 'ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sex_ontology_term_id', 'tissue_ontology_term_id', 'spot_barcodes', 'x', 'y', 'mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b', 'std_r', 'std_g', 'std_b', 'spot_heterogeneity', 'node_weights'
    var: 'features', 'used_for_prediction', 'used_for_reduced', 'used_for_vae'
    uns: 'X_approximate_distribution', 'X_normalization', 'default_embedding', 'schema_version', 'spatial', 'title', 'cell_image_props', 'snn', 'log1p'
    obsm: 'X_pca', 'X_spatial', 'X_umap', 'spatial'
    obsp: 'snn_connectivities', 'snn_knn_connectivities', 'snn_transition_matrix'

[8]:

palette = colors.continuous_palettes['blueYellow']
cmap = LinearSegmentedColormap.from_list('my_cmap', palette, N=256)

[9]:

sc.pl.spatial(ad_thor, color=['PDGFRA', 'CASQ2', 'FBLN2', 'PKP2'], spot_size=40, use_raw=False, ncols=4, cmap=cmap, vmin='p1', vmax='p99')

../_images/notebooks_run_thor_analyses_HF_13_0.png

Pathway enrichment analyses

Get the gene sets for the GO_BP pathways

[10]:

import decoupler as dc

msigdb = dc.get_resource('MSigDB')
msigdb = msigdb[msigdb['collection'] == 'go_biological_process']
msigdb = msigdb[~msigdb.duplicated(['geneset', 'genesymbol'])]
msigdb['geneset'] = [name.split('GOBP_')[1] for name in msigdb['geneset']]
msigdb

root - INFO - Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/about?format=text`
root - INFO - Downloading annotations for all proteins from the following resources: `['MSigDB']`

[10]:

	genesymbol	collection	geneset
33	MAFF	go_biological_process	EMBRYO_DEVELOPMENT
44	MAFF	go_biological_process	POSITIVE_REGULATION_OF_RNA_METABOLIC_PROCESS
82	MAFF	go_biological_process	REGULATION_OF_EPITHELIAL_CELL_DIFFERENTIATION
94	MAFF	go_biological_process	EMBRYO_DEVELOPMENT_ENDING_IN_BIRTH_OR_EGG_HATC...
108	MAFF	go_biological_process	IN_UTERO_EMBRYONIC_DEVELOPMENT
...	...	...	...
3838543	PRAMEF22	go_biological_process	POSITIVE_REGULATION_OF_CELL_POPULATION_PROLIFE...
3838544	PRAMEF22	go_biological_process	APOPTOTIC_PROCESS
3838545	PRAMEF22	go_biological_process	REGULATION_OF_CELL_DEATH
3838546	PRAMEF22	go_biological_process	NEGATIVE_REGULATION_OF_DEVELOPMENTAL_PROCESS
3838547	PRAMEF22	go_biological_process	NEGATIVE_REGULATION_OF_CELL_DEATH

620627 rows × 3 columns

[11]:

ptw_list = [
    'POSITIVE_REGULATION_OF_T_CELL_PROLIFERATION',
    'POSITIVE_REGULATION_OF_INFLAMMATORY_RESPONSE',
    'POSITIVE_REGULATION_OF_IMMUNE_EFFECTOR_PROCESS',
    'POSITIVE_REGULATION_OF_B_CELL_ACTIVATION',
    'POSITIVE_REGULATION_OF_FIBROBLAST_PROLIFERATION',
    'COLLAGEN_FIBRIL_ORGANIZATION',
    'CARDIAC_MUSCLE_CONTRACTION'
    ]
msigdb = msigdb[msigdb.geneset.isin(ptw_list)]
msigdb

[11]:

	genesymbol	collection	geneset
3181	GPR183	go_biological_process	POSITIVE_REGULATION_OF_B_CELL_ACTIVATION
5388	RIPK2	go_biological_process	POSITIVE_REGULATION_OF_IMMUNE_EFFECTOR_PROCESS
5673	RIPK2	go_biological_process	POSITIVE_REGULATION_OF_T_CELL_PROLIFERATION
12886	CEBPB	go_biological_process	POSITIVE_REGULATION_OF_INFLAMMATORY_RESPONSE
14576	SPHK1	go_biological_process	POSITIVE_REGULATION_OF_FIBROBLAST_PROLIFERATION
...	...	...	...
3815299	SASH3	go_biological_process	POSITIVE_REGULATION_OF_IMMUNE_EFFECTOR_PROCESS
3815331	SASH3	go_biological_process	POSITIVE_REGULATION_OF_T_CELL_PROLIFERATION
3815499	SASH3	go_biological_process	POSITIVE_REGULATION_OF_B_CELL_ACTIVATION
3826247	TAFAZZIN	go_biological_process	CARDIAC_MUSCLE_CONTRACTION
3838037	HLA-DRB3	go_biological_process	POSITIVE_REGULATION_OF_IMMUNE_EFFECTOR_PROCESS

843 rows × 3 columns

Calculate the enrichment score using decoupler

[12]:

acts = get_pathway_score(ad_thor, layer=None, net_df=msigdb)
acts

Running ora on mat with 32994 samples and 14467 targets for 7 sources.

100%|██████████| 32994/32994 [00:28<00:00, 1175.86it/s]

[12]:

AnnData object with n_obs × n_vars = 32994 × 7
    obs: 'n_counts', 'n_genes', 'percent.mt', 'Adipocyte', 'Cardiomyocyte', 'Endothelial', 'Fibroblast', 'Lymphoid', 'Mast', 'Myeloid', 'Neuronal', 'Pericyte', 'Cycling.cells', 'vSMCs', 'cell_type_original', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'development_stage_ontology_term_id', 'disease_ontology_term_id', 'ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sex_ontology_term_id', 'tissue_ontology_term_id', 'spot_barcodes', 'x', 'y', 'mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b', 'std_r', 'std_g', 'std_b', 'spot_heterogeneity', 'node_weights'
    uns: 'X_approximate_distribution', 'X_normalization', 'cell_image_props', 'default_embedding', 'log1p', 'schema_version', 'snn', 'spatial', 'title'
    obsm: 'X_pca', 'X_spatial', 'X_umap', 'ora_estimate', 'ora_pvals', 'spatial', 'msigdb_ora_estimate', 'msigdb_ora_pvals'
    layers: 'smoothed'

The pathway enrichment scores can be visualized on our interactive browser Mjolnir or using SCANPY as follows

[13]:

for pathway in ptw_list:
    sc.pl.spatial(acts, color=pathway, layer='smoothed', spot_size=30, use_raw=False, ncols=3, cmap='rainbow', frameon=False, title=pathway.lower(), colorbar_loc=None, alpha=0.5, alpha_img=0.7)

../_images/notebooks_run_thor_analyses_HF_21_0.png

../_images/notebooks_run_thor_analyses_HF_21_1.png

../_images/notebooks_run_thor_analyses_HF_21_2.png

../_images/notebooks_run_thor_analyses_HF_21_3.png

../_images/notebooks_run_thor_analyses_HF_21_4.png

../_images/notebooks_run_thor_analyses_HF_21_5.png

../_images/notebooks_run_thor_analyses_HF_21_6.png

Transcription factor activity inference

Download the CollecTRI database.

[14]:

net = dc.get_collectri(organism='human', split_complexes=False)
net

root - INFO - Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
root - INFO - Downloading data from `https://omnipathdb.org/about?format=text`

[14]:

	source	target	weight	PMID
0	MYC	TERT	1	10022128;10491298;10606235;10637317;10723141;1...
1	SPI1	BGLAP	1	10022617
2	SMAD3	JUN	1	10022869;12374795
3	SMAD4	JUN	1	10022869;12374795
4	STAT5A	IL2	1	10022878;11435608;17182565;17911616;22854263;2...
...	...	...	...	...
43173	NFKB	hsa-miR-143-3p	1	19472311
43174	AP1	hsa-miR-206	1	19721712
43175	NFKB	hsa-miR-21-5p	1	20813833;22387281
43176	NFKB	hsa-miR-224-5p	1	23474441;23988648
43177	AP1	hsa-miR-144	1	23546882

43178 rows × 4 columns

Calculate the TF activity based on regulated gene expression using decoupler

[15]:

acts = get_tf_activity(ad_thor, layer=None, net_df=net)
acts

Running ulm on mat with 32994 samples and 14467 targets for 659 sources.

100%|██████████| 4/4 [00:03<00:00,  1.20it/s]

[15]:

AnnData object with n_obs × n_vars = 32994 × 659
    obs: 'n_counts', 'n_genes', 'percent.mt', 'Adipocyte', 'Cardiomyocyte', 'Endothelial', 'Fibroblast', 'Lymphoid', 'Mast', 'Myeloid', 'Neuronal', 'Pericyte', 'Cycling.cells', 'vSMCs', 'cell_type_original', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'development_stage_ontology_term_id', 'disease_ontology_term_id', 'ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sex_ontology_term_id', 'tissue_ontology_term_id', 'spot_barcodes', 'x', 'y', 'mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b', 'std_r', 'std_g', 'std_b', 'spot_heterogeneity', 'node_weights'
    uns: 'X_approximate_distribution', 'X_normalization', 'cell_image_props', 'default_embedding', 'log1p', 'schema_version', 'snn', 'spatial', 'title'
    obsm: 'X_pca', 'X_spatial', 'X_umap', 'ora_estimate', 'ora_pvals', 'spatial', 'ulm_estimate', 'ulm_pvals', 'msigdb_ulm_estimate', 'msigdb_ulm_pvals'
    layers: 'smoothed'

The TF activity score can be visualized on our interactive browser Mjolnir or using SCANPY as follows

[16]:

tf_list = ['MYB', 'STAT4', 'SMAD3']
for tf in tf_list:
    sc.pl.spatial(acts, color=tf, layer='smoothed', spot_size=30, use_raw=False, ncols=3, cmap='rainbow', frameon=False, title=tf, alpha=0.5, alpha_img=0.7)

../_images/notebooks_run_thor_analyses_HF_28_0.png

../_images/notebooks_run_thor_analyses_HF_28_1.png

../_images/notebooks_run_thor_analyses_HF_28_2.png

Semi-supervised annotation

The semi-supervised annotation of similar cells can be performed on our web-based platform Mjolnir. Below are three video tips for how to use mjolnir.