This page was generated from a Jupyter notebook. You can download the notebook here to run it locally in Jupyter.

Mouse olfactory bulb

Thor reveals cell-resolution mouse olfactory bulb structure.

In this tutorial, we show inference of cell-level spatial gene expression from the Visium spot-level spatial transcriptome and a whole slide image (H&E staining) using Thor.

The spatial transcriptome dataset is the Adult Mouse Olfactory Bulb. The input data can be downloaded from our google drive or 10x Genomics website.

For installation of Thor, please refer to this installation guide.

Import the packages

[1]:

import sys
import os
import logging
import datetime

logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.basicConfig(format='%(name)s - %(levelname)s - %(message)s')

now = datetime.datetime.now()
logger.info(f"Current Time: {now}")

root - INFO - Current Time: 2023-12-13 12:49:20.893931

[2]:

%config InlineBackend.figure_format = 'retina'

import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap

import warnings

warnings.filterwarnings("ignore")
sc.set_figure_params(scanpy=True, dpi=80, dpi_save=300)
sc.settings.verbosity = 'error'


from thor.pp import WholeSlideImage, Spatial
from thor.finest import fineST
from thor.pl import colors

numexpr.utils - INFO - Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
numexpr.utils - INFO - NumExpr defaulting to 8 threads.

Preprocessing

Preprocessing the whole slide image, including cell segmentation and morphological feature extraction. If not specified, the segmentation data will be saved in a directory (created by Thor) WSI_{sn}.

The WSI can be downloaded directly from google drive.

The outcomes of image preprocessing are two files: - cell mask - cell features

[3]:

sn = 'MOB_10x'
image_path = "Visium_Mouse_Olfactory_Bulb_image.tif"

wsi = WholeSlideImage(image_path, name=sn)

Here we use Cellpose to segment the nuclei. Notice the segmentation can take a long time without GPU. The output file is a n_pixel_row x n_pixel_col matrix.

For the sake of reproducing the results in the paper, we’ll skip this step and load the pre-computed segmentation results from here.

In the downloaded sub-folder (“WSI_MOB_10x”), both the cell segmentation and cell features files are included.

[4]:

%%script echo "Skip cell segmentation"

wsi.process(method='cellpose')

Skip cell segmentation

Preprocessing spatial transcriptome. We used SCANPY to generate an adata for the spots including QC.

We follow the standarized steps used by SCANPY to create the spot adata from the space ranger output directory (st_dir).

The spot adata contains the expression matrix, the location of the spots as pixel positions on the WSI, and the hires, lowres images with scalefactors. - spot.h5ad

You can skip this step and download the files here.The output is in the folder “Spatial_MOB_10x”.

[5]:

st_dir = "10xOlfactory"

trans = Spatial(sn, st_dir)
trans.process_transcriptome(perform_QC=False)

... storing 'feature_types' as categorical
... storing 'genome' as categorical

We can customize the transcriptome preprocessing if needed. Here we do log-normalization.

[6]:

spot_adata_path = trans.spot_adata_path
adata_spot = sc.read_h5ad(spot_adata_path)
sc.pp.normalize_total(adata_spot, target_sum=1e4)
sc.pp.log1p(adata_spot)

adata_spot.write_h5ad(spot_adata_path)

Predicting cell-level gene expression using Markov graph diffusion

After finishing the preprocessing, you should have those files: - The original WSI (image_path) - The cell(nuclei) mask and features (in directory “./WSI_MOB_10x”) - The spot-level gene expression (in directory “./Spatial_MOB_10x”)

[7]:

outdir = os.getcwd()

image_process_dir = os.path.join(outdir, "WSI_MOB_10x")
cell_mask_path = os.path.join(image_process_dir, "cell_mask.npz")
cell_feature_path = os.path.join(image_process_dir, "cell_features.csv")

The first step is to map the spot gene expression to the segmented cells. We use the nearest neighbors approach. This cell-level gene expression is the starting point for the diffusion process.

[8]:

MOB = fineST(
    image_path,
    name=sn,
    spot_adata_path=spot_adata_path,
    cell_features_csv_path=cell_feature_path,
    cell_features_list=['x', 'y', 'mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b', 'std_r', 'std_g', 'std_b'],
)

MOB.prepare_input(mapping_margin=2)

[12/13/23 12:49:29] INFO     Thor: Please check alignment of cells and spots
                    INFO     Thor: The first two columns in the node_feat DataFrame need to match the spatial coordinates from obsm['spatial'].
                    INFO     Thor: Mapping cells to the closest spots within 2 x the spot radius

../_images/notebooks_runThor_MOB_18_1.png

[9]:

MOB.adata

[9]:

AnnData object with n_obs × n_vars = 40543 × 32285
    obs: 'in_tissue', 'array_row', 'array_col', 'spot_barcodes', 'x', 'y', 'mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b', 'std_r', 'std_g', 'std_b', 'spot_heterogeneity'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'log1p', 'spatial', 'cell_image_props'
    obsm: 'spatial'

Second step is set up genes for prediction. For the sake of time, we’ll show prediction of a few genes. The same Markov transition matrix can be applied to all genes. The user-defined genes can be input by either in a 1-column text file or directly as the attribute genes of the MOB object

[10]:

MOB.genes = ['Eomes', 'Uchl1', 'Pcp4l1', 'Col6a1', 'Tbx21', 'Slc17a6', 'Bmp7', 'Camk2a', 'Cystm1', 'Penk', 'Gria1', 'Nap1l5', 'Serpine2', 'Kctd12', 'Pde5a', 'Syt7', 'Vtn']
MOB.set_genes_for_prediction(genes_selection_key=None)
MOB.adata.var[MOB.adata.var['used_for_prediction']]

[10]:

	gene_ids	feature_types	genome	used_for_prediction
Serpine2	ENSMUSG00000026249	Gene Expression	mm10	True
Pcp4l1	ENSMUSG00000038370	Gene Expression	mm10	True
Bmp7	ENSMUSG00000008999	Gene Expression	mm10	True
Pde5a	ENSMUSG00000053965	Gene Expression	mm10	True
Penk	ENSMUSG00000045573	Gene Expression	mm10	True
Uchl1	ENSMUSG00000029223	Gene Expression	mm10	True
Nap1l5	ENSMUSG00000055430	Gene Expression	mm10	True
Slc17a6	ENSMUSG00000030500	Gene Expression	mm10	True
Eomes	ENSMUSG00000032446	Gene Expression	mm10	True
Col6a1	ENSMUSG00000001119	Gene Expression	mm10	True
Gria1	ENSMUSG00000020524	Gene Expression	mm10	True
Vtn	ENSMUSG00000017344	Gene Expression	mm10	True
Tbx21	ENSMUSG00000001444	Gene Expression	mm10	True
Kctd12	ENSMUSG00000098557	Gene Expression	mm10	True
Cystm1	ENSMUSG00000046727	Gene Expression	mm10	True
Camk2a	ENSMUSG00000024617	Gene Expression	mm10	True
Syt7	ENSMUSG00000024743	Gene Expression	mm10	True

In this case study, we include the effect of input spatial transcriptome in constructing the cell-cell network.

Burn in 5 steps using only histology features (vanilla version) for all highly variable genes.
Load the burnin result, perform PCA, and use the transcriptome to adjust the cell-cell network.
The cell-cell network with transcriptome PCA information will be used to perform the production diffusion.

Run time (~ 3 mins on MacbookPro M2)

[11]:

MOB.recipe = 'gene'
MOB.set_params(
    is_rawCount=False,
    out_prefix="fineST",
    write_freq=40,
    n_iter=40,
    conn_csr_matrix="force",
    smoothing_scale=0.8,
    node_features_obs_list=['spot_heterogeneity'],
    n_neighbors=5,
    geom_morph_ratio=0.5,
    geom_constraint=0,
    inflation_percentage=1,
    regulate_expression_mean=False,
    stochastic_expression_neighbors_level='spot',
    smooth_predicted_expression_steps=10,
    reduced_dimension_transcriptome_obsm_key='X_pca',
    adjust_cell_network_by_transcriptome_scale=1,
    n_jobs=4)

MOB.predict_gene_expression()

[12/13/23 12:49:30] INFO     Thor: Using mode gene

Creating piles: 100%|██████████| 4/4 [00:01<00:00,  2.67it/s]

[12/13/23 12:52:17] INFO     Thor: Forcing to recalculate the connectivities.
                    INFO     Thor: Construct SNN with morphological features: ['mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b',
                             'std_r', 'std_g', 'std_b'].
                    INFO     Thor: Incorporate the effect of transcriptome.
                    INFO     Thor: Finish constructing SNN
                    INFO     Thor: Add adata.obsp["snn_connectivities"]
                    INFO     Thor: Add adata.obsp["snn_knn_connectivities"]
                    INFO     Thor: Add adata.uns["snn"]
                    INFO     Thor: Weigh cells according to the spot heterogeneity.
                    INFO     Thor: Promote flow of information from more homogeneous cells to less.
                    INFO     Thor: Eliminate low quality edges (<0.1) between cells.
                    INFO     Thor: Compute transition matrix
                    INFO     Thor: self weight scale is set to: 0.200
                    INFO     Thor: Compute transition matrix
                    INFO     Thor: self weight scale is set to: 1.808
[12/13/23 12:52:18] INFO     Thor: Added transition matrix to adata.obsp["snn_transition_matrix"]

... storing 'spot_barcodes' as categorical

                    INFO     Thor: fineST estimation starts.
[12/13/23 12:52:34] INFO     Thor: fineST estimation finished.

Check the expression of a few genes.

[12]:

ad_thor = MOB.load_result('fineST_40_samp10.npz')
ad_thor

[12]:

AnnData object with n_obs × n_vars = 40543 × 17
    obs: 'in_tissue', 'array_row', 'array_col', 'spot_barcodes', 'x', 'y', 'mean_gray', 'std_gray', 'entropy_img', 'mean_r', 'mean_g', 'mean_b', 'std_r', 'std_g', 'std_b', 'spot_heterogeneity', 'node_weights'
    var: 'gene_ids', 'feature_types', 'genome', 'used_for_prediction', 'used_for_reduced', 'used_for_vae', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'log1p', 'spatial', 'cell_image_props', 'hvg', 'snn'
    obsm: 'spatial', 'X_pca'
    obsp: 'snn_connectivities', 'snn_knn_connectivities', 'snn_transition_matrix'

[13]:

genes = ['Eomes', 'Uchl1', 'Pcp4l1', 'Col6a1', 'Tbx21', 'Slc17a6', 'Bmp7']

ad_spot = sc.read_h5ad(spot_adata_path)
sc.pl.spatial(ad_spot, color=genes, vmax='p99', frameon=False, ncols=4)

../_images/notebooks_runThor_MOB_26_0.png

[14]:

genes = ['Eomes', 'Uchl1', 'Pcp4l1']

sc.set_figure_params(scanpy=True, dpi=80, dpi_save=100, facecolor='k', frameon=False)
fw2 = LinearSegmentedColormap.from_list('fw2', colors.continuous_palettes['fireworks2'])

fig, axes = plt.subplots(1, 3, figsize=(15, 5))
i = 0

for gene in genes:
    sc.pl.spatial(ad_thor, color=gene, spot_size=15, vmin='p5', vmax='p95', colorbar_loc=None, cmap='coolwarm', img_key=None, show=False, ax=axes[i])
    axes[i].set_title(gene, color='w')
    i += 1
plt.show()

../_images/notebooks_runThor_MOB_27_0.png

Visualize module-specific genes

[15]:

from matplotlib.colors import LinearSegmentedColormap
from thor.pl import colors

mod_genes = ['Col6a1', 'Tbx21', 'Slc17a6', 'Bmp7']


sc.set_figure_params(scanpy=True, dpi=80, dpi_save=100, facecolor='k', frameon=False)
yb = LinearSegmentedColormap.from_list('yb', colors.continuous_palettes['blueYellow'])

fig, axes = plt.subplots(1, 4, figsize=(20, 5))
i = 0

for gene in mod_genes:
    sc.pl.spatial(ad_thor, color=gene, spot_size=15, vmin='p2', vmax='p98', colorbar_loc=None, cmap=yb, img_key=None, show=False, ax=axes[i])
    axes[i].set_title(gene, color='w')
    i += 1
plt.show()

../_images/notebooks_runThor_MOB_29_0.png

[ ]: