thor.analy.prepare_and_run_copykat

thor.analy.prepare_and_run_copykat(adata: AnnData, datadir: str | None = None, layer: str | None = None, batch_size: int = 10, id_type: str = 'S', ngene_chr: int = 5, win_size: int = 25, KS_cut: float = 0.1, sam_name: str = '', distance: str = 'euclidean', norm_cell_names: str = '', output_seg: str = 'FALSE', plot_genes: str = 'TRUE', genome: str = 'hg20', n_cores: int = 1, copykat: bool = True) None[source]

Run CopyKAT on the input data. Requires R and the CopyKAT package to be installed. Refer to the CopyKAT documentation for more information: CopyKAT.

Parameters:
  • adata (anndata.AnnData) – Gene expression data.

  • datadir (str, optional) – Directory where data will be saved. Default is None. If None, data will be saved in the current working directory.

  • layer (str, optional) – Name of the layer in adata to use as input data. Default is None. If None, the X layer will be used.

  • batch_size (int) – Number of subfolders to process in parallel. Default is 10.

  • id_type (str) – CopyKAT parameter id.type. Gene identification type. Default is “S”(“s”), which refers to gene symbol. Other option is “E”(“e”) for Ensembl ID.

  • ngene_chr (int) – CopyKAT parameter ngene.chr. Minimal number of genes per chromosome for cell filtering. Default is 5.

  • win_size (int) – CopyKAT parameter win.size. Minimal window sizes for segmentation. Default is 25.

  • KS_cut (float) – CopyKAT parameter KS.cut. Segmentation parameter ranging from 0 to 1; larger value means looser criteria. Default is 0.1.

  • sam_name (str) – CopyKAT parameter sam.name. Sample name. Default is “”.

  • distance (str) – CopyKAT parameter distance. Distance metric. Default is “euclidean”. Other options are “pearson” and “spearman”.

  • norm_cell_names (str) – CopyKAT parameter norm.cell.names. A vector of normal cell names. Default is “”.

  • output_seg (str) – CopyKAT parameter output.seg. Whether to output segmentation results for IGV visualization. Default is “FALSE”. Other option is “TRUE”. Note that it is a string and not a boolean.

  • plot_genes (str) – CopyKAT parameter plot.genes. Whether to output heatmap of CNAs with genename labels. Default is “TRUE”. Other option is “FALSE”. Note that it is a string and not a boolean.

  • genome (str) – CopyKAT parameter genome. Genome name. Default is “hg20” for human genome version 20. Other option is “mm10” for mouse genome version 10.

  • n_cores (int) – CopyKAT parameter n.cores. Number of CPU cores for parallel computing. Default is 1. Recommended to use 1 core if batch_size > 1.

  • copykat (bool) – Whether to run the CopyKAT analysis. Default is True. If False, the function will only split the data into smaller chunks and prepare the R script for CopyKAT.

Returns:

Results are saved in the current working directory.

Return type:

None