thor.analy.prepare_and_run_copykat
- thor.analy.prepare_and_run_copykat(adata: AnnData, datadir: str | None = None, layer: str | None = None, batch_size: int = 10, id_type: str = 'S', ngene_chr: int = 5, win_size: int = 25, KS_cut: float = 0.1, sam_name: str = '', distance: str = 'euclidean', norm_cell_names: str = '', output_seg: str = 'FALSE', plot_genes: str = 'TRUE', genome: str = 'hg20', n_cores: int = 1, copykat: bool = True) None [source]
Run CopyKAT on the input data. Requires R and the CopyKAT package to be installed. Refer to the CopyKAT documentation for more information: CopyKAT.
- Parameters:
adata (
anndata.AnnData
) – Gene expression data.datadir (
str
, optional) – Directory where data will be saved. Default isNone
. IfNone
, data will be saved in the current working directory.layer (
str
, optional) – Name of the layer in adata to use as input data. Default isNone
. IfNone
, the X layer will be used.batch_size (
int
) – Number of subfolders to process in parallel. Default is 10.id_type (
str
) – CopyKAT parameter id.type. Gene identification type. Default is “S”(“s”), which refers to gene symbol. Other option is “E”(“e”) for Ensembl ID.ngene_chr (
int
) – CopyKAT parameter ngene.chr. Minimal number of genes per chromosome for cell filtering. Default is 5.win_size (
int
) – CopyKAT parameter win.size. Minimal window sizes for segmentation. Default is 25.KS_cut (
float
) – CopyKAT parameter KS.cut. Segmentation parameter ranging from 0 to 1; larger value means looser criteria. Default is 0.1.sam_name (
str
) – CopyKAT parameter sam.name. Sample name. Default is “”.distance (
str
) – CopyKAT parameter distance. Distance metric. Default is “euclidean”. Other options are “pearson” and “spearman”.norm_cell_names (
str
) – CopyKAT parameter norm.cell.names. A vector of normal cell names. Default is “”.output_seg (
str
) – CopyKAT parameter output.seg. Whether to output segmentation results for IGV visualization. Default is “FALSE”. Other option is “TRUE”. Note that it is a string and not a boolean.plot_genes (
str
) – CopyKAT parameter plot.genes. Whether to output heatmap of CNAs with genename labels. Default is “TRUE”. Other option is “FALSE”. Note that it is a string and not a boolean.genome (
str
) – CopyKAT parameter genome. Genome name. Default is “hg20” for human genome version 20. Other option is “mm10” for mouse genome version 10.n_cores (
int
) – CopyKAT parameter n.cores. Number of CPU cores for parallel computing. Default is 1. Recommended to use 1 core if batch_size > 1.copykat (
bool
) – Whether to run the CopyKAT analysis. Default is True. If False, the function will only split the data into smaller chunks and prepare the R script for CopyKAT.
- Returns:
Results are saved in the current working directory.
- Return type: