Core class API¶

Core module

Module holding class that implements the analysis pipeline

analysisPipeline.process(df1other, df2main, df2other, dir1, dir2, genesOfInterest=None, knownRegulators=None, nCPUs=4, panels=['fraction', 'binomial', 'top50', 'markers', 'combo3avgs', 'combo4avgs'], parallelBootstrap=False, exprCutoff1=0.05, exprCutoff2=0.05, perEachOtherCase=True, doScramble=False, part1=True, part2=True, part3=True, **kwargs)¶

Main workflow programmed in two scenaria depending on parameter “perEachOtherCase”.

Parameters:

df1main: pandas.DataFrame: Expression data of main group of cells of the first species
df1other: pandas.DataFrame: Expression data of other cells of the first species
df2main: pandas.DataFrame: Expression data of main group of cells of the second species
df2other: pandas.DataFrame: Expression data of other cells of the second species
dir1: str: Path to the first species working directory
dir2: str: Path to the second species working directory
genesOfInterest: list, Default None: Particular genes to analyze, e.g. receptors
knownRegulators: list, Default None: Known marker genes
nCPUs: int, Default 1: Number of CPUs to use for multiprocessing, recommended 10-20
panels: list, Default None: Particular measurements to include in the analysis
parallelBootstrap: boolean, Default False: Whether to generate bootstrap experiments in parallel mode
exprCutoff1: float, Default 0.05: Per-batch expression cutoff for the first dataset
exprCutoff2: float, Default 0.05: Per-batch expression cutoff for the second dataset
perEachOtherCase: boolean, Default True: Scenario of comparison

Any other parameters that class “Analysis” can take

Returns:

Analysis: First class Analysis instance
Analysis: Second class Analysis instance

class Analysis(workingDir='', otherCaseDir='', genesOfInterest=None, knownRegulators=None, nCPUs=1, panels=None, nBootstrap=100, majorMetric='correlation', perEachOtherCase=False, metricsFile='metricsFile.h5', seed=None, PCNpath='data/', minBatches=5, pseudoBatches=10, dendrogramMetric='euclidean', dendrogramLinkageMethod='ward', methodForDEG='ttest')[source]¶

Bases: object

Class of analysis and visualization functions for DECNEO

Parameters:

workingDir: str, Default ‘’: Directory to retrieve and save files and results to
otherCaseDir: str, Default ‘’: Directory holding comparison (other species) data
genesOfInterest: list, Default None: Particular genes to analyze, e.g. receptors
knownRegulators: list, Default None: Known marker genes
nCPUs: int, Default 1: Number of CPUs to use for multiprocessing, recommended 10-20
panels: list, Default None: Particular measurements to include in the analysis
nBootstrap: int, Default 100: Number of bootstrap experiments to perform
majorMetric: str, Default ‘correlation’: Metric name (e.g. ‘correlation’, ‘cosine’, ‘euclidean’, ‘spearman’)
methodForDEG: str, Default ‘ttest’: Possible options: {‘ttest’, ‘mannwhitneyu’}
perEachOtherCase: boolean, Default False: Whether to perform comparisons of bootstrap experiments with other bootstrap experiments or with a single case
metricsFile: str, ‘metricsFile.h5’: Name of file where gene expression distance data is saved for specified metric
seed: int, None: Used to set randomness deterministic
PCNpath: str, Default ‘data/’: Path to PCN file

Methods:

`analyzeAllPeaksOfCombinationVariant`(variant)	Find all peaks and their frequency from the bootstrap experiments
`analyzeBootstrapExperiments`()	Analyze all bootstrap experiments
`analyzeCase`(df_expr[, …])	Analyze, calculate, and generate plots for individual experiment
`analyzeCombinationVariant`(variant)	Analyze a combination of measures (same as in panels)
`bootstrapMaxpeakPlot`(variant)	Bootstrap max-peak plot
`compareTwoCases`(saveDir1, saveDir2[, name1, …])	Compare gene measurements between two cases for each bootstrap experiment
`generateAnalysisReport`()	Generate analysis report.
`prepareBootstrapExperiments`([allDataToo, …])	Prepare bootstrap experiments data and calculating gene statistics for each experiment
`prepareDEG`(dfa, dfb[, pvalueLimit])	Save gene expression data of cell type of interest.
`preparePerBatchCase`(**kwargs)	Process gene expression data to generate per-batch distance measure and save to file.
`reanalyzeMain`([case])	Reanalyze case
`runPairOfExperiments`(args)	Analyze the case, compare it with comparison case, find the conserved genes between the cases, analyze case again
`scramble`(measures[, subDir, case, N, M, …])	Run control analysis for the dendrogram order

Attributes:

`combinationPanels`
`combinationPanelsDict`
`deprecatedPanels`
`standardPanels`

standardPanels = ['fraction', 'binomial', 'top50', 'markers']¶

deprecatedPanels = ['PubMedHits', 'gAbove50_PanglaoMouse', 'gAbove50_PanglaoHuman', 'GOpositive', 'GOnegative', 'markerstop50', 'expression', 'closeness', 'age', 'rate']¶

combinationPanels = ['combo3avgs', 'combo4avgs']¶

combinationPanelsDict = {'combo2avgs': ['fraction', 'binomial'], 'combo3avgs': ['fraction', 'top50', 'binomial'], 'combo4avgs': ['fraction', 'top50', 'binomial', 'markers'], 'combo5-1avgs': ['fraction', 'top50', 'binomial', 'markers', 'PubMedHits'], 'combo5-2avgs': ['fraction', 'top50', 'binomial', 'markers', 'gAbove50_PanglaoMouse'], 'combo5-3avgs': ['fraction', 'top50', 'binomial', 'markers', 'gAbove50_PanglaoHuman'], 'combo6-1avgs': ['fraction', 'top50', 'binomial', 'markers', 'PubMedHits', 'gAbove50_PanglaoMouse'], 'combo6-2avgs': ['fraction', 'top50', 'binomial', 'markers', 'PubMedHits', 'gAbove50_PanglaoHuman'], 'combo6-3avgs': ['fraction', 'top50', 'binomial', 'markers', 'gAbove50_PanglaoMouse', 'gAbove50_PanglaoHuman'], 'combo7avgs': ['fraction', 'top50', 'binomial', 'markers', 'PubMedHits', 'gAbove50_PanglaoMouse', 'gAbove50_PanglaoHuman']}¶

prepareDEG(dfa, dfb, pvalueLimit=0.001)[source]¶

Save gene expression data of cell type of interest. Create rank dataframe (df_ranks) with genes ranked by differential expression

Parameters:

dfa: pandas.Dataframe: Dataframe containing expression data for cell type of interest Has genes as rows and (batches, cells) as columns
dfb: pandas.Dataframe: Dataframe containing expression data for cells of type other than cell type of interest Has genes as rows and (batches, cells) as columns
pvalueLimit: float, Default 0.001: Maximum possible p-value to include

Returns:

None

Usage:

prepareDEG(dfa, dfb)

preparePerBatchCase(**kwargs)[source]¶

Process gene expression data to generate per-batch distance measure and save to file. No plots are generated

Parameters:

Any parameters that function ‘analyzeCase’ can accept

Returns:

None

Usage:

an = Analysis()

an.preparePerBatchCase()

prepareBootstrapExperiments(allDataToo=True, df_ranks=None, parallel=False)[source]¶

Prepare bootstrap experiments data and calculating gene statistics for each experiment

Parameters:

allDataToo: boolean, Default True: Whether to prepare experiment for all data as well
df_ranks: pd.DataFrame, Default None: Genes ranked by differential expression If None function will use rank dataframe from working directory

Returns:

None

Usage:

an = Analysis()

an.prepareBootstrapExperiments()

compareTwoCases(saveDir1, saveDir2, name1='N1', name2='N2', saveName='saveName')[source]¶

Compare gene measurements between two cases for each bootstrap experiment

Parameters:

saveDir1: str: Directory storing gene measurement data for case 1
saveDir2: str: Directory storing gene measurement data for case 2
name1: str, Default ‘N1’: Phrase to append to keys of the resulting dataframe for case 1
name2: str, Default ‘N2’: Phrase to append to keys of the resulting dataframe for case 2
saveName: str, Default ‘saveName’: Name of file to save result dataframe to

Returns:

None

Usage:

an = Analysis()

an.compareTwoCases(saveDir1, saveDir2, name1, name2, saveName)

runPairOfExperiments(args)[source]¶

Analyze the case, compare it with comparison case, find the conserved genes between the cases, analyze case again

Parameters:

saveDir: str: Directory with all bootstrap experiments
saveSubDir: str: Subdirectory for a bootstrap experiment
otherCaseDir: str: Directory holding comparison data

Returns:

None

Usage:

For internal use only

analyzeBootstrapExperiments()[source]¶

Analyze all bootstrap experiments

Parameters:

None

Returns:

None

Usage:

an = Analysis()

an.analyzeBootstrapExperiments()

analyzeCombinationVariant(variant)[source]¶

Analyze a combination of measures (same as in panels)

Parameters:

variant: str: Name of combination variant (e.g. ‘Avg combo4avgs’, ‘Avg combo3avgs’)

Returns:

pandas.DataFrame: Analysis result

Usage:

an = Analysis()

an.analyzeCombinationVariant(variant)

scramble(measures, subDir='', case='All', N=10000, M=20, getMax=False, maxSuff='')[source]¶

Run control analysis for the dendrogram order

Parameters:

measures: list: Measures (e.g: [Markers’, ‘Binomial -log(pvalue)’, ‘Top50 overlap’])
subDir: str, Default ‘’: Subdirectory to save dataframe to
N: int: Chunk size
M: int: Number of chunks

Returns:

None

Usage:

an = Analysis()

an.scramble (measures)

analyzeCase(df_expr, toggleCalculateMajorMetric=True, exprCutoff=0.05, toggleExportFigureData=True, toggleCalculateMeasures=True, suffix='', saveDir='', toggleGroupBatches=True, dpi=300, toggleAdjustText=True, markersLabelsRepelForce=1.5, figureSize=(8, 22), toggleAdjustFigureHeight=True, noPlot=False, halfWindowSize=10, printStages=True, externalPanelsData=None, toggleIncludeHeatmap=True, addDeprecatedPanels=False, includeClusterNumber=True, togglePublicationFigure=False)[source]¶

Analyze, calculate, and generate plots for individual experiment

Parameters:

df_expr: pandas.Dataframe: Gene expression data
toggleCalculateMajorMatric: boolean, Default True: Whether to calculate cdist of major metric. This is a legacy parameter
exprCutoff: float, Default 0.05: Cutoff for percent expression in a batch of input data
toggleExportFigureData: boolean, Default True: Whether to export figure data
toggleCalculateMeasures: boolean, Default True: Whether to calculate measures
suffix: str, Default ‘’: Name of experiment
saveDir: str, Default ‘’: Exerything is exported to this directory, should be unique for each dataset
toggleGroupBatches: boolean, Default True: Whether to group batches or save per-batch distance measure
dpi: int or ‘figure’, Default 300: Resolution in dots per inch, if ‘float’ use figures dpi value
toggleAdjustText: boolean, Default True: Whether to use (external) module to minimize text overlap in figure
figure_size: tuple, Default (8, 20): Width, height in inches
toggleAdjustFigureHeight: boolean, Default True: Whether to adjust figure height
noPlot: boolean, Default False: Whether to generate plot
halfWindowSize: int, Default 10: Moving average half-window size
printStages: boolean, Default True: Whether to print stage status to output
externalPanelsData: dict, Default None: Dictionary containing additional panels data
toggleIncludeHeatmap: boolean, Default True: Whether to include heatmap in figure
addDeprecatedPanels: boolean, Default False: Whether to include deprecated panels

Returns:

None

Usage:

self.analyzeCase(df_expr)

reanalyzeMain(case='All', **kwargs)[source]¶

Reanalyze case

Parameters:

Any parameters that function ‘analyzeCase’ can accept

Returns:

None

Usage:

an = Analyze()

an.reanalyzeMain()

analyzeAllPeaksOfCombinationVariant(variant, nG=8, nE=30, fcutoff=0.5, width=50)[source]¶

Find all peaks and their frequency from the bootstrap experiments

Parameters:

variant: str: Name of combination variant (e.g. ‘Avg combo4avgs’, ‘Avg combo3avgs’)
nG: int, Default 8: Number of clusters of genes
nE: int, Default 30: Number of clusters of bootstrap experiments
fcutoff: float, Default 0.5: Lower peak height cutoff
width: int, Default 50: Width of peak

Returns:

None

Usage:

an = Analyze()

an.analyzeAllPeaksOfCombinationVariant(‘Avg combo4avgs’, nG=8, nE=30, fcutoff=0.5, width=50)

bootstrapMaxpeakPlot(variant)[source]¶: Bootstrap max-peak plot

generateAnalysisReport()[source]¶: Generate analysis report.