Core class API

Core module

Description of the package functionality

Module holding class that implements the analysis pipeline



analysisPipeline.process(df1other, df2main, df2other, dir1, dir2, genesOfInterest=None, knownRegulators=None, nCPUs=4, panels=['fraction', 'binomial', 'top50', 'markers', 'combo3avgs', 'combo4avgs'], parallelBootstrap=False, exprCutoff1=0.05, exprCutoff2=0.05, perEachOtherCase=True, doScramble=False, part1=True, part2=True, part3=True, **kwargs)

Main workflow programmed in two scenaria depending on parameter “perEachOtherCase”.

Parameters:
df1main: pandas.DataFrame

Expression data of main group of cells of the first species

df1other: pandas.DataFrame

Expression data of other cells of the first species

df2main: pandas.DataFrame

Expression data of main group of cells of the second species

df2other: pandas.DataFrame

Expression data of other cells of the second species

dir1: str

Path to the first species working directory

dir2: str

Path to the second species working directory

genesOfInterest: list, Default None

Particular genes to analyze, e.g. receptors

knownRegulators: list, Default None

Known marker genes

nCPUs: int, Default 1

Number of CPUs to use for multiprocessing, recommended 10-20

panels: list, Default None

Particular measurements to include in the analysis

parallelBootstrap: boolean, Default False

Whether to generate bootstrap experiments in parallel mode

exprCutoff1: float, Default 0.05

Per-batch expression cutoff for the first dataset

exprCutoff2: float, Default 0.05

Per-batch expression cutoff for the second dataset

perEachOtherCase: boolean, Default True

Scenario of comparison

Any other parameters that class “Analysis” can take

Returns:
Analysis

First class Analysis instance

Analysis

Second class Analysis instance

class Analysis(workingDir='', otherCaseDir='', genesOfInterest=None, knownRegulators=None, nCPUs=1, panels=None, nBootstrap=100, majorMetric='correlation', perEachOtherCase=False, metricsFile='metricsFile.h5', seed=None, PCNpath='data/', minBatches=5, pseudoBatches=10, dendrogramMetric='euclidean', dendrogramLinkageMethod='ward', methodForDEG='ttest')[source]

Bases: object

Class of analysis and visualization functions for DECNEO

Parameters:
workingDir: str, Default ‘’

Directory to retrieve and save files and results to

otherCaseDir: str, Default ‘’

Directory holding comparison (other species) data

genesOfInterest: list, Default None

Particular genes to analyze, e.g. receptors

knownRegulators: list, Default None

Known marker genes

nCPUs: int, Default 1

Number of CPUs to use for multiprocessing, recommended 10-20

panels: list, Default None

Particular measurements to include in the analysis

nBootstrap: int, Default 100

Number of bootstrap experiments to perform

majorMetric: str, Default ‘correlation’

Metric name (e.g. ‘correlation’, ‘cosine’, ‘euclidean’, ‘spearman’)

methodForDEG: str, Default ‘ttest’

Possible options: {‘ttest’, ‘mannwhitneyu’}

perEachOtherCase: boolean, Default False

Whether to perform comparisons of bootstrap experiments with other bootstrap experiments or with a single case

metricsFile: str, ‘metricsFile.h5’

Name of file where gene expression distance data is saved for specified metric

seed: int, None

Used to set randomness deterministic

PCNpath: str, Default ‘data/’

Path to PCN file

Methods:

analyzeAllPeaksOfCombinationVariant(variant)

Find all peaks and their frequency from the bootstrap experiments

analyzeBootstrapExperiments()

Analyze all bootstrap experiments

analyzeCase(df_expr[, …])

Analyze, calculate, and generate plots for individual experiment

analyzeCombinationVariant(variant)

Analyze a combination of measures (same as in panels)

bootstrapMaxpeakPlot(variant)

Bootstrap max-peak plot

compareTwoCases(saveDir1, saveDir2[, name1, …])

Compare gene measurements between two cases for each bootstrap experiment

generateAnalysisReport()

Generate analysis report.

prepareBootstrapExperiments([allDataToo, …])

Prepare bootstrap experiments data and calculating gene statistics for each experiment

prepareDEG(dfa, dfb[, pvalueLimit])

Save gene expression data of cell type of interest.

preparePerBatchCase(**kwargs)

Process gene expression data to generate per-batch distance measure and save to file.

reanalyzeMain([case])

Reanalyze case

runPairOfExperiments(args)

Analyze the case, compare it with comparison case, find the conserved genes between the cases, analyze case again

scramble(measures[, subDir, case, N, M, …])

Run control analysis for the dendrogram order

Attributes:

combinationPanels

combinationPanelsDict

deprecatedPanels

standardPanels

standardPanels = ['fraction', 'binomial', 'top50', 'markers']
deprecatedPanels = ['PubMedHits', 'gAbove50_PanglaoMouse', 'gAbove50_PanglaoHuman', 'GOpositive', 'GOnegative', 'markerstop50', 'expression', 'closeness', 'age', 'rate']
combinationPanels = ['combo3avgs', 'combo4avgs']
combinationPanelsDict = {'combo2avgs': ['fraction', 'binomial'], 'combo3avgs': ['fraction', 'top50', 'binomial'], 'combo4avgs': ['fraction', 'top50', 'binomial', 'markers'], 'combo5-1avgs': ['fraction', 'top50', 'binomial', 'markers', 'PubMedHits'], 'combo5-2avgs': ['fraction', 'top50', 'binomial', 'markers', 'gAbove50_PanglaoMouse'], 'combo5-3avgs': ['fraction', 'top50', 'binomial', 'markers', 'gAbove50_PanglaoHuman'], 'combo6-1avgs': ['fraction', 'top50', 'binomial', 'markers', 'PubMedHits', 'gAbove50_PanglaoMouse'], 'combo6-2avgs': ['fraction', 'top50', 'binomial', 'markers', 'PubMedHits', 'gAbove50_PanglaoHuman'], 'combo6-3avgs': ['fraction', 'top50', 'binomial', 'markers', 'gAbove50_PanglaoMouse', 'gAbove50_PanglaoHuman'], 'combo7avgs': ['fraction', 'top50', 'binomial', 'markers', 'PubMedHits', 'gAbove50_PanglaoMouse', 'gAbove50_PanglaoHuman']}
prepareDEG(dfa, dfb, pvalueLimit=0.001)[source]

Save gene expression data of cell type of interest. Create rank dataframe (df_ranks) with genes ranked by differential expression

Parameters:
dfa: pandas.Dataframe

Dataframe containing expression data for cell type of interest Has genes as rows and (batches, cells) as columns

dfb: pandas.Dataframe

Dataframe containing expression data for cells of type other than cell type of interest Has genes as rows and (batches, cells) as columns

pvalueLimit: float, Default 0.001

Maximum possible p-value to include

Returns:

None

Usage:

prepareDEG(dfa, dfb)

preparePerBatchCase(**kwargs)[source]

Process gene expression data to generate per-batch distance measure and save to file. No plots are generated

Parameters:

Any parameters that function ‘analyzeCase’ can accept

Returns:

None

Usage:

an = Analysis()

an.preparePerBatchCase()

prepareBootstrapExperiments(allDataToo=True, df_ranks=None, parallel=False)[source]

Prepare bootstrap experiments data and calculating gene statistics for each experiment

Parameters:
allDataToo: boolean, Default True

Whether to prepare experiment for all data as well

df_ranks: pd.DataFrame, Default None

Genes ranked by differential expression If None function will use rank dataframe from working directory

Returns:

None

Usage:

an = Analysis()

an.prepareBootstrapExperiments()

compareTwoCases(saveDir1, saveDir2, name1='N1', name2='N2', saveName='saveName')[source]

Compare gene measurements between two cases for each bootstrap experiment

Parameters:
saveDir1: str

Directory storing gene measurement data for case 1

saveDir2: str

Directory storing gene measurement data for case 2

name1: str, Default ‘N1’

Phrase to append to keys of the resulting dataframe for case 1

name2: str, Default ‘N2’

Phrase to append to keys of the resulting dataframe for case 2

saveName: str, Default ‘saveName’

Name of file to save result dataframe to

Returns:

None

Usage:

an = Analysis()

an.compareTwoCases(saveDir1, saveDir2, name1, name2, saveName)

runPairOfExperiments(args)[source]

Analyze the case, compare it with comparison case, find the conserved genes between the cases, analyze case again

Parameters:
saveDir: str

Directory with all bootstrap experiments

saveSubDir: str

Subdirectory for a bootstrap experiment

otherCaseDir: str

Directory holding comparison data

Returns:

None

Usage:

For internal use only

analyzeBootstrapExperiments()[source]

Analyze all bootstrap experiments

Parameters:

None

Returns:

None

Usage:

an = Analysis()

an.analyzeBootstrapExperiments()

analyzeCombinationVariant(variant)[source]

Analyze a combination of measures (same as in panels)

Parameters:
variant: str

Name of combination variant (e.g. ‘Avg combo4avgs’, ‘Avg combo3avgs’)

Returns:
pandas.DataFrame

Analysis result

Usage:

an = Analysis()

an.analyzeCombinationVariant(variant)

scramble(measures, subDir='', case='All', N=10000, M=20, getMax=False, maxSuff='')[source]

Run control analysis for the dendrogram order

Parameters:
measures: list

Measures (e.g: [Markers’, ‘Binomial -log(pvalue)’, ‘Top50 overlap’])

subDir: str, Default ‘’

Subdirectory to save dataframe to

N: int

Chunk size

M: int

Number of chunks

Returns:

None

Usage:

an = Analysis()

an.scramble (measures)

analyzeCase(df_expr, toggleCalculateMajorMetric=True, exprCutoff=0.05, toggleExportFigureData=True, toggleCalculateMeasures=True, suffix='', saveDir='', toggleGroupBatches=True, dpi=300, toggleAdjustText=True, markersLabelsRepelForce=1.5, figureSize=(8, 22), toggleAdjustFigureHeight=True, noPlot=False, halfWindowSize=10, printStages=True, externalPanelsData=None, toggleIncludeHeatmap=True, addDeprecatedPanels=False, includeClusterNumber=True, togglePublicationFigure=False)[source]

Analyze, calculate, and generate plots for individual experiment

Parameters:
df_expr: pandas.Dataframe

Gene expression data

toggleCalculateMajorMatric: boolean, Default True

Whether to calculate cdist of major metric. This is a legacy parameter

exprCutoff: float, Default 0.05

Cutoff for percent expression in a batch of input data

toggleExportFigureData: boolean, Default True

Whether to export figure data

toggleCalculateMeasures: boolean, Default True

Whether to calculate measures

suffix: str, Default ‘’

Name of experiment

saveDir: str, Default ‘’

Exerything is exported to this directory, should be unique for each dataset

toggleGroupBatches: boolean, Default True

Whether to group batches or save per-batch distance measure

dpi: int or ‘figure’, Default 300

Resolution in dots per inch, if ‘float’ use figures dpi value

toggleAdjustText: boolean, Default True

Whether to use (external) module to minimize text overlap in figure

figure_size: tuple, Default (8, 20)

Width, height in inches

toggleAdjustFigureHeight: boolean, Default True

Whether to adjust figure height

noPlot: boolean, Default False

Whether to generate plot

halfWindowSize: int, Default 10

Moving average half-window size

printStages: boolean, Default True

Whether to print stage status to output

externalPanelsData: dict, Default None

Dictionary containing additional panels data

toggleIncludeHeatmap: boolean, Default True

Whether to include heatmap in figure

addDeprecatedPanels: boolean, Default False

Whether to include deprecated panels

Returns:

None

Usage:

self.analyzeCase(df_expr)

reanalyzeMain(case='All', **kwargs)[source]

Reanalyze case

Parameters:

Any parameters that function ‘analyzeCase’ can accept

Returns:

None

Usage:

an = Analyze()

an.reanalyzeMain()

analyzeAllPeaksOfCombinationVariant(variant, nG=8, nE=30, fcutoff=0.5, width=50)[source]

Find all peaks and their frequency from the bootstrap experiments

Parameters:
variant: str

Name of combination variant (e.g. ‘Avg combo4avgs’, ‘Avg combo3avgs’)

nG: int, Default 8

Number of clusters of genes

nE: int, Default 30

Number of clusters of bootstrap experiments

fcutoff: float, Default 0.5

Lower peak height cutoff

width: int, Default 50

Width of peak

Returns:

None

Usage:

an = Analyze()

an.analyzeAllPeaksOfCombinationVariant(‘Avg combo4avgs’, nG=8, nE=30, fcutoff=0.5, width=50)

bootstrapMaxpeakPlot(variant)[source]

Bootstrap max-peak plot

generateAnalysisReport()[source]

Generate analysis report.