Core class API¶
Core module
Description of the package functionality
Module holding class that implements the analysis pipeline
-
analysisPipeline.process(df1other, df2main, df2other, dir1, dir2, genesOfInterest=None, knownRegulators=None, nCPUs=4, panels=['fraction', 'binomial', 'top50', 'markers', 'combo3avgs', 'combo4avgs'], parallelBootstrap=False, exprCutoff1=0.05, exprCutoff2=0.05, perEachOtherCase=True, doScramble=False, part1=True, part2=True, part3=True, **kwargs)¶ Main workflow programmed in two scenaria depending on parameter “perEachOtherCase”.
- Parameters:
- df1main: pandas.DataFrame
Expression data of main group of cells of the first species
- df1other: pandas.DataFrame
Expression data of other cells of the first species
- df2main: pandas.DataFrame
Expression data of main group of cells of the second species
- df2other: pandas.DataFrame
Expression data of other cells of the second species
- dir1: str
Path to the first species working directory
- dir2: str
Path to the second species working directory
- genesOfInterest: list, Default None
Particular genes to analyze, e.g. receptors
- knownRegulators: list, Default None
Known marker genes
- nCPUs: int, Default 1
Number of CPUs to use for multiprocessing, recommended 10-20
- panels: list, Default None
Particular measurements to include in the analysis
- parallelBootstrap: boolean, Default False
Whether to generate bootstrap experiments in parallel mode
- exprCutoff1: float, Default 0.05
Per-batch expression cutoff for the first dataset
- exprCutoff2: float, Default 0.05
Per-batch expression cutoff for the second dataset
- perEachOtherCase: boolean, Default True
Scenario of comparison
Any other parameters that class “Analysis” can take
- Returns:
- Analysis
First class Analysis instance
- Analysis
Second class Analysis instance
-
class
Analysis(workingDir='', otherCaseDir='', genesOfInterest=None, knownRegulators=None, nCPUs=1, panels=None, nBootstrap=100, majorMetric='correlation', perEachOtherCase=False, metricsFile='metricsFile.h5', seed=None, PCNpath='data/', minBatches=5, pseudoBatches=10, dendrogramMetric='euclidean', dendrogramLinkageMethod='ward', methodForDEG='ttest')[source]¶ Bases:
objectClass of analysis and visualization functions for DECNEO
- Parameters:
- workingDir: str, Default ‘’
Directory to retrieve and save files and results to
- otherCaseDir: str, Default ‘’
Directory holding comparison (other species) data
- genesOfInterest: list, Default None
Particular genes to analyze, e.g. receptors
- knownRegulators: list, Default None
Known marker genes
- nCPUs: int, Default 1
Number of CPUs to use for multiprocessing, recommended 10-20
- panels: list, Default None
Particular measurements to include in the analysis
- nBootstrap: int, Default 100
Number of bootstrap experiments to perform
- majorMetric: str, Default ‘correlation’
Metric name (e.g. ‘correlation’, ‘cosine’, ‘euclidean’, ‘spearman’)
- methodForDEG: str, Default ‘ttest’
Possible options: {‘ttest’, ‘mannwhitneyu’}
- perEachOtherCase: boolean, Default False
Whether to perform comparisons of bootstrap experiments with other bootstrap experiments or with a single case
- metricsFile: str, ‘metricsFile.h5’
Name of file where gene expression distance data is saved for specified metric
- seed: int, None
Used to set randomness deterministic
- PCNpath: str, Default ‘data/’
Path to PCN file
Methods:
analyzeAllPeaksOfCombinationVariant(variant)Find all peaks and their frequency from the bootstrap experiments
Analyze all bootstrap experiments
analyzeCase(df_expr[, …])Analyze, calculate, and generate plots for individual experiment
analyzeCombinationVariant(variant)Analyze a combination of measures (same as in panels)
bootstrapMaxpeakPlot(variant)Bootstrap max-peak plot
compareTwoCases(saveDir1, saveDir2[, name1, …])Compare gene measurements between two cases for each bootstrap experiment
Generate analysis report.
prepareBootstrapExperiments([allDataToo, …])Prepare bootstrap experiments data and calculating gene statistics for each experiment
prepareDEG(dfa, dfb[, pvalueLimit])Save gene expression data of cell type of interest.
preparePerBatchCase(**kwargs)Process gene expression data to generate per-batch distance measure and save to file.
reanalyzeMain([case])Reanalyze case
runPairOfExperiments(args)Analyze the case, compare it with comparison case, find the conserved genes between the cases, analyze case again
scramble(measures[, subDir, case, N, M, …])Run control analysis for the dendrogram order
Attributes:
-
standardPanels= ['fraction', 'binomial', 'top50', 'markers']¶
-
deprecatedPanels= ['PubMedHits', 'gAbove50_PanglaoMouse', 'gAbove50_PanglaoHuman', 'GOpositive', 'GOnegative', 'markerstop50', 'expression', 'closeness', 'age', 'rate']¶
-
combinationPanels= ['combo3avgs', 'combo4avgs']¶
-
combinationPanelsDict= {'combo2avgs': ['fraction', 'binomial'], 'combo3avgs': ['fraction', 'top50', 'binomial'], 'combo4avgs': ['fraction', 'top50', 'binomial', 'markers'], 'combo5-1avgs': ['fraction', 'top50', 'binomial', 'markers', 'PubMedHits'], 'combo5-2avgs': ['fraction', 'top50', 'binomial', 'markers', 'gAbove50_PanglaoMouse'], 'combo5-3avgs': ['fraction', 'top50', 'binomial', 'markers', 'gAbove50_PanglaoHuman'], 'combo6-1avgs': ['fraction', 'top50', 'binomial', 'markers', 'PubMedHits', 'gAbove50_PanglaoMouse'], 'combo6-2avgs': ['fraction', 'top50', 'binomial', 'markers', 'PubMedHits', 'gAbove50_PanglaoHuman'], 'combo6-3avgs': ['fraction', 'top50', 'binomial', 'markers', 'gAbove50_PanglaoMouse', 'gAbove50_PanglaoHuman'], 'combo7avgs': ['fraction', 'top50', 'binomial', 'markers', 'PubMedHits', 'gAbove50_PanglaoMouse', 'gAbove50_PanglaoHuman']}¶
-
prepareDEG(dfa, dfb, pvalueLimit=0.001)[source]¶ Save gene expression data of cell type of interest. Create rank dataframe (df_ranks) with genes ranked by differential expression
- Parameters:
- dfa: pandas.Dataframe
Dataframe containing expression data for cell type of interest Has genes as rows and (batches, cells) as columns
- dfb: pandas.Dataframe
Dataframe containing expression data for cells of type other than cell type of interest Has genes as rows and (batches, cells) as columns
- pvalueLimit: float, Default 0.001
Maximum possible p-value to include
- Returns:
None
- Usage:
prepareDEG(dfa, dfb)
-
preparePerBatchCase(**kwargs)[source]¶ Process gene expression data to generate per-batch distance measure and save to file. No plots are generated
- Parameters:
Any parameters that function ‘analyzeCase’ can accept
- Returns:
None
- Usage:
an = Analysis()
an.preparePerBatchCase()
-
prepareBootstrapExperiments(allDataToo=True, df_ranks=None, parallel=False)[source]¶ Prepare bootstrap experiments data and calculating gene statistics for each experiment
- Parameters:
- allDataToo: boolean, Default True
Whether to prepare experiment for all data as well
- df_ranks: pd.DataFrame, Default None
Genes ranked by differential expression If None function will use rank dataframe from working directory
- Returns:
None
- Usage:
an = Analysis()
an.prepareBootstrapExperiments()
-
compareTwoCases(saveDir1, saveDir2, name1='N1', name2='N2', saveName='saveName')[source]¶ Compare gene measurements between two cases for each bootstrap experiment
- Parameters:
- saveDir1: str
Directory storing gene measurement data for case 1
- saveDir2: str
Directory storing gene measurement data for case 2
- name1: str, Default ‘N1’
Phrase to append to keys of the resulting dataframe for case 1
- name2: str, Default ‘N2’
Phrase to append to keys of the resulting dataframe for case 2
- saveName: str, Default ‘saveName’
Name of file to save result dataframe to
- Returns:
None
- Usage:
an = Analysis()
an.compareTwoCases(saveDir1, saveDir2, name1, name2, saveName)
-
runPairOfExperiments(args)[source]¶ Analyze the case, compare it with comparison case, find the conserved genes between the cases, analyze case again
- Parameters:
- saveDir: str
Directory with all bootstrap experiments
- saveSubDir: str
Subdirectory for a bootstrap experiment
- otherCaseDir: str
Directory holding comparison data
- Returns:
None
- Usage:
For internal use only
-
analyzeBootstrapExperiments()[source]¶ Analyze all bootstrap experiments
- Parameters:
None
- Returns:
None
- Usage:
an = Analysis()
an.analyzeBootstrapExperiments()
-
analyzeCombinationVariant(variant)[source]¶ Analyze a combination of measures (same as in panels)
- Parameters:
- variant: str
Name of combination variant (e.g. ‘Avg combo4avgs’, ‘Avg combo3avgs’)
- Returns:
- pandas.DataFrame
Analysis result
- Usage:
an = Analysis()
an.analyzeCombinationVariant(variant)
-
scramble(measures, subDir='', case='All', N=10000, M=20, getMax=False, maxSuff='')[source]¶ Run control analysis for the dendrogram order
- Parameters:
- measures: list
Measures (e.g: [Markers’, ‘Binomial -log(pvalue)’, ‘Top50 overlap’])
- subDir: str, Default ‘’
Subdirectory to save dataframe to
- N: int
Chunk size
- M: int
Number of chunks
- Returns:
None
- Usage:
an = Analysis()
an.scramble (measures)
-
analyzeCase(df_expr, toggleCalculateMajorMetric=True, exprCutoff=0.05, toggleExportFigureData=True, toggleCalculateMeasures=True, suffix='', saveDir='', toggleGroupBatches=True, dpi=300, toggleAdjustText=True, markersLabelsRepelForce=1.5, figureSize=(8, 22), toggleAdjustFigureHeight=True, noPlot=False, halfWindowSize=10, printStages=True, externalPanelsData=None, toggleIncludeHeatmap=True, addDeprecatedPanels=False, includeClusterNumber=True, togglePublicationFigure=False)[source]¶ Analyze, calculate, and generate plots for individual experiment
- Parameters:
- df_expr: pandas.Dataframe
Gene expression data
- toggleCalculateMajorMatric: boolean, Default True
Whether to calculate cdist of major metric. This is a legacy parameter
- exprCutoff: float, Default 0.05
Cutoff for percent expression in a batch of input data
- toggleExportFigureData: boolean, Default True
Whether to export figure data
- toggleCalculateMeasures: boolean, Default True
Whether to calculate measures
- suffix: str, Default ‘’
Name of experiment
- saveDir: str, Default ‘’
Exerything is exported to this directory, should be unique for each dataset
- toggleGroupBatches: boolean, Default True
Whether to group batches or save per-batch distance measure
- dpi: int or ‘figure’, Default 300
Resolution in dots per inch, if ‘float’ use figures dpi value
- toggleAdjustText: boolean, Default True
Whether to use (external) module to minimize text overlap in figure
- figure_size: tuple, Default (8, 20)
Width, height in inches
- toggleAdjustFigureHeight: boolean, Default True
Whether to adjust figure height
- noPlot: boolean, Default False
Whether to generate plot
- halfWindowSize: int, Default 10
Moving average half-window size
- printStages: boolean, Default True
Whether to print stage status to output
- externalPanelsData: dict, Default None
Dictionary containing additional panels data
- toggleIncludeHeatmap: boolean, Default True
Whether to include heatmap in figure
- addDeprecatedPanels: boolean, Default False
Whether to include deprecated panels
- Returns:
None
- Usage:
self.analyzeCase(df_expr)
-
reanalyzeMain(case='All', **kwargs)[source]¶ Reanalyze case
- Parameters:
Any parameters that function ‘analyzeCase’ can accept
- Returns:
None
- Usage:
an = Analyze()
an.reanalyzeMain()
-
analyzeAllPeaksOfCombinationVariant(variant, nG=8, nE=30, fcutoff=0.5, width=50)[source]¶ Find all peaks and their frequency from the bootstrap experiments
- Parameters:
- variant: str
Name of combination variant (e.g. ‘Avg combo4avgs’, ‘Avg combo3avgs’)
- nG: int, Default 8
Number of clusters of genes
- nE: int, Default 30
Number of clusters of bootstrap experiments
- fcutoff: float, Default 0.5
Lower peak height cutoff
- width: int, Default 50
Width of peak
- Returns:
None
- Usage:
an = Analyze()
an.analyzeAllPeaksOfCombinationVariant(‘Avg combo4avgs’, nG=8, nE=30, fcutoff=0.5, width=50)