Generating inputs for BEELINE

BoolODE provides additional scripts, located in scripts/, to process simulation output for use with BEELINE.

Compute Pseudotime using Slingshot

Note

runSlingshot.py requires the Slingshot Docker container. Please make sure Docker has been set up and the container has been built.

Slingshot computes pseudotime trajectories for a given dataset by first carrying out dimensionality reduction, then carrying out k-means clustering on the low dimensional embedding in order to compute trajectories. The number of clusters expected depends on the features of the dataset. For instance, a dataset with two steady state clusters should be specified with --nClusters 3, specifying an additional initial state cluster.

runSlingshot.py takes the following command line arguments:

-h, --help            show this help message and exit
--outPrefix=OUTPREFIX
                      Prefix for output files.
-e EXPR, --expr=EXPR  Path to expression data file
-p PSEUDO, --pseudo=PSEUDO
                      Path to 'pseudotime' file generated by BoolODE
-c NCLUSTERS, --nClusters=NCLUSTERS
                      Number of expected clusters in the dataset.
--noEnd               Do not force Slingshot to have an end state.
-r PERPLEXITY, --perplexity=PERPLEXITY
                      Perplexity for tSNE.

Note

runSlingshot.py requires a ‘pseudotime’ file passed using the --pseudo option. This file should contain the actual simulation time from BoolODE, which can then be used to compare the quality of the inferred trajectory with the actual simulation time values. This file is NOT required by Slingshot itself.

Generate dropouts from expression data

In order to mimic real single-cell expression datasets, BoolODE includes genDropouts.py which implements dropouts as described in the paper, by dropping expression values below DROP_CUTOFF using a probability of DROP_PROB. This script samples NCELLS from the columns in the expression dataset EXPR, and will throw an error if the number is greater than the number of columns.

genDropouts.py takes the following command line arguments:

-h, --help            show this help message and exit
--outPrefix=OUTPREFIX
                      Prefix for output files.
-e EXPR, --expr=EXPR  Path to expression data file
-p PSEUDO, --pseudo=PSEUDO
                      Path to pseudotime file
-r REFNET, --refNet=REFNET
                      Path to reference network file
-n NCELLS, --nCells=NCELLS
                      Number of cells to sample.
-d, --dropout         Carry out dropout analysis? [Optional]
--drop-cutoff=DROP_CUTOFF
                      Specify percentile cutoff on gene expression
--drop-prob=DROP_PROB
                      Specify the probability of dropping a gene below
                      quantile q. Ensure 0 < DROP_PROB < 1.
-i SAMPLENUM, --samplenum=SAMPLENUM
                      Sample Number

Attention

Ensure the --dropout option is passed. If not, genDropouts will still randomly sample cells but will not drop out any values.