Generating inputs for BEELINE
BoolODE provides additional scripts, located in scripts/, to
process simulation output for use with BEELINE.
Compute Pseudotime using Slingshot
Note
runSlingshot.py requires the Slingshot Docker container. Please make sure Docker has been set up and the container has been built.
Slingshot computes pseudotime trajectories for a given dataset by
first carrying out dimensionality reduction, then carrying out
k-means clustering on the low dimensional embedding in order to
compute trajectories. The number of clusters expected depends on the
features of the dataset. For instance, a dataset with two steady state
clusters should be specified with --nClusters 3, specifying an
additional initial state cluster.
runSlingshot.py takes the following command line arguments:
-h, --help show this help message and exit
--outPrefix=OUTPREFIX
Prefix for output files.
-e EXPR, --expr=EXPR Path to expression data file
-p PSEUDO, --pseudo=PSEUDO
Path to 'pseudotime' file generated by BoolODE
-c NCLUSTERS, --nClusters=NCLUSTERS
Number of expected clusters in the dataset.
--noEnd Do not force Slingshot to have an end state.
-r PERPLEXITY, --perplexity=PERPLEXITY
Perplexity for tSNE.
Note
runSlingshot.py requires a ‘pseudotime’ file passed
using the --pseudo option. This file should contain the
actual simulation time from BoolODE, which can then be used
to compare the quality of the inferred trajectory with the
actual simulation time values. This file is NOT required by
Slingshot itself.
Generate dropouts from expression data
In order to mimic real single-cell expression datasets, BoolODE
includes genDropouts.py which implements dropouts as described in
the paper, by dropping expression values below DROP_CUTOFF using a
probability of DROP_PROB. This script samples NCELLS from the
columns in the expression dataset EXPR, and will throw an error if
the number is greater than the number of columns.
genDropouts.py takes the following command line arguments:
-h, --help show this help message and exit
--outPrefix=OUTPREFIX
Prefix for output files.
-e EXPR, --expr=EXPR Path to expression data file
-p PSEUDO, --pseudo=PSEUDO
Path to pseudotime file
-r REFNET, --refNet=REFNET
Path to reference network file
-n NCELLS, --nCells=NCELLS
Number of cells to sample.
-d, --dropout Carry out dropout analysis? [Optional]
--drop-cutoff=DROP_CUTOFF
Specify percentile cutoff on gene expression
--drop-prob=DROP_PROB
Specify the probability of dropping a gene below
quantile q. Ensure 0 < DROP_PROB < 1.
-i SAMPLENUM, --samplenum=SAMPLENUM
Sample Number
Attention
Ensure the --dropout option is passed. If not, genDropouts
will still randomly sample cells but will not drop out any values.