.. _geninputs:

Generating inputs for BEELINE
##############################

BoolODE provides additional scripts, located in ``scripts/``, to
process simulation output for use with BEELINE.

Compute Pseudotime using Slingshot
###################################

.. note:: runSlingshot.py requires the Slingshot Docker container. Please make sure
          Docker has been set up and the container has been built.


Slingshot computes pseudotime trajectories for a given dataset by
first carrying out dimensionality reduction, then carrying out
*k*-means clustering on the low dimensional embedding in order to
compute trajectories. The number of clusters expected depends on the
features of the dataset. For instance, a dataset with two steady state
clusters should be specified with ``--nClusters 3``, specifying an
additional initial state cluster.

``runSlingshot.py`` takes the following command line arguments:

.. code:: text

  -h, --help            show this help message and exit
  --outPrefix=OUTPREFIX
                        Prefix for output files.
  -e EXPR, --expr=EXPR  Path to expression data file
  -p PSEUDO, --pseudo=PSEUDO
                        Path to 'pseudotime' file generated by BoolODE
  -c NCLUSTERS, --nClusters=NCLUSTERS
                        Number of expected clusters in the dataset.
  --noEnd               Do not force Slingshot to have an end state.
  -r PERPLEXITY, --perplexity=PERPLEXITY
                        Perplexity for tSNE.


.. note:: ``runSlingshot.py`` requires a 'pseudotime' file passed
           using the ``--pseudo`` option. This file should contain the
           actual simulation time from BoolODE, which can then be used
           to compare the quality of the inferred trajectory with the
           actual simulation time values. This file is NOT required by
           Slingshot itself.


Generate dropouts from expression data
########################################

In order to mimic real single-cell expression datasets, BoolODE
includes ``genDropouts.py`` which implements dropouts as described in
the paper, by dropping expression values below ``DROP_CUTOFF`` using a
probability of ``DROP_PROB``. This script samples ``NCELLS`` from the
columns in the expression dataset ``EXPR``, and will throw an error if
the number is greater than the number of columns.

``genDropouts.py`` takes the following command line arguments:

.. code:: text

  -h, --help            show this help message and exit
  --outPrefix=OUTPREFIX
                        Prefix for output files.
  -e EXPR, --expr=EXPR  Path to expression data file
  -p PSEUDO, --pseudo=PSEUDO
                        Path to pseudotime file
  -r REFNET, --refNet=REFNET
                        Path to reference network file
  -n NCELLS, --nCells=NCELLS
                        Number of cells to sample.
  -d, --dropout         Carry out dropout analysis? [Optional]
  --drop-cutoff=DROP_CUTOFF
                        Specify percentile cutoff on gene expression
  --drop-prob=DROP_PROB
                        Specify the probability of dropping a gene below
                        quantile q. Ensure 0 < DROP_PROB < 1.
  -i SAMPLENUM, --samplenum=SAMPLENUM
                        Sample Number


.. attention:: Ensure the ``--dropout`` option is passed. If not, ``genDropouts``
               will still randomly sample cells but will not drop out any values.