BLEval package

Submodules

BLEval.AUPRC module

class BLEval.AUPRC.AUPRC[source]

Bases: Evaluator

Evaluator that computes the area under the Precision-Recall curve (AUPRC) for each algorithm against the ground truth network.

For each DatasetGroup, writes AUPRC.csv to dataset_path. Rows are algorithms and columns are run_ids. Runs whose ground truth file is missing are skipped with a warning.

BLEval.AUROC module

class BLEval.AUROC.AUROC[source]

Bases: Evaluator

Evaluator that computes the area under the ROC curve (AUROC) for each algorithm against the ground truth network.

For each DatasetGroup, writes AUROC.csv to dataset_path. Rows are algorithms and columns are run_ids. Runs whose ground truth file is missing are skipped with a warning.

BLEval.BLTime module

class BLEval.BLTime.BLTime[source]

Bases: Evaluator

Evaluator that reports the CPU time consumed by each algorithm.

Timing files (time*.txt) are produced by the time -v shell utility and written to each algorithm’s working_dir during the run phase. CPU time is defined as user time + system time; multi-trajectory algorithms may produce multiple files whose values are summed.

For each DatasetGroup, writes time.csv to dataset_path. Rows are algorithms and columns are run_ids. Missing timing files produce nan entries. A dictionary mapping algorithm name to CPU time (in seconds) is returned for each dataset group.

BLEval.Borda module

class BLEval.Borda.Borda[source]

Bases: Evaluator

Evaluator that aggregates per-algorithm ranked edge lists into a single consensus ranking using Borda count.

For each run, four aggregation methods are computed: mean, median, min, and max of per-algorithm Borda scores. Per-run scores for each method are then summarised across runs by taking the median. The final output is a single file written to dataset_path:

dataset_path/Borda.csv

Columns: Gene1, Gene2, BORDA (median of mean), mBORDA (median of median), sBORDA (median of min), smBORDA (median of max). Rows are sorted by BORDA descending.

BLEval.EarlyPrecision module

class BLEval.EarlyPrecision.EarlyPrecision(tf_edges: bool = False)[source]

Bases: Evaluator

Evaluator that computes the Early Precision Ratio (EPR) for each algorithm against the ground truth network.

EPR equals early precision (fraction of true positives in top-k predictions) divided by the expected precision of a random predictor over the candidate edge universe. k equals the number of ground truth edges excluding self-loops. EPR > 1 indicates better-than-random performance; EPR = 1 is the random baseline; EPR < 1 is below random.

When tf_edges is True, the candidate universe and predictions are restricted to TF→gene edges (appropriate for experimental scRNA-seq data). When False, all directed non-self-loop gene pairs form the universe (appropriate for synthetic/simulated data).

For each DatasetGroup, writes EarlyPrecision.csv to dataset_path. Rows are algorithms and columns are run_ids.

BLEval.Jaccard module

class BLEval.Jaccard.Jaccard[source]

Bases: Evaluator

Evaluator that computes the median pairwise Jaccard index of top-k predicted networks across runs sharing the same ground truth.

The Jaccard index measures the stability of each algorithm’s predictions: runs within a DatasetGroup are perturbations of the same biological system, so a high median Jaccard indicates that the algorithm produces consistent top-k edge sets regardless of sampling noise.

k is set to the number of edges in the ground truth network (excluding self-loops), consistent with the EarlyPrecision evaluator. For each DatasetGroup, writes Jaccard.csv to dataset_path. Rows are algorithms and the single column is the median Jaccard index across all run pairs. DatasetGroups with fewer than two runs produce nan for all algorithms.

BLEval.Motifs module

class BLEval.Motifs.Motifs[source]

Bases: Evaluator

Evaluator that computes ratios of three-node feedback loop (FBL), three-node feedforward loop (FFL), and two-node mutual interaction (MI) motif counts between the predicted top-k network and the reference network.

For each algorithm and run, the top-k predicted edges (k = number of ground truth edges, excluding self-loops) are used to build a directed graph. Motif counts in that graph are divided by the corresponding counts in the ground truth network to produce dimensionless ratios.

For each DatasetGroup, writes three CSV files to dataset_path:
  • motifs_FBL.csv — three-node feedback loop ratios

  • motifs_FFL.csv — three-node feedforward loop ratios

  • motifs_MI.csv — two-node mutual interaction ratios

Each CSV has algorithms as rows and run_ids as columns.

BLEval.PathStats module

class BLEval.PathStats.PathStats[source]

Bases: Evaluator

Evaluator that characterises false positive predicted edges by their shortest-path distance in the ground truth network.

For each algorithm, the top-k predicted edges (k = |reference edges|) are split into true positives and false positives. Each false positive (u, v) is looked up in the reference graph; if a directed path from u to v exists its length is binned (2–5); otherwise it is counted as having no path.

For each DatasetGroup, one CSV per run is written to dataset_path:

dataset_path/PathStats_{run_id}.csv

Rows are algorithms; columns are ‘0’, ‘2’, ‘3’, ‘4’, ‘5’, numPred, numTP, numFP_withPath, numFP_noPath. Runs whose ground truth file is missing are skipped with a warning.

BLEval.SignedEarlyPrecision module

class BLEval.SignedEarlyPrecision.SignedEarlyPrecision[source]

Bases: Evaluator

Evaluator that computes early precision separately for activation and inhibitory edges in the ground truth network.

For activation edges, the top-ka predictions are inspected, where ka is the number of activation edges in the reference network excluding self-loops. Inhibitory precision is computed analogously with ki. For each DatasetGroup, writes EarlyPrecisionActivation.csv and EarlyPrecisionInhibitory.csv to dataset_path. Rows are algorithms and columns are run_ids. Runs whose ground truth file is missing or lacks a Type column are skipped with a warning.

BLEval.Spearman module

class BLEval.Spearman.Spearman[source]

Bases: Evaluator

Evaluator that measures how consistently each algorithm ranks edges across runs within a DatasetGroup.

For each algorithm, all pairs of runs are compared by computing the Spearman rank correlation of their signed EdgeWeight vectors over the fixed universe of all directed gene pairs in the ground truth (missing edges receive weight 0, signed weights preserved). Weight arrays are precomputed once per run and all pairwise correlations are obtained via a single matrix call to scipy.stats.spearmanr. The median and mean absolute deviation of all pairwise correlations are reported. For each DatasetGroup, writes Spearman.csv to dataset_path. Rows are algorithms; columns are MedianSpearman and MADSpearman.

BLEval.data module

class BLEval.data.DatasetGroup(dataset_id: str, runs: List[RunResult], dataset_path: Path)[source]

Bases: object

All runs belonging to one dataset entry in the config.

Runs in a DatasetGroup share the same ground truth network and represent multiple perturbations or noise realisations of the same biological system. Evaluation metrics (e.g. Jaccard, Spearman) are aggregated across runs within a group.

class BLEval.data.EvaluationData(config: dict, root: Optional[Path] = None)[source]

Bases: object

Loads and organises predicted networks from the output directory, mirroring the hierarchical structure of the ‘datasets’ section of config.yaml.

Top level: datasets (DatasetGroup), each grouping multiple runs that share the same ground truth. Within each run, ranked edge lists are keyed by algorithm name. Algorithms with missing output files are skipped.

class BLEval.data.RunResult(run_id: str, ranked_edges: Dict[str, DataFrame], ground_truth_path: Path, run_path: Path)[source]

Bases: object

Predicted networks and ground truth reference for a single run.

ranked_edges maps algorithm name to its ranked edge list DataFrame (columns: Gene1, Gene2, EdgeWeight). Algorithms whose rankedEdges.csv is missing are omitted with a warning rather than raising an error.

BLEval.evaluator module

class BLEval.evaluator.Evaluator[source]

Bases: ABC

Abstract base class for BEELINE evaluation methods.

Each subclass implements __call__ to compute a specific evaluation metric over an EvaluationData object and write results to disk. Output is written to each DatasetGroup’s dataset_path so results are co-located with the predicted networks they describe.

Module contents