Skip to content

Reproducibility

This guide shows how to regenerate the figures and model outputs in this repository, including the batch scripts used for the full Midway3 run.

Prerequisites

  • Install the package (PyPI or source). See Installation.
  • Prepare data as described in Data. Paths below assume the repository layout (data/ and results/).

End-to-end workflow (single motif/RBP)

1) Scan PWM

accmix scan \
  -f data/fasta/test.fa \
  -p data/pwms/M00124_example.txt \
  -o results/M00124_example

2) Compute s_l (accessibility)

accmix annotate-acc \
  -n data/AS/ANC1C.hisat3n_table.bed6 \
  -f data/AS/ANC1xC.hisat3n_table.bed6 \
  -t results/M00124_example_topA.tsv.gz \
  -o results/M00124_example_sl.parquet \
  --M 50 --N 500

3) Annotate TSS / conservation / TPM

accmix annotate-tss \
  -i results/M00124_example_sl.parquet \
  -o results/M00124_example_annotated.parquet \
  -r data/evaluation/RNAseq_HeLa_TPM.parquet \
  -c data/evaluation/phastCons100way.bed.gz \
  -p data/evaluation/phastCons100way.parquet \
  -y data/evaluation/phyloP100way.parquet \
  -R data/fasta/test.fa

4) Fit EM model

accmix model \
  -i results/M00124_example_annotated.parquet \
  -o results/RBP_Motif \
  -r ExampleRBP \
  -m M00124

5) Evaluate

Run evaluation directly on the model parquet:

accmix evaluate \
  -M results/RBP_Motif.XXXXXX.model.parquet \
  -b data/clipseq/ELAVL1_HeLa.bed \
  -p data/evaluation/PIPseq_HeLa.parquet \
  -r ExampleRBP \
  -m M00124 \
  -o results \
  -L data/logos/M00124_fwd.png \
  -t 1.0 \
  -R 50

Outputs include evaluation plots and tables. Example artifacts from this repo:

  • Heatmap: assets/figures/ExampleRBP_M00124_heatmap_with_title_logo.png
  • Distribution: assets/figures/ExampleRBP_M00124.dist.val.png

Batch runs used for figures

The figures in this repository were generated by running two batch scripts from the repo root:

1) Fit models across Midway3 parquets

python scripts/run_accmix_models_over_midway3_results.py
  • Discovers data/midway3_results/*.parquet and matching CLIP beds.
  • Calls accmix model internals for each RBP/motif pair.
  • Writes per-run model parquet/JSON into results/.

2) Evaluate all fitted models

python scripts/run_accmix_evaluation_over_midway3_results.py
  • For each RBP/motif pair, finds the latest results/<RBP>_<Motif>*.model.parquet.
  • Invokes accmix evaluate -M <model.parquet> -b <CLIP.bed> -p <pipseq.parquet> -r <RBP> -m <Motif>.
  • Plots and logs are written under results/plots/ and results/logs/.

Running these two scripts sequentially reproduces the model fits and evaluation plots shown in the repository.

Rendered examples:

Example heatmap

Example distribution