Reproducibility¶

This guide shows how to regenerate the figures and model outputs in this repository, including the batch scripts used for the full Midway3 run.

Prerequisites¶

Install the package (PyPI or source). See Installation.
Prepare data as described in Data. Paths below assume the repository layout (data/ and results/).

End-to-end workflow (single motif/RBP)¶

1) Scan PWM

accmix scan \
  -f data/fasta/test.fa \
  -p data/pwms/M00124_example.txt \
  -o results/M00124_example

2) Compute s_l (accessibility)

accmix annotate-acc \
  -n data/AS/ANC1C.hisat3n_table.bed6 \
  -f data/AS/ANC1xC.hisat3n_table.bed6 \
  -t results/M00124_example_topA.tsv.gz \
  -o results/M00124_example_sl.parquet \
  --M 50 --N 500

3) Annotate TSS / conservation / TPM

accmix annotate-tss \
  -i results/M00124_example_sl.parquet \
  -o results/M00124_example_annotated.parquet \
  -r data/evaluation/RNAseq_HeLa_TPM.parquet \
  -c data/evaluation/phastCons100way.bed.gz \
  -p data/evaluation/phastCons100way.parquet \
  -y data/evaluation/phyloP100way.parquet \
  -R data/fasta/test.fa

4) Fit EM model

accmix model \
  -i results/M00124_example_annotated.parquet \
  -o results/RBP_Motif \
  -r ExampleRBP \
  -m M00124

5) Evaluate

Run evaluation directly on the model parquet:

accmix evaluate \
  -M results/RBP_Motif.XXXXXX.model.parquet \
  -b data/clipseq/ELAVL1_HeLa.bed \
  -p data/evaluation/PIPseq_HeLa.parquet \
  -r ExampleRBP \
  -m M00124 \
  -o results \
  -L data/logos/M00124_fwd.png \
  -t 1.0 \
  -R 50

Outputs include evaluation plots and tables. Example artifacts from this repo:

Heatmap: assets/figures/ExampleRBP_M00124_heatmap_with_title_logo.png
Distribution: assets/figures/ExampleRBP_M00124.dist.val.png

Batch runs used for figures¶

The figures in this repository were generated by running two batch scripts from the repo root:

1) Fit models across Midway3 parquets

python scripts/run_accmix_models_over_midway3_results.py

Discovers data/midway3_results/*.parquet and matching CLIP beds.
Calls accmix model internals for each RBP/motif pair.
Writes per-run model parquet/JSON into results/.

2) Evaluate all fitted models

python scripts/run_accmix_evaluation_over_midway3_results.py

For each RBP/motif pair, finds the latest results/<RBP>_<Motif>*.model.parquet.
Invokes accmix evaluate -M <model.parquet> -b <CLIP.bed> -p <pipseq.parquet> -r <RBP> -m <Motif>.
Plots and logs are written under results/plots/ and results/logs/.

Running these two scripts sequentially reproduces the model fits and evaluation plots shown in the repository.

Rendered examples:

Example heatmap

Example distribution