============== scASprofiler-perp CLI ============== scASprofiler provides three CLI directly available in your Python path: ``scASprofiler-perp``, ``scASprofiler-impute``, ``scASprofiler-quantify`` . This module implements the core functionality of **scASprofiler** for building a high-quality single-cell splicing junction (SJ) count matrix. Starting from STAR junction outputs, it performs junction preprocessing, gene annotation mapping with a GTF file, intron grouping by splice sites (3' / 5'), and multi-round quality control (QC) filtering. The final outputs include group-aware SJ matrices, a junction metadata table, and a normalized/padded matrix ready for downstream modeling. you can generate a filitered single-cell splice junction counts matrix a list of Sj files by the command line like this: .. code-block:: bash # for smart-seq scASprofiler-perp run --sj-dir /SJ --gtf gencode.v46.annotation.gtf --outdir ./out --samples-ps 1 --sites-ps 20 --sites-thres 10 --samples-thres 1000 # for droplet, e.g. 10x Genomics scASprofiler-perp run --sj-dir /SJ --gtf gencode.v46.annotation.gtf --outdir ./out --samples-ps 1 --sites-ps 20 --sites-thres 10 --samples-thres 1000 --x10 By default, you will have three output files in the outdir: ``filter_sj_counts.csv``, ``sj_meta.csv``, and ``raw_sj_counts.csv``. The ``filter_sj_counts.csv`` contains all information for imputation, e.g., for `scASprofiler-impute`. Options ======= There are more parameters for setting (``scASprofiler-perp run --help`` always give the version you are using): .. code-block:: html parameter settings Options: sj_dir: directory of STAR splicing junction files (e.g., SJ.out.tab); used as the input junction table for filtering and grouping. gtf: GTF annotation file path. outdir: output directory for all generated results. samples_ps: group-wise thresholding parameter; minimum number of observed (non-NaN) cells per junction within an intron group—junctions below this are set to missing (NaN). sites_ps: group-wise thresholding parameter; minimum total counts per cell within an intron group—cells below this are set to missing (NaN) within that group. sites_thres: site-level QC threshold; minimum number of expressing cells per junction (row-wise non-NaN count) to retain a junction. samples_thres: cell-level QC threshold; minimum number of expressing junctions per cell (column-wise non-NaN count) to retain a cell. use_ray: whether to enable Ray parallelism for group-wise threshold filtering (useful for large datasets). num_cpus: number of CPUs to allocate for Ray-based parallel computation. filter_unique_gene: whether to keep only junctions uniquely assigned to a single gene (reduces cross-gene ambiguity). keep_multi_gene: whether to retain junctions that map to multiple genes (less strict; may include ambiguous loci). use_multi: whether to include multi-mapped reads/junction counts if present in the input matrix (behavior depends on how the upstream counts were generated). plate: pipeline switch; whether to use the plate-based workflow (smart-seq2). x10: alias for the 10x (droplet-based) pipeline; if enabled, overrides the plate/tenx selection.