============== scASprofiler-impute CLI ============== The scASprofiler-impute CLI implements a VAE-GAN framework for imputing highly sparse single-cell alternative splicing (AS) / junction count matrices. This command supports two modes: training and imputation. In training mode, scASprofiler learns a conditional generative model that captures cluster-specific splicing distributions, where each cell is associated with a discrete label (cluster / cell type). In imputation mode, it uses the trained decoder to generate in silico splicing profiles and then performs weighted KNN-style imputation (median over nearest synthetic neighbors) to fill in missing entries, while preserving observed values. you can generate a complete single-cell splice junction counts matrix by the command line like this: .. code-block:: bash scASprofiler-impute train --data-Sj /out/filter_sj_counts.csv --data-c onlyfillna_as_PRJEB15062_smart_seq2_label.txt --outdir ./ --clusters 2 --n-epochs 1000 --batch-size 8 --drop-prob 0.1 --patience 10 --overwrite --run-impute --name scasp_drop_0.1 -k 10 By default, you will have one output file in the outdir,the output file contains all information for quantify, for `scASprofiler-quantify`. Options ======= There are more parameters for setting (``scASprofiler-impute train --help`` always give the version you are using): .. code-block:: html parameter settings Options: data_sj: path to the splicing junction matrix file (feature × cell; missing values as NaN), used as the model input for training/imputation. data_c: path to the cell label/cluster file (one label per cell, aligned to the columns of the SJ matrix), used for conditional generation. outdir: output directory for saving checkpoints and imputed results. name: user-defined job name used to prefix output files; if empty, a name is derived from input filenames. n_epochs: number of training epochs. batch_size: mini-batch size used during training. drop_prob: fraction of observed entries randomly masked to build a pseudo-validation set for early stopping and model selection. patience: early-stopping patience; training stops if validation MSE does not improve for this many epochs. threthold: convergence threshold parameter (reserved for convergence control; may be used to judge training stabilization depending on implementation). channels: number of input channels for the reshaped SJ “image” (typically 1 for a single matrix). latent_dim: dimensionality of the latent space in the VAE encoder/decoder. clusters: number of clusters/classes in the label file (one-hot conditioning dimension). overwrite: whether to overwrite existing checkpoints with the same job configuration. run_impute: whether to run imputation immediately after training finishes. no_run_impute: disable automatic post-training imputation. sim_size: number of synthetic samples generated per cluster during imputation (larger gives smoother KNN statistics but costs more time/memory). k: number of nearest synthetic neighbors used in mask-aware KNN imputation.