Liger2LiGer: Nanopore chimera splitting/detection
Liger2LiGer is a Nanopore chimera splitting/detection tool. Automated end-to-end chimera detection and dataset evaluation starting from fastq.
Documentation
Important Notes
- Module Name: liger2liger (see the modules page for more information)
- Unusual environment variables set
- L2L_HOME installation directory
- L2L_BIN executable directory
- L2L_SRC source code directory
- L2L_DATA sample data directory
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=4g -c4 [user@cn0911 ~]$module load liger2liger [+] Loading singularity 3.10.5 on cn0911 [+] Loading liger2liger 20230901 [user@cn0911 ~]$evaluate_chimeras.py -h usage: evaluate_chimeras.py [-h] --ref REF --fastq FASTQ [--dry] [--n_threads N_THREADS] optional arguments: -h, --help show this help message and exit --ref REF, -r REF Path of reference fasta to use for alignment --fastq FASTQ, -i FASTQ comma-separated list of fastq file paths to be aligned --dry echo the commands instead of executing them --n_threads N_THREADS How many threads to use for minimap2 alignment [user@cn0911 ~]$extract_reads_from_fastq.py -h usage: extract_reads_from_fastq.py [-h] --fastq FASTQ --ids IDS optional arguments: -h, --help show this help message and exit --fastq FASTQ path of file containing FASTA/FASTQ sequence --ids IDS path of file containing 1 id per line to be queried [user@cn0911 ~]$filter_paf_by_read_name.py -h usage: filter_paf_by_read_name.py [-h] --paf PAF --names NAMES optional arguments: -h, --help show this help message and exit --paf PAF path of PAF file to be filtered --names NAMES path of file containing a list of read ids, one on each line [user@cn0911 ~]$generate_chimer_stats.py usage: generate_chimer_stats.py [-h] --input INPUT [--output_dir OUTPUT_DIR] [--dry] optional arguments: -h, --help show this help message and exit --input INPUT, -i INPUT Comma separated paths of length txt files containing lengths of chimers and non-chimers (one length [in bp] per line). Can run any number of pairs of txt files as long as they have the suffix non_chimer_lengths.txt or chimer_lengths.txt and pairs share a prefix. --output_dir OUTPUT_DIR, -o OUTPUT_DIR Optionally create a directory to store the output. --dry, -d Don't create any files, report output paths only.Run liger2liger on sample data:
[user@cn0911 ~]$mkdir /data/$USER/L2L && cd /data/$USER/L2L [user@cn0911 ~]$cp $L2L_DATA/* . [user@cn0911 ~]$ln -s /fdb/igenomes/Drosophila_melanogaster/UCSC/dm6/Sequence/Chromosomes/chrX.fa [user@cn0911 ~]$evaluate_chimeras.py --ref chrX.fa --fastq test.fastq --n_threads 2 STARTING: test_VS_chrX Found executable: /opt/conda/envs/l2l/Liger2LiGer/build/filter_chimeras_from_alignment minimap2 -x map-ont --secondary=no -n 10 -K 10g -k 17 -t 2 chrX.fa test.fastq [M::mm_idx_gen::0.863*1.00] collected minimizers [M::mm_idx_gen::1.122*1.23] sorted minimizers [M::main::1.122*1.23] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::1.197*1.22] mid_occ = 30 [M::mm_idx_stat] kmer size: 17; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::1.255*1.21] distinct minimizers: 3830080 (96.13% are singletons); average occurrences: 1.118; average spacing: 5.500 [M::worker_pipeline::1.276*1.21] mapped 350 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -x map-ont --secondary=no -n 10 -K 10g -k 17 -t 2 chrX.fa test.fastq [M::main] Real time: 1.286 sec; CPU: 1.551 sec; Peak RSS: 0.216 GB Writing chimeric reads to file: "/vf/users/user/L2L/test_VS_chrX/test_VS_chrX.chimeric_reads.txt" Writing non-chimeric reads to file: "/vf/users/user/L2L/test_VS_chrX/test_VS_chrX.non_chimeric_reads.txt" Writing chimeric lengths to file: "/vf/users/user/L2L/test_VS_chrX/test_VS_chrX.chimer_lengths.txt" Writing non-chimeric lengths to file: "/vf/users/user/L2L/test_VS_chrX/test_VS_chrX.non_chimer_lengths.txt" /vf/users/user/L2L/test_VS_chrX/test_VS_chrX.non_chimer_lengths.txt test_VS_chrX True True /vf/users/user/L2L/test_VS_chrX/test_VS_chrX.chimer_lengths.txt test_VS_chrX False True test N50 2625.0 Writing outputs to: /vf/users/user/L2L/test_VS_chrX/results_09_27_2023_09:03:52.csv Writing outputs to: results_09_27_2023_09:03:52.csv [user@cn0911 ~]$exitEnd the interactive session:
[user@cn0911 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$