Mixcr on Biowulf
Quick Links
Obtaining license
|
Starting Jan 1st, 2025, to run MiXCR on biowulf cluster, please request free individual academic license from here
MiXCR is a universal software for fast and accurate analysis of T- and B- cell receptor repertoire sequencing data.
It works with any kind of sequencing data:
- Bulk repertoire sequencing data with or without UMIs
- Single cell sequencing data including but not limited to 10x Genomics protocols
- RNA-Seq or any other kind of fragmented/shotgun data which may contain just a tiny fraction of target sequences
- and any other kind of sequencing data containing TCRs or BCRs
Powerful downstream analysis tools allow to obtain vector plots and tabular results for multiple measures. Key features include:
- Ability to group samples by metadata values and compare repertoire features between groups
- Comprehensive repertoire normalization and filtering
- Statistical significance tests with proper p-value adjustment
- Repertoire overlap analysis
- Vector plots output (.svg / .pdf)
- Tabular outputs
Other key features:
- Clonotype assembly by arbitrary gene feature, including full-length variable region
- PCR / Sequencing error correction with or without aid of UMI or Cell barcodes
- Robust and dedicated aligner algorithms for maximum extraction with zero false-positive rate
- Supports any custom barcode sequences architecture (UMI / Cell)
- Human, Mice, Rat, Spalax, Alpaca, Monkey
- Support IMGT reference
- Barcodes error-correction
- Adapter trimming
- Optional CDR3 reconstruction by assembling overlapping fragmented sequencing reads into complete CDR3-containing contigs when the read position is floating (e.g. shotgun-sequencing, RNA-Seq etc.)
- Optional contig assembly to build longest possible TCR/IG sequence from available data (with or without aid of UMI or Cell barcodes)
- Comprehensive quality control reports provided at all the steps of the pipeline
- Regions not covered by the data may be imputed from germline
- Exhaustive output information for clonotypes and alignments:
- nucleotide and amino acid sequences of all immunologically relevant regions (FR1, CDR1, ..., CDR3, etc..)
- identified V, D, J, C genes
- comprehensive information on nucleotide and amino acid mutations
- positions of all immunologically relevant points in output sequences
- and many more informative columns
- Ability to backtrack fate of each raw sequencing read through the whole pipeline
Obtaining license
Documentation
Citation
When using MiXCR under ACADEMIC LICENSE in journal publications, please cite the following publications:
- Dmitriy A. Bolotin, Stanislav Poslavsky, Igor Mitrophanov, Mikhail Shugay, Ilgar Z. Mamedov, Ekaterina V. Putintseva, and Dmitriy M. Chudakov. "MiXCR: software for comprehensive adaptive immunity profiling." Nature methods 12, no. 5 (2015): 380-381.
(Files referenced in this paper can be found here.)
- Dmitriy A. Bolotin, Stanislav Poslavsky, Alexey N. Davydov, Felix E. Frenkel, Lorenzo Fanchi, Olga I. Zolotareva, Saskia Hemmers, Ekaterina V. Putintseva, Anna S. Obraztsova, Mikhail Shugay, Ravshan I. Ataullakhanov, Alexander Y. Rudensky, Ton N. Schumacher & Dmitriy M. Chudakov. "Antigen receptor repertoire profiling from RNA-seq data." Nature Biotechnology 35, 908–911 (2017)
Important Notes
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task=4 --mem=10g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load mixcr
[user@cn3144 ~]$ mixcr align -r log.txt -t $SLURM_CPUS_PER_TASK \
/data/$USER/S1.fastq.gz \
/data/$USER/S2.fastq.gz \
aln.vdjca
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Batch job
Create a batch input file (e.g. mixcr.sh). For example:
#!/bin/bash
set -e
module load mixcr
mixcr align -r log.txt -t $SLURM_CPUS_PER_TASK \
/data/$USER/S1.fastq.gz \
/data/$USER/S2.fastq.gz \
aln.vdjca
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=4 --mem=10g mixcr.sh
Swarm of Jobs
A
swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.
Create a swarmfile (e.g. mixcr.swarm). For example:
cd /data/$USER/dir1; mixcr align -r log.txt -t $SLURM_CPUS_PER_TASK S1.fastq.gz S2.fastq.gz aln.vdjca
cd /data/$USER/dir2; mixcr align -r log.txt -t $SLURM_CPUS_PER_TASK S1.fastq.gz S2.fastq.gz aln.vdjca
cd /data/$USER/dir3; mixcr align -r log.txt -t $SLURM_CPUS_PER_TASK S1.fastq.gz S2.fastq.gz aln.vdjca
cd /data/$USER/dir4; mixcr align -r log.txt -t $SLURM_CPUS_PER_TASK S1.fastq.gz S2.fastq.gz aln.vdjca
Submit this job using the swarm command.
swarm -f mixcr.swarm -g 10 -t 4 --module mixcr
where
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file)
|
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file).
|
--module mixcr | Loads the mixcr module for each subjob in the swarm
|