Mixcr on Biowulf

Quick Links

Obtaining license

Documentation

Starting Jan 1st, 2025, to run MiXCR on biowulf cluster, please request free individual academic license from here

MiXCR is a universal software for fast and accurate analysis of T- and B- cell receptor repertoire sequencing data.

It works with any kind of sequencing data:

Bulk repertoire sequencing data with or without UMIs
Single cell sequencing data including but not limited to 10x Genomics protocols
RNA-Seq or any other kind of fragmented/shotgun data which may contain just a tiny fraction of target sequences
and any other kind of sequencing data containing TCRs or BCRs

Powerful downstream analysis tools allow to obtain vector plots and tabular results for multiple measures. Key features include:

Ability to group samples by metadata values and compare repertoire features between groups
Comprehensive repertoire normalization and filtering
Statistical significance tests with proper p-value adjustment
Repertoire overlap analysis
Vector plots output (.svg / .pdf)
Tabular outputs

Other key features:

Clonotype assembly by arbitrary gene feature, including full-length variable region
PCR / Sequencing error correction with or without aid of UMI or Cell barcodes
Robust and dedicated aligner algorithms for maximum extraction with zero false-positive rate
Supports any custom barcode sequences architecture (UMI / Cell)
Human, Mice, Rat, Spalax, Alpaca, Monkey
Support IMGT reference
Barcodes error-correction
Adapter trimming
Optional CDR3 reconstruction by assembling overlapping fragmented sequencing reads into complete CDR3-containing contigs when the read position is floating (e.g. shotgun-sequencing, RNA-Seq etc.)
Optional contig assembly to build longest possible TCR/IG sequence from available data (with or without aid of UMI or Cell barcodes)
Comprehensive quality control reports provided at all the steps of the pipeline
Regions not covered by the data may be imputed from germline
Exhaustive output information for clonotypes and alignments:
- nucleotide and amino acid sequences of all immunologically relevant regions (FR1, CDR1, ..., CDR3, etc..)
- identified V, D, J, C genes
- comprehensive information on nucleotide and amino acid mutations
- positions of all immunologically relevant points in output sequences
- and many more informative columns
- Ability to backtrack fate of each raw sequencing read through the whole pipeline

Obtaining license

Obtain free individual academmic license for each individual user from here
Put mi.license content to MI_LICENSE global variable by adding the following line in ~/.bashrc file:
```
export MI_LICENSE="CopyPasteLicenseKeyHere"
```
make sure to remove all old license info:
- rm ~/.mi.license
- rm ~/mi.license
- unset MI_LICENSE_FILE

Documentation

See full documentation at https://docs.milaboratories.com.

Citation

When using MiXCR under ACADEMIC LICENSE in journal publications, please cite the following publications:

Dmitriy A. Bolotin, Stanislav Poslavsky, Igor Mitrophanov, Mikhail Shugay, Ilgar Z. Mamedov, Ekaterina V. Putintseva, and Dmitriy M. Chudakov. "MiXCR: software for comprehensive adaptive immunity profiling." Nature methods 12, no. 5 (2015): 380-381.
(Files referenced in this paper can be found here.)
Dmitriy A. Bolotin, Stanislav Poslavsky, Alexey N. Davydov, Felix E. Frenkel, Lorenzo Fanchi, Olga I. Zolotareva, Saskia Hemmers, Ekaterina V. Putintseva, Anna S. Obraztsova, Mikhail Shugay, Ravshan I. Ataullakhanov, Alexander Y. Rudensky, Ton N. Schumacher & Dmitriy M. Chudakov. "Antigen receptor repertoire profiling from RNA-seq data." Nature Biotechnology 35, 908–911 (2017)

Important Notes

Module Name: mixcr (see the modules page for more information)
Multithreaded

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --cpus-per-task=4 --mem=10g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load mixcr

[user@cn3144 ~]$ mixcr align -r log.txt -t $SLURM_CPUS_PER_TASK \
		/data/$USER/S1.fastq.gz \
		/data/$USER/S2.fastq.gz \
		aln.vdjca 

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. mixcr.sh). For example:

#!/bin/bash
set -e
module load mixcr
mixcr align -r log.txt -t $SLURM_CPUS_PER_TASK \
		/data/$USER/S1.fastq.gz \
		/data/$USER/S2.fastq.gz \
		aln.vdjca

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=4 --mem=10g mixcr.sh

Swarm of Jobs

A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. mixcr.swarm). For example:

cd /data/$USER/dir1; mixcr align -r log.txt -t $SLURM_CPUS_PER_TASK S1.fastq.gz S2.fastq.gz aln.vdjca
cd /data/$USER/dir2; mixcr align -r log.txt -t $SLURM_CPUS_PER_TASK S1.fastq.gz S2.fastq.gz aln.vdjca
cd /data/$USER/dir3; mixcr align -r log.txt -t $SLURM_CPUS_PER_TASK S1.fastq.gz S2.fastq.gz aln.vdjca
cd /data/$USER/dir4; mixcr align -r log.txt -t $SLURM_CPUS_PER_TASK S1.fastq.gz S2.fastq.gz aln.vdjca

Submit this job using the swarm command.

swarm -f mixcr.swarm -g 10 -t 4 --module mixcr

where

`-g #`	Number of Gigabytes of memory required for each process (1 line in the swarm command file)
`-t #`	Number of threads/CPUs required for each process (1 line in the swarm command file).
`--module mixcr`	Loads the mixcr module for each subjob in the swarm