From the Cell Ranger manual:
Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data. Cell Ranger ATAC includes two pipelines relevant to single cell chromatin accessibility experiments:
- cellranger-atac mkfastq demultiplexes raw base call (BCL) files generated by Illumina® sequencers into FASTQ files. It is a wrapper around bcl2fastq from Illumina®, with additional useful features that are specific to 10x Genomics libraries and a simplified sample sheet format.
- cellranger-atac count takes FASTQ files from cellranger-atac mkfastq and performs ATAC analysis, including:
- Read filtering and alignment
- Barcode counting
- Identification of transposase cut sites
- Detection of accessible chromatin peaks
- Cell calling
- Count matrix generation for peaks and transcription factors
- Dimensionality reduction
- Cell clustering
- Cluster differential accessibility
- Module Name: cellranger-atac (see the modules page for more information)
- cellranger-atac can operate in local mode or
cluster mode. In both cases, the local part of the job will use
multiple CPUs. Users have to specify the number of allocated CPUs and amount of memory
--localcores=# --localmem=#
to cellranger-atac. - cellranger-atac may attempt to start more processes or open more files than the default limits on our compute nodes allow. If you encounter errors or strange results, you may have to raise these limits. See below for more deails.
- Reference data can be found in /fdb/cellranger-atac
- Test data can be found in
Allocate an interactive session and run the program. Sample session:
Copy the bcl format test data and run the demux pipeline
[user@biowulf]$ sinteractive --cpus-per-task=6 --mem=35g salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load cellranger-atac [user@cn3144 ~]$ cp $CELLRANGER_ATAC_TEST_DATA/* . [user@cn3144 ~]$ tar -xzf cellranger-atac-tiny-bcl-1.0.0.tar.gz [user@cn3144 ~]$ cellranger-atac mkfastq --run=cellranger-atac-tiny-bcl-1.0.0 \ --samplesheet=cellranger-atac-tiny-bcl-samplesheet-1.0.0.csv \ --localcores=$SLURM_CPUS_PER_TASK \ --localmem=34 cellranger-atac mkfastq (2.1.0) Copyright (c) 2018 10x Genomics, Inc. All rights reserved. ------------------------------------------------------------------------------- Martian Runtime - '1.0.0-v3.1.0' 2018-11-09 14:28:35 [runtime] Reattaching in local mode. Serving UI at http://cn3338:34493?auth=6-NvgOIWcH55W8qw0EKM9b_AZbhmwjCbD1CBbXgO_9M 2018-11-09 14:28:35 [runtime] (reset-partial) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_FASTQS_PREFLIGHT.fork0.chnk0 2018-11-09 14:28:35 [runtime] Found orphaned local stage: ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_FASTQS_PREFLIGHT 2018-11-09 14:28:38 [runtime] (ready) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET 2018-11-09 14:28:38 [runtime] (run:local) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET.fork0.chnk0.main 2018-11-09 14:28:41 [runtime] (chunks_complete) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET 2018-11-09 14:28:41 [runtime] (ready) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET 2018-11-09 14:28:41 [runtime] (run:local) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET.fork0.split 2018-11-09 14:28:44 [runtime] (split_complete) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET 2018-11-09 14:28:44 [runtime] (run:local) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET.fork0.chnk0.main 2018-11-09 14:29:03 [runtime] (chunks_complete) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET 2018-11-09 14:29:03 [runtime] (run:local) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET.fork0.join 2018-11-09 14:29:06 [runtime] (join_complete) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET 2018-11-09 14:29:07 [runtime] (ready) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY 2018-11-09 14:29:07 [runtime] (run:local) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY.fork0.split 2018-11-09 14:29:10 [runtime] (split_complete) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY 2018-11-09 14:29:10 [runtime] (run:local) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY.fork0.join 2018-11-09 14:29:16 [runtime] (join_complete) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY 2018-11-09 14:29:16 [runtime] (ready) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE 2018-11-09 14:29:16 [runtime] (run:local) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE.fork0.split 2018-11-09 14:29:20 [runtime] (split_complete) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE 2018-11-09 14:29:21 [runtime] (run:local) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE.fork0.chnk0.main 2018-11-09 14:29:24 [runtime] (chunks_complete) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE 2018-11-09 14:29:24 [runtime] (run:local) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE.fork0.join 2018-11-09 14:29:27 [runtime] (join_complete) ID.HJN3KBCX2.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE Outputs: - Run QC metrics: null - FASTQ output folder: /data/teacher/test/HJN3KBCX2/outs/fastq_path - Interop output folder: /data/teacher/test/HJN3KBCX2/outs/interop_path - Input samplesheet: /data/teacher/test/HJN3KBCX2/outs/input_samplesheet.csv Waiting 6 seconds for UI to do final refresh. Pipestance completed successfully! 2018-11-09 14:29:33 Shutting down. Saving pipestance info to HJN3KBCX2/HJN3KBCX2.mri.tgz
Note that it is necessary to specify
and --localmem
cellranger-atac may start an unreasonable number of processes or open too many files. If you encounter errors that include
... = os.fork() OSError: [Errno 11] Resource temporarily unavailable
or see unexpected results despite specifying --localcores
, you may have to raise the limit on the number of
processes and/or open files allowed in your batch script:
[user@cn3144 ~]$ ulimit -u 10240 -n 16384 [user@cn3144 ~]$ cellranger-atac mkfastq --run=cellranger-atac-tiny-bcl-1.0.0 \ --samplesheet=cellranger-atac-tiny-bcl-samplesheet-1.0.0.csv \ --localcores=$SLURM_CPUS_PER_TASK \ --localmem=34
Generate counts per gene per cell
[user@cn3144 ~]$ cellranger-atac count --id p1 \ --fastqs HJN3KBCX2/outs/fastq_path \ --reference=/fdb/cellranger-atac/refdata-cellranger-arc-GRCh38-2020-A-2.0.0 \ --localcores=$SLURM_CPUS_PER_TASK --localmem=34 cellranger-atac count (2.1.0) Copyright (c) 2018 10x Genomics, Inc. All rights reserved. ------------------------------------------------------------------------------- Martian Runtime - '1.0.0-v3.1.0' Serving UI at http://cn3338:35263?auth=koRZfC445oeJTiOKaz9N6qBn_tZFsmYLIaiWrqbbZNA Running preflight checks (please wait)... 2018-11-09 14:54:16 [runtime] (ready) ID.p1.SC_ATAC_COUNTER_CS.SC_ATAC_COUNTER._BASIC_SC_ATAC_COUNTER._ALIGNER.SETUP_CHUNKS 2018-11-09 14:54:16 [runtime] (run:local) ID.p1.SC_ATAC_COUNTER_CS.SC_ATAC_COUNTER._BASIC_SC_ATAC_COUNTER._ALIGNER.SETUP_CHUNKS.fork0.chnk0.main 2018-11-09 14:54:25 [runtime] (chunks_complete) ID.p1.SC_ATAC_COUNTER_CS.SC_ATAC_COUNTER._BASIC_SC_ATAC_COUNTER._ALIGNER.SETUP_CHUNKS 2018-11-09 14:54:25 [runtime] (ready) ID.p1.SC_ATAC_COUNTER_CS.SC_ATAC_COUNTER._BASIC_SC_ATAC_COUNTER._ALIGNER.TRIM_READS 2018-11-09 14:54:25 [runtime] (run:local) ID.p1.SC_ATAC_COUNTER_CS.SC_ATAC_COUNTER._BASIC_SC_ATAC_COUNTER._ALIGNER.TRIM_READS.fork0.split 2018-11-09 14:54:28 [runtime] (split_complete) ID.p1.SC_ATAC_COUNTER_CS.SC_ATAC_COUNTER._BASIC_SC_ATAC_COUNTER._ALIGNER.TRIM_READS 2018-11-09 14:54:28 [runtime] (run:local) ID.p1.SC_ATAC_COUNTER_CS.SC_ATAC_COUNTER._BASIC_SC_ATAC_COUNTER._ALIGNER.TRIM_READS.fork0.chnk0.main 2018-11-09 14:55:01 [runtime] (chunks_complete) ID.p1.SC_ATAC_COUNTER_CS.SC_ATAC_COUNTER._BASIC_SC_ATAC_COUNTER._ALIGNER.TRIM_READS 2018-11-09 14:55:01 [runtime] (run:local) ID.p1.SC_ATAC_COUNTER_CS.SC_ATAC_COUNTER._BASIC_SC_ATAC_COUNTER._ALIGNER.TRIM_READS.fork0.join [...] node$ exit biowulf$
The same job could also be run in cluster mode where pipeline tasks are submitted as batch jobs. This can be done by setting jobmode to slurm and limiting the max. number of concurrent jobs:
[user@cn3144 ~]$ cellranger-atac count --id s1 \ --fastqs HJN3KBCX2/outs/fastq_path \ --reference=/fdb/cellranger-atac/refdata-cellranger-arc-GRCh38-2020-A-2.0.0 \ --localcores=$SLURM_CPUS_PER_TASK \ --localmem=34 \ --jobmode=slurm --maxjobs=10
Don't forget to close the interactive session when done
[user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Though in the case of this small example this actually results in a longer overall runtime. Even when running in cluster mode, please run the main pipeline in an sinteractive session or as a batch job itself.
Create a batch input file (e.g., which uses the input file ''. For example:
#! /bin/bash module load cellranger-atac || exit 1 ## uncomment the following line if encountering 'resource unavailable' errors ## despite using --localcores and --localmem # ulimit -u 4096 cellranger-atac mkfastq --run=cellranger-atac-tiny-bcl-1.2.0 \ --samplesheet=cellranger-atac-tiny-bcl-samplesheet-1.0.0.csv \ --localcores=$SLURM_CPUS_PER_TASK \ --localmem=34 cellranger-atac count --id p1 \ --fastqs HJN3KBCX2/outs/fastq_path \ --reference=/fdb/cellranger-atac/refdata-cellranger-arc-GRCh38-2020-A-2.0.0 \ --localcores=$SLURM_CPUS_PER_TASK --localmem=34
Again, please remember to include --localcoes
and --localmem
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=12 --mem=35g
Create a swarmfile (e.g. cellranger-atac.swarm). For example:
cellranger-atac mkfastq --run=./run1 --localcores=$SLURM_CPUS_PER_TASK --localmem=34 cellranger-atac mkfastq --run=./run2 --localcores=$SLURM_CPUS_PER_TASK --localmem=34 cellranger-atac mkfastq --run=./run3 --localcores=$SLURM_CPUS_PER_TASK --localmem=34
Submit this job using the swarm command.
swarm -f cellranger-atac.swarm -g 35 -t 12 --module cellranger-atacwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module cellranger-atac | Loads the cellranger-atac module for each subjob in the swarm |