Bartender is an accurate clustering algorithm to detect barcodes and their abundances from raw next-generation sequencing data. In contrast with existing methods that cluster based on sequence similarity alone, Bartender uses a modified two-sample proportion test that also considers cluster size. This modification results in higher accuracy and lower rates of under- and over-clustering artifacts.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive 
[user@cn3200 ~]$module load bartender 
[+] Loading singularity  3.8.2  
[+] Loading bartender  1.1
[user@cn3200 ~]$ cp $BARTENDER_DATA/* .
[user@cn3200 ~]$ bartender_extractor_com  -f 2M_test.fq -o 2M_extracted -q "?" -p "TACC[4-7]AA[4-7]AA[4-7]TT[4-7]ATAA" -m 2
    Running bartender extractor
    bartender_extractor 2M_test.fq 2M_extracted 63 "(TAC.|TA.C|T.CC|.ACC)([ATCGN]{4,7})(AA)([ATCGN]{4,7})(AA)([ATCGN]{4,7})(TT)([ATCGN]{4,7})(ATA.|AT.A|A.AA|.TAA)" TACC ATAA 9 1
    Totally there are 1000 reads in 2M_test.fq file!
    Totally there are 976 valid barcodes from 2M_test.fq file
    Totally there are 924 valid barcodes whose quality pass the quality condition
    The estimated sequence error from the prefix and suffix parts is 0.00153689
[user@cn3200 ~]$ bartender_single_com -f 2M_extracted_barcode.txt -o 2M_barcode -d 3
Running bartender
Loading barcodes from the file
It takes 00:00:00 to load the barcodes from 2M_extracted_barcode.txt
Shortest barcode length: 18
Longest barcode length: 26
Start to group barcode with length 18
Using two sample unpooled test
Transforming the barcodes into seed clusters
Initial number of unique reads:  1
The distance threshold is 3
Identified 1 barcodes with length 18
Start to group barcode with length 19
Using two sample unpooled test
Transforming the barcodes into seed clusters
Initial number of unique reads:  7
The distance threshold is 3
Clustering iteration 1
Clustering iteration 2
Clustering iteration 3
Clustering iteration 4
Identified 7 barcodes with length 19
Start to group barcode with length 20
Using two sample unpooled test
Transforming the barcodes into seed clusters
Initial number of unique reads:  221
The distance threshold is 3
Clustering iteration 1
Clustering iteration 2
Clustering iteration 3
Clustering iteration 4
Identified 221 barcodes with length 20
Start to group barcode with length 21
Using two sample unpooled test
Transforming the barcodes into seed clusters
Initial number of unique reads:  0
The distance threshold is 3
Identified 0 barcodes with length 21
Start to group barcode with length 22
Using two sample unpooled test
Transforming the barcodes into seed clusters
Initial number of unique reads:  0
The distance threshold is 3
Identified 0 barcodes with length 22
Start to group barcode with length 23
Using two sample unpooled test
Transforming the barcodes into seed clusters
Initial number of unique reads:  0
The distance threshold is 3
Identified 0 barcodes with length 23
Start to group barcode with length 24
Using two sample unpooled test
Transforming the barcodes into seed clusters
Initial number of unique reads:  0
The distance threshold is 3
Identified 0 barcodes with length 24
Start to group barcode with length 25
Using two sample unpooled test
Transforming the barcodes into seed clusters
Initial number of unique reads:  0
The distance threshold is 3
Identified 0 barcodes with length 25
Start to group barcode with length 26
Using two sample unpooled test
Transforming the barcodes into seed clusters
Initial number of unique reads:  2
The distance threshold is 3
Clustering iteration 1
Clustering iteration 2
Clustering iteration 3
Identified 2 barcodes with length 26
The clustering process takes 00:00:00
Start to dump clusters to file with prefix 2M_barcode
Start to remove pcr effects
***(Overall error rate estimated from the clustering result)***
Total number of clusters after removing PCR effects: 231
Could not find any high quality clusters(max entropy < 0.33, cluster size > 20 ) to estimate the error rate!
Please use the result cautiously(Better to check the input data)!
The estimated error rate is 0
The overall running time 00:00:00 seconds.
End the interactive session:
[user@cn3200 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$