DeepHiC: a Generative Adversarial Network for Enhancing Hi-C Data Resolutions
DeepHiC ("deep Hi-C") is a generative adversarial network aimed at predicting the high-resolution Hi-C contact maps from the low-coverage sequencing data. Hi-C is commonly used to study three-dimensional genome organization.
References:
- Hong H, Jiang S, Li H, Du G, Sun Y, Tao H, et al.
DeepHiC: A Generative Adversarial Network for Enhancing Hi-C Data Resolution.
PLoS Comput Biol (2020), v.16(2): e1007287. https://doi.org/10.1371/journal.pcbi.1007287.
Documentation
Important Notes
- Module Name: DeepHiC (see the modules page for more information)
- Unusual environment variables set
- DEEPHIC_HOME installation directory
- DEEPHIC_BIN executable directory
- DEEPHIC_SRC source code directory
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=120g --gres=gpu:p100:1,lscratch:20 -c14 [user@cn3104 ~]$ module load DeepHiC [+] Loading CUDA Toolkit 10.0.130 ... [+] Loading cuDNN/7.6.5/CUDA-10.0 libraries... [+] Loading DeepHiC 20201001Copy the code and sample data to your current folder and run one of the DeepHiC scripts:
[user@cn3104 ~]$ cp -rP $DEEPHIC_SRC/* . [user@cn3104 ~]$ export DEEPHIC_DIR=./ [user@cn3104 ~]$ ./data_predict.py -lr 40kb -ckpt save/deephic_kr_100.pth -c GM12878 --cuda 0 ... WARNING: Predict process needs large memory, thus ensure that your machine have ~150G memory. Making directory: ./out_dir/GM12878 in_dir= /fdb/DeepHiC/data/processed out_dir= ./out_dir/GM12878 files= ['deephic_10kb40kb_c40_s40_b201_nonpool_gm12878.npz', 'deephic_10kb40kb_c40_s40_b201_nonpool_train.npz', 'deephic_10kb40kb_c40_s40_b201_nonpool_valid.npz'] Using device: cuda:0 Loading data[DeepHiC]: deephic_10kb40kb_c40_s40_b201_nonpool_gm12878.npz Loading DeepHiC checkpoint file from "save/deephic_kr_100.pth DeepHiC Predicting: 100%|███████████████████████████████████████| 1189/1189 [01:01<00:00, 19.41it/s]Reconstructing: data contain [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 'X'] chromosomes Start a multiprocess pool with process_num = 56 for saving predicted data Spreading a (3432, 3432) shaped matrix to (4815, 4815) shaped! Spreading a (3350, 3350) shaped matrix to (5136, 5136) shaped! Spreading a (5529, 5529) shaped matrix to (5916, 5916) shaped! Spreading a (5944, 5944) shaped matrix to (6307, 6307) shaped! Spreading a (7468, 7468) shaped matrix to (7816, 7816) shaped! Spreading a (7673, 7673) shaped matrix to (8127, 8127) shaped! Spreading a (7662, 7662) shaped matrix to (9040, 9040) shaped! Spreading a (7847, 7847) shaped matrix to (10263, 10263) shaped! Spreading a (8745, 8745) shaped matrix to (10736, 10736) shaped! Spreading a (9554, 9554) shaped matrix to (11520, 11520) shaped! Spreading a (14131, 14131) shaped matrix to (14640, 14640) shaped! Spreading a (13024, 13024) shaped matrix to (13398, 13398) shaped! Spreading a (13055, 13055) shaped matrix to (13510, 13510) shaped! Spreading a (11111, 11111) shaped matrix to (14130, 14130) shaped! Spreading a (12868, 12868) shaped matrix to (13566, 13566) shaped! Spreading a (15302, 15302) shaped matrix to (15920, 15920) shaped! Spreading a (14857, 14857) shaped matrix to (15536, 15536) shaped! Spreading a (16711, 16711) shaped matrix to (17118, 17118) shaped! Spreading a (17565, 17565) shaped matrix to (18107, 18107) shaped! Spreading a (18710, 18710) shaped matrix to (19120, 19120) shaped! Spreading a (22102, 22102) shaped matrix to (24950, 24950) shaped! Spreading a (19456, 19456) shaped matrix to (19820, 19820) shaped! Spreading a (23601, 23601) shaped matrix to (24325, 24325) shaped! Saving file: ./out_dir/GM12878/predict_chr22_40kb.npz Saving file: ./out_dir/GM12878/predict_chr21_40kb.npz Saving file: ./out_dir/GM12878/predict_chr20_40kb.npz Saving file: ./out_dir/GM12878/predict_chr19_40kb.npz Saving file: ./out_dir/GM12878/predict_chr17_40kb.npz Saving file: ./out_dir/GM12878/predict_chr18_40kb.npz Saving file: ./out_dir/GM12878/predict_chr16_40kb.npz Saving file: ./out_dir/GM12878/predict_chr14_40kb.npz Saving file: ./out_dir/GM12878/predict_chr15_40kb.npz Saving file: ./out_dir/GM12878/predict_chr13_40kb.npz Saving file: ./out_dir/GM12878/predict_chr12_40kb.npz Saving file: ./out_dir/GM12878/predict_chr11_40kb.npz Saving file: ./out_dir/GM12878/predict_chr10_40kb.npz Saving file: ./out_dir/GM12878/predict_chr9_40kb.npz Saving file: ./out_dir/GM12878/predict_chr8_40kb.npz Saving file: ./out_dir/GM12878/predict_chr7_40kb.npz Saving file: ./out_dir/GM12878/predict_chrX_40kb.npz Saving file: ./out_dir/GM12878/predict_chr6_40kb.npz Saving file: ./out_dir/GM12878/predict_chr5_40kb.npz Saving file: ./out_dir/GM12878/predict_chr4_40kb.npz Saving file: ./out_dir/GM12878/predict_chr3_40kb.npz Saving file: ./out_dir/GM12878/predict_chr1_40kb.npz Saving file: ./out_dir/GM12878/predict_chr2_40kb.npz All data saved. Running cost is 2.3 min.The results will be stored in the folder ./out_dir:
[user@cn3104 ~]$ ls out_dir/GM12878 predict_chr10_40kb.npz predict_chr15_40kb.npz predict_chr21_40kb.npz predict_chr6_40kb.npz predict_chr11_40kb.npz predict_chr16_40kb.npz predict_chr22_40kb.npz predict_chr7_40kb.npz predict_chr12_40kb.npz predict_chr17_40kb.npz predict_chr2_40kb.npz predict_chr8_40kb.npz predict_chr13_40kb.npz predict_chr18_40kb.npz predict_chr3_40kb.npz predict_chr9_40kb.npz predict_chr1_40kb.npz predict_chr19_40kb.npz predict_chr4_40kb.npz predict_chrX_40kb.npz predict_chr14_40kb.npz predict_chr20_40kb.npz predict_chr5_40kb.npzExamples of other available commands:
[user@cn3104 ~]$ ./data_aread.py -c GM12878 [user@cn3104 ~]$ ./data_generate.py -hr 10kb -lr 10kb -s train -chunk 40 -stride 40 -bound 201 -scale 1 -c GM12878 -lrc 100 [user@cn3104 ~]$ ./train.py [user@cn3104 ~]$ ./data_downsample.py -hr 10kb -lr 10kb -c GM12878 -r 16Exit the application:
[user@cn3104 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$