Biowulf High Performance Computing at the NIH
BioBERT: a biomedical language representation model
designed for biomedical text mining tasks

BioBERT is a biomedical language representation model designed for biomedical text mining tasks such as biomedical named entity recognition, relation extraction, question answering, etc.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=16g  --gres=gpu:p100:1,lscratch:10 -c4 
[user@cn3104 ~]$ module load biobert
[+] Loading python 3.6  ...
[+] Loading cuDNN/7.5/CUDA-10.0 libraries...
[+] Loading CUDA Toolkit  10.0.130  ...
[+] Loading biobert v20200409  ...
Copy sample data to your current folder and run one of BioBert scripts:
[user@cn3104 ~]$ cp $BIOBERT_DATA/* .
[user@cn3104 ~]$ run_bert_gt.py   --do_train=true   --do_eval=true   --do_predict=true   --task_name="cdr"  --vocab_file=vocab.txt   --bert_config_file=bert_config.json   --init_checkpoint=model.ckpt-1000000   --num_train_epochs=15.0   --max_seq_length=512   --train_batch_size=4  --surrounding_words_distance=20   --do_lower_case=false   --entity_num=2   --max_num_neighbors=20   --max_num_entity_indices=20   --data_dir=.  --output_dir=out_dir
...
INFO:tensorflow:Writing example 0 of 8614
I0928 15:43:42.794781 46912496418432 run_bert_gt.py:861] Writing example 0 of 8614
INFO:tensorflow:*** Example ***
I0928 15:43:42.886709 46912496418432 run_bert_gt.py:824] *** Example ***
INFO:tensorflow:guid: 17049862
I0928 15:43:42.886834 46912496418432 run_bert_gt.py:825] guid: 17049862
INFO:tensorflow:tokens: [CLS] Is @ChemicalSrc$ administration safe in a @DiseaseTgt$ child ? A male neon #
#ate with a Chi ##ari ma ##lf ##orm ##ation and a leaking my ##elo ##men ##ing ##oc ##oe ##le underwent ve
nt ##ric ##ulo ##per ##ito ##nea ##l s ##hun ##t insertion followed by repair of my ##elo ##men ##ing ##oc
 ##oe ##le . During an ##ae ##st ##hesia and surgery , he inadvertently became moderately @DiseaseTgt$ . I
n ##tra ##ven ##ous @ChemicalSrc$ was administered during the later part of the surgery for seizure prop #
#hyl ##ax ##is . Following @ChemicalSrc$ administration , the patient developed acute severe bra ##dy ##ca
rd ##ia , re ##fra ##ctor ##y to at ##rop ##ine and adrenaline . The cardiac de ##press ##ant actions of @
ChemicalSrc$ and @DiseaseTgt$ can be add ##itive . Administration of @ChemicalSrc$ in the presence of @Dis
easeTgt$ may lead to an adverse cardiac event in children . As @ChemicalSrc$ is a commonly used drug , cli
nic ##ians need to be aware of this interaction . [SEP]
I0928 15:43:42.887003 46912496418432 run_bert_gt.py:827] tokens: [CLS] Is @ChemicalSrc$ administration saf
e in a @DiseaseTgt$ child ? A male neon ##ate with a Chi ##ari ma ##lf ##orm ##ation and a leaking my ##el
o ##men ##ing ##oc ##oe ##le underwent vent ##ric ##ulo ##per ##ito ##nea ##l s ##hun ##t insertion follow
ed by repair of my ##elo ##men ##ing ##oc ##oe ##le . During an ##ae ##st ##hesia and surgery , he inadver
tently became moderately @DiseaseTgt$ . In ##tra ##ven ##ous @ChemicalSrc$ was administered during the lat
er part of the surgery for seizure prop ##hyl ##ax ##is . Following @ChemicalSrc$ administration , the pat
ient developed acute severe bra ##dy ##card ##ia , re ##fra ##ctor ##y to at ##rop ##ine and adrenaline .
The cardiac de ##press ##ant actions of @ChemicalSrc$ and @DiseaseTgt$ can be add ##itive . Administration
 of @ChemicalSrc$ in the presence of @DiseaseTgt$ may lead to an adverse cardiac event in children . As @C
hemicalSrc$ is a commonly used drug , clinic ##ians need to be aware of this interaction . [SEP]
INFO:tensorflow:input_ids: 101 2181 2 3469 2914 1107 170 1 2027 136 138 2581 24762 2193 1114 170 11318 771
0 12477 9654 24211 1891 1105 170 27742 1139 19773 2354 1158 13335 7745 1513 9315 21828 4907 22806 3365 838
3 25362 1233 188 17315 1204 27914 1723 1118 6949 1104 1139 19773 2354 1158 13335 7745 1513 119 1507 1126 5
024 2050 27300 1105 6059 117 1119 23438 1245 19455 1 119 1130 4487 7912 2285 2 1108 8318 1219 1103 1224 12
26 1104 1103 6059 1111 20752 21146 18873 7897 1548 119 2485 2 3469 117 1103 5351 1872 12104 5199 12418 381
0 10542 1465 117 1231 27476 9363 1183 1106 1120 12736 2042 1105 18108 119 1109 17688 1260 11135 2861 3721
1104 2 1105 1 1169 1129 5194 8588 119 4918 1104 2 1107 1103 2915 1104 1 1336 1730 1106 1126 16798 17688 18
56 1107 1482 119 1249 2 1110 170 3337 1215 3850 117 12257 5895 1444 1106 1129 4484 1104 1142 8234 119 102
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I0928 15:43:42.887241 46912496418432 run_bert_gt.py:828] input_ids: 101 2181 2 3469 2914 1107 170 1 2027 1
36 138 2581 24762 2193 1114 170 11318 7710 12477 9654 24211 1891 1105 170 27742 1139 19773 2354 1158 13335
 7745 1513 9315 21828 4907 22806 3365 8383 25362 1233 188 17315 1204 27914 1723 1118 6949 1104 1139 19773
2354 1158 13335 7745 1513 119 1507 1126 5024 2050 27300 1105 6059 117 1119 23438 1245 19455 1 119 1130 448
7 7912 2285 2 1108 8318 1219 1103 1224 1226 1104 1103 6059 1111 20752 21146 18873 7897 1548 119 2485 2 346
9 117 1103 5351 1872 12104 5199 12418 3810 10542 1465 117 1231 27476 9363 1183 1106 1120 12736 2042 1105 1
8108 119 1109 17688 1260 11135 2861 3721 1104 2 1105 1 1169 1129 5194 8588 119 4918 1104 2 1107 1103 2915
1104 1 1336 1730 1106 1126 16798 17688 1856 1107 1482 119 1249 2 1110 170 3337 1215 3850 117 12257 5895 14
44 1106 1129 4484 1104 1142 8234 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...
INFO:tensorflow:entity_indices_mask: 1 = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
I0928 15:43:42.888161 46912496418432 run_bert_gt.py:835] entity_indices_mask: 1 = [1, 1, 1, 1, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
INFO:tensorflow:in_neighbors_indices: 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0 0|1|2|3|4|5|6|7|8|9|10|11|12
|13|14|15|16|17|18|19 1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20 2|3|1|4|5|6|7|8|9|10|11|12|13|14|
15|16|17|18|19|20 3|4|5|2|6|1|7|8|9|10|11|12|13|14|15|16|17|18|19|20 4|5|8|6|3|7|2|1|9|10|11|12|13|14|15|1
6|17|18|19|20 5|6|8|7|4|3|9|2|10|1|11|12|13|14|15|16|17|18|19|20 6|7|8|5|9|4|10|3|11|2|12|13|1|14|15|16|17
|18|19|20 7|8|4|9|6|10|5|11|12|13|3|14|2|15|1|16|17|18|19|20 8|9|1|10|7|11|6|12|13|5|14|4|15|3|16|17|2|18|
19|20 9|10|12|13|11|8|7|14|6|15|5|16|17|4|18|19|20|21|3|22 10|11|12|13|9|14|8|15|7|16|17|6|18|19|20|21|5|2
2|4|23 11|12|13|32|14|10|15|9|16|17|8|18|19|20|21|7|22|6|23|5 12|13|32|11|14|10|15|9|16|17|8|18|19|20|21|7
|22|6|23|5 13|14|18|19|20|21|12|15|11|16|17|10|9|22|8|23|7|24|6|25 14|15|18|19|20|21|16|17|12|13|11|22|10|
23|9|24|8|25|26|27 15|16|17|18|19|20|21|14|22|12|13|23|11|24|10|25|26|27|28|29 16|17|18|19|20|21|15|14|22|
12|13|23|11|24|10|25|26|27|28|29 17|18|19|20|21|12|13|16|22|15|23|14|24|25|26|27|28|29|30|31 18|19|20|21|1
2|13|16|17|22|15|23|14|24|25|26|27|28|29|30|31 19|18|20|21|12|13|16|17|22|15|23|14|24|25|26|27|28|29|30|31
 20|18|19|21|12|13|16|17|22|15|23|14|24|25|26|27|28|29|30|31 21|22|12|13|18|19|20|23|16|17|24|15|25|26|27|
28|29|30|31|14 22|23|25|26|27|28|29|30|31|24|18|19|20|21|16|17|32|15|33|34 23|24|25|26|27|28|29|30|31|22|3
2|18|19|20|21|33|34|35|36|37 24|25|26|27|28|29|30|31|12|13|32|23|33|34|35|36|37|38|39|22 25|26|27|28|29|30
|31|12|13|24|32|23|33|34|35|36|37|38|39|22 26|25|27|28|29|30|31|12|13|24|32|23|33|34|35|36|37|38|39|22 27|
25|26|28|29|30|31|12|13|24|32|23|33|34|35|36|37|38|39|22 28|25|26|27|29|30|31|12|13|24|32|23|33|34|35|36|3
7|38|39|22 29|25|26|27|28|30|31|12|13|24|32|23|33|34|35|36|37|38|39|22 30|25|26|27|28|29|31|12|13|24|32|23
|33|34|35|36|37|38|39|22 31|32|25|26|27|28|29|30|33|34|35|36|37|38|39|24|40|41|42|23 32|33|34|35|36|37|38|
39|43|40|41|42|25|26|27|28|29|30|31|24 33|34|35|36|37|38|39|43|32|40|41|42|25|26|27|28|29|30|31|24 34|33|3
5|36|37|38|39|43|32|40|41|42|25|26|27|28|29|30|31|24 35|33|34|36|37|38|39|43|32|40|41|42|25|26|27|28|29|30
...
etc.
Exit the application:
[user@cn3104 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$