BERT (Bidirectional Encoder Representations from Transforners) is a technique for Natural Language Processing (NLP) pre-training developed by Google.
These instructions provide a concrete example detailing the first steps to using BERT on Biowulf. They are derived from the BERT documentation on GitHub.
The NIH HPC staff provides this quickstart guide as a convenience and makes a best effort to keep it updated. But Deep Learning development moves quickly and users are encouraged to review primary documentation published by the BERT model developers.
Allocate an interactive session and run this example.
Sample session (user input in bold):
As of the time of writing this tuturial, BERT did not support TensorFlow >=2 so the example begins by installing a custom conda environment with TensorFlow 1.15.
If you have not already done so, follow the instructions for installing and updating conda in your space here.
[user@biowulf ~]$ sinteractive --ntasks=1 --cpus-per-task=8 --mem=50g --gres=gpu:k80:2,lscratch:10 salloc.exe: Pending job allocation 45496024 salloc.exe: job 45496024 queued and waiting for resources salloc.exe: job 45496024 has been allocated resources salloc.exe: Granted job allocation 45496024 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn4184 are ready for job srun: error: x11: no local DISPLAY defined, skipping [user@cn4184 ~]$ source /data/${USER}/conda/etc/profile.d/conda.sh [user@cn4184 ~]$ conda create -n my-tensorflow python=3.7 tensorflow-gpu==1.15.0 Collecting package metadata (current_repodata.json): done Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: done ## Package Plan ## environment location: /data/user/conda/envs/my-tensorflow added / updated specs: - python=3.7 - tensorflow-gpu==1.15.0 The following NEW packages will be INSTALLED: [...] Proceed ([y]/n)? y Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use # # $ conda activate my-tensorflow # # To deactivate an active environment, use # # $ conda deactivate [user@cn4184 ~]$ conda activate my-tensorflow (my-tensorflow)[user@cn4184 ~]$ which python #ensure you are using your tensorflow installation /data/user/conda/envs/my-tensorflow/bin/python
Now download the BERT GitHub repository as well as some sample data and a pre-trained model and set environment variables to point to these locations.
(my-tensorflow)[user@cn4184 ~]$ mkdir -pv /data/${USER}/bert mkdir: created directory ‘/data/user/bert’ (my-tensorflow)[user@cn4184 ~]$ cd /data/${USER}/bert (my-tensorflow)[user@cn4184 bert]$ git clone https://github.com/google-research/bert.git Cloning into 'bert'... remote: Enumerating objects: 336, done. remote: Total 336 (delta 0), reused 0 (delta 0), pack-reused 336 Receiving objects: 100% (336/336), 291.41 KiB | 0 bytes/s, done. Resolving deltas: 100% (184/184), done. (my-tensorflow)[user@cn4184 bert]$ cd bert/ && git checkout cc7051dc && cd .. # ensure a working starting point for tutorial Note: checking out 'cc7051dc'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b new_branch_name HEAD is now at cc7051d... Updating XNLI paths (my-tensorflow)[user@cn4184 bert]$ wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py --2020-01-03 17:49:54-- https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py Resolving dtn06-e0 (dtn06-e0)... 10.1.200.242 Connecting to dtn06-e0 (dtn06-e0)|10.1.200.242|:3128... connected. Proxy request sent, awaiting response... 200 OK Length: 8225 (8.0K) [text/plain] Saving to: ‘download_glue_data.py’ 100%[============================================================>] 8,225 --.-K/s in 0.001s 2020-01-03 17:49:54 (5.38 MB/s) - ‘download_glue_data.py’ saved [8225/8225] (my-tensorflow)[user@cn4184 bert]$ python download_glue_data.py Downloading and extracting CoLA... Completed! Downloading and extracting SST... Completed! Processing MRPC... Local MRPC data not specified, downloading data from https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt Completed! Downloading and extracting QQP... Completed! Downloading and extracting STS... Completed! Downloading and extracting MNLI... Completed! Downloading and extracting SNLI... Completed! Downloading and extracting QNLI... Completed! Downloading and extracting RTE... Completed! Downloading and extracting WNLI... Completed! Downloading and extracting diagnostic... Completed! (my-tensorflow)[user@cn4184 bert]$ export GLUE_DIR=/data/${USER}/bert/glue_data (my-tensorflow)[user@cn4184 bert]$ wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip --2020-01-03 17:52:59-- https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip Resolving dtn06-e0 (dtn06-e0)... 10.1.200.242 Connecting to dtn06-e0 (dtn06-e0)|10.1.200.242|:3128... connected. Proxy request sent, awaiting response... 200 OK Length: 407727028 (389M) [application/zip] Saving to: ‘uncased_L-12_H-768_A-12.zip’ 100%[============================================================>] 407,727,028 120MB/s in 3.2s 2020-01-03 17:53:03 (120 MB/s) - ‘uncased_L-12_H-768_A-12.zip’ saved [407727028/407727028] (my-tensorflow)[user@cn4184 bert]$ unzip uncased_L-12_H-768_A-12.zip Archive: uncased_L-12_H-768_A-12.zip creating: uncased_L-12_H-768_A-12/ inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.meta inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.data-00000-of-00001 inflating: uncased_L-12_H-768_A-12/vocab.txt inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.index inflating: uncased_L-12_H-768_A-12/bert_config.json (my-tensorflow)[user@cn4184 bert]$ export BERT_BASE_DIR=/data/${USER}/bert/uncased_L-12_H-768_A-12
Now we can use the run_classifier.py script from the BERT GitHub repo to fine tune the uncased_L-12_H-768_A-12 model on some example data. This step should take around 10 minutes to complete.
(my-tensorflow)[user@cn4184 bert]$ python bert/run_classifier.py \ --task_name=MRPC \ --do_train=true \ --do_eval=true \ --data_dir=$GLUE_DIR/MRPC \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=$BERT_BASE_DIR/bert_config.json \ --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \ --max_seq_length=128 --train_batch_size=32 \ --learning_rate=2e-5 \ --num_train_epochs=3.0 \ --output_dir=/lscratch/${SLURM_JOB_ID}/mrpc_output WARNING:tensorflow:From /gpfs/gsfs11/users/user/bert/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead. WARNING:tensorflow:From bert/run_classifier.py:981: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead. WARNING:tensorflow:From bert/run_classifier.py:784: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead. [...] INFO:tensorflow:evaluation_loop marked as finished I0103 18:07:13.907693 46912496418496 error_handling.py:101] evaluation_loop marked as finished INFO:tensorflow:***** Eval results ***** I0103 18:07:13.907948 46912496418496 run_classifier.py:923] ***** Eval results ***** INFO:tensorflow: eval_accuracy = 0.86764705 I0103 18:07:13.908066 46912496418496 run_classifier.py:925] eval_accuracy = 0.86764705 INFO:tensorflow: eval_loss = 0.38859132 I0103 18:07:13.908277 46912496418496 run_classifier.py:925] eval_loss = 0.38859132 INFO:tensorflow: global_step = 343 I0103 18:07:13.908396 46912496418496 run_classifier.py:925] global_step = 343 INFO:tensorflow: loss = 0.38859132 I0103 18:07:13.908495 46912496418496 run_classifier.py:925] loss = 0.38859132
Now we can use the fine-tuned model to perform inference on some of the example data.
(my-tensorflow)[user@cn4184 bert]$ TRAINED_CLASSIFIER=/lscratch/${SLURM_JOB_ID}/mrpc_output (my-tensorflow)[user@cn4184 bert]$ python bert/run_classifier.py \ --task_name=MRPC \ --do_predict=true \ --data_dir=$GLUE_DIR/MRPC \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=$BERT_BASE_DIR/bert_config.json \ --init_checkpoint=$TRAINED_CLASSIFIER \ --max_seq_length=128 \ --output_dir=/lscratch/${SLURM_JOB_ID}/mrpc_output [...] INFO:tensorflow:Running local_init_op. I0103 18:15:42.120980 46912496418496 session_manager.py:500] Running local_init_op. INFO:tensorflow:Done running local_init_op. I0103 18:15:42.180213 46912496418496 session_manager.py:502] Done running local_init_op. 2020-01-03 18:15:42.976214: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 INFO:tensorflow:prediction_loop marked as finished I0103 18:16:15.129939 46912496418496 error_handling.py:101] prediction_loop marked as finished INFO:tensorflow:prediction_loop marked as finished I0103 18:16:15.130402 46912496418496 error_handling.py:101] prediction_loop marked as finished (my-tensorflow)[user@cn4184 bert]$ tail /lscratch/${SLURM_JOB_ID}/mrpc_output/test_results.tsv 0.009501748 0.99049824 0.024339601 0.9756604 0.009656649 0.99034333 0.9432048 0.056795176 0.012551893 0.98744816 0.96603405 0.03396591 0.9437976 0.056202445 0.010656527 0.9893435 0.008318217 0.99168175 0.008777457 0.9912225
At this point, the fine-tuned model and the inference results are located in the local /lscratch/${SLURM_JOB_ID} directory. Don't forget to copy them to your space before exiting the job.
Create a batch input file. For example:
[user@biowulf ~]$ cat >submit.sh <<'EOF' #!/bin/bash set -e source /data/${USER}/conda/etc/profile.d/conda.sh conda activate my-tensorflow cd /data/${USER}/bert export GLUE_DIR=/data/${USER}/bert/glue_data export BERT_BASE_DIR=/data/${USER}/bert/uncased_L-12_H-768_A-12 python bert/run_classifier.py \ --task_name=MRPC \ --do_train=true \ --do_eval=true \ --data_dir=$GLUE_DIR/MRPC \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=$BERT_BASE_DIR/bert_config.json \ --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \ --max_seq_length=128 --train_batch_size=32 \ --learning_rate=2e-5 \ --num_train_epochs=3.0 \ --output_dir=/lscratch/${SLURM_JOB_ID}/mrpc_output cp -r /lscratch/${SLURM_JOB_ID} /data/$USER/${SLURM_JOB_ID}-trained-model EOF
Submit this job using the Slurm sbatch command.
[user@biowulf ~]$ sbatch --partition=gpu --ntasks=1 --cpus-per-task=8 --mem=50g --gres=gpu:k80:2,lscratch:10 submit.sh 45503181