Biowulf High Performance Computing at the NIH
Cambridge Structural Database/WebCSD
CCDC logo

The Cambridge Structural Database is the world repository of small molecule crystal structures. The database holds bibliographic, 2D chemical and 3D structural results from crystallographic analyses of organics, organometallics and metal complexes. Both X-ray and neutron diffraction studies are included for compounds containing up to ca. 500 atoms (including hydrogens). NOTE: The database is NOW available on the biowulf cluster.

Along with the database are tools for search, retreival, analysis, and display of the CSD contents. These include ConQuest (for text, numeric, 2D substructure and 3D geometric searching) and VISTA (statistical analysis and display of geometric and other data).

Installers for Windows, Mac and Linux OSes, are also available for all NIH employees interested in using the software. If you would like to use CSD from your local desktop, please contact, for a link to the download.



NIH users can access the CSD directly via the WebCSD portal. New structures are added weekly for up-to-date searches. Users can perform searches - Similarity, Substructure, Text and Numerical - as well as browse the database. The portal is restricted to NIH IP addresses only. If you are at home, you must connect to the NIH VPN before using.

Interactive Quest
on Biowulf

CSD requires an X-windows, or preferably NX, session. Click here for more information about X-Windows and NX.

To use, type module load CSD, then quest&, at the prompt.

Sample session: (replace 'username' with your own Helix/Biowulf username).

You cannot run Conquest on the Biowulf login node. You will first need to allocate an interactive session with 'sinteractive'.

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load CSD

[user@cn3144 ~]$  quest &
Starting ConQuest. Use questv5 for QUEST.
Running ConQuest with glibc version 2.5

Python API on Biowulf

Loading the CSD module will also set up the paths for the CSD python API installation. Sample interactive session:

biowulf% sinteractive --mem=20g 
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load CSD
[+] Loading Cambridge Structural Database 2020  on cn0931

# Note: the following is required to run without an Xwindows display
# but will prevent diagram generation and other graphics-oriented
# pieces of the CSD from running
[user@cn3144 ~]$ export CCDC_PYTHON_API_NO_QAPPLICATION=True

[user@cn3144 ~]$ python
Python 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:33:48)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ccdc import io
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-user'
>>> csd_reader = io.EntryReader('CSD')
>>> cryst_abebuf = csd_reader.crystal('ABEBUF')
>>> mol_abebuf = csd_reader.molecule('ABEBUF')
>>> round(mol_abebuf.molecular_weight, 3)
>>> mol_abebuf.is_organic
>>> print(mol_abebuf.heaviest_component.smiles)
>>> quit()

[user@cn3144 ~]$ exit

Python API in batch job

Set up a batch script along the following lines:


module load CSD

# Note: the following is required to run without an Xwindows display
# but will prevent diagram generation and other graphics-oriented
# pieces of the CSD from running

python << EOF

from ccdc import io
from import TextNumericSearch
csd_reader = io.EntryReader('CSD')
text_numeric_search = TextNumericSearch()
identifiers = [h.identifier for h in]
for identifier in sorted(set(identifiers)):	
      e = csd_reader.entry(identifier)
      if e.melting_point:
	    print('%-8s %s' % (e.identifier,e.publication.doi,e.melting_point))

Submit with:
sbatch myscript
The script should produce an output file containing the following:
[+] Loading Cambridge Structural Database 2020  on cn####
ACSALA13         135.5deg.C
CUASPR01 above 573 K
HUPPOX          91-96 deg.C
HUPPOX01                      91-96 deg.C
NUWTOP01        147 deg.C
PIKYUG   502 K