Biowulf High Performance Computing at the NIH
Transferring data to/from the NIH HPC systems

There are several secure options for transferring files to and from Biowulf and Helix. Detailed setup & usage instructions for each method are below.

No matter how you transfer data in and out of the systems, be aware that PII and PHI data cannot be stored or transferred into the NIH HPC systems.

Data transfer and sharing using Globus

Globus is a service that makes it easy to move, sync, and share large amounts of data. It is the recommended way to transfer data to and from the HPC systems.

Globus will manage file transfers, monitor performance, retry failures, recover from faults automatically when possible, and report the status of your data transfer. Globus uses GridFTP for more reliable and high-performance file transfer, and will queue file transfers to be performed asynchronously in the background.

Setting up a Globus account, transferring and sharing data

Mount HPC Systems Directories To Desktop (Inside NIH Network Only):

Windows: Mapped Network Drive

This method will allow you to easily drag/drop files between your local machine and your global HPC (Biowulf/Helix) directories. This includes /home, /data, and /scratch. Please see the section on Storage for more information about /home, /data, and /scratch.

This method can only be used for machines that are within the NIH network, including VPN connections. The NCI-Frederick campus is outside the main NIH campus firewall, so users at NCI-Frederick will need to use VPN.

  1. On your desktop machine, open the 'Computer' tab and open the Tools → Map Network Drive tab.

    My Computer folder

  2. Enter the directory you want to mount as follows:

    • /home/[user]: \\helixdrive.nih.gov\[user]
    • /data/[user]: \\helixdrive.nih.gov\data
    • /scratch: \\helixdrive.nih.gov\scratch
    • Shared group area (e.g. /data/PQRlab: \\helixdrive.nih.gov\name_of_shared_area

    Make sure to replace [user] with your NIH login!!!

    Map Network Drive window

    Because the NIH HPC systems are authenticated using NIH Login, you should not have to enter your user or password. Click the 'Finish' button.

  3. You have successfully mapped your HPC directory to your desktop machine! You should see a network icon in the My Computer folder. You can create a shortcut to this drive on your desktop.

    /home icon

  4. Please note that the disk usage information is not correct for your /home directory, but it is correct for your /data directory.

    /data icon

More about /home, /data, and /scratch directories.

Windows: Add Network Location

This method will also allow you to easily drag/drop files between your local machine and your NIH HPC directories. This includes the global Biowulf/Helix/Felix /home, /data, and /scratch. Please see the section on Storage for more information about /home, /data, and /scratch. This method is ALWAYS ACCESSIBLE unless there are network issues or you get a new machine.

This method can only be used for machines that are within the NIH network, including VPN connections. The NCI-Frederick campus is outside the main NIH campus firewall, so users at NCI-Frederick will need to use VPN.

  1. On your desktop machine, right-click the 'Computer' tab and click the 'Add Network Location' menu item to start the wizard. Click 'Next' when the following window pops up

    add network location

  2. Click 'Next' in this window

    where to create

  3. Enter the shared drive you want to mount as follows:

    • /home/[user]: \\helixdrive.nih.gov\[user]
    • /data/[user]: \\helixdrive.nih.gov\data
    • /scratch: \\helixdrive.nih.gov\scratch
    • Shared group area (e.g. /data/PQRlab: \\helixdrive.nih.gov\name_of_shared_area

    Make sure to replace [user] with your NIH login!!!

    specify location

    Click the 'Next' button.

  4. You can name the location, although, the default name is fine.

    name location image

  5. Completing the wizard

    completing the wizard

    Click the 'Finish' button

    /data icon

    To see larger image, right-click &arr; View Image

More about /home, /data, and /scratch directories.

Macs: Mapped Network Drive

Desktop machines within the NIH network can map NIH HPC directories via Helixdrive, so that you can easily drag/drop files between your local machine and your Biowulf/Helix/Felix /home, /data and /scratch directories. [More information about /home, /data, and /scratch].

Note:

Mac users should consider creating (or editing) the following file on their system if they would like like to use mapped network drives:

/etc/sysctl.conf

include this line in the file (it may be the only contents of the file):

net.inet.tcp.delayed_ack=0

After the file is created/appended, reboot your Mac. This will profoundly increase file-transfer performance. Without this alteration, performance may be bad enough to render Helixdrive shares unusable. If you are unable or unwilling to set this file, you'll likely want to use sftp or scp rather than Helixdrive.

This method can only be used for machines that are within the NIH network, including VPN connections. The NCI-Frederick campus is outside the main NIH campus firewall, so users at NCI-Frederick will need to use VPN.

  1. From the main Mac menu, click on Go → Connect to server.
  2. For 'Server address', enter the HPC directory you want to mount:
    • /home/[user]: smb://helixdrive.nih.gov/user
    • /data/[user]: smb://helixdrive.nih.gov/data
    • /scratch: smb://helixdrive.nih.gov/scratch
    • Shared group area (e.g./data/PQRlab): smb://helixdrive.nih.gov/name_of_shared_area
    (Replace 'user' by your NIH login username.)

    server connection display

  3. Click 'Connect' and in the subsequent window, enter your NIH Login user and password. NIH AD usernames and passwords are used to connect to all Helix & Biowulf services.

    login/password

  4. The requested area should now be mounted as a shared drive. In your Finder window, you will see 'helixdrive.nih.gov' listed under 'Shared', and can drag and drop files to your HPC directories.

    folder display

Linux: Mapped Network Drive

Note that this method is most suitable for transferring small files. Users transferring large amounts of data to and from Biowulf/Helix should continue to use scp/sftp/globus.

This method can only be used for machines that are within the NIH network, including VPN connections. The NCI-Frederick campus is outside the main NIH campus firewall, so users at NCI-Frederick will need to use VPN.

Since your uid/gid is likely to be different on your desktop than on biowulf you may also need to include the uid and gid options. For example, if you local uid and gid are 3245, you would add uid=3245,gid=3245 to the mount option string.

Typical mount commands for accessing a CIFS file system:

To mount your Biowulf /home/[user]:

mount -t cifs -o rw,vers=2.0,nosetuids,sec=ntlmsspi,user=jdoe,domain=NIH.gov //helixdrive.nih.gov/[user] /mnt/bw-home

To mount your Biowulf /data/[user]:

mount -t cifs -o rw,vers=2.0,nosetuids,sec=ntlmsspi,user=jdoe,domain=NIH.gov //helixdrive.nih.gov/data /mnt/bw-data

To mount Biowulf /scratch:

mount -t cifs -o rw,vers=2.0,nosetuids,sec=ntlmsspi,user=jdoe,domain=NIH.gov //helixdrive.nih.gov/scratch /mnt/bw-scratch

To mount a shared group area: (e.g. /data/PQRlab)

mount -t cifs -o rw,vers=2.0,nosetuids,sec=ntlmsspi,user=jdoe,domain=NIH.gov //helixdrive.nih.gov/PQRlab /mnt/bw_PQRlab

Note that the path of the local directory used as mount points may have to be adapted to your situation and jdoe has to be replaced with your Biowulf username.

GUI File Transfer Clients:
Windows: WinSCP
  1. Download from winscp.net

  2. Click 'Open'

  3. WinSCP File Download

  4. Select 'Next'

  5. WinSCP Setup Wizard

  6. Select 'I Accept' then click 'Next'

  7. WinSCP License Agreement

  8. Accept the default location or choose one yourself then click 'Next'

  9. WinSCP Select Destination Directory

  10. Click 'Next'

  11. WinSCP Select Components

  12. Click 'Next'

  13. WinSCP Select Start Menu Folder

  14. Click 'Next'

  15. WinSCP Select Additional Tasks

  16. Click 'Next'

  17. WinSCP Initial user settings

  18. Click 'Install'

  19. WinSCP Ready to Install

  20. Uncheck the 'Launch WinSCP' box then click 'Finish'.

  21. WinSCP Completing the WinSCP3 Setup Wizard

  22. To open WinSCP, double click on the shortcut on your desktop.

  23. WinSCP3 Icon

  24. Fill the host name, your NIH login username and password, select 'SFTP', then click 'Login'.

  25. WinSCP Login Screen

  26. Click 'Yes'. This window only show up the first time you use WinSCP.

  27. WinSCP Add Server Host Key

  28. The left panel shows the directories on your desktop PC and the right panel shows your directories on Biowulf/Helix.

  29. WinSCP Display Panel

  30. Click on the 'Preference' icon and browse through the tags to get an idea of all the options available.

  31. WinSCP Preference Icon

  32. To locate the file source and destination, simply use the two drop down boxes. Drag and drop files or folders to start transfer.

  33. WinSCP Locate File Source

Macs: Fugu

Fugu is a graphical frontend to the commandline Secure File Transfer application (SFTP). SFTP is similar to FTP, but unlike FTP, the entire session is encrypted, meaning no passwords are sent in cleartext form, and is thus much less vulnerable to third-party interception. Fugu allows you to take advantage of SFTP's security without having to sacrifice the ease of use found in a GUI. Fugu also includes support for SCP file transfers, and the ability to create secure tunnels via SSH.

  1. Download Fugu from the U. Mich. Fugu website.

  2. . For OSX 10.5 and above, download from cnet.com.
  3. Doubleclick on the downloaded Fugu_xxxx.dmg file to open. A small window with the Fugu icon will appear,

    Fugu Icon

    Grab the fish and copy it to your Applications folder, your Desktop and/or your Dock.

  4. Start Fugu by clicking on the Fugu icon. In the box for 'Connect to:', enter 'helix.nih.gov' and click 'Connect'. Enter your NIH Login password when requested. You should now see a window with one pane listing files on your local desktop machine, and the other pane listing files in your Biowulf/Helix account space.

  5. Fugu Display Panel

You can now transfer files by dragging and dropping between the two panes.
Windows/Mac/Linux: Filezilla
  1. Download Filezilla

  2. Save the setup.exe to your desktop.

  3. FileZilla Setup Open

  4. Double-click on the setup.exe icon, and accept the license agreement.

  5. FileZilla License Agreement

  6. Choose components, install location, and start menu folder. The defaults are almost always acceptable.

  7. FileZilla Choose Components

    FileZilla Install Location

    FileZilla Start Menu Folder

  8. Click install. Accept and finish.

  9. Start the Filezilla client.

  10. Select File > Site Manager...

  11. FileZilla Site Manager Select

  12. Click New Site and configure for helix as detailed below:

  13. FileZilla Site Manager Panel

  14. Click connect, and drag and drop files across systems.

  15. FileZilla Display Panel

Commandline File Transfer:
Windows: secure FTP and secure copy with PuTTY

Both psftp and pscp are run through the Windows console (Command Prompt in start menu), and require the directory to the PuTTY executables be included in the Path environment variable. This can be done transiently through the console:

PuTTY Command Window

or permanently through the System Control Panel (see here for more information).

pscp

Secure Copy (pscp) is a command line mechanism for copying files to and from remote systems.

From the console, type 'pscp'. This will bring up a help menu showing all the options for pscp.

PuTTY Secure Copy client Release 0.58 Usage: pscp [options] [user@]host:source target pscp [options] source [source...] [user@]host:target pscp [options] -ls [user@]host:filespec Options: -V print version information and exit -pgpfp print PGP key fingerprints and exit -p preserve file attributes -q quiet, don't show statistics -r copy directories recursively -v show verbose messages -load sessname Load settings from saved session -P port connect to specified port -l user connect with specified user -pw passw login with specified password -1 -2 force use of particular SSH protocol version -4 -6 force use of IPv4 or IPv6 -C enable compression -i key private key file for authentication -batch disable all interactive prompts -unsafe allow server-side wildcards (DANGEROUS) -sftp force use of SFTP protocol -scp force use of SCP protocol

To copy a file from the local Windows machine to a user's home directory on Biowulf/Helix, type

C:> pscp localfile user@helix.nih.gov:/home/user/localfile

You will be prompted for your NIH login password, then the file will be copied.

To do the reverse, i.e. copy a remote file from helix to the local Windows machine, type

C:> pscp user@helix.nih.gov:/home/user/remotefile .

(you must include a '.' to retain the same filename, or explicitly give a name for the remotefile copy).

psftp

Secure FTP (psftp) allows for interactive file transfers between machines in the same way as good old FTP (non-secure) did.

From the console, type 'psftp'. This will start a sFTP session, but it will complain that no connection has been made. To transfer a local file to helix, at the psftp prompt type:

psftp> open user@helix.nih.gov

You will again be prompted for a password.

Once a session to helix has been established, the standard FTP commands can be used.

For even more information, see http://the.earth.li/~sgtatham/putty/0.58/htmldoc/

Macs & Unix/Linux: Secure Copy

scp is a secure, encrypted way to transfer files between machines. It is available on Macs and Unix/Linux machines.

To transfer a file from your local machine to Helix/Biowulf, open a terminal window on your local machine. In this window type

scp mylocalfile user@helix.nih.gov:/home/user or scp mylocalfile user@biowulf.nih.gov:/home/user

where 'user' is your NIH login username. The scp program will prompt you for your NIH login password before transferring the file.

Note that whether you copy a file to Helix or Biowulf, it will end up in the same place as Helix and Biowulf share /home/ and /data areas.

To download a file from your NIH HPC account to your desktop machine, use the following command in a terminal window on your local machine.

scp user@helix.nih.gov:/home/user/myfile . or scp user@biowulf.nih.gov:/home/user/myfile . or scp user@biowulf.nih.gov:/data/user/mydir/myfile .

As before, 'user' is your NIH login username, and scp will prompt you for your NIH login password before transferring the file.

Macs & Unix/Linux: bbcp

bbcp is a high-performance version of scp which can provide significantly increased file transfer performance over scp. Biowulf staff have observed over 220 MB/s over 10G links. bbcp is a peer-to-peer application. You invoke bbcp on the source machine and in response a bbcp process is started on the target machine.

Before you can use the bbcp utility it must be installed on both the local and remote systems. bbcp is already available on Helix (but not on Biowulf). To download a pre-compiled bbcp program for your Unix, Linux or MacOS X (x86_darwin_100) go here.

NOTE for Mac users: When the bbcp file is downloaded from the x86_darwin_100 folder, the name may need to be changed from bbcp.txt to bbcp, and the permissions may need to be changed to allow execution. This can be done using the Terminal application like this:

mv bbcp.txt bbcp
chmod +x bbcp

Also, to make the bbcp command 'universal' on your Mac, you can move it to the /usr/local/bin directory on your desktop:

mv bbcp /usr/local/bin/.

The syntax is identical to scp, but there are some differences in its operation, some of which are described here:

  • By default bbcp will not overwrite an existing file. Use the -f switch to force a file to be overwritten.
  • By default bbcp does not report data transfer rates. Use the -v switch to see rates.
  • bbcp uses a non-standard network port, so if you are initiating a copy from outside of the NIHnet firewall, you should use the -z switch (see the bbcp web site for an explanation).

Other useful switches are -h to get help and -r for recursive copies. There are many other features which are documented here.

Example: to download a file from Helix to your desktop machine, use the following command on Helix:

helix%  bbcp myfile myusername@mymachine:/myhome/myusername/myfile

As with scp, bbcp will prompt you for your password before transferring the file; in this case the password of your account on your desktop system.

Example: to transfer a file from your Mac on your home network (and not on the NIH VPN) to Helix, open the Terminal app on your Mac and type

bbcp -f -z myfile username@helix.nih.gov:/home/username/myfile

This will overwrite myfile on Helix if it already exists.

As part of a Biowulf batch job

You may want to automatically transfer your generated results back to your local system at the end of a Biowulf batch job.

Command-line transfer as part of a batch job

Biowulf batch jobs run on the Biowulf compute nodes which are on a private network. Therefore you cannot directly scp from a Biowulf compute node to your local system. The recommended way to automatically transfer files at the end of a batch job is a Globus command line transfer.

First you should get familiar with the Globus command-line interface.

Then add something like the following at the end of your Biowulf batch job:

#!/bin/bash

# process your data
..... some batch job commands ....

# now set up a Globus command-line transfer to copy the results back to your local system
globus transfer --recursive  \
	e2620047-6d04-11e5-ba46-22000b92c6ec:/data/user/mydir/
	d8eb36b6-6d04-11e5-ba46-22000b92c6ec:/data1/myoutput/ \
The output from the last line of this batch script, which will appear in the usual slurm-#####.out output file, will be a Globus task id of the form
Task ID: 2fdd385c-bf3e-11e3-b461-22000a971261
To/from Object storage

Specialized file transfer tools

Some sources of biological data have specialized tools for file transfer.

Downloading data from NCBI:

NCBI makes a large amount of data available through the NCBI ftp site, and also provides most or all of the same data on their Aspera server. Aspera is a commercial package that has considerably faster download speeds than ftp. More details in the NCBI Aspera Transfer Guide.

via the Aspera command line client
You can use the Aspera command-line client (ascp) on Helix to download data from NCBI directly into your Biowulf/Helix account space. Aspera transfers can put a heavy I/O load on the Biowulf login node, and will not work from the Biowulf compute nodes, so please perform all Aspera transfers on Helix, the interactive file transfer system.

You do not need to load any modules. The 'ascp' command is available on Helix by default. If desired, you can set an alias for ascp that includes the key, e.g

alias ascp="/usr/bin/ascp -i /opt/aspera/asperaweb_id_dsa.openssh"

Sample session (user input in bold):

helix% module load aspera
helix% ascp -T -i /opt/aspera/asperaweb_id_dsa.openssh  -l 300M \
     anonftp@ftp-private.ncbi.nlm.nih.gov:/snp/organisms/human_9606/ASN1_flat/ds_flat_ch1.flat.gz /data/user
ds_flat_ch1.flat.gz                                                          100% 5523MB  291Mb/s    02:41
Completed: 5656126K bytes transferred in 161 seconds

If your download stops before completion, you can use the -k2 flag to resume transfers without re-downloading all the data. e.g.

helix% ascp -T -i /opt/aspera/asperaweb_id_dsa.openssh -k2 -l500M \
         anonftp@ftp-private.ncbi.nlm.nih.gov:/snp/organisms/human_9606/ASN1_flat /data/user/
ds_flat_ch1.flat.gz                                       100%  323MB  0.0 b/s    00:03    
[...]
ds_flat_chPAR.flat.gz                                     100% 7742KB  402 b/s    00:01    
ds_flat_chUn.flat.gz                                      100%   39MB  107Mb/s    00:00    
ds_flat_chX.flat.gz                                       100%  104MB  196Mb/s    00:18    
ds_flat_chY.flat.gz                                       100%   14MB  3.3Mb/s    04:59    
Completed: 1706213K bytes transferred in 301 seconds
 (46432K bits/sec), in 30 files, 1 directory.
In the example above, the client skips over the files that had previously been transferred, and will download only the remaining files.

Typical file transfer rates from the NCBI server are 400 - 500 Mb/s, so '-l500M' is the recommended value.

via the Aspera browser plugin
Data transfer by this method will be slower than using the command-line client on Helix, but may be more convenient for smaller transfers. You will need to download the free Aspera client browser plugin, install it on your desktop browser, and download the data to a Helix/Biowulf data area that is mapped onto your desktop system.

  1. Download the Aspera Connect browser plugin from the Aspera website and install on your Mac, Windows, or Linux system.
  2. Map your Helix /data or /scratch area on your desktop system as described in the section above on Mapped Network Drive.
  3. Start up Aspera Connect on your Mac, Windows or Linux system. Go to Preferences->Network, and set the connection speed to the maximum value. In our tests, the actual typical download speed to a desktop system is 50 - 100 Mb/s.
  4. Point your browser to the NCBI Aspera server and select the directory or files you want to download. Select your Helix data or scratch areas as the download target area. You can monitor the download in the Aspera transfer manager window.

    By clicking on the icon in the transfer manager window, you can open the Transfer Monitor which will show a more detailed graph of the transfer rate

via browser on Helix using NX

  1. Start firefox on helix.
  2. Download the Aspera Connect browser plugin from the Aspera website to your home directory. Click on the download tab and choose the operating system, which is v2.4.7 - Linux x86_64
  3. Close firefox
  4. cd to the directory to which you downloaded the file
  5. At the helix prompt, type

    sh aspera-connect-2.4.7.37118-linux-64.sh
    
    which will create the directories .aspera/connect in your home directory.

To actually use the aspera plugin on helix, you will have to install the NoMachine NX client specific to your operating system from NoMachine web site.

Select your client from the Client Products list. If you are using Windows, also download all the fonts before running the software. Once you have the client installed, run the "NX Connection Wizard"

Give the session a name such as helix
Host: helix.nih.gov
Internet Connection: LAN
Desktop: Unix Gnome
NOTE: KDE is not installed on helix
You can click through with no changes after that.

  1. Login as yourself
  2. Applications -> Accessories -> Terminal

    NX Desktop Image - Terminal

  3. At the helix prompt, enter

    ./.aspera/connect/bin/asperaconnect&
    

    NX Desktop Image - Aspera command

    An icon that looks like the letter G should show up in the right hand corner next to the clock. Right-click on the icon and click "Preferences" to set them as requested on the NCBI 1000genomes page. Note: change the download directory to your /data directory.

    NX Desktop Image - Aspera command

  4. Start firefox on helix again and go to the website to download your files.

    NX Desktop Image - Aspera command

    NX Desktop Image - Aspera command

via FTP
It is also possible to download data from NCBI using ftp. In our tests, the Aspera client gave up to 5x faster transfer speeds than NCBI. However, some data may only be available on the NCBI ftp server.

On Helix or Biowulf, use ftp ftp.ncbi.nlm.nih.gov to access the NCBI ftp site. Sample session (user input in bold):

helix%  ftp ftp.ncbi.nlm.nih.gov
Connected to ftp.wip.ncbi.nlm.nih.gov.
220-
 Warning Notice!
[...] 
 ---
 Welcome to the NCBI ftp server! The official anonymous access URL is ftp://ftp.ncbi.nih.gov
 
 Public data may be downloaded by logging in as "anonymous" using your E-mail address as a password.
 
 Please see ftp://ftp.ncbi.nih.gov/README.ftp for hints on large file transfers
220 FTP Server ready.
500 AUTH not understood
500 AUTH not understood
KERBEROS_V4 rejected as an authentication type
Name (ftp.ncbi.nlm.nih.gov:susanc): anonymous
331 Anonymous login ok, send your complete email address as your password.
Password:
230 Anonymous access granted, restrictions apply.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd blast/db/
250 CWD command successful
ftp> get wgs.58.tar.gz
local: wgs.58.tar.gz remote: wgs.58.tar.gz
227 Entering Passive Mode (130,14,29,30,195,228)
150 Opening BINARY mode data connection for wgs.58.tar.gz (983101055 bytes)
226 Transfer complete.
983101055 bytes received in 1.3e+02 seconds (7.7e+03 Kbytes/s)
ftp> quit
221 Goodbye.

helix% 
Transfers from the Biowulf compute nodes:
via the proxy server
A proxy server has been set up so that the compute nodes can download data from hosts on the internet. The proxy server will handle a limited set of protocols: http, https, rsync, ftp. Any other program that uses one of the following environment variables will also work.
http_proxy
ftp_proxy
RSYNC_PROXY
https_proxy
This includes programs such as wget, curl, lftp, rsync, and git.
  • wget example:
    [user@cn1875 ~]$ wget http://www.nih.gov
    --2015-10-01 12:47:48--  http://www.nih.gov/
    Resolving dtn02-e0... 10.1.200.238
    Connecting to dtn02-e0|10.1.200.238|:3128... connected.
    Proxy request sent, awaiting response... 200 OK
    Length: unspecified [text/html]
    Saving to: "index.html"
    
        [ <=>                                                                                                                                    ] 38,836      --.-K/s   in 0.002s
    
    2015-10-01 12:47:48 (18.5 MB/s) - "index.html" saved [38836]
    
  • curl example:
    [user@cn1875 ~]$ curl -o nih_homepage.html http://www.nih.gov
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 38836    0 38836    0     0  2547k      0 --:--:-- --:--:-- --:--:-- 2917k
    
    
  • lftp example:
    [user@cn1875 ~ ]$ lftp ftp.redhat.com
    lftp ftp.redhat.com:~> ls
    drwxr-xr-x  --  /                    
    drwxr-xr-x  --  ..
    lrwxrwxrwx            -  2009-12-19 00:00  pub -> .
    drwxr-xr-x            -  2015-03-18 00:00  redhat
    lftp ftp.redhat.com:~> exit 
    
  • rsync example:
    [user@cn1875 ~ ]$ rsync mirror.umd.edu::centos/timestamp.txt $HOME/tmp
    

    Note: rsync from the compute nodes with the SSH protocol is not supported. Only the rsync protocol is supported (notice the double colon in the command above). Therefore, the following will not work:

    [user@cn1875 ~ ]$ rsync server.nih.gov:~/file.txt $HOME/tmp
    

  • git clone example:
    Note that git requests to "git://some/URL" will not work. Due to the protocol limitations on the proxy server, the URL has to be "http://some/URL" or "https://some/URL".
    [user@cn1875 ~ ]$ git clone https://github.com/ncbi/sra-tools.git
    Initialized empty Git repository in /home/user/sra-tools/.git/
    remote: Counting objects: 7447, done.
    remote: Compressing objects: 100% (137/137), done.
    remote: Total 7447 (delta 79), reused 0 (delta 0), pack-reused 7309
    Receiving objects: 100% (7447/7447), 15.81 MiB | 5.73 MiB/s, done.
    Resolving deltas: 100% (4868/4868), done.
    
  • NCBI applications such as SRA-toolkit, NCBI-ngs, ngs-bam, ncbi-vdb, Entrez Direct and related applications such as hisat have been configured to automatically download data from NCBI as necessary. See the application page for details.
Helix Staff Notes and Comments

FTP is inherently insecure because it sends data and most importantly your password in plain, unencrypted text. SCP and sFTP use an SSH2 encrypted connection to transfer both data and password information. While security is good, it comes at the price of slower transfer rates than FTP.

For those who would need the transfer rates of FTP and are not concerned with data insecurity, we provide access to anonymous FTP on Helix.

The rate of data transfer is only an issue for data amounts greater than 256MB. For amounts less than this, any application will suffice. To optimize transfer rates for large amounts of data, use less demanding encryption ciphers, such as blowfish or arcfour, and try to transfer the data when the network is less busy (before 10 am and after 6 pm). Also use the most appropriate application based on the table below.

The Helix Staff has compared the applications and our results are below. For the most part we recommend using Globus for most transfers. scp is the default and best option for Linux/Unix machines.

Platform Application Pros Cons
All platforms Globus Best for very large files (> 256MB). Clients for all platforms, web-based. Notifications sent on completion. The client must first be installed on the desktop.
Filezilla v3.0 Better control over transfer during the process, fewer and simpler controls than WinSCP, fastest transfer rates by sFTP. scp not an option.
Windows WinSCP Much faster transfer rates than PuTTY-pscp/psftp, but slightly faster than Filezilla for uploads using scp. Cumbersome user interface for changing local and remote directories.
pscp/psftp Direct command line control over process. Need to run through the command prompt, slowest transfer rates seen.
Mapped Network Drive Convenient. Fairly slow transfer rates, especially very large files.
Macs bbcp,scp,sftp Can be used for scripting & automatic file transfers, fastest transfer rates non-GUI interface.
Fugu Easy to configure and use. Slower than command-line.
Mapped Network Drive Convenient drag-and-drop. Fairly slow transfer rates, especially for large files.
Linux/Unix scp,sftp Same as for Macs. Same as for Macs.
bbcp Fastest transfer rate.