Interacting with GridFTP - The client globus-url-copy

GridFTP Service
Conclusion
Interacting with GridFTP
The client globus-url-copy
Riccardo Zappi1
1
INFN-CNAF ,
National Center of INFN (National Institute for Nuclear Physic) for
Research and Development into the field of Information Technologies
SRM School, 2009
CNAF, Bologna, Italy
R.Zappi
Data Management
INFN CNAF
1
Outline
1
GridFTP Service
GridFTP Service
GridFTP Client
2
Conclusion
Outline
1
GridFTP Service
GridFTP Service
GridFTP Client
2
Conclusion
GridFTP Service
Conclusion
GridFTP Service
What is GridFTP?
Definition
GridFTP - File Transfer Protocol in Grid Computing Networks. It is a
high-performance, secure, reliable data transfer protocol optimized for
high-bandwidth wide-area networks.
R.Zappi
Data Management
INFN CNAF
4
GridFTP Service
Conclusion
GridFTP Service
GridFTP: The Protocol
It extends the FTP standard with:
Strong authentication, encryption via GSI on both control
(command) and data channels
Third-party transfers: C can initiate a transfer from A to B.
Multiple data channels for parallel transfers
Striped data transfers
Partial file transfers
Tunable network and I/O parameters
R.Zappi
Data Management
INFN CNAF
5
GridFTP Service
Conclusion
GridFTP Service
GridFTP: The Protocol definition and GridFTP enabled servers
GFD.47, GridFTP v2 Protocol Description
GFD.20, GridFTP: Protocol Extensions to FTP for the Grid
Multiple independent implementation can interoperate
The Globus Toolkit supplies a reference implementation of :
Server, Client tools (globus-url-copy) and Development
Libraries
Another client developed and supported at NCSA:
UberFTP
R.Zappi
Data Management
INFN CNAF
6
GridFTP Service
Conclusion
GridFTP Service
GridFTP channels
Two channel protocol like FTP
Control Channel
Communication link (TCP) over which
commands and responses flow
Low bandwidth; encrypted and
integrity protected by default
Data Channel
Communication link(s) over which the
actual data of interest flows
High Bandwidth; authenticated by
default; encryption and integrity
protection optional
R.Zappi
Data Management
INFN CNAF
7
GridFTP Service
Conclusion
GridFTP Client
Using GridFTP: globus-url-copy command line client
Definition
globus-url-copy is the command to transfer a file between sites
using GridFTP. It is not an interactive command.
$ globus-url-copy <source> <destination>
where <source> or <destination> are of the format:
if local file, file:<full path>
if remote file, gsiftp://<hostname>/<full path>
R.Zappi
Data Management
INFN CNAF
8
GridFTP Service
Conclusion
GridFTP Client
Using GridFTP: usage pattern
Getting a file: move a file from the server to the local machine
$ globus-url-copy gsiftp://<source> file:/<dest>
Putting a file: move a file from one system to a server
$ globus-url-copy file:/<source> gsiftp://<dest>
Third party transfers: move a file between two GridFTP servers
$ globus-url-copy gsiftp://<source> gsiftp://<dest>
R.Zappi
Data Management
INFN CNAF
9
GridFTP Service
Conclusion
GridFTP Client
Using GridFTP: examples (1/3)
Server to local:
$ globus-url-copy -vb
gsiftp://<source-host>:2811/tmp/ciccio.txt
Local to server:
$ globus-url-copy -vb
file:’pwd’/ciccio.txt
gsiftp://<dest-host>:2811/tmp/ciccio.txt
Remote server A to remote server B:
$ globus-url-copy -vb
gsiftp://<source-host>:2811/tmp/ciccio.txt
gsiftp://<dest-host>:2811/tmp/ciccio.txt
R.Zappi
Data Management
INFN CNAF
10
GridFTP Service
Conclusion
GridFTP Client
globus-url-copy command line client
Some parameters:
-vb specifies verbose mode and displays:
number of bytes transferred,
performance since the last update (currently every
5 seconds), and
average performance for the whole transfer.
-p specifies the number of parallel data connections that
should be used.
This is one of the most commonly used options.
-tcp-bs specifies the size (in bytes) of the TCP buffer to be
used by the underlying ftp data channels.
This is critical to good performance over the WAN.
-help prints help. :)
R.Zappi
Data Management
INFN CNAF
11
GridFTP Service
Conclusion
GridFTP Client
Using GridFTP: examples (2/3)
By default, globus-url-copy uses 1 channel
$ globus-url-copy -vb
gsiftp://<source-host>:2811/bigfile
file:’pwd’/bigfile
Use 4 parallel stream
$ globus-url-copy -vb -p 4
gsiftp://<source-host>:2811/bigfile
file:’pwd’/bigfile
R.Zappi
Data Management
INFN CNAF
12
GridFTP Service
Conclusion
GridFTP Client
Using GridFTP: examples (3/3)
Increase the TCP windows
$ globus-url-copy -vb -p 4 -tcp-bs 1048576
gsiftp://<source-host>:2811/bigfile
file:’pwd’/bigfile
Still faster by using large memory buffers
$ globus-url-copy -vb -bs 1048576 -tcp-bs 1048576
gsiftp://<source-host>:2811/bigfile
file:’pwd’/bigfile
R.Zappi
Data Management
INFN CNAF
13
Outline
1
GridFTP Service
GridFTP Service
GridFTP Client
2
Conclusion
GridFTP Service
Conclusion
Questions ?
R.Zappi
Data Management
INFN CNAF
15