Skip to content

Latest commit

 

History

History
243 lines (164 loc) · 12.7 KB

README.md

File metadata and controls

243 lines (164 loc) · 12.7 KB

repocli: cross-platform CLI for managing the Radboud Data Repository data

A command-line tool for performing basic operations on the data content (not the metadata) of the Radboud Data Repository collections. In essense, it uses the WebDAV protocol to implemente the operations; therefore it is also a genetic tool for managing data accessible via WebDAV with the HTTP basic authentication (e.g. SURFDrive).

The following operations are currently implemented:

  • ls: list a directory
  • mkdir: create a new directory
  • cp: copy a file or a directory
  • mv: rename a file or a directory
  • rm: remove a file or a directory
  • get: download a file or a directory
  • put: upload a file or a directory
  • mget: download multiple files or directories
  • mput: upload multiple files or directories

When performing recursive operation on a directory, the tool does a directory walk-through and applies the operation on individual files in parallel. This approach breaks down a lengthy bulk-operation request into multiple shorter, less resource demanding requests. It helps improve the overall success rate of the operation.

Download

The repocli tool is provided as a single binary file which can be downloaded from the here.

Download the asset file repocli for Linux, repocli.darwin for Intel-based MacOSX and repocli.exe for Windows.

You can place the file in any directory as long as the directory is part of the $PATH variable (or %PATH% for Windows).

For Linux and MacOSX users, you also need to make the downloaded file executable, e.g.

$ chmod +x repocli

Usage

A CLI for managing data content of the Donders Repository collections.

Usage:
  repocli [command]

Available Commands:
  completion  Generate the autocompletion script for the specified shell
  config      configure the repository connection and save the credential
  cp          copy file or directory in the repository
  get         download file or directory from the repository
  help        Help about any command
  ls          list file or directory in the repository
  mget        download multiple files or directories from the repository
  mkdir       create new directory in the repository
  mput        upload multiple files or directories to the repository
  mv          move file or directory in the repository
  put         upload file or directory to the repository
  rm          remove file or directory from the repository
  shell       start an interactive shell
  version     print version number and exit

Flags:
  -c, --config path       path of the configuration YAML file. (default "/home/tg/honlee/.repocli.yml")
  -h, --help              help for repocli
  -n, --nthreads number   number of concurrent worker threads. (default 4)
  -s, --silent            set to slient mode (i.e. do not show progress)
  -u, --url URL           URL of the webdav server.
  -v, --verbose           verbose output

Use "repocli [command] --help" for more information about a command.

The configuration file

The credential (username and password) of the data-access account should be provided in a configuration file (specified by the -c option) in the YAML format. The default location of this configuration file is ${HOME}/.repocli.yml on Linux/MacOSX and C:\Users\<username>\.repocli.yml on Windows. Since the program expects that the password stored in the configuration file is encrypted, it is better to use the following command to generate (or overwrite) the file:

$ repocli config

You will be asked to provide the WebDAV's baseURL, username and password. For the Radboud Data Repository users, the baseURL is https://webdav.data.ru.nl. The username and password are your data-access account credential. The credentail can be retrieved from the RDR web portal. See the screenshot below as an example:

screenshot of RDR data-access credential

After providing those values, type y to save the credential to the configuration file. Once it is done successfully, you can reuse the configuration file in the future to connect to the same WebDAV endpoint.

💡You can use the login subcommand with the -c option to create multiple configuration files, each for a different WebDAV endpoint.

❗The password in the configuration file is encrypted with the signatures of the file path and the username. Changes on the signatures (e.g. renaming the configuration file) will make the password invalid.

The shell mode

In addition to run the program's subcommands as individual shell commands (single-command mode), the CLI can also be used as an interactive shell (shell mode). One uses the shell command to enter the shell mode:

$ repocli shell

The CLI's specific prompt > repocli will be displayed as the screenshot below, waiting for furhter commands from the user.

screenshot of the shell mode

In the shell mode, the following additional operations are enabled:

  • cd: change the present working directory in the repository
  • pwd: show the present working directory in the repository
  • lcd: change the present working directory at local
  • lpwd: show the present working directory at local
  • lls: list content in the present working directory at local

Hereafter are examples showcasing how to use various subcommands. You can find more detailed and up-to-date usage via the help subcommand. For example, the online help of the get subcommand can be found by:

$ repocli help get

listing a directory

Given a collection with identifier di.dccn.DAC_3010000.01_173, the WebDAV directory in which the collection data is stored is /dccn/DAC_3010000.01_173. To list the content of this WebDAV directory, one does

$ repocli ls -l /dccn/DAC_3010000.01_173
/dccn/DAC_3010000.01_173:
 drwxrwxr-x            0 /dccn/DAC_3010000.01_173/Cropped
 drwxrwxr-x            0 /dccn/DAC_3010000.01_173/raw
 drwxrwxr-x            0 /dccn/DAC_3010000.01_173/test1
 drwxrwxr-x            0 /dccn/DAC_3010000.01_173/test2021
 drwxrwxr-x            0 /dccn/DAC_3010000.01_173/test3
 drwxrwxr-x            0 /dccn/DAC_3010000.01_173/test_loc.new
 drwxrwxr-x            0 /dccn/DAC_3010000.01_173/test_sync
 drwxrwxr-x            0 /dccn/DAC_3010000.01_173/testx
 drwxrwxr-x            0 /dccn/DAC_3010000.01_173/xyz.5
 drwxrwxr-x            0 /dccn/DAC_3010000.01_173/xyz.x
 -rw-rw-r--          203 /dccn/DAC_3010000.01_173/MANIFEST.txt.1
 -rw-rw-r--       191503 /dccn/DAC_3010000.01_173/MD5E-s191503--8661ce04ccbbf51e96ce124e30fc0c8c.txt
 -rw-rw-r--     49152352 /dccn/DAC_3010000.01_173/MP2RAGE.nii
 -rw-rw-r--         2589 /dccn/DAC_3010000.01_173/Makefile
...

removing a file or directory

Assuming that we want to remove the file MANIFEST.txt.1 from the collection content listed above, we do

$ repocli rm /dccn/DAC_3010000.01_173/MANIFEST.txt.1

If we want to remove the entire sub-directory testx, we use the command

$ repocli rm -r /dccn/DAC_3010000.01_173/textx

where the extra flag -r indicates recursive removal.

creating a directory

To create a subdirectory demo in the collection, we do

$ repocli mkdir /dccn/DAC_3010000.01_173/demo

One could also create a directory tree use the same command, any missing parent directories will also be created (similar to the mkdir -p command on Linux). For example, if we want to create a directory tree demo1/data/sub-001/ses-mri01, we do

$ repocli mkdir /dccn/DAC_3010000.01_173/demo1/data/sub-001/ses-mri01

It can be done with or without the existence of the parent tree structure demo1/data/sub-001.

uploading/download a single file

For uploading/downloading a single file to/from the collection in the repository. One use the put and get sub-commands, respectively. The put and get sub-commands require two arguments. The first argument refers to the source path; while the second to the destination path.

The local path can be in a format recognized by the shell. For the WebDAV path, although either the absolute form (i.e. started with /) or the relative form (i.e. started with ./ or ../) can be used, the relative path makes more sense in the shell mode (i.e. repocli shell, see above) where one can change the current WebDAV directory using the cd command. Outside the shell mode, the current WebDAV working directory is always the one defined by the configuration variable baseURL.

For example, to upload a local file test.txt in the present working directory to /dccn/DAC_3010000.01_173/demo/test.txt, one does

$ repocli put ./test.txt /dccn/DAC_3010000.01_173/demo/test.txt

To download a remote file /dccn/DAC_3010000.01_173/demo/test.txt to test.txt.new in the home directory at local (refered by the $HOME variable), one does

$ repocli get /dccn/DAC_3010000.01_173/demo/test.txt $HOME/test.txt.new

If the destination is a directory, file will be downloaded/uploaded into the directory with the same name. If the destination is an existing file, the file will be skip by default. One can use the -f option to overwrite the existing file.

resursive uploading/downloading a directory

Assuming that we have a local directory /project/3010000.01/demo, and we want to upload the content of it recursively to the collection under the sub-directory demo. We use the command below:

$ repocli put /project/3010000.01/demo/ /dccn/DAC_3010000.01_173/demo

where the first argument to put is a directory locally as the source, and the second is a directory in the repository as the destination.

For downloading a directory from the repository, one does

$ repocli get /dccn/DAC_3010000.01_173/demo/ /project/3010000.01/demo.new

where the first argument is a directory in the repository as the source, and the second is a local directory as the destination.

Note: The same as the rsync command, the tailing / in the source instructs the tool to copy the content into the destination. If the tailing / is left out, it will copy the directory by name in to the destination, resulting in the content being put into a (new) sub-directory in the destination.

moving (i.e. renaming) a file or a directory

For renaming a file within a collection, one uses the mv sub-command. This sub-command also takes two arguments, the source and the destniation.

For example, if we want to rename a file /dccn/DAC_3010000.01_173/test.txt to /dccn/DAC_3010000.01_173/test.txt.old in the repository, we do

$ repocli mv /dccn/DAC_3010000.01_173/test.txt /dccn/DAC_3010000.01_173/test.txt.old

We could also rename an entire directory. For example, if we want to rename a /dccn/DAC_3010000.01_173/demo to /dccn/DAC_3010000.01_173/demo.new, we use the command below (note the tailing / of the source for "moving the content over"):

$ repocli mv /dccn/DAC_3010000.01_173/demo/ /dccn/DAC_3010000.01_173/demo.new

Moving the source directory into a the destination directory can be achived by leaving the tailing / out the source directory. Taking the example above, if the tailing / is omitted, e.g.

$ repocli mv /dccn/DAC_3010000.01_173/demo /dccn/DAC_3010000.01_173/demo.new

the end result will a new directory /dccn/DAC_3010000.01_173/demo.new/demo in which the data within the source directory are moved over.

Error handling

When performing an operation on a large amount of files, there can be temporary (server or network) issues causing errors on few files. While the errors are written to the terminal; one can use the -e {filename} option of repocli to save the errors to a text file {filename}. This text file can be used to simplify the process of patching the operation. The option is currently available for the get, put, mget and mput operations.

From version >= 0.5.0, repocli also supports retry on failed file upload and download. This retry feature is disabled by default and can be enabled for put, get, mput and mget operations with the -r N option where N is the maximum number of retries (i.e. in total N+1 attempts).

Calling repocli from scripts

Since repocli is a standalone executable, it can be used within a shell script or by making a system call. Hereafter are some examples:

  • size.sh gets the total size and number of files in a remote (WebDAV) directory.
  • download_n_process.sh downloads the MR data from a Donders Repository followed by processing it locally.