Performant Uploads with TUS

Overview

Questions:
Objectives:
  • Setup TUSd

  • Configure Galaxy to use it to process uploads

Requirements:
Time estimation: 30 minutes
Supporting Materials:
Last modification: Jul 21, 2022
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License The GTN Framework is licensed under MIT

Here you’ll learn to setup TUS an open source resumable file upload server to process uploads for Galaxy. We use an external process here to offload the main Galaxy processes for more important work and not impact the entire system during periods of heavy uploading.

Agenda

  1. TUS and Galaxy
    1. Installing and Configuring
    2. Check it works

Galaxy Admin Training Path

The yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.

  1. Step 1 ansible-galaxy
  2. Step 2 tus
  3. Step 3 cvmfs
  4. Step 4 singularity
  5. Step 5 tool-management
  6. Step 6 data-library
  7. Step 7 connect-to-compute-cluster
  8. Step 8 job-destinations
  9. Step 9 pulsar
  10. Step 10 gxadmin
  11. Step 11 monitoring
  12. Step 12 tiaas
  13. Step 13 reports
  14. Step 14 ftp

TUS and Galaxy

To allow your user to upload via TUS, you will need to:

  • configure Galaxy to know where the files are uploaded.
  • install TUSd
  • configure Nginx to proxy TUSd

Installing and Configuring

hands_on Hands-on: Setting up ftp upload with Ansible

  1. In your playbook directory, add the galaxyproject.tusd role to your requirements.yml

    --- a/requirements.yml
    +++ b/requirements.yml
    @@ -12,3 +12,5 @@
       version: 0.3.0
     - src: usegalaxy_eu.certbot
       version: 0.1.5
    +- name: galaxyproject.tusd
    +  version: 0.0.1
       
    

    Tip: How to read a Diff

    If you haven’t worked with diffs before, this can be something quite new or different.

    If we have two files, let’s say a grocery list, in two files. We’ll call them ‘a’ and ‘b’.

    code-in Old

    $ cat old
    🍎
    🍐
    🍊
    🍋
    🍒
    🥑

    code-out New

    $ cat new
    🍎
    🍐
    🍊
    🍋
    🍍
    🥑

    We can see that they have some different entries. We’ve removed 🍒 because they’re awful, and replaced them with an 🍍

    Diff lets us compare these files

    $ diff old new
    5c5
    < 🍒
    ---
    > 🍍

    Here we see that 🍒 is only in a, and 🍍 is only in b. But otherwise the files are identical.

    There are a couple different formats to diffs, one is the ‘unified diff’

    $ diff -U2 old new
    --- old 2022-02-16 14:06:19.697132568 +0100
    +++ new 2022-02-16 14:06:36.340962616 +0100
    @@ -3,4 +3,4 @@
    🍊
    🍋
    -🍒
    +🍍
    🥑

    This is basically what you see in the training materials which gives you a lot of context about the changes:

    • --- old is the ‘old’ file in our view
    • +++ new is the ‘new’ file
    • @@ these lines tell us where the change occurs and how many lines are added or removed.
    • Lines starting with a - are removed from our ‘new’ file
    • Lines with a + have been added.

    So when you go to apply these diffs to your files in the training:

    1. Ignore the header
    2. Remove lines starting with - from your file
    3. Add lines starting with + to your file

    The other lines (🍊/🍋 and 🥑) above just provide “context”, they help you know where a change belongs in a file, but should not be edited when you’re making the above change. Given the above diff, you would find a line with a 🍒, and replace it with a 🍍

    Added & Removed Lines

    Removals are very easy to spot, we just have removed lines

    --- old	2022-02-16 14:06:19.697132568 +0100
    +++ new 2022-02-16 14:10:14.370722802 +0100
    @@ -4,3 +4,2 @@
    🍋
    🍒
    -🥑

    And additions likewise are very easy, just add a new line, between the other lines in your file.

    --- old	2022-02-16 14:06:19.697132568 +0100
    +++ new 2022-02-16 14:11:11.422135393 +0100
    @@ -1,3 +1,4 @@
    🍎
    +🍍
    🍐
    🍊

    Completely new files

    Completely new files look a bit different, there the “old” file is /dev/null, the empty file in a Linux machine.

    $ diff -U2 /dev/null old
    --- /dev/null 2022-02-15 11:47:16.100000270 +0100
    +++ old 2022-02-16 14:06:19.697132568 +0100
    @@ -0,0 +1,6 @@
    +🍎
    +🍐
    +🍊
    +🍋
    +🍒
    +🥑

    And removed files are similar, except with the new file being /dev/null

    --- old	2022-02-16 14:06:19.697132568 +0100
    +++ /dev/null 2022-02-15 11:47:16.100000270 +0100
    @@ -1,6 +0,0 @@
    -🍎
    -🍐
    -🍊
    -🍋
    -🍒
    -🥑
  2. Install the role with:

    code-in Input: Bash

    ansible-galaxy install -p roles -r requirements.yml
    
  3. Configure it in your group variables

    --- a/group_vars/galaxyservers.yml
    +++ b/group_vars/galaxyservers.yml
    @@ -60,6 +60,8 @@ galaxy_config:
         allow_user_impersonation: true
         # Tool security
         outputs_to_working_directory: true
    +    # TUS
    +    tus_upload_store: /data/tus
       gravity:
         galaxy_root: "{{ galaxy_root }}/server"
         app_server: gunicorn
    @@ -139,3 +141,16 @@ nginx_conf_http:
     nginx_ssl_role: usegalaxy_eu.certbot
     nginx_conf_ssl_certificate: /etc/ssl/certs/fullchain.pem
     nginx_conf_ssl_certificate_key: /etc/ssl/user/privkey-nginx.pem
    +
    +# TUS
    +galaxy_tusd_port: 1080
    +tusd_instances:
    +  - name: main
    +    user: "{{ galaxy_user.name }}"
    +    group: "galaxy"
    +    args:
    +      - "-host=localhost"
    +      - "-port={{ galaxy_tusd_port }}"
    +      - "-upload-dir={{ galaxy_config.galaxy.tus_upload_store }}"
    +      - "-hooks-http=https://{{ inventory_hostname }}/api/upload/hooks"
    +      - "-hooks-http-forward-headers=X-Api-Key,Cookie"
       
    
  4. We proxy the service next to Galaxy. As it resides “under” the Galaxy path, clients will send cookies and authentication headers to TUS, which it can use to process the uploads before telling Galaxy when they’re done.

    --- a/templates/nginx/galaxy.j2
    +++ b/templates/nginx/galaxy.j2
    @@ -28,6 +28,22 @@ server {
             proxy_set_header Upgrade $http_upgrade;
         }
        
    +    location /api/upload/resumable_upload {
    +        # Disable request and response buffering
    +        proxy_request_buffering     off;
    +        proxy_buffering             off;
    +        proxy_http_version          1.1;
    +
    +        # Add X-Forwarded-* headers
    +        proxy_set_header X-Forwarded-Host   $host;
    +        proxy_set_header X-Forwarded-Proto  $scheme;
    +
    +        proxy_set_header Upgrade            $http_upgrade;
    +        proxy_set_header Connection         "upgrade";
    +        client_max_body_size        0;
    +        proxy_pass http://localhost:{{ galaxy_tusd_port }}/files;
    +    }
    +
         # Static files can be more efficiently served by Nginx. Why send the
         # request to Gunicorn which should be spending its time doing more useful
         # things like serving Galaxy!
       
    
  5. Add to the end of your Galaxy playbook

    --- a/galaxy.yml
    +++ b/galaxy.yml
    @@ -19,3 +19,4 @@
           become: true
           become_user: "{{ galaxy_user.name }}"
         - galaxyproject.nginx
    +    - galaxyproject.tusd
       
    
  6. Run the playbook

    code-in Input: Bash

    ansible-playbook galaxy.yml
    

Congratulations, you’ve set up TUS for Galaxy.

Check it works

hands_on Hands-on: Check that it works.

  1. SSH into your machine

  2. Check the active status of tusd by systemctl status tusd-main.

  3. Upload a small file! (Pasted text will not pass via TUS)

  4. Check the directory /data/tus/ has been created and it’s contents

    code-in Input: Bash

    sudo tree /data/tus/
    
  5. You’ll see files in that directory, a file that’s been uploaded and an ‘info’ file which contains metadata about the upload.

Got lost along the way?

If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.

If you’re using git to track your progress, remember to add your changes and commit with a good commit message!

Key points

  • Use TUS to make uploads more efficient, especially for large uploads over unstable connections.

Frequently Asked Questions

Have questions about this tutorial? Check out the FAQ page for the Galaxy Server administration topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Click here to load Google feedback frame

Citing this Tutorial

  1. Marius van den Beek, Helena Rasche, Lucille Delisle, 2022 Performant Uploads with TUS (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/admin/tutorials/tus/tutorial.html Online; accessed TODAY
  2. Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

details BibTeX

@misc{admin-tus,
author = "Marius van den Beek and Helena Rasche and Lucille Delisle",
title = "Performant Uploads with TUS (Galaxy Training Materials)",
year = "2022",
month = "07",
day = "21"
url = "\url{https://training.galaxyproject.org/training-material/topics/admin/tutorials/tus/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Batut_2018,
    doi = {10.1016/j.cels.2018.05.012},
    url = {https://doi.org/10.1016%2Fj.cels.2018.05.012},
    year = 2018,
    month = {jun},
    publisher = {Elsevier {BV}},
    volume = {6},
    number = {6},
    pages = {752--758.e1},
    author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning},
    title = {Community-Driven Data Analysis Training for Biology},
    journal = {Cell Systems}
}
                

Congratulations on successfully completing this tutorial!