name: inverse layout: true class: center, middle, inverse
---
# Running Jobs on Remote Resources with Pulsar
Authors:
Nate Coraor
Simon Gladman
Marius van den Beek
Helena Rasche
last_modification
Updated: Apr 6, 2021
text-document
Plain-text slides
Tip:
press
P
to view the presenter notes
??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off Press `C` to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting. --- ## Requirements Before diving into this slide deck, we recommend you to have a look at: - [Galaxy Server administration](/training-material/topics/admin) - Ansible: [
slides
slides](/training-material/topics/admin/tutorials/ansible/slides.html) - [
tutorial
hands-on](/training-material/topics/admin/tutorials/ansible/tutorial.html) - Galaxy Installation with Ansible: [
slides
slides](/training-material/topics/admin/tutorials/ansible-galaxy/slides.html) - [
tutorial
hands-on](/training-material/topics/admin/tutorials/ansible-galaxy/tutorial.html) - A server/VM on which to deploy Pulsar --- ### <i class="far fa-question-circle" aria-hidden="true"></i><span class="visually-hidden">question</span> Questions - How does pulsar work? - How can I deploy it? --- ### <i class="fas fa-bullseye" aria-hidden="true"></i><span class="visually-hidden">objectives</span> Objectives - Have an understanding of what Pulsar is and how it works - Install and configure a Pulsar server on a remote linux machine - Be able to get Galaxy to send jobs to a remote Pulsar server --- # What are heterogenous compute resources? Differences in: - Operating system or version - Users/groups - Data accessibility - Administrative control - Physical Location (i.e. Cities) Galaxy expects: - One OS, version (dependencies) - Shared filesystem w/ fixed paths --- # Example - Australia ![australia_locations.png](../../images/australia_locations.png) --- # Partial solution - CLI job runner SSH to remote, submit jobs with CLI `sbatch`, `qsub`, etc. Still depends on shared FS --- # Pulsar ![pulsar_logo.png](../../images/pulsar_logo.png) Galaxy's remote job management system * Can run jobs on any(?) OS including Windows * Multiple modes of operation for every environment --- # Pulsar - Architecture * Pulsar server runs on remote resource (e.g. cluster head node) * Galaxy Pulsar job runner is Pulsar client * Communication is via HTTP or AMQP, language is JSON * File transport is dependent on communication method --- # Pulsar - Architecture ![pulsar_schematic.png](../../images/pulsar_schematic.png) --- # Pulsar Transports - RESTful Pulsar server listens over HTTP(S) Pulsar client (Galaxy) initiates connections to Pulsar server Good for: - Environments where firewall, open ports are not concerns - No external dependencies (AMQP server) --- # Pulsar Transports - AMQP Pulsar server and client connect to AMQP server Good for: - Firewalled/NATted remote compute - Networks w/ bad connectivity --- # Pulsar Transports - Embedded Galaxy runs Pulsar server internally Good for: - Manipulating paths - Copying input datasets from non-shared filesystem --- # Pulsar - Job file staging Pulsar can be configured to *push* or *pull* when using RESTful: - Push - Galaxy sends job inputs, metadata to Pulsar over HTTP - Upon completion signal from Pulsar, Galaxy pulls from Pulsar over HTTP - Pull - Upon setup signal, Pulsar pulls job inputs, metadata from Galaxy over HTTP - Upon completion, Pulsar pushes to Galaxy over HTTP Pulsar can use libcurl for more robust transfers with resume capability AMQP is pull-only because Pulsar does not run HTTP server --- # Pulsar - Dependency management Pulsar does not provide Tool Shed tool dependency management. But: - It has a similar dependency resolver config to Galaxy - It can auto-install **conda** dependencies - It can use containers too! --- # Pulsar - Job management Pulsar "managers" provide job running interfaces: - `queued_python`: Run locally on the Pulsar server - `queued_drmaa`: Run on a cluster with DRMAA - `queued_cli`: Run on a cluster with local `qsub`, `sbatch`, etc. - `queued_condor`: Run on HTCondor --- # Pulsar Australia ![pulsar_australia.png](../../images/pulsar_australia.png) --- # Resources * Pulsar Read-the-docs * [https://pulsar.readthedocs.io/en/latest/index.html](https://pulsar.readthedocs.io/en/latest/index.html) * Pulsar on galaxyproject.org * [https://galaxyproject.org/admin/config/pulsar/](https://galaxyproject.org/admin/config/pulsar/) * Pulsar Github * [https://github.com/galaxyproject/pulsar](https://github.com/galaxyproject/pulsar) * Pulsar Ansible * [https://github.com/galaxyproject/ansible-pulsar](https://github.com/galaxyproject/ansible-pulsar) --- ### <i class="fas fa-key" aria-hidden="true"></i><span class="visually-hidden">keypoints</span> Key points - Pulsar allows you to easily add geographically distributed compute resources into your Galaxy instance - It also works well in situations where the compute resources cannot share storage pools. --- ## Thank You! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!
Authors:
Nate Coraor
Simon Gladman
Marius van den Beek
Helena Rasche
This material is licensed under the Creative Commons Attribution 4.0 International License
.