name: inverse layout: true class: center, middle, inverse
---
# Galaxy Monitoring with gxadmin
Authors:
Nate Coraor
Björn Grüning
Simon Gladman
Helena Rasche
last_modification
Updated: Apr 6, 2021
video-slides
View video slides for this lecture
text-document
Plain-text slides
Tip:
press
P
to view the presenter notes
??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off Press `C` to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting. --- ### <i class="far fa-question-circle" aria-hidden="true"></i><span class="visually-hidden">question</span> Questions - What is gxadmin - What can it do? - How to write a query? --- ### <i class="fas fa-bullseye" aria-hidden="true"></i><span class="visually-hidden">objectives</span> Objectives - Learn gxadmin basics - See some queries and learn how they help debug production issues --- # Database Queries Sometimes it's the best way to get the exact data you need Can be very useful for: - Debugging - Reporting - Analytics But consider: [`gxadmin`](https://github.com/usegalaxy-eu/gxadmin) and sharing your SQL ??? - Admins often need to run database queries. - Debugging the live application in production is difficult, but running queries is easy. - Additionally running SQL queries is commonly used for debuggin and reporting and analytics. - The community collects queries we commonly use in a program called gxadmin. --- # Database Queries "Can you send me the number of jobs per day/state from Main in September and October?" ```sql SELECT date_trunc('month', j.create_time) AS month, j.state, COUNT(j.state) AS job_count FROM job j LEFT OUTER JOIN galaxy_user u ON j.user_id = u.id WHERE u.email != 'monitor@bx.psu.edu' GROUP BY month, j.state HAVING date_trunc('month', j.create_time) >= '2016-09-01' AND date_trunc('month', j.create_time) < '2016-11-01' ORDER BY month, j.state; ``` ??? - We often get requests like how many jobs were run in a time period, and did they finish successfully. - So we write some SQL like this. - As an admin, you'll want to understand SQL. --- # Database Queries ```console month | state | job_count ---------------------+---------+----------- 2016-09-01 00:00:00 | deleted | 11369 2016-09-01 00:00:00 | error | 15375 2016-09-01 00:00:00 | new | 1179 2016-09-01 00:00:00 | ok | 165963 2016-09-01 00:00:00 | paused | 933 2016-09-01 00:00:00 | waiting | 9 2016-10-01 00:00:00 | deleted | 13190 2016-10-01 00:00:00 | error | 12539 2016-10-01 00:00:00 | new | 1183 2016-10-01 00:00:00 | ok | 167547 2016-10-01 00:00:00 | paused | 645 2016-10-01 00:00:00 | queued | 75 2016-10-01 00:00:00 | running | 36 2016-10-01 00:00:00 | waiting | 17 (14 rows) ``` ??? - Here is the output of such a query. - SQL is very powerful for extracting data and graphs for reporting. --- # gxadmin - It's unpleasant to write queries every time you need them. - So the community collects SQL in `gxadmin`. - It's a giant bash script full of SQL and some magic ✨ - [github.com/usegalaxy-eu/gxadmin](https://github.com/usegalaxy-eu/gxadmin) ??? - We often re-run the same queries. - gxadmin makes it easier by storing the useful queries. - everyone is welcome to contribute to it. --- # Database Queries "I need a list of current main toolshed users and a number of their repos" ```console $ gxadmin query ts-repos ``` **OR** ```sql SELECT u.username, COUNT(r.id) AS r_count FROM galaxy_user u JOIN repository r ON u.id = r.user_id WHERE NOT r.deleted GROUP BY u.id ORDER BY r_count DESC LIMIT 12; ``` ??? - However, some of these queries we run over and over. - And those we include in gxadmin. - This way every admin has access to all of the interesting queries we run. - And we don't get repetitive strain injury typing SQL. --- # Database Queries ```console username | r_count ------------------------------------------+--------- iuc | 571 devteam | 366 bgruening | 95 galaxyp | 87 jjohnson | 46 xuebing | 40 peterjc | 39 rnateam | 33 anton | 32 nml | 32 yhoogstrate | 31 iracooke | 30 (12 rows) ``` ??? - Example output from the previous query. --- # Database Queries "What is the status of this job" *Lots* of SQL or: `gxadmin report job-info <job-id>` ```console # Galaxy Job 5132146 Property | Value ------------- | ----- Tool | toolshed.g2.bx.psu.edu/repos/bgruening/canu/canu/1.7 State | running Handler | handler_main_2 Created | 2019-04-20 11:04:40.854975+02 (3 days 03:35:04.881599 ago) Job Runner/ID | condor / 568537 Owner | e08d6c893f5 ## Destination Parameters Key | Value --- | --- description | `canu` priority | `-128` request_cpus | `20` request_memory | `64G` ``` ??? - Another extremely common query is: what is the status of this job? - There is a gxadmin command which aggregates information about jobs. - This can help you debug job failure to run issues. --- # Database Queries "Get me a list of recently registered users" ```console $ gxadmin query latest-users ``` **OR** ```sql SELECT id, create_time AT TIME ZONE 'UTC' as create_time, pg_size_pretty(disk_usage) as disk_usage, username, email, active FROM galaxy_user ORDER BY create_time desc LIMIT 40 ``` ??? - Another common query, who has recently joined our server? - This is an easy command to write, but it is also included in gxadmin. --- # Database Queries ```console id | create_time | disk_usage | username | email | groups | active ------+------------------------+------------+-------------+-------------+--------------+-------- 5581 | 2019-04-23 14:53:58+02 | | xxx | xxx | | t 5580 | 2019-04-23 11:25:03+02 | 1222 MB | xxx | xxx | | t 5579 | 2019-04-23 10:40:30+02 | 841 MB | xxx | xxx | | t 5578 | 2019-04-23 10:31:50+02 | | xxx | xxx | | t 5577 | 2019-04-23 10:27:21+02 | 0 bytes | xxx | xxx | | t 5576 | 2019-04-23 10:07:53+02 | | xxx | xxx | | t 5575 | 2019-04-23 09:04:40+02 | 2011 MB | xxx | xxx | training-gqa | t 5574 | 2019-04-23 09:01:58+02 | 296 MB | xxx | xxx | | t 5573 | 2019-04-23 08:58:42+02 | 605 MB | xxx | xxx | | t 5572 | 2019-04-23 08:58:17+02 | 1762 MB | xxx | xxx | training-gqa | t 5571 | 2019-04-23 08:56:11+02 | 0 bytes | xxx | xxx | | f 5570 | 2019-04-23 08:55:57+02 | 1759 MB | xxx | xxx | training-gqa | t 5569 | 2019-04-23 08:43:21+02 | 1621 MB | xxx | xxx | | t 5568 | 2019-04-23 07:48:38+02 | | xxx | xxx | | t 5567 | 2019-04-23 05:12:43+02 | | xxx | xxx | | t ``` ??? - Example output of the previous query. - With SQL you can choose just the columns or data that is relevant to you. - This may not be so easily available in the interface. --- # Tabular Output Converting SQL query output to tabular and other formats ```SQL COPY (...) to STDOUT with CSV DELIMITER E'\t' ``` **or** ```console $ gxadmin query ... $ gxadmin tsvquery ... $ gxadmin csvquery ... ``` ??? - However to reuse this data in other applications, the postgres format is not convenient. - There is special syntax you can use to output CSV. - gxadmin includes query variants to do this as well. --- # Monitoring Telegraf can consume Influx formatted output, which `gxadmin` can produce: ```console $ gxadmin iquery queue queue,tool_id=ncbi_blastx_wrapper/0.3.1,state=running count=76 queue,tool_id=rna_star/2.6.0b-2,state=queued count=36 queue,tool_id=rna_star/2.6.0b-2,state=running count=15 queue,tool_id=unicycler/0.4.7.0,state=running count=10 queue,tool_id=hisat2/2.1.0+galaxy4,state=running count=9 ``` ??? - For monitoring, many galaxy sites use Influx DB. - gxadmin supports exporting influx compatible format. - This makes it easy to add new data and graphs to your monitoring. --- ### <i class="fas fa-key" aria-hidden="true"></i><span class="visually-hidden">keypoints</span> Key points - gxadmin is a tool to run common database queries useful for Galaxy admins - new queries are welcome and easy to contribute --- ## Thank You! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!
Authors:
Nate Coraor
Björn Grüning
Simon Gladman
Helena Rasche
This material is licensed under the Creative Commons Attribution 4.0 International License
.