name: inverse layout: true class: center, middle, inverse
---
# Galactic Database
Authors:
Martin Čech
Nicola Soranzo
Nate Coraor
Helena Rasche
last_modification
Updated: Jun 14, 2022
video-slides
View video slides for this lecture
text-document
Plain-text slides
Tip:
press
P
to view the presenter notes
??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off Press `C` to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting. --- # Galactic Database .left[Galaxy uses a database for:] * Galaxy objects and all their relations (users, histories, datasets, workflows) * Job state persistence, job dispatching .left[Galaxy does not use a database for:] * Dataset **contents**: files on disk ??? - Galaxy objects like users, histories, datasets, and workflows are stored in the database. - All job information is likewise stored there. - User and reference data, however, are stored outside of the database. - You will want to backup both user data and your database. --- # Defaults * Galaxy uses the [SQLAlchemy](http://www.sqlalchemy.org/) database abstraction layer. This allows for different database servers to be plugged in. * By default Galaxy automatically creates and uses an [SQLite](https://sqlite.org/) database during the first startup. * The database is in the file `database/universe.sqlite` ??? - Galaxy uses SQLAlchemy for interacting with databases. - This allows Galaxy to transparently use sqlite or postgres. --- # Choices * SQLite * Useful for single-user Galaxy or development. * **PostgreSQL** * The recommended standard for anything serious. * ~~MySQL~~ * Supported by SQLAlchemy but Galaxy is not tested against it. ??? - sqlite is fine in development, but should not be used for production. - postgres is the best for any production server. - Do not use mysql if possible. --- # Sizing Galaxy rarely deletes from the database, most objects are *marked* deleted. Allocate at least 20 GB of disk to start, 50+ GB if expanding would be difficult. 8-16 GB memory should be sufficient. Recommended: Run PostgreSQL on a different server for resource isolation. ??? - Data is not removed from the database, so plan accordingly. - Allocated at least 20 GB of disk to start, and 50 if expanding later is difficult. - The RAM usage is usually not significant. - We recommend running the database on a separate server from Galaxy for better isolation. --- # Configuration `database_connection` is specified as a [database URL](http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls) in `galaxy.yml` * Default SQLite: `sqlite:///./database/universe.sqlite?isolation_level=IMMEDIATE` * Local PostgreSQL (socket): `postgresql:///<db_name>[?host=/var/run/postgresql]` * Network PostgreSQL: `postgresql://[user][:password]@<host>[:5432]/<db_name>` ??? - The parameter database_connection tells galaxy where the database is. - There are several styles, depending on how you connect to your database. --- # New Database On first startup with an empty database, Galaxy creates its schema ??? - On first startup with an empty database, Galaxy creates its schema. --- # Migrations Changes in the Galaxy DB model (when upgrading Galaxy) are captured incrementally in the form of [atomic scripts](https://github.com/galaxyproject/galaxy/tree/dev/lib/galaxy/model/migrate/versions). Each script can both upgrade and downgrade a DB. ```console $ ./manage_db.sh upgrade $ ./manage_db.sh downgrade --version=132 ``` ??? - Whenever Galaxy is upgraded, there may be changes to the database schema. - These are stored in migration scripts, and you can use the manage db script to up or downgrade. --- # Tuning - Pool If the server logs errors about not having enough database pool connections. | Galaxy config option | default value | usegalaxy.org value | | ------ | ------------- | ------------------- | | `database_engine_option_pool_size` | `5` | `10` | | `database_engine_option_max_overflow` | `10` | `20` | [Values for usegalaxy.org](https://github.com/galaxyproject/usegalaxy-playbook/blob/master/env/main/group_vars/all/galaxy_config_vars.yml) ??? - Databases have a limited number of connection slots. - Galaxy can pool connections, and re-use existing connections from a pool when it needs to query the DB. - There are a couple of options for controlling pooling. - The defaults are generally fine until you see errors. --- # Tuning - Server-side cursors If large database query results are causing memory or response time issues in the Galaxy process, leave it on server (PostgreSQL only, recommended). | Galaxy config option | default value | usegalaxy.org value | | ------ | ------------- | ------------------- | | `database_engine_option_server_side_cursors` | `false` | `true` | ??? - Large queries may slow down Galaxy or the DB. - You can enable server side cursors to help with this. --- # Tuning - Slow query logging Queries slower than this threshold (in s) will be logged at debug level. | Galaxy config option | default value | usegalaxy.org value | | ------ | ------------- | ------------------- | | `slow_query_log_threshold` | `0` | `2` | ??? - If you notice slow responses, you can enable slow query logging. - This will print a message in your Galaxy log if a query takes more than a specified time period. - This can be useful to help you debug issues. --- # Tuning - TS install database Galaxy can track Tool Shed data in a separate DB. | Galaxy config option | default value | usegalaxy.org value | | ------ | ------------- | ------------------- | | `install_database_connection` | value of `database_connection` | SQLite DB in CVMFS | All other database config options but prefixed with `install_` are also available. .left[This allows:] * Bootstrapping fresh Galaxy instances with prebuilt/tested tool sets * Atomic installation/rollback (esp. with SQLite: backup and restore DB file) ??? - Galaxy can track tool shed data in a separate database - This can enable deploying Galaxies with prebuilt tool sets --- # Access through model Python script to access Galaxy's database layer **via the Galaxy model**. ```console (venv)$ python -i scripts/db_shell.py --config config/galaxy.yml >>> new_user = User('foo@example.org', 'secret') >>> new_user.username = 'foo' >>> sa_session.add(new_user) >>> sa_session.flush() >>> sa_session.query(User).all() ``` ??? - You can use the Galaxy python models to interact with the database. - After activating the galaxy virtual environment, you can use the DB shell script to interface. - This can allow scripting tasks like resetting passwords. --- # Useful queries Captured in [gxadmin](/training-material/topics/admin/tutorials/gxadmin/slides.html) ([tutorial](/training-material/topics/admin/tutorials/gxadmin/tutorial.html)) ??? - Many useful DB queries are captured in gxadmin - Look into this if you need to query the database for information like running jobs or recent users. --- ## Thank You! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!
Authors:
Martin Čech
Nicola Soranzo
Nate Coraor
Helena Rasche
This material is licensed under the Creative Commons Attribution 4.0 International License
.