InterMine integration with Galaxy
Overview
Questions:Objectives:
How to export your query results from your InterMine of choice to Galaxy?
How to export a list of identifiers from Galaxy to your InterMine of choice?
Learn how to import/export data from/to InterMine instances
Understand the InterMine Interchange Dataset
Time estimation: 1 hourSupporting Materials:Last modification: Mar 12, 2021
Introduction
InterMine (Smith et al. 2012) is a well-establish platform to integrate and access life sciences data. It provides the integrated data via a web interface and RESTful web services.
Other organizations download and deploy InterMine on their servers: there are more than 30 instances over the world (registered at registry.intermine.org), covering many organism, including human data, model animals, plants and drug targets.
InterMine has been integrated with Galaxy: the InterMine tool server in Galaxy allows to import the data returned by any InterMine search and viceversa, using the InterMine Interchange format it’s possible to export a list of identifiers from Galaxy into any InterMine instance of your choice.
Learn more in this tutorial.
Agenda
In this tutorial, we will cover:
Import data from InterMine
hands_on Hands-on: Import
Search Galaxy for
InterMine
(not case sensitive;intermine
is fine too), and click on InterMine Server under Get Data.
InterMine Server Tool: intermine
This will redirect you to the InterMine registry, which shows a full list of InterMines and the various organisms they support. Find an InterMine that has the organism type you’re working with, and click on it to redirect to that InterMine.
Once you arrive at your InterMine of choice, you can run a query as normal - this could be a search, a list results page, a template, or a query in the query builder. Eventually you’ll be presented with an InterMine results table.
- Click on Export (top right). This will bring up a modal window.
- Select Send to Galaxy and double-check the “Galaxy Location” is correct.
Click on the Send to Galaxy button on the bottom right of the pop-up window.
tip Enable popups
If you get an error when you click on the Send to Galaxy button, please make sure to allow popups and try again.
You have now exported your query results from InterMine to Galaxy.
Export identifiers into InterMine
Get data
hands_on Hands-on: Data upload
Import some fly data from Zenodo or from the data library
https://zenodo.org/record/3407174/files/GenesLocatedOnChromosome4.tsv
Tip: Importing via links
- Copy the link location
Open the Galaxy Upload Manager (galaxy-upload on the top-right of the tool panel)
- Select Paste/Fetch Data
Paste the link into the text field
Press Start
- Close the window
Tip: Importing data from a data library
As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:
- Go into Shared data (top panel) then Data libraries
- Navigate to the correct folder as indicated by your instructor
- Select the desired files
- Click on the To History button near the top and select as Datasets from the dropdown menu
- In the pop-up window, select the history you want to import the files to (or create a new one)
- Click on Import
Rename the dataset to
GenesLocatedOnChromosome4
Tip: Renaming a dataset
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, change the Name field
- Click the Save button
Inspect the data
The dataset contains the secondary identifier and the symbol of the Drosophila melanogaster genes and their location on the chromosome 4
question Questions
Do the data contain the type, e.g
Protein
orGene
?solution Solution
No, they don’t. So we have to specify it, when we create the InterMine Interchange file
Create InterMine Interchange dataset
We will use Create InterMine Interchange Dataset tool in order to generate an intermediate file which will be used to send the identifiers (e.g. gene identifiers) to InterMine. This file requires the identifier’s type (e.g. Gene
), the identifier (e.g WBGene00007063
) and, optionally, the organims’s name.
hands_on Hands-on: Generate InterMine file
- Create InterMine Interchange dateset Tool: toolshed.g2.bx.psu.edu/repos/iuc/intermine_galaxy_exchange/galaxy_intermine_exchange/0.0.1 with the following parameters:
- param-file “Tabular file”: select the
GenesLocatedOnChromosome4
dataset which contains some fly’s genes- “Feature Type Column”:
Column: 1
- “Feature Type”:
Gene
- “Feature Identifier column”:
Column: 2
comment Comment
- In this example, because the
GenesLocatedOnChromosome4
dataset does not contain the type we have to specify it, in the “Feature Type”- “Feature Type”: this is type of the identifiers you are exporting to InterMine, in this example
Gene
. It must be a class in the InterMine data model.- “Feature Identifier column”: select a column from the input file which contains the identifier. We have selected Column 2, which contains the gene symbol.
- “Feature Identifier”: This could be, as an example, a gene symbol like
GATA1
or another other identifier, e.g.FBGN0000099
or perhaps a protein accession. In our example we do not have to edit anything because the values for this field are contained in theGenesLocatedOnChromosome4
dataset, in Column 2.- “Organism Name column”: select a column from the input file which contains the organism’s name, if you have multiple organisms in the same dataset.
- “Organism Name”: alternatively you can directly provide the organism’s name. The organims’ name is not mandatory, but is good to provide if it is known. It does not have to be precise
- Click on Execute
Send identifiers to InterMine
Once the generation of the interchange dataset has been completed, open the green box related to Create InterMine Interchange on data.
hands_on Hands-on: Send data
- Click on view intermine at Registry to be redirected to the InterMine registry, which shows a full list of InterMines and the various organisms they support.
- Find an InterMine that has the organism type you’re working with, in our case FlyMine, and click on the Send to green button to export the identifiers to.
- You are redirected to FlyMine, in the List Analysis page showing the identifiers you have just exported from Galaxy.
Conclusion
You have now exported your identifiers from Galaxy to InterMine.
Frequently Asked Questions
Have questions about this tutorial? Check out the tutorial FAQ page or the FAQ page for the Using Galaxy and Managing your Data topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help ForumReferences
- Smith, R. N., J. Aleksic, D. Butano, A. Carr, S. Contrino et al., 2012 InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data. Bioinformatics 28: 3163–3165. 10.1093/bioinformatics/bts577
Feedback
Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.
Citing this Tutorial
- Daniela Butano, Yo Yehudi, 2021 InterMine integration with Galaxy (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/intermine/tutorial.html Online; accessed TODAY
- Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012
details BibTeX
@misc{galaxy-interface-intermine, author = "Daniela Butano and Yo Yehudi", title = "InterMine integration with Galaxy (Galaxy Training Materials)", year = "2021", month = "03", day = "12" url = "\url{https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/intermine/tutorial.html}", note = "[Online; accessed TODAY]" } @article{Batut_2018, doi = {10.1016/j.cels.2018.05.012}, url = {https://doi.org/10.1016%2Fj.cels.2018.05.012}, year = 2018, month = {jun}, publisher = {Elsevier {BV}}, volume = {6}, number = {6}, pages = {752--758.e1}, author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning}, title = {Community-Driven Data Analysis Training for Biology}, journal = {Cell Systems} }