Peptide and Protein Quantification via Stable Isotope Labelling (SIL)
Overview
Questions:Objectives:
What are MS1 features?
How to quantify based on MS1 features?
How to map MS1 features to MS2 identifications?
How to evaluate and optimize the results?
Requirements:
MS1 feature quantitation and mapping of quantitations to peptide and protein IDs.
- Introduction to Galaxy Analyses
- Proteomics
- Protein FASTA Database Handling: tutorial hands-on
- Peptide and Protein ID using OpenMS tools: tutorial hands-on
Time estimation: 1 hourLevel: Advanced AdvancedSupporting Materials:Last modification: May 31, 2021
Introduction
To compare protein amounts in different samples from MS/MS data, two different experiment setups exist. Firstly, unmodified proteins can be measured in separate runs at one sample per MS-run. Secondly, proteins of samples to compare can be labelled with small chemical tags, mixed, and measured side-by-side in a single MS-run. There are two types of chemical tags:
- Isobaric tags display the same mass on first hand, but fragment during the generation of the MS/MS spectra to yield reporter ions of different mass. The intensity of those reporter ions can be compared in MS/MS spectra. There are two types of isobaric tags commercially available: tandem mass tags (TMT) and isobaric tags for relative and absolute quantitation (iTRAQ).
- Isotopic tags are chemically identical, but differ in their mass due to incorporated stable isotopes. Examples of different isotopic tags for stable isotope labelling (SIL) are ICAT, SILAC, dimethylation, or heavy oxygen (18O).
This tutorial deals with protein quantitation via stable isotope labelling (SIL). For isotopic tags, quantitation can be achieved by comparing the intensity of MS1 peptide mass traces. The whole MS1 profile of a peptide, i.e. the intensities of all its isotopic peaks over time, is called a peptide feature (Figure 1a). Incorporation of stable isotopes results in different peptide masses on MS1 level, which give rise to coeluting ion traces in the TIC with a mass difference typical for each different chemical tag (Figure 1b). Figure originally published in Nilse et al, 2015.
In this tutorial, we will use tools of the OpenMS suite to identify and quantify peptides and proteins.
Prerequisites
If you are in the planning phase of your quantitative proteomics experiment, you may want to consider our tutorial on different quantitation methods first.
To learn about protein identification in Galaxy, please consider our OpenMS-based peptide ID tutorial.
hands_on Hands-on: Introduction
In the hands-on section of this tutorial, we will use a quantitative comparison of HEK cell lysate as a test dataset. In this experiment, HEK cells were once labelled with light, once with heavy SILAC. Both cultures were lysed simultaneously and the cell lysates were mixed in a certain ratio. A detailed description of the full dataset is available in the PRIDE archive.
Your objective in this hands-on-tutorial is to find out the correct mixing ratio of the test sample.
To speed up analysis, the input dataset was filtered to include only those data acquired in second 2000-3000 of the original LC gradient.
Agenda
In this tutorial, we will deal with:
Peptide and Protein Identification
In this tutorial, peptide identification will be performed using the workflow of the previous Peptide ID Tutorial. Alternatively one can perform the protein identification step by step in the Peptide ID Tutorial using the SILAC dataset from zenodo but beware to specify the labels in the param_variable_modifications
of XTandemAdapter tool.
A common problem in mass spectrometry are misassigned mono-isotopic precursor peaks. Most search engines allow for some adaptation of the monoisotopic peak and we will use this by leaving By default, misassignment to the first and second isotopic 13C peak are also considered
at No
.
hands_on Hands-on: Data upload
Create a new history for this SILAC Proteome exercise
Tip: Creating a new history
Click the new-history icon at the top of the history panel.
If the new-history is missing:
- Click on the galaxy-gear icon (History options) on the top of the history panel
- Select the option Create New from the menu
Import the mzml file, containing the measured mass spectra from Zenodo or a data library:
https://zenodo.org/record/1051552/files/HEK_SILAC-K6R6_ST905_part.mzml
Tip: Importing via links
- Copy the link location
Open the Galaxy Upload Manager (galaxy-upload on the top-right of the tool panel)
- Select Paste/Fetch Data
Paste the link into the text field
Press Start
- Close the window
Tip: Importing data from a data library
As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:
- Go into Shared data (top panel) then Data libraries
- Navigate to the correct folder as indicated by your instructor
- Select the desired files
- Click on the To History button near the top and select as Datasets from the dropdown menu
- In the pop-up window, select the history you want to import the files to (or create a new one)
- Click on Import
comment Comment
The data have been preprocessed during the conversion from the machine raw file. We used background removal on MS1 and MS2 level, and MS2 deisotoping.
Import the human protein database (including cRAP contaminants and decoys) from zenodo or a data library:
https://zenodo.org/record/892005/files/Human_database_including_decoys_%28cRAP_added%29.fasta
Import the workflow
https://training.galaxyproject.org//training-material/topics/proteomics/tutorials/protein-id-oms/workflows/workflow.ga
Modify the workflow
- Connect the
mzML
input directly to the XTandemAdapter tool node and delete PeakPickerHiRes.Change the XTandemAdapter tool parameters: Add the param_variable_modifications
Label:13C(6) (K)
andLabel:13C(6) (R)
- Run the workflow with
- the mzML dataset
1: mzML dataset
- the human FASTA database
2: protein FASTA database
tip Tip: Using Galaxy Workflows
If you want to learn more about Galaxy workflows, please consult the Galaxy Introduction
question Questions
- How many peptides and proteins were successfully identified?
solution Solution
- 2217 non-redundant peptides and 763 proteins were identified. Numbers may slightly vary depending on the versions of the tools and the used FASTA file.
MS1 Feature Detection
MS1 feature detection is a critical step in quantitative workflows. In principle, there are two different ways to define features:
- Feature detection solely based upon MS1 data (mzML/raw files) without prior knowledge of peptide identifications (IDs).
- Advantage: Feature results can be used to assist in peptide identification.
- Drawback: Not all peptide identifications can be mapped to features, thus not every identified peptide can be quantified.
- Feature detection based upon peptide IDs.
- Advantage: Most peptide IDs trigger a feature detection.
- Drawback: Feature results cannot be used to improve peptide ID.
The OpenMS suite provides several tools (FeatureFinders) for MS1 feature detection. For SIL we have to use FeatureFinderMultiplex, which does not take peptide IDs as an input.
hands_on Hands-on: MS1 Feature Detection
- Run FeatureFinderMultiplex Tool: toolshed.g2.bx.psu.edu/repos/galaxyp/openms_featurefindermultiplex/FeatureFinderMultiplex/2.6+galaxy0 with
- param-file “LC-MS dataset in centroid or profile mode”: mzML file
- In “algorithmic parameters”
- “Labels used for labelling the samples”:
[ ][Arg6,Lys6]
- “m/z tolerance for search of peak patterns”:
10
- “Maximum number of missed cleavages due to incomplete digestion”:
1
- “Optional outputs”:
out_multiplets (Optional output file containing all detected peptide groups
comment Comment: Multiple labels per peptide
When using SILAC-KR or dimethyl-labelling and trypsin digestion, exactly one labelled amino acid per peptide is expected. The only labelled amino acids are lysine (K) and arginine (R) and trypsin cuts after each of them. However, a small percentage of missed cleavage normally occur also in those datasets. Setting “Maximum number of missed cleavages due to incomplete digestion” to
1
will be sufficient to deal with most missed cleavages.When using other enzymes (e.g. Lys-C) or other labels (e.g. \(^{18}O\)), several labelled amino acids per peptide are expected. You can search for those features by increasing the parameter “Maximum number of missed cleavages due to incomplete digestion”.
Mapping Identifications to Features
We now have feature quantifications for MS1 elution peaks, peptide identifications for the MS2 spectra and protein identifications.
The next step is to map the MS2-based peptide identifications to the quantified MS1 precursor peaks (“peptide features”). This will enable the quantification of identified peptides.
For labelled data, it is necessary to map peptide identifications to consensus features (i.e. a pair of one light peptide feature with one matching heavy feature in the correct m/z distance).
For consensusXML
, IDMapper uses the consensus centroids, not the feature boundaries for mapping. Therefore, the RT tolerance has to be set higher than for mapping to featureXML
. A good starting value is half the expected typical retention time.
Sometimes several peptide identifications are mapped to a feature. The tool IDConflictResolver filters the mapping so that only the identification with the best score is associated to each feature. Another refinement of the quantitative result is obtained by removing falsely mapped identifications e.g. light identification mapped to heavy feature. This step is performed by the MultiplexResolver tool that returns a first file with the correctly mapped peptides and as a second output the falsly mapped peptides.
Finally, the correctly mapped peptides will be combined into protein quantifications with the ProteinQuantifier tool.
hands_on Hands-on: Quant to ID matching
- Run IDMapper Tool: toolshed.g2.bx.psu.edu/repos/galaxyp/openms_idmapper/IDMapper/2.6+galaxy0 with
- param-file “Protein/peptide identifications file”: output of FalseDiscoveryRate
- param-file “Feature map/consensus map file”: output of FeatureFinderMultiplex
- “RT tolerance (in seconds) for the matching of peptide identifications and (consensus) features”:
20
- “m/z tolerance (in ppm or Da) for matching of peptide identifications and (consensus) features”:
10
- In “Additional options for consensusXML input”
- “Match using RT and m/z of sub-features instead of consensus RT and m/z”:
Yes
- “Store the map index of the sub-feature in the peptide ID”:
Yes
- Run IDConflictResolver Tool: toolshed.g2.bx.psu.edu/repos/galaxyp/openms_idconflictresolver/IDConflictResolver/2.6+galaxy0 with
- param-file “Input file”: output of IDMapper
- Run MultiplexResolver Tool: toolshed.g2.bx.psu.edu/repos/galaxyp/openms_multiplexresolver/MultiplexResolver/2.5+galaxy0 with
- param-file “Peptide multiplets with assigned sequence information”: output of IDConflictResolver
- “Labels used for labelling the samples”:
[ ][Arg6,Lys6]
- “Maximum number of missed cleavages due to incomplete digestion”:
1
- Run ProteinQuantifier Tool: toolshed.g2.bx.psu.edu/repos/galaxyp/openms_proteinquantifier/ProteinQuantifier/2.6+galaxy0 with
- param-file “Input file”: first output of MultiplexResolver
- param-file “Protein inference results”: output of FalseDiscoveryRate
- “Calculate protein abundance from this number of proteotypic peptides (most abundant first; ‘0’ for all)”:
0
- “Averaging method used to compute protein abundances from peptide abundances”:
sum
- “Add the log2 ratios of the abundance values to the output”:
Yes
- “Optional outputs”:
out (Output file for protein abundances)
andpeptide_out (Output file for peptide abundances)
comment Comment: ProteinQuantifier parameters
Peptide quantitation algorithms are more precise for high abundant peptides. Therefore, it is recommended to base protein quantitations on those peptides. In ProteinQuantifier, you may restrict the calculation of protein abundances to the most abundant peptides by using the option “Calculate protein abundance from this number of proteotypic peptides”. However, we recommend to use the averaging method
sum
instead. By using this option, protein ratios are based on the sum of all peptide abundances. Thus, highly abundant peptides thus have more influence on protein abundance calculation than low abundant peptides. A simple sum-of-intensities algorithm provided the best estimates of true protein ratios in a comparison of several protein quantitation algorithms (Carrillo et al., Bioinformatics, 2009).
Descriptive Statistics and Plotting of Protein Quantitations
ProteinQuantifier produces two output tables: the first one gives information about the quantified proteins, the second one gives information about the quantified peptides. For proteins, we added a log-transformed ratio to the output, which is saved in column 8 of the protein table. The ratio is calculated as log2 (abundance2/abundance1), which is sometimes called the fold change (FC) ratio.
To get a quick overview of the results, you can calculate basic descriptive statistics and plot the data as a histogram.
Comment lines in the beginning of a tabular
file may sometimes cause errors, therefore we will remove them with the tool Select last lines from a dataset (tail).
hands_on Hands-on: Descriptive Statistics
- Run Summary Statistics Tool: Summary_Statistics1 with
- param-file “Summary statistics on”: protein table output (first file) of ProteinQuantifier
- “Column or expression”:
c8
- Run Select last lines from a dataset (tail) Tool: toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_tail_tool/1.1.0 with
- param-file “Text file”: protein table output (first file) of ProteinQuantifier
- “Operation”:
Keep everything from this line on
- “Number of lines”:
4
- Run Histogram Tool: toolshed.g2.bx.psu.edu/repos/devteam/histogram/histogram_rpy/1.0.4 with
- param-file “Dataset”: output of Select last
- “Numerical column for x axis”:
Column: 8
- “Number of breaks (bars)”:
20
- “Plot title” and “Label for x axis” to something meaningful.
comment Calculating descriptive statistics for peptides
The peptide table output of ProteinQuantifier does not give the log-transformed ratio for each peptide. Nonetheless, you may calculate basic statistics of the FC values by running Summary Statistics with “Column or expression” set to
log(c6/c5,2)
.
question Questions
- How many peptides and proteins were successfully quantified?
- What might have been the mixing ratio of the input dataset?
- In the histogram, there is a second local maximum at about FC 0. What might that mean?
solution Solution
- With the above parameters, you should have quantified 870 peptides and 421 proteins (slight variations are expected depending on tool version or fasta file)
- In the histogram, you see that the peak of the density curve is between -1.1 and -1.2. In the summary statistics, you can see that the mean protein ratio was -0.98. An FC of -1 indicates that the unlabelled proteins were twice as abundant as their heavy-labelled counterparts. Indeed, the mixing ratio of the dataset was 2 parts light labelled HEK cell lysate and 1 part heavy labelled HEK cell lysate.
- Some proteins were quantified with an FC close to 0. These may stem from incomplete SILAC labelling. Even after two weeks of cell culture in SILAC medium, some proteins with a very low turnover rate may remain unlabelled.
Evaluation and Optimization of Quantitation Results
Protein quantitation is a multi-step procedure. Many parameters of different steps influence the final results. Therefore, it is recommended to optimize the tool parameters for each dataset and to carefully evaluate quantitation results. While the total number of quantified proteins is a first important parameter for optimization, it is also necessary to visualize the results and check for correct feature finding and ID mapping.
Galaxy does not provide a tool for proteomics visualization, we recommend to use the OpenMS Viewer TOPPView. Basic TOPPView tutorials are available as videos and a more comprehensive tutorial as HTML or PDF.
For the optimization of tool parameters, it is recommended not to work with a complete LC-MS/MS run. Instead, we will use FileFilter to extract a small RT-slice of our input dataset, i.e. a fraction of the original dataset that was measured during a short period of time. Reducing the test data reduces the time needed for analysis and facilitates visual examination of the data.
Using Galaxy Workflows enables us to quickly re-run a full analysis with changed parameters. To learn about Galaxy Workflows, consult this tutorial.
Cave: Be aware that only very small parts of your dataset can be checked by visual examination. To minimize biases, try to look at the same areas / features of each result file.
hands_on Hands-on: Data reduction and visual evaluation with TOPPView
- Run FileFilter tool with
- param-file “Input file”:
mzML
file- “Retention time range to extract”:
2000:2200
Extract a workflow out of your history or import the premade workflow
Run the whole workflow again with default settings on the reduced
mzML
file.- Run FileFilter tool with
- param-file “Input file”: IDConflictResolver output
- “Remove features without annotations”:
Yes
Rename the FileFilter output to
Annotated features
Tip: Renaming a dataset
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, change the Name field
- Click the Save button
- Run FileFilter tool with
- param-file “Input file”: IDConflictResolver output
- “Remove features without annotations”:
Yes
Rename the FileFilter output to
UNannotated features
- Download locally the following files:
- Spectra:
mzML
file- peptide IDs: IDScoreSwitcher
idXML
output file- features: FeatureFinderMultiplex
featureXML
output file- consensus features: FileFilter
consensusXML
output files (“Annotated” and “UNannotated” features)- Open TOPPView
- Open the
mzML
file in TOPPView with
- Open as set to
new window
- Map view set to
2D
- Low intensity cutoff set to
off
- Open all other downloaded files in TOPPView with
- Open as set to
new layer
- Activate the
mzML
layer and click onShow projections
- Activate the
consensusXML
layers and click onShow consensus feature element positions
- Evaluate your data analysis, by
- zooming into a specific region (hold
Ctrl
and use the mouse to zoom)- measuring m/z and RT distances (select the
mzML
layer, holdShift
and use the mouse to measure)- displaying an area in 3D view (
right-click
into the 2D View and selectSwitch to 3D view
)- switching on and off the display of single layers (
left-click
at the tick-boxes in the window “Layers”)
Examples
-
Displaying annotated vs. UNannotated features: visualize annotated (= mapped) and unannotated (= unmapped) features by switching between activating only the “Annotated_features.consensusxml” or only the “UNannotated_features.consensusxml” layer
-
Correct mapping: a feature was detected, a peptide was identified and the two were mapped.
-
No feature detected for a contaminant. Contaminants are often not labelled, but occur only in their unlabelled isoform. Therefore, they do not give rise to a consensus feature in FeatureFinderMultiplex.
hands_on Hands-on: Check a possible contaminant
- Run TextExporter tool with
- param-file “Input file”: IDFilter output file
- Run Search in textfiles (grep) tool with
- param-file“Select lines from”: output of TextExporter
- “Regular Expression”: the peptide sequence (e.g.
MFLSFPTTK
)- Check if the peptide was mapped to a protein marked with “CONTAMINANT”.
Typical Problems
Three problems typically hamper correct peptide mapping:
-
A feature is detected, but no peptide identification is nearby.
- Possible cause: This may be caused by imperfect peptide identification. However, it is never expected that every single MS2-spectrum leads to an identification. The protein might be missing in the database, or the peptide may carry a modification that was not included in the search.
- Possible solution: Improve your search engine settings.
-
A peptide was identified, but no feature is nearby.
- Possible cause:
- The elution peaks of the peptide may be distorted. This is typical for low intensity peptides. If a lot of peptides have distorted elution peaks this may be a sign of spray instability.
- The peptide is a contaminant.
- Possible solution:
- Lower the FeatureFinderMultiplex parameters Two peptides in a multiplet are expected to have the same isotopic pattern and/or The isotopic pattern of a peptide should resemble the averagine model at this m/z position or broaden the Range of isotopes per peptide in the sample (in Advanced Options).
- No optimization of parameters is needed (see example above)
- Possible cause:
-
A peptide was identified and a feature was detected nearby, but the two are not mapped to each other.
- Possible cause:
- The MS2 event and the feature are too far apart to be mapped.
- The precursor of the MS2 was not correctly assigned to the mono-isotopic peak.
- The detected feature is too small in RT dimension and covers only a part of the peptide peaks.
- Possible solution:
- Increase the IDMapper parameter RT tolerance (in seconds) for the matching of peptide identifications and (consensus) features.
- Feature size in RT dimension cannot be directly corrected, use solution 1 instead.
- Possible cause:
Two problems typically disturb correct peptide quantitation:
- A peptide is mapped to the wrong feature.
- Possible cause: Co-eluting peptides of a similar mass may be falsely mapped to a nearby feature, if the correct peptide did not lead to an identification or was identified only with a low score. In high-resolution data, this problem of is very limited. Co-eluting peptides can normally be distinguished by slightly different m/z values. In low-resolution data, wrong assignment may occur more often.
- Possible solution: If a high value is used for the precursor mass tolerance, try to keep the RT tolerance low to avoid false mapping.
- Background noise (1) or co-eluting peptides (2) are incorporated in a feature.
- Possible solution:
- Use noise-filtering either during pre-processing or by increasing the FeatureFinderMultiplex parameter Lower bound for the intensity of isotopic peaks
- Reduce the FeatureFinderMultiplex parameter m/z tolerance for search of peak patterns.
- Possible solution:
question Question
- How many peptides could not be mapped to MS1 features? (Click on the IDMapper output and look at the tool’s infobox.)
- How many features could not be mapped to a peptide identification? (Click on the ProteinQuantifier output and look at the tool’s infobox.)
- Which problems are most prominent in the test dataset?
solution Solution
- 1,395 peptide IDs could not be mapped to a feature.
- 1,898 features, corresponding to 949 consensus features could not be mapped to a peptide identification.
- The mapping of peptide IDs to features seems to have worked mostly fine. The main problems seem to be (1) missing peptide identifications, (2) missing features where a peptide was identified and (3) features that span a shorter RT range than the corresponding peptide’s elution peak.
Optimization of Quantitation Results
For optimization, it is critical to modify only one parameter at a time. Also, it is recommended to optimize the tools in the order of their position in the workflow.
In the test dataset, several peptides were identified, but not quantified. Some of the peptides were even identified in the unlabelled, as well as in the labelled form. To optimize the feature detection, we will relax the parameters of FeatureFinderMultiplex.
hands_on Hands-on: Optimize Feature Detection
- Run the whole workflow again:
- Change the FeatureFinderMultiplex parameter “Range of isotopes per peptide in the sample” from
3:6
to2:6
.- Integrate the HighResPrecursorMassCorrector tool into the workflow.
- Use the mzML file and the featureXML from the FeatureFinderMultiplex tool as input
- Set Additional retention time tolerance added to feature boundaries between
0.0
and10.0
- Connect the output with the search engine.
- Compare the number of identified proteins, unmatched features and unmapped peptides for each parameter setting.
- Visualize the results with TOPPView to check for correct feature detection and feature-to-peptide mapping.
tip Tip: Sending workflow results to new history
When running a workflow, you may send the results to a new history. This helps keeping track of different parameter settings.
question Question
Which parameter improved the number of quantified proteins?
solution Solution
Both changes led to more quantified proteins. Increasing the isotope range led to 26 \% more protein quantitations, increasing the RT tolerance led to 7 \% more protein quantitations.
Key points
Peptides labelled with stable isotopes result in multiple parallel MS1 ion traces.
MS1 features can be used for relative protein quantitation.
Quantitations have to be mapped to PSMs.
PSM quantitations can be used to calculate protein quantitations.
Proper quantitation and mapping needs careful evaluation and optimization.
Frequently Asked Questions
Have questions about this tutorial? Check out the tutorial FAQ page or the FAQ page for the Proteomics topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help ForumUseful literature
Further information, including links to documentation and original publications, regarding the tools, analysis techniques and the interpretation of results described in this tutorial can be found here.
Feedback
Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.
Citing this Tutorial
- Florian Christoph Sigloch, Björn Grüning, Matthias Fahrner, 2021 Peptide and Protein Quantification via Stable Isotope Labelling (SIL) (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/protein-quant-sil/tutorial.html Online; accessed TODAY
- Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012
details BibTeX
@misc{proteomics-protein-quant-sil, author = "Florian Christoph Sigloch and Björn Grüning and Matthias Fahrner", title = "Peptide and Protein Quantification via Stable Isotope Labelling (SIL) (Galaxy Training Materials)", year = "2021", month = "05", day = "31" url = "\url{https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/protein-quant-sil/tutorial.html}", note = "[Online; accessed TODAY]" } @article{Batut_2018, doi = {10.1016/j.cels.2018.05.012}, url = {https://doi.org/10.1016%2Fj.cels.2018.05.012}, year = 2018, month = {jun}, publisher = {Elsevier {BV}}, volume = {6}, number = {6}, pages = {752--758.e1}, author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning}, title = {Community-Driven Data Analysis Training for Biology}, journal = {Cell Systems} }