@@ -771,65 +751,157 @@
Quickstart
-Quickstart
-A demo server is available here. Be mindful, you share the server with others.
-Download the demo files from Google Drive and extract the archive.
+1. Install ms-mint-app
and start the application.
+If you know how to use pip
run:
+pip install ms-mint-app
+
+or follow the instruction here.
+Then start the application with
+Mint.py
+
+or, if you have a prefered directory for data you can specify it with --data-dir
e.g.:
+Mint.py --data-dir /data
+
+The application will take a while until it starts up. In the mean time the browser window will show
+
+This site can’t be reached
+
+Just wait a bit until the terminal shows INFO:waitress:Serving on http://127.0.0.1:9999
and refresh the page.
+The application is now served on port 9999
of your local machine.
+
+If you have never started the application before, you will not have any workspaces yet.
+2. Create a workspace
+In the Workspaces
tab click on the blue button with the label CREATE WORKSPACE
. A dialogue opens asking you for the name of the future workspace. Type DEMO
into the text field and click on CREATE
.
+
+Now you have created your first workspace, but it is empty. We will need some input files to populate it.
+You can see which workspace is activated in the light-blue info box:
+
+3. Download the demo files
+Some demo files are available for download on the ms-mint
Google-Drive. Go on and download the files from Google Drive and extract the archive.
You will find two csv
files and 12 mzXML
and/or mzML
files.
+.
+├── README.md
+├── metadata
+│ └── metadata.csv
+├── ms-files
+│ ├── CA_B1.mzXML
+│ ├── CA_B2.mzXML
+│ ├── CA_B3.mzXML
+│ ├── CA_B4.mzXML
+│ ├── EC_B1.mzXML
+│ ├── EC_B2.mzXML
+│ ├── EC_B3.mzXML
+│ ├── EC_B4.mzXML
+│ ├── SA_B1.mzML
+│ ├── SA_B2.mzML
+│ ├── SA_B3.mzML
+│ └── SA_B4.mzML
+└── targets
+ └── targets.csv
+
+4 directories, 15 files
+
- A folder with 12 mass-spectrometry (MS) files from microbial samples. We have four files for each Staphylococcus aureus (SA), Escherichia coli (EC), and Candida albicans (CA).
Each file belongs to one of four batches (B1-B4).
-MINT-metadata.csv
contains this information in tabular format. Setting up the metadata for your project is essential.
-MINT-targets.csv
contains the extraction lists. The identification of the metabolites has been done before, so we know where the metabolites appear in the MS data.
+metadata.csv
contains this information in tabular format. Setting up the metadata for your project is essential.
+targets.csv
contains the extraction lists. The identification of the metabolites has been done before, so we know where the metabolites appear in the MS data.
-1. Open the MINT application and create a new workspace named DEMO.
-At workspaces
click on CREATE WORKSPACE
. Type DEMO
into the text field and click on CREATE
.
-2. Make sure the DEMO workspace is activated, indicated by a blue bullet in the workspace table.
-
-3. Switch to MS-Files
and upload the 12 MS files.
+4. Upload LCMS files
+Switch to MS-Files
tab and upload the 12 MS files, by either using the Explorer/Finder to the Upload field, or by clicking on the upload field and selecting the files in the dialogue box that opens. Wait until all files are uploaded:
-Wait until all files are uploaded.
-
-
-This file contains information about your samples. Setting up metadata is important and needs to be done with care.
-Your downstream analysis will benefit greatly from a good metadata table.
-
-PeakOpt
if True these files will be used in the peak optimization tab.
-Label
should be used to indicate the group of the sampe. E.g. treatment group vs control group.
-Batch
is the batch ID, for example the plate if samples come from multiple plates.
-Type
indicates the type for the sample. By default evertying is called Biological Sample
, other meaningful labels are Standard Sample
or Quality Control
.
-Row
and Column
indicate the location of the sample on the plate e.g. 1-12 and A-H for a 96-well plate.
-RunOrder
can contain the order 1-N in which the samples were processed.
-- Add more columns if you need. You can download the file, add new columns with Excel, and upload the table again.
-- Use the
action
to set multiple cell values at once.
-
-5. Switch to Targets
and upload MINT-targets.csv
.
-
+To speed up future processing you can now convert these files to the .feather
format, by selecting all files and clicking on "CONVERT SELECTED FILES TO FEATHER".
+
+Switch to Metadata
and upload metadata.csv
. This will populate the table with important information.
+
+This file contains critical information about your samples, and setting up the metadata accurately and meticulously is essential.
+The metadata table is a cornerstone for downstream processes in ms-mint
and should not be omitted.
+Properly configured metadata enhances the quality and precision of your results, making it a vital component of your workflow.
+
+
+
+Column Name |
+Description |
+
+
+
+
+ms_file_label |
+The label of the mass spectrometry file. |
+
+
+color |
+Color coding for visual identification. |
+
+
+use_for_optimization |
+Boolean value indicating if the file is used in peak optimization. |
+
+
+in_analysis |
+Indicates if the sample is included in the analysis. |
+
+
+label |
+Group of the sample (e.g., treatment group vs. control group). |
+
+
+sample_type |
+Type of the sample, default is Biological Sample , other labels could be Standard Sample or Quality Control . |
+
+
+run_order |
+Order in which the samples were processed (1-N). |
+
+
+plate |
+Batch ID, for example, the plate if samples come from multiple plates. |
+
+
+plate_row |
+Row location of the sample on the plate (e.g., 1-12 for a 96-well plate). |
+
+
+plate_column |
+Column location of the sample on the plate (e.g., A-H for a 96-well plate). |
+
+
+ms_column |
+Mass spectrometry column information. |
+
+
+ionization_mode |
+Mode of ionization used in the mass spectrometry. |
+
+
+
+
+Switch to Targets
and upload MINT-targets.csv
.
+
This is the data extraction protocol. This determines what data is extracted from the files. The same protocol is applied to all files. No fitting or peak optimization is done.
MINT therefore requires a very stable chromatographic column and stable retention times for all files in a workspace.
-6. Switch to Peak Optimization
-Switch the File selection
to Use all files
. Normally, especially for large datasets, you should select a small representative set of samples. The peak optimization takes longer the more files are used for it and the more targets are defined. 'Click on UPDATE PEAK PREVIEWS
.
+6. Optimize retention times
+Switch to Peak Optimization
tab and select Use all files
in the File selection
menu. Normally, especially for large datasets, you should select a small representative set of samples including standards (with known concentrations of the target metabolites). The peak optimization takes longer the more files are used for it and the more targets are defined. 'Click on UPDATE PEAK PREVIEWS
.
-
-- here you can see the shapes of the extracted peaks.
-- optimize the retention time with the interactive tool.
-- you can click on the image to load the data into the interactive tool or use the dropdown menu.
-- highlight the prefered region (highlighted in green) and click on
SET RT TO CURRENT VIEW
to update the retention time window.
-- the horizontal bar indicates how far away the selected window is from the
rt
reference value in the Targets
.
-
+This will show you the shapes of the data in the selected regions as an overview. This is a great way to validate that your target parameters are correct.
+However, you have to make sure that the metabolite you are looking for is present in the files. That is why you should always add some standard samples.
+The colors in the plots correspond to the colors in the metadata table.
+You can use the interactive tool below to optimize the retention time for each target manually. You can do that by zooming in towards the area that you want to select as peak and then click on SET RT TO CURRENT VIEW
. The green area is what is currently selected as retention time (RT) range. The black bar is the expected retention time of the peak maximum that you usually know from former experiments. This way you can compare the peak with older experiments. To set the expected RT to the middle of the current window press CONFIRM RETENTION TIME
. If the target is not present in any of the files, you can and should remove it from the target list by clicking on REMOVE TARGET
.
- if you are happy with the peak shapes you can proceed to
Processing
.
-6. Switch to Processing
and start the data extraction with Run MINT
+7. Process the data
+Switch to Processing
and start the data extraction with Run MINT
The extraction process is done when you get a green notification Finished running MINT
.
Now, you can download the results in long-format or the dense peak_max values.
The tidy format contains all results, while the DENSE PEAK_MAX
only contians the peak_max
values as a matrix.
-7. Switch to Analysis
.
+8. Switch to Analysis
.
Once the results are generated the 'Heatmaptab will show an interactive heatmap.
You can change the size of the heatmap by changing your browser window and
UPDATEthe plot.
The heatmap shows the
peak_max` values. The dropdown menu provides some options.
-8. Switch to Analysis/Plotting
+9. Switch to Analysis/Plotting
The plotting tool is very powerful, but requires some practise. It is a wrapper of the powerful seaborn API.
Let's create a few simple visualizations.
diff --git a/quickstart/metadata-added.png b/quickstart/metadata-added.png
new file mode 100644
index 0000000..c8dd12f
Binary files /dev/null and b/quickstart/metadata-added.png differ
diff --git a/quickstart/ms-files-uploaded.png b/quickstart/ms-files-uploaded.png
index af4e9f0..34c434a 100644
Binary files a/quickstart/ms-files-uploaded.png and b/quickstart/ms-files-uploaded.png differ
diff --git a/quickstart/peak-preview.png b/quickstart/peak-preview.png
index 50cc26b..ff3ec4f 100644
Binary files a/quickstart/peak-preview.png and b/quickstart/peak-preview.png differ
diff --git a/quickstart/peak-preview_.png b/quickstart/peak-preview_.png
new file mode 100644
index 0000000..50cc26b
Binary files /dev/null and b/quickstart/peak-preview_.png differ
diff --git a/quickstart/workspace-activated.png b/quickstart/workspace-activated.png
new file mode 100644
index 0000000..25d712d
Binary files /dev/null and b/quickstart/workspace-activated.png differ
diff --git a/search/search_index.json b/search/search_index.json
index 22b3117..4434844 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":""},{"location":"#mint-metabolomics-integrator","title":"MINT - Metabolomics Integrator","text":"MINT is a post-processing tool for liquid chromatography-mass spectrometry (LCMS) based metabolomics. Metabolomics is the study of all metabolites (small chemical compounds) in a biological sample e.g. from bacteria or a human blood sample. The metabolites can be used to define biomarkers used in medicine to find treatments for diseases or for the development of diagnostic tests or for the identification of pathogens such as methicillin resistant Staphylococcus aureus (MRSA).
Figure 1: Flowchart of MINT processing workflow.
"},{"location":"#quickstart","title":"Quickstart","text":"Check out the Quickstart to jump right into it.
"},{"location":"#what-is-it-all-about","title":"What is it all about?","text":"What problem is MINT solving? Check out the background section.
"},{"location":"background/","title":"Background","text":""},{"location":"background/#what-is-lcms","title":"What is LCMS?","text":"A typical biological sample, such as human blood or agar with some kind of bacteria, can contain thousands of metabolites such as sugars, alcohols, amino acids, nucleotides and more. To meassure the composition of such a sample mass spectrometry can be used.
However, many metabolites share exact masses with other metabolites and therefore would be undistiguishable in the mass spectrometer. Therefore, compounds are sorted using column chromatography and spread out over time. The metabolites that enter the column at the same time interact with the column in different ways based on their specific stereochemistry. These interactions let compounds move faster or slower through the column and therefore the compounds will elude at different times. That way various metabolites can be analysed successively over certain timeframe rather than simultaneously.
The mass spectrometer that follows the chromatographic column meassures the masses given at each point in time and returns a time dependent spectrogram. An example of a LSMS meassurement is visualized in the following figure:
Figure 1: A 2D-histogram of a MS1 recorded intensities taken over time span of 10 minutes. Shown are m/z values between 100 and 600 [Da/z].
If we zoom into this figure to a very narrow band of masses the traces of individual metabolites can be observed. The trace of succinate (or succinic acid) is shown here:
Figure 2: A zoom into the 2D histogram shown in figure 1.
This illustrates how dense and precise the information in a LCMS messurement is. For comparison the M/Z value of an electron is 5.489e-4.
"},{"location":"background/#processing-lcms-data","title":"Processing LCMS data","text":"After the data has been collected on a mass spectrometer (MS) and stored in a (usually) vendor specific format the data can be subjected to analysis. To process data with MINT the data has to be provided in an open format (mzML or mzXML).
Instead of analysing the raw LCMS data it is common practise to deconvolute the data and sum up the signal of individual metabolites. The processed data should be proportional to the amount of metabolite in the sample. However, the meassured intensities will not reflect the relative concentrations between different compounds, only between different samples. For example, due to different ion efficiences compound A might have a stronger signal than compound B even if the compound B is present at higher concentration. Therefore, the intensities can only be use to compare relative amounts. To estimate absolute concentrations a calibration curve has to be created for every single metabolite.
The binning transforms the semi-structured data into a structured format where each column stands for one particular metabolite. Often the data is normalized for each metabolite to reflect the relative intensities across multiple samples. The structured data can then be subjected to common data anayses such as dimensionality reduction, or clustering analysis.
Figure 3: Clustering analysis for a small set of metabolites across 12 different samples including 3 different pathogens (EC: E. coli, SA: S. aureus, CA: C. albicans).
"},{"location":"developer-notes/","title":"Developer Notes","text":"python3 setup.py sdist bdist_wheel\npython3 -m twine upload --repository ms-mint dist/ms*mint-*\n
"},{"location":"developer-notes/#windows-executables","title":"Windows executables","text":"cd specfiles && pyinstaller --noconfirm Mint.spec ..\\scripts\\Mint.py\n
"},{"location":"developer-notes/#documentation-deployment","title":"Documentation deployment","text":"mkdocs build && mkdocs gh-deploy\n
"},{"location":"developer-notes/#example-nginx-config","title":"Example NGINX config","text":"To run Mint on a remote server you need to setup a remote proxy.
server {\n ...\n location / {\n proxy_pass http://localhost:8080;\n client_max_body_size 100G;\n proxy_set_header X-Forwarded-Proto https;\n proxy_set_header Host $host;\n }\n}\n
"},{"location":"gallery/","title":"Gallery","text":""},{"location":"gui/","title":"MINT GUI","text":""},{"location":"gui/#workspaces","title":"Workspaces","text":" - Add new workspaces
- Delete workspaces
- Activate workspace
A workspace is a container for project files that is separated from other workspaces. Through workspaces it is possible to work on different projects simultaneously. All files relevant for one workspace are stored in a corresponding sub-folder of --data-dir
, which by default is the folder MINT in the users home directory. The home directory is different on different platforms. Under Windows the default folder is: C:/Users/<username>/MINT
. The path to the active workspace is always displayed above the workspace tab.
To activate a particular workspace the workspace has to be selected in the table and then the ACTIVATE
button has to be clicked. DELETE
will display a popup window upon confirmation the selected workspace with all corresponding files on the harddrive in --data-dir
will be removed.
"},{"location":"gui/#ms-files","title":"MS-files","text":" - Import mass spectrometry files (MS-file) in mzXML or mzML format
- Convert file to feather format (other formats will be removed)
- Remove MS-files from workspace
Mass-Spec files (in mzML or mzXML format) can be added under the MS-files
tab by drag and drop or by using the selection form. Due to limitations of the Plotly-Dash framework only up to 10 files can be uploaded at a time. For larger projects, the files can simply be copied manually into the ms-files subdirectory. This will be improved in future versions of MINT.
To remove certain files the files have to be selected in the table and the DELETE SELECTED FILES
has to be clicked.
The files are converted to feather
format which is based on Apache Arrow. It is a representation that allows faster read into memory. If files were added manually by copying into the ms-files subdirectory the files can be converted to feather format with the CONVERT TO FEATHER
button. Note that mzXML and mzML files will be deleted after convertion.
"},{"location":"gui/#metadata","title":"Metadata","text":" - Select samples used for peak optimization by setting values in the column
PeakOpt
to True
. - Add batch labels to analyse for possible batch effects.
- Add labels to analyse for differences of different groups (e.g. treatment and control)
- Add types for different files e.g. biological sample, quality control sample, standards etc in order to include ore exclude certain types during analysis.
- Add other types of metadata.
Metadata for the individual files can be edited in the Metadata
tab. This data can be used to group results e.g. by batch or by label as well as types. You want to edit metadata table to:
"},{"location":"gui/#targetlists","title":"Targetlists","text":" - Import peaklist from CSV file or add new peaks manually
- Rename peaks definitions or change parameters
- Delete peak definitions
Targetlists are collection of peak definitions for the extraction of MS intensities beloning to individual metabolites. Targetlists can be provided as Excel or CSV files. Targetlists are explained in more detail here. Files can be uploaded via the drag and drop area or the selection tool. The targetlists can be edited in place or with the optimization tools.
"},{"location":"gui/#add-metabolites","title":"Add Metabolites","text":" - Search for metabolites from ChEBI three stars database
- Add selected metabolites to peaklist (without RT estimation)
"},{"location":"gui/#peak-optimization","title":"Peak Optimization","text":" - Optimize retention times for all peaks or individual peaks
- Preview all peakshapes with quality indicator
Retention times (RT) depend on the experiment and the specific chromatographic column used. Additionally, aging of the column leads to drifts in RT that have to be accounted for. The tools in the peak optimization tab can be used to quickly review all peak definitions in the presently loaded peaklist.
The GENERATE PEAK PREVIEWS
generates a preview of all peak definitions and plots the coresponding chromatograms for all files. The peaks can be reviewed and modified one by one with the interactive tool. FIND CLOSED PEAKS
iterates through all peak definitions and identifes the closest peak with respect to the expected RT which is displayed as black vertical line.
"},{"location":"gui/#manual-interactive-peak-optimization","title":"Manual (interactive) peak optimization","text":" - Optimize individual peaks one by one
- Find bad peaks
- Remove peaks from peaklist
- Set expected retention time
When a peak is selected in the drop down box the chromatograms for the particular mass windows using the peak width as defined in the peaklist is extracted and displayed. The current rt window is visualized as green box. SET RT TO CURRENT VIEW
will set the rt_min and rt_max values to the current view and updated the peaklist accordingly.
"},{"location":"gui/#processing","title":"Processing","text":" - Run MINT (apply the extraction protocol to all files in the workspace)
- Download results
- Reset results and start again
When all peaks look good the data can be processed using RUN MINT
. This will apply the current peaklist to the MS-files in the workspace and extract additional properties. When the results tables are present the results can be explored with the following tabs. The generated results can be downloaded with the DOWNLOAD
button.
"},{"location":"gui/#analysis","title":"Analysis","text":"After running MINT the results can be downloaed or analysed using the provided tools. For quality control purposes histograms and boxplots can be generated in the quality control tab. The interactive heatmap tool can be used to explore the results data after RUN MINT
has been exectuted. The tool allows to explore the generated data in from of heatmaps.
"},{"location":"gui/#general-selection-elements","title":"General selection elements","text":" - Include/exclude file types (based on
Type
column in metadata) - Include/exclude peak labels for analysis
- Set file sorting (e.g. by name, by batch etc.)
- Select group-by column for coloring and statistics
"},{"location":"gui/#heatmap","title":"Heatmap","text":"The first dropdown menu allows to include certain file types e.g. biological samples rather than quality control samples. The second dropdown menu distinguishes the how the heatmap is generated.
- Normalized by biomarer: devide values by column maxium.
- Cluster: Cluster rows with hierachical clustering.
- Dendrogram: Plots a dendrogram instead of row labels.
- Transpose: Switch columns and rows.
- Correlation: Calculate pearson correlation between columns.
- Show in new tab: The figure will be generated in a new independent tab. That way multiple heatmaps can be generated at the same time.
"},{"location":"gui/#correlation-of-scaled-peak_max","title":"Correlation of (scaled) peak_max","text":""},{"location":"gui/#distributions","title":"Distributions","text":" - Plot histograms
- Density distributions
- Boxplots
The MS-files can be grouped based on the values in the metadata table. If nothing is selected the data will not be grouped in order to plot the overall distribution. The second dropdown menu allows to select one or multple kinds of graphs that to generate. The third dropdown menu allows to include certain file types. For example, the analysis can be limited to only the biological samples if such a type has been defined in the type column of the metadata table.
The checkbox can be used to create a dense view. If the box is unchecked the output will be visually grouped into an individual section for each metabolite.
"},{"location":"gui/#pca","title":"PCA","text":" - Perform Principal Component Analysis (PCA)
- Plot projections to first N principal components
- Contributions of original variables to each component.
"},{"location":"gui/#hierarchical-clustering","title":"Hierarchical clustering","text":""},{"location":"gui/#plotting","title":"Plotting","text":"MINT comes with a flexible and powerful plotting interface that is based on the powerful Seaborn library.
- Bar plots
- Violin plots
- Boxen plot
- Scatter plots
- and more...
"},{"location":"install/","title":"Installation","text":""},{"location":"install/#installation-with-pip-linux-macos-windows","title":"Installation with PIP (Linux, MacOS, Windows)","text":"The latest release of the program can easily be installed in a standard Python 3 (>= 3.7) environment using the widely used package manager pip
:
pip install ms-mint\n
Should download and install all necessary dependencies and Mint. Mint should then be available via Mint.py
"},{"location":"install/#windows-installer","title":"Windows Installer","text":"For Windows 10 a build is provided here. The installer generates an icon in the windows start menu. There will be a terminal be shown, with potentially some errors due to missing files which can be ignored. Give it some time until the server is running and then navigate to http://localhost:9999 in the browser.
"},{"location":"install/#start-mintpy","title":"Start Mint.py
","text":"After installation MINT can be started by running Mint.py
.
Mint.py --help\nusage: Mint.py [-h] [--no-browser] [--version] [--data-dir DATA_DIR] [--debug] [--port PORT] [--serve-path SERVE_PATH]\n\nMINT frontend.\n\noptional arguments:\n -h, --help show this help message and exit\n --no-browser do not start the browser\n --version print current version\n --data-dir target directory for MINT data\n --debug start MINT server in debug mode\n --port change the port\n --serve-path serve app at a different path e.g. '/mint/' to serve the app at 'localhost:9999/mint/'\n
If the browser does not open automatically open it manually and navigate to http://localhost:9999
. The app's frontend is build using Plotly-Dash and runs locally in a browser. Thought, the Python functions provided can be imported and used in any Python project independently. The GUI is under active development and may be optimized in the future.
"},{"location":"install/#docker","title":"Docker","text":"MINT is now available on DockerHub in containerized format. A container is a standard unit of software that packages up code and all its dependencies, so the application runs quickly and reliably from one computing environment to another. In contrast to a virtual machine (VM), a Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. This allows to run MINT on any computer that can run Docker.
The following command can be used to pull the latest image from docker hub.
docker pull msmint/msmint:latest\n
The image can be started with:
docker run -p 9999:9999 -it msmint/msmint:latest -v /data/:/data/\n
Then the tool is available in the browser at http://localhost:9999.
"},{"location":"install/#from-source","title":"From source","text":"Here we use conda
from the miniconda package to install dependencies in a virtual environment.
git clone https://github.com/soerendip/ms-mint\ncd ms-mint\n\nconda create -n ms-mint python=3.8\nconda activate ms-mint\npip setup.py install # for regular install\npip install -e . # for development\n
"},{"location":"quickstart/","title":"Quickstart","text":""},{"location":"quickstart/#quickstart","title":"Quickstart","text":"A demo server is available here. Be mindful, you share the server with others.
Download the demo files from Google Drive and extract the archive.
You will find two csv
files and 12 mzXML
and/or mzML
files.
- A folder with 12 mass-spectrometry (MS) files from microbial samples. We have four files for each Staphylococcus aureus (SA), Escherichia coli (EC), and Candida albicans (CA). Each file belongs to one of four batches (B1-B4).
MINT-metadata.csv
contains this information in tabular format. Setting up the metadata for your project is essential. MINT-targets.csv
contains the extraction lists. The identification of the metabolites has been done before, so we know where the metabolites appear in the MS data.
"},{"location":"quickstart/#1-open-the-mint-application-and-create-a-new-workspace-named-demo","title":"1. Open the MINT application and create a new workspace named DEMO.","text":"At workspaces
click on CREATE WORKSPACE
. Type DEMO
into the text field and click on CREATE
.
"},{"location":"quickstart/#2-make-sure-the-demo-workspace-is-activated-indicated-by-a-blue-bullet-in-the-workspace-table","title":"2. Make sure the DEMO workspace is activated, indicated by a blue bullet in the workspace table.","text":""},{"location":"quickstart/#3-switch-to-ms-files-and-upload-the-12-ms-files","title":"3. Switch to MS-Files
and upload the 12 MS files.","text":"Wait until all files are uploaded.
"},{"location":"quickstart/#4-switch-to-metadata-and-upload-mint-metadatacsv","title":"4. Switch to Metadata
and upload MINT-metadata.csv
.","text":"This file contains information about your samples. Setting up metadata is important and needs to be done with care. Your downstream analysis will benefit greatly from a good metadata table.
PeakOpt
if True these files will be used in the peak optimization tab. Label
should be used to indicate the group of the sampe. E.g. treatment group vs control group. Batch
is the batch ID, for example the plate if samples come from multiple plates. Type
indicates the type for the sample. By default evertying is called Biological Sample
, other meaningful labels are Standard Sample
or Quality Control
. Row
and Column
indicate the location of the sample on the plate e.g. 1-12 and A-H for a 96-well plate. RunOrder
can contain the order 1-N in which the samples were processed. - Add more columns if you need. You can download the file, add new columns with Excel, and upload the table again.
- Use the
action
to set multiple cell values at once.
"},{"location":"quickstart/#5-switch-to-targets-and-upload-mint-targetscsv","title":"5. Switch to Targets
and upload MINT-targets.csv
.","text":"This is the data extraction protocol. This determines what data is extracted from the files. The same protocol is applied to all files. No fitting or peak optimization is done. MINT therefore requires a very stable chromatographic column and stable retention times for all files in a workspace.
"},{"location":"quickstart/#6-switch-to-peak-optimization","title":"6. Switch to Peak Optimization
","text":"Switch the File selection
to Use all files
. Normally, especially for large datasets, you should select a small representative set of samples. The peak optimization takes longer the more files are used for it and the more targets are defined. 'Click on UPDATE PEAK PREVIEWS
.
- here you can see the shapes of the extracted peaks.
- optimize the retention time with the interactive tool.
- you can click on the image to load the data into the interactive tool or use the dropdown menu.
- highlight the prefered region (highlighted in green) and click on
SET RT TO CURRENT VIEW
to update the retention time window. - the horizontal bar indicates how far away the selected window is from the
rt
reference value in the Targets
.
- if you are happy with the peak shapes you can proceed to
Processing
.
"},{"location":"quickstart/#6-switch-to-processing-and-start-the-data-extraction-with-run-mint","title":"6. Switch to Processing
and start the data extraction with Run MINT
","text":"The extraction process is done when you get a green notification Finished running MINT
. Now, you can download the results in long-format or the dense peak_max values. The tidy format contains all results, while the DENSE PEAK_MAX
only contians the peak_max
values as a matrix.
"},{"location":"quickstart/#7-switch-to-analysis","title":"7. Switch to Analysis
.","text":"Once the results are generated the 'Heatmaptab will show an interactive heatmap. You can change the size of the heatmap by changing your browser window and
UPDATEthe plot. The heatmap shows the
peak_max` values. The dropdown menu provides some options.
"},{"location":"quickstart/#8-switch-to-analysisplotting","title":"8. Switch to Analysis/Plotting
","text":"The plotting tool is very powerful, but requires some practise. It is a wrapper of the powerful seaborn API. Let's create a few simple visualizations.
And click on Update
. A very simple bar-graph is shown, and we will gradually make it more complex. This simple bar graph shows the average peak_max
value across the whole dataset for all targets.
"},{"location":"quickstart/#a-select-peak_label-for-the-x-axis","title":"a) select peak_label
for the X
axis.","text":""},{"location":"quickstart/#b-set-aspect-ratio-to-5","title":"b) set aspect-ratio to 5.","text":""},{"location":"quickstart/#c-select-logarithmic-y-scale-in-the-dropdown-options","title":"c) select Logarithmic y-scale
in the dropdown options.","text":""},{"location":"quickstart/#d-click-on-update","title":"d) click on UPDATE
.","text":""},{"location":"quickstart/#e-set-figure-height-to-15-and-aspect-ratio-to-2","title":"e) set figure height to 1.5
and aspect ratio to 2
.","text":""},{"location":"quickstart/#e-set-column-to-label","title":"e) set Column
to Label
.","text":""},{"location":"quickstart/#f-set-row-to-batch","title":"f) set Row
to Batch
.","text":"This way you can look at the whole dataset at once, sliced by Batch
and Label
"},{"location":"quickstart/#exercise-try-to-create-the-following-plot","title":"Exercise: Try to create the following plot:","text":""},{"location":"targets/","title":"Target lists","text":"A target list contains the definitions of peaks to be extracted in terms of retention time and mz value. The important parameters for MINT are rt_min
and rt_max
. The rt
value is only used as an estimate and used for comparison. You should know, from former identification runs, at what retention time to expect a certain metabolite. This is what rt
is for. For the final extraction process however rt_min
and rt_max
are used. Before you process the MS files, you should check that all targts have rt_min
and rt_max
properly set.
"},{"location":"targets/#target-list-format","title":"Target list format","text":"The target list is the determining protocol for the data processing step. You can reproduce all results using this list as input. A target list can be provided as csv
(comma separated values) or xlsx
(Microsoft Excel) file.
If the preaklist is provided as multi-sheet xlsx file the target list should be the first sheet.
The input files contains a number of columns headers in the target list should contain:
- peak_label : A unique identifier such as the biomarker name or ID. Even if multiple peaklist files are used, the label have to be unique across all the files.
- mz_mean : The target mass (m/z-value) in [Da].
- mz_width : The width of the peak in the m/z-dimension in units of ppm. The window will be mz_mean +/- (mz_width * mz_mean * 1e-6). Usually, a values between 5 and 10 are used.
- rt : Estimated retention time in [min] (optional, see above).
- rt_min : The start of the retention time for each peak in [min].
- rt_max : The end of the retention time for each peak in [min].
- intensity_threshold : A threshold that is applied to filter noise for each window individually. Can be set to 0 or any positive value.
"},{"location":"targets/#example-file","title":"Example file","text":"target.csv:
peak_label,mz_mean,mz_width,rt_min,rt_max,intensity_threshold\nBiomarker-A,151.0605,10,4.65,5.2,0\nBiomarker-B,151.02585,10,4.18,4.53,0\n
A template can be created using the GUI.
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":""},{"location":"#mint-metabolomics-integrator","title":"MINT - Metabolomics Integrator","text":"MINT is a sophisticated post-processing tool designed for liquid chromatography-mass spectrometry (LCMS) based metabolomics. Metabolomics, the comprehensive study of small molecule metabolites within biological samples, plays a pivotal role in biomedical research. These metabolites serve as crucial biomarkers for disease diagnostics, therapeutic interventions, and pathogen identification, including methicillin-resistant Staphylococcus aureus (MRSA).
"},{"location":"#quickstart","title":"Quickstart","text":"Check out the Quickstart to jump right into it.
"},{"location":"#what-is-lcms","title":"What is LCMS?","text":"A typical biological sample, such as human blood or agar with bacteria, can contain thousands of metabolites such as sugars, alcohols, amino acids, nucleotides, and more. To measure the composition of such a sample, mass spectrometry can be used.
However, many metabolites share exact masses with other metabolites and therefore would be indistinguishable in the mass spectrometer. Therefore, compounds are sorted using column chromatography and spread out over time. The metabolites that enter the column at the same time interact with the column in different ways based on their specific stereochemistry. These interactions let compounds move faster or slower through the column, and therefore the compounds will elute at different times. That way, various metabolites can be analyzed successively over a certain timeframe rather than simultaneously.
The mass spectrometer that follows the chromatographic column measures the masses given at each point in time and returns a time-dependent spectrogram. An example of an LCMS measurement is visualized in the following figure:
Figure 1: A 2D-histogram of MS1 recorded intensities taken over a time span of 10 minutes. Shown are m/z values between 100 and 600 [Da/z].
If we zoom into this figure to a very narrow band of masses, the traces of individual metabolites can be observed. The trace of succinate (or succinic acid) is shown here:
Figure 2: A zoom into the 2D histogram shown in Figure 1.
This illustrates how dense and precise the information in an LCMS measurement is. For comparison, the M/Z value of an electron is 5.489e-4.
"},{"location":"#processing-lcms-data","title":"Processing LCMS Data","text":"After the data has been collected on a mass spectrometer (MS) and stored in a (usually) vendor-specific format, the data can be subjected to analysis. To process data with MINT, the data has to be provided in an open format (mzML or mzXML).
Instead of analyzing the raw LCMS data, it is common practice to deconvolute the data and sum up the signal of individual metabolites. The processed data should be proportional to the amount of metabolite in the sample. However, the measured intensities will not reflect the relative concentrations between different compounds, only between different samples. For example, due to different ion efficiencies, compound A might have a stronger signal than compound B even if compound B is present at a higher concentration. Therefore, the intensities can only be used to compare relative amounts. To estimate absolute concentrations, a calibration curve has to be created for every single metabolite.
The binning transforms the semi-structured data into a structured format where each column stands for one particular metabolite. Often the data is normalized for each metabolite to reflect the relative intensities across multiple samples. The structured data can then be subjected to common data analyses such as dimensionality reduction or clustering analysis.
Figure 3: Clustering analysis for a small set of metabolites across 12 different samples including 3 different pathogens (EC: E. coli, SA: S. aureus, CA: C. albicans).
"},{"location":"#future-directions","title":"Future Directions","text":"MINT is continually evolving to incorporate new features and improvements. Future developments include enhanced data visualization tools, integration with other omics data, and improved user interface design to cater to a broader range of users. Community support is vital for the ongoing development of MINT, and we encourage users to contribute their feedback and engage with the development team.
"},{"location":"#conclusion","title":"Conclusion","text":"In summary, MINT is a powerful tool for the post-processing of LCMS-based metabolomics data, offering significant advantages in data analysis and interpretation. Its robust design and comprehensive features make it an invaluable resource for researchers in the field of metabolomics. We invite the scientific community to adopt MINT in their workflows and contribute to its continuous improvement.
"},{"location":"developer-notes/","title":"Developer Notes","text":"python3 setup.py sdist bdist_wheel\npython3 -m twine upload --repository ms-mint dist/ms*mint-*\n
"},{"location":"developer-notes/#windows-executables","title":"Windows executables","text":"cd specfiles && pyinstaller --noconfirm Mint.spec ..\\scripts\\Mint.py\n
"},{"location":"developer-notes/#documentation-deployment","title":"Documentation deployment","text":"mkdocs build && mkdocs gh-deploy\n
"},{"location":"developer-notes/#example-nginx-config","title":"Example NGINX config","text":"To run Mint on a remote server you need to setup a remote proxy.
server {\n ...\n location / {\n proxy_pass http://localhost:8080;\n client_max_body_size 100G;\n proxy_set_header X-Forwarded-Proto https;\n proxy_set_header Host $host;\n }\n}\n
"},{"location":"gallery/","title":"Gallery","text":""},{"location":"gui/","title":"MINT GUI","text":""},{"location":"gui/#workspaces","title":"Workspaces","text":" - Add new workspaces
- Delete workspaces
- Activate workspace
A workspace is a container for project files that is separated from other workspaces. Through workspaces it is possible to work on different projects simultaneously. All files relevant for one workspace are stored in a corresponding sub-folder of --data-dir
, which by default is the folder MINT in the users home directory. The home directory is different on different platforms. Under Windows the default folder is: C:/Users/<username>/MINT
. The path to the active workspace is always displayed above the workspace tab.
To activate a particular workspace the workspace has to be selected in the table and then the ACTIVATE
button has to be clicked. DELETE
will display a popup window upon confirmation the selected workspace with all corresponding files on the harddrive in --data-dir
will be removed.
"},{"location":"gui/#ms-files","title":"MS-files","text":" - Import mass spectrometry files (MS-file) in mzXML or mzML format
- Convert file to feather format (other formats will be removed)
- Remove MS-files from workspace
Mass-Spec files (in mzML or mzXML format) can be added under the MS-files
tab by drag and drop or by using the selection form. Due to limitations of the Plotly-Dash framework only up to 10 files can be uploaded at a time. For larger projects, the files can simply be copied manually into the ms-files subdirectory. This will be improved in future versions of MINT.
To remove certain files the files have to be selected in the table and the DELETE SELECTED FILES
has to be clicked.
The files are converted to feather
format which is based on Apache Arrow. It is a representation that allows faster read into memory. If files were added manually by copying into the ms-files subdirectory the files can be converted to feather format with the CONVERT TO FEATHER
button. Note that mzXML and mzML files will be deleted after convertion.
"},{"location":"gui/#metadata","title":"Metadata","text":" - Select samples used for peak optimization by setting values in the column
PeakOpt
to True
. - Add batch labels to analyse for possible batch effects.
- Add labels to analyse for differences of different groups (e.g. treatment and control)
- Add types for different files e.g. biological sample, quality control sample, standards etc in order to include ore exclude certain types during analysis.
- Add other types of metadata.
Metadata for the individual files can be edited in the Metadata
tab. This data can be used to group results e.g. by batch or by label as well as types. You want to edit metadata table to:
"},{"location":"gui/#targetlists","title":"Targetlists","text":" - Import peaklist from CSV file or add new peaks manually
- Rename peaks definitions or change parameters
- Delete peak definitions
Targetlists are collection of peak definitions for the extraction of MS intensities beloning to individual metabolites. Targetlists can be provided as Excel or CSV files. Targetlists are explained in more detail here. Files can be uploaded via the drag and drop area or the selection tool. The targetlists can be edited in place or with the optimization tools.
"},{"location":"gui/#add-metabolites","title":"Add Metabolites","text":" - Search for metabolites from ChEBI three stars database
- Add selected metabolites to peaklist (without RT estimation)
"},{"location":"gui/#peak-optimization","title":"Peak Optimization","text":" - Optimize retention times for all peaks or individual peaks
- Preview all peakshapes with quality indicator
Retention times (RT) depend on the experiment and the specific chromatographic column used. Additionally, aging of the column leads to drifts in RT that have to be accounted for. The tools in the peak optimization tab can be used to quickly review all peak definitions in the presently loaded peaklist.
The GENERATE PEAK PREVIEWS
generates a preview of all peak definitions and plots the coresponding chromatograms for all files. The peaks can be reviewed and modified one by one with the interactive tool. FIND CLOSED PEAKS
iterates through all peak definitions and identifes the closest peak with respect to the expected RT which is displayed as black vertical line.
"},{"location":"gui/#manual-interactive-peak-optimization","title":"Manual (interactive) peak optimization","text":" - Optimize individual peaks one by one
- Find bad peaks
- Remove peaks from peaklist
- Set expected retention time
When a peak is selected in the drop down box the chromatograms for the particular mass windows using the peak width as defined in the peaklist is extracted and displayed. The current rt window is visualized as green box. SET RT TO CURRENT VIEW
will set the rt_min and rt_max values to the current view and updated the peaklist accordingly.
"},{"location":"gui/#processing","title":"Processing","text":" - Run MINT (apply the extraction protocol to all files in the workspace)
- Download results
- Reset results and start again
When all peaks look good the data can be processed using RUN MINT
. This will apply the current peaklist to the MS-files in the workspace and extract additional properties. When the results tables are present the results can be explored with the following tabs. The generated results can be downloaded with the DOWNLOAD
button.
"},{"location":"gui/#analysis","title":"Analysis","text":"After running MINT the results can be downloaed or analysed using the provided tools. For quality control purposes histograms and boxplots can be generated in the quality control tab. The interactive heatmap tool can be used to explore the results data after RUN MINT
has been exectuted. The tool allows to explore the generated data in from of heatmaps.
"},{"location":"gui/#general-selection-elements","title":"General selection elements","text":" - Include/exclude file types (based on
Type
column in metadata) - Include/exclude peak labels for analysis
- Set file sorting (e.g. by name, by batch etc.)
- Select group-by column for coloring and statistics
"},{"location":"gui/#heatmap","title":"Heatmap","text":"The first dropdown menu allows to include certain file types e.g. biological samples rather than quality control samples. The second dropdown menu distinguishes the how the heatmap is generated.
- Normalized by biomarer: devide values by column maxium.
- Cluster: Cluster rows with hierachical clustering.
- Dendrogram: Plots a dendrogram instead of row labels.
- Transpose: Switch columns and rows.
- Correlation: Calculate pearson correlation between columns.
- Show in new tab: The figure will be generated in a new independent tab. That way multiple heatmaps can be generated at the same time.
"},{"location":"gui/#correlation-of-scaled-peak_max","title":"Correlation of (scaled) peak_max","text":""},{"location":"gui/#distributions","title":"Distributions","text":" - Plot histograms
- Density distributions
- Boxplots
The MS-files can be grouped based on the values in the metadata table. If nothing is selected the data will not be grouped in order to plot the overall distribution. The second dropdown menu allows to select one or multple kinds of graphs that to generate. The third dropdown menu allows to include certain file types. For example, the analysis can be limited to only the biological samples if such a type has been defined in the type column of the metadata table.
The checkbox can be used to create a dense view. If the box is unchecked the output will be visually grouped into an individual section for each metabolite.
"},{"location":"gui/#pca","title":"PCA","text":" - Perform Principal Component Analysis (PCA)
- Plot projections to first N principal components
- Contributions of original variables to each component.
"},{"location":"gui/#hierarchical-clustering","title":"Hierarchical clustering","text":""},{"location":"gui/#plotting","title":"Plotting","text":"MINT comes with a flexible and powerful plotting interface that is based on the powerful Seaborn library.
- Bar plots
- Violin plots
- Boxen plot
- Scatter plots
- and more...
"},{"location":"install/","title":"Installation","text":""},{"location":"install/#installation-with-pip-linux-macos-windows","title":"Installation with PIP (Linux, MacOS, Windows)","text":"The latest release of the program can easily be installed in a standard Python 3 (>= 3.7) environment using the widely used package manager pip
:
pip install ms-mint\n
Should download and install all necessary dependencies and Mint. Mint should then be available via Mint.py
"},{"location":"install/#windows-installer","title":"Windows Installer","text":"For Windows 10 a build is provided here. The installer generates an icon in the windows start menu. There will be a terminal be shown, with potentially some errors due to missing files which can be ignored. Give it some time until the server is running and then navigate to http://localhost:9999 in the browser.
"},{"location":"install/#start-mintpy","title":"Start Mint.py
","text":"After installation MINT can be started by running Mint.py
.
Mint.py --help\nusage: Mint.py [-h] [--no-browser] [--version] [--data-dir DATA_DIR] [--debug] [--port PORT] [--serve-path SERVE_PATH]\n\nMINT frontend.\n\noptional arguments:\n -h, --help show this help message and exit\n --no-browser do not start the browser\n --version print current version\n --data-dir target directory for MINT data\n --debug start MINT server in debug mode\n --port change the port\n --serve-path serve app at a different path e.g. '/mint/' to serve the app at 'localhost:9999/mint/'\n
If the browser does not open automatically open it manually and navigate to http://localhost:9999
. The app's frontend is build using Plotly-Dash and runs locally in a browser. Thought, the Python functions provided can be imported and used in any Python project independently. The GUI is under active development and may be optimized in the future.
"},{"location":"install/#docker","title":"Docker","text":"MINT is now available on DockerHub in containerized format. A container is a standard unit of software that packages up code and all its dependencies, so the application runs quickly and reliably from one computing environment to another. In contrast to a virtual machine (VM), a Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. This allows to run MINT on any computer that can run Docker.
The following command can be used to pull the latest image from docker hub.
docker pull msmint/msmint:latest\n
The image can be started with:
docker run -p 9999:9999 -it msmint/msmint:latest -v /data/:/data/\n
Then the tool is available in the browser at http://localhost:9999.
"},{"location":"install/#from-source","title":"From source","text":"Here we use conda
from the miniconda package to install dependencies in a virtual environment.
git clone https://github.com/soerendip/ms-mint\ncd ms-mint\n\nconda create -n ms-mint python=3.8\nconda activate ms-mint\npip setup.py install # for regular install\npip install -e . # for development\n
"},{"location":"quickstart/","title":"Quickstart","text":""},{"location":"quickstart/#1-install-ms-mint-app-and-start-the-application","title":"1. Install ms-mint-app
and start the application.","text":"If you know how to use pip
run:
pip install ms-mint-app\n
or follow the instruction here.
Then start the application with
Mint.py\n
or, if you have a prefered directory for data you can specify it with --data-dir
e.g.:
Mint.py --data-dir /data\n
The application will take a while until it starts up. In the mean time the browser window will show
This site can\u2019t be reached
Just wait a bit until the terminal shows INFO:waitress:Serving on http://127.0.0.1:9999
and refresh the page. The application is now served on port 9999
of your local machine.
If you have never started the application before, you will not have any workspaces yet.
"},{"location":"quickstart/#2-create-a-workspace","title":"2. Create a workspace","text":"In the Workspaces
tab click on the blue button with the label CREATE WORKSPACE
. A dialogue opens asking you for the name of the future workspace. Type DEMO
into the text field and click on CREATE
.
Now you have created your first workspace, but it is empty. We will need some input files to populate it. You can see which workspace is activated in the light-blue info box:
"},{"location":"quickstart/#3-download-the-demo-files","title":"3. Download the demo files","text":"Some demo files are available for download on the ms-mint
Google-Drive. Go on and download the files from Google Drive and extract the archive.
You will find two csv
files and 12 mzXML
and/or mzML
files.
.\n\u251c\u2500\u2500 README.md\n\u251c\u2500\u2500 metadata\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 metadata.csv\n\u251c\u2500\u2500 ms-files\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 CA_B1.mzXML\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 CA_B2.mzXML\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 CA_B3.mzXML\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 CA_B4.mzXML\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 EC_B1.mzXML\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 EC_B2.mzXML\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 EC_B3.mzXML\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 EC_B4.mzXML\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 SA_B1.mzML\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 SA_B2.mzML\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 SA_B3.mzML\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 SA_B4.mzML\n\u2514\u2500\u2500 targets\n \u2514\u2500\u2500 targets.csv\n\n4 directories, 15 files\n
- A folder with 12 mass-spectrometry (MS) files from microbial samples. We have four files for each Staphylococcus aureus (SA), Escherichia coli (EC), and Candida albicans (CA). Each file belongs to one of four batches (B1-B4).
metadata.csv
contains this information in tabular format. Setting up the metadata for your project is essential. targets.csv
contains the extraction lists. The identification of the metabolites has been done before, so we know where the metabolites appear in the MS data.
"},{"location":"quickstart/#4-upload-lcms-files","title":"4. Upload LCMS files","text":"Switch to MS-Files
tab and upload the 12 MS files, by either using the Explorer/Finder to the Upload field, or by clicking on the upload field and selecting the files in the dialogue box that opens. Wait until all files are uploaded:
To speed up future processing you can now convert these files to the .feather
format, by selecting all files and clicking on \"CONVERT SELECTED FILES TO FEATHER\".
"},{"location":"quickstart/#5-add-metadata","title":"5. Add metadata","text":"Switch to Metadata
and upload metadata.csv
. This will populate the table with important information.
This file contains critical information about your samples, and setting up the metadata accurately and meticulously is essential. The metadata table is a cornerstone for downstream processes in ms-mint
and should not be omitted. Properly configured metadata enhances the quality and precision of your results, making it a vital component of your workflow.
Column Name Description ms_file_label
The label of the mass spectrometry file. color
Color coding for visual identification. use_for_optimization
Boolean value indicating if the file is used in peak optimization. in_analysis
Indicates if the sample is included in the analysis. label
Group of the sample (e.g., treatment group vs. control group). sample_type
Type of the sample, default is Biological Sample
, other labels could be Standard Sample
or Quality Control
. run_order
Order in which the samples were processed (1-N). plate
Batch ID, for example, the plate if samples come from multiple plates. plate_row
Row location of the sample on the plate (e.g., 1-12 for a 96-well plate). plate_column
Column location of the sample on the plate (e.g., A-H for a 96-well plate). ms_column
Mass spectrometry column information. ionization_mode
Mode of ionization used in the mass spectrometry."},{"location":"quickstart/#5-add-targets-metabolites","title":"5. Add targets (metabolites)","text":"Switch to Targets
and upload MINT-targets.csv
.
This is the data extraction protocol. This determines what data is extracted from the files. The same protocol is applied to all files. No fitting or peak optimization is done. MINT therefore requires a very stable chromatographic column and stable retention times for all files in a workspace.
"},{"location":"quickstart/#6-optimize-retention-times","title":"6. Optimize retention times","text":"Switch to Peak Optimization
tab and select Use all files
in the File selection
menu. Normally, especially for large datasets, you should select a small representative set of samples including standards (with known concentrations of the target metabolites). The peak optimization takes longer the more files are used for it and the more targets are defined. 'Click on UPDATE PEAK PREVIEWS
.
This will show you the shapes of the data in the selected regions as an overview. This is a great way to validate that your target parameters are correct. However, you have to make sure that the metabolite you are looking for is present in the files. That is why you should always add some standard samples. The colors in the plots correspond to the colors in the metadata table.
You can use the interactive tool below to optimize the retention time for each target manually. You can do that by zooming in towards the area that you want to select as peak and then click on SET RT TO CURRENT VIEW
. The green area is what is currently selected as retention time (RT) range. The black bar is the expected retention time of the peak maximum that you usually know from former experiments. This way you can compare the peak with older experiments. To set the expected RT to the middle of the current window press CONFIRM RETENTION TIME
. If the target is not present in any of the files, you can and should remove it from the target list by clicking on REMOVE TARGET
.
- if you are happy with the peak shapes you can proceed to
Processing
.
"},{"location":"quickstart/#7-process-the-data","title":"7. Process the data","text":"Switch to Processing
and start the data extraction with Run MINT
The extraction process is done when you get a green notification Finished running MINT
. Now, you can download the results in long-format or the dense peak_max values. The tidy format contains all results, while the DENSE PEAK_MAX
only contians the peak_max
values as a matrix.
"},{"location":"quickstart/#8-switch-to-analysis","title":"8. Switch to Analysis
.","text":"Once the results are generated the 'Heatmaptab will show an interactive heatmap. You can change the size of the heatmap by changing your browser window and
UPDATEthe plot. The heatmap shows the
peak_max` values. The dropdown menu provides some options.
"},{"location":"quickstart/#9-switch-to-analysisplotting","title":"9. Switch to Analysis/Plotting
","text":"The plotting tool is very powerful, but requires some practise. It is a wrapper of the powerful seaborn API. Let's create a few simple visualizations.
And click on Update
. A very simple bar-graph is shown, and we will gradually make it more complex. This simple bar graph shows the average peak_max
value across the whole dataset for all targets.
"},{"location":"quickstart/#a-select-peak_label-for-the-x-axis","title":"a) select peak_label
for the X
axis.","text":""},{"location":"quickstart/#b-set-aspect-ratio-to-5","title":"b) set aspect-ratio to 5.","text":""},{"location":"quickstart/#c-select-logarithmic-y-scale-in-the-dropdown-options","title":"c) select Logarithmic y-scale
in the dropdown options.","text":""},{"location":"quickstart/#d-click-on-update","title":"d) click on UPDATE
.","text":""},{"location":"quickstart/#e-set-figure-height-to-15-and-aspect-ratio-to-2","title":"e) set figure height to 1.5
and aspect ratio to 2
.","text":""},{"location":"quickstart/#e-set-column-to-label","title":"e) set Column
to Label
.","text":""},{"location":"quickstart/#f-set-row-to-batch","title":"f) set Row
to Batch
.","text":"This way you can look at the whole dataset at once, sliced by Batch
and Label
"},{"location":"quickstart/#exercise-try-to-create-the-following-plot","title":"Exercise: Try to create the following plot:","text":""},{"location":"targets/","title":"Target lists","text":"A target list contains the definitions of peaks to be extracted in terms of retention time and mz value. The important parameters for MINT are rt_min
and rt_max
. The rt
value is only used as an estimate and used for comparison. You should know, from former identification runs, at what retention time to expect a certain metabolite. This is what rt
is for. For the final extraction process however rt_min
and rt_max
are used. Before you process the MS files, you should check that all targts have rt_min
and rt_max
properly set.
"},{"location":"targets/#target-list-format","title":"Target list format","text":"The target list is the determining protocol for the data processing step. You can reproduce all results using this list as input. A target list can be provided as csv
(comma separated values) or xlsx
(Microsoft Excel) file.
If the preaklist is provided as multi-sheet xlsx file the target list should be the first sheet.
The input files contains a number of columns headers in the target list should contain:
- peak_label : A unique identifier such as the biomarker name or ID. Even if multiple peaklist files are used, the label have to be unique across all the files.
- mz_mean : The target mass (m/z-value) in [Da].
- mz_width : The width of the peak in the m/z-dimension in units of ppm. The window will be mz_mean +/- (mz_width * mz_mean * 1e-6). Usually, a values between 5 and 10 are used.
- rt : Estimated retention time in [min] (optional, see above).
- rt_min : The start of the retention time for each peak in [min].
- rt_max : The end of the retention time for each peak in [min].
- intensity_threshold : A threshold that is applied to filter noise for each window individually. Can be set to 0 or any positive value.
"},{"location":"targets/#example-file","title":"Example file","text":"target.csv:
peak_label,mz_mean,mz_width,rt_min,rt_max,intensity_threshold\nBiomarker-A,151.0605,10,4.65,5.2,0\nBiomarker-B,151.02585,10,4.18,4.53,0\n
A template can be created using the GUI.
"}]}
\ No newline at end of file
diff --git a/sitemap.xml b/sitemap.xml
index 59f187f..3f087df 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -5,11 +5,6 @@
2024-06-11
daily
-
- https://github.com/lewisresearchgroup/ms-mint-app/background/
- 2024-06-11
- daily
-
https://github.com/lewisresearchgroup/ms-mint-app/developer-notes/
2024-06-11
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 6aed7ab..3a4a824 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ
diff --git a/targets/index.html b/targets/index.html
index 26b4503..3306ef3 100644
--- a/targets/index.html
+++ b/targets/index.html
@@ -244,26 +244,6 @@
-
-
-
-
-
- Background
-
-
-
-
-
-
-
-
-
-
-
-
-
-