Metabolomics Software for MS
In recent years, the most common areas for bioinformatics software to cover have been various molecular biology research studies referred to as “omics,” including proteomics (for proteins), genomics (for DNA) and metabolomics (for metabolites). Of these, metabolomics is one of the newer frontiers. It is less established than other omics, but makes a huge impact in a wide variety of scientific studies, including drug discovery, food safety, toxicology and biomarker discovery. At its core, metabolomics examines the quantitative and qualitative changes of small molecules within a biological specimen, such as a cell or tissue. One of the primary techniques to conduct such studies is MS, where bioinformatics must have sufficient processing power and fast data access to work with very large data files.
Every major MS instrument supplier already has their own acquisition software for each platform that is generally included with the initial purchase. However, most analytical labs use multiple brands and instruments, and there is no such thing as a single MS software package that handles every type of analysis. For this reason, analytical labs will often install specialized software, either from their primary MS provider or from a third party. These bioinformatics solutions typically run on the backend, where data has already been acquired and is available in open format. This way, data from each instrument can all get processed the same way. Metabolomics software is useful for both GC/MS and LC/MS, but it is far more prevalent and useful in the latter technique. The solutions vary in what workflows they offer, but their functionality can be categorized into three areas: preprocessing, annotation and statistical analysis.
In order for MS data from metabolomics to be analyzed, preprocessing software first makes adjustments to the data using several methods. One of the first tasks to be done is peak picking, whereby important information from the MS peaks is extracted from the raw data. This can be done automatically by setting the proper search parameters, or it can be done manually. A deconvolution tool is usually employed next to recalculate all species with a multiple charge into a single-charge form so that they can be grouped together according to their m/z value and peak width. This is needed for dealing with overlapping peaks within the same metabolite. Metabolomics software may also include smoothing or noise reduction tools to remove signal distortion, and baseline correction abilities to adjust for baseline drift that shifts the mass spectrum from its origin. Peak matching and peak alignment are other crucial steps in the preprocessing stage. Because the results never align perfectly, the software helps the analyst determine which peaks belong to the same molecule by looking at their m/z ratio and retention time.
Within annotation software, MS peaks representing adducts, isotopes and fragments are identified via library or by manually searching metabolite databases. Some software packages even allow searching multiple databases at the same time. Particularly in LC/MS, it is not quite as straightforward as definitively assigning each peak to a known compound. The process is actually quite time-consuming and far from automated. There are frequently false positives, and analysts often resort to putting their data through multiple data packages. The Metabolomics Standards Initiative (MSI), published in 2007, offers a series of reporting standards for results related to metabolite identification. The MSI criteria define four levels of metabolite identification, with level 1 reserved for identified metabolites, levels 2 and 3 for putatively annotated compounds and putatively characterized compound classes, respectively, and level 4 for unknown compounds. This standard was created in order to provide clarity and consistency in scientific literature for metabolomics studies. Despite these good intentions, however, many studies do not strictly adhere to these guidelines and other variations on the standard have since been proposed.
Once the annotation process is complete, postprocessing software is used to further refine the data using several methods. This includes applying parameter thresholds to signal-to-noise ratio, calculating the minimum percentage of samples in which a feature must be present to be included in the data, filling in missing values with imputation and deciding which aspects of the data should be emphasized in order to assess the relevant information. Once such measures are complete, the MS data will exist as a matrix of signal intensities, where various statistical methods can then be applied.
The offerings that are included in metabolomics software for MS vary quite a bit and may exist in the form of web apps, R packages, Windows software, etc. Some include only one particular function, while others offer workflow solutions that cover nearly every task from preprocessing to statistical analysis. There are roughly a couple hundred products to choose from, some of them free and some of them at a significant price point. It is worth noting that it is not unusual for MS vendors to collaborate with third-party software developers.
XCMS software developed by Scripps Research Institute is perhaps the most common metabolomics solution for untargeted MS studies, boasting rapid acquisition and a robust set of algorithms to align features, identify peaks, perform statistical analysis and visualize complex results. It is currently owned by Mass Consortium, but the software is exclusively sold through SCIEX. Thermo Fisher Scientific’s signature offering for metabolomics-based MS is Compound Discoverer, an integrated set of libraries, databases and statistical analysis tools ideal for Orbitrap mass spectrometers analyzing small molecules. The latest version of this software was released in June 2018. It helps identify compounds using multiple databases, including mzCloud, Chemspider and more. Waters offers Progenesis QI software for small molecules and lipids, with its latest version from 2017. Other popular options are no cost, including MetaboAnalyst, a web-based tool for statistical analysis, and OpenMS, a library for LC/MS data that offers metabolite quantification and identification.
The entire market for MS software was over $300 million in 2018, and it is estimated that about one-fifth of end-users engage in some form of metabolomic studies. Of course, the software used for such activities is a mixture of open source no-cost analytical tools and closed source commercial software. Regardless, growth for specialized metabolomics software for MS is expected to be generally robust over the next few years, driven by an increasing number of biology and drug correlation studies sponsored by pharmaceutical companies. In the meantime, there is plenty of demand for solutions that alleviate the bottlenecks of the annotation phase.
Metabolomics Software for MS at a Glance:
- SCIEX (Danaher)
- Thermo Fisher Scientific