size:
 
 

R. palustris protein-protein interaction network

R. palustris protein-protein interaction network

 

Mass Spectrometry Data Analysis and Bioinformatics Tools

The Automated Mass Spectrometry Data Analysis Pipeline automatically and efficiently performs protein searches for the Genomic Science Center's Mass Spectrometry data, using an analysis toolkit, an integrated suite of Mass Spectrometry analysis tools. Additionally, the Analysis Pipeline and the Center's LIMS system are closely integrated. The Pipeline queries the LIMS for new sample sets that may be ready for analysis, transfers the raw data to a central storage system, performs the protein searches, and updates the LIMS on completion of the analysis. The LIMS then imports the results of the protein searches into its database, which is then used for subsequent data mining operations.

The Analysis pipeline currently uses SEQUEST as the primary protein search tool. A new search tool, DBDigger was developed in-house and not only allows for faster searches, but also allows for more efficient searches for post-translationally modified peptides. Parallel implementation of these two search algorithms should allow for much greater sensitivity and specificity in our protein identifications. This will enable significantly higher throughput, distributed processing of Mass Spectrometry data.

A Raw file extractor using Xcalibur XDK Development Kit has been developed in-house. It is used for extraction of spectra from Mass Spectrometry RAW files and storage in a single MS2 file (a format proposed by Yate's Lab), instead of individual .dta files files for each spectrum.

Adopted the use of mzXML format and tools developed by the Institute for Systems Biology. mzXML is a reference file format standard to represent mass spectral data, developed by Institute for Systems Biology. The file format is open and extensible.

Related Publications

2008

2007

2006

2005

2004

2003