Mass Spectrometry Data Analysis and Bioinformatics Tools
The Automated Mass Spectrometry Data Analysis Pipeline automatically and efficiently performs protein searches for the Genomic Science Center's Mass Spectrometry data, using an analysis toolkit, an integrated suite of Mass Spectrometry analysis tools. Additionally, the Analysis Pipeline and the Center's LIMS system are closely integrated. The Pipeline queries the LIMS for new sample sets that may be ready for analysis, transfers the raw data to a central storage system, performs the protein searches, and updates the LIMS on completion of the analysis. The LIMS then imports the results of the protein searches into its database, which is then used for subsequent data mining operations.
The Analysis pipeline currently uses SEQUEST as the primary protein search tool. A new search tool, DBDigger was developed in-house and not only allows for faster searches, but also allows for more efficient searches for post-translationally modified peptides. Parallel implementation of these two search algorithms should allow for much greater sensitivity and specificity in our protein identifications. This will enable significantly higher throughput, distributed processing of Mass Spectrometry data.
A Raw file extractor using Xcalibur XDK Development Kit has been developed in-house. It is used for extraction of spectra from Mass Spectrometry RAW files and storage in a single MS2 file (a format proposed by Yate's Lab), instead of individual .dta files files for each spectrum.
Adopted the use of mzXML format and tools developed by the Institute for Systems Biology. mzXML is a reference file format standard to represent mass spectral data, developed by Institute for Systems Biology. The file format is open and extensible.
Related Publications
2008
- Article: J.M. Gilmore, D.L. Auberry, J.L. Sharp, A.M. White, K.K. Anderson, D.S. Daly, "A Bayesian Estimator of Protein-Protein Association Probabilities," Bioinformatics, 24 (13): 1554–1555, 2008. [PDF]
- Article: W. R. Cannon, B.-J. Webb-Robertson, A. R. Willse, M. Singhal, L. A. McCue, J. E. McDermott, R. C. Taylor, K. M. Waters, C. S. Oehmen, "An Integrative Computational Framework for Hypotheses-Driven Systems Biology Research in Proteomics and Genomics, in Computational and Systems Biology: Applications and Methods," Computational Systems Biology, 2008 (in press).
- Article: R.C. Taylor, M. Singhal, D.S. Daly, K.O. Domico, A.M. White, D.L. Auberry, K.J. Auberry, B.S. Hooker, G.B. Hurst, J.E. McDermott, W.H. McDonald, D.A. Pelletier, D.D. Schmoyer, W.R. Cannon, "SEBINI-CABIN: an analysis pipeline for biological network inference, with a case study in protein-protein interaction network reconstruction," Int. J. Data Mining Bioinform. 2008 (accepted).
- Abstract: Kevin K. Anderson, et al., Advanced Data Analysis Pipeline for Determination of Protein Complexes and Interaction Networks at the Genomics:GTL Center for Molecular and Cellular Systems, 2008 Genomics:GTL Awardee Workshop VI, Bethesda, Maryland.
- Abstract: William R. Cannon, et al., Analysis of the Dynamical Modular Structure of Rhodopseudomonas palustris Based on Global Analysis of Protein-Protein Interactions, 2008 Genomics:GTL Awardee Workshop VI, Bethesda, Maryland.
- Article: J.L. Sharp, J.J. Borkowski, D.D. Schmoyer, D.S. Daly, S. Purvine, W.R. Cannon, G.B. Hurst, "Statistically Appraising Process Quality of Affinity Isolation Experiments," Comp. Stat. Data Analysis 2008. [PDF]
2007
- Abstract: Denise D. Schmoyer, et al., The Microbial Protein-Protein Interaction Database (MiPPI), 2007 Genomics:GTL Awardee Workshop V, Bethesda, Maryland
2006
- Abstract: William R. Cannon, et al., Computational Approaches for Aggregating and Scoring Protein-Protein Interaction Data, 2006 Contractor-Grantee Workshop, North Bethesda, MD (PDF)
- Abstract: G.B. Hurst, et al., The Microbial Interactome Database: An Online System for Identifying Interactions Between Proteins of Microbial Species, 2006 Contractor-Grantee Workshop, North Bethesda, MD (PDF)
- Article: Chongle Pan, et al., "ProRata: A Quantitative Proteomics Program for Accurate Protein Abundance Ratio Estimation with Confidence Interval Evaluation," Analytical Chemistry, 78, 7121-31, 2006. [PDF]
- Article: Chongle Pan, et al.,"Robust Estimation of Peptide Abundance Ratios and Rigorous Scoring of Their Variability and Bias in Quantitative Shotgun Proteomics," Analytical Chemistry, 78, 7110-20, 2006. [PDF]
2005
- Presentation: D.L. Auberry, et al., "Using Nautilus to Capture Metadata for Identification of Protein Interactions," Thermo Informatics World, Bonita Springs, FL, September 20, 2005. PNNL-SA-46394.
- Abstract: Frank W. Larimer, et al., Center for Molecular and Cellular Systems: Statistical Screens for Datasets from High- Throughput Protein Pull-Down Assays, 2005 Contractor-Grantee Workshop, Washington, DC
- Article: David L. Tabb, C. Narasimhan, M. B. Strader, and R. L. Hettich, "DBDigger: Reorganized Proteomic Database Identification That Improves Flexibility and Speed," Analytical Chemistry 77, 2464-74, 2005 [PDF]
2004
- Abstract: Gordon Anderson, et al., Advanced Computational Methodologies for Protein Mass Spectral Data Analysis, 2004 Genomics:GTL Contractor-Grantee Workshop, Washington, DC
- Abstract: F. W. Larimer, et al., Bioinformatics and Computing in the Genomics:GTL Center for Molecular and Cellular Systems - LIMS and Mass Spectrometric Analysis of Proteome Data, 2004 Genomics:GTL Contractor-Grantee Workshop, Washington, DC
2003
- Article: Frank W. Larimer, et al, "Complete genome sequence of the metabolically versatile photosynthetic bacterium Rhodopseudomonas palustris," Nature Biotechnology, December 14, 2003. See also Website Rhodopseudomonas palustris complete genome sequence and annotation [PDF (opens new window)]
- Poster and Abstract: Deborah Payne, et al., "Bioinformatics and Computing in the Genomes to Life Center for Molecular and Cellular Systems," 2003 Genomes to Life Contractor-Grantee Workshop, Arlington, Virginia [Poster PDF shorter download or higher quality]




