Genomics:GTL Awardee Workshop VI
Bethesda, Maryland,
February 10–13, 2008
Project Goal: The Center for Molecular and Cellular Systems (CMCS) has established a resource for high-throughput determination of protein-protein interactions (PPI) for the Genomics:GTL community.
The Center for Molecular and Cellular Systems: Biological Insights from Large Scale Protein-Protein Interaction Studies
Michelle V. Buchanan1* (buchananmv@ornl.gov), Dale A. Pelletier,1 Gregory B. Hurst,1 W. Hayes McDonald,1 Denise D. Schmoyer,1 Jennifer L. Morrell-Falvey,1 Mitchel J. Doktycz,1 Brian S. Hooker,2 William R. Cannon,2 H. Steven Wiley,2 Nagiza F. Samatova,3 Tatiana Karpinets,1 Mudita Singhal,2 Chiann-Tso Lin,2 Ronald C. Taylor,2 Don S. Daly,2 Kevin K. Anderson,2 and Jason E. McDermott2
1Oak Ridge National Laboratory, Oak Ridge, Tennessee; 2Pacific Northwest National Laboratory, Richland, Washington; and 3North Carolina State University, Raleigh, North Carolina
The Center for Molecular and Cellular Systems (CMCS) has established a resource for high-throughput determination of protein-protein interactions (PPI) for the Genomics:GTL community. As part of the CMCS, an analysis “pipeline” has been established for identifying PPI among soluble proteins in Rhodopseudomonas palustris. The general strategy is to express an affinity-tagged protein in a bacterial culture, lyse the cells, isolate the affinity-tagged protein along with interacting proteins, and identify the affinity isolated proteins via mass spectrometry and informatics analysis. The pipeline was designed to be applicable to a wide array of gram negative bacterial species, and thus is sufficiently general to enable studies of any number of organisms that are of importance for DOE energy and environment missions. The cloning component of the pipeline is based on a flexible system (Gateway) that further expands the generality of the approach by allowing facile introduction of a wide variety of affinity (or other) tags.
The CMCS has made considerable progress toward using as affinity-tagged “baits” some 1200 R. palustris proteins that meet the following criteria: (1) the protein is predicted to be soluble, (2) the protein has been previously detected by mass spectrometry in proteomics studies. The results of our PPI survey in R. palustris are available via the Microbial Protein-Protein Interaction Database (MiPPI.ornl.gov). Statistical tools allow evaluation of the PPI based on characteristics of the data, and bioinformatics tools provide insights based on comparison of CMCS results to those from other techniques (e.g. gene expression measurements, PPI predictions) as well as PPI data from other organisms.
The results from the CMCS PPI pipeline are proving to be useful as a source of hypotheses for more detailed experiments aimed at particular pathways or systems in microbes. One example evolves around interactions observed in nitrogen fixing cells among proteins which are potentially involved in electron transfer to nitrogenase. Collaborative experiments involving the CMCS and the Harwood laboratory at the University of Washington are building on this result to explore the implications for production of hydrogen via the nitrogen fixation reaction. A further example involves study of a stress response pathway in R. palustris based on observed PPI involving proteins encoded by an operon that includes an ECF sigma factor, a putative response regulator, a putative histidine kinase, and an unknown protein.
Ongoing research in the CMCS is aimed at improving the throughput, applicability, and reliability of the PPI pipeline. Final validation and implementation of a robot-based protocol for affinity isolation will be completed in January 2008, removing a major bottleneck from the pipeline. Expansion of the CMCS pipeline to include membrane-associated proteins is underway. With these and other advances, the CMCS provides a unique resource for characterizing protein “machines” for the Genomics:GTL program. Details of these studies and other CMCS activities are covered in additional abstracts
Advanced Data Analysis Pipeline for Determination of Protein Complexes and Interaction Networks at the Genomics:GTL Center for Molecular and Cellular Systems
Kevin K. Anderson,2* William R. Cannon,2 Don S. Daly,2 Brian S. Hooker,2 Jason E. McDermott,2 Gregory B. Hurst,1 W. Hayes McDonald,1 Dale A. Pelletier,1 Denise D. Schmoyer,1 Jenny L. Morrell-Falvey,1 Mitchel J. Doktycz,1 Sheryl A. Martin,1 Mudita Singhal,2 Ronald C. Taylor,2 H. Steven Wiley,2 and Michelle V. Buchanan1 (buchananmv@ornl.gov)
1Oak Ridge National Laboratory, Oak Ridge Tennessee and 2Pacific Northwest National Laboratory, Richland Washington
The Genomics:GTL Center for Molecular and Cellular Systems (CMCS) is a DOE Center whose mission is to determine protein complexes and interaction networks for microbial systems. The CMCS is currently focusing on the completion of the characterization of soluble protein-protein interactions in Rhodopseudomas palustris. The CMCS approach combines expression of affinity tagged proteins, affinity purification of interacting proteins, and tandem mass spectrometric identification of these proteins. Our goal is to provide a capability for generating high quality protein-protein interaction data from a variety of energy- and environment-relevant microbial species. This poster provides a status report of the CMCS measurements of protein-protein interactions in R. palustris, which is of high relevance to DOE missions due to its ability to produce hydrogen, to degrade lignin monomers, and for its exceptional metabolic versatility. A critical component of the approach is our evolving data analysis pipeline.
As of early December 2007, nearly 1200 R. palustris genes have been cloned as Gateway entry vectors, and approximately 1060 expression clones for a dual affinity tag (6-His/V5) have been produced. Some 467 affinity-tagged bait proteins have been expressed, affinity purified, and subjected to mass spectrometry (MS) analysis to identify interacting proteins. Approximately 30% of these bait proteins are annotated as conserved hypothetical, conserved unknown, or unknown proteins.
The data pipeline for analysis of the data begins with a Laboratory Information Management System (LIMS) to capture the MS/MS data and descriptions regarding the biological and assay conditions (metadata). The LIMS maintains a detailed history for each sample by capturing processing parameters, protocols, stocks, tests and analytical results for the complete life cycle of the sample.
The resulting lists of potentially interacting prey proteins identified from MS/MS are statistically analyzed within a software environment specifically designed for working with biological networks. Bayes estimates of the confidence of the inferred associations are estimated for each bait/prey pair. For high confidence interactions, robust networks of interacting proteins are determined from patterns of interactions. The resulting protein networks are captured in a database within a publically accessible software environment ((https://www.emsl.pnl.gov/SEBINI/). Using an exploratory data analysis tool that enables integration and analysis of interactions evidence obtained from multiple sources (CABIN, www.sysbio.org/capabilities/compbio/cabin.stm), the information on the nodes (proteins) and edges (interactions) can be linked to external and internal bioinformatic data. The internal bioinformatic data contains information on interologues derived from the Bioverse system, which provides additional information on protein interactions. The joint analysis of experimental data and multiple sources of bioinformatic data is done graphically through collective analysis of biological interaction networks (Cabin), a plug-in for the Cytoscape network visualization program.
These protein-protein interactions are disseminated through the publicly accessible Microbial Protein-Protein Interaction Database (MiPPI.ornl.gov). MiPPI is updated every 6 months (May and November). MiPPI provides tables of observed protein-protein interactions, as well as background information on CMCS measurement and analysis techniques. Various results (mass spectrometry results, corresponding metadata, and identified protein-protein interactions, including the statistical analysis scores) are also available for download in various file formats.
Analysis of the Dynamical Modular Structure of Rhodopseudomonas palustris Based on Global Analysis of Protein-Protein Interactions
William R. Cannon2* (william.cannon@pnl.gov), Mudita Singhal,2 Ronald C. Taylor,2 Don S. Daly,2 Dale A. Pelletier,1 Gregory B. Hurst,1 Denise D. Schmoyer,1 Jennifer L. Morrell-Falvey,1 Brian S. Hooker,2 W. Hayes McDonald,1 Michelle V. Buchanan,1 and H. Steven Wiley2
1Oak Ridge National Laboratory, Oak Ridge, Tennessee and 2Pacific Northwest National Laboratory, Richland, Washington
Global determination of protein-protein interactions for Rhodopseudomonas palustris is the current target for the Genomics:GTL Center for Molecular and Cellular Systems (CMCS). R. palustris is a metabolically versatile anoxygenic phototrophic bacterium, and analyses have focused on protein interactions observed under differing conditions for nitrogen metabolism in which either NH4+ (fixed nitrogen) or N2 serve as the primary source of nitrogen.
We have used the set of protein-protein interactions as the foundation for determining the dynamic modular structure of R. palustris regulatory networks. Global interactions determined by our affinity isolation pipeline are parsed into functional subnetworks by combining operon membership, gene regulatory information, gene expression information, phylogenetic profiling, gene neighborhood analyses and predicted interactions. Approximately 6,000 interactions between over 700 proteins were parsed in modular subnetworks and compared to the pattern of regulated gene expression observed under conditions of hydrogen utilization. We have also compared these functional modules with those inferred from protein interaction data gathered in other bacteria, such as E. coli. Our analysis indicates that different technologies for evaluating protein interaction networks have distinct inherent biases and that combining multiple data sources are likely to produce the most robust results. The subnetworks inferred from multiple data sources can provide novel hypotheses relating to previously unknown proteins and can serve as a foundation for further investigations - see the posters Protein-Protein Interactions Involved in electron transfer to nitrogenase for Hydrogen Production in Rhodopseudomonas palustris and poster Identificataion of a Putative Stress Response Pathway and Novel Extracytoplasmic Function s/Anti-s Factors in the Anoxygenic Phototrophic Bacterium Rhodopseudomonas palustris by Protein-Protein Interactions for detailed discussions of biological phenomena.
Characterization of a Stress Response Pathway in the Anoxygenic Phototrophic Bacterium Rhodopseudomonas palustris
Michael S. Allen1* (allenms@ornl.gov), Dale A. Pelletier,1 Gregory B. Hurst,1 Linda J. Foote,1 Trish K. Lankford,1 Catherine K. McKeown,1 Tse-Yuan S. Lu,1 Elizabeth T. Owens,1 Denise D. Schmoyer,1 Jennifer L. Morrell-Falvey,1 W. Hayes McDonald,1 Mitchel J. Doktycz,1 Brian S. Hooker,2 William R. Cannon,2 and Michelle V. Buchanan1
1Oak Ridge National Laboratory, Oak Ridge, Tennessee and 2Pacific Northwest National Laboratory, Richland, Washington
Rhodopseudomonas palustris is an anoxygenic phototrophic bacterium possessing high metabolic diversity. As part of the Genomics:GTL Center for Molecular and Cellular Systems (CMCS) effort, this organism has been investigated for its ability to produce nitrogenase-mediated biohydrogen and its potential for bioremediation. Analysis of cytoplasmic protein fractions by shotgun proteomics has revealed several proteins up-regulated during growth on benzoate as well as under diazotrophic conditions. Among those was the putative extracytoplasmic function (ECF) s-factor RPA4225. Subsequent large-scale protein-protein interaction experiments also revealed an interaction between the unknown protein RPA4224 and the putative response regulator RPA4223. RPA4224 and RPA4225 form a single operon in R. palustris, suggesting that this unknown protein may serve as an anti-s factor. Organization of this operon along with the preceding response regulator gene RPA4223 is conserved among several a-Proteobacteria including Sinorhizobium meliloti, where the components have been shown to act as mediators of the global stress response. Additionally, we have found that the genomic location of the downstream gene RPA4226, a putative histidine kinase containing a predicted transmembrane domain, is also conserved among these bacteria. This suggests a potential role in the sensing and signal transduction of the stress response. These data underscore the utility of high-throughput methodologies to interrogate complex, multi-component systems and for generating new hypotheses regarding proteins about which little or nothing is known.
Protein-Protein Interactions Involved in Electron Transfer to Nitrogenase for Hydrogen Production in Rhodopseudomonas palustris
Dale A. Pelletier1* (pelletierda@ornl.gov), Erin Heiniger,3 Gregory B. Hurst,1 Trish K. Lankford,1 Catherine K. McKeown,1 Tse-Yuan S. Lu,1 Elizabeth T. Owens,1 Denise D. Schmoyer,1 Jennifer L. Morrell-Falvey,1 Brian S. Hooker,2 W. Hayes McDonald,1 Mitchel J. Doktycz,1 William R. Cannon,2 Caroline S. Harwood,3 and Michelle V. Buchanan1
1Oak Ridge National Laboratory, Oak Ridge Tennessee; 2Pacific Northwest National Laboratory, Richland, Washington; and 3University of Washington, Seattle, Washington
The goal of the Center for Molecular and Cellular Systems (CMCS) is to identify protein-protein interaction networks that form the molecular basis of biological function in bacterial species relevant to the Genomics:GTL program. Rhodopseudomonas palustris is a metabolically versatile anoxygenic phototrophic bacterium that is emerging as a model system for nitrogenase-mediated biohydrogen production. This process requires the integration of several metabolic and regulatory networks, including nitrogen metabolism, photosynthesis and carbon metabolism. Although the nitrogenase enzyme has been the focus of much research, we have a poor understanding of the organization of cellular components facilitating the flow of electrons derived from carbon metabolism to nitrogenase in R. palustris and other diazotrophic bacteria. To better understand this and other processes, we have begun mapping the protein-protein interactions in photoheterotrophically grown R. palustris. Shotgun proteomics and microarray analysis have identified proteins that are upregulated in R. palustris cells grown in the absence of fixed nitrogen. These proteins where subsequently analyzed to identify protein-protein interactions by affinity isolation and mass spectrometry. This analysis revealed interactions among numerous proteins including FixABCX, a predicted protein complex hypothesized to have a role in transfer of electrons to nitrogenase. Subsequently we found that a fixABCX mutant was deficient but not completely blocked in its ability to grow under nitrogen fixing conditions. This mutant was also deficient in nitrogenase activity. Supplying fixABCX in trans restored the growth phenotype. RPA1927 and RPA1928 encode proteins of unknown function that are also highly expressed under nitrogen-fixing growth conditions. While the functions of RPA1927 and RPA1928 are unknown, the presence of a predicted ferredoxin-like iron-sulfur cluster in RPA1928 implicates this protein in electron transfer. Additionally a novel putative interaction was identified between the proteins encoded by RPA1927 and RPA1928 and the FixABCX complex implying a potential role in electron transfer. An RPA1927-RPA1928 deletion strain has been constructed and growth phenotypes are under investigation. These studies have increased our understanding of the pathways and protein-protein interactions that occur in R. palustris cells grown under nitrogen-fixing and hydrogen producing conditions. These results as well as the results of future interaction studies will allow for modeling and metabolic engineering of this organism for increased yields of biological hydrogen.
*Presenting Author



