size:
 

Genomics:GTL Awardee Workshop V
Bethesda, Maryland, February 11–14, 2007

Project Goals: The Center for Molecular and Cellular Systems (CMCS) focuses primarily on the objectives outlined in Goal 1 of the DOE Genomics:GTL program. The core of the CMCS is a high throughput pipeline for the identification of protein complexes, currently focusing on Rhodopseudomonas palustris. The pipeline employs affinity isolation coupled with mass spectrometry to identify protein interactions. Computational tools are used to assess the significance of identified interactions. A dynamic research program supports the goals of the CMCS by focusing on the development and implementation of improved capabilities for complex isolation, molecular level identification of the complexes, and critical bioinformatics and computing capabilities. These efforts are focused on constructing a knowledge base that can provide insight into the relationship between the complement of protein complexes in these microbes and their biological function.

Advanced Data Analysis Pipeline for Determination of Protein Complexes and Interaction Networks at the Genomics:GTL Center for Molecular and Cellular Systems

Kevin K. Anderson,2 Deanna L. Auberry,2 William R. Cannon2* (William.Cannon@pnl.gov), Don S. Daly,2 Brian S. Hooker,2 Gregory B. Hurst,1 Jason E. McDermott,2 W. Hayes McDonald*1 (McDdonaldWH@ornl.gov), Dale A. Pelletier,1 Denise D. Schmoyer1* (SchmoyerDD@ornl.gov), Julia L. Sharp,3 Mudita Singhal2* (Mudita.Singhal@pnl.gov), Ronald C.Taylor2* (Ronald.Taylor@pnl.gov), Michelle V. Buchanan1 (BuchananMV@ornl.gov)

1Oak Ridge National Laboratory, Oak Ridge, Tennessee; 2Pacific Northwest National Laboratory, Richland, Washington; and 3Montana State University, Bozeman, Montana

The Genomics:GTL Center for Molecular and Cellular Systems (CMCS) is a DOE Center whose mission is to determine protein complexes and interaction networks from microbial systems. Currently the center is focusing on protein interactions involved in nitrogen fixation and metabolism in Rhodopseudomonas palustris. The center uses an affinity purification approach (refer to poster "Global survey of protein-protein interactions in Rhodopseudomonas palustris") to identify protein interactions in a robust, high-throughput manner. In this process, bait proteins along with co-purifying prey proteins are extracted from the cellular milieu. The resulting protein mixture is analyzed by HPLC coupled with tandem mass spectrometry.

The pipeline for data analysis begins with a laboratory information management system (LIMS) to capture links to the MS/MS data, peptides/proteins identified from those data, and descriptions regarding biological and assay conditions (metadata). The LIMS is the central data repository for all information related to processing and analysis of CMCS samples. It maintains a detailed history for each sample by capturing processing parameters, protocols, stocks, QA/QC tests and analytical results for the complete life cycle of the sample. Project and study data are also maintained to define each sample in the context of the research tasks it supports.

The resulting lists of potentially interacting prey proteins identified from MS/MS are statistically analyzed within a software environment (Sebini) specifically designed for working with biological networks. The prey protein lists are cross-tabulated by bait protein to form a prey-by-bait frequency matrix. The frequency pattern across a given row (prey) shows the associations between a prey and the baits. Interpretation of this pattern depends on the selected baits. Pattern uniformity is tested with a binomial-based likelihood-ratio test. Test significance is assessed by Monte Carlo simulation where the false discovery rate is controlled. Prey protein candidates are assigned to "specific" and "non-specific" classes based on the likelihood-ratio test. Bayes estimates of the confidence of the inferred associations are estimated for each bait-prey pair. Modeling assumptions are investigated and conservative parameter estimates are made using Monte Carlo simulations.

The resulting protein networks are captured in a database within the software environment where information on the nodes (proteins) and edges (interactions) is linked to external and internal bioinformatics data, such as information on interologs derived from the Bioverse system, which provides additional information on protein interactions derived from orthologous proteins in other model systems. The joint analysis of experimental data and multiple sources of bioinformatically-derived information is accomplished through collective analysis of biological interaction networks (Cabin), a plug-in for the Cytoscape program. Protein interaction networks along with relevant data captured at multiple stages of the data analysis pipeline will be available for download at the project website (Refer to poster The Microbial Protein-Protein Interaction Database-MiPPI). A demonstration will be held at the workshop.

Global Survey of Protein-Protein Interactions in Rhodopseudomonas palustris

Dale A. Pelletier1* (pelletierda@ornl.gov), Gregory B. Hurst,1 Linda J. Foote,1 Trish K. Lankford,1 Catherine K. McKeown,1 Tse-Yuan S. Lu,1 Elizabeth T. Owens,1 Denise D. Schmoyer,1 Manesh B. Shah,1 Jennifer L. Morrell-Falvey,1 Brian S. Hooker,2 Stephen J. Kennel,1 W. Hayes McDonald,1 Mitchel J. Doktycz,1 Deanna L. Auberry,2 William R. Cannon,2 Kenneth J. Auberry,2 H. Steven Wiley,2 and Michelle V. Buchanan1

1Oak Ridge National Laboratory, Oak Ridge, Tennessee and 2Pacific Northwest National Laboratory, Richland, Washington

The goal of the Center for Molecular and Cellular Systems (CMCS) is to identify protein-protein interaction networks, which form the molecular basis of biological function, in environmentally relevant bacterial species in support of the Genomics:GTL program. Rhodopseudomonas palustris is a metabolically diverse anoxygenic phototrophic bacterium that is emerging as a model system for nitrogenase-mediated hydrogen production. This process requires several metabolic and regulatory networks to be integrated within the cell, including nitrogen metabolism, photosynthesis and carbon metabolism. To better understand the interactions among these processes, we have begun mapping protein-protein interactions of photoheterotrophically grown R. palustris. Toward this goal, we have developed and implemented a methodology for systematically identifying the proteins that interact with an affinity-tagged "bait" protein expressed from a plasmid introduced into R. palustris. The steps in this methodology include target or "bait" selection, primer design, PCR amplification, cloning, transformation, batch culture, lysis, affinity isolation, protein identification, and statistical filtering. Here we will present results on interactions identified by this approach.

To date, we have successfully cloned ~1000 R. palustris open reading frames, purified over 250 different affinity-tagged gene products, and identified their protein interaction partners from cultures of R. palustris grown under anaerobic photoheterotrophic growth conditions, in the presence or absence of fixed nitrogen. Interactors identified by this approach include homologues of a number of well-characterized protein complexes involved in known metabolic networks, such as nitrogen metabolism (Mo-nitrogenase, FixABCX with a possible role in electron transfer to nitrogenase, GlnK2-AmtB2, GlnK2-GlnB), carbon metabolism (2-oxoglutarate dehydrogenase, succinate dehydrogenase, succinyl-CoA synthetase, tryptophan synthase), transcription (DNA-directed RNA polymerase), chaperones (GroES-EL, DnaK-GrpE), and energy generation (F1F0 ATPase, subunits of NADH dehydrogenase). Novel putative interactions were also identified, including interactions among four anaerobically induced proteins encoded by RPA2334, RPA2335, RPA2336 and RPA2338; interactions between a conserved unknown RPA3193 and a putative acetyltransferase RPA3194; and interactions among conserved unknowns RPA1244, RPA1243 and RPA1246.

We are applying other approaches, including experimental, literature-based and bioinformatics predictions to verify these high-throughput interaction data (refer to poster Advanced Data Analysis Pipeline for Determination of Protein Complexes and Interaction Networks at the Genomics:GTL Center for Molecular and Cellular Systems). A web resource for these data will be publicly available in February 2007 (refer to poster The Microbial Protein-Protein Interaction Database-MiPPI; mippi.ornl.gov). These results demonstrate the utility of data emerging from the CMCS for confirming known interactions, as well as for generating hypotheses about potentially novel protein-protein interactions. The identification of protein interactions will aid in elucidation of biological interaction networks and possibly in predicting protein function.

Advances in Coverage and Quality for High-Throughput Protein-Protein Interaction Measurements

Jennifer Morrell-Falvey,1 Mitchel J. Doktycz,1 Dale A. Pelletier,1 Linda J. Foote,1 Elizabeth T. Owens,1 Sankar Venkatraman,1 W. Hayes McDonald1* (mcdonaldwh@ornl.gov), Brian S. Hooker,2 Chiann-Tso Lin,2 Kristin D. Victry,2 Deanna L. Auberry,2 Eric A. Livesay,2 Daniel J. Orton,2 H. Steven Wiley,2 and Michelle V. Buchanan1

1Oak Ridge National Laboratory, Oak Ridge, Tennessee and 2Pacific Northwest National Laboratory, Richland, Washington

The overarching goal of the Center for Molecular and Cellular Systems (CMCS) is to identify protein interaction networks that form the molecular basis of biological function in microbes. To accomplish this goal, we have established a high-throughput analysis pipeline that is centered on generalized affinity-based isolation of protein interactors combined with mass spectrometric identification of the interacting protein components (refer to poster "Global survey of protein-protein interactions in Rhodopseudomonas palustris"). While this approach has proven successful for identifying a large number of protein interactions within the photoheterotrophic bacterium Rhodopseudomonas palustris, interactions among some classes of proteins (e.g. membrane-associated or low abundance) remain difficult to detect. Approaches for overcoming these technological challenges as well as increasing throughput are needed for accomplishing our goal of identifying protein interaction networks with high confidence. Here we describe several strategies that we are developing to address these issues.

Our pipeline currently employs a tandem affinity tag comprised of 6XHis and the V5 epitope. Improvements in the affinity and specificity of the affinity tags used for labeling proteins will impact throughput by allowing the use of smaller culture volumes and improving detection of lower abundance proteins. In addition to improved affinity, the tags should be compatible with elution of the bound complex under nondenaturing conditions and amenable to automation using 96-well based robotic handling and microfluidic manipulations. For these reasons, we are constructing and testing several new Gateway-compatible vectors for expression of carboxy-terminal tags, including 1) 2X strep tag-6X His; 2) calmodulin binding protein (CBP)-3X FLAG; and 3) CBP-2X Protein A. In addition, two TEV protease sites are included in these constructs to facilitate more efficient elution from the first round of purification. To aid in detection and also potentially for use as another affinity capture moiety, these constructs also contain a tetracysteine tag that can be recognized by FlAsH™ reagents. Calmodulin binding protein, which has reversible binding, and the Strep tag, which can be competitively eluted, were chosen based on the requirement for native elutions.

To facilitate reductions in sample amounts and to increase throughput, we are exploring optimizations to our high performance liquid chromatographic (HPLC) separations and our mass spectrometric data acquisition equipment and parameters. Comparisons are underway between a more traditional three dimensional ion trap (ThermoFinnigan LCQ) and a newer linear ion trap (ThermoFinnigan LTQ). These include comparisons between data acquisition rates, sensitivity, and effective dynamic range. HPLC optimizations are being performed in parallel in order to both optimize data acquisition duty cycles and take advantage of differences in speed between the two instruments. These parallel optimizations will provide increases in throughput while maintaining or even increasing the dynamic range and sensitivity of our analysis pipeline.

In addition to affinity isolation and mass spectrometric characterization, we can identify and confirm predicted protein interactions using a live cell imaging-based assay that exploits specific localization patterns in cells. This assay involves co-expression of two fusion proteins in Escherichia coli. The first protein of interest is directed to the cell poles by fusion to DivIVA and the second protein of interest is fused to GFP. A direct interaction between the two proteins results in recruitment of the GFP-fusion protein to the poles. Importantly, this assay can be used to test interactions among both soluble and integral membrane proteins and is amenable to automation due to its rapidity, small scale, and ease of interpretation. To facilitate rapid analysis, we have also developed an automated image analysis algorithm to calculate the presence and location of GFP-fusion proteins in E. coli cells. Features such as cell number, diameter, area, and number of GFP-fusion protein localization sites are extracted from each image and used to relay quantitative values that aid in the scoring of positive interactions. This assay facilitates the directed analysis of protein interactions in live bacterial cells with the added benefit of amenability to automation.

Membrane-bound proteins are integral to many biologically active complexes, yet traditional isolation and purification of these proteins is usually tedious and inefficient. In order to make such purifications more routine, we have developed a co-fractionation strategy to localize and separate complexes under native states, followed by direct MS/MS analysis of the digested protein fractions. In this strategy we have tried to minimize dissociation between subunits, and thus loss of previously unknown subunits. For demonstration, clones of recombinant His-tagged ATP synthase were expressed in Shewanella oneidensis MR-1. Membrane proteins were solubilized and separated under native conditions in order of ionic strength on a Mono Q column. Fractions collected were trypsin digested and analyzed by LC MS/MS (Thermo-Finnigan LCQ). Results revealed that not only the ATP synthase subunits were eluted in common fractions, but also that proteins within other complexes were co-eluted at different ionic strengths, suggesting the presence of intact protein complexes. In parallel, we detected in-gel ATP hydrolysis approximately at the molecular size of the synthase complex (> 450 kD). Two dimensional electrophoresis images of the dissected gel show subunits ranging from 15 kD to 60 kD in size. These data demonstrate that co-fractionation and electrophoretic separation of membrane proteins coupled with mass spectrometric analysis is a valid and rapid way to analyze intracellular protein complexes.

Finally, in order to increase overall throughput, automation protocols for cloning, streaking, re-arraying, and purification steps are in place and automated affinity isolation protocols are being tested.

The Microbial Protein-Protein Interaction Database (MiPPI)

Denise D. Schmoyer1* (schmoyerdd@ornl.gov), Sheryl A. Martin,1 Gregory B. Hurst,1 Manesh B. Shah,1 Dale A. Pelletier,1 W. Hayes McDonald,1 William R. Cannon,2 Deanna L. Auberry,2 and Michelle V. Buchanan1

1Oak Ridge National Laboratory, Oak Ridge, Tennessee and 2Pacific Northwest National Laboratory, Richland, Washington

The Microbial Protein-Protein Interaction Database (MiPPI) is a publicly accessible database of microbial protein-protein interactions experimentally detected at the Genomics:GTL Center for Molecular and Cellular Systems (CMCS). The primary experimental method used at the CMCS is affinity-based isolation combined with mass spectrometry (refer to poster "Global Survey of Protein-Protein Interactions in Rhodopseudomonas palustris"). As of December 2006 we have performed over 500 endogenous affinity-tagged experiments which represent over 300 different bait proteins in Rhodopseudomonas palustris and Shewanella oneidensis. Our goal is to provide the highest quality protein interaction data to the biological community for the identification of cellular networks and ultimately biological function.

MiPPI stores the results of mass spectrometric protein identifications as DTASelect output files, as well as statistical evaluation of protein-protein interactions (refer to poster "Advanced Data Analysis Pipeline for Determinationof Protein Complexes and Interaction Networks at the Genomics:GTL Center for Molecular and Cellular Systems"). MiPPI is linked to the CMCS laboratory information management system (LIMS) which maintains sample metadata, from cloning through MS analysis. The first public data release is scheduled for February 2007 and includes all center results collected through November 2006. This release contains biological and technical replicates of more than 300 bait proteins collected from over 500 pulldown experiments and over 900 mass spectrometric analyses. The database includes over 50,000 observed protein-protein interactions. Updates to MiPPI will be released semiannually.

Beginning in February 2007, the web interface (mippi.ornl.gov) to the MiPPI database will provide online searches by protein or protein-protein interaction, and will include a protein-protein interaction viewer for the observed interactions. Mass spectrometry results and corresponding metadata will be provided for download in mzXML and DTASelect file formats. Identified protein-protein interactions including the statistical analysis scores will be provided for download in delimited text file format.

*Presenting Author