• F Luppino, IA Adzhubey, C Cassa, A Toth-Petroczy. DeMAG predicts the effects of variants in clinically actionable genes by integrating structural and evolutionary epistatic features. bioRxiv 2022
  • C Landerer, J Poehls, A Toth-Petroczy. Evolutionary impact of codon specific translation errors at the proteome scale. bioRxiv 2022


* joint first author # joint corresponding author

Maria Luisa Romero Romero, Cedric Landerer, Jonas Poehls, Agnes Toth-Petroczy
Phenotypic mutations contribute to protein diversity and shape protein evolution.
Protein Sci, 31(9) Art. No. e4397 (2022)
Open Access DOI
Errors in DNA replication generate genetic mutations, while errors in transcription and translation lead to phenotypic mutations. Phenotypic mutations are orders of magnitude more frequent than genetic ones, yet they are less understood. Here, we review the types of phenotypic mutations, their quantifications, and their role in protein evolution and disease. The diversity generated by phenotypic mutation can facilitate adaptive evolution. Indeed, phenotypic mutations, such as ribosomal frameshift and stop codon readthrough, sometimes serve to regulate protein expression and function. Phenotypic mutations have often been linked to fitness decrease and diseases. Thus, understanding the protein heterogeneity and phenotypic diversity caused by phenotypic mutations will advance our understanding of protein evolution and have implications on human health and diseases.

Colin Jackson#, Agnes Toth-Petroczy, Rachel Kolodny, Florian Hollfelder, Monika Fuxreiter, Shina Caroline Lynn Kamerlin#, Nobuhiko Tokuriki#
Adventures on the Routes of Protein Evolution-In Memoriam Dan Salah Tawfik (1955-2021).
J Mol Biol, 434(7) Art. No. 167462 (2022)
Understanding how proteins evolved not only resolves mysteries of the past, but also helps address challenges of the future, particularly those relating to the design and engineering of new protein functions. Here we review the work of Dan S. Tawfik, one of the pioneers of this area, highlighting his seminal contributions in diverse fields such as protein design, high throughput screening, protein stability, fundamental enzyme-catalyzed reactions and promiscuity, that underpin biology and the origins of life. We discuss the influence of his work on how our models of enzyme and protein function have developed and how the main driving forces of molecular evolution were elucidated. The discovery of the rugged routes of evolution has enabled many practical applications, some which are now widely used.

Belin Selcen Beydag-Tasöz, Joyson Verner D'Costa, Lena Hersemann, Federica Luppino, Yung Hae Kim, Christoph Zechner, Anne Grapin-Botton
A combined transcriptional and dynamic roadmap of single human pancreatic endocrine progenitors reveals proliferative capacity and differentiation continuum.
bioRxiv, Art. No. (2021)
Open Access DOI
Basic helix-loop-helix genes, particularly proneural genes, are well-described triggers of cell differentiation, yet limited information exists on their dynamics, notably in human development. Here, we focus on Neurogenin 3 (NEUROG3), which is crucial for pancreatic endocrine lineage initiation. Using a double reporter to monitor endogenous NEUROG3 transcription and protein expression in single cells in 2D and 3D models of human pancreas development, we show peaks of expression for the RNA and protein at 22 and 11 hours respectively, approximately two-fold slower than in mice, and remarkable heterogeneity in peak expression levels all triggering differentiation. We also reveal that some human endocrine progenitors proliferate once, mainly at the onset of differentiation, rather than forming a subpopulation with sustained proliferation. Using reporter index-sorted single-cell RNA-seq data, we statistically map transcriptome to dynamic behaviors of cells in live imaging and uncover transcriptional states associated with variations in motility as NEUROG3 levels change, a method applicable to other contexts.

Jodie Ouahed✳︎, Judith R Kelsen✳︎, Waldo A Spessott, Kameron Kooshesh, Maria L Sanmillan, Noor Dawany, Kathleen E Sullivan, Kathryn Hamilton, Voytek Slowik, Sergey Nejentsev, João Farela Neves, Helena Flores, Wendy K Chung, Ashley Wilson, Kwame Anyane-Yeboa, Karen Wou, Preti Jain, Michael Field, Sophia Tollefson, Maiah H Dent, Dalin Li, Takeo Naito, Dermot P B McGovern, Andrew C Kwong, Faith Taliaferro, Jose Ordovas-Montanes, Bruce Horwitz, Daniel Kotlarz, Christoph Klein, Jonathan Evans, Jill Dorsey, Neil Warner, Abdul Elkadri, Aleixo M Muise, Jeffrey Goldsmith, Benjamin Thompson, Karin R Engelhardt, Andrew J Cant, Sophie Hambleton, Andrew Barclay, Agnes Toth-Petroczy, Dana Vuzman, Nikkola Carmichael, Corneliu Bodea, Christopher Cassa, Marcella Devoto, Richard L Maas, Edward M Behrens#, Claudio G Giraudo#, Scott B Snapper
Variants in STXBP3 are Associated with Very Early Onset Inflammatory Bowel Disease, Bilateral Sensorineural Hearing Loss and Immune Dysregulation.
J Crohns Colitis, 15(11) 1908-1919 (2021)
Very early onset inflammatory bowel disease [VEOIBD] is characterized by intestinal inflammation affecting infants and children less than 6 years of age. To date, over 60 monogenic aetiologies of VEOIBD have been identified, many characterized by highly penetrant recessive or dominant variants in underlying immune and/or epithelial pathways. We sought to identify the genetic cause of VEOIBD in a subset of patients with a unique clinical presentation.

Anwoy Kumar Mohanty, Dana Vuzman, Laurent Francioli, Christopher Cassa, Agnes Toth-Petroczy, Shamil Sunyaev
novoCaller: a Bayesian network approach for de novo variant calling from pedigree and population sequence data.
Bioinformatics, 35(7) 1174-1180 (2019)
De novo mutations (i.e. newly occurring mutations) are a pre-dominant cause of sporadic dominant monogenic diseases and play a significant role in the genetics of complex disorders. De novo mutation studies also inform population genetics models and shed light on the biology of DNA replication and repair. Despite the broad interest, there is room for improvement with regard to the accuracy of de novo mutation calling.

Jose Velilla✳︎, Michael Mario Marchetti✳︎, Agnes Toth-Petroczy, Claire Grosgogeat, Alexis H Bennett, Nikkola Carmichael, Elicia Estrella, Basil T Darras, Natasha Y Frank, Joel B Krier, Rachelle Gaudet, Vandana A Gupta
Homozygous TRPV4 mutation causes congenital distal spinal muscular atrophy and arthrogryposis.
Neurol Genet, 5(2) Art. No. e312 (2019)
Open Access DOI
To identify the genetic cause of disease in a form of congenital spinal muscular atrophy and arthrogryposis (CSMAA).

Mirna Bilus, Maja Semanjski, Marko Mocibob, Igor Zivkovic, Nevena Cvetesic, Dan S Tawfik, Agnes Toth-Petroczy, Boris Macek, Ita Gruic-Sovulj
On the Mechanism and Origin of Isoleucyl-tRNA Synthetase Editing against Norvaline.
J Mol Biol, 431(6) 1284-1297 (2019)
Aminoacyl-tRNA synthetases (aaRSs), the enzymes responsible for coupling tRNAs to their cognate amino acids, minimize translational errors by intrinsic hydrolytic editing. Here, we compared norvaline (Nva), a linear amino acid not coded for protein synthesis, to the proteinogenic, branched valine (Val) in their propensity to mistranslate isoleucine (Ile) in proteins. We show that in the synthetic site of isoleucyl-tRNA synthetase (IleRS), Nva and Val are activated and transferred to tRNA at similar rates. The efficiency of the synthetic site in pre-transfer editing of Nva and Val also appears to be similar. Post-transfer editing was, however, more rapid with Nva and consequently IleRS misaminoacylates Nva-tRNAIle at slower rate than Val-tRNAIle. Accordingly, an Escherichia coli strain lacking IleRS post-transfer editing misincorporated Nva and Val in the proteome to a similar extent and at the same Ile positions. However, Nva mistranslation inflicted higher toxicity than Val, in agreement with IleRS editing being optimized for hydrolysis of Nva-tRNAIle. Furthermore, we found that the evolutionary-related IleRS, leucyl- and valyl-tRNA synthetases (I/L/VRSs), all efficiently hydrolyze Nva-tRNAs even when editing of Nva seems redundant. We thus hypothesize that editing of Nva-tRNAs had already existed in the last common ancestor of I/L/VRSs, and that the editing domain of I/L/VRSs had primarily evolved to prevent infiltration of Nva into modern proteins.

Maria Luisa Romero Romero, Fan Yang, Yu-Ru Lin, Agnes Toth-Petroczy, Igor N. Berezovsky, Alexander Goncearenco, Wen Yang, Alon Wellner, Fanindra Kumar-Deshmukh, Michal Sharon, David Baker, Gabriele Varani, Dan S Tawfik
Simple yet functional phosphate-loop proteins.
Proc Natl Acad Sci U.S.A., 115(51) 11943-11950 (2018)
Abundant and essential motifs, such as phosphate-binding loops (P-loops), are presumed to be the seeds of modern enzymes. The Walker-A P-loop is absolutely essential in modern NTPase enzymes, in mediating binding, and transfer of the terminal phosphate groups of NTPs. However, NTPase function depends on many additional active-site residues placed throughout the protein's scaffold. Can motifs such as P-loops confer function in a simpler context? We applied a phylogenetic analysis that yielded a sequence logo of the putative ancestral Walker-A P-loop element: a β-strand connected to an α-helix via the P-loop. Computational design incorporated this element into de novo designed β-α repeat proteins with relatively few sequence modifications. We obtained soluble, stable proteins that unlike modern P-loop NTPases bound ATP in a magnesium-independent manner. Foremost, these simple P-loop proteins avidly bound polynucleotides, RNA, and single-strand DNA, and mutations in the P-loop's key residues abolished binding. Binding appears to be facilitated by the structural plasticity of these proteins, including quaternary structure polymorphism that promotes a combined action of multiple P-loops. Accordingly, oligomerization enabled a 55-aa protein carrying a single P-loop to confer avid polynucleotide binding. Overall, our results show that the P-loop Walker-A motif can be implemented in small and simple β-α repeat proteins, primarily as a polynucleotide binding motif.

Thomas A Hopf, Anna G Green, Benjamin Schubert, Sophia Mersmann, Charlotta P I Schärfe, John Ingraham, Agnes Toth-Petroczy, Kelly Brock, Adam J Riesselman, Perry Palmedo, ChulHee Kang, Robert Sheridan, Eli J Draizen, Christian Dallago, Chris Sander, Debora S Marks
The EVcouplings Python framework for coevolutionary sequence analysis.
Bioinformatics, Art. No. doi: 10.1093/bioinformatics/bty862 (2018)
Coevolutionary sequence analysis has become a commonly used technique for de novo prediction of the structure and function of proteins, RNA, and protein complexes. We present the EVcouplings framework, a fully integrated open-source application and Python package for coevolutionary analysis. The framework enables generation of sequence alignments, calculation and evaluation of evolutionary couplings (ECs), and de novo prediction of structure and mutation effects. The combination of an easy to use, flexible command line interface and an underlying modular Python package makes the full power of coevolutionary analyses available to entry-level and advanced users.

Alireza Haghighi, Joel B Krier, Agnes Toth-Petroczy, Christopher Cassa, Natasha Y Frank, Nikkola Carmichael, Elizabeth Fieg, Andrew Bjonnes, Anwoy Kumar Mohanty, Lauren C Briere, Sharyn Lincoln, Stephanie Lucia, Vandana A Gupta, Onuralp Söylemez, Sheila Sutti, Kameron Kooshesh, Haiyan Qiu, Christopher J Fay, Victoria Perroni, Jamie Valerius, Meredith Hanna, Alexander Frank, Jodie Ouahed, Scott B Snapper, Angeliki Pantazi, Sameer S Chopra, Ignaty Leshchiner, Nathan O Stitziel, Anna Feldweg, Michael Mannstadt, Joseph Loscalzo, David A Sweetser, Eric Liao, Joan M Stoler, Catherine B Nowak, Pedro A Sanchez-Lara, Ophir D. Klein, Hazel Perry, Nikolaos A Patsopoulos, Soumya Raychaudhuri, Wolfram Goessling, Robert C Green, Christine E Seidman, Calum A MacRae, Shamil Sunyaev, Richard L Maas, Dana Vuzman, Dana null
An integrated clinical program and crowdsourcing strategy for genomic sequencing and Mendelian disease gene discovery.
NPJ Genom Med, 3 21-21 (2018)
Despite major progress in defining the genetic basis of Mendelian disorders, the molecular etiology of many cases remains unknown. Patients with these undiagnosed disorders often have complex presentations and require treatment by multiple health care specialists. Here, we describe an integrated clinical diagnostic and research program using whole-exome and whole-genome sequencing (WES/WGS) for Mendelian disease gene discovery. This program employs specific case ascertainment parameters, a WES/WGS computational analysis pipeline that is optimized for Mendelian disease gene discovery with variant callers tuned to specific inheritance modes, an interdisciplinary crowdsourcing strategy for genomic sequence analysis, matchmaking for additional cases, and integration of the findings regarding gene causality with the clinical management plan. The interdisciplinary gene discovery team includes clinical, computational, and experimental biomedical specialists who interact to identify the genetic etiology of the disease, and when so warranted, to devise improved or novel treatments for affected patients. This program effectively integrates the clinical and research missions of an academic medical center and affords both diagnostic and therapeutic options for patients suffering from genetic disease. It may therefore be germane to other academic medical institutions engaged in implementing genomic medicine programs.

Agnes Toth-Petroczy, Perry Palmedo, John Ingraham, Thomas A Hopf, Bonnie Berger, Chris Sander, Debora S Marks
Structured States of Disordered Proteins from Genomic Sequences.
Cell, 167(1) 158-170 (2016)
Protein flexibility ranges from simple hinge movements to functional disorder. Around half of all human proteins contain apparently disordered regions with little 3D or functional information, and many of these proteins are associated with disease. Building on the evolutionary couplings approach previously successful in predicting 3D states of ordered proteins and RNA, we developed a method to predict the potential for ordered states for all apparently disordered proteins with sufficiently rich evolutionary information. The approach is highly accurate (79%) for residue interactions as tested in more than 60 known disordered regions captured in a bound or specific condition. Assessing the potential for structure of more than 1,000 apparently disordered regions of human proteins reveals a continuum of structural order with at least 50% with clear propensity for three- or two-dimensional states. Co-evolutionary constraints reveal hitherto unseen structures of functional importance in apparently disordered proteins.

Paola Laurino, Ágnes Tóth-Petróczy, Rubén Meana-Pañeda, Wei Lin, Donald G Truhlar, Dan S Tawfik
An Ancient Fingerprint Indicates the Common Ancestry of Rossmann-Fold Enzymes Utilizing Different Ribose-Based Cofactors.
PLoS Biol, 14(3) Art. No. e1002396 (2016)
Open Access DOI
Nucleoside-based cofactors are presumed to have preceded proteins. The Rossmann fold is one of the most ancient and functionally diverse protein folds, and most Rossmann enzymes utilize nucleoside-based cofactors. We analyzed an omnipresent Rossmann ribose-binding interaction: a carboxylate side chain at the tip of the second β-strand (β2-Asp/Glu). We identified a canonical motif, defined by the β2-topology and unique geometry. The latter relates to the interaction being bidentate (both ribose hydroxyls interacting with the carboxylate oxygens), to the angle between the carboxylate and the ribose, and to the ribose's ring configuration. We found that this canonical motif exhibits hallmarks of divergence rather than convergence. It is uniquely found in Rossmann enzymes that use different cofactors, primarily SAM (S-adenosyl methionine), NAD (nicotinamide adenine dinucleotide), and FAD (flavin adenine dinucleotide). Ribose-carboxylate bidentate interactions in other folds are not only rare but also have a different topology and geometry. We further show that the canonical geometry is not dictated by a physical constraint--geometries found in noncanonical interactions have similar calculated bond energies. Overall, these data indicate the divergence of several major Rossmann-fold enzyme classes, with different cofactors and catalytic chemistries, from a common pre-LUCA (last universal common ancestor) ancestor that possessed the β2-Asp/Glu motif.

Liat Rockah-Shmuel, Ágnes Tóth-Petróczy, Dan S Tawfik
Systematic Mapping of Protein Mutational Space by Prolonged Drift Reveals the Deleterious Effects of Seemingly Neutral Mutations.
PLoS Comput Biol, 11(8) Art. No. e1004421 (2015)
Open Access DOI
Systematic mappings of the effects of protein mutations are becoming increasingly popular. Unexpectedly, these experiments often find that proteins are tolerant to most amino acid substitutions, including substitutions in positions that are highly conserved in nature. To obtain a more realistic distribution of the effects of protein mutations, we applied a laboratory drift comprising 17 rounds of random mutagenesis and selection of M.HaeIII, a DNA methyltransferase. During this drift, multiple mutations gradually accumulated. Deep sequencing of the drifted gene ensembles allowed determination of the relative effects of all possible single nucleotide mutations. Despite being averaged across many different genetic backgrounds, about 67% of all nonsynonymous, missense mutations were evidently deleterious, and an additional 16% were likely to be deleterious. In the early generations, the frequency of most deleterious mutations remained high. However, by the 17th generation, their frequency was consistently reduced, and those remaining were accepted alongside compensatory mutations. The tolerance to mutations measured in this laboratory drift correlated with sequence exchanges seen in M.HaeIII's natural orthologs. The biophysical constraints dictating purging in nature and in this laboratory drift also seemed to overlap. Our experiment therefore provides an improved method for measuring the effects of protein mutations that more closely replicates the natural evolutionary forces, and thereby a more realistic view of the mutational space of proteins.

Monika Fuxreiter, Ágnes Tóth-Petróczy, Daniel A Kraut, Andreas Matouschek, Roderick Y H Lim, Bin Xue, Lukasz Kurgan, Vladimir N Uversky
Disordered proteinaceous machines.
Chem. Rev., 114(13) 6806-6843 (2014)

Agnes Tóth-Petróczy, Dan S Tawfik
Hopeful (protein InDel) monsters?
Structure, 22(6) 803-804 (2014)
In this issue of Structure, Arpino and colleagues describe in atomic detail how a protein stomachs a deletion within a helix, an event that rarely occurs in nature or in the lab. Can insertions and deletions (InDels) trigger dramatic structural transitions?

Agnes Tóth-Petróczy, Dan S Tawfik
The robustness and innovability of protein folds.
Curr Opin Struct Biol, 26 131-138 (2014)
Assignment of protein folds to functions indicates that >60% of folds carry out one or two enzymatic functions, while few folds, for example, the TIM-barrel and Rossmann folds, exhibit hundreds. Are there structural features that make a fold amenable to functional innovation (innovability)? Do these features relate to robustness--the ability to readily accumulate sequence changes? We discuss several hypotheses regarding the relationship between the architecture of a protein and its evolutionary potential. We describe how, in a seemingly paradoxical manner, opposite properties, such as high stability and rigidity versus conformational plasticity and structural order versus disorder, promote robustness and/or innovability. We hypothesize that polarity--differentiation and low connectivity between a protein's scaffold and its active-site--is a key prerequisite for innovability.

Liat Rockah-Shmuel, Ágnes Tóth-Petróczy, Asaf Sela, Omri Wurtzel, Rotem Sorek, Dan S Tawfik
Correlated occurrence and bypass of frame-shifting insertion-deletions (InDels) to give functional proteins.
PLoS Genet, 9(10) Art. No. e1003882 (2013)
Open Access DOI
Short insertions and deletions (InDels) comprise an important part of the natural mutational repertoire. InDels are, however, highly deleterious, primarily because two-thirds result in frame-shifts. Bypass through slippage over homonucleotide repeats by transcriptional and/or translational infidelity is known to occur sporadically. However, the overall frequency of bypass and its relation to sequence composition remain unclear. Intriguingly, the occurrence of InDels and the bypass of frame-shifts are mechanistically related - occurring through slippage over repeats by DNA or RNA polymerases, or by the ribosome, respectively. Here, we show that the frequency of frame-shifting InDels, and the frequency by which they are bypassed to give full-length, functional proteins, are indeed highly correlated. Using a laboratory genetic drift, we have exhaustively mapped all InDels that occurred within a single gene. We thus compared the naive InDel repertoire that results from DNA polymerase slippage to the frame-shifting InDels tolerated following selection to maintain protein function. We found that InDels repeatedly occurred, and were bypassed, within homonucleotide repeats of 3-8 bases. The longer the repeat, the higher was the frequency of InDels formation, and the more frequent was their bypass. Besides an expected 8A repeat, other types of repeats, including short ones, and G and C repeats, were bypassed. Although obtained in vitro, our results indicate a direct link between the genetic occurrence of InDels and their phenotypic rescue, thus suggesting a potential role for frame-shifting InDels as bridging evolutionary intermediates.

Eynat Dellus-Gur, Agnes Toth-Petroczy, Mikael Elias, Dan S Tawfik
What makes a protein fold amenable to functional innovation? Fold polarity and stability trade-offs.
J Mol Biol, 425(14) 2609-2621 (2013)
Protein evolvability includes two elements--robustness (or neutrality, mutations having no effect) and innovability (mutations readily inducing new functions). How are these two conflicting demands bridged? Does the ability to bridge them relate to the observation that certain folds, such as TIM barrels, accommodate numerous functions, whereas other folds support only one? Here, we hypothesize that the key to innovability is polarity--an active site composed of flexible, loosely packed loops alongside a well-separated, highly ordered scaffold. We show that highly stabilized variants of TEM-1 β-lactamase exhibit selective rigidification of the enzyme's scaffold while the active-site loops maintained their conformational plasticity. Polarity therefore results in stabilizing, compensatory mutations not trading off, but instead promoting the acquisition of new activities. Indeed, computational analysis indicates that in folds that accommodate only one function throughout evolution, for example, dihydrofolate reductase, ≥ 60% of the active-site residues belong to the scaffold. In contrast, folds associated with multiple functions such as the TIM barrel show high scaffold-active-site polarity (~20% of the active site comprises scaffold residues) and >2-fold higher rates of sequence divergence at active-site positions. Our work suggests structural measures of fold polarity that appear to be correlated with innovability, thereby providing new insights regarding protein evolution, design, and engineering.

Agnes Tóth-Petróczy, Dan S Tawfik
Protein insertions and deletions enabled by neutral roaming in sequence space.
Mol Biol Evol, 30(4) 761-771 (2013)
Backbone modifications via insertions and deletions (InDels) may exert dramatic effects, for better (mediating new functions) and for worse (causing loss of structure and/or function). However, contrary to point mutations (substitutions), our knowledge of the evolution and structural-functional effects of InDels is limited and so is our capability to engineer them. We sought to assess how deleterious InDels are relative to point mutations and understand the mechanisms that mediate their acceptance. Analysis of the evolution of InDels in orthologous protein phylogenies indicated that their rate of purging is 9- to 100-fold higher than for point mutations. In yeast, for example, the substitutions-to-InDels ratio is approximately 14-fold higher in protein coding than in noncoding regions. The incorporation of InDels relative to substitutions is not only slow but also nonlinear. On average, ≥50 substitutions accumulate before the appearance of the first InDel. We also found enriched substitutions in sequential and spatial proximity to InDels, suggesting that certain substitutions are correlated with InDels. As indicated by the lag in InDels accumulation, some of these correlated substitutions may have occurred first, as apparently neutral mutations, and later enabled the accumulation of InDels that would be otherwise purged. Thus, compensatory substitutions may follow InDels in an "adaptive walk" as traditionally assumed, but might also accumulate first, by "neutral roaming." The dynamics of InDels accumulation also depends on their genomic frequencies-InDels in flies are 4-fold more frequent than in yeast and tend to be compensated rather than enabled.

Tzachi Hagai, Ágnes Tóth-Petróczy, Ariel Azia, Yaakov Levy
The origins and evolution of ubiquitination sites.
Mol Biosyst, 8(7) 1865-1877 (2012)
Protein ubiquitination is central to the regulation of various pathways in eukaryotes. The process of ubiquitination and its cellular outcome were investigated in hundreds of proteins to date. Despite this, the evolution of this regulatory mechanism has not yet been addressed comprehensively. Here, we quantify the rates of evolutionary changes of ubiquitination and SUMOylation (Small Ubiquitin-like MOdifier) sites. We estimate the time at which they first appeared, and compare them to acetylation and phosphorylation sites and to unmodified residues. We observe that the various modification sites studied exhibit similar rates. Mammalian ubiquitination sites are weakly more conserved than unmodified lysine residues, and a higher degree of relative conservation is observed when analyzing bona fide ubiquitination sites. Various reasons can be proposed for the limited level of excess conservation of ubiquitination, including shifts in locations of the sites, the presence of alternative sites, and changes in the regulatory pathways. We observe that disappearance of sites may be compensated by the presence of a lysine residue in close proximity, which is significant when compared to evolutionary patterns of unmodified lysine residues, especially in disordered regions. This emphasizes the importance of analyzing a window in the vicinity of functional residues, as well as the capability of the ubiquitination machinery to ubiquitinate residues in a certain region. Using prokaryotic orthologs of ubiquitinated proteins, we study how ubiquitination sites were formed, and observe that while sometimes sequence additions and rearrangements are involved, in many cases the ubiquitination machinery utilizes an already existing sequence without significantly changing it. Finally, we examine the evolution of ubiquitination, which is linked with other modifications, to infer how these complex regulatory modules have evolved. Our study gives initial insights into the formation of ubiquitination sites, their degree of conservation in various species, and their co-evolution with other posttranslational modifications.

Tzachi Hagai, Ariel Azia, Ágnes Tóth-Petróczy, Yaakov Levy
Intrinsic disorder in ubiquitination substrates.
J Mol Biol, 412(3) 319-324 (2011)
The ubiquitin-proteasome system is responsible for the degradation of numerous proteins in eukaryotes. Degradation is an essential process in many cellular pathways and involves the proteasome degrading a wide variety of unrelated substrates while retaining specificity in terms of its targets for destruction and avoiding unneeded proteolysis. How the proteasome achieves this task is the subject of intensive research. Many proteins are targeted for degradation by being covalently attached to a poly-ubiquitin chain. Several studies have indicated the importance of a disordered region for efficient degradation. Here, we analyze a data set of 482 in vivo ubiquitinated substrates and a subset in which ubiquitination is known to mediate degradation. We show that, in contrast to phosphorylation sites and other regulatory regions, ubiquitination sites do not tend to be located in disordered regions and that a large number of substrates are modified at structured regions. In degradation-mediated ubiquitination, there is a significant bias of ubiquitination sites to be in disordered regions; however, a significant number is still found in ordered regions. Moreover, in many cases, disordered regions are absent from ubiquitinated substrates or are located far away from the modified region. These surprising findings raise the question of how these proteins are successfully unfolded and ultimately degraded by the proteasome. They indicate that the folded domain must be perturbed by some additional factor, such as the p97 complex, or that ubiquitination may induce unfolding.

Agnes Tóth-Petróczy, Dan S Tawfik
Slow protein evolutionary rates are dictated by surface-core association.
Proc Natl Acad Sci U.S.A., 108(27) 11151-11156 (2011)
Why do certain proteins evolve much slower than others? We compared not only rates per protein, but also rates per position within individual proteins. For ∼90% of proteins, the distribution of positional rates exhibits three peaks: a peak of slow evolving residues, with average log(2)[normalized rate], log(2)μ, of ca. -2, corresponding primarily to core residues; a peak of fast evolving residues (log(2)μ ∼ 0.5) largely corresponding to surface residues; and a very fast peak (log(2)μ ∼ 2) associated with disordered segments. However, a unique fraction of proteins that evolve very slowly exhibit not only a negligible fast peak, but also a peak with a log(2)μ ∼ -4, rather than the standard core peak of -2. Thus, a "freeze" of a protein's surface seems to stop core evolution as well. We also observed a much higher fraction of substitutions in potentially interacting residues than expected by chance, including substitutions in pairs of contacting surface-core residues. Overall, the data suggest that accumulation of surface substitutions enables the acceptance of substitutions in core positions. The underlying reason for slow evolution might therefore be a highly constrained surface due to protein-protein interactions or the need to prevent misfolding or aggregation. If the surface is inaccessible to substitutions, so becomes the core, thus resulting in very slow overall rates.

Agnes Tóth-Petróczy, Istvan Simon, Monika Fuxreiter, Yaakov Levy
Disordered tails of homeodomains facilitate DNA recognition by providing a trade-off between folding and specific binding.
J Am Chem Soc, 131(42) 15084-15085 (2009)
DNA binding specificity of homeodomain transcription factors is critically affected by disordered N-terminal tails (N-tails) that undergo a disorder-to-order transition upon interacting with DNA. The mechanism of the binding process and the molecular basis of selectivity are largely unknown. The coupling between folding and DNA binding of Antp and NK-2 homeodomains was investigated by coarse-grained molecular dynamics simulations using the native protein-DNA complex. The disordered N-tails were found to decrease the stability of the free proteins by competing with the native intramolecular interactions and increasing the radius of gyration of the homeodomain cores. In the presence of DNA, however, the N-tails increase the stability of the homeodomains by reducing the coupling between folding and DNA binding. Detailed studies on Antp demonstrate that the N-tail anchors the homeodomain to DNA and accelerates formation of specific interactions all along the protein-DNA interface. The tidal electrostatic forces between the N-tail and DNA induce faster and tighter binding of the homeodomain core to the DNA; this mechanism conforms to a fly-casting mechanism. In agreement with experiments, the N-tail of Antp also improves the binding affinity for DNA, with a major contribution by the released waters. These results imply that varying the degree of folding upon binding and thereby modulating the size of the buried surface-disordered N-tails of homeodomains can fine-tune the binding strength for specific DNA sequences. Overall, both the kinetics and thermodynamics of specific DNA binding by homeodomains can be improved by N-tails using a mechanism that is inherent in their disordered state.

Agnes Tóth-Petróczy, Christopher J Oldfield, István Simon, Yuichiro Takagi, A. Keith Dunker, Vladimir N Uversky, Monika Fuxreiter
Malleable machines in transcription regulation: the mediator complex.
PLoS Comput Biol, 4(12) Art. No. e1000243 (2008)
Open Access DOI
The Mediator complex provides an interface between gene-specific regulatory proteins and the general transcription machinery including RNA polymerase II (RNAP II). The complex has a modular architecture (Head, Middle, and Tail) and cryoelectron microscopy analysis suggested that it undergoes dramatic conformational changes upon interactions with activators and RNAP II. These rearrangements have been proposed to play a role in the assembly of the preinitiation complex and also to contribute to the regulatory mechanism of Mediator. In analogy to many regulatory and transcriptional proteins, we reasoned that Mediator might also utilize intrinsically disordered regions (IDRs) to facilitate structural transitions and transmit transcriptional signals. Indeed, a high prevalence of IDRs was found in various subunits of Mediator from both Saccharomyces cerevisiae and Homo sapiens, especially in the Tail and the Middle modules. The level of disorder increases from yeast to man, although in both organisms it significantly exceeds that of multiprotein complexes of a similar size. IDRs can contribute to Mediator's function in three different ways: they can individually serve as target sites for multiple partners having distinctive structures; they can act as malleable linkers connecting globular domains that impart modular functionality on the complex; and they can also facilitate assembly and disassembly of complexes in response to regulatory signals. Short segments of IDRs, termed molecular recognition features (MoRFs) distinguished by a high protein-protein interaction propensity, were identified in 16 and 19 subunits of the yeast and human Mediator, respectively. In Saccharomyces cerevisiae, the functional roles of 11 MoRFs have been experimentally verified, and those in the Med8/Med18/Med20 and Med7/Med21 complexes were structurally confirmed. Although the Saccharomyces cerevisiae and Homo sapiens Mediator sequences are only weakly conserved, the arrangements of the disordered regions and their embedded interaction sites are quite similar in the two organisms. All of these data suggest an integral role for intrinsic disorder in Mediator's function.

Agnes Toth-Petroczy, Agnes Szilagyi, Zsolt Ronai#, Maria Sasvari-Szekely, András Guttman#
Validation of a tentative microsatellite marker for the dopamine D4 receptor gene by capillary gel electrophoresis.
J Chromatogr A, 1130(2) 201-205 (2006)
Two to four-basepair-short tandem repeats (i.e. microsatellites) are broadly utilized as genetic markers for mapping disease loci in whole genome search analyses. Based on their close vicinity on chromosome 11, the D11S1984 microsatellite was anticipated as a tentative marker for the dopamine D4 receptor gene. A capillary gel electrophoresis based genotype analysis method and an in-house made computational tool was developed for the analysis of the D11S1984 microsatellite marker to examine a healthy Hungarian population of n=106. The data obtained did not suggest significant linkage between the D11S1984 marker and the DRD4 gene.

Google Scholar