CD-CODE – Database and Encyclopedia for membraneless liquid droplets: MPI-CBG

Dresden researchers develop CrowDsourcing COndensate Database and Encyclopedia (CD-CODE), a community-editable platform and database of biomolecular condensates.

Logo of CD-CODE. Copyright: MPI-CBG

Biomolecular condensates are membrane-less organelles that selectively concentrate biomolecules such as proteins and nucleic acids in the cell. These dynamic liquid-like droplets form rapidly by phase separation—similar to oil droplets forming in water —producing temporary structures protected from the watery cell interior. In recent years, researchers have shown that these membraneless liquid condensates play a role in numerous cellular processes, including cellular signaling, cell division, the nested structure of nucleoli in the cell nucleus, and the regulation of DNA. As a result, biomolecular condensates are increasingly used as a new class of therapeutic targets.

To integrate interdisciplinary scientific knowledge about the function and composition of biomolecular condensates, the research group of Agnes Toth-Petroczy at the Max-Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) and at the Center for Systems Biology Dresden (CSBD) built a database and encyclopedia, which was recently published in the journal Nature Methods. The CrowDsourcing COndensate Database and Encyclopedia (CD-CODE.org) is a community-editable platform that includes a database of verified biomolecular condensates based on the literature, an encyclopedia of relevant scientific terms, and a crowdsourcing web application. Agnes Toth-Petroczy, who supervised the study, explains: “Building a comprehensive database is a nearly impossible task for one single lab or even an institute such as ours, which is a hub for condensate research. Therefore, we decided to build in the crowd-sourcing functionality to allow the community to engage.

Soumyadeep, co-first author, adds: “The aim of the project was to create the most extensive and globally-accepted database of condensates. This is why we allow users to submit new data or updates like Wikipedia. However, quality can only be ensured by expert moderation which is why maintainers play an important role. We hope that expert researchers world-wide will find it useful and will contribute to the curation of knowledge on condensates.”

“There are other databases which catalog proteins involved in phase separation. However, they do not answer questions such as: What are the biomolecular condensates that have been identified and confirmed so far? What are their known protein components? Which condensates are known to contain a given protein? What experimental evidence indicates the presence of a protein in a certain condensate?” – says Agnes. Nadia, an experimental biologist and co-first author of the publication, explains: “This is where our CD-CODE database comes in. Our database includes a catalogue of condensate-protein relationship and the corresponding experimental evidence that is manually curated. CD-CODE contained 9861 proteins linked to 244 distinct biomolecular condensates from 49 different organisms at the time of publication. These numbers are constantly updating as more data is added and reviewed by contributors.”

Agnes provides an outlook: “Our platform will accelerate the discovery and experimental investigation of biomolecular condensates and their protein constituents, as well as advance our understanding of their role in disease and as therapeutic targets. As the topic evolves, the crowd-sourcing aspect allows for a closer examination of concepts and evidence. This will ensure that the ever-growing knowledge of condensate research is quickly merged into the database and encyclopedia. Additionally, the comprehensive and curated data in CD-CODE serves as high quality training data for AI applications, that is our main focus now in the lab. We hope, CD-CODE can help deciphering the molecular determinants of protein condensation.” This project is an interdisciplinary team effort of software engineers and scientists from the research groups of Agnes Toth-Petroczy and Anthony Hyman, the Scientific Computing Facility at MPI-CBG and Dewpoint Therapeutics.

Original Publication

Rostam, N., Ghosh, S., Chow, C.F.W. et al. CD-CODE: crowdsourcing condensate database and encyclopedia. Nat Methods (2023). doi.org/10.1038/s41592-023-01831-0

Link to blogpost “Behind the paper”: https://protocolsmethods.springernature.com/posts/condensing-information-on-condensates