In the first public release of this project, we contributed 2 of the 15 genomes released on Sept. 12, 2018: the greater horseshoe bat (Rhinolophus ferrumequinum) and the flier cichlid fish (Archocentrus centrarchus). In the future, we expect to sequence about 10% of the VGP species in Dresden.
The VGP project at the MPI-CBG
The MPI-CBG and the CSBD are contributing to the international Vertebrate Genomes Project (VGP).
The VGP aims to generate error-free, near gapless reference-quality genome assemblies of all 66.000 vertebrate species. Obtaining the DNA sequences of all vertebrates, will enable the study of how genetic elements such as genes and regulatory regions have contributed to the evolution and fitness of these species.
The high-quality VGP genomes will become the main references for their species and will be stored in the Genome Ark, a digital open-access library. These genomes will be used to address novel questions, ranging from cell-type evolution to the genetics of complex traits and associated diseases. The Genome Ark will also provide tools for designing conservation strategies towards the preservation of life forms for future generations. Broadly, we expect that the VGP will provide a powerful resource to advance questions in biology, genomics, conservation, medicine, and bioinformatics.
The VGP has evolved from the international Genome 10K project, which originally aimed to generate the genomes of 10,000 or more vertebrate species with the short-read technology that prevailed at that time. While many groups continue to produce relevant genomes in this manner, the G10K consortium itself has evolved into an umbrella organization that oversees a number of specific projects such as the VGP, the Bird10K, and the Bat1K.
The current phase 1 – the VGP orders project - aims to generate high-quality near error-free genomes of 260 species representing all vertebrate orders and a divergence time of ~50 million years ago or greater from their most recent common ordinal ancestor. These include human as well as species that might go extinct soon. In phases 2 to 3, representatives of all families (~1.000 species) and genera (~10.000 species) will be sequenced leading to the last phase where they hope to accomplish sequencing of all 66.000 vertebrate species.
The VGP at the MPI-CBG and the CSBD
The MPI-CBG and the CSBD form one of the three international VGP sequencing hubs, together with the Rockefeller University, USA, and the Wellcome Sanger Genome Institute, UK. VGP in Dresden covers the sequencing, genome assembly and subsequent analysis of one representative species of each of the 260 vertebrate orders with a focus on bats and fish.
Apart from vertebrates for the VGP and Bat1K projects, we sequenced more vertebrates, invertebrates and plant species making use of long read technologies such as
a) other vertebrate species:
- very large vertebrate genomes of amphibians such as the Axolotl (Ambystoma mexicanum) and Spanish ribbed newt (Pleurodeles waltl),
- reptiles such as the tegu (Salvator merianae),
- fish such as the sand gopy (Pomatoschistus minutus) or a zebrafish cell line.
b) invertebrate species:
- five planarian species with highly AT-rich and repetitive genomes (Schmidtea mediterranea, Schmidtea polychroa, Polycelis tenius, Polycelis felina, Polycelis nigra),
- insects such as the cabbage fly (Delia radicum) and the hawk-moth (Hyles vespertilio) or
such as the wild tobacco (Nicotiana attenuata).
Species to be sequenced are selected between collaborators and tissues are submitted to the MPI-CBG, the CSBD, and the Dresden-concept Genome Center (DCGC).
A variety of de-novo sequencing technologies are currently applied and data are combined for genome assemblies to achieve our goal of error-free, near-gapless, chromosome-level, phased and annotated assemblies.
The current genome sequencing regime involves:
- 60x genome coverage of PacBio SMRT (single molecule real time DNA sequencing) reads,
- 68x genome coverage of 10x Genomics-linked reads for intermediate-range scaffolding,
- One DLS map making use of Bionano optical mapping to correct potential scaffolding errors,
- 68x HiC-linked reads for large-scale scaffolding,
- HiC and 10x Genomics linked short Illumina reads will be used for error correction of individual bases in the pipeline,
- RNAseq data and / or PacBio IsoSeq data for genome annotation.
PacBio SMRT DNA sequencing and 10x Genomics read cloud generation are done at the DCGC.
Bionano optical mapping will be established in Q4/2018 at the DCGC.
HiC is done exclusively with a commercial supplier (Arima Genomics, Inc. US).
The Genome assembly pipeline and Dazzler:
The Dresden genome assembling pipeline consists of two activities:
a) We are setting up a pipeline to generate error-free, near-gapless, chromosome-level and phased and assemblies making use of existing algorithms and software tools such as FALCON unzip, MARVEL, Scaff10X, TGH, Salsa, and, Arrow.
b) We are working on concepts and algorithms to analyze, understand and error correct long PacBio sequencing reads (The Dresden AZZembLER for long read DNA projects: dazzler). These pipelines will lead to a significant improvement of the assembly process in terms of accuracy, assembly continuity and finally required computing time.
Teeling et al: Bat Biology, Genomes, and the Bat1K Project: To Generate Chromosome-Level Genomes for All Living Bat Species. Annu Rev Anim Biosci, 6 23-46 (2018)
Knoepfli et al: The Genome 10K Project: A Way Forward. Annu. Rev. Anim. Biosci. 3 57-111 (2015)
Grohme et al: The genome of Schmidtea mediterranea and the evolution of core cellular mechanisms. Nature, 554(7690) 56-61 (2018)
Nowoshilow et al: The axolotl genome and the evolution of key tissue formation regulators. Nature, 554(7690) 50-55 (2018)
Data use policy
For all other species get in touch with Sylke Winkler directly.