The Bat1K Project at the MPI-CBG
Bat1K is an initiative to sequence the genomes of all living bat species to chromosome-level assembly. Bats represent with approximately 1,300 species in total the largest order of mammals.
The main goal of the consortium is to uncover the molecular basis behind the unusual and fascinating adaptations of bats e.g. extreme longevity, echo location, unique immunity, contracted genomes, and vocal learning.
The Bat1K has evolved from the international Genome 10K project, which originally aimed to generate the genomes of 10,000 or more vertebrate species with the short-read technology that prevailed at that time. While many groups continue to produce relevant genomes in this manner, the G10K consortium itself has evolved into an umbrella organization that oversees a number of specific projects such as the VGP, the Bird10K, and the Bat1K.
The Bat1K at the MPI-CBG and the CSBD
The MPI-CBG and the CSBD are contributing to the Bat1K project, intrinsically implying the use of long-read DNA sequencing and various scaffolding technologies. Prof. Gene Myers, director at the MPI-CBG and the CSBD, is the bioinformatics and sequencing lead for the Bat1K. The Bat1K at the MPI-CBG and the CSBD covers the sequencing, genome assembly and subsequent analysis of all representatives of all 21 bat-families to address questions about aging, echolocation and sensory perception, and immunity.
We are preparing to release 6 genomes in the near future, and are funded to sequence another 25 species to study aging, immunity, and vocal-learning in collaboration with the Bat1K consortium.
The Max Planck Society is funding this fascinating genome sequencing project.
Species to be sequenced are selected between collaborators and tissues are submitted to the MPI-CBG, the CSBD, and the Dresden-concept Genome Center (DCGC).
A variety of de-novo sequencing technologies are currently applied and data are combined for genome assemblies to achieve our goal of error-free, near-gapless, chromosome-level, phased and annotated assemblies.
The current genome sequencing regime involves:
- 60x genome coverage of PacBio SMRT (single molecule real time DNA sequencing) reads,
- 50x genome coverage of 10X Genomics-linked reads for intermediate-range scaffolding,
- One DLS map making use of Bionano optical mapping to correct potential scaffolding errors,
- 50x HiC-linked reads for large-scale scaffolding,
- HiC and 10x Genomics linked short Illumina reads will be used for error correction of individual bases in the pipeline,
- RNAseq data and / or PacBio IsoSeq data for genome annotation.
PacBio SMRT DNA sequencing and 10x Genomics read cloud generation are done at the DCGC.
Bionano optical mapping will be established in Q4/2018 at the DCGC.
For HiC is done exclusively with a commercial supplier (Arima Genomics, Inc. US).
The Genome assembly pipeline and Dazzler:
The Dresden genome assembling pipeline consists of two activities:
a) We are setting up a pipeline to generate error-free, near-gapless, chromosome-level and phased and assemblies making use of existing algorithms and software tools such as FALCON unzip, MARVEL, Scaff10X, TGH, Salsa, and, PBJelly.
b) We are working on concepts and algorithms to analyze, understand and error correct long PacBio sequencing reads (The Dresden AZZembLER for long read DNA projects: dazzler). These pipelines will lead to a significant improvement of the assembly process in terms of accuracy, assembly continuity and finally required computing time.
Teeling et al: Bat Biology, Genomes, and the Bat1K Project: To Generate Chromosome-Level Genomes for All Living Bat Species. Annu Rev Anim Biosci, 6 23-46 (2018)
Knoepfli et al: The Genome 10K Project: A Way Forward. Annu. Rev. Anim. Biosci. 3 57-111 (2015)
Grohme et al: The genome of Schmidtea mediterranea and the evolution of core cellular mechanisms. Nature, 554(7690) 56-61 (2018)
Nowoshilow et al: The axolotl genome and the evolution of key tissue formation regulators. Nature, 554(7690) 50-55 (2018)