The project is developing methods for rapid surveys and species identification from bulk samples using next-generation sequencing.


Our approach is an adaptation of ‘metagenomics’ procedures used to study complex mixtures of samples in microbial communities, but using sequences from mitochondrial (mt) genomes. 

The large number of mitochondria in each cell ensures that mtDNA is enriched over the nuclear genes. Depending on the species and the tissue used, some 0.2%-1% of all reads in metagenomic sequencing studies are derived from mtDNA.  Only this fraction is targeted for sequence analysis, after further enrichment with various procedures that exploit the greater AT content of mtDNA compared to nuclear DNA in insects. 

Next-generation sequencing

Technology produces huge numbers of sequence reads, which permits the cost effective analysis of mtDNA even if they constitute only a small fraction of the total.  

Advantages over PCR-based technologies 

such as ‘DNA barcoding’ (the sequencing of Cytochrome Oxidase I). 

  • the success of PCR amplification is uneven for the various templates in the pool, resulting in incomplete or highly skewed inventories. 
  • PCR introduces errors in sequences that require the removal of suspect reads (‘denoising’) which further increases the uncertainty about the data. Instead, direct counts of sequence reads without PCR are a better reflection of the true diversity of a bulk sample, and the abundances of reads should correspond closely to the biomass of each species in the mixture.

We employ a protocol that accommodates the need for de novo assembly as well as for identification against a reference database of mitogenome sequences.  

Reference mitogenomes can be obtained from pooled or individualized specimens either by long-range PCR or by direct sequencing from genomic DNA at high sequencing depth at a cost of £20 pounds with current technology. 

The identification of specimens requires ~100x less sequencing depth and can be performed on very large pools of hundreds of species. 

The approach therefore can easily be integrated with studies of taxonomic turnover along perturbation gradients or other functional studies of ecosystems that require species identifications for multiple samples.