Wednesday, 9 April 2008

Genomic outpost serve the phylogenomic pioneers: designing novel nuclear markers for genomic DNA extractions of Lepidoptera

This paper just came out:

PDF Wahlberg, N. & C. Wheat. 2008. Genomic outpost serve the phylogenomic pioneers: designing novel nuclear markers for genomic DNA extractions of Lepidoptera. Systematic Biology, 57(2): 231-242. doi:10.1080/10635150802033006

List of vouchers.

Niklas Wahlberg and Chris Wheat describe a cool way to "easily" find new genes for phylogenetic inference. The authors wonder how many genes are necessary for getting a robust phylogeny. Maybe the more, the merrier, but for butterflies at least, they say that between 3 and 5 genes should be okay —for most of the nodes. If you want to be sure about relationships of ambiguous taxa, get 11 genes then.

From the abstract:
Increasing the number of characters used in phylogenetic studies is the next crucial step towards generating robust and stable phylogenetic hypotheses—i.e., strongly supported and consistent across reconstruction method. Here we describe a genomic approach to finding new protein-coding genes for systematics in nonmodel taxa, which can be PCR amplified from standard, slightly degraded genomic DNA extracts. We test this approach on Lepidoptera, searching the draft genomic sequence of the silk moth Bombyx mori, for exons >500 bp in length, removing annotated gene families, and compared remaining exons with butterfly EST databases to identify conserved regions for primer design. These primers were tested on a set of 65 taxa primarily in the butterfly family Nymphalidae. We were able to identify and amplify six previously unused gene regions (Arginine Kinase, GAPDH, IDH, MDH, RpS2, and RpS5) and two rarely used gene regions (CAD and DDC) that when added to the three traditional gene regions (COI, EF-1α and wingless) gave a data set of 8114 bp. Phylogenetic robustness and stability increased with increasing numbers of genes. Smaller taxanomic subsets were also robust when using the full gene data set. The full 11-gene data set was robust and stable across reconstruction methods, recovering the major lineages and strongly supporting relationships within them. Our methods and insights should be applicable to taxonomic groups having a single genomic reference species and several EST databases from taxa that diverged less than 100 million years ago.