So far, most of my research with bacteria has been experimental—experimental in the sense that I manipulate the genes or environment of bacteria in the laboratory and look at how those manipulations affect fitness, population dynamics, and evolution. One of the great strengths of experimental science is that it lets you change one variable at a time and keep everything else constant. That way, you can be sure that your results are caused by that variable and not something else. And because microbes evolve so quickly, you can use experiments to directly test the predictions of evolutionary theory. All these things are great.
One of the big disadvantages of the experimental approach, though, is that it tells you how evolution can happen—not necessarily how it does happen in the natural world. The best we can do in the lab is often still very different than an organism’s natural habitat. Experimental approaches also don’t tell you how general a result is. When you get a result, you hope that it holds for other bacteria in other environments, but there’s no guarantee that it should be so. Only by repeating those experiments in other systems do you get an idea about generality—and no one wants to perform, publish, or fund work that’s just repeating what other people did and getting the same results.
For these reasons, I’ve been becoming more interested in molecular evolution. The information in DNA and protein sequences reflects the actual evolutionary history of organisms in their real-world environment. It’s hard to observe or experiment on microbes in their natural habitat, but it’s not that hard to look at their DNA. And there’s lots of sequence data already available. Of course, sequence analysis has its disadvantages, too. Correllation is not causation, so if you see that two things are associated with each other it’s always possible that the real cause is some unknown third thing. It can also be difficult to exclude alternative explanations for results. Some people in my field feel these problems to be large and dismiss sequence-based studies as “retrospective evidence” and thus inferior to “prospective” experimental studies. But the way I see it, we have so much of this data these days—why not use it? Anything we can use to better understand out how the natural world works is a good thing, in my book. Why can’t experimental and sequence-based approaches complement each other?
This has been on my mind recently after reading an interesting paper by Fidelma Boyd, Salvador Almagro-Morenoand, and Michelle Parent.
Bacterial genomes are fluid things. Something like 30% of the genes in an E. coli cell may not even be present in the E. coli cell next to it. Often these differences in gene content are viruses laying dormant in the genome, waiting for the right trigger to emerge and find a new host. In other cases, they are clusters of genes called genomic islands that kind of look like viruses—they have a few genes with similar sequences—but don’t seem to have all the pieces necessary to make viruses on their own. What are they doing there? Microbiologists are interested in genomic islands because, aside from containing virus-like genes, they often also have genes that make bacteria more harmful or resistant to antibiotics.
Phage P2 (right) and its freeloader P4 (left). © Institute for Molecular Virology, U. Wisconson-Madison.
There are at least two possible answers. One is that genomic islands are degraded phage (viruses of bacteria). They were once infectious, but at some point mutation inactivated one or more genes necessary for that lifestyle. Now, as the mutations continue to accumulate, they’re sliding toward evolutionary oblivion and their own inevitable deletion. Another possibility is that genomic islands are mobilizeable. This means that they can’t make phage particles on their own, but they can use the proteins made by other phage in the same cell. They’re a kind of freeloader. Phage P4 is a well-known example.
How do we tell? Boyd and coauthors addressed this question using the tools of molecular evolution. If genomic islands were degraded phage, phylogenetic trees made from their protein and DNA sequences would show genomic islands scattered among the other phage. Because they’re degraded and nonfunctional, they’d be recent derivatives and wouldn’t persist long over evolutionary time. All the branches leading to genomic islands would be near the tips of the tree. If, on the other hand, genomic islands are mobilizeable and have a long evolutionary history of freeloading on self-sufficient phage, then phylogentic trees would show them clustering together on their own branch.
Boyd and coauthors made phylogenetic trees using the sequences of integrase (Int) genes from many different genomic islands and phage. They found that virtually all the genomic islands clustered together in their own branch that included P4. This evidence is consistent with the mobilizeable hypothesis and inconsistent with the degradation hypothesis. Boom—we’ve managed to exclude one hypothesis, the other one survived an empirical test, and we’ve made a tiny step forward in understanding the natural world. Science in action.
I wish Boyd and company tested another prediction of the degradation hypothesis: that degraded phage should show evidence of relaxed selection. Once phage get inactivated, natural selection no longer weeds out harmful mutations in their sequences. One kind of evidence for relaxed selection is a larger fraction of pseudogenes—sequences of DNA that once used to be genes but are now prematurely truncated or shifted so that they no longer make functional proteins. Another is that more of the DNA sequence changes should cause differences in the protein sequence (dN/dS, for those who know such things). Not finding these things, or at least putting lower limits on how much they occur, would be another strike against the degradation hypothesis and more support for the mobilizeable hypothesis. The data’s already there—the analysis just needs to be done.
It’s also wierd that this paper is published as a review article rather than a peer-reviewed results paper in a molecular evolution journal. Because it’s not, and because the paper glosses over many of the details of the phylogenetic analysis, I find myself taking the results with a grain of salt. Hopefully this work can at some point be redone or extended at some point so I can be more confident in the results.
In any case, this is an example of how sequence analysis lets us get at an evolutionary question—how does natural selection act on genomic islands?—that can’t be answered by experiments alone. We need both types of data. The experiments show us that mobilization can happen and the sequences show us that these elements have been persisting and evolving just fine without their own phage-producing genes.