The diversity of soil bacteria is simply staggering. According to Torsvik and
colleagues (Torsvik et al. 1990; Torsvik and Ovreas 2002), species diversity in soil samples is so high that our most powerful methods of estimation provide only the crudest measure of its magnitude. Nonetheless, many such estimates exist, and they suggest that a single gram of soil may contain over 10 billion microbial cells and more than 1,800 bacterial species (Torsvik and Ovreas 2002; Gans et al. 2005; Zhang et al. 2008). An equally compelling estimate is provided by Dykhuizen (Dykhuizen and Dean 2004), who examined levels of genetic diversity in soil bacteria and predicted that 30 g of forest soil contains over half a million species! As our methods of empirically estimating bacterial diversity improve, so to do our methods of mathematically refining these numbers. For example, some analytical methods now take into account the fact that there are very few common soil species and untold numbers of rare species (Youssef and Elshahed 2009; Schloss and Handelsman 2006). These refinements push our estimates of bacterial diversity
to over a million species per gram of soil (Gans et al. 2005). To put these numbers into perspective, similar studies of the human gastrointestinal tract predict a mere 400 distinct bacterial phylotypes (Rajilic-Stojanovic et al. 2007). This staggering level of species diversity is even more remarkable when one considers that the number of prokaryotes reported in the National Center for Biotechnology Information molecular database is only about 15,111 (Sayers et al. 2010). Clearly, the soil represents a vast reservoir of untapped bacterial diversity. How we will classify this newfound diversity remains an open question. As molecular methods, such as whole genome sequencing, are more widely appliedto characterize bacterial diversity, our ability to make taxonomic sense of what we learn is severely challenged. The focus of this review is to explore how we can employ population and species level comparative genomics to provide a rational
basis for identifying, and even naming, evolutionary “lineages.” In essence, we want to know whether a functional and useful bacterial species concept emerges from the burgeoning genomic information overload.