American Journal of Botany 1 01 ( 10) : 1666 – 1685, 2014 . S PECIAL INVITED PAPER P HYLOGENOMICS OF THE CARROT GENUS ( DAUCUS, APIACEAE)1 CARLOS ARBIZU , H OLLY RUESS, DOUGLAS S ENALIK, PHILIPP W . SIMON , AND DAVID M. SPOONER2 U . S. Department of Agriculture, Agricultural Research Service, Vegetable Crops Research Unit; and Department of Horticulture, University of Wisconsin, 1575 Linden Drive, Madison, Wisconsin 53706-1590 USA • Premise of the study: We explored the utility of multiple nuclear orthologs for the taxonomic resolution of wild and cultivated carrot, Daucus species. • Methods: We studied the phylogeny of 92 accessions of 13 species and two subspecies of Daucus and 15 accessions of related genera (107 accessions total) with DNA sequences of 94 nuclear orthologs. Reiterative analyses examined data of both alleles using ambiguity codes or a single allele with the highest coverage, trimmed vs. untrimmed homopolymers; pure exonic vs. pure intronic data; the use of all 94 markers vs. a reduced subset of markers; and analysis of a concatenated data set vs. a coalescent (species tree) approach. • K ey results: Our maximum parsimony and maximum likelihood trees were highly resolved, with 100% bootstrap support for most of the external and many of the internal clades. They resolved multiple accessions of many different species as monophy- letic with strong support, but failed to support other species. The single allele analysis gave slightly better topological resolution; trimming homopolymers failed to increase taxonomic resolution; the exonic data had a smaller proportion of parsimony-informative characters. Similar results demonstrating the same dominant topology can be obtained with many fewer markers. A Bayesian concordance analysis provided an overall similar phylogeny, but the coalescent analysis provided drastic changes in topology to all the above. • Conclusions: Our research highlights some diffi cult species groups in D aucus and misidentifi cations in germplasm collections. It highlights a useful subset of markers and approaches for future studies of dominant topologies in D aucus. Key words: Apiaceae; carrot; D aucus ; germplasm; next-generation sequencing; phylogenomics. U ntil very recently, it was diffi cult to imagine the availability that “incorrect” or “noisy” phylogenetic signal was overcome of genomic information with unprecedented amounts of data by huge data sets obtained from concatenation of many loci, for a wide array of organisms ( Delsuc et al., 2005 ; R okas and leading to strongly supported phylogenetic species trees ( Chen Carroll, 2006 ; McCormack et al., 2013 ). However, recent de- and Li, 2001 ; Rokas et al., 2003; Christelová et al., 2011; Blair velopments of next-generation sequencing technologies now et al., 2012 ; L ang et al., 2013 ; S alichos and Rokas, 2013) . How- make it possible to sequence millions of bases in a single ex- ever, even though combining data from multiple genes can re- periment at a relatively low cost ( Egan et al., 2012 ; Soltis et al., sult in strongly supported phylogenetic resolution, assuming a 2013 ), ushering in “phylogenomics” ( Delsuc et al., 2005 ; Mc- single divergent history may undermine interpretation of the Cormack et al., 2013 ), that we use here to refer to the use of phylogeny on a combined gene tree ( Kolaczkowski and Thornton, genome-scale genetic data for phylogenetic analyses. 2004 ; Lewis et al., 2005 ; M ossel and Vigoda, 2005 ). P hylogenomics is a new fi eld with as yet many unexplored B iological explanations were proposed for gene tree discor- applications and potential constraints. For example, the recon- dances, such as coalescent stochasticity ( Takahata, 1989 ), the ciliation of well-supported species trees is a primary interest in movement of genes among species by hybridization and intro- systematics ( Blair and Murphy, 2011) , but since phylogenetics gression ( Rieseberg et al., 2000) , horizontal gene transfer is moving away from single-locus to multilocus analyses ( Doolittle, 1999) , gene duplication (P age and Charleston, 1997) , ( Edwards, 2009 ), debates on gene tree discordance are becom- and incomplete lineage sorting (P amilo and Nei, 1988) . Baum ing more common. For many years, the alternative to deal with (2007) proposed a “primary concordance tree” as a valuable discordance in multilocus data was concatenation (R okas et al., summary of the dominant phylogenetic history among a group 2005 ; D unn et al., 2008 ; Schierwater et al., 2009) , with an idea of organisms. He defi ned the dominant phylogenetic history as the tree composed of clades with a higher concordance factor 1 Manuscript received 11 March 2014; revision accepted 17 June 2014. than any contradictory clade. We use the term “dominant topol- The authors thank David Baum who advised C.A. in earlier stages of this ogy”, as determined by our concatenated data set. Also, this tree research during his course “Phylogenetic Analysis of Molecular Data”, should provide a useful estimate of the primary history and the Cécile Ané for advice and assistance in running “BUCKy”, and Kathleen degree of reticulation/divergence at various points in that his- Reitsma and staff at the North Central Regional Plant Introduction Station tory. Baum (2007) also indicated that clades on concordance in Ames Iowa for advice and assistance in obtaining the germplasm examined here. trees can be annotated with their concordance factor (CF), the 2 Author for correspondence (e-mail: david.spooner@ars.usda.gov) proportion of the genome for which the clade is true. The CF can be estimated from population histories or from multilocus doi:10.3732/ajb.1400106 molecular data sets. American Journal of Botany 1 01 ( 10 ): 1 666 – 1685 , 2 014; http://www.amjbot.org/ © 2014 B otanical Society of America 1666 October 2014] ARBIZU ET AL.—PHYLOGENOMICS OF DAUCUS 1667 The Apiaceae (Umbelliferae) family contains 455 genera and fertile intercrosses of cultivated carrot and D . sahariensis (un- over 3500 species, and is one of the largest families of seed published data). The haploid chromosome number for D aucus plants (P imenov and Leonov, 1993 ). The genus D aucus con- ranges from n = 9 to n = 11. Diploid numbers range from 2n = tains carrot (D aucus carota L. subsp. sativus Hoffm.), which is 18, 20, and 22, but two tetraploid species have been reported the most notable cultivated member of Apiaceae in terms of ( Grzebelus et al., 2011 ). The four species with 2 n = 18 ( D. economic importance and nutrition. Cultivated carrot is grown carota all subspecies, D . capillifolius, D . sahariensis , D . syrti- on an estimated 1.2 million ha annually worldwide (carrots and cus) are clearly interrelated based on shared karyotypes (I ovene turnips as aggregated data) ( FAO, 2012 ), with an annual crop et al., 2008 ). Results from our recent morphological studies value of about $640 M in the United States for fresh and pro- ( Spooner et al., 2014 ) caused us to question the many wild sub- cessing carrots ( USDA National Agricultural Statistics Service, species and suggest that there may be only two wild subspecies 2012 ). It is the single, largest primary source of vitamin A pre- of carrot, D . carota subsp. c arota and subsp. g ummifer. cursors and phytonutrients and is particularly benefi cial for The present study comprises 97 accessions for which 94 nu- human nutrition. The orange carotenoids of carrot, α - and clear orthologous genes were sequenced here, and we later added β- carotene, are vitamin A precursors and make carrot the larg- sequences for 10 accessions with a subset of these 94 nuclear est single source of provitamin A in the U. S. diet, accounting orthologs as described in the methods. The genes are distributed for about half of dietary intake (S imon et al., 2009 ). The eco- along all nine chromosomes of cultivated carrot (D . carota subsp. nomic importance of carrot stimulates research into breeding to s ativus ). Orthologs are genes derived from a single ancestral gene feed a constantly growing population, to guarantee food secu- in the last common ancestor of the target species (K oonin, 2005) . rity, and to adapt to climate change. Wild D aucus species may Phylogenetic studies rely on the identifi cation of true orthologs in play an important role in this process, providing genes that can diverse angiosperms. Nuclear ortholog markers have great poten- be used for breeding purposes such as pest and disease toler- tial utility in further studies on comparative genomics and phylo- ance or resistance, yield increase, male sterility, nutraceutical, genetics ( Fulton et al., 2002; Li et al., 2008; Levin et al., 2009; and culinary traits, among others. A better understanding of the R odríguez et al., 2009 ; C ai et al., 2012 ). The goals of our study species boundaries and phylogenetic relationships of D aucus were: (1) to compare the results from maximum parsimony, will play a crucial role in future breeding programs. maximum likelihood, and Bayesian concordance analyses, (2) T he taxonomic distinction and phylogenetic relationships to examine the effect of concatenated data vs. a coalescent (spe- among species of genus D aucus are not clear, even though there cies tree) analyses, and (3) to evaluate the potential of multiple have been studies of its morphology, anatomy and biochemistry nuclear orthologs using next-generation technologies to resolve ( Vivek and Simon, 1999 ), and phylogeny. Many generic bound- the phylogenetic relationships of D aucus . aries within the Apiaceae are unnatural as documented by mo- lecular investigations based on DNA sequences from nuclear ribosomal internal transcribed spacers, plastid r poC1 intron and M ATERIALS AND METHODS r pl16 intron sequences, plastid matK -coding sequences, plastid DNA restriction-site data, and DNA sequences from nuclear Plant species —W e examined 92 accessions of 13 D aucus species and two orthologs ( Plunkett et al., 1996 ; Downie et al., 2000; Lee and subspecies and 15 accessions of 9 species of non-D aucus genera (107 accessions Downie, 2000 ; Spalik and Downie, 2007; Spooner et al., 2013 ). in total) collected from around the world ( Table 1 ). We sampled the species Molecular data from these studies place some species from the diversity as widely as possible, based on the availability of germplasm acces- genera A grocharis , A thamanta , Cryptotaenia , Margotia , M ela- sions. This availability left 12 Daucus species unsampled: D . arcanus García-Martín and Silvestre (Spain), D . biseriatus Murb. (Algeria), D . conchitae Greuter noselinum, Monizia , Pachyctenium , Pseudorlaya, and Torna- (Greece), D . durieua Lange (Mediterranean), D . gracilis Steinh. (Algeria), D. benea within a monophyletic D aucus clade. hochstetteri A. Braun ex Drude (Eritrea, Ethiopia), D . jordanicus Post (Libya, T he latest genus-level treatment available using a morpho- Israel, Jordan), D . microscias Bornm. and Gauba (Iran, Iraq), D. montanus anatomical classifi cation is reported by S áenz Laín (1981) who Humb. and Bonpl. ex Schult. (Central and South America), D . reboudii Coss. recognized 20 species divided into fi ve sections: D aucus L. (Algeria, Tunisia), D . setifolius Desf. (Algeria, Morocco, Tunisia, Portugal, (12 species), A nisactis DC. (three species), P latyspermum DC. Spain), and D . virgatus (Poir.) Maire (Algeria, Tunisia). When germplasm was available, we examined more than one accession of the same species. All acces- (three species), C hrysodaucus Thell. (one species), and Me- sions were obtained from the United States National Plant Germplasm System, oides Lange (one species). R ubatzky et al. (1999) later esti- with D aucus maintained at the North Central Regional Plant Introduction Station mated 25 species of D aucus. The genus D aucus has a center of in Ames, Iowa. Full details of the collections are available at the Germplasm Re- endemism in the Mediterranean, with several species occurring sources Information Network (http://www.ars-grin.gov/npgs/acc/acc_queries.html). in North America, South America, and Australia ( Sáenz Lain, Vouchers are maintained at the Potato Introduction Station Herbarium (PTIS). 1981 ). Spalik et al. (2010) provided a biogeographic analysis of Many of the genera mentioned above in the D aucus clade were not available as Daucus with dates for radiations from the Mediterranean region. germplasm, which precluded us from obtaining suffi cient quantity and quality of DNA for our study. All examined materials are wild taxa except one cultivated D aucus carota L. subsp. carota is the best-known wild species accession, D . carota subsp. s ativus (T able 1 ). within carrots (B randenburg, 1981 ). The cultivated carrot, D. carota subsp. s ativus , was fi rst domesticated from wild populations of D . Data set— Figure 1 visually summarizes all procedures described below. A carota subsp. carota from Central Asia (I orizzo et al., 2013) . data set was created from the aligned DNA sequences generated by a Roche The taxonomy of D . carota L. is particularly problematical. (Basel, Switzerland) 454 GS FLX+ Platform. Initially, we examined 102 con- It undergoes widespread hybridization experimentally and served nuclear ortholog markers from 97 accessions. These nuclear orthologs spontaneously with commercial varieties and other named sub- were identifi ed by following a protocol developed by Wu et al. (2006) . Ex- species (K rickl, 1961 ; Saenz de Rivas and Heywood, 1974 ; pressed sequence tags (ESTs) of A rabidopsis thaliana (hereafter, A rabidopsis) , carrot, sunfl ower, and lettuce were obtained from different public sources. Ara- McCollum, 1975 , 1977; Umiel et al., 1975; W ijnheijmer et al., bidopsis sequences were obtained from a copy of the TAIR10 assembly at 1989 ; S t. Pierre et al., 1990; Ellis et al., 1993; Steinborn et al., PlantGDB. A set of 41 671 Arabidopsis sequences was downloaded from the 1995 ; V ivek and Simon, 1999 ; Nothnagel et al., 2000; H auser following website: http://www.plantgdb.org/download/Download/xGDB/ and Bjørn, 2001 ; Hauser, 2002) . Coauthor Simon has obtained AtGDB/ATtranscriptTAIR10 . Carrot ESTs were obtained from Additional File 2 1668 AMERICAN JOURNAL OF BOTANY [Vol. 101 of Iorizzo et al. (2011) . Only assembled contigs were used; unassembled Sanger Sequence analysis —M IRA assembled one or (more commonly) two al- reads were excluded, resulting in a set of 58 751 sequences. leles. These alleles can differ by one or many single nucleotide polymorphisms S unfl ower and lettuce sequences were obtained from The Compositae (SNPs) or indels. Only D . glochidiatus is tetraploid, but it exhibited low allelic Genome Project website at http://compgenomics.ucdavis.edu/. A set of variation similar to the diploids. In some cases, more than two alleles were 31 605 H elianthus annuus ESTs was downloaded from http://cgpdb.ucdavis. found with our coverage cutoff of 20 for useable data. However, in every case, edu/asteraceae_assembly/data_assembly_fi les/GB_ESTs_Feb_2007.sp.Heli_ these low coverage “extra” alleles differed in only minor ways (only 1–5 bp) annu.clean.assembly. In addition, a set of 26 720 lettuce ESTs was downloaded from the two higher coverage alleles and were discarded from further analysis. from http://cgpdb.ucdavis.edu/asteraceae_assembly/data_assembly_fi les/GB_ Two methods were used to process the information provided by the heterozy- ESTs_Feb_2007.sp.Lact_sati.clean.assembly . These sequence sets were gous allele state. One method was to construct a single consensus sequence each aligned with each other using the program blastn version 2.2.25 using IUPAC degenerate nucleotide ambiguity codes. A second method was to (C amacho et al., 2009 ) with a maximum expected value of 1e-10 and low select the one allele per accession that had the highest average read coverage. complexity fi ltering by DUST. Two sets of three species were aligned in all DNA sequences from these individual genes of the single allele with the highest pairwise combinations to detect reciprocal best matches (RBM). The com- coverage are deposited in GenBank ( Table 2; Appendix S1; see Supplemental parison between A rabidopsis , carrot, and sunfl ower resulted in 4023 RBM, Data with the online version of this article), and the aligned database is depos- and the Arabidopsis , carrot and lettuce comparison resulted in 5180 RBM. ited in the TreeBase repository ( http://purl.org/phylo/treebase/phylows/study/ Sequence sets were also aligned to themselves, and sequences were desig- TB2:S15477?x-access-code=52b70011707357994e61de7d36a88e63&format=html, nated as single-copy genes when there were no blast alignments to other submission ID: 15477). We concatenated the 94 genes (see Results) into a sin- sequences within the same set. The two RBM sets were then further reduced gle alignment and analyzed these two data sets (single vs. two alleles) for all to contain only sequences which were found to be single-copy genes in all 107 species, resulting in an aligned length of 112 002 bp for the data set of one three of the species making up the set. The set containing sunfl ower yielded allele only, and 116 652 bp for the data set where two alleles were merged. 71 sequences, and the set containing lettuce yielded 92 sequences; the two sets combined yielded 128 unique sequences. The carrot sequences passing Phylogenetic analyses— W e chose the Roche 454 platform to obtain long these steps were used for primer design. For each identifi ed gene, A rabidop- reads, but according to Margulies et al. (2005), this platform produces unreli- sis EST, sunfl ower and/or lettuce EST, carrot EST, and carrot whole genome able sequence for homopolymers over eight base pairs. We encountered diffi - sequence (WGS) ( Iorizzo et al., 2014 ) were aligned using the program Mac- cult and ambiguous alignments with homopolymers of bases A (adenine) and T Clade version 4.08a ( Maddison and Maddison, 2005 ). It was possible to (thymine) up to 16 bases long. Long homopolymers were also encountered in determine the exonic and intronic regions of each gene, and with the use of the carrot genome by I orizzo et al. (2011). James Speers and Xiao Liu (personal the WGS of carrot, estimation of intron sizes were obtained. We designed communication, University of Wisconsin-Madison, Biotechnology Center) sug- most of the nuclear orthologs to capture sequences of 500–700 bp and more gested that homopolymers over six bases long are unreliable. Hence, in our present than 60% intron content. Primers were designed selecting regions that were study, we shortened homopolymers to a maximum of six using MacClade. identical in sequence between all species and with a maximum match of the We rooted our trees on Oenanthe , based on Downie et al. (2000) . We fi rst 3 ′ end of the primer between all sequences (usually at least 5 bp), a melting performed maximum parsimony (MP) analyses of 94 markers and 97 acces- temperature of around 55 ° C, and GC content between 40–60%. Primers sions, comparing a data set of a single allele with the highest coverage with were checked for melting temperature, hairpins, and self-dimers using the homopolymers shortened to a maximum of six, to unmodifi ed homopolymers. program OligoAnalyzer version 3.1 (O wczarzy et al., 2008 ). After we initiated this work, we obtained 10 accessions important for our analy- Using these criteria, we designed 102 marker primer pairs with an expected sis from fi eldwork in Tunisia and Morocco that were not initially available amplicon size of 427–777 bp based on a draft carrot genomic sequence, realiz- ( Table 1) . We performed a MP analysis of each marker separately and identi- ing that some species in this study could fall outside of this range. Sixty-nine fi ed by visual inspection 10 markers that best approached the topology of the markers contain more than 60% intron content, 23 have 26–60% intron content; concatenated data sets. Based on this analysis, we performed MP analyses add- 10 were designed to consist entirely of exons. Primers were evaluated for func- ing these 10 additional accessions (107 accessions total) to the concatenated tionality and expected fragment size using the inbred line B493 of D aucus data set but with DNA sequences of these 10 markers obtained with the dideoxy carota subsp. c arota . We performed a clean PCR (minimizing the unused re- chain termination technique (S anger et al., 1977 ). We next performed a MP agents at the end) of genomic DNA of all our accessions with these primers, analysis of 94 markers and 107 accessions comparing a data set of one allele then evaluated the success of amplifi cation and actual size in a 1.5% agarose gel only chosen by highest coverage, with a data set of a two alleles merged into using standard methods. one using ambiguity codes. Quantifi cation of all amplifi cations was performed using a Quant-iT Pico- E ach study group will have different levels of species divergence depending Green dsDNA Assay Kit (Invitrogen, Paisley, UK). For each of the 97 acces- on the ingroup and outgroup variation and may require different proportions of sions, equal amounts of product from 102 nuclear ortholog marker amplifi cations intronic markers (that are more useful for lower divergence) vs. exonic markers were pooled. Each of these 97 pools was individually purifi ed with magnetic (useful for greater divergence). To explore our choice of markers in our study, beads to remove the PCR reaction components. For each of the 97 pools, we we performed a MP analysis of 94 markers and 107 accessions of a data set ligated one of the 12 Roche MIDs (multiplex identifi ers) to the pool to barcode using the single allele of the pure intronic regions vs. pure exonic regions. the single accession. This reaction was cleaned with magnetic beads again. We A ll MP analyses were conducted in PAUP* version 4.0a131 (Phylogenetic then quantifi ed the PCR fragments that were successfully ligated to MIDs using Analysis Using Parsimony; Swofford, 2002 ). Question marks and blank spaces RL (Roche Library, a fl uorescent tag that is a Roche proprietary product) at- were treated as missing data and gaps, respectively. All characters were treated as tached to the MID using the same machine for Picogreen quantifi cation. Pools unordered and weighted equally ( Fitch, 1971) . The most parsimonious trees were for sequencing consisted of 4–6 accession pools, themselves pooled following found using a heuristic search ( Farris, 1970 ) by generating 100 000 random-addi- the Rapid Library Preparation Method Manual (R oche, 2010 ). Final pools were tion sequence replicates and one tree held for each replicate. Branch swapping sent to the University of Wisconsin-Biotechnology Center where libraries were pre- used tree-bisection reconnection (TBR) retaining all most parsimonious trees. pared using the em-PCr Method Manual-Lib-A SV ( Roche, 2009) , and sequenced on Then, we ran a fi nal heuristic search of the most equally parsimonious trees from a Roche GS FLX+ instrument. We chose the Roche 454 sequencing platform this analysis using TBR and MULPARS. Bootstrap values ( Felsenstein, 1985) for because it provided longer read lengths than available with other technologies the clades were estimated using 1000 replicates with simple addition sequence, ( Shendure and Ji, 2008; Egan et al., 2012 ). setting MAXTREES to 1000. Raw sequence data were parsed by barcode to separate reads from each ac- M aximum likelihood (ML) phylogenetic analysis initially was attempted cession, and vector sequence and barcodes were removed. Reads for each ac- after selecting the best-fi t evolutionary models for the individual gene sequence cession were assembled with the program MIRA version 3.4.0 (C hevreux et al., data ( Table 2 ) with model selection computed using the Akaike information 1999) . Average read coverage was determined for each contig/accession com- criterion (AIC), using jModelTest version 2.1.3 ( Darriba et al., 2012 ). With bination, i.e., the average number of sequence reads covering each nucleotide these models, we attempted to get a ML tree with the program GARLI version of the assembled sequence. Those contigs with average read coverage below 2.0 (Genetic Algorithm for Rapid Likelihood Inference; Zwickl, 2006 ). How- 20 were removed. Assembled contigs were matched with the appropriate nuclear ever, this was impossible to run with our large data set (111 166 bp) due to time ortholog marker using the program MUMmer version 3.0 (K urtz et al., 2004 ). limits, estimated to be several years using our Dell PC with 16 GB memory and For each nuclear ortholog marker, DNA sequences from all accessions were a 3.4 GHz Intel Core i7-2600 processor. Alternatively, we obtained a ML tree aligned using the program MUSCLE version 3.8.31 (E dgar, 2004) , and further with the program RAxML version 8.0.0 (Randomized Accelerated Maximum manual alignment corrections were performed using MacClade. Likelihood; Stamatakis, 2014 ), using GTR+G model and estimating individual October 2014] ARBIZU ET AL.—PHYLOGENOMICS OF DAUCUS 1669 T ABLE 1. Accessions examined in this study. Tentative new Taxon and 2 n chromosome numbera identifi cations Accessionb Location or sourcec Ingroups Daucus aureus Desf. (22) PI 295854 Israel. Wadi Rubin (HaMerkaz). D. aureus PI 319403 Israel. Mediterranean Region. D. aureus +# PI 478858 France. Dijon. D. broteri Ten. (20) + D. guttatus 1 PI 652233 Iran. Mazandaran: Dhalus Road, Dasht-e Nazir, Kandalus. D. broteri +# D. guttatus 2 PI 652329 Greece. Peloponnese: 4 km from Skoura, toward Leonidion, Laconia Prefecture. D . broteri +# D. guttatus 1 PI 652340 Syria. Kassab. D . guttatus Sibth. and Sm. (20) + D. guttatus 1 PI 652343 Syria. Halwah. D . broteri + D. guttatus 3 PI 652367 Turkey. Mugla. D. capillifolius Gilli (18) + PI 279764 Libya. Near Jefren. D. capillifolius Ames 30198 Tunisia. Medenine. D. capillifolius # Ames 30202 Tunisia. Medenine. D . capillifolius + Ames 30207 Tunisia. Medenine. D. carota L. subsp. carota Ames 25017 Germany. Saxony-Anhalt. (18, all subspecies) # D. carota subsp. carota + Ames 26393 Portugal. Castelo Branco. D. carota subsp. carota Ames 26394 Portugal. Portalegre near Monforte. D. carota subsp. c arota Ames 26401 Portugal. Portalegre near Monforte. D. carota subsp. carota Ames 26408 Portugal. Beja. D. carota subsp. carota Ames 27397 Uzbekistan. Between Yalangoch and Sobir Raximova. D . carota subsp. carota Ames 30250 Tunisia. Nabuel: along Route 28 at junction of road to Takelsa. D. carota subsp. c arota Ames 30251 Tunisia. Nabuel: Route 26, between Takelsa and El Haouaria, 26 km from El Haouaria. D . carota subsp. carota # Ames 30252 Tunisia. Nabuel: Sidi Daoud, 1 km from Route 27. D. carota subsp. carota Ames 30253 Tunisia. Nabuel: between El Haouarcae and Dor Allouche. D. carota subsp. carota Ames 30254 Tunisia. Nabuel: between El Haouarcae and Dor Allouche. D. carota subsp. c arota + Ames 30255 Tunisia. Nabuel: along road between Korba and Beni Khalled. D. carota subsp. c arota Ames 30259 Tunisia. Bizerte: south side of Ischkeul. D. carota subsp. c arota Ames 30260 Tunisia. Bizerte: along Route 51, west of Ghzab. D. carota subsp. carota Ames 30261 Tunisia. Bizerte: grounds of Direction Regionale Mogods, Khroumerie Sejnane. D. carota subsp. carota Ames 30262 Tunisia. Beja: road from Route 7, just west of Sejnane to Cap Negro. D. carota subsp. carota * Ames 31570 Morocco. Larache: approximately 10 kilometers south of Larache, Laouamra Region. D. carota subsp. c arota # PI 274297 Pakistan. Northern areas. D. carota subsp. carota PI 279759 Spain. Madrid (Botanic Garden). D. carota subsp. carota PI 279762 Source: Denmark. Copenhagen. D. carota subsp. carota PI 279775 Source: Hungary. Pest. Botanical Garden. D . carota subsp. sativus # PI 279777 Source: Egypt. Giza: Orman Botanic Garden. D. carota subsp. carota # PI 279788 Austria. Vienna. D. carota subsp. carota PI 279798 Spain. Madrid. D. carota subsp. carota PI 295862 Spain. D. carota subsp. carota PI 390887 Israel. Central Israel: From Bet Elazari. D. carota subsp. carota PI 421301 USA. Kansas: Elk County. D. carota subsp. carota PI 430525 Afghanistan. Zardek. D. carota subsp. c arota PI 478369 China. Xinjiang: near Chou En Lai Monument Stone River, Sinkiang. D. carota subsp. carota PI 478873 Italy. Sardinia: St. Elia Beach, 50 m from sea, Cagliari. D. carota subsp. carota PI 478881 USA. Oregon: roadside between Echo and Pendleton. D. carota subsp. carota PI 478884 Source: The Netherlands, South Holland: Botanical Garden, Leiden. D. carota subsp. carota PI 502244 Portugal. Coimbra: Lousa. D. carota subsp. carota PI 652225 Source: France. Collection site unknown. D. carota subsp. carota # PI 652226 Greece. N. Khalkidiki: 10 km N of Kassandra on coast road. D. carota subsp. c arota PI 652229 Source: Tunisia. D. carota subsp. c arota PI 652230 Albania. Lushnje. D . carota subsp. carota PI 652341 Syria. Ash Sheik Hasan. D. carota subsp. c arota PI 652393 Turkey. Konya: 10-15 km to Seydisehir, between Yarpuz and Konya. D. carota subsp. gummifer Ames 7674 Source: Italy. Tuscany: Botanic Garden. (Syme) Hook.f. D. carota subsp. gummifer Ames 26381 Portugal. Faro: Near Portunao. D. carota subsp. gummifer + Ames 26382 Portugal. Faro: Near Sagres. D. carota subsp. gummifer Ames 26383 Portugal. Faro: Near Aljezur. D. carota subsp. gummifer # Ames 26384 Portugal. Beja. D. carota subsp. gummifer Ames 31193 France. D. carota subsp. gummifer Ames 31198 Unknown. D. carota subsp. gummifer PI 478883 France. Finistere: maritime turf, Le Conquet. D. carota subsp. g ummifer + D. guttatus 1 PI 652387 Turkey. Antalya. D. carota subsp. g ummifer + PI 652411 France. Finistere: Pointe de Rospico, Navez. D. carota subsp. carota + D. guttatus 1 Ames 25898 Turkey. Konya: Konya, toward Beysehir. D. carota + D . guttatus 1 PI 286611 Source: Lebanon. Faculty of Agricultural Sciences. D. crinitus Desf. (22) # Ames 26413 Portugal. Castelo Branco. D. crinitus PI 652412 Portugal. Braganca: near Zava. 1670 AMERICAN JOURNAL OF BOTANY [Vol. 101 TABLE 1. Continued. Tentative new Taxon and 2 n chromosome numbera identifi cations Accessionb Location or sourcec D. crinitus PI 652413 Portugal. Guarda: near Barca de Alva. D. crinitus PI 652414 Portugal. Faro: near Bengado. D. glochidiatus (Labill.) Fisch., PI 285038 Source: CSIRO, Australia. Capital Territory. C.A.Mey. & Avé-Lall. (44) + # D . guttatus (20) + D. guttatus 1 PI 279763 Source: Israel. Jerusalem Department of Botany. D. guttatus + D. guttatus 2 PI 652331 Greece. Peloponnese: village of Loutra Agias Elenis, 17 km south of Korinthos, Korinthia Prefecture. D . guttatus + D. guttatus 2 PI 652360 Turkey. Mugla: between Soke and Milas. D. involucratus Sm. (22) + PI 652332 Greece. Peloponnese: village of Loutra Agias Elenis, 17 km south of Korinthos, Korinthia Prefecture. D. involucratus + PI 652350 Turkey. Izmir. D. involucratus +# PI 652355 Turkey. Izmir: 5 km north of Kusadasi. D . littoralis Sibth. & Sm. (20) + PI 295857 Israel. Beit Alpha. D. littoralis +# PI 341902 Israel. D. littoralis Sm. +# D. guttatus 3 PI 652375 Turkey. Mugla: between Dalaman-Gocik and Fethiye. D . muricatus L. (20) Ames 25419 Portugal. Coimbra: Pitanca de Baixo-Condeixa. D . muricatus +# Ames 29090 Tunisia. South of Tunis along Hwy. 3 toward Zaghouan. D. muricatus PI 295863 Spain. Cordoba. From Villa del Rio (Cordoba). D. pusillus Michx. (22) + # PI 349267 Uruguay. Montevideo. Near La Colorado Beach. D. pusillus PI 661242 United States. Oregon: near Hunters River Cove, Curry. D . pusillus PI 661256 United States. Texas: Bastrop County, along Route 713 (Farm to Market Road), 5 miles south of Rockne. D . sahariensis Murb. (18) Ames 29096 Tunisia. Between Tataouine and Bir Lahmer. D. sahariensis Ames 29097 Tunisia. Between Tataouine and Remada. D . sahariensis # Ames 29098 Tunisia. Between Remada and Chenini. D. syrticus Murb. (18) Ames 29107 Tunisia. Near Beni Kdache to the south. D . syrticus Ames 29108 Tunisia. Between Medenine and Matmatas. D . syrticus Ames 29109 Tunisia. Between Medenine and Matmatas. D. syrticus +# Ames 29110 Tunisia. Between Matmatas and El Hamma, near the Gabes airport. D. tenuisectus Coss. ex Batt. (22) * Ames 31616 Morocco. Al Haouz: 25.7 km north of center of Ijoukak, 29 km south of Asni, Nfi ss River Valley, Imgdal Region. D . tenuisectus * + Ames 31617 Morocco. Al Haouz: Along Route 203, 2.3 km south of road going to Oukaimeden from Tahannout (P2028), approximately 12 km north of bridge over river, Nfi ss River Valley, Moulay Brahim Region. Margotia gummifera Lange (22) + Ames 30292 Tunisia. Jendouba: road to Tabarka, near Tabarka airport. Pseudorlaya pumila Grande (16) *+ Ames 29088 Tunisia. South of Medenine toward Tataouine, near Bir Lahmer. Outgroups Ammi visnaga (L.) Lam. (20, 22) Ames 30185 Tunisia. Bizerte: National Park Ischkeul on road to Eco Museum. Astrodaucus littoralis Drude (20) + PI 277064 Source: Azerbaijan. Baku Botanical Garden. Caucalis platycarpos L. (20) + PI 649446 Germany. Saxony-Anhalt: Mannsdorf. Oenanthe virgata Poir. (not reported) Ames 30293 Tunisia. Beja: Route 11, 41 km from Eudiana, 254 km from Beja. Orlaya daucoides (L.) Greuter (20) + # PI 649477 Turkey. Aydin: Dilek Peninsula Reserve. Orlaya daucorlaya Murb. (14) * PI 649478 Greece. Epirus: 8 km from Aristi, toward Ioannina. Torilis arvensis (Hudson) Link (24) *∞ Ames 31623 Morocco. Al Haouz: Along Route 203, 2.3 km south of road g oing to Oukaimeden from Tahannout (P2028), approximately 12 km north of bridge over river, Nfi ss River Valley, Moulay Brahim Region. T. leptophylla (L.) Rchb.f. (12) Ames 25750 Syria. Salma. T. leptophylla *∞ Ames 31619 Morocco. Ifrane: 2 km south of N13 on minor road to Ain-Leuh, beginning a few kilometers southeast of Azrou, Tigrigra Region. T. nodosa (L.) Gaertn. (24) * Ames 31606 Morocco. Berkane: Montes des Beni Snassen, Fezouane Region. T. nodosa * Ames 31607 Morocco. Al Haouz: Moulay Brahim, between Tahannout and Asni, Moulay Brahim Region. T. nodosa *∞ Ames 31622 Morocco. Al Haouz: Moulay Brahim, between Tahannout and Asni, Moulay Brahim Region. Turgenia latifolia (L.) Hoffman (24) + PI 649433 Syria. Ain el Haour. a These names correspond to those in the Germplasm Resources Information Network (GRIN) database, except for the proposed new identifi cations of the subspecies of D . carota listed in Spooner et al. (2014) . The 10 accessions designated with an asterisk were added after Roche 454 analyses with Sanger sequencing, and will therefore have more accessions with sequence data in the ninth column of Table 2 . The 34 accessions designated with a plus sign were used in the *BEAST analysis, the 21 accessions designated with a pound sign were used in the BUCKy analysis, and the three accessions designated with an infi nite sign were not used in our fi rst *BEAST analysis. The 2n chromosome numbers are those known for the species, not the individual accessions, and are taken from Grzebelus et al. (2011) and IPCN chromosome reports (http://www.tropicos.org/Project/IPCN). b Plant Introduction (PI) numbers are permanent numbers assigned to germplasm accessions in the National Plant Germplasm System (NPGS). Germplasm centers in the NPGS assign temporary site-specifi c numbers to newly acquired germplasm (Ames numbers for carrots and other Apiaceae maintained at the North Central Regional Plant Introduction Station in Ames, Iowa, USA) until an accession’s passport data and taxonomy is verifi ed, it is determined not to be a duplicate accession, and it has been determined the accession can be successfully maintained. These accessions may or may not be assigned a PI number after the assessment period. c Location refers to where the germplasm was collected in the wild, while source refers to germplasm acquired through another entity such as a market vendor or genebank. October 2014] ARBIZU ET AL.—PHYLOGENOMICS OF DAUCUS 1671 Fig. 1. Flow chart of the laboratory and bioinformatic procedures used in this study. alpha-shape parameters, GTR rates, and empirical base frequencies for each effective sample size (ESS) of each parameter. The samples of plausible trees individual gene. Using the same program, 1000 nonparametric bootstrap inferences from the two runs were individually summarized, and 25% of the trees were were obtained. Both analyses were conducted via the CIPRES (Cyberinfrastructure discarded as burn-in using the program TreeAnnotator version 1.8.0 in the for Phylogenetic Research; Miller et al., 2010 ) portal at the San Diego Super- *BEAST package. The resulting trees were viewed in FigTree version 1.4.0 computer Center ( http://www.phylo.org ). (h ttp://tree.bio.ed.ac.uk/software/fi gtree/ ). The *BEAST analyses were con- We also performed a Bayesian concordance analysis (BCA) ( Ané et al., ducted in the same PC used for ML with GARLI, but with the BEAGLE library 2007 ) to obtain the primary concordance tree using the program BUCKy version and NVIDIA GPU GeForce GTX 580. 1.4.2 (Bayesian Untangling of Concordance Knots; Larget et al., 2010 ). Accord- ing to Cécile Ané (personal communication, University of Wisconsin-Madison, Department of Botany), there is a practical limit of 25 accessions for BUCKy. R ESULTS Therefore, we conducted pruned analyses choosing representative accessions ( Table 1 ) from major clades as determined from the maximum parsimony and S equence data — Eight of the 102 markers had low coverage maximum likelihood analyses to explore gene to gene confl ict in our data set. (less than 20× ) as determined from MIRA version 3.4.0 All 94 genes with their corresponding model of nucleotide substitution (T able 2 ) ( Chevreux et al., 1999 ) or had ambiguous alignments and were were analyzed separately in MrBayes version 3.2.2 ( Ronquist et al., 2012 ) using the BEAGLE library (A yres et al., 2012 ) with four chains and two discarded from further analyses. The remaining 94 markers searches run simultaneously for 10 million generations sampling every 1000 were distributed on all nine linkage groups of D aucus carota generations. This analysis was also conducted via the CIPRES. We summa- (T able 2) . Of these 94 marker/97 accession matrices (9118 rized the MrBayes results for the 94 genes using the program mbsum included cells), there were missing data for 558 cells, resulting in 6.1% in BUCKy, removing 1001 trees from each chain as burn-in. We then per- missing data. formed the BCA with four independent runs with four linked chains for all four different levels of discordance: α = 0.1, 1, 10, and infi nite (a larger value of α corresponds to greater gene tree incongruence); in each run with Maximum parsimony (MP) analyses — As explained in the 1 100 000 generations; 100 000 generations were discarded as the burn-in pe- introduction, we conducted reiterative (1) modifi cation of the se- riod. Default settings were used for all other parameters. quences with and without homopolymers trimmed to a maximum We also performed a Bayesian analysis using *BEAST package version of 6 bp, (2) analyses of single markers one by one, vs. a concat- 1.8.0 (Bayesian Evolutionary Analysis by Sampling Trees; Drummond et al., 2012 ) to obtain a species tree estimation using a coalescent approach. An XML enated data set of 10 or all 94 markers, (3) analyses of a data set format fi le was generated using BEAUti version 1.8.0. With 94 genes with their of a single allele vs. two alleles merged into one by ambiguity corresponding model of evolution ( Table 2 ), one initial analysis used 104 ac- codes, and (4) analyses of intronic vs. exonic regions. cessions comprising 27 species ( Table 1 ). In addition, one fi nal analysis used only HKY models of evolution ( Table 2 ) and a subset of 37 accessions compris- M odifi cations of topology by trimming the homopolymers to ing 22 species (T able 1) . All analyses were conducted using the Yule process as a maximum of six— Our initial data set of 94 markers and 97 a species tree prior (G ernhard, 2008 ). All Markov chain Monte Carlo (MCMC) chains were run for 1 billion generations sampling every 50 000 generations. accessions with a single allele with highest coverage had an We imported the log fi les of the two runs into the program Tracer version 1.6.0 aligned length of 112 002 bp, and the data set with homopoly- in *BEAST to analyze the convergence to the stationary distribution and the mers trimmed to a maximum of six had an aligned length of 1672 AMERICAN JOURNAL OF BOTANY [Vol. 101 111 166 bp (a reduction in 836 bp). The tree scores (T able 3 ) (T able 3 ) document a longer aligned database for two alleles, and MP topologies of these two analyses (Appendices S2, S3; 115 882 bp, vs. 111 166 bp for single alleles (4716 bp or 4.2% see online Supplemental Data) are similar, with only minor longer). The consistency index of the resulting two-allele tree is differences in bootstrap scores, and rearrangements of clade re- larger (0.641) than of the single-allele tree (0.53). The topology lationships within clade A′ , containing the Daucus species with of the two trees (F ig. 2 , online Appendix S4) also differed. For 2 n = 18 chromosomes, in contrast to all other species in D aucus example, the two geographic subsets (1) Libya and Tunisia, (2) clades A and B with 2 n = 20, 22, and 44 (D . glochidiatus ), and Portugal and Spain are missing in the two-allele tree. There is a 16 (P seudorlaya pumila) . Other than clade A ′, there is no wide- polytomy in clade B of the two-allele tree that is resolved in the spread pattern in chromosome numbers. single-allele tree, although with only 67% bootstrap support. However, many of the remaining topologies remain the same. T opology of differing numbers of markers— As expected, our data sets with a single allele with highest coverage examined Maximum parsimony analyses examining the pure intronic re- with markers one by one produced MP results with a wide gions from the pure exonic regions— We designed our analysis of range of topologies. When we compared these individual gene Daucus and close outgroups to use a majority of markers with 60% trees to that using all 94 markers ( Fig. 2) we noted that some intron content or more, assuming that such regions were needed to individual gene trees were similar to the concatenated “domi- give phylogenetic resolution. To broaden the analyses, we de- nant” topology. F igure 3 shows the MP topology of marker signed primers to evaluate 10 purely exonic gene regions to have DC10366 that appeared the most similar to the dominant topol- data potentially useful for the farther outgroups and to explore the ogy, and F ig. 4 s hows the MP topology of 10 concatenated phylogenetic utility of these regions for the ingroup. Our pure ex- markers that, like marker DC10366, approached the dominant onic regions (gleaned from all 94 markers) had 20 478 aligned topology. These results are useful for those wishing to recon- characters, vs. 90 688 aligned characters for the pure intronic re- struct dominant topologies of D aucus with additional accessions. gions. The consistency indices for both trees (online Appendices The dominant topology is highly resolved, with 100% boot- S5, S6) are nearly identical, but there are many more parsimony- strap support for most of the external and many of the internal informative characters in the intronic regions as a proportion of the clades. Notable exceptions are the relationships within the ac- total characters. Specifi cally, the total database had 18.4% exonic cessions of D . carota and D . capillifolius in clade A′ , and of D . regions and 81.6% intronic regions, and taken as a proportion of sahariensis and of D . syrticus also in clade A′ , but these two these length differences, the introns had 20.6% parsimony-infor- groups are strongly supported as sister clades to each other with mative characters vs. 13.6% for the exons, about 50.7% larger for 100% bootstrap support. Within the D. carota and D. capillifo- the introns. The topologies of the two trees (Appendices S5, S6) lius clade, there are two clusters associated with a geographical refl ect these differences in the number and parsimony-informative- component. All accessions with known locality data of D . ca- ness of these two data sets. While the main clades A, A′ , and B are pillifolius and D. carota collected in Libya and Tunisia form a the same, there is a reduction in bootstrap support for some of the weakly supported clade (<70% bootstrap, F ig. 2 , highlighted in main clades. red). In addition, most accessions of D . carota collected in Por- tugal and Spain form a strongly supported clade (100% boot- M aximum likelihood analysis — Our initial attempt to obtain a strap, Fig. 2 , highlighted in blue), but two accessions from ML tree with 1000 bootstrap replicates using mixed models on the Portugal and Spain were not present in this clade (Ames 26401, GARLI platform was unsuccessful due to lack of time (we esti- PI 279798) and one accession from Morocco (Ames 31570) mated that over 5 yr would be needed based on the run times of our was present in this clade. attempt). Hence, we ran the ML analysis with 1000 bootstrap rep- T he dominant topology grouped different accessions of licates using RAxML with a single model of evolution, but using many different species with strong support, but in addition to different alpha-shape parameters, GTR rates, and empirical base the species intermixing in clade A′ as discussed earlier, this to- frequencies. This ML tree ( Fig. 5 ) has the same overall topology as pology failed to group D . broteri and D . guttatus together, plac- the MP tree ( Fig. 2 ), including the geographic subsets in clade A, ing these two species in three separate well-supported clades and recovers the same clades A, A′ , and B. In addition, there are (all 100% bootstrap support). We grew these accessions again good bootstrap support values in most components of this tree. and resequenced the DNA with the 10 nuclear orthologs men- Two notable exceptions are (1) D. aureus and D . muricatus are tioned earlier to check for misidentifi cations. The plants ap- not on the same clade in ML as they are in MP, but form sister peared the same as our original vouchers and they grouped the clades. (2) Although D . guttatus 1, 2, and 3 have the same sis- same with these new DNA data. However, the morphological ter-group relationships in both analyses, the relationships of characters distinguishing these species are ambiguous, mirror- these sister-group pairs differ. ing our molecular results. Because of uncertainty of the appli- cation of these names, we name them here as D . guttatus (the Bayesian concordance analysis — Our pruned analysis showed earliest name) 1, 2, and 3. M argotia gummifera and Pseudor- an acceptable result as the standard deviation of concordance fac- laya pumila were sister to the D. carota clade, followed by D. tors was less than 0.005. The primary concordance tree (F ig. 6) aureus and D . muricatus , and then the remaining Daucus spe- estimated for 94 genes and 21 accessions with Bayesian analyses cies. O rlaya was supported as the closest outgroup to Daucus. showed a similar topology to the MP and ML trees. In addition, We labeled the two main clades each with 100% bootstrap sup- there were no signifi cant differences among the concordance fac- port as clade A and clade B. tors using the four different prior probabilities on gene tree incon- gruence ( α values). BCA worked well for this pruned analysis. The M aximum parsimony analyses using different scoring of al- concordance factors (CF) of these same main clades, despite having lelic variants—O ur MP results comparing a single allele with comparable taxonomic relationships, are much lower than boot- the highest coverage vs. two alleles merged into one using am- strap support values in the MP and ML analyses, but they are meant biguity codes differed in the following ways. The tree scores to show different aspects of the topology and are not meant to be October 2014] ARBIZU ET AL.—PHYLOGENOMICS OF D AUCUS 1673 TABLE 2. Characteristics of the 94 nuclear orthologs used in our study. No. of accessions Length Linkage Aligned Intron with sequence Exon e Intron No. No. of gene % of Markersa g roup lengthb Model of evolution c Forward primer (5′ –3 ′) Reverse primer (5′ –3 ′) content GenBank nos.d data (nt) (nt) i ntrons e xons (nt) intron DC10366* V 1128 TVM+G (GTR+G) [HKY+G] G TAGTTCTTCACACAGCTTCCTTC A TCAACTTCGTGCTGCTCTTG <60% intron KJ521466–KJ521568 103 2022 2464 6 7 4486 54.9 DC10966* I 1344 TPM2uf+G (GTR+G) T CATTGCACCACCGACATG C TTCAAGGTCTTTGTGGTGTTCA >60% intron KJ520950–KJ521056 107 654 2927 4 5 3581 81.7 [HKY+G] DC11423 V 606 HKY+I+G T CGTTCATGAGGGAAATCGC G GACCCTCCTGAGTAAACAC exon only KJ520089–KJ520179 91 1269 0 0 1 1269 0.0 DC11491 IX 605 HKY+G T CCTGGTCTCCTGTTGCA CACAGGTTGAGGAAGTGAAGATTC exon only KJ516559–KJ516652 94 1965 1819 3 4 3784 48.1 DC11601 IX 1729 HKY+G A GATCTCTCTTCTCGCTCTCC CAACTGCCTATACTCTTCGCTC >60% intron KJ518726–KJ518816 91 1794 18209 16 17 20003 91.0 DC12018 I 1632 GTR+I+G [HKY+I +G] C TCGAGAGCACAACCACATG CAACAATGTTCAACCCTCGTTCT <60% intron KJ521755–KJ521845 91 855 3474 6 7 4329 80.2 DC13263 I 644 GTR+I+G [HKY+G] C AAGCTGCAAGACCTTCTCTC ACCTTCATCAAACACCTTGCAG exon only KJ515770–KJ515865 96 1725 0 0 1 1725 0.0 DC1340 VIII 613 TPM3uf+I+G [(HKY+G)] A GAAGAATCTGAAATGGCAGGTAGA GGCCTTGTCATTCAAAACAGC <60% intron KJ514413–KJ514504 92 378 306 3 4 684 44.7 DC14804 VII 2765 TPM2uf+G (GTR+G) GGGAAGTCCTGTACTCATTTTGG GGAAAGTTTGGGCTAGGTGC >60% intron KJ520600–KJ520663 64 1800 5460 17 18 7260 75.2 [HKY+G] DC1494 IX 837 TM3uf+G [(HKY+G)] A CCGTTTCTACAATCTGGAAATGC TCATTATCCTCGTCATCTGGTT >60% intron KJ520761–KJ520852 92 533 2761 4 5 3294 83.8 DC15318 V 1296 TPM3uf+I+G [(HKY+I+G)] G TGCCTATCGTGGTATTACAGC G TTTTGGTGATGCCTGTGC >60% intron KJ514782–KJ514877 96 1482 6519 15 16 8001 81.5 DC15347* I 1014 TrN+I+G [(HKY+I+G)] A TCTCAACCTTTTCTCTGATGGTG ACCACCCACCTACAGATCC >60% intron KJ522039–KJ522142 104 885 2291 6 7 3176 72.1 DC15398 V 668 TrN+G [(HKY+G)] A GCAGAAGTTGATGATGATGCTG ATGAGTTGGGCTTCGCATC <60% intron KJ520180–KJ520260 81 681 5241 4 5 5922 88.5 DC15442 VI 1267 TVM+I+G (GTR+I+G) C TGTGGCTTTGACTCTCTTCATC C CCACAATTCAGACCAAGAAGC >60% intron KJ520853–KJ520949 97 1290 1990 5 6 3280 60.7 [HKY+I+G] DC15678 III 645 TPM2uf+I+G (GTR+I+G) ATTGCAGGAAGTGGGACG C GAACATGACCATCATCTCCAC >60% intron KJ515536–KJ515587 52 455 2542 3 4 2997 84.8 [HKY+I+G] DC15851 III 665 GTR+I+G [HKY+I+G] G TACCAGAATCAGCTTTCTCCAC CACTCTCGACTACAAGGTAACATG exon only KJ514236–KJ514315 80 321 0 0 1 321 0.0 DC16119 VII 901 HKY+G G AGATAATCCGACAGAACATCAGC ATTGCTTCTCCTACGCATTACC >60% intron KJ514972–KJ515066 95 1248 4287 7 8 5535 77.5 DC16209 I 734 TPM2uf+G (GTR+G) GCCTTCCAGCAACATTCATTC T GTCATTGATTCTGGTGATGGTG >60% intron KJ519530–KJ519624 95 1170 11725 14 15 12895 90.9 [HKY+G] DC16308* VII 2249 TrN+G (GTR+G) [HKY+G] A GTCGAATGCAACAAGAGTTAGC G CTGGTCTACGCTTGAATCATC >60% intron KJ522336–KJ522436 101 1022 5241 5 6 6263 83.7 DC16340 I 3117 TVM+I (GTR+I) [HKY+I] C CCATTTGCTTCCGAACAGTAG GATGAAGAACAACATGCTCATCAGC >60% intron KJ521675–KJ521754 80 1119 7091 7 8 8210 86.4 DC16577* VII 717 HKY+I+G G AATCGCCATCCCCAATCC GGTTTCATCTTCTACCAGCAGTTC <60% intron KJ521366–KJ521465 100 1527 8239 10 11 9766 84.4 DC16645* V 1010 TPM2uf+G (GTR+G) C AAGCGTATTGTTCCAACTGC GAATCCATCCATCGGGAGAAC >60% intron KJ521057–KJ521163 107 1260 5596 11 12 6856 81.6 [HKY+G] DC16778 V 639 TPM2uf+G [(HKY+G)] G ATAACATGTCGGGGATACTGC GGGGTCCAGCAGATATCTTTG <60% intron KJ515679–KJ515769 91 1074 6072 10 11 7146 85.0 DC16928 I 939 HKY+G C ACCCAATACTCCATCGACTG G ATTGGTATCGTACTGGTGTTGG >60% intron KJ518344–KJ518438 95 834 4444 7 8 5278 84.2 DC17018 VII 946 TPM2uf+G [(HKY+G)] A CAGTGAAACCACTTGCACC A AGCCAGGAAGAGCAAGATG >60% intron KJ515258–KJ515353 96 249 680 2 3 929 73.2 DC17278 VI 486 GTR+I+G [HKY+I+G] C GTAACTCCTTACCCGAGAAC GCTCCGTAACAACAACGCTAC exon only KJ519255–KJ519351 97 1266 0 0 1 1266 0.0 DC17284 I 883 TVM+I (GTR+I) [HKY+I] G CCATTTCCCCTTAAACGAGTC GCTCCTGTTCGTGAAGTTTGG >60% intron KJ515451–KJ515535 85 2190 2527 9 10 4717 53.6 DC17311 VI 1479 TIM2+G (GTR+G) [HKY+G] C ATTCCAACCAGTACAGAAGAGAT C ACCCAGCATCAAATGAGC >60% intron KJ519090–KJ519176 87 948 2751 4 5 3699 74.4 DC17915 I 1702 TVM+G (GTR+G) [HKY+G] T AAGCAGAATCCGCAAAGTGC AGGAGGTTTTAGTGAGGGAGG >60% intron KJ521270–KJ521365 96 432 1093 3 4 1525 71.7 DC1877 V 1073 HKY+G G GCCATTTGTTGATCATTGGTG CGGGAAATAGCTTTATTGTGCC >60% intron KJ515163–KJ515257 95 567 5619 7 8 6186 90.8 DC18883 I 1398 TrN+G (GTR+G) [HKY+G] G CCCTTTGACTGCCTCTTC GATGCTTCCCTTGACGATGATG >60% intron KJ517749–KJ517810 62 519 893 3 4 1412 63.2 DC19270 II 836 HKY+G G AGAAAAGCCTCCTGGATCAC GCGGGGTAAAAAGTCTCCTTC >60% intron KJ515866–KJ515956 91 1374 3931 13 14 5305 74.1 DC20026 VII 1354 TPM1uf+I [(HKY+I) ] G AGAGTCAAGGAGAAGATGAAGC CCCTTGCCATTTGTTGCATC >60% intron KJ517659–KJ517748 90 369 636 3 4 1005 63.3 DC20383 IV 957 TIM1+I (GTR+I) [HKI+I] C CCGCTGTTATCAAACACACA ACTATAAATTCCGAGAGATCCAAGC >60% intron KJ516932–KJ517027 96 2148 9189 15 16 11337 81.1 DC20459 II 667 HKY+G T CCAACAAACAGTCAATGCTCATC GGAGAAGAAACATGTCTCTACTGC >60% intron KJ516836–KJ516931 96 2133 1486 7 8 3619 41.1 DC20759 I 985 TPM2uf+G [(HKY+G)] G GCCATTTCAGCACTTTGC A AATTCAGCTTGGAGTTGACCC <60% intron KJ519719–KJ519809 91 1533 15841 12 13 17374 91.2 DC2290 VIII 2368 HKY+G G TTCTTCTTCAGCTCTGGATGG GCTATTCATGTGTTGTCCCTGAC >60% intron KJ519625–KJ519718 94 687 3933 7 8 4620 85.1 DC23389 IV 2426 TPM3uf+I+G [(HKY+I+G)] T CACAAAACCAATCGCAGGG CATTGGGTCCAGTGAGTACC >60% intron KJ518914–KJ518993 80 1812 6372 16 17 8184 77.9 DC246 VI 1573 TIM2+I (GTR+I) [HKY+I] C AAACAAGTGACCCTAGAAGTGG GCTTAATGCCCAGAAAGTGCTC >60% intron KJ519810–KJ519905 96 2748 5626 11 12 8374 67.2 DC25791 VI 1637 HKY+G A GCTTTGGCACAAGAGTCTTG AGTTGGTTGCCGAGTTTCTATG >60% intron KJ516288–KJ516374 87 2451 4486 9 10 6937 64.7 DC2600 II 631 TIM1+G (GTR+G) [HKY+G] T TGCGGTGGGTGTTTATGG TTTCCTCCACTTTCGCCTTC exon only KJ520261–KJ520339 79 1239 0 0 1 1239 0.0 DC26617 VII 905 HKY+G A CCACCTAGTCTCCACTGAC CAGAGGTTACCTGCGAGTAC >60% intron KJ520503–KJ520599 97 1104 460 2 3 1564 29.4 DC26889 V 678 TPM3uf+G (GTR+G) C CTTAAGACCTTCTGCATTTGC CTTGTGAATCATTGCCTGCTTC >60% intron KJ520434–KJ520502 69 3039 6556 22 23 9595 68.3 [HKY+I] DC27381 III 1189 GTR+I+G [ HKY+I+G] G GCACCTATTTCTGTGTTGACA TGCAGCCTTGTCATCATAGTG >60% intron KJ516653–KJ516743 91 456 2639 5 6 3095 85.3 DC2772 II 645 TIM2+I [(HKY+I)] G GTATTCACAGCGAGGAGG A TCAATGCGATGGCGTAACTC >60% intron KJ520664–KJ520760 97 1656 5640 14 15 7296 77.3 DC28180 V 1145 TPM2uf+G [(HKY+G)] T GCTGACTTTTCCAAGGAAGTG GAGAGTTGGCAAGGTGAACTG >60% intron KJ518994–KJ519089 96 1254 5829 12 13 7083 82.3 DC282 VI 878 TMP3uf+G [ (HKY+G)] A CATGGTTCAGATGGTGCTG CACTACCTCTGCTAACCACC <60% intron KJ518817–KJ518913 97 978 333 1 2 1311 25.4 DC28973 I 1896 HKY+G A GAACAGAATGGGCAAGTGG G CGCATCCCTCCATTATCAG >60% intron KJ518249–KJ518343 95 1746 1407 4 5 3153 44.6 DC2965 II 686 HKY+I+G T ATGTGCAGCCTGTAACGATTG A GACTCTCACTCCTCTTCACC exon only KJ517586–KJ517658 73 1461 1229 2 3 2690 45.7 DC30826 V 1118 TIM3+I (GTR+I) [HKY+I] C CATTGCTAGATGTTGAAACACCC GACAATGGTGGTCAAATCCCT >60% intron KJ514878–KJ514971 94 1008 3321 11 12 4329 76.7 DC31753 IX 676 GTR+I [HKY+ I] C AGCTATGCTTCATGACTGTACC CCAAGAGTAACCACATCTGTCTTG >60% intron KJ517490–KJ517585 96 1218 3969 8 9 5187 76.5 DC32391 IX 722 TIM1+G (GTR+G) [HKY+G] A GGCATCTGCTGTGGAAATTG ACAGCCTATCCATCAATCAATGC <60% intron KJ515067–KJ515162 96 576 2512 6 7 3088 81.3 DC3277 VIII 1116 TPM1uf+G (GTR+G) G TGCGGAGAATCATGTCGATG A CAAAGCTGCGGTCCAAAG >60% intron KJ514693–KJ514781 89 2529 7766 9 10 10295 75.4 [HKY+G] DC32914* VI 631 TIM3+G (GTR+G) [HKY+G] T GCTTATTTGCTCCGAGATATGC CACAAATCGGGTCCTAGTCC exon only KJ521164–KJ521269 106 2685 0 0 1 2685 0.0 1674 AMERICAN JOURNAL OF BOTANY [Vol. 101 TABLE 2. Continued. No. of accessions Length Linkage Aligned Intron with sequence Exon e Intron No. No. of gene % of Markersa g roup lengthb Model of evolution c Forward primer (5′ –3 ′ ) Reverse primer (5′ –3′ ) content GenBank nos.d data ( nt) (nt) introns e xons (nt) intron DC3374* VI 1292 TIM3+G (GTR+G) [HKY+G] C TTCTTCCATGCGACCCA C TACAGGGAGCGAAAGAAATATGT >60% intron KJ522143–KJ522246 104 2073 10629 20 21 12702 83.7 DC34333 I 709 GTR+G [HKY+G] G AAATCTGCATATCCCCTAAAGGC T GCTGCCTCAGAATTGTCATTTG <60% intron KJ516744–KJ516835 92 1221 866 5 6 2087 41.5 DC35097* VI 761 TIM1+I+G (GTR+I+G) GGTAACTTCTTGTAATCTGCTCCC C ACTATGTAGCACCATCATTCTGG <60% intron KJ521569–KJ521674 106 2289 6104 13 14 8393 72.7 [HKY+G+I] DC35833 I 1235 TPM2uf+G [(HKY+G)] A CTTCCAAGTGTTGTTTGTCCA GGGACATATGAGGCTGTTAGATAC >60% intron KJ520340–KJ520433 94 225 1421 2 3 1646 86.3 DC38098 VI 2070 TIM1+I (GTR+I) [HKY+I] A ACTTCATGAGTTCTTCATCCACC CATTGGTCAACTGGAAGGGAC <60% intron KJ514602–KJ514692 91 1140 2473 4 5 3613 68.4 DC38583 I 2426 TrN+G (GTR+G) [HKY+G] C CAATCTCCGACATCAAACATGC G GCCGTTGAGGTTCTTGTTG >60% intron KJ519906–KJ519997 92 2151 8715 18 19 10866 80.2 DC3902* VII 1334 TPM2uf+G [(HKY+G)] C AAGCCAATTACCAGGAGAGC CGTCTCTCATAGTTGTCCACAC >60% intron KJ521939–KJ522038 100 507 779 1 2 1286 60.6 DC42375 IV 1148 TIM3+I+G [(HKY+I+G)] T CAAAAATGACCAGCTAGTTGAGG CTCCCAGTAAAGTTGGTTGCAG >60% intron KJ515354–KJ515450 97 1515 2052 7 8 3567 57.5 DC42660 VI 680 TIM1+I+G [(HKY+I+G)] A GAACTGCTAAAGAAGATGGTTGC CTGTTGCTCCAATATCCTCTCTTC exon only KJ514316–KJ514412 97 1578 0 0 1 1578 0.0 DC43804 IX 2792 HKY+G C CTGACAAGATAAGGCAGAACATC TAGAATTCAGAAGAGCATCCTTCG >60% intron KJ517124–KJ517211 88 1386 6274 12 13 7660 81.9 DC45860 V 847 TIM2+G (GTR+G) [HKY+G] G TGAAGCAACTGCCTTCTCTAA A GGCCCAGGTTTTTATTGAGG >60% intron KJ517811–KJ517905 95 1206 4348 8 9 5554 78.3 DC46197 III 1203 TIM2+G (GTR+G) [HKY+G] C TGGTTTTTACACTGCCGAAAAG CTTTGTTTCAGGATGATCAGGTGC >60% intron KJ519446–KJ519529 84 1020 2325 9 10 3345 69.5 DC46683 VI 1362 TIM2+G (GTR+G) [HKY+G] T CATCATCACAAAACACCTCACTC ATAAGCGTATTGGTTCGGCTG >60% intron KJ517212–KJ517304 93 3435 12991 25 26 16426 79.1 DC46818 VII 945 TPM2uf+G [(HKY+G)] A CCAAATAGCTACTGCTTGGATC C GTAAACTCCTAAGCTGTAGTGC >60% intron KJ515588–KJ515678 91 3876 8175 20 21 12051 67.8 DC47239 IV 1241 TPM3uf+I (GTR+I) [HKY+I] G CAACTTTGATCTATTATGGAGTCCG GTCAGGCTGACTGCTTTCTC >60% intron KJ519177–KJ519254 78 835 924 2 3 1759 52.5 DC48708 IV 1521 GTR+G [ HKY+G] T CGACGAAAGAAATTGGAGTGATG G TGTCTTTGAGAGTCCTTCCC >60% intron KJ514050–KJ514143 94 1347 2886 6 7 4233 68.2 DC49764 I 488 TPM3uf+I [(HKY+I+G)] C TGCCAGGAACAAGACGAC GAAAACGGAACCAATAGTCAAGGA <60% intron KJ515957–KJ516051 95 1647 4940 11 12 6587 75.0 DC49801 IX 778 TrN+G [(HKY+G)] G TAGAAGCATTAATAACACGAGGC GGAAAGAAAAGTTGCAGACATGC >60% intron KJ517305–KJ517399 95 2988 14377 17 18 17365 82.8 DC51684 I 1084 TVM+G (GTR+G) [HKY+G] G ATTGGTACCTCAGAGCATGG CTAATCCAGCCAGTCCACC >60% intron KJ516375–KJ516463 89 1089 3591 10 11 4680 76.7 DC54236 III 627 TPM1uf+I [ (HKY+I)] C ATCTCATCCGGTTCCTTTGG C CCTCTGAAGCCCTAAAATCATTG <60% intron KJ518536–KJ518630 95 2877 0 0 1 2877 0.0 DC54335 VII 804 TrN+I+G [(HKY+I+G)] C GGAAAGAGTGTTTTTGTAAGTGC C TTGTGGTGGAAGAGTTGTACAG >60% intron KJ518631–KJ518725 95 2139 15201 18 19 17340 87.7 DC56413 VII 619 TIM2+I +G [(HKY+I+G)] C TCTCATTACCCATGGCAACTC AAAGAGTGCAACACAGAGGG exon only KJ516052–KJ516126 75 456 0 0 1 456 0.0 DC57194 II 574 TPM3uf+G [(HKY+G)] C GGGCAATATACATCCTGCATC G CCTGCATCAAAGGTTGC <60% intron KJ513880–KJ513976 97 1212 7137 9 10 8349 85.5 DC57966 II 484 HKY+G A GTGGAATGTCCGAAAAGGC G CATAGATAGAATGCCGCTGC <60% intron KJ517906–KJ517984 79 867 1687 9 10 2554 66.1 DC58732 II 1712 TPM3uf+I [(HKY+I)] A GGGACAAGGCATTTATCGC CATCCTTGTACAGTAGATAGCGTG >60% intron KJ517028–KJ517123 96 882 7939 12 13 8821 90.0 DC58941 II 2084 HKY+I+G G ATGTCTCTATCAACACGATCTGC C TTGGAAAACGACAACATGGTG >60% intron KJ521846–KJ521938 93 657 3234 6 7 3891 83.1 DC59353 V 2252 TPM3uf+G [(HKY+G)] G GTCTGGCATTTGGAACTGG G GCATTATGTCGTGCAGCAG >60% intron KJ516219–KJ516287 69 1083 2246 6 7 3329 67.5 DC68 IV 857 TPM2uf+G [(HKY+G)] A TTCTGAAGTCATAGCTGCTCAAG CCAAGATAGTCCAAGCATGAAGC >60% intron KJ518439–KJ518535 97 1326 3643 5 6 4969 73.3 DC7428 I 601 HKY+G G GAAGCCTATCTGTTTCCCAG GTTGGTTTGAGAGATCTATGGTGG <60% intron KJ518056–KJ518151 96 1254 8598 9 10 9852 87.3 DC8031 V 960 HKY+G A CATTTGTTGCTCTCTTATCCACC CACCTGCCTCTCTTCCTTC >60% intron KJ518152–KJ518248 97 1726 4787 8 9 6513 73.5 DC8465 IX 934 TrN+G (GTR+G) [HKY+G] A GTGCAAATGGAGATCATACTGC ATGGCGAGTACCACCAAAC >60% intron KJ519352–KJ519445 94 1182 3189 7 8 4371 73.0 DC8584 VIII 1817 HKY+I C AATCCTTGGTTCCTCAGCAC CGGGCATGTCTTTGGGTG >60% intron KJ517985–KJ518055 71 834 2422 9 10 3256 74.4 DC8796 III 1931 TIM3+G [ (HKY+G)] T TTGATGGCTCGTGGTCC G CTTTCTTCCTCGTCCTTTCTTTG >60% intron KJ514144–KJ514235 92 312 3026 3 4 3338 90.7 DC8872 VI 593 TPM3uf+G (GTR+G) A AGCATCACAACCTGCATCTC C TGATTCTCAGTACTTTGAAGAGCTG <60% intron KJ517400–KJ517489 90 882 2536 5 6 3418 74.2 [HKY+G] DC8990 I 600 TPM3uf+G [(HKG+G)] G TTACTCCACAACCTTCTGTACC G GTTATGCCTCCACTATGGTG <60% intron KJ514505–KJ514601 97 1899 6779 12 13 8678 78.1 DC9014 I 1053 TrN+G [ (HKY+G)] C AGGCAACCGAGGTGTAATTC T GTGGGCTTTACGAAAGGAAAC <60% intron KJ522247–KJ522335 89 697 6171 4 5 6868 89.9 DC9186 VII 1546 TIM3+G (GTR+G) [HKY+G] A TGCTGGGTTGCTTTATTGGG AGTTGCTTCCTTCAAGTCCG >60% intron KJ516127–KJ516218 92 1080 1626 3 4 2706 60.1 DC9206 I 1197 TVM+I (GTR+I) [HKY+G] A CCAATTCAGTCTGTGGCAAC TTCAAGAAAGGTTTCAGAGAGGC >60% intron KJ519998–KJ520088 91 1279 1083 3 4 2362 45.9 DC9853 VII 1754 TVM+G (GTR+G) ACTCTTCAAACAGGCAACAGC CTGATTTGAGAAGTCCAGTGCG >60% intron KJ513977–KJ514049 73 2478 2897 7 8 5375 53.9 [HKY+I+G] DC9952 VI 2801 TIM2+G (GTR+G) [HKY+G] C GGATTGCGTTCAGTAAATTCC CCTCCTTCGGCTGAATATGTG >60% intron KJ516464–KJ516558 95 918 1015 2 3 1933 52.5 Concatenated 111 166 18.4% exon, 8557 length 8 1.6% intron a An asterisk designates the 10 markers used in the reduced data set. b Refers to the aligned length of the data set with homopolymers trimmed to a maximum of six. c The fi rst model of evolution was obtained in jModelTest. The second model (if present, in parentheses) was the next lower AIC value accepted in MrBayes analysis and *BEAST analysis. The HKY model (if present, in brackets) with the lowest AIC value was used in our *BEAST analysis. Markers in parentheses and brackets were used in MrBayes and *BEAST analyses. d See online Appendix for specifi c information of accession/marker codes. e Columns 10-15 refer to characteristics of the entire genes, not the smaller portions of the genes sequenced here. October 2014] ARBIZU ET AL.—PHYLOGENOMICS OF D AUCUS 1675 TABLE 3. Tree scores for maximum parsimony analyses. Maximum parsimony tree parameters 94 markers, 107 accessions, 1 allele vs. 2 alleles 94 markers, 107 94 markers, 1 or 10 markers, combined with accessions, 1 allele, 97 accessions, 1 allele, 107 accessions, ambiguity codes, homopolymers homopolymers present 1 allele, homopolymers homopolymers trimmed, intronic vs. vs. trimmed trimmed trimmed exonic regions Homopolymers Homopolymers 1 marker allele Tree statistics present trimmed (DC10366) 10 markers 1 allele 2 alleles Intron Exon No. characters 112 002 111 166 1128 11 480 111 166 115 882 90 688 20 478 Parsimony-informative 21 193 21 011 270 2903 21 502 21 348 18 711 2791 characters No. parsimonious trees 1 2 1 3 6 16 2 1 Length 92 908 91 361 784 10 122 92 859 74 516 81 160 10 904 Consistency index 0.530 0.530 0.732 0.608 0.530 0.641 0.537 0.525 Retention index 0.729 0.732 0.898 0.830 0.736 0.812 0.739 0.704 Rescaled consistency index 0.387 0.388 0.658 0.505 0.390 0.520 0.397 0.370 Fig. or supplemental App. S2 App. S3 Fig. 3 F ig. 4 F ig. 2 App. S4 App. S5 App. S6 Appendix no. comparable values. Concordance factors within the subspecies of D ISCUSSION D aucus are low. On the other hand, D . sahariensis and D . syrticus are grouped in a clade with a high CF (0.69). Clades A and B are U se of next-generation sequencing — The Roche 454 se- supported by concordance factors of 0.348 and 0.396 respectively, quencer was released in 2005 ( Margulies et al., 2005 ) and is con- which translates to 32.7 and 37.2 genes, respectively (by multiply- sidered the fi rst commercially available next-generation sequencing ing the number of genes, 94, by the concordance factors). platform ( Rothberg and Leamon, 2008 ; Egan et al., 2012) . The 454 technology utilizes the pyrosequencing method described by Species tree estimation— The analysis using 104 acces- Dressman et al. (2003) , providing a mean read length of 700 bp, sions of 27 species and 94 genes with their corresponding similar to that obtained by current Sanger capillary technology model of evolution as indicated in Table 2 column 4 exhibited ( Sanger et al., 1977 ), but at a lower cost per read. Other competing some ESS values lower than 100 for the posterior, prior, technologies, such as Illumina provide higher coverage but suffer among other parameters. It took 89 d for this analysis to reach from shorter read lengths ( Egan et al., 2012 ) or have longer reads completion using the PC mentioned in the Materials and averaging 8500 bp but with higher error rates, such as Pacifi c Bio- Methods. These low values have been reported by other re- science ( Egan et al., 2012 ; Koren et al., 2012 ; P acifi c Biosciences, searchers in the web site for users of *BEAST (h ttps://groups. 2013 ). One weakness of the 454 technology is inaccurate estima- google.com/d/forum/beast-users) . Andrew Rambaut, a coauthor tion of homopolymer region lengths. By the time our study was in of the *BEAST package, indicates in the website above that the its design stage, the 454 sequencer had already gained a high repu- low ESS values are obtained because of a problem when using tation in the scientifi c community, shedding light on problems in GTR evolutionary models and Jeffrey’s priors (J effreys, 1946) . He suggested that we also run *BEAST using the HKY models human genetics, metagenomics, ecology, evolution, and paleobiol- only, because in this case Jeffreys prior provides better statisti- ogy ( Rothberg and Leamon, 2008) . We chose it mainly due to its cal properties for estimating the kappa parameter ( Drummond read lengths, providing individual gene topologies with potential et al., 2002 ). We tried this with a reduced data set of 37 acces- taxonomic resolution. sions containing all 22 Daucus ingroup species and representa- tive outgroups: A strodaucus littoralis , Caucalis platycarpos , Phylogenomic analysis — This study is the fi rst phylogenomic O rlaya daucoides, and T urgenia latifolia . ESS values were analysis of Daucus, the economically most important genus in the still lower than 100 for the posterior, prior and other parame- Apiaceae, using next-generation sequencing technology. M argotia ters, but higher than the ESS values of the previous analysis. gummifera was sister to those Daucus with 2 n = 18 chromosomes, Figure 7A p resents the coalescent analysis using 104 accessions concordant with S pooner et al. (2013) . Spalik and Downie (2007) of 27 species. The topologies of this analysis presented signifi cant and S palik et al. (2010) provide phylogenetic and chronogramic differences to the MP, ML, and BCA analyses that were largely analyses of D aucus based on ribosomal DNA sequence variation concordant with each other. The most notable changes in this spe- that includes more species than we used here, and with many con- cies tree analysis relative to MP, ML, and BCA are: (1) Clades A cordant results to our MP and ML concatenated results. Most nota- and B are no longer coherent; (2) A strodaucus littoralis , A mmi vis- bly, they support two main clades of Daucus , Daucus clade I and naga , Caucalis platycarpos, T orilis nodosa , O rlaya daucoides , O. D aucus clade II (here labeled as clade A and clade B). Further- daucorlaya , and T urgenia latifolia resulted as ingroups to D aucus. more, they also report additional non-D aucus species (P seudor- Figure 7B presents the analysis using a subset of 37 accessions of laya pumila and Turgenia latifolia) , which were sampled here, as 22 species. Again, clades A and B are no longer coherent. A stro- being within clade A. Our study uses multiple accessions per spe- daucus littoralis , C aucalis platycarpos , and T urgenia latifolia re- cies and indicates that it was not possible to clearly distinguish the sulted as ingroups to D aucus. subspecies of D. carota . Furthermore, as highlighted below, this 1676 AMERICAN JOURNAL OF BOTANY [Vol. 101 Fig. 2. Phylogeny of D aucus from a maximum parsimony analysis using a data set with the allele of highest coverage, homopolymers shortened to a maximum of 6 bp, and based on 94 nuclear orthologs and 107 accessions. All accessions with known locality data of D . capillifolius and D. carota col- lected in Libya and Tunisia are highlighted in red; most accessions of D . carota collected in Portugal and Spain are highlighted in blue. October 2014] ARBIZU ET AL.—PHYLOGENOMICS OF D AUCUS 1677 Fig. 3. Phylogeny of D aucus from a maximum parsimony analysis using a data set with the allele of highest coverage, homopolymers shortened to a maximum of 6 bp, and based on one nuclear ortholog (marker DC10366) and 107 accessions. The four accessions in the outgroup clade designated by double triangles are misplaced relative to the dominant tree topologies. 1678 AMERICAN JOURNAL OF BOTANY [Vol. 101 Fig. 4. Phylogeny of D aucus from a maximum parsimony analysis using a data set with the allele of highest coverage, homopolymers shortened to a maximum of 6 bp, and based on 10 nuclear orthologs and 107 accessions. October 2014] ARBIZU ET AL.—PHYLOGENOMICS OF DAUCUS 1679 Fig. 5. Phylogeny of D aucus from a maximum likelihood analysis using a data set with the allele of highest coverage, homopolymers shortened to a maximum of 6 bp, and based on 94 nuclear orthologs and 107 accessions. 1680 AMERICAN JOURNAL OF BOTANY [Vol. 101 F ig. 6. Primary concordance tree obtained with Bayesian concordance analysis in D aucus with 94 nuclear orthologs and 21 accessions. Numbers above the branches are the concordance factors, which do not show signifi cant differences for different α values (0.1, 1, 10, and infi nite). study provides data on well-resolved substructure within D . bro- subspecies designation), D . carota subsp. fontanesii and D. teri and D . guttatus that may indicate separate species status for carota subsp. m ajor , respectively. These names correspond to these accessions. We also provide a better-resolved substructure at those provided by the Germplasm Resources Information Network the base of clade A and in clade B. (GRIN) database. However, in light of our results, and after re- Our results supported two subclades within clade A′ that group evaluation of the morphological information at the Germplasm wild D aucus carota accessions collected in (1) Tunisia and Libya Resources Information Network, we tentatively labeled these and in (2) Portugal and Spain. This result partially matches that of accessions as D aucus guttatus 1. Further analyses, including I orizzo et al. (2013) , using SNP data, who grouped D . carota morphological information of all accessions, are needed to de- subsp. carota and D . capillifolius from northern Africa, separate termine whether there are more cases of misidentifi cation in the from D. carota from Europe. However, our results failed to sepa- germplasm bank of the USDA. rate D . carota subsp. c arota from subsp. g ummifer , a separation O ur fi nal data were reduced by 836 bp (0.74%) with the shorten- that was found by Iorizzo et al. (2013) . Clearly, the accessions of ing of homopolymers to a maximum of 6 bp. In addition, our data D. carota (and D. capillifolius ) are very closely related, and mul- set had 6.1% missing data. We did not observe signifi cant differ- tiple nuclear orthologs are inappropriate markers to examine their ences among the topology of trees with trimmed vs. untrimmed relationships. We are further exploring phylogenetic relationships homopolymers ( Fig. 2 vs. Appendix S3), showing that our aligned in clade A with SNP data gathered from genotyping by sequencing data set was not sensitive to a reduction of such data. We suspect (GBS) from many additional accessions of D . carota from other that this trend will be present in other large phylogenomic data sets. described subspecies and geographic areas. Our results failing to No major topological differences in simulated studies using miss- distinguish D . capillifolius , D. carota subsp. carota , and D. carota ing data on large alignments of eukaryotes were found by P hilippe subsp. gummifer could be a result of gene fl ow, or from multiple et al. (2004). To accurately reconstruct the phylogeny of an organ- origins of these morphotypes, or may be a result of the inappropri- ism, the number of genes used is considered a more important fac- ateness of nuclear orthologs to separate these closely related taxa. tor than taxon number ( Rokas and Carroll, 2005 ). It is useful to The fact that the taxa in clade A′ all share 18 chromosomes, experi- determine a number of genes that approaches the dominant topol- mental and fi eld data document ease of gene fl ow, and they are ogy of Daucus . We identifi ed gene DC10366 (aligned length: closely related as documented here suggest that they may be easily 1128 bp) that produced a tree with high bootstrap values in all incorporated in carrot breeding programs. major clades and resembled the dominant topology ( Fig. 2) , and a A ccession numbers 286611, 652387, and 25898 were grouped concatenated data set of 10 genes (11 480 bp) that produced even in clade B. Originally, they were identifi ed as D aucus carota (no better bootstrap values and topological concordance ( Fig. 4 ). October 2014] ARBIZU ET AL.—PHYLOGENOMICS OF DAUCUS 1681 F ig. 7. (A) Species tree based on a coalescent analysis using 104 accessions of 27 species of Daucus and outgroups using the models of evolution obtained by jModelTest. (B) Species tree based on a coalescent analysis using the same 37 accessions of 22 species of D aucus and outgroups using the HKY models modifi ed to be accepted in *BEAST with the lowest AIC value (T able 2 column 4). Number above the branches of both fi gures are posterior probabilities. According to Salichos and Rokas (2013), selecting genes with high based on a better-resolved tree topology. The two-allele alignment average bootstrap support reduces incongruences among many in- with ambiguities can be criticized because if the two alleles under- ternodes. In addition, concatenation of a set of genes with bootstrap went incomplete lineage sorting, they may not share the same tree; support higher than 60% can produce a species phylogeny similar the history of each gene in that alignment may not be tree-like. The to that obtained when using all genes together (S alichos and one-allele data set does not have this problem (personal communi- Rokas, 2013) . cation, Cécile Ané). We decided to use a data set containing a single allele instead of We designed our study to include 10 exons to explore their taxo- a data set containing two alleles merged into one in our analyses, nomic utility and to see if they would be more useful to resolve the 1682 AMERICAN JOURNAL OF BOTANY [Vol. 101 outgroups. As expected, the exonic regions had a smaller propor- T axonomy of Daucus — As discussed in the introduction, molec- tion of parsimony-informative characters as compared with the ular data place some species from nine non-D aucus genera in a intronic regions, but the outgroups resolved the same in both data D aucus clade and suggest the need to redefi ne the taxonomic bound- sets. aries of the genus. Lee et al. (2001) supported some species from C oncatenation of a large number of genes is not guaranteed to three of these genera, A grocharis, P achyctenium , and P seudorlaya, resolve phylogenetic relationships ( Blair and Murphy, 2011 ; Blair as nested within D aucus, based on a cladistic analysis of morpho- et al., 2012 ). In fact, Salichos and Rokas (2013) stated that the use logical data; the other six genera have yet to be examined morpho- of bootstrap support values on concatenated analyses of large data logically. However, the congruence of morphological and molecular sets should be abandoned. The concatenation method is justifi ed data provides strong support for a redefi nition of D aucus to include when a data set has evolved under the same underlying history, in species from these three genera, and perhaps more in the future. which differences in the estimated trees are due only to sampling The three well-supported clades of some accessions previously error or model misspecifi cation ( Baum, 2007 ). If this is not the case assigned to D . broteri, D. carota , and D . guttatus , and D . littoralis for our data set, as is very likely, differences among data sets will ( Table 1, Fig. 2) in the dominant topology provide strong support not be due to sampling error, but to genealogical discordance. for their recognition as three separate species. Their recognition as Bayesian concordance analysis (BCA) does not assume any single distinct species awaits further molecular and morphological stud- evolutionary history. Our concordance analyses of 94 nuclear or- ies of additional accessions. If such studies support distinct species thologs yielded a primary concordance tree, suggesting there are status, however, additional herbarium research of type specimens signifi cant discordant histories in D aucus genomes. Clades con- is needed to assign their proper taxonomic name. taining the subspecies of D aucus c arota have very low concor- Our present molecular study and the morphological studies of dance factors ( Fig. 6) . The clade containing D . sahariensis and D. Spooner et al. (2014) show the diffi culty of defi ning subspecies of syrticus has the highest concordance factor (0.69), indicating that D . carota . In addition, these studies and the SNP analysis of Ior- there are minor discordant histories in D . sahariensis and D . syrti- izzo et al. (2013) show D. capillifolius to be morphologically dis- cus genomes relative to D . carota subspecies genomes. tinct, yet nested within D. carota. These results and the shared As indicated already, our data set had 6.1% of missing data. chromosome numbers and ease of crossability (above) suggest that According to A né et al. (2007) , missing data represents a technical D. capillifolius may be better recognized as a subspecies of D. issue in BCA leading to mixing diffi culties. However, the standard carota , but we await our further SNP analyses of additional acces- deviation concordance factor of our analysis was less than 0.005, sions of D . capillifolius and D . carota before we consider this taxo- indicating a good mixing. Discordance between genes was previ- nomic change. ously reported in different plant species such as potatoes and toma- In summary, relative to our three goals outlined in the intro- toes ( Rodríguez et al., 2009) , rice ( Cranston et al., 2009) , animals duction, (1) for concatenated data sets, MP and ML analyses of such as salamanders ( Williams et al., 2013 ) and lizards (L eaché, the entire D aucus data set of 94 nuclear orthologs produced 2009 ), and plant pathogens such as P hytophthora sp. (B lair et al., mostly congruent trees with 100% bootstrap support for most of 2012 ). The reasons for the discordance in our data set could be the external and many of the internal clades. The BCA analysis explained by a number of causes, from methodological explana- showed a similar topology to the MP and ML trees, but high- tions such as alignment bias or undetected paralogy, to biological lighted the fact that there were often low proportions of genes reasons such as incomplete lineage sorting or hybridization that supported certain clades. (2) The coalescent analysis is no- ( Wendel and Doyle, 1998 ). However, P hilippe et al. (2011), demon tably different from the MP, ML, and BCA trees. At present, we strated that phylogenomics is relatively robust to the possible in- can only speculate on causes of discordance of our gene trees, clusion of nonorthologous sequences when the genuine phyloge- but our database is useful for future workers wishing to explore netic signal is abundant. Therefore, the most likely factors that may causes of discordance in D aucus and other organisms. (3) The be causing discordance are recombination, hybridization and intro- use of multiple nuclear orthologs and next-generation technolo- gression ( Rieseberg et al., 2000 ), and incomplete lineage sorting gies highlighted some diffi cult species groups in Daucus and ( Pamilo and Nei, 1988) . discovered misidentifi cations in germplasm collections. We T he results obtained using MP, ML, and BCA are notably identifi ed a useful subset of markers and methodological ap- different from the *BEAST tree. The multispecies coalescent proaches for future studies of dominant topologies in D aucus, approach implemented in *BEAST assumes that genealogical potentially saving time and resources. discordance is entirely due to incomplete lineage sorting, Since the initiation of our study, the Roche 454 sequencer is which is considered one of the most common causes of seri- being phased out of service and will not be available after 2016. ous diffi culties for phylogenetic inference ( Maddison and A repeat of our techniques could possibly use an Illumina Knowles, 2006; B aum and Smith, 2013 ). However, we know MiSeq platform, but read lengths currently are at a maximum of that there are other processes that can cause genealogical dis- 300 bp. Alternatively, the Pacifi c Bioscience platform could cordance. As a result, it is better to consider an alternative take full advantage of the entire length of the nuclear orthologs approach, the BCA. This method integrates over gene tree un- we examined ( Table 2) . certainty and does not make any particular assumption regard- ing the reason for discordance ( Larget et al., 2010) . Furthermore, LITERATURE CITED BCA uses a simple measure of the prior probability of gene- to-gene discordance to convert sequence data from multiple ANÉ , C . , B . LARGET , D . A. B AUM, S. D. S MITH , AND A. ROKAS. 2 007 . B ayesian genes into an estimate of the proportion of the genome for estimation of concordance among gene trees. M olecular Biology and Evolution 2 4 : 4 12– 426 . which any clade is true, its concordance factor (B aum and AYRES , D . , A. D ARLING , D. ZWICKL , P. BEERLI , M . T. HOLDER, P. O. LEWIS , Smith, 2013) . To date, there is not enough evidence to con- J. HUELSENBECK, ET AL . 2012 . BEAGLE: An application programming clude the cause of genealogical discordance in the D aucus interface and high-performance computing library for statistical phyloge- genome. netics. Systematic Biology 61 : 1 70 – 173. October 2014] ARBIZU ET AL.—PHYLOGENOMICS OF D AUCUS 1683 B AUM, D. 2007 . Concordance trees, concordance factors, and the exploration FARRIS , J. S. 1970 . M ethods for computing Wagner trees. S ystematic Zoology of reticulate genealogy. T axon 5 6 : 4 17 – 426. 19 : 83– 9 2 . B AUM , D. A., AND S. D. SMITH. 2013 . Tree thinking: An introduction to phy- F ELSENSTEIN , J. 1 985 . Confi dence limits on phylogenies: An approach using logenetic biology. Roberts and Co., Greenwood Village, Colorado, USA. the bootstrap. E volution 39 : 7 83– 7 91. B LAIR , C. , AND R. W. MURPHY . 2011. Recent trends in molecular phylogenetic FITCH , W. 1971. Toward defi ning the course of evolution: Minimum analysis: Where to next? J ournal of Heredity 102 : 130 –1 38 . change for a specifi c tree topology. S ystematic Zoology 2 0: BLAIR , J. E. , M. D. C OFFEY , AND F . N. M ARTIN. 2012. Species tree estimation 406 – 416. for the late blight pathogen, P hytophthora infestans, and close relatives. F AO [FOOD AND AGRICULTURAL ORGANIZATION OF THE UNITED NATIONS]. 2012 . P LoS ONE 7 : e37003. FAOSTAT, Rome, Italy. Website h ttp://www.faostat.fao.org [accessed BRANDENBURG , W. A. 1981 . P ossible relationships between wild and culti- 4 March 2014]. vated carrots (D aucus carota L.) in the Netherlands. D ie Kulturpfl anze FULTON , T . M. , R. V AN DER HOEVEN , N. T. EANNETTA , AND S . D. TANKSLEY. 2 9: 3 69– 3 75 . 2002 . I dentifi cation, analysis, and utilization of conserved ortholog CAI , D . , F. RODRÍGUEZ , Y . T ENG, C. A NÉ , M. BONIERBALE , L. A. MUELLER , AND set markers for comparative genomics in higher plants. Plant Cell 1 4: D. M. S POONER . 2 012 . Single copy nuclear gene analysis of polyploidy 1457– 1 467 . in wild potatoes (S olanum section P etota ). B MC Evolutionary Biology G ERNHARD , T. 2008. T he conditioned reconstructed process . J ournal of 1 2: 7 0. Theoretical Biology 253 : 7 69– 7 78 . C AMACHO C. , G. C OULOURIS, V. AVAGYAN, N. MA, J. P APADOPOULOS, K. B EALER , G RZEBELUS, D., R. BARANSKI , K. SPALIK , C. ALLENDER , AND P . W. SIMON . AND T. L. M ADDEN . 2 009. B LAST+: Architecture and applications. B MC 2011 . D aucus . In C. Kole [ed.], Wild crop relatives: Genomic and Bioinformatics 1 0: 421. breeding resources, vol. 5, Vegetables, 129–216. Springer Verlag, CHEN, F . C. , AND W. H. LI . 2001 . G enomic divergences between humans Berlin, Germany. and other hominoids and the effective population size of the common H AUSER , T. P. 2002. F rost sensitivity of hybrids between wild and cultivated ancestor of humans and chimpanzees. A merican Journal of Human carrots. C onservation Genetics 3 : 73– 76. Genetics 68: 4 44 –4 56 . H AUSER , T. P. , AND G. K. BJØRN. 2001 . Hybrids between wild and cultivated CHEVREUX , B. , T. W ETTER , AND S. SUHAI. 1 999 . G enome sequence assembly carrots in Danish carrot fi elds. Genetic Resources and Crop Evolution using trace signals and additional sequence information computer science 4 8: 499– 506. and biology . C omputer Science and Biology: Proceedings of the German IORIZZO, M. , D. SENALIK , S . ELLISON, P. CAVAGNARO , S . C HENG , P. Z HENG, Conference on Bioinformatics (GCB) 9 9 : 4 5– 56 . Z. Z HENG , ET AL. 2014 . The building of the fi rst Apiaceae genome. C HRISTELOVÁ, P ., M . VALÁRIK , E. H IBOVÁ, E. DE LANGHE , AND J . DOLEŽEL . Plant and Animal Genome XXII, San Diego, California, USA [on- 2 011. A multi gene sequence-based phylogeny of the Musaceae (ba- line abstract]. Website h ttps://pag.confex.com/pag/xxii/webprogram/ nana) family. B MC Evolutionary Biology 1 1 : 1 03. Paper11512.html. C RANSTON , K., B. HURWITZ, D. WARE, L. STEIN , AND R . A. WING. 2 009. S pecies I ORIZZO, M ., D. S ENALIK , S. ELLISON , D. GRZEBELUS , P. CAVAGNARO , C. trees from highly incongruent gene trees in rice. S ystematic Biology 5 8: A LLENDER , J . BRUNET , E T AL . 2 013. G enetic structure and domestication 489 – 500. of carrot (D aucus carota subsp. s ativus L.) (Apiaceae). A merican Journal DARRIBA , D. , G. L. TABOADA , R. DOALLO , AND D. POSADA. 2012 . jModelT- of Botany 100: 930 – 938. est 2: More models, new heuristics and parallel computing. N ature I ORIZZO , M. , D. A. SENALIK , D. G RZEBELUS, M. B OWMAN , P . F. CAVAGNARO, M. Methods 9 : 7 72. MATVIENKO , H. ASHRAFI , ET AL . 2011 . D e novo assembly and character- D ELSUC , F. , H . BRINKMANN , AND H . P HILIPPE. 2 005 . Phylogenomics and ization of the carrot transcriptome reveals novel genes, new markers, and the reconstruction of the tree of life. N ature Reviews. Genetics 6 : genetic diversity. B MC Genomics 12 : 389. 361– 3 75 . I OVENE, M., E. GRZEBELUS, D. C ARPUTO, J . JIANG, AND P . W. S IMON . D OOLITTLE , W. F. 1999 . Lateral genomics. Trends in Cell Biology 9: M5– M8 . 2 008 . Major cytogenetic landmarks and karyotype analysis in D OWNIE, S. R., D. S. KATZ-DOWNIE, AND M . F. W ATSON . 2 000 . A phylogeny of D aucus carota and other Apiaceae. A merican Journal of Botany 9 5 : the fl owering plant family Apiaceae based on chloroplast DNA r pl16 and 7 93 – 804 . rpoc1 intron sequences: Towards a suprageneric classifi cation of subfam- JEFFREYS , H. 1 946 . An invariant form for the prior probability in estimation ily Apioideae . A merican Journal of Botany 8 7 : 273 – 292 . problems. Proceedings of the Royal Society of London, A, Mathematical DRESSMAN , D . , H. YAN , G. TRAVERSO , K. W. KINZLER , AND B. V OGELSTEIN . and Physical Sciences 1 86 : 4 53 –4 61 . 2 003. Transforming single DNA molecules into fl uorescent magnetic KOLACZKOWSKI , B. , AND J. W. THORNTON . 2004. P erformance of maximum particles for detection and enumeration of genetic variations. P roceedings parsimony and maximum likelihood phylogenetics when evolution is of the National Academy of Sciences, USA 100 : 8817 - 8822 . heterogeneous. N ature 4 31 : 9 80 –9 84 . D RUMMOND , A. J., G. K. N ICHOLLS , A . G. R ODRIGO, AND W . S OLOMON . 2002 . KOONIN, E. V. 2 005. Orthologs, paralogs, and evolutionary genomics. Annual Estimating mutation parameters, population history and genealogy si- Review of Genetics 39 : 309– 338. multaneously from temporally spaced sequence data. G enetics 1 61 : KOREN, S., M. C. SCHATZ , B . P. W ALENZ , J . M ARTIN, J. T. HOWARD , G. 1307 –1 320 . G ANAPATHY, Z. WANG , ET AL . 2 012 . H ybrid error correction and d e novo D RUMMOND , A. J. , M. A. S UCHARD , D. X IE, AND A . RAMBAUT . 2 012 . B ayesian assembly of single-molecule sequencing reads. Nature Biotechnology phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and 3 0 : 693 – 700 . Evolution 29: 1969 –1 973 . K RICKL , M. 1961 . Karotten: zur Frage der Verkreuzung mit der wilden DUNN , C . W. , A. HEJNOL, D . Q. M ATUS , K . PANG, W. E. BROWNE , S. A. S MITH , Karotte. S aatgut-Wirtschaft 13: 1 35 – 136 . E. S EAVER , ET AL . 2008 . Broad phylogenomics sampling improves reso- KURTZ , S. , A. PHILLIPPY , A. L. DELCHER , M. SMOOT, M. S HUMWAY , C. lution of the animal tree of life. N ature 4 52 : 745 – 749 . ANTONESCU, AND S . L. S ALZBERG. 2 004. Versatile and open software for EDGAR , R . C. 2 004 . MUSCLE: Multiple sequence alignment with high accu- comparing large genomes . G enome Biology 5 : R12 . racy and high throughput. N ucleic Acids Research 32: 1 792 –1 797 . LANG , J. M., A. E. D ARLING , AND J. A. EISEN. 2 013 . Phylogeny of bacterial and E DWARDS , V . 2009 . Is a new and general theory of molecular systematics archaeal genomes using conserved genes: Supertrees and supermatrices. emerging? E volution 6 3 : 1– 19 . P LoS ONE 8 : e62510 . EGAN , A. , J. SCHLUETER, AND D . M. S POONER . 2 012. A pplications of next- L ARGET , B., S. K. KOTHA , C. N. DEWEY, AND C. A NÉ. 2010. BUCKy: Gene generation sequencing in plant biology. A merican Journal of Botany tree/species tree reconciliation with the Bayesian concordance analysis. 9 9 : 1 75– 185 . Bioinformatics 26 : 2 910– 2911 . ELLIS , P . R. , J. A. H ARDMAN , T . C. C ROWTHER, AND P. L. S AW. 1 993 . E xploitation LEACHÉ , A. D. 2 009. Species tree discordance traces to phylogeographic clade of the resistance to carrot fl y in the wild carrot species D aucus capillifo- boundaries in North American fence lizards (S celoporus) . S ystematic lius. Annals of Applied Biology 122: 79 – 91 . Biology 5 8: 5 47 – 559. 1684 AMERICAN JOURNAL OF BOTANY [Vol. 101 L EE , B . Y., AND S . R. D OWNIE. 2000 . Phylogenetic analysis of cpDNA restric- P LUNKETT, G . M., D . E. SOLTIS, AND P . S. S OLTIS . 1996 . E volutionary tion sites and r ps l6 intron sequences reveals relationships among Apiaceae patterns in Apiaceae: Inferences based on m atK sequence data. tribes Caucalideae, Scandiceae and related taxa. P lant Systematics and S ystematic Botany 21 : 477– 4 95. Evolution 221 : 35 –6 0 . RIESEBERG, L. H. , S. J. BAIRD , AND K . A. G ARDNER . 2 000. H ybridization, LEE , B . Y. , G. A. LEVIN , AND S. R. DOWNIE . 2001. Relationships within introgression, and linkage evolution. P lant Molecular Biology 4 2 : the spiny-fruited umbellifers (Scandiceae subtribes Daucinae and 2 05– 224 . Torilidinae) as assessed by phylogenetic analysis of morphological R OCHE. 2 009 . em-PCR method manual—Lib-A SV. Website http://www. characters. S ystematic Botany 26 : 622– 642 . high-throughput-sequencing.com/manuals_roche/emPCR_liba_sv_ L EVIN , R. A., A. W HELAN , AND J. S. MILLER . 2009 . The utility of nuclear Oct2009.pdf [accessed 18 July 2014]. conserved ortholog set II (COSII) genomic regions for species- R OCHE. 2 010 . Rapid library preparation method manual. Website level phylogenetic inference in Lycium (Solanaceae). M olecular h ttp://dna.uga.edu/wp-content/uploads/2013/12/GS-FLX-Titanium- Phylogenetics and Evolution 53: 881 – 890 . RapidLibrary-RL-Preparation-Method-Manual-Roche.pdf [accessed L EWIS, P. O., M . T. H OLDER, AND K. E. HOLSINGER . 2005 . Polytomies 18 July 2014]. and Bayesian phylogenetic inference. S ystematic Biology 54: R ODRÍGUEZ , F. , F . WU , C . A NÉ, S. T ANKSLEY, AND D . M. S POONER . 2 009 . Do 241– 253. potatoes and tomatoes have a single evolutionary history, and what L I , M. , J. WUNDER , G. B ISSOLI , E. SCARPONI , S. GAZZANI , E. BARBARO , H. proportion of the genome supports this history? B MC Evolutionary S AEDLER , AND C. V AROTTO. 2 008 . Development of COS genes as Biology 9 : 1 91. universally amplifi able markers for phylogenetic reconstructions of ROKAS, A . , AND S. B. C ARROLL. 2005 . More genes or more taxa? The rela- closely related plant species. C ladistics 24 : 7 27– 745. tive contribution of gene number and taxon number to phylogenetic M ADDISON, D. R. , AND W. P. MADDISON . 2005 . MacClade 4.08a: accuracy. M olecular Biology and Evolution 2 2 : 1337 – 1344 . Analysis of phylogeny and character evolution. Sinauer, Sunderland, ROKAS, A. , AND S. B. C ARROLL . 2006 . Bushes in the tree of life. P LoS Massachusetts, USA. Biology 4: e352 . MADDISON , W . P. , AND L. L. KNOWLES. 2 006. I nferring phylogeny despite ROKAS , A., D. KRUGER, AND S. CARROLL. 2 005 . A nimal evolution and the incomplete lineage sorting. S ystematic Biology 5 5 : 2 1 –3 0 . molecular signature of radiations compressed in time. S cience 3 10 : MARGULIES, M. , M. EGHOLM , W. A LTMAN , S. A TTIYA , J. BADER , L . BEMBEN , 1 933– 1938 . J. BERKA, E T AL . 2005 . Genome sequencing in microfabricated high- ROKAS , A ., B. L. WILLIAMS , N. KING, AND S . B. C ARROLL. 2003 . G enome- density picolitre reactors. N ature 437: 376– 3 80. scale approaches to resolving incongruence in molecular phylogenies. MCCOLLUM , G. D. 1975 . Interspecifi c hybrid D aucus carota × D . capil- N ature 425: 798– 804 . lifolius. Botanical Gazette (Chicago, Ill.) 136 : 201– 206 . RONQUIST, F. , M. TESLENKO, P . VAN DER MARK , D. L. AYRES , A . D ARLING , MCCOLLUM, G . D. 1977 . Hybrids of D aucus gingidium with cultivated S . HÖHNA, B . L ARGET , E T AL. 2012 . M rBayes 3.2: Effi cient Bayesian carrots (D . carota subsp. sativus) and D. capillifolius. Botanical phylogenetic inference and model choice across a large model space. Gazette 138 : 56 – 63 . S ystematic Biology 6 1 : 539 –5 42. M CCORMACK, J . E. , S. HIRD , A . ZELLMER, B . CARSTENS , AND R. B RUMFIELD . R OTHBERG, J. M. , AND J . H. LEAMON . 2008. The development and impact 2 013 . A pplications of next-generation sequencing to phylogeogra- of 454 sequencing. N ature Biotechnology 2 6 : 1117– 1124. phy and phylogenetics. M olecular Phylogenetics and Evolution 66: R UBATZKY , V. E. , C. F. Q UIROS, AND P. W. SIMON . 1 999 . Carrots and re- 526– 538 . lated vegetable Umbelliferae. CABI, New York, New York, USA. M ILLER , M. A., W . P FEIFFER, AND T. SCHWARTZ . 2010 . Creating the SAENZ DE RIVAS , C., AND V . H. HEYWOOD . 1974. Preliminary study on the CIPRES Science Gateway for inference of large phylogenetic trees. Daucus species of peninsular Spain). A nales del Jardin Botánico A. J. Proceedings of the Gateway Computing Environments Workshop Canavilles 31: 97 –1 18. (GCE) in New Orleans, 2010, 1–8, USA. SÁENZ LAÍN , C . 1981. Research on Daucus L. (Umbelliferae). A nales del MOSSEL , E. , AND E . VIGODA . 2005 . Phylogenetic MCMC algorithms are Jardin Botánico de Madrid 3 7: 481 –5 34. misleading on mixtures of trees. S cience 309: 2 207 –2 209 . S ALICHOS, L ., AND A. ROKAS . 2 013. Inferring ancient divergences requires N OTHNAGEL , T. , P . STRAKA, AND B. LINKE . 2000 . Male sterility in popula- genes with strong phylogenetic signals. N ature 497 : 3 27 – 333 . tions of D aucus and the development of alloplasmic male-sterile lines SANGER , F . , S. NICKLEN , AND A . R. C OULSON. 1977. D NA sequencing with of carrot. P lant Breeding 119 : 145 – 152 . chain-terminating inhibitors. P roceedings of the National Academy of O WCZARZY , R., A . V. T ATAUROV , Y . W U , J. A. MANTHEY, K. A. MCQUISTEN , Sciences, USA 7 4 : 5463– 5467. H . G. A LMABRAZI, K. F. PEDERSEN, E T AL . 2 008 . I DT SciTools: A SCHIERWATER, B. , M. EITEL, W. JAKOB , H. J. OSIGUS, H . HADRYS , S. suite for analysis and design of nucleic acid oligomers. N ucleic Acids L. DELLAPORTA , S . O. KOLOKOTRONIS, AND R . DESALLE . 2009 . Research 3 6 : W 163 – W169. C oncatenated analysis sheds light on early metazoan evolution PACIFIC BIOSCIENCES. 2013 . Pacifi c Biosciences introduces new chemistry and fuels a modern ‘‘urmetazoon’’ hypothesis. P LoS Biology 7: with longer read lengths to detect novel features in DNA sequence and e1000020. advance genome studies of large organisms. Website h ttp://investor. S HENDURE , J ., AND H. J I. 2 008 . N ext-generation DNA sequencing. N ature pacifi cbiosciences.com/releasedetail.cfm?ReleaseID=794692 [accessed Biotechnology 26 : 1 135– 1145 . 9 March 2014]. SIMON , P. , L . POLLAK , B. CLEVIDENCE , J. H OLDEN , AND D. H AYTOWITZ. 2009. PAGE, R. D. M. , AND M . A. C HARLESTON. 1 997. From gene to organismal Plant breeding for human nutrition. Plant Breeding Reviews 31 : phylogeny: Reconciled trees and the gene tree species tree problem. 3 25 – 392 . M olecular Phylogenetics and Evolution 7 : 231 –2 40 . S OLTIS, D. E., M. A. GITZENDANNER , G. STULL, M. C HESTER, A. CHANDERBALI, P AMILO , P. , AND M. NEI. 1 988 . Relationships between gene trees and spe- S. C HAMALA , I. J ORDON-THADEN, ET AL. 2013 . The potential of genom- cies trees. M olecular Biology and Evolution 5: 568 –5 83 . ics in plant systematics. T axon 62 : 886– 898 . P HILIPPE , H ., H. B RINKMANN, D . V. LAVROV , D. T. J. L ITTLEWOOD, M. S PALIK, K. , AND S . R. DOWNIE. 2 007. Intercontinental disjunctions in MANUEL , G. WORHEIDE , AND D. BAURAIN . 2 011. R esolving diffi cult Cryptotaenia (Apiaceae, Oenantheae): An appraisal using molecular phylogenetic questions: Why more sequences are not enough. P LoS data. Journal of Biogeography 34 : 2039 – 2054 . Biology 9 : e 1000602 . SPALIK , K. , M. PIWCZYŃSKI , C . A. DANDERSON , R. KURZYNA-MŁYNIK , T. S. PHILIPPE , H . , E. A. SNELL, E. BAPTESTE , P. L OPEZ , P. W. H. H OLLAND, AND BONE, AND S. R. DOWNIE. 2 010 . Amphitropic amphiantarctic disjunc- D . CASANE . 2004. P hylogenomics of eukaryotes: Impact of miss- tions in Apiaceae subfamily Apioideae. J ournal of Biogeography 3 7: ing data on large alignments . M olecular Biology and Evolution 21: 1 977– 1 994. 1740 – 1752. S POONER, D . M., P. R OJAS , M . BONIERBALE, L. A. MUELLER , M . SRIVASTAV , PIMENOV, M. G. , AND M. V. LEONOV. 1993 . The genera of Umbelliferae: A D. S ENALIK , AND P . SIMON . 2013 . Molecular phylogeny of D aucus. nomenclator. Royal Botanic Gardens, Kew, UK. Systematic Botany 38 : 850 – 857 . October 2014] ARBIZU ET AL.—PHYLOGENOMICS OF D AUCUS 1685 SPOONER , D. M. , M. P. WIDRLECHNER , K. R. REITSMA , D. E. PALMQUIST , S . VIVEK , B . S., AND P . SIMON . 1 999 . Phylogeny and relationships in R OUZ , Z. GHRABI-GAMMAR, M. NEFFATI, ET AL . 2 014 . R eassessment of D aucus based on restriction fragment length polymorphisms practical subspecies identifi cations of the USDA D aucus carota germ- (RFLPs) of the chloroplast and mitochondrial genomes. E uphytica plasm collection: Morphological data. C rop Science 54 : 706– 7 18 . 105 : 1 83– 1 89 . ST. PIERRE , M. D. , R . J. BAYER , AND I . M. WEISS. 1990 . An isozyme-based WENDEL , J. F. , AND J . J. DOYLE . 1 998 . P hylogenetic incongruence: assessment of the genetic variability within the Daucus carota complex Window into genome history and molecular evolution. In D. E. (Apiaceae: Caucalideae). C anadian Journal of Botany 68 : 2449 – 2457. Soltis, P. S. Soltis, and J. J. Doyle [eds.], Molecular systemat- STAMATAKIS , A. 2014. R AxML version 8: A tool for phylogenetic anal- ics of plants II: DNA sequencing, 265–296. Kluwer, Boston, ysis and post-analysis of large phylogenies. B ioinformatics 3 0: Massachusetts, USA. 1 312– 1 313 . WIJNHEIJMER , E. H. M., W . A. B RANDENBURG , AND S. J. TER BORG . 1 989. STEINBORN, R. , B. LINKE , T. NOTHNAGEL , AND T. BÖRNER . 1995 . I nheritance I nteractions between wild and cultivated carrots (D aucus carota L.). of chloroplast and mitochondrial DNA in alloplasmic forms of the Euphytica 4 0: 147 – 154. genus D aucus. T heoretical and Applied Genetics 91 : 6 32 – 638 . WILLIAMS , J . S., J. H. NIEDZWIECKI, AND D. W. WEISROCK . 2013 . Species S WOFFORD , D. 2002 . PAUP*: Phylogenetic analysis using parsimony (*and tree reconstruction of a poorly resolved clade of salaman- other methods), version 4.0a131. Sinauer, Sunderland, Massachusetts. ders (Ambystomatidae) using multiple nuclear loci. M olecular T AKAHATA, N . 1 989. Gene genealogy in three related populations: Phylogenetics and Evolution 6 8 : 6 71– 6 82 . Consistency probability between gene and population trees. G enetics WU , F ., L . A. MUELLER , D . C ROUZILLAT, V . P ÉTIARD, AND S . D. TANKSLEY. 1 22 : 957 – 966. 2 006. C ombining bioinformatics and phylogenetics to identify large UMIEL, N . , R. JACOBSON , AND D. GLOBERSON. 1 975 . Pollination of the culti- sets of single-copy orthologous genes (COSII) for comparative, evo- vated carrot ( Daucus carota L.) by the wild carrot (D . carota var. max- lutionary and systematic studies: A test case in the euasterid plant imus) , and its implications on commercial seed production. H assadeh clade. G enetics 1 74 : 1407 – 1420 . 5 6: 478– 480 [ In Hebrew, English summary] . ZWICKL , D. J. 2 006 . Genetic algorithm approaches for the phylogenetic USDA [U. S. DEPARTMENT OF AGRICULTURE], NATIONAL AGRICULTURAL analysis of large biological sequence datasets under the maximum STATISTICS SERVICE . 2012 . Data and statistics. Website http://www.nass. likelihood criterion. Ph.D. dissertation, University of Texas at Austin, usda.gov/Data_and_Statistics/index.asp [accessed 4 March 2014]. Austin, Texas, USA.