| Abstract Detail
Systematics Joseph, Camryn [1], Stadelmann, Karoline [1], Endara, Lorena [2]. The New Language of Science: A novel approach to generating a phenomic matrix of gymnosperms. We used a semi-automated Natural Language Processing approach to extract phenomic traits and assemble a Taxon/Character matrix for conifers and gnetales. Digital taxonomic descriptions were obtained for all the extant lineages of Conifers, Gnetophytes, and Ginkgo, from the Gymnosperm Database , and the Floras of North America, China and Pakistan. The syntactic analysis of the descriptions (parsing) and the assemblage of the matrix was performed using the ‘Text Capture’ and ‘Matrix Generation’ tools of the Explorer of Taxon Concepts (ETC) pipeline (http://etc.cs.umb.edu/etcsite/start.html#HomePlace:). The output by ETC (raw matrix) was evaluated and discretized using the ‘MatrixConversion’ open source software (https://github.com/gburleigh/MatrixConverter/tree/master/distribution). The resulting Gymnosperm matrix consists of 4452 characters for 650 taxa that represent (96% of the targeted species), and it will be used to reconstruct the evolutionary history of “Flagellate Plants”. Log in to add this item to your schedule
Related Links: Natural Language Processing Pipeline MatrixConverter software to discretize characters Explorer of Taxon Concepts: Natural Language Processing Pipeline to parse text and generate the matrix Floras used in this study
1 - University of Florida, Biology, PO Box 118525, Gainesville, FLORIDA, 32611, United States 2 - University of Florida, Biology, Carr Hall, 217, PO Box 118525, Gainesville, FL, 32611, USA
Keywords: Natural Language Processing Semi-automated approach gymnosperms Phenomic matrix.
Presentation Type: Poster Session: P, Systematics Section/ASPT Posters Location: Exhibit Hall/Savannah International Trade and Convention Center Date: Monday, August 1st, 2016 Time: 5:30 PM This poster will be presented at 5:30 pm. The Poster Session runs from 5:30 pm to 7:00 pm. Posters with odd poster numbers are presented at 5:30 pm, and posters with even poster numbers are presented at 6:15 pm. Number: PSY009 Abstract ID:688 Candidate for Awards:None |