Create your own conference schedule! Click here for full instructions

Abstract Detail



Systematics

Joseph, Camryn [1], Stadelmann, Karoline [1], Endara, Lorena [2].

The New Language of Science: A novel approach to generating a phenomic matrix of gymnosperms.

We used a semi-automated Natural Language Processing approach to extract phenomic traits and assemble a Taxon/Character matrix for conifers and gnetales. Digital taxonomic descriptions were obtained for all the extant lineages of Conifers, Gnetophytes, and Ginkgo, from the Gymnosperm Database , and the Floras of North America, China and Pakistan. The syntactic analysis of the descriptions (parsing) and the assemblage of the matrix was performed using the ‘Text Capture’ and ‘Matrix Generation’ tools of the Explorer of Taxon Concepts (ETC) pipeline (http://etc.cs.umb.edu/etcsite/start.html#HomePlace:). The output by ETC (raw matrix) was evaluated and discretized using the ‘MatrixConversion’ open source software (https://github.com/gburleigh/MatrixConverter/tree/master/distribution).
The resulting Gymnosperm matrix consists of 4452 characters for 650 taxa that represent (96% of the targeted species), and it will be used to reconstruct the evolutionary history of “Flagellate Plants”.


Log in to add this item to your schedule

Related Links:
Natural Language Processing Pipeline
MatrixConverter software to discretize characters
Explorer of Taxon Concepts: Natural Language Processing Pipeline to parse text and generate the matrix
Floras used in this study


1 - University of Florida, Biology, PO Box 118525, Gainesville, FLORIDA, 32611, United States
2 - University of Florida, Biology, Carr Hall, 217, PO Box 118525, Gainesville, FL, 32611, USA

Keywords:
Natural Language Processing
Semi-automated approach
gymnosperms
Phenomic matrix.

Presentation Type: Poster
Session: P, Systematics Section/ASPT Posters
Location: Exhibit Hall/Savannah International Trade and Convention Center
Date: Monday, August 1st, 2016
Time: 5:30 PM This poster will be presented at 5:30 pm. The Poster Session runs from 5:30 pm to 7:00 pm. Posters with odd poster numbers are presented at 5:30 pm, and posters with even poster numbers are presented at 6:15 pm.
Number: PSY009
Abstract ID:688
Candidate for Awards:None


Copyright © 2000-2016, Botanical Society of America. All rights reserved