Enhancing Plant Species Retrieval in Flora Through Language Model Integration

De-Kai Kao,Chih-Kai Yang,Chien-Hsing Chen

doi:10.3897/biss.8.142132

Abstract

Traditionally, textual data storage and retrieval systems were designed primarily for human reading, mainly relying on paper records. However, as information technology has advanced, computerized searches have become common. However, Boolean logic-based data retrieval systems often struggle to handle data's diversity and richness effectively. These systems rely on strict matching rules, which can lead to either too few or too many results. For example, when searching for plant species descriptions, a query like "circle" AND "ellipse" may exclude relevant records that describe these traits using slightly different terms (e.g., "round" or "oval"). Conversely, broader queries like "oblong" may return an overwhelming number of irrelevant results. This rigidity limits the system's ability to adapt to the nuanced and varied ways users describe data. With the advent of advanced semantic models such as SBERT (Sentence-Bidirectional Encoder Representations from Transformers) (Reimers and Gurevych 2019), we can now delve deeper into the semantic relationships within textual data. Unlike general-purpose large language models, SBERT is specifically designed for efficient semantic similarity computation. In plant taxonomy, records in Flora, such as Flora of Taiwan or Flora of China, play a crucial role in understanding plant diversity in specific regions. These records provide critical information on plant growth environments, morphological characteristics, and economic values. Our research aims to enhance the efficiency of retrieving plant data using language models. Specifically, we transform textual descriptions from Flora and user queries into vector representations (Fig. 2) and calculate their cosine similarity to determine the relevance between user inputs and species records. Cosine similarity, a metric commonly used in text mining and information retrieval, quantifies the similarity between two vectors by measuring the cosine of the angle between them. The similarity score ranges from -1 (completely dissimilar) to 1 (identical), where higher scores indicate greater similarity. By applying this method, we can provide users with ranked scores of plant species related to their queries (Fig. 1). This approach not only streamlines data retrieval but also introduces new perspectives for botanical research and data management, fostering a more efficient exploration of plant diversity. Our results demonstrate the potential of language models to facilitate biodiversity research and data management, especially in retrieving plant taxonomy information. Our approach provides a novel tool for future biodiversity data analysis and retrieval, thereby contributing to the progress of biodiversity conservation.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancing Plant Species Retrieval in Flora Through Language Model Integration

Abstract

Published Version

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards

Lead the way for us

Journal: Biodiversity Information Science and Standards	Publication Date: Nov 19, 2024
License type: CC BY 4.0

Similar Papers

Global Open Biodiversity Data: Future Vision of FAIR Biodiversity Data Access, Management, Use and Stewardship
Jurate De Prins
Biodiversity Information Science and Standards | VOL. 3
Jurate De PrinsJurate De Prins
19 Jun 2019
Biodiversity Information Science and Standards | VOL. 3

LUNARINFO: A Data Archiving and Retrieving System for the Circumlunar Explorer Based on XML/Web Services
Zuo Wei ... Xu Tao
Acta Geologica Sinica - English Edition | VOL. 78
Zuo Wei, et. al.Zuo Wei ... Xu Tao
01 Oct 2004
Acta Geologica Sinica - English Edition | VOL. 78

Using SOA and RIAs for water data discovery and retrieval
Mutao Huang ... Yong Tian
Environmental Modelling and Software | VOL. 26
Mutao Huang, et. al.Mutao Huang ... Yong Tian
15 Jun 2011
Environmental Modelling and Software | VOL. 26

Information retrieval system: impacts of water-level changes on uses of federal storage reservoirs of the Columbia River.
D.H Fickeisen ... M.A Simmons
-
D.H Fickeisen, et. al.D.H Fickeisen ... M.A Simmons
01 Sep 1982
01 Sep 1982

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing Plant Species Retrieval in Flora Through Language Model Integration

Abstract

Published Version

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards