User Guide
Why can I only view 3 results?
You can also view all results when you are connected from the network of member institutions only. For non-member institutions, we are opening a 1-month free trial version if institution officials apply.
So many results that aren't mine?
References in many bibliographies are sometimes referred to as "Surname, I", so the citations of academics whose Surname and initials are the same may occasionally interfere. This problem is often the case with citation indexes all over the world.
How can I see only citations to my article?
After searching the name of your article, you can see the references to the article you selected as soon as you click on the details section.
 ASOS INDEKS
 Views 14
Knowledge Extraction from Specimen-Derived Data from GenBank to Enrich Biodiversity Information
2021
Journal:  
Biodiversity Information Science and Standards
Author:  
Abstract:

DNA barcoding and environmental DNA (eDNA) are increasing the need for the utilization of gene sequences in the field of biodiversity. GBIF (Global Biodiversity Information Facility) and GGBN (Global Genome Biodiversity Network) are taking action on the treatment of gene sequences in the field of biodiversity (Finstad et al. 2020). Gene sequences have been collected and published by INSDC (International Nucleotide Sequence Database Collaboration) for over 30 years (Arita et al. 2020). Biodiversity information has been collected using standards such as Darwin Core (Wieczorek et al. 2012), but INSDC gene sequences are stored in their own format. In the field of bioinformatics, researchers are also organizing the BioHackathon series, notably the NBDC/DBCLS BioHackathon and the spin-off Biohackathon Europe,  to standardize data through the Semantic Web (Garcia Castro et al. 2021, Vos et al. 2020), but the linkage with biodiversity information has just begun.In this study, as an example of linking gene sequence information with biodiversity information, I attempted to construct an infrastructure for knowledge extraction by utilising gene sequence entries derived from museum specimens from GenBank (Sayers et al. 2020). I have previously surveyed the BOLD (The Barcode of Life Data System) (Ratnasingham and Hebert 2007) IDs listed in GenBank (Nakazato 2020). I downloaded the fish and insect data from the GenBank FTP (file transfer protocol) site. Then I extracted the descriptions in the "specimen_voucher" field and obtained 749,627 (28% of the fish entries in GenBank) and 1,621,890 (13%) specimen IDs, respectively. I also extracted from the "note" field approximately 1000 entries describing the type of the specimen, such as "holotype", "lectotype", and "paratype". These extracts include descriptions written in natural language. NCBI (National Center for Biotechnology Information) publishes the BioCollections database (Sharma et al. 2019), and these data may be able to refine the description.In the future, I plan to map these extracted IDs to the collection IDs in the biodiversity information database. This will enable us to enrich the biodiversity information with GenBank descriptions, for example, by adding articles listed in GenBank as references to the specimen data.

Keywords:

0
2021
Author:  
Keywords:

Citation Owners
Information: There is no ciation to this publication.
Similar Articles






Biodiversity Information Science and Standards

Journal Type :   Uluslararası

Biodiversity Information Science and Standards