User Guide
Why can I only view 3 results?
You can also view all results when you are connected from the network of member institutions only. For non-member institutions, we are opening a 1-month free trial version if institution officials apply.
So many results that aren't mine?
References in many bibliographies are sometimes referred to as "Surname, I", so the citations of academics whose Surname and initials are the same may occasionally interfere. This problem is often the case with citation indexes all over the world.
How can I see only citations to my article?
After searching the name of your article, you can see the references to the article you selected as soon as you click on the details section.
 Views 14
 Downloands 2
Self-Supervised Model for Speech Tasks with Hugging face Transformers
2021
Journal:  
Turkish Online Journal of Qualitative Inquiry
Author:  
Abstract:

For many years, speech recognition has been a focus of research. Automatic speech recognition (ASR) is the process for converting a speech signal into its corresponding sequence of words or other linguistic entities using algorithms implemented in a device. As our work and life are becoming  integrated with mobile devices, such as tablets and smartphones (e.g., Amazon Alexa , Siri, Google Now, and Cortana), speech recognition technology has quickly become one of the most popular modes of communication.The arrival of this new trend is attributed to the significant progress made in several areas like  high computing power and powerful deep learning models, leading to dramatically lower error rates in speech recognition systems. In this regard, our research is focused on reducing the error rate by using a self-supervised model for Speech Tasks. This paper presents the XLS-R model for multi-lingual speech representation learning based on wav2vec 2.0. XLS-R's new model learns basic speech units in order to answer a self-supervised task. The model is trained by predicting correct speech units for masked parts of the audio, while simultaneously learning what those units should be. The XLS-R model is fine-tuned by using Connectionist Temporal Classification (CTC), which is a technique used to train neural networks to solve sequence-to-sequence problems, such as automatic speech recognition (ASR) and handwriting recognition.We have used a common voice corpus in the Turkish language. This model performs well and the word error rate (WER) is significantly decreased.

Keywords:

0
2021
Author:  
Citation Owners
Information: There is no ciation to this publication.
Similar Articles












Turkish Online Journal of Qualitative Inquiry

Field :   Eğitim Bilimleri

Journal Type :   Uluslararası

Metrics
Article : 4.283
Cite : 1.091
2023 Impact : 0.002
Turkish Online Journal of Qualitative Inquiry