Sobiad Atıf Dizini

İndirme 4

Makale Detay

Benzer Makaleler

PDF Görüntüle

Dergi Bilgisi

Eseri Dinleyin

Alıntı Yap

Bu Sayfayı Yazdırın

Paylaş

Monitor Corpus Trendi and Automatic Text Categorization

2023

Dergi:

Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave

Yazar:

DOI:

Özet:

Abstract The paper presents the compilation of the Trendi corpus, the first monitor corpus of Slovene. The current version (Trendi 2023-02) contains texts published between January 2019 and October 2023, with a total of over 700 million tokens (more than 586 million words). The purpose of the corpus is to provide linguists and non-linguists with data on current language use and to enable the monitoring of new words as well as the increase and decline in the use of existing words. In the paper, we present the contents of the corpus and the methods and criteria used in its compilation. The second part of the paper is focused on the development of a tool for categorizing text topics in news articles. The tool was developed specifically for the Trendi corpus but can be used for other corpora containing similar texts. A set of 13 thematic categories was developed for the tool. The set generally follows international standards and categories used in comparable corpora for other languages. Using texts annotated with these categories, we trained multiple language models and achieved a high classification accuracy when categorizing text topics.

Anahtar Kelimeler:

Atıf Yapanlar

Bilgi: Bu yayına herhangi bir atıf yapılmamıştır.

Benzer Makaleler

1. Universal Dependencies for Slovenian: An Upgrade to the Guidelines, Annotated Data and Parsing Model

2023

Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave

2. The words that make fake stories go viral: A corpus-based approach to analyzing Russian Covid-19 disinformation

2023

Vestnik Rossijskogo Universiteta Družby Narodov: Seriâ Lingvistika

3. Corpus Approaches to Metaphor and Metonymy Identification: The Case of Metonymy in g-KOMET

2023

Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave

4. ¿CÓMO DISEÑAR UN CORPUS DE CALIDAD? PARÁMETROS DE EVALUACIÓN

2010

Sendebar

5. Corpus of Academic Learner English (CALE): A new corpus at the intersection of corpus linguistics and English for academic purposes

2020

The Literacy Trek

6. Using Hadith Corpus in Learning Arabic as a Second Language

2021

Turkish Journal of Computer and Mathematics Education

Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave

Alan : Eğitim Bilimleri; Sosyal, Beşeri ve İdari Bilimler

Dergi Türü : Uluslararası

Metrikler

Makale : 161

Atıf : 5

Özet
Eseri Dinleyin

Yazar : --

Dergi :

Sayı

Yıl

Tür

Atıf Sayısı

PDF Görüntüle

Benzer Makaleler
Bu Yayına Atıf Yapanlar

Benzer Makaleler	Yazar	#

Makale	Yazar	#

Kullanım Kılavuzu

Menü

Mendeley

Endnote

Monitor Corpus Trendi and Automatic Text Categorization

2023

Dergi:

Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave

Yazar:

DOI:

10.4312/slo2.0.2023.1.161-188

Özet:

Anahtar Kelimeler:

Atıf Yapanlar

Bilgi: Bu yayına herhangi bir atıf yapılmamıştır.

Benzer Makaleler

Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave

Metrikler