THE PROCESS OF COLLECTING A DATABASE, ANNOTATING SENTENCES, AND TOKENIZING IN THE CREATION OF A DEPENDENCY PARSING OF THE UZBEK LANGUAGE
DOI:
https://doi.org/10.56292/SJFSU/vol31_iss3/a120Keywords:
Annotation, steps, tokenization, lemmatization, text selection, documentation, guideline, result, processAbstract
This article provides an in-depth overview of the stages involved in building a dependency parsing treebank for the Uzbek language. It outlines the five simplified but essential stages commonly adopted in international practice when constructing a hierarchical corpus for any language. These stages are as follows: Text selection; pre-processing (including the choice of tools and resources); annotation; documentation of language-specific guidelines and treatment of non-universal linguistic features; and finally, transliteration.
References
Bruno Guillaume. 2021. Graph Matching and Graph Rewriting: GREW tools for corpus exploration, maintenance and conversion. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 168–175, Online. Association for Computational Linguistics.
P Qi, Y Zhang, Y Zhang, J Bolton, Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. CD Manning. Association of Computational Linguistics (ACL) System Demonstrations.
https://stanfordnlp.github.io/stanza/
Salaev U. UzMorphAnalyser: A morphological analysis model for the Uzbek language using inflectional endings //AIP Conference Proceedings. – AIP Publishing, 2024. – Т. 3244. – №. 1
Stefanie Dipper, Cora Haiber, Anna Maria Schröter, Alexandra Wiemann, and Maike Brinkschulte. 2024. Universal Dependencies: Extensions for Modern and Historical German. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17101–17111, Torino, Italia. ELRA and ICCL.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Scientific journal of the Fergana State University

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
Most read articles by the same author(s)
- Hulkar Nе'matova, TRAVELOGUE OF HADJI MUIN "MEMORY OF KATTAKURGON" , Scientific journal of the Fergana State University: No. 1 (2023): Scientific journal of the Fergana State University (Exact and natural sciences)
- , , , EFFECTIVE METHODS FOR PRODUCING ORGANOMETALLIC ADSORBENTS BASED ON IRON , Scientific journal of the Fergana State University: No. 1 (2025): FarDU ilmiy xabarlari jurnali (TABIIY FANLAR)
- Hulkar Nе'matova, TRAVELOGUE OF HADJI MUIN "MEMORY OF KATTAKURGON" , Scientific journal of the Fergana State University: No. 1 (2023): Scientific journal of the Fergana State University (Social humanities sciences)