Formatting Texts for the National Corpus of the Uzbek Language
dc.creator | Karshiev , A.B. | |
dc.creator | Karimov , S.A. | |
dc.creator | Tursunov , M.S. | |
dc.date | 2022-09-10 | |
dc.date.accessioned | 2023-08-20T07:09:55Z | |
dc.date.available | 2023-08-20T07:09:55Z | |
dc.description | This article discusses the general approach to the description and coding of the methods used in the inclusion of texts in the national corpus of the Uzbek language. A common format can be justified by the diversity and incompatibility of existing text formats. By using the JSON format to store texts in the corpus, it is possible to increase corpus search speed and overcome theoretical and technical problems of scalability. The inclusion of the texts of the Alpomish epic into the corpus is described. | en-US |
dc.description | Ushbu maqolada o’zbek tili milliy korpusiga matnlarni kiritishda foydalanilgan usullarni tavsiflash va kodlashga umumiy yondashuv muhokama qilinadi. Umumiy format mavjud matn formatlarining xilma-xilligi va nomuvofiqligi bilan asoslanishi mumkin. Korpusda matnlarni saqlash uchun JSON formatdan foydalanish orqali korpus qidiruv tezligini oshirish va kengayuvchanlikdagi nazariy va texnik muammolarni bartaraf etish mumkin. Korpusga Alpomish dostoning matnlari kiritilishi tavsiflangan. | ru-RU |
dc.format | application/pdf | |
dc.identifier | https://ijdt.uz/index.php/ijdt/article/view/31 | |
dc.identifier.uri | http://dspace.umsida.ac.id/handle/123456789/6798 | |
dc.language | rus | |
dc.publisher | Samarkand branch of TUIT | en-US |
dc.relation | https://ijdt.uz/index.php/ijdt/article/view/31/18 | |
dc.rights | Copyright (c) 2022 Tursunov M.S. | en-US |
dc.source | INTERNATIONAL JOURNAL OF THEORETICAL AND APPLIED ISSUES OF DIGITAL TECHNOLOGIES; Vol. 1 No. 1 (2022): International Journal of Theoretical and Applied Issues of Digital Technologies; 58-63 | en-US |
dc.source | Международный Журнал Теоретических и Прикладных Вопросов Цифровых Технологий; Том 1 № 1 (2022): Международный журнал теоретических и прикладных вопросов цифровых технологий; 58-63 | ru-RU |
dc.source | 2181-3094 | |
dc.source | 2181-3086 | |
dc.subject | Corpus | en-US |
dc.subject | formatting | en-US |
dc.subject | file | en-US |
dc.subject | text | en-US |
dc.subject | Alpomish epic | en-US |
dc.subject | token | en-US |
dc.subject | markup | en-US |
dc.subject | tag | en-US |
dc.subject | tagger | en-US |
dc.subject | JSON format | en-US |
dc.subject | , DOCX format | en-US |
dc.subject | korpus | ru-RU |
dc.subject | formatlash | ru-RU |
dc.subject | fayl | ru-RU |
dc.subject | matn | ru-RU |
dc.subject | Alpomish dostoni | ru-RU |
dc.subject | token | ru-RU |
dc.subject | razmetka | ru-RU |
dc.subject | teg | ru-RU |
dc.subject | tegger | ru-RU |
dc.subject | JSON format | ru-RU |
dc.subject | DOCX format | ru-RU |
dc.title | Formatting Texts for the National Corpus of the Uzbek Language | en-US |
dc.title | O‘zbek tili milliy korpusi uchun matnlarni formatlash | ru-RU |
dc.type | info:eu-repo/semantics/article | |
dc.type | info:eu-repo/semantics/publishedVersion | |
dc.type | Peer-reviewed Article | en-US |