Formatting Texts for the National Corpus of the Uzbek Language

dc.creatorKarshiev , A.B.
dc.creatorKarimov , S.A.
dc.creatorTursunov , M.S.
dc.date2022-09-10
dc.date.accessioned2023-08-20T07:09:55Z
dc.date.available2023-08-20T07:09:55Z
dc.descriptionThis article discusses the general approach to the description and coding of the methods used in the inclusion of texts in the national corpus of the Uzbek language. A common format can be justified by the diversity and incompatibility of existing text formats. By using the JSON format to store texts in the corpus, it is possible to increase corpus search speed and overcome theoretical and technical problems of scalability. The inclusion of the texts of the Alpomish epic into the corpus is described.en-US
dc.descriptionUshbu maqolada o’zbek tili milliy korpusiga matnlarni kiritishda foydalanilgan usullarni tavsiflash va kodlashga umumiy yondashuv muhokama qilinadi. Umumiy format mavjud matn formatlarining xilma-xilligi va nomuvofiqligi bilan asoslanishi mumkin. Korpusda matnlarni saqlash uchun JSON formatdan foydalanish orqali korpus qidiruv tezligini oshirish va kengayuvchanlikdagi nazariy va texnik muammolarni bartaraf etish mumkin. Korpusga Alpomish dostoning matnlari kiritilishi tavsiflangan.ru-RU
dc.formatapplication/pdf
dc.identifierhttps://ijdt.uz/index.php/ijdt/article/view/31
dc.identifier.urihttp://dspace.umsida.ac.id/handle/123456789/6798
dc.languagerus
dc.publisherSamarkand branch of TUITen-US
dc.relationhttps://ijdt.uz/index.php/ijdt/article/view/31/18
dc.rightsCopyright (c) 2022 Tursunov M.S.en-US
dc.sourceINTERNATIONAL JOURNAL OF THEORETICAL AND APPLIED ISSUES OF DIGITAL TECHNOLOGIES; Vol. 1 No. 1 (2022): International Journal of Theoretical and Applied Issues of Digital Technologies; 58-63en-US
dc.sourceМеждународный Журнал Теоретических и Прикладных Вопросов Цифровых Технологий; Том 1 № 1 (2022): Международный журнал теоретических и прикладных вопросов цифровых технологий; 58-63ru-RU
dc.source2181-3094
dc.source2181-3086
dc.subjectCorpusen-US
dc.subjectformattingen-US
dc.subjectfileen-US
dc.subjecttexten-US
dc.subjectAlpomish epicen-US
dc.subjecttokenen-US
dc.subjectmarkupen-US
dc.subjecttagen-US
dc.subjecttaggeren-US
dc.subjectJSON formaten-US
dc.subject, DOCX formaten-US
dc.subjectkorpusru-RU
dc.subjectformatlashru-RU
dc.subjectfaylru-RU
dc.subjectmatnru-RU
dc.subjectAlpomish dostoniru-RU
dc.subjecttokenru-RU
dc.subjectrazmetkaru-RU
dc.subjecttegru-RU
dc.subjectteggerru-RU
dc.subjectJSON formatru-RU
dc.subjectDOCX formatru-RU
dc.titleFormatting Texts for the National Corpus of the Uzbek Languageen-US
dc.titleO‘zbek tili milliy korpusi uchun matnlarni formatlashru-RU
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/publishedVersion
dc.typePeer-reviewed Articleen-US
Files