I tried to annotate some Korean text using udpipe_load_model as below. There was no error while running the code, but the result was not right. It produced characters like “<U+653C><U+3E37><U+623C><U+3E30>” instead of a Korean token or lemma. What should I do? I used language = “korean-kaist” but the result remained the same. By the way, when I annotated English text using the same code with the argument “language = english”, the result looked fine as expected.
Thanks for your kind and prompt help!
The code I used was as follows:
ud_model <- udpipe_download_model(language = “korean-gsd”,model_dir = getwd(),
udpipe_model_repo =“jwijffels/udpipe.models.ud.2.3”,overwrite = TRUE)
ud_model <- udpipe_load_model(ud_model$file_model)
s_tst <- udpipe_annotate(ud_model, x=prsn_txt$txt, doc_id=prsn_txt$subj)