OLD CATALAN MORPHOSYNTAX: DEVELOPING AN ANNOTATED CORPUS

Old Catalan Morphosyntax: Developing an Annotated Corpus

Old Catalan Morphosyntax: Developing an Annotated Corpus

Blog Article

This paper presents a full procedure for the development of a Part-of-Speech (POS) tagged corpus of Old Catalan.As an extremely low-resource language with rich inflection and frequent homographs, Old Catalan poses non-trivial problems in the development of a searchable constituency-based treebank.We demonstrate, UHF/VHF Accessories however, that a semi- supervised method of incrementally building training data using both neural and memory-based taggers, together with the Pyrrha annotation tool is highly efficient and yields accurate results.We propose that this Bag simple and effective method could easily be extended to other low-resource historical languages for which no NLP tools exist yet.

Report this page