| dc.contributor.author | Sarveswaran, Kengatharaiyer | |
| dc.contributor.author | Mahesan, Sinnathamby | |
| dc.date.accessioned | 2016-10-25T07:03:08Z | |
| dc.date.available | 2016-10-25T07:03:08Z | |
| dc.date.issued | 2016-10-25T07:03:08Z | |
| dc.identifier.citation | Sarveswaran, K., & Mahesan, S. (2014). Hierarchical Tag-set for Rule-based Processing of Tamil Language. International Journal of Multidisciplinary Studies (IJMS), 1(2), 67-74. | |
| dc.identifier.uri | http://dr.lib.sjp.ac.lk/handle/123456789/3307 | |
| dc.description.abstract | Corpora are fundamental tools for Natural Language Processing. Part of Speech tagging provides more meaning to the corpora by annotating words. A tag-set used to annotate a corpus should be selected in such a way that it represents grammatical structure of the respective language. These tag-sets can be flat or hierarchical in structure. There are several efforts have been made in Tamil language to identify a tag-set. However, existing tag-sets have many shortcomings including inability of tagging all the words, inability to capture required syntactic information such as divisibility, too many numbers of tags in a set, flat in tag structure, and lack of extendibility. The scholar works Tolkāppiyam and Naṉṉūl clearly shows the grammatical classification of words. This paper proposes a new hierarchical tag-set with 10 labels for Tamil language in view of developing a morphological analyser by considering the existing limitations and using Tamil grammar. The morphological analyser can be used to extend the proposed tag-set easily with more grammatical information. | en_US |
| dc.language.iso | en | en_US |
| dc.subject | POS tagging | en_US |
| dc.subject | Tag-set | en_US |
| dc.subject | Morphological analyser | en_US |
| dc.subject | Tamil grammar | en_US |
| dc.title | Hierarchical Tag-set for Rule-based Processing of Tamil Language | en_US |
| dc.type | Article | en_US |
| dc.date.published | 2014 |