DSpace Repository

Finding Efficient Linguistic Feature Set for Authorship Verification

Show simple item record

dc.contributor.author Ranatunga, R.V.S.P.K.
dc.date.accessioned 2022-04-06T10:11:10Z
dc.date.available 2022-04-06T10:11:10Z
dc.date.issued 2013
dc.identifier.citation Ranatunga, R.V.S.P.K.(2013).Finding Efficient Linguistic Feature Set for Authorship Verification, Journal of Computer Science Vol. 1, No. 1 (2013) 35-43 en_US
dc.identifier.uri http://dr.lib.sjp.ac.lk/handle/123456789/11014
dc.description.abstract Authorship verification rely on identification of a given document to verify whether it is written by a particular author or not. Internally, analyzing the document itself with respect to variations in writing style of the author and identification of the author‟s own idiolect is the main context of the authorship verification. Mainly, the detection performance depends on the used feature set for clustering the document. Linguistic features and stylistic features have been utilized for author identification according to the writing style of a particular author. Disclosing the shallow changes of the author‟s writing style is the major problem which should be addressed in the domain of authorship verification. It motivates the computer science researchers to do research on authorship verification in the field of computer forensics and this research also focuses on this problem. The contributions from the proposed research are two folded: Former is introducing a new feature extracting method with Natural Language Processing (NLP) and latter is proposing a novel and more efficient linguistic feature set for verification of the author of the given document. Experiments were carried out on a corpus composed of freely downloadable genuine 19th century English text. Each word segment obtained from the corpus is subjected to feature extraction and 49 stylistic features are used for clustering the text. Other than the standard stylistic features, 19 linguistic features are used as new feature set for the experiments. Generated parse trees by the Stanford Parser are utilized for extracting these linguistic features. Self organizing maps have been used as the classifier to cluster the documents. Proper word segmentation is also introduced in this work which helps us to demonstrate that the proposed strategy can produce promising results. Finally, it is realized that more accurate classification is generated by the proposed strategy with the extracted linguistic feature set. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science Faculty of Applied Sciences University of Sri Jayewardenepura en_US
dc.subject Authorship Verification, Style Markers, Natural Language Processing, Self Organizing Maps en_US
dc.title Finding Efficient Linguistic Feature Set for Authorship Verification en_US
dc.type Article en_US
dc.identifier.doi https://doi.org/10.31357/jcs.v1i1.1616 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account