Finding Efficient Linguistic Feature Set for Authorship Verification

dc.contributor.author	Ranatunga, R.V.S.P.K.
dc.date.accessioned	2022-04-06T10:11:10Z
dc.date.available	2022-04-06T10:11:10Z
dc.date.issued	2013
dc.identifier.citation	Ranatunga, R.V.S.P.K.(2013).Finding Efficient Linguistic Feature Set for Authorship Verification, Journal of Computer Science Vol. 1, No. 1 (2013) 35-43	en_US
dc.identifier.uri	http://dr.lib.sjp.ac.lk/handle/123456789/11014
dc.description.abstract	Authorship verification rely on identification of a given document to verify whether it is written by a particular author or not. Internally, analyzing the document itself with respect to variations in writing style of the author and identification of the author‟s own idiolect is the main context of the authorship verification. Mainly, the detection performance depends on the used feature set for clustering the document. Linguistic features and stylistic features have been utilized for author identification according to the writing style of a particular author. Disclosing the shallow changes of the author‟s writing style is the major problem which should be addressed in the domain of authorship verification. It motivates the computer science researchers to do research on authorship verification in the field of computer forensics and this research also focuses on this problem. The contributions from the proposed research are two folded: Former is introducing a new feature extracting method with Natural Language Processing (NLP) and latter is proposing a novel and more efficient linguistic feature set for verification of the author of the given document. Experiments were carried out on a corpus composed of freely downloadable genuine 19th century English text. Each word segment obtained from the corpus is subjected to feature extraction and 49 stylistic features are used for clustering the text. Other than the standard stylistic features, 19 linguistic features are used as new feature set for the experiments. Generated parse trees by the Stanford Parser are utilized for extracting these linguistic features. Self organizing maps have been used as the classifier to cluster the documents. Proper word segmentation is also introduced in this work which helps us to demonstrate that the proposed strategy can produce promising results. Finally, it is realized that more accurate classification is generated by the proposed strategy with the extracted linguistic feature set.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science Faculty of Applied Sciences University of Sri Jayewardenepura	en_US
dc.subject	Authorship Verification, Style Markers, Natural Language Processing, Self Organizing Maps	en_US
dc.title	Finding Efficient Linguistic Feature Set for Authorship Verification	en_US
dc.type	Article	en_US
dc.identifier.doi	https://doi.org/10.31357/jcs.v1i1.1616	en_US