Please use this identifier to cite or link to this item: http://hdl.handle.net/11328/4289
Title: Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
Authors: García-Méndez, Silvia
Leal, Fátima
Malheiro, Benedita
Burguillo-Rial, Juan Carlos
Veloso, Bruno
Chis, Adriana E.
González-Vélez, Horacio
Keywords: Classification
Data reliability
Stream processing
Synthetic data
Data fabrication
Wiki contributors
Issue Date: Nov-2022
Publisher: Elsevier
Citation: García-Méndez, S., Leal, F., Malheiro, B., Burguillo-Rial, J. C., Veloso, B., Chis, A. E., & González-Vélez, H. (2022). Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly. Simulation Modelling Practice and Theory, 120, 102616, 1-13. https://doi.org/10.1016/j.simpat.2022.102616. Repositório Institucional UPT. http://hdl.handle.net/11328/4289
Abstract: Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.
URI: http://hdl.handle.net/11328/4289
ISSN: 1569-190X (Print)
Appears in Collections:REMIT – Artigos em Revistas Internacionais / Papers in International Journals

Files in This Item:
File Description SizeFormat 
SIMPAT 2022.pdf1.02 MBAdobe PDFView/Open
Imagem1.png219.08 kBimage/pngThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.