Please use this identifier to cite or link to this item: http://hdl.handle.net/11328/4379
Title: ZeroBERTo: Leveraging Zero-Shot Text Classification by topic modeling
Authors: Alcoforado, Alexandre
Ferraz, Thomas Palmeira
Gerber, Rodrigo
Bustos, Enzo
Oliveira, André Seidel
Veloso, Bruno
Siqueira, Fabio Levy
Costa, Anna Helena Reali
Keywords: Artificial intelligence
Machine learning
Natural language processing
Learning paradigms
Supervised learning
Supervised learning by classification
Issue Date: Mar-2022
Publisher: ACM
Citation: Alcoforado, A., Ferraz, T. P., Gerber, R., Bustos, E., Oliveira, A. S., Veloso, B., Siqueira, F. L., & Costa, A. H. R. (2022). ZeroBERTo: Leveraging Zero-Shot Text Classification by topic modeling. In V. Pinheiro, & P. Gamallo (Eds.), [Proceedings of] Computational Processing of the Portuguese Language: 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21-23 2022, (pp. 125-136). ACM. https://doi.org/10.1007/978-3-030-98305-5_12. Repositório Institucional UPT. http://hdl.handle.net/11328/4379
Abstract: Traditional text classification approaches often require a good amount of labeled data, which is difficult to obtain, especially in restricted domains or less widespread languages. This lack of labeled data has led to the rise of low-resource methods, that assume low data availability in natural language processing. Among them, zero-shot learning stands out, which consists of learning a classifier without any previously labeled data. The best results reported with this approach use language models such as Transformers, but fall into two problems: high execution time and inability to handle long texts as input. This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task. We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset.
URI: http://hdl.handle.net/11328/4379
ISBN: 978-3-030-98304-8
Appears in Collections:REMIT - Publicações em Livros de Atas Internacionais / Papers in International Proceedings

Files in This Item:
File Description SizeFormat 
2201.01337.pdf369.43 kBAdobe PDFView/Open    Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.