for Tourism Recommendation

. Crowdsourced data streams are continuous ﬂows of data generated at high rate by users, also known as the crowd. These data streams are popular and extremely valuable in several domains. This is the case of tourism, where crowdsourcing platforms rely on tourist and business inputs to provide tailored recommendations to future tourists in real time. The continuous, open and non-curated nature of the crowd-originated data requires robust data stream mining techniques for on-line proﬁling, recommendation and evaluation. The sought techniques need, not only, to continuously improve proﬁles and learn models, but also be transparent, overcome biases, prioritise preferences, and master huge data volumes; all in real time. This article surveys the state-of-art in this ﬁeld, and identiﬁes future research opportunities.


Introduction
Tourism crowdsourcing platforms have revolutionised both the tourist behaviour and the tourism industry.Platforms such as AirBnB, Booking or TripAdvisor are popular online intermediaries between tourism businesses and tourists and, as a result, continuously accumulate large amounts of data shared by the tourists about their tourism experiences.They adopt a business model where stakeholders play predefined roles: (i ) businesses pay to have their services on display; and (ii ) tourists search for services of interest at no cost and provide feedback on their customer experience for free.According to Leal et al. (2018) [9], depending on the main type of data shared by the crowd, crowdsourcing tourism services can be classified as evaluation-based, map-based, wiki-based, and social networkbased.
While the processing of crowdsourced data can be performed off-line, using data mining, or on-line, using data stream mining, this review article addresses exclusively the challenge of the on-line processing of tourism crowdsourced data.Specifically, the application of data stream mining techniques to crowd inputs is more demanding due to real-time and transparency requirements.
This paper surveys existing techniques and recognises the most promising research trends in tourism crowdsourced data stream recommendation.The adopted method analyses the tourism data stream mining pipeline to identify techniques and technologies for real-time predictions driven by the accountability, responsibility and transparency design principles.To this end, the review of the stream-based processing pipeline covers: (i ) profiling, (ii ) recommendation, (iii ) explanation, (iv ) evaluation and (v ) support technologies, such as blockchain or High Performance Computing (HPC).Figure 1 illustrates this approach.Tourism data stream mining is event-driven and implements, in real time, a profiling, recommendation and evaluation loop.In this context, the continuous arrival of crowd-originated events triggers, first, the update of the involved profile and prediction models and, then, the suggestion and evaluation of personalised self-explainable recommendations.The remaining contents of this document details the data stream recommendation status quo, challenges and support technologies (Sect.2); identifies future research trends (Sect.3); and draws the conclusion (Sect.4).

Data Stream Mining
Data stream mining explores methods and algorithms for extracting knowledge from data streams, which are data sequences occurring continuously and independently.By applying learning algorithms to crowdsourced data streams, i.e., performing tourism crowdsourced data stream mining, it is possible to predict the tourist behaviour based on the associated digital footprint.However, due to the intrinsic dynamic nature of these heterogeneous data streams, they require on the fly techniques to perform automatic model learning and updating, concept drift identification and recovery; as well as cope with preference changes over time, uncurated crowdsourced data and extremely large volumes of data.In this context, automatic model learning refers to the selection of a suitable predictive model (or combination of models), whereas concept drift describes unforeseeable changes in the underlying distribution of streaming data overtime, which need to be addressed to prevent poor learning results [18].
Given the natural evolution of user interests over time, data stream recommendation needs to reflect current rather than outdated interests and, evaluation-wise, requires specialised evaluation protocols [6] and metrics.Finally, crowdsourced data are potentially unreliable and accumulate in huge volumes, dictating the adoption of technologies, which monitor traceability and authenticity, and create the need to perform parallel processing [21].

Profiling
Entity profiling, i.e., the creation and maintenance of entity models, is central to generate personalised tourism recommendations.Using crowdsourced tourism data, it is possible to model the stakeholders according to the corresponding digital footprint stored in tourism crowsourcing platforms.Resource (item) profiling can be based on intrinsic characteristics, crowdsourced information and semantic enrichment.Tourist (user) profiles are mainly based on crowdsourced data, which can be classified as entity-based or feature-based.While entity-based profiles are directly associated to tourism resources; feature-based rely on intrinsic characteristics, e.g., category, location, theme, etc.Based on the contents of crowdsourced data, the literature identifies further types of profiles.
Rating-based profiles rely on ratings to express, quantitatively, opinions concerning multiple services aspects.In evaluation-based crowdsourcing platforms, users can classify tourism resources using multiple service dimensions.[34,36] have used hotel and restaurant evaluations to create stream-based profiles adopting incremental updating.Review-based profiles are created from textual reviews.These reviews generally include qualitative comments and descriptions.In this context, a collection of reviews, rather than being perceived as static, constitutes an ongoing stream [29], leading to opinion stream mining.(2019) [14] use this approach to model the quality of publishers and pages, using wiki streams of publisher-page-review triplets.

Context-based
Popularity profiles use views, clicks and related-data to model the popularity of tourists and tourism resources.These profiles are frequently used to avoid the cold start problem in collaborative filtering.Leal et al. [14] rely on a page view data stream to model wiki publishers and pages in terms of popularity.trust and reputation models for reliability and explainability purposes.Hybrid profiles combine multiple types, leading to richer and more refined profiles and, in principle, to higher quality recommendations.Hybrid-based profiles are indicated for heterogeneous data environments, which have been explored using ensembles [25].However, building hybrid-based profiles from crowdsourced tourism data streams remains unexplored.
Regardless of the contents or the type of profiling used, crowdsourced data streams allow the continuous updating of tourism stakeholder profiles.

Recommendation
Recommendation engines play an important role in the tourism domain, providing personalised recommendations before a large variety of options.They rely mostly on data filtering techniques, ranging from pre-recommendation, recommendation and post-recommendation filters.Standard recommendation filtering techniques include: Content-based filters match tourists with tourism resources.They create tourist profiles based on past interactions with the system, and make recommendations based on the similarity between the content of the tourist and resource profiles, i.e., regardless of other tourist profiles [17].Collaborative filters recommend unknown resources to tourists based on other like-minded tourists, using memory or model based algorithms, and building profiles based on the crowdsourced data.While memory-based approaches combine the preferences of neighbours with identical profiles to generate recommendations, model-based algorithms build models based on the tourist profile to make predictions.Collaborative filters may implement tourist-based or resource-based variants by computing the similarity between tourists or between resources.These techniques have been adapted with success to data stream recommendation [10,12,23,34,36].Hybrid filters combine content-based and collaborative counterparts to eliminate frailties and reinforce qualities and, thus, improve the quality of recommendations.Hybrid filters, aggregating multiple mechanisms in parallel, have been explored by session-based recommendation systems [8,28,30].
A priori and a posteriori filtering aims to refine the recommendations reducing the search space.Pre-recommendation and post-recommendation filters have been explored mainly using context-based profiles.
Pre-Recommendation filters are applied beforehand to select appropriate tourist data [40], e.g., weekdays recommendations, business or leisure travels.They increase recommendation relevance by analysing context-aware data.Post-Recommendation filters remove or reorder the recommendations generated by the recommendation filter.In tourism domain, the value-for-money, the sentiment-value and the pairwise trust have been used, among others, as post-recommendation filters.Value-for-money confronts the price, the crowd overall rating and the resource official star rating to establish the crowdsourced value for money.The sentiment-value of textual reviews is computed using sentiment analysis [34,36].Finally, the pairwise trust and similarity have been used to reorder the generated predictions [12].
Data stream recommendation enables the continuous updating of the users and items models and contributes to improve the quality of real-time recommendations.While data stream tourism recommendation has been able to adapt standard recommendation techniques, mainly collaborative filters, to real-time processing, it still needs to address: Concept drifts in collaborative filters can be detected by focusing on the recency, temporal dynamics or time period partitioning.In stream-based environments, concept drifts can be identified using window-based monitoring, accuracy-based model monitoring, and ensemble-based methods.Alternatively, an incremental adaptive unsupervised learning algorithm for recommendation systems that uses k-means clustering to detect drifts has been explored [38].In the case of stream-based tourism recommendation, concept drifts has been explored using monitoring accuracy metrics [2,33].Model learning is mandatory for data stream recommendation.In the tourism domain, Nilashi et  However, only the works reported in [2,22] address the tourism domain.

Explanations
Given the highly influential nature of recommendations, there are growing concerns about the principles behind recommendation algorithms.In this regard, Dignum (2017) [5] recommends that the development of such algorithms should be guided by the following design principles: accountability -explain and justify decisions; responsibility -incorporate human values into technical requirements; and transparency -describe the decision-making process and how data is used, collected, and governed.This means that data stream recommendation must explain and justify the rationale behind all recommendations, increasing the confidence of the users and the transparency of the system.
An explanation is any additional information which clarifies why a system arrived at a particular decision.Specifically, in the case of recommendations, explanations justify why an item has been recommended, adding transparency and supporting decision making.An explainable and transparent system helps the user understand whether the output is based on his/her preferences rather than third party interests.Explanation models can use multiple sources of information, ranging from entity-based, feature-based, text-based, visual-based to social-based.In this regard, Veloso et al. (2019) [35] suggest exploring trust and reputation profiles to explain recommendations in tourism crowdsourcing platforms while, at the same time, storing these profiles in a blockchain to ensure authenticity and integrity.This proposal was implemented by Leal et al. (2020) [16].They incrementally update and store trust models of the crowd contributors in the blockchain as smart contracts and, then, use them to derive reputation models and generate stream-based explainable recommendations.

Evaluation
Stream-based evaluation has two main components: the evaluation protocol and the evaluation metrics.An online evaluation protocol has three main constraints: (i ) space, where the available memory is limited; (ii ) learning time, when the time required to learn is equal that the rate of incoming events; and (iii ) accuracy or the capacity of the model capture the data variations.The most used online evaluation protocol is the prequential protocol [6], which adopts sliding windows or fading factors to forget less relevant examples.It has three steps: (i ) produce a prediction for an unlabelled instance in the stream; (ii ) assess the prediction error; and (iii ) update the model with the most recently observed error.
In terms of evaluation metrics, there are predictive, classification, and statistical metrics.Prediction metrics describe the accuracy in the accumulation of predictive errors [31].In terms of classification metrics, Cremonesi et al. (2010) [4] present a three-step methodology: (i ) generate the predictions of all items not yet classified by the active user; (ii ) select randomly 1000 of these predictions plus the active user real value; and (iii ) sort this list of 1001 item values using the post-filter.Finally, concerning statistical metrics, Souza et al. (2018) [27] have recently suggested a new evaluation measure (Kappa-Latency), which takes into account the arrival delay of actual instances.Alternatively, Vinagre et al. (2019) [37] propose, for recommendation algorithms, the adoption of the k-fold validation framework together with McNemar and Wilcoxon signed-rank statistical tests applied to adaptive-size sliding windows.

Support Technologies
Real time processing requirements of stream-based recommendation, and the uncurated nature of crowdsourced data poses infrastructural challenges.This review highlights two key technologies to address them: blockchain and HPC.
Blockchain is a distributed ledger technology maintained by a peer-to-peer network of nodes where blocks, containing validated transactions, are sequentially chained through cryptographic hashes.The network validates new transactions concurrently, using consensus mechanisms.Once validated, they are committed to a block granting security, authenticity, immutability, and transparency.Moreover, it ensures end-to-end verification, which can be used to record data and track sources over time in a trusted manner.Blockchain has been explored, in stream-based environments, for auditable purposes [26] and to store tourism smart contracts and transact cryptocurrencies [3,24].High Performance Computing and, in particular, cloud computing infrastructures, underpin the algorithmic analysis of large amounts of data, becoming a de facto pillar of scalable data analytics [32].In the tourism domain, Veloso et al. (2018) [34] explores the scalability of crowdsourced data stream recommendation using HPC.

Research Trends
The most relevant research trends in the crowdsourced data stream recommendation for tourism encompass reliable profiling, automated model learning, including the detection of concept drifts, preference evolution, processing transparency as well as the identification of support technologies that meet the data authenticity and traceability (blockchain) and seamless scalability (HPC) requirements.
Reliable profiling -Crowdsourced data streams are unfiltered and uncurated by default, meaning that they are exposed to malicious manipulation.This suggests the need to build reliable models of data contributors, and to trace data contributions back to contributors themselves.To this end, trust and reputation profiling approaches has been explored [16,35].Concept drift detection -On the fly concept drift detection relies on constant monitoring of relevant metrics and on model learning [2,22,33].Model learning -The processing of tourism crowdsourced data streams demands dynamic model learning to continue generating meaningful recommendations over time [2,22,33].Preference evolution -Stream-based tourism recommendation requires techniques that ignore outdated user preferences [19,39].
Transparency -Personalised recommendations require the explanation of the underlying reasoning and data, particularly when they are based on crowdsourced data.In this context, trust and reputation models of contributors have been explored to explain recommendations [16,35].Blockchain -Crowdsourced data is prone to manipulation.Blockchain provides data quality control for data authenticity and traceability [16,35].HPC -Smart tourism produces crowdsourced data at a high rate and volume, demanding agile mechanisms to profile and filter information in real time and, consequently, efficient computational infrastructures [34].

Conclusion
The research on tourism crowdsourced data stream recommendation presents multiple algorithmic and technology challenges.On the one hand, the algorithmic design needs to address further concept drift identification, crowd reliability, distributed processing, model learning, preference evolution and transparency.As shown, these research directions are beginning to be explored, but there is still a long way to go.On the other hand, the data reliability, pace and volume and the near real time operation impose extremely demanding requirements for supporting technologies.Nevertheless, blockchain and HPC appear as two promising pillars.The adoption of blockchain grants data traceability, authenticity and, when integrated with trust and reputation modelling, provides algorithmic transparency, whereas HPC contributes with a computational infrastructure solution for the real time performance requirements.

Fig. 1 .
Fig. 1.Review proposal profiles use context information, which can be personal context data, social context data, and context-aware information data[15].Gomes et al. (2010) [7] propose a context-aware system with data stream learning to improve existing drift detection methods by exploiting available context information.Similarly, Akbar et al. (2015)[1] explore context-aware stream processing to detect traffic in near real-time.Quality profiles model tourism entities using quality related parameters.It has been used mainly to model tourism wiki pages and corresponding publishers.Wiki publishers originate continuous data streams in the form of content revisions.However, scant research has been conducted to construct qualitybased profiles employing wiki-based information as data streams.Leal et al.
[13]t and Reputation profiles model reliability.Trust defines the reliability of stakeholders based on direct one-to-one relationships.Reputation is based on third party experiences, i.e., many-to-one relationships.Leal et al.  (2018)[10]propose trust and reputation modelling for stream-based hotel recommendation, andLeal et al. (2019)[12]employ incremental trust and reputation models for post-filtering, improving the accuracy of recommendations in both cases.Recently,Leal et al. (2020)[13]recommended chaining1