ON THE ROAD WITH THE ERASMUS IP WISDOM PROJECT-IMPROVING AN ON-LINE BUSINESS BY APPLYING WEB MINING TECHNIQUES

The Web can be regarded as the largest database available and presents a challenging task for efficient design and access. Web mining aims at the discovery and analysis of useful information from the Web through the use of data mining techniques. In this paper we report our experiences in applying web mining techniques to an e-business platform of a retail company. The company related Web content data and its usage are analyzed with data mining tools. The underlying idea will be to address web marketing purposes, suggesting improvements for both the on-line business and the website. The work here reported was developed in the context of an Erasmus Intensive Programme (IP) project.


INTRODUCTION
The rapid growth of the Web in the past two decades has made it the largest publicly accessible data source in the world [1], [2]. Web mining aims to discover potential useful information and patterns from the Web [3], [4]. With the ever-increasing demand for Web-enabled management of knowledge, today's organizations have to address the multiple facets of process, standards, technology, data mining, and warehousing management. This requires ICT approaches to provide an integrated interchange of quality metadata that enable organizations to use the Web as a vehicle for obtaining results that can be both content-rich and practical for decision-making situations.
Web mining can be considered a multidisciplinary topic. In order to have valuable insights, make more informed business decisions and gain competitive advantage, e-business organizations are offered a set of techniques and tools for extracting and analyzing web data ( [3], [4], [5], [6], [7]). For transforming the data into useful business information, knowledge of both business and ICT is needed. We argue that Web Mining can provide much support for decision-making in e-business organizations and to improve the quality of service provided by these organizations.
In this paper we use an e-business platform for an on-line business and analyse the web content data and its usage with data mining tools. The underlying idea will be to address web marketing purposes, suggesting improvements for the on-line business, by applying web mining techniques. The data mining techniques applied on the data will provide valuable insights and allow the discovery of interesting patterns; the knowledge acquired may be used by the business to take better decisions. Furthermore the e-business platform will be analysed from the "usage perspective", through the application of web search models and web analytics tools. This will also allow to acquire knowledge about the way users navigate in the e-business platform and acknowledge ways of improving it.
The work reported here in was developed in the context of an Erasmus Intensive Programme (IP) Project, the IP WISDOM (Web Information System Data Organization Modeling) project, where several European higher education partner institutions were involved [8], [9]. The long-run objective of the IP WISDOM is to build an international curriculum in which the partners can subscribe. Each edition of the IP project was, therefore, a part of a long-term project to develop a joint European Curriculum for Web Mining studies.
The paper is structured as follows: Section 2 provides as overview of the web mining concept and identifies the motivation and challenges involved in the process. Section 3 describes the practical case in which the project was based and sets the objectives to fulfill during the project, identified as a set of business questions to address. In Section 4 we present some findings related with the study performed which address the business questions previously identified. Section 5 includes our suggestions for the e-business as well as for the website improvement, based on the study performed. Section 6 concludes with considerations on the project achievements.

WEB MINING OVERVIEW
Web mining (or Web data mining) is the process of discovering intrinsic relationships (that is, interesting and useful information) from Web data, which are expressed in the form of textual, linkage or usage information. The term "Web mining" was first used by [10] and further adopted by many authors focusing on Web data mining ( [12], [13], [14], [15], [16], [17]). A key issue underlying the definitions provided is that Web mining is the application of data mining techniques to discover usage patterns from large Web repositories, being a continually evolving area of technology and business practice. It reveals interesting and unknown knowledge about both users and websites which can be used for analysis. It may be used to understand customer behaviour, evaluate the effectiveness of a particular website and help to quantify the success of a marketing campaign.
Web mining is a demanding and challenging task due to the Web unique characteristics as described below. Furthermore, if used to a full extent, it can be applied in three different branches so as to mine web content, structure and usage.

The Web Unique Characteristics
There is a general consensus that the Web has many unique characteristics, which make mining useful information and knowledge a both needed and challenging task. These characteristics are summarized in [13] as follows:

Types of Web Mining
Based on the primary kinds of data used in the mining process, Web mining tasks can be categorized into three types [2], [4]: Web structure mining, Web content mining and Web usage mining.
• Web structure mining: Web structure mining discovers useful knowledge from hyperlinks (or links for short), which represent the structure of the Web. For example, from the links, we can discover important Web pages, which is a key technology used in search engines. We can also discover communities of users who share common interests. Traditional data mining does not perform such tasks because there is usually no link structure in a relational table.
• Web content mining: Web content mining extracts or mines useful information or knowledge from Web page contents. For example, we can automatically classify and cluster Web pages according to their topics. These tasks are similar to those in traditional data mining. However, we can also discover patterns in Web pages to extract useful data such as descriptions of products, postings of forums, etc., for many purposes. Furthermore, we can mine customer reviews and forum postings to discover consumer opinions. These are not traditional data mining tasks.
• Web usage mining: Web usage mining refers to the discovery of user access patterns from Web usage logs, which record every click made by each user. Usage data captures the identity or origin of web users along with their browsing behavior at a website.

THE CASE: THE E-BUSINESS COMPANY
Marques Soares is a Portuguese retail company that is mainly focused on the clothing business. It offers a wide variety of products and brands that include ready-to-wear garments for men, women, youth and children as well as other products usually found on department stores such as perfumes, electronic appliances, optics and home décor. Marques Soares reaches their customers through its stores, product catalogue magazine and the online store, which can be found at www.marquessoares.pt.
The company has its roots in Porto, where the first store opened in November 1960. At the time this study was performed, the company had 10 department stores in 7 different cities of Portugal. The company had about 70 000 loyal customers coming from three different sale channels: online sales, physical stores or post mail using the product catalogue magazine. The online store was launched in September 2009 and, in March 2012, the sales on this channel represented only 2% of the global company sales (versus 76% for the stores and 22% for catalogue post mail sales). A major goal for this project was therefore to help the company to identify a strategy to improve the sales and marketing for the e-commerce channel.

Case Data
For the data analysis several data sources were used. These included the website sales data as an excel file, access to the company's Google Analytics account, a Web server log file and the company profiles in social networks such as Facebook and Twitter as well as videos on the company YouTube account.
Interviews with the representatives of the IT and Sales/Marketing departments of the company were also useful to gather additional information about the company business strategy and goals as well as to help to identify the business questions that should be addressed with this study.
The excel file included the online sales data from the period September 2009 until March 2012. The data sheet was composed of 12761 rows, representative of 5278 sales and 1821 different customers. Each row of the datasheet represented sales transaction data, and included details on customer (number, gender, date of birth, location, profession, admission date, type), product purchased (code, store department, product description, unit price, colour, size, brand, etc.) and transaction (payment method, order number, order date, etc.). Google Analytics data included data about the website visitors and demographics of the users of the website. The web server log provided details such as the IP address of the user/customer, the web browser used, page reference and access date/time.

Business Questions
From a preliminary analysis of the data provided and the interviews performed with the company representatives the following issues were identified as to be addressed by the study: 1 What is the typical customer profile of the website?
2 What are the major store departments/product categories in terms of on-line sales?
3 What brands or products do customers prefer?
4 Are there any typical combinations of products sold together?
5 What is the geographic location of the visitors of the website?
6 What is the correlation between website visitors by region and website sales?
7 How do the website visitors reach the website?
8 How can the website be improved so that it can be more effective, attract more visitors/customers and increase online sales?
9 What can be done to improve the company use of social media?

PROCESSING AND ANALYSIS OF THE CASE DATA
In the scope of this project, data mining was performed using MS Excel and SPSS software and mostly covered attempts to find useful information and discover relationships and patterns on the data analyzed and extracted from web page contents. QlikView was used to present some dashboards. Google Analytics was used to mine the web traffic of the website so as to track user activity patterns from usage logs and user interactions with the website. In the remaining sections, we report the major findings with the study performed and how we addressed the business questions previously identified.

Typical customer profile, major product categories and brands preferred
The majority of the website customers were women (85,3%) versus 14,7% of male customers. Customers were mainly from the 35-39 age group, as shown in Fig. 1. The major product categories in terms of sales revenue were Women, Youngsters and Shoes (Fig. 2), whereas the major product categories in terms of number of sales transactions were Youngsters (23%), Women (19%) and Shoes (14%) (Fig. 3). Both figures show that, in general, clothing sales figures were quite good, while other categories such as accessories, bricolage and optica represented residual sales. It could be wise for the company only to focus on clothing sales.  The graph in Fig. 4 shows an overview of the most popular brands sold over the website during the analyzed period of time; "Salsa" is the best selling brand. This information can also be displayed at the year level and month.

Combinations of products sold
We performed some market basket analysis experiences so as find products frequently bought together, that could be the basis for a shopping recommendation system. A subset of the results obtained for the product "Bermuda" is shown in Fig. 5.

Correlation of website visits per region and website sales
We also analysed the correlation between the website visits by region and the website sales. These results are shown in Fig. 7. This will enable to have valuable insights of the regions that need to be targeted with improved marketing campaigns.

Website access sources
As shown in Fig. 8, most of the traffic (87%) came from searches on Google or by direct link access to the website.

Website Goal Conversion Rate in Purchases over the Internet
As shown in Fig. 9, the goal conversion rate of the purchases was 3,52%, which means that only that rate of the website visitors actually placed an order. Such a low rate can be caused by several issues. However, a policy definition to offer some kind of benefits to the online customers should be considered. Figure 9. Goal conversion rate in purchases over the Internet.

SUGGESTIONS FOR IMPROVEMENT
From the data and web mining performed, it was clear the company website needed to be improved so that it could be more effective, attract more visitors/customers and, therefore, become a means to increase online sales. The following suggestions for improvement were identified, so that the company could more easily address these goals:

Search Engine Optimization (SEO) and Website advertisement
Quite often people do not know the exact URL of a website and use a search engine for a prior search. However, from the results obtained with Google Analytics shown in Fig. 10, it is clear that the most used keywords are all related to "Marques Soares"; that means that these users already know the store and probably are customers. The company needs to attract new visitors that have never heard about the store. The goal should be to have a higher page rank when users just look for clothing stores. The following hints should be considered: ª Use better keywords For simple keywords such as "loja de roupas" (clothing store), "sapatos" (shoes) and "calçado" (shoes), Marques Soares' website does not show up on the first page in Google. Other keywords should be considered such as "pronto-a-vestir" (ready to wear) and "grandes armazéns" (department store).
ª Add metadata The use of metadata is a way to make a website appealing to new customers. Metadata contains information about a web page for both search engines and visitors. It allows to set up the description provided by any search engine when a website is shown in the search results. At the time of the study performed, Marques Soares did not have any metadata.
The store "Zara" was one of the closest competitors to Marques Soares in the local market. Hence, a comparison about the results of both searches in Google was made. As shown in Fig. 11, the description of Zara's webpage was more appealing and clearer for visitors when compared with a similar description about Marques Soares (Fig. 12).
Zara's metadata: <meta name="description" content="Poderá comprar todas as novidades que cheguem à loja cada semana e também encontrará as fotografias do catálogo, do lookbook e da colecção." /> New clothes, catalogue and lookbook, store. Figure 11. Metadata and results obtained with a search in Google for the Zara store. ª Website advertisement As displayed in Fig. 8, when considering the access to the website, most of the hits came from "Google", when people were looking for "marques soares". A possibility to further increase the traffic and the number of potential customers would be to optimize the website for other search engines. Furthermore, it is worth noticing that there are no visitors coming from other websites. Therefore, the company should consider creating some ads to be placed on other websites. These online advertisements of the company should target popular sites that are visited by potential customers. Our advice was to first focus to get the website up in the search results, so that the number of visitors and customers may more easily increase. Subsequently, advertisement should be used as well.

Product Recommendation system
The site didn't have any kind of product recommendation system nor a last/recently viewed items section. Such a system should be useful, not only to offer a personalized marketing and website experience to the customer but also to improve the online sales. Our market basket analysis experiences showed some interesting results and uncovered the need for the implementation of such a system.

Promote online sales and reward loyal and good customers
Offer special conditions for online sales, such as a discount rate specially set for online sales and/or offer free shipping for orders above a given amount. Offer the so considered "good customers" vouchers, gift cards or access to private sales; reward a customer when he brings on a new customer.

Improve the interaction with customers through social media
The level of interaction with customers through the company facebook page was rather low. Thus, interaction with customers through the posting on the facebook page of other contents (e.g. sports events) that may attract the attention of some user groups may be an option. The launching of contests in the facebook page which award winners promotion coupons for use in the store is another possibility. Facebook open social graph meta tags (https://developers.facebook.com) should also be added, so that when people share a product on their facebook wall, the proper photo and description are shown. On the other hand, the Twitter account was merely used as a copy of the facebook content. Twitter may be easily used to spread information such as deals, new stores, new products and ensure a direct communication with the customer. Twitter should also be used to target users following brands that the store sells or that are at the same level of interest such as Levis, Adidas, Burberry, etc.
When the number of followers of both the company pages in facebook and Twitter will reach a higher volume, a monitoring social media tool may be used, to allow the company to effectively analyse what the customers think about the company, its products and the service provided, and further enrich its customer information database with other interesting data, such as the events and places that the customers are interested in, the people they are following, their network of friends, etc. This kind of information should be considered in the launching of future marketing campaigns.

Make the website accessible for mobile devices
A mobile version of the site for tablets and smartphones should be useful, specially to attract younger customer age groups and for product marketing purposes, as through mobile devices it is easier to share clothing products information.

Website load time optimization
Loading time is a major contributing factor to page abandonment; slow page response time also results in an increase in page abandonment. To decrease the load time of the website, the following actions were advised to be considered: • Reduce the number of JavaScript and css files used so as to reduce the number of HTTP petitions. At the time of the study, 5 css and 11 JavaScript files were used.
• Use a cache system (e.g. memcached). The most visited pages such as the index should be in memory, ready to be served.
• Use a Content Delivery Network (CDN) to serve static files; css, images and js files can be served using a CDN, such as Amazon S3. The jquery version cached by Google is another possibility.
• Put the JavaScript files at the bottom of the page. The html content will load faster.
• Minimize image size: the images had a lot of white space resulting in a waste of bandwidth.

CONCLUSIONS AND FURTHER WORK
Fundamental to the optimization process proposed in this paper was measurement, gathering data and information that could be transformed into tangible analysis and recommendations for improvement, by using Web mining tools and techniques. We mainly focused on a quantitative analysis of online visitors and customers behaviour, as from the data provided we could not have a more qualitative view of online behaviour so as to report on the overall user experience and report direct feedback given by visitors and customers, even though we could infer some customer experiences at that level (e.g. a high bounce rate means that the website is not fully optimized).
Overall we felt that we succeeded in our effort of defining a set of recommendations for the e-business as well as for website improvement based on the web mining performed. This opinion was corroborated by the project partners, the other project teams and by stakeholders -the company representatives -who were invited to participate in the project final session in order to evaluate the project outcomes. In spite of this, due to time constraints -the overall project covered 10 working days including teaching sessions -, we didn't manage to further enrich the analysis of data with off-line data from company sales and customers, nor to explore the web mining possibilities to a full extent, which would be valuable directions for further work to consider. Finally, building on the experiences and knowledge gained with the project, a curriculum of a Web Mining subject was set within a postgraduate course in Business Intelligence offered at Portucalense University.