Chapter 8: Clustering | Applied Data Science in Tourism

by Matthias Fuchs & Wolfram Höpken

Mid Sweden University, Department of Economics, Geography, Law and Tourism
University of Applied Science Ravensburg-Weingarten

This chapter will discuss the unsupervised machine learning technique known as clustering and its main approaches and use cases. After presenting typical application areas for the tourism industry, the mathematical principle of clustering will be explained. Various techniques for representing differences between cases or clusters will be introduced, and major methods used to form clusters based on these differences will be presented (i.e. single linkage, complete linkage, average linkage, and centroid). Subsequently, the three most widely applied clustering approaches will be described. First, major concepts of hierarchical clustering, like divisive and agglomerative techniques, will be highlighted. Second, the partitioning technique k-means will be introduced, and, third, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) will be discussed. By using real tourism data and the data science platform RapidMiner, the practical demonstration will then explain step-by-step how clustering approaches can be executed. After employing typical processes for data transformation and normalization, RapidMiner processes for k-means, hierarchical clustering, and DBSCAN will be shown, and the clustering results will be discussed. Lastly, a tourism case applying k-means and DBSCAN to identify points of interest based on uploaded photo data extracted from the platform Flickr will conclude the chapter.