Chapter 10: Classification | Applied Data Science in Tourism

by Ulrich Bodenhofer & Andreas Stöckl

University of Applied Sciences Hagenberg, School of Informatics, Communications and Media

Classification, the task of assigning objects to a given set of categories, is used in almost every field. One important sub-branch of classification consists of methods that learn classification functions from example data. The following chapter will provide an overview of the most basic concepts and methods of this type of data-driven classification. We will first highlight the basic ideas behind classification, along with some examples related to tourism. Thereafter, we will introduce measures of classification performance, which are necessary to direct data-driven training of classification functions and/or to evaluate classification results. As an essential part of this chapter, we will provide self-contained, yet stripped-down, descriptions of the most crucial data-driven classification methods. As such, we will focus on nearest neighbor classifiers, logistic regression, Naïve Bayes, decision trees and ensemble variants thereof, support vector machines, and finally, artificial neural networks. All of the concepts and methods will then be applied to a specific use case in an accompanying Jupyter notebook, demonstrating the practical implementation of these concepts and methods through the use of Python and the machine learning framework scikit-learn.