Huggingface | Applied Data Science in Tourism

[{'generated_text': "Natural Language Processing is developing rapidly and not only the performance of the algorithms is constantly increasing but also their use is becoming much easier. AI communities like Huggingface make a variety of state of the art models accessible and usable with just a few lines of code. For this reason, I took a look at two pipelines and tried them out. One is to try to generate a new text from a short existing text, in the second case text summarization is to be used.\n\nThe data set was divided into two main tasks. First, it needed to get the language in the system, on a local and remote server. Second, we wanted to get a translation of our text for the system.\n\nTo do it, I ran a bot which is very simple and easy to learn. On a local and remote, the system translates the text from local to remote but in addition, the system needs some resources to communicate with the local server. The language model is a state of the art. It is designed mostly with local language recognition in mind, making it far easier to understand a native language as it changes through the code.\n\nWhen we try to find that there are problems in the code then our interpreter will interpret them in a way that will reduce the speed of our language learning. For a few lines of code, our bot creates a bot.js file in the system, in a specific file, in the server's root project directory. Now, we could try to apply"}]

[{'summary_text': ' AI communities like Huggingface make a variety of state of the art models accessible and usable with just a few lines of code . For this reason, I took a look at two pipelines and tried them out . One is to try to generate a new text from a short existing text, in the second case text summarization is to be used .'}]

by Roman Egger

As described in detail in chapter 10 (Classification), for a classification task you need labeled data as training data. Let’s assume we have three different classes “hotel”, “tourist” and “travel agency”. This data can be text data or images, for example. But what if the learner finds among the test data examples of classes that were not present during the training. We humans are usually very good at recognizing things and classifying them correctly, even if we have never seen them before. In zero shot learning (ZSL), observed and unobserved classes are related by auxiliary information and difference features are identified. For images, a description about the appearance of objects can help distinguish one class from another, even if no examples of the unknown class were present in the training data. An example often used for explanation is that between horses and zebras. A human would be able to distinguish a zebra from a horse even if he had never seen a zebra, provided the auxiliary information that zebras are black and white striped was known. Auxiliary information can be, for example, attibutes and metadata, text descriptions, or vectors of word category labels.

ZSL occurs in two steps: First, knowledge about attributes is acquired during training. Then, in the second step of inference, the knowledge is used to partition instances into new categories. Natural Language Inference (NLI) examines whether a hypothesis for a given premise is true (consequent), false (contradictory), or indeterminate (neutral).

Another example would be, if we want to predict the sentiment between the positive and negative groups for the example sentence “I love the city of Salzburg”, this can be tested as follows:

Premise: I love the city of Salzburg
Hypothesis 1: This example is positive.

Premise: I love the city of Salzburg
Hypothesis-2: This example is negative.

Now, for each class, a hypothesis template such as “This example is….” is created to predict the correct class for the premise. If the inference is a consequence, the premise belongs to that class. In this case positive.

Now let´s have a look how we can easily implement Zeros Shot Classification for Texts using Huggingface in just two lines of code.

classifier = pipeline("zero-shot-classification")

classifier("Data Science is altering the tourism industry", candidate_labels=["Business",'Politics', 'Technology'])

We use the “zero-shot-classification” pipeline from Huggingface, define a text and provide candidate labels. For this task, the NLI-based zero-shot classification pipeline was trained using a ModelForSequenceClassification. Numerous available models can be used to fine-tune the NLI task. In this case, the sentence “Data Science is altering the tourism industry” should be correctly classified in one of the following classes “Business”, “Politics”, or “Technology”. The output shows us the probability scores for each category.

{'sequence': 'Data Science is altering the tourism industry',
 'labels': ['Technology', 'Business', 'Politics'],
 'scores': [0.6754791140556335, 0.2831524610519409, 0.04136836156249046]}

I can also recommend CLIP from Openai, a neural network that learns visual concepts from natural language supervision. CLIP can be used for any visual classification benchmark by simply providing the names of the visual categories to be recognized.

Applied Data Science in Tourism

Interdisciplinary Approaches, Methodologies and Applications

Tag Archives: Huggingface

Two lines of code to generate or summarize text.

What is Zero-Shot Classification?