Content
The content is split in 2 parts:
For instructors: the video script for both parts is available here.
Check your Learning
The following questions serve as a help for learners to reflect on the content of the videos. Answer at least one question. At best you want to answer these questions as a team.
Question 1
You are provided a table of measurements from a weather station. Each measurement comes with values for temperature, precipation, cloud structure, date, humidity, and a quality ID. The latter tells you if the instrument was performing OK. You’d like to learn an algorithm that is able to predict the quality ID (5 possible integer values from 0 to 4) for any new data coming in. This falls into …
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Solution
Question 2
You are given a data set of iris flowers. The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. Which of the following feature combinations lend themselves for clustering? See this overview plot for help. (Multiple choices possible.)
Sepal.Length versus Sepal.Width
Sepal.Length versus Petal.Width
Petal.Length versus Petal.Width
Sepal.Width versus Petal.Width
Solution
The solution to this is not so clear cut and will depend effectively on the algorithm of your choice. By eye, we can make the following observations:
1. not well separated, ergo: not suited for clustering
2. overlap of the clusters is small, ergo: might work for clustering
3. overlap of the clusters is smallish for 2 of 3 clusters,
might work for clustering depending on what performance you aspire to
4. overlap of the clusters is smallish for 2 of 3 clusters,
might work for clustering depending on what performance you aspire to
Question 3
You are helping to organize a conference of more than 1000 attendants. All participants have already paid and are expecting to pick up their conference t-shirt on the first day. Your team is in shock as it discovers that t-shirt sizes have not been recorded during online registration. However, all participants were asked to provide their age, gender, body height and weight. To help out, you sit down to write a python script that predicts the t-shirt size for each participant using a clustering algorithm. You know that you can only get 7 t-shirt sizes (XS, S, M, L, XL, XXL). This falls into:
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Solution
This is an **unsupervised problem**. You know that you can expect 7 categories or clusters in the data. But you have no idea how they are spread across ``age, gender, body height and weight``. So unsupervised methods will help you here most likely.