Content¶
All content is taken from this carpentries incubator project.
For instructors: the video script for both parts is available here.
All content is taken from this carpentries incubator project.
For instructors: the video script for both parts is available here.
Question 1
Overfitting describes the situation where …
a neural network produces predictions that are more precise than the training data set allows
a neural network produces random predictions
a neural network learns the distribution of the training and test data exactly
a neural network learns the distribution of the training data exactly and is incapable to predict the test set well
1. no, this is describes a situation that rarely occurs but everyone aspires to (you have found a very good predictor!)
2. no, this is called underfitting - the network is incapable of making any predictions better than random choice
3. no, this is an unrealistic situation as the network should not be able to predict exactly the test set (unseen data)
4. yes, overfitting describes the situation where a predictor is incapable to generalize, i.e. it predicts the training set extremely well, but almost performs random predictions on unseen data (i.e. the test set)
Question 2
Overfitting counter-measures include … (multiple answers possible)
defining a baseline which corresponds to random guessing and comparing current prediction quality to this
trying to obtain more data
varying the size of the neural network with respect to hidden layers, number of neurons, layers types in use and other hyperparameters
ignoring quality measurements on the test or validation sets completely
1. yes, this is also often called using a dummy predictor
2. yes, this would potentially help as the training data would then yield more variability which can more closer to reality and help predicting the test set
3. yes, this would help adopt the capacity of the network to the amount of training data available
4. no, this will not help at all or change anything
Repeat the prediction of the sunshine hours using the data from one other city, e.g.
BUDAPEST_sunshine
DE_BILT_sunshine
DRESDEN_sunshine
…
SONNBLICK_sunshine
STOCKHOLM_sunshine
Do you observe a similar situation than with BASEL? To answer this, choose from any of the following aspects to guide your answer:
How does the situation change if you include 5
years instead of 3
?
What are the model configurations that work best for you, e.g. to bring the RMSE
down below 1
hour?
What happens if you choose the sigmoid
activation?
What happens if you choose a larger batch_size
, e.g. 64
or 128
?