Feedback * Lab 2

Feedback – Lab 2
9 Sept 2014
Your learning experience in this course
• Active Listening Video Lectures
– Underlying question is ”how am I going to use this
concept later on”?
• Consolidating new knowledge via quizzes and
surveys
– Direct questions will help you memorize the concepts
– How shall i solve this problem with the knowlege that
I have acquired so far? – Lab Classes
Lab Sessions:
Text Comprehension & Task Interpretation
• (Always: point out inaccuracies)
• Use Case 1
– I do not understand the text:
•
go back to the video lecture, probably you have not
built the background context required for completing
the task.
…. Continued
Use case 2:
– Oh my god, what am I supposed to do here?
• read the text several times and identify the key points in the text
strucure:
- Description
- The purpose
- Tasks
- pre-processing: feature transformation
- identify the best features by applying your
knowledge about empirical error
- Interpret the results based on your knowledge
about empirical error and your common
sense knowledge or historical research.
My expectations on your learning
experience
• Students should be able to interpret the text
and the tasks (diversified interpretations are
allowed and welcome)
• Students should be able to show critical mind
by working out a plausible interpretation(s)
and motivate their choice (s).
About instructions and time…
• I am not sure that instructions were unclear.
• The core task is the representation bin0 and
bin1 in order to apply the formulae.
• This was the cognitive effort of this lab.
• You could work in groups and groups could
exchange info between them… and for several
hours…. And you made it!
Pre-processing: feature transformation
• Categorical features  Binary features
– Each feature shoud assume a value 0 or a value 1
following the instructions under the heading
”Preprocessing” (search & replace; if formulae;
whatever…)
The task was about empirical error
(Lect 6, min 7:44)
• Empirical error: how well the chosen
hypothesis classifies the training data.
• How do you assess a hypothesis?
– Systematic counting of correct guesses and wrong
guesses made by the hypothesis wrt the correct
labels
– This means that you must compare the
predictions of the hypothesis with the actual
labels
Lab Task
• Our hypotheses were the different features.
• We have to assess each feature wtr to
classiffication (survived vs died)
1) For each feature, calculate the
empirical error
•
LEARN TO PREDICT THE FIRST COLUMN
– (a) For each of the features calculate (and write down) the training error if you used only that
feature to classify the data. To do this you will need to do the following for each feature:
– Split the data based on that feature. Call bin0 all examples that have 0 for that features and
bin1 all examples that have 1 for that feature.
– Calculate the majority count for the label in each bin, i.e. for bin0, majority(bin0) =
max(count(bin0 = survive); count(bin0 = notsurvive))
Accuracy/Error
• A possible representation….
WATCH OUT! AGE FEATURE IS TRICKY HERE!
Other representations (etc. etc.)
Which feature would be best to use?
• EMBARKED… if we trust this sample and our
calculations… (error rate on this feature is the
lowest)
• Basically this means that many of those who
started their trip from Southampton did not
survived.
• However, the difference betw the features was
very small!
Many interesting interpretations!
None believed that Embarked was a good
feature for real 
• ”this could depend on the small dataset”
• ”embarked feature gave the lowest error […] Intutivetly
the first class feature should have the strongest
relationship with the chance of surviving”
• ”If we calculate accuracy with more features […], we
get more interesting results”
• ”The Embarked would be the best to use because it has
the lowest error rate. In reality it is very unlikely that
the city has any correlation with their chance of
survival, unless they recieved some special training
before boarding or shared a rough upbringing in the
city”
• Etc.
Missing values
• Good that you noticed that there were missing values,
ie cells without any value!
– Some of you have removed them
– Some of you have coverted to >25
• In practice, missing values require ”more investigation”
• Missing values are not considered to be ”noise” in the
sense that was explained during the video lecture.
Technical troubles
• If you experience problems with a computer:
configuration problems, weird behaviour, etc.
just change computer and report the touble
(Per?)
Next…
• Those who have miscalculated the empirical error
should recalculated in the correct way as
presented.
• Those who want, can have some additional
training with an optional task that is on the
website. It contains the solution. You do not need
to submit anything. It is just for you!
• All those who have submitted the report have
completed this lab task. Well done!