CSSE 413 - Rose

(CSSE 413
(Quiz # 4
Name_______________________________
(Tuesday, Nov. 1, 2005
(Take home, open book, due to be turned in on paper, at start of class Thursday
(Please take no more than 75 minutes to complete this quiz
(As usual, each question is worth 20 points in the great scheme of things
(If you need more space, use the back of the page 
but conciseness is also rewarded! )))))))
1. (Ch 18) We never use the same attribute twice along one path in a decision tree. Why not?
Answer: This is problem 18.4 in the book. The authors say, “In standard decision trees, attribute
tests divide examples according to the attribute value. Therefore any example reaching the
second test already has a known value for the attribute and the second test is redundant…”
Some people said,
“After the first time, it wouldn’t pass the significance test.”
“The goal is to have the smallest tree possible, [and this certainly wouldn’t meet that goal.]”
“This doesn’t add any new information – that’s already been discovered.”
“Any branch contradicting the first decision on this data would never be taken.”
2. (Ch 18) How can you use a “significance test” to prevent overfitting in decision tree learning?
Answer: [p. 662] We prune nodes in the tree that are not clearly relevant to solving the problem,
with that relevance judged by the information gain. If that gain is close to zero, the node probably
isn’t adding anything. A significance test is used to make this decision on such nodes. We set up
a null hypothesis that there is no underlying pattern, then use the actual data to see if we can
disprove that. Usually, something like a 5% level of significance is required to reject the null
hypothesis (and thus keep this node around in the tree).
Some people said,
“The significance test determines if a deviation of data is statistically significant enough to warrant
assuming a pattern exists in the data,” and similar, more concise versions of the above.
Some added, “The test makes sure we don’t check what we don’t have to.”
3. (Ch. 19) Since Prince Charles is currently on an official visit to the US, we can’t resist using
his family tree in a problem:
Let’s invent a new family relationship “P” for this family, in terms of the tree shown. Say, we
have the following true predicates about this new predicate:
P(Mum, Philip), P(Mum, Diana), P(Mum, Mark), P(Mum, Sarah), P(Philip, Diana), P(Philip, Mark),
P(Philip, Sarah), P(Diana, Mark), P(Mark, Sarah), and P(Diana, Sarah); and the reverse of all
these, like P(Philip, Mum), etc.
But P(x, y) for everyone else. That means for every other possible pair in the family, or any x
and/or y not shown. So, P(George, Peter), say, and also P(Kydd, Steve) and P(Dave, Jane),
for instance, for some non-pictured people Steve, Dave, and Jane.
Describe how FOIL should end up describing this new predicate in terms of its known ways of
describing things – using the list of predicates on p. 699, plus, the ones suggested on that page,
like Ancestor. Your solution should work for people who are not members of the Royal Family,
like Steve, as well as for those who are.
Answer: These are all “Royal outlaws” to each other, people who married into the Royal Family.
The problem then is to express this in terms of the predicates shown. Staring at the tree, then,
one idea for a solution is to use something like Ancestor(George, x) for each party in the Ppredicate. But that only works if one is in the Royal Family. We don’t want P(Dave, Jane) to be
true. There’s also a slight problem with figuring Mum into the equation if we’re going to talk about
being married to ancestors of George as a qualifier. And we don’t wish to include Spencer and
Kydd, it seems from the data given, even though Ancestor(George, x) is true for them as well.
So, to fix up all these problems, we end up with P(x, y) meaning something like, “They aren’t
blood-royalty but they are married to blood-royalty” –
P(x,y)   z Ancestor(The Royals, x)  Married(x, z)  Ancestor(The Royals, z)
  w Ancestor(Royals, y)  Married(y, w)  Ancestor(Royals, w)
 x  y.
Some people assumed the Royals don’t marry close relatives (a fair assumption given the above
data, if not in general for European royalty). This lets you get by with the simpler:
P(x,y)   z, w Married(x, z)  Ancestor(The Royals, z)
 Married(y, w)  Ancestor(Royals, w)
 x  y.
And if you ignored all us commoners, just focusing on the tree shown, you could indeed have said
something much simpler, like,
P(x,y)  Ancestor(The Royals, x)  Ancestor(Royals, y)  x  y ,
Except you then had to take out Spencer and Kydd explicitly – a bit inelegant.
And the original idea of
P(x,y)  Ancestor(George, x)  Ancestor(George, y)  x  y
was then equivalent (with the same problem about Spencer and Kydd). I decided these answers
should be almost full credit.
Almost every other answer ended up way messier. Some folks invented their own helper
predicates, which I chose to allow, like ParentInLaw(z, w). This particular one seemed to provide
some simplification. Certainly, one like AncestorInLaw would have been even more useful, but it
would have assumed away most of the problem.
4. (Ch 20) Two statisticians go to the doctor and are both given the same prognosis: A 40%
chance that the problem is the deadly disease A, and a 60% chance of the fatal disease B.
Fortunately, there are anti-A and anti-B drugs that are inexpensive, 100% effective, and free of
side-effects. The statisticians have the choice of taking one drug, both, or neither. The first
statistician is an avid Bayesian – What will she do? How about the second statistician, who
always uses a maximum likelihood hypothesis?
Answer: This is problem 20.4 Russell says, “The Bayesian would take both drugs. The
maximum likelihood approach would be to take the anti-B drug.” The second approach enforces
choosing from among the alternatives, picking the most likely of those. The first approach allows
one to consider the combination A  B in the decision tree.)
5. (Ch 26) Russell & Norvig ask, “Will the use of AI systems result in a loss of accountability?”
We could ask the reverse of that, “Will there be too much accountability imposed?”
Let’s say your automobile has geo-positioning and reports back to its Off-Star system not only
where you are at all times but also associated indicators of car performance, such as your
direction “vector,” which could be used to calculate how fast you are going. The State of Mindiana
decides this data stream from each car would enable a wonderful new way to make money and
improve highway safety. They build an AI system which matches your location, direction and
speed against traffic rules for that location. If you happen to violate a rule, such as driving too
fast or not stopping fully at a stop sign, they mail you a ticket. Much to your dismay, you discover
the system is legal in Mindiana. On what bases could you argue that there is “too much
accountability” in this AI system?
Answer: Surely opinions will vary. However, key points expected include some or all of these:
o
The fact that the system does not judge the situation within which violations occur, unlike
a human intervention by the police. Thus, for example, the person likely to get a record
number of tickets would be someone rushing another person to the hospital, or perhaps
someone fleeing from a carjacker.
o
The fact the system would miss key data unless it was extremely complex. For example,
a police officer waving you through a red light in a construction zone.
o
The fact that the system would change driver behavior in very substantial ways, without
our having tested whether or not this really has benefits (like safety). For example, in the
city, where nobody uses cruise control, everyone would have to be looking down to check
their speed all the time, because of the perhaps thousand-fold increased chance of
getting a ticket. Is it really safer for everyone to be looking down that much, just to drive
a few miles per hour slower?
o
The citizens might well rebel against being forced into such rigid new watchfulness. One
only has to think of all the places where motorcyclists got mandatory helmet laws
repealed as a nuisance, to visualize people’s reaction to a true 100% surveillance of their
driving behavior.
o
The laws were written with a knowledge that only a small percentage of the people
breaking traffic laws would get caught. Not clear even the legislators or traffic engineers
would want the laws so rigidly enforced. Perhaps the goal of making a 35 mph speed
limit on a certain road was that most drivers would thus keep it under 40, for example.