Test

IT 241
Information Discovery and Architecture Exam 3
Page 1
December 5, 2013
Name _____________________________
This exam permits one page of handwritten notes.
1. Interaction concepts. Consider 1) an interactive geographic map as a pixel level display but the elements are very
hierarchical (states, have counties, towns have streets, etc.), 2) a very large Excel spreadsheet as data level structure,
and 3) a full family tree (including siblings, ancestors and descendants) as a relational/hierarchical structure to answer
the following questions.
[3 pts each=15]
a. Describe panning in the map and in the spreadsheet.
b. How do you distinguish between panning and scrolling?
c. Which of the three objects would the zooming operation be most appropriate? Why? Describe what happens
on a zoom in or out on that object.
d. Distinguish between selection and filtering in the spreadsheet.
e. Apply overview+detail to the family tree. If you focused on one person in the tree, what might you show as
detail for the person? What would be displayed nearby? What could be displayed further away?
f.
Describe what opportunistic intention on the map.
IT 241
Information Discovery and Architecture Exam 3
Page 2
2. Describe the issues with the design of this visualization above. Identify at least 4 problems.
[12 pts]
a.
b.
c.
d.
IT 241
Information Discovery and Architecture Exam 3
Page 3
3. Draw a tree map in the box of the hierarchy to the right. Assume sibling nodes have equal weight.
[5 pts]
C
A
D
B
4. Draw a directed graph (lines have arrowheads) from the following adjacency matrix.
[5 pts]
5. Describe a word cloud. How is it constructed? What could the visualization attributes be based on?
[6 pts]
IT 241
Information Discovery and Architecture Exam 3
Page 4
6. A graphic should have three levels of viewing: what is seen 1) at a distance, 2) in the details, and 3) implicitly.
Explain the meaning of these levels.
[6 pts]
7. Why is a stem and leaf graph considered a good graphic, such as the one below? How does data-ink ratio apply here?
[4 pts]
8. Discuss why area and volume are not good choices to visualize magnitude?
[5 pts]
IT 241
Information Discovery and Architecture Exam 3
Page 5
[17 pts]
9. Decision trees.
a. Given the decision tree rule for the above dataset
IF Sex=Male
THEN CreditCardInsurance=No
Determine its accuracy = ___________% and its coverage = ___________%
b. Draw a decision tree to correspond
with these three production rules.
(Not all leaves are defined.)
IF Sex=Male
THEN CreditCardInsurance=No
IF Sex=Female && IncomeRange=30-40K
THEN CreditCardInsurance=Yes
IF Sex=Female && IncomeRange != 30-40K
THEN CreditCardInsurance=No
c. In predicting MagazinePromotion, why is the entropy=0 bits for Salary=”50-60K”?
d. In predicting WatchPromotion, the entropy for Salary=”30-40K” is expressed as
info([ ___ , ___ ]) = entropy ( ____/___ , ___/___) [There are only 3 different numbers in these 6 blanks.]
IT 241
Information Discovery and Architecture Exam 3
Page 6
10. Association Rules.
[13 pts]
a. Using the credit card data from the previous page, identify 4 single items sets would be generated with a
confidence threshold of 33%? (exclude the age attribute)
single item sets
Number of items
A.
B.
C.
D.
b. What pairings of your 4 item sets A-E, if any, also meet the 33% confidence threshold?
c. If you had the item set pairing (which you may not necessarily have) of Sex=Male and LifeInsPromo=Yes,
what two rules could be expressed?
i. IF ________________________ THEN _________________________
ii. IF ________________________ THEN _________________________
11. Given the confusion matrix.
[5 pts]
Actual\Predicted
Cat
Dog
Rabbit
Cat
9
1
2
Dog
5
10
3
Rabbit
2
2
16
a. Total number of dogs = ____________
b. Number of cats classified as a dog = _______
c. Number of dogs incorrectly classified = ________
d. Percent classified correctly (all three categories) = ________
[7 pts]
12. Clustering data mining true/false.
_____ The user must pre-specify the number of clusters.
_____ Outliers are chosen to initialize the cluster centroids.
_____ Euclidean distances are preferred in the K-means clustering when associating data points to cluster centroids.
_____ The K-means clustering algorithm determines the final clusters in 2 iterations.
_____ Different random number seeds are good to experiment with to start the K-means algorithm.
_____ The Perceptron method is a form of neural networks.
_____ The Perceptron method adjusts its biases away from an instance when it discovers a misclassified instance.
IT 241
Information Discovery and Architecture Exam 3
Page 7
Course feedback regarding the shared lectures with University of Applied Sciences in Germany. I know this isn’t very
anonymous, but hope you will still give us constructive suggestions of what worked and what didn’t work and
what we might consider doing differently.
What was good about sharing lectures?
What didn’t work for you?
What suggestions do you have for us?
What concerns would you have if we arranged for some short group projects that involved collaboration internationally
in the course?
Thank you!