Some applications in human behavior modeling

Some applications in human behavior modeling
(and open questions)
Jerry Zhu
University of Wisconsin-Madison
Simons Institute Workshop on Spectral Algorithms:
From Theory to Practice
Oct. 2014
1 / 30
Outline
Human Memory Search
Machine Teaching
2 / 30
Verbal fluency
Say as many animals as you can without repeating in one minute.
3 / 30
Semantic “runs”
1. cow, horse, chicken, pig, elephant, lion, tiger, porcupine,
gopher, rat, mouse, duck, goose, horse, bird, pelican, alligator,
crocodile, iguana, goose
2. elephant, tiger, dog, cow, horse, sheep, cat, lynx, elk, moose,
antelope, deer, tiger, wolverine, bobcat, mink, rabbit, wolf,
coyote, fox, cow, zebra
3. cat, dog, horse, chicken, duck, cow, pig, gorilla, giraffe, tiger,
lion, ostrich, elephant, squirrel, gopher, rat, mouse, gerbil,
hamster, duck, goose
4. cat, dog, sheep, goat, elephant, tiger, dog, deer, lynx, wolf,
mountain goat, bear, giraffe, moose, elk, hyena, aardvark,
platypus, lion, skunk, wolverine, raccoon
5. dog, cat, leopard, elephant, monkey, sea lion, tiger, leopard,
bird, squirrel, deer, antelope, snake, beaver, robin, panda,
vulture
6. deer, muskrat, bear, fish, raccoon, zebra, elephant, giraffe,
cat, dog, mouse, rat, bird, snake, lizard, lamb, hippopotamus,
elephant, skunk, lion, tiger
4 / 30
Memory search
K = 12, obj = 14.1208
human
hippopotamus
whale
shark
musky
dolphin
squid
jaguar
1
porpoise
walrus
alligator
baboon
iguana
muskox
gorilla
0.5
ape
0
pony
reindeer
orangutan
zebra
manatee
polar bear
tiger
elephant
monkey
cougar
rhinoceros
kangaroo
sea lion
panda
moleseal
ox
puma pig
cow
cheetah
otter
lamblion
bison
aardvark
wolf
ocelot
chipmunk
armadillo
rabbit
wallaby
moose
dog
raccoon
elk
skunk muskrat
llamalemur
wildebeest
chimp
prairie dog
panther
bear
porcupine
Guinea pig
tasmanian devil
wolverine
goat
bobcat
hedgehog
fox
ram
cat
badger mongoose
leopard
platypus
hyena
yak dingo
buffalo
alpaca
mule
deer
opossum
impala
horse donkey gazelle
chinchilla rodent
sloth
wombat
tapir
coyote
gerbil
beaver
koala racoon
anteater
lynx gibbon
penguin
jackrabbit
vole
bunny
antelope
mouse
groundhog
weasel
sheep
woodchuck
jackal
shrew
caribou
camel
marten
mare
giraffe
bull
−0.5
hare
colt
emu
ibis
−1
eel
boa onstrictor
c
crocodile
crab
cobra snake
rattlesnake
clam
worm
python
tortoise
gilamonster
salamander
fish
chameleon
frog
anemone
amphibian
tadpole
octopus
snailcaterpillar
lobster
lizard
cockroach
ant
centipede
shrimp
insect
spider
newt
ladybug
gecko
wasp
duck
pelican
hornet
fly
flea
beetle
hamstersquirrel
gopher
turtle
gnat
butterfly
parrot
swan
roadrunner
raven
fowl
rat
woodpecker
bat
quail crow
rooster
loonflamingo
oriole
vulture
ostrich
wren
sparrow
owl
heron
pigeon
buzzard
gull
seagull
turkey crane
dove
bird
bluejay
canary
cockatiel
hawk
chicken
falcon
eagle
peacock
robinchickadee
grouse
moth
bee
mosquito
bumblebee
grasshopper
cricket
cardinal
hummingbird
goose
−1.5
−1
−0.5
0
0.5
1
1.5
5 / 30
Memory search
K = 12, obj = 14.1208
human
hippopotamus
whale
dolphin
shark
musky
sq
jaguar
porpoise
walrus
alligator
baboon
iguana
muskox
boaconst
crocodile
cobra snake
turtle
rattlesnake
manatee
tiger
gar
otter
monkey
kangaroo
sea lion
puma pig
moleseal
aardvark
ocelot
chipmunk
armadillo
rabbit
wallaby
moose
dog
raccoon
skunk muskrat
llamalemur
wildebeest
prairie dog
panther
porcupine
Guinea pig
tasmanian devil
wolverine
goat
hedgehog
fox
cat
badger mongoose
wolf
gilamonster
tortoise
salama
cha
6 / 30
Censored random walk
[Abbott, Austerweil, Griffiths 2012]
I
V: n animal word types in English
I
P : (dense) n × n transition matrix
I
Censored random walk: observing only the first token of each
type x1 , x2 , . . . , xt , . . . ⇒ a1 , . . . , an
I
(star example)
7 / 30
The estimation problem
Givenn
m censored random
walks
o
(1)
(1)
(m)
(m)
D=
a1 , ..., an , ..., a1 , ..., an
, estimate P .
8 / 30
Each observed step is an absorbing random walk
(with Kwang-Sung Jun)
I
P (ak+1 | a1 , . . . , ak ) may contain infinite latent steps
I
Instead, model this observed step as an absorbing random
walk with absorbing states V \ {a1 , . . . , ak }
Q R
P ⇒
0 I
I
9 / 30
Maximum Likelihood
I
Fundamental matrix N = (I − Q)−1 , Nij is the expected
number of visits to j before absorption when starting at i
P
p(ak+1 | a1 , . . . , ak ) = ki=1 Nki Ri1
P Pn
(i)
(i)
(i)
log likelihood m
i=1
k=1 log p(ak+1 | a1 , ..., ak )
I
Nonconvex, gradient method
I
I
10 / 30
Other estimators
I
PRW: pretend a1 , . . . , an not censored
I
PFirst2: Use only a1 , a2 in each walk (consistent)
11 / 30
Star graph
x-axis: m, y-axis: kPb − P k2F
12 / 30
2D Grid
13 / 30
Erdös-Rényi with p = log(n)/n
14 / 30
Ring graph
15 / 30
Questions
I
Consistency?
I
Rate?
16 / 30
Outline
Human Memory Search
Machine Teaching
17 / 30
Learning a noiseless 1D threshold classifier
θ∗
x ∼ uniform[0, 1]
−1, x < θ∗
y=
1, x ≥ θ∗
Θ = {θ : 0 ≤ θ ≤ 1}
18 / 30
Learning a noiseless 1D threshold classifier
θ∗
x ∼ uniform[0, 1]
−1, x < θ∗
y=
1, x ≥ θ∗
Θ = {θ : 0 ≤ θ ≤ 1}
Passive learning:
iid
1. given training data D = (x1 , y1 ) . . . (xn , yn ) ∼ p(x, y)
2. finds θ̂ consistent with D
18 / 30
Learning a noiseless 1D threshold classifier
θ∗
x ∼ uniform[0, 1]
−1, x < θ∗
y=
1, x ≥ θ∗
Θ = {θ : 0 ≤ θ ≤ 1}
Passive learning:
iid
1. given training data D = (x1 , y1 ) . . . (xn , yn ) ∼ p(x, y)
2. finds θ̂ consistent with D
Risk |θ̂ − θ∗ | = O(n−1 )
18 / 30
Active learning (sequential experimental design)
Binary search
θ∗
19 / 30
Active learning (sequential experimental design)
Binary search
θ∗
Risk |θ̂ − θ∗ | = O(2−n )
19 / 30
Machine teaching
What is the minimum training set a helpful teacher can construct?
20 / 30
Machine teaching
What is the minimum training set a helpful teacher can construct?
θ∗
Risk |θ̂ − θ∗ | = , ∀ > 0
20 / 30
Comparing the three
θ∗
O(1/n)
passive learning "waits"
The teacher knows
θ∗
{
{
θ∗
θ∗
O(1/2n)
active learning "explores"
teaching "guides"
and the learning algorithm.
21 / 30
Example 2: Teaching a Gaussian distribution
Given a training set x1 . . . xn ∈ Rd , let the learning algorithm be
n
µ̂ =
1X
xi
n
i=1
n
Σ̂ =
1 X
(xi − µ̂)(xi − µ̂)>
n−1
i=1
How to teach N (µ∗ , Σ∗ ) to the learner quickly?
22 / 30
Example 2: Teaching a Gaussian distribution
Given a training set x1 . . . xn ∈ Rd , let the learning algorithm be
n
µ̂ =
1X
xi
n
i=1
n
Σ̂ =
1 X
(xi − µ̂)(xi − µ̂)>
n−1
i=1
How to teach N (µ∗ , Σ∗ ) to the learner quickly?
non-iid, n = d + 1
22 / 30
An Optimization View of Machine Teaching
A
D
A(D)
∗
θ
−1
−1 ∗
Α
Α (θ )
Θ
D
D
min
D∈D
s.t.
|D|
A(D) = θ∗
The objective is the teaching dimension [Goldman, Kearns 1995] of
θ∗ with respect to A, Θ.
23 / 30
Example 3: Works on Humans
Human categorization on 1D stimuli [Patil, Z, Kopeć, Love 2014]
human training set human test accuracy
machine teaching
72.5%
iid
69.8%
(statistically significant)
24 / 30
New task: teaching humans how to label a graph
Given:
I
a graph G = (V, E)
I
target labels y ∗ : V 7→ {−1, 1}
a label-completion cognitive model A (graph diffusion
algorithm) such as:
I
I
I
I
I
mincut
harmonic function [Z, Gharahmani, Lafferty 2003]
local global consistency [Zhou et al. 2004]
...
Find the smallest seed set:
min
S⊆V
s.t.
|S|
A(y ∗ (S)) = y ∗
(inverse problem of semi-supervised learning)
25 / 30
Example: A = local-global consistency [Zhou et al. 2004]
F = (1 − α)(I − αD−1/2 W D−1/2 )−1 y ∗ (S)
1
y = sgn F
−1
26 / 30
|A−1 (y ∗ )| is large
Turns out 4649 out of 216 = 65536 training sets S lead to the
target label completion
27 / 30
Optimal seed set 1: |S| = 3
28 / 30
Optimal seed set 2: |S| = 3
29 / 30
Questions
I
How does |S| relate to (spectral) properties of G?
I
How to solve for S?
30 / 30