Flappy Bird Game using Q Learning

Flappy Bird Game using Q Learning
http://140.117.164.207/
Reinforcement learning
State Space
• Vertical distance from lower pipe(Y)
• Horizontal distance from next pair of pipes(X)
Life: Dead or Living
The distance in X direction is bounded by
0 from below and by 300 from above.
The Y distance ranges from −200 to 200.
Actions
• Click
• Do Nothing
Rewards
• +1 if Flappy Bird is still alive
• -1000 if Flappy Bird is dead
Algorithm
The Learning Loop(1/2)
• The Q table is initialized with zeros.
• Step 1: Observe what state Flappy Bird is in
and perform the action that maximizes
expected reward.
Let the game engine perform its "tick".
Flappy Bird is in a next state, s'.
The Learning Loop(2/2)
• Step 2: Observe new state, s', and the reward
associated with it. +1 if the bird is still alive, 1000 otherwise.
• Step 3: Update the Q array according to the Q
Learning rule.
Q[s,a] ← Q[s,a] + α (r + γ*𝑚𝑎𝑥𝑎 *Q[s',a'] - Q[s,a])
The alpha : 0.7 The dicount factor: 1
• Step 4: Set the current state to s' and start
over.
Example
Q[s,a] ← Q[s,a] + α (r + γ*𝑚𝑎𝑥𝑎 *Q[s',a'] - Q[s,a])
α : 0.7 γ : 1
Rewards: alive :+1 ; dead : -1000
Actions : Click(x-1,y+1) ; Do Nothing(x-1,y-1)
Q S/A Click not
R S/A Click not
(10,1) 0
0
(10,1) 0
0
0
0
0
0
0
0
0
0
0
0
0
0