Lab2: Temporal Difference Learning Solution

$30.00 $24.00

Lab Objective: In this lab, you will learn temporal difference learning (TD) algorithm by solving the 2048 game using an -tuple network. Turn in: Experiment report (.pdf) Source code [NOT including model weights] Notice: zip all files with name “DLP_LAB2_StudentId_Name.zip”, e.g.: 「DLP_LAB2_0856738_鄭紹雄.zip」 Lab Description: Understand the concept of (before-)state and after-state. Learn to construct and…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

Lab Objective:

In this lab, you will learn temporal difference learning (TD) algorithm by solving the 2048 game using an -tuple network.

Turn in:

  1. Experiment report (.pdf)

  1. Source code [NOT including model weights]

Notice: zip all files with name “DLP_LAB2_StudentId_Name.zip”,

e.g.: DLP_LAB2_0856738_鄭紹雄.zip

Lab Description:

  • Understand the concept of (before-)state and after-state.

  • Learn to construct and design an -tuple network.

  • Understand TD algorithm.

  • Understand Q-learning network training.

Requirements:

  • Implement TD(0) algorithm

    • Construct an -tuple network

    • Action selection according to the -tuple network

    • Calculate TD-target and TD-error

    • Update V(state), not V(after-state).

    • Understand temporal difference learning mechanisms

1

Deep Learning and Practice 2021 Spring; NYCU CGI Lab

Game Environment – 2048:

  • Introduction: 2048 is a single-player sliding block puzzle game. The game’s objective is to slide numbered tiles on a grid to combine them to create a tile with the number 2048.

  • Actions: Up, Down, Left, Right

  • Reward: The score is the value of new tile when two tiles are combined.

  • A sample of two-step state transition

Implementation Details:

Network Architecture

  • -tuple patterns: 4 × 6-tuples with all possible isomorphisms

Training Arguments

  • Learning rate: 0.1

Learning rate for features of -tuple network with features: 0.1 ÷

  • Train the network 500k ~ 1M episodes

2

Deep Learning and Practice 2021 Spring; NYCU CGI Lab

Algorithm:

A pseudocode of the game engine and training. (modified backward training method)

function PLAY GAME

0

INITIALIZE GAME STATE

while IS NOT TERMINAL STATE( ) do

argmax EVALUATE( , ’)

′∈ ( )

, , ′′ MAKE MOVE( , )

SAVE RECORD( , , , ’, ’’)

+

← ′′

for ( , , , ’, ’’) FROM TERMINAL DOWNTO INITIAL do LEARN EVALUATION( , , , ’, ’’)

return

function MAKE MOVE( , )

, ← COMPUTE AFTERSTATE( , )

′′ ← ADD RANDOM TILE()

return ( , , ′′)

TD-state

function EVALUATE( , )

, ← COMPUTE AFTERSTATE( , )

′′ ALL POSSIBLE NEXT STATES( )

return + Σ ′′′′ ( , , ′′) ( ′′)

function LEARN EVALUATION( , , , , ′′)

( ) ← ( ) + ( + ( ′′) − ( ))

TD-after-state

function EVALUATE( , )

, ← COMPUTE AFTERSTATE( , )

return + ( )

function LEARN EVALUATION( , , , , ′′)

argmax ( ′′, )

′∈ ( ′′)

, ( ′′, )

3

Deep Learning and Practice 2021 Spring; NYCU CGI Lab

Rule of Thumb:

  • You can design your own -tuple network, but do NOT try CNN.

  • 2048-tile should appear within 10,000 episodes.

Scoring Criteria:

Show your work, otherwise no credit will be granted.

  • Report (60%)

    • A plot shows episode scores of at least 100,000 training episodes (10%)

    • Describe the implementation and the usage of -tuple network. (10%)

    • Explain the mechanism of TD(0). (5%)

    • Explain the TD-backup diagram of V(after-state). (5%)

    • Explain the action selection of V(after-state) in a diagram. (5%)

    • Explain the TD-backup diagram of V(state). (5%)

    • Explain the action selection of V(state) in a diagram. (5%)

    • Describe your implementation in detail. (10%)

    • Other discussions or improvements. (5%)

  • Demo Performance (40%)

    • The 2048-tile win rate in 1000 games,winrate2048.(20%)

    • Questions. (20%)

References:

  1. Szubert, Marcin, and Wojciech Jaśkowski. “Temporal difference learning of N-tuple networks for the game 2048.” 2014 IEEE Conference on Computational Intelligence and Games. IEEE, 2014.

  1. Kun-Hao Yeh, I-Chen Wu, Chu-Hsuan Hsueh, Chia-Chuan Chang, Chao-Chin Liang, and Han Chiang, Multi-Stage Temporal Difference Learning for 2048-like Games, accepted by IEEE Transactions on Computational Intelligence and AI in Games (SCI), doi: 10.1109/TCIAIG.2016.2593710, 2016.

  1. Oka, Kazuto, and Kiminori Matsuzaki. “Systematic selection of n-tuple networks for 2048.” International Conference on Computers and Games. Springer International Publishing, 2016.

  1. moporgic. “Basic implementation of 2048 in Python.” Retrieved from Github: https://github.com/moporgic/2048-Demo-Python.

  1. moporgic. “Temporal Difference Learning for Game 2048 (Demo).” Retrieved from Github: https://github.com/moporgic/TDL2048-Demo.

  1. lukewayne123. “2048-Framework” Retrieved from Github: https://github.com/lukewayne123/2048-Framework.

4

Lab2: Temporal Difference Learning Solution
$30.00 $24.00