Code Snippet: Off-Policy Actor-critic (Off-PAC) in a 2D puddle world

Description: Off-policy Actor-Critic (Off-PAC) learning off-policy from a behavior policy using an uniform policy distribution, the representation is using tile-coding with a murmur2 hashing function, the target policy is a Gibbs distribution for discrete action.

Source code:

Doxygen: OffPACPuddleworld.java
Github: OffPACPuddleWorld.java

Reference:

Off-policy Actor-Critic. T. Degris, M. White, R. S. Sutton (2012). In Proceedings of the 29th International Conference on Machine Learning.

Running this demo:

From the command line:
1. Download rlpark.jar
2. Run the following command line:
  java -cp rlpark.jar rlpark.example.demos.learning.OffPACPuddleWorld
In Zephyr standalone application:
1. Download Zephyr standalone application
2. Install RLPark plug-ins in Zephyr
3. Go to: Demos->Off-PAC in a Puddle World
In Eclipse, as a Java Application:
1. Create a new Java Project or use an existing project
2. Include rlpark.jar in the project classpath
3. Run a Java Application target using rlpark.example.demos.learning.OffPACPuddleWorld as a main class
In Eclipse, as an Eclipse Application:
1. Install Zephyr plug-ins and RLPark plug-ins in Eclipse
  or
  download RLPark source code and import RLPark projects (including the demo project rlpark.example.demos) into the workspace
2. Set up an Eclipse Application target following the tutorial Using Zephyr plug-ins
3. In the Eclipse Application target configuration:
  1. In the menu, go to: Run->Run Configurations...
  2. Select the Eclipse Application target
  3. In the Plug-ins tab, select the plug-in rlpark.example.demos and rlpark.plugin.rltoysview to enable RLPark views
4. Start Zephyr by running the Eclipse Application target
5. In the Zephyr menu, go to: Demos->Actor-Critic on Pendulum
  or in the Arguments tab, add rlpark.example.demos.learning.OffPACPuddleWorld to the Program Arguments text field

Dependencies

zephyr.plugin.core.api, rlpark.plugin.rltoys

Documentation