Tutorial: Testing the correctness of an algorithm
Online non-stationary supervised learning
Check that online learning algorithms are able to learn to predict with an error below some threshold.Code snippet: testing the IDBD algorithm
NoisyInputSum problem = new NoisyInputSum(); double error = problem.evaluateLearner(new IDBD(NoisyInputSum.NbInputs, 0.001)); Assert.assertEquals(2.0, error, 0.1);See the package rlpark.plugin.rltoys.junit.algorithms.predictions.supervised for more tests.
Online temporal difference learning
Check that online TD algorithms are able to converge on-policy to the TD fix point on a random walk problem.Code snippet: testing the TD(λ) algorithm
RandomWalk randomWalkProblem = new RandomWalk(new Random(0)); FiniteStateGraphOnPolicy.testTD(randomWalkProblem, new OnPolicyTDFactory() { @Override public OnPolicyTD create(int nbFeatures) { return new TDLambda(0.1, 0.9, 0.01, nbFeatures); } });See rlpark.plugin.rltoys.junit.algorithms.predictions.td.TDTest for more tests.
Check that online TD algorithms are able to converge off-policy to the TD fix point on a random walk problem.
Code snippet: testing off-policy TD and Gradient-TD algorithms
OffPolicyTDFactory tdFactory = new OffPolicyTDFactory() { @Override public OffPolicyTD newTD(double lambda, double gamma, int vectorSize) { return new GTDLambda(lambda, gamma, alpha_v, alpha_w, vectorSize, new AMaxTraces()); } }; TestingResult<OffPolicyTD> result = RandomWalkOffPolicy.testOffPolicyGTD(lambda, gamma, targetLeftProbability, behaviourLeftProbability, tdFactory); Assert.assertTrue(result.message, result.passed);See rlpark.plugin.rltoys.junit.algorithms.predictions.td.GTDLambdaTest for more tests.
Control on a reinforcement learning problem
Check that online control algorithms are able to learn on-policy a policy with some minimal performance.Code snippet: test an actor-critic algorithm on the pendulum problem
Assert.assertTrue(PendulumOnPolicyLearning.evaluate(new ActorCriticFactory() { @Override public ControlLearner create(int vectorSize, double vectorNorm, PolicyDistribution policyDistribution) { TD critic = new TD(1.0, 0.5 / vectorNorm, vectorSize); Actor actor = new Actor(policyDistribution, 0.05 / vectorNorm, vectorSize); return new AverageRewardActorCritic(0.01, critic, actor); } }) > .75);See rlpark.plugin.rltoys.junit.algorithms.control.actorcritic.ActorCriticOnPolicyOnPendulumTest for more tests.
Check that online control algorithms are able to learn off-policy a policy with some minimal performance. A fixed exploration policy is run in a problem to generate data that is used to improve a target policy. Then, the performance of the target policy is evaluated on a copy of the problem.
Code snippet: test the off-pac algorithm on the mountain car problem
final MountainCarEvaluationAgentFactory factory = new MountainCarEvaluationAgentFactory() { @Override public OffPolicyAgentEvaluable createOffPolicyAgent(Random random, MountainCar problem, Policy behaviour, double gamma) { Projector criticProjector = MountainCarOffPolicyLearning.createProjector(random, problem); OffPolicyTD critic = createCritic(criticProjector, gamma); StateToStateAction toStateAction = MountainCarOffPolicyLearning.createToStateAction(random, problem); PolicyDistribution target = new BoltzmannDistribution(random, problem.actions(), toStateAction); double alpha_u = 1.0 / criticProjector.vectorNorm(); ActorOffPolicy actor = new ActorLambdaOffPolicy(0, gamma, target, alpha_u, toStateAction.vectorSize(), new ATraces()); return new OffPolicyAgentDirect(behaviour, new OffPAC(behaviour, critic, actor)); } private OffPolicyTD createCritic(Projector criticProjector, double gamma) { double alpha_v = .05 / criticProjector.vectorNorm(); double alpha_w = .0001 / criticProjector.vectorNorm(); GTDLambda gtd = new GTDLambda(0, gamma, alpha_v, alpha_w, criticProjector.vectorSize(), new ATraces()); return new CriticAdapterFA(criticProjector, gtd); } }; Assert.assertTrue(MountainCarOffPolicyLearning.evaluate(factory) < 115);See rlpark.plugin.rltoys.junit.experiments.offpolicy.OffPolicyTests for more tests.
All tests in RLPark
See rlpark.alltests.RLParkAllTestsDocumentation