Tutorial: Testing the correctness of an algorithm
Online non-stationary supervised learning
Check that online learning algorithms are able to learn to predict with an error below some threshold.Code snippet: testing the IDBD algorithm
NoisyInputSum problem = new NoisyInputSum();
double error = problem.evaluateLearner(new IDBD(NoisyInputSum.NbInputs, 0.001));
Assert.assertEquals(2.0, error, 0.1);
See the package rlpark.plugin.rltoys.junit.algorithms.predictions.supervised for more tests.
Online temporal difference learning
Check that online TD algorithms are able to converge on-policy to the TD fix point on a random walk problem.Code snippet: testing the TD(λ) algorithm
RandomWalk randomWalkProblem = new RandomWalk(new Random(0));
FiniteStateGraphOnPolicy.testTD(randomWalkProblem, new OnPolicyTDFactory() {
@Override
public OnPolicyTD create(int nbFeatures) {
return new TDLambda(0.1, 0.9, 0.01, nbFeatures);
}
});
See rlpark.plugin.rltoys.junit.algorithms.predictions.td.TDTest for more tests.
Check that online TD algorithms are able to converge off-policy to the TD fix point on a random walk problem.
Code snippet: testing off-policy TD and Gradient-TD algorithms
OffPolicyTDFactory tdFactory = new OffPolicyTDFactory() {
@Override
public OffPolicyTD newTD(double lambda, double gamma, int vectorSize) {
return new GTDLambda(lambda, gamma, alpha_v, alpha_w, vectorSize, new AMaxTraces());
}
};
TestingResult<OffPolicyTD> result = RandomWalkOffPolicy.testOffPolicyGTD(lambda, gamma, targetLeftProbability,
behaviourLeftProbability, tdFactory);
Assert.assertTrue(result.message, result.passed);
See rlpark.plugin.rltoys.junit.algorithms.predictions.td.GTDLambdaTest for more tests.
Control on a reinforcement learning problem
Check that online control algorithms are able to learn on-policy a policy with some minimal performance.Code snippet: test an actor-critic algorithm on the pendulum problem
Assert.assertTrue(PendulumOnPolicyLearning.evaluate(new ActorCriticFactory() {
@Override
public ControlLearner create(int vectorSize, double vectorNorm, PolicyDistribution policyDistribution) {
TD critic = new TD(1.0, 0.5 / vectorNorm, vectorSize);
Actor actor = new Actor(policyDistribution, 0.05 / vectorNorm, vectorSize);
return new AverageRewardActorCritic(0.01, critic, actor);
}
}) > .75);
See rlpark.plugin.rltoys.junit.algorithms.control.actorcritic.ActorCriticOnPolicyOnPendulumTest for more tests.
Check that online control algorithms are able to learn off-policy a policy with some minimal performance. A fixed exploration policy is run in a problem to generate data that is used to improve a target policy. Then, the performance of the target policy is evaluated on a copy of the problem.
Code snippet: test the off-pac algorithm on the mountain car problem
final MountainCarEvaluationAgentFactory factory = new MountainCarEvaluationAgentFactory() {
@Override
public OffPolicyAgentEvaluable createOffPolicyAgent(Random random, MountainCar problem, Policy behaviour, double gamma) {
Projector criticProjector = MountainCarOffPolicyLearning.createProjector(random, problem);
OffPolicyTD critic = createCritic(criticProjector, gamma);
StateToStateAction toStateAction = MountainCarOffPolicyLearning.createToStateAction(random, problem);
PolicyDistribution target = new BoltzmannDistribution(random, problem.actions(), toStateAction);
double alpha_u = 1.0 / criticProjector.vectorNorm();
ActorOffPolicy actor = new ActorLambdaOffPolicy(0, gamma, target, alpha_u, toStateAction.vectorSize(), new ATraces());
return new OffPolicyAgentDirect(behaviour, new OffPAC(behaviour, critic, actor));
}
private OffPolicyTD createCritic(Projector criticProjector, double gamma) {
double alpha_v = .05 / criticProjector.vectorNorm();
double alpha_w = .0001 / criticProjector.vectorNorm();
GTDLambda gtd = new GTDLambda(0, gamma, alpha_v, alpha_w, criticProjector.vectorSize(), new ATraces());
return new CriticAdapterFA(criticProjector, gtd);
}
};
Assert.assertTrue(MountainCarOffPolicyLearning.evaluate(factory) < 115);
See rlpark.plugin.rltoys.junit.experiments.offpolicy.OffPolicyTests for more tests.
All tests in RLPark
See rlpark.alltests.RLParkAllTestsDocumentation