Autoresearch

Automated experiment results and hyperparameter exploration

V2 Reward Shaping Sweep

19 experiments x 3 seeds x 3 folds x 150 epochs

Complete
19/19
Currently: ETA: Complete

Experiment Leaderboard

Ranked by mean bps -- champion to beat: +29.13 bps

#ExperimentMean bpsStdSeed 1Seed 2Seed 3vs Champion
1entry_cost_1.0+29.139.48+18.27+41.37+27.760.00
2entry_cost_0.5+26.685.42+19.02+30.50+30.53-2.45
3v2_baselineBASELINE+23.632.95+19.59+24.74+26.55-5.50
4entry_cost_0.1+23.302.58+19.74+24.37+25.79-5.83
5neutral_pen_0.10+22.932.66+19.17+24.92+24.71-6.20
6no_win_bonus+22.513.69+17.65+23.31+26.58-6.62
7neutral_pen_0.02+22.012.80+19.20+20.99+25.84-7.12
8neutral_pen_0.00+21.652.93+17.55+23.22+24.18-7.48
9big_win_bonus+21.143.33+16.49+22.84+24.09-7.99

Visual Comparison

Mean bps with champion baseline reference

Champion baseline: +29.13 bps

Seed Variance Analysis

Range of results across random seeds -- wider range = less stable

ec_1.0
s=9.5
ec_0.5
s=5.4
v2_baseline
s=3.0
ec_0.1
s=2.6
np_0.10
s=2.7
no_wb
s=3.7
np_0.02
s=2.8
np_0.00
s=2.9
big_wb
s=3.3
MeanColored bar = min-max range across seeds

High variance experiments may show inflated mean bps due to lucky seeds. Check std deviation before choosing a production candidate.