Autoresearch
Automated experiment results and hyperparameter exploration
V2 Reward Shaping Sweep
19 experiments x 3 seeds x 3 folds x 150 epochs
19/19
Currently: —ETA: Complete
Experiment Leaderboard
Ranked by mean bps -- champion to beat: +29.13 bps
| # | Experiment | Mean bps | Std | Seed 1 | Seed 2 | Seed 3 | vs Champion |
|---|---|---|---|---|---|---|---|
| 1 | entry_cost_1.0 | +29.13 | 9.48 | +18.27 | +41.37 | +27.76 | 0.00 |
| 2 | entry_cost_0.5 | +26.68 | 5.42 | +19.02 | +30.50 | +30.53 | -2.45 |
| 3 | v2_baselineBASELINE | +23.63 | 2.95 | +19.59 | +24.74 | +26.55 | -5.50 |
| 4 | entry_cost_0.1 | +23.30 | 2.58 | +19.74 | +24.37 | +25.79 | -5.83 |
| 5 | neutral_pen_0.10 | +22.93 | 2.66 | +19.17 | +24.92 | +24.71 | -6.20 |
| 6 | no_win_bonus | +22.51 | 3.69 | +17.65 | +23.31 | +26.58 | -6.62 |
| 7 | neutral_pen_0.02 | +22.01 | 2.80 | +19.20 | +20.99 | +25.84 | -7.12 |
| 8 | neutral_pen_0.00 | +21.65 | 2.93 | +17.55 | +23.22 | +24.18 | -7.48 |
| 9 | big_win_bonus | +21.14 | 3.33 | +16.49 | +22.84 | +24.09 | -7.99 |
Visual Comparison
Mean bps with champion baseline reference
Champion baseline: +29.13 bps
Seed Variance Analysis
Range of results across random seeds -- wider range = less stable
ec_1.0
s=9.5
ec_0.5
s=5.4
v2_baseline
s=3.0
ec_0.1
s=2.6
np_0.10
s=2.7
no_wb
s=3.7
np_0.02
s=2.8
np_0.00
s=2.9
big_wb
s=3.3
MeanColored bar = min-max range across seeds
High variance experiments may show inflated mean bps due to lucky seeds. Check std deviation before choosing a production candidate.