Bayesian Optimization — Interactive Playground

Tweak the kernel, acquisition function, and noise to see how BO's assumptions shape its decisions. Click on the plot to manually add a sample. Toggle "Show GP samples" to visualize the prior/posterior as actual functions.

True (hidden) GP mean Uncertainty (±2σ) GP samples Observed Acquisition

What's happening: BO doesn't know the true curve. It fits a Gaussian process (GP) over what it has sampled. The kernel encodes the GP's prior — what kind of functions it considers plausible before seeing data. Try this: reset, click "Show GP samples" before stepping. Those squiggles are random functions drawn from the prior. Now switch kernels — see how Matérn-3/2 produces rougher samples, RBF smoother, periodic ones repeat.

Curious how the explore/exploit knob actually works? Click "Explore vs Exploit" at the top to see three runs racing on the same problem with different κ values.

Stats

BO evaluations

Best found / true optimum

—

GP Kernel prior

Kernel type

Default for materials science. Smooth but not infinitely so.

Lengthscale ℓ0.10

Smaller = wigglier. Larger = smoother.

Signal variance σ²0.25

Observation noise0.001

Acquisition decision

Function

mean + κ·std. Simple. The κ parameter directly controls explore/exploit.

UCB κ2.0

κ=0 = pure exploit. κ=5 = aggressive explore.

True Function

Hidden objective

Explore vs Exploit — Three Runs, One Function

Three BO instances optimizing the same hidden function, starting from the same initial sample, using the same kernel — differing only in their UCB κ. Watch how the explore/exploit balance shapes the entire optimization trajectory.

UCB acquisition: acquisition(x) = μ(x) + κ · σ(x) The mean μ is the exploitation signal, the std σ is the exploration signal. κ is the knob that weights one against the other.

κ = 0 — Pure exploitation greedy

Evals: 0 Best: — % optimum: —

κ = 2 — Balanced default

Evals: 0 Best: — % optimum: —

κ = 5 — Heavy exploration explorer

Evals: 0 Best: — % optimum: —

Regret over time — who's catching up?

Distance from the true optimum after each evaluation. Lower = better.

κ = 0 — pure exploit

Stuck on the first peak

Greedy BO trusts the GP's mean prediction completely. It picks argmax(μ) every iteration. Once it finds any hill, it climbs it and refuses to leave.

Failure mode: if the first sample lands near the local peak, BO can converge to the wrong answer and never escape, no matter how many evaluations you give it.

κ = 2 — balanced

Goldilocks zone

Sample where mean+std is highest. The σ term forces some exploration, but mostly stays focused on promising areas.

Why this is the default: κ ≈ 2 lets BO escape local maxima within ~5-10 iterations on most problems while still converging quickly once it finds the global region. Most BO papers use this.

κ = 5 — heavy explore

Maps the function first, optimizes later

The σ term dominates. BO chases uncertainty, sampling far from previous points before exploiting.

Tradeoff: guaranteed to find the global peak eventually, but wastes early evaluations on regions that are obviously bad. Useful only when you have a large budget and high-stakes problems where missing the global peak is unacceptable.

Key takeaway: the "right" κ depends on your budget and tolerance for missing the global optimum. In practice, most BO frameworks (BoTorch, scikit-learn) default to Expected Improvement (EI) instead of UCB, which adapts the explore/exploit balance automatically without a κ parameter — but the underlying tradeoff is the same. In Polaron's Matter 2024 paper, they don't specify their acquisition function in detail — but for a 4-dimensional manufacturing-parameter problem with ~50 evaluations, EI or UCB with κ ≈ 2 is the standard choice.