BFDA Utilities¶
Bayes Factor Design Analysis for sample-size planning — simulation engine, power curves, and plotting utilities.
utils
¶
simulate_nonpaired_scores(N=200, theta_A=0.75, theta_B=0.6, seed=0, rng=None)
¶
Simulate independent binary outcomes for a non-paired A/B test.
Each group is sampled independently from a Bernoulli distribution with the specified success probability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
N
|
int
|
Number of observations per group. |
200
|
theta_A
|
float
|
True success probability for model A. |
0.75
|
theta_B
|
float
|
True success probability for model B. |
0.6
|
seed
|
int
|
Random seed for reproducibility. |
0
|
rng
|
Generator | None
|
Optional pre-seeded RNG; if provided, seed is ignored. |
None
|
Returns:
| Type | Description |
|---|---|
NonPairedSimResult
|
class: |
NonPairedSimResult
|
|
Source code in bayesprop/utils/utils.py
simulate_paired_scores(N=200, mu=0.0, delta_A=0.5, delta_B=0.0, sigma_theta=0.0, seed=0, rng=None)
¶
Simulate paired binary outcomes from a logistic DGP.
Matches the paired model: y_A ~ Bern(σ(μ + δ_A)),
y_B ~ Bern(σ(μ)).
When sigma_theta > 0 each item i additionally receives a
random effect ε_i ~ N(0, sigma_theta) so that
θ_i = μ + ε_i (useful for more realistic BFDA simulations).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
N
|
int
|
Number of paired observations. |
200
|
mu
|
float
|
Shared logit-scale intercept. |
0.0
|
delta_A
|
float
|
Logit-scale treatment effect for model A. |
0.5
|
delta_B
|
float
|
Logit-scale offset for model B (0 by default). |
0.0
|
sigma_theta
|
float
|
SD of optional per-item random effects
( |
0.0
|
seed
|
int
|
Random seed for reproducibility. |
0
|
rng
|
Generator | None
|
Optional pre-seeded RNG; if provided, seed is ignored. |
None
|
Returns:
| Type | Description |
|---|---|
PairedSimResult
|
class: |
PairedSimResult
|
|
Source code in bayesprop/utils/utils.py
bfda_simulate(data_generator, decision_fn, sample_sizes, n_sim=500, seed=42)
¶
Generic BFDA engine -- works with any data-generating process and decision rule.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_generator
|
Callable[[Generator, int], tuple[ndarray, ndarray]]
|
Callable(rng, n) -> (y_A, y_B). Generates one simulated dataset of size n per group using the provided RNG. |
required |
decision_fn
|
Callable[[ndarray, ndarray], bool]
|
Callable(y_A, y_B) -> bool. Returns |
required |
sample_sizes
|
list[int]
|
List of per-group sample sizes to evaluate. |
required |
n_sim
|
int
|
Number of simulated datasets per sample size. |
500
|
seed
|
int
|
Random seed for reproducibility. |
42
|
Returns:
| Type | Description |
|---|---|
dict[int, float]
|
Dictionary mapping sample size -> P(decisive outcome). |
Source code in bayesprop/utils/utils.py
bf10_to_ph0(bf_10, prior_H0=0.5)
¶
Convert BF_10 to posterior probability of H0.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bf_10
|
float
|
Bayes factor in favour of H1. |
required |
prior_H0
|
float
|
Prior probability of H0. |
0.5
|
Returns:
| Type | Description |
|---|---|
float
|
P(H0 | data). |
Source code in bayesprop/utils/utils.py
bfda_power_curve(theta_A_true, theta_B_true, sample_sizes, design='nonpaired', decision_rule='bayes_factor', *, bf_threshold=3.0, ph0_threshold=0.05, prior_H0=0.5, rope=(-0.02, 0.02), ci_mass=0.95, n_sim=500, seed=42, alpha0=1.0, beta0=1.0, sigma_theta=2.0, prior_sigma_delta=1.0, prior_sigma_mu=2.0, n_iter=1000, burn_in=300, n_chains=2)
¶
Unified Bayes Factor Design Analysis for any design × decision-rule.
Simulates datasets under a known effect and estimates the probability that a given Bayesian decision rule will reject H0 as a function of sample size (i.e. Bayesian "power").
Supported combinations:
+----------------+----------------+--------------------+--------+
| design | decision_rule | key threshold | fast? |
+================+================+====================+========+
| nonpaired | bayes_factor | bf_threshold | yes |
| nonpaired | posterior_null | ph0_threshold | yes |
| nonpaired | rope | rope | medium |
| paired | bayes_factor | bf_threshold | slow |
| paired | posterior_null | ph0_threshold | slow |
| paired | rope | rope | slow |
+----------------+----------------+--------------------+--------+
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta_A_true
|
float
|
Assumed true success rate for model A. |
required |
theta_B_true
|
float
|
Assumed true success rate for model B. |
required |
sample_sizes
|
list[int]
|
List of per-group sample sizes to evaluate. |
required |
design
|
str
|
|
'nonpaired'
|
decision_rule
|
DecisionRuleType
|
|
'bayes_factor'
|
bf_threshold
|
float
|
BF_10 threshold for decisive evidence ( |
3.0
|
ph0_threshold
|
float
|
Reject H0 when P(H0|data) < this ( |
0.05
|
prior_H0
|
float
|
Prior probability of H0 ( |
0.5
|
rope
|
tuple[float, float]
|
(lower, upper) bounds of the ROPE ( |
(-0.02, 0.02)
|
ci_mass
|
float
|
Credible interval mass for ROPE analysis ( |
0.95
|
n_sim
|
int
|
Number of simulated datasets per sample size. |
500
|
seed
|
int
|
Random seed for reproducibility. |
42
|
alpha0
|
float
|
Prior Beta alpha parameter (non-paired only). |
1.0
|
beta0
|
float
|
Prior Beta beta parameter (non-paired only). |
1.0
|
sigma_theta
|
float
|
SD of the shared latent item effect (paired DGP). |
2.0
|
prior_sigma_delta
|
float
|
SD of N(0, σ) prior on delta_A (paired only). |
1.0
|
prior_sigma_mu
|
float
|
SD of N(0, σ) prior on mu (paired only). |
2.0
|
n_iter
|
int
|
Total Gibbs iterations per chain (paired only). |
1000
|
burn_in
|
int
|
Warm-up iterations per chain (paired only). |
300
|
n_chains
|
int
|
Number of MCMC chains per dataset (paired only). |
2
|
Returns:
| Type | Description |
|---|---|
dict[int, float]
|
Dictionary mapping sample size -> P(decisive outcome). |
Source code in bayesprop/utils/utils.py
337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 | |
find_n_for_power(power_curve, target_power=0.8)
¶
Interpolate the sample size needed to achieve a target power level.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
power_curve
|
dict[int, float]
|
Dictionary mapping sample size -> power (from BFDA). |
required |
target_power
|
float
|
Desired power level (default 0.80). |
0.8
|
Returns:
| Type | Description |
|---|---|
float | None
|
Interpolated sample size, or |
Source code in bayesprop/utils/utils.py
plot_bfda_power(power_curve, theta_A_true, theta_B_true, bf_threshold=3.0, target_power=0.8, title=None, ax=None)
¶
Plot a BFDA power curve with 80%/95% reference lines.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
power_curve
|
dict[int, float]
|
Dictionary mapping sample size -> power. |
required |
theta_A_true
|
float
|
Assumed true rate for model A (for title). |
required |
theta_B_true
|
float
|
Assumed true rate for model B (for title). |
required |
bf_threshold
|
float
|
BF_10 threshold used (for y-axis label). |
3.0
|
target_power
|
float
|
Power level to highlight via interpolation. |
0.8
|
title
|
str | None
|
Optional custom title. |
None
|
ax
|
Axes | None
|
Optional matplotlib Axes to plot on. |
None
|
Returns:
| Type | Description |
|---|---|
Figure
|
The matplotlib Figure. |
Source code in bayesprop/utils/utils.py
plot_bfda_sensitivity(theta_A_true, theta_B_true, sample_sizes, thresholds=None, n_sim=500, seed=42, design='nonpaired', title=None, ax=None, **kwargs)
¶
Plot BFDA power curves for multiple BF_10 thresholds.
Works for both paired and non-paired designs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta_A_true
|
float
|
Assumed true success rate for model A. |
required |
theta_B_true
|
float
|
Assumed true success rate for model B. |
required |
sample_sizes
|
list[int]
|
List of per-group sample sizes to evaluate. |
required |
thresholds
|
list[float] | None
|
BF_10 thresholds to compare (default: [3, 6, 10]). |
None
|
n_sim
|
int
|
Number of simulated datasets per sample size. |
500
|
seed
|
int
|
Random seed for reproducibility. |
42
|
design
|
str
|
|
'nonpaired'
|
title
|
str | None
|
Optional custom title. |
None
|
ax
|
Axes | None
|
Optional matplotlib Axes to plot on. |
None
|
**kwargs
|
Any
|
Extra arguments forwarded to :func: |
{}
|
Returns:
| Type | Description |
|---|---|
Figure
|
The matplotlib Figure. |
Source code in bayesprop/utils/utils.py
542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 | |