Skip to content

Sequential designs

Warm-started, batch-by-batch updates with optional early stopping based on Bayes factor or ROPE thresholds.

Non-paired

SequentialNonPairedBayesPropTest(alpha0=1.0, beta0=1.0, threshold=0.7, bf_upper=10.0, bf_lower=0.1, n_max=None, n_min=0, decision_rule='all', rope_epsilon=0.02, seed=0, n_samples=20000, n_quad=100, verbose=False)

Sequential / streaming non-paired Bayesian A/B test.

Maintains a running Beta posterior per arm and updates it as new batches of observations arrive. Because the Beta-Bernoulli model is conjugate, the current posterior is also the prior for the next batch — so the running posterior parameters are sufficient state.

On every :meth:update call the cumulative posterior is re-evaluated via :class:NonPairedBayesPropTest, producing a snapshot containing the posterior state, P(theta_B > theta_A), Savage-Dickey Bayes factor, posterior probability of H₀, ROPE analysis, and a sequential stopping decision.

Stopping rule: stop when the Savage-Dickey BF₁₀ exceeds bf_upper (evidence for H₁), falls below bf_lower (evidence for H₀), or when both arms reach n_max (if set).

Initialise the sequential non-paired test.

Parameters:

Name Type Description Default
alpha0 float

Prior alpha for both arms (used at look 0).

1.0
beta0 float

Prior beta for both arms (used at look 0).

1.0
threshold float

Binarization threshold for continuous scores.

0.7
bf_upper float

Stop for H₁ when BF₁₀ ≥ this value.

10.0
bf_lower float

Stop for H₀ when BF₁₀ ≤ this value.

0.1
n_max int | None

If set, stop once min(n_A, n_B) ≥ n_max.

None
n_min int

Minimum samples per arm before any BF-based stopping decision is allowed (guards against unstable early BFs).

0
decision_rule DecisionRuleType

Decision framework passed to :meth:NonPairedBayesPropTest.decide at each look.

'all'
rope_epsilon float

Half-width of the ROPE on Δ = θ_A − θ_B.

0.02
seed int

Random seed for Monte Carlo draws of Δ.

0
n_samples int

Number of Monte Carlo draws per look.

20000
n_quad int

Gauss-Legendre quadrature nodes for P(B > A).

100
verbose bool

If True, print a one-line summary per look.

False
Source code in bayesprop/resources/bayes_nonpaired.py
def __init__(
    self,
    alpha0: float = 1.0,
    beta0: float = 1.0,
    threshold: float = 0.7,
    bf_upper: float = 10.0,
    bf_lower: float = 0.1,
    n_max: int | None = None,
    n_min: int = 0,
    decision_rule: DecisionRuleType = "all",
    rope_epsilon: float = 0.02,
    seed: int = 0,
    n_samples: int = 20_000,
    n_quad: int = 100,
    verbose: bool = False,
) -> None:
    """Initialise the sequential non-paired test.

    Args:
        alpha0: Prior alpha for both arms (used at look 0).
        beta0: Prior beta for both arms (used at look 0).
        threshold: Binarization threshold for continuous scores.
        bf_upper: Stop for H₁ when BF₁₀ ≥ this value.
        bf_lower: Stop for H₀ when BF₁₀ ≤ this value.
        n_max: If set, stop once min(n_A, n_B) ≥ n_max.
        n_min: Minimum samples per arm before any BF-based stopping
            decision is allowed (guards against unstable early BFs).
        decision_rule: Decision framework passed to
            :meth:`NonPairedBayesPropTest.decide` at each look.
        rope_epsilon: Half-width of the ROPE on Δ = θ_A − θ_B.
        seed: Random seed for Monte Carlo draws of Δ.
        n_samples: Number of Monte Carlo draws per look.
        n_quad: Gauss-Legendre quadrature nodes for P(B > A).
        verbose: If True, print a one-line summary per look.
    """
    if bf_lower >= bf_upper:
        raise ValueError("bf_lower must be strictly less than bf_upper")
    if bf_lower <= 0:
        raise ValueError("bf_lower must be positive")

    # Original prior — kept for the Savage-Dickey prior-at-null term.
    self.alpha0 = alpha0
    self.beta0 = beta0
    self.threshold = threshold
    self.bf_upper = bf_upper
    self.bf_lower = bf_lower
    self.n_max = n_max
    self.n_min = n_min
    self.decision_rule = decision_rule
    self.rope_epsilon = rope_epsilon
    self.seed = seed
    self.n_samples = n_samples
    self.n_quad = n_quad
    self.verbose = verbose

    # Running Beta posterior state (= prior for the next batch).
    # These four numbers are sufficient statistics for everything
    # downstream — no raw data needs to be retained.
    self.posterior_state: dict[str, float] = {
        "alpha_A": float(alpha0),
        "beta_A": float(beta0),
        "alpha_B": float(alpha0),
        "beta_B": float(beta0),
    }
    self.n_A: int = 0
    self.n_B: int = 0
    self.successes_A: int = 0
    self.successes_B: int = 0

    self.history: list[SequentialLookResult] = []
    self._stopped: bool = False
    self._stop_reason: str | None = None

stopped property

True once a stopping rule has triggered.

stop_reason property

Reason for stopping, or None if still continuing.

update(y_a_batch, y_b_batch)

Incorporate a new batch and return the updated snapshot.

Parameters:

Name Type Description Default
y_a_batch ArrayLike

New observations for arm A (continuous or binary).

required
y_b_batch ArrayLike

New observations for arm B (continuous or binary).

required

Returns:

Type Description
SequentialLookResult

class:SequentialLookResult for this look, also appended to

SequentialLookResult

attr:history.

Raises:

Type Description
RuntimeError

If called after the stopping rule has fired.

Source code in bayesprop/resources/bayes_nonpaired.py
def update(
    self,
    y_a_batch: npt.ArrayLike,
    y_b_batch: npt.ArrayLike,
) -> SequentialLookResult:
    """Incorporate a new batch and return the updated snapshot.

    Args:
        y_a_batch: New observations for arm A (continuous or binary).
        y_b_batch: New observations for arm B (continuous or binary).

    Returns:
        :class:`SequentialLookResult` for this look, also appended to
        :attr:`history`.

    Raises:
        RuntimeError: If called after the stopping rule has fired.
    """
    if self._stopped:
        raise RuntimeError(f"Sequential test already stopped: {self._stop_reason}")

    ya = self._binarize(y_a_batch)
    yb = self._binarize(y_b_batch)
    sA, sB = int(ya.sum()), int(yb.sum())
    nA, nB = len(ya), len(yb)

    # Conjugate update of the running posterior state.
    ps = self.posterior_state
    ps["alpha_A"] += sA
    ps["beta_A"] += nA - sA
    ps["alpha_B"] += sB
    ps["beta_B"] += nB - sB
    self.n_A += nA
    self.n_B += nB
    self.successes_A += sA
    self.successes_B += sB

    snap = self._snapshot()
    self.history.append(snap)

    if self.verbose:
        bf10 = (
            snap.decision.bayes_factor.BF_10
            if snap.decision.bayes_factor
            else float("nan")
        )
        print(
            f"[look {snap.look}] n_A={snap.n_A} n_B={snap.n_B} "
            f"P(B>A)={snap.P_B_greater_A:.3f} BF10={bf10:.3g} "
            f"stop={snap.stop} ({snap.stop_reason})"
        )

    return snap

run(batches)

Consume a stream of batches until stopping or exhaustion.

Parameters:

Name Type Description Default
batches Iterable[tuple[ArrayLike, ArrayLike]]

Iterable yielding (y_a_batch, y_b_batch) pairs.

required

Returns:

Type Description
SequentialLookResult

The final :class:SequentialLookResult.

Source code in bayesprop/resources/bayes_nonpaired.py
def run(
    self,
    batches: Iterable[tuple[npt.ArrayLike, npt.ArrayLike]],
) -> SequentialLookResult:
    """Consume a stream of batches until stopping or exhaustion.

    Args:
        batches: Iterable yielding ``(y_a_batch, y_b_batch)`` pairs.

    Returns:
        The final :class:`SequentialLookResult`.
    """
    last: SequentialLookResult | None = None
    for ya, yb in batches:
        last = self.update(ya, yb)
        if self._stopped:
            break
    if last is None:
        raise ValueError("`batches` was empty; nothing to update.")
    return last

history_frame()

Return the per-look history as a tidy DataFrame for plotting.

Source code in bayesprop/resources/bayes_nonpaired.py
def history_frame(self) -> pd.DataFrame:
    """Return the per-look history as a tidy DataFrame for plotting."""
    rows = []
    for s in self.history:
        bf = s.decision.bayes_factor
        rope = s.decision.rope
        rows.append(
            {
                "look": s.look,
                "n_A": s.n_A,
                "n_B": s.n_B,
                "alpha_A": s.posterior_state.alpha_A,
                "beta_A": s.posterior_state.beta_A,
                "alpha_B": s.posterior_state.alpha_B,
                "beta_B": s.posterior_state.beta_B,
                "P_B_gt_A": s.P_B_greater_A,
                "BF_10": bf.BF_10 if bf else np.nan,
                "BF_01": bf.BF_01 if bf else np.nan,
                "pct_in_rope": rope.pct_in_rope if rope else np.nan,
                "stop": s.stop,
                "stop_reason": s.stop_reason,
            }
        )
    return pd.DataFrame(rows)

plot_trajectory(**kwargs)

Plot BF₁₀ and P(B > A) trajectories across looks.

Parameters:

Name Type Description Default
**kwargs Any

Accepts figsize (default (12, 4)).

{}
Source code in bayesprop/resources/bayes_nonpaired.py
def plot_trajectory(self, **kwargs: Any) -> None:
    """Plot BF₁₀ and P(B > A) trajectories across looks.

    Args:
        **kwargs: Accepts ``figsize`` (default ``(12, 4)``).
    """
    import matplotlib.pyplot as plt

    if not self.history:
        raise RuntimeError("No history yet; call .update() first.")

    df = self.history_frame()
    figsize = kwargs.pop("figsize", (12, 4))
    fig, axes = plt.subplots(1, 2, figsize=figsize)

    ax = axes[0]
    ax.plot(df["n_A"] + df["n_B"], df["BF_10"], marker="o", color="#E91E63")
    ax.axhline(
        self.bf_upper,
        ls="--",
        color="gray",
        alpha=0.7,
        label=f"BF₁₀ = {self.bf_upper}",
    )
    ax.axhline(
        self.bf_lower,
        ls="--",
        color="gray",
        alpha=0.7,
        label=f"BF₁₀ = {self.bf_lower}",
    )
    ax.set_yscale("log")
    ax.set_xlabel("Cumulative n_A + n_B")
    ax.set_ylabel("BF₁₀ (log scale)")
    ax.set_title("Sequential Bayes Factor")
    ax.grid(alpha=0.3)
    ax.legend(fontsize=8)

    ax = axes[1]
    ax.plot(df["n_A"] + df["n_B"], df["P_B_gt_A"], marker="o", color="#3F51B5")
    ax.axhline(0.5, ls=":", color="gray", alpha=0.7)
    ax.set_xlabel("Cumulative n_A + n_B")
    ax.set_ylabel("P(θ_B > θ_A)")
    ax.set_title("Posterior probability of superiority")
    ax.set_ylim(0, 1)
    ax.grid(alpha=0.3)

    fig.tight_layout()
    plt.show()

Paired (Laplace)

SequentialPairedBayesPropTest(prior_sigma_delta=1.0, bf_upper=10.0, bf_lower=0.1, n_max=None, n_min=0, decision_rule='all', rope_epsilon=0.02, seed=0, n_samples=8000, verbose=False)

Sequential / streaming paired Bayesian A/B test (Laplace).

Maintains running cumulative sufficient statistics (n_A, k_A, n_B, k_B) and re-fits the pooled Bernoulli logistic model via :class:PairedBayesPropTest after each batch. Because the likelihood depends on the data only through these four counts, the refit at look t returns exactly the same Laplace posterior as fitting all accumulated data in one shot — there is no information loss from streaming.

On every :meth:update call the cumulative posterior is re-evaluated, producing a snapshot containing the Laplace posterior state (mu_MAP, delta_A_MAP, Sigma), the posterior probability P(p_A > p_B) on the probability scale, the Savage-Dickey Bayes factor on delta_A = 0 (logit scale), the ROPE classification on Delta = p_A - p_B, and a sequential stopping decision.

Stopping rule: stop when the Savage-Dickey BF₁₀ exceeds bf_upper (evidence for H₁), falls below bf_lower (evidence for H₀), or when both arms reach n_max (if set).

Initialise the sequential paired Laplace test.

Parameters:

Name Type Description Default
prior_sigma_delta float

Standard deviation of the N(0, sigma) prior on delta_A (logit scale). Held fixed across all looks so the Savage-Dickey BF is consistent.

1.0
bf_upper float

Stop for H₁ when BF₁₀ ≥ this value.

10.0
bf_lower float

Stop for H₀ when BF₁₀ ≤ this value.

0.1
n_max int | None

If set, stop once min(n_A, n_B) ≥ n_max.

None
n_min int

Minimum samples per arm before any BF-based stopping decision is allowed (guards against unstable early BFs).

0
decision_rule DecisionRuleType

Decision framework passed to :meth:PairedBayesPropTest.decide at each look.

'all'
rope_epsilon float

Half-width of the ROPE on Δ = p_A - p_B (probability scale).

0.02
seed int

Random seed for the Laplace posterior draws.

0
n_samples int

Number of draws from the Laplace posterior per look.

8000
verbose bool

If True, print a one-line summary per look.

False
Source code in bayesprop/resources/bayes_paired_laplace.py
def __init__(
    self,
    prior_sigma_delta: float = 1.0,
    bf_upper: float = 10.0,
    bf_lower: float = 0.1,
    n_max: int | None = None,
    n_min: int = 0,
    decision_rule: DecisionRuleType = "all",
    rope_epsilon: float = 0.02,
    seed: int = 0,
    n_samples: int = 8000,
    verbose: bool = False,
) -> None:
    """Initialise the sequential paired Laplace test.

    Args:
        prior_sigma_delta: Standard deviation of the N(0, sigma) prior
            on ``delta_A`` (logit scale). Held fixed across all looks
            so the Savage-Dickey BF is consistent.
        bf_upper: Stop for H\u2081 when BF\u2081\u2080 \u2265 this value.
        bf_lower: Stop for H\u2080 when BF\u2081\u2080 \u2264 this value.
        n_max: If set, stop once min(n_A, n_B) \u2265 n_max.
        n_min: Minimum samples per arm before any BF-based stopping
            decision is allowed (guards against unstable early BFs).
        decision_rule: Decision framework passed to
            :meth:`PairedBayesPropTest.decide` at each look.
        rope_epsilon: Half-width of the ROPE on \u0394 = p_A - p_B
            (probability scale).
        seed: Random seed for the Laplace posterior draws.
        n_samples: Number of draws from the Laplace posterior per look.
        verbose: If True, print a one-line summary per look.
    """
    if bf_lower >= bf_upper:
        raise ValueError("bf_lower must be strictly less than bf_upper")
    if bf_lower <= 0:
        raise ValueError("bf_lower must be positive")

    self.prior_sigma_delta = prior_sigma_delta
    self.bf_upper = bf_upper
    self.bf_lower = bf_lower
    self.n_max = n_max
    self.n_min = n_min
    self.decision_rule = decision_rule
    self.rope_epsilon = rope_epsilon
    self.seed = seed
    self.n_samples = n_samples
    self.verbose = verbose

    # Cumulative sufficient statistics (everything the likelihood sees).
    self.n_A: int = 0
    self.n_B: int = 0
    self.successes_A: int = 0
    self.successes_B: int = 0

    self.history: list[SequentialLaplaceLookResult] = []
    self._stopped: bool = False
    self._stop_reason: str | None = None
    self._last_model: PairedBayesPropTest | None = None

stopped property

True once a stopping rule has triggered.

stop_reason property

Reason for stopping, or None if still continuing.

last_model property

The most recently fitted :class:PairedBayesPropTest (or None).

update(y_a_batch, y_b_batch)

Incorporate a new paired batch and return the updated snapshot.

Parameters:

Name Type Description Default
y_a_batch ArrayLike

New binary observations for arm A (0/1).

required
y_b_batch ArrayLike

New binary observations for arm B (0/1), same length as y_a_batch (paired design).

required

Returns:

Type Description
SequentialLaplaceLookResult

class:SequentialLaplaceLookResult for this look, also

SequentialLaplaceLookResult

appended to :attr:history.

Raises:

Type Description
RuntimeError

If called after the stopping rule has fired.

ValueError

If batch lengths differ or contain non-binary values.

Source code in bayesprop/resources/bayes_paired_laplace.py
def update(
    self,
    y_a_batch: npt.ArrayLike,
    y_b_batch: npt.ArrayLike,
) -> SequentialLaplaceLookResult:
    """Incorporate a new paired batch and return the updated snapshot.

    Args:
        y_a_batch: New binary observations for arm A (0/1).
        y_b_batch: New binary observations for arm B (0/1), same length
            as ``y_a_batch`` (paired design).

    Returns:
        :class:`SequentialLaplaceLookResult` for this look, also
        appended to :attr:`history`.

    Raises:
        RuntimeError: If called after the stopping rule has fired.
        ValueError: If batch lengths differ or contain non-binary values.
    """
    if self._stopped:
        raise RuntimeError(f"Sequential test already stopped: {self._stop_reason}")

    ya = np.asarray(y_a_batch)
    yb = np.asarray(y_b_batch)
    if len(ya) != len(yb):
        raise ValueError(
            f"Paired batches must have equal length, got {len(ya)} and {len(yb)}."
        )
    if ya.size and not (
        np.all((ya == 0) | (ya == 1)) and np.all((yb == 0) | (yb == 1))
    ):
        raise ValueError(
            "SequentialPairedBayesPropTest expects already-binarized "
            "0/1 inputs (binarize continuous scores beforehand)."
        )

    self.n_A += int(len(ya))
    self.n_B += int(len(yb))
    self.successes_A += int(ya.sum())
    self.successes_B += int(yb.sum())

    snap = self._snapshot()
    self.history.append(snap)

    if self.verbose:
        bf10 = (
            snap.decision.bayes_factor.BF_10
            if snap.decision.bayes_factor
            else float("nan")
        )
        print(
            f"[look {snap.look}] n_A={snap.n_A} n_B={snap.n_B} "
            f"P(A>B)={snap.P_A_greater_B:.3f} BF10={bf10:.3g} "
            f"stop={snap.stop} ({snap.stop_reason})"
        )

    return snap

run(batches)

Consume a stream of paired batches until stopping or exhaustion.

Parameters:

Name Type Description Default
batches Iterable[tuple[ArrayLike, ArrayLike]]

Iterable yielding (y_a_batch, y_b_batch) pairs.

required

Returns:

Type Description
SequentialLaplaceLookResult

The final :class:SequentialLaplaceLookResult.

Source code in bayesprop/resources/bayes_paired_laplace.py
def run(
    self,
    batches: Iterable[tuple[npt.ArrayLike, npt.ArrayLike]],
) -> SequentialLaplaceLookResult:
    """Consume a stream of paired batches until stopping or exhaustion.

    Args:
        batches: Iterable yielding ``(y_a_batch, y_b_batch)`` pairs.

    Returns:
        The final :class:`SequentialLaplaceLookResult`.
    """
    last: SequentialLaplaceLookResult | None = None
    for ya, yb in batches:
        last = self.update(ya, yb)
        if self._stopped:
            break
    if last is None:
        raise ValueError("`batches` was empty; nothing to update.")
    return last

history_frame()

Return the per-look history as a tidy DataFrame for plotting.

Source code in bayesprop/resources/bayes_paired_laplace.py
def history_frame(self) -> pd.DataFrame:
    """Return the per-look history as a tidy DataFrame for plotting."""
    rows = []
    for s in self.history:
        bf = s.decision.bayes_factor
        rope = s.decision.rope
        rows.append(
            {
                "look": s.look,
                "n_A": s.n_A,
                "n_B": s.n_B,
                "mu_MAP": s.posterior_state.mu_map,
                "delta_A_MAP": s.posterior_state.delta_A_map,
                "P_A_gt_B": s.P_A_greater_B,
                "BF_10": bf.BF_10 if bf else np.nan,
                "BF_01": bf.BF_01 if bf else np.nan,
                "pct_in_rope": rope.pct_in_rope if rope else np.nan,
                "stop": s.stop,
                "stop_reason": s.stop_reason,
            }
        )
    return pd.DataFrame(rows)

plot_trajectory(**kwargs)

Plot BF₁₀ and P(p_A > p_B) trajectories across looks.

Parameters:

Name Type Description Default
**kwargs Any

Accepts figsize (default (12, 4)).

{}
Source code in bayesprop/resources/bayes_paired_laplace.py
def plot_trajectory(self, **kwargs: Any) -> None:
    """Plot BF\u2081\u2080 and P(p_A > p_B) trajectories across looks.

    Args:
        **kwargs: Accepts ``figsize`` (default ``(12, 4)``).
    """
    import matplotlib.pyplot as plt

    if not self.history:
        raise RuntimeError("No history yet; call .update() first.")

    df = self.history_frame()
    figsize = kwargs.pop("figsize", (12, 4))
    fig, axes = plt.subplots(1, 2, figsize=figsize)

    ax = axes[0]
    ax.plot(df["n_A"] + df["n_B"], df["BF_10"], marker="o", color="#E91E63")
    ax.axhline(
        self.bf_upper,
        ls="--",
        color="gray",
        alpha=0.7,
        label=f"BF\u2081\u2080 = {self.bf_upper}",
    )
    ax.axhline(
        self.bf_lower,
        ls="--",
        color="gray",
        alpha=0.7,
        label=f"BF\u2081\u2080 = {self.bf_lower}",
    )
    ax.set_yscale("log")
    ax.set_xlabel("Cumulative n_A + n_B")
    ax.set_ylabel("BF\u2081\u2080 (log scale)")
    ax.set_title("Sequential Bayes Factor (paired Laplace)")
    ax.grid(alpha=0.3)
    ax.legend(fontsize=8)

    ax = axes[1]
    ax.plot(df["n_A"] + df["n_B"], df["P_A_gt_B"], marker="o", color="#3F51B5")
    ax.axhline(0.5, ls=":", color="gray", alpha=0.7)
    ax.set_xlabel("Cumulative n_A + n_B")
    ax.set_ylabel("P(p_A > p_B)")
    ax.set_title("Posterior probability of superiority")
    ax.set_ylim(0, 1)
    ax.grid(alpha=0.3)

    fig.tight_layout()
    plt.show()