Paired Model — Pólya-Gamma Gibbs Sampler¶
Paired logistic model with exact MCMC inference via Pólya-Gamma data augmentation.
bayes_paired_pg
¶
Pooled Bernoulli logistic regression for paired A/B model comparison (Pólya-Gamma).
This module provides :class:PairedBayesPropTestPG, a class for fitting
a pooled Bayesian Bernoulli logistic model to paired binary scores using
exact Pólya-Gamma data augmentation (Gibbs sampling), performing hypothesis
testing via the Savage-Dickey density ratio, running posterior-predictive
diagnostics, and generating publication-ready plots.
Compared to the Laplace approximation in :mod:ai_eval.resources.bayes_paired_laplace,
the PG sampler provides exact (up to MCMC error) posterior inference and
multi-chain MCMC diagnostics (R-hat, ESS).
Typical workflow::
from ai_eval.resources.bayes_paired_pg import PairedBayesPropTestPG
model = PairedBayesPropTestPG(seed=42).fit(y_A, y_B)
model.print_summary()
model.plot_trace()
model.plot_posterior_delta()
For multi-metric comparisons::
results = {"Relevancy": model_rel, "Faithfulness": model_faith}
PairedBayesPropTestPG.plot_forest(results, label_A="v2", label_B="v1")
PairedBayesPropTestPG(prior_sigma_delta=1.0, prior_sigma_mu=2.0, seed=0, n_iter=2000, burn_in=500, n_chains=4, decision_rule='all', rope_epsilon=0.02)
¶
Pooled Bernoulli logistic model for paired A/B comparison (PG Gibbs).
Uses Pólya-Gamma data augmentation for exact Gibbs sampling instead of Laplace approximation.
Generative model (identical to the Laplace version)::
μ ~ N(0, σ_μ) (overall intercept)
δ_A ~ N(0, σ_δ) (model-A advantage)
y_A,i ~ Bernoulli(σ(μ + δ_A))
y_B,i ~ Bernoulli(σ(μ))
Inference proceeds by augmenting with Pólya-Gamma latent variables ω_i ~ PG(1, x_i'β), which yields conjugate Gaussian conditionals for β = [μ, δ_A].
Multiple independent chains are run for MCMC diagnostics (R-hat, ESS).
Attributes:
| Name | Type | Description |
|---|---|---|
chains |
ndarray | None
|
Array of shape |
samples |
ndarray | None
|
Pooled posterior draws, shape |
summary |
dict[str, Any] | None
|
Dict with |
trace_summary |
DataFrame | None
|
|
delta_A_samples |
ndarray | None
|
1-D array of pooled posterior draws for |
y_A_obs |
ndarray | None
|
Observed binary scores for model A (set by :meth: |
y_B_obs |
ndarray | None
|
Observed binary scores for model B (set by :meth: |
Initialise model configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prior_sigma_delta
|
float
|
Standard deviation of the N(0, σ) prior
on |
1.0
|
prior_sigma_mu
|
float
|
Standard deviation of the N(0, σ) prior
on |
2.0
|
seed
|
int
|
Random seed for reproducibility. |
0
|
n_iter
|
int
|
Total Gibbs iterations per chain (including burn-in). |
2000
|
burn_in
|
int
|
Number of warm-up iterations to discard per chain. |
500
|
n_chains
|
int
|
Number of independent MCMC chains. |
4
|
decision_rule
|
DecisionRuleType
|
Default decision framework — one of
|
'all'
|
rope_epsilon
|
float
|
Half-width of the ROPE interval (default 0.02 = 2 pp). |
0.02
|
Source code in bayesprop/resources/bayes_paired_pg.py
fit(y_A_obs, y_B_obs)
¶
Fit the model via PG Gibbs sampling with multiple chains.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_A_obs
|
ndarray
|
Binary observed scores for model A (0 or 1). |
required |
y_B_obs
|
ndarray
|
Binary observed scores for model B (0 or 1). |
required |
Returns:
| Type | Description |
|---|---|
PairedBayesPropTestPG
|
self (for method chaining). |
Source code in bayesprop/resources/bayes_paired_pg.py
mcmc_diagnostics()
¶
Compute R-hat and ESS for each parameter.
Returns:
| Type | Description |
|---|---|
MCMCDiagnostics
|
class: |
Source code in bayesprop/resources/bayes_paired_pg.py
savage_dickey_test(null_value=0.0)
¶
Savage-Dickey density-ratio Bayes factor for H0: delta_A = null_value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
null_value
|
float
|
The point null hypothesis value for delta_A. |
0.0
|
Returns:
| Type | Description |
|---|---|
SavageDickeyResult
|
Dict with keys |
SavageDickeyResult
|
|
Source code in bayesprop/resources/bayes_paired_pg.py
posterior_probability_H0(BF_01, prior_H0=0.5)
staticmethod
¶
Convert BF_01 to posterior probability of H0 (spike-and-slab).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
BF_01
|
float
|
Bayes factor in favour of H0. |
required |
prior_H0
|
float
|
Prior probability of H0 (default 0.5). |
0.5
|
Returns:
| Type | Description |
|---|---|
PosteriorProbH0Result
|
Dict with keys |
PosteriorProbH0Result
|
|
Source code in bayesprop/resources/bayes_paired_pg.py
rope_test(rope=None, ci_mass=0.95)
¶
ROPE analysis on the posterior of Δ = p_A − p_B (probability scale).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rope
|
tuple[float, float] | None
|
(lower, upper) ROPE bounds. Defaults to
|
None
|
ci_mass
|
float
|
Credible interval mass (default 95%). |
0.95
|
Returns:
| Type | Description |
|---|---|
ROPEResult
|
class: |
ROPEResult
|
decision. |
Source code in bayesprop/resources/bayes_paired_pg.py
decide(rule=None)
¶
Run the chosen decision framework(s) and return a composite result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rule
|
DecisionRuleType | None
|
Override the default |
None
|
Returns:
| Type | Description |
|---|---|
HypothesisDecision
|
class: |
HypothesisDecision
|
populated. |
Source code in bayesprop/resources/bayes_paired_pg.py
ppc_pvalues(seed=None)
¶
Posterior predictive p-values for summary statistics.
Returns:
| Type | Description |
|---|---|
dict[str, PPCStatistic]
|
Dict mapping statistic name to |
Source code in bayesprop/resources/bayes_paired_pg.py
plot_trace(**kwargs)
¶
Trace plots and autocorrelation for all chains.
Source code in bayesprop/resources/bayes_paired_pg.py
plot_posteriors(**kwargs)
¶
Two-panel posterior plot: overlaid p_A / p_B and Δ = p_A − p_B.
The implied success probabilities p_A = σ(μ + δ_A) and
p_B = σ(μ) are computed from the pooled MCMC posterior
samples and displayed as overlaid KDE densities in the left
panel. The right panel shows the difference Δ = p_A − p_B.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Any
|
Accepts |
{}
|
Source code in bayesprop/resources/bayes_paired_pg.py
614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 | |
plot_posterior_delta(color='#2196F3', **kwargs)
¶
KDE posterior density of delta_A (logit scale) with 95% CI.
Source code in bayesprop/resources/bayes_paired_pg.py
plot_savage_dickey(color='#2196F3', **kwargs)
¶
Posterior vs prior density with Savage-Dickey BF annotation.
Source code in bayesprop/resources/bayes_paired_pg.py
plot_ppc(seed=None, **kwargs)
¶
Three-column PPC plot: P(perfect) A, P(perfect) B, rate difference.
Source code in bayesprop/resources/bayes_paired_pg.py
827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 | |
print_summary()
¶
Print posterior summary, MCMC diagnostics, Savage-Dickey test, and PPC.
Source code in bayesprop/resources/bayes_paired_pg.py
plot_forest(results, label_A='Model A', label_B='Model B', **kwargs)
staticmethod
¶
Forest plot + P(A>B) bar chart for multiple metrics.
Source code in bayesprop/resources/bayes_paired_pg.py
999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 | |
print_comparison_table(results)
staticmethod
¶
Print a formatted comparison table across metrics.
Source code in bayesprop/resources/bayes_paired_pg.py
sigmoid(x)
¶
Numerically stable element-wise sigmoid function.