Introduction¶
This project develops a hierarchical Bayesian model for analyzing period-cohort effects in General Social Survey (GSS) data. The model addresses limitations of previous approaches that used ad hoc smoothing and binned cohorts, providing a principled framework for estimating how attitudes and behaviors change across birth cohorts and over time.
Problem Statement¶
Previous analyses of GSS data using period-cohort decomposition faced several challenges:
Noisy results at boundaries: Youngest and oldest cohorts have small sample sizes, leading to unstable estimates
Ad hoc smoothing: Existing approaches used LOWESS smoothing on chunked series without explicit statistical justification
Binned cohorts: Cohorts were grouped into 3-5 year bins, creating arbitrary discontinuities at bin edges
Frequentist limitations: Standard regression approaches don’t naturally handle uncertainty in sparse cells or provide principled interpolation
Proposed Solution¶
We develop a hierarchical Bayesian model that makes smoothing assumptions explicit and principled:
Likelihood: Binomial for binary responses (with ordered logistic as alternative for multi-category responses)
Link function: Logit
Smooth priors: Gaussian Random Walk of order 2 (RW2) for cohort and period effects
Model Specification¶
Basic Structure¶
For binary response data collapsed to cells (cohort × nominal year):
The cell count y(c,t) is Binomial with success probability p(c,t). The linear predictor on the logit scale is additive:
η(c,t) = α + f(c) + g(t) + ε(t)
where α is the baseline intercept, f(c) is a smooth cohort effect, g(t) is a smooth period effect, and ε(t) ~ N(0, σ_short) is extra-Binomial observation noise per nominal year (overdispersion), capturing year-level residual variance beyond the additive cohort+period structure. The probability is p(c,t) = logistic(η(c,t)).
RW2 Prior¶
A Gaussian Random Walk of order 2 (RW2) encourages locally linear trajectories:
Δ²f_c ~ N(0, σ_f)This means the second differences are Gaussian:
f_k - 2f_{k-1} + f_{k-2} ~ N(0, σ)Why RW2 over RW1:
RW1 (first differences) can produce “wiggly staircase” patterns
RW2 provides smoother long-range behavior, better for social trends
Better handles boundary noise in sparse regions
Overdispersion¶
The model includes extra-Binomial observation noise on the logit scale: ε_t ~ N(0, σ_short) per nominal year, so p = logistic(η + ε_t). This captures year-level residual variance (news cycles, wording drift, context effects) that the single additive trajectory cannot explain. Prior: σ_short ~ Exponential(0.2). Time-only (one ε per nominal year, not per cell) keeps the parameter count parsimonious.
Boundary Noise Handling¶
RW priors naturally handle sparse cells:
Sparse cells get pulled toward the smoother latent trajectory
Uncertainty widens appropriately at boundaries
Directly addresses the “youngest cohorts are noisy” problem
Implementation¶
PyMC Implementation¶
PyMC does not have built-in RW2, so we implement it manually using second differences and double cumulative sum:
# Smoothing scale
sigma_rw2 = pm.Exponential("sigma_rw2", 8.0) # mean = 0.125
# Second differences
delta2 = pm.Normal(
"delta2",
mu=0.0,
sigma=sigma_rw2,
shape=K - 2
)
# Reconstruct latent function (double cumsum)
# Free init: f_init ~ N(0,1) for level/slope at boundary (recommended)
# Legacy: pt.zeros(2) fixes first two at zero
f_init = pm.Normal("f_init", 0, 1, shape=2)
f_raw = pt.concatenate([f_init, pt.cumsum(pt.cumsum(delta2))])
# Center for identifiability (removes intercept redundancy)
f = f_raw - pt.mean(f_raw)
# Overdispersion: extra-Binomial noise per nominal year
sigma_short = pm.Exponential("sigma_short", 1.0 / 0.2)
epsilon = pm.Normal("epsilon", 0, sigma_short, dims="year_idx")
epsilon_cells = epsilon[cells["year_idx"].values]
p_obs = pm.math.sigmoid(eta + epsilon_cells)Key Implementation Details¶
Null Space Handling: RW2 has a 2D null space (intercept + linear trend). We use free initial values (
f_init ~ N(0,1)) at the boundary, then center for identifiability. We do not remove the linear trend; detrending in addition to centering would over-constrain the model. Free init produces more interpretable cohort trajectories than pinning the first two values at zero.Overdispersion: Extra-Binomial noise
ε_t ~ N(0, σ_short)per nominal year captures year-level residual variance. Prior:σ_short ~ Exponential(0.2). Time-only parameterization keeps the model parsimonious.Survey Weights: We use
weighted_mean(Kish effective sample size) to compute cell proportions and n_eff, incorporating GSS survey weights for population-level inference.Prior Choice: Exponential priors with mean 0.125 (rate=8.0) for smoothing scales provide optimal balance of geometry and flexibility.
Single-Year Cohorts: We use single-year birth cohorts rather than bins, letting the RW2 prior provide the smoothing. This avoids arbitrary discontinuities at bin edges.
Model Development and Testing¶
Version History¶
We tested multiple model versions to optimize performance:
Version 1-5: Initial exploration of RW2 construction methods and prior choices
Version 6: Added linear trend removal (initially thought to improve identifiability)
Version 7: Switched to Exponential priors (best convergence and efficiency)
Version 8: Validated state-space construction (works but slower than double cumsum)
Version 9: Added complete yearly grid for period effects (31 observed + 17 interpolated years)
Version 10: Two-year cadence for period effects (27 nominal years, all observed); removed linear trend projection; free init (f_init ~ N(0,1) at boundary); overdispersion (ε_t per nominal year); weighted_mean for survey weights (n_eff)
Final Recommended Configuration¶
Version 10 (current recommended version):
Construction: Double cumsum RW2 with free initial values and centering only (no detrending)
Overdispersion:
sigma_short ~ Exponential(5.0),epsilon_t ~ N(0, sigma_short)per nominal yearSurvey weights:
weighted_mean(Kish n_eff) for cell-level aggregationPriors:
sigma_c ~ Exponential(8.0),sigma_t ~ Exponential(8.0)(mean = 0.125)Period grid: Two-year cadence (35 survey years → 27 nominal years, aligns with biennial surveys)
Sampler:
nutpiewith 4 chains, 4 coresPerformance:
Sampling time: ~0.4 seconds (from cache) or ~2-3 minutes for full run
Max R-hat: 1.01 (excellent convergence)
Min ESS bulk: 174 (acceptable, though ideally > 400)
No divergences
Model Comparison: Version 9 vs Version 10¶
We compared the yearly grid (Version 9) and two-year cadence (Version 10) approaches:
Version 9 (Yearly Grid):
48 years in full grid (31 observed, 17 with no data)
1,810 cells total
Higher
sigma_t(0.810) suggesting need for more period variationConvergence issues (R-hat 1.06, ESS 61)
Version 10 (Two-Year Cadence):
26 nominal years (all observed, no gaps)
1,654 cells total
Lower
sigma_t(0.233) indicating smoother, more stable period effectsBetter convergence (R-hat 1.01, ESS 174)
Better alignment with actual survey structure (biennial since 1993)
Trajectory Comparison:
Correlation: 0.80 between model predictions
Mean absolute difference: 0.0275 (2.75 percentage points)
Conclusion: Models produce similar predictions, but Version 10 has better convergence and structural alignment
Data Preparation¶
GSS Data¶
We use the General Social Survey (GSS) data, which has been conducted since 1972. The current analysis uses data through 2024.
Data Structure:
Respondents grouped by birth cohort (single-year, 1900-2000)
Survey years: 1972-2024 (35 survey years mapped to 27 nominal years in Version 10)
Binary outcomes: Conservative vs. non-conservative responses based on variable-specific definitions
Variable Selection: We focus on variables that:
Are binary or easily binarizable
Have been asked frequently over many years
Have clear substantive interpretation
Represent different domains (politics, social issues, etc.)
Current variables analyzed:
cappun: Capital punishment attitudeshomosex: Homosexuality attitudesracopen: Open housing law attitudesfepol: Women in politics attitudes
Survey Year Mapping (Version 10)¶
To implement the two-year cadence, we map actual survey years to nominal years:
Early annual years (1972-1991): Consolidated into two-year periods
Recent biennial years (1993+): Mostly unchanged
Result: 35 survey years → 27 nominal years on consistent two-year grid
Model Interpretation¶
Regularized Cell-Means Estimator¶
This model is best understood as a regularized cell-means estimator over cohort × period. Where data are dense, estimates are effectively descriptive; where data are sparse or missing, RW2 smoothing provides principled interpolation.
Key interpretive framing:
The model estimates cell-level support probabilities without uniquely decomposing age, period, and cohort mechanisms
Where cell sample sizes are large, posterior estimates closely track observed proportions
Where cells are sparse or missing, the model interpolates smoothly across neighboring cohorts and years
Visualization Strategy¶
Recommended plots:
Heatmap of E[p_{c,t}]: Full latent surface showing cohort × year support
Cohort trajectories: Aggregated to decade-of-birth bins for readability
Coverage/exposure map: Sample sizes by cohort × year to show sparsity
Results and Validation¶
Posterior Predictive Checks¶
The model exhibits well-calibrated uncertainty and appropriate shrinkage:
Predicted posterior means align with observed cell proportions
Shrinkage increases smoothly as sample size decreases
Predictive intervals widen appropriately for small-n cells
No systematic bias in residuals
Boundary Cohort Handling¶
For sparsely observed boundary cohorts:
Predicted means exhibit principled shrinkage toward the global surface
Predictive intervals widen substantially, reflecting genuine uncertainty
No unstable extrapolation toward 0 or 1
Convergence Diagnostics¶
Version 10 diagnostics (for cappun variable):
Max R-hat: 1.01 (excellent)
Min ESS bulk: 174 (acceptable)
Min ESS tail: 314 (acceptable)
Divergences: 0 (0.00%)
Key parameter ESS:
sigma_c: bulk=174, tail=314sigma_t: bulk=2085, tail=2546
Batch Processing Results Summary¶
We ran Model 10 on 15 GSS variables to assess model performance across different substantive domains. The following tables summarize key results:
Table 1: Data Characteristics and Convergence Diagnostics
| Variable | Observations | Cells | Max R-hat | Min ESS Bulk | Min ESS Tail | Divergences |
|---|---|---|---|---|---|---|
| cappun | 61,471 | 1,654 | 1.01 | 174 | 314 | 0 |
| homosex | 43,635 | 1,543 | 1.03 | 190 | 531 | 0 |
| racopen | 33,773 | 1,157 | 1.62 | 7 | 11 | 1000 |
| fepol | 36,026 | 1,364 | 1.03 | 128 | 317 | 0 |
| abany | 40,685 | 1,430 | 1.03 | 128 | 447 | 0 |
| premarsx | 44,491 | 1,534 | 1.01 | 216 | 395 | 0 |
| prayer | 35,261 | 1,388 | 1.01 | 272 | 382 | 0 |
| natfare | 39,749 | 1,486 | 1.02 | 188 | 447 | 0 |
| grass | 38,884 | 1,412 | 1.03 | 178 | 477 | 0 |
| natenvir | 39,730 | 1,482 | 1.04 | 119 | 315 | 0 |
| divlaw | 38,837 | 1,395 | 1.01 | 172 | 237 | 0 |
| letdie1 | 35,993 | 1,326 | 1.01 | 180 | 449 | 0 |
| gunlaw | 49,029 | 1,610 | 1.08 | 41 | 74 | 0 |
| sexeduc | 41,885 | 1,487 | 1.06 | 115 | 234 | 0 |
| pornlaw | 45,120 | 1,499 | 1.01 | 249 | 247 | 0 |
Note: Values in bold indicate convergence problems. racopen shows severe convergence issues (R-hat 1.62, 25% divergences).
Table 2: Posterior Estimates for Smoothing Scales
| Variable | σ_c (mean ± sd) | σ_t (mean ± sd) | Sampling Time (min) |
|---|---|---|---|
| cappun | 0.010 ± 0.003 | 0.233 ± 0.045 | 0.0* |
| homosex | 0.060 ± 0.016 | 0.305 ± 0.056 | 0.0* |
| racopen | 0.105 ± 0.153 | 2.191 ± 1.231 | 0.0* |
| fepol | 0.013 ± 0.005 | 2.360 ± 0.245 | 1.2 |
| abany | 0.017 ± 0.005 | 0.146 ± 0.036 | 1.3 |
| premarsx | 0.020 ± 0.006 | 2.932 ± 0.278 | 1.7 |
| prayer | 0.028 ± 0.007 | 0.108 ± 0.046 | 1.5 |
| natfare | 0.010 ± 0.003 | 0.295 ± 0.049 | 1.1 |
| grass | 0.029 ± 0.008 | 3.267 ± 0.310 | 1.8 |
| natenvir | 0.021 ± 0.006 | 0.239 ± 0.043 | 1.5 |
| divlaw | 0.012 ± 0.004 | 1.296 ± 0.167 | 1.2 |
| letdie1 | 0.012 ± 0.004 | 0.069 ± 0.031 | 0.8 |
| gunlaw | 0.002 ± 0.001 | 0.114 ± 0.036 | 0.7 |
| sexeduc | 0.019 ± 0.007 | 1.400 ± 0.225 | 1.3 |
| pornlaw | 0.047 ± 0.012 | 0.093 ± 0.047 | 2.7 |
Note: * indicates results loaded from cache (sampling time not recorded).
Key Findings:
Convergence: Most variables (14 of 15) show good convergence (R-hat ≤ 1.01-1.08). One variable (
racopen) shows severe convergence problems:racopen: R-hat 1.62, ESS 7-11, 1,000/4,000 divergences (25%)
Analysis indicates this is due to 5 missing nominal years in the period grid (21 observed out of 26), creating poor sampling geometry
gunlaw(R-hat 1.08) andsexeduc(R-hat 1.06) show marginal convergence that may benefit from longer sampling
Effective Sample Size: Min ESS bulk ranges from 7 (
racopen) to 272 (prayer). Most variables (12 of 15) have ESS > 100, though several fall below the ideal threshold of 400. The lowest ESS values are associated with convergence problems (racopen,gunlaw).Smoothing Scales - Cohort Effects (
σ_c):Consistently small across all variables (0.002-0.105)
Indicates smooth cohort trajectories across all substantive domains
racopenshows elevated uncertainty (0.105 ± 0.153) due to convergence issues
Smoothing Scales - Period Effects (
σ_t):Shows substantial variation (0.069-3.267), reflecting different temporal dynamics:
Low period variation (
σ_t < 0.3):letdie1(0.069),gunlaw(0.114),prayer(0.108),pornlaw(0.093) - relatively stable over timeModerate period variation (
0.3 < σ_t < 1.5):cappun(0.233),abany(0.146),natfare(0.295),natenvir(0.239),homosex(0.305),divlaw(1.296),sexeduc(1.400) - moderate temporal changeHigh period variation (
σ_t > 2.0):fepol(2.360),premarsx(2.932),grass(3.267),racopen(2.191) - substantial temporal shifts
Variables with high period variation tend to be social issues that have undergone major attitude shifts over the survey period
Sampling Efficiency: Sampling times range from 0.7 to 2.7 minutes per variable for full runs, with most completing in 1-2 minutes. The model demonstrates good computational efficiency across all variables.
Divergences: 14 of 15 variables show zero divergences, indicating good sampling geometry and appropriate prior specification. Only
racopenshows significant divergences (25%), consistent with its convergence problems.
Future Directions¶
Potential Extensions¶
Ordered Logistic Model: For multi-category responses (e.g., 4-point agree/disagree scales)
Age Component: Explicit age effects if needed (currently absorbed in cohort-period structure)
2D Lexis Surface: Tensor spline approach if strong cohort-period interactions found
Survey Weights:
weighted_mean(n_eff) is implemented; bootstrap resampling available for sensitivity analysis
Current Limitations¶
Binary Outcomes: Most variables are binarized, losing ordering information
No Age Component: Age-related variation is reflected in cell probabilities but not separately parameterized
Additive Structure: Cohort and period effects are additive; interactions not currently modeled
Conclusion¶
The hierarchical Bayesian cohort-period model with RW2 priors provides a principled, defensible replacement for ad hoc smoothing approaches. The model:
Handles sparse boundary cohorts gracefully
Provides appropriate uncertainty quantification
Aligns with actual survey structure (two-year cadence)
Exhibits excellent convergence and sampling efficiency
Produces interpretable, substantively meaningful results
Version 10 (two-year cadence) is the recommended configuration, offering better convergence and structural alignment than the yearly grid approach while maintaining similar predictive accuracy.