Technical Report: Bayesian Cohort-Period Model for GSS Data

Introduction¶

This project develops a hierarchical Bayesian model for analyzing period-cohort effects in General Social Survey (GSS) data. The model addresses limitations of previous approaches that used ad hoc smoothing and binned cohorts, providing a principled framework for estimating how attitudes and behaviors change across birth cohorts and over time.

Problem Statement¶

Previous analyses of GSS data using period-cohort decomposition faced several challenges:

Noisy results at boundaries: Youngest and oldest cohorts have small sample sizes, leading to unstable estimates
Ad hoc smoothing: Existing approaches used LOWESS smoothing on chunked series without explicit statistical justification
Binned cohorts: Cohorts were grouped into 3-5 year bins, creating arbitrary discontinuities at bin edges
Frequentist limitations: Standard regression approaches don’t naturally handle uncertainty in sparse cells or provide principled interpolation

Proposed Solution¶

We develop a hierarchical Bayesian model that makes smoothing assumptions explicit and principled:

Likelihood: Binomial for binary responses (with ordered logistic as alternative for multi-category responses)
Link function: Logit
Smooth priors: Gaussian Random Walk of order 2 (RW2) for cohort and period effects

Model Specification¶

Basic Structure¶

For binary response data collapsed to cells (cohort × nominal year):

The cell count y(c,t) is Binomial with success probability p(c,t). The linear predictor on the logit scale is additive:

η(c,t) = α + f(c) + g(t) + ε(t)

where α is the baseline intercept, f(c) is a smooth cohort effect, g(t) is a smooth period effect, and ε(t) ~ N(0, σ_short) is extra-Binomial observation noise per nominal year (overdispersion), capturing year-level residual variance beyond the additive cohort+period structure. The probability is p(c,t) = logistic(η(c,t)).

RW2 Prior¶

A Gaussian Random Walk of order 2 (RW2) encourages locally linear trajectories:

Δ²f_c ~ N(0, σ_f)

This means the second differences are Gaussian:

f_k - 2f_{k-1} + f_{k-2} ~ N(0, σ)

Why RW2 over RW1:

RW1 (first differences) can produce “wiggly staircase” patterns
RW2 provides smoother long-range behavior, better for social trends
Better handles boundary noise in sparse regions

Overdispersion¶

The model includes extra-Binomial observation noise on the logit scale: ε_t ~ N(0, σ_short) per nominal year, so p = logistic(η + ε_t). This captures year-level residual variance (news cycles, wording drift, context effects) that the single additive trajectory cannot explain. Prior: σ_short ~ Exponential(0.2). Time-only (one ε per nominal year, not per cell) keeps the parameter count parsimonious.

Boundary Noise Handling¶

RW priors naturally handle sparse cells:

Sparse cells get pulled toward the smoother latent trajectory
Uncertainty widens appropriately at boundaries
Directly addresses the “youngest cohorts are noisy” problem

Implementation¶

PyMC Implementation¶

PyMC does not have built-in RW2, so we implement it manually using second differences and double cumulative sum:

# Smoothing scale
sigma_rw2 = pm.Exponential("sigma_rw2", 8.0)  # mean = 0.125

# Second differences
delta2 = pm.Normal(
    "delta2",
    mu=0.0,
    sigma=sigma_rw2,
    shape=K - 2
)

# Reconstruct latent function (double cumsum)
# Free init (centered): f0 ~ N(0, σ_L), f1 = f0 + δ with δ ~ N(0, σ_S); σ_L=0.5, σ_S=0.2 match non-centered boundary scales
# Legacy: pt.zeros(2) fixes first two at zero
f0 = pm.Normal("f_init_f0", 0, 0.5)
delta_to_f1 = pm.Normal("f_init_slope", 0, 0.2)
f1 = f0 + delta_to_f1
f_init = pt.stack([f0, f1])
f_raw = pt.concatenate([
    f_init,
    f_init[1] + pt.cumsum(f_init[1] - f_init[0] + pt.cumsum(delta2)),
])

# Center for identifiability (removes intercept redundancy)
f = f_raw - pt.mean(f_raw)

# Overdispersion: extra-Binomial noise per nominal year
sigma_short = pm.Exponential("sigma_short", 1.0 / 0.2)
epsilon = pm.Normal("epsilon", 0, sigma_short, dims="year_idx")
epsilon_cells = epsilon[cells["year_idx"].values]
p_obs = pm.math.sigmoid(eta + epsilon_cells)

Key Implementation Details¶

Null Space Handling: RW2 has a 2D null space (intercept + linear trend). We use free initial values at the boundary: first node f0 ~ N(0, 0.5) and step to the second node δ ~ N(0, 0.2) with f1 = f0 + δ (same typical scales as the non-centered boundary parameterization), then center for identifiability. We do not remove the linear trend; detrending in addition to centering would over-constrain the model. Free init produces more interpretable cohort trajectories than pinning the first two values at zero.
Overdispersion: Extra-Binomial noise ε_t ~ N(0, σ_short) per nominal year captures year-level residual variance. Prior: σ_short ~ Exponential(0.2). Time-only parameterization keeps the model parsimonious.
Survey Weights: We use weighted_mean (Kish effective sample size) to compute cell proportions and n_eff, incorporating GSS survey weights for population-level inference.
Prior Choice: Exponential priors with mean 0.125 (rate=8.0) for smoothing scales provide optimal balance of geometry and flexibility.
Single-Year Cohorts: We use single-year birth cohorts rather than bins, letting the RW2 prior provide the smoothing. This avoids arbitrary discontinuities at bin edges.

Model Development and Testing¶

Version History¶

We tested multiple model versions to optimize performance:

Version 1-5: Initial exploration of RW2 construction methods and prior choices
Version 6: Added linear trend removal (initially thought to improve identifiability)
Version 7: Switched to Exponential priors (best convergence and efficiency)
Version 8: Validated state-space construction (works but slower than double cumsum)
Version 9: Added complete yearly grid for period effects (31 observed + 17 interpolated years)
Version 10: Two-year cadence for period effects (27 nominal years, all observed); removed linear trend projection; free init (chained level/slope at boundary, SDs 0.5 / 0.2); overdispersion (ε_t per nominal year); weighted_mean for survey weights (n_eff)

Final Recommended Configuration¶

Version 10 (current recommended version):

Construction: Double cumsum RW2 with free initial values and centering only (no detrending)
Overdispersion: sigma_short ~ Exponential(5.0), epsilon_t ~ N(0, sigma_short) per nominal year
Survey weights: weighted_mean (Kish n_eff) for cell-level aggregation
Priors: sigma_c ~ Exponential(8.0), sigma_t ~ Exponential(8.0) (mean = 0.125)
Period grid: Two-year cadence (35 survey years → 27 nominal years, aligns with biennial surveys)
Sampler: nutpie with 4 chains, 4 cores
Performance:
- Sampling time: ~0.4 seconds (from cache) or ~2-3 minutes for full run
- Max R-hat: 1.01 (excellent convergence)
- Min ESS bulk: 174 (acceptable, though ideally > 400)
- No divergences

Model Comparison: Version 9 vs Version 10¶

We compared the yearly grid (Version 9) and two-year cadence (Version 10) approaches:

Version 9 (Yearly Grid):

48 years in full grid (31 observed, 17 with no data)
1,810 cells total
Higher sigma_t (0.810) suggesting need for more period variation
Convergence issues (R-hat 1.06, ESS 61)

Version 10 (Two-Year Cadence):

26 nominal years (all observed, no gaps)
1,654 cells total
Lower sigma_t (0.233) indicating smoother, more stable period effects
Better convergence (R-hat 1.01, ESS 174)
Better alignment with actual survey structure (biennial since 1993)

Trajectory Comparison:

Correlation: 0.80 between model predictions
Mean absolute difference: 0.0275 (2.75 percentage points)
Conclusion: Models produce similar predictions, but Version 10 has better convergence and structural alignment

Data Preparation¶

GSS Data¶

We use the General Social Survey (GSS) data, which has been conducted since 1972. The current analysis uses data through 2024.

Data Structure:

Respondents grouped by birth cohort (single-year, 1900-2000)
Survey years: 1972-2024 (35 survey years mapped to 27 nominal years in Version 10)
Binary outcomes: Conservative vs. non-conservative responses based on variable-specific definitions

Variable Selection: We focus on variables that:

Are binary or easily binarizable
Have been asked frequently over many years
Have clear substantive interpretation
Represent different domains (politics, social issues, etc.)

Current variables analyzed:

cappun: Capital punishment attitudes
homosex: Homosexuality attitudes
racopen: Open housing law attitudes
fepol: Women in politics attitudes

Survey Year Mapping (Version 10)¶

To implement the two-year cadence, we map actual survey years to nominal years:

Early annual years (1972-1991): Consolidated into two-year periods
Recent biennial years (1993+): Mostly unchanged
Result: 35 survey years → 27 nominal years on consistent two-year grid

Model Interpretation¶

Regularized Cell-Means Estimator¶

This model is best understood as a regularized cell-means estimator over cohort × period. Where data are dense, estimates are effectively descriptive; where data are sparse or missing, RW2 smoothing provides principled interpolation.

Key interpretive framing:

The model estimates cell-level support probabilities without uniquely decomposing age, period, and cohort mechanisms
Where cell sample sizes are large, posterior estimates closely track observed proportions
Where cells are sparse or missing, the model interpolates smoothly across neighboring cohorts and years

Visualization Strategy¶

Recommended plots:

Heatmap of E[p_{c,t}]: Full latent surface showing cohort × year support
Cohort trajectories: Aggregated to decade-of-birth bins for readability
Coverage/exposure map: Sample sizes by cohort × year to show sparsity

Results and Validation¶

Posterior Predictive Checks¶

The model exhibits well-calibrated uncertainty and appropriate shrinkage:

Predicted posterior means align with observed cell proportions
Shrinkage increases smoothly as sample size decreases
Predictive intervals widen appropriately for small-n cells
No systematic bias in residuals

Boundary Cohort Handling¶

For sparsely observed boundary cohorts:

Predicted means exhibit principled shrinkage toward the global surface
Predictive intervals widen substantially, reflecting genuine uncertainty
No unstable extrapolation toward 0 or 1

Convergence Diagnostics¶

Version 10 diagnostics (for cappun variable):

Max R-hat: 1.01 (excellent)
Min ESS bulk: 174 (acceptable)
Min ESS tail: 314 (acceptable)
Divergences: 0 (0.00%)
Key parameter ESS:
- sigma_c: bulk=174, tail=314
- sigma_t: bulk=2085, tail=2546

Batch Processing Results Summary¶

We ran Model 10 on 15 GSS variables to assess model performance across different substantive domains. The following tables summarize key results:

Table 1: Data Characteristics and Convergence Diagnostics

Variable	Observations	Cells	Max R-hat	Min ESS Bulk	Min ESS Tail	Divergences
cappun	61,471	1,654	1.01	174	314	0
homosex	43,635	1,543	1.03	190	531	0
racopen	33,773	1,157	1.62	7	11	1000
fepol	36,026	1,364	1.03	128	317	0
abany	40,685	1,430	1.03	128	447	0
premarsx	44,491	1,534	1.01	216	395	0
prayer	35,261	1,388	1.01	272	382	0
natfare	39,749	1,486	1.02	188	447	0
grass	38,884	1,412	1.03	178	477	0
natenvir	39,730	1,482	1.04	119	315	0
divlaw	38,837	1,395	1.01	172	237	0
letdie1	35,993	1,326	1.01	180	449	0
gunlaw	49,029	1,610	1.08	41	74	0
sexeduc	41,885	1,487	1.06	115	234	0
pornlaw	45,120	1,499	1.01	249	247	0

Note: Values in bold indicate convergence problems. racopen shows severe convergence issues (R-hat 1.62, 25% divergences).

Table 2: Posterior Estimates for Smoothing Scales

Variable	σ_c (mean ± sd)	σ_t (mean ± sd)	Sampling Time (min)
cappun	0.010 ± 0.003	0.233 ± 0.045	0.0*
homosex	0.060 ± 0.016	0.305 ± 0.056	0.0*
racopen	0.105 ± 0.153	2.191 ± 1.231	0.0*
fepol	0.013 ± 0.005	2.360 ± 0.245	1.2
abany	0.017 ± 0.005	0.146 ± 0.036	1.3
premarsx	0.020 ± 0.006	2.932 ± 0.278	1.7
prayer	0.028 ± 0.007	0.108 ± 0.046	1.5
natfare	0.010 ± 0.003	0.295 ± 0.049	1.1
grass	0.029 ± 0.008	3.267 ± 0.310	1.8
natenvir	0.021 ± 0.006	0.239 ± 0.043	1.5
divlaw	0.012 ± 0.004	1.296 ± 0.167	1.2
letdie1	0.012 ± 0.004	0.069 ± 0.031	0.8
gunlaw	0.002 ± 0.001	0.114 ± 0.036	0.7
sexeduc	0.019 ± 0.007	1.400 ± 0.225	1.3
pornlaw	0.047 ± 0.012	0.093 ± 0.047	2.7

Note: * indicates results loaded from cache (sampling time not recorded).

Key Findings:

Convergence: Most variables (14 of 15) show good convergence (R-hat ≤ 1.01-1.08). One variable (racopen) shows severe convergence problems:
- racopen: R-hat 1.62, ESS 7-11, 1,000/4,000 divergences (25%)
- Analysis indicates this is due to 5 missing nominal years in the period grid (21 observed out of 26), creating poor sampling geometry
- gunlaw (R-hat 1.08) and sexeduc (R-hat 1.06) show marginal convergence that may benefit from longer sampling
Effective Sample Size: Min ESS bulk ranges from 7 (racopen) to 272 (prayer). Most variables (12 of 15) have ESS > 100, though several fall below the ideal threshold of 400. The lowest ESS values are associated with convergence problems (racopen, gunlaw).
Smoothing Scales - Cohort Effects (σ_c):
- Consistently small across all variables (0.002-0.105)
- Indicates smooth cohort trajectories across all substantive domains
- racopen shows elevated uncertainty (0.105 ± 0.153) due to convergence issues
Smoothing Scales - Period Effects (σ_t):
- Shows substantial variation (0.069-3.267), reflecting different temporal dynamics:
  - Low period variation (σ_t < 0.3): letdie1 (0.069), gunlaw (0.114), prayer (0.108), pornlaw (0.093) - relatively stable over time
  - Moderate period variation (0.3 < σ_t < 1.5): cappun (0.233), abany (0.146), natfare (0.295), natenvir (0.239), homosex (0.305), divlaw (1.296), sexeduc (1.400) - moderate temporal change
  - High period variation (σ_t > 2.0): fepol (2.360), premarsx (2.932), grass (3.267), racopen (2.191) - substantial temporal shifts
- Variables with high period variation tend to be social issues that have undergone major attitude shifts over the survey period
Sampling Efficiency: Sampling times range from 0.7 to 2.7 minutes per variable for full runs, with most completing in 1-2 minutes. The model demonstrates good computational efficiency across all variables.
Divergences: 14 of 15 variables show zero divergences, indicating good sampling geometry and appropriate prior specification. Only racopen shows significant divergences (25%), consistent with its convergence problems.

Future Directions¶

Potential Extensions¶

Ordered Logistic Model: For multi-category responses (e.g., 4-point agree/disagree scales)
Age Component: Explicit age effects if needed (currently absorbed in cohort-period structure)
2D Lexis Surface: Tensor spline approach if strong cohort-period interactions found
Survey Weights: weighted_mean (n_eff) is implemented; bootstrap resampling available for sensitivity analysis

Current Limitations¶

Binary Outcomes: Most variables are binarized, losing ordering information
No Age Component: Age-related variation is reflected in cell probabilities but not separately parameterized
Additive Structure: Cohort and period effects are additive; interactions not currently modeled

Conclusion¶

The hierarchical Bayesian cohort-period model with RW2 priors provides a principled, defensible replacement for ad hoc smoothing approaches. The model:

Handles sparse boundary cohorts gracefully
Provides appropriate uncertainty quantification
Aligns with actual survey structure (two-year cadence)
Exhibits excellent convergence and sampling efficiency
Produces interpretable, substantively meaningful results

Version 10 (two-year cadence) is the recommended configuration, offering better convergence and structural alignment than the yearly grid approach while maintaining similar predictive accuracy.