About This Project¶
In most countries, women live longer than men. This difference is often assumed to be natural and inevitable. However, the gap varies substantially between countries and has changed over time, which suggests that it might not be entirely natural, or if it is, it can be mitigated.
This project explores differences in life expectancy and health-adjusted life expectancy (HALE) between countries, in order to identify the factors that contribute to the observed gender gaps and to understand what it would take to close those gaps by improving health outcomes for both men and women.
Contents¶
Executive Summary - Start here: A concise, accessible summary of the project’s motivation, methodology, key findings, and policy implications. Answers the question: “Why do women live longer than men, and what could be done about it?”
Bayesian Panel Data Model (2023, IHME HALE & OWID LE) - Primary analysis: Bayesian hierarchical panel model analyzing HALE and Life Expectancy gender gaps using both temporal variation (2000-2023 for both outcomes) and cross-country variation simultaneously. Uses IHME HALE and OWID LE data for extended temporal coverage and methodological consistency. Provides posterior distributions for all parameters with uncertainty quantification and enables temporal counterfactual analysis.
Bayesian Panel Data Model (2021, WHO HALE) - Legacy analysis: Previous version using WHO HALE data (2000-2021). Retained for comparison and historical reference.
Technical Report - Exploratory analysis using Elastic Net regression, developed as part of the model development process. Includes methodology, results, and counterfactual analysis using cross-sectional models.
Data Inventory - Data sources and metadata.
Neoplasms Drilldown Analysis - Granular analysis of the cancer mortality gender gap, including specific cancer types and risk factor attribution (Behavioral, Metabolic, Environmental).
Exploratory Data Analysis - Summary of exploratory data analysis of gender gaps in HALE and Life Expectancy, including target variables, predictors, summary statistics, and relationships between variables.
Validation Experiments - Model validation comparing WHO and IHME indicators to ensure results remain stable across data sources.
Time Series Analysis - Trends in health indicators and gender gaps (2000-2019), visualizing how HALE, Life Expectancy, and cause-specific mortality indicators have evolved over time.
Temporal Analysis - Evolution of health patterns and gender gaps over time (2000-2019). Runs predictive models at five-year intervals (2000, 2005, 2010, 2015, 2019) and compares results to examine how indicator importance and intervention opportunities have changed.
Methodology¶
Our primary analysis uses a Bayesian hierarchical panel model to analyze the gender gap in life expectancy and HALE. This approach leverages both temporal variation (2000-2023 for both outcomes) and cross-country variation simultaneously, providing several advantages:
Uncertainty quantification: All parameter estimates include posterior distributions with credible intervals
Country-specific effects: Accounts for unobserved country-level heterogeneity through random intercepts
Robust inference: Handles correlation among predictors while quantifying uncertainty in all estimates
The analysis focuses on OECD countries (37 countries excluding Turkey) using IHME HALE data (2000-2023) and OWID Life Expectancy data (2000-2023). This extended temporal range includes the full COVID-19 pandemic period (2020-2023) to understand its impact on gender gaps in health outcomes, including the post-acute recovery phase.
During the development process, we also explored Elastic Net regression models (see the Technical Report) to identify key predictors and validate our approach. These cross-sectional models helped inform the Bayesian panel model specification and provided initial insights into which cause-specific mortality indicators are most strongly associated with the gender gap.
We validate our results by comparing models using WHO indicators with models using IHME indicators, ensuring that conclusions remain stable across data sources. See the Validation Experiments for details.
Data Sources¶
WHO Global Health Observatory (GHO) API: Provides HALE, life expectancy, and various cause-specific mortality indicators
IHME Global Burden of Disease: Provides additional cause-specific mortality indicators with better temporal coverage
Key Findings¶
The analysis identifies several key factors that contribute to the gender gap in life expectancy and HALE:
Primary Drivers (External Causes):
Road traffic accidents: The single strongest predictor—countries where men die in traffic accidents at much higher rates have substantially larger longevity gaps
Suicide: The second or third strongest factor across both HALE and Life Expectancy
Homicide: A substantial contributor, especially for total lifespan
Drug disorders: Major factor, particularly important in countries experiencing opioid epidemics
Chronic Disease Contributors:
Chronic respiratory disease: Significant contributor, possibly reflecting smoking-related differences
Cancer (neoplasms): Moderate contributor to both HALE and Life Expectancy gaps
Cardiovascular disease & diabetes: Show “competing risk” effects—act as protective factors because they primarily affect people who survive to older ages
United States Example: If the USA could achieve best-in-class levels for key factors, the Life Expectancy gap could be reduced by approximately 2.5 years through improvements in:
Road traffic safety (-0.83 years)
Drug overdose prevention (-0.77 years)
Suicide prevention (-0.52 years)
Violence reduction (-0.29 years)
For detailed findings, counterfactual analysis, and uncertainty quantification, see the Executive Summary or the full Bayesian Panel Data Model (2023) report. For comparison with the previous WHO HALE-based analysis, see the 2021 Legacy Report. For exploratory analysis using Elastic Net regression, see the Technical Report.