Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Model Validation: Replacing WHO Indicators with IHME Indicators

Purpose

This document compares the results of replacing WHO indicators with IHME indicators one at a time to validate that data source changes don’t introduce unexpected artifacts or substantially alter model conclusions. Each replacement is tested independently by re-running the analysis and comparing results to the baseline (all WHO indicators).

This validation process serves as a form of model validation, ensuring that:

Validation Framework

For each indicator replacement, we compare the following metrics:

Model Performance:

Feature Importance:

Counterfactual Analysis:

Key Questions for Each Replacement:

  1. Does the indicator maintain its relative importance ranking?

  2. Are the counterfactual effects similar in magnitude?

  3. Do the model performance metrics (R², MAE) change significantly?

  4. Are definitional differences between WHO and IHME indicators understood?

Alcohol: WHO → IHME

Replacement Details

  1. Does Alcohol remain the most important indicator for Life Expectancy gap?

  2. Does Alcohol remain the second most important indicator for HALE gap (after Neoplasms)?

  3. Are the counterfactual effects similar in magnitude?

  4. Do the model performance metrics (R², MAE) change significantly?

Baseline Results (WHO Alcohol)

Life Expectancy Gap Model

Counterfactual Analysis (USA):

HALE Gap Model

Counterfactual Analysis (USA):

New Results (IHME Alcohol)

Life Expectancy Gap Model

Counterfactual Analysis (USA):

HALE Gap Model

Counterfactual Analysis (USA):

Comparison and Conclusions

Major Differences Identified

1. Alcohol Gap Values Are Dramatically Different

Critical Finding: The alcohol gap values are completely different between WHO and IHME data:

This represents an 86% reduction in the measured gap. This is not a small difference—it’s a fundamental difference in how alcohol-related mortality is measured.

Possible Explanations:

2. Alcohol Importance Dropped Substantially

Life Expectancy:

HALE:

This is a major change that exceeds the 10% threshold for significant differences defined in the validation criteria.

3. Model Performance Improved

Both models show improved performance with IHME data:

This improvement may reflect:

4. Counterfactual Effects Reduced

Life Expectancy:

HALE:

The counterfactual effects are substantially smaller, reflecting the much smaller alcohol gaps in the IHME data.

5. Ranking Changes

Life Expectancy top indicators:

HALE top indicators:

Neoplasms remains the top indicator for HALE, but Alcohol has dropped out of the top 3 for both models.

Answers to Key Questions

  1. Does Alcohol remain the most important indicator for Life Expectancy gap?

    • NO — Alcohol dropped from #1 to #4, with importance reduced by 87.6%

  2. Does Alcohol remain the second most important indicator for HALE gap?

    • NO — Alcohol dropped from #2 to #5, with importance reduced by 87.9%

  3. Are the counterfactual effects similar in magnitude?

    • NO — Counterfactual effects reduced by 58-61%, reflecting much smaller alcohol gaps in IHME data

  4. Do the model performance metrics (R², MAE) change significantly?

    • YES, but positively — Both R² and MAE improved, suggesting better model fit with IHME data

Implications

The replacement of WHO alcohol data with IHME alcohol data has substantial impacts on model results:

  1. Alcohol is no longer the dominant factor — The dramatic reduction in alcohol importance suggests that the WHO “alcohol-attributable all-cause deaths” definition captures a much broader set of alcohol-related mortality than IHME “alcohol use disorders.”

  2. Other indicators gain importance — With Alcohol’s reduced importance, other indicators (Neoplasms, UnintentionalInjury, ChronicRespiratory) become relatively more important.

  3. Model performance improved — Despite the change in Alcohol’s role, overall model performance improved, suggesting the IHME data may be more consistent or higher quality.

  4. Counterfactual analysis implications — The much smaller alcohol gaps in IHME data suggest that either:

    • The IHME definition is more restrictive (only direct alcohol use disorders)

    • The WHO definition is more comprehensive (includes all alcohol-attributable deaths)

    • There are methodological differences in how the two organizations estimate alcohol-related mortality

Recommendations

  1. Investigate definitional differences — The 86% difference in alcohol gap values requires investigation into how WHO and IHME define and measure alcohol-related mortality.

  2. Consider using both definitions — Depending on the research question, one definition may be more appropriate:

    • WHO definition (alcohol-attributable all-cause deaths): Better for understanding the full burden of alcohol on mortality

    • IHME definition (alcohol use disorders): Better for understanding direct alcohol-related health conditions

  3. Document the choice — The choice between WHO and IHME alcohol data significantly affects model conclusions. This choice should be clearly documented and justified based on the research question.

  4. Update reporting — If using IHME data, the conclusions in hale_gaps.md need to be updated to reflect that Alcohol is no longer the dominant factor, and other indicators (particularly Neoplasms and UnintentionalInjury) are relatively more important.

  5. Validate other indicators — Before replacing other indicators (suicide, homicide, road traffic), validate that the definitional differences are understood and acceptable.

Next Steps

  1. Compare WHO and IHME alcohol data definitions and methodologies

  2. Check country coverage differences

  3. Decide whether to use IHME or WHO alcohol data based on research objectives

  4. If using IHME, update hale_gaps.md with new results

  5. Proceed with caution when replacing other indicators


Suicide: WHO → IHME

Replacement Details

Key Questions

  1. Does Suicide maintain its relative importance ranking?

  2. Are the counterfactual effects similar in magnitude?

  3. Do the model performance metrics (R², MAE) change significantly?

  4. Are definitional differences between WHO and IHME indicators understood?

Baseline Results (WHO Suicide)

Life Expectancy Gap Model:

HALE Gap Model:

New Results (IHME Self-Harm)

Life Expectancy Gap Model:

Counterfactual Analysis (USA):

HALE Gap Model:

Counterfactual Analysis (USA):

Comparison and Conclusions

Major Differences Identified

1. Suicide Importance Increased

Life Expectancy:

HALE:

This is a significant increase that exceeds the 10% threshold for significant differences, though the magnitude is smaller than the alcohol change.

2. Suicide Gap Values Are Slightly Higher

This is a moderate difference, much smaller than the alcohol gap difference (86%). The target gap also increased slightly (4.0 → 4.51, +13%).

3. Model Performance Improved

Both models show improved performance with IHME data:

4. Counterfactual Effects

Life Expectancy:

HALE:

The Life Expectancy counterfactual effect increased substantially, while the HALE effect remained very similar.

Answers to Key Questions

  1. Does Suicide maintain its relative importance ranking?

    • PARTIALLY — Suicide importance increased significantly (+139% for LE, +42% for HALE), moving it into the top 4-5 indicators. It was not in the top 3 in the baseline, and remains outside the top 3 in the new results, but its relative importance has increased.

  2. Are the counterfactual effects similar in magnitude?

    • MIXED — For Life Expectancy, the counterfactual effect increased by 75% (-0.39 → -0.682 years). For HALE, the effect remained very similar (-0.79 → -0.77 years, -2.5% change).

  3. Do the model performance metrics (R², MAE) change significantly?

    • YES, but positively — Both R² and MAE improved for both models, suggesting better model fit with IHME data.

  4. Are definitional differences between WHO and IHME indicators understood?

    • PARTIALLY — Both measure intentional self-harm/suicide, but the 16% difference in gap values suggests there may be methodological differences in how the data is collected or estimated.

Implications

The replacement of WHO suicide data with IHME self-harm data has moderate impacts on model results:

  1. Suicide importance increased — The increase in importance (+139% for LE, +42% for HALE) suggests that IHME data may capture suicide-related mortality more effectively or consistently than WHO data, or that the slightly higher gap values in IHME data make suicide a more predictive factor.

  2. Counterfactual effects vary by outcome — The Life Expectancy counterfactual effect increased substantially (+75%), while the HALE effect remained nearly identical. This suggests that suicide may have a stronger relationship with overall life expectancy than with healthy life expectancy when using IHME data.

  3. Model performance improved — Overall model performance improved, suggesting the IHME data may be more consistent or higher quality.

  4. Gap values are similar but not identical — The 16% difference in suicide gap values is moderate compared to the 86% difference seen with alcohol, suggesting that WHO and IHME definitions of suicide/self-harm are more similar than their definitions of alcohol-related mortality.

Recommendations

  1. Investigate the 16% difference — While smaller than the alcohol difference, the 16% difference in suicide gap values should be understood. This may reflect:

    • Different data sources or estimation methods

    • Different classification systems for intentional self-harm

    • Temporal differences (IHME may use more recent data)

  2. Consider the increased importance — The substantial increase in suicide importance suggests that IHME data may be more predictive. This could be due to better data quality, more consistent methodology, or the slightly higher gap values making suicide a stronger predictor.

  3. Document the choice — The choice between WHO and IHME suicide data affects model conclusions, though less dramatically than the alcohol choice. Document the rationale for using IHME data (better temporal coverage, consistent methodology with other IHME indicators).

  4. Proceed with other replacements — The suicide replacement shows moderate but acceptable changes. The improvements in model performance and the reasonable similarity in gap values suggest that IHME data is a good alternative to WHO data for suicide/self-harm.


Homicide: WHO → IHME

Replacement Details

Key Questions

  1. Does Homicide maintain its relative importance ranking?

  2. Are the counterfactual effects similar in magnitude?

  3. Do the model performance metrics (R², MAE) change significantly?

  4. Are definitional differences between WHO and IHME indicators understood?

Baseline Results (WHO Homicide)

Life Expectancy Gap Model:

HALE Gap Model:

New Results (IHME Interpersonal Violence)

Life Expectancy Gap Model:

Counterfactual Analysis (USA):

HALE Gap Model:

Counterfactual Analysis (USA):

Comparison and Conclusions

Major Differences Identified

1. Homicide Dropped Out of Life Expectancy Model

Critical Finding: Homicide was not selected by Elastic Net for the Life Expectancy model when using IHME data. This means the model determined that homicide does not contribute significantly to explaining the Life Expectancy gap when using IHME data.

Possible Explanations:

2. Homicide Importance Decreased for HALE

HALE:

This is a significant decrease that exceeds the 10% threshold for significant differences.

3. Model Performance Unchanged

Both models show the same performance as with the suicide replacement:

This suggests that the homicide replacement did not affect overall model performance, likely because homicide was not a major contributor to model fit.

4. Counterfactual Effects

Life Expectancy:

HALE:

The HALE counterfactual effect decreased, reflecting the lower importance of homicide in the model.

Answers to Key Questions

  1. Does Homicide maintain its relative importance ranking?

    • NO — Homicide dropped out of the Life Expectancy model entirely (not selected by Elastic Net), and decreased in importance for HALE (-28%, from 2.2 to 1.58).

  2. Are the counterfactual effects similar in magnitude?

    • NO — For Life Expectancy, the counterfactual effect is 0 (indicator not selected). For HALE, the effect decreased by 24% (-0.10 → -0.0758 years).

  3. Do the model performance metrics (R², MAE) change significantly?

    • NO — Model performance remained the same as with the suicide replacement, suggesting homicide was not a major contributor to model fit.

  4. Are definitional differences between WHO and IHME indicators understood?

    • PARTIALLY — Both measure intentional homicide/interpersonal violence, but the fact that IHME homicide was not selected for the Life Expectancy model suggests there may be meaningful differences in how the data is collected, estimated, or distributed.

Implications

The replacement of WHO homicide data with IHME interpersonal violence data has significant impacts on model results:

  1. Homicide is no longer a factor in Life Expectancy model — The fact that Elastic Net did not select homicide for the Life Expectancy model suggests that either:

    • IHME homicide data is less predictive than WHO data

    • Other indicators (particularly Suicide) capture the same variance

    • The data distributions or country coverage differ in ways that reduce homicide’s predictive power

  2. Homicide importance decreased for HALE — The 28% decrease in importance for HALE suggests that IHME homicide data is less predictive than WHO data, though it remains a selected indicator.

  3. Model performance unaffected — The fact that model performance did not change suggests that homicide was not a critical factor for model fit, and other indicators (particularly Suicide, which increased in importance) may capture similar variance.

  4. Gap values are similar — The homicide gap value (7.15) appears in the counterfactual analysis, suggesting the values are reasonable, but the model determined they are not predictive enough to include.

Recommendations

  1. Investigate why homicide was not selected — The fact that homicide was not selected for the Life Expectancy model requires investigation:

    • Compare WHO and IHME homicide gap values and distributions

    • Check for multicollinearity with other indicators (particularly Suicide)

    • Verify country coverage differences

    • Examine whether IHME data quality or methodology differs significantly

  2. Consider the relationship with Suicide — The increase in Suicide importance (+139% for LE, +42% for HALE) may have come at the expense of Homicide. These indicators may be capturing similar variance, and Elastic Net selected Suicide as the more predictive indicator.

  3. Document the choice — The choice between WHO and IHME homicide data affects model conclusions, particularly for Life Expectancy where homicide is no longer a factor. Document the rationale for using IHME data and note that homicide is not selected for the Life Expectancy model.

  4. Proceed with caution — The fact that homicide was not selected for the Life Expectancy model suggests that IHME homicide data may be less suitable than WHO data, or that the model structure has changed in ways that make homicide less relevant. Consider whether to use WHO homicide data for Life Expectancy if homicide is an important factor for the research question.


Road Traffic: WHO → IHME

Replacement Details

Key Questions

  1. Does RoadTraffic maintain its relative importance ranking?

  2. Are the counterfactual effects similar in magnitude?

  3. Do the model performance metrics (R², MAE) change significantly?

  4. Are definitional differences between WHO and IHME indicators understood?

Baseline Results (WHO Road Traffic)

Life Expectancy Gap Model:

HALE Gap Model:

New Results (IHME Road Injuries)

Life Expectancy Gap Model:

Counterfactual Analysis (USA):

HALE Gap Model:

Counterfactual Analysis (USA):

Comparison and Conclusions

Major Differences Identified

1. RoadTraffic Has Very Low Importance

Life Expectancy:

HALE:

RoadTraffic has very low importance in both models, suggesting it is not a major predictive factor for either Life Expectancy or HALE gaps.

2. Model Performance

Life Expectancy:

HALE:

The HALE model performance improved substantially with the road traffic replacement, while Life Expectancy performance remained similar to previous replacements.

3. Counterfactual Effects

Life Expectancy:

HALE:

The HALE counterfactual effect increased slightly, while the Life Expectancy effect is very small.

4. Gap Component Not Selected for Life Expectancy

Critical Finding: For the Life Expectancy model, only the Mid component of RoadTraffic was selected (0.111), while the Gap component was not selected (0). This suggests that:

Answers to Key Questions

  1. Does RoadTraffic maintain its relative importance ranking?

    • YES, but with very low importance — RoadTraffic has very low importance in both models (0.111 for LE, 0.633 for HALE), ranking #8-9. It was not in the top 3 in the baseline, and remains outside the top 3 in the new results.

  2. Are the counterfactual effects similar in magnitude?

    • MIXED — For Life Expectancy, the counterfactual effect is very small (-0.0391 years). For HALE, the effect is similar to baseline (-0.20 → -0.226 years, +13% increase).

  3. Do the model performance metrics (R², MAE) change significantly?

    • YES, but positively — Both R² and MAE improved for both models compared to baseline. HALE R² improved substantially (+10.8%), while Life Expectancy R² improved modestly (+2.2%).

  4. Are definitional differences between WHO and IHME indicators understood?

    • PARTIALLY — Both measure road traffic crash/injury deaths, but WHO data is age-standardized for ages 15+ while IHME covers all ages. The very low importance suggests that road traffic may not be a major factor in explaining gender gaps, or that the age standardization difference affects the predictive power.

Implications

The replacement of WHO road traffic data with IHME road injuries data has minimal impacts on model results:

  1. RoadTraffic has very low importance — The very low importance values (0.111 for LE, 0.633 for HALE) suggest that road traffic deaths are not a major predictive factor for gender gaps in Life Expectancy or HALE, at least when using IHME data.

  2. Gap component not selected for Life Expectancy — The fact that only the Mid component was selected for Life Expectancy suggests that the gender gap in road traffic deaths does not contribute to explaining the Life Expectancy gap when using IHME data. This may reflect:

    • The age standardization difference (WHO: 15+, IHME: all ages)

    • Different data distributions or country coverage

    • The gender gap in road traffic deaths may be less predictive than the average rate

  3. Model performance improved — Overall model performance improved, particularly for HALE (+10.8% R² improvement). This suggests that IHME data may be more consistent or higher quality, even though RoadTraffic itself has low importance.

  4. Counterfactual effects are small — The counterfactual effects are small for both models, reflecting the low importance of RoadTraffic. The HALE effect is slightly larger than baseline (+13%), but still relatively small.

Recommendations

  1. Accept the low importance — The very low importance of RoadTraffic suggests it is not a major factor in explaining gender gaps. This is acceptable and may reflect that road traffic deaths, while important for overall mortality, do not contribute significantly to gender gaps in Life Expectancy or HALE.

  2. Consider age standardization — The fact that WHO data is age-standardized for ages 15+ while IHME covers all ages may affect the predictive power. However, given the very low importance, this difference is unlikely to be critical.

  3. Document the choice — The choice between WHO and IHME road traffic data has minimal impact on model conclusions due to the low importance of RoadTraffic. Document the rationale for using IHME data (better temporal coverage, consistent methodology with other IHME indicators).

  4. Proceed with confidence — The road traffic replacement shows minimal changes and improved model performance. The IHME data appears to be a good alternative to WHO data for road traffic, though RoadTraffic itself is not a major factor in the models.


Removing WHO Poisoning: Keeping Only IHME DrugDisorder

Replacement Details

Key Questions

  1. Does removing Poisoning affect model performance?

  2. Does DrugDisorder maintain its importance (or gain importance)?

  3. Are there any changes in other indicators’ importance?

Baseline Results (Both Poisoning and DrugDisorder)

Life Expectancy Gap Model:

HALE Gap Model:

New Results (Only DrugDisorder, No Poisoning)

Life Expectancy Gap Model:

HALE Gap Model:

Comparison and Conclusions

Major Findings

1. No Impact on Model Performance

Both models show identical performance before and after removing Poisoning:

2. Neither Indicator Was Selected

Critical Finding: Both Poisoning and DrugDisorder had importance = 0 in the baseline model, meaning Elastic Net did not select either indicator. After removing Poisoning, DrugDisorder still has importance = 0, meaning it is still not selected.

This indicates that:

3. Counterfactual Effects

Life Expectancy:

HALE:

Answers to Key Questions

  1. Does removing Poisoning affect model performance?

    • NO — Model performance is identical (R² and MAE unchanged). This is expected since Poisoning was not selected by Elastic Net in the baseline.

  2. Does DrugDisorder maintain its importance (or gain importance)?

    • NO CHANGE — DrugDisorder still has importance = 0 (not selected). It does not gain importance after Poisoning is removed, suggesting they don’t capture overlapping variance.

  3. Are there any changes in other indicators’ importance?

    • NO — All other indicators maintain the same importance values as in the baseline (with Road Traffic replacement).

Implications

The removal of WHO Poisoning has no impact on model results:

  1. Poisoning was not contributing — The fact that Poisoning had importance = 0 in the baseline means it was not selected by Elastic Net and was not contributing to model fit. Removing it has no effect.

  2. DrugDisorder also not contributing — DrugDisorder also has importance = 0, meaning it is not selected by Elastic Net either. This suggests that drug-related mortality (whether captured by Poisoning or DrugDisorder) does not contribute significantly to explaining gender gaps in Life Expectancy or HALE.

  3. No redundancy — The fact that DrugDisorder does not gain importance after Poisoning is removed suggests they don’t capture the same variance. However, since neither is selected, this is not a critical finding.

  4. Model is robust — The model performance is unchanged, confirming that neither indicator was important for model fit.

Recommendations

  1. Accept the removal — Removing Poisoning has no negative impact since it wasn’t contributing to the model. The model now uses only DrugDisorder (IHME), which provides better temporal coverage.

  2. Note that DrugDisorder is also not selected — While DrugDisorder remains in the model, it is not selected by Elastic Net (importance = 0). This suggests that drug-related mortality may not be a major factor in explaining gender gaps, at least with the current data and model structure.

  3. Document the choice — The removal of Poisoning is justified by:

    • Better temporal coverage in DrugDisorder (1990-2023 vs 2000-2021)

    • More comprehensive capture of drug overdose deaths

    • No impact on model performance (since Poisoning wasn’t selected)

  4. Consider future analysis — If drug-related mortality becomes more important in future analyses or with different model specifications, both indicators could be re-evaluated. However, for the current analysis, neither contributes significantly.


Adding Liver Disease Indicator (IHME)

Addition Details

Key Questions

  1. Does adding Liver Disease improve model performance?

  2. What is the importance of Liver Disease relative to other indicators?

  3. Are the counterfactual effects meaningful?

  4. How does Liver Disease relate to Alcohol (since many liver disease deaths are alcohol-related)?

Baseline Results (Before Adding Liver Disease)

Life Expectancy Gap Model:

HALE Gap Model:

New Results (With Liver Disease Added)

Life Expectancy Gap Model:

Counterfactual Analysis (USA):

HALE Gap Model:

Counterfactual Analysis (USA):

Comparison and Conclusions

Major Findings

1. Liver Disease Has Moderate Importance

Life Expectancy:

HALE:

Liver Disease has moderate importance in both models, suggesting it contributes meaningfully to explaining gender gaps in Life Expectancy and HALE.

2. Model Performance Changes

Life Expectancy:

HALE:

3. Counterfactual Effects Are Meaningful

Life Expectancy:

HALE:

4. Relationship to Alcohol

Key Observation: Liver Disease (importance 2.12 for LE, 2.39 for HALE) has higher importance than Alcohol (importance 1.62 for LE, 1.8 for HALE) in both models. This is interesting because:

5. Impact on Other Indicators

Life Expectancy:

HALE:

The addition of Liver Disease appears to have redistributed some importance, particularly affecting Alcohol and Cardiovascular indicators.

Answers to Key Questions

  1. Does adding Liver Disease improve model performance?

    • MIXED — Life Expectancy R² improved slightly (+0.8%), but MAE increased slightly (+2.9%). HALE R² decreased (-5.9%), but MAE improved (-5.0%). The changes are relatively small, suggesting that Liver Disease adds some predictive power but doesn’t dramatically change model performance.

  2. What is the importance of Liver Disease relative to other indicators?

    • MODERATE — Liver Disease ranks #4 for Life Expectancy (importance 2.12) and #6 for HALE (importance 2.39). It has moderate importance, ranking above Alcohol in both models.

  3. Are the counterfactual effects meaningful?

    • YES — Counterfactual effects are moderate (-0.213 years for LE, -0.2 years for HALE), suggesting that reducing liver disease gender gaps could meaningfully reduce overall gender gaps.

  4. How does Liver Disease relate to Alcohol?

    • COMPLEX — Liver Disease has higher importance than Alcohol in both models, which is interesting because many liver disease deaths are alcohol-related. However, Liver Disease captures all liver disease deaths (alcoholic and non-alcoholic), while Alcohol (IHME) only captures direct alcohol use disorder deaths. This suggests that Liver Disease may be capturing some of the alcohol-related mortality that was previously captured by WHO’s broader “alcohol-attributable” definition.

Implications

The addition of Liver Disease as an indicator has moderate impacts on model results:

  1. Liver Disease has moderate importance — The moderate importance values (2.12 for LE, 2.39 for HALE) suggest that liver disease contributes meaningfully to explaining gender gaps in Life Expectancy and HALE.

  2. Model performance changes are small — The changes in R² and MAE are relatively small, suggesting that Liver Disease adds some predictive power but doesn’t dramatically change model performance.

  3. Counterfactual effects are meaningful — The counterfactual effects (-0.213 years for LE, -0.2 years for HALE) suggest that reducing liver disease gender gaps could meaningfully reduce overall gender gaps.

  4. Relationship to Alcohol is complex — Liver Disease has higher importance than Alcohol in both models, which may reflect that Liver Disease captures a broader set of alcohol-related mortality than the narrow IHME “alcohol use disorders” definition.

  5. Some redistribution of importance — The addition of Liver Disease appears to have redistributed some importance, particularly affecting Alcohol and Cardiovascular indicators.

Recommendations

  1. Keep Liver Disease in the model — The moderate importance and meaningful counterfactual effects suggest that Liver Disease should be included in the model.

  2. Consider the relationship to Alcohol — The fact that Liver Disease has higher importance than Alcohol suggests that it may be capturing some of the alcohol-related mortality that was previously captured by WHO’s broader “alcohol-attributable” definition. This is consistent with the understanding that many liver disease deaths are alcohol-related.

  3. Document the choice — The addition of Liver Disease adds a meaningful indicator that captures an important cause of death with good temporal coverage and country coverage. Document the rationale for including it.

  4. Monitor model performance — The small changes in model performance suggest that Liver Disease adds value without dramatically changing the model. Continue to monitor model performance as other indicators are added or modified.


Removing Maternal Mortality Indicator

Removal Details

Key Questions

  1. Does removing Maternal Mortality affect model performance (R², MAE)?

  2. How do other indicators’ importance values change after removal?

  3. Are there any changes in the ranking of top indicators?

  4. Does removing Maternal Mortality improve model interpretability (by removing counterintuitive associations)?

Baseline Results (With Maternal Mortality)

Life Expectancy Gap Model:

HALE Gap Model:

New Results (Without Maternal Mortality)

Life Expectancy Gap Model:

Counterfactual Analysis (USA):

HALE Gap Model:

Counterfactual Analysis (USA):

Comparison and Conclusions

Major Findings

1. Model Performance Changes Are Small

Life Expectancy:

HALE:

2. Importance Redistribution After Removal

Life Expectancy:

HALE:

3. Cardiovascular and Homicide Gained Substantial Importance in HALE Model

Critical Finding: After removing MaternalMortality, Cardiovascular and Homicide showed substantial increases in importance in the HALE model:

This suggests that MaternalMortality may have been capturing some variance that is now being captured by Cardiovascular and Homicide. This could indicate:

4. Homicide Was Newly Selected for Life Expectancy Model

Life Expectancy:

This suggests that MaternalMortality may have been suppressing Homicide’s selection in the Life Expectancy model, possibly due to multicollinearity or shared variance.

5. Counterfactual Effects

Since MaternalMortality was removed, there are no counterfactual effects to compare. However, the removal of MaternalMortality’s counterfactual effect (which would have been positive, counterintuitively) improves model interpretability.

Answers to Key Questions

  1. Does removing Maternal Mortality affect model performance (R², MAE)?

    • MINIMAL IMPACT — Life Expectancy R² decreased slightly (-0.6%), while HALE R² improved slightly (+2.5%). MAE increased slightly for both models (+7.2% for LE, +3.7% for HALE). The changes are relatively small, suggesting MaternalMortality was not critical for model fit.

  2. How do other indicators’ importance values change after removal?

    • MIXED — For Life Expectancy, most indicators showed small decreases in importance, while Homicide was newly selected. For HALE, Cardiovascular and Homicide showed substantial increases (+111% and +60% respectively), while other indicators showed small changes.

  3. Are there any changes in the ranking of top indicators?

    • YES, for HALE — Cardiovascular moved from #4 to #2, and Homicide moved from #8 to #5. For Life Expectancy, the top rankings remained similar, with Homicide newly entering at #7.

  4. Does removing Maternal Mortality improve model interpretability?

    • YES — Removing the counterintuitive positive coefficient for MaternalMortality improves model interpretability. The fact that Cardiovascular and Homicide gained importance after removal suggests that MaternalMortality may have been capturing spurious associations related to general healthcare quality.

Implications

The removal of Maternal Mortality has moderate impacts on model results:

  1. Model performance is largely unchanged — The small changes in R² and MAE suggest that MaternalMortality was not critical for model fit, supporting the decision to remove it due to the counterintuitive coefficient.

  2. Cardiovascular and Homicide gained substantial importance in HALE model — The large increases in importance for Cardiovascular (+111%) and Homicide (+60%) suggest that MaternalMortality may have been suppressing these indicators, possibly due to multicollinearity or shared variance related to healthcare quality.

  3. Homicide was newly selected for Life Expectancy model — This suggests that MaternalMortality was suppressing Homicide’s selection, and removing it allows Homicide to contribute to the model.

  4. Removal improves interpretability — Removing the counterintuitive positive coefficient for MaternalMortality improves model interpretability, as higher female mortality should close the gap, not widen it.

  5. Spurious association hypothesis supported — The fact that removing MaternalMortality allows other indicators (particularly Cardiovascular and Homicide) to gain importance supports the hypothesis that MaternalMortality was capturing a spurious association related to general healthcare quality rather than a direct causal relationship.

Recommendations

  1. Keep Maternal Mortality removed — The removal of MaternalMortality improves model interpretability by eliminating the counterintuitive positive coefficient. The small impact on model performance and the redistribution of importance to other indicators (particularly Cardiovascular and Homicide) support this decision.

  2. Investigate the relationship with Cardiovascular and Homicide — The substantial increases in importance for Cardiovascular and Homicide after removing MaternalMortality suggest there may be shared variance related to healthcare quality. This relationship should be investigated further.

  3. Document the rationale — The removal of MaternalMortality is justified by:

    • Counterintuitive positive coefficient (higher female mortality should close gap, not widen it)

    • Minimal impact on model performance

    • Improvement in model interpretability

    • Redistribution of importance to other indicators that may better capture the underlying relationships

  4. Monitor model performance — The small changes in model performance suggest that removing MaternalMortality does not harm model fit. Continue to monitor model performance as other indicators are added or modified.

  5. Consider the healthcare quality proxy hypothesis — The fact that Cardiovascular and Homicide gained importance after removing MaternalMortality supports the hypothesis that MaternalMortality was acting as a proxy for general healthcare quality. This relationship should be considered when interpreting model results.


Replacing WHO Under-Five Mortality with IHME All-Cause Under 5

Replacement Details

Key Questions

  1. Does replacing WHO U5MR with IHME All-Cause Under 5 affect model performance (R², MAE)?

  2. How does the Childhood indicator’s importance change after replacement?

  3. Are there any changes in the ranking of top indicators?

  4. Does the replacement improve model interpretability or temporal coverage?

Baseline Results (With WHO U5MR)

Life Expectancy Gap Model:

HALE Gap Model:

New Results (With IHME All-Cause Under 5)

Life Expectancy Gap Model:

Counterfactual Analysis (USA):

HALE Gap Model:

Counterfactual Analysis (USA):

Comparison and Conclusions

Major Findings

1. Significant Increase in Childhood Indicator Importance (But Potentially Spurious)

The replacement of WHO U5MR with IHME All-Cause Under 5 led to a dramatic increase in the Childhood indicator’s importance:

However, this increase may be spurious due to confounding: the IHME indicator (deaths per 100,000 population) is confounded with age structure and fertility rates. Countries with more people of child-bearing age and higher fertility will have more people under 5 in the population, and therefore more deaths under 5, even if the underlying risk of death for children is the same. The WHO indicator (deaths per 1,000 live births) controls for these factors by using live births as the denominator, making it a more direct measure of early-life mortality risk.

2. Mixed Model Performance Changes

The replacement resulted in mixed changes in model performance:

The Life Expectancy model improved, while the HALE model showed a small decrease in performance. The changes are relatively small and within acceptable limits.

3. Other Indicators Show Small Changes

Most other indicators showed small changes in importance and rankings:

4. Measurement Unit Differences and Confounding

Important Note: The WHO and IHME indicators use different measurement units:

These different units reflect fundamentally different approaches to measuring early-life mortality. The IHME indicator (per 100,000 population) is confounded with age structure and fertility rates: countries with more people of child-bearing age and higher fertility will have more people under 5 in the population, and therefore more deaths under 5, even if the underlying risk of death for children is the same. The WHO indicator (per 1,000 live births) controls for these factors by using live births as the denominator, making it a more direct measure of early-life mortality risk independent of demographic structure.

Answers to Key Questions

  1. Does replacing WHO U5MR with IHME All-Cause Under 5 affect model performance (R², MAE)?

    • MIXED IMPACT — Life Expectancy model improved (R² +1.0%, MAE -5.9%), while HALE model showed small decreases (R² -1.4%, MAE +3.4%). Overall, the changes are small and acceptable.

  2. How does the Childhood indicator’s importance change after replacement?

    • MAJOR INCREASE (BUT POTENTIALLY SPURIOUS) — Childhood importance increased dramatically: from 0.0558 to 2.65 in LE model (+4,650%), and from not in top 10 to 3.55 in HALE model (ranked #5). However, this increase may be spurious due to confounding with age structure and fertility rates, rather than reflecting true variation in early-life mortality risk.

  3. Are there any changes in the ranking of top indicators?

    • YES — Childhood moved from rank #10 (LE) or not in top 10 (HALE) to rank #4 (LE) and #5 (HALE). Other indicators showed small changes in rankings, but the top indicators (Neoplasms, UnintentionalInjury, ChronicRespiratory) remained stable.

  4. Does the replacement improve model interpretability or temporal coverage?

    • NO, DUE TO CONFOUNDING — While the IHME indicator provides better temporal coverage (1990-2023), the confounding with age structure and fertility rates makes it less interpretable. The WHO indicator (per 1,000 live births) is methodologically more appropriate because it controls for these confounding factors, even though it has less temporal coverage.

Implications

The replacement of WHO U5MR with IHME All-Cause Under 5 reveals important methodological considerations:

  1. Confounding with Age Structure and Fertility: The IHME indicator (deaths per 100,000 population) is confounded with age structure and fertility rates. Countries with:

    • A larger proportion of the population in child-bearing age

    • Higher fertility rates

    will have more people under age 5 in the population, and therefore more deaths under 5, even if the underlying risk of death for children is the same. This confounding makes it difficult to interpret the IHME indicator as a pure measure of early-life mortality risk.

  2. WHO Indicator Controls for Confounding: The WHO indicator (deaths per 1,000 live births) controls for age structure and fertility by using live births as the denominator. This makes it a more direct measure of the risk of death for children, independent of how many children are in the population.

  3. Why IHME Shows Higher Importance: The dramatic increase in Childhood indicator importance with the IHME version (from 0.0558 to 2.65 in LE model) may be partially or entirely due to this confounding. The IHME indicator may be capturing variation in fertility rates and age structure across countries, which could be correlated with gender gaps in life expectancy through mechanisms unrelated to early-life mortality risk itself.

  4. Measurement Unit Considerations: The different measurement units (per 1,000 live births vs per 100,000 population) reflect fundamentally different approaches to measuring early-life mortality, with the WHO approach being more appropriate for isolating mortality risk from demographic structure.

Recommendations

  1. Retain WHO indicator: Despite the dramatic increase in importance with the IHME indicator, the WHO U5MR indicator should be retained in the final model because:

    • It controls for age structure and fertility, providing a more direct measure of early-life mortality risk

    • The higher importance of the IHME indicator may be spurious, driven by confounding with demographic factors rather than true variation in mortality risk

    • The WHO indicator’s methodology (deaths per 1,000 live births) is more appropriate for cross-country comparisons of child mortality risk

  2. Document the confounding issue: Clearly document that the IHME All-Cause Under 5 indicator (deaths per 100,000 population) is confounded with age structure and fertility, and that this confounding likely explains why it shows higher importance in the model.

  3. Accept lower importance for WHO indicator: The lower importance of the WHO U5MR indicator (0.0558 in LE model, not in top 10 for HALE) may reflect that:

    • Early-life mortality has less variation across OECD countries (most have low child mortality)

    • The indicator is appropriately measuring mortality risk without demographic confounding

    • The lower importance is not necessarily a problem, as it may reflect the true, smaller contribution of early-life mortality to gender gaps in OECD countries

  4. Consider temporal coverage trade-off: While the WHO indicator has less temporal coverage than IHME, the methodological appropriateness (controlling for age structure and fertility) outweighs the temporal coverage advantage for this analysis.


Removing Childhood Indicator (Under-Five Mortality)

Removal Details

Key Questions

  1. Does removing Childhood affect model performance (R², MAE)?

  2. How do other indicators’ importance values change after removal?

  3. Are there any changes in the ranking of top indicators?

  4. Does removing Childhood simplify the model without losing important information?

Baseline Results (With Childhood - WHO U5MR)

Life Expectancy Gap Model:

HALE Gap Model:

New Results (Without Childhood)

Life Expectancy Gap Model:

Counterfactual Analysis (USA):

HALE Gap Model:

Counterfactual Analysis (USA):

Comparison and Conclusions

Major Findings

1. Minimal Impact on Model Performance

The removal of Childhood resulted in very small changes in model performance:

The changes are very small and within acceptable limits, confirming that Childhood was not contributing meaningfully to model fit.

2. Some Redistribution of Importance

The removal of Childhood led to some redistribution of importance, with notable changes in both models:

Life Expectancy Model:

HALE Model:

3. Homicide Gained Importance in Both Models

Critical Finding: Homicide showed substantial increases in importance in both models after removing Childhood:

This suggests that Childhood may have been capturing some variance that is now being attributed to Homicide, or that removing Childhood allows Homicide to better capture its true relationship with gender gaps.

4. Neoplasms Strengthened in HALE Model

HALE Model: Neoplasms importance increased from 23.7 to 28.2 (+19% increase), further strengthening its position as the top indicator. This suggests that removing Childhood allows Neoplasms to better capture its relationship with the HALE gap.

5. Cardiovascular Decreased in HALE Model

HALE Model: Cardiovascular importance decreased from 7.33 to 5.55 (-24% decrease), moving from rank #2 to #3. This suggests that Childhood may have been interacting with Cardiovascular in some way, or that the removal allows other indicators (particularly Neoplasms) to capture more variance.

Answers to Key Questions

  1. Does removing Childhood affect model performance (R², MAE)?

    • MINIMAL IMPACT — Life Expectancy R² improved slightly (+0.8%), MAE improved (-4.4%). HALE R² decreased very slightly (-0.3%), MAE increased slightly (+1.2%). The changes are very small and within acceptable limits, confirming that Childhood was not contributing meaningfully to model fit.

  2. How do other indicators’ importance values change after removal?

    • MIXED CHANGES — Homicide showed substantial increases in both models (+70% for LE, +29% for HALE). Neoplasms increased in HALE model (+19%). Cardiovascular decreased in HALE model (-24%). Most other indicators showed small changes. The redistribution suggests that Childhood may have been interacting with other indicators or capturing some shared variance.

  3. Are there any changes in the ranking of top indicators?

    • MINOR CHANGES — In the Life Expectancy model, Homicide moved from #7 to #5. In the HALE model, Cardiovascular moved from #2 to #3, while Neoplasms strengthened its #1 position. The top indicators remained largely stable, with minor shifts in rankings.

  4. Does removing Childhood simplify the model without losing important information?

    • YES — The minimal impact on model performance and the very low importance of Childhood (0.0558 in LE, not in top 10 for HALE) confirm that removing it simplifies the model without losing meaningful predictive power. The redistribution of importance to other indicators (particularly Homicide and Neoplasms) suggests that the model is more robust without Childhood.

Implications

The removal of Childhood has minimal impacts on model results:

  1. Model performance is essentially unchanged — The very small changes in R² and MAE confirm that Childhood was not contributing meaningfully to model fit, supporting the decision to remove it.

  2. Some redistribution of importance — The removal led to some redistribution of importance, with Homicide gaining substantial importance in both models and Neoplasms strengthening in the HALE model. This suggests that Childhood may have been interacting with these indicators or capturing some shared variance.

  3. Model simplification — Removing Childhood simplifies the model by eliminating an indicator with very low importance and limited temporal coverage, without sacrificing meaningful predictive power.

  4. No suitable alternative — The lack of a suitable alternative (IHME version is confounded with age structure and fertility) supports the decision to remove Childhood entirely rather than replace it.

Recommendations

  1. Confirm removal: The removal of Childhood is justified and should be maintained in the final model, given:

    • Very low importance (0.0558 in LE, not in top 10 for HALE)

    • Minimal impact on model performance

    • Limited temporal coverage

    • Lack of a suitable alternative (IHME version is methodologically inappropriate due to confounding)

  2. Document the rationale: Clearly document that Childhood was removed because:

    • It had very low importance in both models

    • It has limited temporal coverage

    • The IHME alternative is confounded with age structure and fertility

    • Removing it simplifies the model without sacrificing meaningful predictive power

  3. Note the redistribution: The redistribution of importance to other indicators (particularly Homicide and Neoplasms) should be noted, as it suggests that Childhood may have been interacting with these indicators or capturing some shared variance.

  4. Accept the model simplification: The minimal impact on model performance confirms that removing Childhood is a reasonable simplification that does not harm model fit.