Other Operations¶
Operations that support allocation calculations: scenario harmonization, RCB pathway generation, data preprocessing, and validation.
Scenario Harmonization¶
Harmonization with Convergence¶
Aligns emission pathways with historical data at an anchor year, then converges back to the original scenario trajectory.
- Replace scenario values with historical data for years ≤ anchor year
- Linearly interpolate for anchor year < year < convergence year
- Use original scenario values for years ≥ convergence year
Implementation → | src/fair_shares/library/utils/timeseries.py
Cumulative Peak Preservation¶
Preserves the peak cumulative emissions using time-varying scaling when preserve_cumulative_peak=True.
Implementation → | src/fair_shares/library/utils/timeseries.py
Post-Net-Zero Handling in Global Pathways¶
Some AR6 scenario pathways have the global emission trajectory going net-negative (i.e., the world as a whole achieves net-negative emissions). The allocation framework cannot meaningfully distribute negative global emissions across countries, so years after the global pathway crosses zero are set to NaN and reported.
This is a preprocessing step applied to global scenario pathways before allocation. Pre-net-zero years are preserved unchanged.
Implementation → | src/fair_shares/library/utils/dataframes.py::set_post_net_zero_emissions_to_nan
RCB Pathway Generation¶
Converts the global remaining carbon budget into a global annual emission pathway. This is a prerequisite step before country-level pathway allocation — it does not produce country pathways directly.
How it works¶
- Takes the global RCB (in Mt CO₂) and current global emissions as inputs
- Generates a single global pathway using normalized shifted exponential decay
- The pathway starts at current global emissions and reaches exactly zero at the end year (default 2100)
- The discrete annual sum equals the original carbon budget by construction
Country allocations happen after this step, using pathway allocation approaches (e.g., equal-per-capita, per-capita-adjusted). The pathway shape does not prescribe country net-zero years — those emerge from the allocation step. When a country's allocated share approaches zero, that approximates their implied net-zero year.
The default (and currently only) generator is exponential-decay (shifted exponential). The generator parameter is an extensibility point — alternative functional forms (e.g., linear, sigmoid) can be added without changing the allocation pipeline, provided they satisfy the same constraint: the discrete annual sum must exactly equal the input carbon budget by the end year (typically 2100). Note that even with a 2100 end year, individual regions may reach effective net-zero much earlier when their allocated share of the global pathway becomes negligibly small.
Data Preprocessing¶
Interpolation¶
Fills missing values using linear or stepwise interpolation.
Implementation → | src/fair_shares/library/utils/timeseries.py::interpolate_scenarios_data
Unit Conversion¶
Standardizes units (emissions: kt/Mt/Gt CO2e, population: million).
Implementation → | src/fair_shares/library/utils/units.py
Data Validation¶
TimeseriesDataFrame Validation¶
Validates structure (MultiIndex format) and content (non-negative values, complete time series).
Implementation → | src/fair_shares/library/validation/pipeline_validation.py
Cross-Dataset Validation¶
Verifies analysis countries + ROW = world totals, and ensures temporal/spatial alignment.
Implementation → | src/fair_shares/library/validation/pipeline_validation.py
Data Completeness¶
Analysis Country Selection¶
Identifies countries with complete data across all datasets and computes Rest of World totals for remaining countries.
Implementation → | src/fair_shares/library/utils/data/completeness.py
World Total Extraction¶
Extracts world totals for validation. Supports keys: "EARTH", "WLD", "World".
Implementation → | src/fair_shares/library/utils/data/completeness.py
All-GHG Allocations with RCBs¶
Remaining carbon budgets constrain CO₂ only — they are derived from the near-linear relationship between cumulative CO₂ emissions and global warming [IPCC AR6 WG1]. Non-CO₂ greenhouse gases (CH₄, N₂O, F-gases) pathways are an assumption used in deriving the remaining carbon budget quantity.
When users request an all-GHG allocation (e.g. all-ghg or all-ghg-ex-co2-lulucf) but use RCBs as the target source, a problem arises: RCBs constrain CO₂ only and say nothing about non-CO₂ gases. To produce all-GHG results, the system decomposes the problem into two components — allocating CO₂ via the budget approach and non-CO₂ via scenario pathways:
| Component | Gas scope | Allocation method | Data source |
|---|---|---|---|
| CO₂ | co2 or co2-ffi |
Budget approach (RCBs) | Remaining carbon budgets |
| Non-CO₂ | non-co2 |
Pathway approach (e.g. AR6 scenarios) | e.g. AR6 median pathways |
New RCBs require matching scenario pathways
When adding new remaining carbon budgets, complete scenario pathways that meet their climate assessments must also be available in the active data source configuration for pathways. Without them, the non-CO₂ decomposition has no pathway data to allocate from.
Decomposition rules¶
The CO₂ component depends on the requested emission category:
all-ghg→ CO₂ component isco2(total CO₂ including LULUCF, NGHGI-corrected)all-ghg-ex-co2-lulucf→ CO₂ component isco2-ffi(fossil only, no NGHGI corrections needed)
Non-CO₂ derivation¶
Non-CO₂ emissions are not a native category. They are derived by subtraction:
This subtraction is applied to both historical emissions (e.g. PRIMAP) and future scenarios (e.g. AR6), producing non-CO₂ timeseries that are used for pathway allocation.
Scenario labels¶
Scenario source categories are mapped to clean climate-assessment names and quantile fields during preprocessing (notebook 104). This normalisation ensures that all downstream code — including non-CO₂ pathways — works with a consistent format regardless of the upstream scenario source. For example, AR6 categories map as:
- C1 →
climate-assessment="1.5C",quantile=0.5 - C3 →
climate-assessment="2C",quantile=0.66 - C2 →
climate-assessment="2C",quantile=0.83
New scenario sources would define their own mapping into the same clean format. Any new data processing notebook that introduces a scenario source must output data in this normalised schema.
Internally, notebook 104 uses combined labels (e.g., "1.5p50") during median calculation to keep C2 and C3 distinct, then remaps to the clean format at output.
Auto-derivation of pathway approaches¶
Users only specify budget approaches (e.g., equal-per-capita-budget). The
system automatically derives equivalent pathway approaches for non-CO₂:
equal-per-capita-budget→equal-per-capitaper-capita-adjusted-budget→per-capita-adjustedper-capita-adjusted-gini-budget→per-capita-adjusted-giniallocation_year→first_allocation_yearpreserve_allocation_year_shares→preserve_first_allocation_year_shares
This ensures methodological consistency: the same equity principle governs both gases, adapted to the different allocation modes. This auto-derivation only works when scenario pathway data matching the requested climate assessments is available in the active source set — without it, the non-CO₂ leg has nothing to allocate.
Single-pass exception for direct scenario pathways¶
When target=pathway, composite categories are not decomposed. If the
scenario source provides direct data for all-ghg or all-ghg-ex-co2-lulucf
(as AR6 does), the system runs a single pathway allocation pass instead of
the CO₂ + non-CO₂ decomposition.
Weber RCB Corrections¶
Remaining carbon budgets (RCBs) are published relative to a baseline year (e.g., 2020 for AR6 WGI, 2023 for Lamboll et al.) and use bookkeeping model (BM) estimates for land-use CO₂ fluxes. To produce country-level fair share allocations that are comparable with nationally reported emissions, the published RCB must be:
- Rebased to the allocation reference year (2020) using actual observational data
- Decomposed to isolate the fossil-allocatable or NGHGI-consistent portion
- Adjusted for international bunker fuels excluded from national inventories
The correction methodology follows Weber et al. (2026). A central design principle is the strict separation of actual observations (used for the rebase) from scenario projections (used only for forward-looking quantities).
Notation¶
| Symbol | Definition |
|---|---|
| \(\text{RCB}_{\text{BM}}(\text{base})\) | Published Remaining Carbon Budget from baseline year, in BM convention |
| \(F_{\text{actual}}(a, b)\) | Cumulative actual fossil CO₂ emissions from year \(a\) to year \(b\) (PRIMAP) |
| \(L_{\text{BM}}(a, b)\) | Cumulative BM LULUCF CO₂ from AR6 scenario median (AFOLU|Direct), years \(a\) to \(b\) |
| \(L_{\text{BM,actual}}(a, b)\) | Cumulative actual observed BM LULUCF CO₂ (PRIMAP co2-lulucf), years \(a\) to \(b\) |
| \(B(a, b)\) | Cumulative international bunker fuel CO₂ emissions, years \(a\) to \(b\) |
| \(\text{gap}(a, b)\) | Cumulative NGHGI--BM convention gap for CO₂, years \(a\) to \(b\) |
| \(\text{NZ}\) | Net-zero year (from scenario data) |
| \(\text{base}\) | RCB baseline year (e.g., 2023 for Lamboll, 2020 for AR6 WGI) |
Correction for fossil-only budgets (co2-ffi)¶
The fossil-allocatable budget isolates the portion of the total carbon budget available for fossil CO₂ emissions, after removing the land-use share and international bunkers:
The four terms are:
-
\(\text{RCB}_{\text{BM}}(\text{base})\) -- the published carbon budget, which covers total anthropogenic CO₂ (fossil + BM LULUCF) from the baseline year onward.
-
\(F_{\text{actual}}(2020,\, \text{base}{-}1)\) -- the fossil rebase. When the published baseline is after 2020, actual fossil emissions from 2020 to \(\text{base}{-}1\) are added back. This uses only observational data (PRIMAP), never scenario projections. When \(\text{base} = 2020\), this term is zero.
-
\(L_{\text{BM}}(\text{base},\, \text{NZ})\) -- the LULUCF decomposition. Removes the BM LULUCF share of the budget from the baseline year to the scenario net-zero year, using AR6 scenario median pathways (AFOLU|Direct). This is the only way to separate the fossil and land-use portions of the total budget.
-
\(B(2020,\, \text{NZ})\) -- the bunker deduction. Removes international aviation and shipping emissions that appear in global totals but are excluded from national inventories. Integrated from 2020 to NZ regardless of baseline year.
Why LULUCF is absent from the co2-ffi rebase¶
The rebase needs to shift the budget's starting point from \(\text{base}\) to 2020. A naive approach would add all actual emissions (fossil + LULUCF) for the rebase period. But the LULUCF decomposition already covers the full range from \(\text{base}\) to NZ, so adding actual BM LULUCF from 2020 to \(\text{base}{-}1\) alongside decomposing from \(\text{base}\) to NZ is equivalent to decomposing from 2020 to NZ:
Since we would need to subtract \(L_{\text{BM}}(2020,\, \text{NZ})\) to isolate the fossil budget anyway, the rebase LULUCF and the decomposition LULUCF from 2020 to \(\text{base}{-}1\) cancel algebraically. The formula therefore omits actual LULUCF from the rebase and starts the LULUCF decomposition at \(\text{base}\), not 2020. The result is the same, but the formula is simpler and avoids mixing actual and scenario data for overlapping years.
Precautionary cap on BM LULUCF¶
A precautionary cap (default: on) ensures that the projected BM LULUCF sink cannot increase the fossil budget -- only sources can reduce it:
The LULUCF decomposition extends from the baseline year to net-zero. Thus we must use AR6 scenario pathways corresponding to the same climate category as the RCB (e.g., 1.5°C 50th percentile scenarios for a 1.5p50 budget). Each scenario's cumulative AFOLU|Direct is integrated from 2020 to its own net-zero year \(t_{\text{nz},i}\), then the median across scenarios is taken (integrate per scenario, then median — consistent with the convention gap computation per Weber 2026). The pre-computed median cumulative is stored as bm_lulucf_cumulative_median in rcb_scenario_adjustments.yaml. When the RCB baseline year \(\text{base} > 2020\), the 2020-to-base prefix is subtracted using the per-year median timeseries (accurate because historical BM LULUCF has negligible inter-scenario spread). When the result is negative (net sink), the cap zeros it out because the sink relies on uncertain future reforestation. Configurable via precautionary_lulucf in the adjustments config (default: true; set to false for sensitivity analysis).
Why convention_gap_median is 0 in CO2-FFI output
The rcb_scenario_adjustments.yaml file includes a convention_gap_median field for every AR6 category. For co2-ffi allocations, this field is not used — the LULUCF adjustment comes from bm_lulucf_cumulative_median instead (see above). The convention gap only applies to co2 allocations, where the budget must switch from the bookkeeping model convention to the NGHGI convention. Its presence in the YAML for co2-ffi scenarios is a storage artefact, not a computational input.
Correction for total CO₂ budgets (co2)¶
For budgets covering total CO₂ including land use, LULUCF stays in the budget but the convention must switch from BM to NGHGI. Unlike the co2-ffi case, there is no LULUCF decomposition -- only a convention gap adjustment:
The five terms are:
-
\(\text{RCB}_{\text{BM}}(\text{base})\) -- the published carbon budget, same as for co2-ffi.
-
\(F_{\text{actual}}(2020,\, \text{base}{-}1)\) -- the fossil rebase, identical to co2-ffi.
-
\(L_{\text{BM,actual}}(2020,\, \text{base}{-}1)\) -- the BM LULUCF rebase. Unlike co2-ffi, actual observed BM LULUCF is included in the rebase. This is because there is no LULUCF decomposition to cancel with -- the budget retains the full land-use component. Source: PRIMAP co2-lulucf (already in the pipeline).
-
\(\text{gap}(2020,\, \text{NZ})\) -- the BM-to-NGHGI convention gap. Covers the full period from 2020 (not from \(\text{base}\)) because the BM LULUCF rebase is in BM convention — the gap for the rebase years converts it to NGHGI. This quantity is negative (NGHGI reports a larger land sink than BM), so it reduces the allocatable budget. Computed from Melo v3.1 NGHGI LULUCF and Gidden AR6 reanalysis data (see Convention gap decomposition).
-
\(B(2020,\, \text{NZ})\) -- the bunker deduction, same as for co2-ffi.
Why the co2 rebase includes actual BM LULUCF¶
In the co2-ffi formula, actual BM LULUCF in the rebase period cancels with the decomposition for the same years. In the co2 formula there is no LULUCF decomposition (because land-use emissions stay in the budget), so there is nothing for the rebase LULUCF to cancel with. The rebase must therefore include both fossil and BM LULUCF to correctly shift the total CO₂ budget from \(\text{base}\) to 2020.
Design principle: actual data for the rebase, scenario data for the future¶
The formulas enforce a strict separation:
| Quantity | Data type | Rationale |
|---|---|---|
| Fossil rebase (\(F_{\text{actual}}\)) | Actual (PRIMAP) | Observed emissions -- no projection uncertainty |
| BM LULUCF rebase (\(L_{\text{BM,actual}}\), co2 only) | Actual (PRIMAP co2-lulucf) | Same rationale |
| BM LULUCF decomposition (\(L_{\text{BM}}\), co2-ffi only) | Scenario median (AFOLU|Direct) | Requires future pathway to NZ; no observational data exists |
| Convention gap (\(\text{gap}\)) | Scenario-based | Forward-looking NGHGI--BM difference requires modeled indirect fluxes |
| Net-zero year (\(\text{NZ}\)) | Scenario data | By definition a future quantity |
| Bunker deduction (\(B\)) | Observational + extrapolation | Historical data extended at last observed rate to NZ |
This means that adding a new RCB source (e.g., Lamboll et al. with baseline 2023) only requires actual emissions data through 2022 for the rebase. The LULUCF decomposition integrates from \(\text{base}\) (co2-ffi), while the convention gap and bunker deduction always cover the full 2020--NZ period.
Why two LULUCF conventions matter¶
Bookkeeping models (e.g., BLUE, OSCAR) estimate only direct human-caused land-use fluxes -- deforestation, afforestation, land management. NGHGIs additionally include indirect effects such as CO₂ fertilization of managed forests and climate-driven changes in soil carbon. The NGHGI total is therefore systematically different from the bookkeeping total, even for the same physical land area.
The global difference is substantial: NGHGI-reported LULUCF is a larger net sink than BM estimates, creating a 5--7 GtCO₂/yr discrepancy primarily because CO₂ fertilization enhances carbon uptake on managed land [Weber 2026].
Per-scenario net-zero years as integration bounds¶
Forward-looking quantities (LULUCF decomposition, convention gap, bunker deduction) are integrated from their start year to the scenario-specific net-zero year \(t_{\text{nz},i}\) -- the first year when that scenario's total CO₂ emissions (Emissions|CO2) reach zero. This prevents post-net-zero negative emissions from inflating the corrections.
Note that Emissions|CO2 in IAM scenario databases is fossil + BM LULUCF by convention — this is a property of how scenarios report total CO₂, not a methodological choice by fair-shares.
Per-scenario net-zero years are computed from the Gidden et al. AR6 reanalysis (OSCAR v3.2) data. Scenario-level summary statistics (median, quartiles) are stored in data/rcbs/ar6_category_constants.yaml, keyed by RCB scenario label (e.g., 1.5p50). The scenario-level median NZ year is used for the bunker integration endpoint (which is observational, not scenario-dependent).
Scenarios that never reach net-zero total CO₂ before 2100 are assigned 2100 as a conservative upper integration bound.
Convention gap decomposition¶
The per-scenario convention gap \(\text{Gap}_i\) decomposes into two temporal segments:
Historical (\(2020 \leq t \leq \mathrm{splice{\_}year}\)): NGHGI actual LULUCF (reported values, same for all scenarios) minus BM LULUCF for scenario \(i\) (the bookkeeping proxy from scenario data). The splice year is derived dynamically from the data (currently 2023). The gap always covers the full 2020--NZ range because the BM LULUCF rebase years also need convention conversion:
Currently, NGHGI actual LULUCF is sourced from Melo et al. and BM LULUCF from the Gidden et al. AR6 reanalysis (AFOLU|Direct). These are the current data sources for these roles — alternatives providing the same quantities could be substituted.
Future (\(t > \mathrm{splice{\_}year}\)): Only the NGHGI-consistent indirect component for scenario \(i\) (CO₂ fertilization and other passive fluxes), because the direct components cancel in the gap:
Currently sourced from the Gidden et al. AR6 reanalysis (AFOLU|Indirect).
The total per-scenario gap is \(\text{Gap}_i = \text{Gap}_{i,\text{hist}} + \text{Gap}_{i,\text{future}}\), and the median is taken across all scenarios \(i\) in the corresponding climate category pool. Each scenario's integration ends at its own \(t_{\text{nz},i}\).
Why per-scenario data for the BM side? The convention gap is a per-scenario quantity — each scenario has its own AFOLU|Direct pathway, NZ year, and Indirect fluxes. Global Carbon Budget multi-model averages cannot be substituted here because they would collapse the per-scenario variation into a single number, breaking the integrate-per-scenario-then-median methodology. For actual historical emissions (the RCB rebase from baseline to 2020), observational data is used — no scenario data is involved in that step.
World CO₂ timeseries for backward extension¶
When the allocation year is before 2020, historical emissions must be added back to the RCB (see RCB Pathway Generation above). For total CO₂, the per-year world emissions use the NGHGI convention:
Where LULUCF uses:
- 2000 onwards: Melo NGHGI LULUCF (nationally aggregated inventory data, v3.1)
- Pre-2000: Not available in NGHGI convention. Categories including LULUCF are limited to the NGHGI data range (2000+). No NGHGI/BM splicing is performed. While earlier NGHGI LULUCF estimates may exist (e.g., via Grassi et al. or historical extensions of Melo), the official NGHGI data begins in 2000. Extending to 1990 would require splicing heterogeneous datasets, which risks introducing artefacts (the BM-to-NGHGI transition around 1990 shows a large jump). We plan to extend coverage when validated pre-2000 NGHGI LULUCF data becomes available.
This ensures the world timeseries passed to calculate_budget_from_rcb is NGHGI-consistent, and that function works identically for both co2-ffi and co2 categories.
Data requirements for new scenario sources¶
When adding a new RCB source (e.g., a new publication with a different baseline year or scenario set), the following data are needed:
| Data needed | Used for | Source |
|---|---|---|
| RCB value + baseline year | Starting point | Published literature |
| Actual fossil CO₂ | Rebase | PRIMAP (already in pipeline) |
| Actual BM LULUCF | co2 rebase | PRIMAP co2-lulucf (already in pipeline) |
| Per-year BM LULUCF pathway | co2-ffi LULUCF decomposition | Scenario data (AFOLU|Direct median) |
| Net-zero year | Integration limit for bunkers + LULUCF | Scenario data |
| Convention gap | co2 BM-to-NGHGI adjustment | NGHGI + scenario Indirect AFOLU |
| Bunker fuel timeseries | Bunker deduction | NGHGI (already in pipeline) |
The first three rows are observational and already available in the pipeline. The remaining four require scenario data for the new source's mitigation pathway category.
Data sources¶
| Component | Source | Coverage |
|---|---|---|
| Fossil CO₂ | PRIMAP-hist v2.6.1 | 1750--present |
| BM LULUCF (actual) | PRIMAP co2-lulucf | Country-level, annual |
| NGHGI LULUCF | Melo et al. (2026) v3.1 NGHGI LULUCF | 2000--2023, 187 countries + world |
| BM LULUCF proxy | Gidden et al. AR6 reanalysis, AFOLU|Direct | 2015--2100, per scenario within category |
| Passive flux | Gidden et al. AR6 reanalysis, AFOLU|Indirect | 2015--2100, per scenario within category |
| Net-zero years | Gidden et al. AR6 reanalysis, Emissions|CO2 | Per scenario (first year total CO₂ ≤ 0) |
| Bunker fuels | GCB2024 historical + rate extrapolation | Historical + extrapolated to median NZ |
API Reference → | src/fair_shares/library/utils/data/nghgi.py
Worked example: AR6 WG1 1.5C 50% (1.5p50)¶
Using ar6_2020 source: 500 GtCO₂ total from 2020, scenario 1.5p50 (70 C1 scenarios, median NZ year ~2050). Values from make dev-pipeline-rcbs with PRIMAP v2025.03 emissions and Melo v3.1 LULUCF.
Step 1: Weber corrections (RCB to allocatable budget at 2020)¶
| co2-ffi | co2 | |
|---|---|---|
| Published RCB (total CO₂) | 500 Gt | 500 Gt |
| Fossil rebase | 0 (base=2020) | 0 (base=2020) |
| BM LULUCF rebase | -- | 0 (base=2020) |
| LULUCF decomposition / gap | 0 (BM sink capped) | -90 Gt (conv gap) |
| Bunker subtraction | -35 Gt | -35 Gt |
| Allocatable budget (2020) | 465 Gt | 375 Gt |
co2-ffi: The cumulative BM LULUCF is estimated from AR6 scenarios in the corresponding climate category (here 1.5°C 50th percentile). Each scenario's AFOLU|Direct is integrated from 2020 to its own NZ year, then the median across scenarios is taken. The result is a net sink. Under the precautionary cap (default), this sink is not credited to the fossil budget (capped to 0). Without the cap (precautionary_lulucf: false), the fossil budget would increase.
co2: The convention gap is -90 Gt — NGHGI reports a larger land CO₂ sink than bookkeeping models, reducing the allocatable budget. The gap is computed from Melo v3.1 NGHGI LULUCF and Gidden AR6 reanalysis (see Convention gap decomposition). Bunker deduction is ~35 Gt (~870 Mt/yr integrated to median NZ year ~2050). The co2 budget is lower than co2-ffi because the convention gap is a significant negative adjustment.
For Lamboll 2023 (1.5C, 247 Gt from 2023):
| co2-ffi | co2 | |
|---|---|---|
| Published RCB | 247 Gt | 247 Gt |
| Fossil rebase (2020--2022) | +107 Gt | +107 Gt |
| BM LULUCF rebase (2020--2022) | -- | -12 Gt |
| LULUCF decomposition / gap | 0 (BM sink capped) | -90 Gt (conv gap) |
| Bunker subtraction | -35 Gt | -35 Gt |
| Allocatable budget (2020) | 319 Gt | 217 Gt |
Step 2: Allocation year adjustment¶
The allocatable budget is the budget from 2020 onwards. The allocation_year parameter shifts the starting point by adding historical emissions (before 2020) or subtracting already-used emissions (after 2020).
To regenerate these values, run make dev-pipeline-rcbs.
See Also¶
- Allocation Approaches -- Design choices
- API Reference -- Function documentation