Data Processing Utilities¶

Functions for processing and transforming input datasets: convergence pathway data preparation, NGHGI-consistent RCB corrections, and RCB scenario processing.

Convergence Data Processing¶

process_emissions_data¶

fair_shares.library.utils.data.convergence.process_emissions_data ¶

Python

process_emissions_data(
    country_actual_emissions_ts: TimeseriesDataFrame,
    first_allocation_year: int,
    emission_category: str,
    group_level: str,
    unit_level: str,
    ur: PlainRegistry,
) -> tuple[
    DataFrame, DataFrame, dict[int, str | int | float], str
]

Process country emissions data and extract initial shares.

Parameters:

Name	Type	Description	Default
`country_actual_emissions_ts`	`TimeseriesDataFrame`	Raw country emissions data.	required
`first_allocation_year`	`int`	Year to start allocation.	required
`emission_category`	`str`	Emission category to analyze.	required
`group_level`	`str`	Index level for grouping (e.g., 'iso3c').	required
`unit_level`	`str`	Index level for units.	required
`ur`	`PlainRegistry`	Unit registry.	required

Returns:

Type	Description
`tuple`	(emissions_full_numeric, emissions_countries_full, year_to_label, start_column)

calculate_initial_shares¶

fair_shares.library.utils.data.convergence.calculate_initial_shares ¶

Python

calculate_initial_shares(
    emissions_countries_full: DataFrame,
    start_column: str,
    group_level: str,
) -> tuple[Series, float]

Calculate initial emission shares from actual emissions at start year.

Parameters:

Name	Type	Description	Default
`emissions_countries_full`	`DataFrame`	Country emissions data (World rows already filtered out).	required
`start_column`	`str`	Column label for first allocation year.	required
`group_level`	`str`	Index level for grouping.	required

Returns:

Type	Description
`tuple`	(country_totals, country_sum) where country_totals is Series of emissions by country and country_sum is the total.

process_world_scenario_data¶

fair_shares.library.utils.data.convergence.process_world_scenario_data ¶

Python

process_world_scenario_data(
    world_scenario_emissions_ts: TimeseriesDataFrame,
    first_allocation_year: int,
    group_level: str,
    unit_level: str,
    ur: PlainRegistry,
) -> tuple[
    DataFrame,
    Series,
    list[str],
    dict[int, str | int | float],
    str,
    float,
]

Process world scenario emissions and calculate year fractions.

Parameters:

Name	Type	Description	Default
`world_scenario_emissions_ts`	`TimeseriesDataFrame`	World emissions pathway.	required
`first_allocation_year`	`int`	Year to start allocation.	required
`group_level`	`str`	Index level for grouping.	required
`unit_level`	`str`	Index level for units.	required
`ur`	`PlainRegistry`	Unit registry.	required

Returns:

Type	Description
`tuple`	(emissions_world, year_fraction_of_cumulative_emissions, sorted_columns, world_year_to_label, world_start_column, world_total)

process_population_data¶

fair_shares.library.utils.data.convergence.process_population_data ¶

Python

process_population_data(
    population_ts: TimeseriesDataFrame,
    first_allocation_year: int,
    group_level: str,
    unit_level: str,
    ur: PlainRegistry,
    cumulative_start_year: int | None = None,
) -> Series

Process population data and calculate cumulative population by group.

Parameters:

Name	Type	Description	Default
`population_ts`	`TimeseriesDataFrame`	Population time series.	required
`first_allocation_year`	`int`	Year to start allocation.	required
`group_level`	`str`	Index level for grouping.	required
`unit_level`	`str`	Index level for units.	required
`ur`	`PlainRegistry`	Unit registry.	required
`cumulative_start_year`	`int \| None`	If provided, cumulative population is computed from this year instead of first_allocation_year. Must be <= first_allocation_year. This shifts entitlements toward historically populous countries when early start years (e.g. 1850) are used.	`None`

Returns:

Type	Description
`Series`	Cumulative population by group.

build_result_dataframe¶

fair_shares.library.utils.data.convergence.build_result_dataframe ¶

Python

build_result_dataframe(
    shares_by_group: DataFrame,
    emissions_countries_index: Index,
    world_time_columns: list[str],
    group_level: str,
    unit_level: str,
) -> DataFrame

Build final result DataFrame with proper index structure.

Parameters:

Name	Type	Description	Default
`shares_by_group`	`DataFrame`	Calculated shares indexed by group.	required
`emissions_countries_index`	`Index`	Original emissions index for alignment.	required
`world_time_columns`	`list[str]`	Year columns from world scenario.	required
`group_level`	`str`	Index level for grouping.	required
`unit_level`	`str`	Index level for units.	required

Returns:

Type	Description
`DataFrame`	Result DataFrame with proper multi-index structure.

NGHGI Corrections¶

Functions for converting IPCC RCBs to NGHGI-consistent values following Weber et al. (2026). See Scientific Documentation for methodology.

load_ar6_category_constants¶

fair_shares.library.utils.data.nghgi.load_ar6_category_constants ¶

Python

load_ar6_category_constants(
    path: str | Path,
) -> dict[str, dict]

Load pre-computed AR6 category constants from YAML.

The constants file is generated by the RCB preprocessing notebook and contains per-category net-zero years and scenario counts extracted from AR6 reanalysis data.

Parameters:

Name	Type	Description	Default
`path`	`str or Path`	Path to `ar6_category_constants.yaml`	required

Returns:

Type	Description
`dict[str, dict]`	Mapping of AR6 category (e.g. "C1") to dict with keys: `nz_year_median`, `n_scenarios`, and distribution stats (`nz_year_min`, `nz_year_q25`, `nz_year_q75`, `nz_year_max`, `n_reaching_nz`)

Raises:

Type	Description
`DataLoadingError`	If the file does not exist or cannot be parsed
`DataProcessingError`	If required keys are missing from any category

load_world_co2_lulucf¶

fair_shares.library.utils.data.nghgi.load_world_co2_lulucf ¶

Python

load_world_co2_lulucf(
    path: str | Path,
) -> tuple[DataFrame, int]

Load world-total NGHGI LULUCF CO2 timeseries from notebook-produced CSV.

Reads the world-total NGHGI-reported LULUCF CO2 values produced by notebook 107 (Melo v3.1). The CSV has a single row with a "source" index and string year columns. Values are in MtCO2/yr (negative = net sink).

The splice year (last year of NGHGI data) is derived dynamically from the data rather than being hardcoded.

Parameters:

Name	Type	Description	Default
`path`	`str or Path`	Path to `world_co2-lulucf_timeseries.csv`	required

Returns:

Type	Description
`tuple[DataFrame, int]`	(nghgi_ts, splice_year) where nghgi_ts is a single-row DataFrame indexed by ["source"] with string year columns, and splice_year is the last year of NGHGI data coverage.

Raises:

Type	Description
`DataLoadingError`	If the file does not exist or expected structure is missing

load_bunker_timeseries¶

fair_shares.library.utils.data.nghgi.load_bunker_timeseries ¶

Python

load_bunker_timeseries(path: str | Path) -> DataFrame

Load international bunker fuel CO2 timeseries from notebook-produced CSV.

Reads the intermediate CSV produced by notebook 107 (LULUCF & bunker preprocessing). The CSV has a single row with a "source" index and string year columns. Values are already in MtCO2/yr.

Parameters:

Name	Type	Description	Default
`path`	`str or Path`	Path to `bunker_timeseries.csv`	required

Returns:

Type	Description
`DataFrame`	Single-row DataFrame indexed by ["source"] with string year columns and values in MtCO2/yr

Raises:

Type	Description
`DataLoadingError`	If the file does not exist or expected structure is missing

compute_bunker_deduction¶

fair_shares.library.utils.data.nghgi.compute_bunker_deduction ¶

Python

compute_bunker_deduction(
    bunker_ts: DataFrame,
    start_year: int,
    net_zero_year: int,
    historical_end_year: int = 2023,
) -> float

Compute cumulative international bunker fuel CO2 deduction.

Combines historical year-by-year values from GCB2024 with extrapolation from the last observed annual rate for years beyond the historical record.

Parameters:

Name	Type	Description	Default
`bunker_ts`	`DataFrame`	Bunker fuel CO2 timeseries (from load_bunker_timeseries) in MtCO2/yr	required
`start_year`	`int`	Start of integration window (inclusive)	required
`net_zero_year`	`int`	End of integration window (inclusive)	required
`historical_end_year`	`int`	Last year covered by the historical timeseries (default: 2023, matching GCB2024 coverage)	`2023`

Returns:

Type	Description
`float`	Total cumulative bunker deduction in MtCO2 (always positive)

Raises:

Type	Description
`DataProcessingError`	If historical data is insufficient for the start_year

build_nghgi_world_co2_timeseries¶

fair_shares.library.utils.data.nghgi.build_nghgi_world_co2_timeseries ¶

Python

build_nghgi_world_co2_timeseries(
    fossil_ts: DataFrame,
    nghgi_ts: DataFrame,
    bunker_ts: DataFrame,
) -> DataFrame

Construct NGHGI-consistent world total CO2 timeseries.

For backward extension of allocation years < 2020, Weber Eq. 3 requires per-year world CO2 = fossil - bunkers + LULUCF, where LULUCF uses: - 2000 onwards: NGHGI LULUCF (Melo v3.1) - Pre-2000: NaN (no fallback — NGHGI coverage only)

No NGHGI/BM splicing is performed. Years outside NGHGI coverage are NaN.

Parameters:

Name	Type	Description	Default
`fossil_ts`	`DataFrame`	World CO2-FFI emissions timeseries (PRIMAP) in Mt CO2/yr. Must have string year columns and a MultiIndex with (iso3c, unit, emission-category).	required
`nghgi_ts`	`DataFrame`	NGHGI LULUCF historical timeseries (from load_world_co2_lulucf) in MtCO2/yr. Single-row DataFrame with string year columns.	required
`bunker_ts`	`DataFrame`	Bunker fuel CO2 timeseries (from load_bunker_timeseries) in MtCO2/yr. Single-row DataFrame with string year columns.	required

Returns:

Type	Description
`DataFrame`	Single-row DataFrame with same index structure as fossil_ts but emission-category label set to "co2", containing per-year NGHGI-consistent total CO2 = fossil - bunkers + LULUCF. Years outside NGHGI LULUCF coverage will be NaN.

compute_cumulative_emissions¶

fair_shares.library.utils.data.nghgi.compute_cumulative_emissions ¶

Python

compute_cumulative_emissions(
    timeseries: DataFrame, start_year: int, end_year: int
) -> float

Integrate a single-row timeseries DataFrame over a year range.

Sums values for all years from start_year to end_year (inclusive). Missing years are skipped (not interpolated) since gap-filling is the caller's responsibility.

Parameters:

Name	Type	Description	Default
`timeseries`	`DataFrame`	Single-row DataFrame with string year columns (as produced by the load_* functions in this module)	required
`start_year`	`int`	First year to include (inclusive)	required
`end_year`	`int`	Last year to include (inclusive)	required

Returns:

Type	Description
`float`	Cumulative sum over the requested year range

Raises:

Type	Description
`DataProcessingError`	If no year columns fall within the requested range

RCB Processing¶

Functions for parsing RCB scenarios and converting to allocation-ready budgets.

parse_rcb_scenario¶

fair_shares.library.utils.data.rcb.parse_rcb_scenario ¶

Python

parse_rcb_scenario(scenario_string: str) -> tuple[str, str]

Parse RCB scenario string into climate assessment and quantile.

RCB scenario strings follow the format "TEMPpPROB" where TEMP is the temperature target (e.g., "1.5" or "2") and PROB is the probability as a percentage (e.g., "50" or "66").

Parameters:

Name	Type	Description	Default
`scenario_string`	`str`	RCB scenario string (e.g., "1.5p50", "2p66")	required

Returns:

Type	Description
`tuple[str, str]`	A tuple of (climate_assessment, quantile) as strings - climate_assessment: Temperature target with "C" suffix (e.g., "1.5C") - quantile: Probability as decimal string (e.g., "0.5")

calculate_budget_from_rcb¶

fair_shares.library.utils.data.rcb.calculate_budget_from_rcb ¶

Python

calculate_budget_from_rcb(
    rcb_value: float,
    allocation_year: int,
    world_scenario_emissions_ts: TimeseriesDataFrame,
    verbose: bool = True,
) -> float

Calculate total budget to allocate based on RCB value and allocation year.

RCB (Remaining Carbon Budget) values represent the remaining budget FROM 2020 onwards. The total budget to allocate depends on the allocation year:

If allocation_year < 2020: Add historical emissions (allocation_year to 2019)
If allocation_year == 2020: Use RCB directly
If allocation_year > 2020: Subtract emissions already used (2020 to allocation_year-1)

This ensures that the budget allocation is consistent regardless of which year is chosen as the allocation starting point.

All values are in Mt * CO2. RCB values are converted from Gt to Mt during preprocessing to match the units used in world_scenario_emissions_ts.

Parameters:

Name	Type	Description	Default
`rcb_value`	`float`	Remaining Carbon Budget value in Mt CO2 (from 2020 onwards)	required
`allocation_year`	`int`	Year when budget allocation should start	required
`world_scenario_emissions_ts`	`TimeseriesDataFrame`	World scenario emissions timeseries data with year columns (in Mt CO2)	required
`verbose`	`bool`	Whether to print detailed calculation information (default: True)	`True`

Returns:

Type	Description
`float`	Total budget to allocate in Mt CO2

process_rcb_to_2020_baseline¶

fair_shares.library.utils.data.rcb.process_rcb_to_2020_baseline ¶

Python

process_rcb_to_2020_baseline(
    rcb_value: float,
    rcb_unit: str,
    rcb_baseline_year: int,
    emission_category: str,
    world_co2_ffi_emissions: DataFrame,
    actual_bm_lulucf_emissions: DataFrame | None = None,
    bunkers_deduction_mt: float = 0.0,
    lulucf_deduction_mt: float = 0.0,
    target_baseline_year: int = 2020,
    source_name: str = "",
    scenario: str = "",
    verbose: bool = True,
) -> dict[str, float | str | int]

Process RCB from its original baseline year to 2020 baseline with adjustments.

This function converts RCB values from any baseline year (>= 2020) to a standardized 2020 baseline. It also applies adjustments for international bunkers and LULUCF following Weber et al. (2026).

The rebase always uses actual observational data (PRIMAP), never scenario projections. What enters the rebase depends on the emission category:

co2-ffi: Rebase uses fossil CO2 only. LULUCF is omitted because it cancels algebraically with the LULUCF decomposition term. lulucf_deduction = -L_BM(base,NZ) (positive, increases fossil budget).
co2: Rebase uses fossil CO2 + actual bookkeeping-model LULUCF. lulucf_deduction = convention_gap(base,NZ) (negative, reduces budget per Weber).

The calculation follows these steps: 1. Convert RCB from source unit to Mt * CO2e 2. If baseline_year > 2020: Add actual emissions from 2020 to (baseline_year - 1) — fossil only for co2-ffi, fossil + BM LULUCF for co2 3. Subtract bunkers deduction (always reduces budget) 4. Apply LULUCF deduction (sign-ready from caller)

Sign convention for deduction parameters: - bunkers_deduction_mt: always positive (cumulative emissions), subtracted - lulucf_deduction_mt: sign-ready from caller (added directly to budget): - For co2: convention gap, negative -> reduces budget (per Weber) - For co2-ffi: negated BM LULUCF, positive -> increases fossil budget

Parameters:

Name	Type	Description	Default
`rcb_value`	`float`	Original RCB value from the source	required
`rcb_unit`	`str`	Unit of the RCB value (e.g., "Gt * CO2", "Mt * CO2")	required
`rcb_baseline_year`	`int`	The year from which the RCB is calculated (must be >= 2020)	required
`emission_category`	`str`	Emission category: "co2-ffi" or "co2". Controls whether BM LULUCF is included in the rebase.	required
`world_co2_ffi_emissions`	`DataFrame`	World-level CO2-FFI emissions timeseries with year columns (in Mt * CO2e)	required
`actual_bm_lulucf_emissions`	`DataFrame or None`	Actual bookkeeping-model LULUCF CO2 emissions from PRIMAP, with year columns (in Mt * CO2e). Used ONLY for the co2 rebase (default: None).	`None`
`bunkers_deduction_mt`	`float`	Total bunker CO2 emissions from 2020-2100 in Mt * CO2e (default: 0.0). Always positive; subtracted from budget.	`0.0`
`lulucf_deduction_mt`	`float`	LULUCF adjustment in Mt * CO2e, sign-ready (default: 0.0). Added directly to the budget -- caller is responsible for correct sign.	`0.0`
`target_baseline_year`	`int`	Target baseline year for standardization (default: 2020)	`2020`
`source_name`	`str`	Name of the RCB source for logging (default: "")	`''`
`scenario`	`str`	Scenario name for logging (default: "")	`''`
`verbose`	`bool`	Whether to print detailed calculation information (default: True)	`True`

Returns:

Type Description

dict

Dictionary containing: - 'rcb_2020_mt': RCB adjusted to 2020 baseline in Mt * CO2e - 'rcb_original_value': Original RCB value (in source units) - 'rcb_original_unit': Original RCB unit - 'baseline_year': Original baseline year - 'rebase_total_mt': Emissions added to rebase from source year to 2020 (positive, Mt * CO2e); fossil only for co2-ffi, fossil + actual BM LULUCF for co2 - 'rebase_fossil_mt': Fossil-only component of rebase (Mt * CO2e) - 'rebase_lulucf_mt': Actual BM LULUCF component of rebase (Mt * CO2e); only non-zero for co2 - 'deduction_bunkers_mt': Bunker fuel deduction (negative, Mt * CO2e) - 'deduction_lulucf_mt': LULUCF deduction (Mt * CO2e; sign depends on emission category) - 'net_adjustment_mt': Total change from original to 2020 baseline (rebase + deductions, Mt * CO2e)

Data Processing Utilities¶

Convergence Data Processing¶

process_emissions_data¶

fair_shares.library.utils.data.convergence.process_emissions_data ¶

calculate_initial_shares¶

fair_shares.library.utils.data.convergence.calculate_initial_shares ¶

process_world_scenario_data¶

fair_shares.library.utils.data.convergence.process_world_scenario_data ¶

process_population_data¶

fair_shares.library.utils.data.convergence.process_population_data ¶

build_result_dataframe¶

fair_shares.library.utils.data.convergence.build_result_dataframe ¶

NGHGI Corrections¶

load_ar6_category_constants¶

fair_shares.library.utils.data.nghgi.load_ar6_category_constants ¶

load_world_co2_lulucf¶

fair_shares.library.utils.data.nghgi.load_world_co2_lulucf ¶

load_bunker_timeseries¶

fair_shares.library.utils.data.nghgi.load_bunker_timeseries ¶

compute_bunker_deduction¶

fair_shares.library.utils.data.nghgi.compute_bunker_deduction ¶

build_nghgi_world_co2_timeseries¶

fair_shares.library.utils.data.nghgi.build_nghgi_world_co2_timeseries ¶

compute_cumulative_emissions¶

fair_shares.library.utils.data.nghgi.compute_cumulative_emissions ¶

RCB Processing¶

parse_rcb_scenario¶

fair_shares.library.utils.data.rcb.parse_rcb_scenario ¶

calculate_budget_from_rcb¶

fair_shares.library.utils.data.rcb.calculate_budget_from_rcb ¶

process_rcb_to_2020_baseline¶

fair_shares.library.utils.data.rcb.process_rcb_to_2020_baseline ¶

See Also¶