Skip to content

Data Processing Utilities

Functions for processing and transforming input datasets: convergence pathway data preparation, NGHGI-consistent RCB corrections, and RCB scenario processing.

Convergence Data Processing

process_emissions_data

fair_shares.library.utils.data.convergence.process_emissions_data

Python
process_emissions_data(
    country_actual_emissions_ts: TimeseriesDataFrame,
    first_allocation_year: int,
    emission_category: str,
    group_level: str,
    unit_level: str,
    ur: PlainRegistry,
) -> tuple[
    DataFrame, DataFrame, dict[int, str | int | float], str
]

Process country emissions data and extract initial shares.

Parameters:

Name Type Description Default
country_actual_emissions_ts TimeseriesDataFrame

Raw country emissions data.

required
first_allocation_year int

Year to start allocation.

required
emission_category str

Emission category to analyze.

required
group_level str

Index level for grouping (e.g., 'iso3c').

required
unit_level str

Index level for units.

required
ur PlainRegistry

Unit registry.

required

Returns:

Type Description
tuple

(emissions_full_numeric, emissions_countries_full, year_to_label, start_column)

calculate_initial_shares

fair_shares.library.utils.data.convergence.calculate_initial_shares

Python
calculate_initial_shares(
    emissions_countries_full: DataFrame,
    start_column: str,
    group_level: str,
) -> tuple[Series, float]

Calculate initial emission shares from actual emissions at start year.

Parameters:

Name Type Description Default
emissions_countries_full DataFrame

Country emissions data (World rows already filtered out).

required
start_column str

Column label for first allocation year.

required
group_level str

Index level for grouping.

required

Returns:

Type Description
tuple

(country_totals, country_sum) where country_totals is Series of emissions by country and country_sum is the total.

process_world_scenario_data

fair_shares.library.utils.data.convergence.process_world_scenario_data

Python
process_world_scenario_data(
    world_scenario_emissions_ts: TimeseriesDataFrame,
    first_allocation_year: int,
    group_level: str,
    unit_level: str,
    ur: PlainRegistry,
) -> tuple[
    DataFrame,
    Series,
    list[str],
    dict[int, str | int | float],
    str,
    float,
]

Process world scenario emissions and calculate year fractions.

Parameters:

Name Type Description Default
world_scenario_emissions_ts TimeseriesDataFrame

World emissions pathway.

required
first_allocation_year int

Year to start allocation.

required
group_level str

Index level for grouping.

required
unit_level str

Index level for units.

required
ur PlainRegistry

Unit registry.

required

Returns:

Type Description
tuple

(emissions_world, year_fraction_of_cumulative_emissions, sorted_columns, world_year_to_label, world_start_column, world_total)

process_population_data

fair_shares.library.utils.data.convergence.process_population_data

Python
process_population_data(
    population_ts: TimeseriesDataFrame,
    first_allocation_year: int,
    group_level: str,
    unit_level: str,
    ur: PlainRegistry,
    cumulative_start_year: int | None = None,
) -> Series

Process population data and calculate cumulative population by group.

Parameters:

Name Type Description Default
population_ts TimeseriesDataFrame

Population time series.

required
first_allocation_year int

Year to start allocation.

required
group_level str

Index level for grouping.

required
unit_level str

Index level for units.

required
ur PlainRegistry

Unit registry.

required
cumulative_start_year int | None

If provided, cumulative population is computed from this year instead of first_allocation_year. Must be <= first_allocation_year. This shifts entitlements toward historically populous countries when early start years (e.g. 1850) are used.

None

Returns:

Type Description
Series

Cumulative population by group.

build_result_dataframe

fair_shares.library.utils.data.convergence.build_result_dataframe

Python
build_result_dataframe(
    shares_by_group: DataFrame,
    emissions_countries_index: Index,
    world_time_columns: list[str],
    group_level: str,
    unit_level: str,
) -> DataFrame

Build final result DataFrame with proper index structure.

Parameters:

Name Type Description Default
shares_by_group DataFrame

Calculated shares indexed by group.

required
emissions_countries_index Index

Original emissions index for alignment.

required
world_time_columns list[str]

Year columns from world scenario.

required
group_level str

Index level for grouping.

required
unit_level str

Index level for units.

required

Returns:

Type Description
DataFrame

Result DataFrame with proper multi-index structure.

NGHGI Corrections

Functions for converting IPCC RCBs to NGHGI-consistent values following Weber et al. (2026). See Scientific Documentation for methodology.

load_ar6_category_constants

fair_shares.library.utils.data.nghgi.load_ar6_category_constants

Python
load_ar6_category_constants(
    path: str | Path,
) -> dict[str, dict]

Load pre-computed AR6 category constants from YAML.

The constants file is generated by the RCB preprocessing notebook and contains per-category net-zero years and scenario counts extracted from AR6 reanalysis data.

Parameters:

Name Type Description Default
path str or Path

Path to ar6_category_constants.yaml

required

Returns:

Type Description
dict[str, dict]

Mapping of AR6 category (e.g. "C1") to dict with keys: nz_year_median, n_scenarios, and distribution stats (nz_year_min, nz_year_q25, nz_year_q75, nz_year_max, n_reaching_nz)

Raises:

Type Description
DataLoadingError

If the file does not exist or cannot be parsed

DataProcessingError

If required keys are missing from any category

load_world_co2_lulucf

fair_shares.library.utils.data.nghgi.load_world_co2_lulucf

Python
load_world_co2_lulucf(
    path: str | Path,
) -> tuple[DataFrame, int]

Load world-total NGHGI LULUCF CO2 timeseries from notebook-produced CSV.

Reads the world-total NGHGI-reported LULUCF CO2 values produced by notebook 107 (Melo v3.1). The CSV has a single row with a "source" index and string year columns. Values are in MtCO2/yr (negative = net sink).

The splice year (last year of NGHGI data) is derived dynamically from the data rather than being hardcoded.

Parameters:

Name Type Description Default
path str or Path

Path to world_co2-lulucf_timeseries.csv

required

Returns:

Type Description
tuple[DataFrame, int]

(nghgi_ts, splice_year) where nghgi_ts is a single-row DataFrame indexed by ["source"] with string year columns, and splice_year is the last year of NGHGI data coverage.

Raises:

Type Description
DataLoadingError

If the file does not exist or expected structure is missing

load_bunker_timeseries

fair_shares.library.utils.data.nghgi.load_bunker_timeseries

Python
load_bunker_timeseries(path: str | Path) -> DataFrame

Load international bunker fuel CO2 timeseries from notebook-produced CSV.

Reads the intermediate CSV produced by notebook 107 (LULUCF & bunker preprocessing). The CSV has a single row with a "source" index and string year columns. Values are already in MtCO2/yr.

Parameters:

Name Type Description Default
path str or Path

Path to bunker_timeseries.csv

required

Returns:

Type Description
DataFrame

Single-row DataFrame indexed by ["source"] with string year columns and values in MtCO2/yr

Raises:

Type Description
DataLoadingError

If the file does not exist or expected structure is missing

compute_bunker_deduction

fair_shares.library.utils.data.nghgi.compute_bunker_deduction

Python
compute_bunker_deduction(
    bunker_ts: DataFrame,
    start_year: int,
    net_zero_year: int,
    historical_end_year: int = 2023,
) -> float

Compute cumulative international bunker fuel CO2 deduction.

Combines historical year-by-year values from GCB2024 with extrapolation from the last observed annual rate for years beyond the historical record.

Parameters:

Name Type Description Default
bunker_ts DataFrame

Bunker fuel CO2 timeseries (from load_bunker_timeseries) in MtCO2/yr

required
start_year int

Start of integration window (inclusive)

required
net_zero_year int

End of integration window (inclusive)

required
historical_end_year int

Last year covered by the historical timeseries (default: 2023, matching GCB2024 coverage)

2023

Returns:

Type Description
float

Total cumulative bunker deduction in MtCO2 (always positive)

Raises:

Type Description
DataProcessingError

If historical data is insufficient for the start_year

build_nghgi_world_co2_timeseries

fair_shares.library.utils.data.nghgi.build_nghgi_world_co2_timeseries

Python
build_nghgi_world_co2_timeseries(
    fossil_ts: DataFrame,
    nghgi_ts: DataFrame,
    bunker_ts: DataFrame,
) -> DataFrame

Construct NGHGI-consistent world total CO2 timeseries.

For backward extension of allocation years < 2020, Weber Eq. 3 requires per-year world CO2 = fossil - bunkers + LULUCF, where LULUCF uses: - 2000 onwards: NGHGI LULUCF (Melo v3.1) - Pre-2000: NaN (no fallback — NGHGI coverage only)

No NGHGI/BM splicing is performed. Years outside NGHGI coverage are NaN.

Parameters:

Name Type Description Default
fossil_ts DataFrame

World CO2-FFI emissions timeseries (PRIMAP) in Mt CO2/yr. Must have string year columns and a MultiIndex with (iso3c, unit, emission-category).

required
nghgi_ts DataFrame

NGHGI LULUCF historical timeseries (from load_world_co2_lulucf) in MtCO2/yr. Single-row DataFrame with string year columns.

required
bunker_ts DataFrame

Bunker fuel CO2 timeseries (from load_bunker_timeseries) in MtCO2/yr. Single-row DataFrame with string year columns.

required

Returns:

Type Description
DataFrame

Single-row DataFrame with same index structure as fossil_ts but emission-category label set to "co2", containing per-year NGHGI-consistent total CO2 = fossil - bunkers + LULUCF. Years outside NGHGI LULUCF coverage will be NaN.

compute_cumulative_emissions

fair_shares.library.utils.data.nghgi.compute_cumulative_emissions

Python
compute_cumulative_emissions(
    timeseries: DataFrame, start_year: int, end_year: int
) -> float

Integrate a single-row timeseries DataFrame over a year range.

Sums values for all years from start_year to end_year (inclusive). Missing years are skipped (not interpolated) since gap-filling is the caller's responsibility.

Parameters:

Name Type Description Default
timeseries DataFrame

Single-row DataFrame with string year columns (as produced by the load_* functions in this module)

required
start_year int

First year to include (inclusive)

required
end_year int

Last year to include (inclusive)

required

Returns:

Type Description
float

Cumulative sum over the requested year range

Raises:

Type Description
DataProcessingError

If no year columns fall within the requested range

RCB Processing

Functions for parsing RCB scenarios and converting to allocation-ready budgets.

parse_rcb_scenario

fair_shares.library.utils.data.rcb.parse_rcb_scenario

Python
parse_rcb_scenario(scenario_string: str) -> tuple[str, str]

Parse RCB scenario string into climate assessment and quantile.

RCB scenario strings follow the format "TEMPpPROB" where TEMP is the temperature target (e.g., "1.5" or "2") and PROB is the probability as a percentage (e.g., "50" or "66").

Parameters:

Name Type Description Default
scenario_string str

RCB scenario string (e.g., "1.5p50", "2p66")

required

Returns:

Type Description
tuple[str, str]

A tuple of (climate_assessment, quantile) as strings - climate_assessment: Temperature target with "C" suffix (e.g., "1.5C") - quantile: Probability as decimal string (e.g., "0.5")

calculate_budget_from_rcb

fair_shares.library.utils.data.rcb.calculate_budget_from_rcb

Python
calculate_budget_from_rcb(
    rcb_value: float,
    allocation_year: int,
    world_scenario_emissions_ts: TimeseriesDataFrame,
    verbose: bool = True,
) -> float

Calculate total budget to allocate based on RCB value and allocation year.

RCB (Remaining Carbon Budget) values represent the remaining budget FROM 2020 onwards. The total budget to allocate depends on the allocation year:

  • If allocation_year < 2020: Add historical emissions (allocation_year to 2019)
  • If allocation_year == 2020: Use RCB directly
  • If allocation_year > 2020: Subtract emissions already used (2020 to allocation_year-1)

This ensures that the budget allocation is consistent regardless of which year is chosen as the allocation starting point.

All values are in Mt * CO2. RCB values are converted from Gt to Mt during preprocessing to match the units used in world_scenario_emissions_ts.

Parameters:

Name Type Description Default
rcb_value float

Remaining Carbon Budget value in Mt CO2 (from 2020 onwards)

required
allocation_year int

Year when budget allocation should start

required
world_scenario_emissions_ts TimeseriesDataFrame

World scenario emissions timeseries data with year columns (in Mt CO2)

required
verbose bool

Whether to print detailed calculation information (default: True)

True

Returns:

Type Description
float

Total budget to allocate in Mt CO2

process_rcb_to_2020_baseline

fair_shares.library.utils.data.rcb.process_rcb_to_2020_baseline

Python
process_rcb_to_2020_baseline(
    rcb_value: float,
    rcb_unit: str,
    rcb_baseline_year: int,
    emission_category: str,
    world_co2_ffi_emissions: DataFrame,
    actual_bm_lulucf_emissions: DataFrame | None = None,
    bunkers_deduction_mt: float = 0.0,
    lulucf_deduction_mt: float = 0.0,
    target_baseline_year: int = 2020,
    source_name: str = "",
    scenario: str = "",
    verbose: bool = True,
) -> dict[str, float | str | int]

Process RCB from its original baseline year to 2020 baseline with adjustments.

This function converts RCB values from any baseline year (>= 2020) to a standardized 2020 baseline. It also applies adjustments for international bunkers and LULUCF following Weber et al. (2026).

The rebase always uses actual observational data (PRIMAP), never scenario projections. What enters the rebase depends on the emission category:

  • co2-ffi: Rebase uses fossil CO2 only. LULUCF is omitted because it cancels algebraically with the LULUCF decomposition term. lulucf_deduction = -L_BM(base,NZ) (positive, increases fossil budget).
  • co2: Rebase uses fossil CO2 + actual bookkeeping-model LULUCF. lulucf_deduction = convention_gap(base,NZ) (negative, reduces budget per Weber).

The calculation follows these steps: 1. Convert RCB from source unit to Mt * CO2e 2. If baseline_year > 2020: Add actual emissions from 2020 to (baseline_year - 1) — fossil only for co2-ffi, fossil + BM LULUCF for co2 3. Subtract bunkers deduction (always reduces budget) 4. Apply LULUCF deduction (sign-ready from caller)

Sign convention for deduction parameters: - bunkers_deduction_mt: always positive (cumulative emissions), subtracted - lulucf_deduction_mt: sign-ready from caller (added directly to budget): - For co2: convention gap, negative -> reduces budget (per Weber) - For co2-ffi: negated BM LULUCF, positive -> increases fossil budget

Parameters:

Name Type Description Default
rcb_value float

Original RCB value from the source

required
rcb_unit str

Unit of the RCB value (e.g., "Gt * CO2", "Mt * CO2")

required
rcb_baseline_year int

The year from which the RCB is calculated (must be >= 2020)

required
emission_category str

Emission category: "co2-ffi" or "co2". Controls whether BM LULUCF is included in the rebase.

required
world_co2_ffi_emissions DataFrame

World-level CO2-FFI emissions timeseries with year columns (in Mt * CO2e)

required
actual_bm_lulucf_emissions DataFrame or None

Actual bookkeeping-model LULUCF CO2 emissions from PRIMAP, with year columns (in Mt * CO2e). Used ONLY for the co2 rebase (default: None).

None
bunkers_deduction_mt float

Total bunker CO2 emissions from 2020-2100 in Mt * CO2e (default: 0.0). Always positive; subtracted from budget.

0.0
lulucf_deduction_mt float

LULUCF adjustment in Mt * CO2e, sign-ready (default: 0.0). Added directly to the budget -- caller is responsible for correct sign.

0.0
target_baseline_year int

Target baseline year for standardization (default: 2020)

2020
source_name str

Name of the RCB source for logging (default: "")

''
scenario str

Scenario name for logging (default: "")

''
verbose bool

Whether to print detailed calculation information (default: True)

True

Returns:

Type Description
dict

Dictionary containing: - 'rcb_2020_mt': RCB adjusted to 2020 baseline in Mt * CO2e - 'rcb_original_value': Original RCB value (in source units) - 'rcb_original_unit': Original RCB unit - 'baseline_year': Original baseline year - 'rebase_total_mt': Emissions added to rebase from source year to 2020 (positive, Mt * CO2e); fossil only for co2-ffi, fossil + actual BM LULUCF for co2 - 'rebase_fossil_mt': Fossil-only component of rebase (Mt * CO2e) - 'rebase_lulucf_mt': Actual BM LULUCF component of rebase (Mt * CO2e); only non-zero for co2 - 'deduction_bunkers_mt': Bunker fuel deduction (negative, Mt * CO2e) - 'deduction_lulucf_mt': LULUCF deduction (Mt * CO2e; sign depends on emission category) - 'net_adjustment_mt': Total change from original to 2020 baseline (rebase + deductions, Mt * CO2e)

See Also