Data Processing Utilities¶
Functions for processing and transforming input datasets: convergence pathway data preparation, NGHGI-consistent RCB corrections, and RCB scenario processing.
Convergence Data Processing¶
process_emissions_data¶
fair_shares.library.utils.data.convergence.process_emissions_data ¶
process_emissions_data(
country_actual_emissions_ts: TimeseriesDataFrame,
first_allocation_year: int,
emission_category: str,
group_level: str,
unit_level: str,
ur: PlainRegistry,
) -> tuple[
DataFrame, DataFrame, dict[int, str | int | float], str
]
Process country emissions data and extract initial shares.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country_actual_emissions_ts
|
TimeseriesDataFrame
|
Raw country emissions data. |
required |
first_allocation_year
|
int
|
Year to start allocation. |
required |
emission_category
|
str
|
Emission category to analyze. |
required |
group_level
|
str
|
Index level for grouping (e.g., 'iso3c'). |
required |
unit_level
|
str
|
Index level for units. |
required |
ur
|
PlainRegistry
|
Unit registry. |
required |
Returns:
| Type | Description |
|---|---|
tuple
|
(emissions_full_numeric, emissions_countries_full, year_to_label, start_column) |
calculate_initial_shares¶
fair_shares.library.utils.data.convergence.calculate_initial_shares ¶
calculate_initial_shares(
emissions_countries_full: DataFrame,
start_column: str,
group_level: str,
) -> tuple[Series, float]
Calculate initial emission shares from actual emissions at start year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
emissions_countries_full
|
DataFrame
|
Country emissions data (World rows already filtered out). |
required |
start_column
|
str
|
Column label for first allocation year. |
required |
group_level
|
str
|
Index level for grouping. |
required |
Returns:
| Type | Description |
|---|---|
tuple
|
(country_totals, country_sum) where country_totals is Series of emissions by country and country_sum is the total. |
process_world_scenario_data¶
fair_shares.library.utils.data.convergence.process_world_scenario_data ¶
process_world_scenario_data(
world_scenario_emissions_ts: TimeseriesDataFrame,
first_allocation_year: int,
group_level: str,
unit_level: str,
ur: PlainRegistry,
) -> tuple[
DataFrame,
Series,
list[str],
dict[int, str | int | float],
str,
float,
]
Process world scenario emissions and calculate year fractions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
world_scenario_emissions_ts
|
TimeseriesDataFrame
|
World emissions pathway. |
required |
first_allocation_year
|
int
|
Year to start allocation. |
required |
group_level
|
str
|
Index level for grouping. |
required |
unit_level
|
str
|
Index level for units. |
required |
ur
|
PlainRegistry
|
Unit registry. |
required |
Returns:
| Type | Description |
|---|---|
tuple
|
(emissions_world, year_fraction_of_cumulative_emissions, sorted_columns, world_year_to_label, world_start_column, world_total) |
process_population_data¶
fair_shares.library.utils.data.convergence.process_population_data ¶
process_population_data(
population_ts: TimeseriesDataFrame,
first_allocation_year: int,
group_level: str,
unit_level: str,
ur: PlainRegistry,
cumulative_start_year: int | None = None,
) -> Series
Process population data and calculate cumulative population by group.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
population_ts
|
TimeseriesDataFrame
|
Population time series. |
required |
first_allocation_year
|
int
|
Year to start allocation. |
required |
group_level
|
str
|
Index level for grouping. |
required |
unit_level
|
str
|
Index level for units. |
required |
ur
|
PlainRegistry
|
Unit registry. |
required |
cumulative_start_year
|
int | None
|
If provided, cumulative population is computed from this year instead of first_allocation_year. Must be <= first_allocation_year. This shifts entitlements toward historically populous countries when early start years (e.g. 1850) are used. |
None
|
Returns:
| Type | Description |
|---|---|
Series
|
Cumulative population by group. |
build_result_dataframe¶
fair_shares.library.utils.data.convergence.build_result_dataframe ¶
build_result_dataframe(
shares_by_group: DataFrame,
emissions_countries_index: Index,
world_time_columns: list[str],
group_level: str,
unit_level: str,
) -> DataFrame
Build final result DataFrame with proper index structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
shares_by_group
|
DataFrame
|
Calculated shares indexed by group. |
required |
emissions_countries_index
|
Index
|
Original emissions index for alignment. |
required |
world_time_columns
|
list[str]
|
Year columns from world scenario. |
required |
group_level
|
str
|
Index level for grouping. |
required |
unit_level
|
str
|
Index level for units. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Result DataFrame with proper multi-index structure. |
NGHGI Corrections¶
Functions for converting IPCC RCBs to NGHGI-consistent values following Weber et al. (2026). See Scientific Documentation for methodology.
load_ar6_category_constants¶
fair_shares.library.utils.data.nghgi.load_ar6_category_constants ¶
Load pre-computed AR6 category constants from YAML.
The constants file is generated by the RCB preprocessing notebook and contains per-category net-zero years and scenario counts extracted from AR6 reanalysis data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str or Path
|
Path to |
required |
Returns:
| Type | Description |
|---|---|
dict[str, dict]
|
Mapping of AR6 category (e.g. "C1") to dict with keys:
|
Raises:
| Type | Description |
|---|---|
DataLoadingError
|
If the file does not exist or cannot be parsed |
DataProcessingError
|
If required keys are missing from any category |
load_world_co2_lulucf¶
fair_shares.library.utils.data.nghgi.load_world_co2_lulucf ¶
Load world-total NGHGI LULUCF CO2 timeseries from notebook-produced CSV.
Reads the world-total NGHGI-reported LULUCF CO2 values produced by notebook 107 (Melo v3.1). The CSV has a single row with a "source" index and string year columns. Values are in MtCO2/yr (negative = net sink).
The splice year (last year of NGHGI data) is derived dynamically from the data rather than being hardcoded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str or Path
|
Path to |
required |
Returns:
| Type | Description |
|---|---|
tuple[DataFrame, int]
|
(nghgi_ts, splice_year) where nghgi_ts is a single-row DataFrame indexed by ["source"] with string year columns, and splice_year is the last year of NGHGI data coverage. |
Raises:
| Type | Description |
|---|---|
DataLoadingError
|
If the file does not exist or expected structure is missing |
load_bunker_timeseries¶
fair_shares.library.utils.data.nghgi.load_bunker_timeseries ¶
Load international bunker fuel CO2 timeseries from notebook-produced CSV.
Reads the intermediate CSV produced by notebook 107 (LULUCF & bunker preprocessing). The CSV has a single row with a "source" index and string year columns. Values are already in MtCO2/yr.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str or Path
|
Path to |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Single-row DataFrame indexed by ["source"] with string year columns and values in MtCO2/yr |
Raises:
| Type | Description |
|---|---|
DataLoadingError
|
If the file does not exist or expected structure is missing |
compute_bunker_deduction¶
fair_shares.library.utils.data.nghgi.compute_bunker_deduction ¶
compute_bunker_deduction(
bunker_ts: DataFrame,
start_year: int,
net_zero_year: int,
historical_end_year: int = 2023,
) -> float
Compute cumulative international bunker fuel CO2 deduction.
Combines historical year-by-year values from GCB2024 with extrapolation from the last observed annual rate for years beyond the historical record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bunker_ts
|
DataFrame
|
Bunker fuel CO2 timeseries (from load_bunker_timeseries) in MtCO2/yr |
required |
start_year
|
int
|
Start of integration window (inclusive) |
required |
net_zero_year
|
int
|
End of integration window (inclusive) |
required |
historical_end_year
|
int
|
Last year covered by the historical timeseries (default: 2023, matching GCB2024 coverage) |
2023
|
Returns:
| Type | Description |
|---|---|
float
|
Total cumulative bunker deduction in MtCO2 (always positive) |
Raises:
| Type | Description |
|---|---|
DataProcessingError
|
If historical data is insufficient for the start_year |
build_nghgi_world_co2_timeseries¶
fair_shares.library.utils.data.nghgi.build_nghgi_world_co2_timeseries ¶
build_nghgi_world_co2_timeseries(
fossil_ts: DataFrame,
nghgi_ts: DataFrame,
bunker_ts: DataFrame,
) -> DataFrame
Construct NGHGI-consistent world total CO2 timeseries.
For backward extension of allocation years < 2020, Weber Eq. 3 requires per-year world CO2 = fossil - bunkers + LULUCF, where LULUCF uses: - 2000 onwards: NGHGI LULUCF (Melo v3.1) - Pre-2000: NaN (no fallback — NGHGI coverage only)
No NGHGI/BM splicing is performed. Years outside NGHGI coverage are NaN.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fossil_ts
|
DataFrame
|
World CO2-FFI emissions timeseries (PRIMAP) in Mt CO2/yr. Must have string year columns and a MultiIndex with (iso3c, unit, emission-category). |
required |
nghgi_ts
|
DataFrame
|
NGHGI LULUCF historical timeseries (from load_world_co2_lulucf) in MtCO2/yr. Single-row DataFrame with string year columns. |
required |
bunker_ts
|
DataFrame
|
Bunker fuel CO2 timeseries (from load_bunker_timeseries) in MtCO2/yr. Single-row DataFrame with string year columns. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Single-row DataFrame with same index structure as fossil_ts but emission-category label set to "co2", containing per-year NGHGI-consistent total CO2 = fossil - bunkers + LULUCF. Years outside NGHGI LULUCF coverage will be NaN. |
compute_cumulative_emissions¶
fair_shares.library.utils.data.nghgi.compute_cumulative_emissions ¶
compute_cumulative_emissions(
timeseries: DataFrame, start_year: int, end_year: int
) -> float
Integrate a single-row timeseries DataFrame over a year range.
Sums values for all years from start_year to end_year (inclusive). Missing years are skipped (not interpolated) since gap-filling is the caller's responsibility.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
timeseries
|
DataFrame
|
Single-row DataFrame with string year columns (as produced by the load_* functions in this module) |
required |
start_year
|
int
|
First year to include (inclusive) |
required |
end_year
|
int
|
Last year to include (inclusive) |
required |
Returns:
| Type | Description |
|---|---|
float
|
Cumulative sum over the requested year range |
Raises:
| Type | Description |
|---|---|
DataProcessingError
|
If no year columns fall within the requested range |
RCB Processing¶
Functions for parsing RCB scenarios and converting to allocation-ready budgets.
parse_rcb_scenario¶
fair_shares.library.utils.data.rcb.parse_rcb_scenario ¶
Parse RCB scenario string into climate assessment and quantile.
RCB scenario strings follow the format "TEMPpPROB" where TEMP is the temperature target (e.g., "1.5" or "2") and PROB is the probability as a percentage (e.g., "50" or "66").
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scenario_string
|
str
|
RCB scenario string (e.g., "1.5p50", "2p66") |
required |
Returns:
| Type | Description |
|---|---|
tuple[str, str]
|
A tuple of (climate_assessment, quantile) as strings - climate_assessment: Temperature target with "C" suffix (e.g., "1.5C") - quantile: Probability as decimal string (e.g., "0.5") |
calculate_budget_from_rcb¶
fair_shares.library.utils.data.rcb.calculate_budget_from_rcb ¶
calculate_budget_from_rcb(
rcb_value: float,
allocation_year: int,
world_scenario_emissions_ts: TimeseriesDataFrame,
verbose: bool = True,
) -> float
Calculate total budget to allocate based on RCB value and allocation year.
RCB (Remaining Carbon Budget) values represent the remaining budget FROM 2020 onwards. The total budget to allocate depends on the allocation year:
- If allocation_year < 2020: Add historical emissions (allocation_year to 2019)
- If allocation_year == 2020: Use RCB directly
- If allocation_year > 2020: Subtract emissions already used (2020 to allocation_year-1)
This ensures that the budget allocation is consistent regardless of which year is chosen as the allocation starting point.
All values are in Mt * CO2. RCB values are converted from Gt to Mt during preprocessing to match the units used in world_scenario_emissions_ts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rcb_value
|
float
|
Remaining Carbon Budget value in Mt CO2 (from 2020 onwards) |
required |
allocation_year
|
int
|
Year when budget allocation should start |
required |
world_scenario_emissions_ts
|
TimeseriesDataFrame
|
World scenario emissions timeseries data with year columns (in Mt CO2) |
required |
verbose
|
bool
|
Whether to print detailed calculation information (default: True) |
True
|
Returns:
| Type | Description |
|---|---|
float
|
Total budget to allocate in Mt CO2 |
process_rcb_to_2020_baseline¶
fair_shares.library.utils.data.rcb.process_rcb_to_2020_baseline ¶
process_rcb_to_2020_baseline(
rcb_value: float,
rcb_unit: str,
rcb_baseline_year: int,
emission_category: str,
world_co2_ffi_emissions: DataFrame,
actual_bm_lulucf_emissions: DataFrame | None = None,
bunkers_deduction_mt: float = 0.0,
lulucf_deduction_mt: float = 0.0,
target_baseline_year: int = 2020,
source_name: str = "",
scenario: str = "",
verbose: bool = True,
) -> dict[str, float | str | int]
Process RCB from its original baseline year to 2020 baseline with adjustments.
This function converts RCB values from any baseline year (>= 2020) to a standardized 2020 baseline. It also applies adjustments for international bunkers and LULUCF following Weber et al. (2026).
The rebase always uses actual observational data (PRIMAP), never scenario projections. What enters the rebase depends on the emission category:
- co2-ffi: Rebase uses fossil CO2 only. LULUCF is omitted because it cancels algebraically with the LULUCF decomposition term. lulucf_deduction = -L_BM(base,NZ) (positive, increases fossil budget).
- co2: Rebase uses fossil CO2 + actual bookkeeping-model LULUCF. lulucf_deduction = convention_gap(base,NZ) (negative, reduces budget per Weber).
The calculation follows these steps: 1. Convert RCB from source unit to Mt * CO2e 2. If baseline_year > 2020: Add actual emissions from 2020 to (baseline_year - 1) — fossil only for co2-ffi, fossil + BM LULUCF for co2 3. Subtract bunkers deduction (always reduces budget) 4. Apply LULUCF deduction (sign-ready from caller)
Sign convention for deduction parameters: - bunkers_deduction_mt: always positive (cumulative emissions), subtracted - lulucf_deduction_mt: sign-ready from caller (added directly to budget): - For co2: convention gap, negative -> reduces budget (per Weber) - For co2-ffi: negated BM LULUCF, positive -> increases fossil budget
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rcb_value
|
float
|
Original RCB value from the source |
required |
rcb_unit
|
str
|
Unit of the RCB value (e.g., "Gt * CO2", "Mt * CO2") |
required |
rcb_baseline_year
|
int
|
The year from which the RCB is calculated (must be >= 2020) |
required |
emission_category
|
str
|
Emission category: "co2-ffi" or "co2". Controls whether BM LULUCF is included in the rebase. |
required |
world_co2_ffi_emissions
|
DataFrame
|
World-level CO2-FFI emissions timeseries with year columns (in Mt * CO2e) |
required |
actual_bm_lulucf_emissions
|
DataFrame or None
|
Actual bookkeeping-model LULUCF CO2 emissions from PRIMAP, with year columns (in Mt * CO2e). Used ONLY for the co2 rebase (default: None). |
None
|
bunkers_deduction_mt
|
float
|
Total bunker CO2 emissions from 2020-2100 in Mt * CO2e (default: 0.0). Always positive; subtracted from budget. |
0.0
|
lulucf_deduction_mt
|
float
|
LULUCF adjustment in Mt * CO2e, sign-ready (default: 0.0). Added directly to the budget -- caller is responsible for correct sign. |
0.0
|
target_baseline_year
|
int
|
Target baseline year for standardization (default: 2020) |
2020
|
source_name
|
str
|
Name of the RCB source for logging (default: "") |
''
|
scenario
|
str
|
Scenario name for logging (default: "") |
''
|
verbose
|
bool
|
Whether to print detailed calculation information (default: True) |
True
|
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary containing: - 'rcb_2020_mt': RCB adjusted to 2020 baseline in Mt * CO2e - 'rcb_original_value': Original RCB value (in source units) - 'rcb_original_unit': Original RCB unit - 'baseline_year': Original baseline year - 'rebase_total_mt': Emissions added to rebase from source year to 2020 (positive, Mt * CO2e); fossil only for co2-ffi, fossil + actual BM LULUCF for co2 - 'rebase_fossil_mt': Fossil-only component of rebase (Mt * CO2e) - 'rebase_lulucf_mt': Actual BM LULUCF component of rebase (Mt * CO2e); only non-zero for co2 - 'deduction_bunkers_mt': Bunker fuel deduction (negative, Mt * CO2e) - 'deduction_lulucf_mt': LULUCF deduction (Mt * CO2e; sign depends on emission category) - 'net_adjustment_mt': Total change from original to 2020 baseline (rebase + deductions, Mt * CO2e) |
See Also¶
- Core Utilities: General data manipulation functions
- Math Utilities: Convergence solver and adjustments
- NGHGI Corrections (Science): Scientific methodology