pandas_openscm.index_manipulation#

Manipulation of the index of data

Functions:

Name	Description
`convert_index_to_category_index`	Convert the index's values to categories
`create_level_from_collection`	Create new level and corresponding codes.
`create_new_level_and_codes_by_mapping`	Create a new level and associated codes by mapping an existing level
`create_new_level_and_codes_by_mapping_multiple`	Create a new level and associated codes by mapping existing levels
`ensure_index_is_multiindex`	Ensure that the index of a pandas object is a pd.MultiIndex
`ensure_is_multiindex`	Ensure that an index is a pd.MultiIndex
`set_index_levels_func`	Set the index levels of a pd.DataFrame
`set_levels`	Set the levels of a MultiIndex to the provided values
`unify_index_levels`	Unify the levels on two indexes
`unify_index_levels_check_index_types`	Unify the levels on two indexes
`update_index_from_candidates`	Update the index of data to align with the candidate columns as much as possible
`update_index_levels_from_other_func`	Update the index levels based on other levels of a pandas object
`update_index_levels_func`	Update the index levels of a pandas object
`update_levels`	Update the levels of a pd.MultiIndex
`update_levels_from_other`	Update levels based on other levels in a pd.MultiIndex

convert_index_to_category_index #

convert_index_to_category_index(pandas_obj: P) -> P

Convert the index's values to categories

This can save a lot of memory and improve the speed of processing. However, it comes with some pitfalls. For a nice discussion of some of them, see this article.

Parameters:

Name	Type	Description	Default
`pandas_obj`	`P`	Object whose index we want to change to categorical.	required

Returns:

Type	Description
`P`	A new object with the same data as `pandas_obj` but a category type index.

Source code in src/pandas_openscm/index_manipulation.py

def convert_index_to_category_index(pandas_obj: P) -> P:
    """
    Convert the index's values to categories

    This can save a lot of memory and improve the speed of processing.
    However, it comes with some pitfalls.
    For a nice discussion of some of them,
    see [this article](https://towardsdatascience.com/staying-sane-while-adopting-pandas-categorical-datatypes-78dbd19dcd8a/).

    Parameters
    ----------
    pandas_obj
        Object whose index we want to change to categorical.

    Returns
    -------
    :
        A new object with the same data as `pandas_obj`
        but a category type index.
    """
    new_index = pd.MultiIndex.from_frame(
        pandas_obj.index.to_frame(index=False).astype("category")
    )

    if hasattr(pandas_obj, "columns"):
        return type(pandas_obj)(  # type: ignore # confusing mypy here
            pandas_obj.values,
            index=new_index,
            columns=pandas_obj.columns,
        )

    return type(pandas_obj)(
        pandas_obj.values,
        index=new_index,
    )

create_level_from_collection #

create_level_from_collection(
    level: str, value: Collection[Any]
) -> tuple[Index[Any], NDArray[integer[Any]]]

Create new level and corresponding codes.

Parameters:

Name	Type	Description	Default
`level`	`str`	Name of the level to create	required
`value`	`Collection[Any]`	Values to use to create the level	required

Returns:

Type	Description
`tuple[Index[Any], NDArray[integer[Any]]]`	New level and corresponding codes

Source code in src/pandas_openscm/index_manipulation.py

def create_level_from_collection(
    level: str, value: Collection[Any]
) -> tuple[pd.Index[Any], npt.NDArray[np.integer[Any]]]:
    """
    Create new level and corresponding codes.

    Parameters
    ----------
    level
        Name of the level to create

    value
        Values to use to create the level

    Returns
    -------
    :
        New level and corresponding codes
    """
    new_level: pd.Index[Any] = pd.Index(value, name=level)
    if not new_level.has_duplicates:
        # Fast route, can just return new level and codes from level we mapped from
        return new_level, np.arange(len(value))

    # Slow route, have to update the codes
    new_level = new_level.unique()
    new_codes = new_level.get_indexer(value)  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused

    return new_level, new_codes

create_new_level_and_codes_by_mapping #

create_new_level_and_codes_by_mapping(
    ini: MultiIndex,
    level_to_create_from: str,
    mapper: Callable[[Any], Any]
    | dict[Any, Any]
    | Series[Any],
) -> tuple[Index[Any], NDArray[integer[Any]]]

Create a new level and associated codes by mapping an existing level

This is a thin function intended for internal use to handle some slightly tricky logic.

Parameters:

Name	Type	Description	Default
`ini`	`MultiIndex`	Input index	required
`level_to_create_from`	`str`	Level to create the new level from	required
`mapper`	`Callable[[Any], Any] \| dict[Any, Any] \| Series[Any]`	Function to use to map existing levels to new levels	required

Returns:

Name	Type	Description
`new_level`	`Index[Any]`	New level
`new_codes`	`NDArray[integer[Any]]`	New codes

Source code in src/pandas_openscm/index_manipulation.py

def create_new_level_and_codes_by_mapping(
    ini: pd.MultiIndex,
    level_to_create_from: str,
    mapper: Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any],
) -> tuple[pd.Index[Any], npt.NDArray[np.integer[Any]]]:
    """
    Create a new level and associated codes by mapping an existing level

    This is a thin function intended for internal use
    to handle some slightly tricky logic.

    Parameters
    ----------
    ini
        Input index

    level_to_create_from
        Level to create the new level from

    mapper
        Function to use to map existing levels to new levels

    Returns
    -------
    new_level :
        New level

    new_codes :
        New codes
    """
    # There might be a faster way to do this if you work on the codes directly
    # and only use the unique level values.
    # However, it might still be slower than using pandas' compiled C stuff.
    level_to_map_from_idx = ini.names.index(level_to_create_from)
    new_level = ini.levels[level_to_map_from_idx].map(mapper)  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused
    if not new_level.has_duplicates:
        # Fast route,
        # can just return new level and codes based on the simple mapping alone
        return new_level, ini.codes[level_to_map_from_idx]

    # Slow route: have to update the codes
    # because the mapping isn't 1:1
    # (it is many:1).
    #
    # Step 1: use the result from above
    # to get the new level we actually want i.e. a level that only has unique entries
    new_level = new_level.unique()

    # Step 2: get the new i.e. mapped values.
    # This seems to be the easiest (maybe fastest too?) way to do the final step.
    mapped_values = ini.get_level_values(level_to_create_from).map(mapper)  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused

    # Step 3: use pandas' inbuilt functionality to get the new codes
    # by getting the indexer we need based on the new level
    # and the mapped values from the above.
    # There might be a faster way to do this,
    # but this is the simplest and given it uses pandas' internals,
    # it's probably already quite fast.
    new_codes = new_level.get_indexer(mapped_values)

    return new_level, new_codes

create_new_level_and_codes_by_mapping_multiple #

create_new_level_and_codes_by_mapping_multiple(
    ini: MultiIndex,
    levels_to_create_from: tuple[str, ...],
    mapper: Callable[[Any], Any]
    | dict[Any, Any]
    | Series[Any],
) -> tuple[Index[Any], NDArray[integer[Any]]]

Create a new level and associated codes by mapping existing levels

This is a thin function intended for internal use to handle some slightly tricky logic.

Parameters:

Name	Type	Description	Default
`ini`	`MultiIndex`	Input index	required
`levels_to_create_from`	`tuple[str, ...]`	Levels to create the new level from	required
`mapper`	`Callable[[Any], Any] \| dict[Any, Any] \| Series[Any]`	Function to use to map existing levels to new levels	required

Returns:

Name	Type	Description
`new_level`	`Index[Any]`	New level
`new_codes`	`NDArray[integer[Any]]`	New codes

Source code in src/pandas_openscm/index_manipulation.py

def create_new_level_and_codes_by_mapping_multiple(
    ini: pd.MultiIndex,
    levels_to_create_from: tuple[str, ...],
    mapper: Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any],
) -> tuple[pd.Index[Any], npt.NDArray[np.integer[Any]]]:
    """
    Create a new level and associated codes by mapping existing levels

    This is a thin function intended for internal use
    to handle some slightly tricky logic.

    Parameters
    ----------
    ini
        Input index

    levels_to_create_from
        Levels to create the new level from

    mapper
        Function to use to map existing levels to new levels

    Returns
    -------
    new_level :
        New level

    new_codes :
        New codes
    """
    # You could probably do some optimisation here
    # that checks for unique combinations of codes
    # for the levels we're using,
    # then only applies the mapping to those unique combos
    # to reduce the number of evaluations of mapper.
    # That feels tricky to get right, so just doing the brute force way for now.
    levels_to_drop = [v for v in ini.names if v not in levels_to_create_from]
    dup_level = ini.droplevel(levels_to_drop).map(mapper)  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused

    # Brute force: get codes from new levels
    new_level = dup_level.unique()
    new_codes = new_level.get_indexer(dup_level)

    return new_level, new_codes

ensure_index_is_multiindex #

ensure_index_is_multiindex(
    pandas_obj: P, copy: bool = True
) -> P

Ensure that the index of a pandas object is a pd.MultiIndex

Parameters:

Name	Type	Description	Default
`pandas_obj`	`P`	Object whose index we want to ensure is a pd.MultiIndex	required
`copy`	`bool`	Should we copy `pandas_obj` before modifying the index?	`True`

Returns:

Type	Description
`P`	`pandas_obj` with a pd.MultiIndex If the index was already a pd.MultiIndex, this is a no-op (although the value of copy is respected).

Source code in src/pandas_openscm/index_manipulation.py

def ensure_index_is_multiindex(pandas_obj: P, copy: bool = True) -> P:
    """
    Ensure that the index of a pandas object is a [pd.MultiIndex][pandas.MultiIndex]

    Parameters
    ----------
    pandas_obj
        Object whose index we want to ensure is a [pd.MultiIndex][pandas.MultiIndex]

    copy
        Should we copy `pandas_obj` before modifying the index?

    Returns
    -------
    :
        `pandas_obj` with a [pd.MultiIndex][pandas.MultiIndex]

        If the index was already a [pd.MultiIndex][pandas.MultiIndex],
        this is a no-op (although the value of copy is respected).
    """
    if copy:
        pandas_obj = pandas_obj.copy()  # ty: ignore

    if isinstance(pandas_obj.index, pd.MultiIndex):
        return pandas_obj

    pandas_obj.index = ensure_is_multiindex(pandas_obj.index)

    return pandas_obj

ensure_is_multiindex #

ensure_is_multiindex(
    index: Index[Any] | MultiIndex,
) -> MultiIndex

Ensure that an index is a pd.MultiIndex

Parameters:

Name	Type	Description	Default
`index`	`Index[Any] \| MultiIndex`	Index to check	required

Returns:

Type	Description
`MultiIndex`	Index, cast to pd.MultiIndex if needed

Source code in src/pandas_openscm/index_manipulation.py

def ensure_is_multiindex(index: pd.Index[Any] | pd.MultiIndex) -> pd.MultiIndex:
    """
    Ensure that an index is a [pd.MultiIndex][pandas.MultiIndex]

    Parameters
    ----------
    index
        Index to check

    Returns
    -------
    :
        Index, cast to [pd.MultiIndex][pandas.MultiIndex] if needed
    """
    if isinstance(index, pd.MultiIndex):
        return index

    return pd.MultiIndex.from_arrays([index.values], names=[index.name])

set_index_levels_func #

set_index_levels_func(
    pobj: P,
    levels_to_set: dict[str, Any | Collection[Any]],
    copy: bool = True,
) -> P

Set the index levels of a pd.DataFrame

Parameters:

Name	Type	Description	Default
`pobj`	`P`	Supported pandas object to update	required
`levels_to_set`	`dict[str, Any \| Collection[Any]]`	Mapping of level names to values to set	required
`copy`	`bool`	Should `pobj` be copied before returning?	`True`

Returns:

Type	Description
`P`	`pobj` with updates applied to its index

Source code in src/pandas_openscm/index_manipulation.py

def set_index_levels_func(
    pobj: P,
    levels_to_set: dict[str, Any | Collection[Any]],
    copy: bool = True,
) -> P:
    """
    Set the index levels of a [pd.DataFrame][pandas.DataFrame]

    Parameters
    ----------
    pobj
        Supported [pandas][] object to update

    levels_to_set
        Mapping of level names to values to set

    copy
        Should `pobj` be copied before returning?

    Returns
    -------
    :
        `pobj` with updates applied to its index
    """
    if not isinstance(pobj.index, pd.MultiIndex):
        msg = (
            "This function is only intended to be used "
            "when `pobj`'s index is an instance of `MultiIndex`. "
            f"Received {type(pobj.index)=}"
        )
        raise TypeError(msg)

    if copy:
        pobj = pobj.copy()  # ty: ignore[invalid-argument-type, invalid-assignment]

    pobj.index = set_levels(pobj.index, levels_to_set=levels_to_set)  # type: ignore[arg-type] # pandas-stubs confused

    return pobj

set_levels #

set_levels(
    ini: MultiIndex,
    levels_to_set: dict[str, Any | Collection[Any]],
) -> MultiIndex

Set the levels of a MultiIndex to the provided values

Parameters:

Name	Type	Description	Default
`ini`	`MultiIndex`	Input MultiIndex	required
`levels_to_set`	`dict[str, Any \| Collection[Any]]`	Mapping of level names to values to set. If values is of type `Collection`, it must be of the same length as the MultiIndex. If it is not a `Collection`, it will be set to the same value for all levels.	required

Returns:

Type	Description
`MultiIndex`	New MultiIndex with the levels set to the provided values

Raises:

Type	Description
`TypeError`	If `ini` is not a MultiIndex
`ValueError`	If the length of the values is a collection that is not equal to the length of the index

Examples:

>>> start = pd.MultiIndex.from_tuples(
...     [
...         ("sa", "ma", "v1", "kg"),
...         ("sb", "ma", "v2", "m"),
...         ("sa", "mb", "v1", "kg"),
...         ("sa", "mb", "v2", "m"),
...     ],
...     names=["scenario", "model", "variable", "unit"],
... )
>>> start
MultiIndex([('sa', 'ma', 'v1', 'kg'),
            ('sb', 'ma', 'v2',  'm'),
            ('sa', 'mb', 'v1', 'kg'),
            ('sa', 'mb', 'v2',  'm')],
           names=['scenario', 'model', 'variable', 'unit'])
>>>
>>> # Set a new level with a single string
>>> set_levels(
...     start,
...     {"new_variable": "xyz"},
... )
MultiIndex([('sa', 'ma', 'v1', 'kg', 'xyz'),
        ('sb', 'ma', 'v2',  'm', 'xyz'),
        ('sa', 'mb', 'v1', 'kg', 'xyz'),
        ('sa', 'mb', 'v2',  'm', 'xyz')],
       names=['scenario', 'model', 'variable', 'unit', 'new_variable'])
>>>
>>> # Replace a level with a collection
>>> set_levels(
...     start,
...     {"new_variable": [1, 2, 3, 4]},
... )
MultiIndex([('sa', 'ma', 'v1', 'kg', 1),
            ('sb', 'ma', 'v2',  'm', 2),
            ('sa', 'mb', 'v1', 'kg', 3),
            ('sa', 'mb', 'v2',  'm', 4)],
           names=['scenario', 'model', 'variable', 'unit', 'new_variable'])
>>>
>>> # Replace a level with a single value and add a new level
>>> set_levels(
...     start,
...     {"model": "new_model", "new_variable": ["xyz", "xyz", "x", "y"]},
... )
MultiIndex([('sa', 'new_model', 'v1', 'kg', 'xyz'),
            ('sb', 'new_model', 'v2',  'm', 'xyz'),
            ('sa', 'new_model', 'v1', 'kg',   'x'),
            ('sa', 'new_model', 'v2',  'm',   'y')],
           names=['scenario', 'model', 'variable', 'unit', 'new_variable'])

Source code in src/pandas_openscm/index_manipulation.py

def set_levels(
    ini: pd.MultiIndex, levels_to_set: dict[str, Any | Collection[Any]]
) -> pd.MultiIndex:
    """
    Set the levels of a MultiIndex to the provided values

    Parameters
    ----------
    ini
        Input MultiIndex

    levels_to_set
        Mapping of level names to values to set. If values is of type `Collection`,
        it must be of the same length as the MultiIndex. If it is not a `Collection`,
        it will be set to the same value for all levels.

    Returns
    -------
    :
        New MultiIndex with the levels set to the provided values

    Raises
    ------
    TypeError
        If `ini` is not a MultiIndex
    ValueError
        If the length of the values is a collection that is not equal to the
        length of the index

    Examples
    --------
    >>> start = pd.MultiIndex.from_tuples(
    ...     [
    ...         ("sa", "ma", "v1", "kg"),
    ...         ("sb", "ma", "v2", "m"),
    ...         ("sa", "mb", "v1", "kg"),
    ...         ("sa", "mb", "v2", "m"),
    ...     ],
    ...     names=["scenario", "model", "variable", "unit"],
    ... )
    >>> start
    MultiIndex([('sa', 'ma', 'v1', 'kg'),
                ('sb', 'ma', 'v2',  'm'),
                ('sa', 'mb', 'v1', 'kg'),
                ('sa', 'mb', 'v2',  'm')],
               names=['scenario', 'model', 'variable', 'unit'])
    >>>
    >>> # Set a new level with a single string
    >>> set_levels(
    ...     start,
    ...     {"new_variable": "xyz"},
    ... )
    MultiIndex([('sa', 'ma', 'v1', 'kg', 'xyz'),
            ('sb', 'ma', 'v2',  'm', 'xyz'),
            ('sa', 'mb', 'v1', 'kg', 'xyz'),
            ('sa', 'mb', 'v2',  'm', 'xyz')],
           names=['scenario', 'model', 'variable', 'unit', 'new_variable'])
    >>>
    >>> # Replace a level with a collection
    >>> set_levels(
    ...     start,
    ...     {"new_variable": [1, 2, 3, 4]},
    ... )
    MultiIndex([('sa', 'ma', 'v1', 'kg', 1),
                ('sb', 'ma', 'v2',  'm', 2),
                ('sa', 'mb', 'v1', 'kg', 3),
                ('sa', 'mb', 'v2',  'm', 4)],
               names=['scenario', 'model', 'variable', 'unit', 'new_variable'])
    >>>
    >>> # Replace a level with a single value and add a new level
    >>> set_levels(
    ...     start,
    ...     {"model": "new_model", "new_variable": ["xyz", "xyz", "x", "y"]},
    ... )
    MultiIndex([('sa', 'new_model', 'v1', 'kg', 'xyz'),
                ('sb', 'new_model', 'v2',  'm', 'xyz'),
                ('sa', 'new_model', 'v1', 'kg',   'x'),
                ('sa', 'new_model', 'v2',  'm',   'y')],
               names=['scenario', 'model', 'variable', 'unit', 'new_variable'])
    """
    levels: list[pd.Index[Any]] = list(ini.levels)
    codes: list[npt.NDArray[np.integer[Any]]] = list(ini.codes)
    names: list[str] = list(ini.names)  # type: ignore[arg-type] # ty: ignore[invalid-assignment] # pandas-stubs confused

    for level, value in levels_to_set.items():
        if isinstance(value, Collection) and not isinstance(value, str):
            if len(value) != len(ini):
                msg = (
                    f"Length of values for level '{level}' does not "
                    f"match index length: {len(value)} != {len(ini)}"
                )
                raise ValueError(msg)
            new_level, new_codes = create_level_from_collection(level, value)
        else:
            new_level = pd.Index([value], name=level)
            new_codes = np.zeros(ini.shape[0], dtype=int)

        if level in ini.names:
            level_idx = ini.names.index(level)
            levels[level_idx] = new_level
            codes[level_idx] = new_codes
        else:
            levels.append(new_level)
            codes.append(new_codes)
            names.append(level)

    res = pd.MultiIndex(levels=levels, codes=codes, names=names)  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused

    return res

unify_index_levels #

unify_index_levels(
    left: MultiIndex, right: MultiIndex
) -> tuple[MultiIndex, MultiIndex]

Unify the levels on two indexes

The levels are unified by simply adding NaN to any level in either left or right that is not in the level of the other index.

This is differnt to pd.DataFrame.align. pd.DataFrame.align will fill missing values with values from the other index if it can. We don't want that here. We want any non-aligned levels to be filled with NaN.

The implementation also allows this to be performed on indexes directly (avoiding casting to a DataFrame and avoiding paying the price of aligning everything else or creating a bunch of NaN that we just drop straight away).

The indexes are returned with the levels from left first, then the levels from right.

Parameters:

Name	Type	Description	Default
`left`	`MultiIndex`	First index to unify	required
`right`	`MultiIndex`	Second index to unify	required

Returns:

Name	Type	Description
`left_aligned`	`MultiIndex`	Left after alignment
`right_aligned`	`MultiIndex`	Right after alignment

Examples:

>>> import pandas as pd
>>>
>>> idx_a = pd.MultiIndex.from_tuples(
...     [
...         (1, 2, 3),
...         (4, 5, 6),
...     ],
...     names=["a", "b", "c"],
... )
>>> idx_b = pd.MultiIndex.from_tuples(
...     [
...         (7, 8),
...         (10, 11),
...     ],
...     names=["a", "b"],
... )
>>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
>>> unified_a
MultiIndex([(1, 2, 3),
            (4, 5, 6)],
           names=['a', 'b', 'c'])
>>>
>>> unified_b
MultiIndex([( 7,  8, nan),
            (10, 11, nan)],
           names=['a', 'b', 'c'])
>>>
>>> # Also fine if b has swapped levels
>>> idx_b = pd.MultiIndex.from_tuples(
...     [
...         (7, 8),
...         (10, 11),
...     ],
...     names=["b", "a"],
... )
>>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
>>> unified_a
MultiIndex([(1, 2, 3),
            (4, 5, 6)],
           names=['a', 'b', 'c'])
>>>
>>> unified_b
MultiIndex([( 8,  7, nan),
            (11, 10, nan)],
           names=['a', 'b', 'c'])
>>>
>>> # Also works if a is 'inside' b
>>> idx_a = pd.MultiIndex.from_tuples(
...     [
...         (7, 8),
...         (10, 11),
...     ],
...     names=["a", "b"],
... )
>>> idx_b = pd.MultiIndex.from_tuples(
...     [
...         (1, 2, 3),
...         (4, 5, 6),
...     ],
...     names=["a", "b", "c"],
... )
>>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
>>> unified_a
MultiIndex([( 7,  8, nan),
            (10, 11, nan)],
           names=['a', 'b', 'c'])
>>>
>>> unified_b
MultiIndex([(1, 2, 3),
            (4, 5, 6)],
           names=['a', 'b', 'c'])
>>>
>>> # But, be a bit careful, this is now sensitive to a's column order
>>> idx_a = pd.MultiIndex.from_tuples(
...     [
...         (7, 8),
...         (10, 11),
...     ],
...     names=["b", "a"],
... )
>>> idx_b = pd.MultiIndex.from_tuples(
...     [
...         (1, 2, 3),
...         (4, 5, 6),
...     ],
...     names=["a", "b", "c"],
... )
>>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
>>> # Note that the names are `['b', 'a', 'c']` in the output
>>> unified_a
MultiIndex([( 7,  8, nan),
            (10, 11, nan)],
           names=['b', 'a', 'c'])
>>>
>>> unified_b
MultiIndex([(2, 1, 3),
            (5, 4, 6)],
           names=['b', 'a', 'c'])

Source code in src/pandas_openscm/index_manipulation.py

def unify_index_levels(
    left: pd.MultiIndex, right: pd.MultiIndex
) -> tuple[pd.MultiIndex, pd.MultiIndex]:
    """
    Unify the levels on two indexes

    The levels are unified by simply adding NaN to any level in either `left` or `right`
    that is not in the level of the other index.

    This is differnt to [pd.DataFrame.align][pandas.DataFrame.align].
    [pd.DataFrame.align][pandas.DataFrame.align]
    will fill missing values with values from the other index if it can.
    We don't want that here.
    We want any non-aligned levels to be filled with NaN.

    The implementation also allows this to be performed on indexes directly
    (avoiding casting to a DataFrame
    and avoiding paying the price of aligning everything else
    or creating a bunch of NaN that we just drop straight away).

    The indexes are returned with the levels from `left` first,
    then the levels from `right`.

    Parameters
    ----------
    left
        First index to unify

    right
        Second index to unify

    Returns
    -------
    left_aligned :
        Left after alignment

    right_aligned :
        Right after alignment

    Examples
    --------
    >>> import pandas as pd
    >>>
    >>> idx_a = pd.MultiIndex.from_tuples(
    ...     [
    ...         (1, 2, 3),
    ...         (4, 5, 6),
    ...     ],
    ...     names=["a", "b", "c"],
    ... )
    >>> idx_b = pd.MultiIndex.from_tuples(
    ...     [
    ...         (7, 8),
    ...         (10, 11),
    ...     ],
    ...     names=["a", "b"],
    ... )
    >>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
    >>> unified_a
    MultiIndex([(1, 2, 3),
                (4, 5, 6)],
               names=['a', 'b', 'c'])
    >>>
    >>> unified_b
    MultiIndex([( 7,  8, nan),
                (10, 11, nan)],
               names=['a', 'b', 'c'])
    >>>
    >>> # Also fine if b has swapped levels
    >>> idx_b = pd.MultiIndex.from_tuples(
    ...     [
    ...         (7, 8),
    ...         (10, 11),
    ...     ],
    ...     names=["b", "a"],
    ... )
    >>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
    >>> unified_a
    MultiIndex([(1, 2, 3),
                (4, 5, 6)],
               names=['a', 'b', 'c'])
    >>>
    >>> unified_b
    MultiIndex([( 8,  7, nan),
                (11, 10, nan)],
               names=['a', 'b', 'c'])
    >>>
    >>> # Also works if a is 'inside' b
    >>> idx_a = pd.MultiIndex.from_tuples(
    ...     [
    ...         (7, 8),
    ...         (10, 11),
    ...     ],
    ...     names=["a", "b"],
    ... )
    >>> idx_b = pd.MultiIndex.from_tuples(
    ...     [
    ...         (1, 2, 3),
    ...         (4, 5, 6),
    ...     ],
    ...     names=["a", "b", "c"],
    ... )
    >>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
    >>> unified_a
    MultiIndex([( 7,  8, nan),
                (10, 11, nan)],
               names=['a', 'b', 'c'])
    >>>
    >>> unified_b
    MultiIndex([(1, 2, 3),
                (4, 5, 6)],
               names=['a', 'b', 'c'])
    >>>
    >>> # But, be a bit careful, this is now sensitive to a's column order
    >>> idx_a = pd.MultiIndex.from_tuples(
    ...     [
    ...         (7, 8),
    ...         (10, 11),
    ...     ],
    ...     names=["b", "a"],
    ... )
    >>> idx_b = pd.MultiIndex.from_tuples(
    ...     [
    ...         (1, 2, 3),
    ...         (4, 5, 6),
    ...     ],
    ...     names=["a", "b", "c"],
    ... )
    >>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
    >>> # Note that the names are `['b', 'a', 'c']` in the output
    >>> unified_a
    MultiIndex([( 7,  8, nan),
                (10, 11, nan)],
               names=['b', 'a', 'c'])
    >>>
    >>> unified_b
    MultiIndex([(2, 1, 3),
                (5, 4, 6)],
               names=['b', 'a', 'c'])
    """
    left_names = list(left.names)
    right_names = list(right.names)

    if left_names == right_names:
        return left, right

    if set(left_names) == set(right_names):
        return left, right.reorder_levels(left_names)

    out_names = [*left_names, *[v for v in right_names if v not in left_names]]
    left_to_add = [v for v in out_names if v not in left_names]
    right_to_add = [v for v in out_names if v not in right_names]

    left_unified = pd.MultiIndex(
        levels=[  # ty: ignore[invalid-argument-type]
            *left.levels,  # type: ignore[list-item] # pandas-stubs confused
            *[pd.Index([], dtype=right.get_level_values(c).dtype) for c in left_to_add],  # type: ignore # pandas-stubs confused
        ],
        codes=[  # ty: ignore[invalid-argument-type]
            *left.codes,  # type: ignore[list-item] # pandas-stubs confused
            *([np.full(left.shape[0], -1)] * len(left_to_add)),  # type: ignore[list-item] # pandas-stubs confused
        ],
        names=[
            *left_names,
            *left_to_add,
        ],
    ).reorder_levels(out_names)

    right_unified = pd.MultiIndex(
        levels=[  # ty: ignore[invalid-argument-type]
            *[pd.Index([], dtype=left.get_level_values(c).dtype) for c in right_to_add],  # type: ignore # pandas-stubs confused
            *right.levels,  # type: ignore[list-item] # pandas-stubs confused
        ],
        codes=[  # ty: ignore[invalid-argument-type]
            *([np.full(right.shape[0], -1)] * len(right_to_add)),  # type: ignore[list-item] # pandas-stubs confused
            *right.codes,  # type: ignore[list-item] # pandas-stubs confused
        ],
        names=[
            *right_to_add,
            *right_names,
        ],
    ).reorder_levels(out_names)

    return left_unified, right_unified

unify_index_levels_check_index_types #

unify_index_levels_check_index_types(
    left: Index[Any], right: Index[Any]
) -> tuple[MultiIndex, MultiIndex]

Unify the levels on two indexes

This is just a thin wrapper around unify_index_levels that checks the the inputs are both pd.MultiIndex before unifying the indices.

Parameters:

Name	Type	Description	Default
`left`	`Index[Any]`	First index to unify	required
`right`	`Index[Any]`	Second index to unify	required

Returns:

Name	Type	Description
`left_aligned`	`MultiIndex`	Left after alignment
`right_aligned`	`MultiIndex`	Right after alignment

Source code in src/pandas_openscm/index_manipulation.py

def unify_index_levels_check_index_types(
    left: pd.Index[Any], right: pd.Index[Any]
) -> tuple[pd.MultiIndex, pd.MultiIndex]:
    """
    Unify the levels on two indexes

    This is just a thin wrapper around [unify_index_levels][(m).]
    that checks the the inputs are both [pd.MultiIndex][pandas.MultiIndex]
    before unifying the indices.

    Parameters
    ----------
    left
        First index to unify

    right
        Second index to unify

    Returns
    -------
    left_aligned :
        Left after alignment

    right_aligned :
        Right after alignment
    """
    if not isinstance(left, pd.MultiIndex):
        raise TypeError(left)

    if not isinstance(right, pd.MultiIndex):
        raise TypeError(right)

    return unify_index_levels(left, right)

update_index_from_candidates #

update_index_from_candidates(
    indf: DataFrame, candidates: Iterable[Hashable]
) -> DataFrame

Update the index of data to align with the candidate columns as much as possible

Parameters:

Name	Type	Description	Default
`indf`	`DataFrame`	Data of which to update the index	required
`candidates`	`Iterable[Hashable]`	Candidate columns to use to create the updated index	required

Returns:

Type	Description
`DataFrame`	`indf` with its updated index. All columns of `indf` that are in `candidates` are used to create the index of the result.

Notes

This overwrites any existing index of indf so you will only want to use this function when you're sure that there isn't anything of interest already in the index of indf.

Source code in src/pandas_openscm/index_manipulation.py

def update_index_from_candidates(
    indf: pd.DataFrame, candidates: Iterable[Hashable]
) -> pd.DataFrame:
    """
    Update the index of data to align with the candidate columns as much as possible

    Parameters
    ----------
    indf
        Data of which to update the index

    candidates
        Candidate columns to use to create the updated index

    Returns
    -------
    :
        `indf` with its updated index.

        All columns of `indf` that are in `candidates`
        are used to create the index of the result.

    Notes
    -----
    This overwrites any existing index of `indf`
    so you will only want to use this function
    when you're sure that there isn't anything of interest
    already in the index of `indf`.
    """
    set_to_index = [v for v in candidates if v in indf.columns]
    res = indf.set_index(set_to_index)

    return res

update_index_levels_from_other_func #

update_index_levels_from_other_func(
    pobj: P,
    update_sources: dict[
        Any,
        tuple[
            Any,
            Callable[[Any], Any]
            | dict[Any, Any]
            | Series[Any],
        ]
        | tuple[
            tuple[Any, ...],
            Callable[[tuple[Any, ...]], Any]
            | dict[tuple[Any, ...], Any]
            | Series[Any],
        ],
    ],
    copy: bool = True,
    remove_unused_levels: bool = True,
) -> P

Update the index levels based on other levels of a pandas object

If the level to be updated doesn't exist, it is created.

Parameters:

Name	Type	Description	Default
`pobj`	`P`	Supported pandas object to update	required
`update_sources`	`dict[Any, tuple[Any, Callable[[Any], Any] \| dict[Any, Any] \| Series[Any]] \| tuple[tuple[Any, ...], Callable[[tuple[Any, ...]], Any] \| dict[tuple[Any, ...], Any] \| Series[Any]]]`	Updates to apply to `pobj`'s index Each key is the level to which the updates will be applied (or the level that will be created if it doesn't already exist). There are two options for the values. The first is used when only one level is used to update the 'target level'. In this case, each value is a tuple of which the first element is the level to use to generate the values (the 'source level') and the second is mapper of the form used by pd.Index.map which will be applied to the source level to update/create the level of interest. Each value is a tuple of which the first element is the level or levels (if a tuple) to use to generate the values (the 'source level') and the second is mapper of the form used by pd.Index.map which will be applied to the source level to update/create the level of interest.	required
`copy`	`bool`	Should `pobj` be copied before returning?	`True`
`remove_unused_levels`	`bool`	Call `pobj.index.remove_unused_levels` before updating the levels This avoids trying to update levels that aren't being used.	`True`

Returns:

Type	Description
`P`	`pobj` with updates applied to its index

Source code in src/pandas_openscm/index_manipulation.py

def update_index_levels_from_other_func(
    pobj: P,
    update_sources: dict[
        Any,
        tuple[
            Any,
            Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any],
        ]
        | tuple[
            tuple[Any, ...],
            Callable[[tuple[Any, ...]], Any]
            | dict[tuple[Any, ...], Any]
            | pd.Series[Any],
        ],
    ],
    copy: bool = True,
    remove_unused_levels: bool = True,
) -> P:
    """
    Update the index levels based on other levels of a [pandas][] object

    If the level to be updated doesn't exist,
    it is created.

    Parameters
    ----------
    pobj
        Supported [pandas][] object to update

    update_sources
        Updates to apply to `pobj`'s index

        Each key is the level to which the updates will be applied
        (or the level that will be created if it doesn't already exist).

        There are two options for the values.

        The first is used when only one level is used to update the 'target level'.
        In this case, each value is a tuple of which the first element
        is the level to use to generate the values (the 'source level')
        and the second is mapper of the form used by
        [pd.Index.map][pandas.Index.map]
        which will be applied to the source level
        to update/create the level of interest.

        Each value is a tuple of which the first element
        is the level or levels (if a tuple)
        to use to generate the values (the 'source level')
        and the second is mapper of the form used by
        [pd.Index.map][pandas.Index.map]
        which will be applied to the source level
        to update/create the level of interest.

    copy
        Should `pobj` be copied before returning?

    remove_unused_levels
        Call `pobj.index.remove_unused_levels` before updating the levels

        This avoids trying to update levels that aren't being used.

    Returns
    -------
    :
        `pobj` with updates applied to its index
    """
    if copy:
        pobj = pobj.copy()  # ty: ignore[invalid-argument-type, invalid-assignment]

    if not isinstance(pobj.index, pd.MultiIndex):
        msg = (
            "This function is only intended to be used "
            "when `pobj`'s index is an instance of `MultiIndex`. "
            f"Received {type(pobj.index)=}"
        )
        raise TypeError(msg)

    pobj.index = update_levels_from_other(
        pobj.index,
        update_sources=update_sources,
        remove_unused_levels=remove_unused_levels,
    )

    return pobj

update_index_levels_func #

update_index_levels_func(
    pobj: P,
    updates: Mapping[
        Any,
        Callable[[Any], Any] | dict[Any, Any] | Series[Any],
    ],
    copy: bool = True,
    remove_unused_levels: bool = True,
) -> P

Update the index levels of a pandas object

Parameters:

Name	Type	Description	Default
`pobj`	`P`	Supported pandas object to update	required
`updates`	`Mapping[Any, Callable[[Any], Any] \| dict[Any, Any] \| Series[Any]]`	Updates to apply to `pobj`'s index Each key is the index level to which the updates will be applied. Each value is a function which updates the levels to their new values.	required
`copy`	`bool`	Should `pobj` be copied before returning?	`True`
`remove_unused_levels`	`bool`	Call `pobj.index.remove_unused_levels` before updating the levels This avoids trying to update levels that aren't being used.	`True`

Returns:

Type	Description
`P`	`pobj` with updates applied to its index

Source code in src/pandas_openscm/index_manipulation.py

def update_index_levels_func(
    pobj: P,
    updates: Mapping[Any, Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any]],
    copy: bool = True,
    remove_unused_levels: bool = True,
) -> P:
    """
    Update the index levels of a [pandas][] object

    Parameters
    ----------
    pobj
        Supported [pandas][] object to update

    updates
        Updates to apply to `pobj`'s index

        Each key is the index level to which the updates will be applied.
        Each value is a function which updates the levels to their new values.

    copy
        Should `pobj` be copied before returning?

    remove_unused_levels
        Call `pobj.index.remove_unused_levels` before updating the levels

        This avoids trying to update levels that aren't being used.

    Returns
    -------
    :
        `pobj` with updates applied to its index
    """
    if copy:
        pobj = pobj.copy()  # ty: ignore[invalid-argument-type, invalid-assignment]

    if not isinstance(pobj.index, pd.MultiIndex):
        msg = (
            "This function is only intended to be used "
            "when `pobj`'s index is an instance of `MultiIndex`. "
            f"Received {type(pobj.index)=}"
        )
        raise TypeError(msg)

    pobj.index = update_levels(
        pobj.index, updates=updates, remove_unused_levels=remove_unused_levels
    )

    return pobj

update_levels #

update_levels(
    ini: MultiIndex,
    updates: Mapping[
        Any,
        Callable[[Any], Any] | dict[Any, Any] | Series[Any],
    ],
    remove_unused_levels: bool = True,
) -> MultiIndex

Update the levels of a pd.MultiIndex

Parameters:

Name	Type	Description	Default
`ini`	`MultiIndex`	Input index	required
`updates`	`Mapping[Any, Callable[[Any], Any] \| dict[Any, Any] \| Series[Any]]`	Updates to apply Each key is the level to which the updates will be applied. Each value is a mapper of the form used by pd.Index.map.	required
`remove_unused_levels`	`bool`	Call `ini.remove_unused_levels` before updating the levels This avoids trying to update levels that aren't being used.	`True`

Returns:

Type	Description
`MultiIndex`	`ini` with updates applied

Raises:

Type	Description
`KeyError`	A level in `updates` is not a level in `ini`

Examples:

>>> start = pd.MultiIndex.from_tuples(
...     [
...         ("sa", "ma", "v1", "kg"),
...         ("sb", "ma", "v2", "m"),
...         ("sa", "mb", "v1", "kg"),
...         ("sa", "mb", "v2", "m"),
...     ],
...     names=["scenario", "model", "variable", "unit"],
... )
>>> start
MultiIndex([('sa', 'ma', 'v1', 'kg'),
            ('sb', 'ma', 'v2',  'm'),
            ('sa', 'mb', 'v1', 'kg'),
            ('sa', 'mb', 'v2',  'm')],
           names=['scenario', 'model', 'variable', 'unit'])
>>>
>>> update_levels(
...     start,
...     {"model": lambda x: f"model {x}", "scenario": lambda x: f"scenario {x}"},
... )
MultiIndex([('scenario sa', 'model ma', 'v1', 'kg'),
            ('scenario sb', 'model ma', 'v2',  'm'),
            ('scenario sa', 'model mb', 'v1', 'kg'),
            ('scenario sa', 'model mb', 'v2',  'm')],
           names=['scenario', 'model', 'variable', 'unit'])
>>>
>>> update_levels(
...     start,
...     {"variable": {"v1": "variable one", "v2": "variable two"}},
... )
MultiIndex([('sa', 'ma', 'variable one', 'kg'),
            ('sb', 'ma', 'variable two',  'm'),
            ('sa', 'mb', 'variable one', 'kg'),
            ('sa', 'mb', 'variable two',  'm')],
           names=['scenario', 'model', 'variable', 'unit'])

Source code in src/pandas_openscm/index_manipulation.py

def update_levels(
    ini: pd.MultiIndex,
    updates: Mapping[Any, Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any]],
    remove_unused_levels: bool = True,
) -> pd.MultiIndex:
    """
    Update the levels of a [pd.MultiIndex][pandas.MultiIndex]

    Parameters
    ----------
    ini
        Input index

    updates
        Updates to apply

        Each key is the level to which the updates will be applied.
        Each value is a mapper of the form used by
        [pd.Index.map][pandas.Index.map].

    remove_unused_levels
        Call `ini.remove_unused_levels` before updating the levels

        This avoids trying to update levels that aren't being used.

    Returns
    -------
    :
        `ini` with updates applied

    Raises
    ------
    KeyError
        A level in `updates` is not a level in `ini`

    Examples
    --------
    >>> start = pd.MultiIndex.from_tuples(
    ...     [
    ...         ("sa", "ma", "v1", "kg"),
    ...         ("sb", "ma", "v2", "m"),
    ...         ("sa", "mb", "v1", "kg"),
    ...         ("sa", "mb", "v2", "m"),
    ...     ],
    ...     names=["scenario", "model", "variable", "unit"],
    ... )
    >>> start
    MultiIndex([('sa', 'ma', 'v1', 'kg'),
                ('sb', 'ma', 'v2',  'm'),
                ('sa', 'mb', 'v1', 'kg'),
                ('sa', 'mb', 'v2',  'm')],
               names=['scenario', 'model', 'variable', 'unit'])
    >>>
    >>> update_levels(
    ...     start,
    ...     {"model": lambda x: f"model {x}", "scenario": lambda x: f"scenario {x}"},
    ... )
    MultiIndex([('scenario sa', 'model ma', 'v1', 'kg'),
                ('scenario sb', 'model ma', 'v2',  'm'),
                ('scenario sa', 'model mb', 'v1', 'kg'),
                ('scenario sa', 'model mb', 'v2',  'm')],
               names=['scenario', 'model', 'variable', 'unit'])
    >>>
    >>> update_levels(
    ...     start,
    ...     {"variable": {"v1": "variable one", "v2": "variable two"}},
    ... )
    MultiIndex([('sa', 'ma', 'variable one', 'kg'),
                ('sb', 'ma', 'variable two',  'm'),
                ('sa', 'mb', 'variable one', 'kg'),
                ('sa', 'mb', 'variable two',  'm')],
               names=['scenario', 'model', 'variable', 'unit'])
    """
    if remove_unused_levels:
        ini = ini.remove_unused_levels()

    levels: list[pd.Index[Any]] = list(ini.levels)
    codes: list[npt.NDArray[np.integer[Any]]] = list(ini.codes)

    for level, updater in updates.items():
        if level not in ini.names:
            msg = (
                f"{level} is not available in the index. Available levels: {ini.names}"
            )
            raise KeyError(msg)

        new_level, new_codes = create_new_level_and_codes_by_mapping(
            ini=ini,
            level_to_create_from=level,
            mapper=updater,
        )

        level_idx = ini.names.index(level)
        levels[level_idx] = new_level
        codes[level_idx] = new_codes

    res = pd.MultiIndex(
        levels=levels,  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused
        codes=codes,  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused
        names=ini.names,
    )

    return res

update_levels_from_other #

update_levels_from_other(
    ini: MultiIndex,
    update_sources: dict[
        Any,
        tuple[
            Any,
            Callable[[Any], Any]
            | dict[Any, Any]
            | Series[Any],
        ]
        | tuple[
            tuple[Any, ...],
            Callable[[tuple[Any, ...]], Any]
            | dict[tuple[Any, ...], Any]
            | Series[Any],
        ],
    ],
    remove_unused_levels: bool = True,
) -> MultiIndex

Update levels based on other levels in a pd.MultiIndex

If the level to be updated doesn't exist, it is created.

Parameters:

Name	Type	Description	Default
`ini`	`MultiIndex`	Input index	required
`update_sources`	`dict[Any, tuple[Any, Callable[[Any], Any] \| dict[Any, Any] \| Series[Any]] \| tuple[tuple[Any, ...], Callable[[tuple[Any, ...]], Any] \| dict[tuple[Any, ...], Any] \| Series[Any]]]`	Updates to apply and their source levels Each key is the level to which the updates will be applied (or the level that will be created if it doesn't already exist). There are two options for the values. The first is used when only one level is used to update the 'target level'. In this case, each value is a tuple of which the first element is the level to use to generate the values (the 'source level') and the second is mapper of the form used by pd.Index.map which will be applied to the source level to update/create the level of interest. Each value is a tuple of which the first element is the level or levels (if a tuple) to use to generate the values (the 'source level') and the second is mapper of the form used by pd.Index.map which will be applied to the source level to update/create the level of interest.	required
`remove_unused_levels`	`bool`	Call `ini.remove_unused_levels` before updating the levels This avoids trying to update based on levels that aren't being used.	`True`

Returns:

Type	Description
`MultiIndex`	`ini` with updates applied

Raises:

Type	Description
`KeyError`	A source level in `update_sources` is not a level in `ini`

Examples:

>>> start = pd.MultiIndex.from_tuples(
...     [
...         ("sa", "ma", "v1", "kg"),
...         ("sb", "ma", "v2", "m"),
...         ("sa", "mb", "v1", "kg"),
...         ("sa", "mb", "v2", "m"),
...     ],
...     names=["scenario", "model", "variable", "unit"],
... )
>>> start
MultiIndex([('sa', 'ma', 'v1', 'kg'),
            ('sb', 'ma', 'v2',  'm'),
            ('sa', 'mb', 'v1', 'kg'),
            ('sa', 'mb', 'v2',  'm')],
           names=['scenario', 'model', 'variable', 'unit'])
>>>
>>> # Create a new level based on an existing level
>>> update_levels_from_other(
...     start,
...     {
...         "unit squared": ("unit", lambda x: f"{x}**2"),
...         "class": ("model", {"ma": "delta", "mb": "gamma"}),
...     },
... )
MultiIndex([('sa', 'ma', 'v1', 'kg', 'kg**2', 'delta'),
            ('sb', 'ma', 'v2',  'm',  'm**2', 'delta'),
            ('sa', 'mb', 'v1', 'kg', 'kg**2', 'gamma'),
            ('sa', 'mb', 'v2',  'm',  'm**2', 'gamma')],
           names=['scenario', 'model', 'variable', 'unit', 'unit squared', 'class'])
>>>
>>> # Update an existing level based on another level
>>> update_levels_from_other(
...     start,
...     {
...         "unit": ("variable", {"v1": "g", "v2": "km"}),
...         "model": ("scenario", lambda x: f"model {x}"),
...     },
... )
MultiIndex([('sa', 'model sa', 'v1',  'g'),
            ('sb', 'model sb', 'v2', 'km'),
            ('sa', 'model sa', 'v1',  'g'),
            ('sa', 'model sa', 'v2', 'km')],
           names=['scenario', 'model', 'variable', 'unit'])
>>>
>>> # Create a new level based on multiple existing levels
>>> update_levels_from_other(
...     start,
...     {
...         "model || scenario": (("model", "scenario"), lambda x: " || ".join(x)),
...     },
... )
MultiIndex([('sa', 'ma', 'v1', 'kg', 'sa || ma'),
            ('sb', 'ma', 'v2',  'm', 'sb || ma'),
            ('sa', 'mb', 'v1', 'kg', 'sa || mb'),
            ('sa', 'mb', 'v2',  'm', 'sa || mb')],
           names=['scenario', 'model', 'variable', 'unit', 'model || scenario'])
>>>
>>> # Both at the same time
>>> update_levels_from_other(
...     start,
...     {
...         "title": ("scenario", lambda x: x.capitalize()),
...         "unit": ("unit", {"v1": "g", "v2": "km"}),
...     },
... )
MultiIndex([('sa', 'ma', 'v1', nan, 'Sa'),
            ('sb', 'ma', 'v2', nan, 'Sb'),
            ('sa', 'mb', 'v1', nan, 'Sa'),
            ('sa', 'mb', 'v2', nan, 'Sa')],
           names=['scenario', 'model', 'variable', 'unit', 'title'])
>>>
>>> # Setting with a range of different methods
>>> update_levels_from_other(
...     start,
...     {
...         # callable
...         "y-label": (("variable", "unit"), lambda x: f"{x[0]} ({x[1]})"),
...         # dict
...         "title": ("scenario", {"sa": "Scenario A", "sb": "Delta"}),
...         # pd.Series
...         "Source": (
...             "model",
...             pd.Series(["Internal", "External"], index=["ma", "mb"]),
...         ),
...     },
... )
MultiIndex([('sa', 'ma', 'v1', 'kg', 'v1 (kg)', 'Scenario A', 'Internal'),
            ('sb', 'ma', 'v2',  'm',  'v2 (m)',      'Delta', 'Internal'),
            ('sa', 'mb', 'v1', 'kg', 'v1 (kg)', 'Scenario A', 'External'),
            ('sa', 'mb', 'v2',  'm',  'v2 (m)', 'Scenario A', 'External')],
           names=['scenario', 'model', 'variable', 'unit', 'y-label', 'title', 'Source'])

Source code in src/pandas_openscm/index_manipulation.py

def update_levels_from_other(
    ini: pd.MultiIndex,
    update_sources: dict[
        Any,
        tuple[
            Any,
            Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any],
        ]
        | tuple[
            tuple[Any, ...],
            Callable[[tuple[Any, ...]], Any]
            | dict[tuple[Any, ...], Any]
            | pd.Series[Any],
        ],
    ],
    remove_unused_levels: bool = True,
) -> pd.MultiIndex:
    """
    Update levels based on other levels in a [pd.MultiIndex][pandas.MultiIndex]

    If the level to be updated doesn't exist,
    it is created.

    Parameters
    ----------
    ini
        Input index

    update_sources
        Updates to apply and their source levels

        Each key is the level to which the updates will be applied
        (or the level that will be created if it doesn't already exist).

        There are two options for the values.

        The first is used when only one level is used to update the 'target level'.
        In this case, each value is a tuple of which the first element
        is the level to use to generate the values (the 'source level')
        and the second is mapper of the form used by
        [pd.Index.map][pandas.Index.map]
        which will be applied to the source level
        to update/create the level of interest.

        Each value is a tuple of which the first element
        is the level or levels (if a tuple)
        to use to generate the values (the 'source level')
        and the second is mapper of the form used by
        [pd.Index.map][pandas.Index.map]
        which will be applied to the source level
        to update/create the level of interest.

    remove_unused_levels
        Call `ini.remove_unused_levels` before updating the levels

        This avoids trying to update based on levels that aren't being used.

    Returns
    -------
    :
        `ini` with updates applied

    Raises
    ------
    KeyError
        A source level in `update_sources` is not a level in `ini`

    Examples
    --------
    >>> start = pd.MultiIndex.from_tuples(
    ...     [
    ...         ("sa", "ma", "v1", "kg"),
    ...         ("sb", "ma", "v2", "m"),
    ...         ("sa", "mb", "v1", "kg"),
    ...         ("sa", "mb", "v2", "m"),
    ...     ],
    ...     names=["scenario", "model", "variable", "unit"],
    ... )
    >>> start
    MultiIndex([('sa', 'ma', 'v1', 'kg'),
                ('sb', 'ma', 'v2',  'm'),
                ('sa', 'mb', 'v1', 'kg'),
                ('sa', 'mb', 'v2',  'm')],
               names=['scenario', 'model', 'variable', 'unit'])
    >>>
    >>> # Create a new level based on an existing level
    >>> update_levels_from_other(
    ...     start,
    ...     {
    ...         "unit squared": ("unit", lambda x: f"{x}**2"),
    ...         "class": ("model", {"ma": "delta", "mb": "gamma"}),
    ...     },
    ... )
    MultiIndex([('sa', 'ma', 'v1', 'kg', 'kg**2', 'delta'),
                ('sb', 'ma', 'v2',  'm',  'm**2', 'delta'),
                ('sa', 'mb', 'v1', 'kg', 'kg**2', 'gamma'),
                ('sa', 'mb', 'v2',  'm',  'm**2', 'gamma')],
               names=['scenario', 'model', 'variable', 'unit', 'unit squared', 'class'])
    >>>
    >>> # Update an existing level based on another level
    >>> update_levels_from_other(
    ...     start,
    ...     {
    ...         "unit": ("variable", {"v1": "g", "v2": "km"}),
    ...         "model": ("scenario", lambda x: f"model {x}"),
    ...     },
    ... )
    MultiIndex([('sa', 'model sa', 'v1',  'g'),
                ('sb', 'model sb', 'v2', 'km'),
                ('sa', 'model sa', 'v1',  'g'),
                ('sa', 'model sa', 'v2', 'km')],
               names=['scenario', 'model', 'variable', 'unit'])
    >>>
    >>> # Create a new level based on multiple existing levels
    >>> update_levels_from_other(
    ...     start,
    ...     {
    ...         "model || scenario": (("model", "scenario"), lambda x: " || ".join(x)),
    ...     },
    ... )
    MultiIndex([('sa', 'ma', 'v1', 'kg', 'sa || ma'),
                ('sb', 'ma', 'v2',  'm', 'sb || ma'),
                ('sa', 'mb', 'v1', 'kg', 'sa || mb'),
                ('sa', 'mb', 'v2',  'm', 'sa || mb')],
               names=['scenario', 'model', 'variable', 'unit', 'model || scenario'])
    >>>
    >>> # Both at the same time
    >>> update_levels_from_other(
    ...     start,
    ...     {
    ...         "title": ("scenario", lambda x: x.capitalize()),
    ...         "unit": ("unit", {"v1": "g", "v2": "km"}),
    ...     },
    ... )
    MultiIndex([('sa', 'ma', 'v1', nan, 'Sa'),
                ('sb', 'ma', 'v2', nan, 'Sb'),
                ('sa', 'mb', 'v1', nan, 'Sa'),
                ('sa', 'mb', 'v2', nan, 'Sa')],
               names=['scenario', 'model', 'variable', 'unit', 'title'])
    >>>
    >>> # Setting with a range of different methods
    >>> update_levels_from_other(
    ...     start,
    ...     {
    ...         # callable
    ...         "y-label": (("variable", "unit"), lambda x: f"{x[0]} ({x[1]})"),
    ...         # dict
    ...         "title": ("scenario", {"sa": "Scenario A", "sb": "Delta"}),
    ...         # pd.Series
    ...         "Source": (
    ...             "model",
    ...             pd.Series(["Internal", "External"], index=["ma", "mb"]),
    ...         ),
    ...     },
    ... )
    MultiIndex([('sa', 'ma', 'v1', 'kg', 'v1 (kg)', 'Scenario A', 'Internal'),
                ('sb', 'ma', 'v2',  'm',  'v2 (m)',      'Delta', 'Internal'),
                ('sa', 'mb', 'v1', 'kg', 'v1 (kg)', 'Scenario A', 'External'),
                ('sa', 'mb', 'v2',  'm',  'v2 (m)', 'Scenario A', 'External')],
               names=['scenario', 'model', 'variable', 'unit', 'y-label', 'title', 'Source'])
    """  # noqa: E501
    if remove_unused_levels:
        ini = ini.remove_unused_levels()

    levels: list[pd.Index[Any]] = list(ini.levels)
    codes: list[npt.NDArray[np.integer[Any]]] = list(ini.codes)
    names: list[str] = list(ini.names)  # type: ignore[arg-type] # ty: ignore[invalid-assignment] # pandas-stubs confused

    for level, (source, updater) in update_sources.items():
        if isinstance(source, tuple):
            missing_levels = set(source) - set(ini.names)
            if missing_levels:
                conj = "is" if len(missing_levels) == 1 else "are"
                msg = (
                    f"{sorted(missing_levels)} {conj} not available in the index. "
                    f"Available levels: {ini.names}"
                )
                raise KeyError(msg)

            new_level, new_codes = create_new_level_and_codes_by_mapping_multiple(
                ini=ini,
                levels_to_create_from=source,
                mapper=updater,
            )

        else:
            if source not in ini.names:
                msg = (
                    f"{source} is not available in the index. "
                    f"Available levels: {ini.names}"
                )
                raise KeyError(msg)

            new_level, new_codes = create_new_level_and_codes_by_mapping(
                ini=ini,
                level_to_create_from=source,
                mapper=updater,
            )

        if level in ini.names:
            level_idx = ini.names.index(level)
            levels[level_idx] = new_level
            codes[level_idx] = new_codes

        else:
            levels.append(new_level)
            codes.append(new_codes)
            names.append(level)

    res = pd.MultiIndex(levels=levels, codes=codes, names=names)  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused

    return res