Skip to content

pandas_openscm.index_manipulation#

Manipulation of the index of data

Functions:

Name Description
convert_index_to_category_index

Convert the index's values to categories

create_level_from_collection

Create new level and corresponding codes.

create_new_level_and_codes_by_mapping

Create a new level and associated codes by mapping an existing level

create_new_level_and_codes_by_mapping_multiple

Create a new level and associated codes by mapping existing levels

ensure_index_is_multiindex

Ensure that the index of a pandas object is a pd.MultiIndex

ensure_is_multiindex

Ensure that an index is a pd.MultiIndex

set_index_levels_func

Set the index levels of a pd.DataFrame

set_levels

Set the levels of a MultiIndex to the provided values

unify_index_levels

Unify the levels on two indexes

unify_index_levels_check_index_types

Unify the levels on two indexes

update_index_from_candidates

Update the index of data to align with the candidate columns as much as possible

update_index_levels_from_other_func

Update the index levels based on other levels of a pandas object

update_index_levels_func

Update the index levels of a pandas object

update_levels

Update the levels of a pd.MultiIndex

update_levels_from_other

Update levels based on other levels in a pd.MultiIndex

convert_index_to_category_index #

convert_index_to_category_index(pandas_obj: P) -> P

Convert the index's values to categories

This can save a lot of memory and improve the speed of processing. However, it comes with some pitfalls. For a nice discussion of some of them, see this article.

Parameters:

Name Type Description Default
pandas_obj P

Object whose index we want to change to categorical.

required

Returns:

Type Description
P

A new object with the same data as pandas_obj but a category type index.

Source code in src/pandas_openscm/index_manipulation.py
def convert_index_to_category_index(pandas_obj: P) -> P:
    """
    Convert the index's values to categories

    This can save a lot of memory and improve the speed of processing.
    However, it comes with some pitfalls.
    For a nice discussion of some of them,
    see [this article](https://towardsdatascience.com/staying-sane-while-adopting-pandas-categorical-datatypes-78dbd19dcd8a/).

    Parameters
    ----------
    pandas_obj
        Object whose index we want to change to categorical.

    Returns
    -------
    :
        A new object with the same data as `pandas_obj`
        but a category type index.
    """
    new_index = pd.MultiIndex.from_frame(
        pandas_obj.index.to_frame(index=False).astype("category")
    )

    if hasattr(pandas_obj, "columns"):
        return type(pandas_obj)(  # type: ignore # confusing mypy here
            pandas_obj.values,
            index=new_index,
            columns=pandas_obj.columns,
        )

    return type(pandas_obj)(
        pandas_obj.values,
        index=new_index,
    )

create_level_from_collection #

create_level_from_collection(
    level: str, value: Collection[Any]
) -> tuple[Index[Any], NDArray[integer[Any]]]

Create new level and corresponding codes.

Parameters:

Name Type Description Default
level str

Name of the level to create

required
value Collection[Any]

Values to use to create the level

required

Returns:

Type Description
tuple[Index[Any], NDArray[integer[Any]]]

New level and corresponding codes

Source code in src/pandas_openscm/index_manipulation.py
def create_level_from_collection(
    level: str, value: Collection[Any]
) -> tuple[pd.Index[Any], npt.NDArray[np.integer[Any]]]:
    """
    Create new level and corresponding codes.

    Parameters
    ----------
    level
        Name of the level to create

    value
        Values to use to create the level

    Returns
    -------
    :
        New level and corresponding codes
    """
    new_level: pd.Index[Any] = pd.Index(value, name=level)
    if not new_level.has_duplicates:
        # Fast route, can just return new level and codes from level we mapped from
        return new_level, np.arange(len(value))

    # Slow route, have to update the codes
    new_level = new_level.unique()
    new_codes = new_level.get_indexer(value)  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused

    return new_level, new_codes

create_new_level_and_codes_by_mapping #

create_new_level_and_codes_by_mapping(
    ini: MultiIndex,
    level_to_create_from: str,
    mapper: Callable[[Any], Any]
    | dict[Any, Any]
    | Series[Any],
) -> tuple[Index[Any], NDArray[integer[Any]]]

Create a new level and associated codes by mapping an existing level

This is a thin function intended for internal use to handle some slightly tricky logic.

Parameters:

Name Type Description Default
ini MultiIndex

Input index

required
level_to_create_from str

Level to create the new level from

required
mapper Callable[[Any], Any] | dict[Any, Any] | Series[Any]

Function to use to map existing levels to new levels

required

Returns:

Name Type Description
new_level Index[Any]

New level

new_codes NDArray[integer[Any]]

New codes

Source code in src/pandas_openscm/index_manipulation.py
def create_new_level_and_codes_by_mapping(
    ini: pd.MultiIndex,
    level_to_create_from: str,
    mapper: Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any],
) -> tuple[pd.Index[Any], npt.NDArray[np.integer[Any]]]:
    """
    Create a new level and associated codes by mapping an existing level

    This is a thin function intended for internal use
    to handle some slightly tricky logic.

    Parameters
    ----------
    ini
        Input index

    level_to_create_from
        Level to create the new level from

    mapper
        Function to use to map existing levels to new levels

    Returns
    -------
    new_level :
        New level

    new_codes :
        New codes
    """
    # There might be a faster way to do this if you work on the codes directly
    # and only use the unique level values.
    # However, it might still be slower than using pandas' compiled C stuff.
    level_to_map_from_idx = ini.names.index(level_to_create_from)
    new_level = ini.levels[level_to_map_from_idx].map(mapper)  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused
    if not new_level.has_duplicates:
        # Fast route,
        # can just return new level and codes based on the simple mapping alone
        return new_level, ini.codes[level_to_map_from_idx]

    # Slow route: have to update the codes
    # because the mapping isn't 1:1
    # (it is many:1).
    #
    # Step 1: use the result from above
    # to get the new level we actually want i.e. a level that only has unique entries
    new_level = new_level.unique()

    # Step 2: get the new i.e. mapped values.
    # This seems to be the easiest (maybe fastest too?) way to do the final step.
    mapped_values = ini.get_level_values(level_to_create_from).map(mapper)  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused

    # Step 3: use pandas' inbuilt functionality to get the new codes
    # by getting the indexer we need based on the new level
    # and the mapped values from the above.
    # There might be a faster way to do this,
    # but this is the simplest and given it uses pandas' internals,
    # it's probably already quite fast.
    new_codes = new_level.get_indexer(mapped_values)

    return new_level, new_codes

create_new_level_and_codes_by_mapping_multiple #

create_new_level_and_codes_by_mapping_multiple(
    ini: MultiIndex,
    levels_to_create_from: tuple[str, ...],
    mapper: Callable[[Any], Any]
    | dict[Any, Any]
    | Series[Any],
) -> tuple[Index[Any], NDArray[integer[Any]]]

Create a new level and associated codes by mapping existing levels

This is a thin function intended for internal use to handle some slightly tricky logic.

Parameters:

Name Type Description Default
ini MultiIndex

Input index

required
levels_to_create_from tuple[str, ...]

Levels to create the new level from

required
mapper Callable[[Any], Any] | dict[Any, Any] | Series[Any]

Function to use to map existing levels to new levels

required

Returns:

Name Type Description
new_level Index[Any]

New level

new_codes NDArray[integer[Any]]

New codes

Source code in src/pandas_openscm/index_manipulation.py
def create_new_level_and_codes_by_mapping_multiple(
    ini: pd.MultiIndex,
    levels_to_create_from: tuple[str, ...],
    mapper: Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any],
) -> tuple[pd.Index[Any], npt.NDArray[np.integer[Any]]]:
    """
    Create a new level and associated codes by mapping existing levels

    This is a thin function intended for internal use
    to handle some slightly tricky logic.

    Parameters
    ----------
    ini
        Input index

    levels_to_create_from
        Levels to create the new level from

    mapper
        Function to use to map existing levels to new levels

    Returns
    -------
    new_level :
        New level

    new_codes :
        New codes
    """
    # You could probably do some optimisation here
    # that checks for unique combinations of codes
    # for the levels we're using,
    # then only applies the mapping to those unique combos
    # to reduce the number of evaluations of mapper.
    # That feels tricky to get right, so just doing the brute force way for now.
    levels_to_drop = [v for v in ini.names if v not in levels_to_create_from]
    dup_level = ini.droplevel(levels_to_drop).map(mapper)  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused

    # Brute force: get codes from new levels
    new_level = dup_level.unique()
    new_codes = new_level.get_indexer(dup_level)

    return new_level, new_codes

ensure_index_is_multiindex #

ensure_index_is_multiindex(
    pandas_obj: P, copy: bool = True
) -> P

Ensure that the index of a pandas object is a pd.MultiIndex

Parameters:

Name Type Description Default
pandas_obj P

Object whose index we want to ensure is a pd.MultiIndex

required
copy bool

Should we copy pandas_obj before modifying the index?

True

Returns:

Type Description
P

pandas_obj with a pd.MultiIndex

If the index was already a pd.MultiIndex, this is a no-op (although the value of copy is respected).

Source code in src/pandas_openscm/index_manipulation.py
def ensure_index_is_multiindex(pandas_obj: P, copy: bool = True) -> P:
    """
    Ensure that the index of a pandas object is a [pd.MultiIndex][pandas.MultiIndex]

    Parameters
    ----------
    pandas_obj
        Object whose index we want to ensure is a [pd.MultiIndex][pandas.MultiIndex]

    copy
        Should we copy `pandas_obj` before modifying the index?

    Returns
    -------
    :
        `pandas_obj` with a [pd.MultiIndex][pandas.MultiIndex]

        If the index was already a [pd.MultiIndex][pandas.MultiIndex],
        this is a no-op (although the value of copy is respected).
    """
    if copy:
        pandas_obj = pandas_obj.copy()  # ty: ignore

    if isinstance(pandas_obj.index, pd.MultiIndex):
        return pandas_obj

    pandas_obj.index = ensure_is_multiindex(pandas_obj.index)

    return pandas_obj

ensure_is_multiindex #

ensure_is_multiindex(
    index: Index[Any] | MultiIndex,
) -> MultiIndex

Ensure that an index is a pd.MultiIndex

Parameters:

Name Type Description Default
index Index[Any] | MultiIndex

Index to check

required

Returns:

Type Description
MultiIndex

Index, cast to pd.MultiIndex if needed

Source code in src/pandas_openscm/index_manipulation.py
def ensure_is_multiindex(index: pd.Index[Any] | pd.MultiIndex) -> pd.MultiIndex:
    """
    Ensure that an index is a [pd.MultiIndex][pandas.MultiIndex]

    Parameters
    ----------
    index
        Index to check

    Returns
    -------
    :
        Index, cast to [pd.MultiIndex][pandas.MultiIndex] if needed
    """
    if isinstance(index, pd.MultiIndex):
        return index

    return pd.MultiIndex.from_arrays([index.values], names=[index.name])

set_index_levels_func #

set_index_levels_func(
    pobj: P,
    levels_to_set: dict[str, Any | Collection[Any]],
    copy: bool = True,
) -> P

Set the index levels of a pd.DataFrame

Parameters:

Name Type Description Default
pobj P

Supported pandas object to update

required
levels_to_set dict[str, Any | Collection[Any]]

Mapping of level names to values to set

required
copy bool

Should pobj be copied before returning?

True

Returns:

Type Description
P

pobj with updates applied to its index

Source code in src/pandas_openscm/index_manipulation.py
def set_index_levels_func(
    pobj: P,
    levels_to_set: dict[str, Any | Collection[Any]],
    copy: bool = True,
) -> P:
    """
    Set the index levels of a [pd.DataFrame][pandas.DataFrame]

    Parameters
    ----------
    pobj
        Supported [pandas][] object to update

    levels_to_set
        Mapping of level names to values to set

    copy
        Should `pobj` be copied before returning?

    Returns
    -------
    :
        `pobj` with updates applied to its index
    """
    if not isinstance(pobj.index, pd.MultiIndex):
        msg = (
            "This function is only intended to be used "
            "when `pobj`'s index is an instance of `MultiIndex`. "
            f"Received {type(pobj.index)=}"
        )
        raise TypeError(msg)

    if copy:
        pobj = pobj.copy()  # ty: ignore[invalid-argument-type, invalid-assignment]

    pobj.index = set_levels(pobj.index, levels_to_set=levels_to_set)  # type: ignore[arg-type] # pandas-stubs confused

    return pobj

set_levels #

set_levels(
    ini: MultiIndex,
    levels_to_set: dict[str, Any | Collection[Any]],
) -> MultiIndex

Set the levels of a MultiIndex to the provided values

Parameters:

Name Type Description Default
ini MultiIndex

Input MultiIndex

required
levels_to_set dict[str, Any | Collection[Any]]

Mapping of level names to values to set. If values is of type Collection, it must be of the same length as the MultiIndex. If it is not a Collection, it will be set to the same value for all levels.

required

Returns:

Type Description
MultiIndex

New MultiIndex with the levels set to the provided values

Raises:

Type Description
TypeError

If ini is not a MultiIndex

ValueError

If the length of the values is a collection that is not equal to the length of the index

Examples:

>>> start = pd.MultiIndex.from_tuples(
...     [
...         ("sa", "ma", "v1", "kg"),
...         ("sb", "ma", "v2", "m"),
...         ("sa", "mb", "v1", "kg"),
...         ("sa", "mb", "v2", "m"),
...     ],
...     names=["scenario", "model", "variable", "unit"],
... )
>>> start
MultiIndex([('sa', 'ma', 'v1', 'kg'),
            ('sb', 'ma', 'v2',  'm'),
            ('sa', 'mb', 'v1', 'kg'),
            ('sa', 'mb', 'v2',  'm')],
           names=['scenario', 'model', 'variable', 'unit'])
>>>
>>> # Set a new level with a single string
>>> set_levels(
...     start,
...     {"new_variable": "xyz"},
... )
MultiIndex([('sa', 'ma', 'v1', 'kg', 'xyz'),
        ('sb', 'ma', 'v2',  'm', 'xyz'),
        ('sa', 'mb', 'v1', 'kg', 'xyz'),
        ('sa', 'mb', 'v2',  'm', 'xyz')],
       names=['scenario', 'model', 'variable', 'unit', 'new_variable'])
>>>
>>> # Replace a level with a collection
>>> set_levels(
...     start,
...     {"new_variable": [1, 2, 3, 4]},
... )
MultiIndex([('sa', 'ma', 'v1', 'kg', 1),
            ('sb', 'ma', 'v2',  'm', 2),
            ('sa', 'mb', 'v1', 'kg', 3),
            ('sa', 'mb', 'v2',  'm', 4)],
           names=['scenario', 'model', 'variable', 'unit', 'new_variable'])
>>>
>>> # Replace a level with a single value and add a new level
>>> set_levels(
...     start,
...     {"model": "new_model", "new_variable": ["xyz", "xyz", "x", "y"]},
... )
MultiIndex([('sa', 'new_model', 'v1', 'kg', 'xyz'),
            ('sb', 'new_model', 'v2',  'm', 'xyz'),
            ('sa', 'new_model', 'v1', 'kg',   'x'),
            ('sa', 'new_model', 'v2',  'm',   'y')],
           names=['scenario', 'model', 'variable', 'unit', 'new_variable'])
Source code in src/pandas_openscm/index_manipulation.py
def set_levels(
    ini: pd.MultiIndex, levels_to_set: dict[str, Any | Collection[Any]]
) -> pd.MultiIndex:
    """
    Set the levels of a MultiIndex to the provided values

    Parameters
    ----------
    ini
        Input MultiIndex

    levels_to_set
        Mapping of level names to values to set. If values is of type `Collection`,
        it must be of the same length as the MultiIndex. If it is not a `Collection`,
        it will be set to the same value for all levels.

    Returns
    -------
    :
        New MultiIndex with the levels set to the provided values

    Raises
    ------
    TypeError
        If `ini` is not a MultiIndex
    ValueError
        If the length of the values is a collection that is not equal to the
        length of the index

    Examples
    --------
    >>> start = pd.MultiIndex.from_tuples(
    ...     [
    ...         ("sa", "ma", "v1", "kg"),
    ...         ("sb", "ma", "v2", "m"),
    ...         ("sa", "mb", "v1", "kg"),
    ...         ("sa", "mb", "v2", "m"),
    ...     ],
    ...     names=["scenario", "model", "variable", "unit"],
    ... )
    >>> start
    MultiIndex([('sa', 'ma', 'v1', 'kg'),
                ('sb', 'ma', 'v2',  'm'),
                ('sa', 'mb', 'v1', 'kg'),
                ('sa', 'mb', 'v2',  'm')],
               names=['scenario', 'model', 'variable', 'unit'])
    >>>
    >>> # Set a new level with a single string
    >>> set_levels(
    ...     start,
    ...     {"new_variable": "xyz"},
    ... )
    MultiIndex([('sa', 'ma', 'v1', 'kg', 'xyz'),
            ('sb', 'ma', 'v2',  'm', 'xyz'),
            ('sa', 'mb', 'v1', 'kg', 'xyz'),
            ('sa', 'mb', 'v2',  'm', 'xyz')],
           names=['scenario', 'model', 'variable', 'unit', 'new_variable'])
    >>>
    >>> # Replace a level with a collection
    >>> set_levels(
    ...     start,
    ...     {"new_variable": [1, 2, 3, 4]},
    ... )
    MultiIndex([('sa', 'ma', 'v1', 'kg', 1),
                ('sb', 'ma', 'v2',  'm', 2),
                ('sa', 'mb', 'v1', 'kg', 3),
                ('sa', 'mb', 'v2',  'm', 4)],
               names=['scenario', 'model', 'variable', 'unit', 'new_variable'])
    >>>
    >>> # Replace a level with a single value and add a new level
    >>> set_levels(
    ...     start,
    ...     {"model": "new_model", "new_variable": ["xyz", "xyz", "x", "y"]},
    ... )
    MultiIndex([('sa', 'new_model', 'v1', 'kg', 'xyz'),
                ('sb', 'new_model', 'v2',  'm', 'xyz'),
                ('sa', 'new_model', 'v1', 'kg',   'x'),
                ('sa', 'new_model', 'v2',  'm',   'y')],
               names=['scenario', 'model', 'variable', 'unit', 'new_variable'])
    """
    levels: list[pd.Index[Any]] = list(ini.levels)
    codes: list[npt.NDArray[np.integer[Any]]] = list(ini.codes)
    names: list[str] = list(ini.names)  # type: ignore[arg-type] # ty: ignore[invalid-assignment] # pandas-stubs confused

    for level, value in levels_to_set.items():
        if isinstance(value, Collection) and not isinstance(value, str):
            if len(value) != len(ini):
                msg = (
                    f"Length of values for level '{level}' does not "
                    f"match index length: {len(value)} != {len(ini)}"
                )
                raise ValueError(msg)
            new_level, new_codes = create_level_from_collection(level, value)
        else:
            new_level = pd.Index([value], name=level)
            new_codes = np.zeros(ini.shape[0], dtype=int)

        if level in ini.names:
            level_idx = ini.names.index(level)
            levels[level_idx] = new_level
            codes[level_idx] = new_codes
        else:
            levels.append(new_level)
            codes.append(new_codes)
            names.append(level)

    res = pd.MultiIndex(levels=levels, codes=codes, names=names)  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused

    return res

unify_index_levels #

unify_index_levels(
    left: MultiIndex, right: MultiIndex
) -> tuple[MultiIndex, MultiIndex]

Unify the levels on two indexes

The levels are unified by simply adding NaN to any level in either left or right that is not in the level of the other index.

This is differnt to pd.DataFrame.align. pd.DataFrame.align will fill missing values with values from the other index if it can. We don't want that here. We want any non-aligned levels to be filled with NaN.

The implementation also allows this to be performed on indexes directly (avoiding casting to a DataFrame and avoiding paying the price of aligning everything else or creating a bunch of NaN that we just drop straight away).

The indexes are returned with the levels from left first, then the levels from right.

Parameters:

Name Type Description Default
left MultiIndex

First index to unify

required
right MultiIndex

Second index to unify

required

Returns:

Name Type Description
left_aligned MultiIndex

Left after alignment

right_aligned MultiIndex

Right after alignment

Examples:

>>> import pandas as pd
>>>
>>> idx_a = pd.MultiIndex.from_tuples(
...     [
...         (1, 2, 3),
...         (4, 5, 6),
...     ],
...     names=["a", "b", "c"],
... )
>>> idx_b = pd.MultiIndex.from_tuples(
...     [
...         (7, 8),
...         (10, 11),
...     ],
...     names=["a", "b"],
... )
>>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
>>> unified_a
MultiIndex([(1, 2, 3),
            (4, 5, 6)],
           names=['a', 'b', 'c'])
>>>
>>> unified_b
MultiIndex([( 7,  8, nan),
            (10, 11, nan)],
           names=['a', 'b', 'c'])
>>>
>>> # Also fine if b has swapped levels
>>> idx_b = pd.MultiIndex.from_tuples(
...     [
...         (7, 8),
...         (10, 11),
...     ],
...     names=["b", "a"],
... )
>>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
>>> unified_a
MultiIndex([(1, 2, 3),
            (4, 5, 6)],
           names=['a', 'b', 'c'])
>>>
>>> unified_b
MultiIndex([( 8,  7, nan),
            (11, 10, nan)],
           names=['a', 'b', 'c'])
>>>
>>> # Also works if a is 'inside' b
>>> idx_a = pd.MultiIndex.from_tuples(
...     [
...         (7, 8),
...         (10, 11),
...     ],
...     names=["a", "b"],
... )
>>> idx_b = pd.MultiIndex.from_tuples(
...     [
...         (1, 2, 3),
...         (4, 5, 6),
...     ],
...     names=["a", "b", "c"],
... )
>>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
>>> unified_a
MultiIndex([( 7,  8, nan),
            (10, 11, nan)],
           names=['a', 'b', 'c'])
>>>
>>> unified_b
MultiIndex([(1, 2, 3),
            (4, 5, 6)],
           names=['a', 'b', 'c'])
>>>
>>> # But, be a bit careful, this is now sensitive to a's column order
>>> idx_a = pd.MultiIndex.from_tuples(
...     [
...         (7, 8),
...         (10, 11),
...     ],
...     names=["b", "a"],
... )
>>> idx_b = pd.MultiIndex.from_tuples(
...     [
...         (1, 2, 3),
...         (4, 5, 6),
...     ],
...     names=["a", "b", "c"],
... )
>>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
>>> # Note that the names are `['b', 'a', 'c']` in the output
>>> unified_a
MultiIndex([( 7,  8, nan),
            (10, 11, nan)],
           names=['b', 'a', 'c'])
>>>
>>> unified_b
MultiIndex([(2, 1, 3),
            (5, 4, 6)],
           names=['b', 'a', 'c'])
Source code in src/pandas_openscm/index_manipulation.py
def unify_index_levels(
    left: pd.MultiIndex, right: pd.MultiIndex
) -> tuple[pd.MultiIndex, pd.MultiIndex]:
    """
    Unify the levels on two indexes

    The levels are unified by simply adding NaN to any level in either `left` or `right`
    that is not in the level of the other index.

    This is differnt to [pd.DataFrame.align][pandas.DataFrame.align].
    [pd.DataFrame.align][pandas.DataFrame.align]
    will fill missing values with values from the other index if it can.
    We don't want that here.
    We want any non-aligned levels to be filled with NaN.

    The implementation also allows this to be performed on indexes directly
    (avoiding casting to a DataFrame
    and avoiding paying the price of aligning everything else
    or creating a bunch of NaN that we just drop straight away).

    The indexes are returned with the levels from `left` first,
    then the levels from `right`.

    Parameters
    ----------
    left
        First index to unify

    right
        Second index to unify

    Returns
    -------
    left_aligned :
        Left after alignment

    right_aligned :
        Right after alignment

    Examples
    --------
    >>> import pandas as pd
    >>>
    >>> idx_a = pd.MultiIndex.from_tuples(
    ...     [
    ...         (1, 2, 3),
    ...         (4, 5, 6),
    ...     ],
    ...     names=["a", "b", "c"],
    ... )
    >>> idx_b = pd.MultiIndex.from_tuples(
    ...     [
    ...         (7, 8),
    ...         (10, 11),
    ...     ],
    ...     names=["a", "b"],
    ... )
    >>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
    >>> unified_a
    MultiIndex([(1, 2, 3),
                (4, 5, 6)],
               names=['a', 'b', 'c'])
    >>>
    >>> unified_b
    MultiIndex([( 7,  8, nan),
                (10, 11, nan)],
               names=['a', 'b', 'c'])
    >>>
    >>> # Also fine if b has swapped levels
    >>> idx_b = pd.MultiIndex.from_tuples(
    ...     [
    ...         (7, 8),
    ...         (10, 11),
    ...     ],
    ...     names=["b", "a"],
    ... )
    >>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
    >>> unified_a
    MultiIndex([(1, 2, 3),
                (4, 5, 6)],
               names=['a', 'b', 'c'])
    >>>
    >>> unified_b
    MultiIndex([( 8,  7, nan),
                (11, 10, nan)],
               names=['a', 'b', 'c'])
    >>>
    >>> # Also works if a is 'inside' b
    >>> idx_a = pd.MultiIndex.from_tuples(
    ...     [
    ...         (7, 8),
    ...         (10, 11),
    ...     ],
    ...     names=["a", "b"],
    ... )
    >>> idx_b = pd.MultiIndex.from_tuples(
    ...     [
    ...         (1, 2, 3),
    ...         (4, 5, 6),
    ...     ],
    ...     names=["a", "b", "c"],
    ... )
    >>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
    >>> unified_a
    MultiIndex([( 7,  8, nan),
                (10, 11, nan)],
               names=['a', 'b', 'c'])
    >>>
    >>> unified_b
    MultiIndex([(1, 2, 3),
                (4, 5, 6)],
               names=['a', 'b', 'c'])
    >>>
    >>> # But, be a bit careful, this is now sensitive to a's column order
    >>> idx_a = pd.MultiIndex.from_tuples(
    ...     [
    ...         (7, 8),
    ...         (10, 11),
    ...     ],
    ...     names=["b", "a"],
    ... )
    >>> idx_b = pd.MultiIndex.from_tuples(
    ...     [
    ...         (1, 2, 3),
    ...         (4, 5, 6),
    ...     ],
    ...     names=["a", "b", "c"],
    ... )
    >>> unified_a, unified_b = unify_index_levels(idx_a, idx_b)
    >>> # Note that the names are `['b', 'a', 'c']` in the output
    >>> unified_a
    MultiIndex([( 7,  8, nan),
                (10, 11, nan)],
               names=['b', 'a', 'c'])
    >>>
    >>> unified_b
    MultiIndex([(2, 1, 3),
                (5, 4, 6)],
               names=['b', 'a', 'c'])
    """
    left_names = list(left.names)
    right_names = list(right.names)

    if left_names == right_names:
        return left, right

    if set(left_names) == set(right_names):
        return left, right.reorder_levels(left_names)

    out_names = [*left_names, *[v for v in right_names if v not in left_names]]
    left_to_add = [v for v in out_names if v not in left_names]
    right_to_add = [v for v in out_names if v not in right_names]

    left_unified = pd.MultiIndex(
        levels=[  # ty: ignore[invalid-argument-type]
            *left.levels,  # type: ignore[list-item] # pandas-stubs confused
            *[pd.Index([], dtype=right.get_level_values(c).dtype) for c in left_to_add],  # type: ignore # pandas-stubs confused
        ],
        codes=[  # ty: ignore[invalid-argument-type]
            *left.codes,  # type: ignore[list-item] # pandas-stubs confused
            *([np.full(left.shape[0], -1)] * len(left_to_add)),  # type: ignore[list-item] # pandas-stubs confused
        ],
        names=[
            *left_names,
            *left_to_add,
        ],
    ).reorder_levels(out_names)

    right_unified = pd.MultiIndex(
        levels=[  # ty: ignore[invalid-argument-type]
            *[pd.Index([], dtype=left.get_level_values(c).dtype) for c in right_to_add],  # type: ignore # pandas-stubs confused
            *right.levels,  # type: ignore[list-item] # pandas-stubs confused
        ],
        codes=[  # ty: ignore[invalid-argument-type]
            *([np.full(right.shape[0], -1)] * len(right_to_add)),  # type: ignore[list-item] # pandas-stubs confused
            *right.codes,  # type: ignore[list-item] # pandas-stubs confused
        ],
        names=[
            *right_to_add,
            *right_names,
        ],
    ).reorder_levels(out_names)

    return left_unified, right_unified

unify_index_levels_check_index_types #

unify_index_levels_check_index_types(
    left: Index[Any], right: Index[Any]
) -> tuple[MultiIndex, MultiIndex]

Unify the levels on two indexes

This is just a thin wrapper around unify_index_levels that checks the the inputs are both pd.MultiIndex before unifying the indices.

Parameters:

Name Type Description Default
left Index[Any]

First index to unify

required
right Index[Any]

Second index to unify

required

Returns:

Name Type Description
left_aligned MultiIndex

Left after alignment

right_aligned MultiIndex

Right after alignment

Source code in src/pandas_openscm/index_manipulation.py
def unify_index_levels_check_index_types(
    left: pd.Index[Any], right: pd.Index[Any]
) -> tuple[pd.MultiIndex, pd.MultiIndex]:
    """
    Unify the levels on two indexes

    This is just a thin wrapper around [unify_index_levels][(m).]
    that checks the the inputs are both [pd.MultiIndex][pandas.MultiIndex]
    before unifying the indices.

    Parameters
    ----------
    left
        First index to unify

    right
        Second index to unify

    Returns
    -------
    left_aligned :
        Left after alignment

    right_aligned :
        Right after alignment
    """
    if not isinstance(left, pd.MultiIndex):
        raise TypeError(left)

    if not isinstance(right, pd.MultiIndex):
        raise TypeError(right)

    return unify_index_levels(left, right)

update_index_from_candidates #

update_index_from_candidates(
    indf: DataFrame, candidates: Iterable[Hashable]
) -> DataFrame

Update the index of data to align with the candidate columns as much as possible

Parameters:

Name Type Description Default
indf DataFrame

Data of which to update the index

required
candidates Iterable[Hashable]

Candidate columns to use to create the updated index

required

Returns:

Type Description
DataFrame

indf with its updated index.

All columns of indf that are in candidates are used to create the index of the result.

Notes

This overwrites any existing index of indf so you will only want to use this function when you're sure that there isn't anything of interest already in the index of indf.

Source code in src/pandas_openscm/index_manipulation.py
def update_index_from_candidates(
    indf: pd.DataFrame, candidates: Iterable[Hashable]
) -> pd.DataFrame:
    """
    Update the index of data to align with the candidate columns as much as possible

    Parameters
    ----------
    indf
        Data of which to update the index

    candidates
        Candidate columns to use to create the updated index

    Returns
    -------
    :
        `indf` with its updated index.

        All columns of `indf` that are in `candidates`
        are used to create the index of the result.

    Notes
    -----
    This overwrites any existing index of `indf`
    so you will only want to use this function
    when you're sure that there isn't anything of interest
    already in the index of `indf`.
    """
    set_to_index = [v for v in candidates if v in indf.columns]
    res = indf.set_index(set_to_index)

    return res

update_index_levels_from_other_func #

update_index_levels_from_other_func(
    pobj: P,
    update_sources: dict[
        Any,
        tuple[
            Any,
            Callable[[Any], Any]
            | dict[Any, Any]
            | Series[Any],
        ]
        | tuple[
            tuple[Any, ...],
            Callable[[tuple[Any, ...]], Any]
            | dict[tuple[Any, ...], Any]
            | Series[Any],
        ],
    ],
    copy: bool = True,
    remove_unused_levels: bool = True,
) -> P

Update the index levels based on other levels of a pandas object

If the level to be updated doesn't exist, it is created.

Parameters:

Name Type Description Default
pobj P

Supported pandas object to update

required
update_sources dict[Any, tuple[Any, Callable[[Any], Any] | dict[Any, Any] | Series[Any]] | tuple[tuple[Any, ...], Callable[[tuple[Any, ...]], Any] | dict[tuple[Any, ...], Any] | Series[Any]]]

Updates to apply to pobj's index

Each key is the level to which the updates will be applied (or the level that will be created if it doesn't already exist).

There are two options for the values.

The first is used when only one level is used to update the 'target level'. In this case, each value is a tuple of which the first element is the level to use to generate the values (the 'source level') and the second is mapper of the form used by pd.Index.map which will be applied to the source level to update/create the level of interest.

Each value is a tuple of which the first element is the level or levels (if a tuple) to use to generate the values (the 'source level') and the second is mapper of the form used by pd.Index.map which will be applied to the source level to update/create the level of interest.

required
copy bool

Should pobj be copied before returning?

True
remove_unused_levels bool

Call pobj.index.remove_unused_levels before updating the levels

This avoids trying to update levels that aren't being used.

True

Returns:

Type Description
P

pobj with updates applied to its index

Source code in src/pandas_openscm/index_manipulation.py
def update_index_levels_from_other_func(
    pobj: P,
    update_sources: dict[
        Any,
        tuple[
            Any,
            Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any],
        ]
        | tuple[
            tuple[Any, ...],
            Callable[[tuple[Any, ...]], Any]
            | dict[tuple[Any, ...], Any]
            | pd.Series[Any],
        ],
    ],
    copy: bool = True,
    remove_unused_levels: bool = True,
) -> P:
    """
    Update the index levels based on other levels of a [pandas][] object

    If the level to be updated doesn't exist,
    it is created.

    Parameters
    ----------
    pobj
        Supported [pandas][] object to update

    update_sources
        Updates to apply to `pobj`'s index

        Each key is the level to which the updates will be applied
        (or the level that will be created if it doesn't already exist).

        There are two options for the values.

        The first is used when only one level is used to update the 'target level'.
        In this case, each value is a tuple of which the first element
        is the level to use to generate the values (the 'source level')
        and the second is mapper of the form used by
        [pd.Index.map][pandas.Index.map]
        which will be applied to the source level
        to update/create the level of interest.

        Each value is a tuple of which the first element
        is the level or levels (if a tuple)
        to use to generate the values (the 'source level')
        and the second is mapper of the form used by
        [pd.Index.map][pandas.Index.map]
        which will be applied to the source level
        to update/create the level of interest.

    copy
        Should `pobj` be copied before returning?

    remove_unused_levels
        Call `pobj.index.remove_unused_levels` before updating the levels

        This avoids trying to update levels that aren't being used.

    Returns
    -------
    :
        `pobj` with updates applied to its index
    """
    if copy:
        pobj = pobj.copy()  # ty: ignore[invalid-argument-type, invalid-assignment]

    if not isinstance(pobj.index, pd.MultiIndex):
        msg = (
            "This function is only intended to be used "
            "when `pobj`'s index is an instance of `MultiIndex`. "
            f"Received {type(pobj.index)=}"
        )
        raise TypeError(msg)

    pobj.index = update_levels_from_other(
        pobj.index,
        update_sources=update_sources,
        remove_unused_levels=remove_unused_levels,
    )

    return pobj

update_index_levels_func #

update_index_levels_func(
    pobj: P,
    updates: Mapping[
        Any,
        Callable[[Any], Any] | dict[Any, Any] | Series[Any],
    ],
    copy: bool = True,
    remove_unused_levels: bool = True,
) -> P

Update the index levels of a pandas object

Parameters:

Name Type Description Default
pobj P

Supported pandas object to update

required
updates Mapping[Any, Callable[[Any], Any] | dict[Any, Any] | Series[Any]]

Updates to apply to pobj's index

Each key is the index level to which the updates will be applied. Each value is a function which updates the levels to their new values.

required
copy bool

Should pobj be copied before returning?

True
remove_unused_levels bool

Call pobj.index.remove_unused_levels before updating the levels

This avoids trying to update levels that aren't being used.

True

Returns:

Type Description
P

pobj with updates applied to its index

Source code in src/pandas_openscm/index_manipulation.py
def update_index_levels_func(
    pobj: P,
    updates: Mapping[Any, Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any]],
    copy: bool = True,
    remove_unused_levels: bool = True,
) -> P:
    """
    Update the index levels of a [pandas][] object

    Parameters
    ----------
    pobj
        Supported [pandas][] object to update

    updates
        Updates to apply to `pobj`'s index

        Each key is the index level to which the updates will be applied.
        Each value is a function which updates the levels to their new values.

    copy
        Should `pobj` be copied before returning?

    remove_unused_levels
        Call `pobj.index.remove_unused_levels` before updating the levels

        This avoids trying to update levels that aren't being used.

    Returns
    -------
    :
        `pobj` with updates applied to its index
    """
    if copy:
        pobj = pobj.copy()  # ty: ignore[invalid-argument-type, invalid-assignment]

    if not isinstance(pobj.index, pd.MultiIndex):
        msg = (
            "This function is only intended to be used "
            "when `pobj`'s index is an instance of `MultiIndex`. "
            f"Received {type(pobj.index)=}"
        )
        raise TypeError(msg)

    pobj.index = update_levels(
        pobj.index, updates=updates, remove_unused_levels=remove_unused_levels
    )

    return pobj

update_levels #

update_levels(
    ini: MultiIndex,
    updates: Mapping[
        Any,
        Callable[[Any], Any] | dict[Any, Any] | Series[Any],
    ],
    remove_unused_levels: bool = True,
) -> MultiIndex

Update the levels of a pd.MultiIndex

Parameters:

Name Type Description Default
ini MultiIndex

Input index

required
updates Mapping[Any, Callable[[Any], Any] | dict[Any, Any] | Series[Any]]

Updates to apply

Each key is the level to which the updates will be applied. Each value is a mapper of the form used by pd.Index.map.

required
remove_unused_levels bool

Call ini.remove_unused_levels before updating the levels

This avoids trying to update levels that aren't being used.

True

Returns:

Type Description
MultiIndex

ini with updates applied

Raises:

Type Description
KeyError

A level in updates is not a level in ini

Examples:

>>> start = pd.MultiIndex.from_tuples(
...     [
...         ("sa", "ma", "v1", "kg"),
...         ("sb", "ma", "v2", "m"),
...         ("sa", "mb", "v1", "kg"),
...         ("sa", "mb", "v2", "m"),
...     ],
...     names=["scenario", "model", "variable", "unit"],
... )
>>> start
MultiIndex([('sa', 'ma', 'v1', 'kg'),
            ('sb', 'ma', 'v2',  'm'),
            ('sa', 'mb', 'v1', 'kg'),
            ('sa', 'mb', 'v2',  'm')],
           names=['scenario', 'model', 'variable', 'unit'])
>>>
>>> update_levels(
...     start,
...     {"model": lambda x: f"model {x}", "scenario": lambda x: f"scenario {x}"},
... )
MultiIndex([('scenario sa', 'model ma', 'v1', 'kg'),
            ('scenario sb', 'model ma', 'v2',  'm'),
            ('scenario sa', 'model mb', 'v1', 'kg'),
            ('scenario sa', 'model mb', 'v2',  'm')],
           names=['scenario', 'model', 'variable', 'unit'])
>>>
>>> update_levels(
...     start,
...     {"variable": {"v1": "variable one", "v2": "variable two"}},
... )
MultiIndex([('sa', 'ma', 'variable one', 'kg'),
            ('sb', 'ma', 'variable two',  'm'),
            ('sa', 'mb', 'variable one', 'kg'),
            ('sa', 'mb', 'variable two',  'm')],
           names=['scenario', 'model', 'variable', 'unit'])
Source code in src/pandas_openscm/index_manipulation.py
def update_levels(
    ini: pd.MultiIndex,
    updates: Mapping[Any, Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any]],
    remove_unused_levels: bool = True,
) -> pd.MultiIndex:
    """
    Update the levels of a [pd.MultiIndex][pandas.MultiIndex]

    Parameters
    ----------
    ini
        Input index

    updates
        Updates to apply

        Each key is the level to which the updates will be applied.
        Each value is a mapper of the form used by
        [pd.Index.map][pandas.Index.map].

    remove_unused_levels
        Call `ini.remove_unused_levels` before updating the levels

        This avoids trying to update levels that aren't being used.

    Returns
    -------
    :
        `ini` with updates applied

    Raises
    ------
    KeyError
        A level in `updates` is not a level in `ini`

    Examples
    --------
    >>> start = pd.MultiIndex.from_tuples(
    ...     [
    ...         ("sa", "ma", "v1", "kg"),
    ...         ("sb", "ma", "v2", "m"),
    ...         ("sa", "mb", "v1", "kg"),
    ...         ("sa", "mb", "v2", "m"),
    ...     ],
    ...     names=["scenario", "model", "variable", "unit"],
    ... )
    >>> start
    MultiIndex([('sa', 'ma', 'v1', 'kg'),
                ('sb', 'ma', 'v2',  'm'),
                ('sa', 'mb', 'v1', 'kg'),
                ('sa', 'mb', 'v2',  'm')],
               names=['scenario', 'model', 'variable', 'unit'])
    >>>
    >>> update_levels(
    ...     start,
    ...     {"model": lambda x: f"model {x}", "scenario": lambda x: f"scenario {x}"},
    ... )
    MultiIndex([('scenario sa', 'model ma', 'v1', 'kg'),
                ('scenario sb', 'model ma', 'v2',  'm'),
                ('scenario sa', 'model mb', 'v1', 'kg'),
                ('scenario sa', 'model mb', 'v2',  'm')],
               names=['scenario', 'model', 'variable', 'unit'])
    >>>
    >>> update_levels(
    ...     start,
    ...     {"variable": {"v1": "variable one", "v2": "variable two"}},
    ... )
    MultiIndex([('sa', 'ma', 'variable one', 'kg'),
                ('sb', 'ma', 'variable two',  'm'),
                ('sa', 'mb', 'variable one', 'kg'),
                ('sa', 'mb', 'variable two',  'm')],
               names=['scenario', 'model', 'variable', 'unit'])
    """
    if remove_unused_levels:
        ini = ini.remove_unused_levels()

    levels: list[pd.Index[Any]] = list(ini.levels)
    codes: list[npt.NDArray[np.integer[Any]]] = list(ini.codes)

    for level, updater in updates.items():
        if level not in ini.names:
            msg = (
                f"{level} is not available in the index. Available levels: {ini.names}"
            )
            raise KeyError(msg)

        new_level, new_codes = create_new_level_and_codes_by_mapping(
            ini=ini,
            level_to_create_from=level,
            mapper=updater,
        )

        level_idx = ini.names.index(level)
        levels[level_idx] = new_level
        codes[level_idx] = new_codes

    res = pd.MultiIndex(
        levels=levels,  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused
        codes=codes,  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused
        names=ini.names,
    )

    return res

update_levels_from_other #

update_levels_from_other(
    ini: MultiIndex,
    update_sources: dict[
        Any,
        tuple[
            Any,
            Callable[[Any], Any]
            | dict[Any, Any]
            | Series[Any],
        ]
        | tuple[
            tuple[Any, ...],
            Callable[[tuple[Any, ...]], Any]
            | dict[tuple[Any, ...], Any]
            | Series[Any],
        ],
    ],
    remove_unused_levels: bool = True,
) -> MultiIndex

Update levels based on other levels in a pd.MultiIndex

If the level to be updated doesn't exist, it is created.

Parameters:

Name Type Description Default
ini MultiIndex

Input index

required
update_sources dict[Any, tuple[Any, Callable[[Any], Any] | dict[Any, Any] | Series[Any]] | tuple[tuple[Any, ...], Callable[[tuple[Any, ...]], Any] | dict[tuple[Any, ...], Any] | Series[Any]]]

Updates to apply and their source levels

Each key is the level to which the updates will be applied (or the level that will be created if it doesn't already exist).

There are two options for the values.

The first is used when only one level is used to update the 'target level'. In this case, each value is a tuple of which the first element is the level to use to generate the values (the 'source level') and the second is mapper of the form used by pd.Index.map which will be applied to the source level to update/create the level of interest.

Each value is a tuple of which the first element is the level or levels (if a tuple) to use to generate the values (the 'source level') and the second is mapper of the form used by pd.Index.map which will be applied to the source level to update/create the level of interest.

required
remove_unused_levels bool

Call ini.remove_unused_levels before updating the levels

This avoids trying to update based on levels that aren't being used.

True

Returns:

Type Description
MultiIndex

ini with updates applied

Raises:

Type Description
KeyError

A source level in update_sources is not a level in ini

Examples:

>>> start = pd.MultiIndex.from_tuples(
...     [
...         ("sa", "ma", "v1", "kg"),
...         ("sb", "ma", "v2", "m"),
...         ("sa", "mb", "v1", "kg"),
...         ("sa", "mb", "v2", "m"),
...     ],
...     names=["scenario", "model", "variable", "unit"],
... )
>>> start
MultiIndex([('sa', 'ma', 'v1', 'kg'),
            ('sb', 'ma', 'v2',  'm'),
            ('sa', 'mb', 'v1', 'kg'),
            ('sa', 'mb', 'v2',  'm')],
           names=['scenario', 'model', 'variable', 'unit'])
>>>
>>> # Create a new level based on an existing level
>>> update_levels_from_other(
...     start,
...     {
...         "unit squared": ("unit", lambda x: f"{x}**2"),
...         "class": ("model", {"ma": "delta", "mb": "gamma"}),
...     },
... )
MultiIndex([('sa', 'ma', 'v1', 'kg', 'kg**2', 'delta'),
            ('sb', 'ma', 'v2',  'm',  'm**2', 'delta'),
            ('sa', 'mb', 'v1', 'kg', 'kg**2', 'gamma'),
            ('sa', 'mb', 'v2',  'm',  'm**2', 'gamma')],
           names=['scenario', 'model', 'variable', 'unit', 'unit squared', 'class'])
>>>
>>> # Update an existing level based on another level
>>> update_levels_from_other(
...     start,
...     {
...         "unit": ("variable", {"v1": "g", "v2": "km"}),
...         "model": ("scenario", lambda x: f"model {x}"),
...     },
... )
MultiIndex([('sa', 'model sa', 'v1',  'g'),
            ('sb', 'model sb', 'v2', 'km'),
            ('sa', 'model sa', 'v1',  'g'),
            ('sa', 'model sa', 'v2', 'km')],
           names=['scenario', 'model', 'variable', 'unit'])
>>>
>>> # Create a new level based on multiple existing levels
>>> update_levels_from_other(
...     start,
...     {
...         "model || scenario": (("model", "scenario"), lambda x: " || ".join(x)),
...     },
... )
MultiIndex([('sa', 'ma', 'v1', 'kg', 'sa || ma'),
            ('sb', 'ma', 'v2',  'm', 'sb || ma'),
            ('sa', 'mb', 'v1', 'kg', 'sa || mb'),
            ('sa', 'mb', 'v2',  'm', 'sa || mb')],
           names=['scenario', 'model', 'variable', 'unit', 'model || scenario'])
>>>
>>> # Both at the same time
>>> update_levels_from_other(
...     start,
...     {
...         "title": ("scenario", lambda x: x.capitalize()),
...         "unit": ("unit", {"v1": "g", "v2": "km"}),
...     },
... )
MultiIndex([('sa', 'ma', 'v1', nan, 'Sa'),
            ('sb', 'ma', 'v2', nan, 'Sb'),
            ('sa', 'mb', 'v1', nan, 'Sa'),
            ('sa', 'mb', 'v2', nan, 'Sa')],
           names=['scenario', 'model', 'variable', 'unit', 'title'])
>>>
>>> # Setting with a range of different methods
>>> update_levels_from_other(
...     start,
...     {
...         # callable
...         "y-label": (("variable", "unit"), lambda x: f"{x[0]} ({x[1]})"),
...         # dict
...         "title": ("scenario", {"sa": "Scenario A", "sb": "Delta"}),
...         # pd.Series
...         "Source": (
...             "model",
...             pd.Series(["Internal", "External"], index=["ma", "mb"]),
...         ),
...     },
... )
MultiIndex([('sa', 'ma', 'v1', 'kg', 'v1 (kg)', 'Scenario A', 'Internal'),
            ('sb', 'ma', 'v2',  'm',  'v2 (m)',      'Delta', 'Internal'),
            ('sa', 'mb', 'v1', 'kg', 'v1 (kg)', 'Scenario A', 'External'),
            ('sa', 'mb', 'v2',  'm',  'v2 (m)', 'Scenario A', 'External')],
           names=['scenario', 'model', 'variable', 'unit', 'y-label', 'title', 'Source'])
Source code in src/pandas_openscm/index_manipulation.py
def update_levels_from_other(
    ini: pd.MultiIndex,
    update_sources: dict[
        Any,
        tuple[
            Any,
            Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any],
        ]
        | tuple[
            tuple[Any, ...],
            Callable[[tuple[Any, ...]], Any]
            | dict[tuple[Any, ...], Any]
            | pd.Series[Any],
        ],
    ],
    remove_unused_levels: bool = True,
) -> pd.MultiIndex:
    """
    Update levels based on other levels in a [pd.MultiIndex][pandas.MultiIndex]

    If the level to be updated doesn't exist,
    it is created.

    Parameters
    ----------
    ini
        Input index

    update_sources
        Updates to apply and their source levels

        Each key is the level to which the updates will be applied
        (or the level that will be created if it doesn't already exist).

        There are two options for the values.

        The first is used when only one level is used to update the 'target level'.
        In this case, each value is a tuple of which the first element
        is the level to use to generate the values (the 'source level')
        and the second is mapper of the form used by
        [pd.Index.map][pandas.Index.map]
        which will be applied to the source level
        to update/create the level of interest.

        Each value is a tuple of which the first element
        is the level or levels (if a tuple)
        to use to generate the values (the 'source level')
        and the second is mapper of the form used by
        [pd.Index.map][pandas.Index.map]
        which will be applied to the source level
        to update/create the level of interest.

    remove_unused_levels
        Call `ini.remove_unused_levels` before updating the levels

        This avoids trying to update based on levels that aren't being used.

    Returns
    -------
    :
        `ini` with updates applied

    Raises
    ------
    KeyError
        A source level in `update_sources` is not a level in `ini`

    Examples
    --------
    >>> start = pd.MultiIndex.from_tuples(
    ...     [
    ...         ("sa", "ma", "v1", "kg"),
    ...         ("sb", "ma", "v2", "m"),
    ...         ("sa", "mb", "v1", "kg"),
    ...         ("sa", "mb", "v2", "m"),
    ...     ],
    ...     names=["scenario", "model", "variable", "unit"],
    ... )
    >>> start
    MultiIndex([('sa', 'ma', 'v1', 'kg'),
                ('sb', 'ma', 'v2',  'm'),
                ('sa', 'mb', 'v1', 'kg'),
                ('sa', 'mb', 'v2',  'm')],
               names=['scenario', 'model', 'variable', 'unit'])
    >>>
    >>> # Create a new level based on an existing level
    >>> update_levels_from_other(
    ...     start,
    ...     {
    ...         "unit squared": ("unit", lambda x: f"{x}**2"),
    ...         "class": ("model", {"ma": "delta", "mb": "gamma"}),
    ...     },
    ... )
    MultiIndex([('sa', 'ma', 'v1', 'kg', 'kg**2', 'delta'),
                ('sb', 'ma', 'v2',  'm',  'm**2', 'delta'),
                ('sa', 'mb', 'v1', 'kg', 'kg**2', 'gamma'),
                ('sa', 'mb', 'v2',  'm',  'm**2', 'gamma')],
               names=['scenario', 'model', 'variable', 'unit', 'unit squared', 'class'])
    >>>
    >>> # Update an existing level based on another level
    >>> update_levels_from_other(
    ...     start,
    ...     {
    ...         "unit": ("variable", {"v1": "g", "v2": "km"}),
    ...         "model": ("scenario", lambda x: f"model {x}"),
    ...     },
    ... )
    MultiIndex([('sa', 'model sa', 'v1',  'g'),
                ('sb', 'model sb', 'v2', 'km'),
                ('sa', 'model sa', 'v1',  'g'),
                ('sa', 'model sa', 'v2', 'km')],
               names=['scenario', 'model', 'variable', 'unit'])
    >>>
    >>> # Create a new level based on multiple existing levels
    >>> update_levels_from_other(
    ...     start,
    ...     {
    ...         "model || scenario": (("model", "scenario"), lambda x: " || ".join(x)),
    ...     },
    ... )
    MultiIndex([('sa', 'ma', 'v1', 'kg', 'sa || ma'),
                ('sb', 'ma', 'v2',  'm', 'sb || ma'),
                ('sa', 'mb', 'v1', 'kg', 'sa || mb'),
                ('sa', 'mb', 'v2',  'm', 'sa || mb')],
               names=['scenario', 'model', 'variable', 'unit', 'model || scenario'])
    >>>
    >>> # Both at the same time
    >>> update_levels_from_other(
    ...     start,
    ...     {
    ...         "title": ("scenario", lambda x: x.capitalize()),
    ...         "unit": ("unit", {"v1": "g", "v2": "km"}),
    ...     },
    ... )
    MultiIndex([('sa', 'ma', 'v1', nan, 'Sa'),
                ('sb', 'ma', 'v2', nan, 'Sb'),
                ('sa', 'mb', 'v1', nan, 'Sa'),
                ('sa', 'mb', 'v2', nan, 'Sa')],
               names=['scenario', 'model', 'variable', 'unit', 'title'])
    >>>
    >>> # Setting with a range of different methods
    >>> update_levels_from_other(
    ...     start,
    ...     {
    ...         # callable
    ...         "y-label": (("variable", "unit"), lambda x: f"{x[0]} ({x[1]})"),
    ...         # dict
    ...         "title": ("scenario", {"sa": "Scenario A", "sb": "Delta"}),
    ...         # pd.Series
    ...         "Source": (
    ...             "model",
    ...             pd.Series(["Internal", "External"], index=["ma", "mb"]),
    ...         ),
    ...     },
    ... )
    MultiIndex([('sa', 'ma', 'v1', 'kg', 'v1 (kg)', 'Scenario A', 'Internal'),
                ('sb', 'ma', 'v2',  'm',  'v2 (m)',      'Delta', 'Internal'),
                ('sa', 'mb', 'v1', 'kg', 'v1 (kg)', 'Scenario A', 'External'),
                ('sa', 'mb', 'v2',  'm',  'v2 (m)', 'Scenario A', 'External')],
               names=['scenario', 'model', 'variable', 'unit', 'y-label', 'title', 'Source'])
    """  # noqa: E501
    if remove_unused_levels:
        ini = ini.remove_unused_levels()

    levels: list[pd.Index[Any]] = list(ini.levels)
    codes: list[npt.NDArray[np.integer[Any]]] = list(ini.codes)
    names: list[str] = list(ini.names)  # type: ignore[arg-type] # ty: ignore[invalid-assignment] # pandas-stubs confused

    for level, (source, updater) in update_sources.items():
        if isinstance(source, tuple):
            missing_levels = set(source) - set(ini.names)
            if missing_levels:
                conj = "is" if len(missing_levels) == 1 else "are"
                msg = (
                    f"{sorted(missing_levels)} {conj} not available in the index. "
                    f"Available levels: {ini.names}"
                )
                raise KeyError(msg)

            new_level, new_codes = create_new_level_and_codes_by_mapping_multiple(
                ini=ini,
                levels_to_create_from=source,
                mapper=updater,
            )

        else:
            if source not in ini.names:
                msg = (
                    f"{source} is not available in the index. "
                    f"Available levels: {ini.names}"
                )
                raise KeyError(msg)

            new_level, new_codes = create_new_level_and_codes_by_mapping(
                ini=ini,
                level_to_create_from=source,
                mapper=updater,
            )

        if level in ini.names:
            level_idx = ini.names.index(level)
            levels[level_idx] = new_level
            codes[level_idx] = new_codes

        else:
            levels.append(new_level)
            codes.append(new_codes)
            names.append(level)

    res = pd.MultiIndex(levels=levels, codes=codes, names=names)  # type: ignore[arg-type] # ty: ignore[invalid-argument-type] # pandas-stubs confused

    return res