pandas_openscm.indexing#
Helpers for working with pandas
Really these should either go into pandas_indexing or pandas long-term, but they're ok here for now.
Functions:
| Name | Description |
|---|---|
index_name_aware_lookup |
Perform a look up with an index, being aware of the index's name. |
index_name_aware_match |
Perform a match with an index, being aware of the index's name. |
mi_loc |
Select data, being slightly smarter than the default pandas.DataFrame.loc. |
multi_index_lookup |
Perform a multi-index look up |
multi_index_match |
Perform a multi-index match |
index_name_aware_lookup #
Perform a look up with an index, being aware of the index's name.
For the problem this is solving, see index_name_aware_match.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pandas_obj
|
P
|
Pandas object in which to find matches |
required |
locator
|
Index[Any]
|
Locator to use for finding matches |
required |
Returns:
| Type | Description |
|---|---|
P
|
Rows of |
Examples:
>>> import numpy as np
>>> import pandas as pd
>>>
>>> base = pd.DataFrame(
... data=np.arange(8).reshape((4, 2)),
... columns=[2000, 2020],
... index=pd.MultiIndex.from_tuples(
... (
... ("ma", "sa", 1),
... ("ma", "sb", 2),
... ("mb", "sa", 4),
... ("mb", "sb", 3),
... ),
... names=["model", "scenario", "id"],
... ),
... )
>>>
>>> # A locator that lines up with the third level only
>>> loc = pd.Index([1, 3], name="id")
>>> index_name_aware_lookup(base, loc)
2000 2020
model scenario id
ma sa 1 0 1
mb sb 3 6 7
Source code in src/pandas_openscm/indexing.py
index_name_aware_match #
index_name_aware_match(
idx: MultiIndex, locator: Index[Any]
) -> NDArray[bool]
Perform a match with an index, being aware of the index's name.
This works, even if the index being looked up is not the first index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
idx
|
MultiIndex
|
Index in which to find matches |
required |
locator
|
Index[Any]
|
Locator to use for finding matches |
required |
Returns:
| Type | Description |
|---|---|
NDArray[bool]
|
Location of the rows in |
Examples:
>>> import pandas as pd
>>>
>>> base = pd.MultiIndex.from_tuples(
... (
... ("ma", "sa", 1),
... ("ma", "sb", 2),
... ("mb", "sa", 1),
... ("mb", "sb", 3),
... ),
... names=["model", "scenario", "id"],
... )
>>>
>>> # A locator that lines up with the third level only
>>> loc = pd.Index([1, 3], name="id")
>>> index_name_aware_match(base, loc)
array([ True, False, True, True])
Source code in src/pandas_openscm/indexing.py
mi_loc #
mi_loc(
pandas_obj: P,
locator: Index[Any] | MultiIndex | Selector,
) -> P
Select data, being slightly smarter than the default pandas.DataFrame.loc.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pandas_obj
|
P
|
Pandas object on which to do the |
required |
locator
|
Index[Any] | MultiIndex | Selector
|
Locator to apply If this is a multi-index, we use multi_index_lookup to ensure correct alignment. If this is an index that has a name, we use the name to ensure correct alignment. |
required |
Returns:
| Type | Description |
|---|---|
P
|
Selected data |
Notes
If you have pandas_indexing installed, you can get the same (perhaps even better) functionality using something like the following instead
Source code in src/pandas_openscm/indexing.py
multi_index_lookup #
multi_index_lookup(
pandas_obj: P, locator: MultiIndex
) -> P
Perform a multi-index look up
For the problem this is solving, see multi_index_match.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pandas_obj
|
P
|
Pandas object in which to find matches |
required |
locator
|
MultiIndex
|
Locator to use for finding matches |
required |
Returns:
| Type | Description |
|---|---|
P
|
Rows of |
Examples:
>>> import numpy as np
>>> import pandas as pd
>>>
>>> base = pd.DataFrame(
... data=np.arange(8).reshape((4, 2)),
... columns=[2000, 2020],
... index=pd.MultiIndex.from_tuples(
... (
... ("ma", "sa", 1),
... ("ma", "sb", 2),
... ("mb", "sa", 4),
... ("mb", "sb", 3),
... ),
... names=["model", "scenario", "id"],
... ),
... )
>>>
>>> # A locator that lines up with the second and third level only
>>> loc_first_level = pd.MultiIndex.from_tuples(
... (
... ("sa", 1),
... ("sb", 3),
... ),
... names=["scenario", "id"],
... )
>>> multi_index_lookup(base, loc_first_level)
2000 2020
model scenario id
ma sa 1 0 1
mb sb 3 6 7
Source code in src/pandas_openscm/indexing.py
multi_index_match #
multi_index_match(
idx: MultiIndex, locator: MultiIndex
) -> NDArray[bool]
Perform a multi-index match
This works, even if the levels of the locator are not the same as the levels of the index in which to match.
Arguably, this should be moved to pandas_indexing or pandas. Relevant issues:
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
idx
|
MultiIndex
|
Index in which to find matches |
required |
locator
|
MultiIndex
|
Locator to use for finding matches |
required |
Returns:
| Type | Description |
|---|---|
NDArray[bool]
|
Location of the rows in |
Raises:
| Type | Description |
|---|---|
KeyError
|
|
Examples:
>>> import pandas as pd
>>> base = pd.MultiIndex.from_tuples(
... (
... ("ma", "sa", 1),
... ("ma", "sb", 2),
... ("mb", "sa", 1),
... ("mb", "sb", 3),
... ),
... names=["model", "scenario", "id"],
... )
>>>
>>> # A locator that lines up with the multi-index levels exactly
>>> loc_simple = pd.MultiIndex.from_tuples(
... (
... ("ma", "sa", 1),
... ("mb", "sa", 1),
... ),
... names=["model", "scenario", "id"],
... )
>>> multi_index_match(base, loc_simple)
array([ True, False, True, False])
>>>
>>> # A locator that lines up with the first level only
>>> loc_first_level = pd.MultiIndex.from_tuples(
... (("ma",),),
... names=["model"],
... )
>>> multi_index_match(base, loc_first_level)
array([ True, True, False, False])
>>>
>>> # A locator that lines up with the second level only
>>> loc_first_level = pd.MultiIndex.from_tuples(
... (("sa",),),
... names=["scenario"],
... )
>>> multi_index_match(base, loc_first_level)
array([ True, False, True, False])
>>>
>>> # A locator that lines up with the second and third level only
>>> loc_first_level = pd.MultiIndex.from_tuples(
... (("sb", 3),),
... names=["scenario", "id"],
... )
>>> multi_index_match(base, loc_first_level)
array([False, False, False, True])
Source code in src/pandas_openscm/indexing.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |