Skip to content

pandas_openscm.accessors.dataframe#

Accessor for pd.DataFrame

Classes:

Name Description
PandasDataFrameOpenSCMAccessor

pd.DataFrame accessor

PandasDataFrameOpenSCMAccessor #

pd.DataFrame accessor

For details, see pandas' docs.

Methods:

Name Description
__init__

Initialise

convert_unit

Convert units

convert_unit_like

Convert units to match another pd.DataFrame

eiim

Ensure that the index is a pd.MultiIndex

ensure_index_is_multiindex

Ensure that the index is a pd.MultiIndex

fix_index_name_after_groupby_quantile

Fix the index name after performing a groupby(...).quantile(...) operation

groupby_except

Group by all index levels except specified levels

mi_loc

Select data, being slightly smarter than the default pandas.DataFrame.loc.

plot_plume

Plot a plume plot

plot_plume_after_calculating_quantiles

Plot a plume plot, calculating the required quantiles first

set_index_levels

Set the index levels

to_category_index

Convert the index's values to categories

to_long_data

Convert to long data

update_index_levels

Update the index levels

update_index_levels_from_other

Update the index levels based on other index levels

Source code in src/pandas_openscm/accessors/dataframe.py
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
class PandasDataFrameOpenSCMAccessor:
    """
    [pd.DataFrame][pandas.DataFrame] accessor

    For details, see
    [pandas' docs](https://pandas.pydata.org/docs/development/extending.html#registering-custom-accessors).
    """

    def __init__(self, df: pd.DataFrame):
        """
        Initialise

        Parameters
        ----------
        df
            [pd.DataFrame][pandas.DataFrame] to use via the accessor
        """
        # It is possible to validate here.
        # However, it's probably better to do validation closer to the data use.
        self._df = df

    def convert_unit(
        self,
        desired_units: str | Mapping[str, str] | pd.Series[str],
        unit_level: str = "unit",
        ur: pint.UnitRegistry | None = None,
    ) -> pd.DataFrame:
        """
        Convert units

        This uses [convert_unit_from_target_series][pandas_openscm.unit_conversion.].
        If you want to understand the details of how the conversion works,
        see that function's docstring.

        Parameters
        ----------
        desired_units
            Desired unit(s) for `df`

            If this is a string,
            we attempt to convert all timeseries to the given unit.

            If this is a mapping,
            we convert the given units to the target units.
            Be careful using this form - you need to be certain of the units.
            If any of your keys don't match the existing units
            (even by a single whitespace character)
            then the unit conversion will not happen.

            If this is a [pd.Series][pandas.Series],
            then it will be passed to
            [convert_unit_from_target_series][pandas_openscm.unit_conversion.]
            after filling any rows in the [pd.DataFrame][pandas.DataFrame]
            that are not in `desired_units`
            with the existing unit (i.e. unspecified rows are not converted).

            For further details, see the examples
            in [convert_unit][pandas_openscm.unit_conversion.].

        unit_level
            Level in the index which holds unit information

            Passed to
            [convert_unit_from_target_series][pandas_openscm.unit_conversion.].

        ur
            Unit registry to use for the conversion.

            Passed to
            [convert_unit_from_target_series][pandas_openscm.unit_conversion.].

        Returns
        -------
        :
            Data with converted units
        """
        return convert_unit(
            self._df, desired_units=desired_units, unit_level=unit_level, ur=ur
        )

    def convert_unit_like(
        self,
        target: pd.DataFrame | pd.Series[Any],
        unit_level: str = "unit",
        target_unit_level: str | None = None,
        ur: pint.UnitRegistry | None = None,
    ) -> pd.DataFrame:
        """
        Convert units to match another [pd.DataFrame][pandas.DataFrame]

        For further details, see the examples
        in [convert_unit_like][pandas_openscm.unit_conversion.].

        This is essentially a helper for
        [convert_unit_from_target_series][pandas_openscm.unit_conversion.].
        It implements one set of logic for extracting desired units
        and tries to be clever, handling differences in index levels
        between the data and `target` sensibly wherever possible.

        If you want behaviour other than what is implemented here,
        use [convert_unit_from_target_series][pandas_openscm.unit_conversion.] directly.

        Parameters
        ----------
        target
            Supported [pandas][] object whose units should be matched

        unit_level
            Level in the data's index which holds unit information

        target_unit_level
            Level in `target`'s index which holds unit information

            If not supplied, we use `df_unit_level`.

        ur
            Unit registry to use for the conversion.

            Passed to
            [convert_unit_from_target_series][pandas_openscm.unit_conversion.].

        Returns
        -------
        :
            Data with converted units
        """
        return convert_unit_like(
            self._df,
            target=target,
            unit_level=unit_level,
            target_unit_level=target_unit_level,
            ur=ur,
        )

    def ensure_index_is_multiindex(self, copy: bool = True) -> pd.DataFrame:
        """
        Ensure that the index is a [pd.MultiIndex][pandas.MultiIndex]

        Parameters
        ----------
        copy
            Whether to copy `df` before manipulating the index name

        Returns
        -------
        :
            `df` with a [pd.MultiIndex][pandas.MultiIndex]

            If the index was already a [pd.MultiIndex][pandas.MultiIndex],
            this is a no-op (although the value of copy is respected).
        """
        return ensure_index_is_multiindex(self._df, copy=copy)

    def eiim(self, copy: bool = True) -> pd.DataFrame:
        """
        Ensure that the index is a [pd.MultiIndex][pandas.MultiIndex]

        Alias for [ensure_index_is_multiindex][pandas_openscm.index_manipulation.]

        Parameters
        ----------
        copy
            Whether to copy `df` before manipulating the index name

        Returns
        -------
        :
            `df` with a [pd.MultiIndex][pandas.MultiIndex]

            If the index was already a [pd.MultiIndex][pandas.MultiIndex],
            this is a no-op (although the value of copy is respected).
        """
        return self.ensure_index_is_multiindex(copy=copy)

    def fix_index_name_after_groupby_quantile(
        self, new_name: str = "quantile", copy: bool = False
    ) -> pd.DataFrame:
        """
        Fix the index name after performing a `groupby(...).quantile(...)` operation

        By default, pandas doesn't assign a name to the quantile level
        when doing an operation of the form given above.
        This fixes this, but it does assume
        that the quantile level is the only unnamed level in the index.

        Parameters
        ----------
        new_name
            New name to give to the quantile column

        copy
            Whether to copy `df` before manipulating the index name

        Returns
        -------
        :
            `df`, with the last level in its index renamed to `new_name`.
        """
        return fix_index_name_after_groupby_quantile(
            self._df, new_name=new_name, copy=copy
        )

    def groupby_except(
        self, non_groupers: str | list[str], observed: bool = True
    ) -> pandas.core.groupby.generic.DataFrameGroupBy[Any, Any]:
        """
        Group by all index levels except specified levels

        This is the inverse of [pd.DataFrame.groupby][pandas.DataFrame.groupby].

        Parameters
        ----------
        non_groupers
            Columns to exclude from the grouping

        observed
            Whether to only return observed combinations or not

        Returns
        -------
        :
            The [pd.DataFrame][pandas.DataFrame],
            grouped by all columns except `non_groupers`.
        """
        return groupby_except(self._df, non_groupers=non_groupers, observed=observed)

    def mi_loc(
        self,
        locator: pd.Index[Any] | pd.MultiIndex | pix.selectors.Selector,
    ) -> pd.DataFrame:
        """
        Select data, being slightly smarter than the default [pandas.DataFrame.loc][].

        Parameters
        ----------
        locator
            Locator to apply

            If this is a multi-index, we use
            [multi_index_lookup][pandas_openscm.indexing.] to ensure correct alignment.

            If this is an index that has a name,
            we use the name to ensure correct alignment.

        Returns
        -------
        :
            Selected data

        Notes
        -----
        If you have [pandas_indexing][] installed,
        you can get the same (perhaps even better) functionality
        using something like the following instead

        ```python
        ...
        pandas_obj.loc[pandas_indexing.isin(locator)]
        ...
        ```
        """
        return mi_loc(self._df, locator)

    def plot_plume(  # noqa: PLR0913
        self,
        quantiles_plumes: QUANTILES_PLUMES_LIKE,
        ax: matplotlib.axes.Axes | None = None,
        *,
        quantile_var: str = "quantile",
        quantile_var_label: str | None = None,
        quantile_legend_round: int = 3,
        hue_var: str = "scenario",
        hue_var_label: str | None = None,
        palette: PALETTE_LIKE[Any] | None = None,
        warn_on_palette_value_missing: bool = True,
        style_var: str = "variable",
        style_var_label: str | None = None,
        dashes: dict[Any, str | tuple[float, tuple[float, ...]]] | None = None,
        warn_on_dashes_value_missing: bool = True,
        linewidth: float = 2.0,
        unit_var: str = "unit",
        unit_aware: bool | pint.UnitRegistry = False,
        time_units: str | None = None,
        x_label: str | None = "time",
        y_label: str | bool | None = True,
        warn_infer_y_label_with_multi_unit: bool = True,
        create_legend: Callable[
            [matplotlib.axes.Axes, list[matplotlib.artist.Artist]], None
        ] = create_legend_default,
        observed: bool = True,
    ) -> matplotlib.axes.Axes:
        """
        Plot a plume plot

        Parameters
        ----------
        quantiles_plumes
            Quantiles to plot in each plume.

            If the first element of each tuple is a tuple,
            a plume is plotted between the given quantiles.
            Otherwise, if the first element is a plain float,
            a line is plotted for the given quantile.

        ax
            Axes on which to plot.

            If not supplied, a new axes is created.

        quantile_var
            Variable/column in the multi-index which stores information
            about the quantile that each timeseries represents.

        quantile_var_label
            Label to use as the header for the quantile section in the legend

        quantile_legend_round
            Rounding to apply to quantile values when creating the legend

        hue_var
            Variable to use for grouping data into different colour groups

        hue_var_label
            Label to use as the header for the hue/colour section in the legend

        palette
            Colour to use for the different groups in the data.

            If any groups are not included in `palette`,
            they are auto-filled.

        warn_on_palette_value_missing
            Should a warning be emitted if there are values missing from `palette`?

        style_var
            Variable to use for grouping data into different (line)style groups

        style_var_label
            Label to use as the header for the style section in the legend

        dashes
            Dash/linestyle to use for the different groups in the data.

            If any groups are not included in `dashes`,
            they are auto-filled.

        warn_on_dashes_value_missing
            Should a warning be emitted if there are values missing from `dashes`?

        linewidth
            Width to use for plotting lines.

        unit_var
            Variable/column in the multi-index which stores information
            about the unit of each timeseries.

        unit_aware
            Should the plot be done in a unit-aware way?

            If `True`, we use the default application registry
            (retrieved with [pint.get_application_registry][]).
            Otherwise, a [pint.UnitRegistry][] can be supplied and will be used.

            For details, see matplotlib and pint support plotting with units
            ([stable docs](https://pint.readthedocs.io/en/stable/user/plotting.html),
            [last version that we checked at the time of writing](https://pint.readthedocs.io/en/0.24.4/user/plotting.html)).

        time_units
            Units of the time axis of the data.

            These are required if `unit_aware` is not `False`.

        x_label
            Label to apply to the x-axis.

            If `None`, no label will be applied.

        y_label
            Label to apply to the y-axis.

            If `True`, we will try and infer the y-label based on the data's units.

            If `None`, no label will be applied.

        warn_infer_y_label_with_multi_unit
            Should a warning be raised if we try to infer the y-unit
            but the data has more than one unit?

        create_legend
            Function to use to create the legend.

            This allows the user to have full control over the creation of the legend.

        observed
            Passed to [pd.DataFrame.groupby][pandas.DataFrame.groupby].

        Returns
        -------
        :
            Axes on which the data was plotted
        """
        return plot_plume_func(
            self._df,
            ax=ax,
            quantiles_plumes=quantiles_plumes,
            quantile_var=quantile_var,
            quantile_var_label=quantile_var_label,
            quantile_legend_round=quantile_legend_round,
            hue_var=hue_var,
            hue_var_label=hue_var_label,
            palette=palette,
            warn_on_palette_value_missing=warn_on_palette_value_missing,
            style_var=style_var,
            style_var_label=style_var_label,
            dashes=dashes,
            warn_on_dashes_value_missing=warn_on_dashes_value_missing,
            linewidth=linewidth,
            unit_var=unit_var,
            unit_aware=unit_aware,
            time_units=time_units,
            x_label=x_label,
            y_label=y_label,
            warn_infer_y_label_with_multi_unit=warn_infer_y_label_with_multi_unit,
            create_legend=create_legend,
            observed=observed,
        )

    def plot_plume_after_calculating_quantiles(  # noqa: PLR0913
        self,
        ax: matplotlib.axes.Axes | None = None,
        *,
        quantile_over: str | list[str],
        quantiles_plumes: QUANTILES_PLUMES_LIKE = (
            (0.5, 0.7),
            ((0.05, 0.95), 0.2),
        ),
        quantile_var_label: str | None = None,
        quantile_legend_round: int = 2,
        hue_var: str = "scenario",
        hue_var_label: str | None = None,
        palette: PALETTE_LIKE[Any] | None = None,
        warn_on_palette_value_missing: bool = True,
        style_var: str = "variable",
        style_var_label: str | None = None,
        dashes: dict[Any, str | tuple[float, tuple[float, ...]]] | None = None,
        warn_on_dashes_value_missing: bool = True,
        linewidth: float = 3.0,
        unit_var: str = "unit",
        unit_aware: bool | pint.UnitRegistry = False,
        time_units: str | None = None,
        x_label: str | None = "time",
        y_label: str | bool | None = True,
        warn_infer_y_label_with_multi_unit: bool = True,
        create_legend: Callable[
            [matplotlib.axes.Axes, list[matplotlib.artist.Artist]], None
        ] = create_legend_default,
        observed: bool = True,
    ) -> matplotlib.axes.Axes:
        """
        Plot a plume plot, calculating the required quantiles first

        Parameters
        ----------
        ax
            Axes on which to plot.

            If not supplied, a new axes is created.

        quantile_over
            Variable(s)/column(s) over which to calculate the quantiles.

            The data is grouped by all columns except `quantile_over`
            when calculating the quantiles.

        quantiles_plumes
            Quantiles to plot in each plume.

            If the first element of each tuple is a tuple,
            a plume is plotted between the given quantiles.
            Otherwise, if the first element is a plain float,
            a line is plotted for the given quantile.

        quantile_var_label
            Label to use as the header for the quantile section in the legend

        quantile_legend_round
            Rounding to apply to quantile values when creating the legend

        hue_var
            Variable to use for grouping data into different colour groups

        hue_var_label
            Label to use as the header for the hue/colour section in the legend

        palette
            Colour to use for the different groups in the data.

            If any groups are not included in `palette`,
            they are auto-filled.

        warn_on_palette_value_missing
            Should a warning be emitted if there are values missing from `palette`?

        style_var
            Variable to use for grouping data into different (line)style groups

        style_var_label
            Label to use as the header for the style section in the legend

        dashes
            Dash/linestyle to use for the different groups in the data.

            If any groups are not included in `dashes`,
            they are auto-filled.

        warn_on_dashes_value_missing
            Should a warning be emitted if there are values missing from `dashes`?

        linewidth
            Width to use for plotting lines.

        unit_var
            Variable/column in the multi-index which stores information
            about the unit of each timeseries.

        unit_aware
            Should the plot be done in a unit-aware way?

            If `True`, we use the default application registry
            (retrieved with [pint.get_application_registry][]).
            Otherwise, a [pint.UnitRegistry][] can be supplied and will be used.

            For details, see matplotlib and pint support plotting with units
            ([stable docs](https://pint.readthedocs.io/en/stable/user/plotting.html),
            [last version that we checked at the time of writing](https://pint.readthedocs.io/en/0.24.4/user/plotting.html)).

        time_units
            Units of the time axis.

            These are required if `unit_aware` is not `False`.

        x_label
            Label to apply to the x-axis.

            If `None`, no label will be applied.

        y_label
            Label to apply to the y-axis.

            If `True`, we will try and infer the y-label based on the data's units.

            If `None`, no label will be applied.

        warn_infer_y_label_with_multi_unit
            Should a warning be raised if we try to infer the y-unit
            but the data has more than one unit?

        create_legend
            Function to use to create the legend.

            This allows the user to have full control over the creation of the legend.

        observed
            Passed to [pd.DataFrame.groupby][pandas.DataFrame.groupby].

        Returns
        -------
        :
            Axes on which the data was plotted
        """
        return plot_plume_after_calculating_quantiles_func(
            self._df,
            ax=ax,
            quantile_over=quantile_over,
            quantiles_plumes=quantiles_plumes,
            quantile_var_label=quantile_var_label,
            quantile_legend_round=quantile_legend_round,
            hue_var=hue_var,
            hue_var_label=hue_var_label,
            palette=palette,
            warn_on_palette_value_missing=warn_on_palette_value_missing,
            style_var=style_var,
            style_var_label=style_var_label,
            dashes=dashes,
            warn_on_dashes_value_missing=warn_on_dashes_value_missing,
            linewidth=linewidth,
            unit_var=unit_var,
            unit_aware=unit_aware,
            time_units=time_units,
            x_label=x_label,
            y_label=y_label,
            warn_infer_y_label_with_multi_unit=warn_infer_y_label_with_multi_unit,
            create_legend=create_legend,
            observed=observed,
        )

    def set_index_levels(
        self,
        levels_to_set: dict[str, Any | Collection[Any]],
        copy: bool = True,
    ) -> pd.DataFrame:
        """
        Set the index levels

        Parameters
        ----------
        levels_to_set
            Mapping of level names to values to set

        copy
            Should the [pd.DataFrame][pandas.DataFrame] be copied before returning?

        Returns
        -------
        :
            [pd.DataFrame][pandas.DataFrame] with updates applied to its index
        """
        return set_index_levels_func(
            self._df,
            levels_to_set=levels_to_set,
            copy=copy,
        )

    def to_category_index(self) -> pd.DataFrame:
        """
        Convert the index's values to categories

        This can save a lot of memory and improve the speed of processing.
        However, it comes with some pitfalls.
        For a nice discussion of some of them,
        see [this article](https://towardsdatascience.com/staying-sane-while-adopting-pandas-categorical-datatypes-78dbd19dcd8a/).

        Returns
        -------
        :
            [pd.DataFrame][pandas.DataFrame] with all index levels
            converted to category type.
        """
        return convert_index_to_category_index(self._df)

    def to_long_data(self, time_col_name: str = "time") -> pd.DataFrame:
        """
        Convert to long data

        Here, long data means that each row contains a single value,
        alongside metadata associated with that value
        (for more details, see e.g.
        https://data.europa.eu/apps/data-visualisation-guide/wide-versus-long-data).

        Parameters
        ----------
        time_col_name
            Name of the time column in the output

        Returns
        -------
        :
            DataFrame in long-form

        Examples
        --------
        >>> import numpy as np
        >>>
        >>> from pandas_openscm.accessors import register_pandas_accessors
        >>>
        >>> # pandas<3 has different representations,
        >>> # so skip if we have that version.
        >>> import pytest
        >>> pd = pytest.importorskip("pandas", minversion="3.0")
        >>>
        >>> register_pandas_accessors()
        >>>
        >>> df = pd.DataFrame(
        ...     [
        ...         [1.1, 0.8, 1.2],
        ...         [2.1, np.nan, 8.4],
        ...         [2.3, 3.2, 3.0],
        ...         [1.2, 2.8, np.nan],
        ...     ],
        ...     columns=[2010.0, 2015.0, 2025.0],
        ...     index=pd.MultiIndex.from_tuples(
        ...         [
        ...             ("sa", np.nan, "K"),
        ...             ("sb", "v1", None),
        ...             ("sa", "v2", "W"),
        ...             ("sb", "v2", "W"),
        ...         ],
        ...         names=["scenario", "variable", "unit"],
        ...     ),
        ... )
        >>>
        >>> # Start with wide data
        >>> df
                                2010.0  2015.0  2025.0
        scenario variable unit
        sa       nan      K        1.1     0.8     1.2
        sb       v1       nan      2.1     NaN     8.4
        sa       v2       W        2.3     3.2     3.0
        sb       v2       W        1.2     2.8     NaN
        >>>
        >>> # Convert to long data
        >>> df.openscm.to_long_data()
           scenario variable unit    time  value
        0        sa      NaN    K  2010.0    1.1
        1        sb       v1  NaN  2010.0    2.1
        2        sa       v2    W  2010.0    2.3
        3        sb       v2    W  2010.0    1.2
        4        sa      NaN    K  2015.0    0.8
        5        sb       v1  NaN  2015.0    NaN
        6        sa       v2    W  2015.0    3.2
        7        sb       v2    W  2015.0    2.8
        8        sa      NaN    K  2025.0    1.2
        9        sb       v1  NaN  2025.0    8.4
        10       sa       v2    W  2025.0    3.0
        11       sb       v2    W  2025.0    NaN
        >>>
        >>> # Specify a different time column name
        >>> df.openscm.to_long_data(time_col_name="year")
           scenario variable unit    year  value
        0        sa      NaN    K  2010.0    1.1
        1        sb       v1  NaN  2010.0    2.1
        2        sa       v2    W  2010.0    2.3
        3        sb       v2    W  2010.0    1.2
        4        sa      NaN    K  2015.0    0.8
        5        sb       v1  NaN  2015.0    NaN
        6        sa       v2    W  2015.0    3.2
        7        sb       v2    W  2015.0    2.8
        8        sa      NaN    K  2025.0    1.2
        9        sb       v1  NaN  2025.0    8.4
        10       sa       v2    W  2025.0    3.0
        11       sb       v2    W  2025.0    NaN
        >>>
        >>> # The result is just a pandas DataFrame,
        >>> # so you can do whatever operations you want
        >>> # on the result.
        >>> # A common one is probably dropping all rows with NaN
        >>> df.openscm.to_long_data(time_col_name="year").dropna()
           scenario variable unit    year  value
        2        sa       v2    W  2010.0    2.3
        3        sb       v2    W  2010.0    1.2
        6        sa       v2    W  2015.0    3.2
        7        sb       v2    W  2015.0    2.8
        10       sa       v2    W  2025.0    3.0
        >>>
        >>> # or just rows with NaN in particular columns
        >>> df.openscm.to_long_data(time_col_name="year").dropna(subset=["variable"])
           scenario variable unit    year  value
        1        sb       v1  NaN  2010.0    2.1
        2        sa       v2    W  2010.0    2.3
        3        sb       v2    W  2010.0    1.2
        5        sb       v1  NaN  2015.0    NaN
        6        sa       v2    W  2015.0    3.2
        7        sb       v2    W  2015.0    2.8
        9        sb       v1  NaN  2025.0    8.4
        10       sa       v2    W  2025.0    3.0
        11       sb       v2    W  2025.0    NaN
        """
        return ts_to_long_data(self._df, time_col_name=time_col_name)

    def update_index_levels(
        self,
        updates: dict[Any, Callable[[Any], Any]],
        copy: bool = True,
        remove_unused_levels: bool = True,
    ) -> pd.DataFrame:
        """
        Update the index levels

        Parameters
        ----------
        updates
            Updates to apply to the index levels

            Each key is the index level to which the updates will be applied.
            Each value is a function which updates the levels to their new values.

        copy
            Should the [pd.DataFrame][pandas.DataFrame] be copied before returning?

        remove_unused_levels
            Remove unused levels before applying the update

            Specifically, call
            [pd.MultiIndex.remove_unused_levels][pandas.MultiIndex.remove_unused_levels].

            This avoids trying to update levels that aren't being used.

        Returns
        -------
        :
            [pd.DataFrame][pandas.DataFrame] with updates applied to its index
        """
        return update_index_levels_func(
            self._df,
            updates=updates,
            copy=copy,
            remove_unused_levels=remove_unused_levels,
        )

    def update_index_levels_from_other(
        self,
        update_sources: dict[
            Any,
            tuple[
                Any,
                Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any],
            ]
            | tuple[
                tuple[Any, ...],
                Callable[[tuple[Any, ...]], Any]
                | dict[tuple[Any, ...], Any]
                | pd.Series[Any],
            ],
        ],
        copy: bool = True,
        remove_unused_levels: bool = True,
    ) -> pd.DataFrame:
        """
        Update the index levels based on other index levels

        Parameters
        ----------
        update_sources
            Updates to apply to the data's index

            Each key is the level to which the updates will be applied
            (or the level that will be created if it doesn't already exist).

            There are two options for the values.

            The first is used when only one level is used to update the 'target level'.
            In this case, each value is a tuple of which the first element
            is the level to use to generate the values (the 'source level')
            and the second is mapper of the form used by
            [pd.Index.map][pandas.Index.map]
            which will be applied to the source level
            to update/create the level of interest.

            Each value is a tuple of which the first element
            is the level or levels (if a tuple)
            to use to generate the values (the 'source level')
            and the second is mapper of the form used by
            [pd.Index.map][pandas.Index.map]
            which will be applied to the source level
            to update/create the level of interest.

        copy
            Should the [pd.DataFrame][pandas.DataFrame] be copied before returning?

        remove_unused_levels
            Remove unused levels before applying the update

            Specifically, call
            [pd.MultiIndex.remove_unused_levels][pandas.MultiIndex.remove_unused_levels].

            This avoids trying to update levels that aren't being used.

        Returns
        -------
        :
            [pd.DataFrame][pandas.DataFrame] with updates applied to its index
        """
        return update_index_levels_from_other_func(
            self._df,
            update_sources=update_sources,
            copy=copy,
            remove_unused_levels=remove_unused_levels,
        )

__init__ #

__init__(df: DataFrame)

Initialise

Parameters:

Name Type Description Default
df DataFrame

pd.DataFrame to use via the accessor

required
Source code in src/pandas_openscm/accessors/dataframe.py
def __init__(self, df: pd.DataFrame):
    """
    Initialise

    Parameters
    ----------
    df
        [pd.DataFrame][pandas.DataFrame] to use via the accessor
    """
    # It is possible to validate here.
    # However, it's probably better to do validation closer to the data use.
    self._df = df

convert_unit #

convert_unit(
    desired_units: str | Mapping[str, str] | Series[str],
    unit_level: str = "unit",
    ur: UnitRegistry | None = None,
) -> DataFrame

Convert units

This uses convert_unit_from_target_series. If you want to understand the details of how the conversion works, see that function's docstring.

Parameters:

Name Type Description Default
desired_units str | Mapping[str, str] | Series[str]

Desired unit(s) for df

If this is a string, we attempt to convert all timeseries to the given unit.

If this is a mapping, we convert the given units to the target units. Be careful using this form - you need to be certain of the units. If any of your keys don't match the existing units (even by a single whitespace character) then the unit conversion will not happen.

If this is a pd.Series, then it will be passed to convert_unit_from_target_series after filling any rows in the pd.DataFrame that are not in desired_units with the existing unit (i.e. unspecified rows are not converted).

For further details, see the examples in convert_unit.

required
unit_level str

Level in the index which holds unit information

Passed to convert_unit_from_target_series.

'unit'
ur UnitRegistry | None

Unit registry to use for the conversion.

Passed to convert_unit_from_target_series.

None

Returns:

Type Description
DataFrame

Data with converted units

Source code in src/pandas_openscm/accessors/dataframe.py
def convert_unit(
    self,
    desired_units: str | Mapping[str, str] | pd.Series[str],
    unit_level: str = "unit",
    ur: pint.UnitRegistry | None = None,
) -> pd.DataFrame:
    """
    Convert units

    This uses [convert_unit_from_target_series][pandas_openscm.unit_conversion.].
    If you want to understand the details of how the conversion works,
    see that function's docstring.

    Parameters
    ----------
    desired_units
        Desired unit(s) for `df`

        If this is a string,
        we attempt to convert all timeseries to the given unit.

        If this is a mapping,
        we convert the given units to the target units.
        Be careful using this form - you need to be certain of the units.
        If any of your keys don't match the existing units
        (even by a single whitespace character)
        then the unit conversion will not happen.

        If this is a [pd.Series][pandas.Series],
        then it will be passed to
        [convert_unit_from_target_series][pandas_openscm.unit_conversion.]
        after filling any rows in the [pd.DataFrame][pandas.DataFrame]
        that are not in `desired_units`
        with the existing unit (i.e. unspecified rows are not converted).

        For further details, see the examples
        in [convert_unit][pandas_openscm.unit_conversion.].

    unit_level
        Level in the index which holds unit information

        Passed to
        [convert_unit_from_target_series][pandas_openscm.unit_conversion.].

    ur
        Unit registry to use for the conversion.

        Passed to
        [convert_unit_from_target_series][pandas_openscm.unit_conversion.].

    Returns
    -------
    :
        Data with converted units
    """
    return convert_unit(
        self._df, desired_units=desired_units, unit_level=unit_level, ur=ur
    )

convert_unit_like #

convert_unit_like(
    target: DataFrame | Series[Any],
    unit_level: str = "unit",
    target_unit_level: str | None = None,
    ur: UnitRegistry | None = None,
) -> DataFrame

Convert units to match another pd.DataFrame

For further details, see the examples in convert_unit_like.

This is essentially a helper for convert_unit_from_target_series. It implements one set of logic for extracting desired units and tries to be clever, handling differences in index levels between the data and target sensibly wherever possible.

If you want behaviour other than what is implemented here, use convert_unit_from_target_series directly.

Parameters:

Name Type Description Default
target DataFrame | Series[Any]

Supported pandas object whose units should be matched

required
unit_level str

Level in the data's index which holds unit information

'unit'
target_unit_level str | None

Level in target's index which holds unit information

If not supplied, we use df_unit_level.

None
ur UnitRegistry | None

Unit registry to use for the conversion.

Passed to convert_unit_from_target_series.

None

Returns:

Type Description
DataFrame

Data with converted units

Source code in src/pandas_openscm/accessors/dataframe.py
def convert_unit_like(
    self,
    target: pd.DataFrame | pd.Series[Any],
    unit_level: str = "unit",
    target_unit_level: str | None = None,
    ur: pint.UnitRegistry | None = None,
) -> pd.DataFrame:
    """
    Convert units to match another [pd.DataFrame][pandas.DataFrame]

    For further details, see the examples
    in [convert_unit_like][pandas_openscm.unit_conversion.].

    This is essentially a helper for
    [convert_unit_from_target_series][pandas_openscm.unit_conversion.].
    It implements one set of logic for extracting desired units
    and tries to be clever, handling differences in index levels
    between the data and `target` sensibly wherever possible.

    If you want behaviour other than what is implemented here,
    use [convert_unit_from_target_series][pandas_openscm.unit_conversion.] directly.

    Parameters
    ----------
    target
        Supported [pandas][] object whose units should be matched

    unit_level
        Level in the data's index which holds unit information

    target_unit_level
        Level in `target`'s index which holds unit information

        If not supplied, we use `df_unit_level`.

    ur
        Unit registry to use for the conversion.

        Passed to
        [convert_unit_from_target_series][pandas_openscm.unit_conversion.].

    Returns
    -------
    :
        Data with converted units
    """
    return convert_unit_like(
        self._df,
        target=target,
        unit_level=unit_level,
        target_unit_level=target_unit_level,
        ur=ur,
    )

eiim #

eiim(copy: bool = True) -> DataFrame

Ensure that the index is a pd.MultiIndex

Alias for ensure_index_is_multiindex

Parameters:

Name Type Description Default
copy bool

Whether to copy df before manipulating the index name

True

Returns:

Type Description
DataFrame

df with a pd.MultiIndex

If the index was already a pd.MultiIndex, this is a no-op (although the value of copy is respected).

Source code in src/pandas_openscm/accessors/dataframe.py
def eiim(self, copy: bool = True) -> pd.DataFrame:
    """
    Ensure that the index is a [pd.MultiIndex][pandas.MultiIndex]

    Alias for [ensure_index_is_multiindex][pandas_openscm.index_manipulation.]

    Parameters
    ----------
    copy
        Whether to copy `df` before manipulating the index name

    Returns
    -------
    :
        `df` with a [pd.MultiIndex][pandas.MultiIndex]

        If the index was already a [pd.MultiIndex][pandas.MultiIndex],
        this is a no-op (although the value of copy is respected).
    """
    return self.ensure_index_is_multiindex(copy=copy)

ensure_index_is_multiindex #

ensure_index_is_multiindex(copy: bool = True) -> DataFrame

Ensure that the index is a pd.MultiIndex

Parameters:

Name Type Description Default
copy bool

Whether to copy df before manipulating the index name

True

Returns:

Type Description
DataFrame

df with a pd.MultiIndex

If the index was already a pd.MultiIndex, this is a no-op (although the value of copy is respected).

Source code in src/pandas_openscm/accessors/dataframe.py
def ensure_index_is_multiindex(self, copy: bool = True) -> pd.DataFrame:
    """
    Ensure that the index is a [pd.MultiIndex][pandas.MultiIndex]

    Parameters
    ----------
    copy
        Whether to copy `df` before manipulating the index name

    Returns
    -------
    :
        `df` with a [pd.MultiIndex][pandas.MultiIndex]

        If the index was already a [pd.MultiIndex][pandas.MultiIndex],
        this is a no-op (although the value of copy is respected).
    """
    return ensure_index_is_multiindex(self._df, copy=copy)

fix_index_name_after_groupby_quantile #

fix_index_name_after_groupby_quantile(
    new_name: str = "quantile", copy: bool = False
) -> DataFrame

Fix the index name after performing a groupby(...).quantile(...) operation

By default, pandas doesn't assign a name to the quantile level when doing an operation of the form given above. This fixes this, but it does assume that the quantile level is the only unnamed level in the index.

Parameters:

Name Type Description Default
new_name str

New name to give to the quantile column

'quantile'
copy bool

Whether to copy df before manipulating the index name

False

Returns:

Type Description
DataFrame

df, with the last level in its index renamed to new_name.

Source code in src/pandas_openscm/accessors/dataframe.py
def fix_index_name_after_groupby_quantile(
    self, new_name: str = "quantile", copy: bool = False
) -> pd.DataFrame:
    """
    Fix the index name after performing a `groupby(...).quantile(...)` operation

    By default, pandas doesn't assign a name to the quantile level
    when doing an operation of the form given above.
    This fixes this, but it does assume
    that the quantile level is the only unnamed level in the index.

    Parameters
    ----------
    new_name
        New name to give to the quantile column

    copy
        Whether to copy `df` before manipulating the index name

    Returns
    -------
    :
        `df`, with the last level in its index renamed to `new_name`.
    """
    return fix_index_name_after_groupby_quantile(
        self._df, new_name=new_name, copy=copy
    )

groupby_except #

groupby_except(
    non_groupers: str | list[str], observed: bool = True
) -> DataFrameGroupBy[Any, Any]

Group by all index levels except specified levels

This is the inverse of pd.DataFrame.groupby.

Parameters:

Name Type Description Default
non_groupers str | list[str]

Columns to exclude from the grouping

required
observed bool

Whether to only return observed combinations or not

True

Returns:

Type Description
DataFrameGroupBy[Any, Any]

The pd.DataFrame, grouped by all columns except non_groupers.

Source code in src/pandas_openscm/accessors/dataframe.py
def groupby_except(
    self, non_groupers: str | list[str], observed: bool = True
) -> pandas.core.groupby.generic.DataFrameGroupBy[Any, Any]:
    """
    Group by all index levels except specified levels

    This is the inverse of [pd.DataFrame.groupby][pandas.DataFrame.groupby].

    Parameters
    ----------
    non_groupers
        Columns to exclude from the grouping

    observed
        Whether to only return observed combinations or not

    Returns
    -------
    :
        The [pd.DataFrame][pandas.DataFrame],
        grouped by all columns except `non_groupers`.
    """
    return groupby_except(self._df, non_groupers=non_groupers, observed=observed)

mi_loc #

mi_loc(
    locator: Index[Any] | MultiIndex | Selector,
) -> DataFrame

Select data, being slightly smarter than the default pandas.DataFrame.loc.

Parameters:

Name Type Description Default
locator Index[Any] | MultiIndex | Selector

Locator to apply

If this is a multi-index, we use multi_index_lookup to ensure correct alignment.

If this is an index that has a name, we use the name to ensure correct alignment.

required

Returns:

Type Description
DataFrame

Selected data

Notes

If you have pandas_indexing installed, you can get the same (perhaps even better) functionality using something like the following instead

...
pandas_obj.loc[pandas_indexing.isin(locator)]
...
Source code in src/pandas_openscm/accessors/dataframe.py
def mi_loc(
    self,
    locator: pd.Index[Any] | pd.MultiIndex | pix.selectors.Selector,
) -> pd.DataFrame:
    """
    Select data, being slightly smarter than the default [pandas.DataFrame.loc][].

    Parameters
    ----------
    locator
        Locator to apply

        If this is a multi-index, we use
        [multi_index_lookup][pandas_openscm.indexing.] to ensure correct alignment.

        If this is an index that has a name,
        we use the name to ensure correct alignment.

    Returns
    -------
    :
        Selected data

    Notes
    -----
    If you have [pandas_indexing][] installed,
    you can get the same (perhaps even better) functionality
    using something like the following instead

    ```python
    ...
    pandas_obj.loc[pandas_indexing.isin(locator)]
    ...
    ```
    """
    return mi_loc(self._df, locator)

plot_plume #

plot_plume(
    quantiles_plumes: QUANTILES_PLUMES_LIKE,
    ax: Axes | None = None,
    *,
    quantile_var: str = "quantile",
    quantile_var_label: str | None = None,
    quantile_legend_round: int = 3,
    hue_var: str = "scenario",
    hue_var_label: str | None = None,
    palette: PALETTE_LIKE[Any] | None = None,
    warn_on_palette_value_missing: bool = True,
    style_var: str = "variable",
    style_var_label: str | None = None,
    dashes: dict[Any, str | tuple[float, tuple[float, ...]]]
    | None = None,
    warn_on_dashes_value_missing: bool = True,
    linewidth: float = 2.0,
    unit_var: str = "unit",
    unit_aware: bool | UnitRegistry = False,
    time_units: str | None = None,
    x_label: str | None = "time",
    y_label: str | bool | None = True,
    warn_infer_y_label_with_multi_unit: bool = True,
    create_legend: Callable[
        [Axes, list[Artist]], None
    ] = create_legend_default,
    observed: bool = True,
) -> Axes

Plot a plume plot

Parameters:

Name Type Description Default
quantiles_plumes QUANTILES_PLUMES_LIKE

Quantiles to plot in each plume.

If the first element of each tuple is a tuple, a plume is plotted between the given quantiles. Otherwise, if the first element is a plain float, a line is plotted for the given quantile.

required
ax Axes | None

Axes on which to plot.

If not supplied, a new axes is created.

None
quantile_var str

Variable/column in the multi-index which stores information about the quantile that each timeseries represents.

'quantile'
quantile_var_label str | None

Label to use as the header for the quantile section in the legend

None
quantile_legend_round int

Rounding to apply to quantile values when creating the legend

3
hue_var str

Variable to use for grouping data into different colour groups

'scenario'
hue_var_label str | None

Label to use as the header for the hue/colour section in the legend

None
palette PALETTE_LIKE[Any] | None

Colour to use for the different groups in the data.

If any groups are not included in palette, they are auto-filled.

None
warn_on_palette_value_missing bool

Should a warning be emitted if there are values missing from palette?

True
style_var str

Variable to use for grouping data into different (line)style groups

'variable'
style_var_label str | None

Label to use as the header for the style section in the legend

None
dashes dict[Any, str | tuple[float, tuple[float, ...]]] | None

Dash/linestyle to use for the different groups in the data.

If any groups are not included in dashes, they are auto-filled.

None
warn_on_dashes_value_missing bool

Should a warning be emitted if there are values missing from dashes?

True
linewidth float

Width to use for plotting lines.

2.0
unit_var str

Variable/column in the multi-index which stores information about the unit of each timeseries.

'unit'
unit_aware bool | UnitRegistry

Should the plot be done in a unit-aware way?

If True, we use the default application registry (retrieved with pint.get_application_registry). Otherwise, a pint.UnitRegistry can be supplied and will be used.

For details, see matplotlib and pint support plotting with units (stable docs, last version that we checked at the time of writing).

False
time_units str | None

Units of the time axis of the data.

These are required if unit_aware is not False.

None
x_label str | None

Label to apply to the x-axis.

If None, no label will be applied.

'time'
y_label str | bool | None

Label to apply to the y-axis.

If True, we will try and infer the y-label based on the data's units.

If None, no label will be applied.

True
warn_infer_y_label_with_multi_unit bool

Should a warning be raised if we try to infer the y-unit but the data has more than one unit?

True
create_legend Callable[[Axes, list[Artist]], None]

Function to use to create the legend.

This allows the user to have full control over the creation of the legend.

create_legend_default
observed bool True

Returns:

Type Description
Axes

Axes on which the data was plotted

Source code in src/pandas_openscm/accessors/dataframe.py
def plot_plume(  # noqa: PLR0913
    self,
    quantiles_plumes: QUANTILES_PLUMES_LIKE,
    ax: matplotlib.axes.Axes | None = None,
    *,
    quantile_var: str = "quantile",
    quantile_var_label: str | None = None,
    quantile_legend_round: int = 3,
    hue_var: str = "scenario",
    hue_var_label: str | None = None,
    palette: PALETTE_LIKE[Any] | None = None,
    warn_on_palette_value_missing: bool = True,
    style_var: str = "variable",
    style_var_label: str | None = None,
    dashes: dict[Any, str | tuple[float, tuple[float, ...]]] | None = None,
    warn_on_dashes_value_missing: bool = True,
    linewidth: float = 2.0,
    unit_var: str = "unit",
    unit_aware: bool | pint.UnitRegistry = False,
    time_units: str | None = None,
    x_label: str | None = "time",
    y_label: str | bool | None = True,
    warn_infer_y_label_with_multi_unit: bool = True,
    create_legend: Callable[
        [matplotlib.axes.Axes, list[matplotlib.artist.Artist]], None
    ] = create_legend_default,
    observed: bool = True,
) -> matplotlib.axes.Axes:
    """
    Plot a plume plot

    Parameters
    ----------
    quantiles_plumes
        Quantiles to plot in each plume.

        If the first element of each tuple is a tuple,
        a plume is plotted between the given quantiles.
        Otherwise, if the first element is a plain float,
        a line is plotted for the given quantile.

    ax
        Axes on which to plot.

        If not supplied, a new axes is created.

    quantile_var
        Variable/column in the multi-index which stores information
        about the quantile that each timeseries represents.

    quantile_var_label
        Label to use as the header for the quantile section in the legend

    quantile_legend_round
        Rounding to apply to quantile values when creating the legend

    hue_var
        Variable to use for grouping data into different colour groups

    hue_var_label
        Label to use as the header for the hue/colour section in the legend

    palette
        Colour to use for the different groups in the data.

        If any groups are not included in `palette`,
        they are auto-filled.

    warn_on_palette_value_missing
        Should a warning be emitted if there are values missing from `palette`?

    style_var
        Variable to use for grouping data into different (line)style groups

    style_var_label
        Label to use as the header for the style section in the legend

    dashes
        Dash/linestyle to use for the different groups in the data.

        If any groups are not included in `dashes`,
        they are auto-filled.

    warn_on_dashes_value_missing
        Should a warning be emitted if there are values missing from `dashes`?

    linewidth
        Width to use for plotting lines.

    unit_var
        Variable/column in the multi-index which stores information
        about the unit of each timeseries.

    unit_aware
        Should the plot be done in a unit-aware way?

        If `True`, we use the default application registry
        (retrieved with [pint.get_application_registry][]).
        Otherwise, a [pint.UnitRegistry][] can be supplied and will be used.

        For details, see matplotlib and pint support plotting with units
        ([stable docs](https://pint.readthedocs.io/en/stable/user/plotting.html),
        [last version that we checked at the time of writing](https://pint.readthedocs.io/en/0.24.4/user/plotting.html)).

    time_units
        Units of the time axis of the data.

        These are required if `unit_aware` is not `False`.

    x_label
        Label to apply to the x-axis.

        If `None`, no label will be applied.

    y_label
        Label to apply to the y-axis.

        If `True`, we will try and infer the y-label based on the data's units.

        If `None`, no label will be applied.

    warn_infer_y_label_with_multi_unit
        Should a warning be raised if we try to infer the y-unit
        but the data has more than one unit?

    create_legend
        Function to use to create the legend.

        This allows the user to have full control over the creation of the legend.

    observed
        Passed to [pd.DataFrame.groupby][pandas.DataFrame.groupby].

    Returns
    -------
    :
        Axes on which the data was plotted
    """
    return plot_plume_func(
        self._df,
        ax=ax,
        quantiles_plumes=quantiles_plumes,
        quantile_var=quantile_var,
        quantile_var_label=quantile_var_label,
        quantile_legend_round=quantile_legend_round,
        hue_var=hue_var,
        hue_var_label=hue_var_label,
        palette=palette,
        warn_on_palette_value_missing=warn_on_palette_value_missing,
        style_var=style_var,
        style_var_label=style_var_label,
        dashes=dashes,
        warn_on_dashes_value_missing=warn_on_dashes_value_missing,
        linewidth=linewidth,
        unit_var=unit_var,
        unit_aware=unit_aware,
        time_units=time_units,
        x_label=x_label,
        y_label=y_label,
        warn_infer_y_label_with_multi_unit=warn_infer_y_label_with_multi_unit,
        create_legend=create_legend,
        observed=observed,
    )

plot_plume_after_calculating_quantiles #

plot_plume_after_calculating_quantiles(
    ax: Axes | None = None,
    *,
    quantile_over: str | list[str],
    quantiles_plumes: QUANTILES_PLUMES_LIKE = (
        (0.5, 0.7),
        ((0.05, 0.95), 0.2),
    ),
    quantile_var_label: str | None = None,
    quantile_legend_round: int = 2,
    hue_var: str = "scenario",
    hue_var_label: str | None = None,
    palette: PALETTE_LIKE[Any] | None = None,
    warn_on_palette_value_missing: bool = True,
    style_var: str = "variable",
    style_var_label: str | None = None,
    dashes: dict[Any, str | tuple[float, tuple[float, ...]]]
    | None = None,
    warn_on_dashes_value_missing: bool = True,
    linewidth: float = 3.0,
    unit_var: str = "unit",
    unit_aware: bool | UnitRegistry = False,
    time_units: str | None = None,
    x_label: str | None = "time",
    y_label: str | bool | None = True,
    warn_infer_y_label_with_multi_unit: bool = True,
    create_legend: Callable[
        [Axes, list[Artist]], None
    ] = create_legend_default,
    observed: bool = True,
) -> Axes

Plot a plume plot, calculating the required quantiles first

Parameters:

Name Type Description Default
ax Axes | None

Axes on which to plot.

If not supplied, a new axes is created.

None
quantile_over str | list[str]

Variable(s)/column(s) over which to calculate the quantiles.

The data is grouped by all columns except quantile_over when calculating the quantiles.

required
quantiles_plumes QUANTILES_PLUMES_LIKE

Quantiles to plot in each plume.

If the first element of each tuple is a tuple, a plume is plotted between the given quantiles. Otherwise, if the first element is a plain float, a line is plotted for the given quantile.

((0.5, 0.7), ((0.05, 0.95), 0.2))
quantile_var_label str | None

Label to use as the header for the quantile section in the legend

None
quantile_legend_round int

Rounding to apply to quantile values when creating the legend

2
hue_var str

Variable to use for grouping data into different colour groups

'scenario'
hue_var_label str | None

Label to use as the header for the hue/colour section in the legend

None
palette PALETTE_LIKE[Any] | None

Colour to use for the different groups in the data.

If any groups are not included in palette, they are auto-filled.

None
warn_on_palette_value_missing bool

Should a warning be emitted if there are values missing from palette?

True
style_var str

Variable to use for grouping data into different (line)style groups

'variable'
style_var_label str | None

Label to use as the header for the style section in the legend

None
dashes dict[Any, str | tuple[float, tuple[float, ...]]] | None

Dash/linestyle to use for the different groups in the data.

If any groups are not included in dashes, they are auto-filled.

None
warn_on_dashes_value_missing bool

Should a warning be emitted if there are values missing from dashes?

True
linewidth float

Width to use for plotting lines.

3.0
unit_var str

Variable/column in the multi-index which stores information about the unit of each timeseries.

'unit'
unit_aware bool | UnitRegistry

Should the plot be done in a unit-aware way?

If True, we use the default application registry (retrieved with pint.get_application_registry). Otherwise, a pint.UnitRegistry can be supplied and will be used.

For details, see matplotlib and pint support plotting with units (stable docs, last version that we checked at the time of writing).

False
time_units str | None

Units of the time axis.

These are required if unit_aware is not False.

None
x_label str | None

Label to apply to the x-axis.

If None, no label will be applied.

'time'
y_label str | bool | None

Label to apply to the y-axis.

If True, we will try and infer the y-label based on the data's units.

If None, no label will be applied.

True
warn_infer_y_label_with_multi_unit bool

Should a warning be raised if we try to infer the y-unit but the data has more than one unit?

True
create_legend Callable[[Axes, list[Artist]], None]

Function to use to create the legend.

This allows the user to have full control over the creation of the legend.

create_legend_default
observed bool True

Returns:

Type Description
Axes

Axes on which the data was plotted

Source code in src/pandas_openscm/accessors/dataframe.py
def plot_plume_after_calculating_quantiles(  # noqa: PLR0913
    self,
    ax: matplotlib.axes.Axes | None = None,
    *,
    quantile_over: str | list[str],
    quantiles_plumes: QUANTILES_PLUMES_LIKE = (
        (0.5, 0.7),
        ((0.05, 0.95), 0.2),
    ),
    quantile_var_label: str | None = None,
    quantile_legend_round: int = 2,
    hue_var: str = "scenario",
    hue_var_label: str | None = None,
    palette: PALETTE_LIKE[Any] | None = None,
    warn_on_palette_value_missing: bool = True,
    style_var: str = "variable",
    style_var_label: str | None = None,
    dashes: dict[Any, str | tuple[float, tuple[float, ...]]] | None = None,
    warn_on_dashes_value_missing: bool = True,
    linewidth: float = 3.0,
    unit_var: str = "unit",
    unit_aware: bool | pint.UnitRegistry = False,
    time_units: str | None = None,
    x_label: str | None = "time",
    y_label: str | bool | None = True,
    warn_infer_y_label_with_multi_unit: bool = True,
    create_legend: Callable[
        [matplotlib.axes.Axes, list[matplotlib.artist.Artist]], None
    ] = create_legend_default,
    observed: bool = True,
) -> matplotlib.axes.Axes:
    """
    Plot a plume plot, calculating the required quantiles first

    Parameters
    ----------
    ax
        Axes on which to plot.

        If not supplied, a new axes is created.

    quantile_over
        Variable(s)/column(s) over which to calculate the quantiles.

        The data is grouped by all columns except `quantile_over`
        when calculating the quantiles.

    quantiles_plumes
        Quantiles to plot in each plume.

        If the first element of each tuple is a tuple,
        a plume is plotted between the given quantiles.
        Otherwise, if the first element is a plain float,
        a line is plotted for the given quantile.

    quantile_var_label
        Label to use as the header for the quantile section in the legend

    quantile_legend_round
        Rounding to apply to quantile values when creating the legend

    hue_var
        Variable to use for grouping data into different colour groups

    hue_var_label
        Label to use as the header for the hue/colour section in the legend

    palette
        Colour to use for the different groups in the data.

        If any groups are not included in `palette`,
        they are auto-filled.

    warn_on_palette_value_missing
        Should a warning be emitted if there are values missing from `palette`?

    style_var
        Variable to use for grouping data into different (line)style groups

    style_var_label
        Label to use as the header for the style section in the legend

    dashes
        Dash/linestyle to use for the different groups in the data.

        If any groups are not included in `dashes`,
        they are auto-filled.

    warn_on_dashes_value_missing
        Should a warning be emitted if there are values missing from `dashes`?

    linewidth
        Width to use for plotting lines.

    unit_var
        Variable/column in the multi-index which stores information
        about the unit of each timeseries.

    unit_aware
        Should the plot be done in a unit-aware way?

        If `True`, we use the default application registry
        (retrieved with [pint.get_application_registry][]).
        Otherwise, a [pint.UnitRegistry][] can be supplied and will be used.

        For details, see matplotlib and pint support plotting with units
        ([stable docs](https://pint.readthedocs.io/en/stable/user/plotting.html),
        [last version that we checked at the time of writing](https://pint.readthedocs.io/en/0.24.4/user/plotting.html)).

    time_units
        Units of the time axis.

        These are required if `unit_aware` is not `False`.

    x_label
        Label to apply to the x-axis.

        If `None`, no label will be applied.

    y_label
        Label to apply to the y-axis.

        If `True`, we will try and infer the y-label based on the data's units.

        If `None`, no label will be applied.

    warn_infer_y_label_with_multi_unit
        Should a warning be raised if we try to infer the y-unit
        but the data has more than one unit?

    create_legend
        Function to use to create the legend.

        This allows the user to have full control over the creation of the legend.

    observed
        Passed to [pd.DataFrame.groupby][pandas.DataFrame.groupby].

    Returns
    -------
    :
        Axes on which the data was plotted
    """
    return plot_plume_after_calculating_quantiles_func(
        self._df,
        ax=ax,
        quantile_over=quantile_over,
        quantiles_plumes=quantiles_plumes,
        quantile_var_label=quantile_var_label,
        quantile_legend_round=quantile_legend_round,
        hue_var=hue_var,
        hue_var_label=hue_var_label,
        palette=palette,
        warn_on_palette_value_missing=warn_on_palette_value_missing,
        style_var=style_var,
        style_var_label=style_var_label,
        dashes=dashes,
        warn_on_dashes_value_missing=warn_on_dashes_value_missing,
        linewidth=linewidth,
        unit_var=unit_var,
        unit_aware=unit_aware,
        time_units=time_units,
        x_label=x_label,
        y_label=y_label,
        warn_infer_y_label_with_multi_unit=warn_infer_y_label_with_multi_unit,
        create_legend=create_legend,
        observed=observed,
    )

set_index_levels #

set_index_levels(
    levels_to_set: dict[str, Any | Collection[Any]],
    copy: bool = True,
) -> DataFrame

Set the index levels

Parameters:

Name Type Description Default
levels_to_set dict[str, Any | Collection[Any]]

Mapping of level names to values to set

required
copy bool

Should the pd.DataFrame be copied before returning?

True

Returns:

Type Description
DataFrame

pd.DataFrame with updates applied to its index

Source code in src/pandas_openscm/accessors/dataframe.py
def set_index_levels(
    self,
    levels_to_set: dict[str, Any | Collection[Any]],
    copy: bool = True,
) -> pd.DataFrame:
    """
    Set the index levels

    Parameters
    ----------
    levels_to_set
        Mapping of level names to values to set

    copy
        Should the [pd.DataFrame][pandas.DataFrame] be copied before returning?

    Returns
    -------
    :
        [pd.DataFrame][pandas.DataFrame] with updates applied to its index
    """
    return set_index_levels_func(
        self._df,
        levels_to_set=levels_to_set,
        copy=copy,
    )

to_category_index #

to_category_index() -> DataFrame

Convert the index's values to categories

This can save a lot of memory and improve the speed of processing. However, it comes with some pitfalls. For a nice discussion of some of them, see this article.

Returns:

Type Description
DataFrame

pd.DataFrame with all index levels converted to category type.

Source code in src/pandas_openscm/accessors/dataframe.py
def to_category_index(self) -> pd.DataFrame:
    """
    Convert the index's values to categories

    This can save a lot of memory and improve the speed of processing.
    However, it comes with some pitfalls.
    For a nice discussion of some of them,
    see [this article](https://towardsdatascience.com/staying-sane-while-adopting-pandas-categorical-datatypes-78dbd19dcd8a/).

    Returns
    -------
    :
        [pd.DataFrame][pandas.DataFrame] with all index levels
        converted to category type.
    """
    return convert_index_to_category_index(self._df)

to_long_data #

to_long_data(time_col_name: str = 'time') -> DataFrame

Convert to long data

Here, long data means that each row contains a single value, alongside metadata associated with that value (for more details, see e.g. https://data.europa.eu/apps/data-visualisation-guide/wide-versus-long-data).

Parameters:

Name Type Description Default
time_col_name str

Name of the time column in the output

'time'

Returns:

Type Description
DataFrame

DataFrame in long-form

Examples:

>>> import numpy as np
>>>
>>> from pandas_openscm.accessors import register_pandas_accessors
>>>
>>> # pandas<3 has different representations,
>>> # so skip if we have that version.
>>> import pytest
>>> pd = pytest.importorskip("pandas", minversion="3.0")
>>>
>>> register_pandas_accessors()
>>>
>>> df = pd.DataFrame(
...     [
...         [1.1, 0.8, 1.2],
...         [2.1, np.nan, 8.4],
...         [2.3, 3.2, 3.0],
...         [1.2, 2.8, np.nan],
...     ],
...     columns=[2010.0, 2015.0, 2025.0],
...     index=pd.MultiIndex.from_tuples(
...         [
...             ("sa", np.nan, "K"),
...             ("sb", "v1", None),
...             ("sa", "v2", "W"),
...             ("sb", "v2", "W"),
...         ],
...         names=["scenario", "variable", "unit"],
...     ),
... )
>>>
>>> # Start with wide data
>>> df
                        2010.0  2015.0  2025.0
scenario variable unit
sa       nan      K        1.1     0.8     1.2
sb       v1       nan      2.1     NaN     8.4
sa       v2       W        2.3     3.2     3.0
sb       v2       W        1.2     2.8     NaN
>>>
>>> # Convert to long data
>>> df.openscm.to_long_data()
   scenario variable unit    time  value
0        sa      NaN    K  2010.0    1.1
1        sb       v1  NaN  2010.0    2.1
2        sa       v2    W  2010.0    2.3
3        sb       v2    W  2010.0    1.2
4        sa      NaN    K  2015.0    0.8
5        sb       v1  NaN  2015.0    NaN
6        sa       v2    W  2015.0    3.2
7        sb       v2    W  2015.0    2.8
8        sa      NaN    K  2025.0    1.2
9        sb       v1  NaN  2025.0    8.4
10       sa       v2    W  2025.0    3.0
11       sb       v2    W  2025.0    NaN
>>>
>>> # Specify a different time column name
>>> df.openscm.to_long_data(time_col_name="year")
   scenario variable unit    year  value
0        sa      NaN    K  2010.0    1.1
1        sb       v1  NaN  2010.0    2.1
2        sa       v2    W  2010.0    2.3
3        sb       v2    W  2010.0    1.2
4        sa      NaN    K  2015.0    0.8
5        sb       v1  NaN  2015.0    NaN
6        sa       v2    W  2015.0    3.2
7        sb       v2    W  2015.0    2.8
8        sa      NaN    K  2025.0    1.2
9        sb       v1  NaN  2025.0    8.4
10       sa       v2    W  2025.0    3.0
11       sb       v2    W  2025.0    NaN
>>>
>>> # The result is just a pandas DataFrame,
>>> # so you can do whatever operations you want
>>> # on the result.
>>> # A common one is probably dropping all rows with NaN
>>> df.openscm.to_long_data(time_col_name="year").dropna()
   scenario variable unit    year  value
2        sa       v2    W  2010.0    2.3
3        sb       v2    W  2010.0    1.2
6        sa       v2    W  2015.0    3.2
7        sb       v2    W  2015.0    2.8
10       sa       v2    W  2025.0    3.0
>>>
>>> # or just rows with NaN in particular columns
>>> df.openscm.to_long_data(time_col_name="year").dropna(subset=["variable"])
   scenario variable unit    year  value
1        sb       v1  NaN  2010.0    2.1
2        sa       v2    W  2010.0    2.3
3        sb       v2    W  2010.0    1.2
5        sb       v1  NaN  2015.0    NaN
6        sa       v2    W  2015.0    3.2
7        sb       v2    W  2015.0    2.8
9        sb       v1  NaN  2025.0    8.4
10       sa       v2    W  2025.0    3.0
11       sb       v2    W  2025.0    NaN
Source code in src/pandas_openscm/accessors/dataframe.py
def to_long_data(self, time_col_name: str = "time") -> pd.DataFrame:
    """
    Convert to long data

    Here, long data means that each row contains a single value,
    alongside metadata associated with that value
    (for more details, see e.g.
    https://data.europa.eu/apps/data-visualisation-guide/wide-versus-long-data).

    Parameters
    ----------
    time_col_name
        Name of the time column in the output

    Returns
    -------
    :
        DataFrame in long-form

    Examples
    --------
    >>> import numpy as np
    >>>
    >>> from pandas_openscm.accessors import register_pandas_accessors
    >>>
    >>> # pandas<3 has different representations,
    >>> # so skip if we have that version.
    >>> import pytest
    >>> pd = pytest.importorskip("pandas", minversion="3.0")
    >>>
    >>> register_pandas_accessors()
    >>>
    >>> df = pd.DataFrame(
    ...     [
    ...         [1.1, 0.8, 1.2],
    ...         [2.1, np.nan, 8.4],
    ...         [2.3, 3.2, 3.0],
    ...         [1.2, 2.8, np.nan],
    ...     ],
    ...     columns=[2010.0, 2015.0, 2025.0],
    ...     index=pd.MultiIndex.from_tuples(
    ...         [
    ...             ("sa", np.nan, "K"),
    ...             ("sb", "v1", None),
    ...             ("sa", "v2", "W"),
    ...             ("sb", "v2", "W"),
    ...         ],
    ...         names=["scenario", "variable", "unit"],
    ...     ),
    ... )
    >>>
    >>> # Start with wide data
    >>> df
                            2010.0  2015.0  2025.0
    scenario variable unit
    sa       nan      K        1.1     0.8     1.2
    sb       v1       nan      2.1     NaN     8.4
    sa       v2       W        2.3     3.2     3.0
    sb       v2       W        1.2     2.8     NaN
    >>>
    >>> # Convert to long data
    >>> df.openscm.to_long_data()
       scenario variable unit    time  value
    0        sa      NaN    K  2010.0    1.1
    1        sb       v1  NaN  2010.0    2.1
    2        sa       v2    W  2010.0    2.3
    3        sb       v2    W  2010.0    1.2
    4        sa      NaN    K  2015.0    0.8
    5        sb       v1  NaN  2015.0    NaN
    6        sa       v2    W  2015.0    3.2
    7        sb       v2    W  2015.0    2.8
    8        sa      NaN    K  2025.0    1.2
    9        sb       v1  NaN  2025.0    8.4
    10       sa       v2    W  2025.0    3.0
    11       sb       v2    W  2025.0    NaN
    >>>
    >>> # Specify a different time column name
    >>> df.openscm.to_long_data(time_col_name="year")
       scenario variable unit    year  value
    0        sa      NaN    K  2010.0    1.1
    1        sb       v1  NaN  2010.0    2.1
    2        sa       v2    W  2010.0    2.3
    3        sb       v2    W  2010.0    1.2
    4        sa      NaN    K  2015.0    0.8
    5        sb       v1  NaN  2015.0    NaN
    6        sa       v2    W  2015.0    3.2
    7        sb       v2    W  2015.0    2.8
    8        sa      NaN    K  2025.0    1.2
    9        sb       v1  NaN  2025.0    8.4
    10       sa       v2    W  2025.0    3.0
    11       sb       v2    W  2025.0    NaN
    >>>
    >>> # The result is just a pandas DataFrame,
    >>> # so you can do whatever operations you want
    >>> # on the result.
    >>> # A common one is probably dropping all rows with NaN
    >>> df.openscm.to_long_data(time_col_name="year").dropna()
       scenario variable unit    year  value
    2        sa       v2    W  2010.0    2.3
    3        sb       v2    W  2010.0    1.2
    6        sa       v2    W  2015.0    3.2
    7        sb       v2    W  2015.0    2.8
    10       sa       v2    W  2025.0    3.0
    >>>
    >>> # or just rows with NaN in particular columns
    >>> df.openscm.to_long_data(time_col_name="year").dropna(subset=["variable"])
       scenario variable unit    year  value
    1        sb       v1  NaN  2010.0    2.1
    2        sa       v2    W  2010.0    2.3
    3        sb       v2    W  2010.0    1.2
    5        sb       v1  NaN  2015.0    NaN
    6        sa       v2    W  2015.0    3.2
    7        sb       v2    W  2015.0    2.8
    9        sb       v1  NaN  2025.0    8.4
    10       sa       v2    W  2025.0    3.0
    11       sb       v2    W  2025.0    NaN
    """
    return ts_to_long_data(self._df, time_col_name=time_col_name)

update_index_levels #

update_index_levels(
    updates: dict[Any, Callable[[Any], Any]],
    copy: bool = True,
    remove_unused_levels: bool = True,
) -> DataFrame

Update the index levels

Parameters:

Name Type Description Default
updates dict[Any, Callable[[Any], Any]]

Updates to apply to the index levels

Each key is the index level to which the updates will be applied. Each value is a function which updates the levels to their new values.

required
copy bool

Should the pd.DataFrame be copied before returning?

True
remove_unused_levels bool

Remove unused levels before applying the update

Specifically, call pd.MultiIndex.remove_unused_levels.

This avoids trying to update levels that aren't being used.

True

Returns:

Type Description
DataFrame

pd.DataFrame with updates applied to its index

Source code in src/pandas_openscm/accessors/dataframe.py
def update_index_levels(
    self,
    updates: dict[Any, Callable[[Any], Any]],
    copy: bool = True,
    remove_unused_levels: bool = True,
) -> pd.DataFrame:
    """
    Update the index levels

    Parameters
    ----------
    updates
        Updates to apply to the index levels

        Each key is the index level to which the updates will be applied.
        Each value is a function which updates the levels to their new values.

    copy
        Should the [pd.DataFrame][pandas.DataFrame] be copied before returning?

    remove_unused_levels
        Remove unused levels before applying the update

        Specifically, call
        [pd.MultiIndex.remove_unused_levels][pandas.MultiIndex.remove_unused_levels].

        This avoids trying to update levels that aren't being used.

    Returns
    -------
    :
        [pd.DataFrame][pandas.DataFrame] with updates applied to its index
    """
    return update_index_levels_func(
        self._df,
        updates=updates,
        copy=copy,
        remove_unused_levels=remove_unused_levels,
    )

update_index_levels_from_other #

update_index_levels_from_other(
    update_sources: dict[
        Any,
        tuple[
            Any,
            Callable[[Any], Any]
            | dict[Any, Any]
            | Series[Any],
        ]
        | tuple[
            tuple[Any, ...],
            Callable[[tuple[Any, ...]], Any]
            | dict[tuple[Any, ...], Any]
            | Series[Any],
        ],
    ],
    copy: bool = True,
    remove_unused_levels: bool = True,
) -> DataFrame

Update the index levels based on other index levels

Parameters:

Name Type Description Default
update_sources dict[Any, tuple[Any, Callable[[Any], Any] | dict[Any, Any] | Series[Any]] | tuple[tuple[Any, ...], Callable[[tuple[Any, ...]], Any] | dict[tuple[Any, ...], Any] | Series[Any]]]

Updates to apply to the data's index

Each key is the level to which the updates will be applied (or the level that will be created if it doesn't already exist).

There are two options for the values.

The first is used when only one level is used to update the 'target level'. In this case, each value is a tuple of which the first element is the level to use to generate the values (the 'source level') and the second is mapper of the form used by pd.Index.map which will be applied to the source level to update/create the level of interest.

Each value is a tuple of which the first element is the level or levels (if a tuple) to use to generate the values (the 'source level') and the second is mapper of the form used by pd.Index.map which will be applied to the source level to update/create the level of interest.

required
copy bool

Should the pd.DataFrame be copied before returning?

True
remove_unused_levels bool

Remove unused levels before applying the update

Specifically, call pd.MultiIndex.remove_unused_levels.

This avoids trying to update levels that aren't being used.

True

Returns:

Type Description
DataFrame

pd.DataFrame with updates applied to its index

Source code in src/pandas_openscm/accessors/dataframe.py
def update_index_levels_from_other(
    self,
    update_sources: dict[
        Any,
        tuple[
            Any,
            Callable[[Any], Any] | dict[Any, Any] | pd.Series[Any],
        ]
        | tuple[
            tuple[Any, ...],
            Callable[[tuple[Any, ...]], Any]
            | dict[tuple[Any, ...], Any]
            | pd.Series[Any],
        ],
    ],
    copy: bool = True,
    remove_unused_levels: bool = True,
) -> pd.DataFrame:
    """
    Update the index levels based on other index levels

    Parameters
    ----------
    update_sources
        Updates to apply to the data's index

        Each key is the level to which the updates will be applied
        (or the level that will be created if it doesn't already exist).

        There are two options for the values.

        The first is used when only one level is used to update the 'target level'.
        In this case, each value is a tuple of which the first element
        is the level to use to generate the values (the 'source level')
        and the second is mapper of the form used by
        [pd.Index.map][pandas.Index.map]
        which will be applied to the source level
        to update/create the level of interest.

        Each value is a tuple of which the first element
        is the level or levels (if a tuple)
        to use to generate the values (the 'source level')
        and the second is mapper of the form used by
        [pd.Index.map][pandas.Index.map]
        which will be applied to the source level
        to update/create the level of interest.

    copy
        Should the [pd.DataFrame][pandas.DataFrame] be copied before returning?

    remove_unused_levels
        Remove unused levels before applying the update

        Specifically, call
        [pd.MultiIndex.remove_unused_levels][pandas.MultiIndex.remove_unused_levels].

        This avoids trying to update levels that aren't being used.

    Returns
    -------
    :
        [pd.DataFrame][pandas.DataFrame] with updates applied to its index
    """
    return update_index_levels_from_other_func(
        self._df,
        update_sources=update_sources,
        copy=copy,
        remove_unused_levels=remove_unused_levels,
    )