Skip to content

OpenSCMDB back-end speed#

Here we present an analysis of the speed of different back-ends available to use with OpenSCMDB. For each back-end, we show how fast it is to read and write a collection of timeseries. We consider use cases that mirror the use cases we have. If you have a use case that you would like us to include in this analysis, please raise an issue.

Run-time conditions#

  • pandas-openscm version: 0.1.1a1
  • pandas-openscm commit: 53479d6df2ca6c8ea5e5001221bdc25e3e86b4d8
  • python version: 3.11.11
  • platform information: macOS-14.3-arm64-arm-64bit
  • number of workers used in parallel tests: 8

Simple climate model full output#

Here we consider the case of output that mirrors the output produced by simple climate models. We have a number of scenarios and variables, each of which has hundreds of ensemble members. The output also spans a relatively long time period.

      Back-end In memory csv feather netCDF
In-memory size (MB) Time series Time points Files in database (after grouping)        
9.2 3,000 401 1 0.00 0.81 0.05 0.18
5 0.00 0.82 0.10 0.05
3,000 0.69 4.59 50.97 16.63
91.9 30,000 401 1 0.00 7.51 0.17 0.21
50 0.04 7.07 1.06 0.48
275.8 90,000 401 1 0.00 23.92 0.67 0.26
150 0.12 24.36 4.16 1.35
919.3 300,000 401 1 0.01 80.25 1.38 1.13
500 0.42 81.60 12.61 5.55

      Back-end In memory csv feather netCDF
In-memory size (MB) Time series Time points Files in database (after grouping)        
9.2 3,000 401 1 nan 1.30 0.66 0.73
5 nan 1.07 0.80 0.91
3,000 nan 2.92 10.71 5.64
91.9 30,000 401 1 nan 7.86 0.85 0.86
50 nan 2.03 1.00 1.20
275.8 90,000 401 1 nan 23.74 1.87 1.26
150 nan 5.79 1.80 1.59
919.3 300,000 401 1 nan 78.68 3.05 2.80
500 nan 21.31 4.05 2.86

*Note that there are no values for parallel operations with an in-memory back-end because such a setup doesn't make sense (you just duplicate the Python process for no gain).

      Back-end In memory csv feather netCDF
In-memory size (MB) Time series Time points Files in database (after grouping)        
9.2 3,000 401 1 0.00 21.37 9.41 9.49
5 0.00 21.38 10.03 9.59
3,000 0.00 29.69 469.18 61.47
91.9 30,000 401 1 0.00 213.67 92.65 94.35
50 0.00 213.83 100.27 95.66
275.8 90,000 401 1 0.00 641.11 277.66 282.95
150 0.00 641.64 300.79 286.93
919.3 300,000 401 1 0.00 2,137.17 925.19 943.06
500 0.00 2,139.08 1,002.63 956.38

      Back-end In memory csv feather netCDF
In-memory size (MB) Time series Time points Files in database (after grouping)        
9.2 3,000 401 1 0.00 0.12 0.01 0.06
5 0.00 0.14 0.04 0.04
3,000 0.11 8.34 12.36 11.43
91.9 30,000 401 1 0.00 0.90 0.05 0.09
50 0.03 1.13 0.32 0.38
275.8 90,000 401 1 0.00 3.20 0.15 0.31
150 0.05 4.12 1.01 0.92
919.3 300,000 401 1 0.00 10.27 0.71 1.20
500 0.20 15.19 7.13 3.95

      Back-end In memory csv feather netCDF
In-memory size (MB) Time series Time points Files in database (after grouping)        
9.2 3,000 401 1 nan 0.54 0.52 0.66
5 nan 0.71 0.65 0.73
3,000 nan 3.42 6.51 4.71
91.9 30,000 401 1 nan 1.50 0.70 0.77
50 nan 1.00 0.93 1.35
275.8 90,000 401 1 nan 4.10 1.08 1.17
150 nan 2.23 1.35 1.75
919.3 300,000 401 1 nan 10.49 2.66 2.93
500 nan 5.59 3.37 3.19

*Note that there are no values for parallel operations with an in-memory back-end because such a setup doesn't make sense (you just duplicate the Python process for no gain).

      Back-end In memory csv feather netCDF
In-memory size (MB) Time series Time points Files in database (after grouping)        
9.2 3,000 401 1 0.00 0.00 0.00 0.00
5 0.00 0.00 0.00 0.00
3,000 0.07 0.13 0.16 0.17
91.9 30,000 401 1 0.00 0.01 0.00 0.00
50 0.00 0.01 0.01 0.01
275.8 90,000 401 1 0.00 0.05 0.01 0.04
150 0.00 0.04 0.01 0.02
919.3 300,000 401 1 0.00 0.31 0.01 0.02
500 0.01 0.25 0.04 0.15

Simple climate model future quantile output#

Here we consider the case of output that mirrors processed output produced from simple climate models. We have a number of scenarios and variables, each of which has been processed to a few quantiles. The output is restricted to the future time period.

      Back-end In memory csv feather netCDF
In-memory size (MB) Time series Time points Files in database (after grouping)        
0.1 75 126 1 0.00 0.01 0.01 0.02
5 0.00 0.01 0.03 0.04
0.7 750 126 1 0.00 0.06 0.01 0.02
50 0.01 0.10 0.26 0.31
7.3 7,500 126 1 0.00 0.69 0.02 0.03
5 0.00 0.62 0.07 0.05
21.8 22,500 126 1 0.00 1.89 0.04 0.07
5 0.01 1.92 0.09 0.08
72.6 75,000 126 1 0.00 6.35 0.16 0.19
5 0.02 6.56 0.14 0.18
217.4 225,000 126 1 0.01 18.61 0.64 0.30
50 0.07 18.87 0.60 0.82
725.5 750,000 126 1 0.01 60.37 1.78 1.08
5 0.17 65.08 2.24 1.14

      Back-end In memory csv feather netCDF
In-memory size (MB) Time series Time points Files in database (after grouping)        
0.1 75 126 1 nan 0.47 0.63 0.53
5 nan 0.65 0.72 0.70
0.7 750 126 1 nan 0.47 0.42 0.50
50 nan 0.72 0.80 0.93
7.3 7,500 126 1 nan 1.13 0.55 0.62
5 nan 0.86 0.69 0.93
21.8 22,500 126 1 nan 2.46 0.63 0.66
5 nan 1.41 0.82 0.93
72.6 75,000 126 1 nan 6.81 0.76 0.82
5 nan 2.52 0.90 1.05
217.4 225,000 126 1 nan 19.10 1.32 1.18
50 nan 5.08 1.92 1.91
725.5 750,000 126 1 nan 57.81 2.99 2.78
5 nan 18.87 3.11 2.85

*Note that there are no values for parallel operations with an in-memory back-end because such a setup doesn't make sense (you just duplicate the Python process for no gain).

      Back-end In memory csv feather netCDF
In-memory size (MB) Time series Time points Files in database (after grouping)        
0.1 75 126 1 0.00 0.17 0.13 0.11
5 0.00 0.18 0.33 0.16
0.7 750 126 1 0.00 1.71 0.79 0.82
50 0.00 1.76 3.25 1.49
7.3 7,500 126 1 0.00 17.08 7.44 7.89
5 0.00 17.08 7.64 7.97
21.8 22,500 126 1 0.00 51.27 22.21 23.60
5 0.00 51.27 22.42 23.74
72.6 75,000 126 1 0.00 170.93 73.93 78.60
5 0.00 170.93 74.18 79.00
217.4 225,000 126 1 0.00 513.01 221.63 235.26
50 0.00 513.23 224.20 237.07
725.5 750,000 126 1 0.00 1,710.69 739.01 785.35
5 0.00 1,710.69 739.57 788.20

      Back-end In memory csv feather netCDF
In-memory size (MB) Time series Time points Files in database (after grouping)        
0.1 75 126 1 0.00 0.01 0.01 0.02
5 0.00 0.01 0.02 0.03
0.7 750 126 1 0.00 0.01 0.01 0.01
50 0.00 0.07 0.10 0.20
7.3 7,500 126 1 0.00 0.12 0.02 0.03
5 0.00 0.11 0.02 0.05
21.8 22,500 126 1 0.00 0.35 0.03 0.05
5 0.01 0.31 0.04 0.08
72.6 75,000 126 1 0.00 0.90 0.05 0.11
5 0.01 0.89 0.09 0.13
217.4 225,000 126 1 0.00 2.53 0.22 0.30
50 0.05 3.11 0.41 0.56
725.5 750,000 126 1 0.00 7.95 0.66 1.22
5 0.10 9.23 0.96 1.35

      Back-end In memory csv feather netCDF
In-memory size (MB) Time series Time points Files in database (after grouping)        
0.1 75 126 1 nan 0.38 0.38 0.50
5 nan 0.53 0.50 0.66
0.7 750 126 1 nan 0.37 0.34 0.41
50 nan 0.75 0.78 0.94
7.3 7,500 126 1 nan 0.59 0.45 0.50
5 nan 0.60 0.58 0.73
21.8 22,500 126 1 nan 0.92 0.51 0.59
5 nan 0.82 0.72 1.36
72.6 75,000 126 1 nan 1.42 0.60 0.82
5 nan 1.16 0.81 0.92
217.4 225,000 126 1 nan 3.46 0.91 1.11
50 nan 2.32 1.53 1.87
725.5 750,000 126 1 nan 9.34 2.36 2.51
5 nan 5.72 3.23 2.88

*Note that there are no values for parallel operations with an in-memory back-end because such a setup doesn't make sense (you just duplicate the Python process for no gain).

      Back-end In memory csv feather netCDF
In-memory size (MB) Time series Time points Files in database (after grouping)        
0.1 75 126 1 0.00 0.00 0.00 0.00
5 0.00 0.00 0.00 0.00
0.7 750 126 1 0.00 0.00 0.00 0.00
50 0.00 0.00 0.00 0.00
7.3 7,500 126 1 0.00 0.00 0.00 0.00
5 0.00 0.00 0.00 0.00
21.8 22,500 126 1 0.00 0.00 0.00 0.00
5 0.00 0.01 0.00 0.00
72.6 75,000 126 1 0.00 0.01 0.00 0.01
5 0.00 0.01 0.00 0.00
217.4 225,000 126 1 0.00 0.05 0.01 0.01
50 0.00 0.04 0.01 0.01
725.5 750,000 126 1 0.00 0.20 0.05 0.01
5 0.00 0.17 0.02 0.04