Access data from WAPI
Example
The data in WAPI is stored in curves
. A curve is a collection of metadata,
describing one or more time series.
There are 4 types of curves:
TIME_SERIES
TAGGED
INSTANCES
TAGGED_INSTANCES
This chapter describes how to search for available curves in WAPI and how to access the stored data, based on the given curve type.
Searching for curves
Each curve can have various of the following metadata
attributes that
describe the curve.
commodity
categories
area
border_source
station
sources
scenarios
unit
time_zone
version
frequency
data_type
The standard way of finding curves, is by searching using a combination of these metadata attributes. To search for curves, you can either use the api web interface (see the documentation for more info) or search for curves within python.
To search for curves within python, use the wapi.session.Session.search()
function.
The valid values for each attribute can be accessed using
the specific session.get_ATTRIBUTE
function:
Getting a curve object
In order to fetch data from WAPI, you first have to fetch the curve you want
to read the data from. You can either do this by
searching for curves ,
since this will already return a list of curve objects. Or you can get
a curve object by its name using the wapi.session.Session.get_curve()
method:
curve = session.get_curve(name='pro ee wnd intraday mwh/h cet h a')
Each curve has all of the following attributes
:
id: id of the curve
name: name of the curve
curve_state: state of the curve (Normal availability, Beta release or Scheduled for removal)
curve_type: one of the 4 defines types (TIME_SERIES, TAGGED, INSTANCES and TAGGED_INSTANCES)
The value of the attribute can be accessed with curve.attribute_name
, eg
>>> curve = session.get_curve(name='pro ee wnd intraday mwh/h cet h a')
>>> curve.name
pro ee wnd intraday mwh/h cet h a
>>> curve.curve_type
TIME_SERIES
Getting data from a curve object
There is a different method to get data for each of the 4 types of curves (TIME_SERIES, TAGGED, INSTANCES, TAGGED_INSTANCES)
To find out the type of a given curve, use the curve.curve_type
attribute:
>>> curve = session.get_curve(name='pro ee wnd intraday mwh/h cet h a')
>>> curve.curve_type # check the type of the given curve
'TIME_SERIES'
Getting data from a TIME_SERIES curve
A Time Series curves holds a single time series.
This is used for actual values, backcasts, normals, etc.
To get data from a Times Series curve, use the
get_data()
method
( wapi.curves.TimeSeriesCurve.get_data()
). You can get the data as it
is stored in the curve, by defining a start date (data_from) and
an end date (data_to)
curve = session.get_curve(name='pro ee wnd intraday mwh/h cet h a')
ts = curve.get_data(data_from='2018-01-01T14:00Z', data_to='2018-02-01T14:00Z')
Note
End dates are always excluded in the result!
The get_data()
method returns
a TS
object (wapi.util.TS
).
Here you can see how to work with an TS object .
It is possible to process curves directly in the API (eg aggregating to
daily/weekly/monthly/yearly values) by using additional inputs to the
get_data()
method. This can be used with great effect to reduce the amount of
data retrieved if the full set of details is not needed.
Have a look at the detailed method documentation below and at our
examples .
Getting data from a TAGGED curve
A tagged curve holds a set of closely related time series, each identified by a tag. The most common use of tags is for ensemble weather data.
The existing set of tags of a curve can be found using the
get_tags()
method:
tags = curve.get_tags()
You can get data from a tagged curve using the
get_data()
method. This method has the same
inputs and functionality as the wapi.curves.TimeSeriesCurve.get_data()
method for Time Series curves. Additionally you can provide a tag
argument.
tag
can be a single value or a list of values. If omitted, the default tag
is returned. When a list of tags is requested, a list of time series is
returned:
# get data between two dates for all tags
ts_list = curve.get_data(data_from='2018-01-01', data_to='2018-02-01')
# get data between two dates for single tag='Avg'
ts = curve.get_data(data_from='2018-01-01', data_to='2018-02-01', tag='Avg')
# get data between two dates for tags 'Avg', '01' and '12'
ts_list = curve.get_data(data_from='2018-01-01', data_to='2018-02-01', tag=['Avg','01','12'])
Getting data from a INSTANCES curve
A Instance curve contains a time series for each issue_date of the curve. This is typically a forecast with a time series for each issue_date of the forecast.
You can fetch a single instance identified by its issue_date using the
get_instance()
method:
ts = curve.get_instance(issue_date='2018-01-01T00:00')
You can fetch multiple instances (within a given time-range) using the
search_instances()
method. The function
will only return TS
objects with data, when the
with_data
argument is set to True
(default is False
and will return
a TS
object with meta data only):
ts_list = curve.search_instances(issue_date_from='2018-07-01Z00:00',
issue_date_to='2018-07-04Z00:00',
with_data=True)
You can also fetch the latest available instance using the
get_latest()
method:
ts = curve.get_latest()
Note
All three methods allow to process curves directly in the API (eg. select date ranges, aggregating, filtering, changing timezones) by using additional inputs. Have a look at the detailed function descriptions below and at the provided examples.
Getting data from a TAGGED_INSTANCES curve
Tagged Instance curves are a combination of Tagged curves and Instance curves. A Tagged Instance curve typically represents forecasts that contain multiple time series for each issue_date of the forecast, which are assigned to tags. Each time series is therefore defined by a unique combination of issue_date and tag. Ensamble forecasts are a typical use case for Tagged Instance curves.
The existing set of tags of a curve can be found using the
get_tags()
method:
tags = curve.get_tags()
You can fetch a single instance identified by its issue_date using the
get_instance()
method.
This function allows you the specify a single tag or a list of tags to the
tag
argument. If omitted, the default tag is returned.
# get all tags for this issue date
ts_list = curve.get_instance(issue_date='2018-07-01T00:00')
# get data for this issue date for single tag='Avg'
ts = curve.get_instance(issue_date='2018-07-01T00:00', tag='Avg')
# get data for this issue date for tags 'Avg', '02' and '05'
ts_list = curve.get_instance(issue_date='2018-07-01T00:00', tag=['Avg','02','05'])
You can fetch multiple instances (within a given time-range) using the
search_instances()
method. The function
will only return TS
objects with data, when the with_data
argument is set to True
(default is False
and will return a
TS
object with meta data only). Here you can again omit
the tags
argument, which returns the default tag for each
issue_date, or specify a single tag or a list of tags.
ts_list = curve.search_instances(issue_date_from='2018-07-01Z00:00',
issue_date_to='2018-07-04Z00:00',
with_data=True,
tags=['Avg','11'])
You can also fetch the latest available instance using the
get_latest()
method. This function will always
return exactly ONE Time Series curve for ONE tag of the latest issue_date.
It is possible to provide a list of tags to the tags
argument,
but it is strongly recommended to specify ONE SINGLE TAG here! If omitted,
the default tag is returned.
ts = curve.get_latest(tags='03')
Note
All three methods to get data allow to process curves directly in the API (eg. select date ranges, aggregating, filtering, changing timezones) by using additional inputs. Have a look at the detailed function descriptions below and at the provided examples.
Working with data from a curve object
Independent from the curve type and the respective method to get the data,
all these methods return a TS
object
(wapi.util.TS
).
The most important function of the TS
class, is the
to_pandas()
function,
which will return a pandas.Series object with a date index, containing the
data of the curve:
>>> curve = session.get_curve(name='pro ee wnd intraday mwh/h cet h a')
>>> ts = curve.get_data(data_from="2018-01-01", data_to="2018-01-05",
>>> frequency="D", function="SUM")
>>> ts.to_pandas()
2018-01-01 00:00:00+01:00 2169.0
2018-01-02 00:00:00+01:00 3948.0
2018-01-03 00:00:00+01:00 1489.0
2018-01-04 00:00:00+01:00 1860.0
Freq: D, Name: pro ee wnd intraday mwh/h cet h a, dtype: float64
Have a look at our examples or at the pandas documentation , to see how to work with pandas.Series or pandas.DataFrame objects.
The TS
class contains some simple aggregation functions, which can be
used directly on a TS
object:
sum()
, mean()
and median()
.