User Guide¶
The MULTIPLY Data Access Component is supposed to be used via its Python API. Therefore, most of this section will deal with the Usage via the Python API. To see how to manually manipulate the data stores file, see Configuration. If you want to register a new data store from data that is saved on local disk, see How to add new Local Data Stores. Finally, if you find that the Data Access Component is missing functionality, you can extend it by Implementing a new File System or Implementing a new Meta Info Provider. When you have these two set up, you can create a new data store by editing the default data stores yaml file.
implementing new file systems
implementing new meta info providers
implementing new data types
Usage via the Python API¶
This section gives an overview about how the Data Access Component can be used within Python. The only component that is supposed to be used directly is the DataAccessComponent object.
DataAccessComponent¶
-
class
multiply_data_access.data_access_component.
DataAccessComponent
[source]¶ The controlling component. The data access component is responsible for communicating with the various data stores and decides which data is used from which data store.
-
can_put
(data_type: str) → bool[source]¶ - Parameters
data_type – A data type.
- Returns
True, if data of this type can be added to at least one data store.
-
create_local_data_store
(base_dir: Optional[str] = None, meta_info_file: Optional[str] = None, base_pattern: Optional[str] = '/dt/yy/mm/dd/', id: Optional[str] = None, supported_data_types: Optional[str] = None)[source]¶ Adds a a new local data store and saves it permanently. It will consist of a LocalFileSystem and a JsonMetaInfoProvider. :param supported_data_types: A string with the comma-separated names of data types shall be allowed in this data store. If this is None or empty, the data types will be derived from the data sets in the json file. If there are no entries in the json file, it will be guessed from the data in the file system. :param base_dir: The base directory to which the data shall be written. :param meta_info_file: A JSON file that already contains meta information about the data that is present in the folder. If not provided, an empty file will be created and filled with the data that match the base directory and the base pattern. :param base_pattern: A pattern that allows to create an order in the base directory. Available options are ‘dt’ for the data type, ‘yy’ for the year, ‘mm’ for the month, and ‘dd’ for the day, arrangeable in any oder. If no pattern is given, all data will simply be written into the base directory. :param id: An identifier for the Data Store. If there already exists a Data Store with the name, an additional number will be added to the name.
-
get_data_urls
(roi: str, start_time: str, end_time: str, data_types: str) → List[str][source]¶ Builds a query from the given parameters and asks all data stores whether they contain data that match the query. If datasets are found, url’s to their positions are returned. :return: a list of url’s to locally stored files that match the conditions given by the query in the parameter.
-
get_data_urls_from_data_set_meta_infos
(data_set_meta_infos: List[multiply_data_access.data_access.DataSetMetaInfo]) → List[str][source]¶ Builds a query from the given parameters and asks all data stores whether they contain data that match the query. If datasets are found, url’s to their positions are returned. :return: a list of url’s to locally stored files that match the conditions given by the query in the parameter.
-
get_provided_data_types
() → List[str][source]¶ - Returns
A list of all data types that are provided by the Data Access Component.
-
put
(path: str, data_store_id: Optional[str] = None) → None[source]¶ Puts data into the data access component. If the id to a data store is provided, the data access component will attempt to put the data into the store. If data cannot be added to that particular store, it will not be attempted to put it into another one. If no store id is provided, the data access component will on its own try to determine an apt data store. A data store is considered apt if it already holds data of the same type. :param path: A path to the data that shall be added to the Data Access Component. :param data_store_id: The id of a data store. Can be None.
-
query
(roi: str, start_time: str, end_time: str, data_types: str) → List[multiply_data_access.data_access.DataSetMetaInfo][source]¶ Distributes the query on all registered data stores and returns meta information on all data sets that meet the conditions of the query. :param roi: The region of interest, given in the form of a wkt-string. :param start_time: The start time of the query, given as a string in UTC time format :param end_time: The end time of the query, given as a string in UTC time format :param data_types: A list of data types to be queried for. :return: A list of DataSetMetaInfos that meet the conditions of the query.
-
DataSetMetaInfo¶
-
class
multiply_data_access.data_access.
DataSetMetaInfo
(coverage: str, start_time: Optional[str], end_time: Optional[str], data_type: str, identifier: str, referenced_data: Optional[str] = None)[source]¶ A representation of meta information about a data set. To be retrieved from a query on a MetaInfoProvider or DataStore.
-
property
coverage
¶ The dataset’s spatial coverage, given as WKT string.
-
property
data_type
¶ The type of the dataset.
-
property
end_time
¶ The dataset’s end time. Can be none.
-
equals
(other: object) → bool[source]¶ Checks whether two data set meta infos are equal. Does not check the identifier or referenced data sets!
-
equals_except_data_type
(other: object) → bool[source]¶ Checks whether two data set meta infos are equal, except that they may have the same data type. Does not check the identifier or referenced data sets!
-
property
identifier
¶ An identifier so that the data set can be found on the Data Store’s File System.
-
property
referenced_data
¶ A list of additional files that are referenced by this data set. Can be none.
-
property
start_time
¶ The dataset’s start time. Can be none.
-
property
How to add new Local Data Stores¶
You can add a new local data store via the Python API like this. This will create a new data store consisting of a LocalFileSystem and a JsonMetaInfoProvider.
All parameters are optional.
The default for the base directory is the .multiply
-folder in the user’s home directory.
The base directory will be checked for any pre-existing data.
This data will be registered in the store if it is of any of the supported data types.
If you do not specify the supported data types, the Data Access Component will determine these from the entries in the
JSON metainfo file.
If no metadata file is provided, the data types will be determined from the data in the base directory.
If finally no data can be found there, the data store is not created.
Implementing new Data Stores¶
If you need to create a completely new data store, you will probably need to implement both a new File System and a new Meta Info Provider (we advise to check whether you can re-use existing File Systems and Meta Info Providers). This section is a guideline on how to do so. It is recommended to consider Basic Concepts first.
Implementing a new File System¶
The basic decision is whether the file system shall be wrapped by a local file system or not.
The wrapping functionality is provided by the LocallyWrappedFileSystem
in locally_wrapped_data_access.py
.
Choose this if you want to access remote data but don’t want to bother with how to organize the data on the local disk.
Implementing a Non-Locally Wrapped File System¶
For this, you need to adher to the interfaces FileSystemAccessor
and FileSystem
defined in data_access.py
.
The following lists the methods of the interface that need to be implemented:
-
class
multiply_data_access.data_access.
FileSystemAccessor
[source]¶
name
: Shall return the name of the file system.
create_from_parameters
: Will receive a list of parameters and create a file system by handing these in as the
initialization parameters.
Shall correspond to the dictionary handed out by FileSystem’s get_parameters_as_dict
.
-
class
multiply_data_access.data_access.
FileSystem
[source]¶ An abstraction of a file system on which data sets are physically stored
-
abstract
get
(data_set_meta_info: multiply_data_access.data_access.DataSetMetaInfo) → Sequence[<Mock name='mock.FileRef' id='140144878905832'>][source]¶ Retrieves a sequence of ‘FileRef’s.
-
abstract
get_parameters_as_dict
() → dict[source]¶ - Returns
The parameters of this file system as dict
-
abstract
put
(from_url: str, data_set_meta_info: multiply_data_access.data_access.DataSetMetaInfo) → multiply_data_access.data_access.DataSetMetaInfo[source]¶ Adds a data set to the file system by copying it from the given url to the expected location within the file system. Returns an updated data set meta info.
-
abstract
name
: Shall simply return the name of the file system.
This will serve as identifier.
get
: From a list of :ref:`ug_02`s, this returns FileRefs to the data that is ready to be accessed, i.e.,
is provided locally.
This part would perform a download if necessary.
get_parameters_as_dict
: This will return the parameters that are needed to reconstruct the file system.
The parameters will eventually be written to the data stores file.
Shall correspond to the dictionary handed in by the FileSystemAccessors’s create_from_parameters
.
can put
: Shall return true when the Data Access Component can add data to the file system.
put
: Will copy the data located from the url to the file system and update the data set meta info.
You might throw a User Warning here if you do not support this operation.
You can use the identifier of the data set meta info to later relocate the file on the file system more easily.
remove
: Shall remove the file identified by the data set meta info from the file system.
You might throw a User Warning here if you do not support this operation.
scan
: Retrieves data set meta infos for all data that is found on the file system.
This expects to find the data that is directly, i.e, locally available.
To later have the file system available in the data access component,
you need to register it in the setup.py
of your python package.
The registration should look like this:
- setup(name=’my-multiply-data-access-extension’,
version=1.0, packages=[‘my_multiply_package’], entry_points={
- ‘file_system_plugins’: [
‘my_file_system = my_multiply_package:my_file_system.MyFileSystemAccessor’
],
}, )
Implementing a Locally Wrapped File System¶
A locally wrapped file system requires a FileSystemAccessor that should be defined as above.
The LocallyWrappedFileSystem
base class already implements some of the methods,
but puts up other method stubs that need to be implemented.
Note that all these methods are private.
Already implemented methods are:
* get
* get_parameters_as_dict
* can_put
* put
* remove
* scan
So, actually the only method from the FileSystem
interface that still needs implementing is name
.
-
class
multiply_data_access.locally_wrapped_data_access.
LocallyWrappedFileSystem
(parameters: dict)[source]¶ -
abstract
_get_from_wrapped
(data_set_meta_info: multiply_data_access.data_access.DataSetMetaInfo) → Sequence[<Mock name='mock.FileRef' id='140144878905832'>][source]¶ Retrieves the file ref from the wrapped file system.
-
abstract
_get_wrapped_parameters_as_dict
() → dict[source]¶ - Returns
The parameters of this wrapped file system as dict
-
abstract
_init_wrapped_file_system
: This method is called right after the creation of the object.
Implement it to initialize the file system with parameters.
Shall correspond to the dictionary handed out by _get_wrapped_parameters_as_dict
.
_get_from_wrapped
: Like get
from the File System: Will retrieve FileRefs to data.
This data has to be provided locally, so any downloading has to be performed here.
_notify_copied_to_local
: Informs the File System that the data desidnated by the data set meta info has been put to
the local file system.
You do not have to do anythin here, but in case you have downloaded the data to a temporary directory,
this is a good time to delete it from there.
_get_wrapped_parameters_as_dict
: Similar to the FileSystem
’s get_parameters_as_dict
, this method will return
the required initialization parameters in the form of a dictionary.
Shall correspond to the dictionary handed in to _init_wrapped_file_system
.
Implementing a new Meta Info Provider¶
In many cases when you require your own dedicated File System, you will want to add a Meta Info Provider.
As for the File System, you also have the choice to create a locally wrapped version of it or not.
The wrapping functionality is provided by the LocallyWrappedMetaInfoProvider
in locally_wrapped_data_access.py
.
Choose this if you want to provide information about remotely stored data and keep it separated from information
about data from this source that has already been downloaded.
Implementing a Non-Locally Wrapped Meta Info Provider¶
To implement a regular Meta Info Provider, you need to create realizations of the interfaces
MetaInfoProviderAccessor
and MetaInfoProvider
defined in data_access.py
.
The MetaInfoProviderAccessor is required by the DataAccessComponent so that MetaInfoProviders can be registered and
created.
The following lists the methods of the MetaInfoProviderAccessor interface that need to be implemented:
-
class
multiply_data_access.data_access.
MetaInfoProviderAccessor
[source]¶
name
: Shall return the name of the meta info provider.
create_from_parameters
: Will receive a list of parameters and create a meta infor provider by handing the
parameters in as the initialization parameters.
Shall correspond to the dictionary handed out by the MetaInfoProvider’s _get_parameters_as_dict
.
The methods to be implemented for the MetaInfoProvider are:
-
class
multiply_data_access.data_access.
MetaInfoProvider
[source]¶ An abstraction of a provider that contains meta information about the files provided by a data store.
-
abstract
_get_parameters_as_dict
() → dict[source]¶ - Returns
The parameters of this file system as dict
-
abstract
get_all_data
() → Sequence[multiply_data_access.data_access.DataSetMetaInfo][source]¶ Returns all available data set meta infos.
-
abstract
get_provided_data_types
() → List[str][source]¶ - Returns
A list of the data types provided by this data store.
-
abstract
provides_data_type
(data_type: str) → bool[source]¶ Whether the meta info provider provides access to data of the queried type :param data_type: A string labelling the data :return: True if data of that type can be requested from the meta info provider
-
abstract
query
(query_string: str) → List[multiply_data_access.data_access.DataSetMetaInfo][source]¶ Processes a query and retrieves a result. The result will consist of all the data sets that satisfy the query. :return: A list of meta information about data sets that fulfill the query.
-
abstract
name
: Shall simply return the name of the meta info provider.
This will serve as identifier.
query
: Evaluates a query string and returns a list of data set meta infos about available data that fulfils the
query.
A query string consists of a geometry in the form of a wkt string, a start time in UTC format,
an end time in UTC format, and a comma-separated list of data types.
provides_data_type
: True, if the meta info provider is apt for this data.
Returning true here does not necessarilyy mean that data of this type is currently stored.
get_provided_data_types
: Returns a list of all data types that this meta info provider supports.
_get_parameters_as_dict
: A private method that will return the parameters that are needed to reconstruct
the meta info provider.
The parameters will eventually be written to the data stores file.
Shall correspond to the dictionary handed in by the MetaInfoProviderAccessors’s create_from_parameters
.
can_update
: Shall return true when entries about data available on the file system can be added to this
meta info provider.
update
: Hands in a data set that has been put to the file system.
The meta info provider is expected to store this information and retrieve it when it meets an incoming query.
If this is not implemented, make sure that can_update
returns false.
remove
: Shall remove the entry associated with the data set meta info from the provider’s registry.
If this is not implemented, make sure that can_update
returns false.
get_all_data
: Shall return data set meta infos about all available data.
As for the File System, the Meta Info Provider needs to be registered in the setup.py
of the python package to
make it available for the data access component.
The registration should look like this:
- setup(name=’my-multiply-data-access-extension’,
version=1.0, packages=[‘my_multiply_package’], entry_points={
- ‘meta_info_provider_plugins’: [
‘my_meta_info_provider = my_multiply_package:my_meta_info_provider.MyMetaInfoProviderAccessor’
],
}, )
Implementing a Locally Wrapped Meta Info Provider¶
A locally wrapped meta info provider is a special type of meta info provider and requires a MetaInfoProviderAccessor
that should be defined as above.
The LocallyWrappedMetaInfoProvider
base class already implements some of the methods,
but puts up other method stubs that need to be implemented.
Note that all these methods are private and are never to be called from another class.
Already implemented methods are:
* query
* _get_parameters_as_dict
* can_update
* update
* remove
* get_all_data
So, the only methods from the MetaInfoProvider
interface that still needs implementing are name
,
provides_data_type
, and get_provided_data_types
.
-
class
multiply_data_access.locally_wrapped_data_access.
LocallyWrappedFileSystem
(parameters: dict)[source] -
abstract
_get_wrapped_parameters_as_dict
() → dict[source] - Returns
The parameters of this wrapped file system as dict
-
abstract
_init_wrapped_meta_info_provider
: This method is called right after the creation of the object.
Implement it to initialize the meta info provider with parameters.
Shall correspond to the dictionary handed out by _get_wrapped_parameters_as_dict
.
_query_wrapped_meta_info_provider
: Evaluates a query string and returns a list of data set meta infos about
available data that fulfils the query.
A query string consists of a geometry in the form of a wkt string, a start time in UTC format,
an end time in UTC format, and a comma-separated list of data types.
_get_wrapped_parameters_as_dict
: Similar to the FileSystem
’s get_parameters_as_dict
, this method will return
the required initialization parameters in the form of a dictionary.
Shall correspond to the dictionary handed in to _init_wrapped_file_system
.