User Guide

The MULTIPLY Data Access Component is supposed to be used via its Python API. Therefore, most of this section will deal with the Usage via the Python API. To see how to manually manipulate the data stores file, see Configuration. If you want to register a new data store from data that is saved on local disk, see How to add new Local Data Stores. Finally, if you find that the Data Access Component is missing functionality, you can extend it by Implementing a new File System or Implementing a new Meta Info Provider. When you have these two set up, you can create a new data store by editing the default data stores yaml file.

  • implementing new file systems

  • implementing new meta info providers

  • implementing new data types

Usage via the Python API

This section gives an overview about how the Data Access Component can be used within Python. The only component that is supposed to be used directly is the DataAccessComponent object.

DataAccessComponent

class multiply_data_access.data_access_component.DataAccessComponent[source]

The controlling component. The data access component is responsible for communicating with the various data stores and decides which data is used from which data store.

can_put(data_type: str) → bool[source]
Parameters

data_type – A data type.

Returns

True, if data of this type can be added to at least one data store.

create_local_data_store(base_dir: Optional[str] = None, meta_info_file: Optional[str] = None, base_pattern: Optional[str] = '/dt/yy/mm/dd/', id: Optional[str] = None, supported_data_types: Optional[str] = None)[source]

Adds a a new local data store and saves it permanently. It will consist of a LocalFileSystem and a JsonMetaInfoProvider. :param supported_data_types: A string with the comma-separated names of data types shall be allowed in this data store. If this is None or empty, the data types will be derived from the data sets in the json file. If there are no entries in the json file, it will be guessed from the data in the file system. :param base_dir: The base directory to which the data shall be written. :param meta_info_file: A JSON file that already contains meta information about the data that is present in the folder. If not provided, an empty file will be created and filled with the data that match the base directory and the base pattern. :param base_pattern: A pattern that allows to create an order in the base directory. Available options are ‘dt’ for the data type, ‘yy’ for the year, ‘mm’ for the month, and ‘dd’ for the day, arrangeable in any oder. If no pattern is given, all data will simply be written into the base directory. :param id: An identifier for the Data Store. If there already exists a Data Store with the name, an additional number will be added to the name.

get_data_urls(roi: str, start_time: str, end_time: str, data_types: str) → List[str][source]

Builds a query from the given parameters and asks all data stores whether they contain data that match the query. If datasets are found, url’s to their positions are returned. :return: a list of url’s to locally stored files that match the conditions given by the query in the parameter.

get_data_urls_from_data_set_meta_infos(data_set_meta_infos: List[multiply_data_access.data_access.DataSetMetaInfo]) → List[str][source]

Builds a query from the given parameters and asks all data stores whether they contain data that match the query. If datasets are found, url’s to their positions are returned. :return: a list of url’s to locally stored files that match the conditions given by the query in the parameter.

get_provided_data_types() → List[str][source]
Returns

A list of all data types that are provided by the Data Access Component.

put(path: str, data_store_id: Optional[str] = None) → None[source]

Puts data into the data access component. If the id to a data store is provided, the data access component will attempt to put the data into the store. If data cannot be added to that particular store, it will not be attempted to put it into another one. If no store id is provided, the data access component will on its own try to determine an apt data store. A data store is considered apt if it already holds data of the same type. :param path: A path to the data that shall be added to the Data Access Component. :param data_store_id: The id of a data store. Can be None.

query(roi: str, start_time: str, end_time: str, data_types: str) → List[multiply_data_access.data_access.DataSetMetaInfo][source]

Distributes the query on all registered data stores and returns meta information on all data sets that meet the conditions of the query. :param roi: The region of interest, given in the form of a wkt-string. :param start_time: The start time of the query, given as a string in UTC time format :param end_time: The end time of the query, given as a string in UTC time format :param data_types: A list of data types to be queried for. :return: A list of DataSetMetaInfos that meet the conditions of the query.

show_stores()[source]

Prints out a list of all registered data stores.

DataSetMetaInfo

class multiply_data_access.data_access.DataSetMetaInfo(coverage: str, start_time: Optional[str], end_time: Optional[str], data_type: str, identifier: str, referenced_data: Optional[str] = None)[source]

A representation of meta information about a data set. To be retrieved from a query on a MetaInfoProvider or DataStore.

property coverage

The dataset’s spatial coverage, given as WKT string.

property data_type

The type of the dataset.

property end_time

The dataset’s end time. Can be none.

equals(other: object) → bool[source]

Checks whether two data set meta infos are equal. Does not check the identifier or referenced data sets!

equals_except_data_type(other: object) → bool[source]

Checks whether two data set meta infos are equal, except that they may have the same data type. Does not check the identifier or referenced data sets!

property identifier

An identifier so that the data set can be found on the Data Store’s File System.

property referenced_data

A list of additional files that are referenced by this data set. Can be none.

property start_time

The dataset’s start time. Can be none.

How to add new Local Data Stores

You can add a new local data store via the Python API like this. This will create a new data store consisting of a LocalFileSystem and a JsonMetaInfoProvider.

All parameters are optional. The default for the base directory is the .multiply-folder in the user’s home directory. The base directory will be checked for any pre-existing data. This data will be registered in the store if it is of any of the supported data types. If you do not specify the supported data types, the Data Access Component will determine these from the entries in the JSON metainfo file. If no metadata file is provided, the data types will be determined from the data in the base directory. If finally no data can be found there, the data store is not created.

Implementing new Data Stores

If you need to create a completely new data store, you will probably need to implement both a new File System and a new Meta Info Provider (we advise to check whether you can re-use existing File Systems and Meta Info Providers). This section is a guideline on how to do so. It is recommended to consider Basic Concepts first.

Implementing a new File System

The basic decision is whether the file system shall be wrapped by a local file system or not. The wrapping functionality is provided by the LocallyWrappedFileSystem in locally_wrapped_data_access.py. Choose this if you want to access remote data but don’t want to bother with how to organize the data on the local disk.

Implementing a Non-Locally Wrapped File System

For this, you need to adher to the interfaces FileSystemAccessor and FileSystem defined in data_access.py. The following lists the methods of the interface that need to be implemented:

class multiply_data_access.data_access.FileSystemAccessor[source]
classmethod create_from_parameters(parameters: dict) → multiply_data_access.data_access.FileSystem[source]

Returns a FileSystem object.

classmethod name() → str[source]

The name of the file system implementation.

name: Shall return the name of the file system.

create_from_parameters: Will receive a list of parameters and create a file system by handing these in as the initialization parameters. Shall correspond to the dictionary handed out by FileSystem’s get_parameters_as_dict.

class multiply_data_access.data_access.FileSystem[source]

An abstraction of a file system on which data sets are physically stored

abstract can_put() → bool[source]
Returns

True, if data can be put into this file system.

abstract get(data_set_meta_info: multiply_data_access.data_access.DataSetMetaInfo) → Sequence[<Mock name='mock.FileRef' id='140144878905832'>][source]

Retrieves a sequence of ‘FileRef’s.

abstract get_parameters_as_dict() → dict[source]
Returns

The parameters of this file system as dict

abstract classmethod name() → str[source]
Returns

The name of the file system implementation.

abstract put(from_url: str, data_set_meta_info: multiply_data_access.data_access.DataSetMetaInfo) → multiply_data_access.data_access.DataSetMetaInfo[source]

Adds a data set to the file system by copying it from the given url to the expected location within the file system. Returns an updated data set meta info.

abstract remove(data_set_meta_info: multiply_data_access.data_access.DataSetMetaInfo)[source]

Removes all data sets from the file system that are described by the data set meta info

abstract scan() → Sequence[multiply_data_access.data_access.DataSetMetaInfo][source]

Retrieves a sequence of data set meta informations of all file refs found in the file system.

name: Shall simply return the name of the file system. This will serve as identifier.

get: From a list of :ref:`ug_02`s, this returns FileRefs to the data that is ready to be accessed, i.e., is provided locally. This part would perform a download if necessary.

get_parameters_as_dict: This will return the parameters that are needed to reconstruct the file system. The parameters will eventually be written to the data stores file. Shall correspond to the dictionary handed in by the FileSystemAccessors’s create_from_parameters.

can put: Shall return true when the Data Access Component can add data to the file system.

put: Will copy the data located from the url to the file system and update the data set meta info. You might throw a User Warning here if you do not support this operation. You can use the identifier of the data set meta info to later relocate the file on the file system more easily.

remove: Shall remove the file identified by the data set meta info from the file system. You might throw a User Warning here if you do not support this operation.

scan: Retrieves data set meta infos for all data that is found on the file system. This expects to find the data that is directly, i.e, locally available.

To later have the file system available in the data access component, you need to register it in the setup.py of your python package. The registration should look like this:


setup(name=’my-multiply-data-access-extension’,

version=1.0, packages=[‘my_multiply_package’], entry_points={

‘file_system_plugins’: [

‘my_file_system = my_multiply_package:my_file_system.MyFileSystemAccessor’

],

}, )

Implementing a Locally Wrapped File System

A locally wrapped file system requires a FileSystemAccessor that should be defined as above. The LocallyWrappedFileSystem base class already implements some of the methods, but puts up other method stubs that need to be implemented. Note that all these methods are private.

Already implemented methods are: * get * get_parameters_as_dict * can_put * put * remove * scan So, actually the only method from the FileSystem interface that still needs implementing is name.

class multiply_data_access.locally_wrapped_data_access.LocallyWrappedFileSystem(parameters: dict)[source]
abstract _get_from_wrapped(data_set_meta_info: multiply_data_access.data_access.DataSetMetaInfo) → Sequence[<Mock name='mock.FileRef' id='140144878905832'>][source]

Retrieves the file ref from the wrapped file system.

abstract _get_wrapped_parameters_as_dict() → dict[source]
Returns

The parameters of this wrapped file system as dict

abstract _init_wrapped_file_system(parameters: dict) → None[source]

Initializes the file system wrapped by the LocallyWrappingFileSystem. To be called instead of __init__

abstract _notify_copied_to_local(data_set_meta_info: multiply_data_access.data_access.DataSetMetaInfo) → None[source]

Called when the data set has been copied to the local file system.

_init_wrapped_file_system: This method is called right after the creation of the object. Implement it to initialize the file system with parameters. Shall correspond to the dictionary handed out by _get_wrapped_parameters_as_dict. _get_from_wrapped: Like get from the File System: Will retrieve FileRefs to data. This data has to be provided locally, so any downloading has to be performed here. _notify_copied_to_local: Informs the File System that the data desidnated by the data set meta info has been put to the local file system. You do not have to do anythin here, but in case you have downloaded the data to a temporary directory, this is a good time to delete it from there. _get_wrapped_parameters_as_dict: Similar to the FileSystem’s get_parameters_as_dict, this method will return the required initialization parameters in the form of a dictionary. Shall correspond to the dictionary handed in to _init_wrapped_file_system.

Implementing a new Meta Info Provider

In many cases when you require your own dedicated File System, you will want to add a Meta Info Provider. As for the File System, you also have the choice to create a locally wrapped version of it or not. The wrapping functionality is provided by the LocallyWrappedMetaInfoProvider in locally_wrapped_data_access.py. Choose this if you want to provide information about remotely stored data and keep it separated from information about data from this source that has already been downloaded.

Implementing a Non-Locally Wrapped Meta Info Provider

To implement a regular Meta Info Provider, you need to create realizations of the interfaces MetaInfoProviderAccessor and MetaInfoProvider defined in data_access.py. The MetaInfoProviderAccessor is required by the DataAccessComponent so that MetaInfoProviders can be registered and created. The following lists the methods of the MetaInfoProviderAccessor interface that need to be implemented:

class multiply_data_access.data_access.MetaInfoProviderAccessor[source]
classmethod create_from_parameters(parameters: dict) → multiply_data_access.data_access.MetaInfoProvider[source]

Returns a MetaInfoProvider object.

classmethod name() → str[source]

The name of the meta info provider implementation.

name: Shall return the name of the meta info provider.

create_from_parameters: Will receive a list of parameters and create a meta infor provider by handing the parameters in as the initialization parameters. Shall correspond to the dictionary handed out by the MetaInfoProvider’s _get_parameters_as_dict.

The methods to be implemented for the MetaInfoProvider are:

class multiply_data_access.data_access.MetaInfoProvider[source]

An abstraction of a provider that contains meta information about the files provided by a data store.

abstract _get_parameters_as_dict() → dict[source]
Returns

The parameters of this file system as dict

abstract can_update() → bool[source]
Returns

true if this meta info provider can be updated.

abstract get_all_data() → Sequence[multiply_data_access.data_access.DataSetMetaInfo][source]

Returns all available data set meta infos.

abstract get_provided_data_types() → List[str][source]
Returns

A list of the data types provided by this data store.

abstract classmethod name() → str[source]

The name of the file system implementation.

abstract provides_data_type(data_type: str) → bool[source]

Whether the meta info provider provides access to data of the queried type :param data_type: A string labelling the data :return: True if data of that type can be requested from the meta info provider

abstract query(query_string: str) → List[multiply_data_access.data_access.DataSetMetaInfo][source]

Processes a query and retrieves a result. The result will consist of all the data sets that satisfy the query. :return: A list of meta information about data sets that fulfill the query.

abstract remove(data_set_meta_info: multiply_data_access.data_access.DataSetMetaInfo)[source]

Removes information about this data set from its internal registry.

abstract update(data_set_meta_info: multiply_data_access.data_access.DataSetMetaInfo)[source]

Adds information about the data set to its internal registry.

name: Shall simply return the name of the meta info provider. This will serve as identifier.

query: Evaluates a query string and returns a list of data set meta infos about available data that fulfils the query. A query string consists of a geometry in the form of a wkt string, a start time in UTC format, an end time in UTC format, and a comma-separated list of data types.

provides_data_type: True, if the meta info provider is apt for this data. Returning true here does not necessarilyy mean that data of this type is currently stored.

get_provided_data_types: Returns a list of all data types that this meta info provider supports.

_get_parameters_as_dict: A private method that will return the parameters that are needed to reconstruct the meta info provider. The parameters will eventually be written to the data stores file. Shall correspond to the dictionary handed in by the MetaInfoProviderAccessors’s create_from_parameters.

can_update: Shall return true when entries about data available on the file system can be added to this meta info provider.

update: Hands in a data set that has been put to the file system. The meta info provider is expected to store this information and retrieve it when it meets an incoming query. If this is not implemented, make sure that can_update returns false.

remove: Shall remove the entry associated with the data set meta info from the provider’s registry. If this is not implemented, make sure that can_update returns false.

get_all_data: Shall return data set meta infos about all available data.

As for the File System, the Meta Info Provider needs to be registered in the setup.py of the python package to make it available for the data access component. The registration should look like this:


setup(name=’my-multiply-data-access-extension’,

version=1.0, packages=[‘my_multiply_package’], entry_points={

‘meta_info_provider_plugins’: [

‘my_meta_info_provider = my_multiply_package:my_meta_info_provider.MyMetaInfoProviderAccessor’

],

}, )

Implementing a Locally Wrapped Meta Info Provider

A locally wrapped meta info provider is a special type of meta info provider and requires a MetaInfoProviderAccessor that should be defined as above. The LocallyWrappedMetaInfoProvider base class already implements some of the methods, but puts up other method stubs that need to be implemented. Note that all these methods are private and are never to be called from another class.

Already implemented methods are: * query * _get_parameters_as_dict * can_update * update * remove * get_all_data So, the only methods from the MetaInfoProvider interface that still needs implementing are name, provides_data_type, and get_provided_data_types.

class multiply_data_access.locally_wrapped_data_access.LocallyWrappedFileSystem(parameters: dict)[source]
abstract _get_wrapped_parameters_as_dict() → dict[source]
Returns

The parameters of this wrapped file system as dict

_init_wrapped_meta_info_provider: This method is called right after the creation of the object. Implement it to initialize the meta info provider with parameters. Shall correspond to the dictionary handed out by _get_wrapped_parameters_as_dict.

_query_wrapped_meta_info_provider: Evaluates a query string and returns a list of data set meta infos about available data that fulfils the query. A query string consists of a geometry in the form of a wkt string, a start time in UTC format, an end time in UTC format, and a comma-separated list of data types.

_get_wrapped_parameters_as_dict: Similar to the FileSystem’s get_parameters_as_dict, this method will return the required initialization parameters in the form of a dictionary. Shall correspond to the dictionary handed in to _init_wrapped_file_system.