This is an overview of steps a providers needs to take to integrate with the C-SCALE Data federation.
The C-SCALE federation integrates providers of spatio-temporal data, who wish to facilitate easy access and analysis of said data. The federation welcomes providers of Earth Observation and well as in-situ data and targets primarily data obtained within the Copernicus Programme. However providers of spatio-temporal data from other sources are also welcome.
Integration into the federation takes place along two lines:
Both lines of integration are explained in the following text. Please note that this guide does not discuss C-SCALE's motivation for choosing the technologies or approaches discussed herein. For more on the reasoning, consult C-SCALE's deliverables: Copernicus Data Access and Querying Design and Copernicus Data Lookup, access and Dissemination Final Implementation Report (TBD).
For the reader's convenience, this is a checklist of requirements that must be met by a site to fully integrate with the Data federation:
Requirement | Check |
---|---|
GOCDB Registration | |
STAC API | |
HTTPS Interface | |
EGI Check-in integration |
The Metadata Query Service is the principal tool for discovering data across the federation. To make your own data discoverable through it, you need to complete two steps:
The provider catalogue maintains information on federation partners. It does not keep track of any actual spatio-temporal metadata. The Provider catalogue is kept in the Grid Configuration Database – GOCDB.
Login to GOCDB using your NGI/institute account. You may have a look at the GOCDB documentation if you get into trouble.
There is a catch in the form of Level of Assurance (LoA) of your account. If it's not sufficient (that is if you see attribute
https://aai.egi.eu/LoA#Low
when logging in through EGI Check-in) you will receive an error. You can also get an error when your institute's IdP does not meet all requirements for research status or security standard. In that case authorization to GOCDB cannot be granted automatically and you have to request access explicitly.
After signing in, you also have to register an account by clicking User status → Register.
NGIs are official, national-level organisations that concentrate the nation's resources. In case one does not exist for your country, it takes official action to have it registered.
A Site is a semiphysical location where services are concentrated. For C-SCALE one can reuse existing sites if applicable, and just add new services in there. Otherwise it is necessery to register a new site specifically for C-SCALE. Sites must be created by the appropriate NGI Operations Manager. If you need a new site created, find who's yours and contact them.
Either ask for permissions to manage services in the site (get the Site Administrator role, or arrange with an authorized person).
The Scope is a tag that shows what is the purpose of the site. There may be multiple tags. We have a C-SCALE
tag registered with GOCDB and all sites in the federation must be marked with that tag. Please set the C-SCALE
tag for your site. You may list all sites with the tag already set, e.g., with curl:
curl 'https://goc.egi.eu/gocdbpi/public/?method=get_site_list&scope=C-SCALE'
This is the list the Metadata Query Service will consider in its searches!
Register your service. We are using service type CUSTOM_SERVICE
. The input is largely free form.
Register your endpoints. At the moment the most important one is a STAC endpoint. There may also be additional value in registering OpenSearch
and OData
endpoints. Give the protocol (especially STAC
) as the NAME
of the endpoint. This is imperative! The MQS is going to use it to identify STAC endpoints.
If the registration is done correctly, services integrated with C-SCALE, such as the MQS, can list supported endpoints with the C-SCALE tag. You may try for yourself, e.g.:
curl 'https://goc.egi.eu/gocdbpi/public/?method=get_service_endpoint&scope=C-SCALE'
If you see your endpoint listed, you have done this step correctly and the MQS will become aware of your service within an hour. An up-to-date list of sites included by MQS in its queries can be found at https://eo-mqs.c-scale.eu/stac/v1/data-providers
The preferred interface for discovery is STAC-API, STAC being the spatio-Temporal Asset Catalogue. The Metadata Query Service provides it to the users and also prefers to have it made available by downstream catalogues.
The MQS currently supports the STAC-API core query parameters for searching. It is expected that the provided STAC endpoint can handle those parameters as well. If only a static implementation is set up, the MQS will still be able to detect your catalogue and make it available for browsing, but users will not be able to apply the STAC search on your data.
Congratulations, you are done with the discovery part of your setup. Provided you have correctly registered your STAC endpoint in the previous step, this is all you need to.
Unfortunately it is mostly up to you to provide one, and make sure the underlying catalogue reflects the data collections you are offering. There are multiple ways of achieving this.
This is an approach several existing C-SCALE partners have selected for their sites. There are multiple implementations of STAC catalogue, one that can be recommended is RESTO. An extensive list of tools for creating STAC Servers is maintained on the STAC Index website. This is also a useful resource for finding software to generate STAC-compliant metadata for your datasets, e.g., STAC Collections and STAC Items containing links and assets.
Specifically for providers sites whose main data catalogue is operated by the ESA/SERCO DHuS, there is a set of tools that can subsequently fill a STAC catalogue with metadata for products stored in your DHuS. Although it's still work in progress, one such component is the
register-stac.sh
script. It wraps arround stac-tools to reconstruct STAC metadata for individual Sentinel products and register them into your STAC catalogue.
This approach consists in setting up a stand-alone translation service between the discovery API you are currently using and STAC. We are aware of no implementations of this approach.
STAC metadata are quite detailed and many products currently in use in the context of Earth Observation do not even store metadata at that level of detail – most notably ESA/SERCO DHuS. Before considering a translation component, make sure that sufficient level of detail is available in your catalogue and can be referenced by your discovery protocol.
The Metadata Query Service is an Open Source tool whose source code is freely available on GitHub. Contributions are welcome to introduce additional features, including backend support for additional discovery protocols.
Spatio-temporal data items are typically organized into collections. C-SCALE does not mandate a fixed collection structure – partners who already operate catalogues are welcome to keep their current one – but those building a catalogue afresh can benefit from a recommended set of collections for ESA Sentinel data.
Currently available upon request.
Services redistributing Copernicus data tyically require user registration and authentication. The purpose of integrating with the C-SCALE federation along the Access/Authentication line is to allow users to rely on their existing identities when accessing data resources across C-SCALE.
The second objective being followed is enabling access regardless of distance.
Sites integrated with the C-SCALE data federation are expected to make data downloadable over HTTP. Support for any other method or protocol to access the data is welcome but not mandated. In terms of authenttication, there are two options:
Same as the C-SCALE cloud compute federation, the Data federation relies int the EGI Check-in service for federated auhtentication. Integration with that service is explained in its own documentation:
Aside of mere authentication, many Copernicus data providers also require specific attributes to be provided for incoming users. The following set of attributes is typically available for users who have enrolled into one of the virtual organisations currently supported by C-SCALE and managed in Perun:
Name in Perun | Name in Perun RPC | Name in LDAP | OIDC Scope | OIDC Claim | OIDC Value | Type |
---|---|---|---|---|---|---|
C-SCALE Company | urn:perun:user:attribute-def:def:cscaleCompany |
cscaleCompany |
org |
org_name |
urn:mace:egi.eu:res:c-scale:company:<COMPANY>#aai.egi.eu |
string |
C-SCALE User Category | urn:perun:user:attribute-def:def:cscaleUserCategory |
cscaleUserCategory |
eduperson_entitlement |
eduperson_entitlement |
urn:mace:egi.eu:res:c-scale:user-category:<USER_CATEGORY>#aai.egi.eu |
"commercial" , "education" , "government" , "research" , "other" |
C-SCALE Accept Email | urn:perun:user:attribute-def:def:cscaleAcceptEmail |
cscaleAcceptEmail |
eduperson_entitlement |
eduperson_entitlement |
[Boolean] true or false |
boolean ("true" or "" ) |
C-SCALE Research Field | urn:perun:user:attribute-def:def:cscaleResearchField |
cscaleResearchField |
eduperson_entitlement |
eduperson_entitlement |
urn:mace:egi.eu:res:c-scale:research-field:<RESEARCH_FIELD>#aai.egi.eu |
"atmosphere" , "climate" , "emergency" , "energy" , "land" , "marine" , "security" , "other" |
C-SCALE User Function | urn:perun:user:attribute-def:def:cscaleUserFunction |
cscaleUserFunction |
eduperson_entitlement |
eduperson_entitlement |
urn:mace:egi.eu:res:c-scale:user-function:<USER_FUNCTION>#aai.egi.eu |
"expert" , "manager" , "researcher" , "technician" , "other" |
C-SCALE Country | urn:perun:user:attribute-def:def:cscaleCountry |
cscaleCountry |
address |
country |
[String] As saved in Perun |
These attributes can be used by sites for usage statistics, reporting, profiling, or even authorization decisions.
Many sites contracted by ESA to operate a node in the Data Hub Relay Framework are obliged to allow access only to collaborative partners, i.e., formal members of the ESA Collaborative Ground Segment. Still, this does not prevent them from integrating with the C-SCALE Data federation. There are multiple options to adopt: