ECPDS command-line Tool¶
The ecpds command-line is designed to submit local data files to a specified destination on OpenECPDS. It provides several options to fine-tune transfer requests, monitor transfer statuses, and manage scheduled transfers. This page examines the workflow for submitting a local file in both synchronous and asynchronous modes.
Synchronous Push¶
This is the default mode for submitting data files to OpenECPDS. It allows both file transfer and metadata registration in a single execution of the ecpds command-line. Once OpenECPDS has successfully processed the file, it returns a Data File ID, which can be used to track the file via the OpenECPDS monitoring interface. The returned Data File ID guarantees that the file has been correctly registered and stored in the Data Store.
Workflow steps:
- The
ecpdscommand connects to the Master Server, authenticates, and sends metadata (e.g., source hostname, user ID, filename, location, size). - The Master Server allocates a Data File ID for the file and assigns a Data Mover to receive its content. It then returns the hostname and port of the selected Data Mover, along with the Data File ID, to the
ecpdscommand. - The
ecpdscommand connects to the Data Mover using the provided hostname and port and transfers the file content. - The Data Mover sends an acknowledgment of file reception to the Master Server.
- The Master Server sends the acknowledgment to the
ecpdscommand, including the Data File ID.
Asynchronous Push¶
The asynchronous mode is recommended when handling a large number of data files or high data volumes. In the first phase, metadata — including the file location and the source host for retrieval — is registered in OpenECPDS. In the second phase, OpenECPDS initiates the file downloads from the Data Movers. Typically, multiple files are registered at once, organised into groups, and retrieved in parallel streams managed by OpenECPDS. This approach enhances performance by using a load-balancing mechanism to distribute the workload across Data Movers and source hosts.
Submitting the request¶
- The
ecpdscommand connects to the Master Server, authenticates, and sends metadata (e.g., source hostname, user ID, filename, location, size). - The Master Server allocates a Data File ID for the file and assigns a Data Mover to receive its content. It then sends an acknowledgment to the
ecpdscommand, including the Data File ID.
Retrieving the file content¶
Triggered by the Transfer Scheduler:
- The Master Server contacts the allocated Data Mover and requests that the data file be retrieved.
- Using the provided metadata, the Data Mover connects to the source host with the user ID and retrieves the content of the file based on its filename and location.
- After the retrieval is successfully completed, the Data Mover sends an acknowledgment to the Master Server.
After submitting the request with the ecpds command, the status can be checked using the ecpds command and the Data File ID to track the data file retrieval. Tracking can also be done through the OpenECPDS monitoring interface.
Retrieval mechanism at ECMWF¶
At ECMWF, the forecast is produced and stored on the supercomputer (HPC) across multiple data nodes. Some submission tasks register these files with the OpenECPDS Master. The Master Server then records the requests, and the Transfer Scheduler asynchronously load-balances the data file retrievals across multiple Data Movers in each hall. This enables high parallelism between the HPC data nodes and the OpenECPDS Data Movers, maximising the use of the local network. The maximum number of simultaneous data retrievals is configurable in OpenECPDS.
Data submission requests to OpenECPDS are grouped under a specific name. At the end of a batch submission, an ecpds command is executed to track the retrieval of all files in the group. Once all files have been successfully retrieved, the ecpds command returns.
Related¶
- Data Portal
- Dissemination
- Lifecycle of a Data Transfer
- PRS event fields — product status updates from
ecpds