OpenECPDS Documentation¶
Our mission with OpenECPDS is to keep data moving.
Inspired by operational excellence. Powered by open-source innovation. Acquire from anywhere. Deliver everywhere. Connect with confidence. Share without limits.
OpenECPDS is a multi-purpose data repository — the Data Store — that delivers three strategic data-related services:
- Data Acquisition — the automatic discovery and retrieval of data from data providers.
- Data Dissemination — the automatic distribution of data products to remote sites.
- Data Portal — the pulling and pushing of data initiated by remote sites.
Data Acquisition and Data Dissemination are active services initiated by OpenECPDS, whereas the Data Portal is a passive service triggered by incoming requests from remote sites. The Data Portal provides interactive access to the Dissemination and Acquisition services.
OpenECPDS enhances data services by integrating innovative technologies to streamline the acquisition, dissemination, and storage of data across diverse environments and protocols.
Why OpenECPDS¶
-
Acquire from anywhere
Automatically discover and retrieve data from providers over FTP, SFTP, FTPS, HTTP/S, Amazon S3, Azure and Google Cloud Storage.
-
Deliver everywhere
Disseminate products to more than 1,000 destinations across 80+ countries with a fully customisable, retry-aware transfer scheduler.
-
Object Data Store
Store data as objects with metadata and a globally unique identifier, with replication across local storage and cloud platforms.
-
Real-time notifications
An embedded MQTT broker and client enable instant notifications and integration with the WMO WIS2 infrastructure.
-
Container-native
Build and run with Docker and a development container; scale from a laptop to hundreds of systems and petabytes of data.
-
Open and extensible
A modular architecture supports new protocols through extensions, backed by a commitment to long-term maintenance.
Quick links¶
- New here? Start with System Requirements → Installation → First Run.
- Understand the system: Architecture Overview and Key Concepts.
- Configure transfers: Transfer Modules and the Host Directory Field.
- Operate & monitor: Event Logging and the MQTT Notification System.
Architecture at a glance¶
| Component | Role |
|---|---|
| Master Server | Central coordinator — authentication, metadata, scheduling, Data Mover allocation. |
| Mover Server (Data Mover) | Moves bytes — connects to remote systems via transfer modules, stores/streams content. |
| Monitor Server | Web monitoring interface for destinations, transfers and hosts. |
| Data Portal | Incoming FTP/HTTPS/S3 access for remote sites to push and pull data. |
| Database | Persists metadata, destinations, hosts, transfers and history. |
See the Architecture Overview for how these components work together, and Continental Data Movers for geographically distributed dissemination.
Core capabilities¶
- Multiple protocols — FTP, SFTP, FTPS, HTTP/S, Amazon S3, Azure Blob and Google Cloud Storage. See Protocols & Connections.
- Object storage — hierarchy-free storage that can emulate directory structures. See Object Storage.
- Notification system — embedded MQTT broker and client. See MQTT Overview.
- Data compression — lzma, zip, gzip, bzip2, lbzip2, lz4, snappy.
- Data checksumming — MD5 for remote integrity, ADLER32 in the Data Store.
- Garbage collection — automatic removal of expired data.
- Data backup — map data sets to existing archiving systems.
See Additional Features for details.
Support & resources¶
- Javadoc API documentation
- Support Materials
- Glossary of key terms
- Contributing and Changelog