Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Qubed

Qubed is a Rust library (with Python bindings) for working with trees of datacubes — a compressed data structure that efficiently represents large, sparse collections of key-value identifiers.

What is a Qube?

In many domains — particularly meteorological data — datasets are labelled by sets of key-value pairs called identifiers:

{ class: "od", expver: "0001", param: "1", levtype: "sfc" }

When datasets are dense (every combination of key values exists), they can be represented as a single datacube. In practice, however, datasets are sparse — not every combination is valid. A Qube represents these sparse datasets as a compressed tree of dense datacubes, achieving massive compression while still supporting efficient operations.

For example, a dataset with over 1 billion distinct identifiers can be stored in a Qube with just a few thousand nodes, fitting in a few megabytes of memory.

At a Glance

root
├── class=od, expver=0001/0002, param=1/2
└── class=rd
    ├── expver=0001, param=1/2/3
    └── expver=0002, param=1/2

This tree represents 9 unique identifiers using only 5 nodes instead of 9 leaf entries. Each path from root to leaf defines a dense datacube; the tree as a whole represents their union.

Key Features

  • Compression — automatically merges sibling nodes with identical subtree structure, reducing tree size dramatically.
  • Set operations — union, intersection, difference, and symmetric difference, all operating directly on the compressed form.
  • Selection & filtering — query the tree by dimension and coordinate values.
  • Multiple serialization formats — ASCII tree, nested JSON, arena JSON (BFS flat array), and DSS constraints (array-of-maps).
  • Adapters — ingest from MARS list format, FDB path lists, and DSS constraint JSON.
  • Python bindings — full access to Qube construction, serialization, and adapter functionality from Python via PyO3.

Crate Organization

CratePurpose
qubedCore data structure: Qube, Coordinates, compression, selection, serialization
qubed-meteoDomain-specific adapters: MARS list parser, FDB list parser, DSS constraints
py_qubedPython bindings for the core qubed crate
py_qubed_meteoPython bindings for the qubed-meteo adapters

Getting Started

Building the Book

# Install mdbook if needed
cargo install mdbook

# Build
mdbook build docs

# Serve locally with live-reload
mdbook serve docs -o

Datacubes, Trees and Compressed Trees

This chapter introduces the core concepts behind Qubed. Feel free to skip ahead to the Quickstart if you’d rather learn by doing.

Identifiers

Qubed is primarily geared towards dealing with data files uniquely labelled by sets of key-value pairs. We call such a set an identifier:

{
  "class": "d1",
  "dataset": "climate-dt",
  "generation": "1",
  "date": "20241102",
  "resolution": "high",
  "time": "0000"
}

Each identifier maps to exactly one dataset (a GRIB field, a file on disk, an API result, etc.). We’re interested in describing which identifiers currently exist and performing efficient operations over them.

Dense Datacubes

If we’re lucky, the set of identifiers forms a dense datacube — every combination of key values is present:

class=d1/d2, dataset=climate-dt, generation=1/2/3,
model=icon, date=20241102/20241103, resolution=high/low,
time=0000/0600/1200/1800

This single object represents 2 × 1 × 3 × 1 × 2 × 2 × 4 = 96 distinct datasets. Dense datacubes are compact and efficient.

Sparse Datacubes as Trees

In practice, datasets are rarely fully dense. For example, certain models may only produce data at certain resolutions or certain experiments may only cover a subset of parameters.

We can represent which data exists as a tree, where each node carries a dimension name and a set of coordinate values:

root
├── class=od
│   ├── expver=0001
│   │   ├── param=1
│   │   └── param=2
│   └── expver=0002
│       ├── param=1
│       └── param=2
└── class=rd
    ├── expver=0001
    │   ├── param=1
    │   ├── param=2
    │   └── param=3
    └── expver=0002
        ├── param=1
        └── param=2

Each root-to-leaf path defines one identifier. This tree represents 9 identifiers. It can express sparsity that a flat datacube cannot — above, class=rd, expver=0001 has an extra param=3 that class=od doesn’t have.

Compression: Trees of Dense Datacubes

The expanded tree above contains a lot of redundant information. Many subtrees are structurally identical. In practice, real-world data tends to be “nearly dense” — it’s composed of a modest number of dense datacubes.

Qubed compresses the tree by merging sibling nodes that have identical subtree structure. The algorithm computes a structural hash of each node (covering its dimension, children’s keys, children’s values, and recursively their children) and merges siblings whose hashes match:

root
├── class=od, expver=0001/0002, param=1/2
└── class=rd
    ├── expver=0001, param=1/2/3
    └── expver=0002, param=1/2

The 16-node expanded tree is now just 5 nodes — and still represents exactly the same 9 identifiers. Each leaf-path in the compressed tree is a dense datacube.

Restriction: No identical key=value pairs may be adjacent siblings. For example, the following would not be allowed:

root
├── class=od, expver=0001/0002, param=1/2
└── class=rd
    ├── expver=0001, param=3
    └── expver=0001/0002, param=1/2

This restriction ensures that looking up a particular expver value in a branch never requires following multiple children — each value appears under at most one child per dimension.

Scale

At real-world scale these properties are dramatic. For example, the ECMWF Climate DT dataset contains over 1 billion distinct identifiers but can be represented by a Qube with approximately 3,000 nodes in about 11 MB of memory.

What’s Next

Under the Hood: Algorithms

This chapter explains the key algorithms that power Qubed: set operations on compressed trees and compression itself.

Set Operations

Qubes represent sets of identifiers, so the familiar set operations are all defined:

OperationRust methodDescription
Uniona.append(&mut b)All identifiers in A or B (or both)
Intersectionselect with intersection logicIdentifiers in both A and B
Differenceinternal set operationIdentifiers in A but not B
Symmetric differenceinternal set operationIdentifiers in exactly one of A or B

How It Works

The algorithm traverses both trees in tandem, recursively:

for node_a in level_A:
    for node_b in level_B:
        just_A, intersection, just_B = fused_set_operation(
            node_a.coordinates,
            node_b.coordinates
        )

At each level, nodes are grouped by dimension. For every pair of nodes sharing the same dimension, the algorithm computes three disjoint sets of coordinate values:

  • just_A — values only in node A
  • intersection — values in both nodes
  • just_B — values only in node B

Depending on the operation:

OperationKeeps
Unionjust_A + intersection + just_B
Intersectionintersection only
A − Bjust_A only
B − Ajust_B only
Symmetric differencejust_A + just_B

The crucial insight is that each partition gets different children:

  • just_A inherits the children of node A
  • just_B inherits the children of node B
  • intersection gets children computed by recursively calling the set operation on the sub-trees of A and B

This recursive decomposition ensures the result is still a valid compressed Qube.

Performance Considerations

The pairwise comparison is quadratic in the number of matching nodes at each level: $O(N_A \times N_B)$ comparisons per dimension group. In practice this is manageable because:

  1. Once any of just_A, intersection, or just_B is determined to be empty, it can be discarded immediately.
  2. For sorted coordinate types (integers, ranges), the intersection can be computed in linear time by walking both sorted lists in tandem.
  3. After the operation, compression merges any resulting sibling nodes with identical structure, keeping the tree compact.

Compression

Compression is the process of reducing tree size while preserving the set of identifiers. It works in three phases:

Phase 1: Recursive Coordinate Merging

Starting from the leaves and working upward, the algorithm identifies sibling nodes (children of the same parent, sharing the same dimension) that have identical structural hashes.

The structural hash of a node is computed from:

  • The node’s dimension name
  • The structural hashes of all its children (recursively)
  • But not the node’s own coordinate values

Two sibling nodes with the same structural hash have identical subtree shapes. Their coordinates can be merged into a single node without losing information:

Before:                          After:
├── expver=0001                  └── expver=0001/0002
│   ├── param=1                      ├── param=1
│   └── param=2                      └── param=2
└── expver=0002
    ├── param=1
    └── param=2

Phase 2: Pruning Empty Nodes

After merging, some nodes may have empty coordinate sets (their values were absorbed by a sibling). These empty nodes are pruned from the tree.

Phase 3: Deduplication

A final pass deduplicates any nodes that became structurally identical after merging. This is done by recomputing structural hashes and collapsing identical siblings.

Hash Caching

Structural hashes are cached in each node using an AtomicU64. The cache is invalidated (set to 0) whenever a node or any of its ancestors are modified. This ensures hashes are recomputed lazily only when needed, making repeated compression operations efficient.

The append / Union Workflow

When two Qubes are merged via append:

  1. The two root nodes are paired and node_merge is called recursively.
  2. At each level, children are grouped by dimension and the internal set operation produces the three partitions (just_A, intersection, just_B).
  3. For the intersection partition, new nodes are created and children are recursively merged.
  4. For just_B partitions, subtrees are copied from the other Qube into self.
  5. After all merging is complete, compress() is called to re-compress the result.

The append_many method optimizes merging many Qubes by batching: it performs intermediate compression every 500 Qubes to prevent unbounded tree growth.

Quickstart

This chapter walks through building, manipulating, and querying Qubes using the Rust API. For the equivalent Python API, see the Python bindings chapter.

Creating a Qube

From ASCII Representation

The most readable way to build a Qube is from its ASCII tree representation:

#![allow(unused)]
fn main() {
use qubed::Qube;

let q = Qube::from_ascii(r#"root
├── class=od
│   ├── expver=0001
│   │   ├── param=1
│   │   └── param=2
│   └── expver=0002
│       ├── param=1
│       └── param=2
└── class=rd
    ├── expver=0001
    │   ├── param=1
    │   ├── param=2
    │   └── param=3
    └── expver=0002
        ├── param=1
        └── param=2"#).unwrap();

println!("{}", q.to_ascii());
}

Each line takes the form key=value where multiple values are separated by /:

root
├── class=od, expver=0001/0002, param=1/2

From Nested JSON

You can also build a Qube from a JSON object where keys are "dimension=values" strings:

#![allow(unused)]
fn main() {
use qubed::Qube;
use serde_json::json;

let q = Qube::from_json(json!({
    "class=od": {
        "expver=0001/0002": {
            "param=1/2": {}
        }
    },
    "class=rd": {
        "expver=0001": { "param=1/2/3": {} },
        "expver=0002": { "param=1/2": {} }
    }
})).unwrap();
}

Programmatically

Build a Qube node by node:

#![allow(unused)]
fn main() {
use qubed::{Qube, Coordinates};

let mut q = Qube::new();
let root = q.root();

// Create coordinate values
let class_coords = Coordinates::from_string("od");
let child = q.create_child("class", root, Some(class_coords)).unwrap();

let exp_coords = Coordinates::from_string("0001/0002");
let exp = q.create_child("expver", child, Some(exp_coords)).unwrap();

let param_coords = Coordinates::from_string("1/2");
q.create_child("param", exp, Some(param_coords)).unwrap();
}

From a Datacube

Build a Qube from a Datacube (a flat map of dimensions to coordinates):

#![allow(unused)]
fn main() {
use qubed::{Datacube, Qube, Coordinates};

let mut dc = Datacube::new();
dc.add_coordinate("class", Coordinates::from_string("od/rd"));
dc.add_coordinate("expver", Coordinates::from_string("0001/0002"));
dc.add_coordinate("param", Coordinates::from_string("1/2"));

let order = vec!["class".to_string(), "expver".to_string(), "param".to_string()];
let q = Qube::from_datacube(&dc, Some(&order));
}

Compression

Compression merges sibling nodes with identical subtree structure. This is the defining operation of Qubed — it keeps trees compact without losing information.

#![allow(unused)]
fn main() {
let mut q = Qube::from_ascii(r#"root
├── class=od
│   ├── expver=0001
│   │   ├── param=1
│   │   └── param=2
│   └── expver=0002
│       ├── param=1
│       └── param=2
└── class=rd
    ├── expver=0001
    │   ├── param=1
    │   └── param=2
    └── expver=0002
        ├── param=1
        └── param=2"#).unwrap();

q.compress();
println!("{}", q.to_ascii());
// root
// └── class=od/rd, expver=0001/0002, param=1/2
}

After compression, the number of leaf identifiers is preserved but the tree has far fewer nodes.

Selection

Select a subset of the tree by providing dimension constraints:

#![allow(unused)]
fn main() {
use qubed::Qube;
use qubed::select::SelectMode;

let q = Qube::from_ascii(r#"root
├── class=od
│   ├── expver=0001
│   │   ├── param=1
│   │   └── param=2
│   └── expver=0002
│       ├── param=1
│       └── param=2
└── class=rd
    ├── expver=0001
    │   ├── param=1
    │   ├── param=2
    │   └── param=3
    └── expver=0002
        ├── param=1
        └── param=2"#).unwrap();

// Select only class=od, param=1
let selection = [("class", &[1]), ("param", &[1])];
let result = q.select(&selection, SelectMode::Default).unwrap();
println!("{}", result.to_ascii());
}

SelectMode::Prune additionally removes branches that don’t contain all selected dimensions.

Union (Append)

Merge two Qubes together. The result contains all identifiers from both:

#![allow(unused)]
fn main() {
let mut a = Qube::from_ascii(r#"root
└── class=od, expver=0001, param=1/2"#).unwrap();

let mut b = Qube::from_ascii(r#"root
└── class=rd, expver=0002, param=3/4"#).unwrap();

a.append(&mut b);
// b is now empty; a contains the union, automatically compressed
println!("{}", a.to_ascii());
}

For merging many Qubes at once, append_many is more efficient — it performs intermediate compression every 500 merges:

#![allow(unused)]
fn main() {
let mut base = Qube::new();
let mut others: Vec<Qube> = vec![/* ... */];
base.append_many(&mut others);
}

Iteration

Datacubes

Decompose the Qube back into individual dense datacubes. Each datacube is a HashMap<String, Coordinates>:

#![allow(unused)]
fn main() {
let datacubes = q.to_datacubes();
for dc in &datacubes {
    for (dim, coords) in dc.coordinates() {
        println!("  {} = {}", dim, coords.to_string());
    }
}
}

Leaf Count

Get the number of individual identifiers without expanding:

#![allow(unused)]
fn main() {
let count = q.datacube_count();
println!("This qube contains {} identifiers", count);
}

Serialization

ASCII

Human-readable tree format, useful for debugging and display:

#![allow(unused)]
fn main() {
let ascii = q.to_ascii();
let roundtrip = Qube::from_ascii(&ascii).unwrap();
}

Nested JSON

Keys are "dimension=values" strings, values are child objects:

#![allow(unused)]
fn main() {
let json_val = q.to_json();
let json_str = serde_json::to_string_pretty(&json_val).unwrap();
let roundtrip = Qube::from_json(json_val).unwrap();
}

Arena JSON

A flat BFS array of node records — more suitable for programmatic consumption and web transport:

#![allow(unused)]
fn main() {
let arena = q.to_arena_json();
// Each entry: { "dim": "class", "coords": "od/rd", "parent": 0, "children": [1, 2] }
let restored = Qube::from_arena_json(arena).unwrap();
}

Each node in the array contains:

  • dim — dimension name (e.g. "class")
  • coords — coordinate values as a /-separated string
  • parent — index of the parent node (or null for root)
  • children — array of child node indices

Coordinate Types

The Coordinates enum supports multiple value types and automatically categorizes them:

VariantStoresExample
EmptyNo values(default for root)
IntegersSorted i32 values1/2/3
FloatsSorted f64 values0.1/0.5
StringsSorted string valuesod/rd
MixedCombination of above1/od/0.5

Leading Zero Preservation

Values with leading zeros (like "0001") are preserved as strings rather than parsed as integers. The detection logic: if a token has length > 1, starts with '0', and the second character is a digit, it’s stored as a String.

#![allow(unused)]
fn main() {
let coords = Coordinates::from_string("0001/0002");
assert_eq!(coords.to_string(), "0001/0002"); // NOT "1/2"
}

Operations on Coordinates

#![allow(unused)]
fn main() {
use qubed::Coordinates;

// Append values
let mut c = Coordinates::new();
c.append(1_i32);
c.append(2_i32);

// Extend from another
let other = Coordinates::from_string("3/4");
c.extend(&other);

// Intersect
let a = Coordinates::from_string("1/2/3");
let b = Coordinates::from_string("2/3/4");
let result = a.intersect(&b);
// result.intersection = [2, 3]
// result.only_a = [1]
// result.only_b = [4]
}

Tree Inspection

#![allow(unused)]
fn main() {
// Check if a qube has any content
let is_empty = q.is_empty();

// Get the number of leaf datacubes
let n = q.datacube_count();

// Get all unique dimension→coordinates pairs across the entire tree
let all = q.all_unique_dim_coords();
for (dim, coords) in &all {
    println!("{}: {}", dim, coords.to_string());
}
}

Node Navigation

Access individual nodes via NodeRef:

#![allow(unused)]
fn main() {
let root = q.root();
let root_node = q.node(root).unwrap();

// Dimension name
let dim = root_node.dimension(); // Some("root")

// Coordinates
let coords = root_node.coordinates();

// Iterate children
for child_id in root_node.all_children() {
    let child = q.node(child_id).unwrap();
    println!("{} = {}", child.dimension().unwrap(), child.coordinates().to_string());
}

// Get all dimensions in subtree
let dims = root_node.span();

// Walk ancestors
for ancestor_id in root_node.ancestors() {
    // ...
}
}

qubed — Core Library

The qubed crate provides the core Qube data structure, Coordinates types, compression, selection, serialization, and datacube conversion.

Cargo.toml:

[dependencies]
qubed = { path = "qubed" }

Qube

The central type. A Qube is a slot-map-backed tree where each node has a dimension name, a set of coordinate values, and children grouped by dimension.

Construction

MethodSignatureDescription
newfn new() -> QubeCreate an empty Qube with just a root node
from_asciifn from_ascii(input: &str) -> Result<Qube, String>Parse an ASCII tree representation
from_jsonfn from_json(value: Value) -> Result<Qube, String>Parse a nested JSON object
from_arena_jsonfn from_arena_json(value: Value) -> Result<Qube, String>Parse a BFS flat-array JSON layout
from_datacubefn from_datacube(dc: &Datacube, order: Option<&[String]>) -> QubeBuild from a flat datacube with optional dimension ordering

Example — from ASCII:

#![allow(unused)]
fn main() {
use qubed::Qube;

let q = Qube::from_ascii(r#"root
├── class=od
│   ├── expver=0001, param=1/2
│   └── expver=0002, param=1/2
└── class=rd
    ├── expver=0001, param=1/2/3
    └── expver=0002, param=1/2"#).unwrap();
}

Example — from nested JSON:

#![allow(unused)]
fn main() {
use qubed::Qube;
use serde_json::json;

let q = Qube::from_json(json!({
    "class=od": {
        "expver=0001/0002": { "param=1/2": {} }
    },
    "class=rd": {
        "expver=0001": { "param=1/2/3": {} },
        "expver=0002": { "param=1/2": {} }
    }
})).unwrap();
}

Tree Modification

MethodSignatureDescription
create_childfn create_child(&mut self, key: &str, parent: NodeIdx, coords: Option<Coordinates>) -> Result<NodeIdx, String>Create a child node. Returns existing node if an identical child already exists.
get_or_create_childfn get_or_create_child(&mut self, key: &str, parent_id: NodeIdx, coordinates: Option<Coordinates>) -> Result<NodeIdx, String>Return the existing child with the given dimension+coordinates, or create a new one.
check_if_new_childfn check_if_new_child(&mut self, key: &str, parent_id: NodeIdx, coordinates: Option<Coordinates>) -> Result<bool, String>Return true if no child with the given dimension+coordinates exists yet.
remove_nodefn remove_node(&mut self, id: NodeIdx) -> Result<(), String>Remove a node and all its descendants
appendfn append(&mut self, other: &mut Qube)Union: merge other into self, compress, then clear other
append_manyfn append_many(&mut self, others: &mut Vec<Qube>)Merge many Qubes with periodic compression (every 500)
append_datacubefn append_datacube(&mut self, dc: Datacube, order: Option<&[String]>, accept_existing_order: bool)Append a single Datacube
dropfn drop<I>(&mut self, to_drop: I) -> Result<(), String>Remove one or more dimensions, re-parenting their children, then compress
squeezefn squeeze(&mut self) -> Result<(), String>Drop every dimension whose union of values has length 1

Example — building programmatically:

#![allow(unused)]
fn main() {
use qubed::{Qube, Coordinates};

let mut q = Qube::new();
let root = q.root();

let class = q.create_child("class", root,
    Some(Coordinates::from_string("od"))).unwrap();
let expver = q.create_child("expver", class,
    Some(Coordinates::from_string("0001/0002"))).unwrap();
q.create_child("param", expver,
    Some(Coordinates::from_string("1/2"))).unwrap();
}

Example — union:

#![allow(unused)]
fn main() {
let mut a = Qube::from_ascii("root\n└── class=od, param=1").unwrap();
let mut b = Qube::from_ascii("root\n└── class=rd, param=2").unwrap();
a.append(&mut b);
// a now contains both branches, compressed; b is empty
}

Example — drop:

#![allow(unused)]
fn main() {
let mut q = Qube::from_ascii(r#"root
└── class=1
    ├── expver=0001
    │   └── param=1/2
    └── expver=0002
        └── param=1/2"#).unwrap();

q.drop(vec!["expver"]).unwrap();
// expver is removed; param nodes are re-parented under class
}

Example — squeeze:

#![allow(unused)]
fn main() {
let mut q = Qube::from_ascii(r#"root
└── class=1
    ├── expver=0001
    │   └── param=1/2
    └── expver=0002
        └── param=1/2"#).unwrap();

q.squeeze().unwrap();
// class=1 is the only value for that dimension, so it is dropped
}

Compression

#![allow(unused)]
fn main() {
fn compress(&mut self)
}

Compress the tree in-place. Three phases:

  1. Recursive merge — bottom-up, siblings with the same structural hash have their coordinates merged.
  2. Prune — nodes with Coordinates::Empty are removed.
  3. Dedup — structurally identical siblings are collapsed.

Called automatically by append and append_many.

Selection

#![allow(unused)]
fn main() {
fn select<C>(&self, selection: &[(&str, C)], mode: SelectMode) -> Result<Qube, String>
where C: Into<Coordinates> + Clone
}

Returns a new Qube containing only identifiers matching the constraints. C can be &[i32], Coordinates, or other Into<Coordinates> types.

#![allow(unused)]
fn main() {
fn prune(&mut self, node_id: NodeIdx, has_none_of: HashSet<&str>)
}

Remove branches that don’t contain all of the specified dimensions.

SelectMode:

  • Default — keep branches with at least one matching value per constrained dimension.
  • Prune — additionally remove branches missing any selected dimension entirely.

Serialization

MethodReturnsFormat
to_ascii()StringHuman-readable tree with ├──/└── connectors
to_json()ValueNested JSON: { "key=values": { children } }
to_arena_json()ValueBFS flat array: [{ dim, coords, parent, children }]

Arena JSON node record:

{ "dim": "class", "coords": "od/rd", "parent": null, "children": [1, 2] }

Iteration & Inspection

MethodSignatureDescription
to_datacubesfn to_datacubes(&self) -> Vec<Datacube>Decompose into leaf-path datacubes
datacube_countfn datacube_count(&self) -> usizeCount leaf identifiers without expansion
is_emptyfn is_empty(&self) -> boolTrue if root has no children and no coordinates
all_unique_dim_coordsfn all_unique_dim_coords(&mut self) -> BTreeMap<String, Coordinates>Union of all coordinates per dimension
rootfn root(&self) -> NodeIdxRoot node index
nodefn node(&self, id: NodeIdx) -> Option<NodeRef>Read-only reference to a node
dimensionfn dimension(&self, s: &str) -> Option<Dimension>Look up dimension by name
dimension_strfn dimension_str(&self, d: &Dimension) -> Option<&str>Get dimension name string

NodeRef

Read-only reference to a node in the Qube tree.

MethodReturnsDescription
id()NodeIdxSlot-map key
dimension()Option<&str>Dimension name (e.g. "class")
coordinates()&CoordinatesCoordinate values
child_dimensions()impl Iterator<Item = &Dimension>Distinct child dimension keys
children(key)Option<impl Iterator<Item = NodeIdx>>Children under a specific dimension
all_children()impl Iterator<Item = NodeIdx>All children across all dimensions
children_count()usizeTotal direct children
coordinates_count()usizeNumber of coordinate values
parent()Option<NodeIdx>Parent index
parent_node()Option<NodeRef>Parent as NodeRef
ancestors()impl Iterator<Item = NodeIdx>Walk up to root
span()HashSet<Dimension>All unique dimensions in subtree
structural_hash()Option<u64>Cached structural hash

Coordinates

A typed, ordered set of coordinate values.

Variants

VariantStorageExample
EmptyDefault for root
IntegersSorted i321/2/3
FloatsSorted f640.1/0.5
StringsSorted Stringod/rd
MixedAll three1/od/0.5

Construction

MethodDescription
Coordinates::new()Empty coordinates
Coordinates::from_string(s)Parse |-separated string (also handles / in ASCII context)
From<i32>, From<f64>, From<String>Single-value construction
FromIterator<i32>, FromIterator<f64>, FromIterator<String>Build from iterators

Leading zero preservation: tokens with length > 1 that start with '0' followed by a digit are stored as String to preserve formatting (e.g. "0001" stays "0001", not 1).

Modification

MethodDescription
append(value)Add a single value; auto-promotes to Mixed if types differ
extend(&other)Merge values from another Coordinates
extend_from_iter(iter)Extend from an iterator

Query

MethodDescription
to_string()/-separated string
len()Value count
is_empty()True if no values
contains(value)Membership check (integers only currently)

Set Operations

MethodDescription
intersect(&other)Returns IntersectionResult { intersection, only_a, only_b }
merge_coords(&other)Union (intersection + only_a + only_b combined)

Datacube

A flat HashMap<String, Coordinates> representing one dense datacube.

MethodDescription
new()Create empty
add_coordinate(dim, coords)Add a dimension
coordinates()Access the map
is_empty() / len()Check dimensions

Key Types Summary

TypeDescription
NodeIdxSlotMap key for node identity
DimensionInterned string key (MiniSpur from lasso)
IntersectionResult<T>{ intersection, only_a, only_b }
SelectModeDefault or Prune
CoordinateTypesInteger(i32), Float(f64), String(String)

qubed-meteo — Adapters

The qubed-meteo crate provides domain-specific adapters for ingesting meteorological metadata into Qubes and exporting them to external formats.

Cargo.toml:

[dependencies]
qubed-meteo = { path = "qubed-meteo" }

FromMARSList — MARS List Parser

Trait: qubed_meteo::adapters::mars_list::FromMARSList

#![allow(unused)]
fn main() {
fn from_mars_list(mars_list: &str) -> Result<Qube, String>
}

Parses indentation-based MARS list text into a Qube. This is the format produced by ECMWF’s MARS listing tools, where indentation indicates parent-child relationships.

Input Format

class=od, expver=0001
  param=1/2
  param=3
class=rd, expver=0002
  param=4
  • Lines are split by commas into tokens of the form key=value.
  • Indentation determines hierarchy: indented lines are children of the preceding less-indented line.
  • Slash-separated values (e.g. param=1/2) become multiple coordinate values.
  • The resulting tree is automatically compressed.

Parsing Rules

  1. If a line has deeper indentation than the previous line, its tokens become a chain under the last node of the previous line.
  2. If a line has equal or shallower indentation, it chains under the nearest ancestor in the indentation stack.
  3. Values with leading zeros (e.g. 0001) are preserved as strings.

Example

#![allow(unused)]
fn main() {
use qubed::Qube;
use qubed_meteo::adapters::mars_list::FromMARSList;

let mars_text = "class=od, expver=0001\n  param=1/2\nclass=rd, expver=0002\n  param=3/4";
let qube = Qube::from_mars_list(mars_text).unwrap();
println!("{}", qube.to_ascii());
}

FromFDBList — FDB Path Parser

Trait: qubed_meteo::adapters::fdb::FromFDBList

#![allow(unused)]
fn main() {
fn from_fdb_list(request_map: &serde_json::Value) -> Result<Qube, String>
}

Builds a Qube from FDB-style comma-separated path strings, as produced by the rsfdb listing tools.

Input Format

Each item is a comma-separated sequence of key=value segments:

class=od,expver=0001,param=1/2
class=rd,expver=0003,param=3/4
  • Each segment’s values can be slash-separated for multiple coordinates.
  • Segments without = become dimension-only nodes (no coordinates).
  • Values with leading zeros are preserved as strings.
  • The resulting tree is automatically compressed.

Example

#![allow(unused)]
fn main() {
use qubed::Qube;
use qubed_meteo::adapters::fdb::FromFDBList;
use serde_json::json;

let request_map = json!({
  "class" : "od",
  "expver" : "0001",
  "stream" : "oper",
  "time" : "0000",
  "domain" : "g",
  "levtype" : "sfc",
});

let qube = Qube::from_fdb_list(&request_map).unwrap();
println!("{}", qube.to_ascii());
}

ToDssConstraints — DSS Constraints Exporter

Trait: qubed_meteo::adapters::to_constraints::ToDssConstraints

#![allow(unused)]
fn main() {
fn to_dss_constraints(&self) -> serde_json::Value
}

Converts a Qube into a JSON array of constraint objects, one per leaf-path datacube. Every object contains the same set of dimension keys (the union across all datacubes); dimensions not present in a particular datacube get an empty array.

Output Format

[
  {
    "class": ["od"],
    "expver": ["0001", "0002"],
    "param": ["1", "2"]
  },
  {
    "class": ["rd"],
    "expver": ["0003"],
    "param": ["3", "4"]
  }
]
  • The "root" dimension is excluded from the output.
  • Coordinate values are serialized as string arrays (split on /).

Example

#![allow(unused)]
fn main() {
use qubed::Qube;
use qubed_meteo::adapters::to_constraints::ToDssConstraints;

let q = Qube::from_ascii(r#"root
├── class=od, expver=0001/0002, param=1/2
└── class=rd, expver=0003, param=3/4"#).unwrap();

let constraints = q.to_dss_constraints();
println!("{}", serde_json::to_string_pretty(&constraints).unwrap());
}

FromDssConstraints — DSS Constraints Importer

Trait: qubed_meteo::adapters::dss_constraints::FromDssConstraints

#![allow(unused)]
fn main() {
fn from_dss_constraints(dss_constraints: &serde_json::Value) -> Result<Qube, String>
}

Rebuilds a Qube from DSS-style constraint JSON (array of maps). Each map in the array is parsed as a Datacube, then all datacubes are merged with append_many.

A built-in dimension ordering is applied (origin, forecast_type, hday, day, hmonth, hyear, year, month, time, leadtime_hour, level_type, variable) to produce a consistent tree structure.

Example

#![allow(unused)]
fn main() {
use qubed::Qube;
use qubed_meteo::adapters::dss_constraints::FromDssConstraints;
use serde_json::json;

let constraints = json!([
    { "class": ["od"], "expver": ["0001"], "param": ["1", "2"] },
    { "class": ["rd"], "expver": ["0002"], "param": ["3"] }
]);

let qube = Qube::from_dss_constraints(&constraints).unwrap();
println!("{}", qube.to_ascii());
}

Leading Zero Preservation

All adapters use the same detection logic for preserving leading zeros:

if token.len() > 1
   && token.starts_with('0')
   && token[1].is_ascii_digit()
then
   → store as String (e.g. "0001")
else
   → try parse as i32, then f64, then String

This ensures values like "0001" or "0042" round-trip correctly through serialization, while plain numbers like "1" or "42" are stored as integers.

py_qubed — Python Bindings

The py_qubed package exposes the core qubed Rust library to Python via PyO3. It provides the PyQube class (importable as qubed.PyQube) for building, manipulating, and serializing Qubes from Python.

Installation

cd py_qubed
maturin develop --release

Then in Python:

from qubed import PyQube

PyQube Class

Construction

PyQube()

Create an empty Qube.

q = PyQube()

PyQube.from_ascii(text: str) -> PyQube

Parse an ASCII tree representation:

q = PyQube.from_ascii("""root
├── class=od
│   └── expver=0001/0002, param=1/2
└── class=rd
    ├── expver=0001, param=1/2/3
    └── expver=0002, param=1/2""")

PyQube.from_datacube(datacube: dict[str, str], order: list[str] | None = None) -> PyQube

Build a Qube from a flat datacube dictionary. Each key is a dimension name and each value is a coordinate string (use / to specify multiple values for a dimension, e.g. "1/2/3").

The optional order list controls the nesting order of dimensions in the resulting tree — dimensions listed first become shallower levels. Any dimensions not in order are appended at deeper levels in an unspecified order. When order is None, all dimension ordering is unspecified.

This is the inverse of to_datacubes(): a single dict from that list can be passed back here to reconstruct a single-branch Qube.

# Single identifier
q = PyQube.from_datacube({"class": "od", "expver": "0001", "param": "1"}, ["class", "expver", "param"])
print(q)
# root
# └── class=od
#     └── expver=0001
#         └── param=1

# Multiple values on a dimension
q = PyQube.from_datacube({"class": "od", "param": "1/2/3"}, ["class", "param"])
print(q.all_unique_dim_coords())
# {'class': ['od'], 'param': ['1', '2', '3']}

# Roundtrip from to_datacubes
original = PyQube.from_ascii("root\n└── class=od, expver=0001, param=1")
for dc in original.to_datacubes():
    rebuilt = PyQube.from_datacube(dc, ["class", "expver", "param"])

PyQube.from_arena_json(json_str: str) -> PyQube

Reconstruct a Qube from arena JSON (a flat BFS array produced by to_arena_json):

import json

arena_str = q.to_arena_json()
restored = PyQube.from_arena_json(arena_str)

Serialization

to_ascii() -> str

Return the human-readable ASCII tree representation:

print(q.to_ascii())
# root
# ├── class=od, expver=0001/0002, param=1/2
# └── class=rd
#     ├── expver=0001, param=1/2/3
#     └── expver=0002, param=1/2

Also available as str(q) (via __str__).

to_arena_json() -> str

Return a JSON string containing a flat BFS array of node records:

import json

arena = json.loads(q.to_arena_json())
for node in arena:
    print(node["dim"], node["coords"])

Each record: { "dim": "class", "coords": "od/rd", "parent": 0, "children": [1, 2] }

to_datacubes() -> list[dict]

Decompose into a list of datacube dictionaries. Each dict maps dimension names to coordinate strings:

for dc in q.to_datacubes():
    print(dc)
# {'class': 'od', 'expver': '0001/0002', 'param': '1/2'}
# {'class': 'rd', 'expver': '0001', 'param': '1/2/3'}
# ...

Merging

append(other: PyQube) -> None

Merge another Qube into this one. The result is automatically compressed. other becomes empty.

a = PyQube.from_ascii("root\n└── class=od, param=1")
b = PyQube.from_ascii("root\n└── class=rd, param=2")
a.append(b)
print(a)

append_many(others: list[PyQube]) -> None

Merge multiple Qubes at once:

base = PyQube()
qubes = [PyQube.from_ascii(f"root\n└── class=c{i}, param=1") for i in range(100)]
base.append_many(qubes)

append_datacube(datacube: dict[str, str], order: list[str] | None = None, accept_existing_order: bool = False) -> None

Merge a single flat datacube dictionary into this Qube in-place. This is a convenience wrapper around from_datacube + append: it constructs a temporary single-branch Qube from datacube and merges it, then compresses the result.

order controls the dimension nesting order of the new branch (see from_datacube). accept_existing_order is reserved for future use.

q = PyQube.from_ascii("""root
└── class=od
    └── expver=0001
        └── param=1""")

q.append_datacube({"class": "od", "expver": "0002", "param": "1"}, ["class", "expver", "param"])
print(q.all_unique_dim_coords())
# {'class': ['od'], 'expver': ['0001', '0002'], 'param': ['1']}

# Build a Qube incrementally from a list of datacube dicts
q = PyQube()
for dc in [{"class": "od", "param": "1"}, {"class": "rd", "param": "2"}]:
    q.append_datacube(dc, ["class", "param"])
print(q)
# root
# └── class=od/rd
#     └── param=1/2  (structure may vary)

Manipulation

compress() -> None

Compress the Qube in-place. Merges structurally identical sibling nodes, removes empty nodes, and deduplicates. Called automatically by append and append_many.

q.compress()

drop(dims: list[str]) -> None

Remove one or more dimensions from the tree. Children of removed nodes are re-parented to the grandparent, preserving the rest of the structure. The result is automatically compressed.

q = PyQube.from_ascii("""root
└── class=1
    ├── expver=0001
    │   └── param=1/2
    └── expver=0002
        └── param=1/2""")

q.drop(["expver"])
print(q)
# root
# └── class=1
#     └── param=1/2

squeeze() -> None

Drop all dimensions that have only a single coordinate value. Equivalent to calling drop on every dimension whose union of values has length 1.

q = PyQube.from_ascii("""root
└── class=1
    ├── expver=0001
    │   └── param=1/2
    └── expver=0002
        └── param=1/2""")

q.squeeze()
print(q)
# root
# └── expver=0001/0002
#     └── param=1/2

Query

all_unique_dim_coords() -> dict[str, list[str]]

Return a dictionary mapping each dimension name to a list of all coordinate values that appear anywhere in the Qube.

coords = q.all_unique_dim_coords()
# {'class': ['1'], 'expver': ['0001', '0002'], 'param': ['1', '2']}

select(request: dict, mode: str | None, consume: bool | None) -> PyQube

Return a new Qube containing only the identifiers that satisfy the request. Each key in request is a dimension name; values may be a single string/int or a list.

mode controls behaviour for dimensions absent in a branch:

  • None / any other string — default: keep branches that have at least one matching value.
  • "prune" — additionally remove branches that are missing any requested dimension entirely.
selected = q.select({"class": [1], "param": [1, 2]}, None, None)

Special Methods

MethodDescription
__str__()Same as to_ascii()
__repr__()Returns PyQube(root_id=...)
__len__()Returns datacube_count() — the number of leaf identifiers
q = PyQube.from_ascii("root\n├── class=od, param=1/2\n└── class=rd, param=3")
print(len(q))  # 3

Complete Example

from qubed import PyQube
import json

# Build from ASCII
q = PyQube.from_ascii("""root
├── class=od
│   └── expver=0001/0002, param=1/2
└── class=rd
    ├── expver=0001, param=1/2/3
    └── expver=0002, param=1/2""")

# Inspect
print(f"Identifiers: {len(q)}")
print(q)

# Decompose to datacubes
for dc in q.to_datacubes():
    print(dc)

# Roundtrip through arena JSON
arena = q.to_arena_json()
restored = PyQube.from_arena_json(arena)
assert str(q) == str(restored)

# Merge two qubes
other = PyQube.from_ascii("root\n└── class=xd, expver=0001, param=99")
q.append(other)
print(q)

py_qubed_meteo — Python Adapters

The py_qubed_meteo package exposes the qubed-meteo adapter crate to Python via PyO3. It provides standalone functions for parsing MARS lists, FDB path lists, and converting Qubes to DSS constraint format.

Installation

cd py_qubed_meteo
maturin develop --release

Then in Python:

from qubed_meteo import from_mars_list_py, from_fdb_list_py, to_dss_constraints_py

Functions

from_mars_list_py(text: str) -> str

Parse MARS list text and return the resulting Qube as an ASCII string. The returned string can be passed to PyQube.from_ascii() to get a PyQube object.

Input format: Indentation-based MARS listing where indented lines are children of preceding less-indented lines. Tokens are comma-separated key=value pairs; values can be slash-separated.

from qubed_meteo import from_mars_list_py
from qubed import PyQube

mars_text = """class=od, expver=0001
  param=1/2
  param=3
class=rd, expver=0002
  param=4"""

ascii = from_mars_list_py(mars_text)
q = PyQube.from_ascii(ascii)
print(q)

from_fdb_list_py(items: list[str]) -> str

Build a Qube from a list of FDB-style comma-separated path strings. Returns an ASCII Qube string.

Each string is a comma-separated sequence of key=value segments (e.g. "class=od,expver=0001,param=1/2").

from qubed_meteo import from_fdb_list_py
from qubed import PyQube

items = [
    "class=od,expver=0001,param=1/2",
    "class=rd,expver=0003,param=3/4",
    "class=rd,expver=0002,param=3/4",
]

ascii = from_fdb_list_py(items)
q = PyQube.from_ascii(ascii)
print(q)
# root
# ├── class=od
# │   └── expver=0001
# │       └── param=1/2
# └── class=rd
#     └── expver=0002/0003
#         └── param=3/4

to_dss_constraints_py(ascii: str) -> str

Convert an ASCII Qube string to DSS-style constraints JSON. Returns a JSON string (array of maps).

from qubed_meteo import to_dss_constraints_py
import json

ascii = """root
├── class=od, expver=0001/0002, param=1/2
└── class=rd, expver=0003, param=3/4"""

constraints_json = to_dss_constraints_py(ascii)
constraints = json.loads(constraints_json)

for c in constraints:
    print(c)
# {"class": ["od"], "expver": ["0001", "0002"], "param": ["1", "2"]}
# {"class": ["rd"], "expver": ["0003"], "param": ["3", "4"]}

Each object in the array has the same set of dimension keys. Dimensions not present in a particular datacube get an empty array [].


Complete Workflow Example

from qubed import PyQube
from qubed_meteo import from_fdb_list_py, to_dss_constraints_py
import json

# 1. Ingest from FDB listing
fdb_items = [
    "class=od,expver=0001,param=1/2",
    "class=od,expver=0002,param=1/2",
    "class=rd,expver=0001,param=1/2/3",
]

qube = PyQube.from_ascii(from_fdb_list_py(fdb_items))
print(f"Built qube with {len(qube)} identifiers")
print(qube)

# 2. Export to DSS constraints
constraints = json.loads(to_dss_constraints_py(str(qube)))
print(json.dumps(constraints, indent=2))

# 3. Merge with another qube
extra = PyQube.from_ascii("root\n└── class=xd, expver=0001, param=99")
qube.append(extra)
print(qube)

Notes

  • All adapter functions return ASCII strings as a lightweight bridge format. Use PyQube.from_ascii() to convert to a full PyQube object.
  • Leading zeros in coordinate values (e.g. "0001") are preserved through all adapter functions.
  • The functions raise ValueError on parse failures.