Qubed
Qubed is a Rust library (with Python bindings) for working with trees of datacubes — a compressed data structure that efficiently represents large, sparse collections of key-value identifiers.
What is a Qube?
In many domains — particularly meteorological data — datasets are labelled by sets of key-value pairs called identifiers:
{ class: "od", expver: "0001", param: "1", levtype: "sfc" }
When datasets are dense (every combination of key values exists), they can be represented as a single datacube. In practice, however, datasets are sparse — not every combination is valid. A Qube represents these sparse datasets as a compressed tree of dense datacubes, achieving massive compression while still supporting efficient operations.
For example, a dataset with over 1 billion distinct identifiers can be stored in a Qube with just a few thousand nodes, fitting in a few megabytes of memory.
At a Glance
root
├── class=od, expver=0001/0002, param=1/2
└── class=rd
├── expver=0001, param=1/2/3
└── expver=0002, param=1/2
This tree represents 9 unique identifiers using only 5 nodes instead of 9 leaf entries. Each path from root to leaf defines a dense datacube; the tree as a whole represents their union.
Key Features
- Compression — automatically merges sibling nodes with identical subtree structure, reducing tree size dramatically.
- Set operations — union, intersection, difference, and symmetric difference, all operating directly on the compressed form.
- Selection & filtering — query the tree by dimension and coordinate values.
- Multiple serialization formats — ASCII tree, nested JSON, arena JSON (BFS flat array), and DSS constraints (array-of-maps).
- Adapters — ingest from MARS list format, FDB path lists, and DSS constraint JSON.
- Python bindings — full access to Qube construction, serialization, and adapter functionality from Python via PyO3.
Crate Organization
| Crate | Purpose |
|---|---|
qubed | Core data structure: Qube, Coordinates, compression, selection, serialization |
qubed-meteo | Domain-specific adapters: MARS list parser, FDB list parser, DSS constraints |
py_qubed | Python bindings for the core qubed crate |
py_qubed_meteo | Python bindings for the qubed-meteo adapters |
Getting Started
- For conceptual background, see Background: Datacubes, Trees and Compressed Trees.
- For a hands-on tutorial, jump to the Quickstart.
- For Rust API details, see qubed and qubed-meteo.
- For Python API details, see py_qubed and py_qubed_meteo.
Building the Book
# Install mdbook if needed
cargo install mdbook
# Build
mdbook build docs
# Serve locally with live-reload
mdbook serve docs -o
Datacubes, Trees and Compressed Trees
This chapter introduces the core concepts behind Qubed. Feel free to skip ahead to the Quickstart if you’d rather learn by doing.
Identifiers
Qubed is primarily geared towards dealing with data files uniquely labelled by sets of key-value pairs. We call such a set an identifier:
{
"class": "d1",
"dataset": "climate-dt",
"generation": "1",
"date": "20241102",
"resolution": "high",
"time": "0000"
}
Each identifier maps to exactly one dataset (a GRIB field, a file on disk, an API result, etc.). We’re interested in describing which identifiers currently exist and performing efficient operations over them.
Dense Datacubes
If we’re lucky, the set of identifiers forms a dense datacube — every combination of key values is present:
class=d1/d2, dataset=climate-dt, generation=1/2/3,
model=icon, date=20241102/20241103, resolution=high/low,
time=0000/0600/1200/1800
This single object represents 2 × 1 × 3 × 1 × 2 × 2 × 4 = 96 distinct datasets. Dense datacubes are compact and efficient.
Sparse Datacubes as Trees
In practice, datasets are rarely fully dense. For example, certain models may only produce data at certain resolutions or certain experiments may only cover a subset of parameters.
We can represent which data exists as a tree, where each node carries a dimension name and a set of coordinate values:
root
├── class=od
│ ├── expver=0001
│ │ ├── param=1
│ │ └── param=2
│ └── expver=0002
│ ├── param=1
│ └── param=2
└── class=rd
├── expver=0001
│ ├── param=1
│ ├── param=2
│ └── param=3
└── expver=0002
├── param=1
└── param=2
Each root-to-leaf path defines one identifier. This tree represents 9 identifiers. It can express sparsity that a flat datacube cannot — above, class=rd, expver=0001 has an extra param=3 that class=od doesn’t have.
Compression: Trees of Dense Datacubes
The expanded tree above contains a lot of redundant information. Many subtrees are structurally identical. In practice, real-world data tends to be “nearly dense” — it’s composed of a modest number of dense datacubes.
Qubed compresses the tree by merging sibling nodes that have identical subtree structure. The algorithm computes a structural hash of each node (covering its dimension, children’s keys, children’s values, and recursively their children) and merges siblings whose hashes match:
root
├── class=od, expver=0001/0002, param=1/2
└── class=rd
├── expver=0001, param=1/2/3
└── expver=0002, param=1/2
The 16-node expanded tree is now just 5 nodes — and still represents exactly the same 9 identifiers. Each leaf-path in the compressed tree is a dense datacube.
Restriction: No identical
key=valuepairs may be adjacent siblings. For example, the following would not be allowed:root ├── class=od, expver=0001/0002, param=1/2 └── class=rd ├── expver=0001, param=3 └── expver=0001/0002, param=1/2This restriction ensures that looking up a particular
expvervalue in a branch never requires following multiple children — each value appears under at most one child per dimension.
Scale
At real-world scale these properties are dramatic. For example, the ECMWF Climate DT dataset contains over 1 billion distinct identifiers but can be represented by a Qube with approximately 3,000 nodes in about 11 MB of memory.
What’s Next
- Under the Hood: Algorithms — how set operations and compression work internally.
- Quickstart — build and manipulate Qubes hands-on.
Under the Hood: Algorithms
This chapter explains the key algorithms that power Qubed: set operations on compressed trees and compression itself.
Set Operations
Qubes represent sets of identifiers, so the familiar set operations are all defined:
| Operation | Rust method | Description |
|---|---|---|
| Union | a.append(&mut b) | All identifiers in A or B (or both) |
| Intersection | select with intersection logic | Identifiers in both A and B |
| Difference | internal set operation | Identifiers in A but not B |
| Symmetric difference | internal set operation | Identifiers in exactly one of A or B |
How It Works
The algorithm traverses both trees in tandem, recursively:
for node_a in level_A:
for node_b in level_B:
just_A, intersection, just_B = fused_set_operation(
node_a.coordinates,
node_b.coordinates
)
At each level, nodes are grouped by dimension. For every pair of nodes sharing the same dimension, the algorithm computes three disjoint sets of coordinate values:
- just_A — values only in node A
- intersection — values in both nodes
- just_B — values only in node B
Depending on the operation:
| Operation | Keeps |
|---|---|
| Union | just_A + intersection + just_B |
| Intersection | intersection only |
| A − B | just_A only |
| B − A | just_B only |
| Symmetric difference | just_A + just_B |
The crucial insight is that each partition gets different children:
- just_A inherits the children of node A
- just_B inherits the children of node B
- intersection gets children computed by recursively calling the set operation on the sub-trees of A and B
This recursive decomposition ensures the result is still a valid compressed Qube.
Performance Considerations
The pairwise comparison is quadratic in the number of matching nodes at each level: $O(N_A \times N_B)$ comparisons per dimension group. In practice this is manageable because:
- Once any of just_A, intersection, or just_B is determined to be empty, it can be discarded immediately.
- For sorted coordinate types (integers, ranges), the intersection can be computed in linear time by walking both sorted lists in tandem.
- After the operation, compression merges any resulting sibling nodes with identical structure, keeping the tree compact.
Compression
Compression is the process of reducing tree size while preserving the set of identifiers. It works in three phases:
Phase 1: Recursive Coordinate Merging
Starting from the leaves and working upward, the algorithm identifies sibling nodes (children of the same parent, sharing the same dimension) that have identical structural hashes.
The structural hash of a node is computed from:
- The node’s dimension name
- The structural hashes of all its children (recursively)
- But not the node’s own coordinate values
Two sibling nodes with the same structural hash have identical subtree shapes. Their coordinates can be merged into a single node without losing information:
Before: After:
├── expver=0001 └── expver=0001/0002
│ ├── param=1 ├── param=1
│ └── param=2 └── param=2
└── expver=0002
├── param=1
└── param=2
Phase 2: Pruning Empty Nodes
After merging, some nodes may have empty coordinate sets (their values were absorbed by a sibling). These empty nodes are pruned from the tree.
Phase 3: Deduplication
A final pass deduplicates any nodes that became structurally identical after merging. This is done by recomputing structural hashes and collapsing identical siblings.
Hash Caching
Structural hashes are cached in each node using an AtomicU64. The cache is invalidated (set to 0) whenever a node or any of its ancestors are modified. This ensures hashes are recomputed lazily only when needed, making repeated compression operations efficient.
The append / Union Workflow
When two Qubes are merged via append:
- The two root nodes are paired and
node_mergeis called recursively. - At each level, children are grouped by dimension and the internal set operation produces the three partitions (just_A, intersection, just_B).
- For the intersection partition, new nodes are created and children are recursively merged.
- For just_B partitions, subtrees are copied from the other Qube into self.
- After all merging is complete,
compress()is called to re-compress the result.
The append_many method optimizes merging many Qubes by batching: it performs intermediate compression every 500 Qubes to prevent unbounded tree growth.
Quickstart
This chapter walks through building, manipulating, and querying Qubes using the Rust API. For the equivalent Python API, see the Python bindings chapter.
Creating a Qube
From ASCII Representation
The most readable way to build a Qube is from its ASCII tree representation:
#![allow(unused)]
fn main() {
use qubed::Qube;
let q = Qube::from_ascii(r#"root
├── class=od
│ ├── expver=0001
│ │ ├── param=1
│ │ └── param=2
│ └── expver=0002
│ ├── param=1
│ └── param=2
└── class=rd
├── expver=0001
│ ├── param=1
│ ├── param=2
│ └── param=3
└── expver=0002
├── param=1
└── param=2"#).unwrap();
println!("{}", q.to_ascii());
}
Each line takes the form key=value where multiple values are separated by /:
root
├── class=od, expver=0001/0002, param=1/2
From Nested JSON
You can also build a Qube from a JSON object where keys are "dimension=values" strings:
#![allow(unused)]
fn main() {
use qubed::Qube;
use serde_json::json;
let q = Qube::from_json(json!({
"class=od": {
"expver=0001/0002": {
"param=1/2": {}
}
},
"class=rd": {
"expver=0001": { "param=1/2/3": {} },
"expver=0002": { "param=1/2": {} }
}
})).unwrap();
}
Programmatically
Build a Qube node by node:
#![allow(unused)]
fn main() {
use qubed::{Qube, Coordinates};
let mut q = Qube::new();
let root = q.root();
// Create coordinate values
let class_coords = Coordinates::from_string("od");
let child = q.create_child("class", root, Some(class_coords)).unwrap();
let exp_coords = Coordinates::from_string("0001/0002");
let exp = q.create_child("expver", child, Some(exp_coords)).unwrap();
let param_coords = Coordinates::from_string("1/2");
q.create_child("param", exp, Some(param_coords)).unwrap();
}
From a Datacube
Build a Qube from a Datacube (a flat map of dimensions to coordinates):
#![allow(unused)]
fn main() {
use qubed::{Datacube, Qube, Coordinates};
let mut dc = Datacube::new();
dc.add_coordinate("class", Coordinates::from_string("od/rd"));
dc.add_coordinate("expver", Coordinates::from_string("0001/0002"));
dc.add_coordinate("param", Coordinates::from_string("1/2"));
let order = vec!["class".to_string(), "expver".to_string(), "param".to_string()];
let q = Qube::from_datacube(&dc, Some(&order));
}
Compression
Compression merges sibling nodes with identical subtree structure. This is the defining operation of Qubed — it keeps trees compact without losing information.
#![allow(unused)]
fn main() {
let mut q = Qube::from_ascii(r#"root
├── class=od
│ ├── expver=0001
│ │ ├── param=1
│ │ └── param=2
│ └── expver=0002
│ ├── param=1
│ └── param=2
└── class=rd
├── expver=0001
│ ├── param=1
│ └── param=2
└── expver=0002
├── param=1
└── param=2"#).unwrap();
q.compress();
println!("{}", q.to_ascii());
// root
// └── class=od/rd, expver=0001/0002, param=1/2
}
After compression, the number of leaf identifiers is preserved but the tree has far fewer nodes.
Selection
Select a subset of the tree by providing dimension constraints:
#![allow(unused)]
fn main() {
use qubed::Qube;
use qubed::select::SelectMode;
let q = Qube::from_ascii(r#"root
├── class=od
│ ├── expver=0001
│ │ ├── param=1
│ │ └── param=2
│ └── expver=0002
│ ├── param=1
│ └── param=2
└── class=rd
├── expver=0001
│ ├── param=1
│ ├── param=2
│ └── param=3
└── expver=0002
├── param=1
└── param=2"#).unwrap();
// Select only class=od, param=1
let selection = [("class", &[1]), ("param", &[1])];
let result = q.select(&selection, SelectMode::Default).unwrap();
println!("{}", result.to_ascii());
}
SelectMode::Prune additionally removes branches that don’t contain all selected dimensions.
Union (Append)
Merge two Qubes together. The result contains all identifiers from both:
#![allow(unused)]
fn main() {
let mut a = Qube::from_ascii(r#"root
└── class=od, expver=0001, param=1/2"#).unwrap();
let mut b = Qube::from_ascii(r#"root
└── class=rd, expver=0002, param=3/4"#).unwrap();
a.append(&mut b);
// b is now empty; a contains the union, automatically compressed
println!("{}", a.to_ascii());
}
For merging many Qubes at once, append_many is more efficient — it performs intermediate compression every 500 merges:
#![allow(unused)]
fn main() {
let mut base = Qube::new();
let mut others: Vec<Qube> = vec![/* ... */];
base.append_many(&mut others);
}
Iteration
Datacubes
Decompose the Qube back into individual dense datacubes. Each datacube is a HashMap<String, Coordinates>:
#![allow(unused)]
fn main() {
let datacubes = q.to_datacubes();
for dc in &datacubes {
for (dim, coords) in dc.coordinates() {
println!(" {} = {}", dim, coords.to_string());
}
}
}
Leaf Count
Get the number of individual identifiers without expanding:
#![allow(unused)]
fn main() {
let count = q.datacube_count();
println!("This qube contains {} identifiers", count);
}
Serialization
ASCII
Human-readable tree format, useful for debugging and display:
#![allow(unused)]
fn main() {
let ascii = q.to_ascii();
let roundtrip = Qube::from_ascii(&ascii).unwrap();
}
Nested JSON
Keys are "dimension=values" strings, values are child objects:
#![allow(unused)]
fn main() {
let json_val = q.to_json();
let json_str = serde_json::to_string_pretty(&json_val).unwrap();
let roundtrip = Qube::from_json(json_val).unwrap();
}
Arena JSON
A flat BFS array of node records — more suitable for programmatic consumption and web transport:
#![allow(unused)]
fn main() {
let arena = q.to_arena_json();
// Each entry: { "dim": "class", "coords": "od/rd", "parent": 0, "children": [1, 2] }
let restored = Qube::from_arena_json(arena).unwrap();
}
Each node in the array contains:
dim— dimension name (e.g."class")coords— coordinate values as a/-separated stringparent— index of the parent node (ornullfor root)children— array of child node indices
Coordinate Types
The Coordinates enum supports multiple value types and automatically categorizes them:
| Variant | Stores | Example |
|---|---|---|
Empty | No values | (default for root) |
Integers | Sorted i32 values | 1/2/3 |
Floats | Sorted f64 values | 0.1/0.5 |
Strings | Sorted string values | od/rd |
Mixed | Combination of above | 1/od/0.5 |
Leading Zero Preservation
Values with leading zeros (like "0001") are preserved as strings rather than parsed as integers. The detection logic: if a token has length > 1, starts with '0', and the second character is a digit, it’s stored as a String.
#![allow(unused)]
fn main() {
let coords = Coordinates::from_string("0001/0002");
assert_eq!(coords.to_string(), "0001/0002"); // NOT "1/2"
}
Operations on Coordinates
#![allow(unused)]
fn main() {
use qubed::Coordinates;
// Append values
let mut c = Coordinates::new();
c.append(1_i32);
c.append(2_i32);
// Extend from another
let other = Coordinates::from_string("3/4");
c.extend(&other);
// Intersect
let a = Coordinates::from_string("1/2/3");
let b = Coordinates::from_string("2/3/4");
let result = a.intersect(&b);
// result.intersection = [2, 3]
// result.only_a = [1]
// result.only_b = [4]
}
Tree Inspection
#![allow(unused)]
fn main() {
// Check if a qube has any content
let is_empty = q.is_empty();
// Get the number of leaf datacubes
let n = q.datacube_count();
// Get all unique dimension→coordinates pairs across the entire tree
let all = q.all_unique_dim_coords();
for (dim, coords) in &all {
println!("{}: {}", dim, coords.to_string());
}
}
Node Navigation
Access individual nodes via NodeRef:
#![allow(unused)]
fn main() {
let root = q.root();
let root_node = q.node(root).unwrap();
// Dimension name
let dim = root_node.dimension(); // Some("root")
// Coordinates
let coords = root_node.coordinates();
// Iterate children
for child_id in root_node.all_children() {
let child = q.node(child_id).unwrap();
println!("{} = {}", child.dimension().unwrap(), child.coordinates().to_string());
}
// Get all dimensions in subtree
let dims = root_node.span();
// Walk ancestors
for ancestor_id in root_node.ancestors() {
// ...
}
}
qubed — Core Library
The qubed crate provides the core Qube data structure, Coordinates types, compression, selection, serialization, and datacube conversion.
Cargo.toml:
[dependencies]
qubed = { path = "qubed" }
Qube
The central type. A Qube is a slot-map-backed tree where each node has a dimension name, a set of coordinate values, and children grouped by dimension.
Construction
| Method | Signature | Description |
|---|---|---|
new | fn new() -> Qube | Create an empty Qube with just a root node |
from_ascii | fn from_ascii(input: &str) -> Result<Qube, String> | Parse an ASCII tree representation |
from_json | fn from_json(value: Value) -> Result<Qube, String> | Parse a nested JSON object |
from_arena_json | fn from_arena_json(value: Value) -> Result<Qube, String> | Parse a BFS flat-array JSON layout |
from_datacube | fn from_datacube(dc: &Datacube, order: Option<&[String]>) -> Qube | Build from a flat datacube with optional dimension ordering |
Example — from ASCII:
#![allow(unused)]
fn main() {
use qubed::Qube;
let q = Qube::from_ascii(r#"root
├── class=od
│ ├── expver=0001, param=1/2
│ └── expver=0002, param=1/2
└── class=rd
├── expver=0001, param=1/2/3
└── expver=0002, param=1/2"#).unwrap();
}
Example — from nested JSON:
#![allow(unused)]
fn main() {
use qubed::Qube;
use serde_json::json;
let q = Qube::from_json(json!({
"class=od": {
"expver=0001/0002": { "param=1/2": {} }
},
"class=rd": {
"expver=0001": { "param=1/2/3": {} },
"expver=0002": { "param=1/2": {} }
}
})).unwrap();
}
Tree Modification
| Method | Signature | Description |
|---|---|---|
create_child | fn create_child(&mut self, key: &str, parent: NodeIdx, coords: Option<Coordinates>) -> Result<NodeIdx, String> | Create a child node. Returns existing node if an identical child already exists. |
get_or_create_child | fn get_or_create_child(&mut self, key: &str, parent_id: NodeIdx, coordinates: Option<Coordinates>) -> Result<NodeIdx, String> | Return the existing child with the given dimension+coordinates, or create a new one. |
check_if_new_child | fn check_if_new_child(&mut self, key: &str, parent_id: NodeIdx, coordinates: Option<Coordinates>) -> Result<bool, String> | Return true if no child with the given dimension+coordinates exists yet. |
remove_node | fn remove_node(&mut self, id: NodeIdx) -> Result<(), String> | Remove a node and all its descendants |
append | fn append(&mut self, other: &mut Qube) | Union: merge other into self, compress, then clear other |
append_many | fn append_many(&mut self, others: &mut Vec<Qube>) | Merge many Qubes with periodic compression (every 500) |
append_datacube | fn append_datacube(&mut self, dc: Datacube, order: Option<&[String]>, accept_existing_order: bool) | Append a single Datacube |
drop | fn drop<I>(&mut self, to_drop: I) -> Result<(), String> | Remove one or more dimensions, re-parenting their children, then compress |
squeeze | fn squeeze(&mut self) -> Result<(), String> | Drop every dimension whose union of values has length 1 |
Example — building programmatically:
#![allow(unused)]
fn main() {
use qubed::{Qube, Coordinates};
let mut q = Qube::new();
let root = q.root();
let class = q.create_child("class", root,
Some(Coordinates::from_string("od"))).unwrap();
let expver = q.create_child("expver", class,
Some(Coordinates::from_string("0001/0002"))).unwrap();
q.create_child("param", expver,
Some(Coordinates::from_string("1/2"))).unwrap();
}
Example — union:
#![allow(unused)]
fn main() {
let mut a = Qube::from_ascii("root\n└── class=od, param=1").unwrap();
let mut b = Qube::from_ascii("root\n└── class=rd, param=2").unwrap();
a.append(&mut b);
// a now contains both branches, compressed; b is empty
}
Example — drop:
#![allow(unused)]
fn main() {
let mut q = Qube::from_ascii(r#"root
└── class=1
├── expver=0001
│ └── param=1/2
└── expver=0002
└── param=1/2"#).unwrap();
q.drop(vec!["expver"]).unwrap();
// expver is removed; param nodes are re-parented under class
}
Example — squeeze:
#![allow(unused)]
fn main() {
let mut q = Qube::from_ascii(r#"root
└── class=1
├── expver=0001
│ └── param=1/2
└── expver=0002
└── param=1/2"#).unwrap();
q.squeeze().unwrap();
// class=1 is the only value for that dimension, so it is dropped
}
Compression
#![allow(unused)]
fn main() {
fn compress(&mut self)
}
Compress the tree in-place. Three phases:
- Recursive merge — bottom-up, siblings with the same structural hash have their coordinates merged.
- Prune — nodes with
Coordinates::Emptyare removed. - Dedup — structurally identical siblings are collapsed.
Called automatically by append and append_many.
Selection
#![allow(unused)]
fn main() {
fn select<C>(&self, selection: &[(&str, C)], mode: SelectMode) -> Result<Qube, String>
where C: Into<Coordinates> + Clone
}
Returns a new Qube containing only identifiers matching the constraints. C can be &[i32], Coordinates, or other Into<Coordinates> types.
#![allow(unused)]
fn main() {
fn prune(&mut self, node_id: NodeIdx, has_none_of: HashSet<&str>)
}
Remove branches that don’t contain all of the specified dimensions.
SelectMode:
Default— keep branches with at least one matching value per constrained dimension.Prune— additionally remove branches missing any selected dimension entirely.
Serialization
| Method | Returns | Format |
|---|---|---|
to_ascii() | String | Human-readable tree with ├──/└── connectors |
to_json() | Value | Nested JSON: { "key=values": { children } } |
to_arena_json() | Value | BFS flat array: [{ dim, coords, parent, children }] |
Arena JSON node record:
{ "dim": "class", "coords": "od/rd", "parent": null, "children": [1, 2] }
Iteration & Inspection
| Method | Signature | Description |
|---|---|---|
to_datacubes | fn to_datacubes(&self) -> Vec<Datacube> | Decompose into leaf-path datacubes |
datacube_count | fn datacube_count(&self) -> usize | Count leaf identifiers without expansion |
is_empty | fn is_empty(&self) -> bool | True if root has no children and no coordinates |
all_unique_dim_coords | fn all_unique_dim_coords(&mut self) -> BTreeMap<String, Coordinates> | Union of all coordinates per dimension |
root | fn root(&self) -> NodeIdx | Root node index |
node | fn node(&self, id: NodeIdx) -> Option<NodeRef> | Read-only reference to a node |
dimension | fn dimension(&self, s: &str) -> Option<Dimension> | Look up dimension by name |
dimension_str | fn dimension_str(&self, d: &Dimension) -> Option<&str> | Get dimension name string |
NodeRef
Read-only reference to a node in the Qube tree.
| Method | Returns | Description |
|---|---|---|
id() | NodeIdx | Slot-map key |
dimension() | Option<&str> | Dimension name (e.g. "class") |
coordinates() | &Coordinates | Coordinate values |
child_dimensions() | impl Iterator<Item = &Dimension> | Distinct child dimension keys |
children(key) | Option<impl Iterator<Item = NodeIdx>> | Children under a specific dimension |
all_children() | impl Iterator<Item = NodeIdx> | All children across all dimensions |
children_count() | usize | Total direct children |
coordinates_count() | usize | Number of coordinate values |
parent() | Option<NodeIdx> | Parent index |
parent_node() | Option<NodeRef> | Parent as NodeRef |
ancestors() | impl Iterator<Item = NodeIdx> | Walk up to root |
span() | HashSet<Dimension> | All unique dimensions in subtree |
structural_hash() | Option<u64> | Cached structural hash |
Coordinates
A typed, ordered set of coordinate values.
Variants
| Variant | Storage | Example |
|---|---|---|
Empty | — | Default for root |
Integers | Sorted i32 | 1/2/3 |
Floats | Sorted f64 | 0.1/0.5 |
Strings | Sorted String | od/rd |
Mixed | All three | 1/od/0.5 |
Construction
| Method | Description |
|---|---|
Coordinates::new() | Empty coordinates |
Coordinates::from_string(s) | Parse |-separated string (also handles / in ASCII context) |
From<i32>, From<f64>, From<String> | Single-value construction |
FromIterator<i32>, FromIterator<f64>, FromIterator<String> | Build from iterators |
Leading zero preservation: tokens with length > 1 that start with '0' followed by a digit are stored as String to preserve formatting (e.g. "0001" stays "0001", not 1).
Modification
| Method | Description |
|---|---|
append(value) | Add a single value; auto-promotes to Mixed if types differ |
extend(&other) | Merge values from another Coordinates |
extend_from_iter(iter) | Extend from an iterator |
Query
| Method | Description |
|---|---|
to_string() | /-separated string |
len() | Value count |
is_empty() | True if no values |
contains(value) | Membership check (integers only currently) |
Set Operations
| Method | Description |
|---|---|
intersect(&other) | Returns IntersectionResult { intersection, only_a, only_b } |
merge_coords(&other) | Union (intersection + only_a + only_b combined) |
Datacube
A flat HashMap<String, Coordinates> representing one dense datacube.
| Method | Description |
|---|---|
new() | Create empty |
add_coordinate(dim, coords) | Add a dimension |
coordinates() | Access the map |
is_empty() / len() | Check dimensions |
Key Types Summary
| Type | Description |
|---|---|
NodeIdx | SlotMap key for node identity |
Dimension | Interned string key (MiniSpur from lasso) |
IntersectionResult<T> | { intersection, only_a, only_b } |
SelectMode | Default or Prune |
CoordinateTypes | Integer(i32), Float(f64), String(String) |
qubed-meteo — Adapters
The qubed-meteo crate provides domain-specific adapters for ingesting meteorological metadata into Qubes and exporting them to external formats.
Cargo.toml:
[dependencies]
qubed-meteo = { path = "qubed-meteo" }
FromMARSList — MARS List Parser
Trait: qubed_meteo::adapters::mars_list::FromMARSList
#![allow(unused)]
fn main() {
fn from_mars_list(mars_list: &str) -> Result<Qube, String>
}
Parses indentation-based MARS list text into a Qube. This is the format produced by ECMWF’s MARS listing tools, where indentation indicates parent-child relationships.
Input Format
class=od, expver=0001
param=1/2
param=3
class=rd, expver=0002
param=4
- Lines are split by commas into tokens of the form
key=value. - Indentation determines hierarchy: indented lines are children of the preceding less-indented line.
- Slash-separated values (e.g.
param=1/2) become multiple coordinate values. - The resulting tree is automatically compressed.
Parsing Rules
- If a line has deeper indentation than the previous line, its tokens become a chain under the last node of the previous line.
- If a line has equal or shallower indentation, it chains under the nearest ancestor in the indentation stack.
- Values with leading zeros (e.g.
0001) are preserved as strings.
Example
#![allow(unused)]
fn main() {
use qubed::Qube;
use qubed_meteo::adapters::mars_list::FromMARSList;
let mars_text = "class=od, expver=0001\n param=1/2\nclass=rd, expver=0002\n param=3/4";
let qube = Qube::from_mars_list(mars_text).unwrap();
println!("{}", qube.to_ascii());
}
FromFDBList — FDB Path Parser
Trait: qubed_meteo::adapters::fdb::FromFDBList
#![allow(unused)]
fn main() {
fn from_fdb_list(request_map: &serde_json::Value) -> Result<Qube, String>
}
Builds a Qube from FDB-style comma-separated path strings, as produced by the rsfdb listing tools.
Input Format
Each item is a comma-separated sequence of key=value segments:
class=od,expver=0001,param=1/2
class=rd,expver=0003,param=3/4
- Each segment’s values can be slash-separated for multiple coordinates.
- Segments without
=become dimension-only nodes (no coordinates). - Values with leading zeros are preserved as strings.
- The resulting tree is automatically compressed.
Example
#![allow(unused)]
fn main() {
use qubed::Qube;
use qubed_meteo::adapters::fdb::FromFDBList;
use serde_json::json;
let request_map = json!({
"class" : "od",
"expver" : "0001",
"stream" : "oper",
"time" : "0000",
"domain" : "g",
"levtype" : "sfc",
});
let qube = Qube::from_fdb_list(&request_map).unwrap();
println!("{}", qube.to_ascii());
}
ToDssConstraints — DSS Constraints Exporter
Trait: qubed_meteo::adapters::to_constraints::ToDssConstraints
#![allow(unused)]
fn main() {
fn to_dss_constraints(&self) -> serde_json::Value
}
Converts a Qube into a JSON array of constraint objects, one per leaf-path datacube. Every object contains the same set of dimension keys (the union across all datacubes); dimensions not present in a particular datacube get an empty array.
Output Format
[
{
"class": ["od"],
"expver": ["0001", "0002"],
"param": ["1", "2"]
},
{
"class": ["rd"],
"expver": ["0003"],
"param": ["3", "4"]
}
]
- The
"root"dimension is excluded from the output. - Coordinate values are serialized as string arrays (split on
/).
Example
#![allow(unused)]
fn main() {
use qubed::Qube;
use qubed_meteo::adapters::to_constraints::ToDssConstraints;
let q = Qube::from_ascii(r#"root
├── class=od, expver=0001/0002, param=1/2
└── class=rd, expver=0003, param=3/4"#).unwrap();
let constraints = q.to_dss_constraints();
println!("{}", serde_json::to_string_pretty(&constraints).unwrap());
}
FromDssConstraints — DSS Constraints Importer
Trait: qubed_meteo::adapters::dss_constraints::FromDssConstraints
#![allow(unused)]
fn main() {
fn from_dss_constraints(dss_constraints: &serde_json::Value) -> Result<Qube, String>
}
Rebuilds a Qube from DSS-style constraint JSON (array of maps). Each map in the array is parsed as a Datacube, then all datacubes are merged with append_many.
A built-in dimension ordering is applied (origin, forecast_type, hday, day, hmonth, hyear, year, month, time, leadtime_hour, level_type, variable) to produce a consistent tree structure.
Example
#![allow(unused)]
fn main() {
use qubed::Qube;
use qubed_meteo::adapters::dss_constraints::FromDssConstraints;
use serde_json::json;
let constraints = json!([
{ "class": ["od"], "expver": ["0001"], "param": ["1", "2"] },
{ "class": ["rd"], "expver": ["0002"], "param": ["3"] }
]);
let qube = Qube::from_dss_constraints(&constraints).unwrap();
println!("{}", qube.to_ascii());
}
Leading Zero Preservation
All adapters use the same detection logic for preserving leading zeros:
if token.len() > 1
&& token.starts_with('0')
&& token[1].is_ascii_digit()
then
→ store as String (e.g. "0001")
else
→ try parse as i32, then f64, then String
This ensures values like "0001" or "0042" round-trip correctly through serialization, while plain numbers like "1" or "42" are stored as integers.
py_qubed — Python Bindings
The py_qubed package exposes the core qubed Rust library to Python via PyO3. It provides the PyQube class (importable as qubed.PyQube) for building, manipulating, and serializing Qubes from Python.
Installation
cd py_qubed
maturin develop --release
Then in Python:
from qubed import PyQube
PyQube Class
Construction
PyQube()
Create an empty Qube.
q = PyQube()
PyQube.from_ascii(text: str) -> PyQube
Parse an ASCII tree representation:
q = PyQube.from_ascii("""root
├── class=od
│ └── expver=0001/0002, param=1/2
└── class=rd
├── expver=0001, param=1/2/3
└── expver=0002, param=1/2""")
PyQube.from_datacube(datacube: dict[str, str], order: list[str] | None = None) -> PyQube
Build a Qube from a flat datacube dictionary. Each key is a dimension name and each value is a coordinate string (use / to specify multiple values for a dimension, e.g. "1/2/3").
The optional order list controls the nesting order of dimensions in the resulting tree — dimensions listed first become shallower levels. Any dimensions not in order are appended at deeper levels in an unspecified order. When order is None, all dimension ordering is unspecified.
This is the inverse of to_datacubes(): a single dict from that list can be passed back here to reconstruct a single-branch Qube.
# Single identifier
q = PyQube.from_datacube({"class": "od", "expver": "0001", "param": "1"}, ["class", "expver", "param"])
print(q)
# root
# └── class=od
# └── expver=0001
# └── param=1
# Multiple values on a dimension
q = PyQube.from_datacube({"class": "od", "param": "1/2/3"}, ["class", "param"])
print(q.all_unique_dim_coords())
# {'class': ['od'], 'param': ['1', '2', '3']}
# Roundtrip from to_datacubes
original = PyQube.from_ascii("root\n└── class=od, expver=0001, param=1")
for dc in original.to_datacubes():
rebuilt = PyQube.from_datacube(dc, ["class", "expver", "param"])
PyQube.from_arena_json(json_str: str) -> PyQube
Reconstruct a Qube from arena JSON (a flat BFS array produced by to_arena_json):
import json
arena_str = q.to_arena_json()
restored = PyQube.from_arena_json(arena_str)
Serialization
to_ascii() -> str
Return the human-readable ASCII tree representation:
print(q.to_ascii())
# root
# ├── class=od, expver=0001/0002, param=1/2
# └── class=rd
# ├── expver=0001, param=1/2/3
# └── expver=0002, param=1/2
Also available as str(q) (via __str__).
to_arena_json() -> str
Return a JSON string containing a flat BFS array of node records:
import json
arena = json.loads(q.to_arena_json())
for node in arena:
print(node["dim"], node["coords"])
Each record: { "dim": "class", "coords": "od/rd", "parent": 0, "children": [1, 2] }
to_datacubes() -> list[dict]
Decompose into a list of datacube dictionaries. Each dict maps dimension names to coordinate strings:
for dc in q.to_datacubes():
print(dc)
# {'class': 'od', 'expver': '0001/0002', 'param': '1/2'}
# {'class': 'rd', 'expver': '0001', 'param': '1/2/3'}
# ...
Merging
append(other: PyQube) -> None
Merge another Qube into this one. The result is automatically compressed. other becomes empty.
a = PyQube.from_ascii("root\n└── class=od, param=1")
b = PyQube.from_ascii("root\n└── class=rd, param=2")
a.append(b)
print(a)
append_many(others: list[PyQube]) -> None
Merge multiple Qubes at once:
base = PyQube()
qubes = [PyQube.from_ascii(f"root\n└── class=c{i}, param=1") for i in range(100)]
base.append_many(qubes)
append_datacube(datacube: dict[str, str], order: list[str] | None = None, accept_existing_order: bool = False) -> None
Merge a single flat datacube dictionary into this Qube in-place. This is a convenience wrapper around from_datacube + append: it constructs a temporary single-branch Qube from datacube and merges it, then compresses the result.
order controls the dimension nesting order of the new branch (see from_datacube). accept_existing_order is reserved for future use.
q = PyQube.from_ascii("""root
└── class=od
└── expver=0001
└── param=1""")
q.append_datacube({"class": "od", "expver": "0002", "param": "1"}, ["class", "expver", "param"])
print(q.all_unique_dim_coords())
# {'class': ['od'], 'expver': ['0001', '0002'], 'param': ['1']}
# Build a Qube incrementally from a list of datacube dicts
q = PyQube()
for dc in [{"class": "od", "param": "1"}, {"class": "rd", "param": "2"}]:
q.append_datacube(dc, ["class", "param"])
print(q)
# root
# └── class=od/rd
# └── param=1/2 (structure may vary)
Manipulation
compress() -> None
Compress the Qube in-place. Merges structurally identical sibling nodes, removes empty nodes, and deduplicates. Called automatically by append and append_many.
q.compress()
drop(dims: list[str]) -> None
Remove one or more dimensions from the tree. Children of removed nodes are re-parented to the grandparent, preserving the rest of the structure. The result is automatically compressed.
q = PyQube.from_ascii("""root
└── class=1
├── expver=0001
│ └── param=1/2
└── expver=0002
└── param=1/2""")
q.drop(["expver"])
print(q)
# root
# └── class=1
# └── param=1/2
squeeze() -> None
Drop all dimensions that have only a single coordinate value. Equivalent to calling drop on every dimension whose union of values has length 1.
q = PyQube.from_ascii("""root
└── class=1
├── expver=0001
│ └── param=1/2
└── expver=0002
└── param=1/2""")
q.squeeze()
print(q)
# root
# └── expver=0001/0002
# └── param=1/2
Query
all_unique_dim_coords() -> dict[str, list[str]]
Return a dictionary mapping each dimension name to a list of all coordinate values that appear anywhere in the Qube.
coords = q.all_unique_dim_coords()
# {'class': ['1'], 'expver': ['0001', '0002'], 'param': ['1', '2']}
select(request: dict, mode: str | None, consume: bool | None) -> PyQube
Return a new Qube containing only the identifiers that satisfy the request. Each key in request is a dimension name; values may be a single string/int or a list.
mode controls behaviour for dimensions absent in a branch:
None/ any other string — default: keep branches that have at least one matching value."prune"— additionally remove branches that are missing any requested dimension entirely.
selected = q.select({"class": [1], "param": [1, 2]}, None, None)
Special Methods
| Method | Description |
|---|---|
__str__() | Same as to_ascii() |
__repr__() | Returns PyQube(root_id=...) |
__len__() | Returns datacube_count() — the number of leaf identifiers |
q = PyQube.from_ascii("root\n├── class=od, param=1/2\n└── class=rd, param=3")
print(len(q)) # 3
Complete Example
from qubed import PyQube
import json
# Build from ASCII
q = PyQube.from_ascii("""root
├── class=od
│ └── expver=0001/0002, param=1/2
└── class=rd
├── expver=0001, param=1/2/3
└── expver=0002, param=1/2""")
# Inspect
print(f"Identifiers: {len(q)}")
print(q)
# Decompose to datacubes
for dc in q.to_datacubes():
print(dc)
# Roundtrip through arena JSON
arena = q.to_arena_json()
restored = PyQube.from_arena_json(arena)
assert str(q) == str(restored)
# Merge two qubes
other = PyQube.from_ascii("root\n└── class=xd, expver=0001, param=99")
q.append(other)
print(q)
py_qubed_meteo — Python Adapters
The py_qubed_meteo package exposes the qubed-meteo adapter crate to Python via PyO3. It provides standalone functions for parsing MARS lists, FDB path lists, and converting Qubes to DSS constraint format.
Installation
cd py_qubed_meteo
maturin develop --release
Then in Python:
from qubed_meteo import from_mars_list_py, from_fdb_list_py, to_dss_constraints_py
Functions
from_mars_list_py(text: str) -> str
Parse MARS list text and return the resulting Qube as an ASCII string. The returned string can be passed to PyQube.from_ascii() to get a PyQube object.
Input format: Indentation-based MARS listing where indented lines are children of preceding less-indented lines. Tokens are comma-separated key=value pairs; values can be slash-separated.
from qubed_meteo import from_mars_list_py
from qubed import PyQube
mars_text = """class=od, expver=0001
param=1/2
param=3
class=rd, expver=0002
param=4"""
ascii = from_mars_list_py(mars_text)
q = PyQube.from_ascii(ascii)
print(q)
from_fdb_list_py(items: list[str]) -> str
Build a Qube from a list of FDB-style comma-separated path strings. Returns an ASCII Qube string.
Each string is a comma-separated sequence of key=value segments (e.g. "class=od,expver=0001,param=1/2").
from qubed_meteo import from_fdb_list_py
from qubed import PyQube
items = [
"class=od,expver=0001,param=1/2",
"class=rd,expver=0003,param=3/4",
"class=rd,expver=0002,param=3/4",
]
ascii = from_fdb_list_py(items)
q = PyQube.from_ascii(ascii)
print(q)
# root
# ├── class=od
# │ └── expver=0001
# │ └── param=1/2
# └── class=rd
# └── expver=0002/0003
# └── param=3/4
to_dss_constraints_py(ascii: str) -> str
Convert an ASCII Qube string to DSS-style constraints JSON. Returns a JSON string (array of maps).
from qubed_meteo import to_dss_constraints_py
import json
ascii = """root
├── class=od, expver=0001/0002, param=1/2
└── class=rd, expver=0003, param=3/4"""
constraints_json = to_dss_constraints_py(ascii)
constraints = json.loads(constraints_json)
for c in constraints:
print(c)
# {"class": ["od"], "expver": ["0001", "0002"], "param": ["1", "2"]}
# {"class": ["rd"], "expver": ["0003"], "param": ["3", "4"]}
Each object in the array has the same set of dimension keys. Dimensions not present in a particular datacube get an empty array [].
Complete Workflow Example
from qubed import PyQube
from qubed_meteo import from_fdb_list_py, to_dss_constraints_py
import json
# 1. Ingest from FDB listing
fdb_items = [
"class=od,expver=0001,param=1/2",
"class=od,expver=0002,param=1/2",
"class=rd,expver=0001,param=1/2/3",
]
qube = PyQube.from_ascii(from_fdb_list_py(fdb_items))
print(f"Built qube with {len(qube)} identifiers")
print(qube)
# 2. Export to DSS constraints
constraints = json.loads(to_dss_constraints_py(str(qube)))
print(json.dumps(constraints, indent=2))
# 3. Merge with another qube
extra = PyQube.from_ascii("root\n└── class=xd, expver=0001, param=99")
qube.append(extra)
print(qube)
Notes
- All adapter functions return ASCII strings as a lightweight bridge format. Use
PyQube.from_ascii()to convert to a fullPyQubeobject. - Leading zeros in coordinate values (e.g.
"0001") are preserved through all adapter functions. - The functions raise
ValueErroron parse failures.