Under the Hood: Algorithms

This chapter explains the key algorithms that power Qubed: set operations on compressed trees and compression itself.

Set Operations

Qubes represent sets of identifiers, so the familiar set operations are all defined:

Operation	Rust method	Description
Union	`a.append(&mut b)`	All identifiers in A or B (or both)
Intersection	`select` with intersection logic	Identifiers in both A and B
Difference	`a.subtract(&b)`	Identifiers in A but not B
Symmetric difference	internal set operation	Identifiers in exactly one of A or B

How It Works

The algorithm traverses both trees in tandem, recursively:

for node_a in level_A:
    for node_b in level_B:
        just_A, intersection, just_B = fused_set_operation(
            node_a.coordinates,
            node_b.coordinates
        )

At each level, nodes are grouped by dimension. For every pair of nodes sharing the same dimension, the algorithm computes three disjoint sets of coordinate values:

just_A — values only in node A
intersection — values in both nodes
just_B — values only in node B

Depending on the operation:

Operation	Keeps
Union	just_A + intersection + just_B
Intersection	intersection only
A − B	just_A only
B − A	just_B only
Symmetric difference	just_A + just_B

The crucial insight is that each partition gets different children:

just_A inherits the children of node A
just_B inherits the children of node B
intersection gets children computed by recursively calling the set operation on the sub-trees of A and B

This recursive decomposition ensures the result is still a valid compressed Qube.

Performance Considerations

The pairwise comparison is quadratic in the number of matching nodes at each level: $O(N_A \times N_B)$ comparisons per dimension group. In practice this is manageable because:

Once any of just_A, intersection, or just_B is determined to be empty, it can be discarded immediately.
For sorted coordinate types (integers, ranges), the intersection can be computed in linear time by walking both sorted lists in tandem.
After the operation, compression merges any resulting sibling nodes with identical structure, keeping the tree compact.

Compression

Compression is the process of reducing tree size while preserving the set of identifiers. It works in three phases:

Phase 1: Recursive Coordinate Merging

Starting from the leaves and working upward, the algorithm identifies sibling nodes (children of the same parent, sharing the same dimension) that have identical structural hashes.

The structural hash of a node is computed from:

The node’s dimension name
The structural hashes of all its children (recursively)
But not the node’s own coordinate values

Two sibling nodes with the same structural hash have identical subtree shapes. Their coordinates can be merged into a single node without losing information:

Before:                          After:
├── expver=0001                  └── expver=0001/0002
│   ├── param=1                      ├── param=1
│   └── param=2                      └── param=2
└── expver=0002
    ├── param=1
    └── param=2

Phase 2: Pruning Empty Nodes

After merging, some nodes may have empty coordinate sets (their values were absorbed by a sibling). These empty nodes are pruned from the tree.

Phase 3: Deduplication

A final pass deduplicates any nodes that became structurally identical after merging. This is done by recomputing structural hashes and collapsing identical siblings.

Structural hashes are cached in each node using an AtomicU64. The cache is invalidated (set to 0) whenever a node or any of its ancestors are modified. This ensures hashes are recomputed lazily only when needed, making repeated compression operations efficient.

The `append` / Union Workflow

When two Qubes are merged via append:

The two root nodes are paired and node_merge is called recursively.
At each level, children are grouped by dimension and the internal set operation produces the three partitions (just_A, intersection, just_B).
For the intersection partition, new nodes are created and children are recursively merged.
For just_B partitions, subtrees are copied from the other Qube into self.
After all merging is complete, compress() is called to re-compress the result.

The append_many method optimizes merging many Qubes by batching: it performs intermediate compression every 500 Qubes to prevent unbounded tree growth.

Qubed API Documentation