Hello, Michael!
> So, here is attached a counter-proposal, where we can simply added a
> counter tracking a node count in _jumbleNode() to add more entropy to
> the mix, incrementing it as well for NULL nodes.
It definitely looks like a more reliable solution than my variant, which only
counts NULL nodes.
However, we already knew about the overhead of adding `\0` bytes for
every NULL field.
> So that adds about 9.1% overhead to jumbling, on average.
See:
https://www.postgresql.org/message-id/flat/5ac172e0b77a4baba50671cd1a15285f%40localhost.localdomain#6c43f354f5f42d2a27e6824faa660a86
Is it really worth spending extra execution time to increase entropy
when we have non-NULL nodes?
Maybe we should choose to add node_count to the hash every time we visit
non-NULL or NULL nodes.
We could also add entropy if we see a change in the node->type value for
non-NULL variants.
Your Variant
------------
< node_count = 1 > < node 1 >
< node_count = 2 > /* node 2 = NULL */
< node_count = 3 > < node 3 >
Alternative 1 (mark only NULL Nodes)
------------------------------------
/* node_count = 1 */ < node 1 >
< node_count = 2 > /* node 2 = NULL */
/* node_count = 3 */ < node 3 >
Alternative 2 (mark only non-NULL Nodes)
----------------------------------------
This could address concerns about problems related to visiting nodes with the
same content placed in different query tree branches.
< node_count = 1 > < node 1 >
/* node_count = 2 */ /* node 2 = NULL */
< node_count = 3 > < node 3 >