Hi!
Sorry for late answer, I didn't notice your email.
> Here is my 'aerial' review
Yes. You are right.
> Can you benchmark the scenario where the optimiser mispredicts numGroups. If the planner underestimates group
cardinality,the btree overhead could be much higher than expected. Does the approach degrade gracefully?
I will try
> 2. Consider splitting the hash_* → spill_* field renaming into a separate preparatory commit to reduce the complexity
ofreviewing the core logic changes.
Will be done
> 3. I notice AGG_INDEX requires both sortable AND hashable types. While I understand this is for the hash-based spill
partitioning,is this limitation necessary? Could you use sort-based spilling (similar to tuplesort's external merge)
instead?This would allow AGG_INDEX to work with sortable-only types (I can imagine a geometric type with B-tree
operatorsbut no hash functions).
I think this is possible if we could use combine function. I did not think about this yet.
---
Some days ago I have implemented Ttree as internal index instead of B+tree. To my surprise, the performance degraded.
Thereis a table with benchmark results (amount is amount of groups and value is latency in ms).
int
| amount | HashAgg | GroupAgg | IndexAgg |
| ------ | ------- | -------- | -------- |
| 100 | 0.222 | 0.199 | 0.198 |
| 1000 | 1.506 | 1.506 | 1.414 |
| 10000 | 15.414 | 15.598 | 15.891 |
| 100000 | 159.625 | 171.507 | 194.401 |
bigint
| amount | HashAgg | GroupAgg | IndexAgg |
| ------ | ------- | -------- | -------- |
| 100 | 0.220 | 0.198 | 0.196 |
| 1000 | 1.504 | 1.514 | 1.419 |
| 10000 | 15.404 | 15.717 | 15.836 |
| 100000 | 160.323 | 172.922 | 193.799 |
text
| amount | HashAgg | GroupAgg | IndexAgg |
| ------ | ------- | -------- | -------- |
| 100 | 0.280 | 0.301 | 0.287 |
| 1000 | 2.267 | 2.954 | 2.734 |
| 10000 | 24.613 | 35.383 | 35.401 |
| 100000 | 270.657 | 430.929 | 485.113 |
uuid
| amount | HashAgg | GroupAgg | IndexAgg |
| ------ | ------- | -------- | -------- |
| 100 | 0.311 | 0.317 | 0.310 |
| 1000 | 2.827 | 2.667 | 2.675 |
| 10000 | 33.233 | 26.980 | 28.848 |
| 100000 | 437.452 | 287.236 | 363.142 |
You can notice how latency increases when amount of groups reaches 100K. Probably this is because of low branching of
theTtree - unlike B+tree it has only 2 children, so have to traverse more nodes.
Also, I do not deny that the problem may be in my code, i.e. some paths are not optimized or there is a bug and tree
becomesimbalanced.
I will try to implement simple Btree as another attempt (not B+tree).
The patches are in attachments.
---
Sergey Soloviev
TantorLabs: https://tantorlabs.com