Hi!
> 2. Consider splitting the hash_* → spill_* field renaming into a separate preparatory commit
> to reduce the complexity of reviewing the core logic changes.
Here patches rebased to master. I've managed to move renaming part into different patch.
Also, I improved the accuracy of the planner in determining the required memory - it counts
total required index nodes and calculates amount of memory for them (internal and leaf
separately).
Patches in attachments.
> 3. I notice AGG_INDEX requires both sortable AND hashable types. While I understand this
> is for the hash-based spill partitioning, is this limitation necessary? Could you use sort-based
> spilling (similar to tuplesort's external merge) instead? This would allow AGG_INDEX to work
> with sortable-only types (I can imagine a geometric type with B-tree operators but no hash functions).
I thought about this idea and came to the conclusion, that this should be additional behaviour -
when the type is not hashable. Because if we allow only sortable types, then we have to choose
what to do to support memory limits:
1. Dump all tuples not present in index to disk
2. On overflow compute partial aggregates and at the end perform final merge/combine
Also, at the 1 case I am not considering sorting tuples, because otherwise what we get is plain
Sort/Group pair. By using hash-partitioning we improve performance, because all same tuples
will belong to the same bucket.
In case 2 we imply restriction on aggregation function itself, because not every aggregate has
combine function.
In the end, I haven't come to a decision on which option is better, so I will leave it as it is for now.
---
Sergey Soloviev
TantorLabs: https://tantorlabs.com