Home > mailing lists

Re: Introduce Index Aggregate - new GROUP BY strategy - Mailing list pgsql-hackers

From	Sergey Soloviev
Subject	Re: Introduce Index Aggregate - new GROUP BY strategy
Date	February 4 20:33:44
Msg-id	2b06b055-7f0d-42a7-ac0b-983ee92e239f@tantorlabs.ru Whole thread Raw
In response to	Re: Introduce Index Aggregate - new GROUP BY strategy (Sergey Soloviev <sergey.soloviev@tantorlabs.ru>)
List	pgsql-hackers

Tree view

Hi!

> 2. Consider splitting the hash_* → spill_* field renaming into a separate preparatory commit
> to reduce the complexity of reviewing the core logic changes.

Here patches rebased to master. I've managed to move renaming part into different patch.
Also, I improved the accuracy of the planner in determining the required memory - it counts
total required index nodes and calculates amount of memory for them (internal and leaf
separately).

Patches in attachments.

> 3. I notice AGG_INDEX requires both sortable AND hashable types. While I understand this
> is for the hash-based spill partitioning, is this limitation necessary? Could you use sort-based
> spilling (similar to tuplesort's external merge) instead? This would allow AGG_INDEX to work
> with sortable-only types (I can imagine a geometric type with B-tree operators but no hash functions). 

I thought about this idea and came to the conclusion, that this should be additional behaviour -
when the type is not hashable. Because if we allow only sortable types, then we have to choose
what to do to support memory limits:

1. Dump all tuples not present in index to disk
2. On overflow compute partial aggregates and at the end perform final merge/combine

Also, at the 1 case I am not considering sorting tuples, because otherwise what we get is plain
Sort/Group pair. By using hash-partitioning we improve performance, because all same tuples
will belong to the same bucket.

In case 2 we imply restriction on aggregation function itself, because not every aggregate has
combine function.

In the end, I haven't come to a decision on which option is better, so I will leave it as it is for now.

---
Sergey Soloviev

TantorLabs: https://tantorlabs.com

Attachment

pgsql-hackers by date:

From: "David G. Johnston"
Date: 04 February, 20:02:17
Subject: Re: Docs: Use non-default throughout the documentation

From: Zsolt Parragi
Date: 04 February, 20:44:06
Subject: Re: Pasword expiration warning

Re: Introduce Index Aggregate - new GROUP BY strategy - Mailing list pgsql-hackers

Attachment

Previous

Next