Re: Macro customizable hashtable / bitmapscan & aggregation perf - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Macro customizable hashtable / bitmapscan & aggregation perf
Date
Msg-id 20161003041215.tfrifbeext3i7nkj@alap3.anarazel.de
Whole thread Raw
In response to Re: Macro customizable hashtable / bitmapscan & aggregation perf  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On 2016-10-02 02:59:04 +0200, Tomas Vondra wrote:
> On 10/02/2016 02:17 AM, Andres Freund wrote:
> > Hi,
> > ...
> > 
> > > > > I think a crucial part of the benchmarking will be identifying and
> > > > > measuring corner cases, e.g. a lot of rows with the same key, etc.
> > > > > Although that probably is not a major issue for the two places
> > > > > switched to the new implementation (e.g. execGrouping merges the
> > > > > duplicates to a single group, by definition).
> > > > 
> > > > Hm. I don't really see a case where that's going to cause issues - all
> > > > the execGrouping.c users only store one key, and either ignore later
> > > > ones, or add the data to the initial tuple in the group.  I don't really
> > > > see any case where repeated keys should cause an issue for a hash table?
> > > > 
> > > 
> > > Yep, that's pretty much what I suggested. I was wondering more about the
> > > other places where this hash table might be used - I've been thinking about
> > > hashjoins, but now that I think of it - that's a fairly specialized and
> > > tuned code. In any case, let's not complicate the discussion for now.
> > 
> > My point is that I don't really see any potential usecase where
> > that's an issue.
> > 
> 
> OK, I think we agree the two places modified by the submitted patch series
> are fine. Let's leave discussion about places modified by future patches for
> the future ;-)

I think my problem here is that I just can't see a potential use-case
for hashtables where that's an issue ;)


> > > 5) SH_RESIZE
> > > 
> > > Do we expect to shrink hash tables?
> > 
> > Not at the moment. Should be doable, but I'm not sure it's worth
> > developing and testing - I don't see any usecases in pg atm.
> > 
> 
> OK, sure. Then what about adding this to SH_RESIZE?
> 
> Assert(oldsize <= newsize);

Yes. And potentially rename it to SH_GROW.


> I assumed we might shrink the catcache after invalidation, for example (but
> maybe I don't really know how that works).

They don't shrink so far, and in most invalidation cases we don't delete
entries, just mark them as invalid (as they might be referenced on the
outside at that moment).


> > > It's not entirely clear why is it guaranteed that there's always
> > > an element with optimal position (when load factor < 1)? Adding an
> > >  explanation or a link to a paper would be helpful, I guess.
> > 
> > As the load factor always has to be below 1, there always has to be
> > an empty bucket. Every bucket after an empty bucket has to be at
> > it's optimal position.

> Hmmm, thanks to moving the entries after delete?

Pretty much, yes.


> Adding an assert() enforcing this seems like a good idea.

Probably requires some additional state to be meaningfully enforced. Not
sure that's worth the effort. If that invariant is broken, not a lot of
stuff works (believe me, I have broken it ;)). Otherwise searches won't
necessarily work anymore, if they didn't end up in their original
position (as the cell would now be empty, a lookup/insert/delete would
not find the existing element).


> > > 7) Should hash agg size estimation for the planner consider the fillfactor?
> > > 
> > > I think we should account for fillfactor - we should account for the
> > > allocated memory, even if there's free space in the hash table. After all,
> > > that's what hashjoins do.
> > 
> > We didn't really for dynahash (as it basically assumed a fillfactor of 1
> > all the time), not sure if changing it won't also cause issues.
> > 
> 
> That's a case of glass half-full vs. half-empty, I guess. If we assumed load
> factor 1.0, then I see it as accounting for load factor (while you see it as
> not accounting of it).

Well, even before we grow by factors of two, so 1 wasn't accurate for
most of the time.  My issue isn't really that I don't want to do it, but
that I'm not entirely sure that there's a good way. TupleHashEntryData
itself isn't exactly large, and we'd have to multiply it by the load
factor. At the same time, after the patch, AggStatePerGroupData isn't
allocated for empty elements anymore, and that's the largest part of the
memory.  So I'm kind of inclined to just disregard the fillfactor (and
document that).

> Also, with load factor 0.8 the table is only ~20% larger, so even if we
> don't include that into work_mem it's a reasonably small error (easily
> smaller than errors in the pre-9.5 HashJoin accounting, which did not
> include chunk headers etc.).

I think in either cases the entries themselves are smaller after the
patch by enough that the overhead doesn't matter once you crossed a few
dozen entries.

Regards,

Andres



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Multi-tenancy with RLS
Next
From: Michael Paquier
Date:
Subject: Re: Aggregate Push Down - Performing aggregation on foreign server