Re: Macro customizable hashtable / bitmapscan & aggregation perf - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Macro customizable hashtable / bitmapscan & aggregation perf
Date
Msg-id 20161001195956.u657cbvya7thzkr6@alap3.anarazel.de
Whole thread Raw
In response to Re: Macro customizable hashtable / bitmapscan & aggregation perf  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Macro customizable hashtable / bitmapscan & aggregation perf  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 2016-10-01 20:19:21 +0200, Tomas Vondra wrote:
> On 10/01/2016 02:44 AM, Andres Freund wrote:
> > Hi,
> > 
> > On 2016-07-26 17:43:33 -0700, Andres Freund wrote:
> > > In the attached patch I've attached simplehash.h, which can be
> > > customized by a bunch of macros, before being inlined.  There's also a
> > > patch using this for tidbitmap.c and nodeAgg/nodeSubplan/... via
> > > execGrouping.c.
> > 
> > Attached is a significantly improved version of this series.  The main
> > changes are:
> > 
> > - The hash table now uses robin-hood style hash collision handling. See the
> >   commit message of 0002 (or simplehash.h) for an introduction. That
> >   significantly reduces the impact of "clustering" due to linear
> >   addressing.
> 
> Interesting. Have you looked at cuckoo hashing?

Yes. I don't think it's a good fit for modern CPUs. Because it doesn't
probe linearly you constantly have random uncached memory accesses.

I've played with a few schemes, and they all seemed to be slower than
linear probing deviations, unless you intentionally used a wrong
hash-distribution, or intentionally "unbalanced" the hash-space by
iterating over either end of the keyspace and moved the entries around.


> > - Significant comment and correctness fixes, both in simplehash.h
> > - itself, and 0003/0004.
> > - a lot of other random performance improvements for the hashing code.
> > 
> > 
> > Unfortunately I'm running out battery right now, so I don't want to
> > re-run the benchmarks posted upthread (loading takes a while). But the
> > last time I ran them all the results after the patches were better than
> > before.
> > 
> 
> Well, I have rather bad experience with running benchmarks on laptops anyway
> - a lot of noise due to power management, etc.

Well, with factor ~2x improvements thats not really a big
detractor. Using a new CPU makes some forms of analysis easier (better
PMUs).

For longrunning tests I agree.


> What about running a bigger benchmark - say, TPC-DS - and evaluating
> the impact?

Worthwhile, although obviously the impact will be a lot smaller, since
they're not entirely bottlenecked on hash-aggs and bitmap scans.


> I think a crucial part of the benchmarking will be identifying and measuring
> corner cases, e.g. a lot of rows with the same key, etc.
> Although that probably is not a major issue for the two places
> switched to the new implementation (e.g. execGrouping merges the
> duplicates to a single group, by definition).

Hm. I don't really see a case where that's going to cause issues - all
the execGrouping.c users only store one key, and either ignore later
ones, or add the data to the initial tuple in the group.  I don't really
see any case where repeated keys should cause an issue for a hash table?


> > This patch series currently consists out of four patches:
> > - 0001 - add unlikely/likely() macros. The use here is not entirely
> >   mandatory, but they do provide a quite consistent improvement. There
> >   are other threads where they proved to be beneficial, so I see little
> >   reason not to introduce them.
> > - 0002 - the customizable hashtable itself. It now contains documentation.
> > - 0003 - convert tidbitmap.c to use the new hashmap. This includes a fix
> >   for the issue mentioned in [1], to improve peformance in heavily lossy
> >   scenarios (otherwise we could regress in some extreme cases)
> > - 0004 - convert execGrouping.c, e.g. used by hash aggregates
> > 
> > 
> > While not quite perfect yet, I do think this is at a state where input
> > is needed.  I don't want to go a lot deeper into this rabbit hole,
> > before we have some agreement that this is a sensible course of action.
> > 
> 
> So, is it the right time to do some benchmarking?

That's one thing that's required, yes. The other is whether we can live
with the uglyness of implementing templates via macros. I do think we
can, but...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Macro customizable hashtable / bitmapscan & aggregation perf
Next
From: Andres Freund
Date:
Subject: Re: Macro customizable hashtable / bitmapscan & aggregation perf