Thread: [HACKERS] Performance degradation in TPC-H Q18

[HACKERS] Performance degradation in TPC-H Q18

From

Kuntal Ghosh

Date:

22 February 2017, 08:53:07

Hello everyone,

While conducting the experiments for parallelism, Rafia came across a
hang in Q18 when plan uses partial and finalize hash aggregate. This
could be seen on both scale factors - 20 and 300, on setting work_mem
high enough so that the query uses hash aggregate. It seems that
commit b81b5a96f424531b97cdd1dba97d9d1b9c9d372e does not solve the
issue completely.

I've tested on scale factor 10 and work_mem 100mb and the issue is
reproducible. Below is the simplified part of the query which uses the
hash aggregate.

tpch10GB=# explain select l_orderkey from lineitem group by l_orderkey;
                                          QUERY PLAN
------------------------------------------------------------------------------------------------Finalize HashAggregate
(cost=1612872.73..1616801.32rows=392859 width=8)  Group Key: l_orderkey  ->  Gather  (cost=1528408.04..1610908.43
rows=785718width=8)        Workers Planned: 2        ->  Partial HashAggregate  (cost=1527408.04..1531336.63
 
rows=392859 width=8)              Group Key: l_orderkey              ->  Parallel Seq Scan on lineitem
(cost=0.00..1464922.43 rows=24994243 width=8)
(7 rows)

I've also tried to execute the same query with a different column name.

tpch10GB=# explain analyze select l_suppkey from lineitem group by l_suppkey;
             QUERY PLAN
 

-----------------------------------------------------------------------------------------------------------------------------------------------------Finalize
HashAggregate (cost=1549511.80..1550493.37 rows=98157 width=8)               (actual time=22363.901..22380.361
rows=100000loops=1)  Group Key: l_suppkey  ->  Gather  (cost=1528408.04..1549021.01 rows=196314 width=8)
(actualtime=22107.387..22244.848 rows=300000 loops=1)        Workers Planned: 2        Workers Launched: 2        ->
PartialHashAggregate  (cost=1527408.04..1528389.61
 
rows=98157 width=8)                   (actual time=22100.776..22135.063 rows=100000 loops=3)              Group Key:
l_suppkey             ->  Parallel Seq Scan on lineitem
 
(cost=0.00..1464922.43 rows=24994243 width=8)                           (actual time=0.846..10664.258 rows=19995351
loops=3)Planningtime: 0.171 msExecution time: 22393.030 ms
 
(10 rows)

And, it finished in time although the stats in both the plans are
exactly same. Hence, the problem is with the query itself. As Robert
suggested offline, I've produced the perf report for the query.

99.44%     0.01%  postgres  postgres           [.] LookupTupleHashEntry
74.31%    53.55%  postgres  postgres           [.] tuplehash_insert
17.05%    17.04%  postgres  libc-2.17.so       [.] __memcpy_ssse3_back
11.06%    11.05%  postgres  postgres           [.] tuplehash_prev
10.81%    10.80%  postgres  postgres           [.] tuplehash_next

It's clear from the perf report that something terrible is going on in
the simple hash insert. I've enabled the logging for hash
ops(SH_STAT()) during hash growing phase.

2017-02-21 16:42:39.634 IST [9372] LOG:  size: 1048576, members:
838860, filled: 0.799999,
total chain: 2008236860, max chain: 66743, avg chain: 2394.007176,
total_collisions: 278348, max_collisions: 8, avg_collisions: 0.331817
2017-02-21 16:42:39.634 IST [9372] STATEMENT:  explain analyze select
l_orderkey from lineitem group by l_orderkey;

2017-02-21 16:42:39.741 IST [9372] LOG:  size: 2097152, members:
838860, filled: 0.400000,
total chain: 788175, max chain: 124, avg chain: 0.939579,
total_collisions: 216052, max_collisions: 6, avg_collisions: 0.257554
2017-02-21 16:42:39.741 IST [9372] STATEMENT:  explain analyze select
l_orderkey from lineitem group by l_orderkey;

The value of max chain length is really high which hampers the query's
performance. To further understand the scenario, I've taken the
following results:

tpch10GB=# select count(distinct l_orderkey) from lineitem; count
----------15000000

tpch10GB=# select correlation from pg_stats where tablename='lineitem'
and attname='l_orderkey';correlation
-------------          1

tpch10GB=# select count(distinct l_suppkey) from lineitem;count
--------100000

tpch10GB=# select correlation from pg_stats where tablename='lineitem'
and attname='l_suppkey';correlation
------------- 0.00900259

It seems that l_orderkey produces a pretty bad case for hashing with
linear probing and a bad choice of hash function(as linear probing is
sensitive to elements with nearby hash codes). In [1], Andres's
proposed a hack to resize the hashtable when the chain is growing
quickly. Below is a comment from the patch explaining the objective:

+            /*
+             * To avoid negative consequences from overly imbalanced
+             * hashtables, grow the hashtable if collisions lead to large
+             * runs. The most likely cause of such imbalance is filling a
+             * (currently) small table, from a currently big one, in
+             * hash-table order.
+             *
+             * FIXME: compute boundaries in a more principled manner.
+             */

After applying the patch, TPC-H Q18 and the simplified query, both
completes in time. Although the patch works in this scenario, we need
a smarter way to trigger the hash expansion. The problem with the
patch is that it triggers a hash expansion even when the filled
percentage is pretty low, causing unnecessary memory consumption.

Thoughts?

[1] https://www.postgresql.org/message-id/20161123083351.5vramz52nmdokhzz%40alap3.anarazel.de
0001-Resize-simplehash.h-table-in-case-of-long-runs.patch


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Robert Haas

Date:

26 February 2017, 17:00:32

On Wed, Feb 22, 2017 at 11:23 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
> While conducting the experiments for parallelism, Rafia came across a
> hang in Q18 when plan uses partial and finalize hash aggregate. This
> could be seen on both scale factors - 20 and 300, on setting work_mem
> high enough so that the query uses hash aggregate. It seems that
> commit b81b5a96f424531b97cdd1dba97d9d1b9c9d372e does not solve the
> issue completely.

Andres, any thoughts?  Isn't the same issue that we were discussing
https://www.postgresql.org/message-id/CA+TgmoYNO8qouPVO=1Q2aXzuxe942d_T5bcvZd9iKOC9tb3uLg@mail.gmail.com
over a month ago?

To me, it seems like a big problem that we have large, unfixed
performance regressions with simplehash four months after it went in.
I hate to suggest ripping the whole thing out, and it seems like
overkill, but it's pretty clear to me that the current state of things
is unacceptable, and that we're going to have a lot of unhappy users
if we ship it the way that it is right now.  I want to point out that
the kinds of problems we're hitting here are exactly what I told you I
was worried about before it went in - that the average-case
performance would be better but that there would be
all-too-easy-to-hit cases where things got much worse because the
whole thing degenerated into a linear search.  Not only did that
happen, but there seem to be multiple ways of producing it without
half trying, of which b81b5a96f424531b97cdd1dba97d9d1b9c9d372e fixed
only one.  Something that's 5-10% faster in common cases but 2x or 10x
slower when things go wrong is not an improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Andres Freund

Date:

28 February 2017, 20:46:13

Hi,

On 2017-02-26 19:30:32 +0530, Robert Haas wrote:
> On Wed, Feb 22, 2017 at 11:23 AM, Kuntal Ghosh
> <kuntalghosh.2007@gmail.com> wrote:
> > While conducting the experiments for parallelism, Rafia came across a
> > hang in Q18 when plan uses partial and finalize hash aggregate. This
> > could be seen on both scale factors - 20 and 300, on setting work_mem
> > high enough so that the query uses hash aggregate. It seems that
> > commit b81b5a96f424531b97cdd1dba97d9d1b9c9d372e does not solve the
> > issue completely.
> 
> Andres, any thoughts?  Isn't the same issue that we were discussing
> https://www.postgresql.org/message-id/CA+TgmoYNO8qouPVO=1Q2aXzuxe942d_T5bcvZd9iKOC9tb3uLg@mail.gmail.com
> over a month ago?

Yes, I presume that it is.

> To me, it seems like a big problem that we have large, unfixed
> performance regressions with simplehash four months after it went in.

Yea, I agree.  I'm fairly sure that the patch I posted in that thread
actually fixes the issue (and would also have made already applied hash
patch of yours a small-ish optimization). I held back back because I
disliked the idea of magic constants, and I couldn't figure out a way to
properly "derive" them - but I'm inclined to simply live with the magic constsnts.

> I hate to suggest ripping the whole thing out, and it seems like
> overkill

Yea, I don't think we're there yet.  Let me push the patch that resizes
adaptively, and we can see how things are going from there.

> , but it's pretty clear to me that the current state of things
> is unacceptable, and that we're going to have a lot of unhappy users
> if we ship it the way that it is right now.

Yea, if we can't improve upon the current state, we'd need to revert.

> I want to point out that
> the kinds of problems we're hitting here are exactly what I told you I
> was worried about before it went in - that the average-case
> performance would be better but that there would be
> all-too-easy-to-hit cases where things got much worse because the
> whole thing degenerated into a linear search.  Not only did that
> happen, but there seem to be multiple ways of producing it without
> half trying, of which b81b5a96f424531b97cdd1dba97d9d1b9c9d372e fixed
> only one.

Note that that one is also fixed with what I'm proposing (but it's a
worthwhile small-ish improvement nonetheless).

> Something that's 5-10% faster in common cases but 2x or 10x
> slower when things go wrong is not an improvement.

The hash-table part is more like 3x faster - but as so often when making
things faster, the next bottleneck is just around the corner...

- Andres

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Robert Haas

Date:

01 March 2017, 06:12:35

On Tue, Feb 28, 2017 at 11:16 PM, Andres Freund <andres@anarazel.de> wrote:
>> To me, it seems like a big problem that we have large, unfixed
>> performance regressions with simplehash four months after it went in.
>
> Yea, I agree.  I'm fairly sure that the patch I posted in that thread
> actually fixes the issue (and would also have made already applied hash
> patch of yours a small-ish optimization). I held back back because I
> disliked the idea of magic constants, and I couldn't figure out a way to
> properly "derive" them - but I'm inclined to simply live with the magic constsnts.

I think it's better to push in a proposed fix and see how it holds up
than to leave the tree in an unfixed state for long periods of time.
If the fix is good, then the fact that there's a magic constant
doesn't really matter all that much, and it can always be improved
later.  On the other hand, if the fix is bad, pushing it improves the
chances of finding the problems.  Not many people are going to test an
uncommitted fix.

BTW, another option to consider is lowering the target fillfactor.
IIRC, Kuntal mentioned to me that cranking it down seemed to fix the
issue.  Obviously, it's better to detect when we need a lower
fillfactor than to always use a lower one, but obviously the tighter
you pack the hash table, the more likely it is that you're going to
have these kinds of problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Andres Freund

Date:

01 March 2017, 06:18:05

On 2017-03-01 08:42:35 +0530, Robert Haas wrote:
> On Tue, Feb 28, 2017 at 11:16 PM, Andres Freund <andres@anarazel.de> wrote:
> >> To me, it seems like a big problem that we have large, unfixed
> >> performance regressions with simplehash four months after it went in.
> >
> > Yea, I agree.  I'm fairly sure that the patch I posted in that thread
> > actually fixes the issue (and would also have made already applied hash
> > patch of yours a small-ish optimization). I held back back because I
> > disliked the idea of magic constants, and I couldn't figure out a way to
> > properly "derive" them - but I'm inclined to simply live with the magic constsnts.
> 
> I think it's better to push in a proposed fix and see how it holds up
> than to leave the tree in an unfixed state for long periods of time.

Yea, that was a mistake.  I kind of was hoping for an epiphany that
has not yet come.


> BTW, another option to consider is lowering the target fillfactor.
> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the
> issue.  Obviously, it's better to detect when we need a lower
> fillfactor than to always use a lower one, but obviously the tighter
> you pack the hash table, the more likely it is that you're going to
> have these kinds of problems.

Yea, that'd be one approach, but I feel better dynamically increasing
the fillfactor like in the patch. Even with a lower fillfactor you could
see issues, and as you say a higher fillfactor is nice [TM]; after the
patch I played with *increasing* the fillfactor, without being able to
measure a downside.

Greetings,

Andres Freund

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Kuntal Ghosh

Date:

01 March 2017, 06:43:15

On Wed, Mar 1, 2017 at 8:48 AM, Andres Freund <andres@anarazel.de> wrote:
>> BTW, another option to consider is lowering the target fillfactor.
>> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the
>> issue.  Obviously, it's better to detect when we need a lower
>> fillfactor than to always use a lower one, but obviously the tighter
>> you pack the hash table, the more likely it is that you're going to
>> have these kinds of problems.
>
> Yea, that'd be one approach, but I feel better dynamically increasing
> the fillfactor like in the patch. Even with a lower fillfactor you could
> see issues, and as you say a higher fillfactor is nice [TM]; after the
> patch I played with *increasing* the fillfactor, without being able to
> measure a downside.
I've tested with different fill factors and the query got completed
with fill factor 0.6.

With linear probing, the performance degrades more quickly at high
fill factors because of primary clustering, a tendency for one
collision to cause more nearby collisions. So, increasing the fill
factor further doesn't seem to be a good idea. Besides, when I've
tested with the patch, I've seen hash table grows even when the 10-12%
of the hash table is filled.

LOG:  size: 8388608, members: 845056, filled: 0.100739, total chain:
735723, max chain: 37, avg chain: 0.870620, total_collisions: 219101,
max_collisions: 6, avg_collisions: 0.259274
STATEMENT:  explain analyze select l_orderkey from lineitem group by l_orderkey;
LOG:  size: 16777216, members: 845056, filled: 0.050369, total chain:
197047, max chain: 6, avg chain: 0.233176, total_collisions: 121185,
max_collisions: 5, avg_collisions: 0.143405
STATEMENT:  explain analyze select l_orderkey from lineitem group by l_orderkey;
LOG:  size: 16777216, members: 2127649, filled: 0.126818, total chain:
1471327, max chain: 35, avg chain: 0.691527, total_collisions: 486310,
max_collisions: 6, avg_collisions: 0.228567
STATEMENT:  explain analyze select l_orderkey from lineitem group by l_orderkey;
LOG:  size: 33554432, members: 2127649, filled: 0.063409, total chain:
413073, max chain: 7, avg chain: 0.194145, total_collisions: 265766,
max_collisions: 5, avg_collisions: 0.124911
STATEMENT:  explain analyze select l_orderkey from lineitem group by l_orderkey;

This seems bad. We're wasting a lot of memory in the hash table.

Apart from that, I was thinking about the following optimization in the code.       /*        * If the bucket is not
empty,we either found a match (in which case        * we're done), or we have to decide whether to skip over or move
the       * colliding entry. When the colliding element's distance to its        * optimal position is smaller than the
to-be-insertedentry's, we        * shift the colliding entry (and its followers) forward by one.        */

I'm worried about the fact that we are increasing the chain length of
each follower. It may happen that we've increased the chain length of
the last element by a huge factor.

So, I was looking for other alternatives and I've found one called
RobinHood hashing. In this approach, when you probe for a position to
insert a new element, if the probe length for the existing element is
less than the current probe length for the element being inserted,
swap the two elements and keep going. It leads to a low variance for
the chain lengths rather than the last one. Is this approach already
considered?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Andres Freund

Date:

01 March 2017, 06:49:23

On 2017-03-01 09:13:15 +0530, Kuntal Ghosh wrote:
> On Wed, Mar 1, 2017 at 8:48 AM, Andres Freund <andres@anarazel.de> wrote:
> >> BTW, another option to consider is lowering the target fillfactor.
> >> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the
> >> issue.  Obviously, it's better to detect when we need a lower
> >> fillfactor than to always use a lower one, but obviously the tighter
> >> you pack the hash table, the more likely it is that you're going to
> >> have these kinds of problems.
> >
> > Yea, that'd be one approach, but I feel better dynamically increasing
> > the fillfactor like in the patch. Even with a lower fillfactor you could
> > see issues, and as you say a higher fillfactor is nice [TM]; after the
> > patch I played with *increasing* the fillfactor, without being able to
> > measure a downside.

> I've tested with different fill factors and the query got completed
> with fill factor 0.6.

That's without the patch in
http://archives.postgresql.org/message-id/20161123083351.5vramz52nmdokhzz%40alap3.anarazel.de
? With that patch it should complete without that change, right?

> With linear probing, the performance degrades more quickly at high
> fill factors because of primary clustering, a tendency for one
> collision to cause more nearby collisions. So, increasing the fill
> factor further doesn't seem to be a good idea.

Well, the idea is increasing it, but *also* detecting clustering and
adaptively resizing earlier.


> So, I was looking for other alternatives and I've found one called
> RobinHood hashing.

simplehash.h implements robin hood hashing.


- Andres

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Robert Haas

Date:

01 March 2017, 06:51:47

On Wed, Mar 1, 2017 at 8:48 AM, Andres Freund <andres@anarazel.de> wrote:
>> BTW, another option to consider is lowering the target fillfactor.
>> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the
>> issue.  Obviously, it's better to detect when we need a lower
>> fillfactor than to always use a lower one, but obviously the tighter
>> you pack the hash table, the more likely it is that you're going to
>> have these kinds of problems.
>
> Yea, that'd be one approach, but I feel better dynamically increasing
> the fillfactor like in the patch. Even with a lower fillfactor you could
> see issues, and as you say a higher fillfactor is nice [TM]; after the
> patch I played with *increasing* the fillfactor, without being able to
> measure a downside.

I am going to bet that increasing the fillfactor would be a mistake.
The expected length of a clump in the table is going to be about
1/(1-ff), which increases very quickly beyond the current value of
0.8.  For example, if the actual fillfactor is 0.9 and the keys are
perfectly distributed -- nine in a row and then one empty slot,
lather, rinse, repeat -- probing for a key that's not there has a 10%
chance of hitting an empty slot, a 10% chance of hitting the slot just
before an empty slot, and so on.  So the expected number of probes to
find that there's no match is 4.5.  However, it could easily happen
that you have 3 or 4 empty slots in a row in one place and 27 or 36
occupied slots in a row in another place, and that could cause all
kinds of grief.  Moreover, real-world hash functions on real-world
data aren't always perfect, which increases the chances of things
going off track.  I'm not exactly sure how high a fillfactor is too
high, but note that
https://en.wikipedia.org/wiki/Hash_table#Open_addressing claims that
"performance dramatically degrades when the load factor grows beyond
0.7 or so", which isn't cited but the same text appears in
https://www.unf.edu/~wkloster/3540/wiki_book2.pdf for whatever that's
worth.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Andres Freund

Date:

01 March 2017, 07:02:22

On 2017-03-01 09:21:47 +0530, Robert Haas wrote:
> On Wed, Mar 1, 2017 at 8:48 AM, Andres Freund <andres@anarazel.de> wrote:
> >> BTW, another option to consider is lowering the target fillfactor.
> >> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the
> >> issue.  Obviously, it's better to detect when we need a lower
> >> fillfactor than to always use a lower one, but obviously the tighter
> >> you pack the hash table, the more likely it is that you're going to
> >> have these kinds of problems.
> >
> > Yea, that'd be one approach, but I feel better dynamically increasing
> > the fillfactor like in the patch. Even with a lower fillfactor you could
> > see issues, and as you say a higher fillfactor is nice [TM]; after the
> > patch I played with *increasing* the fillfactor, without being able to
> > measure a downside.
> 
> I am going to bet that increasing the fillfactor would be a mistake.
> The expected length of a clump in the table is going to be about
> 1/(1-ff), which increases very quickly beyond the current value of
> 0.8.  For example, if the actual fillfactor is 0.9 and the keys are
> perfectly distributed -- nine in a row and then one empty slot,
> lather, rinse, repeat -- probing for a key that's not there has a 10%
> chance of hitting an empty slot, a 10% chance of hitting the slot just
> before an empty slot, and so on.  So the expected number of probes to
> find that there's no match is 4.5.  However, it could easily happen
> that you have 3 or 4 empty slots in a row in one place and 27 or 36
> occupied slots in a row in another place, and that could cause all
> kinds of grief.  Moreover, real-world hash functions on real-world
> data aren't always perfect, which increases the chances of things
> going off track.  I'm not exactly sure how high a fillfactor is too
> high, but note that
> https://en.wikipedia.org/wiki/Hash_table#Open_addressing claims that
> "performance dramatically degrades when the load factor grows beyond
> 0.7 or so", which isn't cited but the same text appears in
> https://www.unf.edu/~wkloster/3540/wiki_book2.pdf for whatever that's
> worth.

That's without the "trick" of "robin hood" hashing though:
https://en.wikipedia.org/wiki/Hash_table#Robin_Hood_hashing
http://codecapsule.com/2013/11/17/robin-hood-hashing-backward-shift-deletion/
https://www.pvk.ca/Blog/2013/11/26/the-other-robin-hood-hashing/

it probably depends a bit on the scenario how high you want to have the
fillfactor - for hashaggs a high fill factor would probably be good
(they're often "read" heavy), for bitmapscans less so.

But I guess there's not much point in immediately increasing the
fillfactor, can do that in a second step.

Greetings,

Andres Freund

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Kuntal Ghosh

Date:

01 March 2017, 07:03:07

On Wed, Mar 1, 2017 at 9:19 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-03-01 09:13:15 +0530, Kuntal Ghosh wrote:
>> On Wed, Mar 1, 2017 at 8:48 AM, Andres Freund <andres@anarazel.de> wrote:
>> >> BTW, another option to consider is lowering the target fillfactor.
>> >> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the
>> >> issue.  Obviously, it's better to detect when we need a lower
>> >> fillfactor than to always use a lower one, but obviously the tighter
>> >> you pack the hash table, the more likely it is that you're going to
>> >> have these kinds of problems.
>> >
>> > Yea, that'd be one approach, but I feel better dynamically increasing
>> > the fillfactor like in the patch. Even with a lower fillfactor you could
>> > see issues, and as you say a higher fillfactor is nice [TM]; after the
>> > patch I played with *increasing* the fillfactor, without being able to
>> > measure a downside.
>
>> I've tested with different fill factors and the query got completed
>> with fill factor 0.6.
>
> That's without the patch in
> http://archives.postgresql.org/message-id/20161123083351.5vramz52nmdokhzz%40alap3.anarazel.de
> ? With that patch it should complete without that change, right?
>
No, it's with the patch in the above link which is
0001-Resize-simplehash.h-table-in-case-of-long-runs.patch. But, I've
kept the SH_FILLFACTOR to 0.8 and SH_MAX_FILLFACTOR to 0.95. I'll do
another round of testing with the constants provided by you.

>> With linear probing, the performance degrades more quickly at high
>> fill factors because of primary clustering, a tendency for one
>> collision to cause more nearby collisions. So, increasing the fill
>> factor further doesn't seem to be a good idea.
>
> Well, the idea is increasing it, but *also* detecting clustering and
> adaptively resizing earlier.
>
>
>> So, I was looking for other alternatives and I've found one called
>> RobinHood hashing.
>
> simplehash.h implements robin hood hashing.
>
>
But, it doesn't implement the swapping idea, right?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Andres Freund

Date:

01 March 2017, 07:08:32

Hi,

On 2017-03-01 09:33:07 +0530, Kuntal Ghosh wrote:
> On Wed, Mar 1, 2017 at 9:19 AM, Andres Freund <andres@anarazel.de> wrote:
> >> So, I was looking for other alternatives and I've found one called
> >> RobinHood hashing.
> >
> > simplehash.h implements robin hood hashing.

> But, it doesn't implement the swapping idea, right?

It does, that's the if (insertdist > curdist) block in SH_INSERT.
Unless I misunderstand what you're proposing?

- Andres

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Kuntal Ghosh

Date:

01 March 2017, 08:09:11

On Wed, Mar 1, 2017 at 9:33 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
> On Wed, Mar 1, 2017 at 9:19 AM, Andres Freund <andres@anarazel.de> wrote:
>> That's without the patch in
>> http://archives.postgresql.org/message-id/20161123083351.5vramz52nmdokhzz%40alap3.anarazel.de
>> ? With that patch it should complete without that change, right?
>>
> No, it's with the patch in the above link which is
> 0001-Resize-simplehash.h-table-in-case-of-long-runs.patch. But, I've
> kept the SH_FILLFACTOR to 0.8 and SH_MAX_FILLFACTOR to 0.95. I'll do
> another round of testing with the constants provided by you.
>
I've tested with the patch
0001-Resize-simplehash.h-table-in-case-of-long-runs.patch and the
results are same, i.e., hash table grows even when it is only 10-12%
filled. I've attached the logfile for reference.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

logfile

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Kuntal Ghosh

Date:

01 March 2017, 08:17:45

On Wed, Mar 1, 2017 at 9:38 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2017-03-01 09:33:07 +0530, Kuntal Ghosh wrote:
>> On Wed, Mar 1, 2017 at 9:19 AM, Andres Freund <andres@anarazel.de> wrote:
>> >> So, I was looking for other alternatives and I've found one called
>> >> RobinHood hashing.
>> >
>> > simplehash.h implements robin hood hashing.
>
>> But, it doesn't implement the swapping idea, right?
>
> It does, that's the if (insertdist > curdist) block in SH_INSERT.
> Unless I misunderstand what you're proposing?
>
I think the idea of the existing implementation is:

if (insertdist > curdist)
{
find an empty space at the end of the cluster.
shift all the followers by 1 position to make space for the element to
be inserted.
insert the element at the current place.
}

What I'm trying to say is:

if (insertdist > curdist)
{
swap the entry to be inserted with the current entry.
Try to insert the current entry in the same logic.
}

So, the second approach will not cause all the followers to be shifted by 1.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Andres Freund

Date:

01 March 2017, 08:23:16

On 2017-03-01 10:47:45 +0530, Kuntal Ghosh wrote:
> if (insertdist > curdist)
> {
> swap the entry to be inserted with the current entry.
> Try to insert the current entry in the same logic.
> }
> 
> So, the second approach will not cause all the followers to be shifted by 1.

How not? You'll have to do that game until you found a free slot.

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Kuntal Ghosh

Date:

01 March 2017, 08:35:33

On Wed, Mar 1, 2017 at 10:53 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-03-01 10:47:45 +0530, Kuntal Ghosh wrote:
>> if (insertdist > curdist)
>> {
>> swap the entry to be inserted with the current entry.
>> Try to insert the current entry in the same logic.
>> }
>>
>> So, the second approach will not cause all the followers to be shifted by 1.
>
> How not? You'll have to do that game until you found a free slot.
You can skip increasing the curdist of some elements for which it is
already pretty high. Suppose, we've the following hash table and an
element e,
_,_,_,i,_,_,j,j+1,j+2,_,_
We want to insert e at i but there is a collision and we start doing
probing. At j, the condition if (insertdist > curdist) satisfies and
we'll swap e with the element at j. Now, we try to insert e( = element
at j) and so on. In this process, if the element at j+1,j+2 has
already high distance from their optimal location, you can skip those
element and go ahead. But, in the existing approach, we increase their
distance further. Thoughts?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Andres Freund

Date:

02 March 2017, 22:52:48

Hi,

On 2017-03-01 10:39:11 +0530, Kuntal Ghosh wrote:
> On Wed, Mar 1, 2017 at 9:33 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
> > On Wed, Mar 1, 2017 at 9:19 AM, Andres Freund <andres@anarazel.de> wrote:
> >> That's without the patch in
> >> http://archives.postgresql.org/message-id/20161123083351.5vramz52nmdokhzz%40alap3.anarazel.de
> >> ? With that patch it should complete without that change, right?
> >>
> > No, it's with the patch in the above link which is
> > 0001-Resize-simplehash.h-table-in-case-of-long-runs.patch. But, I've
> > kept the SH_FILLFACTOR to 0.8 and SH_MAX_FILLFACTOR to 0.95. I'll do
> > another round of testing with the constants provided by you.
> >
> I've tested with the patch
> 0001-Resize-simplehash.h-table-in-case-of-long-runs.patch and the
> results are same, i.e., hash table grows even when it is only 10-12%
> filled. I've attached the logfile for reference.

So, I observed similar things, where the *leader backend* reports a lot
of "long runs".

WARNING:  clumping, growing this thing
LOG:  size: 1048576, members: 158071, filled: 0.150748, total chain: 95622, max chain: 35, avg chain: 0.604931,
total_collisions:35909, max_collisions: 5, avg_collisions: 0.227170

STATEMENT:  EXPLAIN ANALYZE select l_orderkey from lineitem group by l_orderkey;
WARNING:  clumping, growing this thing
LOG:  size: 2097152, members: 238992, filled: 0.113960, total chain: 1850141, max chain: 422, avg chain: 7.741435,
total_collisions:55363, max_collisions: 5, avg_collisions: 0.231652

STATEMENT:  EXPLAIN ANALYZE select l_orderkey from lineitem group by l_orderkey;
WARNING:  clumping, growing this thing
LOG:  size: 4194304, members: 881089, filled: 0.210068, total chain: 567907, max chain: 29, avg chain: 0.644551,
total_collisions:174681, max_collisions: 6, avg_collisions: 0.198256

STATEMENT:  EXPLAIN ANALYZE select l_orderkey from lineitem group by l_orderkey;
WARNING:  clumping, growing this thing
LOG:  size: 8388608, members: 5397649, filled: 0.643450, total chain: 5752690, max chain: 22, avg chain: 1.065777,
total_collisions:1437957, max_collisions: 8, avg_collisions: 0.266404

STATEMENT:  EXPLAIN ANALYZE select l_orderkey from lineitem group by l_orderkey;

The pertinent part is that there's "forced" resizes due to long runs at
0.15%, 0.11%, 0.21%, 0.64%.  Note that there's a good number of
"ordinary" resizes, and at the end there's 7500000 tuples in the
hashtable.  I.e. there's a number of resizes after the last one due to
clumping, and all of those are entirely guided through fillfactor; the
final hash-table is entirely reasonably sized.

Trying to do the math, that just didn't sit well with me. There's just
no way there could be that many and such long runs, without something
broken.  And indeed:

Thinking about this even more I realized the reason this happens is that
b81b5a96f4 didn't actually address the problem of inserting in
hash-table order.   To understand, let's first look at the query plan:

Finalize HashAggregate  (cost=899227.75..903176.55 rows=394880 width=4) (actual time=6255.957..7415.152 rows=7500000
loops=1)Group Key: l_orderkey ->  Gather  (cost=652427.75..893304.55 rows=2369280 width=4) (actual
time=1741.706..3699.248rows=7940881 loops=1)       Workers Planned: 6       Workers Launched: 6       ->  Partial
HashAggregate (cost=651427.75..655376.55 rows=394880 width=4) (actual time=1731.468..1956.693 rows=11344
GroupKey: l_orderkey             ->  Parallel Seq Scan on lineitem  (cost=0.00..638927.80 rows=4999980 width=4) (actual
time=0.025..711.771rows

Planning time: 0.192 ms
Execution time: 7700.027 ms

The important part is the estimated rows=394880 and actual
rows=7500000. That means that the hash-table on the leader backend will
first be created suitably for 394880 rows.  But we obviously get a lot
more.

The problem then is that even after introducing the hash_iv stuff
(b81b5a96f), we essentially still iterate over the tuples in
hash-order.  TupleHashTableHash() computes the hash for a single-column
group condition as:
   uint32 hashkey = hashtable->hash_iv;
   hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0);   attr = slot_getattr(slot, att, &isNull);   if
(!isNull)           /* treat nulls as having hash key 0 */   {       uint32 hkey;

       hkey = DatumGetUInt32(FunctionCall1(&hashfunctions[i], attr));       hashkey ^= hkey;   }

the resulting hash-values aren't actually meaningfully influenced by the
IV. Because we just xor with the IV, most hash-value that without the IV
would have fallen into a single hash-bucket, fall into a single
hash-bucket afterwards as well; just somewhere else in the hash-range.

Which then leads to the issue that we insert hundreds of rows into the
leader's hash-value that fall into the same hash-bucket (as the leader's
hashtable is quite small at this point, the hash-buckets on the worker
tables have *far* fewer entries per bucket / are far smaller on the
workers than the leaders).  Which then leads to super-long runs.

As soon as the table gets to it "real" size, the issue of the overlong
runs is gone, because now the hash-table has the appropriate size for
its contents.

So that explains why there's a there's a lot of resizes when the
fillfactor is low (as you've observed seen in the 10-20% range) - in
specific hash-ranges the fillfactor is extremely high, but in other
parts the table is effectively empty.

The question remains what to do about that... I think it's pretty clear
that we need a "defense" against such unbalanced hashtables - now that I
understand how we can get into the situation of these being so
unbalanced I'm a lot less uncomfortable adding the necessary logic.

In addition to that it seems quite worthwhile to provide an iterator
that's not vulnerable to this.  An approach that I am, seemingly
successfully, testing is to iterate the hashtable in multiple (in my
case 23, because why not) passes, accessing only every nth element. That
allows the data to be inserted in a lot less "dense" fashion.  But
that's more an optimization, so I'll just push something like the patch
mentioned in the thread already.

Makes some sense?

- Andres

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Andres Freund

Date:

02 March 2017, 22:56:22

On 2017-03-01 11:05:33 +0530, Kuntal Ghosh wrote:
> On Wed, Mar 1, 2017 at 10:53 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2017-03-01 10:47:45 +0530, Kuntal Ghosh wrote:
> >> if (insertdist > curdist)
> >> {
> >> swap the entry to be inserted with the current entry.
> >> Try to insert the current entry in the same logic.
> >> }
> >>
> >> So, the second approach will not cause all the followers to be shifted by 1.
> >
> > How not? You'll have to do that game until you found a free slot.
> You can skip increasing the curdist of some elements for which it is
> already pretty high. Suppose, we've the following hash table and an
> element e,
> _,_,_,i,_,_,j,j+1,j+2,_,_
> We want to insert e at i but there is a collision and we start doing
> probing. At j, the condition if (insertdist > curdist) satisfies and
> we'll swap e with the element at j. Now, we try to insert e( = element
> at j) and so on. In this process, if the element at j+1,j+2 has
> already high distance from their optimal location, you can skip those
> element and go ahead. But, in the existing approach, we increase their
> distance further. Thoughts?

All the following elements already are at their "optimal" position given
the previous occupation of the hash-table.  As we only start to move
once the insert-distance of the existing element is lower than the one
of the to-be-inserted one, the following elements will all be moved.
Doing the swap on a element-by-element basis would be possible, but just
achieves the same result at a higher cost.

Regards,

Andres

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Robert Haas

Date:

03 March 2017, 06:11:52

On Fri, Mar 3, 2017 at 1:22 AM, Andres Freund <andres@anarazel.de> wrote:
> the resulting hash-values aren't actually meaningfully influenced by the
> IV. Because we just xor with the IV, most hash-value that without the IV
> would have fallen into a single hash-bucket, fall into a single
> hash-bucket afterwards as well; just somewhere else in the hash-range.

Wow, OK.  I had kind of assumed (without looking) that setting the
hash IV did something a little more useful than that.  Maybe we should
do something like struct blah { int iv; int hv; }; newhv =
hash_any(&blah, sizeof(blah)).

> In addition to that it seems quite worthwhile to provide an iterator
> that's not vulnerable to this.  An approach that I am, seemingly
> successfully, testing is to iterate the hashtable in multiple (in my
> case 23, because why not) passes, accessing only every nth element. That
> allows the data to be inserted in a lot less "dense" fashion.  But
> that's more an optimization, so I'll just push something like the patch
> mentioned in the thread already.
>
> Makes some sense?

Yep.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Kuntal Ghosh

Date:

03 March 2017, 08:53:00

On Fri, Mar 3, 2017 at 8:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 3, 2017 at 1:22 AM, Andres Freund <andres@anarazel.de> wrote:
>> the resulting hash-values aren't actually meaningfully influenced by the
>> IV. Because we just xor with the IV, most hash-value that without the IV
>> would have fallen into a single hash-bucket, fall into a single
>> hash-bucket afterwards as well; just somewhere else in the hash-range.
>
> Wow, OK.  I had kind of assumed (without looking) that setting the
> hash IV did something a little more useful than that.  Maybe we should
> do something like struct blah { int iv; int hv; }; newhv =
> hash_any(&blah, sizeof(blah)).
>
Sounds good. I've seen a post from Thomas Munro suggesting some
alternative approach for combining hash values in execGrouping.c[1].

>> In addition to that it seems quite worthwhile to provide an iterator
>> that's not vulnerable to this.  An approach that I am, seemingly
>> successfully, testing is to iterate the hashtable in multiple (in my
>> case 23, because why not) passes, accessing only every nth element. That
>> allows the data to be inserted in a lot less "dense" fashion.  But
>> that's more an optimization, so I'll just push something like the patch
>> mentioned in the thread already.
>>
>> Makes some sense?
>
> Yep.
>
Yes, it makes sense. Quadratic probing is another approach, but it
would require an extra shift op every time we want to find the next or
prev location during a collision.

[1] https://www.postgresql.org/message-id/CAEepm%3D3rdgjfxW4cKvJ0OEmya2-34B0qHNG1xV0vK7TGPJGMUQ@mail.gmail.com
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Andres Freund

Date:

04 March 2017, 03:26:00

Hi,

attached is a patch to address this problem, and the one reported by
Dilip.  I ran a lot of TPC-H and other benchmarks, and so far this
addresses all the performance issues, often being noticeably faster than
with the dynahash code.

Comments?

On 2017-03-03 11:23:00 +0530, Kuntal Ghosh wrote:
> On Fri, Mar 3, 2017 at 8:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Fri, Mar 3, 2017 at 1:22 AM, Andres Freund <andres@anarazel.de> wrote:
> >> the resulting hash-values aren't actually meaningfully influenced by the
> >> IV. Because we just xor with the IV, most hash-value that without the IV
> >> would have fallen into a single hash-bucket, fall into a single
> >> hash-bucket afterwards as well; just somewhere else in the hash-range.
> >
> > Wow, OK.  I had kind of assumed (without looking) that setting the
> > hash IV did something a little more useful than that.  Maybe we should
> > do something like struct blah { int iv; int hv; }; newhv =
> > hash_any(&blah, sizeof(blah)).

The hash invocations are already noticeable performancewise, so I'm a
bit hesitant to go there.  I'd rather introduce a decent 'hash_combine'
function or such.

> Sounds good. I've seen a post from Thomas Munro suggesting some
> alternative approach for combining hash values in execGrouping.c[1].

Yea, I played with that too - it's a bit better, but doesn't really
address the problem fully either.  Seems to generally reduce collisions
in multi-column keys a bit, but not that much.

> >> In addition to that it seems quite worthwhile to provide an iterator
> >> that's not vulnerable to this.  An approach that I am, seemingly
> >> successfully, testing is to iterate the hashtable in multiple (in my
> >> case 23, because why not) passes, accessing only every nth element. That
> >> allows the data to be inserted in a lot less "dense" fashion.  But
> >> that's more an optimization, so I'll just push something like the patch
> >> mentioned in the thread already.
> >>
> >> Makes some sense?
> >
> > Yep.
> >
> Yes, it makes sense. Quadratic probing is another approach, but it
> would require an extra shift op every time we want to find the next or
> prev location during a collision.

That's not really the big problem with quadratic probing.  The issue is
that it leads to further random unpredictable memory accesses in case of
collisions, whereas linear probing has easy to predict accesses.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

0001-Make-simplehash.h-grow-hashtable-in-additional-cases.patch

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Robert Haas

Date:

04 March 2017, 08:39:40

On Sat, Mar 4, 2017 at 5:56 AM, Andres Freund <andres@anarazel.de> wrote:
> attached is a patch to address this problem, and the one reported by
> Dilip.  I ran a lot of TPC-H and other benchmarks, and so far this
> addresses all the performance issues, often being noticeably faster than
> with the dynahash code.
>
> Comments?

I'm still not convinced that raising the fillfactor like this is going
to hold up in testing, but I don't mind you committing it and we'll
see what happens.  If it is possible to survive with a fillfactor that
high, it certainly has some advantages, and it's obviously more likely
to work out with the other logic you've added.  I think we need a lot
more people playing with this to know whether any given approach is
right, and I think getting something committed will help more than any
amount of theoretical discussion.

I think DEBUG1 is far too high for something that could occur with
some frequency on a busy system; I'm fairly strongly of the opinion
that you ought to downgrade that by a couple of levels, say to DEBUG3
or so.

> On 2017-03-03 11:23:00 +0530, Kuntal Ghosh wrote:
>> On Fri, Mar 3, 2017 at 8:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > On Fri, Mar 3, 2017 at 1:22 AM, Andres Freund <andres@anarazel.de> wrote:
>> >> the resulting hash-values aren't actually meaningfully influenced by the
>> >> IV. Because we just xor with the IV, most hash-value that without the IV
>> >> would have fallen into a single hash-bucket, fall into a single
>> >> hash-bucket afterwards as well; just somewhere else in the hash-range.
>> >
>> > Wow, OK.  I had kind of assumed (without looking) that setting the
>> > hash IV did something a little more useful than that.  Maybe we should
>> > do something like struct blah { int iv; int hv; }; newhv =
>> > hash_any(&blah, sizeof(blah)).
>
> The hash invocations are already noticeable performancewise, so I'm a
> bit hesitant to go there.  I'd rather introduce a decent 'hash_combine'
> function or such.

Yes, that might be better.  I wasn't too sure the approach I proposed
would actually do a sufficiently-good job mixing it the bits from the
IV anyway.  It's important to keep in mind that the values we're using
as IVs aren't necessarily going to be uniformly distributed in any
meaningful way.  They're just PIDs, so you might only have 1-3 bits of
difference between one value and another within the same parallel
query.  If you don't do something fairly aggressive to make that
change perturb the final hash value, it probably won't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Kuntal Ghosh

Date:

06 March 2017, 09:56:24

On Sat, Mar 4, 2017 at 11:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Mar 4, 2017 at 5:56 AM, Andres Freund <andres@anarazel.de> wrote:
>> attached is a patch to address this problem, and the one reported by
>> Dilip.  I ran a lot of TPC-H and other benchmarks, and so far this
>> addresses all the performance issues, often being noticeably faster than
>> with the dynahash code.
>>
>> Comments?
+                            errdetail("size %f/members %f: factor %.2f",
+                                      (double)tb->size, (double)tb->members,
+                                      (double) tb->members / tb->size),
In SH_STAT, we use 'filled' instead of 'factor'. For displaying size
and members, there is no need to convert those into double.


> I'm still not convinced that raising the fillfactor like this is going
> to hold up in testing, but I don't mind you committing it and we'll
> see what happens.  If it is possible to survive with a fillfactor that
> high, it certainly has some advantages, and it's obviously more likely
> to work out with the other logic you've added.  I think we need a lot
> more people playing with this to know whether any given approach is
> right, and I think getting something committed will help more than any
> amount of theoretical discussion.
+1

> I think DEBUG1 is far too high for something that could occur with
> some frequency on a busy system; I'm fairly strongly of the opinion
> that you ought to downgrade that by a couple of levels, say to DEBUG3
> or so.
+1

I've tested with TPC-H query 18 and it's working fine.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Andres Freund

Date:

06 March 2017, 23:32:00

On 2017-03-04 11:09:40 +0530, Robert Haas wrote:
> On Sat, Mar 4, 2017 at 5:56 AM, Andres Freund <andres@anarazel.de> wrote:
> > attached is a patch to address this problem, and the one reported by
> > Dilip.  I ran a lot of TPC-H and other benchmarks, and so far this
> > addresses all the performance issues, often being noticeably faster than
> > with the dynahash code.
> >
> > Comments?
> 
> I'm still not convinced that raising the fillfactor like this is going
> to hold up in testing, but I don't mind you committing it and we'll
> see what happens.

I didn't see anything in testing, but I agree that it's debatable.  But
I'd rather commit it now, when we all know it's new code.  Raising it in
a new release will be a lot harder.


> I think DEBUG1 is far too high for something that could occur with
> some frequency on a busy system; I'm fairly strongly of the opinion
> that you ought to downgrade that by a couple of levels, say to DEBUG3
> or so.

I actually planned to remove it entirely, before committing. It was more
left in for testers to be able to see when the code triggers.


> > On 2017-03-03 11:23:00 +0530, Kuntal Ghosh wrote:
> >> On Fri, Mar 3, 2017 at 8:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> > On Fri, Mar 3, 2017 at 1:22 AM, Andres Freund <andres@anarazel.de> wrote:
> >> >> the resulting hash-values aren't actually meaningfully influenced by the
> >> >> IV. Because we just xor with the IV, most hash-value that without the IV
> >> >> would have fallen into a single hash-bucket, fall into a single
> >> >> hash-bucket afterwards as well; just somewhere else in the hash-range.
> >> >
> >> > Wow, OK.  I had kind of assumed (without looking) that setting the
> >> > hash IV did something a little more useful than that.  Maybe we should
> >> > do something like struct blah { int iv; int hv; }; newhv =
> >> > hash_any(&blah, sizeof(blah)).
> >
> > The hash invocations are already noticeable performancewise, so I'm a
> > bit hesitant to go there.  I'd rather introduce a decent 'hash_combine'
> > function or such.
> 
> Yes, that might be better.  I wasn't too sure the approach I proposed
> would actually do a sufficiently-good job mixing it the bits from the
> IV anyway.  It's important to keep in mind that the values we're using
> as IVs aren't necessarily going to be uniformly distributed in any
> meaningful way.  They're just PIDs, so you might only have 1-3 bits of
> difference between one value and another within the same parallel
> query.  If you don't do something fairly aggressive to make that
> change perturb the final hash value, it probably won't.

FWIW, I played with some better mixing, and it helps a bit with
accurately sized hashtables and multiple columns.

What's however more interesting is that a better mixed IV and/or better
iteration now *slightly* *hurts* performance with grossly misestimated
sizes, because resizing has to copy more rows... Not what I predicted.

Greetings,

Andres Freund

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Robert Haas

Date:

07 March 2017, 00:58:23

On Mon, Mar 6, 2017 at 3:32 PM, Andres Freund <andres@anarazel.de> wrote:
>> I think DEBUG1 is far too high for something that could occur with
>> some frequency on a busy system; I'm fairly strongly of the opinion
>> that you ought to downgrade that by a couple of levels, say to DEBUG3
>> or so.
>
> I actually planned to remove it entirely, before committing. It was more
> left in for testers to be able to see when the code triggers.

Oh, OK.  That'd be fine too.

> FWIW, I played with some better mixing, and it helps a bit with
> accurately sized hashtables and multiple columns.
>
> What's however more interesting is that a better mixed IV and/or better
> iteration now *slightly* *hurts* performance with grossly misestimated
> sizes, because resizing has to copy more rows... Not what I predicted.

I don't quite follow this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Andres Freund

Date:

07 March 2017, 01:19:29

On 2017-03-06 16:58:23 -0500, Robert Haas wrote:
> On Mon, Mar 6, 2017 at 3:32 PM, Andres Freund <andres@anarazel.de> wrote:
> >> I think DEBUG1 is far too high for something that could occur with
> >> some frequency on a busy system; I'm fairly strongly of the opinion
> >> that you ought to downgrade that by a couple of levels, say to DEBUG3
> >> or so.
> >
> > I actually planned to remove it entirely, before committing. It was more
> > left in for testers to be able to see when the code triggers.
> 
> Oh, OK.  That'd be fine too.

And pushed that way.

> > FWIW, I played with some better mixing, and it helps a bit with
> > accurately sized hashtables and multiple columns.
> >
> > What's however more interesting is that a better mixed IV and/or better
> > iteration now *slightly* *hurts* performance with grossly misestimated
> > sizes, because resizing has to copy more rows... Not what I predicted.
> 
> I don't quite follow this.

The whole performance issue trigger this thread only exists when the
hashtable sizes are mis-estimated, right?  Turns out that after applying
the just-committed changes, that "fixing" the bad-mixing and/or doing
iteration that's not entirely in hash-order, slighty degrades
performance.  The difference is that without either of those additional
changes, we resize to the "right" size very early, when the hashtable is
barely filled (i.e. only few entries need to be moved), because the
imbalance is observed at tsart.  With the changes however the resizing
happens when the table is pretty full (i.e. a lot of entries need to be
moved).  So the early imbalance ends up actually not hurting
performance...

- Andres

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Robert Haas

Date:

07 March 2017, 02:59:02

On Mon, Mar 6, 2017 at 5:19 PM, Andres Freund <andres@anarazel.de> wrote:
> The whole performance issue trigger this thread only exists when the
> hashtable sizes are mis-estimated, right?  Turns out that after applying
> the just-committed changes, that "fixing" the bad-mixing and/or doing
> iteration that's not entirely in hash-order, slighty degrades
> performance.  The difference is that without either of those additional
> changes, we resize to the "right" size very early, when the hashtable is
> barely filled (i.e. only few entries need to be moved), because the
> imbalance is observed at tsart.  With the changes however the resizing
> happens when the table is pretty full (i.e. a lot of entries need to be
> moved).  So the early imbalance ends up actually not hurting
> performance...

Hmm.  I don't know what to do about that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Performance degradation in TPC-H Q18

From

Andres Freund

Date:

07 March 2017, 03:04:19

On 2017-03-06 18:59:02 -0500, Robert Haas wrote:
> On Mon, Mar 6, 2017 at 5:19 PM, Andres Freund <andres@anarazel.de> wrote:
> > The whole performance issue trigger this thread only exists when the
> > hashtable sizes are mis-estimated, right?  Turns out that after applying
> > the just-committed changes, that "fixing" the bad-mixing and/or doing
> > iteration that's not entirely in hash-order, slighty degrades
> > performance.  The difference is that without either of those additional
> > changes, we resize to the "right" size very early, when the hashtable is
> > barely filled (i.e. only few entries need to be moved), because the
> > imbalance is observed at tsart.  With the changes however the resizing
> > happens when the table is pretty full (i.e. a lot of entries need to be
> > moved).  So the early imbalance ends up actually not hurting
> > performance...
> 
> Hmm.  I don't know what to do about that.

Oh, I don't think we need to do much about it either way - I think it's
more an interesting curiosity than anything.  Seems to reduce the
urgency for better iteration a bit, that's it.

- Andres