Re: [HACKERS] Parallel Hash take II - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] Parallel Hash take II
Date
Msg-id CAH2-WznquJ56iXP-XU=vMG7mXNqbO8=j02RHE5xav2A6gS_OpA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Hash take II  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] Parallel Hash take II  (Andres Freund <andres@anarazel.de>)
Re: [HACKERS] Parallel Hash take II  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
On Wed, Nov 15, 2017 at 10:35 AM, Andres Freund <andres@anarazel.de> wrote:
>> I realize you're sort of joking here, but I think it's necessary to
>> care about fairness between pieces of code.
>
> Indeed I kinda was.

When I posted v1 of parallel CREATE INDEX, it followed the hash join
model of giving workMem (maintenance_work_mem) to every worker. Robert
suggested that my comparison with a serial case was therefore not
representative, since I was using much more memory. I actually changed
the patch to use a single maintenance_work_mem share for the entire
operation afterwards, which seemed to work better. And, it made very
little difference to performance for my original benchmark in the end,
so I was arguably wasting memory in v1.

>> I mean, the very first version of this patch that Thomas submitted was
>> benchmarked by Rafia and had phenomenally good performance
>> characteristics.  That turned out to be because it wasn't respecting
>> work_mem; you can often do a lot better with more memory, and
>> generally you can't do nearly as well with less.  To make comparisons
>> meaningful, they have to be comparisons between algorithms that use
>> the same amount of memory.  And it's not just about testing.  If we
>> add an algorithm that will run twice as fast with equal memory but
>> only allow it half as much, it will probably never get picked and the
>> whole patch is a waste of time.

The contrast with the situation with Thomas and his hash join patch is
interesting. Hash join is *much* more sensitive to the availability of
memory than a sort operation is.

> I don't really have a good answer to "but what should we otherwise do",
> but I'm doubtful this is quite the right answer.

I think that the work_mem model should be replaced by something that
centrally budgets memory. It would make sense to be less generous with
sorts and more generous with hash joins when memory is in short
supply, for example, and a model like this can make that possible. The
work_mem model has always forced users to be far too conservative.
Workloads are very complicated, and always having users target the
worst case leaves a lot to be desired.

-- 
Peter Geoghegan


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] [POC] Faster processing at Gather node
Next
From: Tom Lane
Date:
Subject: Re: Updated macOS start scripts