Home > mailing lists

Re: DBT-3 with SF=20 got failed - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: DBT-3 with SF=20 got failed
Date	September 24, 2015 19:55:59
Msg-id	56042B18.9000008@2ndquadrant.com Whole thread Raw
In response to	Re: DBT-3 with SF=20 got failed (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: DBT-3 with SF=20 got failed
List	pgsql-hackers

Tree view

On 09/24/2015 05:18 PM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Of course, if we can postpone sizing the hash table until after the
>> input size is known, as you suggest, then that would be better still
>> (but not back-patch material).
>
> AFAICS, it works that way today as long as the hash fits in memory
> (ie, single-batch).  We load into a possibly seriously undersized hash
> table, but that won't matter for performance until we start probing it.
> At the conclusion of loading, MultiExecHash will call
> ExecHashIncreaseNumBuckets which will re-hash into a better-sized hash
> table.  I doubt this can be improved on much.
>
> It would be good if we could adjust the numbuckets choice at the
> conclusion of the input phase for the multi-batch case as well.
> The code appears to believe that wouldn't work, but I'm not sure if
> it's right about that, or how hard it'd be to fix if so.

So you suggest to use a small hash table even when we expect batching?

That would be rather difficult to do because of the way we derive 
buckets and batches from the hash value - they must not overlap. The 
current code simply assumes that once we start batching the number of 
bits needed for buckets does not change anymore.

It's possible to rework of course - the initial version of the patch 
actually did just that (although it was broken in other ways).

But I think the real problem here is the batching itself - if we 
overestimate and start batching (while we could actually run with a 
single batch), we've already lost.

But what about computing the number of expected batches, but always 
start executing assuming no batching? And only if we actually fill 
work_mem, we start batching and use the expected number of batches?

I.e.

1) estimate nbatches, but use nbatches=1

2) run until exhausting work_mem

3) start batching, with the initially estimated number of batches

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Tom Lane
Date: 24 September 2015, 19:52:21
Subject: Re: TEXT vs VARCHAR join qual push down diffrence, bug or expected?

From: Tomas Vondra
Date: 24 September 2015, 19:58:48
Subject: Re: multivariate statistics / patch v7

Re: DBT-3 with SF=20 got failed - Mailing list pgsql-hackers

Previous

Next