Home > mailing lists

Re: DBT-3 with SF=20 got failed - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: DBT-3 with SF=20 got failed
Date	August 20, 2015 23:13:05
Msg-id	55D634C5.8030006@2ndquadrant.com Whole thread Raw
In response to	Re: DBT-3 with SF=20 got failed (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: DBT-3 with SF=20 got failed (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List	pgsql-hackers

Tree view

Hi,

On 08/19/2015 03:53 PM, Tom Lane wrote:
>
> I don't see anything very wrong with constraining the initial
> allocation to 1GB, or even less. That will prevent consuming insane
> amounts of work_mem when the planner's rows estimate is too high
> rather than too low. And we do have the ability to increase the hash
> table size on the fly.

Perhaps. Limiting the initial allocation to 1GB might help with lowering 
memory consumed in case of over-estimation, and it's high enough so that 
regular queries don't run into that.

My initial thought was "if the memory consumption is a problem, lower 
the work_mem" but after thinking about it for a while I don't see a 
reason to limit the memory consumption this way, if possible.

But what is the impact on queries that actually need more than 1GB of 
buckets? I assume we'd only limit the initial allocation and still allow 
the resize based on the actual data (i.e. the 9.5 improvement), so the 
queries would start with 1GB and then resize once finding out the 
optimal size (as done in 9.5). The resize is not very expensive, but 
it's not free either, and with so many tuples (requiring more than 1GB 
of buckets, i.e. ~130M tuples) it's probably just a noise in the total 
query runtime. But I'd be nice to see some proofs of that ...

Also, why 1GB and not 512MB (which is effectively the current limit on 
9.4, because we can't allocate 1GB there so we end up with 1/2 of that)? 
Why not some small percentage of work_mem, e.g. 5%?

Most importantly, this is mostly orthogonal to the original bug report. 
Even if we do this, we still have to fix the palloc after the resize.

So I'd say let's do a minimal bugfix of the actual issue, and then 
perhaps improve the behavior in case of significant overestimates. The 
bugfix should happen ASAP so that it gets into 9.5.0 (and backported). 
We already have patches for that.

Limiting the initial allocation deserves more thorough discussion and 
testing, and I'd argue it's 9.6 material at this point.

> The real problem here is the potential integer overflow in
> ExecChooseHashTableSize. Now that we know there is one, that should
> be fixed (and not only in HEAD/9.5).

I don't think there's any integer overflow. The problem is that we end 
up with nbuckets so high that (nbuckets*8) overflows 1GB-1.

There's a protection against integer overflow in place (it was there for 
a long time), but there never was any protection against the 1GB limit. 
Because we've been using much smaller work_mem values and 
NTUP_PER_BUCKET=10, so we could not actually reach it.

> But I believe Kaigai-san withdrew his initial proposed patch, and we
> don't have a replacementas far as I saw.

I believe the patch proposed by KaiGai-san is the right one to fix the 
bug discussed in this thread. My understanding is KaiGai-san withdrew 
the patch as he wants to extend it to address the over-estimation issue.

I don't think we should do that - IMHO that's an unrelated improvement 
and should be addressed in a separate patch.

kind regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Andres Freund
Date: 20 August 2015, 22:50:54
Subject: Re: allowing wal_level change at run time

From: Peter Geoghegan
Date: 20 August 2015, 23:16:41
Subject: Re: Using quicksort for every external sort run

Re: DBT-3 with SF=20 got failed - Mailing list pgsql-hackers

Previous

Next