Home > mailing lists

Re: Memory Accounting v11 - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Memory Accounting v11
Date	June 15, 2015 01:51:22
Msg-id	557E0560.7020306@2ndquadrant.com Whole thread Raw
In response to	Re: Memory Accounting v11 (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Memory Accounting v11 (Simon Riggs <simon@2ndQuadrant.com>)
List	pgsql-hackers

Tree view

Hi,

On 06/14/15 22:21, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
>> This patch tracks memory usage (at the block level) for all memory
>> contexts. Individual palloc()s aren't tracked; only the new blocks
>> allocated to the memory context with malloc().
>> ...
>> My general answer to the performance concerns is that they aren't a
>> reason to block this patch, unless someone has a suggestion about how to
>> fix them. Adding one field to a struct and a few arithmetic operations
>> for each malloc() or realloc() seems reasonable to me.
>
> Maybe, but this here is a pretty weak argument:
>
>> The current state, where HashAgg just blows up the memory, is just not
>> reasonable, and we need to track the memory to fix that problem.
>
> Meh. HashAgg could track its memory usage without loading the entire
> system with a penalty.

+1 to a solution like that, although I don't think that's doable without 
digging the info from memory contexts somehow.
> Moreover, this is about fourth or fifth down the
> list of the implementation problems with spilling hash aggregation to
> disk.  It would be good to see credible solutions for the bigger issues
> before we buy into adding overhead for a mechanism with no uses in core.

I don't think so. If we don't have an acceptable solution for tracking 
memory in hashagg, there's not really much point in doing any of the 
following steps. That's why Jeff is tackling this problem first.

Also, Jeff already posted a PoC of the memory-bounded hashagg, although 
it only worked for aggregates with fixed-size state, and not that great 
for cases like array_agg where the state grows. Maybe the improvements 
to aggregate functions proposed by David Rowley last week might help in 
those cases, as that'd allow spilling those states to disk.

So while the approach proposed by Jeff may turn out not to be the right 
one, I think memory accounting needs to be solved first (even if it's 
not committed until the whole feature is ready).

>> Others have also mentioned that we might want to use this mechanism
>> to track memory for other operators, like Sort or HashJoin, which
>> might be simpler and more accurate.
>
> That would imply redefining the meaning of work_mem for those
> operations; it would suddenly include a lot more overhead space than
> it used to, causing them to spill to disk much more quickly than
> before, unless one changes the work_mem setting to compensate. Not
> sure that people would like such a change.

Don't we tweak the work_mem meaning over time anyway? Maybe not to such 
extent, but for example the hashjoin 9.5 improvements certainly change 
this by packing the tuples more densely, sizing the buckets differently 
etc. It's true the changes are in the other direction (i.e. batching 
less frequently) though.

OTOH this would make the accounting more accurate (e.g. eliminating 
differences between cases with different amounts of overhead because of 
different tuple width, for example). Wouldn't that be a good thing, even 
if people need to tweak the work_mem? Isn't that kinda acceptable when 
upgrading to the next major version?

> An even bigger problem is that it would pretty much break the logic
> around LACKMEM() in tuplesort.c, which supposes that spilling
> individual tuples to disk is a reliable and linear way to decrease
> the accounted-for memory. Individual pfree's would typically not
> decrease the accounted total in this implementation. Depending on
> what you have in mind for the spill-to-disk behavior, this issue
> could be a fail for HashAgg as well, which is another reason not to
> commit this patch in advance of seeing the use-case.

That's true, but I think the plan was always to wait for the whole 
feature, and only then commit all the pieces.

--
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Petr Jelinek
Date: 15 June 2015, 01:50:18
Subject: creating extension including dependencies

From: Michael Paquier
Date: 15 June 2015, 02:26:31
Subject: Re: The real reason why TAP testing isn't ready for prime time

Re: Memory Accounting v11 - Mailing list pgsql-hackers

Previous

Next