Thread: memory context for tuplesort return values

memory context for tuplesort return values

From
Tom Lane
Date:
I've been modifying tuplesort.c to keep all the sort's local data in a
separate memory context, to simplify and speed up cleaning up the data
during tuplesort_end.  I thought this would be a straightforward change,
but was disillusioned when I got a core dump while testing :-(.  On
investigation the problem is that nodeSort.c keeps the tuple pointer
returned by tuplesort.c in a TupleTableSlot, and the order of operations
during plan shutdown is such that the slot tries to pfree the tuple
only after tuplesort_end has been called.  Of course this means it's
pfree'ing already-released memory.

I don't want to give up the idea of keeping sort-local data in a private
context --- it just seems cleaner, as well as faster, than letting it be
mixed into the caller's stuff.  I can see two alternatives:

1. Document an ordering constraint that the result of tuplesort_gettuple
must be pfree'd, if at all, before tuplesort_end.  AFAICT this would
require only one simple change in the existing backend code (add an
ExecClearTuple at a suitable spot in nodeSort.c).  However there is the
potential to break add-on modules, if anyone is using tuplesort directly.

2. Arrange that if the result of tuplesort_gettuple is supposed to be
pfree'd by the caller, it is stored in the caller's context not the
private context.  Unfortunately, this looks like it would be pretty
messy inside tuplesort.c and would force an extra palloc/copy per tuple.
(The hard case is where we are doing the final merge on-the-fly.)

I'm inclined to go with #1 but wanted to see if this would really offend
anyone's sensibilities.
        regards, tom lane


Re: memory context for tuplesort return values

From
Alvaro Herrera
Date:
Tom Lane wrote:
> I've been modifying tuplesort.c to keep all the sort's local data in a
> separate memory context, to simplify and speed up cleaning up the data
> during tuplesort_end.  I thought this would be a straightforward change,
> but was disillusioned when I got a core dump while testing :-(.  On
> investigation the problem is that nodeSort.c keeps the tuple pointer
> returned by tuplesort.c in a TupleTableSlot, and the order of operations
> during plan shutdown is such that the slot tries to pfree the tuple
> only after tuplesort_end has been called.  Of course this means it's
> pfree'ing already-released memory.

Is it possible to make the TupleTableSlot not free the tuple
automatically?  AFAIR there is a parameter to the creation routine of a
TupleTableSlot to make it so, but I don't know if it's possible to use
in tuplesort.c's case at all.  (Or even if we are talking at the same
level here -- my reading of that code was very shallow.)

> 1. Document an ordering constraint that the result of tuplesort_gettuple
> must be pfree'd, if at all, before tuplesort_end.  AFAICT this would
> require only one simple change in the existing backend code (add an
> ExecClearTuple at a suitable spot in nodeSort.c).  However there is the
> potential to break add-on modules, if anyone is using tuplesort directly.

Maybe it is possible to write some sort of magic number to the
TupleTableSlot before it is destroyed, which could be checked if
somebody tries to destroy it again, to warn them that the code should be
changed to cope with the new order of things.  This could be compiled
conditionally on MEMORY_CONTEXT_CHECKING or something else that is only
defined in --enable-cassert builds, so extension developers are likely
to trip on it.  It wouldn't be bullet-proof, because of course the
memory could be used for something else before the subsequent invalid
destruction, but it could help.

(Actually the first idea that came to mind was checking for the 0x7f
pattern ...)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: memory context for tuplesort return values

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> I've been modifying tuplesort.c to keep all the sort's local data in a
>> separate memory context, to simplify and speed up cleaning up the data
>> during tuplesort_end.

> Is it possible to make the TupleTableSlot not free the tuple
> automatically?  AFAIR there is a parameter to the creation routine of a
> TupleTableSlot to make it so, but I don't know if it's possible to use
> in tuplesort.c's case at all.

The problem is that we specifically *want* the caller to free the tuple
when done with it.  The API is essentially that tuplesort_gettuple is
passing ownership of the tuple over to the caller.  Without this it
seems very hard to avoid an essentially useless palloc/pfree cycle
per tuple.

> Maybe it is possible to write some sort of magic number to the
> TupleTableSlot before it is destroyed, which could be checked if
> somebody tries to destroy it again, to warn them that the code should be
> changed to cope with the new order of things.

Oh, the coredump is reliable enough if you compiled with
MEMORY_CONTEXT_CHECKING ... we don't need any more frammishes there.
        regards, tom lane


Re: memory context for tuplesort return values

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Tom Lane wrote:
> >> I've been modifying tuplesort.c to keep all the sort's local data in a
> >> separate memory context, to simplify and speed up cleaning up the data
> >> during tuplesort_end.
> 
> > Is it possible to make the TupleTableSlot not free the tuple
> > automatically?  AFAIR there is a parameter to the creation routine of a
> > TupleTableSlot to make it so, but I don't know if it's possible to use
> > in tuplesort.c's case at all.
> 
> The problem is that we specifically *want* the caller to free the tuple
> when done with it.  The API is essentially that tuplesort_gettuple is
> passing ownership of the tuple over to the caller.  Without this it
> seems very hard to avoid an essentially useless palloc/pfree cycle
> per tuple.

Oh, true, I had got it backwards.  So we would be turning a valid
operation into a memory leak for extension developers :-(  Since you
assert that solution #2 is pretty messy, it would be good to stay away
from it.  However, if following #1 I'm not sure that documenting the new
behavior is enough.  I don't have any better suggestion however :-(

Crazy ideas: add a #warning or #error to the header file unless there is
some special symbol previously defined, something like
#define I_UNDERSTAND_MEM_ALLOC_IN_TUPLETABLESLOT
which means the developer did update his code.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: memory context for tuplesort return values

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Crazy ideas: add a #warning or #error to the header file unless there is
> some special symbol previously defined, something like
> #define I_UNDERSTAND_MEM_ALLOC_IN_TUPLETABLESLOT
> which means the developer did update his code.

Well, if we wanted to go that far, the simple solution is to rename
tuplesort_gettuple to something else.  Guaranteed code breakage then.
However, I think this is overkill for a problem that only *might*
bite people.  I kinda doubt that people are using TupleTableSlots in
add-on code, because the design of the executor makes it hard for
anything except plan nodes to obtain slots.  You'd have to posit a
piece of code that isn't structured as
while ((tuple = tuplesort_gettuple(...)) != NULL){    process tuple;    if (should_free)        pfree(tuple);}

but instead hangs onto the last tuple past the sort shutdown, and
yet isn't a plan node.
        regards, tom lane


Re: memory context for tuplesort return values

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Crazy ideas: add a #warning or #error to the header file unless there is
> > some special symbol previously defined, something like
> > #define I_UNDERSTAND_MEM_ALLOC_IN_TUPLETABLESLOT
> > which means the developer did update his code.
> 
> Well, if we wanted to go that far, the simple solution is to rename
> tuplesort_gettuple to something else.  Guaranteed code breakage then.
> However, I think this is overkill for a problem that only *might*
> bite people.  I kinda doubt that people are using TupleTableSlots in
> add-on code, because the design of the executor makes it hard for
> anything except plan nodes to obtain slots.

Yeah, you may be right.  I guess I was worried because the Mammoth
Replicator code uses TupleTableSlots here and there and I was assuming
that other developers may have similar code.  However, Replicator is
fairly involved stuff, and I don't see that TupleTableSlots would have
much use outside something that tied to the internals.  And of course,
every major Postgres release is major surgery for Replicator as well, so
nothing new here :-(

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: memory context for tuplesort return values

From
Simon Riggs
Date:
On Thu, 2006-02-23 at 16:10 -0500, Tom Lane wrote:
> I don't want to give up the idea of keeping sort-local data in a
> private
> context --- it just seems cleaner, as well as faster, than letting it
> be
> mixed into the caller's stuff.  I can see two alternatives:

Would that be a single context for all sort operations, or a separate
context for each sort within a plan?

There is some evidence that high sort memory is not that useful during
the final merge phase. Would it be possible to have multiple contexts
within each sort e.g. Run Forming context and Final Merge context? That
would then allow us to more easily free the Run Forming context before
moving into the final context with a potentially different size.

Best Regards, Simon Riggs



Re: memory context for tuplesort return values

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Thu, 2006-02-23 at 16:10 -0500, Tom Lane wrote:
>> I don't want to give up the idea of keeping sort-local data in a
>> private
>> context --- it just seems cleaner, as well as faster, than letting it
>> be
>> mixed into the caller's stuff.  I can see two alternatives:

> Would that be a single context for all sort operations, or a separate
> context for each sort within a plan?

A private context for each sort operation.  Otherwise you lose the
point, which is to be able to use MemoryContextDelete to clean up
in tuplesort_end.

> There is some evidence that high sort memory is not that useful during
> the final merge phase. Would it be possible to have multiple contexts
> within each sort e.g. Run Forming context and Final Merge context? That
> would then allow us to more easily free the Run Forming context before
> moving into the final context with a potentially different size.

Possible, but I'm not going to implement it without more evidence.
The tests I did way back when showed considerable usefulness for the
merge preload behavior, and I think that your change to allow N tapes
probably made it even more useful (because with fewer merge passes,
the tape files don't get so disorganized).  So I'm inclined to leave it
as-is.
        regards, tom lane