Thread: memory context for tuplesort return values
I've been modifying tuplesort.c to keep all the sort's local data in a separate memory context, to simplify and speed up cleaning up the data during tuplesort_end. I thought this would be a straightforward change, but was disillusioned when I got a core dump while testing :-(. On investigation the problem is that nodeSort.c keeps the tuple pointer returned by tuplesort.c in a TupleTableSlot, and the order of operations during plan shutdown is such that the slot tries to pfree the tuple only after tuplesort_end has been called. Of course this means it's pfree'ing already-released memory. I don't want to give up the idea of keeping sort-local data in a private context --- it just seems cleaner, as well as faster, than letting it be mixed into the caller's stuff. I can see two alternatives: 1. Document an ordering constraint that the result of tuplesort_gettuple must be pfree'd, if at all, before tuplesort_end. AFAICT this would require only one simple change in the existing backend code (add an ExecClearTuple at a suitable spot in nodeSort.c). However there is the potential to break add-on modules, if anyone is using tuplesort directly. 2. Arrange that if the result of tuplesort_gettuple is supposed to be pfree'd by the caller, it is stored in the caller's context not the private context. Unfortunately, this looks like it would be pretty messy inside tuplesort.c and would force an extra palloc/copy per tuple. (The hard case is where we are doing the final merge on-the-fly.) I'm inclined to go with #1 but wanted to see if this would really offend anyone's sensibilities. regards, tom lane
Tom Lane wrote: > I've been modifying tuplesort.c to keep all the sort's local data in a > separate memory context, to simplify and speed up cleaning up the data > during tuplesort_end. I thought this would be a straightforward change, > but was disillusioned when I got a core dump while testing :-(. On > investigation the problem is that nodeSort.c keeps the tuple pointer > returned by tuplesort.c in a TupleTableSlot, and the order of operations > during plan shutdown is such that the slot tries to pfree the tuple > only after tuplesort_end has been called. Of course this means it's > pfree'ing already-released memory. Is it possible to make the TupleTableSlot not free the tuple automatically? AFAIR there is a parameter to the creation routine of a TupleTableSlot to make it so, but I don't know if it's possible to use in tuplesort.c's case at all. (Or even if we are talking at the same level here -- my reading of that code was very shallow.) > 1. Document an ordering constraint that the result of tuplesort_gettuple > must be pfree'd, if at all, before tuplesort_end. AFAICT this would > require only one simple change in the existing backend code (add an > ExecClearTuple at a suitable spot in nodeSort.c). However there is the > potential to break add-on modules, if anyone is using tuplesort directly. Maybe it is possible to write some sort of magic number to the TupleTableSlot before it is destroyed, which could be checked if somebody tries to destroy it again, to warn them that the code should be changed to cope with the new order of things. This could be compiled conditionally on MEMORY_CONTEXT_CHECKING or something else that is only defined in --enable-cassert builds, so extension developers are likely to trip on it. It wouldn't be bullet-proof, because of course the memory could be used for something else before the subsequent invalid destruction, but it could help. (Actually the first idea that came to mind was checking for the 0x7f pattern ...) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> I've been modifying tuplesort.c to keep all the sort's local data in a >> separate memory context, to simplify and speed up cleaning up the data >> during tuplesort_end. > Is it possible to make the TupleTableSlot not free the tuple > automatically? AFAIR there is a parameter to the creation routine of a > TupleTableSlot to make it so, but I don't know if it's possible to use > in tuplesort.c's case at all. The problem is that we specifically *want* the caller to free the tuple when done with it. The API is essentially that tuplesort_gettuple is passing ownership of the tuple over to the caller. Without this it seems very hard to avoid an essentially useless palloc/pfree cycle per tuple. > Maybe it is possible to write some sort of magic number to the > TupleTableSlot before it is destroyed, which could be checked if > somebody tries to destroy it again, to warn them that the code should be > changed to cope with the new order of things. Oh, the coredump is reliable enough if you compiled with MEMORY_CONTEXT_CHECKING ... we don't need any more frammishes there. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Tom Lane wrote: > >> I've been modifying tuplesort.c to keep all the sort's local data in a > >> separate memory context, to simplify and speed up cleaning up the data > >> during tuplesort_end. > > > Is it possible to make the TupleTableSlot not free the tuple > > automatically? AFAIR there is a parameter to the creation routine of a > > TupleTableSlot to make it so, but I don't know if it's possible to use > > in tuplesort.c's case at all. > > The problem is that we specifically *want* the caller to free the tuple > when done with it. The API is essentially that tuplesort_gettuple is > passing ownership of the tuple over to the caller. Without this it > seems very hard to avoid an essentially useless palloc/pfree cycle > per tuple. Oh, true, I had got it backwards. So we would be turning a valid operation into a memory leak for extension developers :-( Since you assert that solution #2 is pretty messy, it would be good to stay away from it. However, if following #1 I'm not sure that documenting the new behavior is enough. I don't have any better suggestion however :-( Crazy ideas: add a #warning or #error to the header file unless there is some special symbol previously defined, something like #define I_UNDERSTAND_MEM_ALLOC_IN_TUPLETABLESLOT which means the developer did update his code. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Crazy ideas: add a #warning or #error to the header file unless there is > some special symbol previously defined, something like > #define I_UNDERSTAND_MEM_ALLOC_IN_TUPLETABLESLOT > which means the developer did update his code. Well, if we wanted to go that far, the simple solution is to rename tuplesort_gettuple to something else. Guaranteed code breakage then. However, I think this is overkill for a problem that only *might* bite people. I kinda doubt that people are using TupleTableSlots in add-on code, because the design of the executor makes it hard for anything except plan nodes to obtain slots. You'd have to posit a piece of code that isn't structured as while ((tuple = tuplesort_gettuple(...)) != NULL){ process tuple; if (should_free) pfree(tuple);} but instead hangs onto the last tuple past the sort shutdown, and yet isn't a plan node. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Crazy ideas: add a #warning or #error to the header file unless there is > > some special symbol previously defined, something like > > #define I_UNDERSTAND_MEM_ALLOC_IN_TUPLETABLESLOT > > which means the developer did update his code. > > Well, if we wanted to go that far, the simple solution is to rename > tuplesort_gettuple to something else. Guaranteed code breakage then. > However, I think this is overkill for a problem that only *might* > bite people. I kinda doubt that people are using TupleTableSlots in > add-on code, because the design of the executor makes it hard for > anything except plan nodes to obtain slots. Yeah, you may be right. I guess I was worried because the Mammoth Replicator code uses TupleTableSlots here and there and I was assuming that other developers may have similar code. However, Replicator is fairly involved stuff, and I don't see that TupleTableSlots would have much use outside something that tied to the internals. And of course, every major Postgres release is major surgery for Replicator as well, so nothing new here :-( -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, 2006-02-23 at 16:10 -0500, Tom Lane wrote: > I don't want to give up the idea of keeping sort-local data in a > private > context --- it just seems cleaner, as well as faster, than letting it > be > mixed into the caller's stuff. I can see two alternatives: Would that be a single context for all sort operations, or a separate context for each sort within a plan? There is some evidence that high sort memory is not that useful during the final merge phase. Would it be possible to have multiple contexts within each sort e.g. Run Forming context and Final Merge context? That would then allow us to more easily free the Run Forming context before moving into the final context with a potentially different size. Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > On Thu, 2006-02-23 at 16:10 -0500, Tom Lane wrote: >> I don't want to give up the idea of keeping sort-local data in a >> private >> context --- it just seems cleaner, as well as faster, than letting it >> be >> mixed into the caller's stuff. I can see two alternatives: > Would that be a single context for all sort operations, or a separate > context for each sort within a plan? A private context for each sort operation. Otherwise you lose the point, which is to be able to use MemoryContextDelete to clean up in tuplesort_end. > There is some evidence that high sort memory is not that useful during > the final merge phase. Would it be possible to have multiple contexts > within each sort e.g. Run Forming context and Final Merge context? That > would then allow us to more easily free the Run Forming context before > moving into the final context with a potentially different size. Possible, but I'm not going to implement it without more evidence. The tests I did way back when showed considerable usefulness for the merge preload behavior, and I think that your change to allow N tapes probably made it even more useful (because with fewer merge passes, the tape files don't get so disorganized). So I'm inclined to leave it as-is. regards, tom lane