Re: tuplestore API problem - Mailing list pgsql-hackers

From Hitoshi Harada
Subject Re: tuplestore API problem
Date
Msg-id e08cc0400903270218t13311d6end3116fab409760e0@mail.gmail.com
Whole thread Raw
In response to Re: tuplestore API problem  (Hitoshi Harada <umi.tanuki@gmail.com>)
Responses Re: tuplestore API problem  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
2009/3/27 Hitoshi Harada <umi.tanuki@gmail.com>:
> 2009/3/27 Tom Lane <tgl@sss.pgh.pa.us>:
>> By chance I discovered that this query in the regression tests
>>
>> SELECT ntile(NULL) OVER (ORDER BY ten, four), ten, four FROM tenk1 LIMIT 2;
>>
>> stops working if work_mem is small enough: it either dumps core or
>> delivers wrong answers depending on platform.
>>
>> After some tracing I found out the reason.  ExecWindowAgg() does this:
>>
>>    if (!tuplestore_gettupleslot(winstate->buffer, true,
>>                                 winstate->ss.ss_ScanTupleSlot))
>>        elog(ERROR, "unexpected end of tuplestore");
>>
>> and then goes off and calls the window functions (ntile() here), and
>> expects the ScanTupleSlot to still be valid afterwards.  However,
>> ntile() forces us to read to the end of the input to find out the number
>> of rows.  If work_mem is small enough, that means the tuplestore is
>> forced into dump-to-disk mode, which means it releases all its in-memory
>> tuples.  And guess what: the ScanTupleSlot is pointing at one of those,
>> it doesn't have its own copy of the tuple.  So we wind up trying to read
>> from a trashed bit of memory.
>>
>> A brute-force solution is to change tuplestore_gettupleslot() so that it
>> always copies the tuple, but this would be wasted cycles for most uses
>> of tuplestores.  I'm thinking of changing tuplestore_gettupleslot's API
>> to add a bool parameter specifying whether the caller wants to force
>> a copy.
>>
>> Comments, better ideas?
>
> Is this tuplestore API problem? ISTM this is window function's
> problem. I think my early code was holding heaptuple instead of
> tupleslot for the current row. At a glance, the issue appears in only
> current row in window function, which fetches row and uses it later
> after storing following rows in some cases. So a brute-force solution
> might be that ExecWindowAgg() copies the current row from tuplestore
> instead of pointing directly to inside tuplestore memory, not changing
> tuplestore API.
Here's the patch. Hope there are no more on the same reason. It seems
that we'd need to implement something like garbage collector in
tuplestore, marking and tracing each row references, if the complete
solution is required.

Regards,

--
Hitoshi Harada

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: New trigger option of pg_standby
Next
From: Simon Riggs
Date:
Subject: Re: New trigger option of pg_standby