Re: tuplestore API problem - Mailing list pgsql-hackers

From Hitoshi Harada
Subject Re: tuplestore API problem
Date
Msg-id e08cc0400903272331q4e3ab3a7m60e05d605c7c6944@mail.gmail.com
Whole thread Raw
In response to Re: tuplestore API problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: tuplestore API problem  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
2009/3/28 Tom Lane <tgl@sss.pgh.pa.us>:
> Hitoshi Harada <umi.tanuki@gmail.com> writes:
>> 2009/3/27 Hitoshi Harada <umi.tanuki@gmail.com>:
>>> 2009/3/27 Tom Lane <tgl@sss.pgh.pa.us>:
>>>> A brute-force solution is to change tuplestore_gettupleslot() so that it
>>>> always copies the tuple, but this would be wasted cycles for most uses
>>>> of tuplestores.  I'm thinking of changing tuplestore_gettupleslot's API
>>>> to add a bool parameter specifying whether the caller wants to force
>>>> a copy.
>
>> Here's the patch. Hope there are no more on the same reason. It seems
>> that we'd need to implement something like garbage collector in
>> tuplestore, marking and tracing each row references, if the complete
>> solution is required.
>
> I don't like this; I'm planning to go with the aforementioned API
> change instead.  The way you have it guarantees an extra copy cycle
> even when tuplestore is already making a copy internally; and it doesn't
> help if we find similar problems elsewhere.  (While I'm making the
> API change I'll take a close look at each call site to see if it has
> any similar risk.)

You're right. It kills performance even after dumptuples(). Thinking
more, I found the cause is only around dumptuples(). If you can trace
TupleTableSlots that points to memtuples inside tuplestore, you can
materialize them just before WRITETUP() in dumptuples().
So I tried pass EState.es_tupleTables to tuplestore_begin_heap() to
trace those TupleTableSlots. Note that if you pass NULL the behavior
is as before so nothing's broken. Regression passes but I'm not quite
sure EState.es_tupleTable is only place that holds TupleTableSlots
passed to tuplestore...

I know you propose should_copy boolean parameter would be added to
tuplestore_gettupleslot(). That always adds overhead even if
tuplestore *doesn't* dump tuples. That case doesn't need copy tuples,
I guess.

Regards,



--
Hitoshi Harada

Attachment

pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: 8.4 release notes proof reading 1/2
Next
From: Simon Riggs
Date:
Subject: Re: Should SET ROLE inherit config params?