2011/10/30 Tom Lane <tgl@sss.pgh.pa.us>:
> Hitoshi Harada <umi.tanuki@gmail.com> writes:
>> On Sat, Oct 29, 2011 at 8:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I have not looked at the code, but ISTM the way that this has to work is
>>> that you set up a portal for each active scan. Then you can fetch a few
>>> rows at a time from any one of them.
>
>> Hmm, true. Looking back at the original proposal (neither did I look
>> at the code,) there seems to be a cursor mode. ISTM it is hard for fdw
>> to know how the whole plan tree looks, so consequently do we always
>> cursor regardless of estimated row numbers?
>
> I think we have to. Even if we estimate that a given scan will return
> only a few rows, what happens if we're wrong? We don't want to blow out
> memory on the local server by retrieving gigabytes in one go.
Oh, I overlooked the possibility of wrong estimation. Old PostgreSQL uses
1000 as default estimation, so big table which has not been analyzed may
crashes the backend.
To ensure the data retrieving safe, we need to get actual amount of result,
maybe by executing SELECT COUNT(*) in planning phase. It sounds too heavy
to do for every scan, and it still lacks actual width.
One possible idea is to change default value of min_cursur_rows option to 0
so that pgsql_fdw uses CURSOR by default, but it seems not enough. I'll
drop simple SELECT mode from first version of pgsql_fdw for safety.
>> I haven't had much experiences around cursor myself, but is it as
>> efficient as non-cursor?
>
> No, but if you need max efficiency you shouldn't be using foreign tables
> in the first place; they're always going to be expensive to access.
>
> It's likely that making use of native protocol portals (instead of
> executing a lot of FETCH commands) would help. But I think we'd be well
> advised to do the first pass with just the existing libpq facilities,
> and then measure to see where to improve performance.
I long for protocol-level cursor. :)
--
Shigeru Hanada