Re: FDW for PostgreSQL - Mailing list pgsql-hackers

From Kohei KaiGai
Subject Re: FDW for PostgreSQL
Date
Msg-id CADyhKSWxpk4rn+un98NzfoXc=foLPoH9LV5sKjM0NpjvB5qihQ@mail.gmail.com
Whole thread Raw
In response to Re: FDW for PostgreSQL  (Shigeru Hanada <shigeru.hanada@gmail.com>)
Responses Re: FDW for PostgreSQL  (Shigeru Hanada <shigeru.hanada@gmail.com>)
List pgsql-hackers
2012/11/21 Shigeru Hanada <shigeru.hanada@gmail.com>:
> Thank for the comment!
>
> On Tue, Nov 20, 2012 at 10:23 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>>
>> I also think the new "use_remote_explain" option is good. It works fine
>> when we try to use this fdw over the network with latency more or less.
>> It seems to me its default is "false", thus, GetForeignRelSize will return
>> the estimated cost according to ANALYZE, instead of remote EXPLAIN.
>> Even though you mention the default behavior was not changed, is it
>> an expected one? My preference is the current one, as is.
>
>
> Oops, I must have focused on only cost factors.
> I too think that using local statistics is better as default behavior,
> because foreign tables would be used for relatively stable tables.
> If target tables are updated often, it would cause problems about
> consistency, unless we provide full-fledged transaction mapping.
>
>>
>> The deparseFuncExpr() still has case handling whether it is explicit case,
>> implicit cast or regular functions. If my previous proposition has no
>> flaw,
>> could you fix up it using regular function invocation manner? In case when
>> remote node has incompatible implicit-cast definition, this logic can make
>> a problem.
>
>
> Sorry, I overlooked this issue. Fixed to use function call notation
> for all of explicit function calls, explicit casts, and implicit casts.
>
>> At InitPostgresFdwOptions(), the source comment says we don't use
>> malloc() here for simplification of code. Hmm. I'm not sure why it is more
>> simple. It seems to me we have no reason why to avoid malloc here, even
>> though libpq options are acquired using malloc().
>
>
> I used "simple" because using palloc avoids null-check and error handling.
> However, many backend code use malloc to allocate memory which lives
> as long as backend process itself, so I fixed.
>
>>
>> Regarding to the regression test.
>
>  [snip]
>>
>> I guess this test tries to check a case when remote column has
>> incompatible
>> data type with local side. Please check timestamp_out(). Its output
>> format follows
>> "datestyle" setting of GUC, and it affects OS configuration on initdb
>> time.
>> (Please grep "datestyle" at initdb.c !) I'd like to recommend to use
>> another data type
>> for this regression test to avoid false-positive detection.
>
>
> Good catch.  :)
> I fixed the test to use another data type, user defined enum.
>
One other thing I noticed.

At execute_query(), it stores the retrieved rows onto tuplestore of
festate->tuples at once. Doesn't it make problems when remote-
table has very big number of rows?

IIRC, the previous code used cursor feature to fetch a set of rows
to avoid over-consumption of local memory. Do we have some
restriction if we fetch a certain number of rows with FETCH?
It seems to me, we can fetch 1000 rows for example, and tentatively
store them onto the tuplestore within one PG_TRY() block (so, no
need to worry about PQclear() timing), then we can fetch remote
rows again when IterateForeignScan reached end of tuplestore.

Please point out anything if I missed something.

Anyway, I'll check this v4 patch simultaneously.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: logical changeset generation v3
Next
From: Boszormenyi Zoltan
Date:
Subject: Re: [PATCH] Make pg_basebackup configure and start standby [Review]