Re: pgsql_fdw, FDW for PostgreSQL server - Mailing list pgsql-hackers

From Shigeru HANADA
Subject Re: pgsql_fdw, FDW for PostgreSQL server
Date
Msg-id 4F7BED96.1070504@gmail.com
Whole thread Raw
In response to Re: pgsql_fdw, FDW for PostgreSQL server  ("Albe Laurenz" <laurenz.albe@wien.gv.at>)
Responses Re: pgsql_fdw, FDW for PostgreSQL server
Re: pgsql_fdw, FDW for PostgreSQL server
List pgsql-hackers
(2012/04/03 22:31), Albe Laurenz wrote:
> Shigeru HANADA wrote:
>> Attached are latest version of pgsql_fdw patches.  Note that
>> pgsql_fdw_analyze.patch is only for test the effect of local
> statistics.
>>   Please apply patches in the order below:
>>
>> (1) pgsql_fdw_v18.patch
>> (2) pgsql_fdw_pushdown_v11.patch
>> (3) pgsql_fdw_analyze.patch (if you want to try local stats)
>
> Since Kohei KaiGai doesn't post a review, I'll have a go.
>
> The patch applies and compiles fine without warnings and passes
> regression tests.

Thanks for the review.

> I found bugs in the analyze functions:
>
> In pgsql_fdw_analyze:
> nspname and relname are not initialized to NULL.
> This causes failure if the corresponding option is not set
> on the foreign table.
>
> In store_remote_stats:
> atttypmod is initialized to 0 and never changed.
> This causes the following error for columns of type "interval":
> ERROR:  unrecognized interval typmod: 0

Oops, sorry for silly bugs.  However, we won't need to care them anyway,
because coping remote stats to local side seems not practical way to
obtain local statistics.  As you mentioned in another sub-thread,
copying remote stats contains some design-level problems such as
privileges and version difference.

    (2012/03/28 21:07), Albe Laurenz wrote:
    > I found another limitation of this approach:
    > pgsql_fdw_analyze() has to run as a user who can update
    > pg_statistic, and this user needs a user mapping to a remote
    > user who can read pg_statistic.
    >
    > This is not necessary for normal operation and needs
    > to be configured specifically for getting remote statistics.
    > This is cumbersome, and people might be unhappy to have to
    > create user mappings for highly privileged remote users.

So, if we want to have statistics of remote data on local side, getting
actual data from remote and calculate statistics on local side with same
(or similar) routines as local tables seems better.  I'd like to
separate this issue because it belongs to ANALYZE support patch which is
proposed by Fujita-san.

> During a foreign scan, type input functions are used to convert
> the text representation of values.  If a foreign table is misconfigured,
> you can get error messages from these functions, like:
>
> ERROR:  invalid input syntax for type double precision: "etwas"
> or
> ERROR:  value too long for type character varying(3)
>
> It might me nice for finding problems if the message were
> something like:
>
> ERROR:  cannot convert data in foreign scan of "tablename", column "col"
> in row 42
> DETAIL:  ERROR:  value too long for type character varying(3)

Agreed.  How about showing context information with errcontext() in
addition to main error message?  Of course, identifiers are quoted if
necessary.  This way doesn't need additional PG_TRY block, so overhead
would be relatively cheap.

postgres=# SELECT * FROM ft1 WHERE c1 = 1;  -- ERROR
ERROR:  invalid input syntax for integer: "1970-01-02 17:00:00+09"
CONTEXT:  column c4 of foreign table ft1

Showing index of the row seems overkill, because most cause of this kind
of error is wrong configuration, as you say, and users would be able to
address the issue without knowing which record caused the error.

> As stated previously, I don't think that using local stats on
> foreign tables is a win.  The other patches work fine for me, and
> I'd be happy if that could go into 9.2.

I have opposite opinion on this issue because we need to do some of
filtering on local side.  We can leave cost/rows estimation to remote
side about WHERE expressions which are pushed down, but we need
selectivity of extra filtering done on local side.  For such purpose,
having local stats of foreign data seems reasonable and useful.

Of course, it has downside that we need to execute explicit ANALYZE for
foreign tables which would cause full sequential scan on remote tables,
in addition to ANALYZE for remote tables done on remote side as usual
maintenance work.

> Once the two bugs above are fixed, should I mark it "ready for
> committer"?

Attached patch contains changes below:

pgsql_fdw_v19.patch
  - show context of data conversion error
  - move codes for fetch_count FDW option to option.c
    (refactoring)
pgsql_fdw_pushdown_v12.patch
  - make deparseExpr function static (refactoring)

I also attached pgsql_fdw_analyze for only testing the effect of local
statistics.  It contains both backend's ANALYZE command support and
pgsql_fdw's ANALYZE support.

Regards,
--
Shigeru HANADA

Attachment

pgsql-hackers by date:

Previous
From: Mathieu Desnoyers
Date:
Subject: Scalability micro-conference topic proposals (LPC2012)
Next
From: Dave Page
Date:
Subject: Re: Switching to Homebrew as recommended Mac install?