Re: WIP: Collecting statistics on CSV file data - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: WIP: Collecting statistics on CSV file data
Date
Msg-id 4F3E06DA.5060108@lab.ntt.co.jp
Whole thread Raw
In response to Re: WIP: Collecting statistics on CSV file data  (Shigeru Hanada <shigeru.hanada@gmail.com>)
List pgsql-hackers
Hi Hanada-san,

Sorry for the late response.

(2012/02/10 22:05), Shigeru Hanada wrote:
> (2011/12/15 11:30), Etsuro Fujita wrote:
>> (2011/12/14 15:34), Shigeru Hanada wrote:
>>> I think this patch could be marked as "Ready for committer" with some
>>> minor fixes.  Please find attached a revised patch (v6.1).
> 
> I've tried to make pgsql_fdw work with this feature, and found that few
> static functions to be needed to exported to implement ANALYZE handler
> in short-cut style.  The "Short-cut style" means the way to generate
> statistics (pg_class and pg_statistic) for foreign tables without
> retrieving sample data from foreign server.

That's great!  Here is my review.

The patch applies with some modifications and compiles cleanly.  But
regression tests on subqueries failed in addition to role related tests
as discussed earlier.

While I've not looked at the patch in detail, I have some comments:

1. The patch might need codes to handle the irregular case where
ANALYZE-related catalog data such as attstattarget are different between
the local and the remote. (Although we don't have the options to set
such a data on a foreign table in ALTER FOREIGN TABLE.)  For example,
while attstattarget = -1 for some column on the local, attstattarget = 0
for that column on the remote meaning that there can be no stats
available for that column.  In such a case it would be better to inform
the user of it.

2. It might be better for the FDW to estimate the costs of a remote
query for itself without doing EXPLAIN if stats were available using
this feature.  While this approach is less accurate compared to the
EXPLAIN approach due to the lack of information such as seq_page_cost or
randam_page_cost on the remote, it is cheaper!  I think such a
information may be added to generic options for a foreign table, which
may have been previously discussed.

3.
> In implementing ANALYZE handler, hardest part was copying anyarray
> values from remote to local.  If we can make it common in core, it would
> help FDW authors who want to implement ANALYZE handler without
> retrieving sample rows from remote server.

+1 from me.

Best regards,
Etsuro Fujita


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: MySQL search query is not executing in Postgres DB
Next
From: David Smith
Date:
Subject: Re: Regex code versus Unicode chars beyond codepoint 255