Re: WIP: Collecting statistics on CSV file data - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: WIP: Collecting statistics on CSV file data
Date
Msg-id 4EC60883.2050905@lab.ntt.co.jp
Whole thread Raw
In response to Re: WIP: Collecting statistics on CSV file data  (Shigeru Hanada <shigeru.hanada@gmail.com>)
Responses Re: WIP: Collecting statistics on CSV file data
List pgsql-hackers
(2011/11/07 20:26), Shigeru Hanada wrote:
> (2011/10/20 18:56), Etsuro Fujita wrote:
>> I revised the patch according to Hanada-san's comments. Attached is the
>> updated version of the patch.
>>
>> Changes:
>>
>>     * pull up of logging "analyzing foo.bar"
>>     * new vac_update_relstats always called
>>     * tab-completion in psql
>>     * add "foreign tables are not analyzed automatically..." to 23.1.3
>> Updating Planner Statistics
>>     * some other modifications
>
> Submission review
> =================
>
> - Patch can be applied, and all regression tests passed. :)

Thank you for your testing.  I updated the patch according to your
comments.  Attached is the updated version of the patch.

> - file_fdw_do_analyze_rel is almost copy of do_analyze_rel.  IIUC,
> difference against do_analyze_rel are:
>      * don't logging analyze target
>      * don't switch userid to the owner of target table
>      * don't measure elapsed time for autoanalyze deamon
>      * don't handle index
>      * some comments are removed.
>      * sample rows are acquired by file_fdw's routine
>
> I don't see any problem here, but would you confirm that all of them are
> intentional?

Yes.  But in the updated version, I've refactored analyze.c a little bit
to allow FDW authors to simply call do_analyze_rel().

> - In your design, each FDW have to copy most of do_analyze_rel to their
> own source.  It means that FDW authors must know much details of ANALYZE
> to implement ANALYZE handler.  Actually, your patch exports some static
> functions from analyze.c.  Have you considered hooking
> acquire_sample_rows()?  Such handler should be more simple, and
> FDW-specific.  As you say, such design requires FDWs to skip some
> records, but it would be difficult for some FDW (e.g. twitter_fdw) which
> can't pick sample data up easily.  IMHO such problem *must* be solved by
> FDW itself.

The updated version enables FDW authors to just write their own
acquire_sample_rows().  On the other hand, by retaining to hook
AnalyzeForeignTable routine at analyze_rel(), higher level than
acquire_sample_rows() as before, it allows FDW authors to write
AnalyzeForeignTable routine for foreign tables on a remote server to ask
the server for its current stats instead, as pointed out earlier by Tom
Lane.

Best regards,
Etsuro Fujita

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Inlining comparators as a performance optimisation
Next
From: Jeff Davis
Date:
Subject: Re: Are range_before and range_after commutator operators?