Re: WIP: Collecting statistics on CSV file data - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: WIP: Collecting statistics on CSV file data
Date
Msg-id 4EE1FC38.5080707@lab.ntt.co.jp
Whole thread Raw
In response to Re: WIP: Collecting statistics on CSV file data  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Responses Re: WIP: Collecting statistics on CSV file data  (Shigeru Hanada <shigeru.hanada@gmail.com>)
List pgsql-hackers
Hi Hanada-san,

I updated the patch.  Please find attached a patch.

Best regards,
Etsuro Fujita

> (2011/11/18 21:00), Shigeru Hanada wrote:
>> (2011/11/18 16:25), Etsuro Fujita wrote:
>>> Thank you for your testing.  I updated the patch according to your
>>> comments.  Attached is the updated version of the patch.
>>
>> I'd like to share result of my review even though it's not fully
>> finished.  So far I looked from viewpoint of API design, code
>> formatting, and documentation.  I'll examine effectiveness of the patch
>> and details of implementation next week, and hopefully try writing
>> ANALYZE handler for pgsql_fdw :)
>>
>> New patch has correct format, and it could be applied to HEAD of master
>> branch cleanly.  All regression tests including those of contrib modules
>> have passed.  It contains changes of codes and regression tests related
>> to the issue, and they have enough comments.  IMO the document in this
>> patch is not enough to show how to write analyze handler to FDW authors,
>> but it can be enhanced afterward.  With this patch, FDW author can
>> provide optional ANALYZE handler which updates statistics of foreign
>> tables.  Planner would be able to generate better plan by using statistics.
>>
>>> Yes.  But in the updated version, I've refactored analyze.c a little bit
>>> to allow FDW authors to simply call do_analyze_rel().
>> <snip>
>>> The updated version enables FDW authors to just write their own
>>> acquire_sample_rows().  On the other hand, by retaining to hook
>>> AnalyzeForeignTable routine at analyze_rel(), higher level than
>>> acquire_sample_rows() as before, it allows FDW authors to write
>>> AnalyzeForeignTable routine for foreign tables on a remote server to ask
>>> the server for its current stats instead, as pointed out earlier by Tom
>>> Lane.
>>
>> IIUC, this patch offers three options to FDWs: a) set
>> AnalyzeForeignTable to NULL to indicate lack of capability, b) provide
>> AnalyzeForeignTable which calls do_analyze_rel with custom
>> sample_row_acquirer, and c) create statistics data from scratch by FDW
>> itself by doing similar things to do_analyze_rel with given argument or
>> copying statistics from remote PostgreSQL server.
>>
>> ISTM that this design is well-balanced between simplicity and
>> flexibility.  Maybe these three options would suit web-based wrappers,
>> file-based or RDBMS wrappers, and pgsql_fdw respectively.  I think that
>> adding more details of FdwRoutine, such as purpose of new callback
>> function and difference from required ones, would help FDW authors,
>> including me :)
>>
>> I have some random comments:
>>
>> - I think separated typedef of sample_acquire_rows would make codes more
>> readable.  In addition, parameters of the function should be described
>> explicitly.
>> - I couldn't see the reason why file_fdw sets ctid of sample tuples,
>> though I guess it's for Vitter's random sampling algorithm.  If every
>> FDW must set valid ctid to sample tuples, it should be mentioned in
>> document of AnalyzeForeignTable.  Exporting some functions from
>> analyze.c relates this issue?
>> - Why file_fdw skips sample tuples which have NULL value?  AFAIS
>> original acquire_sample_rows doesn't do so.
>> - Some comment lines go past 80 columns.
>> - Some headers included in file_fdw.c seems unnecessary.
>>
>> Regards,
>
>


Attachment

pgsql-hackers by date:

Previous
From: "Albe Laurenz"
Date:
Subject: Re: review: CHECK FUNCTION statement
Next
From: Heikki Linnakangas
Date:
Subject: Re: Bug in walsender when calling out to do_pg_stop_backup (and others?)