Re: WIP: Collecting statistics on CSV file data - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: WIP: Collecting statistics on CSV file data |
Date | |
Msg-id | 4EE1FC38.5080707@lab.ntt.co.jp Whole thread Raw |
In response to | Re: WIP: Collecting statistics on CSV file data (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>) |
Responses |
Re: WIP: Collecting statistics on CSV file data
|
List | pgsql-hackers |
Hi Hanada-san, I updated the patch. Please find attached a patch. Best regards, Etsuro Fujita > (2011/11/18 21:00), Shigeru Hanada wrote: >> (2011/11/18 16:25), Etsuro Fujita wrote: >>> Thank you for your testing. I updated the patch according to your >>> comments. Attached is the updated version of the patch. >> >> I'd like to share result of my review even though it's not fully >> finished. So far I looked from viewpoint of API design, code >> formatting, and documentation. I'll examine effectiveness of the patch >> and details of implementation next week, and hopefully try writing >> ANALYZE handler for pgsql_fdw :) >> >> New patch has correct format, and it could be applied to HEAD of master >> branch cleanly. All regression tests including those of contrib modules >> have passed. It contains changes of codes and regression tests related >> to the issue, and they have enough comments. IMO the document in this >> patch is not enough to show how to write analyze handler to FDW authors, >> but it can be enhanced afterward. With this patch, FDW author can >> provide optional ANALYZE handler which updates statistics of foreign >> tables. Planner would be able to generate better plan by using statistics. >> >>> Yes. But in the updated version, I've refactored analyze.c a little bit >>> to allow FDW authors to simply call do_analyze_rel(). >> <snip> >>> The updated version enables FDW authors to just write their own >>> acquire_sample_rows(). On the other hand, by retaining to hook >>> AnalyzeForeignTable routine at analyze_rel(), higher level than >>> acquire_sample_rows() as before, it allows FDW authors to write >>> AnalyzeForeignTable routine for foreign tables on a remote server to ask >>> the server for its current stats instead, as pointed out earlier by Tom >>> Lane. >> >> IIUC, this patch offers three options to FDWs: a) set >> AnalyzeForeignTable to NULL to indicate lack of capability, b) provide >> AnalyzeForeignTable which calls do_analyze_rel with custom >> sample_row_acquirer, and c) create statistics data from scratch by FDW >> itself by doing similar things to do_analyze_rel with given argument or >> copying statistics from remote PostgreSQL server. >> >> ISTM that this design is well-balanced between simplicity and >> flexibility. Maybe these three options would suit web-based wrappers, >> file-based or RDBMS wrappers, and pgsql_fdw respectively. I think that >> adding more details of FdwRoutine, such as purpose of new callback >> function and difference from required ones, would help FDW authors, >> including me :) >> >> I have some random comments: >> >> - I think separated typedef of sample_acquire_rows would make codes more >> readable. In addition, parameters of the function should be described >> explicitly. >> - I couldn't see the reason why file_fdw sets ctid of sample tuples, >> though I guess it's for Vitter's random sampling algorithm. If every >> FDW must set valid ctid to sample tuples, it should be mentioned in >> document of AnalyzeForeignTable. Exporting some functions from >> analyze.c relates this issue? >> - Why file_fdw skips sample tuples which have NULL value? AFAIS >> original acquire_sample_rows doesn't do so. >> - Some comment lines go past 80 columns. >> - Some headers included in file_fdw.c seems unnecessary. >> >> Regards, > >
Attachment
pgsql-hackers by date: