Re: WIP: Collecting statistics on CSV file data - Mailing list pgsql-hackers
| From | Etsuro Fujita |
|---|---|
| Subject | Re: WIP: Collecting statistics on CSV file data |
| Date | |
| Msg-id | 4EE74C5F.3000309@lab.ntt.co.jp Whole thread Raw |
| In response to | Re: WIP: Collecting statistics on CSV file data (Shigeru Hanada <shigeru.hanada@gmail.com>) |
| Responses |
Re: WIP: Collecting statistics on CSV file data
Re: WIP: Collecting statistics on CSV file data |
| List | pgsql-hackers |
(2011/12/12 19:33), Shigeru Hanada wrote:
> (2011/12/09 21:16), Etsuro Fujita wrote:
>> I updated the patch. Please find attached a patch.
>
> I've examined v5 patch, and got reasonable EXPLAIN results which reflect
> collected statistics! As increasing STATISTICS option, estimated rows
> become better. Please see attached stats_*.txt for what I
> tested.
>
> stats_none.txt : before ANALYZE
> stats_100.txt : SET STATISTICS = 100 for all columns, and ANALYZE
> stats_10000.txt : SET STATISTICS = 10000 for all columns, and ANALYZE
>
> I think that this patch become ready for committer after some
> minor corrections:
>
> Couldn't set n_distinct
> =======================
> I couldn't set n_distinct to columns of foreign tables. With some
> research, I noticed that ATSimplePermissions should accept
> ATT_FOREIGN_TABLE too for that case. In addition, regression tests for
> ALTER FOREIGN TABLE should be added to detect this kind of problem.
>
> Showing stats target
> ====================
> We can see stats target of ordinary tables with \d+, but it is not
> available for foreign tables. I think "Stats target" column should be
> added even though output of \d+ for foreign tables become wider. One
> possible idea is to remove useless "Storage" column instead, but views
> have that column even though their source could come from foreign tables.
>
> Please see attached patch for these two items.
>
> Comments of FdwRoutine
> ======================
> Mention about optional handler is obsolete. We should clearly say
> AnalyzeForeignTable is optional (can be set to NULL) and rest are
> required. IMO separating them with comment would help FDW authors to
> understand requirements, e.g.:
>
> typedef struct FdwRoutine
> {
> NodeTag type;
>
> /*
> * These Handlers are required to execute simple scan on a foreign
> * table. If any of them was set to NULL, scan on a foreign table
> * managed by such FDW would fail.
> */
> PlanForeignScan_function PlanForeignScan;
> ExplainForeignScan_function ExplainForeignScan;
> BeginForeignScan_function BeginForeignScan;
> IterateForeignScan_function IterateForeignScan;
> ReScanForeignScan_function ReScanForeignScan;
> EndForeignScan_function EndForeignScan;
>
> /*
> * Handlers below are optional. You can set any of them to
> * NULL to tell PostgreSQL that the FDW doesn't have the
> * capability.
> */
> AnalyzeForeignTable_function AnalyzeForeignTable;
> } FdwRoutine;
>
> Code formatting
> ===============
> Some code lines go past 80 columns.
>
> Message style
> =============
> The terms 'cannot support option "n_distinct"...' used in
> ATPrepSetOptions seems little unusual in PostgreSQL. Should we say
> 'cannot set "n_distinct_inherited" for foreign tables' for that case?
>
> Typo
> ====
> Typo "spcify" is in document of analyze.
Thank you for your effectiveness experiments and proposals for
improvements. I updated the patch according to your proposals.
Attached is the updated version of the patch.
Best regards,
Etsuro Fujita
> Regards,
>
>
>
>
Attachment
pgsql-hackers by date: