Re: WIP: Collecting statistics on CSV file data - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: WIP: Collecting statistics on CSV file data
Date
Msg-id 4EE74C5F.3000309@lab.ntt.co.jp
Whole thread Raw
In response to Re: WIP: Collecting statistics on CSV file data  (Shigeru Hanada <shigeru.hanada@gmail.com>)
Responses Re: WIP: Collecting statistics on CSV file data
Re: WIP: Collecting statistics on CSV file data
List pgsql-hackers
(2011/12/12 19:33), Shigeru Hanada wrote:
> (2011/12/09 21:16), Etsuro Fujita wrote:
>> I updated the patch.  Please find attached a patch.
>
> I've examined v5 patch, and got reasonable EXPLAIN results which reflect
> collected statistics!  As increasing STATISTICS option, estimated rows
> become better.  Please see attached stats_*.txt for what I
> tested.
>
>    stats_none.txt  : before ANALYZE
>    stats_100.txt   : SET STATISTICS = 100 for all columns, and ANALYZE
>    stats_10000.txt : SET STATISTICS = 10000 for all columns, and ANALYZE
>
> I think that this patch become ready for committer after some
> minor corrections:
>
> Couldn't set n_distinct
> =======================
> I couldn't set n_distinct to columns of foreign tables.  With some
> research, I noticed that ATSimplePermissions should accept
> ATT_FOREIGN_TABLE too for that case.  In addition, regression tests for
> ALTER FOREIGN TABLE should be added to detect this kind of problem.
>
> Showing stats target
> ====================
> We can see stats target of ordinary tables with \d+, but it is not
> available for foreign tables.  I think "Stats target" column should be
> added even though output of \d+ for foreign tables become wider.  One
> possible idea is to remove useless "Storage" column instead, but views
> have that column even though their source could come from foreign tables.
>
> Please see attached patch for these two items.
>
> Comments of FdwRoutine
> ======================
> Mention about optional handler is obsolete.  We should clearly say
> AnalyzeForeignTable is optional (can be set to NULL) and rest are
> required.  IMO separating them with comment would help FDW authors to
> understand requirements, e.g.:
>
> typedef struct FdwRoutine
> {
>      NodeTag     type;
>
>      /*
>       * These Handlers are required to execute simple scan on a foreign
>       * table.  If any of them was set to NULL, scan on a foreign table
>       * managed by such FDW would fail.
>       */
>      PlanForeignScan_function PlanForeignScan;
>      ExplainForeignScan_function ExplainForeignScan;
>      BeginForeignScan_function BeginForeignScan;
>      IterateForeignScan_function IterateForeignScan;
>      ReScanForeignScan_function ReScanForeignScan;
>      EndForeignScan_function EndForeignScan;
>
>      /*
>       * Handlers below are optional.  You can set any of them to
>       * NULL to tell PostgreSQL that the FDW doesn't have the
>       * capability.
>       */
>      AnalyzeForeignTable_function AnalyzeForeignTable;
> } FdwRoutine;
>
> Code formatting
> ===============
> Some code lines go past 80 columns.
>
> Message style
> =============
> The terms 'cannot support option "n_distinct"...' used in
> ATPrepSetOptions seems little unusual in PostgreSQL.  Should we say
> 'cannot set "n_distinct_inherited" for foreign tables' for that case?
>
> Typo
> ====
> Typo "spcify" is in document of analyze.

Thank you for your effectiveness experiments and proposals for
improvements.  I updated the patch according to your proposals.
Attached is the updated version of the patch.

Best regards,
Etsuro Fujita

> Regards,
>
>
>
>


Attachment

pgsql-hackers by date:

Previous
From: Shigeru Hanada
Date:
Subject: Re: pgsql_fdw, FDW for PostgreSQL server
Next
From: Alex Goncharov
Date:
Subject: libpq: PQcmdStatus, PQcmdTuples signatures can be painlessly improved