Re: WIP: Collecting statistics on CSV file data - Mailing list pgsql-hackers

From Shigeru Hanada
Subject Re: WIP: Collecting statistics on CSV file data
Date
Msg-id 4EE5D862.1080809@gmail.com
Whole thread Raw
In response to Re: WIP: Collecting statistics on CSV file data  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Responses Re: WIP: Collecting statistics on CSV file data  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
List pgsql-hackers
(2011/12/09 21:16), Etsuro Fujita wrote:
> I updated the patch.  Please find attached a patch.

I've examined v5 patch, and got reasonable EXPLAIN results which reflect
collected statistics!  As increasing STATISTICS option, estimated rows
become better.  Please see attached stats_*.txt for what I
tested.

  stats_none.txt  : before ANALYZE
  stats_100.txt   : SET STATISTICS = 100 for all columns, and ANALYZE
  stats_10000.txt : SET STATISTICS = 10000 for all columns, and ANALYZE

I think that this patch become ready for committer after some
minor corrections:

Couldn't set n_distinct
=======================
I couldn't set n_distinct to columns of foreign tables.  With some
research, I noticed that ATSimplePermissions should accept
ATT_FOREIGN_TABLE too for that case.  In addition, regression tests for
ALTER FOREIGN TABLE should be added to detect this kind of problem.

Showing stats target
====================
We can see stats target of ordinary tables with \d+, but it is not
available for foreign tables.  I think "Stats target" column should be
added even though output of \d+ for foreign tables become wider.  One
possible idea is to remove useless "Storage" column instead, but views
have that column even though their source could come from foreign tables.

Please see attached patch for these two items.

Comments of FdwRoutine
======================
Mention about optional handler is obsolete.  We should clearly say
AnalyzeForeignTable is optional (can be set to NULL) and rest are
required.  IMO separating them with comment would help FDW authors to
understand requirements, e.g.:

typedef struct FdwRoutine
{
    NodeTag     type;

    /*
     * These Handlers are required to execute simple scan on a foreign
     * table.  If any of them was set to NULL, scan on a foreign table
     * managed by such FDW would fail.
     */
    PlanForeignScan_function PlanForeignScan;
    ExplainForeignScan_function ExplainForeignScan;
    BeginForeignScan_function BeginForeignScan;
    IterateForeignScan_function IterateForeignScan;
    ReScanForeignScan_function ReScanForeignScan;
    EndForeignScan_function EndForeignScan;

    /*
     * Handlers below are optional.  You can set any of them to
     * NULL to tell PostgreSQL that the FDW doesn't have the
     * capability.
     */
    AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;

Code formatting
===============
Some code lines go past 80 columns.

Message style
=============
The terms 'cannot support option "n_distinct"...' used in
ATPrepSetOptions seems little unusual in PostgreSQL.  Should we say
'cannot set "n_distinct_inherited" for foreign tables' for that case?

Typo
====
Typo "spcify" is in document of analyze.

Regards,
--
Shigeru Hanada


Attachment

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Arithmetic operators for macaddr type
Next
From: Mark Cave-Ayland
Date:
Subject: Re: [PATCH] PostgreSQL fails to build with 32bit MinGW-w64