Re: Import Statistics in postgres_fdw before resorting to sampling. - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Import Statistics in postgres_fdw before resorting to sampling.
Date
Msg-id CADkLM=foU7yK1rfMzqTPiQtfEOGANAS9dHE5-Js6hEXmZiKfow@mail.gmail.com
Whole thread Raw
In response to Re: Import Statistics in postgres_fdw before resorting to sampling.  (Etsuro Fujita <etsuro.fujita@gmail.com>)
List pgsql-hackers

Other initial comments:

The commit message says:

    This is managed via two new options, fetch_stats and remote_analyze,
    both are available at the server level and table level. If fetch_stats
    is true, then the ANALYZE command will first attempt to fetch statistics
    from the remote table and import those statistics locally.

    If remote_analyze is true, and if the first attempt to fetch remote
    statistics found no attribute statistics, then an attempt will be made
    to ANALYZE the remote table before a second and final attempt to fetch
    remote statistics.

    If no statistics are found, then ANALYZE will fall back to the normal
    behavior of sampling and local analysis.

I think the first step assumes that the remote stats are up-to-date;
if they aren't, it would cause a regression.  (If the remote relation
is a plain table, they are likely to be up-to-date, but for example,
if it is a foreign table, it's possible that they are stale.)  So how
about making it the user's responsibility to make them up-to-date?  If
doing so, we wouldn't need to do the second and third steps anymore,
making the patch simple.

Obviously there is no way to know the quality/freshness of remote stats if they are found.

The analyze option was borne of feedback from other postgres hackers while brainstorming on what this option might look like. I don't think we *need* this extra option for the feature to be a success, but it's relative simplicity did make me want to put it out there to see who else liked it.
 

On the other hand:

    This operation will only work on remote relations that can have stored
    statistics: tables, partitioned tables, and materialized views. If the
    remote relation is a view then remote fetching/analyzing is just wasted
    effort and the user is better of setting fetch_stats to false for that
    table.

I'm not sure the waste effort is acceptable; IMO, if the remote table
is a view, I think that the system should detect that in some way, and
then just do the normal ANALYZE processing.

The stats fetch query is pretty light, but I can see fetching the relkind along with the relstats, and making decisions on whether to continue from there, only applying the relstats after attrstats have been successfully applied.

 
That's it for now.

I'll see what I can do to make that work.
 
My apologies for the delayed response.

Valuable responses are worth waiting for.

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: should we have a fast-path planning for OLTP starjoins?
Next
From: Jelte Fennema-Nio
Date:
Subject: Re: [PATCH] Reorganize pqcomm.h a bit