Home > mailing lists

Re: Import Statistics in postgres_fdw before resorting to sampling. - Mailing list pgsql-hackers

From	Corey Huinker
Subject	Re: Import Statistics in postgres_fdw before resorting to sampling.
Date	November 22 00:31:23
Msg-id	CADkLM=foU7yK1rfMzqTPiQtfEOGANAS9dHE5-Js6hEXmZiKfow@mail.gmail.com Whole thread Raw
In response to	Re: Import Statistics in postgres_fdw before resorting to sampling. (Etsuro Fujita <etsuro.fujita@gmail.com>)
List	pgsql-hackers

Tree view

Other initial comments:

The commit message says:

This is managed via two new options, fetch_stats and remote_analyze,
both are available at the server level and table level. If fetch_stats
is true, then the ANALYZE command will first attempt to fetch statistics
from the remote table and import those statistics locally.

If remote_analyze is true, and if the first attempt to fetch remote
statistics found no attribute statistics, then an attempt will be made
to ANALYZE the remote table before a second and final attempt to fetch
remote statistics.

If no statistics are found, then ANALYZE will fall back to the normal
behavior of sampling and local analysis.

I think the first step assumes that the remote stats are up-to-date;
if they aren't, it would cause a regression. (If the remote relation
is a plain table, they are likely to be up-to-date, but for example,
if it is a foreign table, it's possible that they are stale.) So how
about making it the user's responsibility to make them up-to-date? If
doing so, we wouldn't need to do the second and third steps anymore,
making the patch simple.

Obviously there is no way to know the quality/freshness of remote stats if they are found.

The analyze option was borne of feedback from other postgres hackers while brainstorming on what this option might look like. I don't think we *need* this extra option for the feature to be a success, but it's relative simplicity did make me want to put it out there to see who else liked it.

On the other hand:

This operation will only work on remote relations that can have stored
statistics: tables, partitioned tables, and materialized views. If the
remote relation is a view then remote fetching/analyzing is just wasted
effort and the user is better of setting fetch_stats to false for that
table.

I'm not sure the waste effort is acceptable; IMO, if the remote table
is a view, I think that the system should detect that in some way, and
then just do the normal ANALYZE processing.

The stats fetch query is pretty light, but I can see fetching the relkind along with the relstats, and making decisions on whether to continue from there, only applying the relstats after attrstats have been successfully applied.

That's it for now.

I'll see what I can do to make that work.

My apologies for the delayed response.

Valuable responses are worth waiting for.

pgsql-hackers by date:

From: Bruce Momjian
Date: 21 November, 23:47:46
Subject: Re: should we have a fast-path planning for OLTP starjoins?

From: Jelte Fennema-Nio
Date: 22 November, 00:39:12
Subject: Re: [PATCH] Reorganize pqcomm.h a bit

Re: Import Statistics in postgres_fdw before resorting to sampling. - Mailing list pgsql-hackers

Previous

Next