Re: Import Statistics in postgres_fdw before resorting to sampling. - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Import Statistics in postgres_fdw before resorting to sampling.
Date
Msg-id CADkLM=cEZYVP4_36ZdOT+rvj+pgA3LDrukiaczJTH3Pm+Yauug@mail.gmail.com
Whole thread
In response to Re: Import Statistics in postgres_fdw before resorting to sampling.  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Responses Re: Import Statistics in postgres_fdw before resorting to sampling.
List pgsql-hackers
* I proposed the StatisticsAreImportable callback function, but the
ImportStatistics callback function is now allowed to fall back to
analyzing, so I don't think StatisticsAreImportable has much value
anymore.  I'll withdraw my proposal.  Attached is an updated patch
removing it, in which 0001 is merged into 0002 for ease of review
(0001 can't be tested without 0002).

Thanks.


* I modified analyze.c to start progress monitor a bit earlier to
monitor the work of ImportStatistics (and AnalyzeForeignTable) as part
of initialization.

* I noticed this odd behavior:

create table t1 (id int, data text);
create foreign table ft1 (id int, data text) server loopback options
(table_name 't1');

alter foreign table ft1 options (add fetch_stats 'false');
analyze ft1 (ctid);
ERROR:  column "ctid" of relation "ft1" does not exist

Looks good, but:

alter foreign table ft1 options (set fetch_stats 'true');
analyze ft1 (ctid);
NOTICE:  remote table public.t1 has no statistics to import
HINT:  Falling back to ANALYZE with regular row sampling.
ERROR:  column "ctid" of relation "ft1" does not exist

postgres_fdw is doing the remote access without any need to do so,
which isn't good.

The cause is that the patches fail to validate the va_cols list given
to analyze_rel before importing stats.  To fix, I modified analyze.c
to do so before calling ImportStatistics.  I think this is also useful
when analyzing inheritance trees, as it avoids validating the list
repeatedly in that case.

I see this change and I like it.
 
* I separated a bit of code in examine_attribute that checks whether
the given column is analyzable into a new extern function so that FDWs
can use it.

Interesting. I wouldn't have dared make a change that affected regular analyze, but it makes sense.

postgres_fdw side:

* In fetch_remote_statistics, if we get reltuples=0 for v14 or later,
I think we should update only the relation stats with that info, and
avoid resorting to analyzing, for efficiency, as I proposed before.  I
modified that function (and import_fetched_statistics) that way.

This will miss out on the case where the remote table did get analyzed once, when empty, but now isn't empty. I realize that shouldn't happen very often, but the cost of rowsampling a table that is empty is very low.
 
The root cause is that passing NULL to pg_restore_attribute_stats
leaves the stats unchanged.  To fix, I modified
import_fetched_statistics to do pg_clear_attribute_stats before the
restore function.

+1
 
* The matching logic in match_attrmap seems correct to me, but I think
it's still complicated than necessary: you are doing a merge join of
the RemoteAttributeMapping array and the attstats set by placing the
latter in the outer side, but it makes it simpler to do the join the
opposite way (ie, the former in the outer side), like the attached,
which doesn't need an inner for-loop anymore, as the inner side (ie,
the attstats set) consists of *distinct* columns.

The new match_attrmap seems reasonable.
 
* I moved table_has_extended_stats to extended_stats.c and renamed it
to match other functions in it so that other FDWs too can use it.

+1
 
* The 0002 patch is using different levels (LOG, NOTICE, and WARNING)
for logging in postgresImportStatisitcs, but I think it's appropriate
to use elevel or WARNING for it.  So I modified the patch to use
WARNING.  I modified the wording as well to match analyze.c.

+1
 
* I added at the top/end of postgresImportStatisitcs log messages that
show the start/end of importing stats.


+1
 
That's it.

I see that remote_analyze didn't make it as a part of this patch. Is that something you'd repackaged as a follow-on patch, or are you just done with it?

pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: Refactor query normalization into core query jumbling
Next
From: Corey Huinker
Date:
Subject: Re: Import Statistics in postgres_fdw before resorting to sampling.