Re: Ideas about a better API for postgres_fdw remote estimates - Mailing list pgsql-hackers

From Andrey Lepikhov
Subject Re: Ideas about a better API for postgres_fdw remote estimates
Date
Msg-id 7a40707d-1758-85a2-7bb1-6e5775518e64@postgrespro.ru
Whole thread Raw
In response to Re: Ideas about a better API for postgres_fdw remote estimates  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers

On 7/14/20 6:32 AM, Bruce Momjian wrote:
> On Mon, Jul  6, 2020 at 11:28:28AM -0400, Stephen Frost wrote:
>>> Yeah, thinking about it as a function that inspects partial planner
>>> results, it might be useful for other purposes besides postgres_fdw.
>>> As I said before, I don't think this necessarily has to be bundled as
>>> part of postgres_fdw.  That still doesn't make it part of EXPLAIN.
>>
>> Providing it as a function rather than through EXPLAIN does make a bit
>> more sense if we're going to skip things like the lookups you mention
>> above.  I'm still inclined to have it be a part of core rather than
>> having it as postgres_fdw though.  I'm not completely against it being
>> part of postgres_fdw... but I would think that would really be
>> appropriate if it's actually using something in postgres_fdw, but if
>> everything that it's doing is part of core and nothing related
>> specifically to the postgres FDW, then having it as part of core makes
>> more sense to me.  Also, having it as part of core would make it more
>> appropriate for other tools to look at and adding that kind of
>> inspection capability for partial planner results could be very
>> interesting for tools like pgAdmin and such.
> 
> I agree the statistics extraction should probably be part of core.
> There is the goal if FDWs returning data, and returning the data
> quickly.  I think we can require all-new FDW servers to get improved
> performance.  I am not even clear if we have a full understanding of the
> performance characteristics of FDWs yet.  I know Tomas did some research
> on its DML behavior, but other than that, I haven't seen much.
> 
> On a related note, I have wished to be able to see all the costs
> associated with plans not chosen, and I think others would like that as
> well.  Getting multiple costs for a query goes in that direction.
> 

During the implementation of sharding related improvements i noticed 
that if we use a lot of foreign partitions, we have bad plans because of 
vacuum don't update statistics of foreign tables.This is done by the 
ANALYZE command, but it is very expensive operation for foreign table.
Problem with statistics demonstrates with TAP-test from the first patch 
in attachment.

I implemented some FDW + pg core machinery to reduce weight of the 
problem. The ANALYZE command on foreign table executes query on foreign 
server that extracts statistics tuple, serializes it into json-formatted 
string and returns to the caller. The caller deserializes this string, 
generates statistics for this foreign table and update it. The second 
patch is a proof-of-concept.

This patch speedup analyze command and provides statistics relevance on 
a foreign table after autovacuum operation. Its effectiveness depends on 
relevance of statistics on the remote server, but still.

-- 
regards,
Andrey Lepikhov
Postgres Professional

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: More aggressive vacuuming of temporary tables
Next
From: Dilip Kumar
Date:
Subject: Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING