Ideas about a better API for postgres_fdw remote estimates - Mailing list pgsql-hackers

From Tom Lane
Subject Ideas about a better API for postgres_fdw remote estimates
Date
Msg-id 1155731.1593832096@sss.pgh.pa.us
Whole thread Raw
Responses Re: Ideas about a better API for postgres_fdw remote estimates
List pgsql-hackers
In <1116564.1593813043@sss.pgh.pa.us> I wrote:
> I wonder whether someday we ought to invent a new API that's more
> suited to postgres_fdw's needs than EXPLAIN is.  It's not like the
> remote planner doesn't know the number we want; it just fails to
> include it in EXPLAIN.

I've been thinking about this a little more, and I'd like to get some
ideas down on electrons before they vanish.

The current method for postgres_fdw to obtain remote estimates is to
issue an EXPLAIN command to the remote server and then decipher the
result.  This has just one big advantage, which is that it works
against existing, even very old, remote PG versions.  In every other
way it's pretty awful: it involves a lot of cycles on the far end
to create output details we don't really care about, it requires a
fair amount of logic to parse that output, and we can't get some
details that we *do* care about (such as the total size of the foreign
table, as per the other discussion).

We can do better.  I don't propose removing the existing logic, because
being able to work against old remote PG versions seems pretty useful.
But we could probe at connection start for whether the remote server
has support for a better way, and then use that way if available.

What should the better way look like?  I suggest the following:

* Rather than adding a core-server feature, the remote-end part of the new
API should be a function installed by an extension (either postgres_fdw
itself, or a new extension "postgres_fdw_remote" or the like).  One
attraction of this approach is that it'd be conceivable to back-port the
new code into existing PG releases by updating the extension.  Also
there'd be room for multiple versions of the support.  The
connection-start probe could be of the form "does this function exist
in pg_proc?".

* I'm imagining the function being of the form

    function pg_catalog.postgres_fdw_support(query text) returns something

where the input is still the text of a query we're considering issuing,
and the output is some structure that contains the items of EXPLAIN-like
data we need, but not the items we don't.  The implementation of the
function would run the query through parse/plan, then pick out the
data we want and return that.

* We could do a lot worse than to have the "structure" be JSON.
This'd allow structured, labeled data to be returned; it would not be
too difficult to construct, even in PG server versions predating the
addition of JSON logic to the core; and the receiving postgres_fdw
extension could use the core's JSON logic to parse the data.

* The contents of the structure need to be designed with forethought
for extensibility, but this doesn't seem hard if it's all a collection
of labeled fields.  We can just say that the recipient must ignore
fields it doesn't recognize.  Once a given field has been defined, we
can't change its contents, but we can introduce new fields as needed.
Note that I would not be in favor of putting an overall version number
within the structure; that's way too coarse-grained.

I'm not planning to do anything about these ideas myself, at least
not in the short term.  But perhaps somebody else would like to
run with them.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: estimation problems for DISTINCT ON with FDW
Next
From: Pavel Stehule
Date:
Subject: Re: track_planning causing performance regression