Ideas about a better API for postgres_fdw remote estimates - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Ideas about a better API for postgres_fdw remote estimates |
Date | |
Msg-id | 1155731.1593832096@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Ideas about a better API for postgres_fdw remote estimates
|
List | pgsql-hackers |
In <1116564.1593813043@sss.pgh.pa.us> I wrote: > I wonder whether someday we ought to invent a new API that's more > suited to postgres_fdw's needs than EXPLAIN is. It's not like the > remote planner doesn't know the number we want; it just fails to > include it in EXPLAIN. I've been thinking about this a little more, and I'd like to get some ideas down on electrons before they vanish. The current method for postgres_fdw to obtain remote estimates is to issue an EXPLAIN command to the remote server and then decipher the result. This has just one big advantage, which is that it works against existing, even very old, remote PG versions. In every other way it's pretty awful: it involves a lot of cycles on the far end to create output details we don't really care about, it requires a fair amount of logic to parse that output, and we can't get some details that we *do* care about (such as the total size of the foreign table, as per the other discussion). We can do better. I don't propose removing the existing logic, because being able to work against old remote PG versions seems pretty useful. But we could probe at connection start for whether the remote server has support for a better way, and then use that way if available. What should the better way look like? I suggest the following: * Rather than adding a core-server feature, the remote-end part of the new API should be a function installed by an extension (either postgres_fdw itself, or a new extension "postgres_fdw_remote" or the like). One attraction of this approach is that it'd be conceivable to back-port the new code into existing PG releases by updating the extension. Also there'd be room for multiple versions of the support. The connection-start probe could be of the form "does this function exist in pg_proc?". * I'm imagining the function being of the form function pg_catalog.postgres_fdw_support(query text) returns something where the input is still the text of a query we're considering issuing, and the output is some structure that contains the items of EXPLAIN-like data we need, but not the items we don't. The implementation of the function would run the query through parse/plan, then pick out the data we want and return that. * We could do a lot worse than to have the "structure" be JSON. This'd allow structured, labeled data to be returned; it would not be too difficult to construct, even in PG server versions predating the addition of JSON logic to the core; and the receiving postgres_fdw extension could use the core's JSON logic to parse the data. * The contents of the structure need to be designed with forethought for extensibility, but this doesn't seem hard if it's all a collection of labeled fields. We can just say that the recipient must ignore fields it doesn't recognize. Once a given field has been defined, we can't change its contents, but we can introduce new fields as needed. Note that I would not be in favor of putting an overall version number within the structure; that's way too coarse-grained. I'm not planning to do anything about these ideas myself, at least not in the short term. But perhaps somebody else would like to run with them. regards, tom lane
pgsql-hackers by date: