Re: Ideas about a better API for postgres_fdw remote estimates - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: Ideas about a better API for postgres_fdw remote estimates |
Date | |
Msg-id | 20200706152828.GG3125@tamriel.snowman.net Whole thread Raw |
In response to | Re: Ideas about a better API for postgres_fdw remote estimates (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Ideas about a better API for postgres_fdw remote estimates
|
List | pgsql-hackers |
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> Per postgres_fdw's get_remote_estimate(), the only data we use right now > >> is the startup_cost, total_cost, rows and width estimates from the > >> top-level Plan node. That's available immediately from the Plan tree, > >> meaning that basically *nothing* of the substantial display effort > >> expended by explain.c and ruleutils.c is of any value. So the level-zero > > > The 'display effort' you're referring to, when using JSON format with > > explain, is basically to format the results into JSON and return them- > > which is what you're suggesting this mode would do anyway, no..? > > Not hardly. Spend some time studying ruleutils.c sometime --- > reverse-compiling a plan is *expensive*. For instance, we have to > look up the names of all the operators used in the query quals, > decide what needs quoting, decide what needs parenthesization, etc. Ah, alright, that makes more sense then. > There's also a fun little bit that assigns unique aliases to each > table appearing in the query, which from memory is at least O(N^2) > and maybe worse. (Admittedly, shipped queries are usually not so > complicated that N would be large.) And by the way, we're also > starting up the executor, even if you didn't say ANALYZE. > > A little bit of fooling with "perf" suggests that when explaining > a pretty simple bitmapscan query --- I used > EXPLAIN SELECT * FROM tenk1 WHERE unique1 > 9995 > which ought to be somewhat representative of what postgres_fdw needs > --- only about half of the runtime is spent within pg_plan_query, and > the other half is spent on explain.c + ruleutils.c formatting work. > So while getting rid of that overhead wouldn't be an earthshattering > improvement, I think it'd be worthwhile. Sure. > >> Further down the road, we might want to rethink the whole idea of > >> completely constructing a concrete Plan. We could get the data we need > >> at the list-of-Paths stage. Even more interesting, we could (with very > >> little more work) return data about multiple Paths, so that the client > >> could find out, for example, the costs of sorted and unsorted output > >> without paying two network round trips to discover that. > > > I have to admit that I'm not really sure how we could make it work, but > > having a way to get multiple paths returned by EXPLAIN would certainly > > be interesting to a lot of users. Certainly it's easier to see how we > > could get at that info in a postgres_fdw-specific function, and be able > > to understand how to deal with it there and what could be done, but once > > it's there I wonder if other tools might see that and possibly even > > build on it because it'd be the only way to get that kind of info, which > > certainly wouldn't be ideal. > > Yeah, thinking about it as a function that inspects partial planner > results, it might be useful for other purposes besides postgres_fdw. > As I said before, I don't think this necessarily has to be bundled as > part of postgres_fdw. That still doesn't make it part of EXPLAIN. Providing it as a function rather than through EXPLAIN does make a bit more sense if we're going to skip things like the lookups you mention above. I'm still inclined to have it be a part of core rather than having it as postgres_fdw though. I'm not completely against it being part of postgres_fdw... but I would think that would really be appropriate if it's actually using something in postgres_fdw, but if everything that it's doing is part of core and nothing related specifically to the postgres FDW, then having it as part of core makes more sense to me. Also, having it as part of core would make it more appropriate for other tools to look at and adding that kind of inspection capability for partial planner results could be very interesting for tools like pgAdmin and such. > > That postgres_fdw is an extension is almost as much of a wart as > > anything being discussed here and suggesting that things added to > > postgres_fdw aren't 'core features' seems akin to ignoring the forest > > for the trees- > > I think we just had this discussion in another thread. The fact that > postgres_fdw is an extension is a feature, not a bug, because (a) it > means that somebody could implement their own version if they wanted > it to act differently; and (b) it keeps us honest about whether the > APIs needed by an FDW are accessible from outside core. I think moving > postgres_fdw into core would be a large step backwards. I'm not looking to change it today, as that ship has sailed, but while having FDWs as a general capability that can be implemented by extensions is certainly great and I'd love to see more of that (even better would be more of those that are well maintained and cared for by this community of folks), requiring users to install an extension into every database where they want to query another PG server from isn't a feature. Thanks, Stephen
Attachment
pgsql-hackers by date: