Re: Ideas about a better API for postgres_fdw remote estimates - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Ideas about a better API for postgres_fdw remote estimates
Date
Msg-id 20200706152828.GG3125@tamriel.snowman.net
Whole thread Raw
In response to Re: Ideas about a better API for postgres_fdw remote estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Ideas about a better API for postgres_fdw remote estimates
List pgsql-hackers
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> Per postgres_fdw's get_remote_estimate(), the only data we use right now
> >> is the startup_cost, total_cost, rows and width estimates from the
> >> top-level Plan node.  That's available immediately from the Plan tree,
> >> meaning that basically *nothing* of the substantial display effort
> >> expended by explain.c and ruleutils.c is of any value.  So the level-zero
>
> > The 'display effort' you're referring to, when using JSON format with
> > explain, is basically to format the results into JSON and return them-
> > which is what you're suggesting this mode would do anyway, no..?
>
> Not hardly.  Spend some time studying ruleutils.c sometime ---
> reverse-compiling a plan is *expensive*.  For instance, we have to
> look up the names of all the operators used in the query quals,
> decide what needs quoting, decide what needs parenthesization, etc.

Ah, alright, that makes more sense then.

> There's also a fun little bit that assigns unique aliases to each
> table appearing in the query, which from memory is at least O(N^2)
> and maybe worse.  (Admittedly, shipped queries are usually not so
> complicated that N would be large.)  And by the way, we're also
> starting up the executor, even if you didn't say ANALYZE.
>
> A little bit of fooling with "perf" suggests that when explaining
> a pretty simple bitmapscan query --- I used
>     EXPLAIN SELECT * FROM tenk1 WHERE unique1 > 9995
> which ought to be somewhat representative of what postgres_fdw needs
> --- only about half of the runtime is spent within pg_plan_query, and
> the other half is spent on explain.c + ruleutils.c formatting work.
> So while getting rid of that overhead wouldn't be an earthshattering
> improvement, I think it'd be worthwhile.

Sure.

> >> Further down the road, we might want to rethink the whole idea of
> >> completely constructing a concrete Plan.  We could get the data we need
> >> at the list-of-Paths stage.  Even more interesting, we could (with very
> >> little more work) return data about multiple Paths, so that the client
> >> could find out, for example, the costs of sorted and unsorted output
> >> without paying two network round trips to discover that.
>
> > I have to admit that I'm not really sure how we could make it work, but
> > having a way to get multiple paths returned by EXPLAIN would certainly
> > be interesting to a lot of users.  Certainly it's easier to see how we
> > could get at that info in a postgres_fdw-specific function, and be able
> > to understand how to deal with it there and what could be done, but once
> > it's there I wonder if other tools might see that and possibly even
> > build on it because it'd be the only way to get that kind of info, which
> > certainly wouldn't be ideal.
>
> Yeah, thinking about it as a function that inspects partial planner
> results, it might be useful for other purposes besides postgres_fdw.
> As I said before, I don't think this necessarily has to be bundled as
> part of postgres_fdw.  That still doesn't make it part of EXPLAIN.

Providing it as a function rather than through EXPLAIN does make a bit
more sense if we're going to skip things like the lookups you mention
above.  I'm still inclined to have it be a part of core rather than
having it as postgres_fdw though.  I'm not completely against it being
part of postgres_fdw... but I would think that would really be
appropriate if it's actually using something in postgres_fdw, but if
everything that it's doing is part of core and nothing related
specifically to the postgres FDW, then having it as part of core makes
more sense to me.  Also, having it as part of core would make it more
appropriate for other tools to look at and adding that kind of
inspection capability for partial planner results could be very
interesting for tools like pgAdmin and such.

> > That postgres_fdw is an extension is almost as much of a wart as
> > anything being discussed here and suggesting that things added to
> > postgres_fdw aren't 'core features' seems akin to ignoring the forest
> > for the trees-
>
> I think we just had this discussion in another thread.  The fact that
> postgres_fdw is an extension is a feature, not a bug, because (a) it
> means that somebody could implement their own version if they wanted
> it to act differently; and (b) it keeps us honest about whether the
> APIs needed by an FDW are accessible from outside core.  I think moving
> postgres_fdw into core would be a large step backwards.

I'm not looking to change it today, as that ship has sailed, but while
having FDWs as a general capability that can be implemented by
extensions is certainly great and I'd love to see more of that (even
better would be more of those that are well maintained and cared for by
this community of folks), requiring users to install an extension into
every database where they want to query another PG server from isn't a
feature.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: bad status of FETCH PERCENT in commitfest application
Next
From: Alvaro Herrera
Date:
Subject: Re: min_safe_lsn column in pg_replication_slots view