Re: estimation problems for DISTINCT ON with FDW - Mailing list pgsql-hackers

From Tom Lane
Subject Re: estimation problems for DISTINCT ON with FDW
Date
Msg-id 1116564.1593813043@sss.pgh.pa.us
Whole thread Raw
In response to Re: estimation problems for DISTINCT ON with FDW  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Responses Re: estimation problems for DISTINCT ON with FDW
List pgsql-hackers
Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> On Thu, Jul 2, 2020 at 11:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We could perhaps try to make our own estimate of the selectivity of the
>> shippable quals and then back into #1 from the value we got for #2 from
>> the remote server.

> Actually, that is what I suggested:
> ... By "the baserestrictinfo clauses", I mean the shippable clauses as
> well as the non-shippable clauses.  Since baserel->rows stores the
> rows estimate returned by estimate_path_cost_size(), which is #3, this
> estimates #1.

Ah.  That isn't a number we compute in this code path at the moment,
but you're right that we could do so.  However ...

>> But that sounds mighty error-prone, so I doubt it'd
>> make for much of an improvement.

> I have to admit the error-proneness.

... that is the crux of the problem.  The entire reason why we're
expending all these cycles to get a remote estimate is that we don't
trust the local estimate of the shippable quals' selectivity to be
any good.  So relying on it anyway doesn't seem very smart, even if
it's for the usually-not-too-important purpose of estimating the
total table size.

I suppose there is one case where this approach could win: if the
local selectivity estimate is just fine, but the remote table size has
changed a lot since we last did an ANALYZE, then this would give us a
decent table size estimate with no additional remote traffic.  But
that doesn't really seem like a great bet --- if the table size has
changed that much, our local stats are probably obsolete too.

I wonder whether someday we ought to invent a new API that's more
suited to postgres_fdw's needs than EXPLAIN is.  It's not like the
remote planner doesn't know the number we want; it just fails to
include it in EXPLAIN.

>> In any case, the proposal I'm making is just to add a sanity-check
>> clamp to prevent the worst effects of not setting rel->tuples sanely.
>> It doesn't foreclose future improvements inside the FDW.

> Agreed.

OK, I'll go ahead and push the patch I proposed yesterday.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Parallell hashjoin sometimes ignores temp_tablespaces
Next
From: Tom Lane
Date:
Subject: Ideas about a better API for postgres_fdw remote estimates