Re: estimation problems for DISTINCT ON with FDW - Mailing list pgsql-hackers

From Tom Lane
Subject Re: estimation problems for DISTINCT ON with FDW
Date
Msg-id 583486.1593614409@sss.pgh.pa.us
Whole thread Raw
In response to Re: estimation problems for DISTINCT ON with FDW  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Responses Re: estimation problems for DISTINCT ON with FDW  (Etsuro Fujita <etsuro.fujita@gmail.com>)
List pgsql-hackers
Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> On Wed, Jul 1, 2020 at 7:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> +    baserel->tuples = Max(baserel->tuples, baserel->rows);

> for consistency, this should be
>   baserel->tuples = clamp_row_est(baserel->rows / sel);
> where sel is the selectivity of the baserestrictinfo clauses?

If we had the selectivity available, maybe so, but we don't.
(And even less so if we put this logic in the core code.)

Short of sending a whole second query to the remote server, it's
not clear to me how we could get the full table size (or equivalently
the target query's selectivity for that table).  The best we realistically
can do is to adopt pg_class.reltuples if there's been an ANALYZE of
the foreign table.  That case already works (and this proposal doesn't
break it).  The problem is what to do when pg_class.reltuples is zero
or otherwise badly out-of-date.

>> What I'm sort of inclined to do is neither of these exactly, but
>> instead put the
>>     baserel->tuples = Max(baserel->tuples, baserel->rows);
>> clamping behavior into the core code, immediately after the call to
>> GetForeignRelSize.  This'd still let the FDW set baserel->tuples if
>> it has a mind to, while not requiring that; and it prevents the
>> situation where the rows and tuples estimates are inconsistent.

> I'm not sure this would address the inconsistency.  Consider the
> postgres_fdw case where use_remote_estimate is true, and the stats are
> out of date, eg, baserel->tuples copied from pg_class is much larger
> than the actual tuples and hence baserel->rows (I assume here that
> postgres_fdw doesn't do anything about baserel->tuples).  In such a
> case the inconsistency would make the estimate_num_groups() estimate
> more inaccurate.  I think the consistency is the responsibility of the
> FDW rather than the core, so I would vote for the 0002 patch.  Maybe
> I'm missing something.

Nothing about this proposal is stopping the FDW from inserting a better
value for rel->tuples if it's got one.  But it's not necessarily easy
or cheap to get that info.  In any case I think that upgrading the
requirements for what GetForeignRelSize must set is a hard sell.
We certainly could not back-patch a fix that required that, and even
going forward, it seems likely that many FDWs would never get the word.
(Well, maybe we could force the issue by throwing an error if
rel->tuples < rel->rows after GetForeignRelSize, but it's not hard
to imagine that routine testing could fail to trigger such a check.)

            regards, tom lane



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: SQL-standard function body
Next
From: Laurenz Albe
Date:
Subject: Re: Remove Deprecated Exclusive Backup Mode