Re: estimation problems for DISTINCT ON with FDW - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: estimation problems for DISTINCT ON with FDW
Date
Msg-id CAPmGK15afZcgRKPHzn4oZ3aat3qjWWeMWj0nvaEwa8DXeYY7yg@mail.gmail.com
Whole thread Raw
In response to Re: estimation problems for DISTINCT ON with FDW  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: estimation problems for DISTINCT ON with FDW
List pgsql-hackers
On Wed, Jul 1, 2020 at 11:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> > On Wed, Jul 1, 2020 at 7:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> +    baserel->tuples = Max(baserel->tuples, baserel->rows);
>
> > for consistency, this should be
> >   baserel->tuples = clamp_row_est(baserel->rows / sel);
> > where sel is the selectivity of the baserestrictinfo clauses?
>
> If we had the selectivity available, maybe so, but we don't.
> (And even less so if we put this logic in the core code.)
>
> Short of sending a whole second query to the remote server, it's
> not clear to me how we could get the full table size (or equivalently
> the target query's selectivity for that table).  The best we realistically
> can do is to adopt pg_class.reltuples if there's been an ANALYZE of
> the foreign table.  That case already works (and this proposal doesn't
> break it).  The problem is what to do when pg_class.reltuples is zero
> or otherwise badly out-of-date.

In estimate_path_cost_size(), if use_remote_estimate is true, we
adjust the rows estimate returned from the remote server, by factoring
in the selectivity of the locally-checked quals.  I thought what I
proposed above would be more consistent with that.

Best regards,
Etsuro Fujita



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Asynchronous Append on postgres_fdw nodes.
Next
From: Thomas Munro
Date:
Subject: Re: WIP: WAL prefetch (another approach)