Re: neqjoinsel versus "refresh materialized view concurrently" - Mailing list pgsql-hackers

From Tom Lane
Subject Re: neqjoinsel versus "refresh materialized view concurrently"
Date
Msg-id 334.1521495215@sss.pgh.pa.us
Whole thread Raw
In response to Re: neqjoinsel versus "refresh materialized view concurrently"  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Wed, Mar 14, 2018 at 2:56 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> Is there any good way to make the regression tests fail if the plan reverts
>> to the bad one?  The only thing I can think of would be to make the table
>> bigger so the regression tests becomes "noticeably slower", but that is
>> pretty vague and not user friendly to formally pass and just hope it is slow
>> enough for someone to investigate.

> I can't think of a good way.  I guess it can still pick a nested loop
> if it thinks there'll only be a couple of loops.  This patch tells it
> to pay attention to the total cost, not the startup cost, so as soon
> as it thinks there is more than a hand full of rows the quadratic cost
> will exceed the sort/merge's logarithmic cost.

Right.  After further thought, the key point here is that in non-error
cases the query will produce no rows, meaning that it must be executed
to completion before we can be sure of that.  But applying a LIMIT
encourages the planner to pick a fast-start (slow-completion) plan,
which is not going to be what we want.  If in fact the query were going
to produce a lot of rows, and the planner could accurately predict that,
then maybe a LIMIT would be useful --- but there's no hope of estimates
on wholerowvar *= wholerowvar being accurate any time soon, let alone
correctly handling the correlation with ctid <> ctid.  So the LIMIT
is just an invitation to trouble and we may as well remove it.

Committed that way.  I also changed EXISTS(SELECT * ...) to
EXISTS(SELECT 1 ...), in hopes of saving a few microseconds of
parsing effort.

> Since I've had hash joins on the mind recently I couldn't help
> noticing that you can't get a hash join out of this query's "record
> image" based join qual (or even a regular row-based =).

Yeah, because there's no hash support for recordimage.  So there's even
less reason to be worried about how smart the planner is for this query:
basically, it might do a nestloop for a very small number of rows, but
otherwise it's gonna have to go for a merge join.

My earlier thought that we might be able to skip the ANALYZE step
seems wrong, though.  It's true that it does little for this query,
but the follow-on query to build a diff table can probably make
good use of the stats.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Error detail/hint style fixup
Next
From: "David G. Johnston"
Date:
Subject: Re: Problems with Error Messages wrt Domains, Checks