Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Wed, Mar 14, 2018 at 2:56 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> Is there any good way to make the regression tests fail if the plan reverts
>> to the bad one? The only thing I can think of would be to make the table
>> bigger so the regression tests becomes "noticeably slower", but that is
>> pretty vague and not user friendly to formally pass and just hope it is slow
>> enough for someone to investigate.
> I can't think of a good way. I guess it can still pick a nested loop
> if it thinks there'll only be a couple of loops. This patch tells it
> to pay attention to the total cost, not the startup cost, so as soon
> as it thinks there is more than a hand full of rows the quadratic cost
> will exceed the sort/merge's logarithmic cost.
Right. After further thought, the key point here is that in non-error
cases the query will produce no rows, meaning that it must be executed
to completion before we can be sure of that. But applying a LIMIT
encourages the planner to pick a fast-start (slow-completion) plan,
which is not going to be what we want. If in fact the query were going
to produce a lot of rows, and the planner could accurately predict that,
then maybe a LIMIT would be useful --- but there's no hope of estimates
on wholerowvar *= wholerowvar being accurate any time soon, let alone
correctly handling the correlation with ctid <> ctid. So the LIMIT
is just an invitation to trouble and we may as well remove it.
Committed that way. I also changed EXISTS(SELECT * ...) to
EXISTS(SELECT 1 ...), in hopes of saving a few microseconds of
parsing effort.
> Since I've had hash joins on the mind recently I couldn't help
> noticing that you can't get a hash join out of this query's "record
> image" based join qual (or even a regular row-based =).
Yeah, because there's no hash support for recordimage. So there's even
less reason to be worried about how smart the planner is for this query:
basically, it might do a nestloop for a very small number of rows, but
otherwise it's gonna have to go for a merge join.
My earlier thought that we might be able to skip the ANALYZE step
seems wrong, though. It's true that it does little for this query,
but the follow-on query to build a diff table can probably make
good use of the stats.
regards, tom lane