Re: hashjoin chosen over 1000x faster plan - Mailing list pgsql-performance

From Tom Lane
Subject Re: hashjoin chosen over 1000x faster plan
Date
Msg-id 23650.1192048377@sss.pgh.pa.us
Whole thread Raw
In response to Re: hashjoin chosen over 1000x faster plan  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: hashjoin chosen over 1000x faster plan  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-performance
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> The point I'm trying to make is that at planning time the
> pg_statistic row for this "Charge"."reopHistSeqNo" column showed
> stanullfrac as 0.989; it doesn't seem to have taken this into account
> when making its guess about how many rows would be joined when it was
> compared to the primary key column of the "CaseHist" table.

It certainly does take nulls into account, but the estimate of resulting
rows was still nonzero; and even if it were zero, I'd be very hesitant
to make it choose a plan that is fast only if there were exactly zero
such rows and is slow otherwise.  Most of the complaints we've had about
issues of this sort involve the opposite problem, ie, the planner is
choosing a plan that works well for few rows but falls down because
reality involves many rows.  "Fast-for-few-rows" plans are usually a lot
more brittle than the alternatives in terms of the penalty you pay for
too many rows, and so putting a thumb on the scales to push it towards a
"fast" corner case sounds pretty unsafe to me.

As Simon notes, the only technically sound way to handle this would
involve run-time plan changeover, which is something we're not nearly
ready to tackle.

            regards, tom lane

pgsql-performance by date:

Previous
From: Josh Trutwin
Date:
Subject: Re: Shared Buffer setting in postgresql.conf
Next
From: "Kevin Grittner"
Date:
Subject: Re: hashjoin chosen over 1000x faster plan