Re: what's going on here? - Mailing list pgsql-general

From Tom Lane
Subject Re: what's going on here?
Date
Msg-id 9425.984177173@sss.pgh.pa.us
Whole thread Raw
In response to Re: what's going on here?  (Ben <bench@silentmedia.com>)
Responses Re: what's going on here?
List pgsql-general
Ben <bench@silentmedia.com> writes:
>> I am betting that 1 is the most common value in playlist.stream, or at
>> least is being chosen as the most common value by VACUUM ANALYZE's
>> not-totally-accurate estimation process.  The 2321 rowcount estimate
>> then falls out of the stored statistic for the most common value's
>> frequency.  In the second case, the estimator knows that 2 is *not* the

> Yes, stream=1 is the most common value, but 2321? There are ~7000 rows
> where stream=1.

Well, I did say not totally accurate ;-).  But [digs in code] there is
a hardwired selectivity of 0.333 for the IS NULL clause, so it's really
estimating about 6960 rows matching the stream=1 clause.  That's closer
than I'd have expected, actually, given the inaccuracies in VACUUM's
method of obtaining this estimate.  So the real problem here is the lack
of estimation for IS NULL.

> This is not good news. Most of the reason I have an index on
> playlist.played is to figure out which are null, to limit the results this
> query has to look at.

We don't currently use indexes for IS NULL at all.

> Is this (probably very hard to fix) shortcoming
> addressed in version 7.1, by chance?

No.  The fact that the planner doesn't derive an estimate could be fixed
with not too much work (but it won't happen for 7.1).  Making IS NULL an
indexable condition would affect considerably more code, and it's not
really what's making the difference here anyway --- although it would
allow us to replace the seqscan on playlist with an indexscan, which'd
be even faster.

> This I don't follow. The stream=1 query is returning 10 times the rows -
> shouldn't that make the use of an index even *more* likely?

No.  Your second query is using nested loop with inner indexscan joins.
That's fine for *small* numbers of rows returned by the outer subquery,
but it goes to hell in a handbasket for large numbers of rows.  The
planner is doing the right thing to switch to a heavier-duty plan with
more startup overhead ... or it would be if it had the right idea about
how many rows are going to come out of playlist, that is.

The real issue here is the lack of estimation of IS NULL selectivity.
I was aware that that was broken, but I don't recall having seen a
real-life example before where it really mattered.  I'll try to move
it up a notch or two on my to-do list.  Or if someone else wants to
dive in, feel free...

            regards, tom lane

pgsql-general by date:

Previous
From: "Martin A. Marques"
Date:
Subject: postgres and readline lib
Next
From: "Sam and Lisa Snow"
Date:
Subject: RE: Postgresql.org website search