Re: what's going on here? - Mailing list pgsql-general

From Ben
Subject Re: what's going on here?
Date
Msg-id Pine.LNX.4.10.10103091349171.28803-100000@gilgamesh.eos.SilentMedia.com
Whole thread Raw
In response to Re: what's going on here?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: what's going on here?  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
Re: what's going on here?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Fri, 9 Mar 2001, Tom Lane wrote:

> Ben <bench@silentmedia.com> writes:
> > music=# explain ... where ... playlist.stream=1
> >      ->  Seq Scan on playlist (cost=0.00..300.81 rows=2321 width=20)
>
> > music=# explain ... where ... playlist.stream=2
> >      ->  Seq Scan on playlist  (cost=0.00..300.81 rows=205 width=20)
>
> I am betting that 1 is the most common value in playlist.stream, or at
> least is being chosen as the most common value by VACUUM ANALYZE's
> not-totally-accurate estimation process.  The 2321 rowcount estimate
> then falls out of the stored statistic for the most common value's
> frequency.  In the second case, the estimator knows that 2 is *not* the

Yes, stream=1 is the most common value, but 2321? There are ~7000 rows
where stream=1.

> most common value, but it has absolutely no statistical basis on which
> to guess what the frequency really is.  I think it uses 1/10th of the
> most common frequency for anything that's not the most common value
> (look in utils/adt/selfuncs.c to be sure).  There's probably also some
> contribution from the "playlist.played is null" clause, else the row
> count estimate would be exactly 1/10th as much.  However, I don't
> believe that the thing currently makes any serious effort to gauge the
> selectivity of IS NULL, which is a shame because that would critically
> affect the results here.  (You did say some thousands of rows matching
> the stream=N clause, but only a few matching IS NULL, right?)

This is not good news. Most of the reason I have an index on
playlist.played is to figure out which are null, to limit the results this
query has to look at. Is this (probably very hard to fix) shortcoming
addressed in version 7.1, by chance?

> Given the fundamental difference in this initial row count estimate,
> the large difference in the subsequent join plan structure is not too
> surprising.

This I don't follow. The stream=1 query is returning 10 times the rows -
shouldn't that make the use of an index even *more* likely?

> BTW, you didn't actually say which plan was faster.  Since the second
> one was closer to the true statistic (only a few rows returned from
> playlist), I'm hoping it was faster...

Sorry. Yes, the second one was far faster, thanks to the indices it used.


pgsql-general by date:

Previous
From: "Brent R. Matzelle"
Date:
Subject: Re: (blowfish in Postgres) Undefined symbol -- YAY!
Next
From: Stephan Szabo
Date:
Subject: Re: what's going on here?