Re: AW: More Performance - Mailing list pgsql-hackers

From Tom Lane
Subject Re: AW: More Performance
Date
Msg-id 7488.959183094@sss.pgh.pa.us
Whole thread Raw
In response to AW: More Performance  (Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>)
List pgsql-hackers
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:
> There was lots of discussion about this issue, and I was one of those
> who are 100% with you. The result was that Tom Lane tried to provide
> defaults, that do use the indexes when stats are missing. This does work
> quite well, only it seems to fail in this particular case.
> Imho we should look at exactly this query and try to find why it ignores 
> the index, because it should not.

IIRC, the issue was that he had done VACUUM but not VACUUM ANALYZE,
with the result that the planner was working with default selectivity
estimates but non-default knowledge of the table size.

The default selectivity estimate for '=' is 0.01 (ie, 1% of the table
is expected to match).  The MySQL benchmark creates a table that
has 300,000 rows and occupies about 3000 disk pages (24Mb).

With the current cost model for indexscans, the planner does not believe
that an indexscan is a good way to select 1% of the data in this table
--- and I think it's probably right, if you assume that the matching
rows are randomly distributed.  Since there are about 100 rows per disk
block, there is going to be about one matching row per block, implying
that the indexscan is going to have to visit most of the table's pages
anyway.  It's faster to do a seqscan because sequential reads are way
faster than random reads in a typical Unix environment.

We could ensure that the planner still picks an indexscan as the known
table size grows by reducing the default selectivity estimate a little
bit (I experimented and determined that 0.005 would do the trick, for
the current cost model parameters).  That's pretty ad-hoc, but then
the 0.01 number is pretty ad-hoc too.  It's probably better to be able
to say "if you haven't done VACUUM ANALYZE, you will get an indexscan
from WHERE col = const" than to have to say "it depends".  Comments?

Of course the real issue here is that the true selectivity of '=' is
much smaller in this table, because the column being looked at is
unique.  But the planner doesn't know that without VACUUM stats.

A hack I have been thinking about adding is that CREATE UNIQUE INDEX
for a single-column index should immediately force the attdisbursion
value for that column to "unique", so that the planner would know the
column is unique even without VACUUM ANALYZE.  That would help not
at all for the MySQL benchmark (it has a two-column unique index,
but you can't conclude anything about the properties of either column
individually from that :-().  But it'd help in a lot of real-world
scenarios.

Comments anyone?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: [GENERAL] Re: Postgresql OO Patch
Next
From: Vince Vielhaber
Date:
Subject: Re: Re: interactive pgsql book