Thread: AW: AW: More Performance

AW: AW: More Performance

From
Zeugswetter Andreas SB
Date:
> We could ensure that the planner still picks an indexscan as the known
> table size grows by reducing the default selectivity estimate a little
> bit (I experimented and determined that 0.005 would do the trick, for
> the current cost model parameters).  That's pretty ad-hoc, but then
> the 0.01 number is pretty ad-hoc too.  It's probably better to be able
> to say "if you haven't done VACUUM ANALYZE, you will get an indexscan
> from WHERE col = const" than to have to say "it depends".  Comments?

Imho the initial goal why we said 0.01, was to make it use the index,
so reducing it to 0.005 would be ok. I would actually try to calculate 
the value with the current costs for index vs seq scan, so that it
guarantees
use of the index regardless of table size.
But, it probably shows a problem with the chosen metric for selectivity
itself.
Imho the chances are better, that an = restriction will return an equal
amount 
of rows while the table grows than that it will return a percentage of total
table size.

> 
> Of course the real issue here is that the true selectivity of '=' is
> much smaller in this table, because the column being looked at is
> unique.  But the planner doesn't know that without VACUUM stats.
> 
> A hack I have been thinking about adding is that CREATE UNIQUE INDEX
> for a single-column index should immediately force the attdisbursion
> value for that column to "unique", so that the planner would know the
> column is unique even without VACUUM ANALYZE.  That would help not
> at all for the MySQL benchmark (it has a two-column unique index,
> but you can't conclude anything about the properties of either column
> individually from that :-().  But it'd help in a lot of real-world
> scenarios.

Yes, that would imho be a real winner. 
For the multi column index we would need some magic that actually notices 
that all index columns are restricted with =. 

Andreas


Re: AW: AW: More Performance

From
Tom Lane
Date:
Zeugswetter Andreas SB <ZeugswetterA@Wien.Spardat.at> writes:
> But, it probably shows a problem with the chosen metric for
> selectivity itself.  Imho the chances are better, that an =
> restriction will return an equal amount of rows while the table grows
> than that it will return a percentage of total table size.

Unfortunately you are allowing your thinking to be driven by a single
example.  Consider queries likeselect * from employees where dept = 'accounting';
It's perfectly possible that the column being tested with '=' has only
a small number of distinct values, in which case the number of retrieved
rows probably *is* proportional to the table size.

I am not willing to change the planner so that it "guarantees" to choose
an indexscan no matter what, because then it would be broken for cases
like this.  We have to look at the statistics we have, inadequate though
they are.
        regards, tom lane