Thread: Re: [PATCHES] Minimum selectivity estimate for LIKE 'prefix%'

Re: [PATCHES] Minimum selectivity estimate for LIKE 'prefix%'

From
Peter Eisentraut
Date:
Am Donnerstag, 6. März 2008 schrieb Tom Lane:
> What I propose doing about this is a small variant on Peter's original
> suggestion: compute the estimated selectivity for
>         col = 'prefix'
> and clamp the result of prefix_selectivity to be at least that.

OK, first results with this patch are in: The selectivity estimations are 
adjusted nicely, but the cost calculation doesn't change at all.  Before:

Index Scan using foo_idx_3 on foo foo (cost=0.01..6.03 rows=1 width=8)

After:

Index Scan using foo_idx_3 on foo foo (cost=0.01..6.03 rows=627 width=8)

How is that possible?

Btw., the corresponding query plan for the LIKE 'constant' case is:

Index Scan using foo_idx_3 on foo foo (cost=0.00..2527.84 rows=627 width=8)

This is what we had hoped to get in the "after" case.


Re: [PATCHES] Minimum selectivity estimate for LIKE 'prefix%'

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> OK, first results with this patch are in: The selectivity estimations are 
> adjusted nicely, but the cost calculation doesn't change at all.  Before:

I've forgotten the context ... what's the whole query and plan again?
And which PG version exactly?
        regards, tom lane


Re: [PATCHES] Minimum selectivity estimate for LIKE 'prefix%'

From
Peter Eisentraut
Date:
Am Montag, 31. März 2008 schrieb Tom Lane:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > OK, first results with this patch are in: The selectivity estimations are
> > adjusted nicely, but the cost calculation doesn't change at all.  Before:
>
> I've forgotten the context ... what's the whole query and plan again?
> And which PG version exactly?

Please see http://archives.postgresql.org/pgsql-hackers/2008-01/msg00048.php


Re: [PATCHES] Minimum selectivity estimate for LIKE 'prefix%'

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Am Montag, 31. M�rz 2008 schrieb Tom Lane:
>> I've forgotten the context ... what's the whole query and plan again?
>> And which PG version exactly?

> Please see http://archives.postgresql.org/pgsql-hackers/2008-01/msg00048.php

Hm.  Now that I think about it, the index scan cost estimate is made
using a separate estimate of rows fetched (since this will depend on the
specific index qual clauses in use, whereas the overall row estimate for
the relation doesn't vary with index).  For the case at hand, the index
quals that it's looking at are the >= and < clauses with close-together
comparison values, and so it comes out with a rock-bottom rowcount
estimate.  The clamping occuring over in prefix_selectivity isn't
relevant here.

Your original complaint was that the bad overall rowcount estimate was
leading to a bad join plan, and that should be fixed even though the
cost estimate for the indexscan itself is unrealistically small.

Changing the indexscan cost estimate would require patching the main
range-constraint-estimation code in clausesel.c.  I don't see any very
good fix for that, since it has to deal with much more general cases
than this.  In particular it doesn't really know whether it's dealing
with >= or >.
        regards, tom lane