Re: Estimating costs (was Functional Indices) - Mailing list pgsql-general

From Martijn van Oosterhout
Subject Re: Estimating costs (was Functional Indices)
Date
Msg-id 20010524140142.A19798@svana.org
Whole thread Raw
In response to Re: Estimating costs (was Functional Indices)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Estimating costs (was Functional Indices)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Wed, May 23, 2001 at 10:40:38PM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > I'm not sure how common this is (long runs in a foreign key column) and it's
> > probably not worth it in the general case. So, is there a column in
> > pg_statistic where I can twiddle the per-tuple index-scan cost?
>
> You could stick a phony value into the correlation datum.

Ah, that would do it. Would need to experiment. Is this in 7.1 or 7.2?

> > We'd get better results with partial indexes anyway I think.
>
> I'd like to see the partial-index support cranked up again, for sure.
> But how does that solve your problem?  I don't see the connection.

Because the queries are of the form ID1 = 'xxxx' and ID2 is null. So,
improving the index scan would make it use the index for the first clause
and scan for the second (nulls don't appear in indicies, right?)

With a partial index on the ID2 is null clause it could scan that index and
look for tuples with match the first. As indicated on that paper linked to
from the documentation, it also gives a hints to the database where most of
the queries are likely to be directed at.

The first clause matches about 0.04% of rows, the second about 5%. Most of
the time it's fine but when you start summerising data so you need to scan
many of ID1 it decides to start using an sequential scan. Look at the estimates
for the sequential and index scan:

Seq Scan on dailycalls  (cost=0.00..46352.20 rows=1630 width=139)
Index Scan using dailycalls_clid on dailycalls  (cost=0.00..6279.75 rows=1630 width=139)

Yet I can tell you that empirical evidence suggests that the index scan is
at least 50 times faster than the sequential scan (10 seconds to less than
0.2 seconds).

Does the planner take into account that since the total size of the database
is less than the total amount of memory, a lot of the database is likely to
be cached?

Talking about statistics, is there anywhere that counts the number of times
an index has been used? So I can check to see if all my indicies are
worthwhile.

Thanks for listening,
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Estimating costs (was Functional Indices)
Next
From: "Denis A. Doroshenko"
Date:
Subject: Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory