Re: question about index cost estimates - Mailing list pgsql-hackers

From Jeff Hoffmann
Subject Re: question about index cost estimates
Date
Msg-id 39234018.550E0BE6@propertykey.com
Whole thread Raw
In response to question about index cost estimates  (Jeff Hoffmann <jeff@propertykey.com>)
Responses Re: question about index cost estimates
List pgsql-hackers
Tom Lane wrote:
> If the table is bigger than the disk cache, and we are hitting pages
> more than once, then we are likely to have to fetch some pages more than
> once.  The model is assuming that the tuples we want are scattered more
> or less randomly over the whole table.  If #tuples to be fetched
> approaches or exceeds #pages in table, then clearly we are going to be
> touching some pages more than once because we want more than one tuple
> from them.

i understand that, but still, if you have a tuple thats, say, 144 bytes,
you're going to have a higher chance of revisiting the same page than if
the tuple is 4k.  it may be something that works itself out of the
equation somewhere. 
> The $64 question is did the page fall out of buffer cache
> between references?  Worst case (if your table vastly exceeds your
> available buffer space), practically every tuple fetch will require a
> separate physical read, and then the number of page fetches is
> essentially the number of tuples returned --- which of course can be
> a lot more than the number of pages in the table.

i understand that, too.  i understand the need for a curve like that,
maybe my question should be the slope of the curve.  it turns out that
playing around with my queries, it seems like when i was selecting about
10% of the records, the page fetch estimate is pretty accurate, although
when the selectivity falls to significantly below that, the page fetch
estimate stays quite a bit higher than actual.  

part of what i'm doing is looking at the executor stats for each query. 
can you explain what the shared blocks, local blocks, and direct blocks
mean?  i'm assuming that the shared blocks refers to the reads to the
database files & indexes.  what do the other mean?  are the shared
blocks shared amongst all of the backends or does each have its own
pool?  i'm getting a 90%+ buffer hit rate on index scans, but i'm
assuming that's because i'm the only one doing anything on this machine
right now and that would go down with more processes.

> 
> Right now these considerations are divided between cost_index() and
> cost_nonsequential_access() in a way that might well be wrong.
> I've been intending to dig into the literature and try to find a better
> cost model but haven't gotten to it yet.

well, there have been a lot of things that i've looked at and thought
"that doesn't sound right", like for example the random page access
cost, which defaults to 4.0.  i thought that was too high, but i ran
bonnie on my hard drive and it looks like that actual value is around
3.9.  so now i'm a believer.  somebody must have put some thought into
somewhere.  i'm still taking everything with a grain of salt until i can
explain it, though.

jeff


pgsql-hackers by date:

Previous
From: Jim Mercer
Date:
Subject: Re: table level locking different in 7.0?
Next
From: Tom Lane
Date:
Subject: Re: table level locking different in 7.0?