Home > mailing lists

Re: question about index cost estimates - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: question about index cost estimates
Date	May 17, 2000 22:35:58
Msg-id	21125.958617333@sss.pgh.pa.us Whole thread Raw
In response to	Re: question about index cost estimates (Jeff Hoffmann <jeff@propertykey.com>)
List	pgsql-hackers

Tree view

Jeff Hoffmann <jeff@propertykey.com> writes:
> i understand that, but still, if you have a tuple thats, say, 144 bytes,
> you're going to have a higher chance of revisiting the same page than if
> the tuple is 4k.

Are you?  If you are pulling ten tuples from a thousand-page relation,
seems to me the odds of revisiting any one page are somewhere around
0.01 (too lazy to do the exact combinatorics right now).  Doesn't really
matter what the average tuple size is --- or if you prefer, we already
accounted for that because we are looking at target tuples divided by
total pages rather than target tuples divided by total tuples.

> maybe my question should be the slope of the curve.  it turns out that
> playing around with my queries, it seems like when i was selecting about
> 10% of the records, the page fetch estimate is pretty accurate, although
> when the selectivity falls to significantly below that, the page fetch
> estimate stays quite a bit higher than actual.  

Hmm.  I would think that the behavior for very low selectivity ought to
be pretty close to the model: it's hard to see how it can be anything
but one page fetch per selected tuple, unless you are seeing strong
clustering effects.  I do *not* claim that the slope of the curve is
necessarily right for higher selectivities though... that needs to be
looked at.

> part of what i'm doing is looking at the executor stats for each query. 
> can you explain what the shared blocks, local blocks, and direct blocks
> mean?

The filesystem blocks in/out are from the kernel getrusage() call.
On my box, at least, these seem to mean physical I/O operations
initiated on behalf of the process --- AFAICT, touching a page that
is already in kernel disk buffers does not increment the filesystem
blocks count.

The other numbers are from PrintBufferUsage() in bufmgr.c.  It looks
like the shared block counts are the number of block read and write
requests made to the kernel by bufmgr.c, and the hit rate indicates
the fraction of buffer fetch requests made to bufmgr.c that were
satisfied in Postgres' own buffer area (ie, without a kernel request).

The local block counts are the same numbers for non-shared relations,
which basically means tables created in the current transaction.
They'll probably be zero in most cases of practical interest.

The "direct" counts seem to be broken at the moment --- I can't find
any code that increments them.  It looks like the intent was to count
block I/O operations on temporary files (sorttemp and suchlike).
This is not of interest for pure indexscans...

> are the shared blocks shared amongst all of the backends or does each
> have its own pool?

Shared among all --- that's what the shared memory block is (mostly)
for...

> i'm getting a 90%+ buffer hit rate on index scans,
> but i'm assuming that's because i'm the only one doing anything on
> this machine right now and that would go down with more processes.

It also suggests that your test table isn't much bigger than the buffer
cache ;-) --- or at least the part of it that you're touching isn't.

> i'm still taking everything with a grain of salt until i can
> explain it, though.

Good man.  I don't think anyone but me has looked at this stuff in a
long while, and it could use review.
        regards, tom lane

pgsql-hackers by date:

From: Bruce Momjian
Date: 17 May 2000, 22:34:00
Subject: Re: table level locking different in 7.0?

From: Tom Lane
Date: 17 May 2000, 23:26:58
Subject: Re: AW: type conversion discussion

Re: question about index cost estimates - Mailing list pgsql-hackers

Previous

Next