Re: question about index cost estimates - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: question about index cost estimates |
Date | |
Msg-id | 21125.958617333@sss.pgh.pa.us Whole thread Raw |
In response to | Re: question about index cost estimates (Jeff Hoffmann <jeff@propertykey.com>) |
List | pgsql-hackers |
Jeff Hoffmann <jeff@propertykey.com> writes: > i understand that, but still, if you have a tuple thats, say, 144 bytes, > you're going to have a higher chance of revisiting the same page than if > the tuple is 4k. Are you? If you are pulling ten tuples from a thousand-page relation, seems to me the odds of revisiting any one page are somewhere around 0.01 (too lazy to do the exact combinatorics right now). Doesn't really matter what the average tuple size is --- or if you prefer, we already accounted for that because we are looking at target tuples divided by total pages rather than target tuples divided by total tuples. > maybe my question should be the slope of the curve. it turns out that > playing around with my queries, it seems like when i was selecting about > 10% of the records, the page fetch estimate is pretty accurate, although > when the selectivity falls to significantly below that, the page fetch > estimate stays quite a bit higher than actual. Hmm. I would think that the behavior for very low selectivity ought to be pretty close to the model: it's hard to see how it can be anything but one page fetch per selected tuple, unless you are seeing strong clustering effects. I do *not* claim that the slope of the curve is necessarily right for higher selectivities though... that needs to be looked at. > part of what i'm doing is looking at the executor stats for each query. > can you explain what the shared blocks, local blocks, and direct blocks > mean? The filesystem blocks in/out are from the kernel getrusage() call. On my box, at least, these seem to mean physical I/O operations initiated on behalf of the process --- AFAICT, touching a page that is already in kernel disk buffers does not increment the filesystem blocks count. The other numbers are from PrintBufferUsage() in bufmgr.c. It looks like the shared block counts are the number of block read and write requests made to the kernel by bufmgr.c, and the hit rate indicates the fraction of buffer fetch requests made to bufmgr.c that were satisfied in Postgres' own buffer area (ie, without a kernel request). The local block counts are the same numbers for non-shared relations, which basically means tables created in the current transaction. They'll probably be zero in most cases of practical interest. The "direct" counts seem to be broken at the moment --- I can't find any code that increments them. It looks like the intent was to count block I/O operations on temporary files (sorttemp and suchlike). This is not of interest for pure indexscans... > are the shared blocks shared amongst all of the backends or does each > have its own pool? Shared among all --- that's what the shared memory block is (mostly) for... > i'm getting a 90%+ buffer hit rate on index scans, > but i'm assuming that's because i'm the only one doing anything on > this machine right now and that would go down with more processes. It also suggests that your test table isn't much bigger than the buffer cache ;-) --- or at least the part of it that you're touching isn't. > i'm still taking everything with a grain of salt until i can > explain it, though. Good man. I don't think anyone but me has looked at this stuff in a long while, and it could use review. regards, tom lane
pgsql-hackers by date: