Jeff Hoffmann <jeff@propertykey.com> writes:
> calculation of pages fetched from the relation seems high. i'm not sure
> what the actual should be, but some queries have it at 2-4x the number
> of pages in the relation, which seemed high, so i started looking at
> that.
If the table is bigger than the disk cache, and we are hitting pages
more than once, then we are likely to have to fetch some pages more than
once. The model is assuming that the tuples we want are scattered more
or less randomly over the whole table. If #tuples to be fetched
approaches or exceeds #pages in table, then clearly we are going to be
touching some pages more than once because we want more than one tuple
from them. The $64 question is did the page fall out of buffer cache
between references? Worst case (if your table vastly exceeds your
available buffer space), practically every tuple fetch will require a
separate physical read, and then the number of page fetches is
essentially the number of tuples returned --- which of course can be
a lot more than the number of pages in the table.
Right now these considerations are divided between cost_index() and
cost_nonsequential_access() in a way that might well be wrong.
I've been intending to dig into the literature and try to find a better
cost model but haven't gotten to it yet.
> it just seems to me that there should be some notion of tuple size
> figured in to know how many tuples fit in a page.
It seemed to me that the critical ratios are #tuples fetched vs #pages
in table and table size vs. cache size. I could be wrong though...
regards, tom lane