Thread: Index Scan cost expression

Index Scan cost expression

From
Amit Gupta
Date:
While trying to figure out an appropriate cost expression function for
Thick indexes, i learned that we are using Mackert and Lohman formula
(described in their paper "Index Scans Using a Finite LRU Buffer: A
Validated I/O Model", ACM Transactions on Database Systems).
The paper's result is as follows:
# Heap Pages fetched from disk for x index probes =min(2TDx/(2T+Dx), T)            when T <= b2TDx/(2T+Dx)
     when T > b and Dx <= 2Tb/(2T-b)b + (Dx - 2Tb/(2T-b))*(T-b)/T   when T > b and Dx > 2Tb/(2T-b)
 

where,
T = # pages in table
N = # tuples in table
D = avg. number of an index value is repeated in the table.
(duplication factor), and
b buffer/cache size

Please note that the above result only computes _heap_ page reads.


The above expression is used by index_pages_fetched() function to
compute index scan cost. The function however doesn't account for cost
of index page scans. On average an index probe will require (h-1) page
reads from disk, where h is the height of the B-tree (when # index
probes << # index key values).  I can post the details of the
derivation of this result, if required.
I am planning to use a similar expression for Thick indexes cost expressions.


Upon taking a cursory look at the cost functions of other operators, I
realized that available memory (effective_cache_size) is not
considered for estimating the costs of hash/sort/NLjoin/etc. Why is
that the case?

Regards,
Amit
Persistent Systems


Re: Index Scan cost expression

From
Tom Lane
Date:
Amit Gupta <amit.pc.gupta@gmail.com> writes:
> Upon taking a cursory look at the cost functions of other operators, I
> realized that available memory (effective_cache_size) is not
> considered for estimating the costs of hash/sort/NLjoin/etc. Why is
> that the case?

The relevant number for those is work_mem not effective_cache_size.
        regards, tom lane


Re: Index Scan cost expression

From
Gregory Stark
Date:
Amit Gupta <amit.pc.gupta@gmail.com> writes:

> While trying to figure out an appropriate cost expression function for
> Thick indexes, i learned that we are using Mackert and Lohman formula
> (described in their paper "Index Scans Using a Finite LRU Buffer: A
> Validated I/O Model", ACM Transactions on Database Systems).
...
> Please note that the above result only computes _heap_ page reads.

Moreover it only models a single index scan. It assumes nothing is cached
prior to the index scan which is very much not true if we're repeatedly
scanning similar ranges of keys.

Omitting the index pages is a very coarse attempt to model the caching across
multiple plan invocations since upper level index pages will nearly always be
cached and even lower index pages have a good chance of being cached.

The problem is that modeling cross-plan-invocation caching is a hard problem
we have few ideas for.

> Upon taking a cursory look at the cost functions of other operators, I
> realized that available memory (effective_cache_size) is not
> considered for estimating the costs of hash/sort/NLjoin/etc. Why is
> that the case?

Well they're all different but I suspect the root of what you're observing are
all the same thing. Cache doesn't affect any of these nodes unless we start
with something in the cache from previous queries and we don't model that. We
assume each query and even each plan node is run on a cold cache.

The problem is that how much to discount the cost of the inner node depends,
not only on the type of node it is, but also on the types of parameters it
will be called with. So it needs very different results for something like a
nested loop between an outer table with few closely spaced values and another
with an outer table with values sparsely spread throughout the inner table.

This is complicated by the fact that the repetitions don't necessarily come
from the parent of the plan in question. You could have, for example, a
subquery several nodes down from the scan that causes repetitions.

I think the only way to tackle it is to come up with some parametric formula
for how much to discount repetitions and carry the parameters of that formula
on every plan node. So when a node needs to add the cost of n repetitions of a
lower node it applies that formula using the parameters advertised by the
sub-node.

The tricky part of that is coming up with a formula and figuring parameters to
model plan nodes using it. Consider that how much to discount repetitions will
depend heavily on what distribution of parameters the plan is executed with.

I'm also not clear what kinds of formulas work for this. It has to be
something that can be composed effectively. That is, even if a plan node
doesn't want to discount itself at all for repetitions it has to include the
discount that any subplans have asked for. For example a large sequential scan
which expects to overflow effective_cache_size might not want to be discounted
at all, but if it has a subquery which scans a small table it will want to
discount that 100% for any repetitions since it'll be cached after the first
scan.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production
Tuning


Re: Index Scan cost expression

From
Gregory Stark
Date:
Amit Gupta <amit.pc.gupta@gmail.com> writes:

>> Moreover it only models a single index scan. It assumes nothing is cached
>>  prior to the index scan which is very much not true if we're repeatedly
>>  scanning similar ranges of keys.
>>
>
> It's reasonable to assume that nothing is cached for estimating the cost.

Not really, but doing otherwise is just hard. There's nothing in the query to
give Postgres a hint about which tables are more heavily used and more likely
to be in cache than others.

> 1 block !/O per index probe is a considerable cost.

Well it's closer to 0 than n...

>> Well they're all different but I suspect the root of what you're observing are
>>  all the same thing. Cache doesn't affect any of these nodes unless we start
>>  with something in the cache from previous queries and we don't model that. We
>>  assume each query and even each plan node is run on a cold cache.
>
> Cost of evaluating operators depend heavily on available cache size,
> which is not considered by the Postgre optimizer at many places. For
> instance,
> - # I/O for sorting = T log_M T/M, where T is size of relation, and M
> is available memory.
>
> However, postgre assumes constant avaliable memory of 1M for sorting.
> Since sort is a blocking operator, which means that exeuction of other
> operators should halt when sorting is in progress, it should be able
> to hog all the available memory.

Well you're free to raise work_mem. Also, while sorting all happens in one
step the memory remains used after the sort is done. 

It's true that if work_mem is set low but there's lots of cache available it
will speed up spilled hashes and on-disk merge sorts. But you would be better
off raising work_mem in those cases. If they're spilling because they don't
fit in RAM at all then will cache still have an effect?

>>  I'm also not clear what kinds of formulas work for this. It has to be
>>  something that can be composed effectively. That is, even if a plan node
>>  doesn't want to discount itself at all for repetitions it has to include the
>>  discount that any subplans have asked for. For example a large sequential scan
>>  which expects to overflow effective_cache_size might not want to be discounted
>>  at all, but if it has a subquery which scans a small table it will want to
>>  discount that 100% for any repetitions since it'll be cached after the first
>>  scan.
>
> We also need robust statistics to feed into these complex cost
> expression for accurate estimation. For instance, Oracle lets user to
> analyze each column to create distribution graph using histograms.
> These histograms is used by the optimizer to figure out exact number
> of rows (and their values ) output by an operator.

Well we certainly compute histograms and use them for selectivity estimates.
The challenge in this case is that you need to combine the distribution from
the outer node with the distribution in the inner node to estimate how much
overlap in disk accesses will result. So you need more than histograms, you
need some kind of cross-table statistics.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production
Tuning