Thread: Large index scan perfomance and indexCorrelation (PG 8.1.4 Win32)

Large index scan perfomance and indexCorrelation (PG 8.1.4 Win32)

From
Andrew Sagulin
Date:
Hello all

I have a big amount of phone calls data (280M records, 25 Gbytes).The best decision
for this task is partitioning and I use it now. But at first I tried put all
data in a single table indexed by call date&time. Because of nature of the
data the records clustered by date and near ordered by time.

The table definition:

CREATE DOMAIN datetime AS timestamp NOT NULL;
CREATE DOMAIN cause  AS int2  DEFAULT 16  NOT NULL;
CREATE DOMAIN conn_num  AS varchar(34);
CREATE DOMAIN dur AS int4  NOT NULL;
CREATE DOMAIN lno AS int2;
CREATE DOMAIN tgrp AS char(6);

CREATE TABLE conn
(
  datetime datetime,
  anum conn_num,
  bnum conn_num,
  dur dur,
  itgrp tgrp,
  ilno lno,
  otgrp tgrp,
  olno lno,
  cause cause
)
WITHOUT OIDS;

CREATE INDEX conn_dt
  ON conn
  USING btree
  (datetime);

Usual tasks on the table are export and search calls on one or more days. This
cause the scan of 400K or more records, selected by 'conn_dt' index. The best data
access path is a bitmap heap scan. Tests I've made showed incredible bitmap scan
perfomance almost equal to a seq scan. But PG always prefered simple index scan
which is 20 times slower. Digging in the PG internals brought me to
indexCorrelation. For the 'datetime' column it was about 0,999999. But why despite
of this the index scan was so slow? In the next step I ran

select ctid from conn where ... order by datetime;

Result showed up that there were no page seq scan at all - true random access
only.
The simple model which can explain the situation: the sequence of numbers 2, 1,
4, 3, 6, 5, ..., 100, 99 has correlation about 0,9994. Let's imagine it's the page
order of an index scan. H'm, bad choice, isn't it?

I think indexCorrelation can help to estimate page count but not page
fetch cost. Why not to use formula

min_IO_cost = ceil(indexSelectivity * T) * random_page_cost

instead of

min_IO_cost = ceil(indexSelectivity * T) ?





Re: Large index scan perfomance and indexCorrelation (PG

From
Simon Riggs
Date:
On Tue, 2006-06-27 at 16:14 +0400, Andrew Sagulin wrote:

> Result showed up that there were no page seq scan at all - true random access
> only.
> The simple model which can explain the situation: the sequence of numbers 2, 1,
> 4, 3, 6, 5, ..., 100, 99 has correlation about 0,9994. Let's imagine it's the page
> order of an index scan. H'm, bad choice, isn't it?

Your example is only possible if whole blocks of data were out of order,
which I guess is possible within a multi-million row table. Slightly out
of order values would be ignored, since I/O works at the block rather
than the tuple level.

ANALYZE doesn't cope well with tables as large as you have. It doesn't
sample enough rows, nor does it look within single blocks/groups to
discover anomalies such as yours. As a result, the data that is sampled
looks almost perfectly ordered, though the main bulk is not.

I think what you are also pointing out is that the assumption of the
effects of correlation doesn't match the current readahead logic of
filesystems. If we were to actively force a readahead stride of 2 for
this scan (somehow), then the lack of correlation would disappear
completely. IIRC the current filesystem readahead logic would find that
such a sequence would be seen as random, and so no readahead would be
performed at all - even though the data is highly correlated. That isn't
PostgreSQL's fault directly, since the readahead ought to work better
than it does, but we fail indirectly by relying upon it in this case.

> I think indexCorrelation can help to estimate page count but not page
> fetch cost. Why not to use formula
>
> min_IO_cost = ceil(indexSelectivity * T) * random_page_cost
>
> instead of
>
> min_IO_cost = ceil(indexSelectivity * T) ?

That part is sensible. The min_IO_cost is when the access is sequential,
which by definition has a cost of 1.0.

The bit you might have issue with is how we extrapolate from the
min_IO_cost and correlation to arrive at a cost.

--
  Simon Riggs
  EnterpriseDB          http://www.enterprisedb.com


Large index scan perfomance and indexCorrelation (PG 8.1.4 Win32)

From
Andrew Sagulin
Date:
Wednesday, June 28, 2006, 2:04:17 Simon Riggs, you wrote:

> That part is sensible. The min_IO_cost is when the access is sequential,
> which by definition has a cost of 1.0.

In general - yes. But we talk about the min_IO_cost of the index scan which is
barely sequential. Correct me if I'm wrong: index scan algorithm is something
like this: 'read couple of index pages, read some data pages, index pages, data
pages, ...'. So, the current assumption of min_IO_cost is too optimistic even in
a case of ideal tuple ordering.

> The bit you might have issue with is how we extrapolate from the
> min_IO_cost and correlation to arrive at a cost.

Now index scan cost calculation use indexCorrelation as measure of a tuple
clustering and a degree of their sequentiality (physical ordering). As far
as I know there are cases when this  approach is wrong, for example, my issue or
any other case with high clustering without ordering, where bitmap heap scan is
the best way but PostgreSQL prefer index scan or even sequential scan.

Does PostgreSQL's development team plan to revise the index scan
cost algorithm or issues like mine is too rare for taking into account?




Re: Large index scan perfomance and indexCorrelation (PG 8.1.4 Win32)

From
Tom Lane
Date:
Andrew Sagulin <andrews42@yandex.ru> writes:
> Does PostgreSQL's development team plan to revise the index scan
> cost algorithm or issues like mine is too rare for taking into account?

The algorithm is certainly open for discussion, but we're not changing
it on the basis of just a single report ...

You're mistaken to be fingering min_IO_cost as the source of the issue,
because there is also a correction for near-sequential access in
cost_bitmap_heap_scan.  If we were to bias the system as heavily against
the consideration as you propose, we would logically have to put a
similar bias into cost_bitmap_heap_scan, and you'd probably still end up
with a plain indexscan.  What you need to do is compare the two
functions and figure out what part of the cost models are out of line
with reality.  I tend to agree with the upthread comment that the
nonlinear interpolation between min_IO_cost and max_IO_cost is suspect
... but that may or may not have anything truly to do with your problem.
It might be that cost_index is fine and cost_bitmap_heap_scan is
overcharging.

BTW there are already some changes in HEAD relating to this, please see
the pghackers archives from beginning of June (thread "More thoughts
about planner's cost estimates").

            regards, tom lane

Re: Large index scan perfomance and indexCorrelation (PG 8.1.4 Win32)

From
"Jim C. Nasby"
Date:
On Wed, Jun 28, 2006 at 10:37:24AM -0400, Tom Lane wrote:
> with a plain indexscan.  What you need to do is compare the two
> functions and figure out what part of the cost models are out of line
> with reality.  I tend to agree with the upthread comment that the
> nonlinear interpolation between min_IO_cost and max_IO_cost is suspect

If you're going to make such a comparison (which is badly needed, imho),
http://stats.distributed.net/~decibel/ might be of use to you. It shows
that the nonlinear interpolation between the correlated and uncorrelated
index scans is way off base, at least for this example.

BTW, you'll have a hard time convincing people to increase the cost
estimates of index scans, because experience has shown that they're
already too high (all the discussions about dropping random_page_cost,
for example).
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461