Re: Planner enhancement suggestion. - Mailing list pgsql-performance

From PFC
Subject Re: Planner enhancement suggestion.
Date
Msg-id op.s5111p2fcigqcu@apollo13
Whole thread Raw
In response to Re: Planner enhancement suggestion.  ("Jim C. Nasby" <jnasby@pervasive.com>)
Responses Re: Planner enhancement suggestion.
List pgsql-performance
> The problem is that you're now talking about doing 2 index scans instead
> of just one and a sort.

    It depends on what you call an index scan :
    a- Scanning just the index (no heap lookup) to create a bitmap
    b- Scanning the index and hitting the heap in index order to retrieve the
rows

    (a) should be quite fast, because indexes generally use less space than
the main table, and have good locality of reference. (b) is OK if the
table fits in memory, but if it has to seek on every row from the heap...

    So, when doing :
    SELECT * FROM products WHERE category=C ORDER BY price LIMIT 20;

    If the category contains few products, using the index on category then
sorting is good.
    However, if the category contains many items, postgres is likely to use
the index on price to avoid the sort. It needs to lose time fetching many
rows from the heap which will not be in category C. In that case, I guess
it would be a win to build a bitmap of the pages containing rows which
belongs to category C, and only do the heap lookup on these pages.

    I have a query like that. When category C contains cheap products, the
index scan on price finds them pretty quick. However if it is a category
containing mostly expensive products, the index scan will have to hit most
of the table in order to find them. The time needed for the query for
these two extreme varies from 1 ms to about 20 ms (and that's because the
table is fully cached, or else the worst case would be a lot slower). I
would definitely prefer a constant 2 ms. The other solution is to create
an index on (category,price), but this path leads to lots, lots of indexes
given all the combinations.

    The bitmap trick I proposed in my previous post would be even more
interesting if the table is clustered on category (which seems a
reasonable thing to do).

> If the correlation on price is high, it could
> still win. As the cost estimator for index scan stands right now,
> there's no way such a plan would be chosen unless correlation was
> extremely high, however.

    Does the cost estimator know about this kind of correlation ?








pgsql-performance by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: firebird X postgresql 8.1.2 windows, performance
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Planner enhancement suggestion.