Re: Planner enhancement suggestion. - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: Planner enhancement suggestion.
Date
Msg-id 20060307001913.GS51968@pervasive.com
Whole thread Raw
In response to Planner enhancement suggestion.  (PFC <lists@peufeu.com>)
Responses Re: Planner enhancement suggestion.
List pgsql-performance
On Sun, Mar 05, 2006 at 10:00:25PM +0100, PFC wrote:
>
>     Bitmap index scan is bliss. Many thanks to the postgres team ! Now
> searching in tables with a lot of fields and conditions is no longer a
> pain.
>
>     And just a thought :
>
>     SELECT * FROM table WHERE category IN (1,2,3) ORDER BY price LIMIT
>     10;
>
>     Suppose you have an index on category, and another index on price.
> Depending on the stats postgres has about the values, you'll either get :
>
>     0- seq scan + sort
>     1- Plain or Bitmap Index scan using "category", then sort by "price"
>     2- Index scan on "price", Filter on "category IN (1,2,3)", no sort.
>
>     1 is efficient if the category is rare. Postgres knows this and uses
>     this  plan well.
>     Without a LIMIT, option 1 should be preferred.
>
>     2 is efficient if the items in the categories 1,2,3 are cheap (close
>     to  the start of the index on price). However if the items in question are
> on  the other side of the index, it will index-scan a large part of the
> table.  This can be a big hit. Postgres has no stats about the correlation
> of  "category" and "price", so it won't know when there is going to be a
> problem.
>
>     Another option would be interesting. It has two steps :
>
>     - Build a bitmap using the index on "category" (just like in case 1)
>     so we know which pages on the table have relevant rows
>
>     - Index scan on "price", but only looking in the heap for pages
>     which are  flagged in the bitmap, and then "Recheck Cond" on "category".
>     In other words, do an index scan to get the rows in the right order,
>     but  don't bother to check the heap for pages where the bitmap says there
> are  no rows.
>     In the worst case, you still have to run through the entire index,
>     but at  least not through the entire table !
>
>     It can also speed up some merge joins.

The problem is that you're now talking about doing 2 index scans instead
of just one and a sort. If the correlation on price is high, it could
still win. As the cost estimator for index scan stands right now,
there's no way such a plan would be chosen unless correlation was
extremely high, however.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

pgsql-performance by date:

Previous
From: "i.v.r."
Date:
Subject: Help understanding indexes, explain, and optimizing a query
Next
From: Chris
Date:
Subject: Re: Help understanding indexes, explain, and optimizing