Re: Why does query planner choose slower BitmapAnd ? - Mailing list pgsql-general

From Jeff Janes
Subject Re: Why does query planner choose slower BitmapAnd ?
Date
Msg-id CAMkU=1yV7WQLetrCVPqn=dTPdNzW3JD29ZsK0zJmgzO2tdcx-Q@mail.gmail.com
Whole thread Raw
In response to Re: Why does query planner choose slower BitmapAnd ?  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Why does query planner choose slower BitmapAnd ?
Re: Why does query planner choose slower BitmapAnd ?
List pgsql-general
On Mon, Feb 22, 2016 at 8:20 AM, Stephen Frost <sfrost@snowman.net> wrote:
>
> Also agreed here, but I've seen field evidence (with reasonable
> configurations) that definitely shows that we're a bit too happy to go
> with a BitmapAnd scan across two indexes where one returns an order of
> magnitude (or less) pages to consider than the other and most of the
> time we spend on the overall query is in going through the index to find
> a bunch of pages we're just going to throw away when we do the AND.
>
> In one specific case which I can recall offhand (having seen it quite
> recently), there was a btree index and a gist index (PostGIS geometry)
> where the btree index pulled back perhaps 100k rows but the gist index
> returned nearly everything (the bounding box included in the query
> covering almost the entire table).  Dropping the gist index greatly
> improved *that* query, but, of course, destroyed the performance of more
> selective queries bounding box queries which didn't include a constraint
> on the column with the btree index (forcing a sequential scan of the
> table).
>
> I've not looked into the specific costing here to see why the BitmapAnd
> ended up being chosen over just doing an index scan with the btree and
> then filtering, but I do believe it to be a problem area that would be
> good to try and improve.  The first question is probably- are we
> properly accounting for the cost of scanning the index vs the cost of
> scanning one index and then applying the filter?

I looked into this before as well, and I think it is vastly
underestimating the cost of adding a bit into the bitmap, near this
comment:

        /*
         * Charge a small amount per retrieved tuple to reflect the costs of
         * manipulating the bitmap.  This is mostly to make sure that a bitmap
         * scan doesn't look to be the same cost as an indexscan to retrieve a
         * single tuple.
         */

It charges 0.1 CPU_operator_cost, while reality seemed to be more like
6 CPU_operator_cost.

The assumption seems to be that this estimate doesn't need to be
accurate, because the cost estimate when the bitmap is *used* will
make up for it.  But when ANDing together bitmaps of very unequal
size, that doesn't happen.

I've been wanting to revisit this in the 9.6 code base, but can't find
my code and notes, so this all from memory.

Cheers,

Jeff


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Why does query planner choose slower BitmapAnd ?
Next
From: Seamus Abshere
Date:
Subject: Re: Why does query planner choose slower BitmapAnd ?