Re: Cost of sort/order by not estimated by the query planner - Mailing list pgsql-performance

From Tom Lane
Subject Re: Cost of sort/order by not estimated by the query planner
Date
Msg-id 15107.1259773284@sss.pgh.pa.us
Whole thread Raw
In response to Re: Cost of sort/order by not estimated by the query planner  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
Robert Haas <robertmhaas@gmail.com> writes:
> The exact break-even point between the two plans will vary depending
> on what percentage of the rows in the table satisfy the bitmap
> condition.

It's worse than that.  The planner is not too bad about understanding
the percentage-of-rows problem --- at least, assuming you are using
a condition it has statistics for, which it doesn't for bitvector &&.
But whether the indexscan plan is fast will also depend on where the
matching rows are in the index ordering.  If they're all towards the
end you can lose big, and the planner hasn't got stats to let it
predict that.  It just assumes the filter condition is uncorrelated
to the ordering condition.

My own advice would be to forget the bitmap field and see if you can't
use a collection of plain boolean columns instead.  You might still
lose if there's a correlation problem, but "bitfield && B'1'" is
absolutely positively guaranteed to produce stupid row estimates and
hence bad plan choices.

Or you could work on introducing a non-stupid selectivity estimator
for &&, but it's not a trivial project.

            regards, tom lane

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: Cost of sort/order by not estimated by the query planner
Next
From: "Kevin Grittner"
Date:
Subject: Re: Order by (for 15 rows) adds 30 seconds to query time