Thread: Inaccurate Rows estimate for "Bitmap And" causes Planner to choosewrong join

Inaccurate Rows estimate for "Bitmap And" causes Planner to choosewrong join

From

Steve Pritchard

Date:

06 May 2020, 16:19:48

Version: Postgres 9.6.3 production system (but also tested on Postgres 12)

For my query the Planner is sometimes choosing an execution plan that uses "Bitmap And" (depending on the parameters):

-> Bitmap Heap Scan on observation (cost=484.92..488.93 rows=1 width=203) (actual time=233.129..330.886 rows=15636 loops=1)
Recheck Cond: (((user_id)::text = 'USER123'::text) AND ((loc_id)::text = ANY ('{LOC12345678}'::text[])))
Filter: ((taxa)::text = 'Birds'::text)
Rows Removed by Filter: 3
Heap Blocks: exact=1429
Buffers: shared hit=721 read=944
-> BitmapAnd (cost=484.92..484.92 rows=1 width=0) (actual time=232.888..232.888 rows=0 loops=1)
Buffers: shared hit=3 read=233
-> Bitmap Index Scan on indx_observation_user_id (cost=0.00..81.14 rows=3277 width=0) (actual time=169.003..169.003 rows=32788 loops=1)
Index Cond: ((user_id)::text = 'USER123'::text)
Buffers: shared hit=2 read=134
-> Bitmap Index Scan on indx_observation_loc_id (cost=0.00..403.52 rows=13194 width=0) (actual time=63.520..63.520 rows=15853 loops=1)
Index Cond: ((loc_id)::text = ANY ('{LOC12345678}'::text[]))
Buffers: shared hit=1 read=99

(fragment of explain plan)

However it is estimating the number of rows as 1, whereas in this case the actual number of rows is 15636 (it can be much higher).

The Planner then carries this estimate of "1 row" through the rest of the query (which is quite complex), and then makes poor choices about joins.

e.g. uses "Nested Loop Left Join" because it's only expecting one row, whereas in practice it has to do 15636 loops which is very slow.

Note that in cases where the Planner selects a single Index Scan for this query (with different parameters), the Planner makes an accurate estimate of the number of rows and then makes sensible selections of joins (i.e. quick).

i.e. the issue seems to be with the "Bitmap And".

I don't have an index with both user_id & loc_id, as this is one of several different combinations that can arise (it would require quite a few indexes to cover all the possible combinations). However if I did have such an index, the planner would presumably be able to use the statistics for user_id and loc_id to estimate the number of rows.

So why can't it make an accurate estimate of the rows with a "Bitmap And" & " Bitmap Heap Scan"? (as above)

Steve Pritchard

Database Developer

British Trust for Ornithology, The Nunnery, Thetford, Norfolk IP24 2PU, UK
Tel: +44 (0)1842 750050, fax: +44 (0)1842 750030
Registered Charity No 216652 (England & Wales) No SC039193 (Scotland)
Company Limited by Guarantee No 357284 (England & Wales)

Re: Inaccurate Rows estimate for "Bitmap And" causes Planner tochoose wrong join

From

Justin Pryzby

Date:

06 May 2020, 16:34:57

On Wed, May 06, 2020 at 05:19:48PM +0100, Steve Pritchard wrote:
> Version: Postgres 9.6.3 production system (but also tested on Postgres 12)
> 
> For my query the Planner is sometimes choosing an execution plan that uses
> "Bitmap And" (depending on the parameters):
> 
> The Planner then carries this estimate of "1 row" through the rest of the
> query (which is quite complex), and then makes poor choices about joins.
> e.g. uses "Nested Loop Left Join" because it's only expecting one row,
> whereas in practice it has to do 15636 loops which is very slow.

> Note that in cases where the Planner selects a single Index Scan for this
> query (with different parameters), the Planner makes an accurate estimate
> of the number of rows and then makes sensible selections of joins (i.e.
> quick).
> i.e. the issue seems to be with the "Bitmap And".
> 
> I don't have an index with both user_id & loc_id, as this is one of several
> different combinations that can arise (it would require quite a few indexes
> to cover all the possible combinations). However if I did have such an
> index, the planner would presumably be able to use the statistics for
> user_id and loc_id to estimate the number of rows.
> 
> So why can't it make an accurate estimate of the rows with a "Bitmap And" &
> " Bitmap Heap Scan"? (as above)

It probably *has* statistics for user_id and loc_id, but doesn't have stats for
(user_id,loc_id).

Presumbly the conditions are partially redundant, so loc_id => user_id
(strictly implies or just correlated) or the other way around.

In pg10+ you can use "CREATE STATISTICS (dependencies)" to improve that.
https://www.postgresql.org/docs/devel/sql-createstatistics.html

Otherwise you can use the "CREATE TYPE / CREATE INDEX" trick Tomas described here:
https://www.postgresql.org/message-id/20190424003633.ruvhbv5ro3fawo67%40development

-- 
Justin

Re: Inaccurate Rows estimate for "Bitmap And" causes Planner tochoose wrong join

From

Jeff Janes

Date:

06 May 2020, 18:24:55

On Wed, May 6, 2020 at 12:20 PM Steve Pritchard <steve.pritchard@bto.org> wrote:

Version: Postgres 9.6.3 production system (but also tested on Postgres 12)

For my query the Planner is sometimes choosing an execution plan that uses "Bitmap And" (depending on the parameters):

-> Bitmap Heap Scan on observation (cost=484.92..488.93 rows=1 width=203) (actual time=233.129..330.886 rows=15636 loops=1)
Recheck Cond: (((user_id)::text = 'USER123'::text) AND ((loc_id)::text = ANY ('{LOC12345678}'::text[])))

If you change " = ANY(array_of_one)" to " = scalar", does that change anything? You might be able to fix this (in v12) using CREATE STATISTICS, but I don't know if that mechanism can see through the ANY(array_of_one) wrapper.

Note that in cases where the Planner selects a single Index Scan for this query (with different parameters), the Planner makes an accurate estimate of the number of rows and then makes sensible selections of joins (i.e. quick).
i.e. the issue seems to be with the "Bitmap And".

I don't know if this nitpick matters, but I don't think that that is how the planner works. The row estimates work from the top down, not the bottom up. The row estimate of 1 is based on what conditions the bitmap heap scan implements, it is not arrived at by combining the estimates from the index scans below it. If it were to change to a different type of node but implemented the same conditions, I think it would have the same row estimate.

I don't have an index with both user_id & loc_id, as this is one of several different combinations that can arise (it would require quite a few indexes to cover all the possible combinations).

Are you actually experiencing problems with those other combinations as well? If not, I wouldn't worry about solving hypothetical problems. If those other combinations are actually problems and you go with CREATE STATISTICS, then you would have to be creating a lot of different statistics. That would still be ugly, but at least the overhead for statistics is lower than for indexes.

However if I did have such an index, the planner would presumably be able to use the statistics for user_id and loc_id to estimate the number of rows.

Indexes on physical columns do not have statistics, so making that index would not help with the estimation. (Expressional indexes do have statistics, but I don't see that helping you here). So while this node would execute faster with that index, it would still be kicking the unshown nested loop left join 15,636 times when it thinks it will be doing it once, and so would still be slow. The most robust solution might be to make the outer part of that nested loop left join faster, so that your system would be more tolerant of statistics problems.

So why can't it make an accurate estimate of the rows with a "Bitmap And" & " Bitmap Heap Scan"? (as above)

In the absence of custom statistics, it assumes the selectivities of user_id = 'USER123', of loc_id = ANY ('{LOC12345678}'::text[]), and of taxa = 'Birds' are all independent of each other and can be multiplied to arrive at the overall selectivity. But clearly that is not the case. Bird watchers mostly watch near where they live, not in random other places.

Cheers,

Jeff

Re: Inaccurate Rows estimate for "Bitmap And" causes Planner tochoose wrong join

From

Steve Pritchard

Date:

13 May 2020, 09:33:35

Many thanks Justin & Jeff for your replies.

Presumbly the conditions are partially redundant, so loc_id => user_id

Yes you're right. I had overlooked this.

I've done some further testing and this confirms what you say: if the WHERE columns are independent, then the Planner makes a reasonable estimate of the number of rows. irrespective of whether it uses a single index or a "Bitmap And" of two indexes.

I've also tested "create statistics" on Postgres 12:

gives good estimate with WHERE user_id = 'USER123' and loc_id = 'LOC12345678'
but Plan Rows = 5 with WHERE user_id = 'USER123' and loc_id = ANY('{ LOC12345678 }'::text[])
Note: if I omit the user_id condition then it gives a good estimate, i.e. with WHERE loc_id = ANY('{ LOC12345678 }'::text[])

So statistics objects don't seem to be able to handle the combination of dependencies and arrays (at least in 12.2).

Steve

On Wed, 6 May 2020 at 19:25, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, May 6, 2020 at 12:20 PM Steve Pritchard <steve.pritchard@bto.org> wrote:
Version: Postgres 9.6.3 production system (but also tested on Postgres 12)

For my query the Planner is sometimes choosing an execution plan that uses "Bitmap And" (depending on the parameters):

-> Bitmap Heap Scan on observation (cost=484.92..488.93 rows=1 width=203) (actual time=233.129..330.886 rows=15636 loops=1)
Recheck Cond: (((user_id)::text = 'USER123'::text) AND ((loc_id)::text = ANY ('{LOC12345678}'::text[])))

If you change " = ANY(array_of_one)" to " = scalar", does that change anything? You might be able to fix this (in v12) using CREATE STATISTICS, but I don't know if that mechanism can see through the ANY(array_of_one) wrapper.

Note that in cases where the Planner selects a single Index Scan for this query (with different parameters), the Planner makes an accurate estimate of the number of rows and then makes sensible selections of joins (i.e. quick).
i.e. the issue seems to be with the "Bitmap And".

I don't know if this nitpick matters, but I don't think that that is how the planner works. The row estimates work from the top down, not the bottom up. The row estimate of 1 is based on what conditions the bitmap heap scan implements, it is not arrived at by combining the estimates from the index scans below it. If it were to change to a different type of node but implemented the same conditions, I think it would have the same row estimate.

I don't have an index with both user_id & loc_id, as this is one of several different combinations that can arise (it would require quite a few indexes to cover all the possible combinations).

Are you actually experiencing problems with those other combinations as well? If not, I wouldn't worry about solving hypothetical problems. If those other combinations are actually problems and you go with CREATE STATISTICS, then you would have to be creating a lot of different statistics. That would still be ugly, but at least the overhead for statistics is lower than for indexes.

However if I did have such an index, the planner would presumably be able to use the statistics for user_id and loc_id to estimate the number of rows.

Indexes on physical columns do not have statistics, so making that index would not help with the estimation. (Expressional indexes do have statistics, but I don't see that helping you here). So while this node would execute faster with that index, it would still be kicking the unshown nested loop left join 15,636 times when it thinks it will be doing it once, and so would still be slow. The most robust solution might be to make the outer part of that nested loop left join faster, so that your system would be more tolerant of statistics problems.

So why can't it make an accurate estimate of the rows with a "Bitmap And" & " Bitmap Heap Scan"? (as above)

In the absence of custom statistics, it assumes the selectivities of user_id = 'USER123', of loc_id = ANY ('{LOC12345678}'::text[]), and of taxa = 'Birds' are all independent of each other and can be multiplied to arrive at the overall selectivity. But clearly that is not the case. Bird watchers mostly watch near where they live, not in random other places.

Cheers,

Jeff

Steve Pritchard

Database Developer