Bitmap heap scan performance - Mailing list pgsql-performance

From Rob Emery
Subject Bitmap heap scan performance
Date
Msg-id CAPCETptU5SbOSNGj2dp5h4rCeFubwAcV7BxiPdB=vjpMCxFv=A@mail.gmail.com
Whole thread Raw
Responses Re: Bitmap heap scan performance  (Jeff Janes <jeff.janes@gmail.com>)
Re: Bitmap heap scan performance  (Jeremy Finzel <finzelj@gmail.com>)
List pgsql-performance
Hi Guys,

I’m at a bit of a loss where I can go with the following 2 queries
that are over the same data structure (DDL attached) under postgresql
PostgreSQL 9.5.16 on x86_64-pc-linux-gnu (Debian 9.5.16-1.pgdg90+1),
compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit and
could do with a second set of eyes if someone would oblige.

I’ve attached Query1.txt and Query2.txt along with the DDL for the
tables and indicies and execution plans.

On our production environment we’re running at about 2 seconds (with
the cache warm); I’m getting a comparable speed on my playbox. It
seems to me like the Bitmap Heap Scan on proposal is the issue because
the recheck is throwing away enormous amounts of data. The
has_been_anonymised flag on the proposal is effectively a soft-delete;
so I’ve tried adding something like :

CREATE INDEX ON proposal.proposal (system_id, legacy_organisation_id, reference)
WHERE has_been_anonymised = false;

Which I was hoping would shrink the size of the index significantly
and encourage an index scan rather than bitmap, however it didn’t have
that effect. For reference:

Has_been_anonymised false:    1534790
Has_been_anonymised true:        7072192

Row counts over the whole table in question are :
Proposal.proposal:    8606982    2340 MB
Proposal.note:        2624423      1638 MB

Presumably I could partition proposal on has_been_anonymised, however
the row counts seem low enough that it feels a bit like overkill? We
also need referential integrity so I'll need to wait until that's in
(I think it's coming in PG12?)

If I decrease the number of legacy_organisation_id’s that are being
used then the query performance gets much better, but presumably
that’s because there’s a smaller dataset.

Any thoughts or ideas?

Thanks
Rob

--
 <https://codeweavers.net>


A big Get Focused ‘thank you’
<https://codeweavers.net/company-blog/a-big-get-focused-thank-you>
Why you
should partner with an Agile company
<https://codeweavers.net/company-blog/why-you-should-partner-with-an-agile-company>


*
*
*Phone:* 0800 021 0888   Email: contactus@codeweavers.net
<mailto:contactus@codeweavers.net>
Codeweavers Ltd | Barn 4 | Dunston
Business Village | Dunston | ST18 9AB
Registered in England and Wales No.
04092394 | VAT registration no. 974 9705 63 




<https://twitter.com/Codeweavers_Ltd> 
<https://www.facebook.com/Codeweavers.Ltd/> 
<https://www.linkedin.com/company/codeweavers-limited>

Attachment

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Postgres not using correct indices for views.
Next
From: Jeff Janes
Date:
Subject: Re: Bitmap heap scan performance