Thread: Working around, or with, bitmap heap scan?

Working around, or with, bitmap heap scan?

From
"James A. Robinson"
Date:
Hello,

I'm newly exposed to a Postgres 9.4 database system, and am
trying to understand how I might optimize a query that is taking
a long time to return.

What I'm observing is an uncached query that takes much much
longer to complete, sometimes minutes longer, when
enable_bitmapscan is true.  Even after the query result is
cached, it seems to consistently be 300 ms slower when
enable_bitmapscan is true (so taking 900 ms to return instead of
600 ms).

This is on an AWS RDS Postgres 9.4 instance, backed by SSD, with
work_mem set to 1 gb (I'm not sure if that is a default or if
someone set it to that value on purpose).  Autovacuum is on and
an ANALYSIS has been run against the db.

I'm hoping for some guidance from experts here on whether or not
I might be able to change something to get the query plan
selection to use the faster index scan that appears to be running
when the enable_bitmapscan is false.  Or perhaps I should be
trying to craft a new index to reduce the chance that it needs
the bitmap?

I tried seeing whether or not reducing the random_page_cost from
the default 1.1 to 1.0 effected any change, and it does not.

Here's the part of the EXPLAIN that appears to change when
enable_bitmapscan is true:

->  Hash Left Join  (cost=24696.39..594784.63 rows=11732 width=281)
    Hash Cond: ((cr.cs_id)::text = (cm.raw_cs_id)::text)
    ->  Hash Join  (cost=24680.76..594368.09 rows=11732 width=135)
      Hash Cond: (cr.resource_id = r_1.id)
      ->  Bitmap Heap Scan on cs_resource cr  (cost=6999.99..569213.13
rows=588549 width=27)
          Recheck Cond: ((cs_id)::text = ANY
('{example.org,_zero_row_}'::text[]))
          ->  Bitmap Index Scan on cs_resource_by_cs_idx
(cost=0.00..6852.86 rows=588549 width=0)
            Index Cond: ((cs_id)::text = ANY
('{example.org,_zero_row_}'::text[]))
      ->  Hash  (cost=17576.61..17576.61 rows=8333 width=124)
          ->  Hash Right Join  (cost=4132.63..17576.61 rows=8333 width=124)
            Hash Cond: (pr.id = r_1.id)
            ->  Seq Scan on pub_resource pr  (cost=0.00..8135.29
rows=418029 width=69)
            ->  Hash  (cost=4028.47..4028.47 rows=8333 width=63)
                ->  Index Scan using "idx_1" on resource r_1
(cost=0.42..4028.47 rows=8333 width=63)
                  Index Cond: ((pl_id = 6) AND ((data_type_id)::text =
'DATABASE'::text))
    ->  Hash  (cost=12.50..12.50 rows=250 width=292)
      ->  Seq Scan on cs_mappings cm  (cost=0.00..12.50 rows=250 width=292)

compared to when enable_bitmapscan is false:

->  Hash Left Join  (cost=17509.48..648717.08 rows=11732 width=281)
    Hash Cond: (r_1.id = pr.id)
    ->  Hash Left Join  (cost=4148.83..635121.79 rows=11732 width=228)
      Hash Cond: ((cr.cs_id)::text = (cm.raw_cs_id)::text)
      ->  Hash Join  (cost=4133.20..634705.25 rows=11732 width=82)
          Hash Cond: (cr.resource_id = r_1.id)
          ->  Index Scan using "idx_2" on cs_resource cr
(cost=0.57..623098.44 rows=588549 width=27)
            Index Cond: ((cs_id)::text = ANY
('{example.org,_zero_row_}'::text[]))
          ->  Hash  (cost=4028.47..4028.47 rows=8333 width=63)
            ->  Index Scan using "idx_1" on resource r_1
(cost=0.42..4028.47 rows=8333 width=63)
                Index Cond: ((pl_id = 6) AND ((data_type_id)::text =
'DATABASE'::text))
      ->  Hash  (cost=12.50..12.50 rows=250 width=292)
          ->  Seq Scan on cs_mappings cm  (cost=0.00..12.50 rows=250 width=292)
    ->  Hash  (cost=8135.29..8135.29 rows=418029 width=69)
      ->  Seq Scan on pub_resource pr  (cost=0.00..8135.29 rows=418029 width=69)

I'm not sure if the following details will be important to know,
but opting for more detail, here's an example of the query (which
returns 183 rows):

> EXPLAIN SELECT
data_type_id, title,
pub_name,
pl_name,
pl_id,
rpt_month,
cs_dsp_name,
cs_id,
grp_id ,
zpe,
sect_type,
sr_fed,
sr_reg,
r_v,
r_x_c,
prop_id
FROM report_view
WHERE pl_id IN (6)
AND data_type_id = 'DATABASE'
AND rpt_month between '2018-06-01' and '2018-06-30'
AND cs_id IN ( 'example.org' , '_zero_row_')
ORDER BY data_type_id ASC, CAST(title AS VARCHAR(256)) ASC, rpt_month ASC;

The underlying report_view that backs this query is... well, I'm
honestly having a very hard time wrapping my head around it (w/
original developer long gone).

I'll also note the tables it hits are massive, e.g., 265 gb for the
cs_resource_event table and 65 gb for the cs_resource table.

> \d+ report_view
                               View "public.report_view"
        Column         |           Type           | Modifiers |
Storage  | Description
-----------------------+--------------------------+-----------+----------+-------------
 data_type_id          | character varying(32)    |           | extended |
 title                 | text                     |           | extended |
 zpe                   | integer                  |           | plain    |
 pub_name              | character varying        |           | extended |
 pl_id                 | bigint                   |           | plain    |
 xid                   | character varying        |           | extended |
 itnv                  | character varying        |           | extended |
 prop_id               | character varying        |           | extended |
 p1_identity           | character varying        |           | extended |
 p2_identity           | character varying        |           | extended |
 rpt_month             | timestamp with time zone |           | plain    |
 ft_html               | integer                  |           | plain    |
 ft_html_mob           | integer                  |           | plain    |
 ft_pdf                | integer                  |           | plain    |
 ft_pdf_mob            | integer                  |           | plain    |
 ft_ps                 | integer                  |           | plain    |
 ft_ps_mob             | integer                  |           | plain    |
 ft_epub               | integer                  |           | plain    |
 sect_html             | integer                  |           | plain    |
 sect_html_mob         | integer                  |           | plain    |
 sect_pdf              | integer                  |           | plain    |
 ft_other              | integer                  |           | plain    |
 ft_other_mob          | integer                  |           | plain    |
 ft_oa                 | integer                  |           | plain    |
 toc                   | integer                  |           | plain    |
 abs                   | integer                  |           | plain    |
 ref                   | integer                  |           | plain    |
 data_set              | integer                  |           | plain    |
 ct_1                  | integer                  |           | plain    |
 ct_2                  | integer                  |           | plain    |
 ct_3                  | integer                  |           | plain    |
 ct_4                  | integer                  |           | plain    |
 ct_5                  | integer                  |           | plain    |
 media_1               | integer                  |           | plain    |
 media_2               | integer                  |           | plain    |
 media_3               | integer                  |           | plain    |
 media_4               | integer                  |           | plain    |
 media_5               | integer                  |           | plain    |
 r_v                   | integer                  |           | plain    |
 r_x_c                 | integer                  |           | plain    |
 sr_reg                | integer                  |           | plain    |
 sr_fed                | integer                  |           | plain    |
 turnaway              | integer                  |           | plain    |
 no_access             | integer                  |           | plain    |
 other                 | integer                  |           | plain    |
 sect_type             | character varying        |           | extended |
 cs_id                 | character varying        |           | extended |
 cs_dsp_name           | character varying        |           | extended |
 grp_id                | character varying        |           | extended |
 pl_name               | character varying(128)   |           | extended |
View definition:
 SELECT r.data_type_id,
    r.title,
    r.zpe,
    r.pub_name,
    r.pl_id,
    r.xid,
    r.itnv,
    r.prop_id,
    r.p1_identity,
    r.p2_identity,
    r.rpt_month,
    r.ft_html,
    r.ft_html_mob,
    r.ft_pdf,
    r.ft_pdf_mob,
    r.ft_ps,
    r.ft_ps_mob,
    r.ft_epub,
    r.sect_html,
    r.sect_html_mob,
    r.sect_pdf,
    r.ft_other,
    r.ft_other_mob,
    r.ft_oa,
    r.toc,
    r.abs,
    r.ref,
    r.data_set,
    r.ct_1,
    r.ct_2,
    r.ct_3,
    r.ct_4,
    r.ct_5,
    r.media_1,
    r.media_2,
    r.media_3,
    r.media_4,
    r.media_5,
    r.r_v,
    r.r_x_c,
    r.sr_reg,
    r.sr_fed,
    r.turnaway,
    r.no_access,
    r.other,
    r.sect_type,
    r.cs_id,
    r.cs_dsp_name,
    r.grp_id,
    p.pl_name
   FROM ( SELECT r_1.data_type_id,
            COALESCE(r_1.title, ''::text) AS title,
            COALESCE(pr.zpe, 0) AS zpe,
            COALESCE(pr.publisher, ''::character varying) AS pub_name,
            r_1.pl_id,
            COALESCE(pr.xid, ''::character varying) AS xid,
            COALESCE(pr.itnv, ''::character varying) AS itnv,
            COALESCE(r_1.prop_id, ''::character varying) AS prop_id,
            COALESCE(pr.p1_identity, ''::character varying) AS p1_identity,
            COALESCE(pr.p2_identity, ''::character varying) AS p2_identity,
            generate_series.generate_series AS rpt_month,
            COALESCE(cre.ft_html, 0) AS ft_html,
            COALESCE(cre.ft_html_mob, 0) AS ft_html_mob,
            COALESCE(cre.ft_pdf, 0) AS ft_pdf,
            COALESCE(cre.ft_pdf_mob, 0) AS ft_pdf_mob,
            COALESCE(cre.ft_ps, 0) AS ft_ps,
            COALESCE(cre.ft_ps_mob, 0) AS ft_ps_mob,
            COALESCE(cre.ft_epub, 0) AS ft_epub,
            COALESCE(cre.sect_html, 0) AS sect_html,
            COALESCE(cre.sect_html_mob, 0) AS sect_html_mob,
            COALESCE(cre.sect_pdf, 0) AS sect_pdf,
            COALESCE(cre.ft_other, 0) AS ft_other,
            COALESCE(cre.ft_other_mob, 0) AS ft_other_mob,
            COALESCE(cre.ft_oa, 0) AS ft_oa,
            COALESCE(cre.toc, 0) AS toc,
            COALESCE(cre.abs, 0) AS abs,
            COALESCE(cre.ref, 0) AS ref,
            COALESCE(cre.data_set, 0) AS data_set,
            COALESCE(cre.ct_1, 0) AS ct_1,
            COALESCE(cre.ct_2, 0) AS ct_2,
            COALESCE(cre.ct_3, 0) AS ct_3,
            COALESCE(cre.ct_4, 0) AS ct_4,
            COALESCE(cre.ct_5, 0) AS ct_5,
            COALESCE(cre.media_1, 0) AS media_1,
            COALESCE(cre.media_2, 0) AS media_2,
            COALESCE(cre.media_3, 0) AS media_3,
            COALESCE(cre.media_4, 0) AS media_4,
            COALESCE(cre.media_5, 0) AS media_5,
            COALESCE(cre.r_v, 0) AS r_v,
            COALESCE(cre.r_x_c, 0) AS r_x_c,
            COALESCE(cre.sr_reg, 0) AS sr_reg,
            COALESCE(cre.sr_fed, 0) AS sr_fed,
            COALESCE(cre.turnaway, 0) AS turnaway,
            COALESCE(cre.no_access, 0) AS no_access,
            COALESCE(cre.other, 0) AS other,
            COALESCE(cre.subtype, ''::character varying) AS sect_type,
            cr.cs_id,
            COALESCE(cm.mapped_cs_id, cr.cs_id) AS cs_dsp_name,
            COALESCE(cre.grp, ''::character varying) AS grp_id
           FROM cs_resource cr
             LEFT JOIN cs_mappings cm ON cm.raw_cs_id::text = cr.cs_id::text
             LEFT JOIN resource r_1 ON r_1.id = cr.resource_id
             LEFT JOIN pub_resource pr ON r_1.id = pr.id
             CROSS JOIN generate_series('2000-01-01
00:00:00+00'::timestamp with time zone, now(), '1 mon'::interval)
generate_series(generate_series)
             LEFT JOIN cs_resource_event cre
             ON cr.id = cre.cs_resource_id
             AND generate_series.generate_series = cre.rpt_month
             AND cre.art_id = 0
             AND cre.additional_grp_event <> true
        UNION
         SELECT r_1.data_type_id,
            COALESCE(r_1.title, ''::text) AS title,
            COALESCE(r_1.zpe, 0) AS zpe,
            COALESCE(r_1.publisher, ''::character varying) AS pub_name,
            rl.pl_id,
            COALESCE(r_1.xid, ''::character varying) AS xid,
            COALESCE(r_1.itnv, ''::character varying) AS itnv,
            COALESCE(r_1.prop_id, ''::character varying) AS prop_id,
            COALESCE(r_1.p1_identity, ''::character varying) AS p1_identity,
            COALESCE(r_1.p2_identity, ''::character varying) AS p2_identity,
            generate_series.generate_series AS rpt_month,
            0 AS ft_html,
            0 AS ft_html_mob,
            0 AS ft_pdf,
            0 AS ft_pdf_mob,
            0 AS ft_ps,
            0 AS ft_ps_mob,
            0 AS ft_epub,
            0 AS sect_html,
            0 AS sect_html_mob,
            0 AS sect_pdf,
            0 AS ft_other,
            0 AS ft_other_mob,
            0 AS ft_oa,
            0 AS toc,
            0 AS abs,
            0 AS ref,
            0 AS data_set,
            0 AS ct_1,
            0 AS ct_2,
            0 AS ct_3,
            0 AS ct_4,
            0 AS ct_5,
            0 AS media_1,
            0 AS media_2,
            0 AS media_3,
            0 AS media_4,
            0 AS media_5,
            0 AS r_v,
            0 AS r_x_c,
            0 AS sr_reg,
            0 AS sr_fed,
            0 AS turnaway,
            0 AS no_access,
            0 AS other,
            ''::character varying AS sect_type,
            '_zero_row_'::character varying AS cs_id,
            '_zero_row_'::character varying AS cs_dsp_name,
            '_zero_row_'::character varying AS grp_id
           FROM resource_list_entry r_1
             LEFT JOIN resource_lists rl ON r_1.list_code::text =
rl.list_code::text
             CROSS JOIN generate_series('2000-01-01
00:00:00+00'::timestamp with time zone, now(), '1 mon'::interval)
generate_series(generate_series)) r
     JOIN platform p ON p.id = r.pl_id
  WHERE r.data_type_id::text <> ALL (ARRAY['BK_ART'::character
varying::text, 'JNL_ART'::character varying::text]);


Re: Working around, or with, bitmap heap scan?

From
Laurenz Albe
Date:
James A. Robinson wrote:
> I'm newly exposed to a Postgres 9.4 database system, and am
> trying to understand how I might optimize a query that is taking
> a long time to return.
> 
> What I'm observing is an uncached query that takes much much
> longer to complete, sometimes minutes longer, when
> enable_bitmapscan is true.  Even after the query result is
> cached, it seems to consistently be 300 ms slower when
> enable_bitmapscan is true (so taking 900 ms to return instead of
> 600 ms).

It would be helpful to have EXPLAIN (ANALYZE, BUFFERS) output for
both queries.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com



Re: Working around, or with, bitmap heap scan?

From
"James A. Robinson"
Date:
Thank you.  Do you need the entire output of 'EXPLAIN (ANALYZE,
BUFFERS) ...', or will just the sub-section, as I emailed for the
plain 'EXPLAIN ...' I initially e-mailed. do?


Re: Working around, or with, bitmap heap scan?

From
Laurenz Albe
Date:
James A. Robinson wrote:
> Thank you.  Do you need the entire output of 'EXPLAIN (ANALYZE,
> BUFFERS) ...', or will just the sub-section, as I emailed for the
> plain 'EXPLAIN ...' I initially e-mailed. do?

It should be the whole thing.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com



Re: Working around, or with, bitmap heap scan?

From
"James A. Robinson"
Date:
Well, so as to not spam the list here's a link to a folder with copies
of the output:

https://drive.google.com/open?id=1lSlx7UMUMNgRft2B3Rq2zc4WIRJLLOqU
On Fri, Nov 2, 2018 at 4:12 AM James A. Robinson <jim.robinson@gmail.com> wrote:
>
> Thank you.  Do you need the entire output of 'EXPLAIN (ANALYZE,
> BUFFERS) ...', or will just the sub-section, as I emailed for the
> plain 'EXPLAIN ...' I initially e-mailed. do?


Re: Working around, or with, bitmap heap scan?

From
Laurenz Albe
Date:
On Fri, 2018-11-02 at 04:55 -0700, James A. Robinson wrote:
> Well, so as to not spam the list here's a link to a folder with copies
> of the output:

I have no idea how to improve that, sorry.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com



Re: Working around, or with, bitmap heap scan?

From
"James A. Robinson"
Date:
On Fri, Nov 2, 2018 at 6:21 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> I have no idea how to improve that, sorry.

Heh, thank you for looking.  From your response I've got a vision of a
medical drama where the doctor looks over some test results and sadly
informs the patient "I'm sorry, there's nothing more we can do." :)