Working around, or with, bitmap heap scan? - Mailing list pgsql-general

From James A. Robinson
Subject Working around, or with, bitmap heap scan?
Date
Msg-id CAPd04b7yzZH64JKUt8w51kFh+Ze2rxzSc50ghhqKAGR=vsgnXA@mail.gmail.com
Whole thread Raw
Responses Re: Working around, or with, bitmap heap scan?
List pgsql-general
Hello,

I'm newly exposed to a Postgres 9.4 database system, and am
trying to understand how I might optimize a query that is taking
a long time to return.

What I'm observing is an uncached query that takes much much
longer to complete, sometimes minutes longer, when
enable_bitmapscan is true.  Even after the query result is
cached, it seems to consistently be 300 ms slower when
enable_bitmapscan is true (so taking 900 ms to return instead of
600 ms).

This is on an AWS RDS Postgres 9.4 instance, backed by SSD, with
work_mem set to 1 gb (I'm not sure if that is a default or if
someone set it to that value on purpose).  Autovacuum is on and
an ANALYSIS has been run against the db.

I'm hoping for some guidance from experts here on whether or not
I might be able to change something to get the query plan
selection to use the faster index scan that appears to be running
when the enable_bitmapscan is false.  Or perhaps I should be
trying to craft a new index to reduce the chance that it needs
the bitmap?

I tried seeing whether or not reducing the random_page_cost from
the default 1.1 to 1.0 effected any change, and it does not.

Here's the part of the EXPLAIN that appears to change when
enable_bitmapscan is true:

->  Hash Left Join  (cost=24696.39..594784.63 rows=11732 width=281)
    Hash Cond: ((cr.cs_id)::text = (cm.raw_cs_id)::text)
    ->  Hash Join  (cost=24680.76..594368.09 rows=11732 width=135)
      Hash Cond: (cr.resource_id = r_1.id)
      ->  Bitmap Heap Scan on cs_resource cr  (cost=6999.99..569213.13
rows=588549 width=27)
          Recheck Cond: ((cs_id)::text = ANY
('{example.org,_zero_row_}'::text[]))
          ->  Bitmap Index Scan on cs_resource_by_cs_idx
(cost=0.00..6852.86 rows=588549 width=0)
            Index Cond: ((cs_id)::text = ANY
('{example.org,_zero_row_}'::text[]))
      ->  Hash  (cost=17576.61..17576.61 rows=8333 width=124)
          ->  Hash Right Join  (cost=4132.63..17576.61 rows=8333 width=124)
            Hash Cond: (pr.id = r_1.id)
            ->  Seq Scan on pub_resource pr  (cost=0.00..8135.29
rows=418029 width=69)
            ->  Hash  (cost=4028.47..4028.47 rows=8333 width=63)
                ->  Index Scan using "idx_1" on resource r_1
(cost=0.42..4028.47 rows=8333 width=63)
                  Index Cond: ((pl_id = 6) AND ((data_type_id)::text =
'DATABASE'::text))
    ->  Hash  (cost=12.50..12.50 rows=250 width=292)
      ->  Seq Scan on cs_mappings cm  (cost=0.00..12.50 rows=250 width=292)

compared to when enable_bitmapscan is false:

->  Hash Left Join  (cost=17509.48..648717.08 rows=11732 width=281)
    Hash Cond: (r_1.id = pr.id)
    ->  Hash Left Join  (cost=4148.83..635121.79 rows=11732 width=228)
      Hash Cond: ((cr.cs_id)::text = (cm.raw_cs_id)::text)
      ->  Hash Join  (cost=4133.20..634705.25 rows=11732 width=82)
          Hash Cond: (cr.resource_id = r_1.id)
          ->  Index Scan using "idx_2" on cs_resource cr
(cost=0.57..623098.44 rows=588549 width=27)
            Index Cond: ((cs_id)::text = ANY
('{example.org,_zero_row_}'::text[]))
          ->  Hash  (cost=4028.47..4028.47 rows=8333 width=63)
            ->  Index Scan using "idx_1" on resource r_1
(cost=0.42..4028.47 rows=8333 width=63)
                Index Cond: ((pl_id = 6) AND ((data_type_id)::text =
'DATABASE'::text))
      ->  Hash  (cost=12.50..12.50 rows=250 width=292)
          ->  Seq Scan on cs_mappings cm  (cost=0.00..12.50 rows=250 width=292)
    ->  Hash  (cost=8135.29..8135.29 rows=418029 width=69)
      ->  Seq Scan on pub_resource pr  (cost=0.00..8135.29 rows=418029 width=69)

I'm not sure if the following details will be important to know,
but opting for more detail, here's an example of the query (which
returns 183 rows):

> EXPLAIN SELECT
data_type_id, title,
pub_name,
pl_name,
pl_id,
rpt_month,
cs_dsp_name,
cs_id,
grp_id ,
zpe,
sect_type,
sr_fed,
sr_reg,
r_v,
r_x_c,
prop_id
FROM report_view
WHERE pl_id IN (6)
AND data_type_id = 'DATABASE'
AND rpt_month between '2018-06-01' and '2018-06-30'
AND cs_id IN ( 'example.org' , '_zero_row_')
ORDER BY data_type_id ASC, CAST(title AS VARCHAR(256)) ASC, rpt_month ASC;

The underlying report_view that backs this query is... well, I'm
honestly having a very hard time wrapping my head around it (w/
original developer long gone).

I'll also note the tables it hits are massive, e.g., 265 gb for the
cs_resource_event table and 65 gb for the cs_resource table.

> \d+ report_view
                               View "public.report_view"
        Column         |           Type           | Modifiers |
Storage  | Description
-----------------------+--------------------------+-----------+----------+-------------
 data_type_id          | character varying(32)    |           | extended |
 title                 | text                     |           | extended |
 zpe                   | integer                  |           | plain    |
 pub_name              | character varying        |           | extended |
 pl_id                 | bigint                   |           | plain    |
 xid                   | character varying        |           | extended |
 itnv                  | character varying        |           | extended |
 prop_id               | character varying        |           | extended |
 p1_identity           | character varying        |           | extended |
 p2_identity           | character varying        |           | extended |
 rpt_month             | timestamp with time zone |           | plain    |
 ft_html               | integer                  |           | plain    |
 ft_html_mob           | integer                  |           | plain    |
 ft_pdf                | integer                  |           | plain    |
 ft_pdf_mob            | integer                  |           | plain    |
 ft_ps                 | integer                  |           | plain    |
 ft_ps_mob             | integer                  |           | plain    |
 ft_epub               | integer                  |           | plain    |
 sect_html             | integer                  |           | plain    |
 sect_html_mob         | integer                  |           | plain    |
 sect_pdf              | integer                  |           | plain    |
 ft_other              | integer                  |           | plain    |
 ft_other_mob          | integer                  |           | plain    |
 ft_oa                 | integer                  |           | plain    |
 toc                   | integer                  |           | plain    |
 abs                   | integer                  |           | plain    |
 ref                   | integer                  |           | plain    |
 data_set              | integer                  |           | plain    |
 ct_1                  | integer                  |           | plain    |
 ct_2                  | integer                  |           | plain    |
 ct_3                  | integer                  |           | plain    |
 ct_4                  | integer                  |           | plain    |
 ct_5                  | integer                  |           | plain    |
 media_1               | integer                  |           | plain    |
 media_2               | integer                  |           | plain    |
 media_3               | integer                  |           | plain    |
 media_4               | integer                  |           | plain    |
 media_5               | integer                  |           | plain    |
 r_v                   | integer                  |           | plain    |
 r_x_c                 | integer                  |           | plain    |
 sr_reg                | integer                  |           | plain    |
 sr_fed                | integer                  |           | plain    |
 turnaway              | integer                  |           | plain    |
 no_access             | integer                  |           | plain    |
 other                 | integer                  |           | plain    |
 sect_type             | character varying        |           | extended |
 cs_id                 | character varying        |           | extended |
 cs_dsp_name           | character varying        |           | extended |
 grp_id                | character varying        |           | extended |
 pl_name               | character varying(128)   |           | extended |
View definition:
 SELECT r.data_type_id,
    r.title,
    r.zpe,
    r.pub_name,
    r.pl_id,
    r.xid,
    r.itnv,
    r.prop_id,
    r.p1_identity,
    r.p2_identity,
    r.rpt_month,
    r.ft_html,
    r.ft_html_mob,
    r.ft_pdf,
    r.ft_pdf_mob,
    r.ft_ps,
    r.ft_ps_mob,
    r.ft_epub,
    r.sect_html,
    r.sect_html_mob,
    r.sect_pdf,
    r.ft_other,
    r.ft_other_mob,
    r.ft_oa,
    r.toc,
    r.abs,
    r.ref,
    r.data_set,
    r.ct_1,
    r.ct_2,
    r.ct_3,
    r.ct_4,
    r.ct_5,
    r.media_1,
    r.media_2,
    r.media_3,
    r.media_4,
    r.media_5,
    r.r_v,
    r.r_x_c,
    r.sr_reg,
    r.sr_fed,
    r.turnaway,
    r.no_access,
    r.other,
    r.sect_type,
    r.cs_id,
    r.cs_dsp_name,
    r.grp_id,
    p.pl_name
   FROM ( SELECT r_1.data_type_id,
            COALESCE(r_1.title, ''::text) AS title,
            COALESCE(pr.zpe, 0) AS zpe,
            COALESCE(pr.publisher, ''::character varying) AS pub_name,
            r_1.pl_id,
            COALESCE(pr.xid, ''::character varying) AS xid,
            COALESCE(pr.itnv, ''::character varying) AS itnv,
            COALESCE(r_1.prop_id, ''::character varying) AS prop_id,
            COALESCE(pr.p1_identity, ''::character varying) AS p1_identity,
            COALESCE(pr.p2_identity, ''::character varying) AS p2_identity,
            generate_series.generate_series AS rpt_month,
            COALESCE(cre.ft_html, 0) AS ft_html,
            COALESCE(cre.ft_html_mob, 0) AS ft_html_mob,
            COALESCE(cre.ft_pdf, 0) AS ft_pdf,
            COALESCE(cre.ft_pdf_mob, 0) AS ft_pdf_mob,
            COALESCE(cre.ft_ps, 0) AS ft_ps,
            COALESCE(cre.ft_ps_mob, 0) AS ft_ps_mob,
            COALESCE(cre.ft_epub, 0) AS ft_epub,
            COALESCE(cre.sect_html, 0) AS sect_html,
            COALESCE(cre.sect_html_mob, 0) AS sect_html_mob,
            COALESCE(cre.sect_pdf, 0) AS sect_pdf,
            COALESCE(cre.ft_other, 0) AS ft_other,
            COALESCE(cre.ft_other_mob, 0) AS ft_other_mob,
            COALESCE(cre.ft_oa, 0) AS ft_oa,
            COALESCE(cre.toc, 0) AS toc,
            COALESCE(cre.abs, 0) AS abs,
            COALESCE(cre.ref, 0) AS ref,
            COALESCE(cre.data_set, 0) AS data_set,
            COALESCE(cre.ct_1, 0) AS ct_1,
            COALESCE(cre.ct_2, 0) AS ct_2,
            COALESCE(cre.ct_3, 0) AS ct_3,
            COALESCE(cre.ct_4, 0) AS ct_4,
            COALESCE(cre.ct_5, 0) AS ct_5,
            COALESCE(cre.media_1, 0) AS media_1,
            COALESCE(cre.media_2, 0) AS media_2,
            COALESCE(cre.media_3, 0) AS media_3,
            COALESCE(cre.media_4, 0) AS media_4,
            COALESCE(cre.media_5, 0) AS media_5,
            COALESCE(cre.r_v, 0) AS r_v,
            COALESCE(cre.r_x_c, 0) AS r_x_c,
            COALESCE(cre.sr_reg, 0) AS sr_reg,
            COALESCE(cre.sr_fed, 0) AS sr_fed,
            COALESCE(cre.turnaway, 0) AS turnaway,
            COALESCE(cre.no_access, 0) AS no_access,
            COALESCE(cre.other, 0) AS other,
            COALESCE(cre.subtype, ''::character varying) AS sect_type,
            cr.cs_id,
            COALESCE(cm.mapped_cs_id, cr.cs_id) AS cs_dsp_name,
            COALESCE(cre.grp, ''::character varying) AS grp_id
           FROM cs_resource cr
             LEFT JOIN cs_mappings cm ON cm.raw_cs_id::text = cr.cs_id::text
             LEFT JOIN resource r_1 ON r_1.id = cr.resource_id
             LEFT JOIN pub_resource pr ON r_1.id = pr.id
             CROSS JOIN generate_series('2000-01-01
00:00:00+00'::timestamp with time zone, now(), '1 mon'::interval)
generate_series(generate_series)
             LEFT JOIN cs_resource_event cre
             ON cr.id = cre.cs_resource_id
             AND generate_series.generate_series = cre.rpt_month
             AND cre.art_id = 0
             AND cre.additional_grp_event <> true
        UNION
         SELECT r_1.data_type_id,
            COALESCE(r_1.title, ''::text) AS title,
            COALESCE(r_1.zpe, 0) AS zpe,
            COALESCE(r_1.publisher, ''::character varying) AS pub_name,
            rl.pl_id,
            COALESCE(r_1.xid, ''::character varying) AS xid,
            COALESCE(r_1.itnv, ''::character varying) AS itnv,
            COALESCE(r_1.prop_id, ''::character varying) AS prop_id,
            COALESCE(r_1.p1_identity, ''::character varying) AS p1_identity,
            COALESCE(r_1.p2_identity, ''::character varying) AS p2_identity,
            generate_series.generate_series AS rpt_month,
            0 AS ft_html,
            0 AS ft_html_mob,
            0 AS ft_pdf,
            0 AS ft_pdf_mob,
            0 AS ft_ps,
            0 AS ft_ps_mob,
            0 AS ft_epub,
            0 AS sect_html,
            0 AS sect_html_mob,
            0 AS sect_pdf,
            0 AS ft_other,
            0 AS ft_other_mob,
            0 AS ft_oa,
            0 AS toc,
            0 AS abs,
            0 AS ref,
            0 AS data_set,
            0 AS ct_1,
            0 AS ct_2,
            0 AS ct_3,
            0 AS ct_4,
            0 AS ct_5,
            0 AS media_1,
            0 AS media_2,
            0 AS media_3,
            0 AS media_4,
            0 AS media_5,
            0 AS r_v,
            0 AS r_x_c,
            0 AS sr_reg,
            0 AS sr_fed,
            0 AS turnaway,
            0 AS no_access,
            0 AS other,
            ''::character varying AS sect_type,
            '_zero_row_'::character varying AS cs_id,
            '_zero_row_'::character varying AS cs_dsp_name,
            '_zero_row_'::character varying AS grp_id
           FROM resource_list_entry r_1
             LEFT JOIN resource_lists rl ON r_1.list_code::text =
rl.list_code::text
             CROSS JOIN generate_series('2000-01-01
00:00:00+00'::timestamp with time zone, now(), '1 mon'::interval)
generate_series(generate_series)) r
     JOIN platform p ON p.id = r.pl_id
  WHERE r.data_type_id::text <> ALL (ARRAY['BK_ART'::character
varying::text, 'JNL_ART'::character varying::text]);


pgsql-general by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Is there a way to speed up WAL replay?
Next
From: Vijaykumar Jain
Date:
Subject: incomplete startup packet messages in logs