Working around, or with, bitmap heap scan? - Mailing list pgsql-general
From | James A. Robinson |
---|---|
Subject | Working around, or with, bitmap heap scan? |
Date | |
Msg-id | CAPd04b7yzZH64JKUt8w51kFh+Ze2rxzSc50ghhqKAGR=vsgnXA@mail.gmail.com Whole thread Raw |
Responses |
Re: Working around, or with, bitmap heap scan?
|
List | pgsql-general |
Hello, I'm newly exposed to a Postgres 9.4 database system, and am trying to understand how I might optimize a query that is taking a long time to return. What I'm observing is an uncached query that takes much much longer to complete, sometimes minutes longer, when enable_bitmapscan is true. Even after the query result is cached, it seems to consistently be 300 ms slower when enable_bitmapscan is true (so taking 900 ms to return instead of 600 ms). This is on an AWS RDS Postgres 9.4 instance, backed by SSD, with work_mem set to 1 gb (I'm not sure if that is a default or if someone set it to that value on purpose). Autovacuum is on and an ANALYSIS has been run against the db. I'm hoping for some guidance from experts here on whether or not I might be able to change something to get the query plan selection to use the faster index scan that appears to be running when the enable_bitmapscan is false. Or perhaps I should be trying to craft a new index to reduce the chance that it needs the bitmap? I tried seeing whether or not reducing the random_page_cost from the default 1.1 to 1.0 effected any change, and it does not. Here's the part of the EXPLAIN that appears to change when enable_bitmapscan is true: -> Hash Left Join (cost=24696.39..594784.63 rows=11732 width=281) Hash Cond: ((cr.cs_id)::text = (cm.raw_cs_id)::text) -> Hash Join (cost=24680.76..594368.09 rows=11732 width=135) Hash Cond: (cr.resource_id = r_1.id) -> Bitmap Heap Scan on cs_resource cr (cost=6999.99..569213.13 rows=588549 width=27) Recheck Cond: ((cs_id)::text = ANY ('{example.org,_zero_row_}'::text[])) -> Bitmap Index Scan on cs_resource_by_cs_idx (cost=0.00..6852.86 rows=588549 width=0) Index Cond: ((cs_id)::text = ANY ('{example.org,_zero_row_}'::text[])) -> Hash (cost=17576.61..17576.61 rows=8333 width=124) -> Hash Right Join (cost=4132.63..17576.61 rows=8333 width=124) Hash Cond: (pr.id = r_1.id) -> Seq Scan on pub_resource pr (cost=0.00..8135.29 rows=418029 width=69) -> Hash (cost=4028.47..4028.47 rows=8333 width=63) -> Index Scan using "idx_1" on resource r_1 (cost=0.42..4028.47 rows=8333 width=63) Index Cond: ((pl_id = 6) AND ((data_type_id)::text = 'DATABASE'::text)) -> Hash (cost=12.50..12.50 rows=250 width=292) -> Seq Scan on cs_mappings cm (cost=0.00..12.50 rows=250 width=292) compared to when enable_bitmapscan is false: -> Hash Left Join (cost=17509.48..648717.08 rows=11732 width=281) Hash Cond: (r_1.id = pr.id) -> Hash Left Join (cost=4148.83..635121.79 rows=11732 width=228) Hash Cond: ((cr.cs_id)::text = (cm.raw_cs_id)::text) -> Hash Join (cost=4133.20..634705.25 rows=11732 width=82) Hash Cond: (cr.resource_id = r_1.id) -> Index Scan using "idx_2" on cs_resource cr (cost=0.57..623098.44 rows=588549 width=27) Index Cond: ((cs_id)::text = ANY ('{example.org,_zero_row_}'::text[])) -> Hash (cost=4028.47..4028.47 rows=8333 width=63) -> Index Scan using "idx_1" on resource r_1 (cost=0.42..4028.47 rows=8333 width=63) Index Cond: ((pl_id = 6) AND ((data_type_id)::text = 'DATABASE'::text)) -> Hash (cost=12.50..12.50 rows=250 width=292) -> Seq Scan on cs_mappings cm (cost=0.00..12.50 rows=250 width=292) -> Hash (cost=8135.29..8135.29 rows=418029 width=69) -> Seq Scan on pub_resource pr (cost=0.00..8135.29 rows=418029 width=69) I'm not sure if the following details will be important to know, but opting for more detail, here's an example of the query (which returns 183 rows): > EXPLAIN SELECT data_type_id, title, pub_name, pl_name, pl_id, rpt_month, cs_dsp_name, cs_id, grp_id , zpe, sect_type, sr_fed, sr_reg, r_v, r_x_c, prop_id FROM report_view WHERE pl_id IN (6) AND data_type_id = 'DATABASE' AND rpt_month between '2018-06-01' and '2018-06-30' AND cs_id IN ( 'example.org' , '_zero_row_') ORDER BY data_type_id ASC, CAST(title AS VARCHAR(256)) ASC, rpt_month ASC; The underlying report_view that backs this query is... well, I'm honestly having a very hard time wrapping my head around it (w/ original developer long gone). I'll also note the tables it hits are massive, e.g., 265 gb for the cs_resource_event table and 65 gb for the cs_resource table. > \d+ report_view View "public.report_view" Column | Type | Modifiers | Storage | Description -----------------------+--------------------------+-----------+----------+------------- data_type_id | character varying(32) | | extended | title | text | | extended | zpe | integer | | plain | pub_name | character varying | | extended | pl_id | bigint | | plain | xid | character varying | | extended | itnv | character varying | | extended | prop_id | character varying | | extended | p1_identity | character varying | | extended | p2_identity | character varying | | extended | rpt_month | timestamp with time zone | | plain | ft_html | integer | | plain | ft_html_mob | integer | | plain | ft_pdf | integer | | plain | ft_pdf_mob | integer | | plain | ft_ps | integer | | plain | ft_ps_mob | integer | | plain | ft_epub | integer | | plain | sect_html | integer | | plain | sect_html_mob | integer | | plain | sect_pdf | integer | | plain | ft_other | integer | | plain | ft_other_mob | integer | | plain | ft_oa | integer | | plain | toc | integer | | plain | abs | integer | | plain | ref | integer | | plain | data_set | integer | | plain | ct_1 | integer | | plain | ct_2 | integer | | plain | ct_3 | integer | | plain | ct_4 | integer | | plain | ct_5 | integer | | plain | media_1 | integer | | plain | media_2 | integer | | plain | media_3 | integer | | plain | media_4 | integer | | plain | media_5 | integer | | plain | r_v | integer | | plain | r_x_c | integer | | plain | sr_reg | integer | | plain | sr_fed | integer | | plain | turnaway | integer | | plain | no_access | integer | | plain | other | integer | | plain | sect_type | character varying | | extended | cs_id | character varying | | extended | cs_dsp_name | character varying | | extended | grp_id | character varying | | extended | pl_name | character varying(128) | | extended | View definition: SELECT r.data_type_id, r.title, r.zpe, r.pub_name, r.pl_id, r.xid, r.itnv, r.prop_id, r.p1_identity, r.p2_identity, r.rpt_month, r.ft_html, r.ft_html_mob, r.ft_pdf, r.ft_pdf_mob, r.ft_ps, r.ft_ps_mob, r.ft_epub, r.sect_html, r.sect_html_mob, r.sect_pdf, r.ft_other, r.ft_other_mob, r.ft_oa, r.toc, r.abs, r.ref, r.data_set, r.ct_1, r.ct_2, r.ct_3, r.ct_4, r.ct_5, r.media_1, r.media_2, r.media_3, r.media_4, r.media_5, r.r_v, r.r_x_c, r.sr_reg, r.sr_fed, r.turnaway, r.no_access, r.other, r.sect_type, r.cs_id, r.cs_dsp_name, r.grp_id, p.pl_name FROM ( SELECT r_1.data_type_id, COALESCE(r_1.title, ''::text) AS title, COALESCE(pr.zpe, 0) AS zpe, COALESCE(pr.publisher, ''::character varying) AS pub_name, r_1.pl_id, COALESCE(pr.xid, ''::character varying) AS xid, COALESCE(pr.itnv, ''::character varying) AS itnv, COALESCE(r_1.prop_id, ''::character varying) AS prop_id, COALESCE(pr.p1_identity, ''::character varying) AS p1_identity, COALESCE(pr.p2_identity, ''::character varying) AS p2_identity, generate_series.generate_series AS rpt_month, COALESCE(cre.ft_html, 0) AS ft_html, COALESCE(cre.ft_html_mob, 0) AS ft_html_mob, COALESCE(cre.ft_pdf, 0) AS ft_pdf, COALESCE(cre.ft_pdf_mob, 0) AS ft_pdf_mob, COALESCE(cre.ft_ps, 0) AS ft_ps, COALESCE(cre.ft_ps_mob, 0) AS ft_ps_mob, COALESCE(cre.ft_epub, 0) AS ft_epub, COALESCE(cre.sect_html, 0) AS sect_html, COALESCE(cre.sect_html_mob, 0) AS sect_html_mob, COALESCE(cre.sect_pdf, 0) AS sect_pdf, COALESCE(cre.ft_other, 0) AS ft_other, COALESCE(cre.ft_other_mob, 0) AS ft_other_mob, COALESCE(cre.ft_oa, 0) AS ft_oa, COALESCE(cre.toc, 0) AS toc, COALESCE(cre.abs, 0) AS abs, COALESCE(cre.ref, 0) AS ref, COALESCE(cre.data_set, 0) AS data_set, COALESCE(cre.ct_1, 0) AS ct_1, COALESCE(cre.ct_2, 0) AS ct_2, COALESCE(cre.ct_3, 0) AS ct_3, COALESCE(cre.ct_4, 0) AS ct_4, COALESCE(cre.ct_5, 0) AS ct_5, COALESCE(cre.media_1, 0) AS media_1, COALESCE(cre.media_2, 0) AS media_2, COALESCE(cre.media_3, 0) AS media_3, COALESCE(cre.media_4, 0) AS media_4, COALESCE(cre.media_5, 0) AS media_5, COALESCE(cre.r_v, 0) AS r_v, COALESCE(cre.r_x_c, 0) AS r_x_c, COALESCE(cre.sr_reg, 0) AS sr_reg, COALESCE(cre.sr_fed, 0) AS sr_fed, COALESCE(cre.turnaway, 0) AS turnaway, COALESCE(cre.no_access, 0) AS no_access, COALESCE(cre.other, 0) AS other, COALESCE(cre.subtype, ''::character varying) AS sect_type, cr.cs_id, COALESCE(cm.mapped_cs_id, cr.cs_id) AS cs_dsp_name, COALESCE(cre.grp, ''::character varying) AS grp_id FROM cs_resource cr LEFT JOIN cs_mappings cm ON cm.raw_cs_id::text = cr.cs_id::text LEFT JOIN resource r_1 ON r_1.id = cr.resource_id LEFT JOIN pub_resource pr ON r_1.id = pr.id CROSS JOIN generate_series('2000-01-01 00:00:00+00'::timestamp with time zone, now(), '1 mon'::interval) generate_series(generate_series) LEFT JOIN cs_resource_event cre ON cr.id = cre.cs_resource_id AND generate_series.generate_series = cre.rpt_month AND cre.art_id = 0 AND cre.additional_grp_event <> true UNION SELECT r_1.data_type_id, COALESCE(r_1.title, ''::text) AS title, COALESCE(r_1.zpe, 0) AS zpe, COALESCE(r_1.publisher, ''::character varying) AS pub_name, rl.pl_id, COALESCE(r_1.xid, ''::character varying) AS xid, COALESCE(r_1.itnv, ''::character varying) AS itnv, COALESCE(r_1.prop_id, ''::character varying) AS prop_id, COALESCE(r_1.p1_identity, ''::character varying) AS p1_identity, COALESCE(r_1.p2_identity, ''::character varying) AS p2_identity, generate_series.generate_series AS rpt_month, 0 AS ft_html, 0 AS ft_html_mob, 0 AS ft_pdf, 0 AS ft_pdf_mob, 0 AS ft_ps, 0 AS ft_ps_mob, 0 AS ft_epub, 0 AS sect_html, 0 AS sect_html_mob, 0 AS sect_pdf, 0 AS ft_other, 0 AS ft_other_mob, 0 AS ft_oa, 0 AS toc, 0 AS abs, 0 AS ref, 0 AS data_set, 0 AS ct_1, 0 AS ct_2, 0 AS ct_3, 0 AS ct_4, 0 AS ct_5, 0 AS media_1, 0 AS media_2, 0 AS media_3, 0 AS media_4, 0 AS media_5, 0 AS r_v, 0 AS r_x_c, 0 AS sr_reg, 0 AS sr_fed, 0 AS turnaway, 0 AS no_access, 0 AS other, ''::character varying AS sect_type, '_zero_row_'::character varying AS cs_id, '_zero_row_'::character varying AS cs_dsp_name, '_zero_row_'::character varying AS grp_id FROM resource_list_entry r_1 LEFT JOIN resource_lists rl ON r_1.list_code::text = rl.list_code::text CROSS JOIN generate_series('2000-01-01 00:00:00+00'::timestamp with time zone, now(), '1 mon'::interval) generate_series(generate_series)) r JOIN platform p ON p.id = r.pl_id WHERE r.data_type_id::text <> ALL (ARRAY['BK_ART'::character varying::text, 'JNL_ART'::character varying::text]);
pgsql-general by date: