Re: Sequential scan instead of index scan - Mailing list pgsql-performance

From Ioannis Anagnostopoulos
Subject Re: Sequential scan instead of index scan
Date
Msg-id 502106CA.6090406@anatec.com
Whole thread Raw
In response to Re: Sequential scan instead of index scan  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Offhand I'd have thought that ANALYZE would gather stats on the
date_trunc expression (because it is indexed) and then you should get
something reasonably accurate for a comparison to a constant.
"Reasonably accurate" meaning "not off by two orders of magnitude".
Practically all of your runtime is going into this one indexscan,
and TBH it seems likely you'd be better off with a seqscan there.
		regards, tom lane
You were right, after running ANALYZE on the temp table I eventually got the HASH JOIN we were talking about. Here is the plan:

"Hash Join  (cost=379575.54..1507341.18 rows=95142 width=128) (actual time=3128.940..634179.270 rows=10495795 loops=1)"
"  Hash Cond: (feed_all_y2012m08.ship_pos_messages.msg_id = tmp_tbl_messages.msg_id)"
"  ->  Append  (cost=0.00..1073525.24 rows=95142 width=128) (actual time=37.157..599002.314 rows=18891614 loops=1)"
"        ->  Seq Scan on ship_pos_messages  (cost=0.00..0.00 rows=1 width=100) (actual time=0.001..0.001 rows=0 loops=1)"
"              Filter: ((date_part('day'::text, msg_date_rec) = 2::double precision) AND (date_trunc('day'::text, msg_date_rec) = '2012-08-02 00:00:00'::timestamp without time zone))"
"        ->  Seq Scan on ship_a_pos_messages ship_pos_messages  (cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0 loops=1)"
"              Filter: ((date_part('day'::text, msg_date_rec) = 2::double precision) AND (date_trunc('day'::text, msg_date_rec) = '2012-08-02 00:00:00'::timestamp without time zone))"
"        ->  Index Scan using idx_ship_b_std_pos_messages_date_trunc on ship_b_std_pos_messages ship_pos_messages  (cost=0.00..48111.95 rows=4323 width=128) (actual time=37.156..23782.030 rows=808692 loops=1)"
"              Index Cond: (date_trunc('day'::text, msg_date_rec) = '2012-08-02 00:00:00'::timestamp without time zone)"
"              Filter: (date_part('day'::text, msg_date_rec) = 2::double precision)"
"        ->  Index Scan using idx_ship_b_ext_pos_messages_date_trunc on ship_b_ext_pos_messages ship_pos_messages  (cost=0.00..1844.30 rows=154 width=128) (actual time=42.042..1270.104 rows=28656 loops=1)"
"              Index Cond: (date_trunc('day'::text, msg_date_rec) = '2012-08-02 00:00:00'::timestamp without time zone)"
"              Filter: (date_part('day'::text, msg_date_rec) = 2::double precision)"
"        ->  Index Scan using idx_ship_a_pos_messages_wk0_date_trunc on ship_a_pos_messages_wk0 ship_pos_messages  (cost=0.00..1023568.99 rows=90663 width=128) (actual time=51.181..571590.415 rows=18054266 loops=1)"
"              Index Cond: (date_trunc('day'::text, msg_date_rec) = '2012-08-02 00:00:00'::timestamp without time zone)"
"              Filter: (date_part('day'::text, msg_date_rec) = 2::double precision)"
"  ->  Hash  (cost=177590.46..177590.46 rows=12311446 width=8) (actual time=3082.762..3082.762 rows=12311446 loops=1)"
"        Buckets: 524288  Batches: 4  Memory Usage: 120316kB"
"        ->  Seq Scan on tmp_tbl_messages  (cost=0.00..177590.46 rows=12311446 width=8) (actual time=0.022..1181.376 rows=12311446 loops=1)"
"Total runtime: 634764.596 ms
"

The time looks reasonable but still quite high for the over night job I am need it for (have to run around 30 of those). So since the join has
been shorted I think I need to do something with the rows difference between actual and expected in the:

"        ->  Index Scan using idx_ship_a_pos_messages_wk0_date_trunc on ship_a_pos_messages_wk0 ship_pos_messages  (cost=0.00..1023568.99 rows=90663 width=128) (actual time=51.181..571590.415 rows=18054266 loops=1)"
"              Index Cond: (date_trunc('day'::text, msg_date_rec) = '2012-08-02 00:00:00'::timestamp without time zone)"
"              Filter: (date_part('day'::text, msg_date_rec) = 2::double precision)"

From what I understand a possible solution is to increase the stats target for the particular column(?). Any suggestion there? I assume we are talking about the
msg_date_rec where the index is build uppon.
Finally, I do understand what you say about the Seq scan. However in this case I have consistently about 10min per execution while the SeqScan was giving me almost nothing at best and usually it was running for so long that
eventually was causing my server problems...

Kind Regards
Yiannis


pgsql-performance by date:

Previous
From: "Tomas Vondra"
Date:
Subject: Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m
Next
From: Stefan Keller
Date:
Subject: Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m