Execution plan does not use index - Mailing list pgsql-general
From | Peter Coppens |
---|---|
Subject | Execution plan does not use index |
Date | |
Msg-id | CAEkf4WBnCDcqQXfw4z-TpTYucBE=AEHEQjLd3QvOtMrPrrK6fA@mail.gmail.com Whole thread Raw |
Responses |
Re: Execution plan does not use index
|
List | pgsql-general |
Hello,
Consider the following PostgreSQL 9.6.18 tables
- measurement_value: time series table with a unique key on (device_id,timestamp) columns and a number of columns with measurements. Table contains a large number of rows (>150million)
- measurement_value: time series table with a unique key on (device_id,timestamp) columns and a number of columns with measurements. Table contains a large number of rows (>150million)
- device table: with device properties (short_id joins to device_id in measurement_value table)
- device_configuration table: extra device properties, including a timezone
This query works fine and quick (returns ±320K rows in 3.2 seconds when explain is removed)
explain
select d.short_id,mv.timestamp,mv.I64_01
from device d, device_configuration dc, measurement_value mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp and mv.timestamp < '2020-11-07'::timestamp
select d.short_id,mv.timestamp,mv.I64_01
from device d, device_configuration dc, measurement_value mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp and mv.timestamp < '2020-11-07'::timestamp
==>
Hash Join (cost=23.63..2156655.51 rows=1454320 width=20)
Hash Cond: (dc.timezone = pg_timezone_names.name)
-> Nested Loop (cost=1.13..2089933.38 rows=409070 width=34)
-> Nested Loop (cost=0.56..7497.34 rows=615 width=18)
-> Index Scan using device_short_id_key on device d (cost=0.28..2423.90 rows=683 width=20)
-> Index Scan using device_configuration_device_latest_idx on device_configuration dc (cost=0.28..7.42 rows=1 width=30)
Index Cond: ((device_id = d.id) AND (latest = true))
Filter: latest
-> Index Scan using measurement_values_pkey on measurement_value mv (cost=0.57..3375.60 rows=1047 width=20)
Index Cond: ((device_id = d.short_id) AND ("timestamp" > '2020-11-06 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-07 00:00:00'::timestamp without time zone))
-> Hash (cost=10.00..10.00 rows=1000 width=32)
-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=32)
-> Nested Loop (cost=1.13..2089933.38 rows=409070 width=34)
-> Nested Loop (cost=0.56..7497.34 rows=615 width=18)
-> Index Scan using device_short_id_key on device d (cost=0.28..2423.90 rows=683 width=20)
-> Index Scan using device_configuration_device_latest_idx on device_configuration dc (cost=0.28..7.42 rows=1 width=30)
Index Cond: ((device_id = d.id) AND (latest = true))
Filter: latest
-> Index Scan using measurement_values_pkey on measurement_value mv (cost=0.57..3375.60 rows=1047 width=20)
Index Cond: ((device_id = d.short_id) AND ("timestamp" > '2020-11-06 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-07 00:00:00'::timestamp without time zone))
-> Hash (cost=10.00..10.00 rows=1000 width=32)
-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=32)
When adding the timezone offset to the where clause, the query becomes extremely slow due to a full table scan of the measurement_value table (±320K rows in 11 minutes)
(See -> Seq Scan on measurement_value mv (cost=0.00..6308233.77 rows=169617977 width=20))
explain
select d.short_id,mv.timestamp,mv.I64_01
from device d, device_configuration dc, measurement_value mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset
from device d, device_configuration dc, measurement_value mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset
==>
Hash Join (cost=683.93..18226567.32 rows=60331762 width=20)
Hash Cond: (mv.device_id = d.short_id)
Join Filter: ((mv."timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND (mv."timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)))
-> Seq Scan on measurement_value mv (cost=0.00..6308233.77 rows=169617977 width=20)
-> Hash (cost=656.61..656.61 rows=2186 width=20)
-> Hash Join (cost=77.87..656.61 rows=2186 width=20)
Hash Cond: (dc.timezone = pg_timezone_names.name)
-> Hash Join (cost=55.37..533.83 rows=615 width=18)
Hash Cond: (dc.device_id = d.id)
-> Seq Scan on device_configuration dc (cost=0.00..470.01 rows=615 width=30)
Filter: latest
-> Hash (cost=46.83..46.83 rows=683 width=20)
-> Seq Scan on device d (cost=0.00..46.83 rows=683 width=20)
-> Hash (cost=10.00..10.00 rows=1000 width=48)
-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=48)
Join Filter: ((mv."timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND (mv."timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)))
-> Seq Scan on measurement_value mv (cost=0.00..6308233.77 rows=169617977 width=20)
-> Hash (cost=656.61..656.61 rows=2186 width=20)
-> Hash Join (cost=77.87..656.61 rows=2186 width=20)
Hash Cond: (dc.timezone = pg_timezone_names.name)
-> Hash Join (cost=55.37..533.83 rows=615 width=18)
Hash Cond: (dc.device_id = d.id)
-> Seq Scan on device_configuration dc (cost=0.00..470.01 rows=615 width=30)
Filter: latest
-> Hash (cost=46.83..46.83 rows=683 width=20)
-> Seq Scan on device d (cost=0.00..46.83 rows=683 width=20)
-> Hash (cost=10.00..10.00 rows=1000 width=48)
-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=48)
Adding the tzn.utc_offset results in the fact that the execution plan no longer considers to use the index on the measurement_value table. Is there any way the SQL can be rewritten so that the index is used? Or any other solution so that the query with the timezone offset returns in a comparable time?
Many thanks!
Peter
Many thanks!
Peter
pgsql-general by date: