Thread: Execution plan does not use index
Hello,
Consider the following PostgreSQL 9.6.18 tables
- measurement_value: time series table with a unique key on (device_id,timestamp) columns and a number of columns with measurements. Table contains a large number of rows (>150million)
- measurement_value: time series table with a unique key on (device_id,timestamp) columns and a number of columns with measurements. Table contains a large number of rows (>150million)
- device table: with device properties (short_id joins to device_id in measurement_value table)
- device_configuration table: extra device properties, including a timezone
This query works fine and quick (returns ±320K rows in 3.2 seconds when explain is removed)
explain
select d.short_id,mv.timestamp,mv.I64_01
from device d, device_configuration dc, measurement_value mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp and mv.timestamp < '2020-11-07'::timestamp
select d.short_id,mv.timestamp,mv.I64_01
from device d, device_configuration dc, measurement_value mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp and mv.timestamp < '2020-11-07'::timestamp
==>
Hash Join (cost=23.63..2156655.51 rows=1454320 width=20)
Hash Cond: (dc.timezone = pg_timezone_names.name)
-> Nested Loop (cost=1.13..2089933.38 rows=409070 width=34)
-> Nested Loop (cost=0.56..7497.34 rows=615 width=18)
-> Index Scan using device_short_id_key on device d (cost=0.28..2423.90 rows=683 width=20)
-> Index Scan using device_configuration_device_latest_idx on device_configuration dc (cost=0.28..7.42 rows=1 width=30)
Index Cond: ((device_id = d.id) AND (latest = true))
Filter: latest
-> Index Scan using measurement_values_pkey on measurement_value mv (cost=0.57..3375.60 rows=1047 width=20)
Index Cond: ((device_id = d.short_id) AND ("timestamp" > '2020-11-06 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-07 00:00:00'::timestamp without time zone))
-> Hash (cost=10.00..10.00 rows=1000 width=32)
-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=32)
-> Nested Loop (cost=1.13..2089933.38 rows=409070 width=34)
-> Nested Loop (cost=0.56..7497.34 rows=615 width=18)
-> Index Scan using device_short_id_key on device d (cost=0.28..2423.90 rows=683 width=20)
-> Index Scan using device_configuration_device_latest_idx on device_configuration dc (cost=0.28..7.42 rows=1 width=30)
Index Cond: ((device_id = d.id) AND (latest = true))
Filter: latest
-> Index Scan using measurement_values_pkey on measurement_value mv (cost=0.57..3375.60 rows=1047 width=20)
Index Cond: ((device_id = d.short_id) AND ("timestamp" > '2020-11-06 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-07 00:00:00'::timestamp without time zone))
-> Hash (cost=10.00..10.00 rows=1000 width=32)
-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=32)
When adding the timezone offset to the where clause, the query becomes extremely slow due to a full table scan of the measurement_value table (±320K rows in 11 minutes)
(See -> Seq Scan on measurement_value mv (cost=0.00..6308233.77 rows=169617977 width=20))
explain
select d.short_id,mv.timestamp,mv.I64_01
from device d, device_configuration dc, measurement_value mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset
from device d, device_configuration dc, measurement_value mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset
==>
Hash Join (cost=683.93..18226567.32 rows=60331762 width=20)
Hash Cond: (mv.device_id = d.short_id)
Join Filter: ((mv."timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND (mv."timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)))
-> Seq Scan on measurement_value mv (cost=0.00..6308233.77 rows=169617977 width=20)
-> Hash (cost=656.61..656.61 rows=2186 width=20)
-> Hash Join (cost=77.87..656.61 rows=2186 width=20)
Hash Cond: (dc.timezone = pg_timezone_names.name)
-> Hash Join (cost=55.37..533.83 rows=615 width=18)
Hash Cond: (dc.device_id = d.id)
-> Seq Scan on device_configuration dc (cost=0.00..470.01 rows=615 width=30)
Filter: latest
-> Hash (cost=46.83..46.83 rows=683 width=20)
-> Seq Scan on device d (cost=0.00..46.83 rows=683 width=20)
-> Hash (cost=10.00..10.00 rows=1000 width=48)
-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=48)
Join Filter: ((mv."timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND (mv."timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)))
-> Seq Scan on measurement_value mv (cost=0.00..6308233.77 rows=169617977 width=20)
-> Hash (cost=656.61..656.61 rows=2186 width=20)
-> Hash Join (cost=77.87..656.61 rows=2186 width=20)
Hash Cond: (dc.timezone = pg_timezone_names.name)
-> Hash Join (cost=55.37..533.83 rows=615 width=18)
Hash Cond: (dc.device_id = d.id)
-> Seq Scan on device_configuration dc (cost=0.00..470.01 rows=615 width=30)
Filter: latest
-> Hash (cost=46.83..46.83 rows=683 width=20)
-> Seq Scan on device d (cost=0.00..46.83 rows=683 width=20)
-> Hash (cost=10.00..10.00 rows=1000 width=48)
-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=48)
Adding the tzn.utc_offset results in the fact that the execution plan no longer considers to use the index on the measurement_value table. Is there any way the SQL can be rewritten so that the index is used? Or any other solution so that the query with the timezone offset returns in a comparable time?
Many thanks!
Peter
Many thanks!
Peter
On Mon, Nov 9, 2020 at 1:11 PM Peter Coppens <peter.coppens@datylon.com> wrote:
Adding the tzn.utc_offset results in the fact that the execution plan no longer considers to use the index on the measurement_value table. Is there any way the SQL can be rewritten so that the index is used? Or any other solution so that the query with the timezone offset returns in a comparable time?
I am not aware of a best practice to handle this. Your where condition on mv.timestamp now depends on several joins to do a filtering that used to be a static range that can be scanned into the index as a first node in the plan. I have sometimes used a sub-query on a broader condition that allows the use of the index, and then fully reducing the set later. Something like this-
select d.short_id,mv.timestamp,mv.I64_01
from device d, device_configuration dc, (
from device d, device_configuration dc, (
select mv.*
from measurement_value AS mv_inner
where mv.timestamp > '2020-11-06'::timestamp - interval '1 day' and mv.timestamp < '2020-11-07'::timestamp + interval '1 day'
where mv.timestamp > '2020-11-06'::timestamp - interval '1 day' and mv.timestamp < '2020-11-07'::timestamp + interval '1 day'
offset 0 /* to prevent in-lining the join to the outside set */
) mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset
By the way, it seems a little odd to be exclusive on both the begin and end. I'd usually expect timestamp >= start_date and timestamp < end_date + interval '1 day' to fully capture a 24 hour period. Right now, you are excluding any data that happens to have a timestamp value with .000000 seconds (midnight exactly).
Michael
Many thanks for spending your time on this. Your alternative does not help unfortunately (see execution plan)
Still a sequential scan on the complete table. I have tried many alternatives and somehow whenever I add a column that is not in the index (I64_01) the optimizer decides not to use the index. If I remove that column, the index is used. I guess it estimates that the extra indirection from index pages to the row pages is more costly than scanning the 168M records. Pretty sure it’s not, but I cannot explain it to the stubborn thing :)
Btw, thanks for the >= tip (I was aware of it)
Wkr,
Peter
Hash Join (cost=683.93..7270857.46 rows=458127 width=20)
Hash Cond: (mv_inner.device_id = d.short_id)
Join Filter: ((mv_inner."timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND (mv_inner."timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)))
-> Seq Scan on measurement_value mv_inner (cost=0.00..7166797.33 rows=1287989 width=1006)
Filter: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone))
-> Hash (cost=656.61..656.61 rows=2186 width=20)
-> Hash Join (cost=77.87..656.61 rows=2186 width=20)
Hash Cond: (dc.timezone = pg_timezone_names.name)
-> Hash Join (cost=55.37..533.83 rows=615 width=18)
Hash Cond: (dc.device_id = d.id)
-> Seq Scan on device_configuration dc (cost=0.00..470.01 rows=615 width=30)
Filter: latest
-> Hash (cost=46.83..46.83 rows=683 width=20)
-> Seq Scan on device d (cost=0.00..46.83 rows=683 width=20)
-> Hash (cost=10.00..10.00 rows=1000 width=48)
-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=48)
On 10 Nov 2020, at 01:15, Michael Lewis <mlewis@entrata.com> wrote:On Mon, Nov 9, 2020 at 1:11 PM Peter Coppens <peter.coppens@datylon.com> wrote:Adding the tzn.utc_offset results in the fact that the execution plan no longer considers to use the index on the measurement_value table. Is there any way the SQL can be rewritten so that the index is used? Or any other solution so that the query with the timezone offset returns in a comparable time?I am not aware of a best practice to handle this. Your where condition on mv.timestamp now depends on several joins to do a filtering that used to be a static range that can be scanned into the index as a first node in the plan. I have sometimes used a sub-query on a broader condition that allows the use of the index, and then fully reducing the set later. Something like this-select d.short_id,mv.timestamp,mv.I64_01
from device d, device_configuration dc, (select mv.*from measurement_value AS mv_inner
where mv.timestamp > '2020-11-06'::timestamp - interval '1 day' and mv.timestamp < '2020-11-07'::timestamp + interval '1 day'offset 0 /* to prevent in-lining the join to the outside set */) mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offsetBy the way, it seems a little odd to be exclusive on both the begin and end. I'd usually expect timestamp >= start_date and timestamp < end_date + interval '1 day' to fully capture a 24 hour period. Right now, you are excluding any data that happens to have a timestamp value with .000000 seconds (midnight exactly).
út 10. 11. 2020 v 8:18 odesílatel Peter Coppens <peter.coppens@datylon.com> napsal:
MichaelMany thanks for spending your time on this. Your alternative does not help unfortunately (see execution plan)Still a sequential scan on the complete table. I have tried many alternatives and somehow whenever I add a column that is not in the index (I64_01) the optimizer decides not to use the index. If I remove that column, the index is used. I guess it estimates that the extra indirection from index pages to the row pages is more costly than scanning the 168M records. Pretty sure it’s not, but I cannot explain it to the stubborn thing :)Btw, thanks for the >= tip (I was aware of it)Wkr,PeterHash Join (cost=683.93..7270857.46 rows=458127 width=20)Hash Cond: (mv_inner.device_id = d.short_id)Join Filter: ((mv_inner."timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND (mv_inner."timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)))-> Seq Scan on measurement_value mv_inner (cost=0.00..7166797.33 rows=1287989 width=1006)Filter: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone))
when you see cast in filter, then you should check type equality in constraints. With some exception Postgres uses indexes only when filtered value has same type like column type.
Maybe there is inconsistency between timestamp (with time zone), and timestamp without time zone
Regards
Pavel
-> Hash (cost=656.61..656.61 rows=2186 width=20)-> Hash Join (cost=77.87..656.61 rows=2186 width=20)Hash Cond: (dc.timezone = pg_timezone_names.name)-> Hash Join (cost=55.37..533.83 rows=615 width=18)Hash Cond: (dc.device_id = d.id)-> Seq Scan on device_configuration dc (cost=0.00..470.01 rows=615 width=30)Filter: latest-> Hash (cost=46.83..46.83 rows=683 width=20)-> Seq Scan on device d (cost=0.00..46.83 rows=683 width=20)-> Hash (cost=10.00..10.00 rows=1000 width=48)-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=48)On 10 Nov 2020, at 01:15, Michael Lewis <mlewis@entrata.com> wrote:On Mon, Nov 9, 2020 at 1:11 PM Peter Coppens <peter.coppens@datylon.com> wrote:Adding the tzn.utc_offset results in the fact that the execution plan no longer considers to use the index on the measurement_value table. Is there any way the SQL can be rewritten so that the index is used? Or any other solution so that the query with the timezone offset returns in a comparable time?I am not aware of a best practice to handle this. Your where condition on mv.timestamp now depends on several joins to do a filtering that used to be a static range that can be scanned into the index as a first node in the plan. I have sometimes used a sub-query on a broader condition that allows the use of the index, and then fully reducing the set later. Something like this-select d.short_id,mv.timestamp,mv.I64_01
from device d, device_configuration dc, (select mv.*from measurement_value AS mv_inner
where mv.timestamp > '2020-11-06'::timestamp - interval '1 day' and mv.timestamp < '2020-11-07'::timestamp + interval '1 day'offset 0 /* to prevent in-lining the join to the outside set */) mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offsetBy the way, it seems a little odd to be exclusive on both the begin and end. I'd usually expect timestamp >= start_date and timestamp < end_date + interval '1 day' to fully capture a 24 hour period. Right now, you are excluding any data that happens to have a timestamp value with .000000 seconds (midnight exactly).
Pavel
Tx for the tip. But given that if the I64_01 column is removed from the select list, the index is used I guess the cast is not likely to be the cause.
Like so
explain
select d.short_id,mv.timestamp --,mv.I64_01
from device d, device_configuration dc, measurement_value mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset
==>
Nested Loop (cost=1.13..6217004.08 rows=60425437 width=12)
-> Nested Loop (cost=0.56..21334.84 rows=2186 width=20)
Join Filter: (dc.timezone = pg_timezone_names.name)
-> Nested Loop (cost=0.56..7497.34 rows=615 width=18)
-> Index Scan using device_short_id_key on device d (cost=0.28..2423.90 rows=683 width=20)
-> Index Scan using device_configuration_device_latest_idx on device_configuration dc (cost=0.28..7.42 rows=1 width=30)
Index Cond: ((device_id = d.id) AND (latest = true))
Filter: latest
-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=48)
-> Index Only Scan using measurement_values_pkey on measurement_value mv (cost=0.57..2399.33 rows=43492 width=12)
Index Cond: ((device_id = d.short_id) AND ("timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND ("timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)))
Peter
On 10 Nov 2020, at 08:25, Pavel Stehule <pavel.stehule@gmail.com> wrote:út 10. 11. 2020 v 8:18 odesílatel Peter Coppens <peter.coppens@datylon.com> napsal:MichaelMany thanks for spending your time on this. Your alternative does not help unfortunately (see execution plan)Still a sequential scan on the complete table. I have tried many alternatives and somehow whenever I add a column that is not in the index (I64_01) the optimizer decides not to use the index. If I remove that column, the index is used. I guess it estimates that the extra indirection from index pages to the row pages is more costly than scanning the 168M records. Pretty sure it’s not, but I cannot explain it to the stubborn thing :)Btw, thanks for the >= tip (I was aware of it)Wkr,PeterHash Join (cost=683.93..7270857.46 rows=458127 width=20)Hash Cond: (mv_inner.device_id = d.short_id)Join Filter: ((mv_inner."timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND (mv_inner."timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)))-> Seq Scan on measurement_value mv_inner (cost=0.00..7166797.33 rows=1287989 width=1006)Filter: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone))when you see cast in filter, then you should check type equality in constraints. With some exception Postgres uses indexes only when filtered value has same type like column type.Maybe there is inconsistency between timestamp (with time zone), and timestamp without time zoneRegardsPavel-> Hash (cost=656.61..656.61 rows=2186 width=20)-> Hash Join (cost=77.87..656.61 rows=2186 width=20)Hash Cond: (dc.timezone = pg_timezone_names.name)-> Hash Join (cost=55.37..533.83 rows=615 width=18)Hash Cond: (dc.device_id = d.id)-> Seq Scan on device_configuration dc (cost=0.00..470.01 rows=615 width=30)Filter: latest-> Hash (cost=46.83..46.83 rows=683 width=20)-> Seq Scan on device d (cost=0.00..46.83 rows=683 width=20)-> Hash (cost=10.00..10.00 rows=1000 width=48)-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=48)On 10 Nov 2020, at 01:15, Michael Lewis <mlewis@entrata.com> wrote:On Mon, Nov 9, 2020 at 1:11 PM Peter Coppens <peter.coppens@datylon.com> wrote:Adding the tzn.utc_offset results in the fact that the execution plan no longer considers to use the index on the measurement_value table. Is there any way the SQL can be rewritten so that the index is used? Or any other solution so that the query with the timezone offset returns in a comparable time?I am not aware of a best practice to handle this. Your where condition on mv.timestamp now depends on several joins to do a filtering that used to be a static range that can be scanned into the index as a first node in the plan. I have sometimes used a sub-query on a broader condition that allows the use of the index, and then fully reducing the set later. Something like this-select d.short_id,mv.timestamp,mv.I64_01
from device d, device_configuration dc, (select mv.*from measurement_value AS mv_inner
where mv.timestamp > '2020-11-06'::timestamp - interval '1 day' and mv.timestamp < '2020-11-07'::timestamp + interval '1 day'offset 0 /* to prevent in-lining the join to the outside set */) mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offsetBy the way, it seems a little odd to be exclusive on both the begin and end. I'd usually expect timestamp >= start_date and timestamp < end_date + interval '1 day' to fully capture a 24 hour period. Right now, you are excluding any data that happens to have a timestamp value with .000000 seconds (midnight exactly).
Triggered by Michael mentioning subqueries I ended up trying
explain
select d.short_id,mv.timestamp ,mv.I64_01
from device d, device_configuration dc, measurement_value mv
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and
mv.timestamp > '2020-11-06'::timestamp - (select tzn.utc_offset from pg_timezone_names tzn where tzn.name=dc.timezone) and mv.timestamp < '2020-11-07'::timestamp - (select tzn.utc_offset from pg_timezone_names tzn where tzn.name=dc.timezone)
==>
Nested Loop (cost=25.85..84540074.64 rows=16996885 width=20)
-> Seq Scan on device_configuration dc (cost=0.00..470.01 rows=615 width=30)
Filter: latest
-> Nested Loop (cost=25.85..137027.83 rows=43494 width=36)
-> Index Scan using device_pkey on device d (cost=0.28..7.23 rows=1 width=20)
Index Cond: (id = dc.device_id)
-> Index Scan using measurement_values_pkey on measurement_value mv (cost=25.58..136585.66 rows=43494 width=20)
Index Cond: ((device_id = d.short_id) AND ("timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - (SubPlan 1))) AND ("timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - (SubPlan 2))))
SubPlan 1
-> Function Scan on pg_timezone_names (cost=0.00..12.50 rows=5 width=16)
Filter: (name = dc.timezone)
SubPlan 2
-> Function Scan on pg_timezone_names pg_timezone_names_1 (cost=0.00..12.50 rows=5 width=16)
Filter: (name = dc.timezone)
Now returns the 320K in less than 5sec.
I was till now convinced that correlated subqueries or joins are equivalent. I guess I was wrong :). Wonder how stable this plan will be though
Peter
On 10 Nov 2020, at 09:06, Peter Coppens <peter.coppens@datylon.com> wrote:PavelTx for the tip. But given that if the I64_01 column is removed from the select list, the index is used I guess the cast is not likely to be the cause.Like soexplainselect d.short_id,mv.timestamp --,mv.I64_01from device d, device_configuration dc, measurement_value mv, pg_timezone_names tznwhere mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name andmv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset==>Nested Loop (cost=1.13..6217004.08 rows=60425437 width=12)-> Nested Loop (cost=0.56..21334.84 rows=2186 width=20)Join Filter: (dc.timezone = pg_timezone_names.name)-> Nested Loop (cost=0.56..7497.34 rows=615 width=18)-> Index Scan using device_short_id_key on device d (cost=0.28..2423.90 rows=683 width=20)-> Index Scan using device_configuration_device_latest_idx on device_configuration dc (cost=0.28..7.42 rows=1 width=30)Index Cond: ((device_id = d.id) AND (latest = true))Filter: latest-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=48)-> Index Only Scan using measurement_values_pkey on measurement_value mv (cost=0.57..2399.33 rows=43492 width=12)Index Cond: ((device_id = d.short_id) AND ("timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND ("timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)))PeterOn 10 Nov 2020, at 08:25, Pavel Stehule <pavel.stehule@gmail.com> wrote:út 10. 11. 2020 v 8:18 odesílatel Peter Coppens <peter.coppens@datylon.com> napsal:MichaelMany thanks for spending your time on this. Your alternative does not help unfortunately (see execution plan)Still a sequential scan on the complete table. I have tried many alternatives and somehow whenever I add a column that is not in the index (I64_01) the optimizer decides not to use the index. If I remove that column, the index is used. I guess it estimates that the extra indirection from index pages to the row pages is more costly than scanning the 168M records. Pretty sure it’s not, but I cannot explain it to the stubborn thing :)Btw, thanks for the >= tip (I was aware of it)Wkr,PeterHash Join (cost=683.93..7270857.46 rows=458127 width=20)Hash Cond: (mv_inner.device_id = d.short_id)Join Filter: ((mv_inner."timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND (mv_inner."timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)))-> Seq Scan on measurement_value mv_inner (cost=0.00..7166797.33 rows=1287989 width=1006)Filter: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone))when you see cast in filter, then you should check type equality in constraints. With some exception Postgres uses indexes only when filtered value has same type like column type.Maybe there is inconsistency between timestamp (with time zone), and timestamp without time zoneRegardsPavel-> Hash (cost=656.61..656.61 rows=2186 width=20)-> Hash Join (cost=77.87..656.61 rows=2186 width=20)Hash Cond: (dc.timezone = pg_timezone_names.name)-> Hash Join (cost=55.37..533.83 rows=615 width=18)Hash Cond: (dc.device_id = d.id)-> Seq Scan on device_configuration dc (cost=0.00..470.01 rows=615 width=30)Filter: latest-> Hash (cost=46.83..46.83 rows=683 width=20)-> Seq Scan on device d (cost=0.00..46.83 rows=683 width=20)-> Hash (cost=10.00..10.00 rows=1000 width=48)-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=48)On 10 Nov 2020, at 01:15, Michael Lewis <mlewis@entrata.com> wrote:On Mon, Nov 9, 2020 at 1:11 PM Peter Coppens <peter.coppens@datylon.com> wrote:Adding the tzn.utc_offset results in the fact that the execution plan no longer considers to use the index on the measurement_value table. Is there any way the SQL can be rewritten so that the index is used? Or any other solution so that the query with the timezone offset returns in a comparable time?I am not aware of a best practice to handle this. Your where condition on mv.timestamp now depends on several joins to do a filtering that used to be a static range that can be scanned into the index as a first node in the plan. I have sometimes used a sub-query on a broader condition that allows the use of the index, and then fully reducing the set later. Something like this-select d.short_id,mv.timestamp,mv.I64_01
from device d, device_configuration dc, (select mv.*from measurement_value AS mv_inner
where mv.timestamp > '2020-11-06'::timestamp - interval '1 day' and mv.timestamp < '2020-11-07'::timestamp + interval '1 day'offset 0 /* to prevent in-lining the join to the outside set */) mv, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offsetBy the way, it seems a little odd to be exclusive on both the begin and end. I'd usually expect timestamp >= start_date and timestamp < end_date + interval '1 day' to fully capture a 24 hour period. Right now, you are excluding any data that happens to have a timestamp value with .000000 seconds (midnight exactly).
On Tue, Nov 10, 2020 at 1:25 AM Peter Coppens <peter.coppens@datylon.com> wrote:
Triggered by Michael mentioning subqueries I ended up tryingexplainselect d.short_id,mv.timestamp ,mv.I64_01from device d, device_configuration dc, measurement_value mvwhere mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and==>Nested Loop (cost=25.85..84540074.64 rows=16996885 width=20)-> Seq Scan on device_configuration dc (cost=0.00..470.01 rows=615 width=30)Filter: latest-> Nested Loop (cost=25.85..137027.83 rows=43494 width=36)-> Index Scan using device_pkey on device d (cost=0.28..7.23 rows=1 width=20)Index Cond: (id = dc.device_id)-> Index Scan using measurement_values_pkey on measurement_value mv (cost=25.58..136585.66 rows=43494 width=20)Index Cond: ((device_id = d.short_id) AND ("timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - (SubPlan 1))) AND ("timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - (SubPlan 2))))SubPlan 1-> Function Scan on pg_timezone_names (cost=0.00..12.50 rows=5 width=16)Filter: (name = dc.timezone)SubPlan 2-> Function Scan on pg_timezone_names pg_timezone_names_1 (cost=0.00..12.50 rows=5 width=16)Filter: (name = dc.timezone)Now returns the 320K in less than 5sec.I was till now convinced that correlated subqueries or joins are equivalent. I guess I was wrong :). Wonder how stable this plan will be though
Curious, what is seq_page_cost and random_page_cost? Any idea of your cache hits for indexes? If they are very high and/or you have SSD or similar fast storage, then maybe random_page_cost should be 1.1-2 and not default 4 (assuming seq_page_cost is still 1). The planner will be more likely to use an index scan if the expected cost for scanning an index (random) is closer to a sequential read. Sorry if this explanation is completely superfluous and you have already configured this.
It would be interesting to see explain (analyze buffers) output so we can see the actual counts for these nodes. I'm rather surprised that the query I provided didn't use the timestamp index unless the inclusion of 3 days worth of range meant that it was estimated to be too high a fraction of the table. If you just execute only the subquery, is the index used by chance?
Top posting (reply with all previous copied below) is discouraged on these lists. I think because it makes the archives rather messy.
Curious, what is seq_page_cost and random_page_cost?
show seq_page_cost
->1
show random_page_cost
->4
Any idea of your cache hits for indexes?
No, I am afraid not. It’s been a long time since I went that deep in the RDBMS behaviour (must have been Oracle 7.something :) )
If they are very high and/or you have SSD or similar fast storage, then maybe random_page_cost should be 1.1-2 and not default 4 (assuming seq_page_cost is still 1). The planner will be more likely to use an index scan if the expected cost for scanning an index (random) is closer to a sequential read. Sorry if this explanation is completely superfluous and you have already configured this.
I played around with some of the settings (also with setting enable_seqscan to false). While some of the plans did no longer use a full table scan, the performance did not really work out when I tried. Tbh, I never got too deep in those plans to find out why that would be, as I am not too keen on changing these settings in the first place.
It would be interesting to see explain (analyze buffers) output so we can see the actual counts for these nodes. I'm rather surprised that the query I provided didn't use the timestamp index unless the inclusion of 3 days worth of range meant that it was estimated to be too high a fraction of the table. If you just execute only the subquery, is the index used by chance?
Index is not used for the subquery
explain
select mv_inner.*
from measurement_value AS mv_inner
where mv_inner.timestamp > '2020-11-06'::timestamp - interval '1 day' and mv_inner.timestamp < '2020-11-07'::timestamp + interval '1 day'
offset 0
==>
Seq Scan on measurement_value mv_inner (cost=0.00..7175777.00 rows=1219616 width=1006)
Filter: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone))
Filter: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone))
Top posting (reply with all previous copied below) is discouraged on these lists. I think because it makes the archives rather messy.
Will try to remember
On Tue, Nov 10, 2020, 3:24 PM Peter Coppens <peter.coppens@datylon.com> wrote:
Index is not used for the subqueryexplainselect mv_inner.*from measurement_value AS mv_innerwhere mv_inner.timestamp > '2020-11-06'::timestamp - interval '1 day' and mv_inner.timestamp < '2020-11-07'::timestamp + interval '1 day'offset 0==>Seq Scan on measurement_value mv_inner (cost=0.00..7175777.00 rows=1219616 width=1006)
Filter: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone))
Curious, how accurate is that row count of 1.2 million records for 3 days? How many total records in the table? If you disable sequential scan, does it choose the index and what cost does it show?
Curious, how accurate is that row count of 1.2 million records for 3 days?
Not to bad actually
select count(mv_inner.*)
from measurement_value AS mv_inner
where mv_inner.timestamp > '2020-11-06'::timestamp - interval '1 day' and mv_inner.timestamp < '2020-11-07'::timestamp + interval '1 day'
offset 0
==> 1128736
How many total records in the table?
±168 million
If you disable sequential scan, does it choose the index and what cost does it show?
It chooses the index, but apparently to create some intermediate structure that then later still needs to be joined on the device_id. Probably requires scanning all pages of the index, which might explain why the performance is still not ok
set enable_seqscan to false;
explain
select d.short_id,mv.timestamp,mv.I64_01
from
device d
, device_configuration dc
, (
select mv_inner.*
from measurement_value AS mv_inner
where mv_inner.timestamp > '2020-11-06'::timestamp - interval '1 day' and mv_inner.timestamp < '2020-11-07'::timestamp + interval '1 day'
offset 0
) mv
, pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset
==>
Hash Join (cost=6677594.18..9545649.57 rows=434126 width=20)
Hash Cond: (mv_inner.device_id = d.short_id)
Join Filter: ((mv_inner."timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND (mv_inner."timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)))
-> Bitmap Heap Scan on measurement_value mv_inner (cost=6676540.29..9446603.90 rows=1220458 width=1006)
Recheck Cond: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on measurement_values_pkey (cost=0.00..6676235.18 rows=1220458 width=0)
Index Cond: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone))
-> Hash (cost=1026.55..1026.55 rows=2187 width=20)
-> Hash Join (cost=471.95..1026.55 rows=2187 width=20)
Hash Cond: (dc.timezone = pg_timezone_names.name)
-> Hash Join (cost=449.45..903.76 rows=615 width=18)
Hash Cond: (dc.device_id = d.id)
-> Bitmap Heap Scan on device_configuration dc (cost=242.72..688.58 rows=615 width=30)
Filter: latest
-> Bitmap Index Scan on device_configuration_device_latest_idx (cost=0.00..242.57 rows=615 width=0)
Index Cond: (latest = true)
-> Hash (cost=198.19..198.19 rows=683 width=20)
-> Index Scan using device_short_id_key on device d (cost=0.28..198.19 rows=683 width=20)
-> Hash (cost=10.00..10.00 rows=1000 width=48)
-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=48)
On Tue, Nov 10, 2020, 10:51 PM Peter Coppens <peter.coppens@datylon.com> wrote:
If you disable sequential scan, does it choose the index and what cost does it show?It chooses the index, but apparently to create some intermediate structure that then later still needs to be joined on the device_id. Probably requires scanning all pages of the index, which might explain why the performance is still not ok
Ahhh. You don't have a single column index on the timestamp value or a multi column one with timestamp first. No wonder the subquery didn't help. My apologies for not realizing that before. Thanks for satisfying my curiosity why it didn't perform like it should. Certainly, that index may or may not be worth creating and maintaining.
Obviously the 3 days worth of data is also too high (just quick and safe). Depending on what your server timezone vs the the most divergent timezone on a device, that could be tightened up. Regardless. If the 5 seconds runtime you got to with the correlated subquery on the where clause is sufficient, then no need to continue I suppose.
It seems odd to me to not do any basic adjustment of random_page_cost though. It isn't a magic number that the core team know to be perfect. It is a baseline that is likely to be quite different for each use case and server config. While there are no hard and fast rules and absolute right answers, it seems prudent to at least follow the advice of the community and lower it a ways if storage is ssd style and/or cache hits are quite high.
> > Ahhh. You don't have a single column index on the timestamp value or a multi column one with timestamp first. No wonderthe subquery didn't help. My apologies for not realizing that before. Thanks for satisfying my curiosity why it didn'tperform like it should. Certainly, that index may or may not be worth creating and maintaining. Indeed. Typical queries use both device id and a timestamp range. Only in some queries more than a few devices are needed. > > Obviously the 3 days worth of data is also too high (just quick and safe). Depending on what your server timezone vs thethe most divergent timezone on a device, that could be tightened up. Regardless. If the 5 seconds runtime you got to withthe correlated subquery on the where clause is sufficient, then no need to continue I suppose. Yeah, I am ok with the performance now. The overall query where this was extracted from (and which had multiple full tablescans) went from more than 2 hours to 30 seconds. Given it’s a daily ‘async’ report that is sufficient. > > It seems odd to me to not do any basic adjustment of random_page_cost though. It isn't a magic number that the core teamknow to be perfect. It is a baseline that is likely to be quite different for each use case and server config. Whilethere are no hard and fast rules and absolute right answers, it seems prudent to at least follow the advice of the communityand lower it a ways if storage is ssd style and/or cache hits are quite high. Ic. Well I don’t mine experimenting with it, and will certainly remember it next time. I guess I was demotivated becauseI read lot’s of warnings but these might have been about disabling sequential scans and not about page cost settings.
On Wed, Nov 11, 2020, 7:30 AM Peter Coppens <peter.coppens@datylon.com> wrote:
> It seems odd to me to not do any basic adjustment of random_page_cost though. It isn't a magic number that the core team know to be perfect. It is a baseline that is likely to be quite different for each use case and server config. While there are no hard and fast rules and absolute right answers, it seems prudent to at least follow the advice of the community and lower it a ways if storage is ssd style and/or cache hits are quite high.
Ic. Well I don’t mine experimenting with it, and will certainly remember it next time. I guess I was demotivated because I read lot’s of warnings but these might have been about disabling sequential scans and not about page cost settings.
Ah yes. Don't handicap the system by removing a join or scan type. However, simply customizing costs that are used to decide between path A and path B when planning how to execute queries, that can definitely can be helpful to get better plans overall. Good luck!
Good luck!
Tx! And tx for your support.