Execution plan does not use index - Mailing list pgsql-general

From Peter Coppens
Subject Execution plan does not use index
Date
Msg-id CAEkf4WBnCDcqQXfw4z-TpTYucBE=AEHEQjLd3QvOtMrPrrK6fA@mail.gmail.com
Whole thread Raw
Responses Re: Execution plan does not use index  (Michael Lewis <mlewis@entrata.com>)
List pgsql-general
Hello,

Consider the following PostgreSQL 9.6.18 tables  
- measurement_value: time series table  with a unique key on (device_id,timestamp) columns and a number of columns with measurements. Table contains a large number of rows (>150million)
- device table: with device properties (short_id joins to device_id in measurement_value table)
- device_configuration table: extra device properties, including a timezone

This query works fine and quick (returns ±320K rows in 3.2 seconds when explain is removed)

explain
  select d.short_id,mv.timestamp,mv.I64_01
  from device d, device_configuration dc, measurement_value mv, pg_timezone_names tzn
  where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
        mv.timestamp > '2020-11-06'::timestamp and mv.timestamp < '2020-11-07'::timestamp


==>
Hash Join  (cost=23.63..2156655.51 rows=1454320 width=20)
  Hash Cond: (dc.timezone = pg_timezone_names.name)
  ->  Nested Loop  (cost=1.13..2089933.38 rows=409070 width=34)
        ->  Nested Loop  (cost=0.56..7497.34 rows=615 width=18)
              ->  Index Scan using device_short_id_key on device d  (cost=0.28..2423.90 rows=683 width=20)
              ->  Index Scan using device_configuration_device_latest_idx on device_configuration dc  (cost=0.28..7.42 rows=1 width=30)
                    Index Cond: ((device_id = d.id) AND (latest = true))
                    Filter: latest
        ->  Index Scan using measurement_values_pkey on measurement_value mv  (cost=0.57..3375.60 rows=1047 width=20)
              Index Cond: ((device_id = d.short_id) AND ("timestamp" > '2020-11-06 00:00:00'::timestamp without time zone) AND ("timestamp" < '2020-11-07
00:00:00'::timestamp without time zone))
  ->  Hash  (cost=10.00..10.00 rows=1000 width=32)
        ->  Function Scan on pg_timezone_names  (cost=0.00..10.00 rows=1000 width=32)



When adding the timezone offset to the where clause, the query becomes extremely slow due to a full table scan of the measurement_value table (±320K rows in 11 minutes)

(See  ->  Seq Scan on measurement_value mv  (cost=0.00..6308233.77 rows=169617977 width=20))

explain 
  select d.short_id,mv.timestamp,mv.I64_01
  from device d, device_configuration dc, measurement_value mv, pg_timezone_names tzn
  where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and
        mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset


==>
Hash Join  (cost=683.93..18226567.32 rows=60331762 width=20)
  Hash Cond: (mv.device_id = d.short_id)
  Join Filter: ((mv."timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND (mv."timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)))
  ->  Seq Scan on measurement_value mv  (cost=0.00..6308233.77 rows=169617977 width=20)
  ->  Hash  (cost=656.61..656.61 rows=2186 width=20)
        ->  Hash Join  (cost=77.87..656.61 rows=2186 width=20)
              Hash Cond: (dc.timezone = pg_timezone_names.name)
              ->  Hash Join  (cost=55.37..533.83 rows=615 width=18)
                    Hash Cond: (dc.device_id = d.id)
                    ->  Seq Scan on device_configuration dc  (cost=0.00..470.01 rows=615 width=30)
                          Filter: latest
                    ->  Hash  (cost=46.83..46.83 rows=683 width=20)
                          ->  Seq Scan on device d  (cost=0.00..46.83 rows=683 width=20)
              ->  Hash  (cost=10.00..10.00 rows=1000 width=48)
                    ->  Function Scan on pg_timezone_names  (cost=0.00..10.00 rows=1000 width=48)



Adding the tzn.utc_offset results in the fact that the execution plan no longer considers to use the index on the measurement_value table. Is there any way the SQL can be rewritten so that the index is used? Or any other solution so that the query with the timezone offset returns in a comparable time?

Many thanks!

Peter


pgsql-general by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: New "function tables" in V13 documentation
Next
From: Adrian Klaver
Date:
Subject: Re: Foreign Data Wrapper Handler