Query-Planer from 6seconds TO DAYS - Mailing list pgsql-performance

From Böckler Andreas
Subject Query-Planer from 6seconds TO DAYS
Date
Msg-id 742A368E-4352-4394-BE19-6F07AA60456D@boeckler.org
Whole thread Raw
List pgsql-performance
Hi,

i've got a very strange problem on PostgreSQL 8.4, where the queryplaner goes absolutely havoc, when slightly changing
oneparameter. 

First the Tables which are involved:
1.                                                Table "public.spsdata"
           Column            |            Type             |                           Modifiers
   

-----------------------------+-----------------------------+---------------------------------------------------------------
 data_id                     | bigint                      | not null default nextval('spsdata_data_id_seq'::regclass)
 machine_id                  | integer                     |
 timestamp                   | timestamp with time zone    |
 value1                      | ….
 value2                      | ….
 errorcode                   | integer
...

This table is partitioned (per month) and holds about 3.86203 * 10^9 records (the machines are generating data every 5
seconds)
Every partition (=month) has about 36 * 10^6 records and has following indexes/constraints:
Indexes:
    "spsdata_2012m09_machine_id_key" UNIQUE, btree (machine_id, "timestamp")
Check constraints:
    "spsdata_2012m09_timestamp_check" CHECK ("timestamp" >= '2012-09-01 00:00:00+02'::timestamp with time zone AND
"timestamp"< '2012-10-01 00:00:00+02'::timestamp with time zone) 
Inherits: spsdata

constraint_exclusion is set to 'partition'

2.                                               Table "public.events"
        Column         |            Type             |                           Modifiers
-----------------------+-----------------------------+----------------------------------------------------------------
 event_id              | bigint                      | not null default nextval('events_event_id_seq'::regclass)
 machine_id            | integer                     |
 timestamp             | timestamp without time zone |
 code                  | integer                     |
Indexes:
    "events_pkey" PRIMARY KEY, btree (event_id)
    "events_unique_key" UNIQUE, btree (machine_id, "timestamp", code)
    "events_code" btree (code)
    "events_timestamp" btree ("timestamp");

THE PROBLEM:
We're trying to select certain rows from the spsdata-table which happened before the event. The event is filtered By
code.Because the timestamp of event and data is not in sync, we look into the last 30 seconds. Here is the select: 
db=# SELECT  m.machine_id, s.timestamp, s.errorcode
FROM events m INNER JOIN spsdata as s ON (m.machine_id= m.machine_id AND s.timestamp BETWEEN m.timestamp - interval '30
seconds'AND m.timestamp) 
WHERE m.code IN 2024 AND m.timestamp BETWEEN '2012-08-14' AND '2012-08-29' AND s.errorcode in '2024';
 machine_id |       timestamp        | errorcode
------------+------------------------+-----------
        183 | 2012-08-18 18:21:29+02 |      2024
        216 | 2012-08-20 15:40:39+02 |      2024
        183 | 2012-08-21 12:56:49+02 |      2024
        183 | 2012-08-27 17:04:34+02 |      2024
        214 | 2012-08-27 23:33:44+02 |      2024
(5 rows)

Time: 6087.911 ms

When I'm changing "m.timestamp BETWEEN '2012-08-14' AND '2012-08-29'" to "m.timestamp BETWEEN '2012-08-13' AND
'2012-08-29'"the query takes HOURS.  
Here are some statistics for different ranges
2012-08-14' AND '2012-08-29' -> ca 4sec
2012-08-14' AND '2012-09-30' -> ca 4sec
2012-08-13' AND '2012-08-15' -> ca 4sec
2012-08-13' AND '2012-08-22' -> ca 4sec
2012-08-13' AND '2012-08-25' -> ca 4sec
2012-08-13' AND '2012-08-26' -> FOREVER
2012-08-14' AND '2012-08-26' -> ca 4sec
2012-08-13' AND  ( >'2012-08-26' ) -> FOREVER

The problem is the change of the query plan.
FAST:
                                                                            QUERY PLAN
                                          

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..144979241.24 rows=42662 width=14)
   Join Filter: ((s."timestamp" <= m."timestamp") AND (m.machine_id = s.machine_id) AND (s."timestamp" >=
(m."timestamp"- '00:00:30'::interval))) 
   ->  Index Scan using events_code on events m  (cost=0.00..4911.18 rows=25 width=12)
         Index Cond: (code = 2024)
         Filter: (("timestamp" >= '2012-08-14 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2012-08-26
00:00:00'::timestampwithout time zone)) 
   ->  Append  (cost=0.00..5770958.44 rows=1400738 width=14)
         ->  Index Scan using spsdata_machine_id on spsdata s  (cost=0.00..4.11 rows=1 width=14)
               Index Cond: (s.machine_id = m.machine_id)

SLOW:
                                                                                         QUERY PLAN
                                                                     

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=631.37..158275670.34 rows=47782 width=14)
   Hash Cond: (s.machine_id = m.machine_id)
   Join Filter: ((s."timestamp" <= m."timestamp") AND (s."timestamp" >= (m."timestamp" - '00:00:30'::interval)))
   ->  Append  (cost=0.00..158152325.56 rows=3071675 width=14)
         ->  Seq Scan on spsdata s  (cost=0.00..10.75 rows=1 width=14)
               Filter: (errorcode = 2024::smallint)
         ->  Seq Scan on spsdata_2009m11 s  (cost=0.00..10.75 rows=1 width=14)
               Filter: (errorcode = 2024::smallint)
         ->  Seq Scan on spsdata_2009m12 s  (cost=0.00..24897.60 rows=32231 width=14)
               Filter: (errorcode = 2024::smallint)
         ->  Seq Scan on spsdata_2010m01 s  (cost=0.00..113650.43 rows=153779 width=14)
               Filter: (errorcode = 2024::smallint)
         ->  Seq Scan on spsdata_2010m02 s  (cost=0.00..451577.41 rows=9952 width=14)
               Filter: (errorcode = 2024::smallint)
         ->  Seq Scan on spsdata_2010m03 s  (cost=0.00..732979.41 rows=16001 width=14)
               Filter: (errorcode = 2024::smallint)
         ->  Seq Scan on spsdata_2010m04 s  (cost=0.00..940208.95 rows=17699 width=14)

As you can imagine, Seq Scanning a Table(s) with  3.86203 * 10^9 records is not a good idea.
What can I do to prevent that behavior ?

Thanks

Andy

--
Andreas Böckler
andy@boeckler.org



pgsql-performance by date:

Previous
From: Steve Crawford
Date:
Subject: Re: Query with limit goes from few ms to hours
Next
From: Shaun Thomas
Date:
Subject: Setting Statistics on Functional Indexes