Performance problem on 2 PG versions on same query - Mailing list pgsql-general

From Rémy-Christophe Schermesser
Subject Performance problem on 2 PG versions on same query
Date
Msg-id CAJDhA5z0rq35BqX+LK-zcByA0mxNFi95s2aPg5=L8+B7aO8yCw@mail.gmail.com
Whole thread Raw
Responses Re: Performance problem on 2 PG versions on same query  ("Tomas Vondra" <tv@fuzzy.cz>)
List pgsql-general
Hi,

We have 2 instances of PG, one in 9.1.1 and 9.1.14. They have the same data, schema, PG configuration, and are almost identical machines, same number of cores and memory, but different cloud provider. The data was transferred with a pg_dump/pg_restore. We ran VACUUM ANALYSE, ANALYSE, and REINDEX on both machines.

One query take ~11 seconds on 9.1.1 and ~25 minutes on 9.1.14.

Can you explain why there is such a big difference, and how to fix it on 9.1.14. 
On the explain plans (below), there is two difference that we can not explain : 
  • Using of Hash Anti Join in 9.1.1, and Merge Anti Join in 9.1.14
  • The usage of the index andalertsmatch_a_mid_idx in 9.1.14 and no index on 9.1.1
Thanks for your help,


Here is the query : 
SELECT a.* , e.id AS eid, e.fk_compc_id AS ecid
FROM gsm.andalertsteam a
JOIN gsm.events e ON (a.fk_team_id = e.part_a1_id OR e.part_b1_id = a.fk_team_id)
LEFT OUTER JOIN  gsm.andalertsmatch am ON e.id = am.a_mid AND am.a_uid = a.t_uid AND am.a_bundle = a.t_bundle 
WHERE (e.eventday = 1415232000 OR e.eventday = 1415059200 OR e.eventday = 1415145600)
AND e.fk_status_id != 2
AND am.a_mid IS NULL AND a.fk_comp_id IS NULL
LIMIT 1000;


Here is the explain plan on 9.1.1 :
Limit  (cost=100414.92..107502.31 rows=1000 width=279) (actual time=6200.302..11650.567 rows=1000 loops=1)
->  Hash Anti Join  (cost=100414.92..118856.31 rows=2602 width=279) (actual time=6200.302..11650.567 rows=1000 loops=1)
      Hash Cond: ((e.id = am.a_mid) AND ((a.t_uid)::text = (am.a_uid)::text) AND ((a.t_bundle)::text = (am.a_bundle)::text))
      ->  Nested Loop  (cost=88.78..7785.80 rows=2655 width=279) (actual time=190.009..9470.460 rows=7951 loops=1)
            ->  Seq Scan on events e  (cost=0.00..1650.14 rows=2 width=32) (actual time=40.002..260.013 rows=8 loops=1)
                  Filter: ((fk_status_id <> 2) AND ((eventday = 1415232000) OR (eventday = 1415059200) OR (eventday = 141514560
0)))
            ->  Bitmap Heap Scan on andalertsteam a  (cost=88.78..3052.59 rows=1016 width=263) (actual time=178.759..1151.306 r
ows=994 loops=8)
                  Recheck Cond: ((fk_team_id = e.part_a1_id) OR (e.part_b1_id = fk_team_id))
                  Filter: (fk_comp_id IS NULL)
                  ->  BitmapOr  (cost=88.78..88.78 rows=1016 width=0) (actual time=175.008..175.008 rows=0 loops=8)
                        ->  Bitmap Index Scan on andalertsmatch_fk_team_id_idx  (cost=0.00..44.26 rows=508 width=0) (actual tim
e=37.502..37.502 rows=555 loops=8)
                              Index Cond: (fk_team_id = e.part_a1_id)
                        ->  Bitmap Index Scan on andalertsmatch_fk_team_id_idx  (cost=0.00..44.26 rows=508 width=0) (actual tim
e=137.507..137.507 rows=4192 loops=8)
                              Index Cond: (e.part_b1_id = fk_team_id)
      ->  Hash  (cost=71287.87..71287.87 rows=1064987 width=52) (actual time=2150.105..2150.105 rows=1064987 loops=1)
            Buckets: 65536  Batches: 2  Memory Usage: 44488kB
            ->  Seq Scan on andalertsmatch am  (cost=0.00..71287.87 rows=1064987 width=52) (actual time=0.000..1680.077 rows=10
64987 loops=1)

Here is the explain plan on 9.1.14 :
Limit  (cost=20.64..73294.62 rows=1000 width=279) (actual time=1419311.904..1419400.785 rows=3 loops=1)
   ->  Merge Anti Join  (cost=20.64..193463.96 rows=2640 width=279) (actual time=1419311.902..1419400.783 rows=3 loops=1)
         Merge Cond: (e.id = am.a_mid)
         Join Filter: (((am.a_uid)::text = (a.t_uid)::text) AND ((am.a_bundle)::text = (a.t_bundle)::text))
         ->  Nested Loop  (cost=20.64..8045.28 rows=2694 width=279) (actual time=13.230..555.366 rows=121063 loops=1)
               ->  Index Scan using events_pkey on events e  (cost=0.00..3369.07 rows=2 width=32) (actual time=12.494..14.003 rows=41 loops=1)
                     Filter: ((fk_status_id <> 2) AND ((eventday = 1415232000) OR (eventday = 1415059200) OR (eventday = 1415145600)))
               ->  Bitmap Heap Scan on andalertsteam a  (cost=20.64..2322.43 rows=1045 width=263) (actual time=0.467..8.612 rows=2953 loops=41)
                     Recheck Cond: ((fk_team_id = e.part_a1_id) OR (e.part_b1_id = fk_team_id))
                     Filter: (fk_comp_id IS NULL)
                     ->  BitmapOr  (cost=20.64..20.64 rows=1045 width=0) (actual time=0.335..0.335 rows=0 loops=41)
                           ->  Bitmap Index Scan on andalertsmatch_fk_team_id_idx  (cost=0.00..10.19 rows=522 width=0) (actual time=0.079..0.079 rows=401 loops=41)
                                 Index Cond: (fk_team_id = e.part_a1_id)
                           ->  Bitmap Index Scan on andalertsmatch_fk_team_id_idx  (cost=0.00..10.19 rows=522 width=0) (actual time=0.253..0.253 rows=2558 loops=41)
                                 Index Cond: (e.part_b1_id = fk_team_id)
         ->  Index Scan using andalertsmatch_a_mid_idx on andalertsmatch am  (cost=0.00..180798.61 rows=1173762 width=52) (actual time=0.015..875294.427 rows=1826118122 loops=1)
 Total runtime: 1419400.861 ms

pgsql-general by date:

Previous
From: Bernd Helmle
Date:
Subject: Re: Re: [BUGS] [HACKERS] COPY TO returning empty result with parallel ALTER TABLE
Next
From: "Tomas Vondra"
Date:
Subject: Re: Performance problem on 2 PG versions on same query