Thread: Performance problem on 2 PG versions on same query

Performance problem on 2 PG versions on same query

From
Rémy-Christophe Schermesser
Date:
Hi,

We have 2 instances of PG, one in 9.1.1 and 9.1.14. They have the same data, schema, PG configuration, and are almost identical machines, same number of cores and memory, but different cloud provider. The data was transferred with a pg_dump/pg_restore. We ran VACUUM ANALYSE, ANALYSE, and REINDEX on both machines.

One query take ~11 seconds on 9.1.1 and ~25 minutes on 9.1.14.

Can you explain why there is such a big difference, and how to fix it on 9.1.14. 
On the explain plans (below), there is two difference that we can not explain : 
  • Using of Hash Anti Join in 9.1.1, and Merge Anti Join in 9.1.14
  • The usage of the index andalertsmatch_a_mid_idx in 9.1.14 and no index on 9.1.1
Thanks for your help,


Here is the query : 
SELECT a.* , e.id AS eid, e.fk_compc_id AS ecid
FROM gsm.andalertsteam a
JOIN gsm.events e ON (a.fk_team_id = e.part_a1_id OR e.part_b1_id = a.fk_team_id)
LEFT OUTER JOIN  gsm.andalertsmatch am ON e.id = am.a_mid AND am.a_uid = a.t_uid AND am.a_bundle = a.t_bundle 
WHERE (e.eventday = 1415232000 OR e.eventday = 1415059200 OR e.eventday = 1415145600)
AND e.fk_status_id != 2
AND am.a_mid IS NULL AND a.fk_comp_id IS NULL
LIMIT 1000;


Here is the explain plan on 9.1.1 :
Limit  (cost=100414.92..107502.31 rows=1000 width=279) (actual time=6200.302..11650.567 rows=1000 loops=1)
->  Hash Anti Join  (cost=100414.92..118856.31 rows=2602 width=279) (actual time=6200.302..11650.567 rows=1000 loops=1)
      Hash Cond: ((e.id = am.a_mid) AND ((a.t_uid)::text = (am.a_uid)::text) AND ((a.t_bundle)::text = (am.a_bundle)::text))
      ->  Nested Loop  (cost=88.78..7785.80 rows=2655 width=279) (actual time=190.009..9470.460 rows=7951 loops=1)
            ->  Seq Scan on events e  (cost=0.00..1650.14 rows=2 width=32) (actual time=40.002..260.013 rows=8 loops=1)
                  Filter: ((fk_status_id <> 2) AND ((eventday = 1415232000) OR (eventday = 1415059200) OR (eventday = 141514560
0)))
            ->  Bitmap Heap Scan on andalertsteam a  (cost=88.78..3052.59 rows=1016 width=263) (actual time=178.759..1151.306 r
ows=994 loops=8)
                  Recheck Cond: ((fk_team_id = e.part_a1_id) OR (e.part_b1_id = fk_team_id))
                  Filter: (fk_comp_id IS NULL)
                  ->  BitmapOr  (cost=88.78..88.78 rows=1016 width=0) (actual time=175.008..175.008 rows=0 loops=8)
                        ->  Bitmap Index Scan on andalertsmatch_fk_team_id_idx  (cost=0.00..44.26 rows=508 width=0) (actual tim
e=37.502..37.502 rows=555 loops=8)
                              Index Cond: (fk_team_id = e.part_a1_id)
                        ->  Bitmap Index Scan on andalertsmatch_fk_team_id_idx  (cost=0.00..44.26 rows=508 width=0) (actual tim
e=137.507..137.507 rows=4192 loops=8)
                              Index Cond: (e.part_b1_id = fk_team_id)
      ->  Hash  (cost=71287.87..71287.87 rows=1064987 width=52) (actual time=2150.105..2150.105 rows=1064987 loops=1)
            Buckets: 65536  Batches: 2  Memory Usage: 44488kB
            ->  Seq Scan on andalertsmatch am  (cost=0.00..71287.87 rows=1064987 width=52) (actual time=0.000..1680.077 rows=10
64987 loops=1)

Here is the explain plan on 9.1.14 :
Limit  (cost=20.64..73294.62 rows=1000 width=279) (actual time=1419311.904..1419400.785 rows=3 loops=1)
   ->  Merge Anti Join  (cost=20.64..193463.96 rows=2640 width=279) (actual time=1419311.902..1419400.783 rows=3 loops=1)
         Merge Cond: (e.id = am.a_mid)
         Join Filter: (((am.a_uid)::text = (a.t_uid)::text) AND ((am.a_bundle)::text = (a.t_bundle)::text))
         ->  Nested Loop  (cost=20.64..8045.28 rows=2694 width=279) (actual time=13.230..555.366 rows=121063 loops=1)
               ->  Index Scan using events_pkey on events e  (cost=0.00..3369.07 rows=2 width=32) (actual time=12.494..14.003 rows=41 loops=1)
                     Filter: ((fk_status_id <> 2) AND ((eventday = 1415232000) OR (eventday = 1415059200) OR (eventday = 1415145600)))
               ->  Bitmap Heap Scan on andalertsteam a  (cost=20.64..2322.43 rows=1045 width=263) (actual time=0.467..8.612 rows=2953 loops=41)
                     Recheck Cond: ((fk_team_id = e.part_a1_id) OR (e.part_b1_id = fk_team_id))
                     Filter: (fk_comp_id IS NULL)
                     ->  BitmapOr  (cost=20.64..20.64 rows=1045 width=0) (actual time=0.335..0.335 rows=0 loops=41)
                           ->  Bitmap Index Scan on andalertsmatch_fk_team_id_idx  (cost=0.00..10.19 rows=522 width=0) (actual time=0.079..0.079 rows=401 loops=41)
                                 Index Cond: (fk_team_id = e.part_a1_id)
                           ->  Bitmap Index Scan on andalertsmatch_fk_team_id_idx  (cost=0.00..10.19 rows=522 width=0) (actual time=0.253..0.253 rows=2558 loops=41)
                                 Index Cond: (e.part_b1_id = fk_team_id)
         ->  Index Scan using andalertsmatch_a_mid_idx on andalertsmatch am  (cost=0.00..180798.61 rows=1173762 width=52) (actual time=0.015..875294.427 rows=1826118122 loops=1)
 Total runtime: 1419400.861 ms

Re: Performance problem on 2 PG versions on same query

From
"Tomas Vondra"
Date:
Hi,

Dne 5 Listopad 2014, 17:31, Rémy-Christophe Schermesser napsal(a):
> Hi,
>
> We have 2 instances of PG, one in 9.1.1 and 9.1.14. They have the same
> data, schema, PG configuration, and are almost identical machines, same
> number of cores and memory, but different cloud provider. The data was
> transferred with a pg_dump/pg_restore. We ran VACUUM ANALYSE, ANALYSE, and
> REINDEX on both machines.

What do you mean by "cloud provider"? Are you installing and configuring
the machine on your own, or is this set-up by the provider? BTW you should
do basic benchmarking first - numbers reported by the providers are just
random numbers until you verify them.

> One query take ~11 seconds on 9.1.1 and ~25 minutes on 9.1.14.

Well, the first observation is that the queries produce different results:

Limit  (cost=100414.92..107502.31 rows=1000 width=279) (actual
time=6200.302..11650.567 rows=1000 loops=1)

Limit  (cost=20.64..73294.62 rows=1000 width=279) (actual
time=1419311.904..1419400.785 rows=3 loops=1)

So while on 9.1.1 it produces 1000 rows very quickly, on 9.1.14 it only
ever finds 3 rows (so the query needs to scan all the data, and the abort
early does not trigger).

There are other differences, though. For example on 9.1.1 the nested loop
returns ~8k rows:

Nested Loop  (cost=88.78..7785.80 rows=2655 width=279) (actual
time=190.009..9470.460 rows=7951 loops=1)

while on 9.1.14 it produces ~120k rows:

Nested Loop  (cost=20.64..8045.28 rows=2694 width=279) (actual
time=13.230..555.366 rows=121063 loops=1)

This may be one of the reasons why the database decided to use different
join method.

Are there any differences in settings between the two machines (e.g.
work_mem)?

regards
Tomas



Re: Performance problem on 2 PG versions on same query

From
Tom Lane
Date:
"Tomas Vondra" <tv@fuzzy.cz> writes:
> Dne 5 Listopad 2014, 17:31, Rémy-Christophe Schermesser napsal(a):
>> We have 2 instances of PG, one in 9.1.1 and 9.1.14. They have the same
>> data, schema, PG configuration, and are almost identical machines, same
>> number of cores and memory, but different cloud provider. The data was
>> transferred with a pg_dump/pg_restore. We ran VACUUM ANALYSE, ANALYSE, and
>> REINDEX on both machines.

> Well, the first observation is that the queries produce different results:

Yeah.  Another reason to not believe that the databases contain identical
data is here:

>>            ->  Seq Scan on andalertsmatch am  (cost=0.00..71287.87
>> rows=1064987 width=52) (actual time=0.000..1680.077 rows=1064987 loops=1)

>>         ->  Index Scan using andalertsmatch_a_mid_idx on andalertsmatch am
>> (cost=0.00..180798.61 rows=1173762 width=52) (actual
>> time=0.015..875294.427 rows=1826118122 loops=1)

For some reason there's over 1000 times more rows in andalertsmatch in
the 9.1.14 installation.  I'm betting on a foulup somewhere in the data
dump/restore process.

            regards, tom lane


Re: Performance problem on 2 PG versions on same query

From
"Tomas Vondra"
Date:
Dne 5 Listopad 2014, 18:10, Tom Lane napsal(a):
> "Tomas Vondra" <tv@fuzzy.cz> writes:
>> Dne 5 Listopad 2014, 17:31, R??my-Christophe Schermesser napsal(a):
>>> We have 2 instances of PG, one in 9.1.1 and 9.1.14. They have the same
>>> data, schema, PG configuration, and are almost identical machines, same
>>> number of cores and memory, but different cloud provider. The data was
>>> transferred with a pg_dump/pg_restore. We ran VACUUM ANALYSE, ANALYSE,
>>> and
>>> REINDEX on both machines.
>
>> Well, the first observation is that the queries produce different
>> results:
>
> Yeah.  Another reason to not believe that the databases contain identical
> data is here:
>
>>>            ->  Seq Scan on andalertsmatch am  (cost=0.00..71287.87
>>> rows=1064987 width=52) (actual time=0.000..1680.077 rows=1064987
>>> loops=1)
>
>>>         ->  Index Scan using andalertsmatch_a_mid_idx on andalertsmatch
>>> am
>>> (cost=0.00..180798.61 rows=1173762 width=52) (actual
>>> time=0.015..875294.427 rows=1826118122 loops=1)
>
> For some reason there's over 1000 times more rows in andalertsmatch in
> the 9.1.14 installation.  I'm betting on a foulup somewhere in the data
> dump/restore process.

I'd bet that's mostly due to rescans in the merge join ...

Tomas



Re: Performance problem on 2 PG versions on same query

From
Tom Lane
Date:
"Tomas Vondra" <tv@fuzzy.cz> writes:
> Dne 5 Listopad 2014, 18:10, Tom Lane napsal(a):
>> For some reason there's over 1000 times more rows in andalertsmatch in
>> the 9.1.14 installation.  I'm betting on a foulup somewhere in the data
>> dump/restore process.

> I'd bet that's mostly due to rescans in the merge join ...

Doh, I should have thought of that :-(.  But still, the difference in
the number of final output rows is pretty compelling evidence that
it's not really the same data.

            regards, tom lane


Re: Performance problem on 2 PG versions on same query

From
Rémy-Christophe Schermesser
Date:
Thank you very much for the insights (and how to read correctly a explain :)).
There is a difference in data between the 2 databases, crons where running on one and not the other.

But your insights enabled us to dive deeper, thanks !

2014-11-05 18:30 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
"Tomas Vondra" <tv@fuzzy.cz> writes:
> Dne 5 Listopad 2014, 18:10, Tom Lane napsal(a):
>> For some reason there's over 1000 times more rows in andalertsmatch in
>> the 9.1.14 installation.  I'm betting on a foulup somewhere in the data
>> dump/restore process.

> I'd bet that's mostly due to rescans in the merge join ...

Doh, I should have thought of that :-(.  But still, the difference in
the number of final output rows is pretty compelling evidence that
it's not really the same data.

                        regards, tom lane