Re: Help trying to tune query that executes 40x slower than in SqlServer - Mailing list pgsql-performance

From Hugo Ferreira
Subject Re: Help trying to tune query that executes 40x slower than in SqlServer
Date
Msg-id 4e8efcf5050309040836b06cc5@mail.gmail.com
Whole thread Raw
In response to Re: Help trying to tune query that executes 40x slower than in SqlServer  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Help trying to tune query that executes 40x slower than in SqlServer
List pgsql-performance
Hi,

Well, I think the problem is far more complex than just joins
reordering... I've restrucutred the query so that it won't use any
explicit joins.Instead it now has a series of 'in (select ...)' and
'not exists (select ...)'. This actually got faster... sometimes!!!

select 1, 10000168,  C.contxt_id, C.contxt_elmt_ix, null, null, null,
null, null, null, 1
from CONTXT as P, CONTXT_ELMT as C, MRS_REPLICATION_OUT as S,
MRS_TRANSACTION as TRANS
where S.age=0
    and S.trans_id=1
    and S.trans_id = TRANS.trans_id
    and S.ent_id = 10000029
    and (P.contxt_id=C.contxt_id) and (P.contxt_id = S.pk1)
    and (C.owner_id not in (select non_repl_data_owner_id from
NON_REPL_DATA_OWNER))
    AND (C.owner_id not in (select repl_data_owner_id from REPL_DATA_OWNER_RSDNC
                    where rsdnc_node_id = TRANS.recv_node_id))
    AND (not exists (select pk1 from MRS_REPLICATION_OUT
                where trans_id=1
                    and ent_id=10000168
                    and C.contxt_id = pk1
                    AND C.contxt_elmt_ix = pk2))
    AND (not exists (select pk1 from MRS_TRANSACTION RED_TRANS,
MRS_REPLICATION_OUT RED_OUT
                where RED_TRANS.cat_code = 'OUT'
                    and RED_TRANS.trans_type in ('X01', 'X02')
                    and RED_TRANS.trans_id=RED_OUT.trans_id
                    and RED_TRANS.prov_node_id=TRANS.prov_node_id
                    and RED_TRANS.recv_node_id=TRANS.recv_node_id
                    and RED_OUT.ent_id=10000168
                    and C.contxt_id = pk1
                    AND C.contxt_elmt_ix = pk2))


For example... I run the query, it takes 122seconds. Then I delete the
target tables, vacuum the database, re-run it again: 9s. But if I run
vacuum several times, and then run, it takes again 122seconds. If I
stop this 122seconds query, say, at second 3 and then run it again, it
will only take 9s. It simply doesn't make sense. Also, explain analyse
will give me diferent plans each time I run it... Unfortunately, this
is rendering PostgreSQL unusable for our goals. Any ideas?

By the way, I got the following indexes over MRS_REPLICATION_OUT which
seems to speed up things:

CREATE INDEX ix_mrs_replication_out_all ON mrs_replication_out
USING btree (ent_id, age, trans_id);

CREATE INDEX ix_mrs_replication_pks ON mrs_replication_out
USING btree (trans_id, ent_id, pk1, pk2, pk3, pk4, pk5, pk6, pk7);

Note: pk2... pk7 are nullable columns. trans_id is the least variant
column. pk1 is the most variant column. Most of the times, the
execution plan includes an 'index scan' over the first index
(ix_mrs_replication_out_all), followed by a filter with columns from
the second index (trans_id, ent_id, pk1, pk2, pk3, pk4, pk5, pk6,
pk7), though the 'age' column is not used... Any guess why??

Thanks in advance,

Hugo Ferreira

> It is possible but complicated to determine that reordering outer joins
> is safe in some cases.  We don't currently have such logic in PG.  It
> may be that SQL Server does have that capability and that's why it's
> finding a much better plan ... but for now you have to do that by hand
> in PG.

--
GPG Fingerprint: B0D7 1249 447D F5BB 22C5  5B9B 078C 2615 504B 7B85

pgsql-performance by date:

Previous
From: Gaetano Mendola
Date:
Subject: Re: vacuum full, why multiple times ?
Next
From: Richard Huxton
Date:
Subject: Re: vacuum full, why multiple times ?