Thread: MERGE: performance advices

MERGE: performance advices

From

Ivan Sergio Borgonovo

Date:

02 September 2008, 08:20:26

I need to merge 2 tables:

update d set c1=s.c1, c2=s.c2... from s where d.pk=s.pk;
insert into d (pk, c1, c2, ...) select pk, c1, c2, c3 from s
  where s.pk not in (select pk from d);

Any strategy to make it faster? Including modifying postgres.conf
temporary?

Considering I've no concurrency problems. The tables I'm dealing
with are "read only" for everything else other than the merge
process.

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: MERGE: performance advices

From

Gregory Stark

Date:

02 September 2008, 09:14:22

Ivan Sergio Borgonovo <mail@webthatworks.it> writes:

> I need to merge 2 tables:
>
> update d set c1=s.c1, c2=s.c2... from s where d.pk=s.pk;
> insert into d (pk, c1, c2, ...) select pk, c1, c2, c3 from s
>   where s.pk not in (select pk from d);

you could try making the not in an exists. In released versions of Postgres
sometimes one is better than the other. Raising work_mem might matter if it
lets you do a hash join for either the IN/EXISTS or the join.

There is another approach though whether it's faster depends on how many
indexes you have and other factors:

CREATE TABLE new_d AS
SELECT DISTINCT ON (pk) pk,c1,c
  FROM (select 1 as t, * from s
        union all
        select 2 as t, * from d
       )
 ORDER BY pk, t

This will pull in all the rows from both tables and sort them by pk with
records from s appearing before matching records from t and then keep only the
first value for each pk.

Then you'll have to build indexes, swap the tables, and fix any views or rules
which refer to the old table (they'll still refer to the old table, not the
new table even after renaming it to the old name).

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

Re: MERGE: performance advices

From

"Richard Broersma"

Date:

02 September 2008, 09:38:55

On Tue, Sep 2, 2008 at 4:19 AM, Ivan Sergio Borgonovo
<mail@webthatworks.it> wrote:


> insert into d (pk, c1, c2, ...) select pk, c1, c2, c3 from s
>  where s.pk not in (select pk from d);

This insert statement might be faster:

INSERT INTO d (pk, c1, c2, ... )
SELECT pk, c1, c2, ...
  FROM s
LEFT JOIN d ON s.pk = d.pk
WHERE d.pk IS NULL;


--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

Re: MERGE: performance advices

From

Steve Clark

Date:

02 September 2008, 12:12:07

Richard Broersma wrote:
> On Tue, Sep 2, 2008 at 4:19 AM, Ivan Sergio Borgonovo
> <mail@webthatworks.it> wrote:
>
>
>
>>insert into d (pk, c1, c2, ...) select pk, c1, c2, c3 from s
>> where s.pk not in (select pk from d);
>
>
> This insert statement might be faster:
>
> INSERT INTO d (pk, c1, c2, ... )
> SELECT pk, c1, c2, ...
>   FROM s
> LEFT JOIN d ON s.pk = d.pk
> WHERE d.pk IS NULL;
>
>
Hello Richard,

Is there a way to do something similar with the following? I am an SQL noob and the
following takes longer to run than is reasonable, on the order of hours.

insert into myevents select * from t_unit_event_log a where exists
        (select b.event_log_no from myevents b
         where a.event_status = 1 and a.event_ref_log_no IS NOT NULL
                and a.event_ref_log_no = b.event_log_no and a.event_log_no not in
                        (select event_log_no from myevents)
        )

Thanks,
Steve

Re: MERGE: performance advices

From

"Richard Broersma"

Date:

02 September 2008, 12:47:52

On Tue, Sep 2, 2008 at 8:10 AM, Steve Clark <sclark@netwolves.com> wrote:

> Is there a way to do something similar with the following? I am an SQL noob
> and the
> following takes longer to run than is reasonable, on the order of hours.
>
> insert into myevents select * from t_unit_event_log a where exists
>       (select b.event_log_no from myevents b
>        where a.event_status = 1 and a.event_ref_log_no IS NOT NULL
>               and a.event_ref_log_no = b.event_log_no and a.event_log_no not
> in
>                       (select event_log_no from myevents)
>       )


To start off with, this SQL statement can be refined a bit.  Many of
the sub-query WHERE clause constraints have nothing to do with the
Correlated sub-query.  The refinement would look like so:

INSERT INTO Myevents
     SELECT *
       FROM T_unit_event_log AS A
      WHERE A.event_status = 1
        AND A.event_ref_log_no IS NOT NULL
        AND A.event_log_no NOT IN ( SELECT event_log_no
                                      FROM Myevents)
        AND EXISTS ( SELECT B.event_log_no
                       FROM Myevents AS B
                      WHERE A.event_ref_log_no = B.event_log_no );


The next step would be to rework the NOT IN sub-query into a LEFT JOIN
WHERE IS NULL;

INSERT INTO Myevents
     SELECT *
       FROM T_unit_event_log AS A
  LEFT JOIN Myevents AS C
         ON A.event_log_no = C.event_log_no
      WHERE A.event_status = 1
        AND A.event_ref_log_no IS NOT NULL
        AND C.event_log_no IS NULL
        AND EXISTS ( SELECT B.event_log_no
                       FROM Myevents AS B
                      WHERE A.event_ref_log_no = B.event_log_no );

There is one possible alteration that may or many not improve
performance.  This would be to replace the EXISTS with a LEFT JOIN
WHERE IS NOT NULL;

INSERT INTO Myevents
     SELECT *
       FROM T_unit_event_log AS A
  LEFT JOIN Myevents AS C
         ON A.event_log_no = C.event_log_no
  LEFT JOIN Myevents AS B
         ON A.event_ref_log_no = B.event_log_no
      WHERE C.event_log_no IS NULL
        AND B.event_log_no IS NOT NULL
        AND A.event_status = 1
        AND A.event_ref_log_no IS NOT NULL;


--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

Re: MERGE: performance advices

From

Steve Clark

Date:

02 September 2008, 15:15:41

Richard Broersma wrote:
> On Tue, Sep 2, 2008 at 8:10 AM, Steve Clark <sclark@netwolves.com> wrote:
>
>
>>Is there a way to do something similar with the following? I am an SQL noob
>>and the
>>following takes longer to run than is reasonable, on the order of hours.
>>
>>insert into myevents select * from t_unit_event_log a where exists
>>      (select b.event_log_no from myevents b
>>       where a.event_status = 1 and a.event_ref_log_no IS NOT NULL
>>              and a.event_ref_log_no = b.event_log_no and a.event_log_no not
>>in
>>                      (select event_log_no from myevents)
>>      )
>
>
>
> To start off with, this SQL statement can be refined a bit.  Many of
> the sub-query WHERE clause constraints have nothing to do with the
> Correlated sub-query.  The refinement would look like so:
>
> INSERT INTO Myevents
>      SELECT *
>        FROM T_unit_event_log AS A
>       WHERE A.event_status = 1
>         AND A.event_ref_log_no IS NOT NULL
>         AND A.event_log_no NOT IN ( SELECT event_log_no
>                                       FROM Myevents)
>         AND EXISTS ( SELECT B.event_log_no
>                        FROM Myevents AS B
>                       WHERE A.event_ref_log_no = B.event_log_no );
>
>
> The next step would be to rework the NOT IN sub-query into a LEFT JOIN
> WHERE IS NULL;
>
> INSERT INTO Myevents
>      SELECT *
>        FROM T_unit_event_log AS A
>   LEFT JOIN Myevents AS C
>          ON A.event_log_no = C.event_log_no
>       WHERE A.event_status = 1
>         AND A.event_ref_log_no IS NOT NULL
>         AND C.event_log_no IS NULL
>         AND EXISTS ( SELECT B.event_log_no
>                        FROM Myevents AS B
>                       WHERE A.event_ref_log_no = B.event_log_no );
>
> There is one possible alteration that may or many not improve
> performance.  This would be to replace the EXISTS with a LEFT JOIN
> WHERE IS NOT NULL;
>
> INSERT INTO Myevents
>      SELECT *
>        FROM T_unit_event_log AS A
>   LEFT JOIN Myevents AS C
>          ON A.event_log_no = C.event_log_no
>   LEFT JOIN Myevents AS B
>          ON A.event_ref_log_no = B.event_log_no
>       WHERE C.event_log_no IS NULL
>         AND B.event_log_no IS NOT NULL
>         AND A.event_status = 1
>         AND A.event_ref_log_no IS NOT NULL;
>
>

Hi Richard and thanks for the response. When I try the last two queries i get and error. I have listed
the results of explain on all three.

srm2=# explain
srm2-# INSERT INTO Myevents
srm2-#      SELECT *
srm2-#        FROM T_unit_event_log AS A
srm2-#       WHERE A.event_status = 1
srm2-#         AND A.event_ref_log_no IS NOT NULL
srm2-#         AND A.event_log_no NOT IN ( SELECT event_log_no
srm2(#                                       FROM Myevents)
srm2-#         AND EXISTS ( SELECT B.event_log_no
srm2(#                        FROM Myevents AS B
srm2(#                       WHERE A.event_ref_log_no = B.event_log_no );
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Index Scan using indx_tuel_usn_ec_es on t_unit_event_log a  (cost=31711.73..3607445990.61 rows=51844 width=146)
   Index Cond: (event_status = 1)
   Filter: ((event_ref_log_no IS NOT NULL) AND (subplan) AND (NOT (subplan)))
   SubPlan
     ->  Materialize  (cost=31711.73..42857.85 rows=830612 width=4)
           ->  Seq Scan on myevents  (cost=0.00..28041.12 rows=830612 width=4)
     ->  Index Scan using indx1myevents on myevents b  (cost=0.00..8.37 rows=1 width=4)
           Index Cond: ($1 = event_log_no)
(8 rows)

srm2=#
srm2=#
srm2=# explain
srm2-# INSERT INTO Myevents
srm2-#      SELECT *
srm2-#        FROM T_unit_event_log AS A
srm2-#   LEFT JOIN Myevents AS C
srm2-#          ON A.event_log_no = C.event_log_no
srm2-#       WHERE A.event_status = 1
srm2-#         AND A.event_ref_log_no IS NOT NULL
srm2-#         AND C.event_log_no IS NULL
srm2-#         AND EXISTS ( SELECT B.event_log_no
srm2(#                        FROM Myevents AS B
srm2(#                       WHERE A.event_ref_log_no = B.event_log_no );
ERROR:  INSERT has more expressions than target columns
srm2=#
srm2=# explain
srm2-# INSERT INTO Myevents
srm2-#      SELECT *
srm2-#        FROM T_unit_event_log AS A
srm2-#   LEFT JOIN Myevents AS C
srm2-#          ON A.event_log_no = C.event_log_no
srm2-#   LEFT JOIN Myevents AS B
srm2-#          ON A.event_ref_log_no = B.event_log_no
srm2-#       WHERE C.event_log_no IS NULL
srm2-#         AND B.event_log_no IS NOT NULL
srm2-#         AND A.event_status = 1
srm2-#         AND A.event_ref_log_no IS NOT NULL;
ERROR:  INSERT has more expressions than target columns

I really appreciate your help.

Steve

Re: MERGE: performance advices

From

"Richard Broersma"

Date:

02 September 2008, 15:24:08

On Tue, Sep 2, 2008 at 9:47 AM, Steve Clark <sclark@netwolves.com> wrote:
> srm2=# explain
> srm2-# INSERT INTO Myevents
> srm2-#      SELECT *

> ERROR:  INSERT has more expressions than target columns

> srm2=# explain
> srm2-# INSERT INTO Myevents
> srm2-#      SELECT *
> ERROR:  INSERT has more expressions than target columns

Oops, replace SELECT * with SELECT A.*.

--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

Re: MERGE: performance advices

From

Steve Clark

Date:

02 September 2008, 15:48:36

Richard Broersma wrote:
> On Tue, Sep 2, 2008 at 9:47 AM, Steve Clark <sclark@netwolves.com> wrote:
>
>>srm2=# explain
>>srm2-# INSERT INTO Myevents
>>srm2-#      SELECT *
>
>
>>ERROR:  INSERT has more expressions than target columns
>
>
>>srm2=# explain
>>srm2-# INSERT INTO Myevents
>>srm2-#      SELECT *
>>ERROR:  INSERT has more expressions than target columns
>
>
> Oops, replace SELECT * with SELECT A.*.
>
Richard,

This is AWESOME! This now only takes seconds where before it was taking longer than
I wanted to wait, I had let it run all night before aborting it.

Thanks so much Richard,

Regards,
Steve

Re: MERGE: performance advices

From

Tom Lane

Date:

02 September 2008, 15:59:34

"Richard Broersma" <richard.broersma@gmail.com> writes:
> There is one possible alteration that may or many not improve
> performance.  This would be to replace the EXISTS with a LEFT JOIN
> WHERE IS NOT NULL;

That changes the behavior, doesn't it?  Or is event_log_no a unique
key for Myevents?  I think what you want is to make the EXISTS an IN
instead.

            regards, tom lane

Re: MERGE: performance advices

From

Steve Clark

Date:

02 September 2008, 16:18:28

Tom Lane wrote:
> "Richard Broersma" <richard.broersma@gmail.com> writes:
>
>>There is one possible alteration that may or many not improve
>>performance.  This would be to replace the EXISTS with a LEFT JOIN
>>WHERE IS NOT NULL;
>
>
> That changes the behavior, doesn't it?  Or is event_log_no a unique
> key for Myevents?  I think what you want is to make the EXISTS an IN
> instead.
>
>             regards, tom lane
>
>
Yes event_log_no is a unique key for myevents.

Re: MERGE: performance advices

From

"Richard Broersma"

Date:

02 September 2008, 16:49:20

On Tue, Sep 2, 2008 at 10:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Richard Broersma" <richard.broersma@gmail.com> writes:
>> There is one possible alteration that may or many not improve
>> performance.  This would be to replace the EXISTS with a LEFT JOIN
>> WHERE IS NOT NULL;
>
> That changes the behavior, doesn't it?  Or is event_log_no a unique
> key for Myevents?  I think what you want is to make the EXISTS an IN
> instead.

Thanks for pointing that out Tom.  I hadn't consider how the EXISTS
clause would return potentially fewer rows if event_log_no wasn't
unique.


--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug