Re: Improving performance of merging data between tables - Mailing list pgsql-general

From Andy Colson
Subject Re: Improving performance of merging data between tables
Date
Msg-id 54A18890.5050307@squeakycode.net
Whole thread Raw
In response to Improving performance of merging data between tables  (Pawel Veselov <pawel.veselov@gmail.com>)
Responses Re: Improving performance of merging data between tables  (Pawel Veselov <pawel.veselov@gmail.com>)
List pgsql-general
On 12/28/2014 3:49 PM, Pawel Veselov wrote:
> Hi.
>
> I was wondering if anybody would have any ideas on how to improve
> certain operations that we are having.
>
<SNIP>
>
> Besides "can somebody please look at this and let me know if I'm doing
> something utterly stupid", here are my questions.
>
> 1) How do I find out what exactly is consuming the CPU in a PL/pgSQL
> function? All I see is that the calls to merge_all() function take long
> time, and the CPU is high while this is going on.
>
>

First, I'll admit I didn't read your entire post.

I can think of a couple methods:

1) try each of the statements in merge_all by hand with an "explain
analyze" in front to see which is slow.  Look for things that hit big
tables without an index.  Check that fk lookups are indexes.

2) try pg_stat_statements, setting "pg_stat_statements.track = all".  see:
http://www.postgresql.org/docs/9.4/static/pgstatstatements.html

I have used this to profile some functions, and it worked pretty well.
Mostly I use it on a test box, but once ran it on the live, which was
scary, but worked great.

3) try auto-explain:
http://www.postgresql.org/docs/9.4/static/auto-explain.html

I've never used it, so don't know if it'll show each statement inside a
function.  Dumps stuff to the log AFAIK, so you'll have to dig out the
info by hand.



 > 2) Is there a better way to merge individual rows, except doing
 > UPDATE/INSERT in a loop, and would that be CPU expensive?
 >

Not that I know of.  I use pretty much the same thing.  Soon!  we will
have merge/upsert support.  Hopefully it'll be fast.

 > 3) Is there a better way to merge whole tables? However, note that I
 > need to translate primary keys from node main table into the common main
 > table, as they are used as foreign keys, hence the loops. I suspect the
 > looping is CPU intensive.

Avoiding loops and doing things as sets is the best way.  If possible.
The only loop I saw was looping over the merge_xxx tables, which is
probably the only way.

If possible (if you haven't already) you could add and extra column to
your secondary table that you can set as the main table's key.

bulk insert into second;
update second set magic = (select key from main where ... );

Then, maybe, you can do two ops in batch:

update main (where key exists in main)
insert into main (where key not exists in main)

-Andy







pgsql-general by date:

Previous
From: sramay
Date:
Subject: vacuum vs pg_repack vs pg_reorg
Next
From: Andy Colson
Date:
Subject: Re: pg_base_backup limit bandwidth possible?