Thread: Improving performance of merging data between tables

Improving performance of merging data between tables

From
Pawel Veselov
Date:
Hi.

I was wondering if anybody would have any ideas on how to improve certain operations that we are having.

PostgreSQL 9.3.3. Table layout:

main_table: PK, N key columns, M data columns. The key columns are bound in a unique key together. PK is pure sequence number. There are few separate non-unique indices on some of the key columns, to aid in querying the table data.

second_table: PK, main_table_PK_REF (declared as foreign key), Na key columns, Ma data columns. There are 0-inf (typically 0-10) entries in second_table that reference a single entry in main_table. PK is pure sequence number, and unique key for that table is (main_table_PK_REF + key columns). The only non-obvious thing here is that some of the unique indexes involve coalesce() function:

"uq_brkioevent" UNIQUE, btree (mainid, blockid, name, subid, (COALESCE(cname, ''::character varying)))

Those are there because we need to write the NULL into the tables, but unique indexes don't like NULL values (AFAIR).

There is one main_table, and 10 second_tables.

The amount of rows in main table right now is ~1.1M, second tables have about 1M-2M of rows. The growth of the main table is fixed amount of rows (~10) per hour.

Multiple (web) application nodes need to write data into this table. Most of the operations are modifying the data columns, rather than inserting new data. We had serious contention problems if we let all the application nodes write directly into the table. The writes involved using prepared transactions, the prepared transaction can only be reaped after other data stores are written to, and there is very high chance nodes will need to modify the same rows, so the locking was taking too long.

To relieve the contention, we have allocated each application node it's own set of tables that structurally are exactly like main/second tables. The application node will open a transaction, write all the data into its own tables, free of locks, and then call a pgsql function to merge the data from its tables into the common tables. There is typically relatively little data in the node tables (say within 100 rows in any table) before its merged into the common tables. Nodes would dump their data when there is something to dump, can be few times a second.

Recently, the operation that moves the data from the node tables into the common tables started being a real drain on the PostgreSQL server CPU. I assume this is probably due to the data set size reaching some critical mass. Things really got outta hand when we had to double the amount of application nodes to accommodate surge in application use.

The merging is done in the following manner.
Each main/second table has an associated PL/pgSQL function (merge_xxx) that takes in key and data values as arguments. It then, in an endless loop, tries to execute UPDATE statement (using math operations to update the data based on existing and input data values, using key data in the query part). If UPDATE statement set "found", then the function exists. Otherwise, the function tries to INSERT with key/data values. If that succeeds, function exists, else if unique_violation is thrown, loop continues.

On top of these individual functions, there is another PL/pgSQL function (merge_all). It uses "for ROW in select * from MAIN_NODE" outer loop, and within that loop it calls the merge_xxx for the main table, and then for each secondary table, does the same "for ROWx in select * from SECOND_NODE", adding WHERE clause to only pick up entries that correspond to the current main_node table entry that's being processed, calling merge_xxx for the corresponding secondary table. At the end of the outer loop, all data from node tables is removed (using DELETE). I will gladly provide pseudo-code, or even the function body is my explanation is unclear.

Besides "can somebody please look at this and let me know if I'm doing something utterly stupid", here are my questions.

1) How do I find out what exactly is consuming the CPU in a PL/pgSQL function? All I see is that the calls to merge_all() function take long time, and the CPU is high while this is going on.

2) Is there a better way to merge individual rows, except doing UPDATE/INSERT in a loop, and would that be CPU expensive?

3) Is there a better way to merge whole tables? However, note that I need to translate primary keys from node main table into the common main table, as they are used as foreign keys, hence the loops. I suspect the looping is CPU intensive.

Thank you,
  Pawel.

Re: Improving performance of merging data between tables

From
Andy Colson
Date:
On 12/28/2014 3:49 PM, Pawel Veselov wrote:
> Hi.
>
> I was wondering if anybody would have any ideas on how to improve
> certain operations that we are having.
>
<SNIP>
>
> Besides "can somebody please look at this and let me know if I'm doing
> something utterly stupid", here are my questions.
>
> 1) How do I find out what exactly is consuming the CPU in a PL/pgSQL
> function? All I see is that the calls to merge_all() function take long
> time, and the CPU is high while this is going on.
>
>

First, I'll admit I didn't read your entire post.

I can think of a couple methods:

1) try each of the statements in merge_all by hand with an "explain
analyze" in front to see which is slow.  Look for things that hit big
tables without an index.  Check that fk lookups are indexes.

2) try pg_stat_statements, setting "pg_stat_statements.track = all".  see:
http://www.postgresql.org/docs/9.4/static/pgstatstatements.html

I have used this to profile some functions, and it worked pretty well.
Mostly I use it on a test box, but once ran it on the live, which was
scary, but worked great.

3) try auto-explain:
http://www.postgresql.org/docs/9.4/static/auto-explain.html

I've never used it, so don't know if it'll show each statement inside a
function.  Dumps stuff to the log AFAIK, so you'll have to dig out the
info by hand.



 > 2) Is there a better way to merge individual rows, except doing
 > UPDATE/INSERT in a loop, and would that be CPU expensive?
 >

Not that I know of.  I use pretty much the same thing.  Soon!  we will
have merge/upsert support.  Hopefully it'll be fast.

 > 3) Is there a better way to merge whole tables? However, note that I
 > need to translate primary keys from node main table into the common main
 > table, as they are used as foreign keys, hence the loops. I suspect the
 > looping is CPU intensive.

Avoiding loops and doing things as sets is the best way.  If possible.
The only loop I saw was looping over the merge_xxx tables, which is
probably the only way.

If possible (if you haven't already) you could add and extra column to
your secondary table that you can set as the main table's key.

bulk insert into second;
update second set magic = (select key from main where ... );

Then, maybe, you can do two ops in batch:

update main (where key exists in main)
insert into main (where key not exists in main)

-Andy







Re: Improving performance of merging data between tables

From
Pawel Veselov
Date:

Andy,

thanks for looking into this.

On Mon, Dec 29, 2014 at 9:00 AM, Andy Colson <andy@squeakycode.net> wrote:
On 12/28/2014 3:49 PM, Pawel Veselov wrote:
Hi.

I was wondering if anybody would have any ideas on how to improve
certain operations that we are having.

<SNIP>

Besides "can somebody please look at this and let me know if I'm doing
something utterly stupid", here are my questions.

1) How do I find out what exactly is consuming the CPU in a PL/pgSQL
function? All I see is that the calls to merge_all() function take long
time, and the CPU is high while this is going on.



First, I'll admit I didn't read your entire post.

I can think of a couple methods:

1) try each of the statements in merge_all by hand with an "explain analyze" in front to see which is slow.  Look for things that hit big tables without an index.  Check that fk lookups are indexes.

If I didn't miss anything, that seems to be OK, even on function-based queries.
 
2) try pg_stat_statements, setting "pg_stat_statements.track = all".  see:
http://www.postgresql.org/docs/9.4/static/pgstatstatements.html

I have used this to profile some functions, and it worked pretty well. Mostly I use it on a test box, but once ran it on the live, which was scary, but worked great.

That looks promising. Turned it on, waiting for when I can turn the server at the next "quiet time".
 
3) try auto-explain:
http://www.postgresql.org/docs/9.4/static/auto-explain.html

I've never used it, so don't know if it'll show each statement inside a function.  Dumps stuff to the log AFAIK, so you'll have to dig out the info by hand.

> 2) Is there a better way to merge individual rows, except doing
> UPDATE/INSERT in a loop, and would that be CPU expensive?
>

Not that I know of.  I use pretty much the same thing.  Soon!  we will have merge/upsert support.  Hopefully it'll be fast.

Well, anytime I cancelled the PID that was executing this whole mess, it would always stop at UPDATE ... SET ... WHERE on the main table. Which does make me believe that bulk update would really help.
 
> 3) Is there a better way to merge whole tables? However, note that I
> need to translate primary keys from node main table into the common main
> table, as they are used as foreign keys, hence the loops. I suspect the
> looping is CPU intensive.

Avoiding loops and doing things as sets is the best way.  If possible. The only loop I saw was looping over the merge_xxx tables, which is probably the only way.

There is an endless loop that is just a device for merging, but then there are loops going over each record in all the tables that are being merge, feeding them into the function that actually does the merge. That table iteration is what I want to eliminate (especially if I knew it would help :) )
 
If possible (if you haven't already) you could add and extra column to your secondary table that you can set as the main table's key.

bulk insert into second;
update second set magic = (select key from main where ... );

Then, maybe, you can do two ops in batch:

update main (where key exists in main)
insert into main (where key not exists in main)


I was thinking along the same lines. I can't really do bulk insert, at any point, because any key can be inserted by another process at any time, and with a good probability. However, there will be a lot less inserts than updates. So, in general, I'm making it do this:

with pivot as ( select main_table.id, node_table.id as node_id as main_id from node_table left join main_table using (key fields) )
update node_table set translate_id = pivot.main_id where node_table.id = pivot.node_id;

(missing is cursor as select from node_table where main_id is null)

for row in missing loop
  -- merge_function will return PK of either the updated, or inserted record.
  -- use (0) data values, so there it's an identity update, if the merge results
  -- into an update, or "empty" data if not.
  select merge_function(missing.key_fields, 0) into use_id;
  update node_table set translate_id = use_id where current of missing;
end loop

At this point, I have a guarantee that I can update all records, and there is nothing to insert.
So,

with new as ( select * from node_table ) 
update main_table old 
set new.val = f(old.val, new.val)
where new.translate_id = old.id

So, I don't need full key matching anymore, I can use PKs instead.


Re: Improving performance of merging data between tables

From
Andy Colson
Date:
On 12/29/2014 11:29 PM, Pawel Veselov wrote:
>
> Andy,
>
> thanks for looking into this.
>
> On Mon, Dec 29, 2014 at 9:00 AM, Andy Colson <andy@squeakycode.net <mailto:andy@squeakycode.net>> wrote:
>
>     On 12/28/2014 3:49 PM, Pawel Veselov wrote:
>
>         Hi.
>
>         I was wondering if anybody would have any ideas on how to improve
>         certain operations that we are having.
>
>     <SNIP>
>
>
>         Besides "can somebody please look at this and let me know if I'm doing
>         something utterly stupid", here are my questions.
>
>         1) How do I find out what exactly is consuming the CPU in a PL/pgSQL
>         function? All I see is that the calls to merge_all() function take long
>         time, and the CPU is high while this is going on.
>
>
>
>     First, I'll admit I didn't read your entire post.
>
>     I can think of a couple methods:
>
>     1) try each of the statements in merge_all by hand with an "explain analyze" in front to see which is slow.  Look
forthings that hit big tables without an index.  Check that fk lookups are indexes. 
>
>
> If I didn't miss anything, that seems to be OK, even on function-based queries.
>
>     2) try pg_stat_statements, setting "pg_stat_statements.track = all".  see:
>     http://www.postgresql.org/__docs/9.4/static/__pgstatstatements.html
<http://www.postgresql.org/docs/9.4/static/pgstatstatements.html>
>
>     I have used this to profile some functions, and it worked pretty well. Mostly I use it on a test box, but once
ranit on the live, which was scary, but worked great. 
>
>
> That looks promising. Turned it on, waiting for when I can turn the server at the next "quiet time".
>
>     3) try auto-explain:
>     http://www.postgresql.org/__docs/9.4/static/auto-explain.__html
<http://www.postgresql.org/docs/9.4/static/auto-explain.html>
>
>     I've never used it, so don't know if it'll show each statement inside a function.  Dumps stuff to the log AFAIK,
soyou'll have to dig out the info by hand. 
>
>     > 2) Is there a better way to merge individual rows, except doing
>     > UPDATE/INSERT in a loop, and would that be CPU expensive?
>     >
>
>     Not that I know of.  I use pretty much the same thing.  Soon!  we will have merge/upsert support.  Hopefully
it'llbe fast. 
>
>
> Well, anytime I cancelled the PID that was executing this whole mess, it would always stop at UPDATE ... SET ...
WHEREon the main table. Which does make me believe that bulk update would really help. 
>
>     > 3) Is there a better way to merge whole tables? However, note that I
>     > need to translate primary keys from node main table into the common main
>     > table, as they are used as foreign keys, hence the loops. I suspect the
>     > looping is CPU intensive.
>
>     Avoiding loops and doing things as sets is the best way.  If possible. The only loop I saw was looping over the
merge_xxxtables, which is probably the only way. 
>
>
> There is an endless loop that is just a device for merging, but then there are loops going over each record in all
thetables that are being merge, feeding them into the function that actually does the merge. That table iteration is
whatI want to eliminate (especially if I knew it would help :) ) 
>
>     If possible (if you haven't already) you could add and extra column to your secondary table that you can set as
themain table's key. 
>
>     bulk insert into second;
>     update second set magic = (select key from main where ... );
>
>     Then, maybe, you can do two ops in batch:
>
>     update main (where key exists in main)
>     insert into main (where key not exists in main)
>
>
> I was thinking along the same lines. I can't really do bulk insert, at any point, because any key can be inserted by
anotherprocess at any time, and with a good probability. However, there will be a lot less inserts than updates. So, in
general,I'm making it do this: 
>

What about transactions?  I assume you do something like:

begin;
merge_all;
commit;


Depending on your transaction isolation level, then you could ensure that nobody could insert while you are inserting.
I'venever used the different isolation levels, so not 100% sure.  Even then, maybe a lock on the table itself, like: 

begin;
lock;
merge_all;
unlock;
commit;

This way only one at a time can work do insert/update, but you can do them in batch and not in a loop.  It might be
fasterthat way.  Other processes might wait for the lock a little bit, but if merge_all was faster in general, the time
towait for lock would be less that the entire merge process itself. 

I'm totally guessing here.

-Andy


Re: Improving performance of merging data between tables

From
Pawel Veselov
Date:


On Mon, Dec 29, 2014 at 9:29 PM, Pawel Veselov <pawel.veselov@gmail.com> wrote:

[skipped]


1) How do I find out what exactly is consuming the CPU in a PL/pgSQL
function? All I see is that the calls to merge_all() function take long
time, and the CPU is high while this is going on.

 
[skipped] 

2) try pg_stat_statements, setting "pg_stat_statements.track = all".  see:
http://www.postgresql.org/docs/9.4/static/pgstatstatements.html

I have used this to profile some functions, and it worked pretty well. Mostly I use it on a test box, but once ran it on the live, which was scary, but worked great.

That looks promising. Turned it on, waiting for when I can turn the server at the next "quiet time".

I have to say this turned out into a bit of a disappointment for this use case. It only measures total time spent in a call. So, it sends up operations that waited a lot on some lock. It's good, but it would be great if total_time was provided along with wait_time (and io_time may be as well, since I also see operations that just naturally have to fetch a lot of data)

[skipped]

Re: Improving performance of merging data between tables

From
Maxim Boguk
Date:


On Wed, Dec 31, 2014 at 11:10 AM, Pawel Veselov <pawel.veselov@gmail.com> wrote
 
[skipped] 

2) try pg_stat_statements, setting "pg_stat_statements.track = all".  see:
http://www.postgresql.org/docs/9.4/static/pgstatstatements.html

I have used this to profile some functions, and it worked pretty well. Mostly I use it on a test box, but once ran it on the live, which was scary, but worked great.

That looks promising. Turned it on, waiting for when I can turn the server at the next "quiet time".

I have to say this turned out into a bit of a disappointment for this use case. It only measures total time spent in a call. So, it sends up operations that waited a lot on some lock. It's good, but it would be great if total_time was provided along with wait_time (and io_time may be as well, since I also see operations that just naturally have to fetch a lot of data)

​1) pg_stat_statements provide an information about io_time of each statement but you should have track_io_timing ​
 
​enabled for that.

2) About locking I suggest enable log_lock_waits and set deadlock_timeout to say 100ms (just for testing purposes), and than any lock waiting more than 100ms will be logged with some useful additional info.

PS: your setup look pretty complicated and hard to analyze without seeing all involved table structures, transaction/query flow, and (especially) involved procedures source code.

PPS: btw, please check the database logs for deadlocks messages, your setup around "and then call a pgsql function to merge the data from its tables into the common tables" part could be easily deadlock prone.

PPPS: and the last suggestion, after you finished with the "write all the data into its own tables", then application should perform analyze of these own tables (or you could have weird/inefficient plans during last stage).



--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
МойКруг: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


Re: Improving performance of merging data between tables

From
Pawel Veselov
Date:
On Tue, Dec 30, 2014 at 7:25 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
On Wed, Dec 31, 2014 at 11:10 AM, Pawel Veselov <pawel.veselov@gmail.com> wrote
 
[skipped] 

2) try pg_stat_statements, setting "pg_stat_statements.track = all".  see:
http://www.postgresql.org/docs/9.4/static/pgstatstatements.html

I have used this to profile some functions, and it worked pretty well. Mostly I use it on a test box, but once ran it on the live, which was scary, but worked great.

That looks promising. Turned it on, waiting for when I can turn the server at the next "quiet time".

I have to say this turned out into a bit of a disappointment for this use case. It only measures total time spent in a call. So, it sends up operations that waited a lot on some lock. It's good, but it would be great if total_time was provided along with wait_time (and io_time may be as well, since I also see operations that just naturally have to fetch a lot of data)

​1) pg_stat_statements provide an information about io_time of each statement but you should have track_io_timing ​
 
​enabled for that.

Enabled that now. Still the top winners are the functions that probably lock for a long (relatively) time. This did help my find some crap that either was missing an index, or used an unreasonable join, and just needed re-writing. One entry that doesn't make sense to me is:

total_time - io_time = 1,366,773
calls = 666,542
query = SELECT * FROM q_SCHEDULER_STATE WHERE SCHED_NAME = ?
The table only has 18 rows, there is an index, but the analyzer chooses to ignore it, which is right since sched_name column has the same value for all rows. So all rows are returned in SELECT. The time to run that query under database load varies from 0.09 to 70ms.
This is a distraction from the main topic, though, but does stand out odd.
 
2) About locking I suggest enable log_lock_waits and set deadlock_timeout to say 100ms (just for testing purposes), and than any lock waiting more than 100ms will be logged with some useful additional info.
 
PPS: btw, please check the database logs for deadlocks messages, your setup around "and then call a pgsql function to merge the data from its tables into the common tables" part could be easily deadlock prone.

I don't have I have abnormal problem with locking. I wanted to eliminate locking time out of the pg_stat_statement, to address queries that aren't waiting on disk and/or locks first, as my problem is high CPU, not specific query performance. I don't have deadlocks for sure -- I had them before, and I would normally get an error if there was a deadlock. We process all the records in exactly the same order of keys to avoid deadlocks.

PPPS: and the last suggestion, after you finished with the "write all the data into its own tables", then application should perform analyze of these own tables (or you could have weird/inefficient plans during last stage).

Any references to back this up? I don't particularly mind doing it, but I wonder if analysis can be more expensive the processing. These tables get a few hundreds of records inserted/updated, then are entirely processed (with expected full scans), and then deleted...
 
PS: your setup look pretty complicated and hard to analyze without seeing all involved table structures, transaction/query flow, and (especially) involved procedures source code.

Sure :) At this point, I've put together the "bulk merge" code as well. I can't quite see much of a difference, actually, but it's hard to trust the execution times, as on the same amount of data they vary from, say, 0.5s to 2s, and the sample data is not stepping on any other locks. In general, I'm afraid all those left joins and multiple scans, even over small amount of data, is nullifying any positive effect.

secondary table(s): http://pastebin.com/aDVakUkp
There are actually 10 secondary tables, but they are more or less of the same structure.
The node tables have identical structure to the main tables.

First stage, which I don't particularly question, but may be wrongfully so, the application does:
- create data suitable for the rows in the primary/secondary tables
- starts transaction
- merges data into its own node tables (using merge_xxx PL/pgSQL functions) (<100 rows in primary table)
- prepares transactions
- deals with other data sources
- commits/rolls back prepared transaction depending on success of the previous step.

An example of a merge_xxx function: http://pastebin.com/6YYm8BVM

Second stage is really:
- start transaction
- call PL/pgSQL merge_all()
- commit

2 reasons for the 2 stages:
- if stage#2 fails, the data will be merged during the next iteration
- the lock time on the shared tables is minimized

It's possible that an external process may take over writing data for certain key subset (combination of (tagid,blockid)), to make sure there is no race condition with such process, such key pairs are "locked", that's what the whole r_locks table and get_r_lock() is about. This makes it a bit more cumbersome for the bulk merge. Here is the r_lock related pieces: http://pastebin.com/Y9NCemLV

This is the "old" code for merge_all function(): http://pastebin.com/5dn7WsvV
And this is the "new" code that I haven't finished testing, or put into service, but that's the gist of the change I'm considering: http://pastebin.com/XmgB5U0f

As always, any suggestions or pointers are greatly appreciated :)

Re: Improving performance of merging data between tables

From
Maxim Boguk
Date:


On Wed, Jan 7, 2015 at 8:49 PM, Pawel Veselov <pawel.veselov@gmail.com> wrote:

PPPS: and the last suggestion, after you finished with the "write all the data into its own tables", then application should perform analyze of these own tables (or you could have weird/inefficient plans during last stage).

Any references to back this up? I don't particularly mind doing it, but I wonder if analysis can be more expensive the processing. These tables get a few hundreds of records inserted/updated, then are entirely processed (with expected full scans), and then deleted...

If these "own tables" used only in full table selects but never used in joins - than there should be no issues.
However, once you start join these tables with anything else, you could have very inefficient/weird plans because the database doesn't know (without analyze) how many rows you have in these tables.​

 
 
PS: your setup look pretty complicated and hard to analyze without seeing all involved table structures, transaction/query flow, and (especially) involved procedures source code.

Sure :) At this point, I've put together the "bulk merge" code as well. I can't quite see much of a difference, actually, but it's hard to trust the execution times, as on the same amount of data they vary from, say, 0.5s to 2s, and the sample data is not stepping on any other locks. In general, I'm afraid all those left joins and multiple scans, even over small amount of data, is nullifying any positive effect.
 

Now some ideas to check.
The high CPU usage usually isn't related to locking, but related to seq scan or wrong plans or simple inefficient pl/pgsql code, locked processes usually doesn't use too much cpu.

1)on the test database perform  select pg_stat_reset(); then perform full round of merges, then check
select * from pg_stat_user_tables where seq_scan>0 order by seq_tup_read;
and if you find a lot of seq_scan and seq_tuple_reads on the particular table try find where they coming from (it could be reason for high CPU usage).

2)enable track_functions in postgresql.conf and perform the same sequence (select pg_stat_reset() + full round of merges
) then check
select * FROM pg_stat_user_functions order by self_time desc;
and check which function using the most time.

3)old/lost prepared transactions can have deadly effect on the database performance at whole. So check select * from pg_prepared_xact(); and verify that you don't have a hours (or weeks) old prepared xact lying around.

PS: btw I still don't fully understood relation between the:
"
- merges data into its own node tables (using merge_xxx PL/pgSQL functions)
"
and provided code for the public."merge_all02-9A-46-8B-C1-DD" and PUBLIC.merge_agrio.
As I see
public."merge_all02-9A-46-8B-C1-DD" calling PUBLIC.merge_agrio, and the PUBLIC.merge_agrio updates a global table R_AGRIO (but not the "own node table").


I think the best implementation of such task is asynchronous processing of this changes via background process. An application only inserts events into queue table (it lockless process), and some background process read these data from queue table and merge it into main table (again lockless because it single thread so no concurrent writes), and then delete the merged data from queue table.


--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
МойКруг: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


Re: Improving performance of merging data between tables

From
Pawel Veselov
Date:

Sorry, it took me a while to respond, but I re-factored all of this process to suggestions.

On Wed, Jan 7, 2015 at 7:49 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
On Wed, Jan 7, 2015 at 8:49 PM, Pawel Veselov <pawel.veselov@gmail.com> wrote:
PS: your setup look pretty complicated and hard to analyze without seeing all involved table structures, transaction/query flow, and (especially) involved procedures source code.

Sure :) At this point, I've put together the "bulk merge" code as well. I can't quite see much of a difference, actually, but it's hard to trust the execution times, as on the same amount of data they vary from, say, 0.5s to 2s, and the sample data is not stepping on any other locks. In general, I'm afraid all those left joins and multiple scans, even over small amount of data, is nullifying any positive effect.
Now some ideas to check.
The high CPU usage usually isn't related to locking, but related to seq scan or wrong plans or simple inefficient pl/pgsql code, locked processes usually doesn't use too much cpu.

1)on the test database perform  select pg_stat_reset(); then perform full round of merges, then check
select * from pg_stat_user_tables where seq_scan>0 order by seq_tup_read;
and if you find a lot of seq_scan and seq_tuple_reads on the particular table try find where they coming from (it could be reason for high CPU usage).

2)enable track_functions in postgresql.conf and perform the same sequence (select pg_stat_reset() + full round of merges
) then check
select * FROM pg_stat_user_functions order by self_time desc;
and check which function using the most time.

These are good pointers, if the new process is having the same sort of problems, this will come in handy on figuring out where they are coming from, thank you.
 
3)old/lost prepared transactions can have deadly effect on the database performance at whole. So check select * from pg_prepared_xact(); and verify that you don't have a hours (or weeks) old prepared xact lying around.

If there are "lost" prepared transactions, they will lock up a particular instance from being able to write into its table data, so it will just stall the node. But does happen, and we have an application mechanism to find and delete those.
 
PS: btw I still don't fully understood relation between the:
"
- merges data into its own node tables (using merge_xxx PL/pgSQL functions)
"
and provided code for the public."merge_all02-9A-46-8B-C1-DD" and PUBLIC.merge_agrio.
As I see
public."merge_all02-9A-46-8B-C1-DD" calling PUBLIC.merge_agrio, and the PUBLIC.merge_agrio updates a global table R_AGRIO (but not the "own node table").

It's a bit irrelevant at this point, but. merge_all02-9A-46-8B-C1-DD() function will take all data for 02-9A-46-8B-C1-DD node and move it into the master table. There is an analogous merge_02-9A-46-8B-C1-DD() function that takes data from application, and writes it into the tables for 02-9A-46-8B-C1-DD node. The process of moving data node tables->main tables and application->node tables is nearly identical, hence I only provided the body once. The big difference, is when merging into master, there is a lot more data to look through, as node tables only contain data that has not yet been merged into the master yet. 
 
I think the best implementation of such task is asynchronous processing of this changes via background process. An application only inserts events into queue table (it lockless process), and some background process read these data from queue table and merge it into main table (again lockless because it single thread so no concurrent writes), and then delete the merged data from queue table.

Well, that was a really good suggestion, thank you. Some weeks later I've put it together. This hasn't hit production yet, so I'm yet to see the overall improvement effect. Along with turning it into a queue, I've added provisions to try to combine as much data as possible before writing it out into the databse tables, and merged all of the satellite tables with the main data.

Before, I had:
r_agrio
  r_brk_xxx (multiple entries reference rows in r_agrio)

Now, I have:
r_agrio_daily
r_agrio_total
r_agrio_hourly

All the data that was in the r_brk_xxx tables is now in columns of the r_agrio* tables. To get around the fact that there are potentially multiple BRK records for each AGR record, the data is now stored as JSON object. The primary key used for the BRK tables is turned into a string that serves as a key in a top level JSON object. This should help me tremendously on the side that needs to read that data, as I had to join or left join the BRK tables.

Splitting this into 3 tables may come back and bite me in back, since it's two more inserts and corresponding look ups, but it seriously helps me on the reading side of things.

The code that aggregates the JSON data is still done in PL/PGSQL, which is probably a bad idea, considering that PL doesn't have good ways of manipulating JSON data in-place (I can't run PL/V8, or untrusted languages). But I should move this logic to Java, and use updateable result sets to modify data in place. The reason I left it in PL/PGSQL is that I didn't want to do select and then update, making it two look ups per each update. Another concern on moving it to the application side is the network turn-around.