Re: Improving performance of merging data between tables - Mailing list pgsql-general

From Maxim Boguk
Subject Re: Improving performance of merging data between tables
Date
Msg-id CAK-MWwTt13dkP2sEDxem63AhRMrTJ8kFs1C1iVFpaV=GFSvgBQ@mail.gmail.com
Whole thread Raw
In response to Re: Improving performance of merging data between tables  (Pawel Veselov <pawel.veselov@gmail.com>)
Responses Re: Improving performance of merging data between tables  (Pawel Veselov <pawel.veselov@gmail.com>)
List pgsql-general


On Wed, Jan 7, 2015 at 8:49 PM, Pawel Veselov <pawel.veselov@gmail.com> wrote:

PPPS: and the last suggestion, after you finished with the "write all the data into its own tables", then application should perform analyze of these own tables (or you could have weird/inefficient plans during last stage).

Any references to back this up? I don't particularly mind doing it, but I wonder if analysis can be more expensive the processing. These tables get a few hundreds of records inserted/updated, then are entirely processed (with expected full scans), and then deleted...

If these "own tables" used only in full table selects but never used in joins - than there should be no issues.
However, once you start join these tables with anything else, you could have very inefficient/weird plans because the database doesn't know (without analyze) how many rows you have in these tables.​

 
 
PS: your setup look pretty complicated and hard to analyze without seeing all involved table structures, transaction/query flow, and (especially) involved procedures source code.

Sure :) At this point, I've put together the "bulk merge" code as well. I can't quite see much of a difference, actually, but it's hard to trust the execution times, as on the same amount of data they vary from, say, 0.5s to 2s, and the sample data is not stepping on any other locks. In general, I'm afraid all those left joins and multiple scans, even over small amount of data, is nullifying any positive effect.
 

Now some ideas to check.
The high CPU usage usually isn't related to locking, but related to seq scan or wrong plans or simple inefficient pl/pgsql code, locked processes usually doesn't use too much cpu.

1)on the test database perform  select pg_stat_reset(); then perform full round of merges, then check
select * from pg_stat_user_tables where seq_scan>0 order by seq_tup_read;
and if you find a lot of seq_scan and seq_tuple_reads on the particular table try find where they coming from (it could be reason for high CPU usage).

2)enable track_functions in postgresql.conf and perform the same sequence (select pg_stat_reset() + full round of merges
) then check
select * FROM pg_stat_user_functions order by self_time desc;
and check which function using the most time.

3)old/lost prepared transactions can have deadly effect on the database performance at whole. So check select * from pg_prepared_xact(); and verify that you don't have a hours (or weeks) old prepared xact lying around.

PS: btw I still don't fully understood relation between the:
"
- merges data into its own node tables (using merge_xxx PL/pgSQL functions)
"
and provided code for the public."merge_all02-9A-46-8B-C1-DD" and PUBLIC.merge_agrio.
As I see
public."merge_all02-9A-46-8B-C1-DD" calling PUBLIC.merge_agrio, and the PUBLIC.merge_agrio updates a global table R_AGRIO (but not the "own node table").


I think the best implementation of such task is asynchronous processing of this changes via background process. An application only inserts events into queue table (it lockless process), and some background process read these data from queue table and merge it into main table (again lockless because it single thread so no concurrent writes), and then delete the merged data from queue table.


--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
МойКруг: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


pgsql-general by date:

Previous
From: Michael Paquier
Date:
Subject: Re: How to exclude building/installing contrib modules on Windows
Next
From: Tom Dearman
Date:
Subject: View 'instead of' update row with new object