Slow concurrent processing - Mailing list pgsql-performance
From | Misa Simic |
---|---|
Subject | Slow concurrent processing |
Date | |
Msg-id | CAH3i69mZ-e1rQshbimu24GVKHV5pKx-oPHaELqnZYo6Ed_y5TA@mail.gmail.com Whole thread Raw |
Responses |
Re: Slow concurrent processing
Re: Slow concurrent processing |
List | pgsql-performance |
Hi all,
We have one table with list of "records for processing"...
We loop trough that table and call one long runing function:
do_the_math_for_record(record_id)
which use different tables for select related rows for input record_id, do some calculations and insert results in two tables...
and we have made 1 function process_all_records()
what simply does: SELECT do_the_math_for_record(record_id) FROM records_for_processing
When we run that function - it last about 4 minutes...
There are about 300 rows in records_for_processing... we have logged the time on the beginning of do_the_math, and the time in end of do the math... and noticed that processing each row, last between 0.5 to 2 seconds...
so our do_the_math looks like:
PERFORM log_time(record_id, clock_timestamp(), 1)
PERFORM do_the_math_and_save_results(record_id);
PERFORM log_time(record_id, clock_timestamp(), 2)
Then we thought, if we take all "records for processing" and process each in separate connection - it should last longer...
but - got worse result! (using 30 concurrent connections...)... about 7 mins...
if we reduce concurrent connections on 10 - we got result in approx the same time as sequential processing...
but - if replace do_the_math_and_save_results with pg_sleep(1); To simulate long running function so processing each row - last 1 sec...
Sequential processing last as expected 300 seconds!
Concurrent processing last faster with higher number of concurrent connections - about 30 seconds with 30 connections! (much faster - and expected...)
however, if we return our: do_the_math_and_save_results - we can't get better results in concurrent processing...
with higher number of conccurent connections - result is worse... also we have noticed that for some records difference between end_time and start_time si even longer than 1 min - but it is random - not always on the same id... i.e. in this concurrent run lasts 1 min - in next 1 sec - but some other takes about 1 min...
Any idea - why? :)
It says to me - that there is somewhere lock on some tables - so probably our concurrent connections wait - to other finish... but I cant figure out: what and why...
do_the_math_and_save results - selects data from 10 other tables, calculates something, and results inserts in other tables...
there are about 3 tracking tables with (record_id - other data...... and about 7 settings tables what we join to tracking tables to get all info...), then do the math with that info - and insert results..
we don't do any update... (to have possibility two connections want to update the same row in the same table)
data from tracking_tables - should be separate sets of data for two differenet record_ids...
(joined rows from settings tables could be common - for two sets of different record_id)
but - even they are the same set - SELECTs should not lock the rows in tables...
There are places where we do:
INSERT INTO result_table (columns)
SELECT query (tracking and settings tables joined)
Is there a chance it does some lock somewhere?
can above query be run "concurrently"?
Many thanks,
Misa
pgsql-performance by date: