Thread: Function performance drops during execution of loop

Function performance drops during execution of loop

From
Seref Arikan
Date:
Greetings, 
I have a function that loops over a set of ids, calls a function inside the loop using the current id as the parameter, and adds the result of the function call to a temp table. 

When the loop terminates, the temp table has all the rows from the function call(s) and it is returned. 

I've noticed that the function was taking longer than it should, so I've used RAISE to write to postgresql log and watch performance with tail -f /.../logfile 
What I'm seeing is that the performance drops as the loop progresses. This is a bit confusing for me due to following reasons:

This database contains synthetic data, so the ids returned by the loop would point at sets of rows which are repetitions of the same pattern such as row1, row2, row3 and then again row1, row2, row3 so the loop is not dealing with changing data size or complexity. 

The inner function simply selects a set of rows from source tables, inserts them into a temp table created by the outer function (where the loop is) performs joins and returns the result. The inner function deletes all rows from the temp tables at entry, so the temp table should not be increasing in size (or so I think) 

But something is building up during execution which is leading to increased times for processing of the same number of records. First the outer function, then the log file:

CREATE OR REPLACE FUNCTION public.iterate_groups()
RETURNS TABLE (
--function return type defined here
)
AS
$BODY$
DECLARE
temp_row RECORD;
    timer TEXT;
    rec_counter INTEGER;
BEGIN
CREATE TEMP TABLE RESULTS_TABLE(
--this temp table matches the function return type
) ON COMMIT DROP;
            
            --nodes temp table
            CREATE TEMP TABLE criterianodes (
              --table 1 to be used by function called from the loop
            ) ON COMMIT DROP;
           
         --structure temp table
            --table 2 to be used by function called from the loop
          ) ON COMMIT DROP;

          
RAISE NOTICE 'Starting loop over all documents';            
    select timeofday() into timer;    
    RAISE NOTICE '%', timer;

    rec_counter := 0;
FOR temp_row in  select distinct id1, id2 from temp_eav_table_global LOOP
    
    rec_counter := rec_counter + 1;
    
PERFORM query_instance_graph2(temp_row.id2);  
        
        IF  (rec_counter % 100) = 0 THEN
          RAISE NOTICE '%:', rec_counter;
          select timeofday() into timer;
          RAISE NOTICE '%', timer;
        END IF;
        
    END LOOP;
    RETURN query select * from RESULTS_TABLE;
END
$BODY$
Language 'plpgsql';


log file shows:

NOTICE:  Starting loop over all documents
NOTICE:  Wed May 21 16:18:51.075245 2014 BST
NOTICE:  100:
NOTICE:  Wed May 21 16:19:04.306767 2014 BST
NOTICE:  200:
NOTICE:  Wed May 21 16:19:08.499653 2014 BST
NOTICE:  300:
NOTICE:  Wed May 21 16:19:14.917691 2014 BST
NOTICE:  400:
NOTICE:  Wed May 21 16:19:23.566721 2014 BST
NOTICE:  500:
NOTICE:  Wed May 21 16:19:34.369962 2014 BST
NOTICE:  600:
NOTICE:  Wed May 21 16:19:47.464242 2014 BST
NOTICE:  700:
NOTICE:  Wed May 21 16:20:02.598676 2014 BST
NOTICE:  800:
NOTICE:  Wed May 21 16:20:20.083649 2014 BST
....

After a while the time between calls gets longer and longer. 

I kept the temp tables in the outer function or else I'd have to drop them at entry to the inner function and it would give an out of shared memory error (which it did in the previous version of the code) due to locks being created as table is created and dropped thousands of times. 

I'm trying to fit a large computation to a limited memory by using the cursor (loop) and repeatedly calling the inner function which will operate with limited data using the temp tables. 

What may be building up here? I suspect deleting all rows from the temp tables is not really deleting them since this is all happening in a transaction, but it is my uneducated guess only. 

Cheers
Seref





Re: Function performance drops during execution of loop

From
Tom Lane
Date:
Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:
> What may be building up here? I suspect deleting all rows from the temp
> tables is not really deleting them since this is all happening in a
> transaction, but it is my uneducated guess only.

I suspect you suspect correctly.  Autovacuum does not touch temp tables,
so it won't help you deal with deleted tuples.  Given the usage pattern
you're describing, I think that using a TRUNCATE rather than
delete-all-the-rows would help ... but if you're already doing that,
we need more info.

            regards, tom lane


Re: Function performance drops during execution of loop

From
Seref Arikan
Date:
Hi Tom, 
Thanks for the feedback. I've moved the temp tables to internal function, increased max_locks_per_transaction and dropped the tables instead of deleting them. 

The performance drop is till there, but it is much, much less then the previous case. Previously the whole execution took 04:36:14 and when I use the drop table approach it takes 00:01:44

less then 2 minutes compared to 4 and a half hours...

I will try TRUNCATE as well and see what happens then. Oh, btw, I realised I've used perform in the version of the code I've included in my initial message, it was simply to eliminate the main temp table in the outer query as a suspect.

Regards
Seref



On Wed, May 21, 2014 at 4:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:
> What may be building up here? I suspect deleting all rows from the temp
> tables is not really deleting them since this is all happening in a
> transaction, but it is my uneducated guess only.

I suspect you suspect correctly.  Autovacuum does not touch temp tables,
so it won't help you deal with deleted tuples.  Given the usage pattern
you're describing, I think that using a TRUNCATE rather than
delete-all-the-rows would help ... but if you're already doing that,
we need more info.

                        regards, tom lane

Re: Function performance drops during execution of loop

From
Seref Arikan
Date:
Thanks a lot for the hint Tom! I've replaced deletes with TRUNCATE and it gave a performance of 50.950 sec which is twice as fast as the drop temp table method, with the added benefit of not having to raise the max_locks_per_transaction. 

I also think I can't see the performance decrease pattern anymore, or the operation is completing before that happens, will generate more data and try again. 

Regards
Seref



On Wed, May 21, 2014 at 4:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:
> What may be building up here? I suspect deleting all rows from the temp
> tables is not really deleting them since this is all happening in a
> transaction, but it is my uneducated guess only.

I suspect you suspect correctly.  Autovacuum does not touch temp tables,
so it won't help you deal with deleted tuples.  Given the usage pattern
you're describing, I think that using a TRUNCATE rather than
delete-all-the-rows would help ... but if you're already doing that,
we need more info.

                        regards, tom lane

Re: Function performance drops during execution of loop

From
Seref Arikan
Date:
Just for the record, I've increased the data volume X10 and observed only quite small performance drop: average time per inner function call increased from 12.6 ms to 13.3 ms. 

Regards
Seref



On Wed, May 21, 2014 at 5:19 PM, Seref Arikan <serefarikan@kurumsalteknoloji.com> wrote:
Thanks a lot for the hint Tom! I've replaced deletes with TRUNCATE and it gave a performance of 50.950 sec which is twice as fast as the drop temp table method, with the added benefit of not having to raise the max_locks_per_transaction. 

I also think I can't see the performance decrease pattern anymore, or the operation is completing before that happens, will generate more data and try again. 

Regards
Seref



On Wed, May 21, 2014 at 4:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:
> What may be building up here? I suspect deleting all rows from the temp
> tables is not really deleting them since this is all happening in a
> transaction, but it is my uneducated guess only.

I suspect you suspect correctly.  Autovacuum does not touch temp tables,
so it won't help you deal with deleted tuples.  Given the usage pattern
you're describing, I think that using a TRUNCATE rather than
delete-all-the-rows would help ... but if you're already doing that,
we need more info.

                        regards, tom lane