Thread: Queries getting canceled inside a proc that seems to slow down randomly

Queries getting canceled inside a proc that seems to slow down randomly

From
Skarsol
Date:

We're using postgres 9.2.5. Every couple of days we'll have a query get cancelled and it is always at the start of one of our custom procedures. The query is typically part of a php loop that runs in less than a second, however when the issue occurs, that pass through the loop takes multiple seconds (~3-4) before the query cancels. The loop continues to run after the issue occurs, so the php script itself is not terminating. We also see it in non-loop based updates and selects, so it's not tied to that particular script. It's possible that some of the selects are actually cancelled by the user, but the inserts have no interaction with the users. The updates are triggered by the user, but there's no way for them to cancel it short of killing their client, and the timeframe for them to do that would (normally) be minuscule.

Are we doing anything weird with this procedure? Is there anything more I can do to get more info as to why/how the cancellation is happening or why the function would slow down seemingly randomly?

 

ERROR:  canceling statement due to user request
CONTEXT:  PL/pgSQL function chooselast(character varying,character varying) line 1 at IF
        SQL statement "INSERT INTO partition_2015 VALUES (NEW.*)"
        PL/pgSQL function table1_insert_trigger() line 4 at SQL statement
STATEMENT:  INSERT into table1 (create_time,cusid,last1) Values ('NOW','8175','ROBERT'')


ERROR:  canceling statement due to user request
CONTEXT:  PL/pgSQL function chooselast(character varying,character varying) line 1 at IF
STATEMENT:  SELECT * FROM table2 WHERE   (cusid = 2521) AND LOWER(chooselast(last1,last2)) LIKE LOWER('87092%')

ERROR:  canceling statement due to user request
CONTEXT:  PL/pgSQL function chooselast(character varying,character varying) line 1 at IF
STATEMENT:  update table1 set status='DELETE' where id=200498919

partition_2015 (on table 1) has one index that references chooselast: "pik_last_2015" btree (cusid, lower(chooselast(last1, last2)::text))


I'm not sure why an update on table1 that does not change last1 or last2 would touch the index, so why would we even call the chooselast procedure?


table2 has no indexes that reference chooselast, but is also partitioned (by year as well).


db01=# select * from pg_proc where proname='chooselast';
-[ RECORD 1 ]---+----------------------------------------------------------------------------------------------------------------
proname         | chooselast
pronamespace    | 2200
proowner        | 10
prolang         | 12599
procost         | 100
prorows         | 0
provariadic     | 0
protransform    | -
proisagg        | f
proiswindow     | f
prosecdef       | f
proleakproof    | f
proisstrict     | f
proretset       | f
provolatile     | i
pronargs        | 2
pronargdefaults | 0
prorettype      | 1043
proargtypes     | 1043 1043
proallargtypes  |
proargmodes     |
proargnames     |
proargdefaults  |
prosrc          |  DECLARE t text;  BEGIN  IF  (character_length($1) > 0) THEN  t = $1; ELSE  t = $2;   END IF;  RETURN t; END;
probin          |
proconfig       |
proacl          |

 

Re: Queries getting canceled inside a proc that seems to slow down randomly

From
Jim Nasby
Date:
On 11/11/15 11:57 AM, Skarsol wrote:
> Are we doing anything weird with this procedure? Is there anything more
> I can do to get more info as to why/how the cancellation is happening or
> why the function would slow down seemingly randomly?
>
> ERROR:  canceling statement due to user request
> CONTEXT:  PL/pgSQL function chooselast(character varying,character
> varying) line 1 at IF
>          SQL statement "INSERT INTO partition_2015 VALUES (NEW.*)"
>          PL/pgSQL function table1_insert_trigger() line 4 at SQL statement
> STATEMENT:  INSERT into table1 (create_time,cusid,last1) Values
> ('NOW','8175','ROBERT'')

The error tells you what's causing this; it's a client-side interrupt.
Actually, looking at the code, you might get the same request if someone
sent a signal to the relevant backend, either at the OS level or via
pg_cancel_backend(). You can test that and see what error you get. A
statement_timeout expiration would give you a different error.

As for the hang, maybe someone is ALTERing or replacing the function?

BTW, you could write that function in the SQL language, which might
allow the optimizer to inline it. Even if it couldn't, you'd probably
still see a performance gain from not firing up plpgsql on every row.
Though, if you didn't allow empty strings in last1, you could also just
replace that whole function with coalesce(). I see the function is
marked IMMUTABLE, which is good.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


On Fri, Nov 13, 2015 at 3:50 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 11/11/15 11:57 AM, Skarsol wrote:
Are we doing anything weird with this procedure? Is there anything more
I can do to get more info as to why/how the cancellation is happening or
why the function would slow down seemingly randomly?

ERROR:  canceling statement due to user request
CONTEXT:  PL/pgSQL function chooselast(character varying,character
varying) line 1 at IF
         SQL statement "INSERT INTO partition_2015 VALUES (NEW.*)"
         PL/pgSQL function table1_insert_trigger() line 4 at SQL statement
STATEMENT:  INSERT into table1 (create_time,cusid,last1) Values
('NOW','8175','ROBERT'')

The error tells you what's causing this; it's a client-side interrupt. Actually, looking at the code, you might get the same request if someone sent a signal to the relevant backend, either at the OS level or via pg_cancel_backend(). You can test that and see what error you get. A statement_timeout expiration would give you a different error.

As for the hang, maybe someone is ALTERing or replacing the function?

BTW, you could write that function in the SQL language, which might allow the optimizer to inline it. Even if it couldn't, you'd probably still see a performance gain from not firing up plpgsql on every row. Though, if you didn't allow empty strings in last1, you could also just replace that whole function with coalesce(). I see the function is marked IMMUTABLE, which is good.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

No one is ALTERing or replacing the function (I'm the only person that would). Recently we had an automated process attempted to load a file containing one record (which usually takes under a second) and it failed because of this issue (the insert was cancelled due to user request at the IF in the chooselast function).

Similarly, but unrelated to the functions, we had a PHP script running as a headless daemon process get this error (ERROR:  canceling statement due to user request) while running 'SELECT * FROM connection WHERE id=109'. Everyone was at lunch, so there's no way a user could have cancelled it. These scripts run for months at a time with no issues, so it's not timeout related. It's not a long running or complicated query:

                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Seq Scan on connection  (cost=0.00..7.83 rows=1 width=60) (actual time=0.031..0.033 rows=1 loops=1)
   Filter: (id = 109)
   Rows Removed by Filter: 306
 Total runtime: 0.047 ms
(4 rows)

This is going through pgbouncer and the related log entries are:

postgres:
5018 2016-01-11 12:23:28.143 CST:ERROR:  canceling statement due to user request
5018 2016-01-11 12:23:28.143 CST:STATEMENT:  SELECT * FROM connection WHERE id=109

pgbouncer (S-0x17fd780 entries):
2016-01-11 12:21:46.600 32905 LOG S-0x17fd780: edi01/editor@127.0.0.1:6432 closing because: server idle timeout (age=612)
2016-01-11 12:23:28.142 32905 LOG S-0x17fd780: edi01/editor@127.0.0.1:6432 new connection to server
2016-01-11 12:23:28.143 32905 LOG S-0x17fd780: edi01/editor@127.0.0.1:6432 closing because: sent cancel req (age=0)
2016-01-11 12:26:31.510 32905 LOG S-0x17fd780: edi01/editor@127.0.0.1:6432 new connection to server





Re: Queries getting canceled inside a proc that seems to slow down randomly

From
Jim Nasby
Date:
On 1/11/16 1:32 PM, Skarsol wrote:
> pgbouncer (S-0x17fd780 entries):
> 2016-01-11 12:21:46.600 32905 LOG S-0x17fd780:
> edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> closing
> because: server idle timeout (age=612)
> 2016-01-11 12:23:28.142 32905 LOG S-0x17fd780:
> edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> new
> connection to server
> 2016-01-11 12:23:28.143 32905 LOG S-0x17fd780:
> edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> closing
> because: sent cancel req (age=0)
> 2016-01-11 12:26:31.510 32905 LOG S-0x17fd780:
> edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> new
> connection to server

This makes me think there's a race condition in pgBouncer, or that their
logging is just confusing.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


On Mon, Jan 11, 2016 at 1:39 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/11/16 1:32 PM, Skarsol wrote:
pgbouncer (S-0x17fd780 entries):
2016-01-11 12:21:46.600 32905 LOG S-0x17fd780:
edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> closing
because: server idle timeout (age=612)
2016-01-11 12:23:28.142 32905 LOG S-0x17fd780:
edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> new
connection to server
2016-01-11 12:23:28.143 32905 LOG S-0x17fd780:
edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> closing
because: sent cancel req (age=0)
2016-01-11 12:26:31.510 32905 LOG S-0x17fd780:
edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> new
connection to server

This makes me think there's a race condition in pgBouncer, or that their logging is just confusing.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Okay, I'll go visit the pgbouncer guys and see if they can shed any light. Thanks.