Thread: Queries getting canceled inside a proc that seems to slow down randomly
We're using postgres 9.2.5. Every couple of days we'll have a query get cancelled and it is always at the start of one of our custom procedures. The query is typically part of a php loop that runs in less than a second, however when the issue occurs, that pass through the loop takes multiple seconds (~3-4) before the query cancels. The loop continues to run after the issue occurs, so the php script itself is not terminating. We also see it in non-loop based updates and selects, so it's not tied to that particular script. It's possible that some of the selects are actually cancelled by the user, but the inserts have no interaction with the users. The updates are triggered by the user, but there's no way for them to cancel it short of killing their client, and the timeframe for them to do that would (normally) be minuscule.
Are we doing anything weird with this procedure? Is there anything more I can do to get more info as to why/how the cancellation is happening or why the function would slow down seemingly randomly?
ERROR: canceling statement due to user request
CONTEXT: PL/pgSQL function chooselast(character varying,character varying) line 1 at IF
SQL statement "INSERT INTO partition_2015 VALUES (NEW.*)"
PL/pgSQL function table1_insert_trigger() line 4 at SQL statement
STATEMENT: INSERT into table1 (create_time,cusid,last1) Values ('NOW','8175','ROBERT'')
ERROR: canceling statement due to user request
CONTEXT: PL/pgSQL function chooselast(character varying,character varying) line 1 at IF
STATEMENT: SELECT * FROM table2 WHERE (cusid = 2521) AND LOWER(chooselast(last1,last2)) LIKE LOWER('87092%')
ERROR: canceling statement due to user request
CONTEXT: PL/pgSQL function chooselast(character varying,character varying) line 1 at IF
STATEMENT: update table1 set status='DELETE' where id=200498919
partition_2015 (on table 1) has one index that references chooselast: "pik_last_2015" btree (cusid, lower(chooselast(last1, last2)::text))
I'm not sure why an update on table1 that does not change last1 or last2 would touch the index, so why would we even call the chooselast procedure?
table2 has no indexes that reference chooselast, but is also partitioned (by year as well).
db01=# select * from pg_proc where proname='chooselast';
-[ RECORD 1 ]---+----------------------------------------------------------------------------------------------------------------
proname | chooselast
pronamespace | 2200
proowner | 10
prolang | 12599
procost | 100
prorows | 0
provariadic | 0
protransform | -
proisagg | f
proiswindow | f
prosecdef | f
proleakproof | f
proisstrict | f
proretset | f
provolatile | i
pronargs | 2
pronargdefaults | 0
prorettype | 1043
proargtypes | 1043 1043
proallargtypes |
proargmodes |
proargnames |
proargdefaults |
prosrc | DECLARE t text; BEGIN IF (character_length($1) > 0) THEN t = $1; ELSE t = $2; END IF; RETURN t; END;
probin |
proconfig |
proacl |
On 11/11/15 11:57 AM, Skarsol wrote: > Are we doing anything weird with this procedure? Is there anything more > I can do to get more info as to why/how the cancellation is happening or > why the function would slow down seemingly randomly? > > ERROR: canceling statement due to user request > CONTEXT: PL/pgSQL function chooselast(character varying,character > varying) line 1 at IF > SQL statement "INSERT INTO partition_2015 VALUES (NEW.*)" > PL/pgSQL function table1_insert_trigger() line 4 at SQL statement > STATEMENT: INSERT into table1 (create_time,cusid,last1) Values > ('NOW','8175','ROBERT'') The error tells you what's causing this; it's a client-side interrupt. Actually, looking at the code, you might get the same request if someone sent a signal to the relevant backend, either at the OS level or via pg_cancel_backend(). You can test that and see what error you get. A statement_timeout expiration would give you a different error. As for the hang, maybe someone is ALTERing or replacing the function? BTW, you could write that function in the SQL language, which might allow the optimizer to inline it. Even if it couldn't, you'd probably still see a performance gain from not firing up plpgsql on every row. Though, if you didn't allow empty strings in last1, you could also just replace that whole function with coalesce(). I see the function is marked IMMUTABLE, which is good. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
On 11/11/15 11:57 AM, Skarsol wrote:Are we doing anything weird with this procedure? Is there anything more
I can do to get more info as to why/how the cancellation is happening or
why the function would slow down seemingly randomly?
ERROR: canceling statement due to user request
CONTEXT: PL/pgSQL function chooselast(character varying,character
varying) line 1 at IF
SQL statement "INSERT INTO partition_2015 VALUES (NEW.*)"
PL/pgSQL function table1_insert_trigger() line 4 at SQL statement
STATEMENT: INSERT into table1 (create_time,cusid,last1) Values
('NOW','8175','ROBERT'')
The error tells you what's causing this; it's a client-side interrupt. Actually, looking at the code, you might get the same request if someone sent a signal to the relevant backend, either at the OS level or via pg_cancel_backend(). You can test that and see what error you get. A statement_timeout expiration would give you a different error.
As for the hang, maybe someone is ALTERing or replacing the function?
BTW, you could write that function in the SQL language, which might allow the optimizer to inline it. Even if it couldn't, you'd probably still see a performance gain from not firing up plpgsql on every row. Though, if you didn't allow empty strings in last1, you could also just replace that whole function with coalesce(). I see the function is marked IMMUTABLE, which is good.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Seq Scan on connection (cost=0.00..7.83 rows=1 width=60) (actual time=0.031..0.033 rows=1 loops=1)
Filter: (id = 109)
Rows Removed by Filter: 306
Total runtime: 0.047 ms
(4 rows)
postgres:
5018 2016-01-11 12:23:28.143 CST:ERROR: canceling statement due to user request
5018 2016-01-11 12:23:28.143 CST:STATEMENT: SELECT * FROM connection WHERE id=109
2016-01-11 12:21:46.600 32905 LOG S-0x17fd780: edi01/editor@127.0.0.1:6432 closing because: server idle timeout (age=612)
2016-01-11 12:23:28.143 32905 LOG S-0x17fd780: edi01/editor@127.0.0.1:6432 closing because: sent cancel req (age=0)
2016-01-11 12:26:31.510 32905 LOG S-0x17fd780: edi01/editor@127.0.0.1:6432 new connection to server
On 1/11/16 1:32 PM, Skarsol wrote: > pgbouncer (S-0x17fd780 entries): > 2016-01-11 12:21:46.600 32905 LOG S-0x17fd780: > edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> closing > because: server idle timeout (age=612) > 2016-01-11 12:23:28.142 32905 LOG S-0x17fd780: > edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> new > connection to server > 2016-01-11 12:23:28.143 32905 LOG S-0x17fd780: > edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> closing > because: sent cancel req (age=0) > 2016-01-11 12:26:31.510 32905 LOG S-0x17fd780: > edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> new > connection to server This makes me think there's a race condition in pgBouncer, or that their logging is just confusing. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/11/16 1:32 PM, Skarsol wrote:pgbouncer (S-0x17fd780 entries):
2016-01-11 12:21:46.600 32905 LOG S-0x17fd780:
edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> closing
because: server idle timeout (age=612)
2016-01-11 12:23:28.142 32905 LOG S-0x17fd780:
edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> new
connection to server
2016-01-11 12:23:28.143 32905 LOG S-0x17fd780:
edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> closing
because: sent cancel req (age=0)
2016-01-11 12:26:31.510 32905 LOG S-0x17fd780:
edi01/editor@127.0.0.1:6432 <http://editor@127.0.0.1:6432> new
connection to server
This makes me think there's a race condition in pgBouncer, or that their logging is just confusing.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
Okay, I'll go visit the pgbouncer guys and see if they can shed any light. Thanks.