BUG #9557: Row not visible after receiving notification - Mailing list pgsql-bugs
From | marko@joh.to |
---|---|
Subject | BUG #9557: Row not visible after receiving notification |
Date | |
Msg-id | 20140313093922.335.80556@wrigleys.postgresql.org Whole thread Raw |
Responses |
Re: BUG #9557: Row not visible after receiving notification
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 9557 Logged by: Marko Tiikkaja Email address: marko@joh.to PostgreSQL version: 9.1.12 Operating system: Linux Description: Hi, We went live with an application a few days ago, and it's been having problems with seeing rows after receiving a notification from the transaction that committed it. I can't say for sure how often it happens, but I've seen evidence suggesting at least three separate occasions. The general design is that we have two independent "RPC Workers" LISTENing for notifications. Upon receiving a notification, they call a function Claim_Request() which does this: BEGIN SELECT Requests.RequestID INTO _RequestID FROM rpcworker.Requests WHERE state = 'UNCLAIMED' AND -- sanity check Requests.Claimed IS NULL ORDER BY Requests.RequestID LIMIT 1 FOR UPDATE ; IF NOT FOUND THEN RETURN; END IF; -- otherwise UPDATE to set claimed=now() I suspected it might be something wonky with FOR UPDATE, so I added an extra query sent right before Claim_Request(): SELECT Requests.RequestID FROM rpcworker.Requests WHERE state = 'UNCLAIMED' AND Requests.Claimed IS NULL ORDER BY Requests.RequestID LIMIT 1 Below I have attached the relevant portions of the log when this happened. In the example, the SELECT without the FOR UPDATE did not see the row, but the subsequent Claim_Request() did (supposedly due to the small latency added by the extra round-trip by the SELECT before it). RPCWorker B appeared to be working "normally" in this case (though it was a bit slower to react than A), but I've attached its logs for posterity as well. In previous cases, where both of them called Claim_Request() directly, we saw a couple of instances where *neither* of the RPC workers saw the new row, which suggests something's terribly wrong. ## RPCWORKER B (UTC): ## received notification with payload 8839 2014-03-12 21:11:27.250 +0000 LOG: received notification on channel "RPC Worker" 2014-03-12 21:11:27.253 +0000 LOG: saw requestid 8839 without FOR UPDATE ## RPCWORKER A (UTC): ## received notification with payload 8839 2014-03-12 21:11:27.240 +0000 LOG: received notification on channel "RPC Worker" 2014-03-12 21:11:27.245 +0000 LOG: plain SELECT: sql: no rows in result set 2014-03-12 21:11:27.251 +0000 WARNING: SNAKE: saw 0, claimed 8839 2014-03-12 21:11:27.251 +0000 LOG: received requestid 8839 2014-03-12 21:11:27.253 +0000 LOG: plain SELECT: sql: no rows in result set 2014-03-12 21:11:27.531 +0000 LOG: request 8839 (attempt 1) succeeded 2014-03-12 21:11:27.562 +0000 LOG: finished request 8839 (attempt 1) ## DBWORKER (UTC + 1): ## 2014-03-12 22:11:27.237 CET,"dbworker","db",34682,"10.1.1.71:42307",531f39d8.877a,211904,"SELECT",2014-03-11 17:29:12 CET,583/727697,1268656677,WARNING,01000," # the above line is the last piece of information we see from the transaction sending the notification # committed somewhere around here 2014-03-12 22:11:27.246 CET,"dbworker","db",34682,"10.1.1.71:42307",531f39d8.877a,211905,"SELECT",2014-03-11 17:29:12 CET,583/0,0,LOG,00000,"duration: 42.826 ms",,,,,,,,"exec_simple_query, postgres.c:1090","" # known committed at this point (but a query has already been executed!) ## RPCWORKER A IN DB (UTC + 1): ## 2014-03-12 22:11:21.052 CET,"rpcworker","db",33568,"10.1.1.72:48556",53207d07.8320,12480,"SELECT",2014-03-12 16:28:07 CET,34/0,0,LOG,00000,"duration: 0.741 ms",,,,,,,,"exec_simple_query, postgres.c:1090","rpcworker # idle for a good while here 2014-03-12 22:11:27.245 CET,"rpcworker","db",33568,"10.1.1.72:48556",53207d07.8320,12481,"idle",2014-03-12 16:28:07 CET,34/2000571,0,LOG,00000,"statement: -- NO FOR UPDATE 2014-03-12 22:11:27.245 CET,"rpcworker","db",33568,"10.1.1.72:48556",53207d07.8320,12482,"SELECT",2014-03-12 16:28:07 CET,34/0,0,LOG,00000,"duration: 0.983 ms",,,,,,,,"exec_simple_query, postgres.c:1090","rpcworker" # didn't see the row 2014-03-12 22:11:27.252 CET,"rpcworker","db",33568,"10.1.1.72:48556",53207d07.8320,12483,"idle",2014-03-12 16:28:07 CET,34/2000572,0,LOG,00000,"statement: -- Claim_Request() # sees the row! ## RPCWORKER B IN DB (UTC + 1): ## 2014-03-12 22:11:27.253 CET,"rpcworker","db",32865,"10.2.1.72:33236",53207cee.8061,14392,"idle",2014-03-12 16:27:42 CET,95/1580775,0,LOG,00000,"statement: -- NO FOR UPDATE # is a bit sluggish, so sees the row! 2014-03-12 22:11:27.254 CET,"rpcworker","db",32865,"10.2.1.72:33236",53207cee.8061,14394,"idle",2014-03-12 16:27:42 CET,95/1580776,0,LOG,00000,"statement: -- Claim_Request() # would likely see the row, but already claimed by worker A
pgsql-bugs by date: