Thread: Serialization, Locking...implement processing Queue with a table
I want to implement a processing Queue with records in a table. This means that I'd like to have multiple processors performing operations against a PostgreSQL database but have those processors only operating on one row each. Ideally, I'd have a PL/PGSQL function that uniquely updates (reserves a row for operation by a given process or thread) and returns only one row at a time from a table like this: int row_id = reserve_next_row(int processor_id); I'm tripping all over myself with transaction isolation modes, transactions, pl/pgsql and basic table locks. The behavior I expect is not happening. How should I go about implementing a synchronized process id queue that will select one unique row from a table at a time and make the selection be safe with concurrent accesses? To get more technical with my attempts, I am using Java/JDBC, PostgreSQL 7.3.2, PL/PGSQL, and have the following code: ---------- 8< -------------------- 8< -------------------- 8< ---------- CREATE OR REPLACE FUNCTION reserve_next_import (bigint) RETURNS bigint AS' DECLARE processor_id ALIAS FOR $1; my_import_id BIGINT; my_number INTEGER; my_import_state CHAR; BEGIN -- Gotta get a handle on this semaphore before you can get in here LOCK TABLE import IN EXCLUSIVE MODE; my_import_id := -1; -- Find the import ID we wish to reserve and get a lock on that row SELECT import_id, import_state INTO my_import_id, my_import_state FROM import WHERE import_state = ''Q'' AND import_processor_id IS NULL ORDER BY import_id LIMIT 1; --FOR UPDATE; -- set this, and processes hit the NOT FOUND below IF NOT FOUND THEN RAISE NOTICE ''No Items left in the Queue.''; RETURN (-1); END IF; -- now go reserve the right to process that record UPDATE import SET import_processor_id = processor_id, import_prc_start = NULL, import_prc_end = NULL, import_state = ''R'' WHERE import_id = my_import_id; -- return the ID for us to process... RETURN (my_import_id); END; 'LANGUAGE 'plpgsql'; ---------- 8< -------------------- 8< -------------------- 8< ---------- Ideally, I could call this function from psql or JDBC and have it block all access to other processes or threads by calling 'LOCK TABLE import IN EXCLUSIVE MODE' and make sure that only one process gets inside the function at a time (like Java's 'synchronized()' function). Well, problem is that my psql instances are still seeing different views of the data and even though one process reserves an ID and updates the state to 'R', the next process doesn't see the update (if it has already started the function as is waiting at the lock) and so it will reserve the same ID in the SELECT. I attempted to fix this by using SELECT FOR UPDATE on my select statement, but problem there is that then the SELECT from the second process fails and a -1 is returned by my function. OK, so you see what I want? I want a function that locks all access to a table, reserves an ID and then releases the lock. I then want any other processes that are waiting for that lock to immediately see the updated information as they process the code inside the function. If I set TRANSACTION ISOLATION level to SERIALIZED in JDBC, then calls will fail if another process is inside the function already. I don't want this either. I want processes to WAIT at the lock then get in and successfully get their own ID. How is this Done? Anybody have an answer for me? Am I going about this the right way? Much help appreciated! Dante D. Dante Lorenso dante@lorenso.com 972-333-4139
"D. Dante Lorenso" <dante@lorenso.com> writes: > How should I go about implementing a synchronized process id > queue that will select one unique row from a table at a time > and make the selection be safe with concurrent accesses? You can find various discussions of this in the archives, but a reasonable way to proceed is: 1. The table of pending or in-process jobs has a column "processor_id" that is zero for pending jobs and equal to the (unique) processor number for active jobs. (Assume for the moment that completed jobs are removed from the table entirely.) 2. When idle, you try to reserve a job like so: BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT job_id, ... FROM job_table WHERE processor_id = 0 LIMIT 1 FOR UPDATE; The SELECT has three possible outcomes: 2a: One row is returned. You do UPDATE job_table SET processor_id = $me WHERE job_id = $jobid; COMMIT; and then go about executing the job. When done, delete the row from job_table and try to get another one. 2b: No row is returned: no jobs are pending. Commit your transaction, sleep for an appropriate delay period, and try again. 2c: You get a "can't serialize" failure. This will happen if two processors try to reserve the same row at the same time. In this case, roll back your transaction, sleep for a short interval (maybe a few msec) and try again. You don't want to sleep as long as normal in this case, since there might be another available job. (Note that you *cannot* do this in a plpgsql function, since it cannot start or commit a transaction; these commands have got to be directly issued by the application.) Assuming that there aren't a vast number of pending jobs at any time, this should work pretty well without even bothering with an index on job_table. You will want to vacuum it often though (at least every few hundred job completions, I'd think). Now, what if you wanted to remember completed jobs? I'd actually recommend transferring the records of completed jobs to a different table. But if you really want to keep them in the same table, maybe add a boolean "completed" field, and make the initial SELECT be SELECT job_id, ... FROM job_table WHERE processor_id = 0 AND NOT completed LIMIT 1 FOR UPDATE; Now you *will* need an index to keep things speedy. I'd try a partial index on processor_id with condition "NOT completed". You'll still need frequent vacuums. regards, tom lane
Tom, Thanks for the excellent reply. I was tossing solutions back and forth and came across this one, but I don't like the idea of failing on a transaction and having to retry it after a delay, so I've come up with this... What do you think of my alternative solution:? In Java, I have a function like this which begins a transaction, locks a dummy table exclusively, and then runs the stored procedure to reserve the next record for processing: ---------- 8< -------------------- 8< -------------------- public int reserveQueuedImport(int pid) throws SQLException { Connection conn = LeadDBConnection.getConnection(); // Reserve an import (for processing), and return it's ID. PreparedStatement pstmt = conn.prepareStatement( "" + "BEGIN TRANSACTION; " + "LOCK TABLE import_lock IN EXCLUSIVE MODE; " + "SELECT reserve_next_import(?) AS import_id; " + "COMMIT; "); pstmt.setInt(1, pid); ResultSet rec = pstmt.executeQuery(); // get the value from the first row and first column rec.first(); return (rec.getInt(1)); } ---------- 8< -------------------- 8< -------------------- Meanwhile, the PL/PGSQL stored procedure looks like this: ---------- 8< -------------------- 8< -------------------- CREATE OR REPLACE FUNCTION reserve_next_import (bigint) RETURNS bigint AS' DECLARE processor_id ALIAS FOR $1; my_import_id BIGINT; BEGIN -- initialize the id my_import_id := -1; -- Find the import ID we wish to reserve and get a lock on that row SELECT import_id INTO my_import_id FROM import WHERE import_state = ''Q'' AND import_processor_id IS NULL ORDER BY import_id LIMIT 1 FOR UPDATE; -- abort if there are no queued rows IF NOT FOUND THEN RETURN (-1); END IF; -- now go reserve the record with our processor id UPDATE import SET import_processor_id = processor_id, import_prc_start = NULL, import_prc_end = NULL, import_state = ''R'' WHERE import_id = my_import_id; -- this is the row we reserved... RETURN (my_import_id); END; 'LANGUAGE 'plpgsql'; ---------- 8< -------------------- 8< -------------------- What I've done is used the 'LOCK TABLE import_lock IN EXCLUSIVE MODE;' to create a 'synchronized' block around the code which reserves the item in the queue. This way, only one application or thread can run the PL/PGSQL function at a given time. There will be BLOCKING for applications that sit at the LOCK call, but that's more desireable than the Fail/Retry approach, eh? Can you confirm that this solution will perform as I expect while keeping the transaction isolation level at a READ COMMITTED mode instead of SERIALIZABLE? Oh, yeah, and as a note, the only purpose for the 'import_lock' table is to provide an object to LOCK on for this code. This table is empty and is not used for any other purpose. Is there any other 'lighter' objects I can create or lock on in PostgreSQL than a table like this? Dante D. Dante Lorenso dante@lorenso.com 972-333-4139 ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "D. Dante Lorenso" <dante@lorenso.com> Cc: <pgsql-general@postgresql.org> Sent: Monday, May 12, 2003 9:23 AM Subject: Re: [GENERAL] Serialization, Locking...implement processing Queue with a table > "D. Dante Lorenso" <dante@lorenso.com> writes: > > How should I go about implementing a synchronized process id > > queue that will select one unique row from a table at a time > > and make the selection be safe with concurrent accesses? > > You can find various discussions of this in the archives, but a > reasonable way to proceed is: > > 1. The table of pending or in-process jobs has a column "processor_id" > that is zero for pending jobs and equal to the (unique) processor number > for active jobs. (Assume for the moment that completed jobs are removed > from the table entirely.) > > 2. When idle, you try to reserve a job like so: > > BEGIN; > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > SELECT job_id, ... FROM job_table > WHERE processor_id = 0 LIMIT 1 FOR UPDATE; > > The SELECT has three possible outcomes: > > 2a: One row is returned. You do > > UPDATE job_table SET processor_id = $me > WHERE job_id = $jobid; > COMMIT; > > and then go about executing the job. When done, delete the row from > job_table and try to get another one. > > 2b: No row is returned: no jobs are pending. Commit your transaction, > sleep for an appropriate delay period, and try again. > > 2c: You get a "can't serialize" failure. This will happen if two > processors try to reserve the same row at the same time. In this case, > roll back your transaction, sleep for a short interval (maybe a few > msec) and try again. You don't want to sleep as long as normal in this > case, since there might be another available job. > > (Note that you *cannot* do this in a plpgsql function, since it cannot > start or commit a transaction; these commands have got to be directly > issued by the application.) > > > Assuming that there aren't a vast number of pending jobs at any time, > this should work pretty well without even bothering with an index on > job_table. You will want to vacuum it often though (at least every few > hundred job completions, I'd think). > > Now, what if you wanted to remember completed jobs? I'd actually > recommend transferring the records of completed jobs to a different > table. But if you really want to keep them in the same table, maybe > add a boolean "completed" field, and make the initial SELECT be > > SELECT job_id, ... FROM job_table > WHERE processor_id = 0 AND NOT completed LIMIT 1 FOR UPDATE; > > Now you *will* need an index to keep things speedy. I'd try a partial > index on processor_id with condition "NOT completed". You'll still > need frequent vacuums. > > regards, tom lane > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
On Mon, 12 May 2003 14:28:18 -0500, "D. Dante Lorenso" <dante@lorenso.com> wrote: >BLOCKING [is] more desireable than the Fail/Retry >approach, eh? > >Can you confirm that this solution will perform as I expect >while keeping the transaction isolation level at a >READ COMMITTED mode instead of SERIALIZABLE? Starting with Tom's suggestion I played around with READ COMMITTED. >>2. When idle, you try to reserve a job like so: >> >> BEGIN; -- TRANSACTION ISOLATION LEVEL is READ COMMITTED by default >> SELECT job_id, ... FROM job_table >> WHERE processor_id = 0 LIMIT 1 FOR UPDATE; The SELECT may be blocked for a while and has two possible outcomes: >>2a: One row is returned. You do >> >> UPDATE job_table SET processor_id = $me >> WHERE job_id = $jobid; >> COMMIT; >> >>and then go about executing the job. >> >>2b: No row is returned: This can have one of two reasons (i) no jobs are pending. Commit your transaction, >>sleep for an appropriate delay period, and try again. (ii) The row has been reserved by another transaction running at the same time. In this case, restart at SELECT FOR UPDATE. You can stay in the same transaction. And you don't need to sleep (this has already happened while SELECT FOR UPDATE was blocked), unless there are lots of job processors in which case it might be advisible to sleep for a short random time. How do you distinguish between (i) and (ii)? Just do SELECT job_id FROM job_table WHERE processor_id = 0 LIMIT 1; If this returns 0 rows, you have (i). If it returns one row, you have (ii). I didn't try, but you should be able to do this in a function. Your function looks like a good start, you just have to add a retry loop and a test for (i) vs. (ii) >CREATE OR REPLACE FUNCTION reserve_next_import (bigint) >RETURNS bigint AS' >DECLARE > processor_id ALIAS FOR $1; > my_import_id BIGINT; >BEGIN > -- initialize the id > my_import_id := -1; > -- start of loop here! > -- Find the import ID we wish to reserve and get a lock on that row > SELECT import_id INTO my_import_id > FROM import > WHERE import_state = ''Q'' > AND import_processor_id IS NULL > ORDER BY import_id > LIMIT 1 > FOR UPDATE; > > -- abort if there are no queued rows > IF NOT FOUND THEN SELECT ...; -- without FOR UPDATE IF NOT FOUND THEN > RETURN (-1); ELSE continue at top of loop END IF; > END IF; > > -- now go reserve the record with our processor id > UPDATE import SET > import_processor_id = processor_id, > import_prc_start = NULL, > import_prc_end = NULL, > import_state = ''R'' > WHERE import_id = my_import_id; > > -- this is the row we reserved... > RETURN (my_import_id); >END; >'LANGUAGE 'plpgsql'; Servus Manfred
On Wed, 14 May 2003 04:36:35 -0500, "D. Dante Lorenso" <dante@lorenso.com> wrote: >> FUNCTION reserve_job >> BEGIN >> SELECT and LOCK row_id matching our criteria. (SELECT FOR UPDATE) >> >> IF (row_id was found) THEN >> RESERVE row_id (UPDATE) >> RETURN (row_id) -- done >> ELSE >> Ask "Are you Sure?" there are no rows matching our criteria? >> IF (certainly no row_id exists) THEN >> RETURN (0) -- no row ID exists >> ELSE >> RETURN reserve_job -- recursive call >> END IF >> END IF >> END What I had in mind was more like FUNCTION reserve_job BEGIN LOOP SELECT ... FOR UPDATE; IF (row_id was found) THEN RESERVE row_id (UPDATE); RETURN (row_id); -- done ELSE SELECT ...; -- without FOR UPDATE IF (certainly no row_id exists) THEN RETURN (0) -- no row ID exists -- ELSE -- continue loop END IF; END IF; END LOOP; END; >Well, DAMNIT, this doesn't work. > >The problem is that this function does not see the COMMITED >data from other functions. I've updated my code to the following: You're right :-( I did some more tests and ended up with (note, however, that I have different names and data types): CREATE OR REPLACE FUNCTION get_next_job (int) RETURNS int AS ' DECLARE in_pid ALIAS FOR $1; my_reserved_id int; BEGIN -- Find the ID we wish to reserve and get a lock on that row SELECT id INTO my_reserved_id FROM job WHERE pr = 0 ORDER BY id LIMIT 1 FOR UPDATE; -- abort if there are no queued rows IF NOT FOUND THEN -- check again ... SELECT id INTO my_reserved_id FROM job WHERE pr = 0 ORDER BY id LIMIT 1; IF NOT FOUND THEN RETURN (-1); ELSE RAISE NOTICE ''GOT LOCKED IN RACE [%]'', my_reserved_id; RETURN (0); -- retry END IF; ELSE -- now go reserve the record with our processor id UPDATE job SET pr = in_pid WHERE id = my_reserved_id; -- this is the row we reserved... RETURN (my_reserved_id); END IF; END; ' LANGUAGE 'plpgsql'; So the caller has to cooperate a little: while (...) { job_id = get_next_job(my_proc_id); if (job_id < 0) { sleep(...); } else if (id == 0) { /* retry */ } else { process(job_id); } } Make sure that get_next_job() and process() are not called within the same transaction. >'LANGUAGE 'plpgsql' STABLE CALLED ON NULL INPUT SECURITY INVOKER; BTW, I wouldn't call this function STABLE. You said in your other message: |PROS | - in theory, code SHOULD be entirely contained within a | single stored procedure (less application coding needed) | - no locks needed (faster execution?) |CONS | - requires extra queries to determine empty queue | - may starve if a single process continues to grab the same | row as other processes Unlikely, unless you have really lots of processes. If two processes see the same request, only one of them can grab it. This process will be busy for a while processing the request, while the other process will immediately retry and grab the next open request. | - need recursive calls in PL/PGSQL? Better use a loop (in the caller). IMHO the biggest PRO is: + You can put additional conditions into the WHERE clause (e.g. queue_no = 42) and processors looking for different kinds of requests will not block each other. Servus Manfred
> ---------- 8< -------------------- 8< ---------- > FUNCTION reserve_job > BEGIN > SELECT and LOCK row_id matching our criteria. (SELECT FOR UPDATE) > > IF (row_id was found) THEN > RESERVE row_id (UPDATE) > RETURN (row_id) -- done > ELSE > Ask "Are you Sure?" there are no rows matching our criteria? > IF (certainly no row_id exists) THEN > RETURN (0) -- no row ID exists > ELSE > RETURN reserve_job -- recursive call > END IF > END IF > END > I am going to try to implement your suggestion and see > what I get in comparison to what I am seeing now with my > LOCK/WAIT code. Well, DAMNIT, this doesn't work. The problem goes back to whatever the heck is happening inside the PL/PGSQL function?! The problem is that this function does not see the COMMITED data from other functions. I've updated my code to the following: ---------- 8< -------------------- 8< -------------------- 8< ---------- CREATE OR REPLACE FUNCTION "public"."subs_get_next_queued" (bigint) RETURNS bigint AS' DECLARE in_pid ALIAS FOR $1; my_reserved_id BIGINT; BEGIN -- initialize the id my_reserved_id := -1; -- Find the ID we wish to reserve and get a lock on that row SELECT subs_id INTO my_reserved_id FROM subscription WHERE subs_start_bill_date <= now() AND subs_next_bill_date <= now() AND subs_processor_id IS NULL ORDER BY subs_id ASC LIMIT 1 FOR UPDATE; -- abort if there are no queued rows IF NOT FOUND THEN -- check again to see if there is something I need to process... SELECT subs_id INTO my_reserved_id FROM subscription WHERE subs_start_bill_date <= now() AND subs_next_bill_date <= now() AND subs_processor_id IS NULL ORDER BY subs_id ASC LIMIT 1; IF NOT FOUND THEN RETURN (-1); ELSE RAISE NOTICE ''GOT LOCKED IN RACE [%]'', my_reserved_id; SELECT subs_get_next_queued(in_pid) INTO my_reserved_id; RETURN (my_reserved_id); END IF; END IF; -- now go reserve the record with our processor id UPDATE subscription SET subs_processor_id = in_pid WHERE subs_id = my_reserved_id; -- this is the row we reserved... RETURN (my_reserved_id); END; 'LANGUAGE 'plpgsql' STABLE CALLED ON NULL INPUT SECURITY INVOKER; ---------- 8< -------------------- 8< -------------------- 8< ---------- Here is what is happening... Process one is started and is humming along just fine. Then, I kick process 2 into gear and as soon as a collision occurs, process 1 spins out of control and goes nuts in an infinite loop. Apparently, it does not see the changes of process 2 even though they have already been committed. PROCESS 1 PROCESS 2 subs_get_next_queued ---------------------- 126 (1 row) subs_get_next_queued ---------------------- 129 (1 row) subs_get_next_queued ---------------------- 140 (1 row) NOTICE: GOT LOCKED IN RACE [140] NOTICE: GOT LOCKED IN RACE [140] NOTICE: GOT LOCKED IN RACE [140] NOTICE: GOT LOCKED IN RACE [140] NOTICE: GOT LOCKED IN RACE [140] NOTICE: GOT LOCKED IN RACE [140] NOTICE: GOT LOCKED IN RACE [140] ... (continues forever and takes down the PostgreSQL server. OUCH!) subs_get_next_queued ---------------------- 176 (1 row) subs_get_next_queued ---------------------- 182 (1 row) (Continues until PostgreSQL dies because of other processes doings) Can someone please explain what is happening to the PL/PGSQL function in regards to commit visibility and transactions? Dante D. Dante Lorenso dante@lorenso.com 972-333-4139
> (i) no jobs are pending. Commit your transaction, > >>sleep for an appropriate delay period, and try again. > (ii) The row has been reserved by another transaction running at the > same time. In this case, restart at SELECT FOR UPDATE. You can stay > in the same transaction. And you don't need to sleep (this has > already happened while SELECT FOR UPDATE was blocked), unless there > are lots of job processors in which case it might be advisible to > sleep for a short random time. > > How do you distinguish between (i) and (ii)? Just do > SELECT job_id FROM job_table > WHERE processor_id = 0 LIMIT 1; > If this returns 0 rows, you have (i). > If it returns one row, you have (ii). You can't do it this way because if this select IS successful, you'll still need to LOCK the row for the update. Any way you try, you're in a race condition whereby another running thread may lock and/or update that row after you select. Remember, we are in READ COMMITTED mode. Without a lock on the row or a table, your chances are that the row will be updated out from under you. Here is a process that I've come up with that locks a bogus table for me to guarantee that only one process has the right to enter the stored procedure: ---------- 8< -------------------- BEGIN TRANSACTION; LOCK TABLE proc_lock IN EXCLUSIVE MODE; SELECT proc($process_id) AS RESULT; COMMIT; ---------- 8< -------------------- If I wrap my stored procedure calls with a call like this, I can guarantee that only one process gets INSIDE the procedure at a time. The "proc_lock" table can be any table and for my cases, I use a table named the same as the table I'm going to be operating on with _lock added to the end. This way I don't have to actually lock a table which is working for other selects by other apps that don't care about concurrency locks for queue processing. Now, my stored procedure can do any logic it wants and doesn't have to worry about concurrency. I tried to put the 'LOCK TABLE' statement as the first line in the PL/PGSQL procedure, but it didn't seem to work there. ex: ---------- 8< -------------------- CREATE OR REPLACE FUNCTION reserve_next_queued (bigint) RETURNS bigint AS' DECLARE processor_id ALIAS FOR $1; my_reserved_id BIGINT; BEGIN LOCK TABLE proc_lock IN EXCLUSIVE MODE; ... RETURN (my_reserved_id); END; 'LANGUAGE 'plpgsql'; ---------- 8< -------------------- Apparently locks work in the stored proc but READ COMMITTED does not? For me, it looks like once the procedure has been entered postgres taken a snapshot of the state of the database BEFORE the procedure was called. So, the lock blocks as it should, but the subsequent select will see the OLD state and not the updated COMMIT from the rows that were just updated by the other thread that held the lock. In other words, it looks like THIS is happening with app (1) and app (2) trying the same stored proc call: (1) begin transaction (2) begin transaction (1) call PL/PGSQL function (2) call PL/PGSQL function (1) LOCK table <- no wait, lock granted immediately (2) LOCK table <- lock already held BLOCKED (1) process PL/PGSQL function (1) exit PL/PGSQL function (1) UNLOCK table (2) LOCK table <- lock granted (2) process PL/PGSQL function (1) COMMIT; (2) exit PL/PGSQL function (2) COMMIT; Is that possible? Can it bee that the lock is being freed before the COMMIT is called? Or is the procedure somehow not allowing my other apps to see the committed data immediately? Pulling this lock out of the PL/PGSQL function to the calling transaction wrapper worked, though. In the future, I'd like to see PL/PGSQL allow me to embed transactions stuff like that. For now, I think my current solution is clean enough to use without having to write try/fail routines. This method trys, waits, then succeeds. It'd have to WAIT that long anyhow, but this this approach, the wait is handled but PostgreSQL and not my app. I'm interested in knowing if anyone sees a FLAW with this design. From my tests, I have code that works exactly as designed but not as cleanly as I'd like. What are your thoughts? Dante D. Dante Lorenso dante@direct2prospect.com 972-333-4139
On Wed, 14 May 2003 01:10:39 -0500, "D. Dante Lorenso" <dante@lorenso.com> wrote: >> How do you distinguish between (i) and (ii)? Just do >> SELECT job_id FROM job_table >> WHERE processor_id = 0 LIMIT 1; >> If this returns 0 rows, you have (i). >> If it returns one row, you have (ii). > >You can't do it this way Oh. > because if this select IS successful, >you'll still need to LOCK the row for the update. That's exactly the reason why I told you ... |(ii) The row has been reserved by another transaction running at the |same time. In this case, restart at SELECT FOR UPDATE. ^^ not after! If there is a row satisfying the WHERE clause but SELECT FOR UPDATE does not return it, this is the effect of a race condition: SELECT xmax,* FROM job; xmax | id | pr ------+----+---- 0 | 2 | 0 0 | 3 | 1 0 | 1 | 2 0 | 4 | 0 Session 1 Session 2 BEGIN; SELECT xmax,* FROM job WHERE pr = 0 FOR UPDATE LIMIT 1; xmax | id | pr ------+----+---- 0 | 2 | 0 (1 row) select xmax,* FROM job WHERE id = 2; xmax | id | pr -------+----+---- 58634 | 2 | 0 (1 row) BEGIN; select xmax,* FROM job WHERE pr = 0 LIMIT 1; xmax | id | pr -------+----+---- 58634 | 2 | 0 (1 row) SELECT xmax,* FROM job WHERE pr = 0 FOR UPDATE LIMIT 1; -- waits, because the row with id=2 -- satisfies the WHERE clause but -- is locked by transaction 58634 ... UPDATE job SET pr = 1 WHERE id = 2; -- The same would happen, if we did the -- SELECT FOR UPDATE here (after the -- UPDATE in the other session), because -- our *visible* version of the row -- still satisfies the WHERE clause. select xmax,* FROM job WHERE id = 2; xmax | id | pr ------+----+---- 0 | 2 | 1 (1 row) -- xmax = 0 because we see -- the newly inserted tuple COMMIT; -- continues ... xmax | id | pr ------+----+---- (0 rows) -- because the row this SELECT was about -- to return does not satisfy pr = 0 -- any more SELECT xmax,* FROM job WHERE pr = 0 LIMIT 1; xmax | id | pr ------+----+---- 0 | 4 | 0 (1 row) -- but there is another row, so ... SELECT xmax,* FROM job WHERE pr = 0 FOR UPDATE LIMIT 1; xmax | id | pr ------+----+---- 0 | 4 | 0 (1 row) -- does not necessarily return the same -- id as the previous SELECT UPDATE ...; COMMIT; HTH. Servus Manfred
"D. Dante Lorenso" <dante@lorenso.com> writes: > You can't do it this way because if this select IS successful, > you'll still need to LOCK the row for the update. That's why you should be using SELECT FOR UPDATE. regards, tom lane
Ah. Very good, Manfred. I see what you are saying now. This will indeed work. And it works without locking. I like it. Hmmm... So, to convert your SQL back into pseudo-code of my own ... You are saying it works something like this: ---------- 8< -------------------- 8< ---------- FUNCTION reserve_job BEGIN SELECT and LOCK row_id matching our criteria. (SELECT FOR UPDATE) IF (row_id was found) THEN RESERVE row_id (UPDATE) RETURN (row_id) -- done ELSE Ask "Are you Sure?" there are no rows matching our criteria? IF (certainly no row_id exists) THEN RETURN (0) -- no row ID exists ELSE RETURN reserve_job -- recursive call END IF END IF END ---------- 8< -------------------- 8< ---------- OK, I see this SHOULD INDEED work. So, now my question is about costs. Assuming both approaches (test/retry VS lock/wait) yield a functional result. Which is best? Which is most efficient. Here are some pros and cons: LOCK/WAIT -------------------------------- PROS - fewer queries made: only one lock and one select per call - easy to implement - access to function is granted in order requested (per PostgreSQL handling of locks - expectation is that no starvation should occur) CONS - requires setting locks in application code which could be forgotten by app developers - locks are placed outside function blocking entire function which may have code that might be safely run concurrently TEST/RETRY -------------------------------- PROS - in theory, code SHOULD be entirely contained within a single stored procedure (less application coding needed) - no locks needed (faster execution?) CONS - requires extra queries to determine empty queue - may starve if a single process continues to grab the same row as other processes - need recursive calls in PL/PGSQL? I am going to try to implement your suggestion and see what I get in comparison to what I am seeing now with my LOCK/WAIT code. Dante D. Dante Lorenso dante@direct2prospect.com 972-333-4139 > On Wed, 14 May 2003 01:10:39 -0500, "D. Dante Lorenso" > <dante@lorenso.com> wrote: > >> How do you distinguish between (i) and (ii)? Just do > >> SELECT job_id FROM job_table > >> WHERE processor_id = 0 LIMIT 1; > >> If this returns 0 rows, you have (i). > >> If it returns one row, you have (ii). > > > >You can't do it this way > > Oh. > > > because if this select IS successful, > >you'll still need to LOCK the row for the update. > > That's exactly the reason why I told you ... > > |(ii) The row has been reserved by another transaction running at the > |same time. In this case, restart at SELECT FOR UPDATE. > ^^ > not after! > > If there is a row satisfying the WHERE clause but SELECT FOR UPDATE > does not return it, this is the effect of a race condition: > > SELECT xmax,* FROM job; > xmax | id | pr > ------+----+---- > 0 | 2 | 0 > 0 | 3 | 1 > 0 | 1 | 2 > 0 | 4 | 0 > > Session 1 Session 2 > > BEGIN; > SELECT xmax,* FROM job > WHERE pr = 0 FOR UPDATE LIMIT 1; > xmax | id | pr > ------+----+---- > 0 | 2 | 0 > (1 row) > > select xmax,* FROM job > WHERE id = 2; > xmax | id | pr > -------+----+---- > 58634 | 2 | 0 > (1 row) > BEGIN; > select xmax,* FROM job > WHERE pr = 0 LIMIT 1; > xmax | id | pr > -------+----+---- > 58634 | 2 | 0 > (1 row) > > SELECT xmax,* FROM job > WHERE pr = 0 FOR UPDATE LIMIT 1; > -- waits, because the row with id=2 > -- satisfies the WHERE clause but > -- is locked by transaction 58634 ... > UPDATE job SET pr = 1 > WHERE id = 2; > -- The same would happen, if we did the > -- SELECT FOR UPDATE here (after the > -- UPDATE in the other session), because > -- our *visible* version of the row > -- still satisfies the WHERE clause. > > select xmax,* FROM job > WHERE id = 2; > xmax | id | pr > ------+----+---- > 0 | 2 | 1 > (1 row) > -- xmax = 0 because we see > -- the newly inserted tuple > > COMMIT; > -- continues ... > xmax | id | pr > ------+----+---- > (0 rows) > -- because the row this SELECT was about > -- to return does not satisfy pr = 0 > -- any more > > SELECT xmax,* FROM job > WHERE pr = 0 LIMIT 1; > xmax | id | pr > ------+----+---- > 0 | 4 | 0 > (1 row) > -- but there is another row, so ... > > SELECT xmax,* FROM job > WHERE pr = 0 FOR UPDATE LIMIT 1; > xmax | id | pr > ------+----+---- > 0 | 4 | 0 > (1 row) > -- does not necessarily return the same > -- id as the previous SELECT > > UPDATE ...; > COMMIT; > > HTH. > Servus > Manfred >
"D. Dante Lorenso" <dante@lorenso.com> writes: > Can someone please explain what is happening to the PL/PGSQL function > in regards to commit visibility and transactions? Since the function is executed as part of a single client SELECT command, the whole thing runs with a single snapshot; its view of other process' actions on the database is frozen at the start of that outer SELECT, even in READ COMMITTED mode. There have been past discussions about whether this is a good idea or not, but so far no consensus to change it, IIRC. (Look for "SetQuerySnapshot" in the archives, if you care.) But at any rate, because of that behavior it's really impossible to encapsulate the locking behavior entirely in a plpgsql function; the wait-and-retry loop *must* be on the client side. That leaves me doubting that there's much value in trying to encapsulate any of it. As Manfred noted, you may eventually want to add additional conditions to the SELECT, and that's just lots easier to do if the logic is all in one place rather than split between client and server code. I still like my serializable-mode solution better than any of the alternatives proposed so far. Manfred's latest try doesn't solve the race-condition problem, because the second SELECT will never find anything the first one didn't find. (It could if it were on the client side... but you don't need two selects at all with the serializable-mode solution.) regards, tom lane
> Date: Sun, 16 Jun 2002 19:53:15 -0400 > So I've come around to agree with the position that Tatsuo and Hiroshi > put forward in the thread mentioned above: plpgsql (and the other PL > languages) need to do SetQuerySnapshot not only CommandCounterIncrement > between user-supplied queries. > > Is anyone still unconvinced? If not, I'll try to fix it sometime soon. > > As that thread pointed out, there also seem to be some problems with > plpgsql not doing enough CommandCounterIncrements when it's executing > already-planned queries; I'll take a look at that issue at the same > time. > regards, tom lane Does this mean that you will be making changes to support READ COMMITTED mode acting like READ COMMITTED mode when inside PL/PGSQL procedures? ... or am I pulling up an archive out of place? I feel that my PL/PGSQL procedures are acting like they are pinned inside a 'READ REPEATABLE' mode instead of READ COMMITTED. I'd prefer to always have the transaction isolation level set to what I ask for rather than this default behavior. It seriously cripples the power of the PL/PGSQL language to have the feature NOT operate this way. Neat thing about stored procedures is being able to embed common logic inside the database and ensure consistency in implementation across multiple client apps, client languages, as well as minimize client/server traffic. If you still do not have concensus, maybe someone just needs to ask for a new poll? Dante D. Dante Lorenso dante@lorenso.com 972-333-4139 ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "D. Dante Lorenso" <dante@lorenso.com> Cc: "Manfred Koizar" <mkoi-pg@aon.at>; <pgsql-general@postgresql.org> Sent: Wednesday, May 14, 2003 9:53 AM Subject: Re: [GENERAL] Serialization, Locking...implement processing Queue with a table > "D. Dante Lorenso" <dante@lorenso.com> writes: > > Can someone please explain what is happening to the PL/PGSQL function > > in regards to commit visibility and transactions? > > Since the function is executed as part of a single client SELECT > command, the whole thing runs with a single snapshot; its view of > other process' actions on the database is frozen at the start of that > outer SELECT, even in READ COMMITTED mode. > > There have been past discussions about whether this is a good idea > or not, but so far no consensus to change it, IIRC. (Look for > "SetQuerySnapshot" in the archives, if you care.) > > But at any rate, because of that behavior it's really impossible to > encapsulate the locking behavior entirely in a plpgsql function; the > wait-and-retry loop *must* be on the client side. That leaves me > doubting that there's much value in trying to encapsulate any of it. > As Manfred noted, you may eventually want to add additional conditions > to the SELECT, and that's just lots easier to do if the logic is all > in one place rather than split between client and server code. > > I still like my serializable-mode solution better than any of the > alternatives proposed so far. Manfred's latest try doesn't solve > the race-condition problem, because the second SELECT will never > find anything the first one didn't find. (It could if it were on > the client side... but you don't need two selects at all with the > serializable-mode solution.) > > regards, tom lane >
I really want to see newly committed data when I am INSIDE a PL/PGSQL function. This way, if I do a LOCK on a table and subsequently perform a SELECT after aquiring the lock, I know I'll see the data that any previous transactions just updated (hence they released my lock)... So, if you don't want to do the auto 'SetQuerySnapshot' calls as they probably should be, is there a way I can call this from the PL/PGSQL function myself, directly? Something like this: -------------------- 8< -------------------- CREATE OR REPLACE... ... BEGIN LOCK TABLE blah_blah... EXCLUSIVE MODE; SetQuerySnapshot(); <----------- I want something like this SELECT id FROM table WHERE blah blah blah ... FOR UPDATE; IF NOT FOUND THEN ... ELSE ... END IF; END; ... -------------------- 8< -------------------- Something like that might solve my problem whereby I'd like the PL/pgSQL function to be able to work as if it were in READ COMMITTED mode. Dante D. Dante Lorenso dante@lorenso.com 972-333-4139