Thread: Small PosgreSQL locking function request - with bounty

Small PosgreSQL locking function request - with bounty

From
David Noel
Date:
I have a few database queries that I've been running from within a
Java project. I have recently come to the understanding that I need to
run them instead within the PostgreSQL server as stored functions. I
have that understanding because I need to make use of locking
functionality, and that seems only able to be done by means of
PostgreSQL functions. Transactions don't seem to be able to provide
this. I've never written functions for postgres, so I thought maybe
someone here could help.

To provide some context: the code is a part of a webcrawler. More
specifically, it is a part of the queuing system that handles the
management of URL's to be crawled. The system takes URL's from the
queue with one query, and marks them as active with a second. It then
sends the results on to the crawler. Once the crawler has crawled the
URL, a third query removes the URL from the queue.

The code is running concurrently in multiple threads on multiple
servers, and in scaling it to multiple servers I've run into some
problems. It seems that due to the way postgres is designed I am
unable to lock tables, or utilize transactions in Java to acheive
concurrency. So I need it instead to be run as a postgres
function/stored procedure. It seems. Am I correct in this, or did I
misread the PosgreSQL Transactions documentation?

Assuming the only way to accomplish this is with a postgres function,
would anyone care to implement this for a small reward (though
relative to the amount of work required I'd say it's probably a decent
to large reward)?

The queries are as follows:

String querySelect =
"select \"URL\",\"PublishDate\",\"SiteName\",\"Classification\",\"Special\",\"NextCrawlDate\"
from \"crawlq\" " +
"where \"Special\" = ? " +
"AND \"Active\" = 'true' " +
"AND \"TimeoutDate\" <= now() " +
"AND \"CrawlError\" = 'false' " +
"OR " +
"\"Special\" = ? " +
"AND \"Active\" = 'false' " +
"AND \"CrawlError\" = 'false' " +
"order by \"NextCrawlDate\" asc limit 1";

String queryUpdateActive =
"update \"crawlq\" " +
"set \"Active\" = 'true', " +
"\"TimeoutDate\" = now() + interval '5 minutes' " +
"where \"URL\" = ? " ;

This is what I need the function to do:

I need the PostgreSQL function to first lock the table "crawlq".
I then need it to perform the "querySelect" query.
I then need it to perform the "queryUpdateActive" query.
I then need it to unlock the table.
I then need it to return the values from the select query to the Java project.

Deliverables: I need the postgres function and a simple java program
that calls the function and returns a result set. It doesn't need to
do anything with the data, just call the function and return the value
to the program. This should be a very simple project that shouldn't
take more than 15 minutes for anyone familiar with writing postgres
functions. Would $50 via PayPal be enough to entice anyone to offer a
solution? I could also offer the payment in LiteCoins if you'd rather
do it that way. Of course if you're feeling benevolent I wouldn't
object to anyone who felt like doing it for free.

The Java function I'm using that fetches elements from the queue
currently is as follows:

    public synchronized FetchType fetch(String cs){
        if(debug_level == 1)
            System.out.println(new java.util.Date(System.currentTimeMillis()) +
" : DAO : fetching element from database");

        /**
         * prepare the select query
         * execute it -- pull the items from the queue database
         * load the query results into a return container
         * clean up
         * prepare the update query
         * execute it -- update the record as active
         * clean up
         * commit the transaction
         * return the query results
         */

        try {
            if(!dbq.isValid(10))
                connectQ();
        } catch (SQLException e1) {
            e1.printStackTrace();
        }

        PreparedStatement stmt = null;
        PreparedStatement stmt2 = null;
        ResultSet rset = null;
        FetchType ret = null;

        // TODO: use a stored function
        String     querySelect    =
                "select \"URL\",\"PublishDate\",\"SiteName\",\"Classification\",\"Special\",\"NextCrawlDate\"
from \"crawlq\" "    +
                "where \"Special\" = ? "        +
                "AND \"Active\" = 'true' "            +
                "AND \"TimeoutDate\" <= now() "     +
                "AND \"CrawlError\" = 'false' "        +
                "OR "                                 +
                "\"Special\" = ? "             +
                "AND \"Active\" = 'false' "            +
                "AND \"CrawlError\" = 'false' "        +
                "order by \"NextCrawlDate\" asc limit 1";

        String queryUpdateActive =
                "update \"crawlq\" "                 +
                    "set \"Active\" = 'true', "     +
                    "\"TimeoutDate\" = now() + interval '5 minutes' "     +
                "where \"URL\" = ? "                ;

        try {
            stmt = dbq.prepareStatement(querySelect);
            stmt.setEscapeProcessing(true);
            stmt.setString(1, cs);
            stmt.setString(2, cs);
            rset = stmt.executeQuery();

            if(rset.next()){
                ret = new FetchType(
                        rset.getString("URL"),
                        rset.getString("SiteName"),
                        rset.getString("Classification"),
                        rset.getDate("PublishDate"),
                        rset.getString("Special")
                        );
            } else
                ret = null;

            rset.close();
            stmt.close();

            if (ret != null){
                stmt2 = dbq.prepareStatement(queryUpdateActive);
                stmt2.setEscapeProcessing(true);
                stmt2.setString(1, ret.getURL());
                stmt2.execute();
                stmt2.close();
                dbq.commit();
            }

            if(debug_level == 1)
                System.out.println(new java.util.Date(System.currentTimeMillis())
+ " : DAO : fetch complete " + ret.getURL());

            return ret;
        } catch (SQLException e) {
            try {
                e.printStackTrace();
                dbq.rollback();
                stmt.close();
                stmt2.close();
                e.printStackTrace();
                return null;
            } catch (SQLException e2) {
                e2.printStackTrace();
                return null;
            }
        }
    }

Running on one machine I'm bypassing the transaction concurrency issue
by synchronizing the method. But the Java concurrency constructs I'm
using here don't scale to multiple machines.

At any rate, have I provided enough information to get the solution
I'm looking for? Have I provided enough financial incentive to get
this implemented? If so, please respond with code here to the list so
multiple people don't implement it and expect to be paid. I can only
pay one person, though if another person fixes a bug in a proposed
solution I'm open to splitting the bounty however seems fair.

Thanks for reading, hope to hear back!

-David Noel


Re: Small PosgreSQL locking function request - with bounty

From
Ralf Schuchardt
Date:
Hi David,

have you used the "for update" clause in your select statements? With this clause "select" locks the selected row(s) in
atable for modifications and other "select for updates". 
My understanding is, that "for update" does what you need. You can execute your select and update statements in a
singletransaction in your Java application and get the desired effect (if the "URL" column is the primary key). 

Regards,
Ralf


Am 12.09.2013 um 13:40 schrieb David Noel <david.i.noel@gmail.com>:

> I have a few database queries that I've been running from within a
> Java project. I have recently come to the understanding that I need to
> run them instead within the PostgreSQL server as stored functions. I
> have that understanding because I need to make use of locking
> functionality, and that seems only able to be done by means of
> PostgreSQL functions. Transactions don't seem to be able to provide
> this. I've never written functions for postgres, so I thought maybe
> someone here could help.
>
> To provide some context: the code is a part of a webcrawler. More
> specifically, it is a part of the queuing system that handles the
> management of URL's to be crawled. The system takes URL's from the
> queue with one query, and marks them as active with a second. It then
> sends the results on to the crawler. Once the crawler has crawled the
> URL, a third query removes the URL from the queue.
>
> The code is running concurrently in multiple threads on multiple
> servers, and in scaling it to multiple servers I've run into some
> problems. It seems that due to the way postgres is designed I am
> unable to lock tables, or utilize transactions in Java to acheive
> concurrency. So I need it instead to be run as a postgres
> function/stored procedure. It seems. Am I correct in this, or did I
> misread the PosgreSQL Transactions documentation?
>
> Assuming the only way to accomplish this is with a postgres function,
> would anyone care to implement this for a small reward (though
> relative to the amount of work required I'd say it's probably a decent
> to large reward)?
>
> The queries are as follows:
>
> String querySelect =
> "select \"URL\",\"PublishDate\",\"SiteName\",\"Classification\",\"Special\",\"NextCrawlDate\"
> from \"crawlq\" " +
> "where \"Special\" = ? " +
> "AND \"Active\" = 'true' " +
> "AND \"TimeoutDate\" <= now() " +
> "AND \"CrawlError\" = 'false' " +
> "OR " +
> "\"Special\" = ? " +
> "AND \"Active\" = 'false' " +
> "AND \"CrawlError\" = 'false' " +
> "order by \"NextCrawlDate\" asc limit 1";
>
> String queryUpdateActive =
> "update \"crawlq\" " +
> "set \"Active\" = 'true', " +
> "\"TimeoutDate\" = now() + interval '5 minutes' " +
> "where \"URL\" = ? " ;
>
> This is what I need the function to do:
>
> I need the PostgreSQL function to first lock the table "crawlq".
> I then need it to perform the "querySelect" query.
> I then need it to perform the "queryUpdateActive" query.
> I then need it to unlock the table.
> I then need it to return the values from the select query to the Java project.
>
> Deliverables: I need the postgres function and a simple java program
> that calls the function and returns a result set. It doesn't need to
> do anything with the data, just call the function and return the value
> to the program. This should be a very simple project that shouldn't
> take more than 15 minutes for anyone familiar with writing postgres
> functions. Would $50 via PayPal be enough to entice anyone to offer a
> solution? I could also offer the payment in LiteCoins if you'd rather
> do it that way. Of course if you're feeling benevolent I wouldn't
> object to anyone who felt like doing it for free.
>
> The Java function I'm using that fetches elements from the queue
> currently is as follows:
>
>     public synchronized FetchType fetch(String cs){
>         if(debug_level == 1)
>             System.out.println(new java.util.Date(System.currentTimeMillis()) +
> " : DAO : fetching element from database");
>
>         /**
>          * prepare the select query
>          * execute it -- pull the items from the queue database
>          * load the query results into a return container
>          * clean up
>          * prepare the update query
>          * execute it -- update the record as active
>          * clean up
>          * commit the transaction
>          * return the query results
>          */
>
>         try {
>             if(!dbq.isValid(10))
>                 connectQ();
>         } catch (SQLException e1) {
>             e1.printStackTrace();
>         }
>
>         PreparedStatement stmt = null;
>         PreparedStatement stmt2 = null;
>         ResultSet rset = null;
>         FetchType ret = null;
>
>         // TODO: use a stored function
>         String     querySelect    =
>                 "select \"URL\",\"PublishDate\",\"SiteName\",\"Classification\",\"Special\",\"NextCrawlDate\"
> from \"crawlq\" "    +
>                 "where \"Special\" = ? "        +
>                 "AND \"Active\" = 'true' "            +
>                 "AND \"TimeoutDate\" <= now() "     +
>                 "AND \"CrawlError\" = 'false' "        +
>                 "OR "                                 +
>                 "\"Special\" = ? "             +
>                 "AND \"Active\" = 'false' "            +
>                 "AND \"CrawlError\" = 'false' "        +
>                 "order by \"NextCrawlDate\" asc limit 1";
>
>         String queryUpdateActive =
>                 "update \"crawlq\" "                 +
>                     "set \"Active\" = 'true', "     +
>                     "\"TimeoutDate\" = now() + interval '5 minutes' "     +
>                 "where \"URL\" = ? "                ;
>
>         try {
>             stmt = dbq.prepareStatement(querySelect);
>             stmt.setEscapeProcessing(true);
>             stmt.setString(1, cs);
>             stmt.setString(2, cs);
>             rset = stmt.executeQuery();
>
>             if(rset.next()){
>                 ret = new FetchType(
>                         rset.getString("URL"),
>                         rset.getString("SiteName"),
>                         rset.getString("Classification"),
>                         rset.getDate("PublishDate"),
>                         rset.getString("Special")
>                         );
>             } else
>                 ret = null;
>
>             rset.close();
>             stmt.close();
>
>             if (ret != null){
>                 stmt2 = dbq.prepareStatement(queryUpdateActive);
>                 stmt2.setEscapeProcessing(true);
>                 stmt2.setString(1, ret.getURL());
>                 stmt2.execute();
>                 stmt2.close();
>                 dbq.commit();
>             }
>
>             if(debug_level == 1)
>                 System.out.println(new java.util.Date(System.currentTimeMillis())
> + " : DAO : fetch complete " + ret.getURL());
>
>             return ret;
>         } catch (SQLException e) {
>             try {
>                 e.printStackTrace();
>                 dbq.rollback();
>                 stmt.close();
>                 stmt2.close();
>                 e.printStackTrace();
>                 return null;
>             } catch (SQLException e2) {
>                 e2.printStackTrace();
>                 return null;
>             }
>         }
>     }
>
> Running on one machine I'm bypassing the transaction concurrency issue
> by synchronizing the method. But the Java concurrency constructs I'm
> using here don't scale to multiple machines.
>
> At any rate, have I provided enough information to get the solution
> I'm looking for? Have I provided enough financial incentive to get
> this implemented? If so, please respond with code here to the list so
> multiple people don't implement it and expect to be paid. I can only
> pay one person, though if another person fixes a bug in a proposed
> solution I'm open to splitting the bounty however seems fair.
>
> Thanks for reading, hope to hear back!
>
> -David Noel
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



Re: Small PosgreSQL locking function request - with bounty

From
rob stone
Date:

On Thu, 2013-09-12 at 06:40 -0500, David Noel wrote:
> I have a few database queries that I've been running from within a
> Java project. I have recently come to the understanding that I need to
> run them instead within the PostgreSQL server as stored functions. I
> have that understanding because I need to make use of locking
> functionality, and that seems only able to be done by means of
> PostgreSQL functions. Transactions don't seem to be able to provide
> this. I've never written functions for postgres, so I thought maybe
> someone here could help.
>
> To provide some context: the code is a part of a webcrawler. More
> specifically, it is a part of the queuing system that handles the
> management of URL's to be crawled. The system takes URL's from the
> queue with one query, and marks them as active with a second. It then
> sends the results on to the crawler. Once the crawler has crawled the
> URL, a third query removes the URL from the queue.
>
> The code is running concurrently in multiple threads on multiple
> servers, and in scaling it to multiple servers I've run into some
> problems. It seems that due to the way postgres is designed I am
> unable to lock tables, or utilize transactions in Java to acheive
> concurrency. So I need it instead to be run as a postgres
> function/stored procedure. It seems. Am I correct in this, or did I
> misread the PosgreSQL Transactions documentation?
>
> Assuming the only way to accomplish this is with a postgres function,
> would anyone care to implement this for a small reward (though
> relative to the amount of work required I'd say it's probably a decent
> to large reward)?
>
> The queries are as follows:
>
> String querySelect =
> "select \"URL\",\"PublishDate\",\"SiteName\",\"Classification\",\"Special\",\"NextCrawlDate\"
> from \"crawlq\" " +
> "where \"Special\" = ? " +
> "AND \"Active\" = 'true' " +
> "AND \"TimeoutDate\" <= now() " +
> "AND \"CrawlError\" = 'false' " +
> "OR " +
> "\"Special\" = ? " +
> "AND \"Active\" = 'false' " +
> "AND \"CrawlError\" = 'false' " +
> "order by \"NextCrawlDate\" asc limit 1";
>
> String queryUpdateActive =
> "update \"crawlq\" " +
> "set \"Active\" = 'true', " +
> "\"TimeoutDate\" = now() + interval '5 minutes' " +
> "where \"URL\" = ? " ;
>
> This is what I need the function to do:
>
> I need the PostgreSQL function to first lock the table "crawlq".
> I then need it to perform the "querySelect" query.
> I then need it to perform the "queryUpdateActive" query.
> I then need it to unlock the table.
> I then need it to return the values from the select query to the Java project.
>
> Deliverables: I need the postgres function and a simple java program
> that calls the function and returns a result set. It doesn't need to
> do anything with the data, just call the function and return the value
> to the program. This should be a very simple project that shouldn't
> take more than 15 minutes for anyone familiar with writing postgres
> functions. Would $50 via PayPal be enough to entice anyone to offer a
> solution? I could also offer the payment in LiteCoins if you'd rather
> do it that way. Of course if you're feeling benevolent I wouldn't
> object to anyone who felt like doing it for free.
>
> The Java function I'm using that fetches elements from the queue
> currently is as follows:
>
>     public synchronized FetchType fetch(String cs){
>         if(debug_level == 1)
>             System.out.println(new java.util.Date(System.currentTimeMillis()) +
> " : DAO : fetching element from database");
>
>         /**
>          * prepare the select query
>          * execute it -- pull the items from the queue database
>          * load the query results into a return container
>          * clean up
>          * prepare the update query
>          * execute it -- update the record as active
>          * clean up
>          * commit the transaction
>          * return the query results
>          */
>
>         try {
>             if(!dbq.isValid(10))
>                 connectQ();
>         } catch (SQLException e1) {
>             e1.printStackTrace();
>         }
>
>         PreparedStatement stmt = null;
>         PreparedStatement stmt2 = null;
>         ResultSet rset = null;
>         FetchType ret = null;
>
>         // TODO: use a stored function
>         String     querySelect    =
>                 "select \"URL\",\"PublishDate\",\"SiteName\",\"Classification\",\"Special\",\"NextCrawlDate\"
> from \"crawlq\" "    +
>                 "where \"Special\" = ? "        +
>                 "AND \"Active\" = 'true' "            +
>                 "AND \"TimeoutDate\" <= now() "     +
>                 "AND \"CrawlError\" = 'false' "        +
>                 "OR "                                 +
>                 "\"Special\" = ? "             +
>                 "AND \"Active\" = 'false' "            +
>                 "AND \"CrawlError\" = 'false' "        +
>                 "order by \"NextCrawlDate\" asc limit 1";
>
>         String queryUpdateActive =
>                 "update \"crawlq\" "                 +
>                     "set \"Active\" = 'true', "     +
>                     "\"TimeoutDate\" = now() + interval '5 minutes' "     +
>                 "where \"URL\" = ? "                ;
>
>         try {
>             stmt = dbq.prepareStatement(querySelect);
>             stmt.setEscapeProcessing(true);
>             stmt.setString(1, cs);
>             stmt.setString(2, cs);
>             rset = stmt.executeQuery();
>
>             if(rset.next()){
>                 ret = new FetchType(
>                         rset.getString("URL"),
>                         rset.getString("SiteName"),
>                         rset.getString("Classification"),
>                         rset.getDate("PublishDate"),
>                         rset.getString("Special")
>                         );
>             } else
>                 ret = null;
>
>             rset.close();
>             stmt.close();
>
>             if (ret != null){
>                 stmt2 = dbq.prepareStatement(queryUpdateActive);
>                 stmt2.setEscapeProcessing(true);
>                 stmt2.setString(1, ret.getURL());
>                 stmt2.execute();
>                 stmt2.close();
>                 dbq.commit();
>             }
>
>             if(debug_level == 1)
>                 System.out.println(new java.util.Date(System.currentTimeMillis())
> + " : DAO : fetch complete " + ret.getURL());
>
>             return ret;
>         } catch (SQLException e) {
>             try {
>                 e.printStackTrace();
>                 dbq.rollback();
>                 stmt.close();
>                 stmt2.close();
>                 e.printStackTrace();
>                 return null;
>             } catch (SQLException e2) {
>                 e2.printStackTrace();
>                 return null;
>             }
>         }
>     }
>
> Running on one machine I'm bypassing the transaction concurrency issue
> by synchronizing the method. But the Java concurrency constructs I'm
> using here don't scale to multiple machines.
>
> At any rate, have I provided enough information to get the solution
> I'm looking for? Have I provided enough financial incentive to get
> this implemented? If so, please respond with code here to the list so
> multiple people don't implement it and expect to be paid. I can only
> pay one person, though if another person fixes a bug in a proposed
> solution I'm open to splitting the bounty however seems fair.
>
> Thanks for reading, hope to hear back!
>
> -David Noel
>
>

Hello David,

Your table crawlq must have an unique key along the lines of:-

crawlq_id SERIAL NOT NULL PRIMARY KEY USING INDEX TABLESPACE whatever,

Take the "limit 1" off from your select so that you have a result set
containing 'n' rows. Just select the unique primary key.
Read a key from the result set. Then:-

BEGIN;
SELECT columns FROM crawlq WHERE crawlq_id = key FOR UPDATE;
(Row is now locked).
In a try -- catch block do your UPDATE.
No errors -- COMMIT;
Errors -- ROLLBACK; plus spit out reason to application even though it
will be in the log but it's better to find these things out a.s.a.p.

When the result set is empty, return and populate it again with new
id's.

I have no idea how many rows are in table crawlq. One idea would be to
create a table crawlq_processed and instead of updating crawlq, delete
the row from that table and insert the data into crawlq_processed. Auto
vacuum crawlq every few hours or so.

Anyway, the basic flow is:-
BEGIN;
SELECT FOR UPDATE;
do all your processing;
COMMIT; or ROLLBACK;

Hope this helps.
Cheers,
Robert




Re: Small PosgreSQL locking function request - with bounty

From
David Noel
Date:
> ...have you used the "for update" clause in your select statements?

Hi Ralf, thanks for the reply. I was unaware of the "for update"
construct. Thank you!

> My understanding is, that "for update" does what you need.

I've read through the documentation, found a few examples using "for
update" syntax, and it looks like you're right. Would transaction
level SERIALIZABLE also work? It does seem messier though, having to
handle errors thrown due to concurrent transactions.

I'm running to some trouble though with the SELECT FOR UPDATE
documentation found here --
http://www.postgresql.org/docs/9.0/static/sql-select.html#SQL-FOR-UPDATE-SHARE.
It says:

"...if an UPDATE, DELETE, or SELECT FOR UPDATE from another
transaction has already locked a selected row or rows, SELECT FOR
UPDATE will wait for the other transaction to complete, and will then
lock and return the updated row (or no row, if the row was deleted).
Within a SERIALIZABLE transaction, however, an error will be thrown if
a row to be locked has changed since the transaction started."

In my case I have a boolean "Active" column as a flag to indicate
whether the selected row is currently being crawled. This column is
used in querySelect to identify inactive rows. It is then modified by
queryUpdateActive within the same transaction to identify the row as
"currently being processed". Under the default READ COMMITTED
isolation level (without using the SELECT FOR UPDATE construct) I'm
running into the issue that concurrent transactions select the same
row to be crawled, and don't see the update from the other
transaction. My question is: according to the documentation, it seems
that SELECT FOR UPDATE may still run into the same concurrency issue.
If two concurrent transactions select the same row, the first will be
given the lock. The second transaction will encounter the lock and be
forced to wait. The update from the first transaction will occur, the
lock will be released, and control will be passed to the second
transaction. According to the documentation, the row will already have
been selected, so the transaction, it seems, will continue processing
the row as if it were marked "Inactive". In essence, the way I read
it, it won't care that the row had been updated by the first
transaction, and so essentially I will be running into the same
problem I'm facing now. Am I reading this correctly?


Re: Small PosgreSQL locking function request - with bounty

From
rob stone
Date:
Hello David,

I replied to your original e-mail but it must have vanished into the
ether. I sent you a brief precis about transaction processing.

For "SELECT FOR UPDATE" to function, you MUST have an unique key on the
table. For example:-

crawlq_id SERIAL NOT NULL PRIMARY KEY USING INDEX TABLESPACE xyz,

That creates a sequence and whenever you insert a row into the table, it
automatically grabs the next value and stores it in that column.

Without knowing exactly what or how your application functions, I made a
suggestion that I believe will save you some grief. Create a new table
crawlq_processed (say) and your transaction flow becomes:-

BEGIN;
SELECT row FOR UPDATE;
Supplying the unique key and row is now locked.
Do your processing.
INSERT INTO crawlq_processed;
DELETE FROM crawlq;
COMMIT; or ROLLBACK; if errors occurred.

All of the above in appropriate try . . catch blocks.
You  need to set up a cron job to vacuum table crawlq.

The SELECT FOR UPDATE will not stop other processes inserting or reading
from crawlq. You have just locked a single row thus "protecting" it from
being updated or deleted by another process.

HTH.

Cheers,
Robert



Re: Small PosgreSQL locking function request - with bounty

From
Ralf Schuchardt
Date:
Am 13.09.2013 um 18:47 schrieb David Noel <david.i.noel@gmail.com>:

>> ...have you used the "for update" clause in your select statements?
>
> Hi Ralf, thanks for the reply. I was unaware of the "for update"
> construct. Thank you!
>
>> My understanding is, that "for update" does what you need.
>
[...]
> My question is: according to the documentation, it seems
> that SELECT FOR UPDATE may still run into the same concurrency issue.
> If two concurrent transactions select the same row, the first will be
> given the lock. The second transaction will encounter the lock and be
> forced to wait. The update from the first transaction will occur, the
> lock will be released, and control will be passed to the second
> transaction. According to the documentation, the row will already have
> been selected, so the transaction, it seems, will continue processing
> the row as if it were marked "Inactive". In essence, the way I read
> it, it won't care that the row had been updated by the first
> transaction, and so essentially I will be running into the same
> problem I'm facing now. Am I reading this correctly?

No, I think it will work. The part "and will then lock and return the updated row" does not mean, that the select
criteriais not rechecked. 

Let's try it. Two clients:

-- > Client 1

locktest=# create table locktest (id serial primary key, state int2);
CREATE TABLE
locktest=# insert into locktest values (1, 0), (2, 0);
INSERT 0 2
locktest=# select * from locktest;
 id | state
----+-------
  1 |      0
  2 |      0
(2 rows)

locktest=# begin;
BEGIN
locktest=# select * from locktest where state = 0 limit 1 for update;
 id | state
----+-------
  1 |     0
(1 row)

-------------- WAIT HERE IN CLIENT 1

-- > Client 2

locktest=# begin;
BEGIN
locktest=# select * from locktest where state = 0 limit 1 for update;

------------- Client 2 waits for a lock

-- > Client 1

locktest=# update locktest set state = 1 where id = 1;
UPDATE 1
locktest=# commit;
COMMIT

-- > Client 2

 id | state
----+-------
  2 |     0
(1 row)

[...]

You only have to take care, that the UPDATE really only updates the selected row.
An URL as a primary key might work, but might not be the best choice.


Ralf



Re: Small PosgreSQL locking function request - with bounty

From
Kevin Grittner
Date:
David Noel <david.i.noel@gmail.com> wrote:

> I've read through the documentation, found a few examples using
> "for update" syntax, and it looks like you're right. Would
> transaction level SERIALIZABLE also work? It does seem messier
> though, having to handle errors thrown due to concurrent
> transactions.

The SERIALIZABLE transaction isolation level, as implemented in
PostgreSQL, would provide correct behavior; but is not a very
efficient way to handling access to the ends of a queue.
Regardless of the isolation level you use for other things, you
probably want blocking around access to the ends of the queue -- or
at least to the part where you pull an item from the queue for
processing.  The tricky part is likely to be making sure that if
something is pulled from the queue it gets processed or gets put
back onto the queue.  Without careful planning you might risk
either losing items or blocking not just for pulling off the queue,
but all the way through processing.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company