Thread: Transactions

Transactions

From
Kevin Brown
Date:
I've been creating a databased application with PostgreSQL for a while now and
have loved it, though we just recently stumbled on an interesting bug.

I have a particular dialog which can do all kinds of incredibly complicated
things to the database.  Of course I didn't want to have to write the "undo
it to the database" code, nor did I want to force the user into being screwed
with a stray click.

So when the dialog opens, it executes a BEGIN, and then based on whether they
clicked ok or cancel on the dialog, it'd COMMIT or ROLLBACK.  This worked
fine for me in testing, but now that I have two people using this app
simulatneously, if they both click on ok at the same time, postgres seemingly
deadlocks.  My setup is such that I have n ruby clients talking with DRb to
another ruby process on the server, which then does all the communication to
postgres.  Each user has his/her own database connection.

So how can I accomplish this behavior without the deadlock?

Re: Transactions

From
Tom Lane
Date:
Kevin Brown <blargity@gmail.com> writes:
> So when the dialog opens, it executes a BEGIN, and then based on whether they
> clicked ok or cancel on the dialog, it'd COMMIT or ROLLBACK.  This worked
> fine for me in testing, but now that I have two people using this app
> simulatneously, if they both click on ok at the same time, postgres seemingly
> deadlocks.

Shouldn't happen; please provide a reproducible test case.

            regards, tom lane

Re: Transactions

From
Kevin Brown
Date:
On Saturday 18 March 2006 00:19, Tom Lane wrote:
> Kevin Brown <blargity@gmail.com> writes:
> > So when the dialog opens, it executes a BEGIN, and then based on whether
> > they clicked ok or cancel on the dialog, it'd COMMIT or ROLLBACK.  This
> > worked fine for me in testing, but now that I have two people using this
> > app simulatneously, if they both click on ok at the same time, postgres
> > seemingly deadlocks.
>
> Shouldn't happen; please provide a reproducible test case.
>
>             regards, tom lane

When the "deadlock" occurs:

kevin@ajbit:~> ps -elF | grep postgres
0 S postgres 17767     1  0  76   0 -  9908 -       4220   0 Mar08 ?        
00:00:01 /usr/bin/postmaster -D /var/lib/pgsql/data
1 S postgres 17769 17767  0  76   0 -  9973 -      12544   0 Mar08 ?        
00:00:09 postgres: writer process
1 S postgres 17770 17767  0  76   0 -  7475 -       2972   0 Mar08 ?        
00:00:00 postgres: stats buffer process
1 S postgres 17771 17770  0  75   0 -  7274 -       3044   0 Mar08 ?        
00:00:00 postgres: stats collector process
1 S postgres 29972 17767  0  76   0 - 10312 -      14016   0 Mar16 ?        
00:00:04 postgres: dli dli 127.0.0.1(53492) idle
1 S postgres  4072 17767  0  76   0 - 10261 -      12892   0 14:02 ?        
00:00:00 postgres: dli dli 127.0.0.1(42227) idle
1 S postgres  6020 17767  0  75   0 - 10585 -      15136   0 19:57 ?        
00:00:29 postgres: dli dli 127.0.0.1(39869) idle in transaction
1 S postgres  6195 17767  0  76   0 - 10323 semtim 14940   0 20:22 ?        
00:00:28 postgres: dli dli 127.0.0.1(37735) INSERT waiting
1 S postgres  6718 17767  0  76   0 - 10240 -       7504   0 21:50 ?        
00:00:00 postgres: dli dli 127.0.0.1(56292) idle
1 S postgres  6721 17767  0  77   0 - 10111 -       5340   0 21:51 ?        
00:00:00 postgres: dli dli 127.0.0.1(45934) idle
0 R kevin     6835  6816  0  78   0 -   663 -        468   0 22:00 pts/9    
00:00:00 grep postgres

You can see an insert is just waiting.  On what?  The server software never
unlocks and it's just sitting in an execute for postgres.

This only happens once a day, and only when 2 people are using it at once, and
using this dialog at once, so I can't provide test cases.  Let me know what
else I can do to help troubleshoot.

Re: Transactions

From
Tom Lane
Date:
Kevin Brown <blargity@gmail.com> writes:
> When the "deadlock" occurs:

> kevin@ajbit:~> ps -elF | grep postgres
> 1 S postgres 6020 17767 0 75  0 - 10585 -   15136  0 19:57 ?
> 00:00:29 postgres: dli dli 127.0.0.1(39869) idle in transaction
> 1 S postgres 6195 17767 0 76  0 - 10323 semtim 14940  0 20:22 ?
> 00:00:28 postgres: dli dli 127.0.0.1(37735) INSERT waiting

> You can see an insert is just waiting.  On what?

I'd bet it's waiting on the "idle in transaction" transaction, which is
waiting for a nonresponding client.  But the pg_locks view would tell
you more.

            regards, tom lane

Re: Transactions

From
"Florian G. Pflug"
Date:
Kevin Brown wrote:
> I've been creating a databased application with PostgreSQL for a while now and
> have loved it, though we just recently stumbled on an interesting bug.
>
> I have a particular dialog which can do all kinds of incredibly complicated
> things to the database.  Of course I didn't want to have to write the "undo
> it to the database" code, nor did I want to force the user into being screwed
> with a stray click.
>
> So when the dialog opens, it executes a BEGIN, and then based on whether they
> clicked ok or cancel on the dialog, it'd COMMIT or ROLLBACK.  This worked
> fine for me in testing, but now that I have two people using this app
> simulatneously, if they both click on ok at the same time, postgres seemingly
> deadlocks.  My setup is such that I have n ruby clients talking with DRb to
> another ruby process on the server, which then does all the communication to
> postgres.  Each user has his/her own database connection.
What ruby database library are you using?

Since ruby doesn't use native threads, but instead implements it's own threads (purely
in user-space), one thread can block your whole app - e.g, if the thread tries to read
from a socket where not data is available, and therefor the kernel puts the process
to sleep until data becomes available.

For the "normal" read() and write() operations exposed to the ruby programming this
is solved internally by ruby (it checks the filedescriptions state, and only reads
if the read won't block). But extensions, especially the ones talking over the network,
like the postgres client, can still cause this problem. In the "raw" postgres client
extension for ruby, you'll find two methods that can execute a query - query() and
query_async(). query() will block the _whole_ ruby interpreter until query results are
available, while query_async() will block only the thread that executed the query.

This one-thread-blocks-all behaviour can lead to "interesting" dead-locks. Assume,
for example, the following situation, with A and B being two ruby threads.
A1) update table where id=1
B1) update table where id=1
-----
B2) commit
A2) commit

Now, the query B1 will block until A2 is executed (because the record is already locked).
But if B1 blocks, and you used query() instead of query_async(), then the _whole_
ruby interpreter is blocked, and therefor A2 will never be executed. Bingo! you got
a deadlock, and postgres has no chance to ever detect this.

Of course, not using threads makes the problem even worse - the only solution is to
use the _async() functions. DBI lets you set a flag (forgot it's name) that forces
DBD::Pg to use the _async() functions - if you don't use DBI just replace all calls
to query() and exec() with query_async() and exec_async().

greetings, Florian Pflug

Re: Transactions

From
Kevin Brown
Date:
On Saturday 18 March 2006 12:31, Florian G. Pflug wrote:
> Kevin Brown wrote:
> > I've been creating a databased application with PostgreSQL for a while
> > now and have loved it, though we just recently stumbled on an interesting
> > bug.
> >
> > I have a particular dialog which can do all kinds of incredibly
> > complicated things to the database.  Of course I didn't want to have to
> > write the "undo it to the database" code, nor did I want to force the
> > user into being screwed with a stray click.
> >
> > So when the dialog opens, it executes a BEGIN, and then based on whether
> > they clicked ok or cancel on the dialog, it'd COMMIT or ROLLBACK.  This
> > worked fine for me in testing, but now that I have two people using this
> > app simulatneously, if they both click on ok at the same time, postgres
> > seemingly deadlocks.  My setup is such that I have n ruby clients talking
> > with DRb to another ruby process on the server, which then does all the
> > communication to postgres.  Each user has his/her own database
> > connection.
>
> What ruby database library are you using?

The C extension.

> Since ruby doesn't use native threads, but instead implements it's own
> threads (purely in user-space), one thread can block your whole app - e.g,
> if the thread tries to read from a socket where not data is available, and
> therefor the kernel puts the process to sleep until data becomes available.
>
> For the "normal" read() and write() operations exposed to the ruby
> programming this is solved internally by ruby (it checks the
> filedescriptions state, and only reads if the read won't block). But
> extensions, especially the ones talking over the network, like the postgres
> client, can still cause this problem. In the "raw" postgres client
> extension for ruby, you'll find two methods that can execute a query -
> query() and query_async(). query() will block the _whole_ ruby interpreter
> until query results are available, while query_async() will block only the
> thread that executed the query.

Yes, but my question is why is the query unable to complete.

> This one-thread-blocks-all behaviour can lead to "interesting" dead-locks.
> Assume, for example, the following situation, with A and B being two ruby
> threads. A1) update table where id=1
> B1) update table where id=1
> -----
> B2) commit
> A2) commit
>
> Now, the query B1 will block until A2 is executed (because the record is
> already locked). But if B1 blocks, and you used query() instead of
> query_async(), then the _whole_ ruby interpreter is blocked, and therefor
> A2 will never be executed. Bingo! you got a deadlock, and postgres has no
> chance to ever detect this.

So what you're saying is that my commits and rollbacks should be async,
correct?

> Of course, not using threads makes the problem even worse - the only
> solution is to use the _async() functions. DBI lets you set a flag (forgot
> it's name) that forces DBD::Pg to use the _async() functions - if you don't
> use DBI just replace all calls to query() and exec() with query_async() and
> exec_async().

I don't quite understand how a select can be an async call....  I need the
data to proceed.  If you're saying on my inserts, deletes, etc, then that I
can understand.

> greetings, Florian Pflug

Re: Transactions

From
"Florian G. Pflug"
Date:
Kevin Brown wrote:
> On Saturday 18 March 2006 12:31, Florian G. Pflug wrote:
>>>I've been creating a databased application with PostgreSQL for a while
>>>now and have loved it, though we just recently stumbled on an interesting
>>>bug.
>>>
>>>I have a particular dialog which can do all kinds of incredibly
>>>complicated things to the database.  Of course I didn't want to have to
>>>write the "undo it to the database" code, nor did I want to force the
>>>user into being screwed with a stray click.
>>>
>>>So when the dialog opens, it executes a BEGIN, and then based on whether
>>>they clicked ok or cancel on the dialog, it'd COMMIT or ROLLBACK.  This
>>>worked fine for me in testing, but now that I have two people using this
>>>app simulatneously, if they both click on ok at the same time, postgres
>>>seemingly deadlocks.  My setup is such that I have n ruby clients talking
>>>with DRb to another ruby process on the server, which then does all the
>>>communication to postgres.  Each user has his/her own database
>>>connection.
>>
>>What ruby database library are you using?
>
> The C extension.
>
>>Since ruby doesn't use native threads, but instead implements it's own
>>threads (purely in user-space), one thread can block your whole app - e.g,
>>if the thread tries to read from a socket where not data is available, and
>>therefor the kernel puts the process to sleep until data becomes available.
>>
>>For the "normal" read() and write() operations exposed to the ruby
>>programming this is solved internally by ruby (it checks the
>>filedescriptions state, and only reads if the read won't block). But
>>extensions, especially the ones talking over the network, like the postgres
>>client, can still cause this problem. In the "raw" postgres client
>>extension for ruby, you'll find two methods that can execute a query -
>>query() and query_async(). query() will block the _whole_ ruby interpreter
>>until query results are available, while query_async() will block only the
>>thread that executed the query.
>
>
> Yes, but my question is why is the query unable to complete.
Well, see below ;-)

>>This one-thread-blocks-all behaviour can lead to "interesting" dead-locks.
>>Assume, for example, the following situation, with A and B being two ruby
>>threads. A1) update table where id=1
>>B1) update table where id=1
>>-----
>>B2) commit
>>A2) commit
>>
>>Now, the query B1 will block until A2 is executed (because the record is
>>already locked). But if B1 blocks, and you used query() instead of
>>query_async(), then the _whole_ ruby interpreter is blocked, and therefor
>>A2 will never be executed. Bingo! you got a deadlock, and postgres has no
>>chance to ever detect this.
>
>
> So what you're saying is that my commits and rollbacks should be async,
> correct?
No, you should always use _async(), and never, ever use query() or exec()
in a multi-threaded app.

>>Of course, not using threads makes the problem even worse - the only
>>solution is to use the _async() functions. DBI lets you set a flag (forgot
>>it's name) that forces DBD::Pg to use the _async() functions - if you don't
>>use DBI just replace all calls to query() and exec() with query_async() and
>>exec_async().
>
> I don't quite understand how a select can be an async call....  I need the
> data to proceed.  If you're saying on my inserts, deletes, etc, then that I
> can understand.
Don't be fooled by the word async()... It's still synchronous for the
thread that calls it - the _thread_ is blocked until the query is done,
and query_async() returns the result, just as query() does. The difference
is only for all _other_ threads. All other threads can continue to run
while the one thread waits for query results _only_ if the one thread
used query_async(). If it used query(), then you _whole_ interpreter is
blocked until the query is finished.

The naming of these function is quite strange - I was fooled by this myself,
and so was whoever wrote the DBD-module for pg, because he used the non-async
functions too ;-)

To rephrase - you don't usually need to change anything in your program when
using query_async() instead of query(). It'll just work better ;-).

greetings, Florian Pflug

Re: Transactions

From
Kevin Brown
Date:
On Saturday 18 March 2006 12:58, Florian G. Pflug wrote:
> Kevin Brown wrote:
> > So what you're saying is that my commits and rollbacks should be async,
> > correct?
>
> No, you should always use _async(), and never, ever use query() or exec()
> in a multi-threaded app.
>
> > I don't quite understand how a select can be an async call....  I need
> > the data to proceed.  If you're saying on my inserts, deletes, etc, then
> > that I can understand.
>
> Don't be fooled by the word async()... It's still synchronous for the
> thread that calls it - the _thread_ is blocked until the query is done,
> and query_async() returns the result, just as query() does. The difference
> is only for all _other_ threads. All other threads can continue to run
> while the one thread waits for query results _only_ if the one thread
> used query_async(). If it used query(), then you _whole_ interpreter is
> blocked until the query is finished.

OH.  Thank you! :-)

> The naming of these function is quite strange - I was fooled by this
> myself, and so was whoever wrote the DBD-module for pg, because he used the
> non-async functions too ;-)

Yeah, I only want the thread to wait, and async sounds like asynchronous or
something. :-)

> To rephrase - you don't usually need to change anything in your program
> when using query_async() instead of query(). It'll just work better ;-).

Makes sense now.

> greetings, Florian Pflug