Thread: Ability to 'fork' a running transaction?

Ability to 'fork' a running transaction?

From

Alex Besogonov

Date:

30 January 2010, 20:29:20

I'm writing a program which needs to do a lot of (read only)
processing on a database. The program works on a SMP system with 16
CPUs, so it's natural to try to make it use all of them.

However, here lies the problem: I need to use SERIALIZABLE transaction
isolation level, and AFAIK it's not possible to make several database
connections to share the same exact view of the database.

So, is there a way to somehow stop all mutating operations? I want to
do something like:

STOP_MUTATING_OPERATIONS;
for(int f=0;f<10;++f)
{
   START_CONNECTION(f);
   SET_TO_SERIALIZABLE(f);
}
START_MUTATING_OPERATIONS();

for(int f=0;f<10;++f)
   DO_WORK_IN_CONNECTION(f);

Or is there a better way?

Re: Ability to 'fork' a running transaction?

From

Craig Ringer

Date:

31 January 2010, 01:26:13

On 31/01/2010 8:19 AM, Alex Besogonov wrote:
> I'm writing a program which needs to do a lot of (read only)
> processing on a database. The program works on a SMP system with 16
> CPUs, so it's natural to try to make it use all of them.
>
> However, here lies the problem: I need to use SERIALIZABLE transaction
> isolation level, and AFAIK it's not possible to make several database
> connections to share the same exact view of the database.

I've noticed some talk on -HACKERS of finding ways to make this
possible. It's needed for parallel pg_dump, among other things.

It's not clear if it'd work for non-read-only transactions; I didn't
notice that being discussed, and don't know enough about it to have an
opinion of my own. Still, it's worth looking into for the future.

> So, is there a way to somehow stop all mutating operations?

Take explicit locks on the resources of interest that are permissive
enough to be shared with other read transactions, but not to permit writes.

You might have to do this on the table level, rather than just using
SELECT ... FOR SHARE. The reason for that is that the locks taken by
SELECT ... FOR SHARE won't prevent the insertion of new rows that match
the where clause used in the select, so:

SELECT id FROM sometable WHERE customer = 1337 FOR SHARE;

won't block someone else inserting a record with customer=1337 in
`sometable', and if another transaction acquires a snapshot after that
INSERT commits it'll see the inserted row.

--
Craig Ringer

Re: Ability to 'fork' a running transaction?

From

Craig Ringer

Date:

31 January 2010, 10:03:09

On 31/01/2010 9:06 PM, Alex Besogonov wrote:
> On Sun, Jan 31, 2010 at 7:25 AM, Craig Ringer
> <craig@postnewspapers.com.au>  wrote:
>>> However, here lies the problem: I need to use SERIALIZABLE transaction
>>> isolation level, and AFAIK it's not possible to make several database
>>> connections to share the same exact view of the database.
>> I've noticed some talk on -HACKERS of finding ways to make this possible.
>> It's needed for parallel pg_dump, among other things.
> Actually, I the program I'm writing behaves exactly like parallel
> pg_dump from PostgreSQL's point of view.
>
> I've found this discussion in -HACKERS:
> http://osdir.com/ml/pgsql-hackers/2009-11/msg00265.html It seems, it's
> exactly what I need to do. I might try to contribute a patch.

Well, if you're able to that'd be absolutely brilliant :-)

>> It's not clear if it'd work for non-read-only transactions; I didn't notice
>> that being discussed, and don't know enough about it to have an opinion of
>> my own. Still, it's worth looking into for the future.
> It should be possible to do this for read/write transactions as well.
>
>>> So, is there a way to somehow stop all mutating operations?
>> Take explicit locks on the resources of interest that are permissive enough
>> to be shared with other read transactions, but not to permit writes.
> I thought about it, but it's too deadlock-prone. I need to lock the
> whole database, and if I do this table-by-table then I'll almost
> certainly generate a deadlock.

Not if you specify, and stick to, a strict lock acquisition order and
never try to upgrade a lock you already hold.

--
Craig Ringer

Re: Ability to 'fork' a running transaction?

From

Alex Besogonov

Date:

31 January 2010, 11:19:53

On Sun, Jan 31, 2010 at 4:02 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
>> I've found this discussion in -HACKERS:
>> http://osdir.com/ml/pgsql-hackers/2009-11/msg00265.html It seems, it's
>> exactly what I need to do. I might try to contribute a patch.
> Well, if you're able to that'd be absolutely brilliant :-)
I have already checked out the sources of PostgreSQL and started to
look how to hook up the required functionality and then I found this:
http://archives.postgresql.org/pgsql-hackers/2010-01/msg00916.php :)
It would be really nice to have it in the next PostgreSQL release.

I'll write a parallel variant of pg_dump so this functionality won't
be left unused.

>>>> So, is there a way to somehow stop all mutating operations?
>>> Take explicit locks on the resources of interest that are permissive
>>> enough
>>> to be shared with other read transactions, but not to permit writes.
>> I thought about it, but it's too deadlock-prone. I need to lock the
>> whole database, and if I do this table-by-table then I'll almost
>> certainly generate a deadlock.
> Not if you specify, and stick to, a strict lock acquisition order and never
> try to upgrade a lock you already hold.
That's not possible, I'm afraid. My code is essentially a 'parallel
pg_dump' and it needs to dump the whole database. So it's just not
possible to stick to the same locking order.

Re: Ability to 'fork' a running transaction?

From

Alex Besogonov

Date:

31 January 2010, 11:20:01

On Sun, Jan 31, 2010 at 7:25 AM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
>> However, here lies the problem: I need to use SERIALIZABLE transaction
>> isolation level, and AFAIK it's not possible to make several database
>> connections to share the same exact view of the database.
> I've noticed some talk on -HACKERS of finding ways to make this possible.
> It's needed for parallel pg_dump, among other things.
Actually, I the program I'm writing behaves exactly like parallel
pg_dump from PostgreSQL's point of view.

I've found this discussion in -HACKERS:
http://osdir.com/ml/pgsql-hackers/2009-11/msg00265.html It seems, it's
exactly what I need to do. I might try to contribute a patch.

Thanks for the pointer!

> It's not clear if it'd work for non-read-only transactions; I didn't notice
> that being discussed, and don't know enough about it to have an opinion of
> my own. Still, it's worth looking into for the future.
It should be possible to do this for read/write transactions as well.

>> So, is there a way to somehow stop all mutating operations?
> Take explicit locks on the resources of interest that are permissive enough
> to be shared with other read transactions, but not to permit writes.
I thought about it, but it's too deadlock-prone. I need to lock the
whole database, and if I do this table-by-table then I'll almost
certainly generate a deadlock.