Re: synchronized snapshots - Mailing list pgsql-hackers

From Joachim Wieland
Subject Re: synchronized snapshots
Date
Msg-id dc7b844e1002100236t3a5b9786xbbae7aa70998ddc3@mail.gmail.com
Whole thread Raw
In response to Re: synchronized snapshots  (Markus Wanner <markus@bluegap.ch>)
Responses Re: synchronized snapshots
List pgsql-hackers
Hi Markus,

On Fri, Feb 5, 2010 at 6:29 PM, Markus Wanner <markus@bluegap.ch> wrote:
>
> So, let's first concentrate on the intended use case: allowing parallel
> pg_dump. To me it seems like a pragmatic and quick solution, however, I'm
> not sure if requiring superuser privileges is acceptable.

http://www.postgresql.org/docs/8.4/static/backup-dump.html already
states about pg_dump: "In particular, it must have read access to all
tables that you want to back up, so in practice you almost always have
to run it as a database superuser." so I think there is not a big loss
here...


> Reading the code, I'm missing the part that actually acquires the snapshot
> for the transaction(s). After setting up multiple transactions with
> pg_synchronize_snapshot and pg_synchronize_snapshot_taken, they still don't
> have a snapshot, do they?

They more or less get it "by chance" :-)  They acquire a snapshot when
they call pg_synchronize_snapshot_taken() and if all the backends do
it while the other backend holds the lock in shared mode, we know that
the snapshot won't change, so they all get the same snapshot.


> Also, you should probably ensure the calling transactions don't have a
> snapshot already (let alone a transaction id).

True...


> In a similar vein, and answering your question in a comment: yes, I'd say
> you want to ensure your transactions are in SERIALIZABLE isolation mode.
> There's no other isolation level for which that kind of snapshot
> serialization makes sense, is there?

That's probably true but I didn't want to enforce this in the first
place. As said, all backends just "happen" to get the same snapshot
but they are still independent of each other so they are free to do
whatever they want to in their transactions.


> Using the exposed functions in a more general sense, I think it's important
> to note that the patch only intents to synchronize snapshots at the start of
> the transaction, not contiguously. Thus, normal transaction isolation
> applies for concurrent writes and each of the transactions can commit or
> rollback independently.
>
> The timeout is nice, but is it really required? Isn't the normal query
> cancellation infrastructure sufficient?

It seemed more robust and convenient to have an expiration in the
backend itself. What would happen if you called
pg_synchronize_snapshots() and if right after that your network
connection dropped? Without the server noticing, it would continue to
hold the lock and you could not log in anymore...

But you are right: The proposed feature is a pragmatic and quick
solution for pg_dump and similar but we might want to have a more
general snapshot cloning procedure instead. Not having a delay for
other activities at all and not requiring superuser privileges would
be a big advantage over what I have proposed.


Joachim


pgsql-hackers by date:

Previous
From: Kurt Harriman
Date:
Subject: Re: Patch: Remove gcc dependency in definition of inline functions
Next
From: Greg Stark
Date:
Subject: Re: Some belated patch review for "Buffers" explain analyze patch