Thread: pg_dump --snapshot

pg_dump --snapshot

From
Simon Riggs
Date:
Patch to allow pg_dump to use a snapshot exported with an explicit
pg_export_snapshot() for when precise timing of the snapshot is
important.

This overrides the internally generated snapshot in parallel pg_dump.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pg_dump --snapshot

From
Andrew Dunstan
Date:
On 05/06/2013 10:56 AM, Simon Riggs wrote:
> Patch to allow pg_dump to use a snapshot exported with an explicit
> pg_export_snapshot() for when precise timing of the snapshot is
> important.
>
> This overrides the internally generated snapshot in parallel pg_dump.
>
>
>


Could you be a bit more expansive about the use case, please?

cheers

andrew



Re: pg_dump --snapshot

From
Simon Riggs
Date:
On 6 May 2013 16:02, Andrew Dunstan <andrew@dunslane.net> wrote:
>
> On 05/06/2013 10:56 AM, Simon Riggs wrote:
>>
>> Patch to allow pg_dump to use a snapshot exported with an explicit
>> pg_export_snapshot() for when precise timing of the snapshot is
>> important.
>>
>> This overrides the internally generated snapshot in parallel pg_dump.
>>

> Could you be a bit more expansive about the use case, please?

Sure.

Exported snapshots allow you to coordinate a number of actions
together, so they all see a common view of the database. So this patch
allows a very general approach to this, much more so than pg_dump
allows currently since the exact timing of the snapshot is not
controlled by the user.

For example, you may wish to create a sequence of materialized views
all using the same snapshot, while at the same time dumping the
database, then follow that by dumping the mat views also. Or you may
wish to dump the database at the exact moment between two tasks,
rather than wait for pg_dump to startup before commencing the next
task.

The original pg_snapclone code from 2008 allowed an export of the
snapshot and then an import of the snapshot explicitly into pg_dump in
much the same way shown here. This patch would be the only aspect
missing from that original feature set: explicit control. (Thanks for
following up to get it all into production code).

Logical replication would be one of the users of this also, but it is
a standalone feature which complements the existing capabilities.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: pg_dump --snapshot

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 6 May 2013 16:02, Andrew Dunstan <andrew@dunslane.net> wrote:
>> On 05/06/2013 10:56 AM, Simon Riggs wrote:
>>> This overrides the internally generated snapshot in parallel pg_dump.

>> Could you be a bit more expansive about the use case, please?

> Exported snapshots allow you to coordinate a number of actions
> together, so they all see a common view of the database. So this patch
> allows a very general approach to this, much more so than pg_dump
> allows currently since the exact timing of the snapshot is not
> controlled by the user.

I'm afraid that this is institutionalizing a design deficiency in
pg_dump; namely that it takes its snapshot before acquiring locks.
Ideally that would happen the other way around.  I don't have a good
idea how we could fix that --- but a feature that allows imposition
of an outside snapshot will permanently foreclose ever fixing it.

What's more, this would greatly widen the risk window between when
the snapshot is taken and when we have all the locks and can have
some confidence that the DB isn't changing under us.

Or in short: -1 for the very concept of letting the user control
pg_dump's snapshot.
        regards, tom lane



Re: pg_dump --snapshot

From
Andres Freund
Date:
On 2013-05-06 13:07:17 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On 6 May 2013 16:02, Andrew Dunstan <andrew@dunslane.net> wrote:
> >> On 05/06/2013 10:56 AM, Simon Riggs wrote:
> >>> This overrides the internally generated snapshot in parallel pg_dump.
>
> >> Could you be a bit more expansive about the use case, please?
>
> > Exported snapshots allow you to coordinate a number of actions
> > together, so they all see a common view of the database. So this patch
> > allows a very general approach to this, much more so than pg_dump
> > allows currently since the exact timing of the snapshot is not
> > controlled by the user.
>
> I'm afraid that this is institutionalizing a design deficiency in
> pg_dump; namely that it takes its snapshot before acquiring locks.
> Ideally that would happen the other way around.  I don't have a good
> idea how we could fix that --- but a feature that allows imposition
> of an outside snapshot will permanently foreclose ever fixing it.
>
> What's more, this would greatly widen the risk window between when
> the snapshot is taken and when we have all the locks and can have
> some confidence that the DB isn't changing under us.

The initial transaction that exports the transaction would need to hold
locks until pg_dump started :/.

> Or in short: -1 for the very concept of letting the user control
> pg_dump's snapshot.

Its rather useful if you e.g. want to instantiate a new replica without
rebuilding pg_dump/pg_restore's capabilities wrt. ordering, parallelism,
separating initial data load from index creation and all that. Which
already has been incompletely reinvented by several solutions :(.

So besides the above and real problems you point out this seems
worthwile to me...

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_dump --snapshot

From
Simon Riggs
Date:
On 6 May 2013 18:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Or in short: -1 for the very concept of letting the user control
> pg_dump's snapshot.

That API is already exposed, so not sure why you say this now? This
has been in PG since early in 9.2, about 2 years ago.

In any case, "flashback database" is one of the most requested
features I know of... the ability to dump the database as it appeared
in the past *after* that point has passed. I call it by its name as
used in Oracle, but many people have taken the trouble to describe
that in detail to me, even without knowing a similar feature existed
elsewhere.

So it will always be important to do SET TRANSACTION SNAPSHOT 'a point
in the past'
and if we can do that, why not pg_dump also?

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: pg_dump --snapshot

From
Stephen Frost
Date:
* Andres Freund (andres@2ndquadrant.com) wrote:
> Its rather useful if you e.g. want to instantiate a new replica without
> rebuilding pg_dump/pg_restore's capabilities wrt. ordering, parallelism,
> separating initial data load from index creation and all that. Which
> already has been incompletely reinvented by several solutions :(.

Perhaps it's just a wording thing, but I wouldn't use the term "replica"
when referring to something built with pg_dump/restore- that should
really be reserved for a slave system created through replication.

> So besides the above and real problems you point out this seems
> worthwile to me...

It certainly sounds interesting and I like the idea of it, but perhaps
we need a different mechanism than just passing in a raw snapshot, to
address the concerns that Tom raised.
Thanks,
    Stephen

Re: pg_dump --snapshot

From
Andres Freund
Date:
On 2013-05-06 14:35:14 -0400, Stephen Frost wrote:
> * Andres Freund (andres@2ndquadrant.com) wrote:
> > Its rather useful if you e.g. want to instantiate a new replica without
> > rebuilding pg_dump/pg_restore's capabilities wrt. ordering, parallelism,
> > separating initial data load from index creation and all that. Which
> > already has been incompletely reinvented by several solutions :(.
> 
> Perhaps it's just a wording thing, but I wouldn't use the term "replica"
> when referring to something built with pg_dump/restore- that should
> really be reserved for a slave system created through replication.

Well, part of the usecase *is* using it for the cloning in a replication
solution instead of open-coding it there. E.g. londiste and slony both
have home-grown hacks to do this.

> > So besides the above and real problems you point out this seems
> > worthwile to me...
> 
> It certainly sounds interesting and I like the idea of it, but perhaps
> we need a different mechanism than just passing in a raw snapshot, to
> address the concerns that Tom raised.

If there is anything which isn't magnitudes more complex, I'd be
interested. But given we couldn't even find a sensible solution for
pg_dump internally I don't have all that high hopes...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_dump --snapshot

From
Greg Stark
Date:
On Mon, May 6, 2013 at 6:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> In any case, "flashback database" is one of the most requested
> features I know of... the ability to dump the database as it appeared
> in the past *after* that point has passed.

Fwiw that's not what flashback database does. It rolls back the whole
database to that earlier point in time. it's equivalent to running
recovery but backwards in time.

Obviously if you had the ability to dump the database as of an earlier
point in time you could do a complete dump and then a complete restore
and effectively have accomplished the same thing. But that's like
calling pg_dump an incremental backup.





-- 
greg



Re: pg_dump --snapshot

From
Simon Riggs
Date:
On 6 May 2013 19:35, Stephen Frost <sfrost@snowman.net> wrote:

> It certainly sounds interesting and I like the idea of it, but perhaps
> we need a different mechanism than just passing in a raw snapshot, to
> address the concerns that Tom raised.

It does *not* pass in a raw snapshot. All it does is to allow pg_dump
to use an API that is already exposed by the backend for this very
purpose, one that has been in Postgres since 9.2.
http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION

Minor patch, no amazing new functionality, no drama.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: pg_dump --snapshot

From
Simon Riggs
Date:
On 6 May 2013 19:48, Greg Stark <stark@mit.edu> wrote:
> On Mon, May 6, 2013 at 6:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> In any case, "flashback database" is one of the most requested
>> features I know of... the ability to dump the database as it appeared
>> in the past *after* that point has passed.
>
> Fwiw that's not what flashback database does. It rolls back the whole
> database to that earlier point in time. it's equivalent to running
> recovery but backwards in time.
>
> Obviously if you had the ability to dump the database as of an earlier
> point in time you could do a complete dump and then a complete restore
> and effectively have accomplished the same thing.

OK, so you know Oracle. So then you also know that there are actually
a number of related features all called "flashback <something>", all
interrelated. What I meant by using their term was just a broad
reference to that capability, not an exact 1:1 match. Most people
requesting this have not asked for it by that name.

At the database level, it rolls back the whole kaboodle. Not what I
meant at all and I would expect people to start twitching at the
prospect.

The feature we have in PG9.2+ is the ability to set a transaction
snapshot to a snapshot that existed in the database at some point,
invoked by some external reference to it. The external reference is
the piece of information that must be specified by the user to allow
the database to look backwards. At the moment we can only specify a
snapshot from a currently running transaction, i.e. the recent past. I
foresee a feature that will allow us to look back further, possibly
with some restrictions, though certainly read only. There is a wide
stretch of water between current state and making that work, but the
existence of an ability to specify an external reference to a snapshot
is pivotal to that future capability and I would not wish to see that
capability removed.

This patch only allows pg_dump to use the existing API. As an example,
we would use it like this.

Session 1:
BEGIN; SELECT pg_export_snapshot(); --returns a textual reference to
the internally held snapshotpg_export_snapshot
--------------------000004F6-1
(1 row)

Session 2 -- some other user of the same snapshot
pg_dump --snapshot '000004F6-1' database1

Session 3 -- some other user of the same snapshot
e.g.
pg_dump --snapshot '000004F6-1' database2
some other programs etc..

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: pg_dump --snapshot

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> It does *not* pass in a raw snapshot. All it does is to allow pg_dump
> to use an API that is already exposed by the backend for this very
> purpose, one that has been in Postgres since 9.2.
> http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION

> Minor patch, no amazing new functionality, no drama.

You're ignoring the objection ...

The snapshot-transfer facility is intended for supporting, in essence,
multi-threaded closed applications.  In such a context we can expect
that the leader knows enough about the followers to predict which locks
need to be acquired before the to-be-shared snapshot is acquired.

Exposing that in pg_dump, without doing a lick of other work (which is
what I take to be your proposal), would leave us with a situation
wherein an application wishing to invoke pg_dump safely would need to
know what locks pg_dump will take --- something that's rather a moving
target.  If it gets that wrong, it will be at risk of obtaining
inconsistent dumps without any warning.

I think a minimum requirement before we can accept this feature is that
there be a way to obtain all the same locks that pg_dump would get when
given the same command-line arguments.  This would, perhaps, be a good
test criterion for the fabled library-ization of pg_dump.
        regards, tom lane



Re: pg_dump --snapshot

From
Stephen Frost
Date:
Simon,

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> On 6 May 2013 19:35, Stephen Frost <sfrost@snowman.net> wrote:
> > It certainly sounds interesting and I like the idea of it, but perhaps
> > we need a different mechanism than just passing in a raw snapshot, to
> > address the concerns that Tom raised.
>
> It does *not* pass in a raw snapshot.

It wasn't my intent to impart anything more specific than what
pg_export_snapshot() returns when I said 'raw snapshot'.  What would you
call it?  Snapshot identifier?  All I was trying to say is that I agree
with Tom that pg_dump really needs more to happen than simply having the
results of pg_export_snapshot() passed to it- pg_dump wants all the
necessary locks taken immediately after the transaction opens and
pg_export_snapshot() simply doesn't do that.

> All it does is to allow pg_dump
> to use an API that is already exposed by the backend for this very
> purpose, one that has been in Postgres since 9.2.

In doing so it opens a much larger hole through which this approach can
break, namely that objects could disappear between the snapshot being
taken and appropriate locks being set up.  That issue already exists in
pg_dump, but at least it's a small window through one code path- and it
all happens before any parallelization or snapshot-sharing happens, as
best as I can tell.

If I understand correctly, right now we have:

connect to DB
start a transaction
run around and grab locks
get our snapshot ID
fork, connect in, glob onto the same snapshot

Assuming I'm following along correctly, this change would be:

someone in a far off land creates the snapshot
time passes..
then:
connect to DB
set the who-knows-how-old snapshot ID
run around and try to grab locks
fork, connect in, glob onto the same snapshot

One thing that, I believe, addresses this a bit is that we should at
least bomb out with an error while we're trying to acquire the locks,
should an object be dropped between transaction start and when we go to
lock it, right?  We'll still see the old version of pg_class from the
start of the transaction and therefore we'll try to lock everything from
the older viewpoint...?

For my part, I'm a bit less worried about error cases around this,
provided that they're made very clear and that they're quite obvious to
the end user, and very worried about us possibly missing some object
that we were expected to capture.

In any case, would a function along the lines of
"pg_export_and_lock_snapshot()", which basically starts a transaction,
acquires locks on every object in the DB, and then returns the snapshot
ID, address this?  Perhaps even pg_dump could use that, on the belief
that it'd be faster for a backend function to acquire those locks than
for pg_dump to do it anyway?  I'm not sure that we want to support every
different combination of filtering options that pg_dump supports for
this, but we could start with the "all-database" option since that's, by
definition, the largest set which could be requested by pg_dump.

Or perhaps even a new lock type for this...  I'd love to see the
"locking-all-objects" portion of time disappear from my nightly
backups..
Thanks,
    Stephen

Re: pg_dump --snapshot

From
Greg Stark
Date:
On Mon, May 6, 2013 at 10:02 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> At the database level, it rolls back the whole kaboodle. Not what I
> meant at all and I would expect people to start twitching at the
> prospect.

I think it would be pretty sweet but we don't have the infrastructure
for it. We would need to retain enough information in the WAL log (or
somewhere else) to reverse the records.

> The feature we have in PG9.2+ is the ability to set a transaction
> snapshot to a snapshot that existed in the database at some point,
> invoked by some external reference to it. The external reference is
> the piece of information that must be specified by the user to allow
> the database to look backwards. At the moment we can only specify a
> snapshot from a currently running transaction, i.e. the recent past. I
> foresee a feature that will allow us to look back further, possibly
> with some restrictions, though certainly read only.

This is similar to "flashback query". And I think you're right that to
be comparable with Oracle's features we would need some option to
specify the snapshot based on time or wal position.  And fwiw I think
it could still be read-write in consistent-read or serializable mode.
If you tried to update any records that had been updated since you
would get a serialization failure.

So I just did some research. It seems Oracle's equivalent of pg_dump
"expdp" does use flashback internally to guarantee consistency in some
cases which is perhaps analogous to how pg_dump uses snapshots to
synchronize multiple sessions (though it sounds like Oracle uses it
for cases that just work in Postgres).

But more interestingly expdp does in fact have a user option to
specify a timestamp or scn (analogous to wal position) and use
flashback query to dump the data at that point in time. That's a
pretty clear a parallel to what you propose here.




-- 
greg



Re: pg_dump --snapshot

From
Stephen Frost
Date:
* Greg Stark (stark@mit.edu) wrote:
> On Mon, May 6, 2013 at 10:02 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > At the database level, it rolls back the whole kaboodle. Not what I
> > meant at all and I would expect people to start twitching at the
> > prospect.
>
> I think it would be pretty sweet but we don't have the infrastructure
> for it. We would need to retain enough information in the WAL log (or
> somewhere else) to reverse the records.

Let me start off by saying that I do like the general idea.  We're
rather different from Oracle, which makes me wonder if we might be in a
slightly better position to handle this kind of an option from.

For example, I'm not sure that we need more information in the WAL..
What we need is a way to tell VACUUM to skip over 'recently modified'
records and not mark them as dead until some time has passed.  This is
essentially what we're already doing with the feedback mechanism in
replication, isn't it?  Would it be possible to make that a simple timer
instead of requiring a replication system which is feeding back that
information based on queries which are running on the replica?

> But more interestingly expdp does in fact have a user option to
> specify a timestamp or scn (analogous to wal position) and use
> flashback query to dump the data at that point in time. That's a
> pretty clear a parallel to what you propose here.

What happens to objects which have been dropped after the user-specified
flashback point?  My guess is that they simply aren't there for the user
to go look at, but perhaps I'm wrong.  Either way though, I'm not sure
that we'd accept such a poor solution to that problem.  For that matter,
what do we do today wrt replication feedback when/if that happens?
Thanks,
    Stephen

Re: pg_dump --snapshot

From
Joe Conway
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/06/2013 03:00 PM, Stephen Frost wrote:
> For example, I'm not sure that we need more information in the
> WAL.. What we need is a way to tell VACUUM to skip over 'recently
> modified' records and not mark them as dead until some time has
> passed.  This is essentially what we're already doing with the
> feedback mechanism in replication, isn't it?  Would it be possible
> to make that a simple timer instead of requiring a replication
> system which is feeding back that information based on queries
> which are running on the replica?

This sounds a lot like having an "idle in transaction" session hanging
around. Could we use something akin to a prepared transaction to
preserve the ability to use the snapshot? Maybe with an (optional)
expiration timestamp?

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRiDCgAAoJEDfy90M199hlbroP/0vVnxo91Atc6hp9l7qFsYZA
YAsrLLHMcRGdP01o+XY50COhm0ScR2zJg88wSwJTwIve5PEKywu7waP6/7Ahty/s
7sHMHZJ90fNbRTqhb5L9/4hEMN0213biol4ANk/gVcNs1nF9t+BrQK3HMsGCe5P/
InMftpTHCuPdHOmAPLMgRi/rAzWgwEy/9A/B6sw+EmMvl7j7EX5Zjur/nHaZAE9s
mA5AxY8oZv7QRJNDmp3Bg0d6tR/6WzXQDv0eEkjpeInk8d/CSFZX/kOWwsGawIrz
9WpxuMRza/L08B0Faw+Bm1jRzjp9FW5SjYDzRLjEcheNreA6vLwHSKNneBfCofU3
SE6+kK/VRxrNyc4f2gq5gl6LmK/frDojoWSt9JUd5hhXSAcmuB5iEmryrnw6xRok
TyXO4PIT59zfLXbesONEJuVIekWVs6GHk5uC+h58Re1dt1cfdQzHrAlX39sodBb8
6uBp++DiPFCg/WklJ29qFL0p6IhXhywxmGuuHB8ca1p1rh8u13HsuJ70MjBAft62
r4T94A1N/vZ9veP6eE8XFYFLaXiNUiR+r1vHdKn6MXnFpqV9OMuJ2pm476j9xSb/
nMOHQFln4IM7W++tV2y9sKKG+C8RqtCAXVSdUe2fFX9FWfprmynecrsphyD17pCi
/ZQFv0jkmS/mBWF7gFjx
=3azd
-----END PGP SIGNATURE-----



Re: pg_dump --snapshot

From
Simon Riggs
Date:
On 6 May 2013 22:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> It does *not* pass in a raw snapshot. All it does is to allow pg_dump
>> to use an API that is already exposed by the backend for this very
>> purpose, one that has been in Postgres since 9.2.
>> http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION
>
>> Minor patch, no amazing new functionality, no drama.
>
> You're ignoring the objection ...

No, I just don't see a problem at all.

Locks and snapshots have got nothing to do with each other, in
Postgres. Taking a snapshot doesn't imply that database objects are
locked; whoever takes the snapshot should lock things first, if they
are worried by that.

If anybody really wanted to fix pg_dump, they could do. If that was so
important, why block this patch, but allow parallel pg_dump to be
committed without it?

There is no risk that is larger than the one already exposed by the
existing user API.

If you do see a risk in the existing API, please deprecate it and
remove it from the docs, or mark it not-for-use-by-users. I hope you
don't, but if you do, do it now - I'll be telling lots of people about
all the useful things you can do with it over the next few years,
hopefully in pg_dump as well.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: pg_dump --snapshot

From
Stephen Frost
Date:
Simon,

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> If anybody really wanted to fix pg_dump, they could do. If that was so
> important, why block this patch, but allow parallel pg_dump to be
> committed without it?

Because parallel pg_dump didn't make the problem any *worse*..?  This
does.  The problem existed before parallel pg_dump.

> There is no risk that is larger than the one already exposed by the
> existing user API.

The API exposes it, yes, but *pg_dump* isn't any worse than it was
before.

> If you do see a risk in the existing API, please deprecate it and
> remove it from the docs, or mark it not-for-use-by-users. I hope you
> don't, but if you do, do it now - I'll be telling lots of people about
> all the useful things you can do with it over the next few years,
> hopefully in pg_dump as well.

pg_dump uses it already and uses it as best it can.  Users could use it
also, provided they understand the constraints around it.  However,
there really isn't a way for users to use this new option correctly-
they would need to intuit what pg_dump will want to lock, lock it
immediately after their transaction is created, and only *then* get the
snapshot ID and pass it to pg_dump, hoping against hope that pg_dump
will actually need the locks that they decided to acquire..
Thanks,
    Stephen

Re: pg_dump --snapshot

From
Andres Freund
Date:
On 2013-05-06 20:18:26 -0400, Stephen Frost wrote:
> Simon,
> 
> * Simon Riggs (simon@2ndQuadrant.com) wrote:
> > If anybody really wanted to fix pg_dump, they could do. If that was so
> > important, why block this patch, but allow parallel pg_dump to be
> > committed without it?

> Because parallel pg_dump didn't make the problem any *worse*..?  This
> does.  The problem existed before parallel pg_dump.

Yes, it did.

> > There is no risk that is larger than the one already exposed by the
> > existing user API.

> The API exposes it, yes, but *pg_dump* isn't any worse than it was
> before.

No, but its still broken. pg_dump without the parameter being passed
isn't any worse off after the patch has been applied. With the parameter
the window gets a bit bigger sure...

> > If you do see a risk in the existing API, please deprecate it and
> > remove it from the docs, or mark it not-for-use-by-users. I hope you
> > don't, but if you do, do it now - I'll be telling lots of people about
> > all the useful things you can do with it over the next few years,
> > hopefully in pg_dump as well.

> pg_dump uses it already and uses it as best it can.  Users could use it
> also, provided they understand the constraints around it.  However,
> there really isn't a way for users to use this new option correctly-
> they would need to intuit what pg_dump will want to lock, lock it
> immediately after their transaction is created, and only *then* get the
> snapshot ID and pass it to pg_dump, hoping against hope that pg_dump
> will actually need the locks that they decided to acquire..

Given that we don't have all that many types of objects we can lock,
that task isn't all that complicated. But I'd guess a very common usage
is to start the snapshot and immediately fork pg_dump. In that case the
window between snapshot acquiration and reading the object list is
probably smaller than the one between reading the object list and
locking.

This all reads like a textbook case of "perfect is the enemy of good" to
me.

A rather useful feature has to fix a bug in pg_dump which a) exists for
ages b) has yet to be reported to the lists c) is rather complicated to
fix and quite possibly requires proper snapshots for internals?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_dump --snapshot

From
Andres Freund
Date:
On 2013-05-07 02:53:16 +0200, Andres Freund wrote:
> A rather useful feature has to fix a bug in pg_dump which a) exists for
> ages b) has yet to be reported to the lists c) is rather complicated to
> fix and quite possibly requires proper snapshots for internals?

Just to clarify: I think this worth fixing, but it just seems like
something that needs to be fixed independently from this feature.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_dump --snapshot

From
Stephen Frost
Date:
* Andres Freund (andres@2ndquadrant.com) wrote:
> On 2013-05-06 20:18:26 -0400, Stephen Frost wrote:
> > Because parallel pg_dump didn't make the problem any *worse*..?  This
> > does.  The problem existed before parallel pg_dump.
>
> Yes, it did.

That's not entirely clear- are you agreeing with my statements, or not?

> > The API exposes it, yes, but *pg_dump* isn't any worse than it was
> > before.
>
> No, but its still broken. pg_dump without the parameter being passed
> isn't any worse off after the patch has been applied. With the parameter
> the window gets a bit bigger sure...

I'm not entirely following the distinction you're making here.  What I
think you're saying is that "pg_dump is still busted" and "pg_dump when
the parameter isn't passed is busted" and "pg_dump creates a bigger
window where it can break if the parameter is passed".  All of which I
think I agree with, but I don't agree with the conclusion that this
larger window is somehow acceptable because there's a very small window
(one which can't be made any smaller, today..) which exists today.

> Given that we don't have all that many types of objects we can lock,
> that task isn't all that complicated.

Alright, then let's provide a function which will do that and tell
people to use it instead of just using pg_export_snapshot(), which
clearly doesn't do that.

> But I'd guess a very common usage
> is to start the snapshot and immediately fork pg_dump. In that case the
> window between snapshot acquiration and reading the object list is
> probably smaller than the one between reading the object list and
> locking.

How would it be smaller..?  I agree that it may only be a few seconds
larger, but you're adding things to the front which the current code
doesn't run, yet running everything the current code runs, so it'd have
to be larger..

> This all reads like a textbook case of "perfect is the enemy of good" to
> me.

I believe the main argument here is really around "you should think
about these issues before just throwing this in" and not "it must be
perfect before it goes in".  Perhaps "it shouldn't make things *worse*
than they are now" would also be apt..

> A rather useful feature has to fix a bug in pg_dump which a) exists for
> ages b) has yet to be reported to the lists c) is rather complicated to
> fix and quite possibly requires proper snapshots for internals?

I've not seen anyone calling for this to be fixed in pg_dump first,
though I did suggest how that might be done.  Rather, it shouldn't make
things *worse* than they are now, which apparently isn't difficult, per
your comments above...  so why not fix this to at least not make things
worse?
Thanks,
    Stephen

Re: pg_dump --snapshot

From
Craig Ringer
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/07/2013 06:37 AM, Joe Conway wrote:
> On 05/06/2013 03:00 PM, Stephen Frost wrote:
> > For example, I'm not sure that we need more information in the
> > WAL.. What we need is a way to tell VACUUM to skip over 'recently
> > modified' records and not mark them as dead until some time has
> > passed.  This is essentially what we're already doing with the
> > feedback mechanism in replication, isn't it?  Would it be possible
> > to make that a simple timer instead of requiring a replication
> > system which is feeding back that information based on queries
> > which are running on the replica?
>
> This sounds a lot like having an "idle in transaction" session hanging
> around. Could we use something akin to a prepared transaction to
> preserve the ability to use the snapshot? Maybe with an (optional)
> expiration timestamp?

I was thinking the same thing myself but assuming it was just another
one of my impractical ideas that're borne of lack of in-depth
understanding of the problem. A "lock holder" of some kind that's
independent of the session seems to be required here, either associated
directly with the snapshot or created after it and managed
independently. A prepared transaction created shortly after the commit
the snapshot refers to that holds all the required locks would seem to
serve the required purpose. A bit of a hack, but enough to make this
sane to use until/unless someone has the time/funding to do the major
rework needed to make this work "right".

- -- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRiFUOAAoJELBXNkqjr+S2TksH/2KqL+7eUxyE9aBg3Ci3gSc7
OP0y3jk34mpG5aXtzCcVD8jC81bDT0eRGDAEnjAliHW/UCWkaxdX6ziY3BRIfJ7B
vvpArYEA3I0CgewGypciT3/692iDVAvTsVXnd1Vx4jJLiyYt83MYr7EmOpsJwzG1
NJ7MFjAV+61SBW8uRwSopvqm2e6MZiYjCR3orvqBm7t3xKeuXAOv4zM5pM+m4hz5
gGB53XKPNsyr2m9pX8ScxprHvkAjflXB6QQBR07XBrkb1kWXifSKxw7bsscxP4hv
GQxcRzex2wWVJ654NH7v/QNt4Ynp2qUpl1tpTloIzv0aF+BTLXdlbGLpkjJvwRU=
=n/k9
-----END PGP SIGNATURE-----




Re: pg_dump --snapshot

From
Andres Freund
Date:
On 2013-05-06 21:07:36 -0400, Stephen Frost wrote:
> * Andres Freund (andres@2ndquadrant.com) wrote:
> > On 2013-05-06 20:18:26 -0400, Stephen Frost wrote:
> > > Because parallel pg_dump didn't make the problem any *worse*..?  This
> > > does.  The problem existed before parallel pg_dump.
> > 
> > Yes, it did.
> 
> That's not entirely clear- are you agreeing with my statements, or
> not?

I am agreeing its a very old problem that has existed before parallel
pg_dump.

> > > The API exposes it, yes, but *pg_dump* isn't any worse than it was
> > > before.
> > 
> > No, but its still broken. pg_dump without the parameter being passed
> > isn't any worse off after the patch has been applied. With the parameter
> > the window gets a bit bigger sure...
> 
> I'm not entirely following the distinction you're making here.  What I
> think you're saying is that "pg_dump is still busted" and "pg_dump when
> the parameter isn't passed is busted" and "pg_dump creates a bigger
> window where it can break if the parameter is passed".

Yes, that's what I was trying to say.

> All of which I
> think I agree with, but I don't agree with the conclusion that this
> larger window is somehow acceptable because there's a very small window
> (one which can't be made any smaller, today..) which exists today.

The window isn't that small currently:

a) If one of our lock statements has to wait for a preexisting
conflicting lock we have to wait, possibly for a very long
time. Allthewhile some other objects are not locked by any backend.
b) Locking all relations in a big database can take a second or some,
even if there are no conflicting locks.

> > Given that we don't have all that many types of objects we can lock,
> > that task isn't all that complicated.
> 
> Alright, then let's provide a function which will do that and tell
> people to use it instead of just using pg_export_snapshot(), which
> clearly doesn't do that.

If it were clear cut what to lock and we had locks for
everything. Maybe. But we don't have locks for everything. So we would
need to take locks preventing any modification on any of system catalogs
which doesn't really seem like a good thing, especially as we can't
release them from sql during the dump were we can allow creation of
temp tables and everything without problems.

Also, as explained above, the problem already exists in larger
timeframes than referenced in this thread, so I really don't see how
anything thats only based on plain locks on user objects can solve the
issue in a relevant enough way.

> > But I'd guess a very common usage
> > is to start the snapshot and immediately fork pg_dump. In that case the
> > window between snapshot acquiration and reading the object list is
> > probably smaller than the one between reading the object list and
> > locking.
> 
> How would it be smaller..?  I agree that it may only be a few seconds
> larger, but you're adding things to the front which the current code
> doesn't run, yet running everything the current code runs, so it'd have
> to be larger..

I am comparing the time between 'snapshot acquiration' and 'getting
the object list' with the time between 'getting the object list' and
'locking the object list'. What I am saying is that in many scenarios
the second part will be the bigger part.

> > This all reads like a textbook case of "perfect is the enemy of good" to
> > me.
> 
> I believe the main argument here is really around "you should think
> about these issues before just throwing this in" and not "it must be
> perfect before it goes in".  Perhaps "it shouldn't make things *worse*
> than they are now" would also be apt..

That's not how I read 8465.1367860037@sss.pgh.pa.us :(

> > A rather useful feature has to fix a bug in pg_dump which a) exists for
> > ages b) has yet to be reported to the lists c) is rather complicated to
> > fix and quite possibly requires proper snapshots for internals?

> I've not seen anyone calling for this to be fixed in pg_dump first,
> though I did suggest how that might be done.

I think there is no point in fixing it somewhere else. The problem is in
pg_dump, not the snapshot import/export.

You did suggest how it can be fixed? You mean
20130506214515.GL4361@tamriel.snowman.net?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_dump --snapshot

From
Simon Riggs
Date:
On 7 May 2013 01:18, Stephen Frost <sfrost@snowman.net> wrote:

> * Simon Riggs (simon@2ndQuadrant.com) wrote:
>> If anybody really wanted to fix pg_dump, they could do. If that was so
>> important, why block this patch, but allow parallel pg_dump to be
>> committed without it?
>
> Because parallel pg_dump didn't make the problem any *worse*..?  This
> does.

Sorry, not accurate. Patch makes nothing *worse*.

The existing API *can* be misused in the way you say, and so also
could pg_dump if the patch is allowed.

However, there is no reason to suppose that such misuse would be
common; no reason why a timing gap would *necessarily* occur in the
way your previous example showed, or if it did why it would
necessarily present a problem for the user. Especially if we put
something in the docs.

> pg_dump uses it already and uses it as best it can.  Users could use it
> also, provided they understand the constraints around it.

Snapshots have no WARNING on them. There is no guarantee in any
transaction that the table you want will not be dropped before you try
to access it. *Any* program that dynamically assembles a list of
objects and then acts on them is at risk of having an out-of-date list
of objects as the database moves forward. This applies to any form of
snapshot, not just this patch, nor even just exported snapshots.

> However,
> there really isn't a way for users to use this new option correctly-

Not accurate.

> they would need to intuit what pg_dump will want to lock, lock it
> immediately after their transaction is created, and only *then* get the
> snapshot ID and pass it to pg_dump, hoping against hope that pg_dump
> will actually need the locks that they decided to acquire..

The argument against this is essentially that we don't trust the user
to use it well, so we won't let them have it at all. Which makes no
sense since they already have this API and don't need our permission
to use it. All that blocking this patch does is to remove any chance
the user has of coordinating pg_dump with other actions; preventing
that causes more issues for the user and so doing nothing is not a
safe or correct either. A balanced viewpoint needs to include the same
level of analysis on both sides, not just a deep look at the worst
case on one side and claim everything is rosy with the current
situation.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: pg_dump --snapshot

From
Stephen Frost
Date:
* Andres Freund (andres@2ndquadrant.com) wrote:
> > All of which I
> > think I agree with, but I don't agree with the conclusion that this
> > larger window is somehow acceptable because there's a very small window
> > (one which can't be made any smaller, today..) which exists today.
>
> The window isn't that small currently:

Agreed- but it also isn't currently possible to make it any smaller.

> b) Locking all relations in a big database can take a second or some,
> even if there are no conflicting locks.

Yes, I've noticed.. :(  You can also run out of locks, which is quite
painful.

> > Alright, then let's provide a function which will do that and tell
> > people to use it instead of just using pg_export_snapshot(), which
> > clearly doesn't do that.
>
> If it were clear cut what to lock and we had locks for
> everything. Maybe. But we don't have locks for everything.

My suggestion was to lock everything that pg_dump locks, which we
clearly have locks for since pg_dump is acquiring them.  Also, I don't
believe it'd be that difficult to identify what pg_dump would lock, at
least in a 'default' whole-database run.  This is more of a stop-gap
than a complete solution.

> So we would
> need to take locks preventing any modification on any of system catalogs
> which doesn't really seem like a good thing, especially as we can't
> release them from sql during the dump were we can allow creation of
> temp tables and everything without problems.

That's already an issue when pg_dump runs, no?  Not sure why this is
different.

> Also, as explained above, the problem already exists in larger
> timeframes than referenced in this thread, so I really don't see how
> anything thats only based on plain locks on user objects can solve the
> issue in a relevant enough way.

The point is to try and avoid making the problem worse..

> I am comparing the time between 'snapshot acquiration' and 'getting
> the object list' with the time between 'getting the object list' and
> 'locking the object list'. What I am saying is that in many scenarios
> the second part will be the bigger part.

I can see how that can happen, sure.

> > I believe the main argument here is really around "you should think
> > about these issues before just throwing this in" and not "it must be
> > perfect before it goes in".  Perhaps "it shouldn't make things *worse*
> > than they are now" would also be apt..
>
> That's not how I read 8465.1367860037@sss.pgh.pa.us :(

I believe the point that Tom is making is that we shouldn't paint
ourselves into a corner by letting users provide old snapshots to
pg_dump which haven't acquired any of the necessary locks.  The goal, at
least as I read it, is to come up with a workable design (and I don't
know that we have, but still) which provides a way for the locks to be
taken at least as quickly as what pg_dump does today and which we could
modify down the road to take the locks pre-snapshot (presuming we can
figure out a way to make that work).

The proposed patch certainly doesn't make any attempt to address that
issue and would encourage users to open themselves up to this risk more
than they are exposted today w/ pg_dump.

> I think there is no point in fixing it somewhere else. The problem is in
> pg_dump, not the snapshot import/export.

It's really a problem for just about everything that uses transactions
and locking, isn't it?  pg_dump just happens to have it worst since it
wants to go and touch every object in the database.  It's certainly
possible for people to connect to the DB, look at pg_class and then try
to access some object which no longer exists (even though it's in
pg_class).  This will be an interesting thing to consider when
implementing MVCC for the catalog.

> You did suggest how it can be fixed? You mean
> 20130506214515.GL4361@tamriel.snowman.net?

I suggested how it might be done. :)  There's undoubtably issues with an
all-database-objects lock, but it would certainly reduce the time
between transaction start and getting all the locks acquired and shrink
the window that much more.  If we did implement such a beast, how could
we ensure that the locks were taken immediately after transaction start
if the snapshot is being passed to pg_dump?  Basically, if we *did*
solve this issue for pg_dump in some way in the future, how would we use
it if pg_dump can accept an outside snapshot?

One other thought did occur to me- we could simply have pg_dump export
the snapshot that it gets to stdout, a file, whatever, and systems which
are trying to do this magic "everyone gets the same view" could glob
onto the snapshot created by pg_dump, after all the locks have been
acquired..
Thanks,
    Stephen

Re: pg_dump --snapshot

From
Stephen Frost
Date:
* Simon Riggs (simon@2ndQuadrant.com) wrote:
> On 7 May 2013 01:18, Stephen Frost <sfrost@snowman.net> wrote:
> > * Simon Riggs (simon@2ndQuadrant.com) wrote:
> >> If anybody really wanted to fix pg_dump, they could do. If that was so
> >> important, why block this patch, but allow parallel pg_dump to be
> >> committed without it?
> >
> > Because parallel pg_dump didn't make the problem any *worse*..?  This
> > does.
>
> Sorry, not accurate. Patch makes nothing *worse*.

I really don't understand that claim.  It clearly increases the time
between transaction start and when the locks are all acquired.  If this
is used programatically (instead of someone hand-typing commands into
psql), perhaps it doesn't increase it very much, but increasing it at
all is clearly going in the "worse" direction.

> The existing API *can* be misused in the way you say, and so also
> could pg_dump if the patch is allowed.

And if the patch isn't allowed, then pg_dump isn't any worse off than
it's always been.  The corrollary is that pg_dump *is* worse off if the
patch goes in.  The patch is absolutely *encouraging* such misuse of
this API because there's almost no other way to use this new option to
pg_dump except as an abuse of the API.

> However, there is no reason to suppose that such misuse would be
> common;

The concern is exactly this point.  It wouldn't be simply common, were
this patch applied and the option used, it would be the predominant
case and tools would be built which either completely ignore the problem
(most likely) or which attempt to duplicate what pg_dump already does,
likely very poorly.

> no reason why a timing gap would *necessarily* occur in the
> way your previous example showed, or if it did why it would
> necessarily present a problem for the user. Especially if we put
> something in the docs.

I don't believe papering over this with documentation is really a
solution.  I did suggest a few other options which don't seem to be
getting much traction from anyone, so perhaps they're not workable, but
I'll reiterate them for fun anyway:

Provide a way for users (perhaps even pg_dump) to acquire locks on
essentially everything in the DB.  This could be a plpgsql function or
perhaps even a new lock type.  Grander ideas might also support the
filtering options which pg_dump supports.

Have pg_dump output the snapshot which it has acquired and shared, for
others to then connect and use.  It would do this after acquiring all of
the locks, of course.

> Snapshots have no WARNING on them.

The documentation is certainly fallible, though this is certainly a
problem which all applications need to address in some way.  We address
it as best we can in pg_dump today and the goal is to continue to do so.

> There is no guarantee in any
> transaction that the table you want will not be dropped before you try
> to access it.

Hence why we do the best we can in pg_dump by immediately locking the
objects, once we've identified what they are.

> > However,
> > there really isn't a way for users to use this new option correctly-
>
> Not accurate.

Then please articulate how they would use it correctly?  Would they
write a function which goes and acquires all of the locks which pg_dump
would (assuming they've figured out what locks pg_dump needs), prior to
exporting the snapshot and calling pg_dump?  If so, perhaps we could
simplify their lives by providing such a function for them to use
instead of asking each user to write it?

> The argument against this is essentially that we don't trust the user
> to use it well, so we won't let them have it at all. Which makes no
> sense since they already have this API and don't need our permission
> to use it. All that blocking this patch does is to remove any chance
> the user has of coordinating pg_dump with other actions; preventing
> that causes more issues for the user and so doing nothing is not a
> safe or correct either. A balanced viewpoint needs to include the same
> level of analysis on both sides, not just a deep look at the worst
> case on one side and claim everything is rosy with the current
> situation.

I'm not sure where the claim that everything is rosy was made; I've
certainly not seen it nor made any claim that it is.  This whole
discussion started with Tom pointing out that pg_dump is essentially
already busted and that this change would end up making that situation
worse.  Perhaps in very select cases users will do the right thing and
acquire all the locks they need before exporting the snapshot and
calling pg_dump, but that description of how to use this option
correctly certainly wasn't articulated anywhere in the initial
description or contents of the patch.  There's also nothing done to make
that any easier to do nor a way to move users using this system to a new
methodology when we figure out how to make pg_dump do this better, thus
putting us in a situation of *always* having this issue, even if we fix
it for 'normal' pg_dump.
Thanks,
    Stephen

Re: pg_dump --snapshot

From
Andres Freund
Date:
Hi,

On 2013-05-07 08:54:54 -0400, Stephen Frost wrote:
> * Andres Freund (andres@2ndquadrant.com) wrote:
> > > All of which I
> > > think I agree with, but I don't agree with the conclusion that this
> > > larger window is somehow acceptable because there's a very small window
> > > (one which can't be made any smaller, today..) which exists today.
> >
> > The window isn't that small currently:
>
> Agreed- but it also isn't currently possible to make it any smaller.

Uh. Why not? I think this is what needs to be fixed instead of making
the hole marginally smaller elsewhere. You can trivially reproduce the
problem with pg_dump today:

S1:
$ psql postgres
=# CREATE DATABASE pgdumptest;
=# CREATE DATABASE pgrestoretest;
=# \c pgdumptest
=# CREATE TABLE tbl(id serial primary key, data_a int8, data_b float8);
=# INSERT INTO tbl(data_a, data_b) SELECT random()::int, random() FROM generate_series(1, 10);
=# BEGIN;
=# ALTER TABLE tbl RENAME COLUMN data_a TO data_swap;
=# ALTER TABLE tbl RENAME COLUMN data_b TO data_a;
=# ALTER TABLE tbl RENAME COLUMN data_swap TO data_b;

S2:
$ pg_dump pgdumptest > /tmp/pg_dump.sql

S1:
=# COMMIT;

S2:
$ psql pgrestoretest -f /tmp/pgdump.sql
psql:/tmp/pgdump.sql:87: ERROR:  invalid input syntax for integer: "0.944006722886115313"
CONTEXT:  COPY tbl, line 1, column data_a: "0.944006722886115313"

A ddl upgrade script taking some seconds isn't exactly anything
unusual...

> > > Alright, then let's provide a function which will do that and tell
> > > people to use it instead of just using pg_export_snapshot(), which
> > > clearly doesn't do that.
> >
> > If it were clear cut what to lock and we had locks for
> > everything. Maybe. But we don't have locks for everything.
>
> My suggestion was to lock everything that pg_dump locks, which we
> clearly have locks for since pg_dump is acquiring them.  Also, I don't
> believe it'd be that difficult to identify what pg_dump would lock, at
> least in a 'default' whole-database run.  This is more of a stop-gap
> than a complete solution.

The problem is that locking - as shown above - doesn't really help all
that much. You would have to do it like:
1) start txn
2) acquire DDL prevention lock
3) assert we do not yet have a snapshot
4) acquire snapshot
5) lock objects
6) release DDL lock
7) dump objects/data
8) commit txn

Unfortunately most of these steps cannot easily/safely exposed to
sql. And again, this is a very old situation, that doesn't really have
to do anything with snapshot exporting.

> > So we would
> > need to take locks preventing any modification on any of system catalogs
> > which doesn't really seem like a good thing, especially as we can't
> > release them from sql during the dump were we can allow creation of
> > temp tables and everything without problems.
>
> That's already an issue when pg_dump runs, no?  Not sure why this is
> different.

pg_dump doesn't prevent you from running CREATE TEMPORARY TABLE? That
would make it unrunnable in many situations. Especially as we cannot
easily (without using several connections at once) release locks before
ending a transaction.

> > > I believe the main argument here is really around "you should think
> > > about these issues before just throwing this in" and not "it must be
> > > perfect before it goes in".  Perhaps "it shouldn't make things *worse*
> > > than they are now" would also be apt..
> >
> > That's not how I read 8465.1367860037@sss.pgh.pa.us :(
>
> I believe the point that Tom is making is that we shouldn't paint
> ourselves into a corner by letting users provide old snapshots to
> pg_dump which haven't acquired any of the necessary locks.  The goal, at
> least as I read it, is to come up with a workable design (and I don't
> know that we have, but still) which provides a way for the locks to be
> taken at least as quickly as what pg_dump does today and which we could
> modify down the road to take the locks pre-snapshot (presuming we can
> figure out a way to make that work).

> The proposed patch certainly doesn't make any attempt to address that
> issue and would encourage users to open themselves up to this risk more
> than they are exposted today w/ pg_dump.

I fail to see a difference that is big enough to worry overly much
about. The above problem is easy enough to encounter without any
snapshot exporting and I can't remember a single report.

> > I think there is no point in fixing it somewhere else. The problem is in
> > pg_dump, not the snapshot import/export.
>
> It's really a problem for just about everything that uses transactions
> and locking, isn't it?  pg_dump just happens to have it worst since it
> wants to go and touch every object in the database.  It's certainly
> possible for people to connect to the DB, look at pg_class and then try
> to access some object which no longer exists (even though it's in
> pg_class).

Well, normal sql shouldn't need to touch pg_class and will know
beforehand which locks it will need. But I have to say I more than once
wished we would throw an error if an objects definition is "newer" than
the one we started out with.

>  This will be an interesting thing to consider when
> implementing MVCC for the catalog.

I think using proper mvcc snapsot for catalog scans doesn't, cannot even
in all case, imply having to use the user's transaction's snapshot, just
one that guarantees a consistent result while a query is running.

> > You did suggest how it can be fixed? You mean
> > 20130506214515.GL4361@tamriel.snowman.net?
>
> I suggested how it might be done. :)  There's undoubtably issues with an
> all-database-objects lock, but it would certainly reduce the time
> between transaction start and getting all the locks acquired and shrink
> the window that much more.  If we did implement such a beast, how could
> we ensure that the locks were taken immediately after transaction start
> if the snapshot is being passed to pg_dump?  Basically, if we *did*
> solve this issue for pg_dump in some way in the future, how would we use
> it if pg_dump can accept an outside snapshot?

I am not sure if the correct fix is locking and not just making sure the
definition of objects hasn't changed since the snapshot started. But if
we go for locking creating a function which makes sure that the source
transaction has a certain strong lock wouldn't be that hard. We have all
the data for it.

> One other thought did occur to me- we could simply have pg_dump export
> the snapshot that it gets to stdout, a file, whatever, and systems which
> are trying to do this magic "everyone gets the same view" could glob
> onto the snapshot created by pg_dump, after all the locks have been
> acquired..

Several problems:
a) exporting a snapshot to a file was discussed and deemed unacceptable
risky. That's why pg_export_snapshot() exports it itself into some
internal format somewhere. The user doesn't need to know where/how.
b) When importing a snapshot the source transaction needs to be alive,
otherwise we cannot guarantee the global xmin hasn't advanced too
much. That would open up very annoying race-conditions because pg_dump
would need to live long enough for the separate transaction to import
data.
c) Quite possibly the snapshot we need needs to meet some special
criterions that pg_dump cannot guarantee.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_dump --snapshot

From
Stephen Frost
Date:
* Andres Freund (andres@2ndquadrant.com) wrote:
> On 2013-05-07 08:54:54 -0400, Stephen Frost wrote:
> > Agreed- but it also isn't currently possible to make it any smaller.
>
> Uh. Why not? I think this is what needs to be fixed instead of making
> the hole marginally smaller elsewhere.

If we're able to fix it- how would we allow users to take advantage of
that fix when starting their own snapshot and feeding it to pg_dump?

> You can trivially reproduce the
> problem with pg_dump today:

Agreed.  The idea was to simply avoid making it worse.  Your argument
seems to be that it's already horrible and easily broken and therefore
we can go ahead and make it worse and no one will complain because no
one has complained about how bad it is already.  I don't follow that.

> > My suggestion was to lock everything that pg_dump locks, which we
> > clearly have locks for since pg_dump is acquiring them.  Also, I don't
> > believe it'd be that difficult to identify what pg_dump would lock, at
> > least in a 'default' whole-database run.  This is more of a stop-gap
> > than a complete solution.
>
> The problem is that locking - as shown above - doesn't really help all
> that much.

It helps in that once we have the lock, things aren't changing under us.
The closer we can keep that to when the transaction starts, the better..

> You would have to do it like:
> 1) start txn
> 2) acquire DDL prevention lock
> 3) assert we do not yet have a snapshot
> 4) acquire snapshot
> 5) lock objects
> 6) release DDL lock
> 7) dump objects/data
> 8) commit txn

We'd need a '4.5' to get the list of objects, no?

> Unfortunately most of these steps cannot easily/safely exposed to
> sql. And again, this is a very old situation, that doesn't really have
> to do anything with snapshot exporting.

Perhaps we can't expose the individual components, but perhaps we could
have a function which does some/all of the pieces (except the pieces
that must be done by pg_dump) and which users need to grant explicit
execute rights to for their "backup" user?

> pg_dump doesn't prevent you from running CREATE TEMPORARY TABLE? That
> would make it unrunnable in many situations. Especially as we cannot
> easily (without using several connections at once) release locks before
> ending a transaction.

I agree that we wouldn't want pg_dump preventing all catalog updates,
but wouldn't we be able to draw a distinction between objects being
modified and objects being added?

> > It's really a problem for just about everything that uses transactions
> > and locking, isn't it?  pg_dump just happens to have it worst since it
> > wants to go and touch every object in the database.  It's certainly
> > possible for people to connect to the DB, look at pg_class and then try
> > to access some object which no longer exists (even though it's in
> > pg_class).
>
> Well, normal sql shouldn't need to touch pg_class and will know
> beforehand which locks it will need. But I have to say I more than once
> wished we would throw an error if an objects definition is "newer" than
> the one we started out with.

I agree- that would be nice.  I'd also contend that 'normal' SQL quite
often looks at pg_class; I know that we have tons of code which does.
We also basically lock all users out of the database when we're doing
DDL changes to avoid any issues, but it'd certainly be nice if we didn't
have to..

> >  This will be an interesting thing to consider when
> > implementing MVCC for the catalog.
>
> I think using proper mvcc snapsot for catalog scans doesn't, cannot even
> in all case, imply having to use the user's transaction's snapshot, just
> one that guarantees a consistent result while a query is running.

The hope would be to see a consistent view of what you can access while
the transaction is running..

> I am not sure if the correct fix is locking and not just making sure the
> definition of objects hasn't changed since the snapshot started.

I'm guessing that the intent of the locking is to make sure that the
objects don't change under us. :)  The problem, as you've explained, is
that the object might change before we get our lock in place.  That
window of opportunity gets a lot larger when it's moved outside of our
control.

> But if
> we go for locking creating a function which makes sure that the source
> transaction has a certain strong lock wouldn't be that hard. We have all
> the data for it.

Agreed.

> a) exporting a snapshot to a file was discussed and deemed unacceptable
> risky. That's why pg_export_snapshot() exports it itself into some
> internal format somewhere. The user doesn't need to know where/how.

It wasn't my intent to export the 'snapshot' to a file but rather to
simply dump out what pg_dump gets when it runs pg_export_snapshot(),
allowing others to connect in and use that snapshot.

> b) When importing a snapshot the source transaction needs to be alive,
> otherwise we cannot guarantee the global xmin hasn't advanced too
> much. That would open up very annoying race-conditions because pg_dump
> would need to live long enough for the separate transaction to import
> data.

Yes, there's some race-condition risk here, but I don't think that's
insoluble (wait till the file disappears before releasing the
connection?).  I agree that it might not be pretty either.

> c) Quite possibly the snapshot we need needs to meet some special
> criterions that pg_dump cannot guarantee.

That's a much more difficult case, of course.  It would help to have a
better understanding of what, exactly, Simon's use-case for this feature
is to answer if this is an issue or not.
Thanks,
    Stephen

Re: pg_dump --snapshot

From
Greg Stark
Date:
On Tue, May 7, 2013 at 1:54 PM, Stephen Frost <sfrost@snowman.net> wrote:
> I believe the point that Tom is making is that we shouldn't paint
> ourselves into a corner by letting users provide old snapshots to
> pg_dump which haven't acquired any of the necessary locks.  The goal, at
> least as I read it, is to come up with a workable design (and I don't
> know that we have, but still) which provides a way for the locks to be
> taken at least as quickly as what pg_dump does today and which we could
> modify down the road to take the locks pre-snapshot (presuming we can
> figure out a way to make that work).

One natural way to do it would be to make an option to pg_dump which
caused it to do all the normal pre-dump things it would normally do,
then export a snapshot and wait for the user. (Alternately it could
even create a prepared transaction which iirc keeps the locks until
it's committed). That gives users a way to get a snapshot that is
guaranteed to work until that transaction is exited.

But I don't think that would really make users happy. I think the
usual use case for this  feature would be to dump a single table or
small number of tables as of some time in the past that they didn't
plan in advance that they would need. They might have a cron job
periodically export a snapshot "just in case" and then want to use it
later. They wouldn't be happy if they had to create a prepared
transaction for each such snapshot which locked every table in their
database until they decide they don't actually need it. That would
mean they could never do any ddl.

The use case of wanting to dump a single table as of a few hours ago
(e.g. before some application data loss bug) is pretty compelling. If
we could do it it I think it would be worth quite a bit.

What's the worst case for using an old snapshot? If I try to access a
table that doesn't exist any longer I'll get an error. That doesn't
really seem that bad for the use case I described. It's worse for the
full table dump but for an explicit list of tables, eh. Seems ok to
me.

If I try to access a table whose schema has changed then I might use
the wrong tupledesc  and see rows that don't decode properly. That
would be a disaster. Can we protect against that by noticing that the
pg_class row isn't visible to our snapshot and throw an error? Would
that be sufficient to protect against all schema changes? Would it
cause massive false positives based on whether vacuum had happened to
have run recently?



-- 
greg



Re: pg_dump --snapshot

From
Stephen Frost
Date:
Greg,

* Greg Stark (stark@mit.edu) wrote:
> One natural way to do it would be to make an option to pg_dump which
> caused it to do all the normal pre-dump things it would normally do,
> then export a snapshot and wait for the user. (Alternately it could
> even create a prepared transaction which iirc keeps the locks until
> it's committed). That gives users a way to get a snapshot that is
> guaranteed to work until that transaction is exited.

Right, that was one of the suggestions that I made up-thread a bit.

> But I don't think that would really make users happy. I think the
> usual use case for this  feature would be to dump a single table or
> small number of tables as of some time in the past that they didn't
> plan in advance that they would need. They might have a cron job
> periodically export a snapshot "just in case" and then want to use it
> later.

The snapshot has to be 'alive' in order to be joined, so unless we're
changing something else, I don't think this cronjob approach would
actually work (unless it forks off and keeps the transaction open, but
I didn't read that from what you wrote..).

> They wouldn't be happy if they had to create a prepared
> transaction for each such snapshot which locked every table in their
> database until they decide they don't actually need it. That would
> mean they could never do any ddl.

If they don't lock the table then any DDL they do would break the backup
*anyway*.  This is a pretty clear example of exactly the problem with
simply adding this option to pg_dump..  The view of pg_class would be
from when the snapshot was taken, but without locks, anything could have
changed between then and when the backup tries to run.

> The use case of wanting to dump a single table as of a few hours ago
> (e.g. before some application data loss bug) is pretty compelling. If
> we could do it it I think it would be worth quite a bit.

I certainly like the idea, but it seems rather beyond what Simon was
after with this patch, aiui, and does come with some other concerns like
dealing with what happens if the table was modified after the snapshot
was taken (possible because there wasn't a lock on it).

> What's the worst case for using an old snapshot? If I try to access a
> table that doesn't exist any longer I'll get an error. That doesn't
> really seem that bad for the use case I described. It's worse for the
> full table dump but for an explicit list of tables, eh. Seems ok to
> me.

I'm not convinced that risk of error is the only issue with this..  What
if a column was dropped and then a new one put in its place (with the
same name)?

> If I try to access a table whose schema has changed then I might use
> the wrong tupledesc  and see rows that don't decode properly. That
> would be a disaster. Can we protect against that by noticing that the
> pg_class row isn't visible to our snapshot and throw an error? Would
> that be sufficient to protect against all schema changes? Would it
> cause massive false positives based on whether vacuum had happened to
> have run recently?

The way this is handled now is that we use syscache to get whatever the
current view of the object is when we're running queries against it.
The problem is that the 'current' view ends up being different from what
we see in pg_class / pg_attribute.  Andres provided a good example which
illustrates this upthread.
Thanks,
    Stephen

Re: pg_dump --snapshot

From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes:
> [ ideas about dumping some past state of a table ]

> If I try to access a table whose schema has changed then I might use
> the wrong tupledesc  and see rows that don't decode properly. That
> would be a disaster. Can we protect against that by noticing that the
> pg_class row isn't visible to our snapshot and throw an error? Would
> that be sufficient to protect against all schema changes? Would it
> cause massive false positives based on whether vacuum had happened to
> have run recently?

No, no, and I'm not sure :-(.  It might be sufficient to notice whether
the pg_class row and all relevant pg_attribute rows are visible in your
snapshot, at least for the narrow purpose of deciding whether you can
dump data.  (This would probably not, for instance, be enough to give
you reliable info about check or foreign key constraints.)

In general though, any such facility would surely block vacuuming on
the table, indeed probably *all* tables in the database, which would
be pretty disastrous in the long run.  I think a better answer for
people who need such a facility is to keep a WAL archive and use PITR
when they actually need to get back yesterday's data.
        regards, tom lane



Re: pg_dump --snapshot

From
Robert Haas
Date:
On Mon, May 6, 2013 at 1:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Exported snapshots allow you to coordinate a number of actions
>> together, so they all see a common view of the database. So this patch
>> allows a very general approach to this, much more so than pg_dump
>> allows currently since the exact timing of the snapshot is not
>> controlled by the user.
>
> I'm afraid that this is institutionalizing a design deficiency in
> pg_dump; namely that it takes its snapshot before acquiring locks.
> Ideally that would happen the other way around.  I don't have a good
> idea how we could fix that --- but a feature that allows imposition
> of an outside snapshot will permanently foreclose ever fixing it.
>
> What's more, this would greatly widen the risk window between when
> the snapshot is taken and when we have all the locks and can have
> some confidence that the DB isn't changing under us.

I don't find this argument very convincing.  The way you could fix the
ordering problem is:

1. take locks on all the objects you think you need to dump
2. update your snapshot and check whether the list of objects you
think you need to dump has changed
3. if yes, go to step 1, else done

Now, I'll grant you that this technique couldn't be used together with
the proposed --snapshot option, but so what?  Many people do DDL
infrequently enough that this is not a problem in practice.  But even
if it is a problem, I don't see why that can't simply be a documented
limitation of --snapshot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_dump --snapshot

From
Dimitri Fontaine
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, May 6, 2013 at 1:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm afraid that this is institutionalizing a design deficiency in
>> pg_dump; namely that it takes its snapshot before acquiring locks.
>> Ideally that would happen the other way around.  I don't have a good
>> idea how we could fix that --- but a feature that allows imposition
>> of an outside snapshot will permanently foreclose ever fixing it.

I don't even understand how you could take locks for a later snapshot,
it seems to me you're talking about something much harder to design and
code than making the catalogs fully MVCC compliant.

>> What's more, this would greatly widen the risk window between when
>> the snapshot is taken and when we have all the locks and can have
>> some confidence that the DB isn't changing under us.

Rather than take some locks, you can now prevent the database objects
from changing with an event trigger. pg_dump could install that event
trigger in a preparing transaction, then do its work as currently, then
when done either remove or disable the event trigger.

All the event trigger has to do is unconditionnaly raise an exception
with a message explaining that no DDL command is accepted during when a
dump is in progress.

> Now, I'll grant you that this technique couldn't be used together with
> the proposed --snapshot option, but so what?  Many people do DDL
> infrequently enough that this is not a problem in practice.  But even
> if it is a problem, I don't see why that can't simply be a documented
> limitation of --snapshot.

IME it's a problem in practice, because people tend to want to run their
pg_dump and their rollouts within the same maintenance window. I mean,
those lucky guys out there having such thing as a maintenance window.

Again, it seems to me that the proper long term solution is both fully
MVCC catalogs and reduced locking for most commands.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: pg_dump --snapshot

From
Andres Freund
Date:
On 2013-05-07 11:01:48 -0400, Stephen Frost wrote:
> * Andres Freund (andres@2ndquadrant.com) wrote:
> > On 2013-05-07 08:54:54 -0400, Stephen Frost wrote:
> > > Agreed- but it also isn't currently possible to make it any smaller.
> > 
> > Uh. Why not? I think this is what needs to be fixed instead of making
> > the hole marginally smaller elsewhere. 
> 
> If we're able to fix it- how would we allow users to take advantage of
> that fix when starting their own snapshot and feeding it to pg_dump?

Depends on the fix. I think the most realistic one is making sure
central definitions haven't changed. In that case they wouldn't have to
do anything.
If we get some DDL lock or something they would probably need to do
something like SELECT pg_prepare_database_dump(); or something - likely
the same pg_dump would use.

> > You can trivially reproduce the
> > problem with pg_dump today:
> 
> Agreed.  The idea was to simply avoid making it worse.  Your argument
> seems to be that it's already horrible and easily broken and therefore
> we can go ahead and make it worse and no one will complain because no
> one has complained about how bad it is already.  I don't follow that.

It doesn't change *anything* with the fundamental problems. We could
also say we forbid ALTER TABLE taking exlusive locks because that makes
the problem far more noticeable.

> > > My suggestion was to lock everything that pg_dump locks, which we
> > > clearly have locks for since pg_dump is acquiring them.  Also, I don't
> > > believe it'd be that difficult to identify what pg_dump would lock, at
> > > least in a 'default' whole-database run.  This is more of a stop-gap
> > > than a complete solution.
> > 
> > The problem is that locking - as shown above - doesn't really help all
> > that much.
> 
> It helps in that once we have the lock, things aren't changing under us.
> The closer we can keep that to when the transaction starts, the better..

If you look at my example the timing where we take the snapshot isn't
the problem. While we wait for a lock on one object the not-yet-locked
objects can still change, get dropped et al. That window is so big that
the timing around the snapshot acquiration and trying to get the first
lock is insignificant. Remember this is only a problem *if* there is
concurrent DDL. And in that case we very, very likely will have access
exlusive locks and thus will wait for them and have the described
problem.

> > You would have to do it like:
> > 1) start txn
> > 2) acquire DDL prevention lock
> > 3) assert we do not yet have a snapshot
> > 4) acquire snapshot
> > 5) lock objects
> > 6) release DDL lock
> > 7) dump objects/data
> > 8) commit txn
> 
> We'd need a '4.5' to get the list of objects, no?

Yea, I had folded that into lock objects.

> > pg_dump doesn't prevent you from running CREATE TEMPORARY TABLE? That
> > would make it unrunnable in many situations. Especially as we cannot
> > easily (without using several connections at once) release locks before
> > ending a transaction.
> 
> I agree that we wouldn't want pg_dump preventing all catalog updates,
> but wouldn't we be able to draw a distinction between objects being
> modified and objects being added?

I don't easily see how. Often enough object creation modifies rows at
some point. So we would need intelligence at some higher level. I guess
it might be possible to reuse the event trigger infrastructure if it
could do all that already...

> > >  This will be an interesting thing to consider when
> > > implementing MVCC for the catalog.
> > 
> > I think using proper mvcc snapsot for catalog scans doesn't, cannot even
> > in all case, imply having to use the user's transaction's snapshot, just
> > one that guarantees a consistent result while a query is running.
> 
> The hope would be to see a consistent view of what you can access while
> the transaction is running..

I don't think thats entirely possible. Think e.g. of constraints,
determination of HOTability, ... All those need to look at the database
state as its valid now, not as it was valid back when our snapshot
started.

> > c) Quite possibly the snapshot we need needs to meet some special
> > criterions that pg_dump cannot guarantee.
> 
> That's a much more difficult case, of course.  It would help to have a
> better understanding of what, exactly, Simon's use-case for this feature
> is to answer if this is an issue or not.

I am sure its an issue for at least one of Simon's use cases. Because
one of them is also mine ;). But I think there are many usecases that
require this.

In the logical decoding patches (submitted previously, will get
resubmitted when 9.3 is getting somewhere), we export a snapshot once we
have read enough WAL that all changes from that point henceforth can be
streamed out. That snapshot is obviously very useful to get a new node
up and running.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_dump --snapshot

From
Andres Freund
Date:
On 2013-05-07 16:50:52 +0100, Greg Stark wrote:
> What's the worst case for using an old snapshot? If I try to access a
> table that doesn't exist any longer I'll get an error. That doesn't
> really seem that bad for the use case I described. It's worse for the
> full table dump but for an explicit list of tables, eh. Seems ok to
> me.

Its worth than that, you can get a dump that dumps successfully but
doesn't restore:
http://archives.postgresql.org/message-id/20130507141526.GA6117%40awork2.anarazel.de

But that's not really related to snapshots. And not related to the
patch. Imo the whole focus on the time between snapshot taking and
taking the locks is a misguided and not really the problem.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_dump --snapshot

From
bricklen
Date:
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Tue, May 7, 2013 at 10:02 AM, Dimitri
Fontaine<span dir="ltr"><<a href="mailto:dimitri@2ndquadrant.fr"
target="_blank">dimitri@2ndquadrant.fr</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><div id=":6q5">Rather than take some locks, you can now prevent the
databaseobjects<br /> from changing with an event trigger. pg_dump could install that event<br /> trigger in a
preparingtransaction, then do its work as currently, then<br /> when done either remove or disable the event
trigger.<br/><br /> All the event trigger has to do is unconditionnaly raise an exception<br /> with a message
explainingthat no DDL command is accepted during when a<br /> dump is in progress.</div></blockquote></div><br /><br
/></div><divclass="gmail_extra">I'm thinking of a case where a hot standby is executing a pg_dump and DDL is issued on
themaster -- would that cause any unexpected problems on the hot standby?<br /></div></div> 

Re: pg_dump --snapshot

From
Andres Freund
Date:
On 2013-05-06 13:07:17 -0400, Tom Lane wrote:
> I'm afraid that this is institutionalizing a design deficiency in
> pg_dump; namely that it takes its snapshot before acquiring locks.

I have suggested this before, but if pg_dump would use SELECT FOR SHARE
in the queries it uses to build DDL it would detect most if not all
modifications for most database objects including tables. Sure, it would
error out, but thats far better than a silently corrupt dump:

S1: =# CREATE TABLE testdump();
S2: =# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
S2: =# SELECT count(*) FROM pg_class; --acquire snapshot
S1: =# ALTER TABLE testdump ADD COLUMN a text;
S2: =#
-# SELECT * FROM pg_class cls
-#     JOIN pg_attribute att ON (cls.oid = att.attrelid)
-#     WHERE cls.oid = 'testdump'::regclass FOR UPDATE
ERROR: could not serialize access due to concurrent update

The serialization failure could be caught and translated into some error
message explaining that concurrent ddl prevented pg_dump from working
correctly. I don't immediately see a case where that would prevent valid
backups from working.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_dump --snapshot

From
Stephen Frost
Date:
* Andres Freund (andres@2ndquadrant.com) wrote:
> > It helps in that once we have the lock, things aren't changing under us.
> > The closer we can keep that to when the transaction starts, the better..
>
> If you look at my example the timing where we take the snapshot isn't
> the problem. While we wait for a lock on one object the not-yet-locked
> objects can still change, get dropped et al. That window is so big that
> the timing around the snapshot acquiration and trying to get the first
> lock is insignificant.

I wasn't talking about just getting the first lock but rather all the
locks..  I agree that we can get stuck behind something else which is
already holding a lock and that's certainly a problem.

> Remember this is only a problem *if* there is
> concurrent DDL. And in that case we very, very likely will have access
> exlusive locks and thus will wait for them and have the described
> problem.

Yes, I understand that issue.

> I don't think thats entirely possible. Think e.g. of constraints,
> determination of HOTability, ... All those need to look at the database
> state as its valid now, not as it was valid back when our snapshot
> started.

There will certainly still need to be parts of the system which are
using SnapshotNow, yes.  Catalog MVCC is a rather massive discussion
which isn't going to be solved here.

> In the logical decoding patches (submitted previously, will get
> resubmitted when 9.3 is getting somewhere), we export a snapshot once we
> have read enough WAL that all changes from that point henceforth can be
> streamed out. That snapshot is obviously very useful to get a new node
> up and running.

At that point, you take a snapshot and want to export it for pg_dump to
use to get the same view of the DB and then dump/restore it into a new
database where you'll then start applying changes from the logical
replication..?  Interesting, but certainly also quite a specialized
use-case, no?  How do you plan to handle the locking concerns which have
been raised during this discussion?  Do you take locks out before
creating the snapshot to pass to pg_dump?
Thanks,
    Stephen