Thread: Retrieve the snapshot's LSN
Hello everyone, I would need to start a read repeatable transaction and retrieve the corresponding LSN. I'm looking for pointers or Ideas on how to achieve this. Andres F. suggested me to extend pg_export_snapshot() [1] and call GetLatestSnapshot() [2] while reliably retrieving the current LSN. Should I call GetXLogWriteRecPtr() [3] for that ? What lock(s) could I take to synchronize the two calls? Any other Idea ? A snapshot is exported when creating a logical replication slot [4] and the corresponding LSN is also returned [5]. This is what I need except that I'd rather prefer to not create a replication slot each time I need the snapshot. During slot creation, the snapshot building and exporting code seems highly coupled with the logical decoding stuff. It doesn't seems much reusable to retrieve the snapshot's LSN outside of logical decoding. Thank you for your help, References: [1] pg_export_snapshot() https://github.com/postgres/postgres/blob/aa9eac45ea868e6ddabc4eb076d18be10ce84c6a/src/backend/utils/time/snapmgr.c#L1111 [2] GetLatestSnapshot() https://github.com/postgres/postgres/blob/aa9eac45ea868e6ddabc4eb076d18be10ce84c6a/src/backend/utils/time/snapmgr.c#L259 [3] GetXLogWriteRecPtr() https://github.com/postgres/postgres/blob/7b156c1e0746a46d083d7dbcd28afb303b3484ef/src/backend/access/transam/xlog.c#L10616 [4] Exported snapshot in logical replication slot creation https://github.com/postgres/postgres/blob/aa9eac45ea868e6ddabc4eb076d18be10ce84c6a/src/backend/replication/walsender.c#L815 /* build initial snapshot, might take a while */ DecodingContextFindStartpoint(ctx); /* * Export a plain (not of the snapbuild.c type) snapshot to the user * that can be imported into another session. */ snapshot_name = SnapBuildExportSnapshot(ctx->snapshot_builder); [5] Consistent point LSN in logical replication slot creation: https://github.com/postgres/postgres/blob/aa9eac45ea868e6ddabc4eb076d18be10ce84c6a/src/backend/replication/walsender.c#L831 snprintf(xpos, sizeof(xpos), "%X/%X", (uint32) (MyReplicationSlot->data.confirmed_flush >> 32), (uint32) MyReplicationSlot->data.confirmed_flush); ...cut... /* second field: LSN at which we became consistent */ pq_sendstring(&buf, "consistent_point"); /* col name */ ...cut.... /* consistent wal location */ pq_sendint(&buf, strlen(xpos), 4); /* col2 len */ pq_sendbytes(&buf, xpos, strlen(xpos)); -- Florent
On Wed, Jul 15, 2015 at 12:51 PM, Florent Guiliani <florent@guiliani.fr> wrote: > During slot creation, the snapshot building and exporting code seems > highly coupled with the logical decoding stuff. It doesn't seems much > reusable to retrieve the snapshot's LSN outside of logical decoding. I don't think "the snapshot's LSN" has a well-defined meaning in general. The obvious meaning would be "the LSN such that all commits prior to that LSN are visible and all later commits are invisible", but such an LSN need not exist. Suppose A writes a commit record at LSN 0/10000, and then B writes a commit record at 0/10100, and then B calls ProcArrayEndTransaction(). At this point, B is visible and A is not visible, even though A's commit record precedes that of B. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-07-16 12:40:07 -0400, Robert Haas wrote: > On Wed, Jul 15, 2015 at 12:51 PM, Florent Guiliani <florent@guiliani.fr> wrote: > > During slot creation, the snapshot building and exporting code seems > > highly coupled with the logical decoding stuff. It doesn't seems much > > reusable to retrieve the snapshot's LSN outside of logical decoding. > > I don't think "the snapshot's LSN" has a well-defined meaning in > general. The obvious meaning would be "the LSN such that all commits > prior to that LSN are visible and all later commits are invisible", > but such an LSN need not exist. Suppose A writes a commit record at > LSN 0/10000, and then B writes a commit record at 0/10100, and then B > calls ProcArrayEndTransaction(). At this point, B is visible and A is > not visible, even though A's commit record precedes that of B. Well, in combination with logical decoding it kinda has one: It should allow you to take a dump of the database with a certain snapshot and replay all transactions with a commit lsn bigger than the "snapshot's lsn" and end up with a continually consistent database. Visibility for HS actually works precisely in commit LSN order, even if that is possibly different than on the primary...
On Thu, Jul 16, 2015 at 12:54 PM, Andres Freund <andres@anarazel.de> wrote: > Well, in combination with logical decoding it kinda has one: It should > allow you to take a dump of the database with a certain snapshot and > replay all transactions with a commit lsn bigger than the "snapshot's > lsn" and end up with a continually consistent database. > > Visibility for HS actually works precisely in commit LSN order, even if > that is possibly different than on the primary... That makes sense, and hopefully answers Florent's question about why this is only exposed through the slot mechanism. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-07-16 13:08:48 -0400, Robert Haas wrote: > On Thu, Jul 16, 2015 at 12:54 PM, Andres Freund <andres@anarazel.de> wrote: > > Well, in combination with logical decoding it kinda has one: It should > > allow you to take a dump of the database with a certain snapshot and > > replay all transactions with a commit lsn bigger than the "snapshot's > > lsn" and end up with a continually consistent database. > > > > Visibility for HS actually works precisely in commit LSN order, even if > > that is possibly different than on the primary... > > That makes sense, and hopefully answers Florent's question about why > this is only exposed through the slot mechanism. Trying to swap-in the pub conversion, I think Florent wanted to be able to re-sync a standby from an existing slot. Which kinda makes sense to me. We could do something like SELECT * FROM pg_export_snapshot_for_slot(...); which would return the snapshot name and the LSN. There'd need to be some finicky locking to get that, but it should b epossible. Andres
On Thu, Jul 16, 2015 at 6:40 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I don't think "the snapshot's LSN" has a well-defined meaning in > general. The obvious meaning would be "the LSN such that all commits > prior to that LSN are visible and all later commits are invisible", I like this definition. > but such an LSN need not exist. Suppose A writes a commit record at > LSN 0/10000, and then B writes a commit record at 0/10100, and then B > calls ProcArrayEndTransaction(). At this point, B is visible and A is > not visible, even though A's commit record precedes that of B. Maybe that's what Andres referred as "doable with some finicky locking". There is some race conditions to build a snapshot with an associated consistent LSN. If I understand your example, A is supposed to call ProcArrayEndTransaction() anytime soon. Could we wait/lock until it happens? -- Florent
On Fri, Jul 17, 2015 at 4:16 AM, Florent Guiliani <florent@guiliani.fr> wrote: >> but such an LSN need not exist. Suppose A writes a commit record at >> LSN 0/10000, and then B writes a commit record at 0/10100, and then B >> calls ProcArrayEndTransaction(). At this point, B is visible and A is >> not visible, even though A's commit record precedes that of B. > > Maybe that's what Andres referred as "doable with some finicky locking". > > There is some race conditions to build a snapshot with an associated > consistent LSN. If I understand your example, A is supposed to call > ProcArrayEndTransaction() anytime soon. Right. > Could we wait/lock until it > happens? In theory, yes. I'm not sure what the code would would look like, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jul 16, 2015 at 7:13 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-07-16 13:08:48 -0400, Robert Haas wrote: >> On Thu, Jul 16, 2015 at 12:54 PM, Andres Freund <andres@anarazel.de> wrote: >> > Well, in combination with logical decoding it kinda has one: It should >> > allow you to take a dump of the database with a certain snapshot and >> > replay all transactions with a commit lsn bigger than the "snapshot's >> > lsn" and end up with a continually consistent database. >> > >> > Visibility for HS actually works precisely in commit LSN order, even if >> > that is possibly different than on the primary... >> >> That makes sense, and hopefully answers Florent's question about why >> this is only exposed through the slot mechanism. > > Trying to swap-in the pub conversion, I think Florent wanted to be able to > re-sync a standby from an existing slot. Which kinda makes sense to > me. We could do something like > SELECT * FROM pg_export_snapshot_for_slot(...); > > which would return the snapshot name and the LSN. > > There'd need to be some finicky locking to get that, but it should b epossible. A pg_export_snapshot_for_slot(...) would work very well. Let me explain the use case. You have many downstream systems that are replicated with logical decoding. Using a dedicated replication slot for each target is not practical. A single logical replication slot is configured. It generates a stream of LSN-stamped transactions in commit order. Those transactions are published to all downstream nodes. The snapshot exported during the slot creation can be used to generate a complete dump that the replicated systems will load before applying the transaction stream. How do you individually reload/recover one of the downstream node? You can use the initial dump and reapply all transactions emitted since the slot's inception. It will quickly become impractical. What you need is to generate a newer dump and only apply the transactions from that point. Problem: How do you synchronize this newer dump with the LSN-stamped stream of transactions? Being able to tell what LSN correspond to the consistent dump would solve it. I've started a quick&dirty solution: https://github.com/flyerman/postgres/commit/a13432d5e596a8b13ff911637afd764f53af2ab3 where I copied CreateReplicationSlot(): https://github.com/flyerman/postgres/blob/a13432d5e596a8b13ff911637afd764f53af2ab3/src/backend/replication/walsender.c#L764 into ExportLogicalDecodingSnapshot() and removed everything that isn't needed for the snapshot creation. I still need to plug it into the replication protocol grammar to test it. It's not very good solution. Among others bad things, it will exercise the output plugin for nothing. -- Florent
On Fri, Jul 17, 2015 at 8:31 AM, Florent Guiliani <florent@guiliani.fr> wrote: > A pg_export_snapshot_for_slot(...) would work very well. > > Let me explain the use case. You have many downstream systems that are > replicated with logical decoding. Using a dedicated replication slot > for each target is not practical. A single logical replication slot is > configured. It generates a stream of LSN-stamped transactions in > commit order. Those transactions are published to all downstream > nodes. > > The snapshot exported during the slot creation can be used to generate > a complete dump that the replicated systems will load before applying > the transaction stream. > > How do you individually reload/recover one of the downstream node? You > can use the initial dump and reapply all transactions emitted since > the slot's inception. It will quickly become impractical. What you > need is to generate a newer dump and only apply the transactions from > that point. I'd like to point out that I told Andres repeatedly during the development of logical decoding that this exact thing was going to be a problem. He told me fixing it was way too complicated, but I hope he'll relent, because I still think this is important. (Not trying to be a jerk here.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jul 17, 2015 at 6:53 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jul 17, 2015 at 8:31 AM, Florent Guiliani <florent@guiliani.fr> wrote: >> A pg_export_snapshot_for_slot(...) would work very well. >> >> Let me explain the use case. You have many downstream systems that are >> replicated with logical decoding. Using a dedicated replication slot >> for each target is not practical. A single logical replication slot is >> configured. It generates a stream of LSN-stamped transactions in >> commit order. Those transactions are published to all downstream >> nodes. >> >> The snapshot exported during the slot creation can be used to generate >> a complete dump that the replicated systems will load before applying >> the transaction stream. >> >> How do you individually reload/recover one of the downstream node? You >> can use the initial dump and reapply all transactions emitted since >> the slot's inception. It will quickly become impractical. What you >> need is to generate a newer dump and only apply the transactions from >> that point. > > I'd like to point out that I told Andres repeatedly during the > development of logical decoding that this exact thing was going to be > a problem. He told me fixing it was way too complicated, but I hope > he'll relent, because I still think this is important. > > (Not trying to be a jerk here.) As a quick solution, I added a replication command named LOGICAL_DECODING_SNAPSHOT that does everything CREATE_REPLICATION_SLOT does for logical slots except that the slot is automatically dropped ( see attached patch or https://github.com/flyerman/postgres/compare/9.5...flyerman:export-logical-snapshot ). Feel free to review and comment my patch. I'd like to provide a better solution but I'd need some guidance on how to get the locking right between the snapshot creation and the LSN retrieval. -- Florent