Thread: Re: pgsql: Avoid duplicate XIDs at recovery when building initialsnapshot
On 2018-Oct-14, Andres Freund wrote: > On 2018-10-14 13:26:24 +0000, Michael Paquier wrote: > > Avoid duplicate XIDs at recovery when building initial snapshot > I'm unhappy this approach was taken over objections. Without a real > warning. Even leaving the crummyness aside, did you check other users > of XLOG_RUNNING_XACTS, e.g. logical decoding? Mumble. Is there a real problem here -- I mean, did this break logical decoding? Maybe we need some more tests to ensure this stuff works sanely for logical decoding. FWIW and not directly related, I recently became aware (because of a customer question) that txid_current_snapshot() is oblivious of XLOG_RUNNING_XACTS in a standby. AFAICS it's not a really serious concern, but it was surprising. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-10-22 12:36:25 -0300, Alvaro Herrera wrote: > On 2018-Oct-14, Andres Freund wrote: > > > On 2018-10-14 13:26:24 +0000, Michael Paquier wrote: > > > Avoid duplicate XIDs at recovery when building initial snapshot > > > I'm unhappy this approach was taken over objections. Without a real > > warning. Even leaving the crummyness aside, did you check other users > > of XLOG_RUNNING_XACTS, e.g. logical decoding? > > Mumble. Is there a real problem here -- I mean, did this break logical > decoding? Maybe we need some more tests to ensure this stuff works > sanely for logical decoding. Hm? My point is that this fix just puts a band-aid onto *one* of the places that read a XLOG_RUNNING_XACTS. Which still leaves the contents of WAL record corrupted. There's not even a note at the WAL-record's definition or its logging denoting that the contents are not what you'd expect. I don't mean that the fix would break logical decoding, but that it's possible that an equivalent of the problem affecting hot standby also affects logical decoding. And even leaving those two users aside, it's possible that there will be further vulernable internal users or extensions parsing the WAL. > FWIW and not directly related, I recently became aware (because of a > customer question) that txid_current_snapshot() is oblivious of > XLOG_RUNNING_XACTS in a standby. AFAICS it's not a really serious > concern, but it was surprising. That's more fundamental than just XLOG_RUNNING_XACTS though, no? The whole way running transactions (i.e. including those that are just detected by looking at their xid) are handled in the known xids struct and in snapshots seems incompatible with that, no? Greetings, Andres Freund
On 2018-Oct-22, Andres Freund wrote: > Hi, > > On 2018-10-22 12:36:25 -0300, Alvaro Herrera wrote: > > On 2018-Oct-14, Andres Freund wrote: > > > > > On 2018-10-14 13:26:24 +0000, Michael Paquier wrote: > > > > Avoid duplicate XIDs at recovery when building initial snapshot > > > > > I'm unhappy this approach was taken over objections. Without a real > > > warning. Even leaving the crummyness aside, did you check other users > > > of XLOG_RUNNING_XACTS, e.g. logical decoding? > > > > Mumble. Is there a real problem here -- I mean, did this break logical > > decoding? Maybe we need some more tests to ensure this stuff works > > sanely for logical decoding. > > Hm? My point is that this fix just puts a band-aid onto *one* of the > places that read a XLOG_RUNNING_XACTS. Which still leaves the contents > of WAL record corrupted. There's not even a note at the WAL-record's > definition or its logging denoting that the contents are not what you'd > expect. I don't mean that the fix would break logical decoding, but > that it's possible that an equivalent of the problem affecting hot > standby also affects logical decoding. And even leaving those two users > aside, it's possible that there will be further vulernable internal > users or extensions parsing the WAL. Ah! I misinterpreted what you were saying. I agree we shouldn't let the WAL message have wrong data. Of course we shouldn't just add a code comment stating that the data is wrong :-) > > FWIW and not directly related, I recently became aware (because of a > > customer question) that txid_current_snapshot() is oblivious of > > XLOG_RUNNING_XACTS in a standby. AFAICS it's not a really serious > > concern, but it was surprising. > > That's more fundamental than just XLOG_RUNNING_XACTS though, no? The > whole way running transactions (i.e. including those that are just > detected by looking at their xid) are handled in the known xids struct > and in snapshots seems incompatible with that, no? hmm ... as I recall, txid_current_snapshot also only considers xacts by xid, so read-only xacts are not considered -- seems to me that if you think of snapshots in a general way, you're right, but for whatever you want txid_current_snapshot for (and I don't quite remember what that is) then it's not really important. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: Avoid duplicate XIDs at recovery when building initialsnapshot
From
Michael Paquier
Date:
On Mon, Oct 22, 2018 at 07:15:38PM -0300, Alvaro Herrera wrote: > On 2018-Oct-22, Andres Freund wrote: >> Hm? My point is that this fix just puts a band-aid onto *one* of the >> places that read a XLOG_RUNNING_XACTS. Which still leaves the contents >> of WAL record corrupted. There's not even a note at the WAL-record's >> definition or its logging denoting that the contents are not what you'd >> expect. I don't mean that the fix would break logical decoding, but >> that it's possible that an equivalent of the problem affecting hot >> standby also affects logical decoding. And even leaving those two users >> aside, it's possible that there will be further vulernable internal >> users or extensions parsing the WAL. > > Ah! I misinterpreted what you were saying. I agree we shouldn't let > the WAL message have wrong data. Of course we shouldn't just add a > code comment stating that the data is wrong :-) Well, following the same kind of thoughts, txid_current_snapshot() uses sort_snapshot() to remove all the duplicates after fetching its data from GetSnapshotData(), so wouldn't we want to do something about removal of duplicates if dummy PGXACT entries are found while scanning the ProcArray also in this case? What I would think we should do is not only to patch GetRunningTransactionData() but also GetSnapshotData() so as we don't have duplicates also in this case, and do things in such a way that both code paths use the same logic, and that we don't need to have sort_snapshot() anymore. That would be more costly though... -- Michael
Attachment
Re: pgsql: Avoid duplicate XIDs at recovery when building initialsnapshot
From
Michael Paquier
Date:
On Tue, Oct 23, 2018 at 10:43:38AM +0900, Michael Paquier wrote: > Well, following the same kind of thoughts, txid_current_snapshot() uses > sort_snapshot() to remove all the duplicates after fetching its data > from GetSnapshotData(), so wouldn't we want to do something about > removal of duplicates if dummy PGXACT entries are found while scanning > the ProcArray also in this case? What I would think we should do is not > only to patch GetRunningTransactionData() but also GetSnapshotData() so > as we don't have duplicates also in this case, and do things in such a > way that both code paths use the same logic, and that we don't need to > have sort_snapshot() anymore. That would be more costly though... My apologies it took a bit longer than I thought. I got a patch on my desk for a couple of days, and finally took the time to finish something which would address the concerns raised here. As long as we don't reach more than hundreds of thousands of entries, there is not going to be any performance impact. So what I do in the attached is to revert 1df21ddb, and then have GetRunningTransactionData() sort the XIDs in the snapshot and remove duplicates only if at least one dummy proc entry is found while scanning, for xids and subxids. This way, there is no need to impact most of the instance deployments with the extra sort/removal phase as most don't use two-phase transactions. The sorting done at recovery when initializing the standby snapshot still needs to happen of course. The patch is added to the upcoming CF for review. Thanks, -- Michael
Attachment
> On Thu, Nov 1, 2018 at 7:09 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Oct 23, 2018 at 10:43:38AM +0900, Michael Paquier wrote: > > Well, following the same kind of thoughts, txid_current_snapshot() uses > > sort_snapshot() to remove all the duplicates after fetching its data > > from GetSnapshotData(), so wouldn't we want to do something about > > removal of duplicates if dummy PGXACT entries are found while scanning > > the ProcArray also in this case? What I would think we should do is not > > only to patch GetRunningTransactionData() but also GetSnapshotData() so > > as we don't have duplicates also in this case, and do things in such a > > way that both code paths use the same logic, and that we don't need to > > have sort_snapshot() anymore. That would be more costly though... > > My apologies it took a bit longer than I thought. I got a patch on my > desk for a couple of days, and finally took the time to finish something > which would address the concerns raised here. As long as we don't reach > more than hundreds of thousands of entries, there is not going to be any > performance impact. So what I do in the attached is to revert 1df21ddb, > and then have GetRunningTransactionData() sort the XIDs in the snapshot > and remove duplicates only if at least one dummy proc entry is found > while scanning, for xids and subxids. This way, there is no need to > impact most of the instance deployments with the extra sort/removal > phase as most don't use two-phase transactions. The sorting done at > recovery when initializing the standby snapshot still needs to happen of > course. > > The patch is added to the upcoming CF for review. Unfortunately, patch has some conflict with the current master. Could you please post a rebased version?
On Thu, 1 Nov 2018 at 06:09, Michael Paquier <michael@paquier.xyz> wrote:
--
On Tue, Oct 23, 2018 at 10:43:38AM +0900, Michael Paquier wrote:
> Well, following the same kind of thoughts, txid_current_snapshot() uses
> sort_snapshot() to remove all the duplicates after fetching its data
> from GetSnapshotData(), so wouldn't we want to do something about
> removal of duplicates if dummy PGXACT entries are found while scanning
> the ProcArray also in this case? What I would think we should do is not
> only to patch GetRunningTransactionData() but also GetSnapshotData() so
> as we don't have duplicates also in this case, and do things in such a
> way that both code paths use the same logic, and that we don't need to
> have sort_snapshot() anymore. That would be more costly though...
My apologies it took a bit longer than I thought. I got a patch on my
desk for a couple of days, and finally took the time to finish something
which would address the concerns raised here. As long as we don't reach
more than hundreds of thousands of entries, there is not going to be any
performance impact. So what I do in the attached is to revert 1df21ddb,
and then have GetRunningTransactionData() sort the XIDs in the snapshot
and remove duplicates only if at least one dummy proc entry is found
while scanning, for xids and subxids. This way, there is no need to
impact most of the instance deployments with the extra sort/removal
phase as most don't use two-phase transactions. The sorting done at
recovery when initializing the standby snapshot still needs to happen of
course.
The patch is added to the upcoming CF for review.
1df21ddb looks OK to me and was simple enough to backpatch safely.
Seems excessive to say that the WAL record is corrupt, it just contains duplicates, just as exported snapshots do. There's a few other imprecise things around in WAL, that is why we need the RunningXact data in the first place. So we have a choice of whether to remove the duplicates eagerly or lazily.
For GetRunningTransactionData(), we can do eager or lazy, since its not a foreground process. I don't object to changing it to be eager in this path, but this patch is more complex than 1df21ddb and I don't think we should backpatch this change, assuming it is acceptable.
This patch doesn't do it, but the suggestion that we touch GetSnapshotData() in the same way so we de-duplicate eagerly is a different matter and would need careful performance testing to ensure we don't slow down 2PC users.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: Avoid duplicate XIDs at recovery when building initialsnapshot
From
Michael Paquier
Date:
On Fri, Nov 30, 2018 at 02:55:47PM +0000, Simon Riggs wrote: > 1df21ddb looks OK to me and was simple enough to backpatch safely. Thanks for the feedback! > Seems excessive to say that the WAL record is corrupt, it just contains > duplicates, just as exported snapshots do. There's a few other imprecise > things around in WAL, that is why we need the RunningXact data in the first > place. So we have a choice of whether to remove the duplicates eagerly or > lazily. > > For GetRunningTransactionData(), we can do eager or lazy, since its not a > foreground process. I don't object to changing it to be eager in this path, > but this patch is more complex than 1df21ddb and I don't think we should > backpatch this change, assuming it is acceptable. Yes, I would avoid a backpatch for this more complicated one, and we need a solution for already-generated WAL. It is not complicated to handle duplicates for xacts and subxacts however holding ProcArrayLock for a longer time stresses me as it is already a bottleneck. > This patch doesn't do it, but the suggestion that we touch > GetSnapshotData() in the same way so we de-duplicate eagerly is a different > matter and would need careful performance testing to ensure we don't slow > down 2PC users. Definitely. That's a much hotter code path. I am not completely sure if that's an effort worth pursuing either.. -- Michael
Attachment
On Fri, 30 Nov 2018 at 23:08, Michael Paquier <michael@paquier.xyz> wrote:
--
On Fri, Nov 30, 2018 at 02:55:47PM +0000, Simon Riggs wrote:
> 1df21ddb looks OK to me and was simple enough to backpatch safely.
Thanks for the feedback!
> Seems excessive to say that the WAL record is corrupt, it just contains
> duplicates, just as exported snapshots do. There's a few other imprecise
> things around in WAL, that is why we need the RunningXact data in the first
> place. So we have a choice of whether to remove the duplicates eagerly or
> lazily.
>
> For GetRunningTransactionData(), we can do eager or lazy, since its not a
> foreground process. I don't object to changing it to be eager in this path,
> but this patch is more complex than 1df21ddb and I don't think we should
> backpatch this change, assuming it is acceptable.
Yes, I would avoid a backpatch for this more complicated one, and
we need a solution for already-generated WAL.
Yes, that is an important reason not to backpatch.
It is not complicated to
handle duplicates for xacts and subxacts however holding ProcArrayLock
for a longer time stresses me as it is already a bottleneck.
I hadn't realised this patch holds ProcArrayLock while removing duplicates. Now I know I vote against applying this patch unless someone can show that the performance effects of doing so are negligable, which I doubt.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: Avoid duplicate XIDs at recovery when building initialsnapshot
From
Michael Paquier
Date:
On Sat, Dec 01, 2018 at 10:51:10AM +0000, Simon Riggs wrote: > On Fri, 30 Nov 2018 at 23:08, Michael Paquier <michael@paquier.xyz> wrote: >> It is not complicated to >> handle duplicates for xacts and subxacts however holding ProcArrayLock >> for a longer time stresses me as it is already a bottleneck. > > I hadn't realised this patch holds ProcArrayLock while removing duplicates. > Now I know I vote against applying this patch unless someone can show that > the performance effects of doing so are negligable, which I doubt. Me too after more thoughts on that. Please note that I have marked the patch as returned with feedback for now. -- Michael