Thread: snapshot too old, configured by time
As discussed when the "proof of concept" patch was submitted during 9.5 development, here is a version intended to be considered for commit to 9.6, with the following changes: 1. It is configured using time rather than number of transactions. Not only was there unanimous agreement here that this was better, but the EDB customer who had requested this feature and who had been testing it independently made the same request. 2. The "proof of concept" patch only supported heap and btree checking; this supports all index types. 3. Documentation has been added. 4. Tests have been added. They are currently somewhat minimal, since this is using a whole new technique for testing from any existing committed tests -- I wanted to make sure that this approach to testing was OK with everyone before expanding it. If it is, I assume we will want to move some of the more generic portions to a .pm file to make it available for other tests. Basically, this patch aims to limit bloat when there are snapshots that are kept registered for prolonged periods. The immediate reason for this is a customer application that keeps read-only cursors against fairly static data open for prolonged periods, and automatically fields SQLSTATE 72000 to re-open them if necessary. When used, it should also provide some protections against extreme bloat from forgotten "idle in transaction" connections which are left holding a snapshot. Once a snapshot reaches the age threshold, it can be terminated if reads data modified after the snapshot was built. It is expected that useful ranges will normally be somewhere from a few hours to a few days. By default old_snapshot_threshold is set to -1, which disables the new behavior. The customer has been testing a preliminary version of this time-based patch for several weeks, and is happy with the results -- it is preventing bloat for them and not generating "snapshot too old" errors at unexpected times. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 08/31/2015 10:07 AM, Kevin Grittner wrote: Kevin, I've started to do a review on this patch but I am a bit confused with some of what I am seeing. The attached testcase fails I replace the cursor in your test case with direct selects from the table. I would have expected this to generate the snapshot too old error as well but it doesn't. # Failed test 'expect "snapshot too old" error' # at t/002_snapshot_too_old_select.pl line 64. # got: '' # expected: '72000' # Looks like you failed 1 test of 9. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/9 subtests Am I misunderstanding something or is the patch not working as expected? > As discussed when the "proof of concept" patch was submitted during > 9.5 development, here is a version intended to be considered for > commit to 9.6, with the following changes: > > 1. It is configured using time rather than number of transactions. > Not only was there unanimous agreement here that this was better, > but the EDB customer who had requested this feature and who had > been testing it independently made the same request. > > 2. The "proof of concept" patch only supported heap and btree > checking; this supports all index types. > > 3. Documentation has been added. > > 4. Tests have been added. They are currently somewhat minimal, > since this is using a whole new technique for testing from any > existing committed tests -- I wanted to make sure that this > approach to testing was OK with everyone before expanding it. If > it is, I assume we will want to move some of the more generic > portions to a .pm file to make it available for other tests. > > Basically, this patch aims to limit bloat when there are snapshots > that are kept registered for prolonged periods. The immediate > reason for this is a customer application that keeps read-only > cursors against fairly static data open for prolonged periods, and > automatically fields SQLSTATE 72000 to re-open them if necessary. > When used, it should also provide some protections against extreme > bloat from forgotten "idle in transaction" connections which are > left holding a snapshot. > > Once a snapshot reaches the age threshold, it can be terminated if > reads data modified after the snapshot was built. It is expected > that useful ranges will normally be somewhere from a few hours to a > few days. > > By default old_snapshot_threshold is set to -1, which disables the > new behavior. > > The customer has been testing a preliminary version of this > time-based patch for several weeks, and is happy with the results > -- it is preventing bloat for them and not generating "snapshot too > old" errors at unexpected times. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > >
Attachment
I'm starting to read through this and have a few questions. Kevin Grittner wrote: > 4. Tests have been added. They are currently somewhat minimal, > since this is using a whole new technique for testing from any > existing committed tests -- I wanted to make sure that this > approach to testing was OK with everyone before expanding it. If > it is, I assume we will want to move some of the more generic > portions to a .pm file to make it available for other tests. I think your test module is a bit unsure about whether it's called tso or sto. I would place it inside src/test/modules, so that buildfarm runs it automatically; I'm not sure it will pick up things in src/test. (This is the whole point of src/test/modules, after all). I haven't written or reviewed our TestLib stuff so I can't comment on whether the test code is good or not. How long is the test supposed to last? It bothers me a bit to add #include rel.h to snapmgr.h because of the new macro definition. It seems out of place. I would instead move the macro closer to bufmgr headers or rel.h itself perhaps. Also, the definition isn't accompanied by an explanatory comment (other than why is it a macro instead of a function). So if I understand correctly, every call to BufferGetPage needs to have a TestForOldSnapshot immediately afterwards? It seems easy to later introduce places that fail to test for old snapshots. What happens if they do? Does vacuum remove tuples anyway and then the query returns wrong results? That seems pretty dangerous. Maybe the snapshot could be an argument of BufferGetPage? Please have the comments be clearer on what "align to minute boundaries" means. Is it floor or ceil? Also, in the OldSnapshotControlData big comment you talk about "length" and then the variable is called "used". Maybe that should be more consistent, for clarity. How large is the struct kept in shmem? I don't think the size is configurable, is it? I would have guessed that the size would depend on the GUC param ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks to both Steve and Álvaro for review and comments. I won't be able to address all of those within the time frame of the current CF, so I have moved it to the next CF and flagged it as "Waiting on Author". I will post a patch to address all comments before that CF starts. I will discuss now, though. Steve Singer <steve@ssinger.info> wrote: > The attached testcase fails I replace the cursor in your test > case with direct selects from the table. > I would have expected this to generate the snapshot too old error > as well but it doesn't. Good idea for an additional test; it does indeed show a case where the patch could do better. I've reviewed the code and see how I can adjust the calculations to fix this. Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Kevin Grittner wrote: >> 4. Tests have been added. They are currently somewhat minimal, >> since this is using a whole new technique for testing from any >> existing committed tests -- I wanted to make sure that this >> approach to testing was OK with everyone before expanding it. >> If it is, I assume we will want to move some of the more generic >> portions to a .pm file to make it available for other tests. > I think your test module is a bit unsure about whether it's > called tso or sto. Oops. Yeah, will fix. They all should be sto for "snapshot too old". There is heavy enough use of timestamp fields name ts in the code that my fingers got confused. > I would place it inside src/test/modules, so that buildfarm > runs it automatically; I'm not sure it will pick up things in > src/test. As it stands now, the test is getting run as part of `make check-world`, and it seems like src/test/modules is about testing separate executables, so I don't think it makes sense to move the tests -- but I could be convinced that I'm missing something. > I haven't written or reviewed our TestLib stuff so I can't > comment on whether the test code is good or not. How long is the > test supposed to last? Well, I can't see how to get a good test of some code with a setting of 0 (snapshot can become "too old" immediately). That setting may keep parts of the test fast, but to exercise much of the important code you need to configure to '1min' and have the time pass the 0 second mark for a minute at various points in the test; so it will be hard to avoid having this test take a few minutes. > It bothers me a bit to add #include rel.h to snapmgr.h because of > the new macro definition. It seems out of place. Yeah, I couldn't find anywhere to put the macro that I entirely liked. > I would instead move the macro closer to bufmgr headers or rel.h > itself perhaps. I'll try that, or something similar. > Also, the definition isn't accompanied by an explanatory comment > (other than why is it a macro instead of a function). Will fix. > So if I understand correctly, every call to BufferGetPage needs > to have a TestForOldSnapshot immediately afterwards? No, every call that is part of a scan. Access for internal purposes (such as adding an index entry or vacuuming) does not need this treatment. > It seems easy to later introduce places that fail to test for old > snapshots. What happens if they do? That could cause incorrect results for those using the "snapshot too old" feature, because a query might fail to recognize that one or more rows it should see are missing. > Does vacuum remove tuples anyway and then the query returns > wrong results? Right. That. > That seems pretty dangerous. Maybe the snapshot could be an > argument of BufferGetPage? It would need that, the Relation (or go find it from the Buffer), and an indication of whether this access is due to a scan. > Please have the comments be clearer on what "align to minute > boundaries" means. Is it floor or ceil? ok > Also, in the OldSnapshotControlData big comment you talk about > "length" and then the variable is called "used". Maybe that > should be more consistent, for clarity. Will work on that comment. > How large is the struct kept in shmem? To allow a setting up to 60 days, 338kB. > I don't think the size is configurable, is it? Not in the posted patch. > I would have guessed that the size would depend on the GUC param ... I had been trying to make this GUC PGC_SIGHUP, but found a lot of devils hiding in the details of that. When I gave up on that and went back to PGC_POSTMASTER I neglected to change this allocation back. I agree that it should be changed, and that will make the size depend on the configuration. If the feature is not used, no significant RAM will be allocated. If, for example, someone wants to configure to '5h' they would need only about 1.2kB for this structure. Again, thanks to both of you for the review! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > I would place it inside src/test/modules, so that buildfarm > > runs it automatically; I'm not sure it will pick up things in > > src/test. > > As it stands now, the test is getting run as part of `make > check-world`, and it seems like src/test/modules is about testing > separate executables, so I don't think it makes sense to move the > tests -- but I could be convinced that I'm missing something. It's not conceived just as a way to test separate executables; maybe it is at the moment (though I don't think it is?) but if so that's just an accident. The intention is to have modules that get tested without them being installed, which wasn't the case when they were in contrib. The problem with check-world is that buildfarm doesn't run it. We don't want to set up separate buildfarm modules for each subdir in src/test; that would be pretty tedious. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tuesday, September 15, 2015 12:07 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Kevin Grittner wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>> I would place it inside src/test/modules, so that buildfarm >>> runs it automatically; I'm not sure it will pick up things in >>> src/test. >> >> As it stands now, the test is getting run as part of `make >> check-world`, and it seems like src/test/modules is about testing >> separate executables, so I don't think it makes sense to move the >> tests -- but I could be convinced that I'm missing something. > > It's not conceived just as a way to test separate executables; maybe it > is at the moment (though I don't think it is?) but if so that's just an > accident. The intention is to have modules that get tested without them > being installed, which wasn't the case when they were in contrib. > > The problem with check-world is that buildfarm doesn't run it. We don't > want to set up separate buildfarm modules for each subdir in src/test; > that would be pretty tedious. OK, moved. All other issues raised by Álvaro and Steve have been addressed, except for this one, which I will argue against: > So if I understand correctly, every call to BufferGetPage needs to have > a TestForOldSnapshot immediately afterwards? It seems easy to later > introduce places that fail to test for old snapshots. What happens if > they do? Does vacuum remove tuples anyway and then the query returns > wrong results? That seems pretty dangerous. Maybe the snapshot could > be an argument of BufferGetPage? There are 486 occurences of BufferGetPage in the source code, and this patch follows 36 of them with TestForOldSnapshot. This only needs to be done when a page is going to be used for a scan which produces user-visible results. That is, it is *not* needed for positioning within indexes to add or vacuum away entries, for heap inserts or deletions (assuming the row to be deleted has already been found). It seems wrong to modify about 450 BufferGetPage references to add a NULL parameter; and if we do want to make that noop change as insurance, it seems like it should be a separate patch, since the substance of this patch would be buried under the volume of that. I will add this to the November CF. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 10/15/2015 05:47 PM, Kevin Grittner wrote: > All other issues raised by Álvaro and Steve have been addressed, > except for this one, which I will argue against: I've been looking through the updated patch In snapmgr.c + * XXX: If we can trust a read of an int64 value to be atomic, we can skip the + * spinlock here. + */ +int64 +GetOldSnapshotThresholdTimestamp(void) Was your intent with the XXX for it to be a TODO to only aquire the lock on platforms without the atomic 64bit operations? On a more general note: I've tried various manual tests of this feature and it sometimes works as expected and sometimes doesn't. I'm getting the feeling that how I expect it to work isn't quite in sync with how it does work. I'd expect the following to be sufficient to generate the test T1: Obtains a snapshot that can see some rows T2: Waits 60 seconds and performs an update on those rows T2: Performs a vacuum T1: Waits 60 seconds, tries to select from the table. The snapshot should be too old For example it seems that in test 002 the select_ok on conn1 following the vacuum but right before the final sleep is critical to the snapshot too old error showing up (ie if I remove that select_ok but leave in the sleep I don't get the error) Is this intended and if so is there a better way we can explain how things work? Also is 0 intended to be an acceptable value for old_snapshot_threshold and if so what should we expect to see then? >> So if I understand correctly, every call to BufferGetPage needs to have >> a TestForOldSnapshot immediately afterwards? It seems easy to later >> introduce places that fail to test for old snapshots. What happens if >> they do? Does vacuum remove tuples anyway and then the query returns >> wrong results? That seems pretty dangerous. Maybe the snapshot could >> be an argument of BufferGetPage? > There are 486 occurences of BufferGetPage in the source code, and > this patch follows 36 of them with TestForOldSnapshot. This only > needs to be done when a page is going to be used for a scan which > produces user-visible results. That is, it is *not* needed for > positioning within indexes to add or vacuum away entries, for heap > inserts or deletions (assuming the row to be deleted has already > been found). It seems wrong to modify about 450 BufferGetPage > references to add a NULL parameter; and if we do want to make that > noop change as insurance, it seems like it should be a separate > patch, since the substance of this patch would be buried under the > volume of that. > > I will add this to the November CF. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > >
On Mon, Nov 9, 2015 at 8:07 AM, Steve Singer <steve@ssinger.info> wrote: > On 10/15/2015 05:47 PM, Kevin Grittner wrote: >> >> All other issues raised by Álvaro and Steve have been addressed, except >> for this one, which I will argue against: > > > I've been looking through the updated patch > > > In snapmgr.c > > > + * XXX: If we can trust a read of an int64 value to be atomic, we can skip > the > + * spinlock here. > + */ > +int64 > +GetOldSnapshotThresholdTimestamp(void) > > > Was your intent with the XXX for it to be a TODO to only aquire the lock on > platforms without the atomic 64bit operations? > > On a more general note: > > I've tried various manual tests of this feature and it sometimes works as > expected and sometimes doesn't. > I'm getting the feeling that how I expect it to work isn't quite in sync > with how it does work. > > I'd expect the following to be sufficient to generate the test > > T1: Obtains a snapshot that can see some rows > T2: Waits 60 seconds and performs an update on those rows > T2: Performs a vacuum > T1: Waits 60 seconds, tries to select from the table. The snapshot should > be too old > > > For example it seems that in test 002 the select_ok on conn1 following the > vacuum but right before the final sleep is critical to the snapshot too old > error showing up (ie if I remove that select_ok but leave in the sleep I > don't get the error) > > Is this intended and if so is there a better way we can explain how things > work? > > > Also is 0 intended to be an acceptable value for old_snapshot_threshold and > if so what should we expect to see then? There has been a review but no replies for more than 1 month. Returned with feedback? -- Michael
On Wed, Dec 2, 2015 at 12:39 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Mon, Nov 9, 2015 at 8:07 AM, Steve Singer <steve@ssinger.info> wrote: >> In snapmgr.c >> >> >> + * XXX: If we can trust a read of an int64 value to be atomic, we can skip the >> + * spinlock here. >> + */ >> +int64 >> +GetOldSnapshotThresholdTimestamp(void) >> >> >> Was your intent with the XXX for it to be a TODO to only aquire the lock on >> platforms without the atomic 64bit operations? I'm not sure whether we can safely assume a read of an int64 to be atomic on any platform; if we actually can on some platforms, and we have a #define to tell us whether we are in such an environment, we could condition the spinlock calls on that. Are we there yet? >> On a more general note: >> >> I've tried various manual tests of this feature and it sometimes works as >> expected and sometimes doesn't. >> I'm getting the feeling that how I expect it to work isn't quite in sync >> with how it does work. >> >> I'd expect the following to be sufficient to generate the test >> >> T1: Obtains a snapshot that can see some rows >> T2: Waits 60 seconds and performs an update on those rows >> T2: Performs a vacuum >> T1: Waits 60 seconds, tries to select from the table. The snapshot should >> be too old >> >> >> For example it seems that in test 002 the select_ok on conn1 following the >> vacuum but right before the final sleep is critical to the snapshot too old >> error showing up (ie if I remove that select_ok but leave in the sleep I >> don't get the error) >> >> Is this intended and if so is there a better way we can explain how things >> work? At every phase I took a conservative approach toward deferring pruning of tuples still visible to any snapshot -- often reducing the overhead of tracking by letting things go to the next minute boundary. The idea is that an extra minute or two probably isn't going to be a big deal in terms of bloat, so if we can save any effort on the bookkeeping by letting things go a little longer, it is a worthwhile trade-off. That does make it hard to give a precise statement of exactly when a transaction *will* be subject to cancellation based on this feature, so I have emphasized the minimum guaranteed time that a transaction will be *safe*. In reviewing what you describe, I think I still don't have it as aggressive as I can (and probably should). My biggest concern is that a long-running transaction which gets a transaction ID matching the xmin on a snapshot it will hold for a long time may not be subject to cancellation. That's probably not too hard to fix, but the bigger problem is the testing. People have said that issuing SQL commands directly from a TAP test via DBD::Pg is not acceptable for a core feature, and (despite assertions to the contrary) I see no way to test this feature with existing testing mechanisms. The bigger set of work here, if we don't want this feature to go in without any testing scripts (which is not acceptable IMO), is to enhance the isolation tester or hybridize TAP testing with the isolation tester. >> Also is 0 intended to be an acceptable value for old_snapshot_threshold and >> if so what should we expect to see then? The docs in the patch say this: + <para> + A value of <literal>-1</> disables this feature, and is the default. + Useful values for production work probably range from a small number + of hours to a few days. The setting will be coerced to a granularity + of minutes, and small numbers (such as <literal>0</> or + <literal>1min</>) are only allowed because they may sometimes be + useful for testing. While a setting as high as <literal>60d</> is + allowed, please note that in many workloads extreme bloat or + transaction ID wraparound may occur in much shorter time frames. + </para> Once we can agree on a testing methodology I expect that I will be adding a number of tests based on a cluster started with old_snapshot_threshold = 0, but as I said in my initial post of the patch I was keeping the tests in the patch thin until it was confirmed whether this testing methodology was acceptable. Since it isn't, that was just as well. The time put into learning enough about perl and TAP tests to create those tests already exceeds the time to develop the actual patch, and it looks like even more will be needed for a test methodology that doesn't require adding a package or downloading a CPAN module. C'est la vie. I did expand my perl and TAP knowledge considerably, for what that's worth. > There has been a review but no replies for more than 1 month. Returned > with feedback? I do intend to post another version of the patch to tweak the calculations again, after I can get a patch in to expand the testing capabilities to allow an acceptable way to test the patch -- so I put it into the next CF instead. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 3, 2015 at 5:48 AM, Kevin Grittner wrote: >> There has been a review but no replies for more than 1 month. Returned >> with feedback? > > I do intend to post another version of the patch to tweak the > calculations again, after I can get a patch in to expand the > testing capabilities to allow an acceptable way to test the patch > -- so I put it into the next CF instead. OK, thanks. -- Michael
On 2015-12-02 14:48:24 -0600, Kevin Grittner wrote: > On Wed, Dec 2, 2015 at 12:39 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > On Mon, Nov 9, 2015 at 8:07 AM, Steve Singer <steve@ssinger.info> wrote: > > >> In snapmgr.c > >> > >> > >> + * XXX: If we can trust a read of an int64 value to be atomic, we can skip the > >> + * spinlock here. > >> + */ > >> +int64 > >> +GetOldSnapshotThresholdTimestamp(void) > >> > >> > >> Was your intent with the XXX for it to be a TODO to only aquire the lock on > >> platforms without the atomic 64bit operations? > > I'm not sure whether we can safely assume a read of an int64 to be > atomic on any platform; if we actually can on some platforms, and > we have a #define to tell us whether we are in such an environment, > we could condition the spinlock calls on that. Are we there yet? We currently don't assume it's atomic. And there are platforms, e.g 32 bit arm, where that's not the case (c.f. https://wiki.postgresql.org/wiki/Atomics). It'd be rather useful to abstract that knowledge into a macro... Andres
Kevin Grittner wrote: > People have said that issuing SQL commands directly from a TAP test > via DBD::Pg is not acceptable for a core feature, and (despite > assertions to the contrary) I see no way to test this feature with > existing testing mechanisms. The bigger set of work here, if we > don't want this feature to go in without any testing scripts (which > is not acceptable IMO), is to enhance the isolation tester or > hybridize TAP testing with the isolation tester. Is it possible to use the PostgresNode stuff to test this? If not, perhaps if you restate what additional capabilities you need we could look into adding them there. I suspect that what you need is the ability to keep more than one session open and feed them commands; perhaps we could have the framework have a function that opens a psql process and returns a FD to which the test program can write, using the IPC::Run stuff (start / pump / finish). -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Kevin Grittner wrote: > > There has been a review but no replies for more than 1 month. Returned > > with feedback? > > I do intend to post another version of the patch to tweak the > calculations again, after I can get a patch in to expand the > testing capabilities to allow an acceptable way to test the patch > -- so I put it into the next CF instead. Two months passed since this, and no activity. I'm closing this as returned-with-feedback now; you're of course free to resubmit to 2016-03. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jan 8, 2016 at 5:22 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> People have said that issuing SQL commands directly from a TAP test >> via DBD::Pg is not acceptable for a core feature, and (despite >> assertions to the contrary) I see no way to test this feature with >> existing testing mechanisms. The bigger set of work here, if we >> don't want this feature to go in without any testing scripts (which >> is not acceptable IMO), is to enhance the isolation tester or >> hybridize TAP testing with the isolation tester. > > Is it possible to use the PostgresNode stuff to test this? If not, > perhaps if you restate what additional capabilities you need we could > look into adding them there. I suspect that what you need is the > ability to keep more than one session open and feed them commands; > perhaps we could have the framework have a function that opens a psql > process and returns a FD to which the test program can write, using the > IPC::Run stuff (start / pump / finish). Resubmitting for the March CF. The main thing that changed is that I can now run all the regression and isolation tests using installcheck with old_snapshot_threshold = 0 and get a clean run. That probably gets better overall coverage than specific tests to demonstrate the "snapshot too old" error, but of course we need those, too. While I can do that with hand-run psql sessions or through connectors from different languages, I have not been able to wrangle the testing tools we support through the build system into working for this purpose. (I had been hoping that the recent improvements to the TAP testing libraries would give me the traction to get there, but either it's still not there or my perl-fu is just too weak to figure out how to use those features -- suggestions welcome.) Basically, a connection needs to remain open and interleave commands with other connections, which the isolation tester does just fine; but it needs to do that using a custom postgresql.conf file, which TAP does just fine. I haven't been able to see the right way to get a TAP test to set up a customized installation to run isolation tests against. If I can get that working, I have additional tests I can drop into that. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 2016-02-29 18:30:27 -0600, Kevin Grittner wrote: > Basically, a connection needs to remain open and interleave > commands with other connections, which the isolation tester does > just fine; but it needs to do that using a custom postgresql.conf > file, which TAP does just fine. I haven't been able to see the > right way to get a TAP test to set up a customized installation to > run isolation tests against. If I can get that working, I have > additional tests I can drop into that. Check contrib/test_decoding's makefile. It does just that with isolationtester. Andres
On Tue, Mar 1, 2016 at 9:35 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-02-29 18:30:27 -0600, Kevin Grittner wrote: >> Basically, a connection needs to remain open and interleave >> commands with other connections, which the isolation tester does >> just fine; but it needs to do that using a custom postgresql.conf >> file, which TAP does just fine. I haven't been able to see the >> right way to get a TAP test to set up a customized installation to >> run isolation tests against. If I can get that working, I have >> additional tests I can drop into that. Launching psql from PostgresNode does not hold the connection, we would need to reinvent/integrate the logic of existing drivers to hold the connection context properly, but that's utterly complicated with not that much gain. > Check contrib/test_decoding's makefile. It does just that with > isolationtester. pg_isolation_regress --temp-config is the key item here, you can enforce a test to run on a server with a wanted configuration set. -- Michael
On Tue, Mar 1, 2016 at 12:58 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Mar 1, 2016 at 9:35 AM, Andres Freund <andres@anarazel.de> wrote: >> On 2016-02-29 18:30:27 -0600, Kevin Grittner wrote: >>> Basically, a connection needs to remain open and interleave >>> commands with other connections, which the isolation tester does >>> just fine; but it needs to do that using a custom postgresql.conf >>> file, which TAP does just fine. I haven't been able to see the >>> right way to get a TAP test to set up a customized installation to >>> run isolation tests against. If I can get that working, I have >>> additional tests I can drop into that. >> Check contrib/test_decoding's makefile. It does just that with >> isolationtester. > > pg_isolation_regress --temp-config is the key item here, you can > enforce a test to run on a server with a wanted configuration set. Thanks for the tips. Attached is a minimal set of isolation tests. I can expand on it if needed, but wanted: (1) to confirm that this is the right way to do this, and (2) how long people were willing to tolerate these tests running. Since we're making this time-based (by popular demand), there must be delays to see the new behavior. This very minimal pair of tests runs in just under one minute on my i7. Decent coverage of all the index AMs would probably require tests which run for at least 10 minutes, and probably double that. I don't recall any satisfactory resolution to prior discussions about long-running tests. This is a follow-on patch, just to add isolation testing; the prior patch must be applied, too. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Thu, Mar 3, 2016 at 2:40 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > Thanks for the tips. Attached is a minimal set of isolation tests. > I can expand on it if needed, but wanted: > > (1) to confirm that this is the right way to do this, and > > (2) how long people were willing to tolerate these tests running. > > Since we're making this time-based (by popular demand), there must > be delays to see the new behavior. This very minimal pair of tests > runs in just under one minute on my i7. Decent coverage of all the > index AMs would probably require tests which run for at least 10 > minutes, and probably double that. I don't recall any satisfactory > resolution to prior discussions about long-running tests. > > This is a follow-on patch, just to add isolation testing; the prior > patch must be applied, too. Michael, any chance that you could take a look at what Kevin did here and see if it looks good? I'm sure the base patch could use more review too, if anyone can find the time. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 11, 2016 at 2:35 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Mar 3, 2016 at 2:40 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> Thanks for the tips. Attached is a minimal set of isolation tests. >> I can expand on it if needed, but wanted: >> >> (1) to confirm that this is the right way to do this, and >> >> (2) how long people were willing to tolerate these tests running. >> >> Since we're making this time-based (by popular demand), there must >> be delays to see the new behavior. This very minimal pair of tests >> runs in just under one minute on my i7. Decent coverage of all the >> index AMs would probably require tests which run for at least 10 >> minutes, and probably double that. I don't recall any satisfactory >> resolution to prior discussions about long-running tests. >> >> This is a follow-on patch, just to add isolation testing; the prior >> patch must be applied, too. > > Michael, any chance that you could take a look at what Kevin did here > and see if it looks good? OK, I am marking this email. Just don't expect any updates from my side until mid/end of next week. > I'm sure the base patch could use more review too, if anyone can find the time. I guess I am going to need to look at the patch if if feedback for the tests is needed.. There is no point in looking at the tests without poking at the patch. -- Michael
New patch just to merge in recent commits -- it was starting to show some bit-rot. Tests folded in with main patch. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Thu, Mar 17, 2016 at 2:15 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > New patch just to merge in recent commits -- it was starting to > show some bit-rot. Tests folded in with main patch. I'm not sure if this is operating as expected. I set the value to 1min. I set up a test like this: pgbench -i pgbench -c4 -j4 -T 3600 & ### watch the size of branches table while (true) ; do psql -c "\dt+" | fgrep _branches; sleep 10; done & ### set up a long lived snapshot. psql -c 'begin; set transaction isolation level repeatable read; select sum(bbalance) from pgbench_branches; select pg_sleep(300); select sum(bbalance) from pgbench_branches;' As this runs, I can see the size of the pgbench_branches bloating once the snapshot is taken, and continues bloating at a linear rate for the full 300 seconds. Once the 300 second pg_sleep is up, the long-lived snapshot holder receives an error once it tries to access the table again, and then the bloat stops increasing. But shouldn't the bloat have stopped increasing as soon as the snapshot became doomed, which would be after a minute or so? Cheers, Jeff
On Thu, Mar 17, 2016 at 2:15 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > New patch just to merge in recent commits -- it was starting to > show some bit-rot. Tests folded in with main patch. I haven't read the patch, but I wonder: What are the implications here for B-Tree page recycling by VACUUM? I know that you understand this topic well, so I don't assume that you didn't address it. Offhand, I imagine that there'd be some special considerations. Why is it okay that an index scan could land on a deleted page with no interlock against VACUUM's page recycling? Or, what prevents that from happening in the first place? I worry that something weird could happen there. For example, perhaps the page LSN on what is actually a newly recycled page could be set such that the backend following a stale right spuriously raises a "snapshot too old" error. I suggest you consider making amcheck [1] a part of your testing strategy. I think that this patch is a good idea, and I'd be happy to take feedback from you on how to make amcheck more effective for testing this patch in particular. [1] https://commitfest.postgresql.org/9/561/ -- Peter Geoghegan
On Sun, Mar 20, 2016 at 4:25 PM, Peter Geoghegan <pg@heroku.com> wrote: > I worry that something weird could happen there. For example, perhaps > the page LSN on what is actually a newly recycled page could be set > such that the backend following a stale right spuriously raises a > "snapshot too old" error. I mean a stale right-link, of course. -- Peter Geoghegan
On 17 March 2016 at 21:15, Kevin Grittner <kgrittn@gmail.com> wrote: > New patch just to merge in recent commits -- it was starting to > show some bit-rot. Tests folded in with main patch. In session 1, I've run: # begin transaction isolation level repeatable read ; BEGIN *# declare stuff scroll cursor for select * from test where num between 5 and 9; DECLARE CURSOR *# fetch forward 5 from stuff;id | num | thing -----+-----+------------------------------------ 2 | 8 | hellofji djf odsjfiojdsif ojdsiof 3 | 7 | hellofji djf odsjfiojdsifojdsiof112 | 9 | hellofji djf odsjfiojdsif ojdsiof115 | 6 | hellofji djf odsjfiojdsif ojdsiof119 | 8 |hellofji djf odsjfiojdsif ojdsiof (5 rows) In session 2, over a min later: # update test set num = 12 where num between 5 and 9 and id between 120 and 250; ERROR: snapshot too old Then back to session 1: *# fetch forward 5 from stuff; ERROR: snapshot too old Should session 2 be getting that error? Thom
Thanks to all for the feedback; I will try to respond later this week. First I'm trying to get my reviews for other patches posted. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Mar 22, 2016 at 5:05 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > Thanks to all for the feedback; I will try to respond later this > week. First I'm trying to get my reviews for other patches posted. I have been looking at 4a, the test module, and things are looking good IMO. Something that I think would be adapted would be to define the options for isolation tests in a variable, like ISOLATION_OPTS to allow MSVC scripts to fetch those option values more easily. +submake-test_decoding: + $(MAKE) -C $(top_builddir)/src/test/modules/snapshot_too_old The target name here is incorrect. This should refer to snapshot_too_old. Regarding the main patch: + <primary><varname>old_snapshot_threshold</> configuration parameter</primary> snapshot_valid_limit? page = BufferGetPage(buf); + TestForOldSnapshot(scan->xs_snapshot, rel, page); This is a sequence repeated many times in this patch, a new routine, say BufferGetPageExtended with a uint8 flag, one flag being used to test old snapshots would be more adapted. But this would require creating a header dependency between the buffer manager and SnapshotData.. Or more simply we may want a common code path when fetching a page that a backend is going to use to fetch tuples. I am afraid of TestForOldSnapshot() being something that could be easily forgotten in code paths introduced in future patches... + if (whenTaken < 0) + { + elog(DEBUG1, + "MaintainOldSnapshotTimeMapping called with negative whenTaken = %ld", + (long) whenTaken); + return; + } + if (!TransactionIdIsNormal(xmin)) + { + elog(DEBUG1, + "MaintainOldSnapshotTimeMapping called with xmin = %lu", + (unsigned long) xmin); + return; + } Material for two assertions? -- Michael
Hi Kevin, On 3/21/16 4:05 PM, Kevin Grittner wrote: > Thanks to all for the feedback; I will try to respond later this > week. First I'm trying to get my reviews for other patches posted. We're getting to the end of the CF now. Do you know when you'll have an updated patch ready? Thanks, -- -David david@pgmasters.net
On Tue, Mar 29, 2016 at 8:58 AM, David Steele <david@pgmasters.net> wrote: > We're getting to the end of the CF now. Do you know when you'll have an > updated patch ready? I am working on it right now. Hopefully I can get it all sorted today. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Michael Paquier wrote: > page = BufferGetPage(buf); > + TestForOldSnapshot(scan->xs_snapshot, rel, page); > This is a sequence repeated many times in this patch, a new routine, > say BufferGetPageExtended with a uint8 flag, one flag being used to > test old snapshots would be more adapted. But this would require > creating a header dependency between the buffer manager and > SnapshotData.. Or more simply we may want a common code path when > fetching a page that a backend is going to use to fetch tuples. I am > afraid of TestForOldSnapshot() being something that could be easily > forgotten in code paths introduced in future patches... I said exactly the same thing, and Kevin dismissed it. I would be worried about your specific proposal though, because it's easy to just call BufferGetPage() (i.e. the not-extended version) and forget the old-snapshot protection completely. I think a safer proposition would be to replace all current BufferGetPage() calls (there are about 500) by adding the necessary arguments: buffer, snapshot, rel, and an integer "flags". All this without adding the feature. Then a subsequent commit would add the TestForOldSnapshot inside BufferGetPage, *except* when a BUFFER_NO_SNAPSHOT_TEST flag is passed. That way, new code always get the snapshot test by default. I don't like the new header dependency either, though. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I think a safer proposition would be to replace all current > BufferGetPage() calls (there are about 500) by adding the necessary > arguments: buffer, snapshot, rel, and an integer "flags". All this > without adding the feature. Then a subsequent commit would add the > TestForOldSnapshot inside BufferGetPage, *except* when a > BUFFER_NO_SNAPSHOT_TEST flag is passed. That way, new code always get > the snapshot test by default. That seems awfully invasive, not to mention performance-killing if the expectation is that most such calls are going to need a snapshot check. (Quite aside from the calls themselves, are they all in routines that are being passed the right snapshot today?) TBH, I think that shoving in something like this at the end of the last commitfest would be a bad idea even if there were widespread consensus that we wanted the feature ... which I am not sure there is. I think it might be time to bounce this one to 9.7. regards, tom lane
On Wed, Mar 30, 2016 at 11:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> I think a safer proposition would be to replace all current >> BufferGetPage() calls (there are about 500) by adding the necessary >> arguments: buffer, snapshot, rel, and an integer "flags". All this >> without adding the feature. Then a subsequent commit would add the >> TestForOldSnapshot inside BufferGetPage, *except* when a >> BUFFER_NO_SNAPSHOT_TEST flag is passed. That way, new code always get >> the snapshot test by default. > > That seems awfully invasive, That's the argument I made, which Álvaro described as "dismissing" his suggestion. In this post from October of 2015, I pointed out that there are 36 calls where we need a snapshot and 450 where we don't. http://www.postgresql.org/message-id/56479263.1140984.1444945639606.JavaMail.yahoo@mail.yahoo.com > not to mention performance-killing if the expectation is that > most such calls are going to need a snapshot check. This patch is one which has allowed a customer where we could not meet their performance requirements to pass them. It is the opposite of performance-killing. > (Quite aside from the calls themselves, are they all in > routines that are being passed the right snapshot today?) I went over that very carefully when the patch was first proposed in January of 2015, and have kept an eye on things to try to avoid bit-rot which might introduce new calls which need to be touched. The customer for which this was initially developed uses a 30 day test run with very complex production releases driven by a simulated user load with large numbers of users. EDB has back-patched it to 9.4 where an earlier version of it is being used in production by this (large) customer. > TBH, I think that shoving in something like this at the end of the last > commitfest would be a bad idea even if there were widespread consensus > that we wanted the feature ... which I am not sure there is. I don't recall anyone opposing the feature itself it except you, and it has had enthusiastic support from many. Before I was made aware of a relevant isolation tester feature, there were many objections to my efforts at regression testing, and Álvaro has argued for touching 450 locations in the code that otherwise don't need it, just as "reminders" to people to consider whether newly added calls might need a snapshot, and he doesn't like the new header dependencies. Simon seemed to want it in 9.5, but that was clearly premature, IMO. Characterizing this as being shoved in at the last moment seems odd, since the big hang-up from the November CF was the testing methodology in the patch. It has been feature-complete since the September CF, and modified based on feedback. Granted, some additional testing in this CF brought up a couple things that merit a look, but this patch is hardly unique in that regard. > I think it might be time to bounce this one to 9.7. If there is a consensus for that, sure, or if I can't sort out the latest issues by feature freeze (which is admittedly looming). -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Mar 20, 2016 at 6:25 PM, Peter Geoghegan <pg@heroku.com> wrote: > I haven't read the patch, but I wonder: What are the implications here > for B-Tree page recycling by VACUUM? > Offhand, I imagine that there'd be some special considerations. Why is > it okay that an index scan could land on a deleted page with no > interlock against VACUUM's page recycling? Or, what prevents that from > happening in the first place? When the initial "proof of concept" patch was tested by the customer, it was not effective due to issues related to what you raise. Autovacuum workers were blocking due to the page pins for scans using these old snapshots, causing the bloat to accumulate in spite of this particular patch. This was addressed, at least to a degree sufficient for this customer, with this patch: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ed5b87f96d473962ec5230fd820abfeaccb2069 Basically, for most common cases the "super-exclusive" locking has been eliminated from btree logic; and I have been happy to see that Simon has been working on dealing with the corner cases where I hadn't rooted it out. > I worry that something weird could happen there. For example, perhaps > the page LSN on what is actually a newly recycled page could be set > such that the backend following a stale right spuriously raises a > "snapshot too old" error. That particular detail doesn't seem to be a realistic concern, though -- if a page has been deleted, assigned to a new place in the index with an LSN corresponding to that action, it would be a pretty big bug if a right-pointer still referenced it. > I suggest you consider making amcheck [1] a part of your testing > strategy. I think that this patch is a good idea, and I'd be happy to > take feedback from you on how to make amcheck more effective for > testing this patch in particular. I'm not sure how that would fit in; could you elaborate? The biggest contradiction making testing hard is that everyone (and I mean everyone!) preferred to see this configured by time rather than number of transactions, so there is no change in behavior without some sort of wait for elapsed time. But nobody wants to drive time needed for running regression tests too high. Testing was far easier when a transaction count was used for configuration. old_snapshot_threshold = -1 (the default) completely disables the new behavior, and I basically abused a configuration setting of 0 to mean a few seconds so I could get some basic testing added to make check-world while keeping the additional time for the tests (barely) below one minute. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 30, 2016 at 11:53 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > When the initial "proof of concept" patch was tested by the > customer, it was not effective due to issues related to what you > raise. Autovacuum workers were blocking due to the page pins for > scans using these old snapshots, causing the bloat to accumulate in > spite of this particular patch. This was addressed, at least to a > degree sufficient for this customer, with this patch: > > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ed5b87f96d473962ec5230fd820abfeaccb2069 > > Basically, for most common cases the "super-exclusive" locking has > been eliminated from btree logic; and I have been happy to see that > Simon has been working on dealing with the corner cases where I > hadn't rooted it out. > >> I worry that something weird could happen there. For example, perhaps >> the page LSN on what is actually a newly recycled page could be set >> such that the backend following a stale right spuriously raises a >> "snapshot too old" error. > > That particular detail doesn't seem to be a realistic concern, > though -- if a page has been deleted, assigned to a new place in > the index with an LSN corresponding to that action, it would be a > pretty big bug if a right-pointer still referenced it. Yes, that would be a big bug. But I wasn't talking about "super-exclusive" locking. Rather, I was talking about the general way in which index scans are guaranteed to not land on an already-recycled page (not a half-dead page, and not a fully deleted page -- a fully reclaimed/recycled page). This works without buffer pins or buffer locks needing to be held at all -- there is a global interlock against page *recycling* based on RecentGlobalXmin, per the nbtree README. So, this vague concern of mine is about VACUUM's B-Tree page recycling. During an index scan, we expect to be able to land on the next page, and to be at least able to reason about it being deleted, even though we don't hold a pin on anything for a period. We certainly are shy about explaining all this, but if you look at a routine like _bt_search() carefully (the routine that is used to descend a B-Tree), it doesn't actually hold a pin concurrently, as we drop a level (and certainly not a buffer lock, either). The next page/block should be substantively the same page as expected from the downlink (or right link) we followed, entirely because of the RecentGlobalXmin interlock. Backwards scans also rely on this. This is just a vague concern, and possibly this is completely irrelevant. I haven't read the patch. >> I suggest you consider making amcheck [1] a part of your testing >> strategy. I think that this patch is a good idea, and I'd be happy to >> take feedback from you on how to make amcheck more effective for >> testing this patch in particular. > > I'm not sure how that would fit in; could you elaborate? Well, amcheck is a tool that in essence makes sure that B-Trees look structurally sound, and respect invariants like having every item on each page in logical order. That could catch a bug of the kind I just described, because it's quite likely that the recycled page would happen to have items that didn't comport with the ordering on the page. The block has been reused essentially at random. Importantly, amcheck can catch this without anything more than an AccessShareLock, so you have some hope of catching this kind of race condition (the stale downlink that you followed to get to the spuriously-recycled-early page doesn't stay stale for long). Or, maybe it would happen to catch some other random problem. Difficult to say. Again, this is based on a speculation that might be wrong. But it's worth considering. -- Peter Geoghegan
Kevin Grittner wrote: > On Wed, Mar 30, 2016 at 11:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > >> I think a safer proposition would be to replace all current > >> BufferGetPage() calls (there are about 500) by adding the necessary > >> arguments: buffer, snapshot, rel, and an integer "flags". All this > >> without adding the feature. Then a subsequent commit would add the > >> TestForOldSnapshot inside BufferGetPage, *except* when a > >> BUFFER_NO_SNAPSHOT_TEST flag is passed. That way, new code always get > >> the snapshot test by default. > > > > That seems awfully invasive, > > That's the argument I made, which Álvaro described as "dismissing" > his suggestion. In this post from October of 2015, I pointed out > that there are 36 calls where we need a snapshot and 450 where we > don't. > > http://www.postgresql.org/message-id/56479263.1140984.1444945639606.JavaMail.yahoo@mail.yahoo.com I understand the invasiveness argument, but to me the danger of introducing new bugs trumps that. The problem is not the current code, but future patches: it is just too easy to make the mistake of not checking the snapshot in new additions of BufferGetPage. And you said that the result of missing a check is silent wrong results from queries that should instead be cancelled, which seems fairly bad to me. My impression was that you were actually considering doing something about that -- sorry for the lack of clarity. We have made similarly invasive changes in the past -- the SearchSysCache API for instance. > > not to mention performance-killing if the expectation is that > > most such calls are going to need a snapshot check. > > This patch is one which has allowed a customer where we could not > meet their performance requirements to pass them. It is the > opposite of performance-killing. I think Tom misunderstood what I said and you misunderstood what Tom said. Let me attempt to set things straight. I said that we should change BufferGetPage into having the snapshot check built-in, except in the cases where a flag is passed; and the flag would be passed in all cases except those 30-something you identified. In other words, the behavior in all the current callsites would be identical to what's there today; we could have a macro do the first check so that we don't introduce the overhead of a function call in the 450 cases where it's not needed. Tom said that my proposal would be performance-killing, not that your patch would be performance-killing. But as I argue above, with my proposal performance would stay the same, so we're actually okay. I don't think nobody disputes that your patch is good in general. I would be happy with it in 9.6, but I have my reservations about the aforementioned problem. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 30, 2016 at 12:21 PM, Peter Geoghegan <pg@heroku.com> wrote: > Well, amcheck is a tool that in essence makes sure that B-Trees look > structurally sound, and respect invariants like having every item on > each page in logical order. That could catch a bug of the kind I just > described, because it's quite likely that the recycled page would > happen to have items that didn't comport with the ordering on the > page. I mean: didn't comport with the ordering of the last page "on the same level" (but, due to this issue, maybe not actually on the same level). We check if the first item on the "right page" (in actuality, due to this bug, the new page image following a spurious early recycle) is greater than or equal to the previous page (the page whose right-link we followed) last item. On each level, everything should be in order -- that's an invariant that (it is posited by me) we can safely check with only an AccessShareLock. Making the relation lock only a AccessShareLock is not just about reducing the impact on production systems. It's also about making the tool more effective at finding these kinds of transient races, with subtle user-visible symptoms. Again, I don't want to prejudice anyone against your patch, which I haven't read. -- Peter Geoghegan
On Sat, Mar 19, 2016 at 1:27 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > I'm not sure if this is operating as expected. > > I set the value to 1min. > > I set up a test like this: > > pgbench -i > > pgbench -c4 -j4 -T 3600 & > > ### watch the size of branches table > while (true) ; do psql -c "\dt+" | fgrep _branches; sleep 10; done & > > ### set up a long lived snapshot. > psql -c 'begin; set transaction isolation level repeatable read; > select sum(bbalance) from pgbench_branches; select pg_sleep(300); > select sum(bbalance) from pgbench_branches;' > > As this runs, I can see the size of the pgbench_branches bloating once > the snapshot is taken, and continues bloating at a linear rate for the > full 300 seconds. > > Once the 300 second pg_sleep is up, the long-lived snapshot holder > receives an error once it tries to access the table again, and then > the bloat stops increasing. But shouldn't the bloat have stopped > increasing as soon as the snapshot became doomed, which would be after > a minute or so? This is actually operating as intended, not a bug. Try running a manual VACUUM command about two minutes after the snapshot is taken and you should get a handle on what's going on. The old tuples become eligible for vacuuming after one minute, but that doesn't necessarily mean that autovacuum jumps in and that the space starts getting reused. The manual vacuum will allow that, as you should see on your monitoring window. A connection should not get the error just because it is using a snapshot that tries to look at data that might be wrong, and the connection holding the long-lived snapshot doesn't do that until it awakes from the sleep and runs the next SELECT command. All is well as far as I can see here. Thanks for checking, though! It is an interesting test! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 30, 2016 at 2:29 PM, Peter Geoghegan <pg@heroku.com> wrote: > [Does the patch allow dangling page pointers?] > Again, I don't want to prejudice anyone against your patch, which I > haven't read. I don't believe that the way the patch does its business opens any new vulnerabilities of this type. If you see such after looking at the patch, let me know. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 30, 2016 at 2:34 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > A connection should not get the > error just because it is using a snapshot that tries to look at > data that might be wrong, and the connection holding the long-lived > snapshot doesn't do that until it awakes from the sleep and runs > the next SELECT command. Well, that came out wrong. A connection should not get the "snapshot too old" error just because it is *holds* an old snapshot; it must actually attempt to read an affected page *using* an old snapshot, and the connection holding the long-lived snapshot doesn't do that until it awakes from the sleep and runs the next SELECT command. Sorry for any confusion from the sloppy editing. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I understand the invasiveness argument, but to me the danger of > introducing new bugs trumps that. The problem is not the current code, > but future patches: it is just too easy to make the mistake of not > checking the snapshot in new additions of BufferGetPage. And you said > that the result of missing a check is silent wrong results from queries > that should instead be cancelled, which seems fairly bad to me. Fair point. > I said that we should change BufferGetPage into having the snapshot > check built-in, except in the cases where a flag is passed; and the flag > would be passed in all cases except those 30-something you identified. > In other words, the behavior in all the current callsites would be > identical to what's there today; we could have a macro do the first > check so that we don't introduce the overhead of a function call in the > 450 cases where it's not needed. In many of the places that BufferGetPage is called there is not a snapshot available. I assume that you would be OK with an Assert that the flag was passed if the snapshot is NULL? I had been picturing what you were requesting as just adding a snapshot parameter and assuming that NULL meant not to check; adding two parameters where the flag explicitly calls that the check is not needed might do more to prevent accidents, but I do wonder how much it would help during copy/paste frenzy. Touching all spots to use the new function signature would be a mechanical job with the compiler catching any errors, so it doesn't seem crazy to refactor that now, but I would like to hear what some others think about this. > Tom said that my proposal would be performance-killing, not that your > patch would be performance-killing. But as I argue above, with my > proposal performance would stay the same, so we're actually okay. > > I don't think nobody disputes that your patch is good in general. > I would be happy with it in 9.6, but I have my reservations about the > aforementioned problem. We have a lot of places in our code where people need to know things that they are not reminded of by the surrounding code, but I'm not about to argue that's a good thing; if the consensus is that this would help prevent future bugs when new BufferGetPage calls are added, I can go with the flow. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner wrote: > On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > I said that we should change BufferGetPage into having the snapshot > > check built-in, except in the cases where a flag is passed; and the flag > > would be passed in all cases except those 30-something you identified. > > In other words, the behavior in all the current callsites would be > > identical to what's there today; we could have a macro do the first > > check so that we don't introduce the overhead of a function call in the > > 450 cases where it's not needed. > > In many of the places that BufferGetPage is called there is not a > snapshot available. I assume that you would be OK with an Assert > that the flag was passed if the snapshot is NULL? Sure, that's fine. BTW I said "a macro" but I was forgetting that we have static inline functions in headers now, which means you can avoid the horrors of actually writing a macro. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Mar 24, 2016 at 2:24 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > I have been looking at 4a, the test module, and things are looking > good IMO. Something that I think would be adapted would be to define > the options for isolation tests in a variable, like ISOLATION_OPTS to > allow MSVC scripts to fetch those option values more easily. Maybe, but that seems like material for a separate patch. > +submake-test_decoding: > + $(MAKE) -C $(top_builddir)/src/test/modules/snapshot_too_old > The target name here is incorrect. This should refer to snapshot_too_old. Good catch. Fixed. So far, pending the resolution of the suggestion to add three new parameters to BufferGetPage in 450 places that otherwise don't need to be touched, this is the only change from the flurry of recent testing and review, so I'm holding off on posting a new patch just for this. > Regarding the main patch: > + <primary><varname>old_snapshot_threshold</> configuration > parameter</primary> > snapshot_valid_limit? There have already been responses supporting old_snapshot_threshold, so I would need to hear a few more votes to consider a change. > page = BufferGetPage(buf); > + TestForOldSnapshot(scan->xs_snapshot, rel, page); > This is a sequence repeated many times in this patch, a new routine, > say BufferGetPageExtended with a uint8 flag, one flag being used to > test old snapshots would be more adapted. But this would require > creating a header dependency between the buffer manager and > SnapshotData.. Or more simply we may want a common code path when > fetching a page that a backend is going to use to fetch tuples. I am > afraid of TestForOldSnapshot() being something that could be easily > forgotten in code paths introduced in future patches... Let's keep that discussion on the other branch of this thread. > + if (whenTaken < 0) > + { > + elog(DEBUG1, > + "MaintainOldSnapshotTimeMapping called with negative > whenTaken = %ld", > + (long) whenTaken); > + return; > + } > + if (!TransactionIdIsNormal(xmin)) > + { > + elog(DEBUG1, > + "MaintainOldSnapshotTimeMapping called with xmin = %lu", > + (unsigned long) xmin); > + return; > + } > Material for two assertions? You omitted the immediately preceding comment block: + /* + * We don't want to do something stupid with unusual values, but we don't + * want to litter the log with warnings or break otherwise normal + * processing for this feature; so if something seems unreasonable, just + * log at DEBUG level and return without doing anything. + */ I'm not clear that more drastic action is a good idea, since the "fallback" is existing behavior. I fear that doing something more aggressive might force other logic to become more precise about aggressive clean-up, adding overhead beyond the value gained. Thanks for the review! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Mar 31, 2016 at 5:09 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> I understand the invasiveness argument, but to me the danger of >> introducing new bugs trumps that. The problem is not the current code, >> but future patches: it is just too easy to make the mistake of not >> checking the snapshot in new additions of BufferGetPage. And you said >> that the result of missing a check is silent wrong results from queries >> that should instead be cancelled, which seems fairly bad to me. > > Fair point. That's my main concern after going through the patch, and the patch written as-is does not help much future users. This could be easily forgotten by committers as well. >> I said that we should change BufferGetPage into having the snapshot >> check built-in, except in the cases where a flag is passed; and the flag >> would be passed in all cases except those 30-something you identified. >> In other words, the behavior in all the current callsites would be >> identical to what's there today; we could have a macro do the first >> check so that we don't introduce the overhead of a function call in the >> 450 cases where it's not needed. > > In many of the places that BufferGetPage is called there is not a > snapshot available. I assume that you would be OK with an Assert > that the flag was passed if the snapshot is NULL? I had been > picturing what you were requesting as just adding a snapshot > parameter and assuming that NULL meant not to check; adding two > parameters where the flag explicitly calls that the check is not > needed might do more to prevent accidents, but I do wonder how much > it would help during copy/paste frenzy. Touching all spots to use > the new function signature would be a mechanical job with the > compiler catching any errors, so it doesn't seem crazy to refactor > that now, but I would like to hear what some others think about > this. That's better than what the existing patch for sure. When calling BufferGetPage() one could be tempted to forget to set snapshot to NULL though. It should be clearly documented in the header of BufferGetPage() where and for which purpose a snapshot should be set, and in which code paths it is expected to be used. In our case, that's mainly when a page is fetched from shared buffers and that it is used for reading tuples from it. Just a note: I began looking at the tests, but finished looking at the patch entirely at the end by curiosity. Regarding the integration of this patch for 9.6, I think that bumping that to 9.7 would be wiser because the patch needs to be re-written largely, and that's never a good sign at this point of the development cycle. -- Michael
Michael Paquier wrote: > Just a note: I began looking at the tests, but finished looking at the > patch entirely at the end by curiosity. Regarding the integration of > this patch for 9.6, I think that bumping that to 9.7 would be wiser > because the patch needs to be re-written largely, and that's never a > good sign at this point of the development cycle. Not rewritten surelY? It will need a very large mechanical change to existing BufferGetPage calls, but to me that doesn't equate "rewriting" it. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 30, 2016 at 9:19 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: > >> Just a note: I began looking at the tests, but finished looking at the >> patch entirely at the end by curiosity. Regarding the integration of >> this patch for 9.6, I think that bumping that to 9.7 would be wiser >> because the patch needs to be re-written largely, and that's never a >> good sign at this point of the development cycle. > > Not rewritten surelY? It will need a very large mechanical change to > existing BufferGetPage calls, but to me that doesn't equate "rewriting" > it. I'll submit patches later today to make the mechanical change to the nearly 500 BufferGetPage() calls and to tweak to the 36 places to use the new "test" flag with the new signature rather than adding a line for the test. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Kevin Grittner wrote: >> On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >> > I said that we should change BufferGetPage into having the snapshot >> > check built-in, except in the cases where a flag is passed; and the flag >> > would be passed in all cases except those 30-something you identified. >> > In other words, the behavior in all the current callsites would be >> > identical to what's there today; we could have a macro do the first >> > check so that we don't introduce the overhead of a function call in the >> > 450 cases where it's not needed. >> >> In many of the places that BufferGetPage is called there is not a >> snapshot available. I assume that you would be OK with an Assert >> that the flag was passed if the snapshot is NULL? > > Sure, that's fine. > > BTW I said "a macro" but I was forgetting that we have static inline > functions in headers now, which means you can avoid the horrors of > actually writing a macro. Attached is what I think you're talking about for the first patch. AFAICS this should generate identical executable code to unpatched. Then the patch to actually implement the feature would, instead of adding 30-some lines with TestForOldSnapshot() would implement that as the behavior for the other enum value, and alter those 30-some BufferGetPage() calls. Álvaro and Michael, is this what you were looking for? Is everyone else OK with this approach? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Kevin Grittner wrote: > Attached is what I think you're talking about for the first patch. > AFAICS this should generate identical executable code to unpatched. > Then the patch to actually implement the feature would, instead > of adding 30-some lines with TestForOldSnapshot() would implement > that as the behavior for the other enum value, and alter those > 30-some BufferGetPage() calls. > > Álvaro and Michael, is this what you were looking for? Yes, this is what I was thinking, thanks. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Apr 1, 2016 at 11:45 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Kevin Grittner wrote: > >> Attached is what I think you're talking about for the first patch. >> AFAICS this should generate identical executable code to unpatched. >> Then the patch to actually implement the feature would, instead >> of adding 30-some lines with TestForOldSnapshot() would implement >> that as the behavior for the other enum value, and alter those >> 30-some BufferGetPage() calls. >> >> Álvaro and Michael, is this what you were looking for? > > Yes, this is what I was thinking, thanks. A small thing: $ git diff master --check src/include/storage/bufmgr.h:181: trailing whitespace. +#define BufferGetPage(buffer, snapshot, relation, agetest) ((Page)BufferGetBlock(buffer)) - Page page = BufferGetPage(buf); + Page page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST); Having a BufferGetPageExtended() with some flags and a default corresponding to NO_SNAPSHOT_TEST would reduce the diff impact. And as long as the check is integrated with BufferGetPage[Extended]() I would not complain, the patch as proposed being 174kB... -- Michael
On Sat, Apr 2, 2016 at 7:12 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Apr 1, 2016 at 11:45 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Kevin Grittner wrote: >> >>> Attached is what I think you're talking about for the first patch. >>> AFAICS this should generate identical executable code to unpatched. >>> Then the patch to actually implement the feature would, instead >>> of adding 30-some lines with TestForOldSnapshot() would implement >>> that as the behavior for the other enum value, and alter those >>> 30-some BufferGetPage() calls. >>> >>> Álvaro and Michael, is this what you were looking for? >> >> Yes, this is what I was thinking, thanks. > > A small thing: > $ git diff master --check > src/include/storage/bufmgr.h:181: trailing whitespace. > +#define BufferGetPage(buffer, snapshot, relation, agetest) > ((Page)BufferGetBlock(buffer)) > > - Page page = BufferGetPage(buf); > + Page page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST); > Having a BufferGetPageExtended() with some flags and a default > corresponding to NO_SNAPSHOT_TEST would reduce the diff impact. And as > long as the check is integrated with BufferGetPage[Extended]() I would > not complain, the patch as proposed being 174kB... If you are saying that the 450 places that don't need the check would remain unchanged, and the only difference would be to use BufferGetPageExtended() instead of BufferGetPage() followed by TestForOldSnapshot() in the 30-some places that need the check, I don't see the point. That would eliminate the "forced choice" aspect of what Álvaro is asking for, and it seems to me that it would do next to nothing to prevent the errors of omission that are the concern here. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 30, 2016 at 12:34 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Sat, Mar 19, 2016 at 1:27 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > >> I'm not sure if this is operating as expected. >> >> I set the value to 1min. >> >> I set up a test like this: >> >> pgbench -i >> >> pgbench -c4 -j4 -T 3600 & >> >> ### watch the size of branches table >> while (true) ; do psql -c "\dt+" | fgrep _branches; sleep 10; done & >> >> ### set up a long lived snapshot. >> psql -c 'begin; set transaction isolation level repeatable read; >> select sum(bbalance) from pgbench_branches; select pg_sleep(300); >> select sum(bbalance) from pgbench_branches;' >> >> As this runs, I can see the size of the pgbench_branches bloating once >> the snapshot is taken, and continues bloating at a linear rate for the >> full 300 seconds. >> >> Once the 300 second pg_sleep is up, the long-lived snapshot holder >> receives an error once it tries to access the table again, and then >> the bloat stops increasing. But shouldn't the bloat have stopped >> increasing as soon as the snapshot became doomed, which would be after >> a minute or so? > > This is actually operating as intended, not a bug. Try running a > manual VACUUM command about two minutes after the snapshot is taken > and you should get a handle on what's going on. The old tuples > become eligible for vacuuming after one minute, but that doesn't > necessarily mean that autovacuum jumps in and that the space starts > getting reused. I can verify that a manual vacuum does stop the bloat from continuing to increase. But I don't see why autovacuum is not already stopping the bloat. It is running often enough that it really ought to do so (as verified by setting log_autovacuum_min_duration = 0 and looking in the log files to see that it is vacuuming the table once per nap-time, although it is not accomplishing much by doing so as no tuples can be removed.) Also, HOT-cleanup should stop the bloat increase once the snapshot crosses the old_snapshot_threshold without even needing to wait until the next autovac runs. Does the code intentionally only work for manual vacuums? If so, that seems quite surprising. Or perhaps I am missing something else here. Thanks, Jeff
On Wed, Mar 30, 2016 at 12:46 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Wed, Mar 30, 2016 at 2:29 PM, Peter Geoghegan <pg@heroku.com> wrote: > >> [Does the patch allow dangling page pointers?] > >> Again, I don't want to prejudice anyone against your patch, which I >> haven't read. > > I don't believe that the way the patch does its business opens any > new vulnerabilities of this type. If you see such after looking at > the patch, let me know. Okay, let me be more concrete about this. The patch does this: > --- a/src/backend/access/heap/pruneheap.c > +++ b/src/backend/access/heap/pruneheap.c > @@ -92,12 +92,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer) > * need to use the horizon that includes slots, otherwise the data-only > * horizon can be used. Note that the toast relation of user defined > * relations are *not* considered catalog relations. > + * > + * It is OK to apply the old snapshot limit before acquiring the cleanup > + * lock because the worst that can happen is that we are not quite as > + * aggressive about the cleanup (by however many transaction IDs are > + * consumed between this point and acquiring the lock). This allows us to > + * save significant overhead in the case where the page is found not to be > + * prunable. > */ > if (IsCatalogRelation(relation) || > RelationIsAccessibleInLogicalDecoding(relation)) > OldestXmin = RecentGlobalXmin; > else > - OldestXmin = RecentGlobalDataXmin; > + OldestXmin = > + TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin, > + relation); This new intermediary function TransactionIdLimitedForOldSnapshots() is called to decide what OldestXmin actually gets to be above, based in part on the new GUC old_snapshot_threshold: > +/* > + * TransactionIdLimitedForOldSnapshots > + * > + * Apply old snapshot limit, if any. This is intended to be called for page > + * pruning and table vacuuming, to allow old_snapshot_threshold to override > + * the normal global xmin value. Actual testing for snapshot too old will be > + * based on whether a snapshot timestamp is prior to the threshold timestamp > + * set in this function. > + */ > +TransactionId > +TransactionIdLimitedForOldSnapshots(TransactionId recentXmin, > + Relation relation) It might not be RecentGlobalDataXmin that is usually returned as OldestXmin as it is today, which is exactly the point of this patch: VACUUM can be more aggressive in cleaning up bloat, not unlike the non-catalog logical decoding case, on the theory that we can reliably detect when that causes failures for old snapshots, and just raise a "snapshot too old" error. (RecentGlobalDataXmin is morally about the same as RecentGlobalXmin, as far as this patch goes). So far, so good. It's okay that _bt_page_recyclable() never got the memo about any of this...: /** _bt_page_recyclable() -- Is an existing page recyclable?** This exists to make sure _bt_getbuf and btvacuumscan havethe same* policy about whether a page is safe to re-use.*/ bool _bt_page_recyclable(Page page) { BTPageOpaque opaque; ... /* * Otherwise, recycle if deleted and too old to have any processes * interested in it. */ opaque = (BTPageOpaque)PageGetSpecialPointer(page); if (P_ISDELETED(opaque) && TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin)) return true; return false; } ...because this patch does nothing to advance RecentGlobalXmin (or RecentGlobalDataXmin) itself more aggressively. It does make vacuum_set_xid_limits() get a more aggressive cutoff point, but we don't see that being passed back down by lazy vacuum here; within _bt_page_recyclable(), we rely on the more conservative RecentGlobalXmin, which is not subject to any clever optimization in the patch. Fortunately, this seems correct, since index scans will always succeed in finding a deleted page, per nbtree README notes on RecentGlobalXmin. Unfortunately, this does stop recycling from happening early for B-Tree pages, even though that's probably safe in principle. This is probably not so bad -- it just needs to be considered when reviewing this patch (the same is true of logical decoding's RecentGlobalDataXmin; it also doesn't appear in _bt_page_recyclable(), and I guess that that was never a problem). Index relations will not get smaller in some important cases, but they will be made less bloated by VACUUM in a sense that's still probably very useful. Maybe that explains some of what Jeff talked about. I think another part of the problems that Jeff mentioned (with pruning) could be this existing code within heap_hot_search_buffer(): /* * If we can't see it, maybe no one else can either. At caller * request, check whether all chainmembers are dead to all * transactions. */ if (all_dead && *all_dead && !HeapTupleIsSurelyDead(heapTuple,RecentGlobalXmin)) *all_dead = false; This is used within routines like btgettuple(), to do the LP_DEAD thing to kill index tuples (not HOT chain pruning). Aside: Not sure offhand why it might be okay, performance-wise, that this code doesn't care about RecentGlobalDataXmin; pruning was a big part of why RecentGlobalDataXmin was added for logical decoding, I thought, although I guess the _bt_killitems() stuff doesn't count as pruning. -- Peter Geoghegan
On Sun, Apr 3, 2016 at 2:09 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > Also, HOT-cleanup should stop the bloat increase once the snapshot > crosses the old_snapshot_threshold without even needing to wait until > the next autovac runs. > > Does the code intentionally only work for manual vacuums? If so, that > seems quite surprising. Or perhaps I am missing something else here. What proportion of the statements in your simulated workload were updates? Per my last mail to this thread, I'm interested in knowing if this was a delete heavy workload. -- Peter Geoghegan
On Mon, Apr 4, 2016 at 8:38 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Sun, Apr 3, 2016 at 2:09 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> Also, HOT-cleanup should stop the bloat increase once the snapshot >> crosses the old_snapshot_threshold without even needing to wait until >> the next autovac runs. >> >> Does the code intentionally only work for manual vacuums? If so, that >> seems quite surprising. Or perhaps I am missing something else here. > > What proportion of the statements in your simulated workload were > updates? Per my last mail to this thread, I'm interested in knowing if > this was a delete heavy workload. It was pgbench's built in TPC-B-like, so 3 UPDATE, 1 SELECT, 1 INSERT per transaction. So I would say that it is ridiculously update heavy compared to almost any real-world use patterns. That is the active workload. The long-term snapshot holder just does a sum(abalance) at the repeatable read level in order to force a snapshot to be taken and held, and then goes idle for a long time. Cheers, Jeff
On Sun, Apr 3, 2016 at 4:09 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Wed, Mar 30, 2016 at 12:34 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> On Sat, Mar 19, 2016 at 1:27 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> I set the value to 1min. >>> >>> I set up a test like this: >>> >>> pgbench -i >>> >>> pgbench -c4 -j4 -T 3600 & >>> >>> ### watch the size of branches table >>> while (true) ; do psql -c "\dt+" | fgrep _branches; sleep 10; done & >>> >>> ### set up a long lived snapshot. >>> psql -c 'begin; set transaction isolation level repeatable read; >>> select sum(bbalance) from pgbench_branches; select pg_sleep(300); >>> select sum(bbalance) from pgbench_branches;' >>> >>> As this runs, I can see the size of the pgbench_branches bloating once >>> the snapshot is taken, and continues bloating at a linear rate for the >>> full 300 seconds. I'm not seeing that on my i7 box. >>> Once the 300 second pg_sleep is up, the long-lived snapshot holder >>> receives an error once it tries to access the table again, and then >>> the bloat stops increasing. But shouldn't the bloat have stopped >>> increasing as soon as the snapshot became doomed, which would be after >>> a minute or so? It will, limited by how well your autovacuum can keep up with the workload on your system. See attached graph. I ran 5 times each in 3 configurations and graphed the median and average of be each. master: development checkout from today, no config changes patch: patch with no config changes except: old_snapshot_threshold = '1min' patch + av: patch with these config changes: old_snapshot_threshold = '1min' autovacuum_max_workers = 8 autovacuum_vacuum_cost_limit = 2000 autovacuum_naptime = '10s' autovacuum_work_mem = '1GB' As expected, differences are minimal at first, then the patch starts to win, and wins even better with more aggressive autovacuum. > I can verify that a manual vacuum does stop the bloat from continuing > to increase. But I don't see why autovacuum is not already stopping > the bloat. It is running often enough that it really ought to do so > (as verified by setting log_autovacuum_min_duration = 0 and looking in > the log files to see that it is vacuuming the table once per nap-time, > although it is not accomplishing much by doing so as no tuples can be > removed.) Perhaps the CPUs I have or the way I have my machine tuned allows autovacuum to be more effective in the face of the pgbench load than on yours? > Also, HOT-cleanup should stop the bloat increase once the snapshot > crosses the old_snapshot_threshold without even needing to wait until > the next autovac runs. It should help some, but you really need a vacuum in there to take care things thoroughly. > Does the code intentionally only work for manual vacuums? If so, that > seems quite surprising. Or perhaps I am missing something else here. Perhaps it is that VACUUM tries harder to get the work done, while autovacuum steps out of the way when it detects that it is blocking something. This is a pretty small table (it never gets to 1MB even when bloating) with multiple clients pounding on it. It might just be that your system doesn't allow much autovacuum activity before it find itself blocking a pgbench process. FWIW, our customer's 30-day test runs were on databases of hundreds of GB and showed similar benefits -- linear growth indefinitely without the patch, settling in to a pretty steady state after a few hours with the patch. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Mon, Apr 4, 2016 at 9:15 PM, Peter Geoghegan <pg@heroku.com> wrote: > The patch does this: > >> --- a/src/backend/access/heap/pruneheap.c >> +++ b/src/backend/access/heap/pruneheap.c >> @@ -92,12 +92,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer) >> * need to use the horizon that includes slots, otherwise the data-only >> * horizon can be used. Note that the toast relation of user defined >> * relations are *not* considered catalog relations. >> + * >> + * It is OK to apply the old snapshot limit before acquiring the cleanup >> + * lock because the worst that can happen is that we are not quite as >> + * aggressive about the cleanup (by however many transaction IDs are >> + * consumed between this point and acquiring the lock). This allows us to >> + * save significant overhead in the case where the page is found not to be >> + * prunable. >> */ >> if (IsCatalogRelation(relation) || >> RelationIsAccessibleInLogicalDecoding(relation)) >> OldestXmin = RecentGlobalXmin; >> else >> - OldestXmin = RecentGlobalDataXmin; >> + OldestXmin = >> + TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin, >> + relation); > > This new intermediary function TransactionIdLimitedForOldSnapshots() > is called to decide what OldestXmin actually gets to be above, based > in part on the new GUC old_snapshot_threshold: > >> +/* >> + * TransactionIdLimitedForOldSnapshots >> + * >> + * Apply old snapshot limit, if any. This is intended to be called for page >> + * pruning and table vacuuming, to allow old_snapshot_threshold to override >> + * the normal global xmin value. Actual testing for snapshot too old will be >> + * based on whether a snapshot timestamp is prior to the threshold timestamp >> + * set in this function. >> + */ >> +TransactionId >> +TransactionIdLimitedForOldSnapshots(TransactionId recentXmin, >> + Relation relation) > > It might not be RecentGlobalDataXmin that is usually returned as > OldestXmin as it is today, which is exactly the point of this patch: Right. > VACUUM can be more aggressive in cleaning up bloat, [...], on the > theory that we can reliably detect when that causes failures for > old snapshots, and just raise a "snapshot too old" error. Right. > ...because this patch does nothing to advance RecentGlobalXmin (or > RecentGlobalDataXmin) itself more aggressively. It does make > vacuum_set_xid_limits() get a more aggressive cutoff point, but we > don't see that being passed back down by lazy vacuum here; within > _bt_page_recyclable(), we rely on the more conservative > RecentGlobalXmin, which is not subject to any clever optimization in > the patch. Right. > Fortunately, this seems correct, since index scans will always succeed > in finding a deleted page, per nbtree README notes on > RecentGlobalXmin. Right. > Unfortunately, this does stop recycling from > happening early for B-Tree pages, even though that's probably safe in > principle. This is probably not so bad -- it just needs to be > considered when reviewing this patch (the same is true of logical > decoding's RecentGlobalDataXmin; it also doesn't appear in > _bt_page_recyclable(), and I guess that that was never a problem). > Index relations will not get smaller in some important cases, but they > will be made less bloated by VACUUM in a sense that's still probably > very useful. As I see it, if the long-running transaction(s) have written (and thus acquired transaction IDs), we can't safely advance the global Xmin values until they complete. If the long-running transactions with the old snapshots don't have transaction IDs, the bloat will be contained. > Maybe that explains some of what Jeff talked about. I think he just didn't have autovacuum configured to where it was being very effective on the tiny tables involved. See my reply to Jeff and the graphs from running his test on my system. I don't think the lines could get much more flat than what I'm seeing with the patch. It may be that this general approach could be made more aggressive and effective by pushing things in the direction you suggest, but we are far past the time to consider that sort of change for 9.6. This patch has been through several rounds of 30-day testing; a change like you propose would require that those tests be redone. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Apr 7, 2016 at 3:56 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> Unfortunately, this does stop recycling from >> happening early for B-Tree pages, even though that's probably safe in >> principle. This is probably not so bad -- it just needs to be >> considered when reviewing this patch (the same is true of logical >> decoding's RecentGlobalDataXmin; it also doesn't appear in >> _bt_page_recyclable(), and I guess that that was never a problem). >> Index relations will not get smaller in some important cases, but they >> will be made less bloated by VACUUM in a sense that's still probably >> very useful. > > As I see it, if the long-running transaction(s) have written (and > thus acquired transaction IDs), we can't safely advance the global > Xmin values until they complete. If the long-running transactions > with the old snapshots don't have transaction IDs, the bloat will > be contained. I'm not really that concerned about it. I'm mostly just explaining my thought process. >> Maybe that explains some of what Jeff talked about. > > I think he just didn't have autovacuum configured to where it was > being very effective on the tiny tables involved. See my reply to > Jeff and the graphs from running his test on my system. I don't > think the lines could get much more flat than what I'm seeing with > the patch. > > It may be that this general approach could be made more aggressive > and effective by pushing things in the direction you suggest, but > we are far past the time to consider that sort of change for 9.6. > This patch has been through several rounds of 30-day testing; a > change like you propose would require that those tests be redone. I think that there is a good argument in favor of this patch that you may have failed to make yourself, which is: it limits bloat in a way that's analogous to how RecentGlobalDataXmin can do so for logical decoding (i.e. where wal_level = logical, and RecentGlobalXmin and RecentGlobalDataXmin could actually differ). Therefore, it benefits to a significant degree from the testing that Andres did to make sure logical decoding doesn't cause excessive bloat when RecentGlobalXmin is pinned to make historic MVCC catalog snapshots work (he did so at my insistence at the time; pruning turned out to be very important for many common workloads, and Andres got that right). I can't really imagine a way that what you have here could be any less effective than what Andres did for logical decoding. This is reassuring, since that mechanism has to be pretty well battle-hardened by now. -- Peter Geoghegan
On Thu, Apr 7, 2016 at 6:12 PM, Peter Geoghegan <pg@heroku.com> wrote: > I think that there is a good argument in favor of this patch that you > may have failed to make yourself, which is: it limits bloat in a way > that's analogous to how RecentGlobalDataXmin can do so for logical > decoding (i.e. where wal_level = logical, and RecentGlobalXmin and > RecentGlobalDataXmin could actually differ). Therefore, it benefits to > a significant degree from the testing that Andres did to make sure > logical decoding doesn't cause excessive bloat when RecentGlobalXmin > is pinned to make historic MVCC catalog snapshots work (he did so at > my insistence at the time; pruning turned out to be very important for > many common workloads, and Andres got that right). I can't really > imagine a way that what you have here could be any less effective than > what Andres did for logical decoding. This is reassuring, since that > mechanism has to be pretty well battle-hardened by now. Interesting. I had not noticed that relationship. Anyway, pushed as two patches -- the no-op patch to create the "forced choice" on whether to do the test at each BufferGetPage point, and the actual feature. Sadly, I forgot to include the reviewer information when writing the commit messages. :-( -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Apr 8, 2016 at 12:55 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > Sadly, I forgot to include the reviewer information when writing the > commit messages. :-( Oh well. I'm just glad we got the patch over the line. I think that there are some types of users that will very significantly benefit from this patch. I am reminded of this blog post, written by a friend and former co-worker: https://brandur.org/postgres-queues -- Peter Geoghegan
On 4/8/16 4:30 PM, Peter Geoghegan wrote: > On Fri, Apr 8, 2016 at 12:55 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> Sadly, I forgot to include the reviewer information when writing the >> commit messages. :-( > > Oh well. I'm just glad we got the patch over the line. I think that > there are some types of users that will very significantly benefit > from this patch. I'm also very happy to see this go in. While I used to dread the "snapshot too old" error back in my O****e architect days it's nice to have the option, especially when it can be configured by time rather than by size. -- -David david@pgmasters.net
Kevin Grittner <kgrittn@gmail.com> writes: > On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> Kevin Grittner wrote: >>> On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>> I said that we should change BufferGetPage into having the snapshot >>> check built-in, except in the cases where a flag is passed; and the flag >>> would be passed in all cases except those 30-something you identified. >>> In other words, the behavior in all the current callsites would be >>> identical to what's there today; we could have a macro do the first >>> check so that we don't introduce the overhead of a function call in the >>> 450 cases where it's not needed. > Attached is what I think you're talking about for the first patch. > AFAICS this should generate identical executable code to unpatched. > Then the patch to actually implement the feature would, instead > of adding 30-some lines with TestForOldSnapshot() would implement > that as the behavior for the other enum value, and alter those > 30-some BufferGetPage() calls. > Álvaro and Michael, is this what you were looking for? > Is everyone else OK with this approach? After struggling with back-patching a GIN bug fix, I wish to offer up the considered opinion that this was an impressively bad idea. It's inserted 450 or so pain points for back-patching, which we will have to deal with for the next five years. Moreover, I do not believe that it will do a damn thing for ensuring that future calls of BufferGetPage think about what to do; they'll most likely be copied-and-pasted from nearby calls, just as people have always done. With luck, the nearby calls will have the right semantics, but this change won't help very much at all if they don't. I think we should revert BufferGetPage to be what it was before (with no snapshot test) and invent BufferGetPageExtended or similar to be used in the small number of places that need a snapshot test. regards, tom lane
On Sun, Apr 17, 2016 at 5:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@gmail.com> writes: >> On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera> <alvherre@2ndquadrant.com> wrote: >>> Kevin Grittner wrote: >>>> On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>>> I said that we should change BufferGetPage into having the snapshot >>>> check built-in, except in the cases where a flag is passed; and the flag >>>> would be passed in all cases except those 30-something you identified. >>>> In other words, the behavior in all the current callsites would be >>>> identical to what's there today; we could have a macro do the first >>>> check so that we don't introduce the overhead of a function call in the >>>> 450 cases where it's not needed. > >> Attached is what I think you're talking about for the first patch. >> AFAICS this should generate identical executable code to unpatched. >> Then the patch to actually implement the feature would, instead >> of adding 30-some lines with TestForOldSnapshot() would implement >> that as the behavior for the other enum value, and alter those >> 30-some BufferGetPage() calls. > >> Álvaro and Michael, is this what you were looking for? > >> Is everyone else OK with this approach? > > After struggling with back-patching a GIN bug fix, I wish to offer up the > considered opinion that this was an impressively bad idea. It's inserted > 450 or so pain points for back-patching, which we will have to deal with > for the next five years. Moreover, I do not believe that it will do a > damn thing for ensuring that future calls of BufferGetPage think about > what to do; they'll most likely be copied-and-pasted from nearby calls, > just as people have always done. With luck, the nearby calls will have > the right semantics, but this change won't help very much at all if they > don't. > > I think we should revert BufferGetPage to be what it was before (with > no snapshot test) and invent BufferGetPageExtended or similar to be > used in the small number of places that need a snapshot test. I'm not sure what BufferGetPageExtended() buys us over simply inserting TestForOldSnapshot() where it is needed. Other than that question, I have no objections to the course outlined, but figure I should not jump on it without allowing at least a couple days for discussion. That also may give me time to perform the benchmarks I wanted -- VPN issues have blocked me from the big test machines so far. I think I see where the time may be going when the feature is disabled, and if I'm right I have a fix; but without a big NUMA machine there is no way to confirm it. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Apr 18, 2016 at 9:52 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Sun, Apr 17, 2016 at 5:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Kevin Grittner <kgrittn@gmail.com> writes: >>> On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera> <alvherre@2ndquadrant.com> wrote: >>>> Kevin Grittner wrote: >> I think we should revert BufferGetPage to be what it was before (with >> no snapshot test) and invent BufferGetPageExtended or similar to be >> used in the small number of places that need a snapshot test. > > I'm not sure what BufferGetPageExtended() buys us over simply > inserting TestForOldSnapshot() where it is needed. Other than that > question, I have no objections to the course outlined, but figure I > should not jump on it without allowing at least a couple days for > discussion. That also may give me time to perform the benchmarks I > wanted -- VPN issues have blocked me from the big test machines so > far. I think I see where the time may be going when the feature is > disabled, and if I'm right I have a fix; but without a big NUMA > machine there is no way to confirm it. TBH, BufferGetPageExtended() still looks like a good idea to me. Backpatching those code paths is going to make the maintenance far harder, on top of the compilation of extensions for perhaps no good reason. Even if this is a low-level change, if this feature goes in with 9.6, it would be really good to mention as well that callers of BufferGetPage should update their calls accordingly if they care about the checks with the old snapshot. This is a routine used a lot in many plugins and extensions. Usually such low-level things are not mentioned in the release notes, but this time I think that's really important to say it loudly. -- Michael
Tom Lane wrote: > After struggling with back-patching a GIN bug fix, I wish to offer up the > considered opinion that this was an impressively bad idea. It's inserted > 450 or so pain points for back-patching, which we will have to deal with > for the next five years. Moreover, I do not believe that it will do a > damn thing for ensuring that future calls of BufferGetPage think about > what to do; they'll most likely be copied-and-pasted from nearby calls, > just as people have always done. With luck, the nearby calls will have > the right semantics, but this change won't help very much at all if they > don't. I disagree. A developer that sees an unadorned BufferGetPage() call doesn't stop to think twice about whether they need to add a snapshot test. Many reviewers will miss the necessary addition also. A developer that sees BufferGetPage(NO_SNAPSHOT_TEST) will at least consider the idea that the flag might be right; if that developer doesn't think about it, some reviewer may notice a new call with the flag and consider the idea that the flag may be wrong. I understand the backpatching pain argument, but my opinion was the contrary of yours even so. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Apr 17, 2016 at 10:38 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Tom Lane wrote: > >> After struggling with back-patching a GIN bug fix, I wish to offer up the >> considered opinion that this was an impressively bad idea. It's inserted >> 450 or so pain points for back-patching, which we will have to deal with >> for the next five years. > I understand the backpatching pain argument, but my opinion was the > contrary of yours even so. The other possibility would be to backpatch the no-op patch which just uses the new syntax without any change in semantics. I'm not arguing for that; just putting it on the table.... -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@gmail.com> writes: > On Sun, Apr 17, 2016 at 10:38 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> I understand the backpatching pain argument, but my opinion was the >> contrary of yours even so. > The other possibility would be to backpatch the no-op patch which > just uses the new syntax without any change in semantics. That would break 3rd-party extensions in a minor release, wouldn't it? Or do I misunderstand your suggestion? regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I disagree. A developer that sees an unadorned BufferGetPage() call > doesn't stop to think twice about whether they need to add a snapshot > test. Many reviewers will miss the necessary addition also. A > developer that sees BufferGetPage(NO_SNAPSHOT_TEST) will at least > consider the idea that the flag might be right; if that developer > doesn't think about it, some reviewer may notice a new call with the > flag and consider the idea that the flag may be wrong. I'm unconvinced ... > I understand the backpatching pain argument, but my opinion was the > contrary of yours even so. I merely point out that the problem came up less than ten days after that patch hit the tree. If that does not give you pause about the size of the back-patching problem we've just introduced, it should. TBH, there is nothing that I like about this feature: not the underlying concept, not the invasiveness of the implementation, nothing. I would dearly like to see it reverted altogether. I do not think it is worth the pain that the current implementation will impose, both on developers and on potential users. Surely there was another way to get a similar end result without mucking with things at the level of BufferGetPage. regards, tom lane
On Mon, Apr 18, 2016 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@gmail.com> writes: >> On Sun, Apr 17, 2016 at 10:38 PM, Alvaro Herrera >> <alvherre@2ndquadrant.com> wrote: >>> I understand the backpatching pain argument, but my opinion was the >>> contrary of yours even so. > >> The other possibility would be to backpatch the no-op patch which >> just uses the new syntax without any change in semantics. > > That would break 3rd-party extensions in a minor release, wouldn't it? > Or do I misunderstand your suggestion? With a little bit of a change to the headers I think we could avoid that breakage. The original no-op patch didn't change the executable code, but it would have interfered with 3rd-party compiles; but with a minor adjustment (using a modified name for the BufferGetPage with the extra parameters), we could avoid that problem. That would seem to address Álvaro's concern while avoiding five years of backpatch nightmares. I don't claim it's an *elegant* solution, but it might be a workable compromise. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Apr 17, 2016 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > After struggling with back-patching a GIN bug fix, I wish to offer up the > considered opinion that this was an impressively bad idea. It's inserted > 450 or so pain points for back-patching, which we will have to deal with > for the next five years. Moreover, I do not believe that it will do a > damn thing for ensuring that future calls of BufferGetPage think about > what to do; they'll most likely be copied-and-pasted from nearby calls, > just as people have always done. With luck, the nearby calls will have > the right semantics, but this change won't help very much at all if they > don't. I hit this problem over the weekend, too, when I tried to rebase a patch a colleague of mine is working on. So I tend to agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > I disagree. A developer that sees an unadorned BufferGetPage() call > > doesn't stop to think twice about whether they need to add a snapshot > > test. Many reviewers will miss the necessary addition also. A > > developer that sees BufferGetPage(NO_SNAPSHOT_TEST) will at least > > consider the idea that the flag might be right; if that developer > > doesn't think about it, some reviewer may notice a new call with the > > flag and consider the idea that the flag may be wrong. > > I'm unconvinced ... Well, nobody opposed this when I proposed it originally. Robert just stated that it caused a problem for him while backpatching but didn't state opinion on reverting that change or not. Maybe we should call for a vote here. > > I understand the backpatching pain argument, but my opinion was the > > contrary of yours even so. > > I merely point out that the problem came up less than ten days after > that patch hit the tree. If that does not give you pause about the > size of the back-patching problem we've just introduced, it should. Undersootd. Kevin's idea of applying a no-op syntax change is on the table. I don't like it either, but ISTM better than the other options so far. > TBH, there is nothing that I like about this feature: not the underlying > concept, not the invasiveness of the implementation, nothing. I would > dearly like to see it reverted altogether. I do not think it is worth > the pain that the current implementation will impose, both on developers > and on potential users. Surely there was another way to get a similar > end result without mucking with things at the level of BufferGetPage. Ah well, that's a completely different angle, and perhaps we should explore this before doing anything in the back branches. So it seems to me we have these options 1) revert the whole feature 2) revert the BufferGetPage syntax change 3) apply a no-op syntax change so that BufferGetPage looks the same on backbranches as it does on master today, keepingAPI and ABI compatibility with existing code 4) do nothing Any others? (If we decide to call for a vote, I suggest we open a new thread) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 18, 2016 at 8:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Surely there was another way to get a similar end result without > mucking with things at the level of BufferGetPage. To get the feature that some customers have been demanding, a check has to be made somewhere near where any page is read in a scan. It didn't take me long in working on this to notice that grepping for BufferGetPage() calls was a good way to find candidate spots to insert the check (even if only 7% of BufferGetPage() calls need to be followed by such a check) -- but the BufferGetPage() itself clearly does *not* need to be modified to implement the feature. We could: (1) Add calls to a check function where needed, and just document that addition of a BufferGetPage() call should be considered a clue that a new check might be needed. (original plan) (2) Replace the 7% of the BufferGetPage() calls that need to check the age of the snapshot with something that wraps the two function calls, and does nothing but call one and then the other. (favored by Michael) (3) Add parameters to BufferGetPage() to specify whether the check is needed and provide sufficient information to perform the check if it is. (current master) (4) Replace BufferGetPage() with some other function name having the characteristics of (3) to minimize back-patch pain. (grudgingly favored by Álvaro) (5) Revert from community code, leaving it as an EDB value-add Advanced Server feature. Does someone see another (better) alternative? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@gmail.com> writes: > On Mon, Apr 18, 2016 at 8:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Surely there was another way to get a similar end result without >> mucking with things at the level of BufferGetPage. > To get the feature that some customers have been demanding, a check > has to be made somewhere near where any page is read in a scan. I'm not really convinced that we need to define it exactly like that, though. In particular, why not just kill transactions as soon as their oldest snapshot is too old? That might not work exactly like this does, but it would have some pretty substantial benefits --- for one, that the timeout could be configured locally per session rather than having to be PGC_POSTMASTER. And it would likely be far easier to limit the performance side-effects. I complained about this back when the feature was first discussed, and you insisted that that answer was no good, and I figured I'd hold my nose and look the other way as long as the final patch wasn't too invasive. Well, now we've seen the end result, and it's very invasive and has got performance issues as well. It's time to reconsider. Or in short: this is a whole lot further than I'm prepared to go to satisfy one customer with a badly-designed application. And from what I can tell from the Feb 2015 discussion, that's what this has been written for. regards, tom lane
On Tue, Apr 19, 2016 at 3:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Or in short: this is a whole lot further than I'm prepared to go to > satisfy one customer with a badly-designed application. And from what > I can tell from the Feb 2015 discussion, that's what this has been > written for. This holds true. I imagine that a lot of people at least on this list have already spent some time in tracking down long-running transactions in someone's application and actually tuned the application so as the bloat gets reduced and things perform better for other transactions taking a shorter time. Without the need of this feature. -- Michael
On Tue, Apr 19, 2016 at 1:23 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Apr 19, 2016 at 3:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Or in short: this is a whole lot further than I'm prepared to go to >> satisfy one customer with a badly-designed application. And from what >> I can tell from the Feb 2015 discussion, that's what this has been >> written for. > > This holds true. I imagine that a lot of people at least on this list > have already spent some time in tracking down long-running > transactions in someone's application and actually tuned the > application so as the bloat gets reduced and things perform better for > other transactions taking a shorter time. Without the need of this > feature. So I don't want to be too vociferous in defending a feature that (a) was written by a colleague and (b) obviously isn't perfect, but I will point out that: 1. There was a surprising amount of positive reaction when Kevin first proposed this. I expected a lot more people to say this kind of thing at the beginning, when Kevin first brought this up, but in fact a number of people wrote into say they'd really like to have this. Those positive reaction shouldn't be forgotten just because those people aren't wading into a discussion about the merits of adding arguments to BufferGetPage. 2. Without this feature, you can kill sessions or transactions to control bloat, but this feature is properly thought of as a way to avoid bloat *without* killing sessions or transactions. You can let the session live, without having it generate bloat, just so long as it doesn't try to touch any data that has been recently modified. We have no other feature in PostgreSQL that does something like that. At the moment, what I see happening is that Tom, the one person who has hated this feature since the beginning, still hates it, and we're on the verge of asking Kevin to revert it because (1) Tom hates it and (2) Kevin changed the BufferGetPage stuff in the way that Alvaro requested. I think that's not quite fair. If we want to demand that this feature be reverted because it causes a performance loss even when turned off, I get that. If we think that it's badly implemented, fine, I get that, too. But asking somebody to revert a patch because the author adjusted things to match what the reviewer wanted is not fair. The right thing to do about that is just change it back to the way Kevin had it originally. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 19, 2016 at 6:38 AM, Robert Haas <robertmhaas@gmail.com> wrote: > The right thing to do about that is just change it back to the > way Kevin had it originally. Since this change to BufferGetPage() has caused severe back-patch pain for at least two committers so far, I will revert that (very recent) change to this patch later today unless I hear an objections. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner wrote: > On Tue, Apr 19, 2016 at 6:38 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > > The right thing to do about that is just change it back to the > > way Kevin had it originally. > > Since this change to BufferGetPage() has caused severe back-patch > pain for at least two committers so far, I will revert that (very > recent) change to this patch later today unless I hear an > objections. I vote for back-patching a no-op change instead, as discussed elsewhere. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-04-19 12:03:22 -0300, Alvaro Herrera wrote: > Kevin Grittner wrote: > > On Tue, Apr 19, 2016 at 6:38 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > > > > The right thing to do about that is just change it back to the > > > way Kevin had it originally. > > > > Since this change to BufferGetPage() has caused severe back-patch > > pain for at least two committers so far, I will revert that (very > > recent) change to this patch later today unless I hear an > > objections. > > I vote for back-patching a no-op change instead, as discussed elsewhere. What about Tom's argument that that'd be problematic for external code?
On Tue, Apr 19, 2016 at 11:03 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Kevin Grittner wrote: >> On Tue, Apr 19, 2016 at 6:38 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> > The right thing to do about that is just change it back to the >> > way Kevin had it originally. >> >> Since this change to BufferGetPage() has caused severe back-patch >> pain for at least two committers so far, I will revert that (very >> recent) change to this patch later today unless I hear an >> objections. > > I vote for back-patching a no-op change instead, as discussed elsewhere. That wouldn't have fixed my problem, which involved rebasing a patch. I really think it's also a bad precedent to back-patch things into older branches that are not themselves bug fixes. Users count on us not to destabilize older branches, and that means being minimalist about what we put into them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Andres Freund wrote: > On 2016-04-19 12:03:22 -0300, Alvaro Herrera wrote: > > > Since this change to BufferGetPage() has caused severe back-patch > > > pain for at least two committers so far, I will revert that (very > > > recent) change to this patch later today unless I hear an > > > objections. > > > > I vote for back-patching a no-op change instead, as discussed elsewhere. > > What about Tom's argument that that'd be problematic for external code? Kevin offered to code it in a way that maintains ABI and API compatibility with some trickery. Robert Haas wrote: > That wouldn't have fixed my problem, which involved rebasing a patch. True. I note that it's possible to munge a patch mechanically to sort out this situation. > I really think it's also a bad precedent to back-patch things into > older branches that are not themselves bug fixes. Users count on us > not to destabilize older branches, and that means being minimalist > about what we put into them. Well, this wouldn't change the inner working of the code at all, only how it looks, so it wouldn't affect users. I grant that it would affect developers of forks. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Apr 19, 2016 at 11:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Andres Freund wrote: >> On 2016-04-19 12:03:22 -0300, Alvaro Herrera wrote: > >>>> Since this change to BufferGetPage() has caused severe back-patch >>>> pain for at least two committers so far, I will revert that (very >>>> recent) change to this patch later today unless I hear an >>>> objections. >>> >>> I vote for back-patching a no-op change instead, as discussed elsewhere. >> >> What about Tom's argument that that'd be problematic for external code? > > Kevin offered to code it in a way that maintains ABI and API > compatibility with some trickery. I pointed out that it would be possible to do so, but specifically said I wasn't arguing for that. We would need to create a new name for what BufferGetPage() does on master, and have that call the old BufferGetPage() on back-branches. That seems pretty ugly. I tend to think that the original approach, while it puts the burden on coders to recognize when TestForOldSnapshot() must be called, is no more onerous than many existing issues coders much worry about -- like whether to add something to outfuncs.c, as an example. I have been skeptical of the nanny approach all along, and after seeing the impact of having it in the tree for a few days, I really am inclined to pull back and put this on the same footing as the other things hackers need to learn and tend to as they code. > Robert Haas wrote: > >> That wouldn't have fixed my problem, which involved rebasing a patch. > > True. I note that it's possible to munge a patch mechanically to sort > out this situation. I admit it is possible. I'm becoming more convinced with each post that it's the wrong approach. I feel like I have been in the modern version of an Æsop fable here: http://www.bartleby.com/17/1/62.html -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner wrote: > On Tue, Apr 19, 2016 at 11:02 AM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Robert Haas wrote: > > > >> That wouldn't have fixed my problem, which involved rebasing a patch. > > > > True. I note that it's possible to munge a patch mechanically to sort > > out this situation. > > I admit it is possible. I'm becoming more convinced with each post > that it's the wrong approach. I feel like I have been in the > modern version of an Æsop fable here: > > http://www.bartleby.com/17/1/62.html LOL. Well, it seems I'm outvoted. I leave you to do with your donkey as you please. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Apr 19, 2016 at 12:49 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Well, it seems I'm outvoted. The no-op change to attempt to force an explicit choice of whether to test for snapshot age after calling BufferGetPage() has been reverted. This eliminates about 500 back-patching pain points in 65 files. In case anyone notices some code left at the bottom of bufmgr.h related to inline functions, that was left on purpose, because I am pretty sure that the fix for the performance regression observed when the "snapshot too old" feature is disabled will involve making at least part of TestForOldSnapshot() an inline function -- so it seemed dumb to rip that out now only to put it back again right away. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Apr 20, 2016 at 8:50 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > In case anyone notices some code left at the bottom of bufmgr.h > related to inline functions, that was left on purpose, because I am > pretty sure that the fix for the performance regression observed > when the "snapshot too old" feature is disabled will involve making > at least part of TestForOldSnapshot() an inline function -- so it > seemed dumb to rip that out now only to put it back again right > away. I pushed something along those lines. I didn't want to inline the whole function because IsCatalogRelation() and RelationIsAccessibleInLogicalDecoding() seemed kinda big to inline and require rel.h to be included; so bringing them into bufmgr.h would have spread that around too far. Putting the quick tests in an inline function which calls a non-inlined _impl function seemed like the best compromise. My connectivity problems to our big NUMA machines have not yet been resolved, so I didn't have a better test case for this than 200 read-only clients at saturation on my single-socket i7, which was only a 2.2% to 2.3% regression -- so I encourage anyone who was able to create something more significant with old_snapshot_threshold = -1 to try with the latest and report the impact for your environment. I'm not sure whether any more is needed here. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 19, 2016 at 07:38:04AM -0400, Robert Haas wrote: > 2. Without this feature, you can kill sessions or transactions to > control bloat, but this feature is properly thought of as a way to > avoid bloat *without* killing sessions or transactions. You can let > the session live, without having it generate bloat, just so long as it > doesn't try to touch any data that has been recently modified. We > have no other feature in PostgreSQL that does something like that. I kind of agreed with Tom about just aborting transactions that held snapshots for too long, and liked the idea this could be set per session, but the idea that we abort only if a backend actually touches the old data is very nice. I can see why the patch author worked hard to do that. How does/did Oracle handle this? I assume we can't find a way to set this per session, right? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Tue, Apr 19, 2016 at 07:38:04AM -0400, Robert Haas wrote:
> > 2. Without this feature, you can kill sessions or transactions to
> > control bloat, but this feature is properly thought of as a way to
> > avoid bloat *without* killing sessions or transactions. You can let
> > the session live, without having it generate bloat, just so long as it
> > doesn't try to touch any data that has been recently modified. We
> > have no other feature in PostgreSQL that does something like that.
>
> I kind of agreed with Tom about just aborting transactions that held
> snapshots for too long, and liked the idea this could be set per
> session, but the idea that we abort only if a backend actually touches
> the old data is very nice. I can see why the patch author worked hard
> to do that.
>
> How does/did Oracle handle this?
>
> On Tue, Apr 19, 2016 at 07:38:04AM -0400, Robert Haas wrote:
> > 2. Without this feature, you can kill sessions or transactions to
> > control bloat, but this feature is properly thought of as a way to
> > avoid bloat *without* killing sessions or transactions. You can let
> > the session live, without having it generate bloat, just so long as it
> > doesn't try to touch any data that has been recently modified. We
> > have no other feature in PostgreSQL that does something like that.
>
> I kind of agreed with Tom about just aborting transactions that held
> snapshots for too long, and liked the idea this could be set per
> session, but the idea that we abort only if a backend actually touches
> the old data is very nice. I can see why the patch author worked hard
> to do that.
>
> How does/did Oracle handle this?
>
IIRC then Oracle gives this error when the space in undo tablespace (aka rollback segment) is low. When the rollback segment gets full, it overwrites the changed data which might be required by some old snapshot and when that old snapshot statement tries to access the data (which is already overwritten), it gets "snapshot too old" error. Assuming there is enough space in rollback segment, Oracle seems to provide a way via Alter System set undo_retention = <time_in_secs>.
Now, if the above understanding of mine is correct, then I think the current implementation done by Kevin is closer to what Oracle provides.
On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote: > On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote: > > > > I kind of agreed with Tom about just aborting transactions that held > > snapshots for too long, and liked the idea this could be set per > > session, but the idea that we abort only if a backend actually touches > > the old data is very nice. I can see why the patch author worked hard > > to do that. > > > > How does/did Oracle handle this? > > > > IIRC then Oracle gives this error when the space in undo tablespace (aka > rollback segment) is low. When the rollback segment gets full, it overwrites > the changed data which might be required by some old snapshot and when that old > snapshot statement tries to access the data (which is already overwritten), it > gets "snapshot too old" error. Assuming there is enough space in rollback > segment, Oracle seems to provide a way via Alter System set undo_retention = > <time_in_secs>. > > Now, if the above understanding of mine is correct, then I think the current > implementation done by Kevin is closer to what Oracle provides. But does the rollback only happen if the long-running Oracle transaction tries to _access_ specific data that was in the undo segment, or _any_ data that potentially could have been in the undo segment? If the later, it seems Kevin's approach is better because you would have to actually need to access old data that was there to be canceled, not just any data that could have been overwritten based on the xid. Also, it seems we have similar behavior already in applying WAL on the standby --- we delay WAL replay when there is a long-running transaction. Once the time expires, we apply the WAL. Do we cancel the long-running transaction at that time, or wait for the long-running transaction to touch some WAL we just applied? If the former, does Kevin's new code allow us to do the later? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Sat, Apr 23, 2016 at 7:50 PM, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote:
> > On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > >
> > > I kind of agreed with Tom about just aborting transactions that held
> > > snapshots for too long, and liked the idea this could be set per
> > > session, but the idea that we abort only if a backend actually touches
> > > the old data is very nice. I can see why the patch author worked hard
> > > to do that.
> > >
> > > How does/did Oracle handle this?
> > >
> >
> > IIRC then Oracle gives this error when the space in undo tablespace (aka
> > rollback segment) is low. When the rollback segment gets full, it overwrites
> > the changed data which might be required by some old snapshot and when that old
> > snapshot statement tries to access the data (which is already overwritten), it
> > gets "snapshot too old" error. Assuming there is enough space in rollback
> > segment, Oracle seems to provide a way via Alter System set undo_retention =
> > <time_in_secs>.
> >
> > Now, if the above understanding of mine is correct, then I think the current
> > implementation done by Kevin is closer to what Oracle provides.
>
> But does the rollback only happen if the long-running Oracle transaction
> tries to _access_ specific data that was in the undo segment, or _any_
> data that potentially could have been in the undo segment?
>
> On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote:
> > On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > >
> > > I kind of agreed with Tom about just aborting transactions that held
> > > snapshots for too long, and liked the idea this could be set per
> > > session, but the idea that we abort only if a backend actually touches
> > > the old data is very nice. I can see why the patch author worked hard
> > > to do that.
> > >
> > > How does/did Oracle handle this?
> > >
> >
> > IIRC then Oracle gives this error when the space in undo tablespace (aka
> > rollback segment) is low. When the rollback segment gets full, it overwrites
> > the changed data which might be required by some old snapshot and when that old
> > snapshot statement tries to access the data (which is already overwritten), it
> > gets "snapshot too old" error. Assuming there is enough space in rollback
> > segment, Oracle seems to provide a way via Alter System set undo_retention =
> > <time_in_secs>.
> >
> > Now, if the above understanding of mine is correct, then I think the current
> > implementation done by Kevin is closer to what Oracle provides.
>
> But does the rollback only happen if the long-running Oracle transaction
> tries to _access_ specific data that was in the undo segment, or _any_
> data that potentially could have been in the undo segment?
>
It does when long running transaction tries to access specific data. If you want to know in more detail then you can read slides 7~29 from the attached presentation (with focus on slides 28 and 29).
> If the
> later, it seems Kevin's approach is better because you would have to
> actually need to access old data that was there to be canceled, not just
> any data that could have been overwritten based on the xid.
>
> Also, it seems we have similar behavior already in applying WAL on the
> standby --- we delay WAL replay when there is a long-running
> transaction. Once the time expires, we apply the WAL. Do we cancel the
> long-running transaction at that time, or wait for the long-running
> transaction to touch some WAL we just applied?
> later, it seems Kevin's approach is better because you would have to
> actually need to access old data that was there to be canceled, not just
> any data that could have been overwritten based on the xid.
>
> Also, it seems we have similar behavior already in applying WAL on the
> standby --- we delay WAL replay when there is a long-running
> transaction. Once the time expires, we apply the WAL. Do we cancel the
> long-running transaction at that time, or wait for the long-running
> transaction to touch some WAL we just applied?
>
As per my understanding, the error is given when any transaction tries to access the data.
Attachment
On Sat, Apr 23, 2016 at 5:20 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote:
> On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote:
> >
> > I kind of agreed with Tom about just aborting transactions that held
> > snapshots for too long, and liked the idea this could be set per
> > session, but the idea that we abort only if a backend actually touches
> > the old data is very nice. I can see why the patch author worked hard
> > to do that.
> >
> > How does/did Oracle handle this?
> >
>
> IIRC then Oracle gives this error when the space in undo tablespace (aka
> rollback segment) is low. When the rollback segment gets full, it overwrites
> the changed data which might be required by some old snapshot and when that old
> snapshot statement tries to access the data (which is already overwritten), it
> gets "snapshot too old" error. Assuming there is enough space in rollback
> segment, Oracle seems to provide a way via Alter System set undo_retention =
> <time_in_secs>.
>
> Now, if the above understanding of mine is correct, then I think the current
> implementation done by Kevin is closer to what Oracle provides.
But does the rollback only happen if the long-running Oracle transaction
tries to _access_ specific data that was in the undo segment, or _any_
data that potentially could have been in the undo segment? If the
later, it seems Kevin's approach is better because you would have to
actually need to access old data that was there to be canceled, not just
any data that could have been overwritten based on the xid.
I'm not sure that we should rely that much on Oracle behavior. It has very different MVCC model.
Thus we can't apply same features one-by-one: they would have different pro and cons for us.
Also, it seems we have similar behavior already in applying WAL on the
standby --- we delay WAL replay when there is a long-running
transaction. Once the time expires, we apply the WAL. Do we cancel the
long-running transaction at that time, or wait for the long-running
transaction to touch some WAL we just applied? If the former, does
Kevin's new code allow us to do the later?
That makes sense for me. If we could improve read-only queries on slaves this way, Kevin's new code becomes much more justified.
The Russian Postgres Company
On Sat, Apr 23, 2016 at 10:20:19AM -0400, Bruce Momjian wrote: > On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote: > > On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote: > > > > > > I kind of agreed with Tom about just aborting transactions that held > > > snapshots for too long, and liked the idea this could be set per > > > session, but the idea that we abort only if a backend actually touches > > > the old data is very nice. I can see why the patch author worked hard > > > to do that. As I understand it, a transaction trying to access a shared buffer aborts if there was a cleanup on the page that removed rows it might be interested in. How does this handle cases where vacuum removes _pages_ from the table? Does vacuum avoid this when there are running transactions? > Also, it seems we have similar behavior already in applying WAL on the > standby --- we delay WAL replay when there is a long-running > transaction. Once the time expires, we apply the WAL. Do we cancel the > long-running transaction at that time, or wait for the long-running > transaction to touch some WAL we just applied? If the former, does > Kevin's new code allow us to do the later? Is this a TODO item? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Sun, May 1, 2016 at 11:54 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Sat, Apr 23, 2016 at 10:20:19AM -0400, Bruce Momjian wrote: >> On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote: >>> On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote: >>>> >>>> I kind of agreed with Tom about just aborting transactions that held >>>> snapshots for too long, and liked the idea this could be set per >>>> session, but the idea that we abort only if a backend actually touches >>>> the old data is very nice. I can see why the patch author worked hard >>>> to do that. > > As I understand it, a transaction trying to access a shared buffer > aborts if there was a cleanup on the page that removed rows it might be > interested in. How does this handle cases where vacuum removes _pages_ > from the table? (1) When the "snapshot too old" feature is enabled (old_snapshot_threshold >= 0) relations are not truncated, so pages cannot be removed that way. This mainly protects against a seq scan having the problem you describe. (2) Other than a seq scan, you could miss a page when descending through an index or following sibling pointers within an index. In either case you can't remove a page without modifying the page pointing to it to no longer do so, so the modified LSN on the parent or sibling will trigger the error. Note that a question has recently been raised regarding hash indexes (which should perhaps be generalized to any non-WAL-logged index on a permanent table. Since that is a correctness issue, versus a performance issue affecting only how many people will find the feature useful, I will add that to the release blockers list and prioritize it ahead of those issues only affecting how many people will find it useful. > Does vacuum avoid this when there are running transactions? I'm not sure I understand the question. >> Also, it seems we have similar behavior already in applying WAL on the >> standby --- we delay WAL replay when there is a long-running >> transaction. Once the time expires, we apply the WAL. Do we cancel the >> long-running transaction at that time, or wait for the long-running >> transaction to touch some WAL we just applied? If the former, does >> Kevin's new code allow us to do the later? > > Is this a TODO item? I'm not aware of any TODO items existing or needed here. The feature operates by adjusting the xmin used by vacuum and pruning, and leaving all the other mechanisms functioning as they were. That looked to me like it should interact with replication streams correctly. If someone sees something that needs adjustment please speak up Real Soon Now. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, May 2, 2016 at 03:50:36PM -0500, Kevin Grittner wrote: > >> Also, it seems we have similar behavior already in applying WAL on the > >> standby --- we delay WAL replay when there is a long-running > >> transaction. Once the time expires, we apply the WAL. Do we cancel the > >> long-running transaction at that time, or wait for the long-running > >> transaction to touch some WAL we just applied? If the former, does > >> Kevin's new code allow us to do the later? > > > > Is this a TODO item? > > I'm not aware of any TODO items existing or needed here. The > feature operates by adjusting the xmin used by vacuum and pruning, > and leaving all the other mechanisms functioning as they were. > That looked to me like it should interact with replication streams > correctly. If someone sees something that needs adjustment please > speak up Real Soon Now. My question is whether this method could also be used to avoid read-only query cancel when we force replay of a conflicting wal record. Could we wait for the read-only query to try to _access_ some old data before cancelling it? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +