Thread: fixing old_snapshot_threshold's time->xid mapping
Hi, I'm starting a new thread for this, because the recent discussion of problems with old_snapshot_threshold[1] touched on a lot of separate issues, and I think it will be too confusing if we discuss all of them on one thread. Attached are three patches. 0001 makes oldSnapshotControl "extern" rather than "static" and exposes the struct definition via a header. 0002 adds a contrib module called old_snapshot which makes it possible to examine the time->XID mapping via SQL. As Andres said, the comments are not really adequate in the existing code, and the code itself is buggy, so it was a little hard to be sure that I was understanding the intended meaning of the different fields correctly. However, I gave it a shot. 0003 attempts to fix bugs in MaintainOldSnapshotTimeMapping() so that it produces a sensible mapping. I encountered and tried to fix two issues here: First, as previously discussed, the branch that advances the mapping should not categorically do "oldSnapshotControl->head_timestamp = ts;" assuming that the head_timestamp is supposed to be the timestamp for the oldest bucket rather than the newest one. Rather, there are three cases: (1) resetting the mapping resets head_timestamp, (2) extending the mapping by an entry without dropping an entry leaves head_timestamp alone, and (3) overwriting the previous head with a new entry advances head_timestamp by 1 minute. Second, the calculation of the number of entries by which the mapping should advance is incorrect. It thinks that it should advance by the number of minutes between the current head_timestamp and the incoming timestamp. That would be correct if head_timestamp were the most recent entry in the mapping, but it's actually the oldest. As a result, without this fix, every time we move into a new minute, we advance the mapping much further than we actually should. Instead of advancing by 1, we advance by the number of entries that already exist in the mapping - which means we now have entries that correspond to times which are in the future, and don't advance the mapping again until those future timestamps are in the past. With these fixes, I seem to get reasonably sensible mappings, at least in light testing. I tried running this in one window with \watch 10: select *, age(newest_xmin), clock_timestamp() from pg_old_snapshot_time_mapping(); And in another window I ran: pgbench -T 300 -R 10 And the age does in fact advance by ~600 transactions per minute. I'm not proposing to commit anything here right now. These patches haven't had enough testing for that, and their interaction with other bugs in the feature needs to be considered before we do anything. However, I thought it might be useful to put them out for review and comment, and I also thought that having the contrib module from 0002 might permit other people to do some better testing of this feature and these fixes. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] http://postgr.es/m/20200401064008.qob7bfnnbu4w5cw4@alap3.anarazel.de
Attachment
Hi, On 2020-04-16 12:41:55 -0400, Robert Haas wrote: > I'm starting a new thread for this, because the recent discussion of > problems with old_snapshot_threshold[1] touched on a lot of separate > issues, and I think it will be too confusing if we discuss all of them > on one thread. Attached are three patches. Cool. > 0003 attempts to fix bugs in MaintainOldSnapshotTimeMapping() so that > it produces a sensible mapping. I encountered and tried to fix two > issues here: > > First, as previously discussed, the branch that advances the mapping > should not categorically do "oldSnapshotControl->head_timestamp = ts;" > assuming that the head_timestamp is supposed to be the timestamp for > the oldest bucket rather than the newest one. Rather, there are three > cases: (1) resetting the mapping resets head_timestamp, (2) extending > the mapping by an entry without dropping an entry leaves > head_timestamp alone, and (3) overwriting the previous head with a new > entry advances head_timestamp by 1 minute. > Second, the calculation of the number of entries by which the mapping > should advance is incorrect. It thinks that it should advance by the > number of minutes between the current head_timestamp and the incoming > timestamp. That would be correct if head_timestamp were the most > recent entry in the mapping, but it's actually the oldest. As a > result, without this fix, every time we move into a new minute, we > advance the mapping much further than we actually should. Instead of > advancing by 1, we advance by the number of entries that already exist > in the mapping - which means we now have entries that correspond to > times which are in the future, and don't advance the mapping again > until those future timestamps are in the past. > > With these fixes, I seem to get reasonably sensible mappings, at least > in light testing. I tried running this in one window with \watch 10: > > select *, age(newest_xmin), clock_timestamp() from > pg_old_snapshot_time_mapping(); > > And in another window I ran: > > pgbench -T 300 -R 10 > > And the age does in fact advance by ~600 transactions per minute. I still think we need a way to test this without waiting for hours to hit various edge cases. You argued against a fixed binning of old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or so? For 60 days, the current max for old_snapshot_threshold, that'd be a granularity of 01:26:24, which seems fine. The best way I can think of that'd keep current GUC values sensible is to change old_snapshot_threshold to be float. Ugly, but ...? Greetings, Andres Freund
On Thu, Apr 16, 2020 at 1:14 PM Andres Freund <andres@anarazel.de> wrote: > I still think we need a way to test this without waiting for hours to > hit various edge cases. You argued against a fixed binning of > old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or > so? For 60 days, the current max for old_snapshot_threshold, that'd be a > granularity of 01:26:24, which seems fine. The best way I can think of > that'd keep current GUC values sensible is to change > old_snapshot_threshold to be float. Ugly, but ...? Yeah, 1000 would be a lot better. However, if we switch to a fixed number of bins, it's going to be a lot more code churn. What did you think of my suggestion of making head_timestamp artificially move backward to simulate the passage of time? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2020-04-16 13:34:39 -0400, Robert Haas wrote: > On Thu, Apr 16, 2020 at 1:14 PM Andres Freund <andres@anarazel.de> wrote: > > I still think we need a way to test this without waiting for hours to > > hit various edge cases. You argued against a fixed binning of > > old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or > > so? For 60 days, the current max for old_snapshot_threshold, that'd be a > > granularity of 01:26:24, which seems fine. The best way I can think of > > that'd keep current GUC values sensible is to change > > old_snapshot_threshold to be float. Ugly, but ...? > > Yeah, 1000 would be a lot better. However, if we switch to a fixed > number of bins, it's going to be a lot more code churn. Given the number of things that need to be addressed around the feature, I am not too concerned about that. > What did you think of my suggestion of making head_timestamp > artificially move backward to simulate the passage of time? I don't think it allows to exercise the various cases well enough. We need to be able to test this feature both interactively as well as in a scripted manner. Edge cases like wrapping around in the time mapping imo can not easily be tested by moving the head timestamp back. Greetings, Andres Freund
On Fri, Apr 17, 2020 at 5:46 AM Andres Freund <andres@anarazel.de> wrote: > On 2020-04-16 13:34:39 -0400, Robert Haas wrote: > > On Thu, Apr 16, 2020 at 1:14 PM Andres Freund <andres@anarazel.de> wrote: > > > I still think we need a way to test this without waiting for hours to > > > hit various edge cases. You argued against a fixed binning of > > > old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or > > > so? For 60 days, the current max for old_snapshot_threshold, that'd be a > > > granularity of 01:26:24, which seems fine. The best way I can think of > > > that'd keep current GUC values sensible is to change > > > old_snapshot_threshold to be float. Ugly, but ...? > > > > Yeah, 1000 would be a lot better. However, if we switch to a fixed > > number of bins, it's going to be a lot more code churn. > > Given the number of things that need to be addressed around the feature, > I am not too concerned about that. > > > > What did you think of my suggestion of making head_timestamp > > artificially move backward to simulate the passage of time? > > I don't think it allows to exercise the various cases well enough. We > need to be able to test this feature both interactively as well as in a > scripted manner. Edge cases like wrapping around in the time mapping imo > can not easily be tested by moving the head timestamp back. What about a contrib function that lets you clobber oldSnapshotControl->current_timestamp? It looks like all times in this system come ultimately from GetSnapshotCurrentTimestamp(), which uses that variable to make sure that time never goes backwards. Perhaps you could abuse that, like so, from test scripts: postgres=# select * from pg_old_snapshot_time_mapping(); array_offset | end_timestamp | newest_xmin --------------+------------------------+------------- 0 | 3000-01-01 13:00:00+13 | 490 (1 row) postgres=# select pg_clobber_current_snapshot_timestamp('3000-01-01 00:01:00Z'); pg_clobber_current_snapshot_timestamp --------------------------------------- (1 row) postgres=# select * from pg_old_snapshot_time_mapping(); array_offset | end_timestamp | newest_xmin --------------+------------------------+------------- 0 | 3000-01-01 13:01:00+13 | 490 1 | 3000-01-01 13:02:00+13 | 490 (2 rows)
Attachment
On Fri, Apr 17, 2020 at 2:12 PM Thomas Munro <thomas.munro@gmail.com> wrote: > What about a contrib function that lets you clobber > oldSnapshotControl->current_timestamp? It looks like all times in > this system come ultimately from GetSnapshotCurrentTimestamp(), which > uses that variable to make sure that time never goes backwards. Here's a draft TAP test that uses that technique successfully, as a POC. It should probably be extended to cover more cases, but I thought I'd check what people thought of the concept first before going further. I didn't see a way to do overlapping transactions with PostgresNode.pm, so I invented one (please excuse the bad perl); am I missing something? Maybe it'd be better to do 002 with an isolation test instead, but I suppose 001 can't be in an isolation test, since it needs to connect to multiple databases, and it seemed better to do them both the same way. It's also not entirely clear to me that isolation tests can expect a database to be fresh and then mess with dangerous internal state, whereas TAP tests set up and tear down a cluster each time. I think I found another bug in MaintainOldSnapshotTimeMapping(): if you make time jump by more than old_snapshot_threshold in one go, then the map gets cleared and then no early pruning or snapshot-too-old errors happen. That's why in 002_too_old.pl it currently advances time by 10 minutes twice, instead of 20 minutes once. To be continued.
Attachment
- v2-0001-Expose-oldSnapshotControl.patch
- v2-0002-contrib-old_snapshot-time-xid-mapping.patch
- v2-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patch
- v2-0004-Add-pg_clobber_current_snapshot_timestamp.patch
- v2-0005-Truncate-old-snapshot-XIDs-before-truncating-CLOG.patch
- v2-0006-Add-TAP-test-for-snapshot-too-old.patch
On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Fri, Apr 17, 2020 at 2:12 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > What about a contrib function that lets you clobber > > oldSnapshotControl->current_timestamp? It looks like all times in > > this system come ultimately from GetSnapshotCurrentTimestamp(), which > > uses that variable to make sure that time never goes backwards. > > Here's a draft TAP test that uses that technique successfully, as a > POC. It should probably be extended to cover more cases, but I > thought I'd check what people thought of the concept first before > going further. I didn't see a way to do overlapping transactions with > PostgresNode.pm, so I invented one (please excuse the bad perl); am I > missing something? Maybe it'd be better to do 002 with an isolation > test instead, but I suppose 001 can't be in an isolation test, since > it needs to connect to multiple databases, and it seemed better to do > them both the same way. It's also not entirely clear to me that > isolation tests can expect a database to be fresh and then mess with > dangerous internal state, whereas TAP tests set up and tear down a > cluster each time. > > I think I found another bug in MaintainOldSnapshotTimeMapping(): if > you make time jump by more than old_snapshot_threshold in one go, then > the map gets cleared and then no early pruning or snapshot-too-old > errors happen. That's why in 002_too_old.pl it currently advances > time by 10 minutes twice, instead of 20 minutes once. To be > continued. IMHO that doesn't seems to be a problem. Because even if we jump more than old_snapshot_threshold in one go we don't clean complete map right. The latest snapshot timestamp will become the headtimestamp. So in TransactionIdLimitedForOldSnapshots if (current_ts - old_snapshot_threshold) is still >= head_timestap then we can still do early pruning. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Hi, On 2020-04-17 14:12:44 +1200, Thomas Munro wrote: > What about a contrib function that lets you clobber > oldSnapshotControl->current_timestamp? It looks like all times in > this system come ultimately from GetSnapshotCurrentTimestamp(), which > uses that variable to make sure that time never goes backwards. It'd be better than the current test situation, and probably would be good to have as part of testing anyway (since it'd allow to make the tests not take long / be racy on slow machines). But I still don't think it really allows to test the feature in a natural way. It makes it easier to test for know edge cases / problems, but not really discover unknown ones. For that I think we need more granular bins. - Andres
On Thu, Apr 16, 2020 at 10:12 PM Robert Haas <robertmhaas@gmail.com> wrote: > > Hi, > > I'm starting a new thread for this, because the recent discussion of > problems with old_snapshot_threshold[1] touched on a lot of separate > issues, and I think it will be too confusing if we discuss all of them > on one thread. Attached are three patches. > > 0001 makes oldSnapshotControl "extern" rather than "static" and > exposes the struct definition via a header. > > 0002 adds a contrib module called old_snapshot which makes it possible > to examine the time->XID mapping via SQL. As Andres said, the comments > are not really adequate in the existing code, and the code itself is > buggy, so it was a little hard to be sure that I was understanding the > intended meaning of the different fields correctly. However, I gave it > a shot. > > 0003 attempts to fix bugs in MaintainOldSnapshotTimeMapping() so that > it produces a sensible mapping. I encountered and tried to fix two > issues here: > > First, as previously discussed, the branch that advances the mapping > should not categorically do "oldSnapshotControl->head_timestamp = ts;" > assuming that the head_timestamp is supposed to be the timestamp for > the oldest bucket rather than the newest one. Rather, there are three > cases: (1) resetting the mapping resets head_timestamp, (2) extending > the mapping by an entry without dropping an entry leaves > head_timestamp alone, and (3) overwriting the previous head with a new > entry advances head_timestamp by 1 minute. > > Second, the calculation of the number of entries by which the mapping > should advance is incorrect. It thinks that it should advance by the > number of minutes between the current head_timestamp and the incoming > timestamp. That would be correct if head_timestamp were the most > recent entry in the mapping, but it's actually the oldest. As a > result, without this fix, every time we move into a new minute, we > advance the mapping much further than we actually should. Instead of > advancing by 1, we advance by the number of entries that already exist > in the mapping - which means we now have entries that correspond to > times which are in the future, and don't advance the mapping again > until those future timestamps are in the past. > > With these fixes, I seem to get reasonably sensible mappings, at least > in light testing. I tried running this in one window with \watch 10: > > select *, age(newest_xmin), clock_timestamp() from > pg_old_snapshot_time_mapping(); > > And in another window I ran: > > pgbench -T 300 -R 10 > > And the age does in fact advance by ~600 transactions per minute. I have started reviewing these patches. I think, the fixes looks right to me. + LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED); + mapping->head_offset = oldSnapshotControl->head_offset; + mapping->head_timestamp = oldSnapshotControl->head_timestamp; + mapping->count_used = oldSnapshotControl->count_used; + for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i) + mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i]; + LWLockRelease(OldSnapshotTimeMapLock); I think memcpy would be a better choice instead of looping it for all the entries, since we are doing this under a lock? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > I think I found another bug in MaintainOldSnapshotTimeMapping(): if > > you make time jump by more than old_snapshot_threshold in one go, then > > the map gets cleared and then no early pruning or snapshot-too-old > > errors happen. That's why in 002_too_old.pl it currently advances > > time by 10 minutes twice, instead of 20 minutes once. To be > > continued. > > IMHO that doesn't seems to be a problem. Because even if we jump more > than old_snapshot_threshold in one go we don't clean complete map > right. The latest snapshot timestamp will become the headtimestamp. > So in TransactionIdLimitedForOldSnapshots if (current_ts - > old_snapshot_threshold) is still >= head_timestap then we can still do > early pruning. Right, thanks. I got confused about that, and misdiagnosed something I was seeing. Here's a new version: 0004: Instead of writing a new kind of TAP test to demonstrate snapshot-too-old errors, I adjusted the existing isolation tests to use the same absolute time control technique. Previously I had invented a way to do isolation tester-like stuff in TAP tests, which might be interesting but strange new perl is not necessary for this. 0005: Truncates the time map when the CLOG is truncated. Its test is now under src/test/module/snapshot_too_old/t/001_truncate.sql. These apply on top of Robert's patches, but the only dependency is on his patch 0001 "Expose oldSnapshotControl.", because now I have stuff in src/test/module/snapshot_too_old/test_sto.c that wants to mess with that object too. Is this an improvement? I realise that there is still nothing to actually verify that early pruning has actually happened. I haven't thought of a good way to do that yet (stats, page inspection, ...).
Attachment
On Mon, Apr 20, 2020 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > > I think I found another bug in MaintainOldSnapshotTimeMapping(): if > > > you make time jump by more than old_snapshot_threshold in one go, then > > > the map gets cleared and then no early pruning or snapshot-too-old > > > errors happen. That's why in 002_too_old.pl it currently advances > > > time by 10 minutes twice, instead of 20 minutes once. To be > > > continued. > > > > IMHO that doesn't seems to be a problem. Because even if we jump more > > than old_snapshot_threshold in one go we don't clean complete map > > right. The latest snapshot timestamp will become the headtimestamp. > > So in TransactionIdLimitedForOldSnapshots if (current_ts - > > old_snapshot_threshold) is still >= head_timestap then we can still do > > early pruning. > > Right, thanks. I got confused about that, and misdiagnosed something > I was seeing. > > Here's a new version: > > 0004: Instead of writing a new kind of TAP test to demonstrate > snapshot-too-old errors, I adjusted the existing isolation tests to > use the same absolute time control technique. Previously I had > invented a way to do isolation tester-like stuff in TAP tests, which > might be interesting but strange new perl is not necessary for this. > > 0005: Truncates the time map when the CLOG is truncated. Its test is > now under src/test/module/snapshot_too_old/t/001_truncate.sql. > > These apply on top of Robert's patches, but the only dependency is on > his patch 0001 "Expose oldSnapshotControl.", because now I have stuff > in src/test/module/snapshot_too_old/test_sto.c that wants to mess with > that object too. > > Is this an improvement? I realise that there is still nothing to > actually verify that early pruning has actually happened. I haven't > thought of a good way to do that yet (stats, page inspection, ...). Could we test the early pruning using xid-burn patch? Basically, in xid_by_minute we have some xids with the current epoch. Now, we burns more than 2b xid and then if we try to vacuum we might hit the case of early pruning no. Do you wnated to this case or you had some other case in mind which you wnated to test? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Apr 20, 2020 at 6:35 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > On Mon, Apr 20, 2020 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > Is this an improvement? I realise that there is still nothing to > > actually verify that early pruning has actually happened. I haven't > > thought of a good way to do that yet (stats, page inspection, ...). > > Could we test the early pruning using xid-burn patch? Basically, in > xid_by_minute we have some xids with the current epoch. Now, we burns > more than 2b xid and then if we try to vacuum we might hit the case of > early pruning no. Do you wnated to this case or you had some other > case in mind which you wnated to test? I mean I want to verify that VACUUM or heap prune actually removed a tuple that was visible to an old snapshot. An idea I just had: maybe sto_using_select.spec should check the visibility map (somehow). For example, the sto_using_select.spec (the version in the patch I just posted) just checks that after time 00:11, the old snapshot gets a snapshot-too-old error. Perhaps we could add a VACUUM before that, and then check that the page has become all visible, meaning that the dead tuple our snapshot could see has now been removed.
On Mon, Apr 20, 2020 at 12:29 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Mon, Apr 20, 2020 at 6:35 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Apr 20, 2020 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > > > On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > > Is this an improvement? I realise that there is still nothing to > > > actually verify that early pruning has actually happened. I haven't > > > thought of a good way to do that yet (stats, page inspection, ...). > > > > Could we test the early pruning using xid-burn patch? Basically, in > > xid_by_minute we have some xids with the current epoch. Now, we burns > > more than 2b xid and then if we try to vacuum we might hit the case of > > early pruning no. Do you wnated to this case or you had some other > > case in mind which you wnated to test? > > I mean I want to verify that VACUUM or heap prune actually removed a > tuple that was visible to an old snapshot. An idea I just had: maybe > sto_using_select.spec should check the visibility map (somehow). For > example, the sto_using_select.spec (the version in the patch I just > posted) just checks that after time 00:11, the old snapshot gets a > snapshot-too-old error. Perhaps we could add a VACUUM before that, > and then check that the page has become all visible, meaning that the > dead tuple our snapshot could see has now been removed. Okay, got your point. Can we try to implement some test functions that can just call visibilitymap_get_status function internally? I agree that we will have to pass the correct block number but that we can find using TID. Or for testing, we can create a very small relation that just has 1 block? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Apr 20, 2020 at 12:10 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > I have started reviewing these patches. I think, the fixes looks right to me. > > + LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED); > + mapping->head_offset = oldSnapshotControl->head_offset; > + mapping->head_timestamp = oldSnapshotControl->head_timestamp; > + mapping->count_used = oldSnapshotControl->count_used; > + for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i) > + mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i]; > + LWLockRelease(OldSnapshotTimeMapLock); > > I think memcpy would be a better choice instead of looping it for all > the entries, since we are doing this under a lock? When I did it that way, it complained about "const" and I couldn't immediately figure out how to fix it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Apr 20, 2020 at 8:02 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > On Mon, Apr 20, 2020 at 12:29 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > I mean I want to verify that VACUUM or heap prune actually removed a > > tuple that was visible to an old snapshot. An idea I just had: maybe > > sto_using_select.spec should check the visibility map (somehow). For > > example, the sto_using_select.spec (the version in the patch I just > > posted) just checks that after time 00:11, the old snapshot gets a > > snapshot-too-old error. Perhaps we could add a VACUUM before that, > > and then check that the page has become all visible, meaning that the > > dead tuple our snapshot could see has now been removed. > > Okay, got your point. Can we try to implement some test functions > that can just call visibilitymap_get_status function internally? I > agree that we will have to pass the correct block number but that we > can find using TID. Or for testing, we can create a very small > relation that just has 1 block? I think it's enough to check SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass). I realised that src/test/module/snapshot_too_old is allowed to install contrib/pg_visibility with EXTRA_INSTALL, so here's a new version to try that idea. It extends sto_using_select.spec to VACUUM and check the vis map at key times. That allows us to check that early pruning really happens once the snapshot becomes too old. There are other ways you could check that but this seems quite "light touch" compared to something based on page inspection. I also changed src/test/module/snapshot_too_old/t/001_truncate.pl back to using Robert's contrib/old_snapshot extension to know the size of the time/xid map, allowing an introspection function to be dropped from test_sto.c. As before, these two apply on top of Robert's patches (or at least his 0001 and 0002).
Attachment
On Mon, Apr 20, 2020 at 11:31 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Apr 20, 2020 at 12:10 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I have started reviewing these patches. I think, the fixes looks right to me. > > > > + LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED); > > + mapping->head_offset = oldSnapshotControl->head_offset; > > + mapping->head_timestamp = oldSnapshotControl->head_timestamp; > > + mapping->count_used = oldSnapshotControl->count_used; > > + for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i) > > + mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i]; > > + LWLockRelease(OldSnapshotTimeMapLock); > > > > I think memcpy would be a better choice instead of looping it for all > > the entries, since we are doing this under a lock? > > When I did it that way, it complained about "const" and I couldn't > immediately figure out how to fix it. I think we can typecast to (const void *). After below change, I did not get the warning. diff --git a/contrib/old_snapshot/time_mapping.c b/contrib/old_snapshot/time_mapping.c index 37e0055..cc53bdd 100644 --- a/contrib/old_snapshot/time_mapping.c +++ b/contrib/old_snapshot/time_mapping.c @@ -94,8 +94,9 @@ GetOldSnapshotTimeMapping(void) mapping->head_offset = oldSnapshotControl->head_offset; mapping->head_timestamp = oldSnapshotControl->head_timestamp; mapping->count_used = oldSnapshotControl->count_used; - for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i) - mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i]; + memcpy(mapping->xid_by_minute, + (const void *) oldSnapshotControl->xid_by_minute, + sizeof(TransactionId) * OLD_SNAPSHOT_TIME_MAP_ENTRIES); LWLockRelease(OldSnapshotTimeMapLock); -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 21, 2020 at 2:05 PM Thomas Munro <thomas.munro@gmail.com> wrote: > As before, these two apply on top of Robert's patches (or at least his > 0001 and 0002). While trying to figure out if Robert's 0003 patch was correct, I added yet another patch to this stack to test it. 0006 does basic xid map maintenance that exercises the cases 0003 fixes, and I think it demonstrates that they now work correctly. Also some minor perl improvements to 0005. I'll attach 0001-0004 again but they are unchanged. Since confusion about head vs tail seems to have been at the root of the bugs addressed by 0003, I wonder if we should also rename head_{timestamp,offset} to oldest_{timestamp,offset}.
Attachment
- v5-0001-Expose-oldSnapshotControl.patch
- v5-0002-contrib-old_snapshot-time-xid-mapping.patch
- v5-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patch
- v5-0004-Rewrite-the-snapshot_too_old-tests.patch
- v5-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patch
- v5-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patch
On Tue, Apr 21, 2020 at 3:44 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Tue, Apr 21, 2020 at 2:05 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > As before, these two apply on top of Robert's patches (or at least his > > 0001 and 0002). > > While trying to figure out if Robert's 0003 patch was correct, I added > yet another patch to this stack to test it. 0006 does basic xid map > maintenance that exercises the cases 0003 fixes, and I think it > demonstrates that they now work correctly. +1, I think we should also add a way to test the case, where we advance the timestamp by multiple slots. I see that you have such case e.g +# test adding minutes while the map is not full +set_time('3000-01-01 02:01:00Z'); +is(summarize_mapping(), "2|02:00:00|02:01:00"); +set_time('3000-01-01 02:05:00Z'); +is(summarize_mapping(), "6|02:00:00|02:05:00"); +set_time('3000-01-01 02:19:00Z'); +is(summarize_mapping(), "20|02:00:00|02:19:00"); But, I think we should try to extend it to test that we have put the new xid only in those slots where we suppose to and not in other slots?. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 21, 2020 at 4:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Apr 21, 2020 at 3:44 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > On Tue, Apr 21, 2020 at 2:05 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > As before, these two apply on top of Robert's patches (or at least his > > > 0001 and 0002). > > > > While trying to figure out if Robert's 0003 patch was correct, I added > > yet another patch to this stack to test it. 0006 does basic xid map > > maintenance that exercises the cases 0003 fixes, and I think it > > demonstrates that they now work correctly. > > +1, I think we should also add a way to test the case, where we > advance the timestamp by multiple slots. I see that you have such > case > e.g > +# test adding minutes while the map is not full > +set_time('3000-01-01 02:01:00Z'); > +is(summarize_mapping(), "2|02:00:00|02:01:00"); > +set_time('3000-01-01 02:05:00Z'); > +is(summarize_mapping(), "6|02:00:00|02:05:00"); > +set_time('3000-01-01 02:19:00Z'); > +is(summarize_mapping(), "20|02:00:00|02:19:00"); > > But, I think we should try to extend it to test that we have put the > new xid only in those slots where we suppose to and not in other > slots?. I feel that we should. probably fix this check as well? Because if ts > update_ts then it will go to else part then there it will finally end up in the last slot only so I think we can use this case also as fast exit. diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c index 93a0c04..644d9b1 100644 --- a/src/backend/utils/time/snapmgr.c +++ b/src/backend/utils/time/snapmgr.c @@ -1831,7 +1831,7 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin, if (!same_ts_as_threshold) { - if (ts == update_ts) + if (ts >= update_ts) { xlimit = latest_xmin; if (NormalTransactionIdFollows(xlimit, recentXmin)) This patch can be applied on top of other v5 patches. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, Apr 22, 2020 at 5:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > - if (ts == update_ts) > + if (ts >= update_ts) Hi Dilip, I didn't follow this bit -- could you explain? Here's a rebase. In the 0004 patch I chose to leave behind some unnecessary braces to avoid reindenting a bunch of code after removing an if branch, just for ease of review, but I'd probably remove those in a committed version. I'm going to add this stuff to the next CF so we don't lose track of it.
Attachment
- v6-0001-Expose-oldSnapshotControl.patch
- v6-0002-contrib-old_snapshot-time-xid-mapping.patch
- v6-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patch
- v6-0004-Rewrite-the-snapshot_too_old-tests.patch
- v6-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patch
- v6-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patch
On Fri, Aug 14, 2020 at 12:52 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Here's a rebase. And another, since I was too slow and v6 is already in conflict... sorry for the high frequency patches.
Attachment
- v7-0001-Expose-oldSnapshotControl.patch
- v7-0002-contrib-old_snapshot-time-xid-mapping.patch
- v7-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patch
- v7-0004-Rewrite-the-snapshot_too_old-tests.patch
- v7-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patch
- v7-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patch
On Fri, Aug 14, 2020 at 1:04 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Aug 14, 2020 at 12:52 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Here's a rebase. > > And another, since I was too slow and v6 is already in conflict... > sorry for the high frequency patches. And ... now that this has a commitfest entry, cfbot told me about a small problem in a makefile. Third time lucky?
Attachment
- v8-0001-Expose-oldSnapshotControl.patch
- v8-0002-contrib-old_snapshot-time-xid-mapping.patch
- v8-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patch
- v8-0004-Rewrite-the-snapshot_too_old-tests.patch
- v8-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patch
- v8-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patch
On Sat, Aug 15, 2020 at 10:09:15AM +1200, Thomas Munro wrote: > And ... now that this has a commitfest entry, cfbot told me about a > small problem in a makefile. Third time lucky? Still lucky since then, and the CF bot does not complain. So... The meat of the patch is in 0003 which is fixing an actual bug. Robert, Thomas, anything specific you are waiting for here? As this is a bug fix, perhaps it would be better to just move on with some portions of the set? Kevin, I really think that you should chime in here. This is originally your feature. -- Michael
Attachment
On Thu, Sep 17, 2020 at 1:47 AM Michael Paquier <michael@paquier.xyz> wrote: > On Sat, Aug 15, 2020 at 10:09:15AM +1200, Thomas Munro wrote: > > And ... now that this has a commitfest entry, cfbot told me about a > > small problem in a makefile. Third time lucky? > > Still lucky since then, and the CF bot does not complain. So... The > meat of the patch is in 0003 which is fixing an actual bug. Robert, > Thomas, anything specific you are waiting for here? As this is a bug > fix, perhaps it would be better to just move on with some portions of > the set? Yeah, I plan to push forward with 0001 through 0003 soon, but 0001 needs to be revised with a PGDLLIMPORT marking, I think, and 0002 needs documentation. Not sure whether there's going to be adequate support for back-patching given that it's adding a new contrib module for observability and not just fixing a bug, so my tentative plan is to just push into master. If there is a great clamor for back-patching then I can, but I'm not very excited about pushing the bug fix into the back-branches without the observability stuff, because then if somebody claims that it's not working properly, it'll be almost impossible to understand why. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 17, 2020 at 10:40 AM Robert Haas <robertmhaas@gmail.com> wrote: > Yeah, I plan to push forward with 0001 through 0003 soon, but 0001 > needs to be revised with a PGDLLIMPORT marking, I think, and 0002 > needs documentation. So here's an updated version of those three, with proposed commit messages, a PGDLLIMPORT for 0001, and docs for 0002. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Sat, Sep 19, 2020 at 12:19 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Sep 17, 2020 at 10:40 AM Robert Haas <robertmhaas@gmail.com> wrote: > > Yeah, I plan to push forward with 0001 through 0003 soon, but 0001 > > needs to be revised with a PGDLLIMPORT marking, I think, and 0002 > > needs documentation. > > So here's an updated version of those three, with proposed commit > messages, a PGDLLIMPORT for 0001, and docs for 0002. LGTM.
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: not tested Spec compliant: not tested Documentation: not tested Patch looks good to me.
On Wed, Sep 23, 2020 at 9:16 PM Thomas Munro <thomas.munro@gmail.com> wrote: > LGTM. Committed. Thomas, with respect to your part of this patch set, I wonder if we can make the functions that you're using to write tests safe enough that we could add them to contrib/old_snapshot and let users run them if they want. As you have them, they are hedged around with vague and scary warnings, but is that really justified? And if so, can it be fixed? It would be nicer not to end up with two loadable modules here, and maybe the right sorts of functions could even have some practical use. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 24, 2020 at 03:46:14PM -0400, Robert Haas wrote: > Committed. Cool, thanks. > Thomas, with respect to your part of this patch set, I wonder if we > can make the functions that you're using to write tests safe enough > that we could add them to contrib/old_snapshot and let users run them > if they want. As you have them, they are hedged around with vague and > scary warnings, but is that really justified? And if so, can it be > fixed? It would be nicer not to end up with two loadable modules here, > and maybe the right sorts of functions could even have some practical > use. I have switched this item as waiting on author in the CF app then, as we are not completely done yet. -- Michael
Attachment
On Fri, Sep 25, 2020 at 2:00 PM Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Sep 24, 2020 at 03:46:14PM -0400, Robert Haas wrote: > > Committed. > > Cool, thanks. +1 > > Thomas, with respect to your part of this patch set, I wonder if we > > can make the functions that you're using to write tests safe enough > > that we could add them to contrib/old_snapshot and let users run them > > if they want. As you have them, they are hedged around with vague and > > scary warnings, but is that really justified? And if so, can it be > > fixed? It would be nicer not to end up with two loadable modules here, > > and maybe the right sorts of functions could even have some practical > > use. Yeah, you may be right. I am thinking about that. In the meantime, here is a rebase. A quick recap of these remaining patches: 0001 replaces the current "magic test mode" that didn't really test anything with a new test mode that verifies pruning and STO behaviour. 0002 fixes a separate bug that Andres reported: the STO XID map suffers from wraparound-itis. 0003 adds a simple smoke test for Robert's commit 55b7e2f4. Before that fix landed, it failed. > I have switched this item as waiting on author in the CF app then, as > we are not completely done yet. Thanks. For the record, I think there is still one more complaint from Andres that remains unaddressed even once these are in the tree: there are thought to be some more places that lack TestForOldSnapshot() calls.
Attachment
On 06.10.2020 08:32, Thomas Munro wrote: > On Fri, Sep 25, 2020 at 2:00 PM Michael Paquier <michael@paquier.xyz> wrote: >> On Thu, Sep 24, 2020 at 03:46:14PM -0400, Robert Haas wrote: >> >>> Thomas, with respect to your part of this patch set, I wonder if we >>> can make the functions that you're using to write tests safe enough >>> that we could add them to contrib/old_snapshot and let users run them >>> if they want. As you have them, they are hedged around with vague and >>> scary warnings, but is that really justified? And if so, can it be >>> fixed? It would be nicer not to end up with two loadable modules here, >>> and maybe the right sorts of functions could even have some practical >>> use. > Yeah, you may be right. I am thinking about that. In the meantime, > here is a rebase. A quick recap of these remaining patches: > > 0001 replaces the current "magic test mode" that didn't really test > anything with a new test mode that verifies pruning and STO behaviour. > 0002 fixes a separate bug that Andres reported: the STO XID map > suffers from wraparound-itis. > 0003 adds a simple smoke test for Robert's commit 55b7e2f4. Before > that fix landed, it failed. > >> I have switched this item as waiting on author in the CF app then, as >> we are not completely done yet. > Thanks. For the record, I think there is still one more complaint > from Andres that remains unaddressed even once these are in the tree: > there are thought to be some more places that lack > TestForOldSnapshot() calls. Status update for a commitfest entry. This entry is "Waiting on author" and the thread was inactive for a while. As far as I see, part of the fixes is already committed. Is there anything left to work on or this patch set needs review now? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company