Thread: fixing old_snapshot_threshold's time->xid mapping

fixing old_snapshot_threshold's time->xid mapping

From

Robert Haas

Date:

16 April 2020, 16:41:55

Hi,

I'm starting a new thread for this, because the recent discussion of
problems with old_snapshot_threshold[1] touched on a lot of separate
issues, and I think it will be too confusing if we discuss all of them
on one thread. Attached are three patches.

0001 makes oldSnapshotControl "extern" rather than "static" and
exposes the struct definition via a header.

0002 adds a contrib module called old_snapshot which makes it possible
to examine the time->XID mapping via SQL. As Andres said, the comments
are not really adequate in the existing code, and the code itself is
buggy, so it was a little hard to be sure that I was understanding the
intended meaning of the different fields correctly. However, I gave it
a shot.

0003 attempts to fix bugs in MaintainOldSnapshotTimeMapping() so that
it produces a sensible mapping. I encountered and tried to fix two
issues here:

First, as previously discussed, the branch that advances the mapping
should not categorically do "oldSnapshotControl->head_timestamp = ts;"
assuming that the head_timestamp is supposed to be the timestamp for
the oldest bucket rather than the newest one. Rather, there are three
cases: (1) resetting the mapping resets head_timestamp, (2) extending
the mapping by an entry without dropping an entry leaves
head_timestamp alone, and (3) overwriting the previous head with a new
entry advances head_timestamp by 1 minute.

Second, the calculation of the number of entries by which the mapping
should advance is incorrect. It thinks that it should advance by the
number of minutes between the current head_timestamp and the incoming
timestamp. That would be correct if head_timestamp were the most
recent entry in the mapping, but it's actually the oldest. As a
result, without this fix, every time we move into a new minute, we
advance the mapping much further than we actually should. Instead of
advancing by 1, we advance by the number of entries that already exist
in the mapping - which means we now have entries that correspond to
times which are in the future, and don't advance the mapping again
until those future timestamps are in the past.

With these fixes, I seem to get reasonably sensible mappings, at least
in light testing.  I tried running this in one window with \watch 10:

select *, age(newest_xmin), clock_timestamp()  from
pg_old_snapshot_time_mapping();

And in another window I ran:

pgbench -T 300 -R 10

And the age does in fact advance by ~600 transactions per minute.

I'm not proposing to commit anything here right now. These patches
haven't had enough testing for that, and their interaction with other
bugs in the feature needs to be considered before we do anything.
However, I thought it might be useful to put them out for review and
comment, and I also thought that having the contrib module from 0002
might permit other people to do some better testing of this feature
and these fixes.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] http://postgr.es/m/20200401064008.qob7bfnnbu4w5cw4@alap3.anarazel.de

Attachment

Re: fixing old_snapshot_threshold's time->xid mapping

From

Andres Freund

Date:

16 April 2020, 17:14:41

Hi,

On 2020-04-16 12:41:55 -0400, Robert Haas wrote:
> I'm starting a new thread for this, because the recent discussion of
> problems with old_snapshot_threshold[1] touched on a lot of separate
> issues, and I think it will be too confusing if we discuss all of them
> on one thread. Attached are three patches.

Cool.


> 0003 attempts to fix bugs in MaintainOldSnapshotTimeMapping() so that
> it produces a sensible mapping. I encountered and tried to fix two
> issues here:
> 
> First, as previously discussed, the branch that advances the mapping
> should not categorically do "oldSnapshotControl->head_timestamp = ts;"
> assuming that the head_timestamp is supposed to be the timestamp for
> the oldest bucket rather than the newest one. Rather, there are three
> cases: (1) resetting the mapping resets head_timestamp, (2) extending
> the mapping by an entry without dropping an entry leaves
> head_timestamp alone, and (3) overwriting the previous head with a new
> entry advances head_timestamp by 1 minute.

> Second, the calculation of the number of entries by which the mapping
> should advance is incorrect. It thinks that it should advance by the
> number of minutes between the current head_timestamp and the incoming
> timestamp. That would be correct if head_timestamp were the most
> recent entry in the mapping, but it's actually the oldest. As a
> result, without this fix, every time we move into a new minute, we
> advance the mapping much further than we actually should. Instead of
> advancing by 1, we advance by the number of entries that already exist
> in the mapping - which means we now have entries that correspond to
> times which are in the future, and don't advance the mapping again
> until those future timestamps are in the past.
> 
> With these fixes, I seem to get reasonably sensible mappings, at least
> in light testing.  I tried running this in one window with \watch 10:
> 
> select *, age(newest_xmin), clock_timestamp()  from
> pg_old_snapshot_time_mapping();
> 
> And in another window I ran:
> 
> pgbench -T 300 -R 10
> 
> And the age does in fact advance by ~600 transactions per minute.

I still think we need a way to test this without waiting for hours to
hit various edge cases. You argued against a fixed binning of
old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or
so? For 60 days, the current max for old_snapshot_threshold, that'd be a
granularity of 01:26:24, which seems fine.  The best way I can think of
that'd keep current GUC values sensible is to change
old_snapshot_threshold to be float. Ugly, but ...?


Greetings,

Andres Freund

Re: fixing old_snapshot_threshold's time->xid mapping

From

Robert Haas

Date:

16 April 2020, 17:34:39

On Thu, Apr 16, 2020 at 1:14 PM Andres Freund <andres@anarazel.de> wrote:
> I still think we need a way to test this without waiting for hours to
> hit various edge cases. You argued against a fixed binning of
> old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or
> so? For 60 days, the current max for old_snapshot_threshold, that'd be a
> granularity of 01:26:24, which seems fine.  The best way I can think of
> that'd keep current GUC values sensible is to change
> old_snapshot_threshold to be float. Ugly, but ...?

Yeah, 1000 would be a lot better. However, if we switch to a fixed
number of bins, it's going to be a lot more code churn. What did you
think of my suggestion of making head_timestamp artificially move
backward to simulate the passage of time?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: fixing old_snapshot_threshold's time->xid mapping

From

Andres Freund

Date:

16 April 2020, 17:46:01

Hi,

On 2020-04-16 13:34:39 -0400, Robert Haas wrote:
> On Thu, Apr 16, 2020 at 1:14 PM Andres Freund <andres@anarazel.de> wrote:
> > I still think we need a way to test this without waiting for hours to
> > hit various edge cases. You argued against a fixed binning of
> > old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or
> > so? For 60 days, the current max for old_snapshot_threshold, that'd be a
> > granularity of 01:26:24, which seems fine.  The best way I can think of
> > that'd keep current GUC values sensible is to change
> > old_snapshot_threshold to be float. Ugly, but ...?
> 
> Yeah, 1000 would be a lot better. However, if we switch to a fixed
> number of bins, it's going to be a lot more code churn.

Given the number of things that need to be addressed around the feature,
I am not too concerned about that.


> What did you think of my suggestion of making head_timestamp
> artificially move backward to simulate the passage of time?

I don't think it allows to exercise the various cases well enough. We
need to be able to test this feature both interactively as well as in a
scripted manner. Edge cases like wrapping around in the time mapping imo
can not easily be tested by moving the head timestamp back.

Greetings,

Andres Freund

Re: fixing old_snapshot_threshold's time->xid mapping

From

Thomas Munro

Date:

17 April 2020, 02:12:44

On Fri, Apr 17, 2020 at 5:46 AM Andres Freund <andres@anarazel.de> wrote:
> On 2020-04-16 13:34:39 -0400, Robert Haas wrote:
> > On Thu, Apr 16, 2020 at 1:14 PM Andres Freund <andres@anarazel.de> wrote:
> > > I still think we need a way to test this without waiting for hours to
> > > hit various edge cases. You argued against a fixed binning of
> > > old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or
> > > so? For 60 days, the current max for old_snapshot_threshold, that'd be a
> > > granularity of 01:26:24, which seems fine.  The best way I can think of
> > > that'd keep current GUC values sensible is to change
> > > old_snapshot_threshold to be float. Ugly, but ...?
> >
> > Yeah, 1000 would be a lot better. However, if we switch to a fixed
> > number of bins, it's going to be a lot more code churn.
>
> Given the number of things that need to be addressed around the feature,
> I am not too concerned about that.
>
>
> > What did you think of my suggestion of making head_timestamp
> > artificially move backward to simulate the passage of time?
>
> I don't think it allows to exercise the various cases well enough. We
> need to be able to test this feature both interactively as well as in a
> scripted manner. Edge cases like wrapping around in the time mapping imo
> can not easily be tested by moving the head timestamp back.

What about a contrib function that lets you clobber
oldSnapshotControl->current_timestamp?  It looks like all times in
this system come ultimately from GetSnapshotCurrentTimestamp(), which
uses that variable to make sure that time never goes backwards.
Perhaps you could abuse that, like so, from test scripts:

postgres=# select * from pg_old_snapshot_time_mapping();
 array_offset |     end_timestamp      | newest_xmin
--------------+------------------------+-------------
            0 | 3000-01-01 13:00:00+13 |         490
(1 row)

postgres=# select pg_clobber_current_snapshot_timestamp('3000-01-01 00:01:00Z');
 pg_clobber_current_snapshot_timestamp
---------------------------------------

(1 row)

postgres=# select * from pg_old_snapshot_time_mapping();
 array_offset |     end_timestamp      | newest_xmin
--------------+------------------------+-------------
            0 | 3000-01-01 13:01:00+13 |         490
            1 | 3000-01-01 13:02:00+13 |         490
(2 rows)

Attachment

0003-Add-pg_clobber_current_snapshot_timestamp.patch

Re: fixing old_snapshot_threshold's time->xid mapping

From

Thomas Munro

Date:

18 April 2020, 06:16:48

On Fri, Apr 17, 2020 at 2:12 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> What about a contrib function that lets you clobber
> oldSnapshotControl->current_timestamp?  It looks like all times in
> this system come ultimately from GetSnapshotCurrentTimestamp(), which
> uses that variable to make sure that time never goes backwards.

Here's a draft TAP test that uses that technique successfully, as a
POC.  It should probably be extended to cover more cases, but I
thought I'd check what people thought of the concept first before
going further.  I didn't see a way to do overlapping transactions with
PostgresNode.pm, so I invented one (please excuse the bad perl); am I
missing something?  Maybe it'd be better to do 002 with an isolation
test instead, but I suppose 001 can't be in an isolation test, since
it needs to connect to multiple databases, and it seemed better to do
them both the same way.  It's also not entirely clear to me that
isolation tests can expect a database to be fresh and then mess with
dangerous internal state, whereas TAP tests set up and tear down a
cluster each time.

I think I found another bug in MaintainOldSnapshotTimeMapping(): if
you make time jump by more than old_snapshot_threshold in one go, then
the map gets cleared and then no early pruning or snapshot-too-old
errors happen.  That's why in 002_too_old.pl it currently advances
time by 10 minutes twice, instead of 20 minutes once.  To be
continued.

Attachment

Re: fixing old_snapshot_threshold's time->xid mapping

From

Dilip Kumar

Date:

18 April 2020, 09:27:35

On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Fri, Apr 17, 2020 at 2:12 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > What about a contrib function that lets you clobber
> > oldSnapshotControl->current_timestamp?  It looks like all times in
> > this system come ultimately from GetSnapshotCurrentTimestamp(), which
> > uses that variable to make sure that time never goes backwards.
>
> Here's a draft TAP test that uses that technique successfully, as a
> POC.  It should probably be extended to cover more cases, but I
> thought I'd check what people thought of the concept first before
> going further.  I didn't see a way to do overlapping transactions with
> PostgresNode.pm, so I invented one (please excuse the bad perl); am I
> missing something?  Maybe it'd be better to do 002 with an isolation
> test instead, but I suppose 001 can't be in an isolation test, since
> it needs to connect to multiple databases, and it seemed better to do
> them both the same way.  It's also not entirely clear to me that
> isolation tests can expect a database to be fresh and then mess with
> dangerous internal state, whereas TAP tests set up and tear down a
> cluster each time.
>
> I think I found another bug in MaintainOldSnapshotTimeMapping(): if
> you make time jump by more than old_snapshot_threshold in one go, then
> the map gets cleared and then no early pruning or snapshot-too-old
> errors happen.  That's why in 002_too_old.pl it currently advances
> time by 10 minutes twice, instead of 20 minutes once.  To be
> continued.

IMHO that doesn't seems to be a problem.  Because even if we jump more
than old_snapshot_threshold in one go we don't clean complete map
right.  The latest snapshot timestamp will become the headtimestamp.
So in TransactionIdLimitedForOldSnapshots if (current_ts -
old_snapshot_threshold) is still >= head_timestap then we can still do
early pruning.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: fixing old_snapshot_threshold's time->xid mapping

From

Andres Freund

Date:

18 April 2020, 20:17:15

Hi,

On 2020-04-17 14:12:44 +1200, Thomas Munro wrote:
> What about a contrib function that lets you clobber
> oldSnapshotControl->current_timestamp?  It looks like all times in
> this system come ultimately from GetSnapshotCurrentTimestamp(), which
> uses that variable to make sure that time never goes backwards.

It'd be better than the current test situation, and probably would be
good to have as part of testing anyway (since it'd allow to make the
tests not take long / be racy on slow machines). But I still don't think
it really allows to test the feature in a natural way. It makes it
easier to test for know edge cases / problems, but not really discover
unknown ones. For that I think we need more granular bins.

- Andres

Re: fixing old_snapshot_threshold's time->xid mapping

From

Dilip Kumar

Date:

20 April 2020, 04:10:28

On Thu, Apr 16, 2020 at 10:12 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Hi,
>
> I'm starting a new thread for this, because the recent discussion of
> problems with old_snapshot_threshold[1] touched on a lot of separate
> issues, and I think it will be too confusing if we discuss all of them
> on one thread. Attached are three patches.
>
> 0001 makes oldSnapshotControl "extern" rather than "static" and
> exposes the struct definition via a header.
>
> 0002 adds a contrib module called old_snapshot which makes it possible
> to examine the time->XID mapping via SQL. As Andres said, the comments
> are not really adequate in the existing code, and the code itself is
> buggy, so it was a little hard to be sure that I was understanding the
> intended meaning of the different fields correctly. However, I gave it
> a shot.
>
> 0003 attempts to fix bugs in MaintainOldSnapshotTimeMapping() so that
> it produces a sensible mapping. I encountered and tried to fix two
> issues here:
>
> First, as previously discussed, the branch that advances the mapping
> should not categorically do "oldSnapshotControl->head_timestamp = ts;"
> assuming that the head_timestamp is supposed to be the timestamp for
> the oldest bucket rather than the newest one. Rather, there are three
> cases: (1) resetting the mapping resets head_timestamp, (2) extending
> the mapping by an entry without dropping an entry leaves
> head_timestamp alone, and (3) overwriting the previous head with a new
> entry advances head_timestamp by 1 minute.
>
> Second, the calculation of the number of entries by which the mapping
> should advance is incorrect. It thinks that it should advance by the
> number of minutes between the current head_timestamp and the incoming
> timestamp. That would be correct if head_timestamp were the most
> recent entry in the mapping, but it's actually the oldest. As a
> result, without this fix, every time we move into a new minute, we
> advance the mapping much further than we actually should. Instead of
> advancing by 1, we advance by the number of entries that already exist
> in the mapping - which means we now have entries that correspond to
> times which are in the future, and don't advance the mapping again
> until those future timestamps are in the past.
>
> With these fixes, I seem to get reasonably sensible mappings, at least
> in light testing.  I tried running this in one window with \watch 10:
>
> select *, age(newest_xmin), clock_timestamp()  from
> pg_old_snapshot_time_mapping();
>
> And in another window I ran:
>
> pgbench -T 300 -R 10
>
> And the age does in fact advance by ~600 transactions per minute.

I have started reviewing these patches.  I think, the fixes looks right to me.

+ LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+ mapping->head_offset = oldSnapshotControl->head_offset;
+ mapping->head_timestamp = oldSnapshotControl->head_timestamp;
+ mapping->count_used = oldSnapshotControl->count_used;
+ for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
+ mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
+ LWLockRelease(OldSnapshotTimeMapLock);

I think memcpy would be a better choice instead of looping it for all
the entries, since we are doing this under a lock?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: fixing old_snapshot_threshold's time->xid mapping

From

Thomas Munro

Date:

20 April 2020, 05:54:18

On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > I think I found another bug in MaintainOldSnapshotTimeMapping(): if
> > you make time jump by more than old_snapshot_threshold in one go, then
> > the map gets cleared and then no early pruning or snapshot-too-old
> > errors happen.  That's why in 002_too_old.pl it currently advances
> > time by 10 minutes twice, instead of 20 minutes once.  To be
> > continued.
>
> IMHO that doesn't seems to be a problem.  Because even if we jump more
> than old_snapshot_threshold in one go we don't clean complete map
> right.  The latest snapshot timestamp will become the headtimestamp.
> So in TransactionIdLimitedForOldSnapshots if (current_ts -
> old_snapshot_threshold) is still >= head_timestap then we can still do
> early pruning.

Right, thanks.  I got confused about that, and misdiagnosed something
I was seeing.

Here's a new version:

0004: Instead of writing a new kind of TAP test to demonstrate
snapshot-too-old errors, I adjusted the existing isolation tests to
use the same absolute time control technique.  Previously I had
invented a way to do isolation tester-like stuff in TAP tests, which
might be interesting but strange new perl is not necessary for this.

0005: Truncates the time map when the CLOG is truncated.  Its test is
now under src/test/module/snapshot_too_old/t/001_truncate.sql.

These apply on top of Robert's patches, but the only dependency is on
his patch 0001 "Expose oldSnapshotControl.", because now I have stuff
in src/test/module/snapshot_too_old/test_sto.c that wants to mess with
that object too.

Is this an improvement?  I realise that there is still nothing to
actually verify that early pruning has actually happened.  I haven't
thought of a good way to do that yet (stats, page inspection, ...).

Attachment

Re: fixing old_snapshot_threshold's time->xid mapping

From

Dilip Kumar

Date:

20 April 2020, 06:34:44

On Mon, Apr 20, 2020 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > I think I found another bug in MaintainOldSnapshotTimeMapping(): if
> > > you make time jump by more than old_snapshot_threshold in one go, then
> > > the map gets cleared and then no early pruning or snapshot-too-old
> > > errors happen.  That's why in 002_too_old.pl it currently advances
> > > time by 10 minutes twice, instead of 20 minutes once.  To be
> > > continued.
> >
> > IMHO that doesn't seems to be a problem.  Because even if we jump more
> > than old_snapshot_threshold in one go we don't clean complete map
> > right.  The latest snapshot timestamp will become the headtimestamp.
> > So in TransactionIdLimitedForOldSnapshots if (current_ts -
> > old_snapshot_threshold) is still >= head_timestap then we can still do
> > early pruning.
>
> Right, thanks.  I got confused about that, and misdiagnosed something
> I was seeing.
>
> Here's a new version:
>
> 0004: Instead of writing a new kind of TAP test to demonstrate
> snapshot-too-old errors, I adjusted the existing isolation tests to
> use the same absolute time control technique.  Previously I had
> invented a way to do isolation tester-like stuff in TAP tests, which
> might be interesting but strange new perl is not necessary for this.
>
> 0005: Truncates the time map when the CLOG is truncated.  Its test is
> now under src/test/module/snapshot_too_old/t/001_truncate.sql.
>
> These apply on top of Robert's patches, but the only dependency is on
> his patch 0001 "Expose oldSnapshotControl.", because now I have stuff
> in src/test/module/snapshot_too_old/test_sto.c that wants to mess with
> that object too.
>
> Is this an improvement?  I realise that there is still nothing to
> actually verify that early pruning has actually happened.  I haven't
> thought of a good way to do that yet (stats, page inspection, ...).

Could we test the early pruning using xid-burn patch?  Basically,  in
xid_by_minute we have some xids with the current epoch.  Now, we burns
more than 2b xid and then if we try to vacuum we might hit the case of
early pruning no.  Do you wnated to this case or you had some other
case in mind which you wnated to test?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: fixing old_snapshot_threshold's time->xid mapping

From

Thomas Munro

Date:

20 April 2020, 06:58:34

On Mon, Apr 20, 2020 at 6:35 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Mon, Apr 20, 2020 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> >
> > On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Is this an improvement?  I realise that there is still nothing to
> > actually verify that early pruning has actually happened.  I haven't
> > thought of a good way to do that yet (stats, page inspection, ...).
>
> Could we test the early pruning using xid-burn patch?  Basically,  in
> xid_by_minute we have some xids with the current epoch.  Now, we burns
> more than 2b xid and then if we try to vacuum we might hit the case of
> early pruning no.  Do you wnated to this case or you had some other
> case in mind which you wnated to test?

I mean I want to verify that VACUUM or heap prune actually removed a
tuple that was visible to an old snapshot.  An idea I just had: maybe
sto_using_select.spec should check the visibility map (somehow).  For
example, the sto_using_select.spec (the version in the patch I just
posted) just checks that after time 00:11, the old snapshot gets a
snapshot-too-old error.  Perhaps we could add a VACUUM before that,
and then check that the page has become all visible, meaning that the
dead tuple our snapshot could see has now been removed.

Re: fixing old_snapshot_threshold's time->xid mapping

From

Dilip Kumar

Date:

20 April 2020, 08:01:55

On Mon, Apr 20, 2020 at 12:29 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Mon, Apr 20, 2020 at 6:35 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > On Mon, Apr 20, 2020 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > >
> > > On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > Is this an improvement?  I realise that there is still nothing to
> > > actually verify that early pruning has actually happened.  I haven't
> > > thought of a good way to do that yet (stats, page inspection, ...).
> >
> > Could we test the early pruning using xid-burn patch?  Basically,  in
> > xid_by_minute we have some xids with the current epoch.  Now, we burns
> > more than 2b xid and then if we try to vacuum we might hit the case of
> > early pruning no.  Do you wnated to this case or you had some other
> > case in mind which you wnated to test?
>
> I mean I want to verify that VACUUM or heap prune actually removed a
> tuple that was visible to an old snapshot.  An idea I just had: maybe
> sto_using_select.spec should check the visibility map (somehow).  For
> example, the sto_using_select.spec (the version in the patch I just
> posted) just checks that after time 00:11, the old snapshot gets a
> snapshot-too-old error.  Perhaps we could add a VACUUM before that,
> and then check that the page has become all visible, meaning that the
> dead tuple our snapshot could see has now been removed.

Okay, got your point.  Can we try to implement some test functions
that can just call visibilitymap_get_status function internally?  I
agree that we will have to pass the correct block number but that we
can find using TID.   Or for testing, we can create a very small
relation that just has 1 block?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: fixing old_snapshot_threshold's time->xid mapping

From

Robert Haas

Date:

20 April 2020, 18:01:07

On Mon, Apr 20, 2020 at 12:10 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I have started reviewing these patches.  I think, the fixes looks right to me.
>
> + LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
> + mapping->head_offset = oldSnapshotControl->head_offset;
> + mapping->head_timestamp = oldSnapshotControl->head_timestamp;
> + mapping->count_used = oldSnapshotControl->count_used;
> + for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
> + mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
> + LWLockRelease(OldSnapshotTimeMapLock);
>
> I think memcpy would be a better choice instead of looping it for all
> the entries, since we are doing this under a lock?

When I did it that way, it complained about "const" and I couldn't
immediately figure out how to fix it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: fixing old_snapshot_threshold's time->xid mapping

From

Thomas Munro

Date:

21 April 2020, 02:05:18

On Mon, Apr 20, 2020 at 8:02 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Mon, Apr 20, 2020 at 12:29 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > I mean I want to verify that VACUUM or heap prune actually removed a
> > tuple that was visible to an old snapshot.  An idea I just had: maybe
> > sto_using_select.spec should check the visibility map (somehow).  For
> > example, the sto_using_select.spec (the version in the patch I just
> > posted) just checks that after time 00:11, the old snapshot gets a
> > snapshot-too-old error.  Perhaps we could add a VACUUM before that,
> > and then check that the page has become all visible, meaning that the
> > dead tuple our snapshot could see has now been removed.
>
> Okay, got your point.  Can we try to implement some test functions
> that can just call visibilitymap_get_status function internally?  I
> agree that we will have to pass the correct block number but that we
> can find using TID.   Or for testing, we can create a very small
> relation that just has 1 block?

I think it's enough to check SELECT EVERY(all_visible) FROM
pg_visibility_map('sto1'::regclass).  I realised that
src/test/module/snapshot_too_old is allowed to install
contrib/pg_visibility with EXTRA_INSTALL, so here's a new version to
try that idea.  It extends sto_using_select.spec to VACUUM and check
the vis map at key times.  That allows us to check that early pruning
really happens once the snapshot becomes too old.  There are other
ways you could check that but this seems quite "light touch" compared
to something based on page inspection.

I also changed src/test/module/snapshot_too_old/t/001_truncate.pl back
to using Robert's contrib/old_snapshot extension to know the size of
the time/xid map, allowing an introspection function to be dropped
from test_sto.c.

As before, these two apply on top of Robert's patches (or at least his
0001 and 0002).

Attachment

Re: fixing old_snapshot_threshold's time->xid mapping

From

Dilip Kumar

Date:

21 April 2020, 04:24:29

On Mon, Apr 20, 2020 at 11:31 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Apr 20, 2020 at 12:10 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > I have started reviewing these patches.  I think, the fixes looks right to me.
> >
> > + LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
> > + mapping->head_offset = oldSnapshotControl->head_offset;
> > + mapping->head_timestamp = oldSnapshotControl->head_timestamp;
> > + mapping->count_used = oldSnapshotControl->count_used;
> > + for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
> > + mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
> > + LWLockRelease(OldSnapshotTimeMapLock);
> >
> > I think memcpy would be a better choice instead of looping it for all
> > the entries, since we are doing this under a lock?
>
> When I did it that way, it complained about "const" and I couldn't
> immediately figure out how to fix it.

I think we can typecast to (const void *).  After below change, I did
not get the warning.

diff --git a/contrib/old_snapshot/time_mapping.c
b/contrib/old_snapshot/time_mapping.c
index 37e0055..cc53bdd 100644
--- a/contrib/old_snapshot/time_mapping.c
+++ b/contrib/old_snapshot/time_mapping.c
@@ -94,8 +94,9 @@ GetOldSnapshotTimeMapping(void)
        mapping->head_offset = oldSnapshotControl->head_offset;
        mapping->head_timestamp = oldSnapshotControl->head_timestamp;
        mapping->count_used = oldSnapshotControl->count_used;
-       for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
-               mapping->xid_by_minute[i] =
oldSnapshotControl->xid_by_minute[i];
+       memcpy(mapping->xid_by_minute,
+                  (const void *) oldSnapshotControl->xid_by_minute,
+                  sizeof(TransactionId) * OLD_SNAPSHOT_TIME_MAP_ENTRIES);
        LWLockRelease(OldSnapshotTimeMapLock);

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: fixing old_snapshot_threshold's time->xid mapping

From

Thomas Munro

Date:

21 April 2020, 10:14:01

On Tue, Apr 21, 2020 at 2:05 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> As before, these two apply on top of Robert's patches (or at least his
> 0001 and 0002).

While trying to figure out if Robert's 0003 patch was correct, I added
yet another patch to this stack to test it.  0006 does basic xid map
maintenance that exercises the cases 0003 fixes, and I think it
demonstrates that they now work correctly.  Also some minor perl
improvements to 0005.  I'll attach 0001-0004 again but they are
unchanged.

Since confusion about head vs tail seems to have been at the root of
the bugs addressed by 0003, I wonder if we should also rename
head_{timestamp,offset} to oldest_{timestamp,offset}.

Attachment

Re: fixing old_snapshot_threshold's time->xid mapping

From

Dilip Kumar

Date:

21 April 2020, 11:22:44

On Tue, Apr 21, 2020 at 3:44 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Tue, Apr 21, 2020 at 2:05 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > As before, these two apply on top of Robert's patches (or at least his
> > 0001 and 0002).
>
> While trying to figure out if Robert's 0003 patch was correct, I added
> yet another patch to this stack to test it.  0006 does basic xid map
> maintenance that exercises the cases 0003 fixes, and I think it
> demonstrates that they now work correctly.

+1,  I think we should also add a way to test the case, where we
advance the timestamp by multiple slots.  I see that you have such
case
e.g
+# test adding minutes while the map is not full
+set_time('3000-01-01 02:01:00Z');
+is(summarize_mapping(), "2|02:00:00|02:01:00");
+set_time('3000-01-01 02:05:00Z');
+is(summarize_mapping(), "6|02:00:00|02:05:00");
+set_time('3000-01-01 02:19:00Z');
+is(summarize_mapping(), "20|02:00:00|02:19:00");

But, I think we should try to extend it to test that we have put the
new xid only in those slots where we suppose to and not in other
slots?.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: fixing old_snapshot_threshold's time->xid mapping

From

Dilip Kumar

Date:

22 April 2020, 05:39:20

On Tue, Apr 21, 2020 at 4:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Apr 21, 2020 at 3:44 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> >
> > On Tue, Apr 21, 2020 at 2:05 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > As before, these two apply on top of Robert's patches (or at least his
> > > 0001 and 0002).
> >
> > While trying to figure out if Robert's 0003 patch was correct, I added
> > yet another patch to this stack to test it.  0006 does basic xid map
> > maintenance that exercises the cases 0003 fixes, and I think it
> > demonstrates that they now work correctly.
>
> +1,  I think we should also add a way to test the case, where we
> advance the timestamp by multiple slots.  I see that you have such
> case
> e.g
> +# test adding minutes while the map is not full
> +set_time('3000-01-01 02:01:00Z');
> +is(summarize_mapping(), "2|02:00:00|02:01:00");
> +set_time('3000-01-01 02:05:00Z');
> +is(summarize_mapping(), "6|02:00:00|02:05:00");
> +set_time('3000-01-01 02:19:00Z');
> +is(summarize_mapping(), "20|02:00:00|02:19:00");
>
> But, I think we should try to extend it to test that we have put the
> new xid only in those slots where we suppose to and not in other
> slots?.

I feel that we should. probably fix this check as well?  Because if ts
> update_ts then it will go to else part then there it will finally
end up in the last slot only so I think we can use this case also as
fast exit.

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 93a0c04..644d9b1 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1831,7 +1831,7 @@
TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,

                if (!same_ts_as_threshold)
                {
-                       if (ts == update_ts)
+                       if (ts >= update_ts)
                        {
                                xlimit = latest_xmin;
                                if (NormalTransactionIdFollows(xlimit,
recentXmin))

This patch can be applied on top of other v5 patches.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

v5-0007-Fix-check-while-computing-transaction-xid-limit.patch

Re: fixing old_snapshot_threshold's time->xid mapping

From

Thomas Munro

Date:

14 August 2020, 00:52:29

On Wed, Apr 22, 2020 at 5:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> -                       if (ts == update_ts)
> +                       if (ts >= update_ts)

Hi Dilip, I didn't follow this bit -- could you explain?

Here's a rebase.  In the 0004 patch I chose to leave behind some
unnecessary braces to avoid reindenting a bunch of code after removing
an if branch, just for ease of review, but I'd probably remove those
in a committed version.  I'm going to add this stuff to the next CF so
we don't lose track of it.

Attachment

Re: fixing old_snapshot_threshold's time->xid mapping

From

Thomas Munro

Date:

14 August 2020, 01:04:03

On Fri, Aug 14, 2020 at 12:52 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Here's a rebase.

And another, since I was too slow and v6 is already in conflict...
sorry for the high frequency patches.

Attachment

Re: fixing old_snapshot_threshold's time->xid mapping

From

Thomas Munro

Date:

14 August 2020, 22:09:15

On Fri, Aug 14, 2020 at 1:04 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Fri, Aug 14, 2020 at 12:52 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Here's a rebase.
>
> And another, since I was too slow and v6 is already in conflict...
> sorry for the high frequency patches.

And ... now that this has a commitfest entry, cfbot told me about a
small problem in a makefile.  Third time lucky?

Attachment

Re: fixing old_snapshot_threshold's time->xid mapping

From

Michael Paquier

Date:

17 September 2020, 05:47:40

On Sat, Aug 15, 2020 at 10:09:15AM +1200, Thomas Munro wrote:
> And ... now that this has a commitfest entry, cfbot told me about a
> small problem in a makefile.  Third time lucky?

Still lucky since then, and the CF bot does not complain.  So...  The
meat of the patch is in 0003 which is fixing an actual bug.  Robert,
Thomas, anything specific you are waiting for here?  As this is a bug
fix, perhaps it would be better to just move on with some portions of
the set?

Kevin, I really think that you should chime in here.  This is
originally your feature.
--
Michael

Attachment

signature.asc

Re: fixing old_snapshot_threshold's time->xid mapping

From

Robert Haas

Date:

17 September 2020, 14:40:09

On Thu, Sep 17, 2020 at 1:47 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Sat, Aug 15, 2020 at 10:09:15AM +1200, Thomas Munro wrote:
> > And ... now that this has a commitfest entry, cfbot told me about a
> > small problem in a makefile.  Third time lucky?
>
> Still lucky since then, and the CF bot does not complain.  So...  The
> meat of the patch is in 0003 which is fixing an actual bug.  Robert,
> Thomas, anything specific you are waiting for here?  As this is a bug
> fix, perhaps it would be better to just move on with some portions of
> the set?

Yeah, I plan to push forward with 0001 through 0003 soon, but 0001
needs to be revised with a PGDLLIMPORT marking, I think, and 0002
needs documentation. Not sure whether there's going to be adequate
support for back-patching given that it's adding a new contrib module
for observability and not just fixing a bug, so my tentative plan is
to just push into master. If there is a great clamor for back-patching
then I can, but I'm not very excited about pushing the bug fix into
the back-branches without the observability stuff, because then if
somebody claims that it's not working properly, it'll be almost
impossible to understand why.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: fixing old_snapshot_threshold's time->xid mapping

From

Robert Haas

Date:

19 September 2020, 00:19:39

On Thu, Sep 17, 2020 at 10:40 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Yeah, I plan to push forward with 0001 through 0003 soon, but 0001
> needs to be revised with a PGDLLIMPORT marking, I think, and 0002
> needs documentation.

So here's an updated version of those three, with proposed commit
messages, a PGDLLIMPORT for 0001, and docs for 0002.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: fixing old_snapshot_threshold's time->xid mapping

From

Thomas Munro

Date:

24 September 2020, 01:15:50

On Sat, Sep 19, 2020 at 12:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Sep 17, 2020 at 10:40 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > Yeah, I plan to push forward with 0001 through 0003 soon, but 0001
> > needs to be revised with a PGDLLIMPORT marking, I think, and 0002
> > needs documentation.
>
> So here's an updated version of those three, with proposed commit
> messages, a PGDLLIMPORT for 0001, and docs for 0002.

LGTM.

Re: fixing old_snapshot_threshold's time->xid mapping

From

Hamid Akhtar

Date:

24 September 2020, 13:14:28

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

Patch looks good to me.

Re: fixing old_snapshot_threshold's time->xid mapping

From

Robert Haas

Date:

24 September 2020, 19:46:14

On Wed, Sep 23, 2020 at 9:16 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> LGTM.

Committed.

Thomas, with respect to your part of this patch set, I wonder if we
can make the functions that you're using to write tests safe enough
that we could add them to contrib/old_snapshot and let users run them
if they want. As you have them, they are hedged around with vague and
scary warnings, but is that really justified? And if so, can it be
fixed? It would be nicer not to end up with two loadable modules here,
and maybe the right sorts of functions could even have some practical
use.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: fixing old_snapshot_threshold's time->xid mapping

From

Michael Paquier

Date:

25 September 2020, 02:00:10

On Thu, Sep 24, 2020 at 03:46:14PM -0400, Robert Haas wrote:
> Committed.

Cool, thanks.

> Thomas, with respect to your part of this patch set, I wonder if we
> can make the functions that you're using to write tests safe enough
> that we could add them to contrib/old_snapshot and let users run them
> if they want. As you have them, they are hedged around with vague and
> scary warnings, but is that really justified? And if so, can it be
> fixed? It would be nicer not to end up with two loadable modules here,
> and maybe the right sorts of functions could even have some practical
> use.

I have switched this item as waiting on author in the CF app then, as
we are not completely done yet.
--
Michael

Attachment

signature.asc

Re: fixing old_snapshot_threshold's time->xid mapping

From

Thomas Munro

Date:

06 October 2020, 05:32:57

On Fri, Sep 25, 2020 at 2:00 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, Sep 24, 2020 at 03:46:14PM -0400, Robert Haas wrote:
> > Committed.
>
> Cool, thanks.

+1

> > Thomas, with respect to your part of this patch set, I wonder if we
> > can make the functions that you're using to write tests safe enough
> > that we could add them to contrib/old_snapshot and let users run them
> > if they want. As you have them, they are hedged around with vague and
> > scary warnings, but is that really justified? And if so, can it be
> > fixed? It would be nicer not to end up with two loadable modules here,
> > and maybe the right sorts of functions could even have some practical
> > use.

Yeah, you may be right.  I am thinking about that.  In the meantime,
here is a rebase.  A quick recap of these remaining patches:

0001 replaces the current "magic test mode" that didn't really test
anything with a new test mode that verifies pruning and STO behaviour.
0002 fixes a separate bug that Andres reported: the STO XID map
suffers from wraparound-itis.
0003 adds a simple smoke test for Robert's commit 55b7e2f4.  Before
that fix landed, it failed.

> I have switched this item as waiting on author in the CF app then, as
> we are not completely done yet.

Thanks.  For the record, I think there is still one more complaint
from Andres that remains unaddressed even once these are in the tree:
there are thought to be some more places that lack
TestForOldSnapshot() calls.

Attachment

Re: fixing old_snapshot_threshold's time->xid mapping

From

Anastasia Lubennikova

Date:

26 November 2020, 10:03:56

On 06.10.2020 08:32, Thomas Munro wrote:
> On Fri, Sep 25, 2020 at 2:00 PM Michael Paquier <michael@paquier.xyz> wrote:
>> On Thu, Sep 24, 2020 at 03:46:14PM -0400, Robert Haas wrote:
>>
>>> Thomas, with respect to your part of this patch set, I wonder if we
>>> can make the functions that you're using to write tests safe enough
>>> that we could add them to contrib/old_snapshot and let users run them
>>> if they want. As you have them, they are hedged around with vague and
>>> scary warnings, but is that really justified? And if so, can it be
>>> fixed? It would be nicer not to end up with two loadable modules here,
>>> and maybe the right sorts of functions could even have some practical
>>> use.
> Yeah, you may be right.  I am thinking about that.  In the meantime,
> here is a rebase.  A quick recap of these remaining patches:
>
> 0001 replaces the current "magic test mode" that didn't really test
> anything with a new test mode that verifies pruning and STO behaviour.
> 0002 fixes a separate bug that Andres reported: the STO XID map
> suffers from wraparound-itis.
> 0003 adds a simple smoke test for Robert's commit 55b7e2f4.  Before
> that fix landed, it failed.
>
>> I have switched this item as waiting on author in the CF app then, as
>> we are not completely done yet.
> Thanks.  For the record, I think there is still one more complaint
> from Andres that remains unaddressed even once these are in the tree:
> there are thought to be some more places that lack
> TestForOldSnapshot() calls.

Status update for a commitfest entry.

This entry is "Waiting on author" and the thread was inactive for a 
while.  As far as I see, part of the fixes is already committed. Is 
there anything left to work on or this patch set needs review now?

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company