Thread: when set track_commit_timestamp on, database system abort startup

when set track_commit_timestamp on, database system abort startup

From
李海龙
Date:


HI, Dear pgsql-hackers


The details are  as follows:


Environment:

OS:        CentOS 6.7 (Final), Linux 3.18.48-11.el6.x86_64 x86_64, 64-bit


PostgreSQL:

postgres=# select version();
                                                 version                                                
---------------------------------------------------------------------------------------------------------
 PostgreSQL 10.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)

postgres=# show wal_level ;
 wal_level
-----------
 replica
(1 row)


When I enable the parameter track_commit_timestamp in postgresql.conf of a Base Backup (making a Base Backup from a standby and the track_commit_timestamp is off on it),

the database system aborted startup.


The log is:
[    2018-09-14 14:38:30.723 CST 8433 5b9b5766.20f1 1  0]LOG:  database system was interrupted while in recovery at log time 2018-09-12 07:00:05 CST
[    2018-09-14 14:38:30.723 CST 8433 5b9b5766.20f1 2  0]HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
[    2018-09-14 14:38:30.817 CST 8433 5b9b5766.20f1 3  0]LOG:  redo starts at 1C45/6FC0A230
[    2018-09-14 14:38:31.215 CST 8433 5b9b5766.20f1 4  0]FATAL:  could not access status of transaction 3003911820
[    2018-09-14 14:38:31.215 CST 8433 5b9b5766.20f1 5  0]DETAIL:  Could not read from file "pg_commit_ts/6FEE" at offset 557056: Success.
[    2018-09-14 14:38:31.215 CST 8433 5b9b5766.20f1 6  0]CONTEXT:  WAL redo at 1C45/7844DC60 for Transaction/COMMIT: 2018-09-12 07:00:16.629145+08
[    2018-09-14 14:38:31.228 CST 8431 5b9b5766.20ef 6  0]LOG:  startup process (PID 8433) exited with exit code 1
[    2018-09-14 14:38:31.228 CST 8431 5b9b5766.20ef 7  0]LOG:  aborting startup due to startup process failure
[    2018-09-14 14:38:31.341 CST 8431 5b9b5766.20ef 8  0]LOG:  database system is shut down



I have no choice but I enable the parameter track_commit_timestamp off, the database system is starting up successfully!


The log is:

[    2018-09-14 14:41:38.887 CST 10210 5b9b5822.27e2 1  0]LOG:  database system was interrupted while in recovery at 2018-09-14 14:41:21 CST
[    2018-09-14 14:41:38.887 CST 10210 5b9b5822.27e2 2  0]HINT:  This probably means that some data is corrupted and you will have to use the last backup for recovery.
[    2018-09-14 14:41:38.977 CST 10210 5b9b5822.27e2 3  0]LOG:  database system was not properly shut down; automatic recovery in progress
[    2018-09-14 14:41:38.979 CST 10210 5b9b5822.27e2 4  0]LOG:  redo starts at 1C45/6FC0A230
[[unknown] [unknown] [unknown] [local] 2018-09-14 14:41:43.794 CST 10284 5b9b5827.282c 1  0]LOG:  connection received: host=[local]
[postgres template1 [unknown] [local] 2018-09-14 14:41:43.794 CST 10284 5b9b5827.282c 2  0]FATAL:  the database system is starting up
......
......
[    2018-09-14 14:47:03.486 CST 10210 5b9b5822.27e2 5  0]LOG:  redo done at 1C63/FF683C98
[    2018-09-14 14:47:03.486 CST 10210 5b9b5822.27e2 6  0]LOG:  last completed transaction was at log time 2018-09-12 13:02:56.737147+08
[    2018-09-14 14:47:03.491 CST 10210 5b9b5822.27e2 7  0]LOG:  checkpoint starting: end-of-recovery immediate
[[unknown] [unknown] [unknown] [local] 2018-09-14 14:47:43.940 CST 13383 5b9b598f.3447 1  0]LOG:  connection received: host=[local]
[postgres template1 [unknown] [local] 2018-09-14 14:47:43.940 CST 13383 5b9b598f.3447 2  0]FATAL:  the database system is starting up
[    2018-09-14 14:48:06.651 CST 10210 5b9b5822.27e2 8  0]LOG:  checkpoint complete: wrote 1871378 buffers (89.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=57.447 s, sync=5.683 s, total=63.160 s; sync files=1366, longest=0.648 s, average=0.004 s; distance=128192471 kB, estimate=128192471 kB
[    2018-09-14 14:48:09.932 CST 10208 5b9b5822.27e0 6  0]LOG:  database system is ready to accept connections
[[unknown] [unknown] [unknown] [local] 2018-09-14 14:48:31.916 CST 13682 5b9b59bf.3572 1  0]LOG:  connection received: host=[local]
[postgres template1 [unknown] [local] 2018-09-14 14:48:31.917 CST 13682 5b9b59bf.3572 2 3/1 0]LOG:  connection authorized: user=postgres database=template1
[postgres template1 [unknown] [local] 2018-09-14 14:48:31.944 CST 13682 5b9b59bf.3572 3  0]LOG:  disconnection: session time: 0:00:00.027 user=postgres database=template1 host=[local]


Has anyone entered this and please help me.

--
Qunar PostgreSQL DBA hailong.li


Attachment

Re: when set track_commit_timestamp on, database system abort startup

From
Masahiko Sawada
Date:
On Fri, Sep 14, 2018 at 4:27 PM, 李海龙 <hailong.li@qunar.com> wrote:
>
> HI, Dear pgsql-hackers
>
>
> The details are  as follows:
>
>
> Environment:
>
> OS:        CentOS 6.7 (Final), Linux 3.18.48-11.el6.x86_64 x86_64, 64-bit
>
>
> PostgreSQL:
>
> postgres=# select version();
>                                                  version
> ---------------------------------------------------------------------------------------------------------
>  PostgreSQL 10.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
> 20120313 (Red Hat 4.4.7-16), 64-bit
> (1 row)
>
> postgres=# show wal_level ;
>  wal_level
> -----------
>  replica
> (1 row)
>
>
> When I enable the parameter track_commit_timestamp in postgresql.conf of a
> Base Backup (making a Base Backup from a standby and the
> track_commit_timestamp is off on it),
>

In addition to the above operation, I've reproduced this issue by
replaying a commit WAL record that sets the timestamp to a new page
during the crash recovery (or from restart).

It seems to me that the cause of this is that we could not extend
commitTs page since the COMMIT_TS_ZEROPAGE WAL wasn't generated at the
standby server whose track_commit_timestamp is off. So during
replaying the commit WAL record the startup process fails since the
corresponding commitTs page doesn't exist.

To fix that maybe we can disable commitTs if
controlFile->track_commit_timestamp == false and the
track_commit_timestamp == true even in crash recovery.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: when set track_commit_timestamp on, database system abort startup

From
Alvaro Herrera
Date:
On 2018-Sep-15, Masahiko Sawada wrote:

> On Fri, Sep 14, 2018 at 4:27 PM, 李海龙 <hailong.li@qunar.com> wrote:

> > When I enable the parameter track_commit_timestamp in postgresql.conf of a
> > Base Backup (making a Base Backup from a standby and the
> > track_commit_timestamp is off on it),
> 
> In addition to the above operation, I've reproduced this issue by
> replaying a commit WAL record that sets the timestamp to a new page
> during the crash recovery (or from restart).
> 
> It seems to me that the cause of this is that we could not extend
> commitTs page since the COMMIT_TS_ZEROPAGE WAL wasn't generated at the
> standby server whose track_commit_timestamp is off. So during
> replaying the commit WAL record the startup process fails since the
> corresponding commitTs page doesn't exist.

Hmm, wow.  I wonder if it's possible to detect the config difference
early enough that the zeropage WAL records are emitted, instead.  But
even this might not work, since some transactions need to have their
commitTS in pages that will not have been zeroed anyway, because the
page threshold was crossed in the old primary.

> To fix that maybe we can disable commitTs if
> controlFile->track_commit_timestamp == false and the
> track_commit_timestamp == true even in crash recovery.

Hmm, so keep it off while crash recovery runs, and once it's out of that
then enable it automatically?  That might work -- by definition we don't
care about the commit TSs of the transaction replayed during crash
recovery, since they were executed in the primary that didn't have
commitTS enable anyway.

It seems like the first thing we need is TAP cases that reproduce these
two crash scenarios.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: when set track_commit_timestamp on, database system abort startup

From
Masahiko Sawada
Date:
On Sat, Sep 15, 2018 at 12:29 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> On 2018-Sep-15, Masahiko Sawada wrote:
>
>> On Fri, Sep 14, 2018 at 4:27 PM, 李海龙 <hailong.li@qunar.com> wrote:
>
>> > When I enable the parameter track_commit_timestamp in postgresql.conf of a
>> > Base Backup (making a Base Backup from a standby and the
>> > track_commit_timestamp is off on it),
>>
>> In addition to the above operation, I've reproduced this issue by
>> replaying a commit WAL record that sets the timestamp to a new page
>> during the crash recovery (or from restart).
>>
>> It seems to me that the cause of this is that we could not extend
>> commitTs page since the COMMIT_TS_ZEROPAGE WAL wasn't generated at the
>> standby server whose track_commit_timestamp is off. So during
>> replaying the commit WAL record the startup process fails since the
>> corresponding commitTs page doesn't exist.
>
> Hmm, wow.  I wonder if it's possible to detect the config difference
> early enough that the zeropage WAL records are emitted, instead.  But
> even this might not work, since some transactions need to have their
> commitTS in pages that will not have been zeroed anyway, because the
> page threshold was crossed in the old primary.
>
>> To fix that maybe we can disable commitTs if
>> controlFile->track_commit_timestamp == false and the
>> track_commit_timestamp == true even in crash recovery.
>
> Hmm, so keep it off while crash recovery runs, and once it's out of that
> then enable it automatically?

Yes. The attached patch changes it to check
controlFile->track_commit_timestamp even the crash recover case. If
track_commit_timestamp is set to true in the config file, it's enabled
at end of the recovery.

> That might work -- by definition we don't
> care about the commit TSs of the transaction replayed during crash
> recovery, since they were executed in the primary that didn't have
> commitTS enable anyway.
>
> It seems like the first thing we need is TAP cases that reproduce these
> two crash scenarios.

I attached TAP test that reproduces this issue. We can reproduce it
even with single server; making postgres replay a commit WAL in the
crash recovery after consumed transactions and enabled
track_commit_timestamp.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: when set track_commit_timestamp on, database system abortstartup

From
Kyotaro HORIGUCHI
Date:
At Sat, 15 Sep 2018 19:26:39 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAxSNorp3TjvJhrOAk+8q5yshSnW-n8buwz4bdU7qOtPA@mail.gmail.com>
> >> To fix that maybe we can disable commitTs if
> >> controlFile->track_commit_timestamp == false and the
> >> track_commit_timestamp == true even in crash recovery.
> >
> > Hmm, so keep it off while crash recovery runs, and once it's out of that
> > then enable it automatically?
> 
> Yes. The attached patch changes it to check
> controlFile->track_commit_timestamp even the crash recover case. If
> track_commit_timestamp is set to true in the config file, it's enabled
> at end of the recovery.
> 
> > That might work -- by definition we don't
> > care about the commit TSs of the transaction replayed during crash
> > recovery, since they were executed in the primary that didn't have
> > commitTS enable anyway.
> >
> > It seems like the first thing we need is TAP cases that reproduce these
> > two crash scenarios.
> 
> I attached TAP test that reproduces this issue. We can reproduce it
> even with single server; making postgres replay a commit WAL in the
> crash recovery after consumed transactions and enabled
> track_commit_timestamp.

The fix looks good to me.  The TAP test works fine.

In the TAP test:

====
The test script lacks a general description about its objective.

====
+$node->append_conf('postgresql.conf',
+                  "track_commit_timestamp = off");
+$node->start;
+
+# When we start firstly from the initdb the PARAMETER_CHANGES
+# is emitted at end of the recovery, which disables the
+# track_commit_timestamp if the crash recovery replay that
+# WAL. Therefore we restart the server so that we can recovery
+# from the point where doesn't contain that WAL.
+$node->restart;

The first startup actually doesn't emit a PARAMETER_CHAGES. If
track_commit_timestamp were set to on, we get one. However, I
agree that it is reasonable to eliminate the possiblity of being
affected by the record. How about something like this?

+# We don't want to replay PARAMETER_CHANGES record while the
+# crash recovery test below. It is not expected to be emitted
+# thus far, but we restart the server to get rid of it just in
+# case.


====
+# During the crash recovery we replay the commit WAL that sets
+# the commit timestamp to a new page.
+$node->start;

The comment is mentioning the unexpected symptom. Shouldn't it be
the desired behavior?

+# During crash recovery server will replay COMMIT records
+# emitted while commit timestamp was off. The server can start
+# if we correctly avoid processing commit timestamp for the
+# records.

====
The following error message is seen when the issue happens.

> DETAIL:  Could not read from file "pg_commit_ts/0000" at offset 8192: Success.

This seems what 811b6e36a9 tried to fix. This should be like the follows.

> DETAIL:  Could not read from file "pg_commit_ts/0000" at offset 8192: read 0 of 8192"

I, as one of reviewers of it, didn't remember how it was
overlooked butthis also needs a fix in this patch or as a
separate issue.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: when set track_commit_timestamp on, database system abort startup

From
Michael Paquier
Date:
On Wed, Sep 19, 2018 at 12:12:44PM +0900, Kyotaro HORIGUCHI wrote:
> The fix looks good to me.  The TAP test works fine.
>
> In the TAP test:
>
> ====
> The test script lacks a general description about its objective.

That's always welcome.  The patch looks sensible to me.

> The following error message is seen when the issue happens.
>
>> DETAIL:  Could not read from file "pg_commit_ts/0000" at offset 8192: Success.
>
> This seems what 811b6e36a9 tried to fix. This should be like the follows.
>
>> DETAIL:  Could not read from file "pg_commit_ts/0000" at offset 8192: read 0 of 8192"
>
> I, as one of reviewers of it, didn't remember how it was
> overlooked butthis also needs a fix in this patch or as a
> separate issue.

Not completely overlooked...  Please see here why I have left out the
bits in slru.c, on purpose:
https://www.postgresql.org/message-id/20180613035038.GE3138@paquier.xyz
If you want to address that as well, please feel free to send a patch.
Reporting the number of bytes read would be the nicest error report, but
this requires a new field similar to slru_errno which includes an error
string to use if errno is 0.
--
Michael

Attachment

Re: when set track_commit_timestamp on, database system abort startup

From
Masahiko Sawada
Date:
On Wed, Sep 19, 2018 at 12:12 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Sat, 15 Sep 2018 19:26:39 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAxSNorp3TjvJhrOAk+8q5yshSnW-n8buwz4bdU7qOtPA@mail.gmail.com>
>> >> To fix that maybe we can disable commitTs if
>> >> controlFile->track_commit_timestamp == false and the
>> >> track_commit_timestamp == true even in crash recovery.
>> >
>> > Hmm, so keep it off while crash recovery runs, and once it's out of that
>> > then enable it automatically?
>>
>> Yes. The attached patch changes it to check
>> controlFile->track_commit_timestamp even the crash recover case. If
>> track_commit_timestamp is set to true in the config file, it's enabled
>> at end of the recovery.
>>
>> > That might work -- by definition we don't
>> > care about the commit TSs of the transaction replayed during crash
>> > recovery, since they were executed in the primary that didn't have
>> > commitTS enable anyway.
>> >
>> > It seems like the first thing we need is TAP cases that reproduce these
>> > two crash scenarios.
>>
>> I attached TAP test that reproduces this issue. We can reproduce it
>> even with single server; making postgres replay a commit WAL in the
>> crash recovery after consumed transactions and enabled
>> track_commit_timestamp.
>
> The fix looks good to me.  The TAP test works fine.

Thank you for looking at this patch.

>
> In the TAP test:
>
> ====
> The test script lacks a general description about its objective.
>
> ====
> +$node->append_conf('postgresql.conf',
> +                                 "track_commit_timestamp = off");
> +$node->start;
> +
> +# When we start firstly from the initdb the PARAMETER_CHANGES
> +# is emitted at end of the recovery, which disables the
> +# track_commit_timestamp if the crash recovery replay that
> +# WAL. Therefore we restart the server so that we can recovery
> +# from the point where doesn't contain that WAL.
> +$node->restart;
>
> The first startup actually doesn't emit a PARAMETER_CHAGES. If
> track_commit_timestamp were set to on, we get one. However, I
> agree that it is reasonable to eliminate the possiblity of being
> affected by the record. How about something like this?
>
> +# We don't want to replay PARAMETER_CHANGES record while the
> +# crash recovery test below. It is not expected to be emitted
> +# thus far, but we restart the server to get rid of it just in
> +# case.
>
>
> ====
> +# During the crash recovery we replay the commit WAL that sets
> +# the commit timestamp to a new page.
> +$node->start;
>
> The comment is mentioning the unexpected symptom. Shouldn't it be
> the desired behavior?
>
> +# During crash recovery server will replay COMMIT records
> +# emitted while commit timestamp was off. The server can start
> +# if we correctly avoid processing commit timestamp for the
> +# records.
>

I agreed with your all review comments. Attached the updated patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: when set track_commit_timestamp on, database system abort startup

From
Michael Paquier
Date:
On Thu, Sep 20, 2018 at 09:51:34PM +0900, Masahiko Sawada wrote:
> I agreed with your all review comments. Attached the updated patch.

Wouldn't it be better to incorporate the new test as part of
004_restart.pl?  This way, we avoid initializing a full instance, which
is always a good thing as that's very costly.  The top of this file also
mentions that it tests clean restarts, but it bothers also about crash
recovery.

+$node->teardown_node;
I would suggest using $node->stop('immediate') here, which makes the
test easier to understand.
--
Michael

Attachment

Re: when set track_commit_timestamp on, database system abort startup

From
Michael Paquier
Date:
Sawada-san,

On Mon, Sep 24, 2018 at 08:28:45AM +0900, Michael Paquier wrote:
> Wouldn't it be better to incorporate the new test as part of
> 004_restart.pl?  This way, we avoid initializing a full instance, which
> is always a good thing as that's very costly.  The top of this file also
> mentions that it tests clean restarts, but it bothers also about crash
> recovery.

I have been able to work on this bug, and rewrote the proposed test case
as attached.  The test can only get on v11 and HEAD.  What do you think?
--
Michael

Attachment

Re: when set track_commit_timestamp on, database system abort startup

From
Masahiko Sawada
Date:
On Tue, Sep 25, 2018 at 1:43 PM, Michael Paquier <michael@paquier.xyz> wrote:
> Sawada-san,
>
> On Mon, Sep 24, 2018 at 08:28:45AM +0900, Michael Paquier wrote:
>> Wouldn't it be better to incorporate the new test as part of
>> 004_restart.pl?  This way, we avoid initializing a full instance, which
>> is always a good thing as that's very costly.  The top of this file also
>> mentions that it tests clean restarts, but it bothers also about crash
>> recovery.
>
> I have been able to work on this bug, and rewrote the proposed test case
> as attached.  The test can only get on v11 and HEAD.  What do you think?

Thank you for working on this.

+
+# Start the server, which generates a XLOG_PARAMETER_CHANGE record where
+# the parameter change is registered.
 $node_master->start;

+# Now restart again the server so as no record XLOG_PARAMETER_CHANGE are
+# replayed with the follow-up immediate shutdown.
+$node_master->restart;

Can we use "XLOG_PARAMETER_CHANGE record" instead of  "record
XLOG_PARAMETER_CHANGE" at the second hunk because the comment of the
first hunk uses it. The other part of this patch looks good to me.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: when set track_commit_timestamp on, database system abort startup

From
Michael Paquier
Date:
On Tue, Sep 25, 2018 at 11:54:53PM +0900, Masahiko Sawada wrote:
> Can we use "XLOG_PARAMETER_CHANGE record" instead of  "record
> XLOG_PARAMETER_CHANGE" at the second hunk because the comment of the
> first hunk uses it. The other part of this patch looks good to me.

Let's use your suggestion, and committed down to 9.5 where this fix
applies.  The TAP test is only present in v11 and above.  I have also
noticed some comment which became incorrect after the fix, so I changed
them on the way.
--
Michael

Attachment

Re: when set track_commit_timestamp on, database system abort startup

From
Masahiko Sawada
Date:
On Wed, Sep 26, 2018 at 10:34 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Sep 25, 2018 at 11:54:53PM +0900, Masahiko Sawada wrote:
> > Can we use "XLOG_PARAMETER_CHANGE record" instead of  "record
> > XLOG_PARAMETER_CHANGE" at the second hunk because the comment of the
> > first hunk uses it. The other part of this patch looks good to me.
>
> Let's use your suggestion, and committed down to 9.5 where this fix
> applies.  The TAP test is only present in v11 and above.  I have also
> noticed some comment which became incorrect after the fix, so I changed
> them on the way.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center