Thread: pg_rewind and xlogtemp files

pg_rewind and xlogtemp files

From
Michael Paquier
Date:
Hi all,

I just bumped into this report regarding pg_rewind, that impacts as
well the version shipped in src/bin/pg_rewind:
https://github.com/vmware/pg_rewind/issues/45

In short, the issue refers to the fact that if the source server
filemap includes xlogtemp files pg_rewind will surely fail with
something like the following error:

error reading xlog record: record with zero length at 1/D5000090
unexpected result while fetching remote files: ERROR:  could not open
file "pg_xlog/xlogtemp.23056" for reading: No such file or directory

The servers diverged at WAL position 1/D4A081B0 on timeline 174.
Rewinding from Last common checkpoint at 1/D30A5650 on timeline 174

As pointed by dev1ant on the original bug report, process_remote_file
should ignore files named as pg_xlog/xlogtemp.*, and I think that this
is the right thing to do. Any objections for a patch that at the same
time makes "xlogtemp." a define declaration in xlog_internal.h?

Regards,
-- 
Michael



Re: pg_rewind and xlogtemp files

From
Michael Paquier
Date:
On Wed, Jun 17, 2015 at 3:17 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> As pointed by dev1ant on the original bug report, process_remote_file
> should ignore files named as pg_xlog/xlogtemp.*, and I think that this
> is the right thing to do. Any objections for a patch that at the same
> time makes "xlogtemp." a define declaration in xlog_internal.h?

And attached is a patch following those lines.
--
Michael

Attachment

Re: pg_rewind and xlogtemp files

From
Vladimir Borodin
Date:

17 июня 2015 г., в 9:48, Michael Paquier <michael.paquier@gmail.com> написал(а):

On Wed, Jun 17, 2015 at 3:17 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
As pointed by dev1ant on the original bug report, process_remote_file
should ignore files named as pg_xlog/xlogtemp.*, and I think that this
is the right thing to do. Any objections for a patch that at the same
time makes "xlogtemp." a define declaration in xlog_internal.h?

Declaration seems to be the right thing.

Another problem I’ve caught twice already in the same test:

error reading xlog record: record with zero length at 0/78000090
unexpected result while fetching remote files: ERROR:  could not open file "base/13003/t6_2424967" for reading: No such file or directory
The servers diverged at WAL position 0/76BADD50 on timeline 303.
Rewinding from Last common checkpoint at 0/7651F870 on timeline 303

I don’t know if this problem could be solved the same way (by skipping such files)… Should I start a new thread for that?


And attached is a patch following those lines.
--
Michael
<20150617_rewind_xlogtemp.patch>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
May the force be with you…

Re: pg_rewind and xlogtemp files

From
Fujii Masao
Date:
On Wed, Jun 17, 2015 at 4:57 PM, Vladimir Borodin <root@simply.name> wrote:
>
> 17 июня 2015 г., в 9:48, Michael Paquier <michael.paquier@gmail.com>
> написал(а):
>
> On Wed, Jun 17, 2015 at 3:17 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>
> As pointed by dev1ant on the original bug report, process_remote_file
> should ignore files named as pg_xlog/xlogtemp.*, and I think that this
> is the right thing to do. Any objections for a patch that at the same
> time makes "xlogtemp." a define declaration in xlog_internal.h?
>
>
> Declaration seems to be the right thing.
>
> Another problem I’ve caught twice already in the same test:
>
> error reading xlog record: record with zero length at 0/78000090
> unexpected result while fetching remote files: ERROR:  could not open file
> "base/13003/t6_2424967" for reading: No such file or directory
> The servers diverged at WAL position 0/76BADD50 on timeline 303.
> Rewinding from Last common checkpoint at 0/7651F870 on timeline 303
>
> I don’t know if this problem could be solved the same way (by skipping such
> files)… Should I start a new thread for that?

That's the file of the temporary table, so there is no need to copy it
from the source server. pg_rewind can safely skip such file, I think.

But even if we make pg_rewind skip such file, we would still get the
similar problem. You can see the problem that I reported in other thread.
In order to address this type of problem completely, we would need
to apply the fix that is been discussed in that thread.
http://www.postgresql.org/message-id/CAHGQGwEdsNgeNZo+GyrzZtjW_TkC=XC6XxrjuAZ7=X_cj1aHHg@mail.gmail.com

BTW, even pg_basebackup doesn't skip the file of temporary table.
But maybe we should change this, too.

Also pg_rewind doesn't skip the files that pg_basebackup does. ISTM
that basically pg_rewind can safely skip any files that pg_basebackup does.
So probably we need to reconsider which file to make pg_rewind skip.

Regards,

--
Fujii Masao



Re: pg_rewind and xlogtemp files

From
Michael Paquier
Date:
On Wed, Jun 17, 2015 at 9:07 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Jun 17, 2015 at 4:57 PM, Vladimir Borodin <root@simply.name> wrote:
>>
>> 17 июня 2015 г., в 9:48, Michael Paquier <michael.paquier@gmail.com>
>> написал(а):
>>
>> On Wed, Jun 17, 2015 at 3:17 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>
>> As pointed by dev1ant on the original bug report, process_remote_file
>> should ignore files named as pg_xlog/xlogtemp.*, and I think that this
>> is the right thing to do. Any objections for a patch that at the same
>> time makes "xlogtemp." a define declaration in xlog_internal.h?
>>
>>
>> Declaration seems to be the right thing.
>>
>> Another problem I’ve caught twice already in the same test:
>>
>> error reading xlog record: record with zero length at 0/78000090
>> unexpected result while fetching remote files: ERROR:  could not open file
>> "base/13003/t6_2424967" for reading: No such file or directory
>> The servers diverged at WAL position 0/76BADD50 on timeline 303.
>> Rewinding from Last common checkpoint at 0/7651F870 on timeline 303
>>
>> I don’t know if this problem could be solved the same way (by skipping such
>> files)… Should I start a new thread for that?
>
> That's the file of the temporary table, so there is no need to copy it
> from the source server. pg_rewind can safely skip such file, I think.

Yes. It is actually recommended to copy them manually if needed from
the archive (per se the docs).

> But even if we make pg_rewind skip such file, we would still get the
> similar problem. You can see the problem that I reported in other thread.
> In order to address this type of problem completely, we would need
> to apply the fix that is been discussed in that thread.
> http://www.postgresql.org/message-id/CAHGQGwEdsNgeNZo+GyrzZtjW_TkC=XC6XxrjuAZ7=X_cj1aHHg@mail.gmail.com

There are two things to take into account here in my opinion:
1) Ignoring files that should not be added into the filemap, like
postmaster.pid, xlogtemp, etc.
2) bypass the files that can be added in the file map, for example a
relation file or a fsm file, and prevent erroring out if they are
missing.

> BTW, even pg_basebackup doesn't skip the file of temporary table.
> But maybe we should change this, too.
>
> Also pg_rewind doesn't skip the files that pg_basebackup does. ISTM
> that basically pg_rewind can safely skip any files that pg_basebackup does.
> So probably we need to reconsider which file to make pg_rewind skip.

pg_rewind and basebackup.c are beginning to share many things in this
area, perhaps we should consider a common routine in let's say
libpqcommon to define if a file can be safely skipped depending on its
path name in PGDATA.
--
Michael