Thread: pg_rewind and xlogtemp files
Hi all, I just bumped into this report regarding pg_rewind, that impacts as well the version shipped in src/bin/pg_rewind: https://github.com/vmware/pg_rewind/issues/45 In short, the issue refers to the fact that if the source server filemap includes xlogtemp files pg_rewind will surely fail with something like the following error: error reading xlog record: record with zero length at 1/D5000090 unexpected result while fetching remote files: ERROR: could not open file "pg_xlog/xlogtemp.23056" for reading: No such file or directory The servers diverged at WAL position 1/D4A081B0 on timeline 174. Rewinding from Last common checkpoint at 1/D30A5650 on timeline 174 As pointed by dev1ant on the original bug report, process_remote_file should ignore files named as pg_xlog/xlogtemp.*, and I think that this is the right thing to do. Any objections for a patch that at the same time makes "xlogtemp." a define declaration in xlog_internal.h? Regards, -- Michael
On Wed, Jun 17, 2015 at 3:17 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > As pointed by dev1ant on the original bug report, process_remote_file > should ignore files named as pg_xlog/xlogtemp.*, and I think that this > is the right thing to do. Any objections for a patch that at the same > time makes "xlogtemp." a define declaration in xlog_internal.h? And attached is a patch following those lines. -- Michael
Attachment
17 июня 2015 г., в 9:48, Michael Paquier <michael.paquier@gmail.com> написал(а):On Wed, Jun 17, 2015 at 3:17 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:As pointed by dev1ant on the original bug report, process_remote_file
should ignore files named as pg_xlog/xlogtemp.*, and I think that this
is the right thing to do. Any objections for a patch that at the same
time makes "xlogtemp." a define declaration in xlog_internal.h?
Declaration seems to be the right thing.
Another problem I’ve caught twice already in the same test:
error reading xlog record: record with zero length at 0/78000090
unexpected result while fetching remote files: ERROR: could not open file "base/13003/t6_2424967" for reading: No such file or directory
The servers diverged at WAL position 0/76BADD50 on timeline 303.
Rewinding from Last common checkpoint at 0/7651F870 on timeline 303
I don’t know if this problem could be solved the same way (by skipping such files)… Should I start a new thread for that?
And attached is a patch following those lines.
--
Michael
<20150617_rewind_xlogtemp.patch>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 17, 2015 at 4:57 PM, Vladimir Borodin <root@simply.name> wrote: > > 17 июня 2015 г., в 9:48, Michael Paquier <michael.paquier@gmail.com> > написал(а): > > On Wed, Jun 17, 2015 at 3:17 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > As pointed by dev1ant on the original bug report, process_remote_file > should ignore files named as pg_xlog/xlogtemp.*, and I think that this > is the right thing to do. Any objections for a patch that at the same > time makes "xlogtemp." a define declaration in xlog_internal.h? > > > Declaration seems to be the right thing. > > Another problem I’ve caught twice already in the same test: > > error reading xlog record: record with zero length at 0/78000090 > unexpected result while fetching remote files: ERROR: could not open file > "base/13003/t6_2424967" for reading: No such file or directory > The servers diverged at WAL position 0/76BADD50 on timeline 303. > Rewinding from Last common checkpoint at 0/7651F870 on timeline 303 > > I don’t know if this problem could be solved the same way (by skipping such > files)… Should I start a new thread for that? That's the file of the temporary table, so there is no need to copy it from the source server. pg_rewind can safely skip such file, I think. But even if we make pg_rewind skip such file, we would still get the similar problem. You can see the problem that I reported in other thread. In order to address this type of problem completely, we would need to apply the fix that is been discussed in that thread. http://www.postgresql.org/message-id/CAHGQGwEdsNgeNZo+GyrzZtjW_TkC=XC6XxrjuAZ7=X_cj1aHHg@mail.gmail.com BTW, even pg_basebackup doesn't skip the file of temporary table. But maybe we should change this, too. Also pg_rewind doesn't skip the files that pg_basebackup does. ISTM that basically pg_rewind can safely skip any files that pg_basebackup does. So probably we need to reconsider which file to make pg_rewind skip. Regards, -- Fujii Masao
On Wed, Jun 17, 2015 at 9:07 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, Jun 17, 2015 at 4:57 PM, Vladimir Borodin <root@simply.name> wrote: >> >> 17 июня 2015 г., в 9:48, Michael Paquier <michael.paquier@gmail.com> >> написал(а): >> >> On Wed, Jun 17, 2015 at 3:17 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >> >> As pointed by dev1ant on the original bug report, process_remote_file >> should ignore files named as pg_xlog/xlogtemp.*, and I think that this >> is the right thing to do. Any objections for a patch that at the same >> time makes "xlogtemp." a define declaration in xlog_internal.h? >> >> >> Declaration seems to be the right thing. >> >> Another problem I’ve caught twice already in the same test: >> >> error reading xlog record: record with zero length at 0/78000090 >> unexpected result while fetching remote files: ERROR: could not open file >> "base/13003/t6_2424967" for reading: No such file or directory >> The servers diverged at WAL position 0/76BADD50 on timeline 303. >> Rewinding from Last common checkpoint at 0/7651F870 on timeline 303 >> >> I don’t know if this problem could be solved the same way (by skipping such >> files)… Should I start a new thread for that? > > That's the file of the temporary table, so there is no need to copy it > from the source server. pg_rewind can safely skip such file, I think. Yes. It is actually recommended to copy them manually if needed from the archive (per se the docs). > But even if we make pg_rewind skip such file, we would still get the > similar problem. You can see the problem that I reported in other thread. > In order to address this type of problem completely, we would need > to apply the fix that is been discussed in that thread. > http://www.postgresql.org/message-id/CAHGQGwEdsNgeNZo+GyrzZtjW_TkC=XC6XxrjuAZ7=X_cj1aHHg@mail.gmail.com There are two things to take into account here in my opinion: 1) Ignoring files that should not be added into the filemap, like postmaster.pid, xlogtemp, etc. 2) bypass the files that can be added in the file map, for example a relation file or a fsm file, and prevent erroring out if they are missing. > BTW, even pg_basebackup doesn't skip the file of temporary table. > But maybe we should change this, too. > > Also pg_rewind doesn't skip the files that pg_basebackup does. ISTM > that basically pg_rewind can safely skip any files that pg_basebackup does. > So probably we need to reconsider which file to make pg_rewind skip. pg_rewind and basebackup.c are beginning to share many things in this area, perhaps we should consider a common routine in let's say libpqcommon to define if a file can be safely skipped depending on its path name in PGDATA. -- Michael