Thread: Couple of issues with pg_xlogdump

Couple of issues with pg_xlogdump

From
Pavan Deolasee
Date:

Hello,

I was playing with pg_xlogdump in the HEAD and found a few issues.

1. Tried compiling pg_xlogdump via PGXS mechanism and it fails with the following error:
make: *** No rule to make target `/home/pavan.deolasee/work/pgsql/postgresql/install/lib/pgxs/src/makefiles/../../src/backend/access/transam/xlogreader.c', needed by `xlogreader.c'.  Stop.

There are no issues if the sources are compiled directly inside the contrib module

2. I created a fresh database cluster, created a table and COPY IN a million records in the table and then stopped the server. I then tried to dump the xlog files from pg_xlog directory.

[pavan.deolasee@puppetserver pg_xlogdump]$ ls ~/db93head/pg_xlog/
000000010000000000000004  000000010000000000000005  000000010000000000000006  000000010000000000000007  000000010000000000000008  archive_status

[pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump  ~/db93head/pg_xlog/000000010000000000000005
pg_xlogdump: FATAL:  could not find a valid record after 0/5000000
[pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump  ~/db93head/pg_xlog/000000010000000000000006
pg_xlogdump: FATAL:  could not find a valid record after 0/6000000
[pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump  ~/db93head/pg_xlog/000000010000000000000007
pg_xlogdump: FATAL:  could not find a valid record after 0/7000000
[pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump  ~/db93head/pg_xlog/000000010000000000000008
pg_xlogdump: FATAL:  could not find a valid record after 0/8000000

So pg_xlogdump gives error for all WAL files except the first one. Should it not have printed the WAL records from these files ?

The first file prints ok with this at the end:

[pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump  ~/db93head/pg_xlog/000000010000000000000004 | tail -n 2
pg_xlogdump: FATAL:  error in WAL record at 0/4C7F208: record with zero length at 0/4C7F270

rmgr: XLOG        len (rec/tot):     72/   104, tx:          0, lsn: 0/04C7F1A0, prev 0/04C7F170, bkp: 0000, desc: checkpoint: redo 0/4400D70; tli 1; prev tli 1; fpw true; xid 0/1807; oid 24576; multi 1; offset 0; oldest xid 1795 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; online
rmgr: XLOG        len (rec/tot):     72/   104, tx:          0, lsn: 0/04C7F208, prev 0/04C7F1A0, bkp: 0000, desc: checkpoint: redo 0/4C7F208; tli 1; prev tli 1; fpw true; xid 0/1807; oid 16387; multi 1; offset 0; oldest xid 1795 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown

But even this is a bit surprising because I would have expected the shutdown checkpoint WAL record in the last WAL file. Just to note, I stopped the server only once at the end of loading the data.

I hope I am not doing something terribly wrong here.

3. The usage of pg_xlogdump shows this:
Usage:
  pg_xlogdump [OPTION] [STARTSEG [ENDSEG]]

Looking at the usage, one might feel that the STARTSEG and ENDSEG both are optional. But if I try to invoke pg_xlogdump without any argument, it fails.

Thanks,
Pavan

Re: Couple of issues with pg_xlogdump

From
Andres Freund
Date:
On 2013-04-23 14:51:05 +0530, Pavan Deolasee wrote:
> Hello,
> 
> I was playing with pg_xlogdump in the HEAD and found a few issues.
> 
> 1. Tried compiling pg_xlogdump via PGXS mechanism and it fails with the
> following error:
> make: *** No rule to make target
>
`/home/pavan.deolasee/work/pgsql/postgresql/install/lib/pgxs/src/makefiles/../../src/backend/access/transam/xlogreader.c',
> needed by `xlogreader.c'.  Stop.
> 
> There are no issues if the sources are compiled directly inside the contrib
> module

Yes, its not supposed to work. In some previous thread I was suggesting
to write out an explicit error but the reactions where mixed, so I
didn't pursue it further. I guess I should submit something more than a
POC patch then...

> 2. I created a fresh database cluster, created a table and COPY IN a
> million records in the table and then stopped the server. I then tried to
> dump the xlog files from pg_xlog directory.
> 
> [pavan.deolasee@puppetserver pg_xlogdump]$ ls ~/db93head/pg_xlog/
> 000000010000000000000004  000000010000000000000005
> 000000010000000000000006  000000010000000000000007
> 000000010000000000000008  archive_status
> 
> [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump
> ~/db93head/pg_xlog/000000010000000000000005
> pg_xlogdump: FATAL:  could not find a valid record after 0/5000000
> [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump
> ~/db93head/pg_xlog/000000010000000000000006
> pg_xlogdump: FATAL:  could not find a valid record after 0/6000000
> [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump
> ~/db93head/pg_xlog/000000010000000000000007
> pg_xlogdump: FATAL:  could not find a valid record after 0/7000000
> [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump
> ~/db93head/pg_xlog/000000010000000000000008
> pg_xlogdump: FATAL:  could not find a valid record after 0/8000000
> 
> So pg_xlogdump gives error for all WAL files except the first one. Should
> it not have printed the WAL records from these files ?

Probably not. Those are likely renamed wal files that do not yet contain
valid data.

> The first file prints ok with this at the end:
> 
> [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump
> ~/db93head/pg_xlog/000000010000000000000004 | tail -n 2
> pg_xlogdump: FATAL:  error in WAL record at 0/4C7F208: record with zero
> length at 0/4C7F270

Which this confirms. This is likely the current end of wal. If you look
at pg_current_xlog_location() after starting the server again, it should
show an address nearby?

> rmgr: XLOG        len (rec/tot):     72/   104, tx:          0, lsn:
> 0/04C7F1A0, prev 0/04C7F170, bkp: 0000, desc: checkpoint: redo 0/4400D70;
> tli 1; prev tli 1; fpw true; xid 0/1807; oid 24576; multi 1; offset 0;
> oldest xid 1795 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
> online
> rmgr: XLOG        len (rec/tot):     72/   104, tx:          0, lsn:
> 0/04C7F208, prev 0/04C7F1A0, bkp: 0000, desc: checkpoint: redo 0/4C7F208;
> tli 1; prev tli 1; fpw true; xid 0/1807; oid 16387; multi 1; offset 0;
> oldest xid 1795 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
> shutdown
> 
> But even this is a bit surprising because I would have expected the
> shutdown checkpoint WAL record in the last WAL file. Just to note, I
> stopped the server only once at the end of loading the data.

Same reasoning, we rename/recycle wal files into place before they are
used. I think its RemoveOldXlogFiles or so that does that.

> 3. The usage of pg_xlogdump shows this:
> Usage:
>   pg_xlogdump [OPTION] [STARTSEG [ENDSEG]]
> 
> Looking at the usage, one might feel that the STARTSEG and ENDSEG both are
> optional. But if I try to invoke pg_xlogdump without any argument, it fails.

It works without either if you use explicit options like -s STARTADDR
and -p PATH which is frequently useful to just print a few records at
the correct point. I am not sure how could put that in there without
making it too complicated. Any suggestions?

Thanks for looking at this!

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Couple of issues with pg_xlogdump

From
Pavan Deolasee
Date:



On Tue, Apr 23, 2013 at 3:00 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-04-23 14:51:05 +0530, Pavan Deolasee wrote:
> Hello,
>
> I was playing with pg_xlogdump in the HEAD and found a few issues.
>
> 1. Tried compiling pg_xlogdump via PGXS mechanism and it fails with the
> following error:
> make: *** No rule to make target
> `/home/pavan.deolasee/work/pgsql/postgresql/install/lib/pgxs/src/makefiles/../../src/backend/access/transam/xlogreader.c',
> needed by `xlogreader.c'.  Stop.
>
> There are no issues if the sources are compiled directly inside the contrib
> module

Yes, its not supposed to work. In some previous thread I was suggesting
to write out an explicit error but the reactions where mixed, so I
didn't pursue it further. I guess I should submit something more than a
POC patch then...


Yeah, I think we can print a user friendly error if USE_PGXS is set. Or at least remove its handling from the Makefile
 

Which this confirms. This is likely the current end of wal. If you look
at pg_current_xlog_location() after starting the server again, it should
show an address nearby?


Oh yes, you are right. Again, could there be a better way to report empty WAL files ? A general tendency would be to look at the last few WAL files in case failures or crashes and they are likely to be empty.
 


It works without either if you use explicit options like -s STARTADDR
and -p PATH which is frequently useful to just print a few records at
the correct point. I am not sure how could put that in there without
making it too complicated. Any suggestions?


Ah ok. Can we mention these details at in the documentation ?
http://www.postgresql.org/docs/devel/static/pgxlogdump.html

Thanks,
Pavan

Re: Couple of issues with pg_xlogdump

From
Andres Freund
Date:
On 2013-04-23 15:16:05 +0530, Pavan Deolasee wrote:
> > Which this confirms. This is likely the current end of wal. If you look
> > at pg_current_xlog_location() after starting the server again, it should
> > show an address nearby?
> >
> >
> Oh yes, you are right. Again, could there be a better way to report empty
> WAL files ? A general tendency would be to look at the last few WAL files
> in case failures or crashes and they are likely to be empty.

Hm. I don't really see what we could sensibly and easily do, but perhaps
I just lived with it for too long ;)

> > It works without either if you use explicit options like -s STARTADDR
> > and -p PATH which is frequently useful to just print a few records at
> > the correct point. I am not sure how could put that in there without
> > making it too complicated. Any suggestions?
> >
> >
> Ah ok. Can we mention these details at in the documentation ?
> http://www.postgresql.org/docs/devel/static/pgxlogdump.html

So something like:
At least one of STARTSEG, --start and --path or --rmgr=list has to be
specified.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Couple of issues with pg_xlogdump

From
Alvaro Herrera
Date:
Andres Freund escribió:
> On 2013-04-23 15:16:05 +0530, Pavan Deolasee wrote:

> > > It works without either if you use explicit options like -s STARTADDR
> > > and -p PATH which is frequently useful to just print a few records at
> > > the correct point. I am not sure how could put that in there without
> > > making it too complicated. Any suggestions?
> > >
> > Ah ok. Can we mention these details at in the documentation ?
> > http://www.postgresql.org/docs/devel/static/pgxlogdump.html
>
> So something like:
> At least one of STARTSEG, --start and --path or --rmgr=list has to be
> specified.

I think we need more than one synopsis line.  Maybe

Usage: pg_xlogdump [OPTION] --path=PATH --start=STARTPOS pg_xlogdump [OPTION] [STARTSEG [ENDSEG]]

And then, under "options", do not list --path and --start (because
that'd imply they can be used when STARTSEG is specified, which I
understand they cannot).

Do we have any other possible operation mode?  IIRC those are the only
two possible modes.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Couple of issues with pg_xlogdump

From
Andres Freund
Date:
On 2013-04-23 11:51:06 -0300, Alvaro Herrera wrote:
> Andres Freund escribió:
> > On 2013-04-23 15:16:05 +0530, Pavan Deolasee wrote:
>
> > > > It works without either if you use explicit options like -s STARTADDR
> > > > and -p PATH which is frequently useful to just print a few records at
> > > > the correct point. I am not sure how could put that in there without
> > > > making it too complicated. Any suggestions?
> > > >
> > > Ah ok. Can we mention these details at in the documentation ?
> > > http://www.postgresql.org/docs/devel/static/pgxlogdump.html
> >
> > So something like:
> > At least one of STARTSEG, --start and --path or --rmgr=list has to be
> > specified.
>
> I think we need more than one synopsis line.  Maybe
>
> Usage:
>   pg_xlogdump [OPTION] --path=PATH --start=STARTPOS
>   pg_xlogdump [OPTION] [STARTSEG [ENDSEG]]
>
> And then, under "options", do not list --path and --start (because
> that'd imply they can be used when STARTSEG is specified, which I
> understand they cannot).

Both can be used. If you specify --start and STARTSEG the address has to
be contained in the file:
    if (XLogRecPtrIsInvalid(private.startptr))        XLogSegNoOffsetToRecPtr(segno, 0, private.startptr);    else if
(!XLByteInSeg(private.startptr,segno))    {        fprintf(stderr,                "%s: start log position %X/%X is not
insidefile \"%s\"\n",                progname,                (uint32) (private.startptr >> 32),
(uint32)private.startptr,                fname);        goto bad_argument;    } 

--path and STARTSEG/ENDSEG also makes sense, it will be used to locate
the file.

> Do we have any other possible operation mode?  IIRC those are the only
> two possible modes.

I think so as well.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Couple of issues with pg_xlogdump

From
Jeff Janes
Date:
On Tue, Apr 23, 2013 at 2:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-04-23 14:51:05 +0530, Pavan Deolasee wrote:

> [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump
> ~/db93head/pg_xlog/000000010000000000000008
> pg_xlogdump: FATAL:  could not find a valid record after 0/8000000
>
> So pg_xlogdump gives error for all WAL files except the first one. Should
> it not have printed the WAL records from these files ?

Probably not. Those are likely renamed wal files that do not yet contain
valid data.

But they do contain valid data, just not for the name the file currently has.  If you can guess what the pre-recycle name was, you can rename the file (as an out-of-tree copy of course) and then get the dump out of it, which I've found can be quite useful.  Perhaps there should be an option for it to "press on regardless" and dump the contents as if it were the pre-recycled file.  Or maybe just have the error message report to you what filename the data it sees would be valid for, so you can know what to rename it to without needing to guess.  I don't really know how this would interact with STARTSEG and ENDSEG range, though, so perhaps it would only apply when just one file is given, not a range.
 
Cheers,

Jeff