Thread: [HACKERS] pg_waldump command line arguments

[HACKERS] pg_waldump command line arguments

From
Robert Haas
Date:
pg_waldump --help claims that you run it like this:

Usage: pg_waldump [OPTION]... [STARTSEG [ENDSEG]]

And https://www.postgresql.org/docs/10/static/pgwaldump.html agrees.
Since square brackets indicate optional arguments, this sort of makes
it sound like running pg_waldump with no arguments ought to work.  But
it doesn't:

$ pg_waldump
pg_waldump: no arguments specified
Try "pg_waldump --help" for more information.

If we removed the error check that displays "pg_waldump: no arguments
specified", then it would still fail, but with a more useful error
message:

$ pg_waldump --
pg_waldump: no start WAL location given
Try "pg_waldump --help" for more information.

That message ought to perhaps be changed to say that you specified
neither the start WAL location nor the start WAL file, but even as it
stands it's certainly better than "no arguments specified".

Another problem is that if the file name you pass to pg_waldump
doesn't happen to have a name that looks like a WAL file, it fails in
a completely ridiculous fashion:

$ pg_waldump /etc/passwd
pg_waldump: FATAL:  could not find file "000000017C55C16F000000FF": No
such file or directory

The problem appears to be that fuzzy_open_file() successfully opens
the file and then invokes XLogFromFileName() on the filename.
XLogFromFileName() calls sscanf() on the file name without any error
checking, which I think results in leaving private.timeline
uninitialized and setting segno to whatever preexisting garbage was in
the log and segno variables declared inside XLogFromFileName(),
resulting in an attempt to find a more or less completely random file.

A slightly broader concern is whether we need to require the start
position at all.  It seems like one could locate the WAL directory
using the existing logic, then search for the earliest file.  It might
be a little unclear what "earliest" means when multiple timelines are
present, but I bet we could come up with some behavior that would be
convenient for most users.  It would be quite handy to be able to run
this without arguments (or just with -z) and have it process all the
WAL files that you've got on hand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] pg_waldump command line arguments

From
Ashutosh Bapat
Date:
On Fri, Jun 16, 2017 at 2:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> A slightly broader concern is whether we need to require the start
> position at all.  It seems like one could locate the WAL directory
> using the existing logic, then search for the earliest file.  It might
> be a little unclear what "earliest" means when multiple timelines are
> present, but I bet we could come up with some behavior that would be
> convenient for most users.

We already have some default behaviour defined
--
-t timeline
--timeline=timeline

Timeline from which to read log records. The default is to use the
value in startseg, if that is specified; otherwise, the default is 1.
--

So, if startseg is not provided, choose the earliest file in the
default timeline (given by -t 1 when specified).

> It would be quite handy to be able to run
> this without arguments (or just with -z) and have it process all the
> WAL files that you've got on hand.
>

+1.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] pg_waldump command line arguments

From
Andres Freund
Date:
On 2017-06-15 17:08:23 -0400, Robert Haas wrote:
> pg_waldump --help claims that you run it like this:
> 
> Usage:
>   pg_waldump [OPTION]... [STARTSEG [ENDSEG]]
> 
> And https://www.postgresql.org/docs/10/static/pgwaldump.html agrees.
> Since square brackets indicate optional arguments, this sort of makes
> it sound like running pg_waldump with no arguments ought to work.  But
> it doesn't:

Well, not really, it indicates that positional arguments are allowed,
but not required.  You can get by with with -s / -e, which are sometimes
important, if you want to look at multiple timelines etc.


> A slightly broader concern is whether we need to require the start
> position at all.  It seems like one could locate the WAL directory
> using the existing logic, then search for the earliest file.

"earliest file" isn't actually that trivial to determine if there's
timelines etc. But leaving that aside, it'll be frequently so much data
that'll be output, that it'd make the output pretty much useless, no?  I
think if we were to add a bit more magic, it'd make more sense to parse
pg_control and start at the last flushed point nof WAL forward,
especially with -f.


> It might be a little unclear what "earliest" means when multiple
> timelines are present, but I bet we could come up with some behavior
> that would be convenient for most users.  It would be quite handy to
> be able to run this without arguments (or just with -z) and have it
> process all the WAL files that you've got on hand.

With -z I agree, probably best by parsing pg_control and parsing
[checkpoint - 1, minRecoveryPoint) or such.

I'm willing to review some patches here, but I don't plan to personally
work on patches around this...

Greetings,

Andres Freund