Thread: [HACKERS] pg_waldump command line arguments
pg_waldump --help claims that you run it like this: Usage: pg_waldump [OPTION]... [STARTSEG [ENDSEG]] And https://www.postgresql.org/docs/10/static/pgwaldump.html agrees. Since square brackets indicate optional arguments, this sort of makes it sound like running pg_waldump with no arguments ought to work. But it doesn't: $ pg_waldump pg_waldump: no arguments specified Try "pg_waldump --help" for more information. If we removed the error check that displays "pg_waldump: no arguments specified", then it would still fail, but with a more useful error message: $ pg_waldump -- pg_waldump: no start WAL location given Try "pg_waldump --help" for more information. That message ought to perhaps be changed to say that you specified neither the start WAL location nor the start WAL file, but even as it stands it's certainly better than "no arguments specified". Another problem is that if the file name you pass to pg_waldump doesn't happen to have a name that looks like a WAL file, it fails in a completely ridiculous fashion: $ pg_waldump /etc/passwd pg_waldump: FATAL: could not find file "000000017C55C16F000000FF": No such file or directory The problem appears to be that fuzzy_open_file() successfully opens the file and then invokes XLogFromFileName() on the filename. XLogFromFileName() calls sscanf() on the file name without any error checking, which I think results in leaving private.timeline uninitialized and setting segno to whatever preexisting garbage was in the log and segno variables declared inside XLogFromFileName(), resulting in an attempt to find a more or less completely random file. A slightly broader concern is whether we need to require the start position at all. It seems like one could locate the WAL directory using the existing logic, then search for the earliest file. It might be a little unclear what "earliest" means when multiple timelines are present, but I bet we could come up with some behavior that would be convenient for most users. It would be quite handy to be able to run this without arguments (or just with -z) and have it process all the WAL files that you've got on hand. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 16, 2017 at 2:38 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > A slightly broader concern is whether we need to require the start > position at all. It seems like one could locate the WAL directory > using the existing logic, then search for the earliest file. It might > be a little unclear what "earliest" means when multiple timelines are > present, but I bet we could come up with some behavior that would be > convenient for most users. We already have some default behaviour defined -- -t timeline --timeline=timeline Timeline from which to read log records. The default is to use the value in startseg, if that is specified; otherwise, the default is 1. -- So, if startseg is not provided, choose the earliest file in the default timeline (given by -t 1 when specified). > It would be quite handy to be able to run > this without arguments (or just with -z) and have it process all the > WAL files that you've got on hand. > +1. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 2017-06-15 17:08:23 -0400, Robert Haas wrote: > pg_waldump --help claims that you run it like this: > > Usage: > pg_waldump [OPTION]... [STARTSEG [ENDSEG]] > > And https://www.postgresql.org/docs/10/static/pgwaldump.html agrees. > Since square brackets indicate optional arguments, this sort of makes > it sound like running pg_waldump with no arguments ought to work. But > it doesn't: Well, not really, it indicates that positional arguments are allowed, but not required. You can get by with with -s / -e, which are sometimes important, if you want to look at multiple timelines etc. > A slightly broader concern is whether we need to require the start > position at all. It seems like one could locate the WAL directory > using the existing logic, then search for the earliest file. "earliest file" isn't actually that trivial to determine if there's timelines etc. But leaving that aside, it'll be frequently so much data that'll be output, that it'd make the output pretty much useless, no? I think if we were to add a bit more magic, it'd make more sense to parse pg_control and start at the last flushed point nof WAL forward, especially with -f. > It might be a little unclear what "earliest" means when multiple > timelines are present, but I bet we could come up with some behavior > that would be convenient for most users. It would be quite handy to > be able to run this without arguments (or just with -z) and have it > process all the WAL files that you've got on hand. With -z I agree, probably best by parsing pg_control and parsing [checkpoint - 1, minRecoveryPoint) or such. I'm willing to review some patches here, but I don't plan to personally work on patches around this... Greetings, Andres Freund