Re: Non-text mode for pg_dumpall - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Non-text mode for pg_dumpall
Date
Msg-id 20250722005339.ca.nmisch@google.com
Whole thread Raw
In response to Re: Non-text mode for pg_dumpall  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Non-text mode for pg_dumpall
List pgsql-hackers
On Mon, Jul 21, 2025 at 04:41:03PM -0400, Andrew Dunstan wrote:
> On 2025-07-17 Th 6:18 AM, Mahendra Singh Thalor wrote
> > > > > > > --- a/src/bin/pg_dump/pg_restore.c
> > > > > > > +++ b/src/bin/pg_dump/pg_restore.c
> > > > > > > +/*
> > > > > > > + * read_one_statement
> > > > > > > + *
> > > > > > > + * This will start reading from passed file pointer using fgetc and read till
> > > > > > > + * semicolon(sql statement terminator for global.dat file)
> > > > > > > + *
> > > > > > > + * EOF is returned if end-of-file input is seen; time to shut down.
> > > > > > What makes it okay to use this particular subset of SQL lexing?
> > > > > To support complex syntax, we used this code from another file.
> > > > I'm hearing that you copied this code from somewhere.  Running
> > > > "git grep 'time to shut down'" suggests you copied it from
> > > > InteractiveBackend().  Is that right?  I do see other similarities between
> > > > read_one_statement() and InteractiveBackend().
> > > > 
> > > > Copying InteractiveBackend() provides negligible assurance that this is the
> > > > right subset of SQL lexing.  Only single-user mode uses InteractiveBackend().
> > > > Single-user mode survives mostly as a last resort for recovering from having
> > > > reached xidStopLimit, is rarely used, and only superusers write queries to it.
> > > Yes, we copied this from InteractiveBackend to read statements from
> > > global.dat file.
> 
> Maybe we should ensure that identifiers with CR or LF are turned into
> Unicode quoted identifiers, so each SQL statement would always only occupy
> one line.

Interesting.  That might work.

> Or just reject role and tablespace names with CR or LF altogether,
> just as we do for database names.

There are other ways to get multi-line statements.  Non-exhaustive list:

- pg_db_role_setting.setconfig
- pg_shdescription.description
- pg_shseclabel.label
- pg_tablespace.spcoptions (if we add a text option in the future)

I think this decision about lexing also ties to other unfinished open item
work of aligning "pg_dumpall -Fd;pg_restore [options]" behavior with "pg_dump
-Fd;pg_restore [options]".  "pg_restore --no-privileges" should not restore
pg_tablespace.spcacl, and "pg_restore --no-comments" should not emit COMMENT
statements.

I suspect this is going to end with a structured dump like we use on the
pg_dump (per-database) side.  It's not an accident that v17 pg_restore doesn't
lex text files to do its job.  pg_dumpall deals with a more-limited set of
statements than pg_dump deals with, but they're not _that much_ more limited.
I won't veto a lexing-based approach if it gets the behaviors right, but I
don't have high hopes for it getting the behaviors right and staying that way.

(I almost said "pg_restore --no-owner" should not restore
pg_tablespace.spcowner, but v17 "pg_dumpall --no-owner" does restore it.  One
could argue for or against aligning $SUBJECT behavior w/ v17's mistake there.)



pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: teach pg_upgrade to handle in-place tablespaces
Next
From: Michael Paquier
Date:
Subject: Re: Improve error reporting in 027_stream_regress test