Re: Non-text mode for pg_dumpall - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Non-text mode for pg_dumpall |
Date | |
Msg-id | 20250722005339.ca.nmisch@google.com Whole thread Raw |
In response to | Re: Non-text mode for pg_dumpall (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: Non-text mode for pg_dumpall
|
List | pgsql-hackers |
On Mon, Jul 21, 2025 at 04:41:03PM -0400, Andrew Dunstan wrote: > On 2025-07-17 Th 6:18 AM, Mahendra Singh Thalor wrote > > > > > > > --- a/src/bin/pg_dump/pg_restore.c > > > > > > > +++ b/src/bin/pg_dump/pg_restore.c > > > > > > > +/* > > > > > > > + * read_one_statement > > > > > > > + * > > > > > > > + * This will start reading from passed file pointer using fgetc and read till > > > > > > > + * semicolon(sql statement terminator for global.dat file) > > > > > > > + * > > > > > > > + * EOF is returned if end-of-file input is seen; time to shut down. > > > > > > What makes it okay to use this particular subset of SQL lexing? > > > > > To support complex syntax, we used this code from another file. > > > > I'm hearing that you copied this code from somewhere. Running > > > > "git grep 'time to shut down'" suggests you copied it from > > > > InteractiveBackend(). Is that right? I do see other similarities between > > > > read_one_statement() and InteractiveBackend(). > > > > > > > > Copying InteractiveBackend() provides negligible assurance that this is the > > > > right subset of SQL lexing. Only single-user mode uses InteractiveBackend(). > > > > Single-user mode survives mostly as a last resort for recovering from having > > > > reached xidStopLimit, is rarely used, and only superusers write queries to it. > > > Yes, we copied this from InteractiveBackend to read statements from > > > global.dat file. > > Maybe we should ensure that identifiers with CR or LF are turned into > Unicode quoted identifiers, so each SQL statement would always only occupy > one line. Interesting. That might work. > Or just reject role and tablespace names with CR or LF altogether, > just as we do for database names. There are other ways to get multi-line statements. Non-exhaustive list: - pg_db_role_setting.setconfig - pg_shdescription.description - pg_shseclabel.label - pg_tablespace.spcoptions (if we add a text option in the future) I think this decision about lexing also ties to other unfinished open item work of aligning "pg_dumpall -Fd;pg_restore [options]" behavior with "pg_dump -Fd;pg_restore [options]". "pg_restore --no-privileges" should not restore pg_tablespace.spcacl, and "pg_restore --no-comments" should not emit COMMENT statements. I suspect this is going to end with a structured dump like we use on the pg_dump (per-database) side. It's not an accident that v17 pg_restore doesn't lex text files to do its job. pg_dumpall deals with a more-limited set of statements than pg_dump deals with, but they're not _that much_ more limited. I won't veto a lexing-based approach if it gets the behaviors right, but I don't have high hopes for it getting the behaviors right and staying that way. (I almost said "pg_restore --no-owner" should not restore pg_tablespace.spcowner, but v17 "pg_dumpall --no-owner" does restore it. One could argue for or against aligning $SUBJECT behavior w/ v17's mistake there.)
pgsql-hackers by date: