Re: Non-text mode for pg_dumpall - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Non-text mode for pg_dumpall
Date
Msg-id b3ec184d-b277-48c7-8584-d54d33fafff1@dunslane.net
Whole thread Raw
In response to Re: Non-text mode for pg_dumpall  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On 2025-07-21 Mo 8:53 PM, Noah Misch wrote:
> On Mon, Jul 21, 2025 at 04:41:03PM -0400, Andrew Dunstan wrote:
>> On 2025-07-17 Th 6:18 AM, Mahendra Singh Thalor wrote
>>>>>>>> --- a/src/bin/pg_dump/pg_restore.c
>>>>>>>> +++ b/src/bin/pg_dump/pg_restore.c
>>>>>>>> +/*
>>>>>>>> + * read_one_statement
>>>>>>>> + *
>>>>>>>> + * This will start reading from passed file pointer using fgetc and read till
>>>>>>>> + * semicolon(sql statement terminator for global.dat file)
>>>>>>>> + *
>>>>>>>> + * EOF is returned if end-of-file input is seen; time to shut down.
>>>>>>> What makes it okay to use this particular subset of SQL lexing?
>>>>>> To support complex syntax, we used this code from another file.
>>>>> I'm hearing that you copied this code from somewhere.  Running
>>>>> "git grep 'time to shut down'" suggests you copied it from
>>>>> InteractiveBackend().  Is that right?  I do see other similarities between
>>>>> read_one_statement() and InteractiveBackend().
>>>>>
>>>>> Copying InteractiveBackend() provides negligible assurance that this is the
>>>>> right subset of SQL lexing.  Only single-user mode uses InteractiveBackend().
>>>>> Single-user mode survives mostly as a last resort for recovering from having
>>>>> reached xidStopLimit, is rarely used, and only superusers write queries to it.
>>>> Yes, we copied this from InteractiveBackend to read statements from
>>>> global.dat file.
>> Maybe we should ensure that identifiers with CR or LF are turned into
>> Unicode quoted identifiers, so each SQL statement would always only occupy
>> one line.
> Interesting.  That might work.
>
>> Or just reject role and tablespace names with CR or LF altogether,
>> just as we do for database names.
> There are other ways to get multi-line statements.  Non-exhaustive list:
>
> - pg_db_role_setting.setconfig
> - pg_shdescription.description
> - pg_shseclabel.label
> - pg_tablespace.spcoptions (if we add a text option in the future)
>
> I think this decision about lexing also ties to other unfinished open item
> work of aligning "pg_dumpall -Fd;pg_restore [options]" behavior with "pg_dump
> -Fd;pg_restore [options]".  "pg_restore --no-privileges" should not restore
> pg_tablespace.spcacl, and "pg_restore --no-comments" should not emit COMMENT
> statements.
>
> I suspect this is going to end with a structured dump like we use on the
> pg_dump (per-database) side.  It's not an accident that v17 pg_restore doesn't
> lex text files to do its job.  pg_dumpall deals with a more-limited set of
> statements than pg_dump deals with, but they're not _that much_ more limited.
> I won't veto a lexing-based approach if it gets the behaviors right, but I
> don't have high hopes for it getting the behaviors right and staying that way.


Yeah, that was my original idea. But maybe instead of extending the 
archive mechanism, we could do something more lightweight, e.g. output 
the statements as a JSON array.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: teach pg_upgrade to handle in-place tablespaces
Next
From: Peter Smith
Date:
Subject: Re: Skipping schema changes in publication