Re: proposal: possibility to read dumped table's name from file - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: proposal: possibility to read dumped table's name from file
Date
Msg-id 2caf1efb-359d-7720-9d6e-88efdaaf55dd@enterprisedb.com
Whole thread Raw
In response to Re: proposal: possibility to read dumped table's name from file  (Stephen Frost <sfrost@snowman.net>)
Responses Re: proposal: possibility to read dumped table's name from file  (Daniel Gustafsson <daniel@yesql.se>)
List pgsql-hackers

On 7/13/21 3:40 PM, Stephen Frost wrote:
> Greetings,
> 
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> [1] your proposal of "[+-] OBJTYPE OBJIDENT" plus empty lines allowed
>>>      plus lines starting with # are comments, seems plenty.  Any line not
>>>      following that format would cause an error to be thrown.
>>
>> I'd like to see some kind of keyword on each line, so that we could extend
>> the command set by adding new keywords.  As this stands, I fear we'd end
>> up using random punctuation characters in place of [+-], which seems
>> pretty horrid from a readability standpoint.
> 
> I agree that it'd end up being bad with single characters.
> 

The [+-] format is based on what rsync does, so there's at least some 
precedent for that, and IMHO it's fairly readable. I agree the rest of 
the rule (object type, ...) may be a bit more verbose.

>> I think that this file format should be designed with an eye to allowing
>> every, or at least most, pg_dump options to be written in the file rather
>> than on the command line.  I don't say we have to *implement* that right
>> now; but if the format spec is incapable of being extended to meet
>> requests like that one, I think we'll regret it.  This line of thought
>> suggests that the initial commands ought to match the existing
>> include/exclude switches, at least approximately.
> 
> I agree that we want to have an actual config file that allows just
> about every pg_dump option.  I'm also fine with saying that we don't
> have to implement that initially but the format should be one which can
> be extended to allow that.
> 

I understand the desire to have a config file that may contain all 
pg_dump options, but I really don't see why we'd want to mix that with 
the file containing filter rules.

I think those should be separate, one of the reasons being that I find 
it desirable to be able to "include" the filter rules into different 
pg_dump configs. That also means the format for the filter rules can be 
much simpler.

It's also not clear to me whether the single-file approach would allow 
filtering not supported by actual pg_dump option, for example.

>> Hence I suggest
>>
>>     include table PATTERN
>>     exclude table PATTERN
>>
>> which ends up being the above but with words not [+-].
> 
Work for me.

> Which ends up inventing yet-another-file-format which people will end up
> writing generators and parsers for.  Which is exactly what I was arguing
> we really should be trying to avoid doing.
> 

People will have to write generators *in any case* because how else 
would you use this? Unless we also provide tools to manipulate that file 
(which seems rather futile), they'll have to do that. Even if we used 
JSON/YAML/TOML/... they'd still need to deal with the semantics of the 
file format.

FWIW I don't understand why would they need to write parsers. That's 
something we'd need to do to process the file. I think the case when the 
filter file needs to be modified is rather rare - it certainly is not 
what the original use case Pavel tried to address needs. (I know that 
customer and the filter would be generated and used for a single dump.)

My opinion is that the best solution (to make both generators and 
parsers simple) is to keep the format itself as simple as possible. 
Which is exactly why I'm arguing for only addressing the filtering, not 
trying to invent a "universal" pg_dump config file format.

> I definitely feel that we should have a way to allow anything that can
> be created as an object in the database to be explicitly included in the
> file and that means whatever we do need to be able to handle objects
> that have names that span multiple lines, etc.  It's not clear how the
> above would.  As I recall, the proposed patch didn't have anything for
> handling that, which was one of the issues I had with it and is why I
> bring it up again.
> 

I really don't understand why you think the current format can't do 
escaping/quoting or handle names spanning multiple lines. The fact that 
the original patch did not handle that correctly is a bug, but it does 
not mean the format can't handle that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)
Next
From: Ranier Vilela
Date:
Subject: Re: [PATCH] Use optimized single-datum tuplesort in ExecSort