Re: proposal: possibility to read dumped table's name from file - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: proposal: possibility to read dumped table's name from file
Date
Msg-id CAFj8pRBjVcTeqD6P9ihc6uMySCzCpNegCOgeZ=gCk0PQVdzGyA@mail.gmail.com
Whole thread Raw
In response to Re: proposal: possibility to read dumped table's name from file  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers

Hi


You're right- no one followed up on that.  Instead, one group continues
to push for 'simple' and to just accept what's been proposed, while
another group counters that we should be looking at the broader design
question and work towards a solution which will work for us down the
road, and not just right now.

One thing remains clear- there's no consensus here.

I think there should be some misunderstanding about the target of this patch, and I am afraid so there cannot be consensus, because the people are speaking about two very different features. And it is not possible to push it to one thing. It cannot work I am afraid.

1. The main target of this patch is to solve the problem with the too large command line of pg_dump when there are a lot of dumped objects. You need to call pg_dump only once to ensure dump in one transaction. And sometimes it is not possible to use wild characters effectively, because the state of objects is in different databases. Enhancing the length of the command line is not secure, and there are other production issues. In this case you need a very simple format - just because you want to use pg_dump in pipe. This format should be line oriented - and usually it will contain just "dump this table, dump second table". Nothing else. Nobody will read this format, nobody will edit this format. Because the main platform for this format is probably the UNIX shell, the format should be simple. I really don't see any joy in generating JSON and parsing JSON later. These data will be processed locally. This is one purpose designed format, and it is not designed for holding configuration. For this purpose the complex format has not any advantage. There is not a problem with parsing JSON or other formats on the pg_dump side, but it is pretty hard to generate valid JSON from bash script. For a unix shell we need the most possible simple format. Theoretically this format (this file) can hold any pg_dump's option, but for usual streaming processing the only filter's options will be there. Originally this feature had the name "filter file". There are a lot of examples of successful filter's file formats in the UNIX world, and I think so nobody doubts about sense and usability. Probably there is a consensus so filter's files are not config files.

The format of the filter file can look like "+d tablename" or "include data tablename". If we find a consensus so the filter file is a good thing, then the format design and implementation is easy work. Isn't problem to invent comment lines.

2. Is true, so there is only a small step from filter's file to option's file. I rewrote this patch in this direction. The advantage is universality - it can support any options without necessity to modify related code. Still this format is not difficult for producers, and it is simple for parsing. Now, the format should be defined by command line format: "-t tablename" or "--table tablename" or "table tablename". There can be issues related to different parsers in shell and in implemented code, but it can be solved. Isn't problem to introduce comment lines. The big advantage is simplicity of usage, simplicity of implementation - more the implementation is generic.

3. But the option's file is just a small step to config file. I can imagine somebody wanting to store typical configuration (and usual options) for psql, pg_dump, pg_restore, pgAdmin, ... somewhere. The config files are very different creatures than filter's files. Although they can be generated, usually are edited and can be very complex. There can be shared parts for all applications, and specific sections for psql, and specific sections for every database. The config files can be brutally complex. The simple text format is not good for this purpose. And some people prefer YAML, some people hate this format. Other people prefer XML or JSON or anything else. Sometimes the complexity of config files is too big, and people prefer startup scripting.

Although there is an intersection between filter's files and config files, I see very big differences in usage. Filter's files are usually temporal and generated and non shared. Config file's are persistent, usually manually modified and can be shared. The requests are different, and should be different too. I don't propose any configuration's file related features, and my proposal doesn't block the introduction of configuration's file in any format in future. I think these features are very different, and should be implemented differently. The filter's file or option's file will be a pretty ugly config file, and config's file will be a pretty impractical filter's file.

So can we talk about implementation of filter's file or option's file? And can we talk about implementation config's files in separate topics? Without it, I am afraid so there is no possibility of finding an agreement and moving forward.

Regards

Pavel















Thanks,

Stephen

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Next
From: Ian Lawrence Barwick
Date:
Subject: Re: [PATCH] psql: \dn+ to show size of each schema..