Thread: proposal: possibility to read dumped table's name from file
Hi
one my customer has to specify dumped tables name by name. After years and increasing database size and table numbers he has problem with too short command line. He need to read the list of tables from file (or from stdin).
I wrote simple PoC patch
Comments, notes, ideas?
Regards
Pavel
Attachment
Pavel Stehule <pavel.stehule@gmail.com> writes: > one my customer has to specify dumped tables name by name. After years and > increasing database size and table numbers he has problem with too short > command line. He need to read the list of tables from file (or from stdin). I guess the question is why. That seems like an enormously error-prone approach. Can't they switch to selecting schemas? Or excluding the hopefully-short list of tables they don't want? regards, tom lane
pá 29. 5. 2020 v 16:28 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> one my customer has to specify dumped tables name by name. After years and
> increasing database size and table numbers he has problem with too short
> command line. He need to read the list of tables from file (or from stdin).
I guess the question is why. That seems like an enormously error-prone
approach. Can't they switch to selecting schemas? Or excluding the
hopefully-short list of tables they don't want?
It is not typical application. It is a analytic application when the schema of database is based on dynamic specification of end user (end user can do customization every time). So schema is very dynamic.
For example - typical server has about four thousand databases and every database has some between 1K .. 10K tables.
Another specific are different versions of data in different tables. A user can work with one set of data (one set of tables) and a application prepares new set of data (new set of tables). Load can be slow, because sometimes bigger tables are filled (about forty GB). pg_dump backups one set of tables (little bit like snapshot of data). So it is strange OLAP (but successfull) application.
Regards
Pavel
regards, tom lane
On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote: > Hi > > one my customer has to specify dumped tables name by name. After years and > increasing database size and table numbers he has problem with too short > command line. He need to read the list of tables from file (or from stdin). > > I wrote simple PoC patch > > Comments, notes, ideas? This seems like a handy addition. What I've done in cases similar to this was to use `grep -f` on the output of `pg_dump -l` to create a file suitable for `pg_dump -L`, or mash them together like this: pg_restore -L <(pg_dump -l /path/to/dumpfile | grep -f /path/to/listfile) -d new_db /path/to/dumpfile That's a lot of shell magic and obscure corners of commands to expect people to use. Would it make sense to expand this patch to handle other objects? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > Would it make sense to expand this patch to handle other objects? If we're gonna do something like this, +1 for being more general. The fact that pg_dump only has selection switches for tables and schemas has always struck me as an omission. regards, tom lane
pá 29. 5. 2020 v 18:03 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
David Fetter <david@fetter.org> writes:
> Would it make sense to expand this patch to handle other objects?
Sure. Just we should to design system (and names of options).
If we're gonna do something like this, +1 for being more general.
The fact that pg_dump only has selection switches for tables and
schemas has always struck me as an omission.
a implementation is trivial, hard is good design of names for these options.
Pavel
regards, tom lane
On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote: > one my customer has to specify dumped tables name by name. After years and > increasing database size and table numbers he has problem with too short > command line. He need to read the list of tables from file (or from stdin). +1 - we would use this. We put a regex (actually a pg_dump pattern) of tables to skip (timeseries partitions which are older than a few days and which are also dumped once not expected to change, and typically not redumped). We're nowhere near the execve() limit, but it'd be nice if the command was primarily a list of options and not a long regex. Please also support reading from file for --exclude-table=pattern. I'm drawing a parallel between this and rsync --include/--exclude and --filter. We'd be implementing a new --filter, which might have similar syntax to rsync (which I always forget). -- Justin
Hi
pá 29. 5. 2020 v 20:25 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote:
> one my customer has to specify dumped tables name by name. After years and
> increasing database size and table numbers he has problem with too short
> command line. He need to read the list of tables from file (or from stdin).
+1 - we would use this.
We put a regex (actually a pg_dump pattern) of tables to skip (timeseries
partitions which are older than a few days and which are also dumped once not
expected to change, and typically not redumped). We're nowhere near the
execve() limit, but it'd be nice if the command was primarily a list of options
and not a long regex.
Please also support reading from file for --exclude-table=pattern.
I'm drawing a parallel between this and rsync --include/--exclude and --filter.
We'd be implementing a new --filter, which might have similar syntax to rsync
(which I always forget).
I implemented support for all "repeated" pg_dump options.
--exclude-schemas-file=FILENAME
--exclude-tables-data-file=FILENAME
--exclude-tables-file=FILENAME
--include-foreign-data-file=FILENAME
--include-schemas-file=FILENAME
--include-tables-file=FILENAME
--exclude-tables-data-file=FILENAME
--exclude-tables-file=FILENAME
--include-foreign-data-file=FILENAME
--include-schemas-file=FILENAME
--include-tables-file=FILENAME
Regards
Pavel
I invite any help with doc. There is just very raw text
--
Justin
Attachment
On Mon, Jun 08, 2020 at 07:18:49PM +0200, Pavel Stehule wrote: > pá 29. 5. 2020 v 20:25 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal: > > > On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote: > > > one my customer has to specify dumped tables name by name. After years and > > > increasing database size and table numbers he has problem with too short > > > command line. He need to read the list of tables from file (or from stdin). > > > > +1 - we would use this. > > > > We put a regex (actually a pg_dump pattern) of tables to skip (timeseries > > partitions which are older than a few days and which are also dumped once not > > expected to change, and typically not redumped). We're nowhere near the > > execve() limit, but it'd be nice if the command was primarily a list of options > > and not a long regex. > > > > Please also support reading from file for --exclude-table=pattern. > > > > I'm drawing a parallel between this and rsync --include/--exclude and > > --filter. > > > > We'd be implementing a new --filter, which might have similar syntax to rsync > > (which I always forget). > > I implemented support for all "repeated" pg_dump options. > > I invite any help with doc. There is just very raw text > > + Do not dump data of tables spefified in file. *specified I still wonder if a better syntax would use a unified --filter option, whose argument would allow including/excluding any type of object: +[tn] include (t)table/(n)namespace/... -[tn] exclude (t)table/(n)namespace/... In the past, I looked for a way to exclude extended stats objects, and ended up using a separate schema. An "extensible" syntax might be better (although reading a file of just patterns has the advantage that the function can just be called once for each option for each type of object). -- Justin
po 8. 6. 2020 v 23:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Mon, Jun 08, 2020 at 07:18:49PM +0200, Pavel Stehule wrote:
> pá 29. 5. 2020 v 20:25 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
>
> > On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote:
> > > one my customer has to specify dumped tables name by name. After years and
> > > increasing database size and table numbers he has problem with too short
> > > command line. He need to read the list of tables from file (or from stdin).
> >
> > +1 - we would use this.
> >
> > We put a regex (actually a pg_dump pattern) of tables to skip (timeseries
> > partitions which are older than a few days and which are also dumped once not
> > expected to change, and typically not redumped). We're nowhere near the
> > execve() limit, but it'd be nice if the command was primarily a list of options
> > and not a long regex.
> >
> > Please also support reading from file for --exclude-table=pattern.
> >
> > I'm drawing a parallel between this and rsync --include/--exclude and
> > --filter.
> >
> > We'd be implementing a new --filter, which might have similar syntax to rsync
> > (which I always forget).
>
> I implemented support for all "repeated" pg_dump options.
>
> I invite any help with doc. There is just very raw text
>
> + Do not dump data of tables spefified in file.
*specified
I am sending updated version - now with own implementation GNU (not POSIX) function getline
I still wonder if a better syntax would use a unified --filter option, whose
argument would allow including/excluding any type of object:
+[tn] include (t)table/(n)namespace/...
-[tn] exclude (t)table/(n)namespace/...
In the past, I looked for a way to exclude extended stats objects, and ended up
using a separate schema. An "extensible" syntax might be better (although
reading a file of just patterns has the advantage that the function can just be
called once for each option for each type of object).
I tried to implement simple format "[+-][tndf] objectname"
please, check attached patch
Regards
Pavel
--
Justin
Attachment
On Tue, Jun 09, 2020 at 11:46:24AM +0200, Pavel Stehule wrote: > po 8. 6. 2020 v 23:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal: > I still wonder if a better syntax would use a unified --filter option, whose > > argument would allow including/excluding any type of object: > I tried to implement simple format "[+-][tndf] objectname" Thanks. > + /* ignore empty rows */ > + if (*line != '\0') Maybe: if line=='\0': continue We should also support comments. > + bool include_filter = false; > + bool exclude_filter = false; I think we only need one bool. You could call it: bool is_exclude = false > + > + if (chars < 4) > + invalid_filter_format(optarg, line, lineno); I think that check is too lax. I think it's ok if we require the first char to be [-+] and the 2nd char to be [dntf] > + objecttype = line[1]; ... but I think this is inadequately "liberal in what it accepts"; I think it should skip spaces. In my proposed scheme, someone might reasonably write: > + > + objectname = &line[3]; > + > + /* skip initial spaces */ > + while (*objectname == ' ') > + objectname++; I suggest to use isspace() I think we should check that *objectname != '\0', rather than chars>=4, above. > + if (include_filter) > + { > + simple_string_list_append(&table_include_patterns, objectname); > + dopt.include_everything = false; > + } > + else if (exclude_filter) > + simple_string_list_append(&table_exclude_patterns, objectname); If you use bool is_exclude, then this becomes "else" and you don't need to think about checking if (!include && !exclude). > + else if (objecttype == 'f') > + { > + if (include_filter) > + simple_string_list_append(&foreign_servers_include_patterns, objectname); > + else if (exclude_filter) > + invalid_filter_format(optarg, line, lineno); > + } I would handle invalid object types as "else: invalid_filter_format()" here, rather than duplicating above as: !=ALL('d','n','t','f') > + > + if (ferror(f)) > + fatal("could not read from file \"%s\": %s", > + f == stdin ? "stdin" : optarg, > + strerror(errno)); I think we're allowed to use %m here ? > + printf(_(" --filter=FILENAME read object names from file\n")); Object name filter expression, or something.. > + * getline is originaly GNU function, and should not be everywhere still. originally > + * Use own reduced implementation. Did you "reduce" this from another implementation? Where? What is its license ? Maybe a line-reader already exists in the frontend (?) .. or maybe it should. -- Justin
st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Tue, Jun 09, 2020 at 11:46:24AM +0200, Pavel Stehule wrote:
> po 8. 6. 2020 v 23:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
> I still wonder if a better syntax would use a unified --filter option, whose
> > argument would allow including/excluding any type of object:
> I tried to implement simple format "[+-][tndf] objectname"
So format can looks like
[tTnNDf] {objectname}
What do you think about this? This format is simpler, and it can work. What do you think about it?
> Did you "reduce" this from another implementation? Where?
> What is its license ?
> What is its license ?
The code is 100% mine. It is not copy from gnulib and everybody can simply check it
Reduced in functionality sense. There is no full argument check that is necessary for glibc functions. There are no memory checks because pg_malloc, pg_realloc are used.
On Wed, Jun 10, 2020 at 05:03:49AM +0200, Pavel Stehule wrote: > st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal: > > > On Tue, Jun 09, 2020 at 11:46:24AM +0200, Pavel Stehule wrote: > > > po 8. 6. 2020 v 23:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal: > > > > > I still wonder if a better syntax would use a unified --filter option, whose > > > > argument would allow including/excluding any type of object: > > > I tried to implement simple format "[+-][tndf] objectname" > > I had another idea about format - instead using +-, we can use case > sensitive options same to pg_dump command line (with extending Df - > because these options doesn't exists in short form) > > So format can looks like > > [tTnNDf] {objectname} > > What do you think about this? This format is simpler, and it can work. What > do you think about it? I prefer [-+][dtnf], which is similar to rsync --filter, and clear what it's doing. I wouldn't put much weight on what the short options are. I wonder if some people would want to be able to use *long* or short options: -table foo +schema baz Or maybe: exclude-table=foo schema=bar Some tools use "long options without leading dashes" as their configuration file format. Examples: openvpn, mysql. So that could be a good option. OTOH, there's only a few "keys", so I'm not sure how many people would want to repeat them, if there's enough to bother putting them in the file rather than the cmdline. -- Justin
st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Tue, Jun 09, 2020 at 11:46:24AM +0200, Pavel Stehule wrote:
> po 8. 6. 2020 v 23:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
> I still wonder if a better syntax would use a unified --filter option, whose
> > argument would allow including/excluding any type of object:
> I tried to implement simple format "[+-][tndf] objectname"
Thanks.
> + /* ignore empty rows */
> + if (*line != '\0')
Maybe: if line=='\0': continue
ok
We should also support comments.
> + bool include_filter = false;
> + bool exclude_filter = false;
I think we only need one bool.
You could call it: bool is_exclude = false
ok
> +
> + if (chars < 4)
> + invalid_filter_format(optarg, line, lineno);
I think that check is too lax.
I think it's ok if we require the first char to be [-+] and the 2nd char to be
[dntf]
> + objecttype = line[1];
... but I think this is inadequately "liberal in what it accepts"; I think it
should skip spaces. In my proposed scheme, someone might reasonably write:
> +
> + objectname = &line[3];
> +
> + /* skip initial spaces */
> + while (*objectname == ' ')
> + objectname++;
I suggest to use isspace()
ok
I think we should check that *objectname != '\0', rather than chars>=4, above.
done
> + if (include_filter)
> + {
> + simple_string_list_append(&table_include_patterns, objectname);
> + dopt.include_everything = false;
> + }
> + else if (exclude_filter)
> + simple_string_list_append(&table_exclude_patterns, objectname);
If you use bool is_exclude, then this becomes "else" and you don't need to
think about checking if (!include && !exclude).
> + else if (objecttype == 'f')
> + {
> + if (include_filter)
> + simple_string_list_append(&foreign_servers_include_patterns, objectname);
> + else if (exclude_filter)
> + invalid_filter_format(optarg, line, lineno);
> + }
I would handle invalid object types as "else: invalid_filter_format()" here,
rather than duplicating above as: !=ALL('d','n','t','f')
good idea
> +
> + if (ferror(f))
> + fatal("could not read from file \"%s\": %s",
> + f == stdin ? "stdin" : optarg,
> + strerror(errno));
I think we're allowed to use %m here ?
changed
> + printf(_(" --filter=FILENAME read object names from file\n"));
Object name filter expression, or something..
yes, it is not object names now
> + * getline is originaly GNU function, and should not be everywhere still.
originally
> + * Use own reduced implementation.
Did you "reduce" this from another implementation? Where?
What is its license ?
Maybe a line-reader already exists in the frontend (?) .. or maybe it should.
everywhere else is used a function fgets. Currently pg_getline is used just on only one place, so I don't think so moving it to some common part is maybe premature.
Maybe it can be used as replacement of some fgets calls, but then it is different topic, I think.
Thank you for comments, attached updated patch
Regards
Pavel
--
Justin
Attachment
On Thu, Jun 11, 2020 at 1:07 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: > > Thank you for comments, attached updated patch > Few comments: +invalid_filter_format(char *message, char *filename, char *line, int lineno) +{ + char *displayname; + + displayname = *filename == '-' ? "stdin" : filename; + + pg_log_error("invalid format of filter file \"%s\": %s", + displayname, + message); + + fprintf(stderr, "%d: %s\n", lineno, line); + exit_nicely(1); +} I think fclose is missing here. + if (line[chars - 1] == '\n') + line[chars - 1] = '\0'; Should we check for '\r' also to avoid failures in some platforms. + <varlistentry> + <term><option>--filter=<replaceable class="parameter">filename</replaceable></option></term> + <listitem> + <para> + Read filters from file. Format "(+|-)(tnfd) objectname: + </para> + </listitem> + </varlistentry> I felt some documentation is missing here. We could include, options tnfd is for controlling table, schema, foreign server data & table exclude patterns. Instead of using tnfd, if we could use the same options as existing pg_dump options it will be less confusing. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote: > st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal: > > > + /* ignore empty rows */ > > > + if (*line != '\0') > > > > Maybe: if line=='\0': continue > > We should also support comments. Comment support is still missing but easily added :) I tried this patch and it works for my purposes. Also, your getline is dynamically re-allocating lines of arbitrary length. Possibly that's not needed. We'll typically read "+t schema.relname", which is 132 chars. Maybe it's sufficient to do char buf[1024]; fgets(buf); if strchr(buf, '\n') == NULL: error(); ret = pstrdup(buf); In any case, you could have getline return a char* and (rather than following GNU) no need to take char**, int* parameters to conflate inputs and outputs. I realized that --filter has an advantage over the previous implementation (with multiple --exclude-* and --include-*) in that it's possible to use stdin for includes *and* excludes. By chance, I had the opportunity yesterday to re-use with rsync a regex that I'd previously been using with pg_dump and grep. What this patch calls "--filter" in rsync is called "--filter-from". rsync's --filter-from rejects filters of length longer than max filename, so I had to split it up into multiple lines instead of using regex alternation ("|"). This option is a close parallel in pg_dump. -- Justin
so 27. 6. 2020 v 14:55 odesílatel vignesh C <vignesh21@gmail.com> napsal:
On Thu, Jun 11, 2020 at 1:07 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Thank you for comments, attached updated patch
>
Few comments:
+invalid_filter_format(char *message, char *filename, char *line, int lineno)
+{
+ char *displayname;
+
+ displayname = *filename == '-' ? "stdin" : filename;
+
+ pg_log_error("invalid format of filter file \"%s\": %s",
+ displayname,
+ message);
+
+ fprintf(stderr, "%d: %s\n", lineno, line);
+ exit_nicely(1);
+}
I think fclose is missing here.
done
+ if (line[chars - 1] == '\n')
+ line[chars - 1] = '\0';
Should we check for '\r' also to avoid failures in some platforms.
I checked other usage of fgets in Postgres source code, and everywhere is used test on \n
When I did some fast research, then https://stackoverflow.com/questions/12769289/carriage-return-by-fgets \r in this case should be thrown by libc on Microsoft
\n should be on Mac OS X .. 2001 year .. I am not sure if Mac OS 9 should be supported.
+ <varlistentry>
+ <term><option>--filter=<replaceable
class="parameter">filename</replaceable></option></term>
+ <listitem>
+ <para>
+ Read filters from file. Format "(+|-)(tnfd) objectname:
+ </para>
+ </listitem>
+ </varlistentry>
I felt some documentation is missing here. We could include,
options tnfd is for controlling table, schema, foreign server data &
table exclude patterns.
I have a plan to completate doc when the design is completed. It was not clear if people prefer long or short forms of option names.
Instead of using tnfd, if we could use the same options as existing
pg_dump options it will be less confusing.
it almost same
+-t .. tables
+-n schema
-d exclude data .. there is not short option for --exclude-table-data
+f include foreign table .. there is not short option for --include-foreign-data
So still, there is a opened question if use +-tnfd system, or system based on long option
table foo
exclude-table foo
schema xx
exclude-schema xx
include-foreign-data yyy
exclude-table-data zzz
Typically these files will be generated by scripts and processed via pipe, so there I see just two arguments for and aginst:
short format - there is less probability to do typo error (but there is not full consistency with pg_dump options)
long format - it is self documented (and there is full consistency with pg_dump)
In this case I prefer short form .. it is more comfortable for users, and there are only a few variants, so it is not necessary to use too verbose language (design). But my opinion is not aggressively strong and I'll accept any common agreement.
Regards
Updated patch attached
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachment
st 1. 7. 2020 v 23:24 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:
> st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
> > > + /* ignore empty rows */
> > > + if (*line != '\0')
> >
> > Maybe: if line=='\0': continue
> > We should also support comments.
Comment support is still missing but easily added :)
I tried this patch and it works for my purposes.
Also, your getline is dynamically re-allocating lines of arbitrary length.
Possibly that's not needed. We'll typically read "+t schema.relname", which is
132 chars. Maybe it's sufficient to do
char buf[1024];
fgets(buf);
if strchr(buf, '\n') == NULL: error();
ret = pstrdup(buf);
63 bytes is max effective identifier size, but it is not max size of identifiers. It is very probably so buff with 1024 bytes will be enough for all, but I do not want to increase any new magic limit. More when dynamic implementation is not too hard.
Table name can be very long - sometimes the data names (table names) can be stored in external storages with full length and should not be practical to require truncating in filter file.
For this case it is very effective, because a resized (increased) buffer is used for following rows, so realloc should not be often. So when I have to choose between two implementations with similar complexity, I prefer more dynamic code without hardcoded limits. This dynamic hasn't any overhead.
In any case, you could have getline return a char* and (rather than following
GNU) no need to take char**, int* parameters to conflate inputs and outputs.
no, it has a special benefit. It eliminates the short malloc/free cycle. When some lines are longer, then the buffer is increased (and limits), and for other rows with same or less size is not necessary realloc.
I realized that --filter has an advantage over the previous implementation
(with multiple --exclude-* and --include-*) in that it's possible to use stdin
for includes *and* excludes.
yes, it looks like better choose
By chance, I had the opportunity yesterday to re-use with rsync a regex that
I'd previously been using with pg_dump and grep. What this patch calls
"--filter" in rsync is called "--filter-from". rsync's --filter-from rejects
filters of length longer than max filename, so I had to split it up into
multiple lines instead of using regex alternation ("|"). This option is a
close parallel in pg_dump.
we can talk about option name - maybe "--filter-from" is better than just "--filter"
Regards
Pavel
--
Justin
On Wed, Jul 01, 2020 at 04:24:52PM -0500, Justin Pryzby wrote: > On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote: > > st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal: > > > > + /* ignore empty rows */ > > > > + if (*line != '\0') > > > > > > Maybe: if line=='\0': continue > > > We should also support comments. > > Comment support is still missing but easily added :) Still missing from the latest patch. With some added documentation, I think this can be RfC. -- Justin
ne 5. 7. 2020 v 22:31 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Wed, Jul 01, 2020 at 04:24:52PM -0500, Justin Pryzby wrote:
> On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:
> > st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
> > > > + /* ignore empty rows */
> > > > + if (*line != '\0')
> > >
> > > Maybe: if line=='\0': continue
> > > We should also support comments.
>
> Comment support is still missing but easily added :)
Still missing from the latest patch.
I can implement a comment support. But I am not sure about the format. The start can be "--" or classic #.
but "--" can be in this context messy
With some added documentation, I think this can be RfC.
--
Justin
ne 5. 7. 2020 v 22:37 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
ne 5. 7. 2020 v 22:31 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:On Wed, Jul 01, 2020 at 04:24:52PM -0500, Justin Pryzby wrote:
> On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:
> > st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
> > > > + /* ignore empty rows */
> > > > + if (*line != '\0')
> > >
> > > Maybe: if line=='\0': continue
> > > We should also support comments.
>
> Comment support is still missing but easily added :)
Still missing from the latest patch.I can implement a comment support. But I am not sure about the format. The start can be "--" or classic #.but "--" can be in this context messy
here is support for comment's line - first char should be #
Regards
Pavel
With some added documentation, I think this can be RfC.
--
Justin
Attachment
On Mon, Jul 06, 2020 at 06:34:15AM +0200, Pavel Stehule wrote: > >> > Comment support is still missing but easily added :) > >> Still missing from the latest patch. > > > > I can implement a comment support. But I am not sure about the format. The > > here is support for comment's line - first char should be # Thanks, that's how I assumed it would look. > >> With some added documentation, I think this can be RfC. Do you want to add any more documentation ? Few more things: > +exit_invalid_filter_format(FILE *fp, char *filename, char *message, char *line, int lineno) > +{ > + pg_log_error("invalid format of filter file \"%s\": %s", > + *filename == '-' ? "stdin" : filename, > + message); You refer to as "stdin" any filename beginning with -. I think you could just print "-" and not "stdin". In any case, *filename=='-' is wrong since it only checks filename[0]. In a few places you compare ==stdin (which is right). Also, I think "f" isn't as good a name as "fp". You're adding 139 lines to a 600 line main(), and no other option is more than 15 lines. Would you put it in a separate function ? -- Justin
On Mon, Jul 6, 2020 at 10:05 AM Pavel Stehule <pavel.stehule@gmail.com> wrote: > > here is support for comment's line - first char should be # > Few comments: + str = fgets(*lineptr + total_chars, + *n - total_chars, + fp); + + if (ferror(fp)) + return -1; Should we include any error message in the above case. + else + break; + } + + if (ferror(fp)) + return -1; Similar to above. + /* check, if there is good enough space for next content */ + if (*n - total_chars < 2) + { + *n += 1024; + *lineptr = pg_realloc(*lineptr, *n); + } We could use a macro in place of 1024. + if (objecttype == 't') + { + if (is_include) + { + simple_string_list_append(&table_include_patterns, + objectname); + dopt.include_everything = false; + } + else + simple_string_list_append(&table_exclude_patterns, + objectname); + } + else if (objecttype == 'n') + { + if (is_include) + { + simple_string_list_append(&schema_include_patterns, + objectname); + dopt.include_everything = false; + } + else + simple_string_list_append(&schema_exclude_patterns, + objectname); + } Some of the above code is repetitive in above, can the common code be made into a macro and called? printf(_(" --extra-float-digits=NUM override default setting for extra_float_digits\n")); + printf(_(" --filter=FILENAME read object name filter expressions from file\n")); printf(_(" --if-exists use IF EXISTS when dropping objects\n")); Can this be changed to dump objects and data based on the filter expressions from the filter file. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
ne 12. 7. 2020 v 1:06 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Mon, Jul 06, 2020 at 06:34:15AM +0200, Pavel Stehule wrote:
> >> > Comment support is still missing but easily added :)
> >> Still missing from the latest patch.
> >
> > I can implement a comment support. But I am not sure about the format. The
>
> here is support for comment's line - first char should be #
Thanks, that's how I assumed it would look.
> >> With some added documentation, I think this can be RfC.
Do you want to add any more documentation ?
done
Few more things:
> +exit_invalid_filter_format(FILE *fp, char *filename, char *message, char *line, int lineno)
> +{
> + pg_log_error("invalid format of filter file \"%s\": %s",
> + *filename == '-' ? "stdin" : filename,
> + message);
You refer to as "stdin" any filename beginning with -.
I think you could just print "-" and not "stdin".
In any case, *filename=='-' is wrong since it only checks filename[0].
In a few places you compare ==stdin (which is right).
done
Also, I think "f" isn't as good a name as "fp".
done
You're adding 139 lines to a 600 line main(), and no other option is more than
15 lines. Would you put it in a separate function ?
done
please, check attached patch
Regards
Pavel
--
Justin
Attachment
ne 12. 7. 2020 v 3:43 odesílatel vignesh C <vignesh21@gmail.com> napsal:
On Mon, Jul 6, 2020 at 10:05 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> here is support for comment's line - first char should be #
>
Few comments:
+ str = fgets(*lineptr + total_chars,
+ *n - total_chars,
+ fp);
+
+ if (ferror(fp))
+ return -1;
Should we include any error message in the above case.
+ else
+ break;
+ }
+
+ if (ferror(fp))
+ return -1;
Similar to above.
it should be ok, both variant finishing by
<-->if (ferror(fp))
<--><-->fatal("could not read from file \"%s\": %m", filename);
<--><-->fatal("could not read from file \"%s\": %m", filename);
%m should to print related error message
+ /* check, if there is good enough space for
next content */
+ if (*n - total_chars < 2)
+ {
+ *n += 1024;
+ *lineptr = pg_realloc(*lineptr, *n);
+ }
We could use a macro in place of 1024.
done
+ if (objecttype == 't')
+ {
+ if (is_include)
+ {
+
simple_string_list_append(&table_include_patterns,
+
objectname);
+
dopt.include_everything = false;
+ }
+ else
+
simple_string_list_append(&table_exclude_patterns,
+
objectname);
+ }
+ else if (objecttype == 'n')
+ {
+ if (is_include)
+ {
+
simple_string_list_append(&schema_include_patterns,
+
objectname);
+
dopt.include_everything = false;
+ }
+ else
+
simple_string_list_append(&schema_exclude_patterns,
+
objectname);
+ }
Some of the above code is repetitive in above, can the common code be
made into a macro and called?
There are two same fragments and two different fragments. In this case I don't think so using macro or auxiliary function can help with readability. Current code is well structured and well readable.
printf(_(" --extra-float-digits=NUM override default
setting for extra_float_digits\n"));
+ printf(_(" --filter=FILENAME read object name
filter expressions from file\n"));
printf(_(" --if-exists use IF EXISTS when
dropping objects\n"));
Can this be changed to dump objects and data based on the filter
expressions from the filter file.
I am sorry, I don't understand. This should work for data from specified by filter without any modification.
attached updated patch
Regards
Pavel
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachment
> On 13 Jul 2020, at 10:20, Pavel Stehule <pavel.stehule@gmail.com> wrote: > attached updated patch Sorry for jumping in late, but thinking about this extension to pg_dump: doesn't it make more sense to use an existing file format like JSON for this, given that virtually all devops/cd/etc tooling know about JSON already? Considering its application and the problem space, I'd expect users to generate this file rather than handcraft it with 10 rows of content, and standard file formats help there. Creative users could even use the database itself to easily manage its content and generate the file (which isn't limited to JSON of course, but it would be easier). Also, we now have backup manifests in JSON which IMO sets a bit of a precedent, even though thats a separate thing. At the very least it seems limiting to not include a file format version identifier since we'd otherwise risk running into backwards compat issues should we want to expand on this in the future. cheers ./daniel
po 13. 7. 2020 v 12:04 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 13 Jul 2020, at 10:20, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> attached updated patch
Sorry for jumping in late, but thinking about this extension to pg_dump:
doesn't it make more sense to use an existing file format like JSON for this,
given that virtually all devops/cd/etc tooling know about JSON already?
Considering its application and the problem space, I'd expect users to generate
this file rather than handcraft it with 10 rows of content, and standard file
formats help there. Creative users could even use the database itself to
easily manage its content and generate the file (which isn't limited to JSON of
course, but it would be easier). Also, we now have backup manifests in JSON
which IMO sets a bit of a precedent, even though thats a separate thing.
At the very least it seems limiting to not include a file format version
identifier since we'd otherwise risk running into backwards compat issues
should we want to expand on this in the future.
I like JSON format. But why here? For this purpose the JSON is over engineered. This input file has no nested structure - it is just a stream of lines.
I don't think so introducing JSON here can be a good idea. For this feature typical usage can be used in pipe, and the most simple format (what is possible) is ideal.
It is a really different case than pg_dump manifest file - in this case, in this case pg_dump is consument.
Regards
Pavel
cheers ./daniel
On Mon, Jul 13, 2020 at 1:51 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Can this be changed to dump objects and data based on the filter >> expressions from the filter file. > > > I am sorry, I don't understand. This should work for data from specified by filter without any modification. > I meant can this: printf(_(" --filter=FILENAME read object name filter expressions from file\n")); be changed to: printf(_(" --filter=FILENAME dump objects and data based on the filter expressions from the filter file\n")); Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
On Mon, Jul 13, 2020 at 12:04:09PM +0200, Daniel Gustafsson wrote: > Sorry for jumping in late, but thinking about this extension to pg_dump: > doesn't it make more sense to use an existing file format like JSON for this, > given that virtually all devops/cd/etc tooling know about JSON already? > > Considering its application and the problem space, I'd expect users to generate > this file rather than handcraft it with 10 rows of content, and standard file > formats help there. I mentioned having tested this patch as we would use it. But it's likely I *wouldn't* use it if the format was something which required added complexity to pipe in from an existing shell script. > At the very least it seems limiting to not include a file format version > identifier since we'd otherwise risk running into backwards compat issues > should we want to expand on this in the future. Maybe .. I'm not sure. The patch would of course be extended to handle additional include/exclude options. Is there any other future behavior we might reasonably anticipate ? If at some point we wanted to support another file format, maybe it would look like: --format=v2:filters.txt (or maybe the old one would be v1:filters.txt) -- Justin
> On 13 Jul 2020, at 13:02, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I like JSON format. But why here? For this purpose the JSON is over engineered. I respectfully disagree, JSON is a commonly used and known format in systems administration and most importantly: we already have code to parse it in the frontend. > This input file has no nested structure - it is just a stream of lines. Well, it has a set of object types which in turn have objects. There is more structure than meets the eye. Also, the current patch allows arbitrary whitespace before object names, but no whitespace before comments etc. Using something where the rules of parsing are known is rarely a bad thing. > I don't think so introducing JSON here can be a good idea. Quite possibly it isn't, but not discussing options seems like a worse idea so I wanted to bring it up. > It is a really different case than pg_dump manifest file - in this case, in this case pg_dump is consument. Right, as I said these are two different, while tangentially related, things. cheers ./daniel
po 13. 7. 2020 v 16:57 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 13 Jul 2020, at 13:02, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I like JSON format. But why here? For this purpose the JSON is over engineered.
I respectfully disagree, JSON is a commonly used and known format in systems
administration and most importantly: we already have code to parse it in the
frontend.
I disagree with the idea so if we have a client side JSON parser we have to use it everywhere.
For this case, parsing JSON means more code, not less. I checked the parse_manifest.c. More
the JSON API is DOM type. For this purpose the SAX type is better. But still, things should be simple as possible.
There is not any necessity to use it.
JSON is good for a lot of purposes, and can be good if the document uses more lexer types, numeric, ... But nothing is used there
> This input file has no nested structure - it is just a stream of lines.
Well, it has a set of object types which in turn have objects. There is more
structure than meets the eye.
Also, the current patch allows arbitrary whitespace before object names, but no
whitespace before comments etc. Using something where the rules of parsing are
known is rarely a bad thing.
if I know - JSON hasn't comments at all.
> I don't think so introducing JSON here can be a good idea.
Quite possibly it isn't, but not discussing options seems like a worse idea so
I wanted to bring it up.
> It is a really different case than pg_dump manifest file - in this case, in this case pg_dump is consument.
Right, as I said these are two different, while tangentially related, things.
Backup manifest format has no trivial complexity - and using JSON has sense. Input filter file is a trivial - +/- list of strings (and it will be everytime).
In this case I don't see any benefits from JSON - on both sides (producent, consuments). It is harder (little bit) to parse it, it is harder (little bit) to generate it.
Regards
Pavel
cheers ./daniel
On Mon, Jul 13, 2020 at 08:15:42AM +0200, Pavel Stehule wrote: > > Do you want to add any more documentation ? > > > > done Thanks - I think the documentation was maybe excessive. See attached. -- Justin
Attachment
po 13. 7. 2020 v 20:00 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Mon, Jul 13, 2020 at 08:15:42AM +0200, Pavel Stehule wrote:
> > Do you want to add any more documentation ?
> >
>
> done
Thanks - I think the documentation was maybe excessive. See attached.
I merged your patch - thank you
new patch with doc changes and text of help change requested by Vignesh attached
Regards
Pavel
--
Justin
Attachment
po 13. 7. 2020 v 15:33 odesílatel vignesh C <vignesh21@gmail.com> napsal:
On Mon, Jul 13, 2020 at 1:51 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Can this be changed to dump objects and data based on the filter
>> expressions from the filter file.
>
>
> I am sorry, I don't understand. This should work for data from specified by filter without any modification.
>
I meant can this:
printf(_(" --filter=FILENAME read object name filter
expressions from file\n"));
be changed to:
printf(_(" --filter=FILENAME dump objects and data based
on the filter expressions from the filter file\n"));
done in today patch
Pavel
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Tue, Jul 14, 2020 at 12:03 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I meant can this: >> printf(_(" --filter=FILENAME read object name filter >> expressions from file\n")); >> be changed to: >> printf(_(" --filter=FILENAME dump objects and data based >> on the filter expressions from the filter file\n")); > > done in today patch > Thanks for fixing the comments. Few comments: + /* use "-" as symbol for stdin */ + if (strcmp(filename, "-") != 0) + { + fp = fopen(filename, "r"); + if (!fp) + fatal("could not open the input file \"%s\": %m", + filename); + } + else + fp = stdin; We could use STDIN itself instead of -, it will be a more easier option to understand. + /* when first char is hash, ignore whole line */ + if (*line == '#') + continue; If line starts with # we ignore that line, I feel this should be included in the documentation. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
On Sat, Jul 25, 2020 at 06:56:31PM +0530, vignesh C wrote: > On Tue, Jul 14, 2020 at 12:03 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> I meant can this: > >> printf(_(" --filter=FILENAME read object name filter > >> expressions from file\n")); > >> be changed to: > >> printf(_(" --filter=FILENAME dump objects and data based > >> on the filter expressions from the filter file\n")); > > > > done in today patch > > Thanks for fixing the comments. > Few comments: > + /* use "-" as symbol for stdin */ > + if (strcmp(filename, "-") != 0) > + { > + fp = fopen(filename, "r"); > + if (!fp) > + fatal("could not open the input file \"%s\": %m", > + filename); > + } > + else > + fp = stdin; > > We could use STDIN itself instead of -, it will be a more easier > option to understand. I think "-" is used widely for commandline tools, and STDIN is not (even though it's commonly used by programmers). For example, since last year, pg_restore -f - means stdout. -- Justin
so 25. 7. 2020 v 15:26 odesílatel vignesh C <vignesh21@gmail.com> napsal:
On Tue, Jul 14, 2020 at 12:03 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> I meant can this:
>> printf(_(" --filter=FILENAME read object name filter
>> expressions from file\n"));
>> be changed to:
>> printf(_(" --filter=FILENAME dump objects and data based
>> on the filter expressions from the filter file\n"));
>
> done in today patch
>
Thanks for fixing the comments.
Few comments:
+ /* use "-" as symbol for stdin */
+ if (strcmp(filename, "-") != 0)
+ {
+ fp = fopen(filename, "r");
+ if (!fp)
+ fatal("could not open the input file \"%s\": %m",
+ filename);
+ }
+ else
+ fp = stdin;
We could use STDIN itself instead of -, it will be a more easier
option to understand.
+ /* when first char is hash, ignore whole line */
+ if (*line == '#')
+ continue;
If line starts with # we ignore that line, I feel this should be
included in the documentation.
Good note - I wrote sentence to doc
+ <para>
+ The lines starting with symbol <literal>#</literal> are ignored.
+ Previous white chars (spaces, tabs) are not allowed. These
+ lines can be used for comments, notes.
+ </para>
+
+ The lines starting with symbol <literal>#</literal> are ignored.
+ Previous white chars (spaces, tabs) are not allowed. These
+ lines can be used for comments, notes.
+ </para>
+
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachment
ne 26. 7. 2020 v 21:10 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Sat, Jul 25, 2020 at 06:56:31PM +0530, vignesh C wrote:
> On Tue, Jul 14, 2020 at 12:03 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> >> I meant can this:
> >> printf(_(" --filter=FILENAME read object name filter
> >> expressions from file\n"));
> >> be changed to:
> >> printf(_(" --filter=FILENAME dump objects and data based
> >> on the filter expressions from the filter file\n"));
> >
> > done in today patch
>
> Thanks for fixing the comments.
> Few comments:
> + /* use "-" as symbol for stdin */
> + if (strcmp(filename, "-") != 0)
> + {
> + fp = fopen(filename, "r");
> + if (!fp)
> + fatal("could not open the input file \"%s\": %m",
> + filename);
> + }
> + else
> + fp = stdin;
>
> We could use STDIN itself instead of -, it will be a more easier
> option to understand.
I think "-" is used widely for commandline tools, and STDIN is not (even though
it's commonly used by programmers). For example, since last year, pg_restore
-f - means stdout.
yes, STDIN is used by programming languages, but it is not usual in command line tools. And because it was used by pg_restore, then we should not use new inconsistency.
Regards
Pavel
--
Justin
On Mon, Jul 27, 2020 at 07:25:54AM +0200, Pavel Stehule wrote: > so 25. 7. 2020 v 15:26 odesílatel vignesh C <vignesh21@gmail.com> napsal: > > > On Tue, Jul 14, 2020 at 12:03 PM Pavel Stehule <pavel.stehule@gmail.com> > > wrote: > > >> I meant can this: > > >> printf(_(" --filter=FILENAME read object name filter > > >> expressions from file\n")); > > >> be changed to: > > >> printf(_(" --filter=FILENAME dump objects and data based > > >> on the filter expressions from the filter file\n")); > > > > > > done in today patch This looks good to my eyes - marked RFC. -- Justin
On Sun, Jul 05, 2020 at 10:08:09PM +0200, Pavel Stehule wrote: > st 1. 7. 2020 v 23:24 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal: > > > On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote: > > > st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>> napsal: > > Also, your getline is dynamically re-allocating lines of arbitrary length. > > Possibly that's not needed. We'll typically read "+t schema.relname", > > which is > > 132 chars. Maybe it's sufficient to do > > char buf[1024]; > > fgets(buf); > > if strchr(buf, '\n') == NULL: error(); > > ret = pstrdup(buf); > > 63 bytes is max effective identifier size, but it is not max size of > identifiers. It is very probably so buff with 1024 bytes will be enough for > all, but I do not want to increase any new magic limit. More when dynamic > implementation is not too hard. Maybe you'd want to use a StrInfo like recent patches (8f8154a50). > Table name can be very long - sometimes the data names (table names) can be > stored in external storages with full length and should not be practical to > require truncating in filter file. > > For this case it is very effective, because a resized (increased) buffer is > used for following rows, so realloc should not be often. So when I have to > choose between two implementations with similar complexity, I prefer more > dynamic code without hardcoded limits. This dynamic hasn't any overhead. -- Justin
On 2020-Jul-27, Pavel Stehule wrote: > +/* > + * getline is originally GNU function, and should not be everywhere still. > + * Use own reduced implementation. > + */ > +static size_t > +pg_getline(char **lineptr, size_t *n, FILE *fp) > +{ So, Tom added a coding pattern for doing this in commit 8f8154a503c7, which is ostensibly also to be used in pg_regress [1] -- maybe it'd be useful to have this in src/common? [1] https://postgr.es/m/m_1NfbowTqSJnrC6rq1a9cQK7E-CHQE7B6Kz9w6fNH-OiV-4mcsdMw7UP2oA2_6dZmXvAMjbSPZjW9U7FD2R52D3d9DtaJxcBprsqJqZNBc=@protonmail.com -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > So, Tom added a coding pattern for doing this in commit 8f8154a503c7, > which is ostensibly also to be used in pg_regress [1] -- maybe it'd be > useful to have this in src/common? Done, see pg_get_line() added by 67a472d71. regards, tom lane
pá 4. 9. 2020 v 2:15 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> So, Tom added a coding pattern for doing this in commit 8f8154a503c7,
> which is ostensibly also to be used in pg_regress [1] -- maybe it'd be
> useful to have this in src/common?
Done, see pg_get_line() added by 67a472d71.
Here is updated patch for pg_dump
Regards
Pavel
regards, tom lane
Attachment
pá 4. 9. 2020 v 5:21 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
pá 4. 9. 2020 v 2:15 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> So, Tom added a coding pattern for doing this in commit 8f8154a503c7,
> which is ostensibly also to be used in pg_regress [1] -- maybe it'd be
> useful to have this in src/common?
Done, see pg_get_line() added by 67a472d71.Here is updated patch for pg_dump
another update based on pg_get_line_append function
Regards
Pavel
RegardsPavel
regards, tom lane
Attachment
Hi Pavel
On Fri, Sep 4, 2020 at 6:22 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
Here is updated patch for pg_dump
pg_dumpall also has –exclude-database=pattern and –no-comments option doesn't that qualify it to benefits from this feature? And please add a test case for this option
regards
Surafel
Hi
po 7. 9. 2020 v 14:14 odesílatel Surafel Temesgen <surafel3000@gmail.com> napsal:
Hi PavelOn Fri, Sep 4, 2020 at 6:22 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:Here is updated patch for pg_dump
pg_dumpall also has –exclude-database=pattern and –no-comments option doesn't that qualify it to benefits from this feature? And please add a test case for this option
This patch is related to pg_dump (in this moment), so pg_dumpall options are out of scope.
I am not sure if pg_dumpall needs this functionality - maybe, but I can use bash or some similar for implementation of this feature. There is no requirement to do it all necessary work under one transaction, one snapshot.
For pg_dump can be used different format, because it uses different granularity. Some like "{+/-} dbname"
"--no-comments" is a global parameter without arguments. I don't understand how this parameter can be related to this feature?
I am working on regress tests.
Regards
Pavel
regards
Surafel
Hi
pá 11. 9. 2020 v 10:50 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hipo 7. 9. 2020 v 14:14 odesílatel Surafel Temesgen <surafel3000@gmail.com> napsal:Hi PavelOn Fri, Sep 4, 2020 at 6:22 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:Here is updated patch for pg_dump
pg_dumpall also has –exclude-database=pattern and –no-comments option doesn't that qualify it to benefits from this feature? And please add a test case for this option
This patch is related to pg_dump (in this moment), so pg_dumpall options are out of scope.I am not sure if pg_dumpall needs this functionality - maybe, but I can use bash or some similar for implementation of this feature. There is no requirement to do it all necessary work under one transaction, one snapshot.For pg_dump can be used different format, because it uses different granularity. Some like "{+/-} dbname""--no-comments" is a global parameter without arguments. I don't understand how this parameter can be related to this feature?I am working on regress tests.
There is a updated version with regress tests
Regards
Pavel
RegardsPavel
regards
Surafel
Attachment
Hi
rebase + minor change - using pg_get_line_buf instead pg_get_line_append
Regards
Pavel
Attachment
Greetings, * Pavel Stehule (pavel.stehule@gmail.com) wrote: > rebase + minor change - using pg_get_line_buf instead pg_get_line_append I started looking at this and went back through the thread and while I tend to agree that JSON may not be a good choice for this, it's not the only possible alternative. There is no doubt that pg_dump is already a sophisticated data export tool, and likely to continue to gain new features, such that having a configuration file for it would be very handy, but this clearly isn't really going in a direction that would allow for that. Perhaps this feature could co-exist with a full blown configuration for pg_dump, but even then there's certainly issues with what's proposed- how would you handle explicitly asking for a table which is named " mytable" to be included or excluded? Or a table which has a newline in it? Using a standardized format which supports the full range of what we do in a table name, explicitly and clearly, would address these issues and also give us the flexibility to extend the options which could be used through the configuration file beyond just the filters in the future. Unlike for the pg_basebackup manifest, which we generate and read entirely programatically, a config file for pg_dump would almost certainly be updated manually (or, at least, parts of it would be and perhaps other parts generated), which means it'd really be ideal to have a proper way to support comments in it (something that the proposed format also doesn't really get right- # must be the *first* character, and you can only have whole-line comments..?), avoid extra unneeded punctuation (or, at times, allow it- such as trailing commas in lists), cleanly handle multi-line strings (consider the oft discussed idea around having pg_dump support a WHERE clause for exporting data from tables...), etc. Overall, -1 from me on this approach. Maybe it could be fixed up to handle all the different names of objects that we support today (something which, imv, is really a clear requirement for this feature to be committed), but I suspect you'd end up half-way to yet another configuration format when we could be working to support something like TOML or maybe YAML... but if you want my 2c, TOML seems closer to what we do for postgresql.conf and getting that over to something that's standardized, while a crazy long shot, is a general nice idea, imv. Thanks, Stephen
Attachment
Hi
út 10. 11. 2020 v 21:09 odesílatel Stephen Frost <sfrost@snowman.net> napsal:
Greetings,
* Pavel Stehule (pavel.stehule@gmail.com) wrote:
> rebase + minor change - using pg_get_line_buf instead pg_get_line_append
I started looking at this and went back through the thread and while I
tend to agree that JSON may not be a good choice for this, it's not the
only possible alternative. There is no doubt that pg_dump is already a
sophisticated data export tool, and likely to continue to gain new
features, such that having a configuration file for it would be very
handy, but this clearly isn't really going in a direction that would
allow for that.
Perhaps this feature could co-exist with a full blown configuration for
pg_dump, but even then there's certainly issues with what's proposed-
how would you handle explicitly asking for a table which is named
" mytable" to be included or excluded? Or a table which has a newline
in it? Using a standardized format which supports the full range of
what we do in a table name, explicitly and clearly, would address these
issues and also give us the flexibility to extend the options which
could be used through the configuration file beyond just the filters in
the future.
This is the correct argument - I will check a possibility to use strange names, but there is the same possibility and functionality like we allow from the command line. So you can use double quoted names. I'll check it.
Unlike for the pg_basebackup manifest, which we generate and read
entirely programatically, a config file for pg_dump would almost
certainly be updated manually (or, at least, parts of it would be and
perhaps other parts generated), which means it'd really be ideal to have
a proper way to support comments in it (something that the proposed
format also doesn't really get right- # must be the *first* character,
and you can only have whole-line comments..?), avoid extra unneeded
punctuation (or, at times, allow it- such as trailing commas in lists),
cleanly handle multi-line strings (consider the oft discussed idea
around having pg_dump support a WHERE clause for exporting data from
tables...), etc.
I think the proposed feature is very far to be the config file for pg_dump (it implements a option "--filter"). This is not the target. It is not designed for this. This is just an alternative for options like -t, -T, ... and I am sure so nobody will generate this file manually. Main target of this patch is eliminating problems with the max length of the command line. So it is really not designed to be the config file for pg_dump.
Overall, -1 from me on this approach. Maybe it could be fixed up to
handle all the different names of objects that we support today
(something which, imv, is really a clear requirement for this feature to
be committed), but I suspect you'd end up half-way to yet another
configuration format when we could be working to support something like
TOML or maybe YAML... but if you want my 2c, TOML seems closer to what
we do for postgresql.conf and getting that over to something that's
standardized, while a crazy long shot, is a general nice idea, imv.
I have nothing against TOML, but I don't see a sense of usage in this patch. This patch doesn't implement a config file for pg_dump, and I don't see any sense or benefits of it. The TOML is designed for different purposes. TOML is good for manual creating, but it is not this case. Typical usage of this patch is some like, and TOML syntax (or JSON) is not good for this.
psql -c "select '+t' || quote_ident(relname) from pg_class where relname ..." | pg_dump --filter=/dev/stdin
I can imagine some benefits of saved configure files for postgres applications - but it should be designed generally and implemented generally. Probably you would use one for pg_dump, psql, pg_restore, .... But it is a different feature with different usage. This patch doesn't implement option "--config", it implements option "--filter".
Regards
Pavel
Thanks,
Stephen
Hi
Perhaps this feature could co-exist with a full blown configuration for
pg_dump, but even then there's certainly issues with what's proposed-
how would you handle explicitly asking for a table which is named
" mytable" to be included or excluded? Or a table which has a newline
in it? Using a standardized format which supports the full range of
what we do in a table name, explicitly and clearly, would address these
issues and also give us the flexibility to extend the options which
could be used through the configuration file beyond just the filters in
the future.This is the correct argument - I will check a possibility to use strange names, but there is the same possibility and functionality like we allow from the command line. So you can use double quoted names. I'll check it.
I checked
echo "+t \"bad Name\"" | /usr/local/pgsql/master/bin/pg_dump --filter=/dev/stdin
It is working without any problem
Regards
Pavel
Hi
čt 24. 9. 2020 v 19:47 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hirebase + minor change - using pg_get_line_buf instead pg_get_line_append
fresh rebase
Regards
Pavel
Attachment
st 11. 11. 2020 v 6:32 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hiút 10. 11. 2020 v 21:09 odesílatel Stephen Frost <sfrost@snowman.net> napsal:Greetings,
* Pavel Stehule (pavel.stehule@gmail.com) wrote:
> rebase + minor change - using pg_get_line_buf instead pg_get_line_append
I started looking at this and went back through the thread and while I
tend to agree that JSON may not be a good choice for this, it's not the
only possible alternative. There is no doubt that pg_dump is already a
sophisticated data export tool, and likely to continue to gain new
features, such that having a configuration file for it would be very
handy, but this clearly isn't really going in a direction that would
allow for that.
Perhaps this feature could co-exist with a full blown configuration for
pg_dump, but even then there's certainly issues with what's proposed-
how would you handle explicitly asking for a table which is named
" mytable" to be included or excluded? Or a table which has a newline
in it? Using a standardized format which supports the full range of
what we do in a table name, explicitly and clearly, would address these
issues and also give us the flexibility to extend the options which
could be used through the configuration file beyond just the filters in
the future.This is the correct argument - I will check a possibility to use strange names, but there is the same possibility and functionality like we allow from the command line. So you can use double quoted names. I'll check it.
Unlike for the pg_basebackup manifest, which we generate and read
entirely programatically, a config file for pg_dump would almost
certainly be updated manually (or, at least, parts of it would be and
perhaps other parts generated), which means it'd really be ideal to have
a proper way to support comments in it (something that the proposed
format also doesn't really get right- # must be the *first* character,
and you can only have whole-line comments..?), avoid extra unneeded
punctuation (or, at times, allow it- such as trailing commas in lists),
cleanly handle multi-line strings (consider the oft discussed idea
around having pg_dump support a WHERE clause for exporting data from
tables...), etc.I think the proposed feature is very far to be the config file for pg_dump (it implements a option "--filter"). This is not the target. It is not designed for this. This is just an alternative for options like -t, -T, ... and I am sure so nobody will generate this file manually. Main target of this patch is eliminating problems with the max length of the command line. So it is really not designed to be the config file for pg_dump.
Overall, -1 from me on this approach. Maybe it could be fixed up to
handle all the different names of objects that we support today
(something which, imv, is really a clear requirement for this feature to
be committed), but I suspect you'd end up half-way to yet another
configuration format when we could be working to support something like
TOML or maybe YAML... but if you want my 2c, TOML seems closer to what
we do for postgresql.conf and getting that over to something that's
standardized, while a crazy long shot, is a general nice idea, imv.I have nothing against TOML, but I don't see a sense of usage in this patch. This patch doesn't implement a config file for pg_dump, and I don't see any sense or benefits of it. The TOML is designed for different purposes. TOML is good for manual creating, but it is not this case. Typical usage of this patch is some like, and TOML syntax (or JSON) is not good for this.psql -c "select '+t' || quote_ident(relname) from pg_class where relname ..." | pg_dump --filter=/dev/stdinI can imagine some benefits of saved configure files for postgres applications - but it should be designed generally and implemented generally. Probably you would use one for pg_dump, psql, pg_restore, .... But it is a different feature with different usage. This patch doesn't implement option "--config", it implements option "--filter".
Some generic configuration for postgres binary applications is an interesting idea. And TOML language can be well for this purpose. We can parametrize applications by command line and by system variables. But filtering objects is a really different case - although there is some small intersection, and it will be used very differently, and I don't think so one language can be practical for both cases. The object filtering is an independent feature, and both features can coexist together.
Regards
Pavel
RegardsPavel
Thanks,
Stephen
On 2020-Nov-11, Pavel Stehule wrote: > I think the proposed feature is very far to be the config file for pg_dump > (it implements a option "--filter"). This is not the target. It is not > designed for this. This is just an alternative for options like -t, -T, ... > and I am sure so nobody will generate this file manually. Main target of > this patch is eliminating problems with the max length of the command line. > So it is really not designed to be the config file for pg_dump. I agree that a replacement for existing command line arguments is a good goal, but at the same time it's good to keep in mind the request that more object types are supported as dumpable. While it's not necessary that this infrastructure supports all object types in the first cut, it'd be good to have it support that. I would propose that instead of a single letter 't' etc we support keywords, maybe similar to those returned by getObjectTypeDescription() (with additions -- for example for "table data"). Then we can extend for other object types later without struggling to find good letters for each. Of course we could allow abbrevations for common cases, such that "t" means "table". For example: it'll be useful to support selective dumping of functions, materialized views, foreign objects, etc.
st 11. 11. 2020 v 16:17 odesílatel Alvaro Herrera <alvherre@alvh.no-ip.org> napsal:
On 2020-Nov-11, Pavel Stehule wrote:
> I think the proposed feature is very far to be the config file for pg_dump
> (it implements a option "--filter"). This is not the target. It is not
> designed for this. This is just an alternative for options like -t, -T, ...
> and I am sure so nobody will generate this file manually. Main target of
> this patch is eliminating problems with the max length of the command line.
> So it is really not designed to be the config file for pg_dump.
I agree that a replacement for existing command line arguments is a good
goal, but at the same time it's good to keep in mind the request that
more object types are supported as dumpable. While it's not necessary
that this infrastructure supports all object types in the first cut,
it'd be good to have it support that. I would propose that instead of a
single letter 't' etc we support keywords, maybe similar to those
returned by getObjectTypeDescription() (with additions -- for example
for "table data"). Then we can extend for other object types later
without struggling to find good letters for each.
Of course we could allow abbrevations for common cases, such that "t"
means "table".
For example: it'll be useful to support selective dumping of functions,
materialized views, foreign objects, etc.
Implementation of this is trivial.
The hard work is mapping pg_dump options on database objects. t -> table is simple, but n -> schema looks a little bit inconsistent - although it is consistent with pg_dump. d or D - there is no system object like data. I am afraid so there are two independent systems - pg_dump options, and database objects, and it can be hard or not very practical to join these systems. Unfortunately there is not good consistency in the short options of pg_dump today. More - a lot of object names are multi words with inner space. This is not too practical.
What about supporting two syntaxes?
1. first short current +-tndf filter - but the implementation should not be limited to one char - there can be any string until space
2. long syntax - all these pg_dump options has long options, and then we can extend this feature without any problem in future
table|exclude-table|exclude-table-data|schema|exclude-schema|include-foreign-data=PATTERN
so the content of filter file can looks like:
+t mytable
+t tabprefix*
-t bigtable
table=mytable2
exclude-table=mytable2
This format allows quick work for most common database objects, and it is extensible and consistent with pg_dump's long options.
What do you think about it?
Personally, I am thinking that it is over-engineering a little bit, maybe we can implement this feature just test first string after +- symbols (instead first char like now) - and any enhanced syntax can be implemented in future when there will be this requirement. Second syntax can be implemented very simply, because it can be identified by first char processing. We can implement second syntax only too. It will work too, but I think so short syntax is more practical for daily work (for common options). I expect so 99% percent of this objects will be "+t tablename".
Regards
Pavel
On Wed, Nov 11, 2020 at 06:49:43AM +0100, Pavel Stehule wrote: > Perhaps this feature could co-exist with a full blown configuration for > >> pg_dump, but even then there's certainly issues with what's proposed- > >> how would you handle explicitly asking for a table which is named > >> " mytable" to be included or excluded? Or a table which has a newline > >> in it? Using a standardized format which supports the full range of > >> what we do in a table name, explicitly and clearly, would address these > >> issues and also give us the flexibility to extend the options which > >> could be used through the configuration file beyond just the filters in > >> the future. I think it's a reasonable question - why would a new configuration file option include support for only a handful of existing arguments but not the rest. > > This is the correct argument - I will check a possibility to use strange > > names, but there is the same possibility and functionality like we allow > > from the command line. So you can use double quoted names. I'll check it. > > I checked > echo "+t \"bad Name\"" | /usr/local/pgsql/master/bin/pg_dump --filter=/dev/stdin > It is working without any problem I think it couldn't possibly work with newlines, since you call pg_get_line(). I realize that entering a newline into the shell would also be a PITA, but that could be one *more* reason to support a config file - to allow terrible table names to be in a file and avoid writing dash tee quote something enter else quote in a pg_dump command, or shell script. I fooled with argument parsing to handle reading from a file in the quickest way. As written, this fails to handle multiple config files, and special table names, which need to support arbitrary, logical lines, with quotes surrounding newlines or other special chars. As written, the --config file is parsed *after* all other arguments, so it could override previous args (like --no-blobs --no-blogs, --file, --format, --compress, --lock-wait), which I guess is bad, so the config file should be processed *during* argument parsing. Unfortunately, I think that suggests duplicating parsing of all/most the argument parsing for config file support - I'd be happy if someone suggested a better way. BTW, in your most recent patch: s/empty rows/empty lines/ unbalanced parens: "invalid option type (use [+-]" @cfbot: I renamed the patch so please ignore it. -- Justin
Attachment
Greetings, * Justin Pryzby (pryzby@telsasoft.com) wrote: > On Wed, Nov 11, 2020 at 06:49:43AM +0100, Pavel Stehule wrote: > > >> Perhaps this feature could co-exist with a full blown configuration for > > >> pg_dump, but even then there's certainly issues with what's proposed- > > >> how would you handle explicitly asking for a table which is named > > >> " mytable" to be included or excluded? Or a table which has a newline > > >> in it? Using a standardized format which supports the full range of > > >> what we do in a table name, explicitly and clearly, would address these > > >> issues and also give us the flexibility to extend the options which > > >> could be used through the configuration file beyond just the filters in > > >> the future. > > I think it's a reasonable question - why would a new configuration file option > include support for only a handful of existing arguments but not the rest. Even if the first version of having a config file for pg_dump only supported some options, that would be reasonable imv, but I dislike the idea of building it in such a way that it'll be awkward to add more options to it in the future, something that I definitely think people would like to see (I know I would...). > > > This is the correct argument - I will check a possibility to use strange > > > names, but there is the same possibility and functionality like we allow > > > from the command line. So you can use double quoted names. I'll check it. > > > > I checked > > echo "+t \"bad Name\"" | /usr/local/pgsql/master/bin/pg_dump --filter=/dev/stdin > > It is working without any problem > > I think it couldn't possibly work with newlines, since you call pg_get_line(). Yeah, I didn't really believe that it actually worked but hadn't had a chance to demonstrate that it didn't yet. > I realize that entering a newline into the shell would also be a PITA, but that > could be one *more* reason to support a config file - to allow terrible table > names to be in a file and avoid writing dash tee quote something enter else > quote in a pg_dump command, or shell script. Agreed. > I fooled with argument parsing to handle reading from a file in the quickest > way. As written, this fails to handle multiple config files, and special table > names, which need to support arbitrary, logical lines, with quotes surrounding > newlines or other special chars. As written, the --config file is parsed > *after* all other arguments, so it could override previous args (like > --no-blobs --no-blogs, --file, --format, --compress, --lock-wait), which I > guess is bad, so the config file should be processed *during* argument parsing. > Unfortunately, I think that suggests duplicating parsing of all/most the > argument parsing for config file support - I'd be happy if someone suggested a > better way. This still feels like we're trying to quickly hack-and-slash at adding a config file option rather than thinking through what a sensible design for a pg_dump config file would look like. Having a way to avoid having multiple places in the code that has to handle all the possible options is a nice idea but, as I tried to allude to up-thread, I fully expect that once we've got this config file capability that we're going to want to add things to it that would be difficult to utilize through the command-line and so I expect these code paths to diverge anyway. I would imagine something like: [connection] db-host=whatever db-port=5433 ... [output] owners = true privileges = false format = "custom" file = "myoutput.dump" # This is a comment [include] tables = [ "sometable", "table with spaces", "table with quoted \"", """this is my table with a carriage return""", "anothertable" ] table-patterns = [ "table*" ] schemas = [ "myschema" ] [exclude] tables = [ "similar to include" ] functions = [ "somefunction(int)" ] etc, etc ... Thanks, Stephen
Attachment
út 17. 11. 2020 v 22:53 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Wed, Nov 11, 2020 at 06:49:43AM +0100, Pavel Stehule wrote:
> Perhaps this feature could co-exist with a full blown configuration for
> >> pg_dump, but even then there's certainly issues with what's proposed-
> >> how would you handle explicitly asking for a table which is named
> >> " mytable" to be included or excluded? Or a table which has a newline
> >> in it? Using a standardized format which supports the full range of
> >> what we do in a table name, explicitly and clearly, would address these
> >> issues and also give us the flexibility to extend the options which
> >> could be used through the configuration file beyond just the filters in
> >> the future.
I think it's a reasonable question - why would a new configuration file option
include support for only a handful of existing arguments but not the rest.
I don't see a strong technical problem - enhancing parsing is not hard work, but I miss a use case for this. The option "--filter" tries to solve a problem with limited command line size. This is a clean use case and there and supported options are options that can be used repeatedly on the command line. Nothing less, nothing more. The format that is used is designed just for this purpose.
When we would implement an alternative configuration to command line and system environments, then the use case should be defined first. When the use case is defined, we can talk about implementation and about good format. There are a lot of interesting formats, but I miss a reason why the usage of this alternative configuration can be helpful for pg_dump. Using external libraries for richer formats means a new dependency, necessity to solve portability issues, and maybe other issues, and for this there should be a good use case. Passing a list of tables for dumping doesn't need a rich format.
I cannot imagine using a config file with generated object names and some other options together. Maybe if these configurations will not be too long (then handy written) configuration can be usable. But when I think about using pg_dump from some bash scripts, then much more practical is using usual command line options and passing a list of objects by pipe. I really miss the use case for special pg_dump's config file, and if there is, then it is very different from a use case for "--filter" option.
> > This is the correct argument - I will check a possibility to use strange
> > names, but there is the same possibility and functionality like we allow
> > from the command line. So you can use double quoted names. I'll check it.
>
> I checked
> echo "+t \"bad Name\"" | /usr/local/pgsql/master/bin/pg_dump --filter=/dev/stdin
> It is working without any problem
I think it couldn't possibly work with newlines, since you call pg_get_line().
I realize that entering a newline into the shell would also be a PITA, but that
could be one *more* reason to support a config file - to allow terrible table
names to be in a file and avoid writing dash tee quote something enter else
quote in a pg_dump command, or shell script.
New patch is working with names that contains multilines
[pavel@localhost postgresql.master]$ psql -At -X -c "select '+t ' || quote_ident(table_name) from information_schema.tables where table_name like 'foo%'"| /usr/local/pgsql/master/bin/pg_dump --filter=/dev/stdin
--
-- PostgreSQL database dump
--
-- Dumped from database version 14devel
-- Dumped by pg_dump version 14devel
-
-- Name: foo boo; Type: TABLE; Schema: public; Owner: pavel
--
CREATE TABLE public."foo
boo" (
a integer
);
ALTER TABLE public."foo
boo" OWNER TO pavel;
--
-- Data for Name: foo boo; Type: TABLE DATA; Schema: public; Owner: pavel
--
COPY public."foo
boo" (a) FROM stdin;
\.
--
-- PostgreSQL database dump complete
--
--
-- PostgreSQL database dump
--
-- Dumped from database version 14devel
-- Dumped by pg_dump version 14devel
-
-- Name: foo boo; Type: TABLE; Schema: public; Owner: pavel
--
CREATE TABLE public."foo
boo" (
a integer
);
ALTER TABLE public."foo
boo" OWNER TO pavel;
--
-- Data for Name: foo boo; Type: TABLE DATA; Schema: public; Owner: pavel
--
COPY public."foo
boo" (a) FROM stdin;
\.
--
-- PostgreSQL database dump complete
--
I fooled with argument parsing to handle reading from a file in the quickest
way. As written, this fails to handle multiple config files, and special table
names, which need to support arbitrary, logical lines, with quotes surrounding
newlines or other special chars. As written, the --config file is parsed
*after* all other arguments, so it could override previous args (like
--no-blobs --no-blogs, --file, --format, --compress, --lock-wait), which I
guess is bad, so the config file should be processed *during* argument parsing.
Unfortunately, I think that suggests duplicating parsing of all/most the
argument parsing for config file support - I'd be happy if someone suggested a
better way.
BTW, in your most recent patch:
s/empty rows/empty lines/
unbalanced parens: "invalid option type (use [+-]"
should be fixed now, thank you for check
Regards
Pavel
@cfbot: I renamed the patch so please ignore it.
--
Justin
Attachment
BTW, in your most recent patch:
s/empty rows/empty lines/
unbalanced parens: "invalid option type (use [+-]"should be fixed now, thank you for check
minor update - fixed handling of processing names with double quotes inside
RegardsPavel
@cfbot: I renamed the patch so please ignore it.
--
Justin
Attachment
On Thu, 19 Nov 2020 at 19:57, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > minor update - fixed handling of processing names with double quotes inside > I see this is marked RFC, but reading the thread it doesn't feel like we have reached consensus on the design for this feature. I agree that being able to configure pg_dump via a config file would be very useful, but the syntax proposed here feels much more like a hacked-up syntax designed to meet this one use case, rather than a good general-purpose design that can be easily extended. IMO, a pg_dump config file should be able to specify all options currently supported through the command line, and vice versa (subject to command line length limits), with a single common code path for handling options. That way, any new options we add will work on the command line and in config files. Likewise, the user should only need to learn one set of options, and have the choice of specifying them on the command line or in a config file (or a mix of both). I can imagine eventually supporting multiple different file formats, each just being a different representation of the same data, so perhaps this could work with 2 new options: --option-file-format=plain|yaml|json|... --option-file=filename with "plain" being the default initial implementation, which might be something like our current postgresql.conf file format. Also, I think we should allow multiple "--option-file" arguments (e.g., to list different object types in different files), and for a config file to contain its own "--option-file" arguments, to allow config files to include other config files. The current design feels far too limited to me, and requires new code and new syntax to be added each time we extend it, so I'm -1 on this patch as it stands. Regards, Dean
Hi
st 25. 11. 2020 v 19:25 odesílatel Dean Rasheed <dean.a.rasheed@gmail.com> napsal:
On Thu, 19 Nov 2020 at 19:57, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> minor update - fixed handling of processing names with double quotes inside
>
I see this is marked RFC, but reading the thread it doesn't feel like
we have reached consensus on the design for this feature.
I agree that being able to configure pg_dump via a config file would
be very useful, but the syntax proposed here feels much more like a
hacked-up syntax designed to meet this one use case, rather than a
good general-purpose design that can be easily extended.
But I don't understand why? What is a use case? What is a benefit against command line, or libpq variables? And why should config files be better as a solution for limited length of command line, when I need to dump thousands of tables exactly specified?
Regards
Pavel
IMO, a pg_dump config file should be able to specify all options
currently supported through the command line, and vice versa (subject
to command line length limits), with a single common code path for
handling options. That way, any new options we add will work on the
command line and in config files. Likewise, the user should only need
to learn one set of options, and have the choice of specifying them on
the command line or in a config file (or a mix of both).
I can imagine eventually supporting multiple different file formats,
each just being a different representation of the same data, so
perhaps this could work with 2 new options:
--option-file-format=plain|yaml|json|...
--option-file=filename
with "plain" being the default initial implementation, which might be
something like our current postgresql.conf file format.
Also, I think we should allow multiple "--option-file" arguments
(e.g., to list different object types in different files), and for a
config file to contain its own "--option-file" arguments, to allow
config files to include other config files.
The current design feels far too limited to me, and requires new code
and new syntax to be added each time we extend it, so I'm -1 on this
patch as it stands.
This new syntax tries to be consistent and simple. It really doesn't try to implement an alternative configuration file for pg_dump. The code is simple and can be easily extended.
What are the benefits of supporting multiple formats?
Regards,
Dean
Pavel Stehule <pavel.stehule@gmail.com> writes: > st 25. 11. 2020 v 19:25 odesílatel Dean Rasheed <dean.a.rasheed@gmail.com> > napsal: >> I agree that being able to configure pg_dump via a config file would >> be very useful, but the syntax proposed here feels much more like a >> hacked-up syntax designed to meet this one use case, rather than a >> good general-purpose design that can be easily extended. > But I don't understand why? What is a use case? What is a benefit against > command line, or libpq variables? And why should config files be better as > a solution for limited length of command line, when I need to dump > thousands of tables exactly specified? Because next week somebody will want to dump thousands of functions selected by name, or schemas selected by name, etc etc. I agree with the position that we don't want a single-purpose solution. The idea that the syntax should match the command line switch syntax seems reasonable, though I'm not wedded to it. (One thing to consider is how painful will it be for people to quote table names containing funny characters, for instance. On the command line, we largely depend on the shell's quoting behavior to solve that, but we'd not have that infrastructure when reading from a file.) > What are the benefits of supporting multiple formats? Yeah, that part of Dean's sketch seemed like overkill to me too. I wasn't very excited about multiple switch files either, though depending on how the implementation is done, that could be simple enough to be in the might-as-well category. One other point that I'm wondering about is that there's really no value in doing anything here until you get to some thousands of table names; as long as the list fits in the shell's command line length limit, you might as well just make a shell script file. Does pg_dump really have sane performance for that situation, or are we soon going to be fielding requests to make it not be O(N^2) in the number of listed tables? regards, tom lane
st 25. 11. 2020 v 21:00 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> st 25. 11. 2020 v 19:25 odesílatel Dean Rasheed <dean.a.rasheed@gmail.com>
> napsal:
>> I agree that being able to configure pg_dump via a config file would
>> be very useful, but the syntax proposed here feels much more like a
>> hacked-up syntax designed to meet this one use case, rather than a
>> good general-purpose design that can be easily extended.
> But I don't understand why? What is a use case? What is a benefit against
> command line, or libpq variables? And why should config files be better as
> a solution for limited length of command line, when I need to dump
> thousands of tables exactly specified?
Because next week somebody will want to dump thousands of functions
selected by name, or schemas selected by name, etc etc. I agree with
the position that we don't want a single-purpose solution. The idea
that the syntax should match the command line switch syntax seems
reasonable, though I'm not wedded to it. (One thing to consider is
how painful will it be for people to quote table names containing
funny characters, for instance. On the command line, we largely
depend on the shell's quoting behavior to solve that, but we'd not
have that infrastructure when reading from a file.)
This is not a problem with the current patch - and the last version of this patch supports well obscure names.
There was a requirement for supporting all and future pg_dump options - ok it can make sense. I have not a problem to use instead a line format
"option argument" or "long-option=argument"
This format - can it be a solution? I'll try to rewrite the parser for this format.
It is implementable, but this is in collision with Stephen's requirement for human well readable format designed for handy writing. There are requests that have no intersection. Well readable format needs a more complex parser. And machine generating in this format needs more fork - generating flat file is more simple and more robust than generating JSON or YAML.
> What are the benefits of supporting multiple formats?
Yeah, that part of Dean's sketch seemed like overkill to me too.
I wasn't very excited about multiple switch files either, though
depending on how the implementation is done, that could be simple
enough to be in the might-as-well category.
One other point that I'm wondering about is that there's really no
value in doing anything here until you get to some thousands of
table names; as long as the list fits in the shell's command line
length limit, you might as well just make a shell script file.
Does pg_dump really have sane performance for that situation, or
are we soon going to be fielding requests to make it not be O(N^2)
in the number of listed tables?
Performance is another factor, but the command line limit can be easily touched when table names have maximum width.
regards, tom lane
On Thu, 26 Nov 2020 at 06:43, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > st 25. 11. 2020 v 21:00 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal: >> >> (One thing to consider is >> how painful will it be for people to quote table names containing >> funny characters, for instance. On the command line, we largely >> depend on the shell's quoting behavior to solve that, but we'd not >> have that infrastructure when reading from a file.) > > This is not a problem with the current patch - and the last version of this patch supports well obscure names. > Actually, that raises a different possible benefit of passing options in an options file -- if the user wants to pass in a table name pattern, it can be a nuisance if the shell's argument processing does additional unwanted things like globbing and environment variable substitutions. Using an options file could provide a handy way to ensure that any option values are interpreted exactly as written, without any additional mangling. > There was a requirement for supporting all and future pg_dump options - ok it can make sense. I have not a problem to useinstead a line format > > "option argument" or "long-option=argument" > > This format - can it be a solution? I'll try to rewrite the parser for this format. > Yes, that's the sort of thing I was thinking of, to make the feature more general-purpose. > It is implementable, but this is in collision with Stephen's requirement for human well readable format designed for handywriting. There are requests that have no intersection. Well readable format needs a more complex parser. And machinegenerating in this format needs more fork - generating flat file is more simple and more robust than generating JSONor YAML. > To be clear, I wasn't suggesting that this patch implement multiple formats. Just the "plain" format above. Other formats like YAML might well be more human-readable, and be able to take advantage of values that are lists to avoid repeating option names, and they would have the advantage of being readable and writable by other standard tools, which might be useful. But I think such things would be for the future. Maybe no one will ever add support for other formats, or maybe someone will just write a separate external tool to convert YAML or JSON to our plain format. I don't know. But supporting all pg_dump options makes more things possible. >> I wasn't very excited about multiple switch files either, though >> depending on how the implementation is done, that could be simple >> enough to be in the might-as-well category. >> That's what I was hoping. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > Actually, that raises a different possible benefit of passing options > in an options file -- if the user wants to pass in a table name > pattern, it can be a nuisance if the shell's argument processing does > additional unwanted things like globbing and environment variable > substitutions. Using an options file could provide a handy way to > ensure that any option values are interpreted exactly as written, > without any additional mangling. Huh? Any format we might devise, or borrow, will have to have some kind of escaping/quoting convention. The idea that "we don't need that" tends to lead to very ugly workarounds later. I do agree that the shell's quoting conventions are pretty messy and so those aren't the ones we should borrow. We could do a lot worse than to use some established data format like JSON or YAML. Given that we already have src/common/jsonapi.c, it seems like JSON would be the better choice of those two. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: > > Actually, that raises a different possible benefit of passing options > > in an options file -- if the user wants to pass in a table name > > pattern, it can be a nuisance if the shell's argument processing does > > additional unwanted things like globbing and environment variable > > substitutions. Using an options file could provide a handy way to > > ensure that any option values are interpreted exactly as written, > > without any additional mangling. > > Huh? Any format we might devise, or borrow, will have to have some > kind of escaping/quoting convention. The idea that "we don't need > that" tends to lead to very ugly workarounds later. Agreed. > I do agree that the shell's quoting conventions are pretty messy > and so those aren't the ones we should borrow. We could do a lot > worse than to use some established data format like JSON or YAML. > Given that we already have src/common/jsonapi.c, it seems like > JSON would be the better choice of those two. JSON doesn't support comments, something that's really useful to have in configuration files, so I don't agree that it's a sensible thing to use in this case. JSON also isn't very forgiving, which is also unfortunate and makes for a poor choice. This is why I was suggesting TOML up-thread, which is MIT licensed, has been around for a number of years, supports comments, has sensible quoting that's easier to deal with than the shell, and has a C (C99) implementation. It's also used in quite a few other projects. In a quick look, I suspect it might also be something that could be used to replace our existing hand-hacked postgresql.conf parser and give us the ability to handle things a bit cleaner there too... Thanks, Stephen
Attachment
Greetings, * Pavel Stehule (pavel.stehule@gmail.com) wrote: > > I agree that being able to configure pg_dump via a config file would > > be very useful, but the syntax proposed here feels much more like a > > hacked-up syntax designed to meet this one use case, rather than a > > good general-purpose design that can be easily extended. > > Nobody sent a real use case for introducing the config file. There was a > discussion about formats, and you introduce other dimensions and > variability. I'm a bit baffled by this because it seems abundently clear to me that being able to have a config file for pg_dump would be extremely helpful. There's no shortage of times that I've had to hack up a shell script and figure out quoting and set up the right set of options for pg_dump, resulting in things like: pg_dump \ --host=myserver.com \ --username=postgres \ --schema=public \ --schema=myschema \ --no-comments \ --no-tablespaces \ --file=somedir \ --format=d \ --jobs=5 which really is pretty grotty. Being able to have a config file that has proper comments would be much better and we could start to extend to things like "please export schema A to directory A, schema B to directory B" and other ways of selecting source and destination, and imagine if we could validate it too, eg: pg_dump --config=whatever --dry-run or --check-config maybe. This isn't a new concept either- export and import tools for other databases have similar support, eg: Oracle's imp/exp tool, mysqldump (see: https://dev.mysql.com/doc/refman/8.0/en/option-files.html which has a TOML-looking format too), pgloader of course has a config file, etc. We certainly aren't in novel territory here. Thanks, Stephen
Attachment
Hi
st 25. 11. 2020 v 21:00 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> st 25. 11. 2020 v 19:25 odesílatel Dean Rasheed <dean.a.rasheed@gmail.com>
> napsal:
>> I agree that being able to configure pg_dump via a config file would
>> be very useful, but the syntax proposed here feels much more like a
>> hacked-up syntax designed to meet this one use case, rather than a
>> good general-purpose design that can be easily extended.
> But I don't understand why? What is a use case? What is a benefit against
> command line, or libpq variables? And why should config files be better as
> a solution for limited length of command line, when I need to dump
> thousands of tables exactly specified?
Because next week somebody will want to dump thousands of functions
selected by name, or schemas selected by name, etc etc. I agree with
the position that we don't want a single-purpose solution. The idea
that the syntax should match the command line switch syntax seems
reasonable, though I'm not wedded to it. (One thing to consider is
how painful will it be for people to quote table names containing
funny characters, for instance. On the command line, we largely
depend on the shell's quoting behavior to solve that, but we'd not
have that infrastructure when reading from a file.)
> What are the benefits of supporting multiple formats?
Yeah, that part of Dean's sketch seemed like overkill to me too.
I wasn't very excited about multiple switch files either, though
depending on how the implementation is done, that could be simple
enough to be in the might-as-well category.
One other point that I'm wondering about is that there's really no
value in doing anything here until you get to some thousands of
table names; as long as the list fits in the shell's command line
length limit, you might as well just make a shell script file.
Does pg_dump really have sane performance for that situation, or
are we soon going to be fielding requests to make it not be O(N^2)
in the number of listed tables?
Here is a fresh implementation. I used the name of a new option - "options-file". Looks more accurate than "config", but the name can be changed easily anytime.
Any short or long option can be read from this file in simple format - one option per line. Arguments inside double quotes can be multi lined. Row comments started by # and can be used everywhere.
The implementation is very generic - support of new options doesn't require change of this new part code. The parser can ignore white spaces almost everywhere, where it has sense.
The option should start with "-" or "--" in the options file too, because this is necessary for good detection if the option is short or long.
Regards
Pavel
regards, tom lane
Attachment
pá 27. 11. 2020 v 19:45 odesílatel Stephen Frost <sfrost@snowman.net> napsal:
Greetings,
* Pavel Stehule (pavel.stehule@gmail.com) wrote:
> > I agree that being able to configure pg_dump via a config file would
> > be very useful, but the syntax proposed here feels much more like a
> > hacked-up syntax designed to meet this one use case, rather than a
> > good general-purpose design that can be easily extended.
>
> Nobody sent a real use case for introducing the config file. There was a
> discussion about formats, and you introduce other dimensions and
> variability.
I'm a bit baffled by this because it seems abundently clear to me that
being able to have a config file for pg_dump would be extremely helpful.
There's no shortage of times that I've had to hack up a shell script and
figure out quoting and set up the right set of options for pg_dump,
resulting in things like:
pg_dump \
--host=myserver.com \
--username=postgres \
--schema=public \
--schema=myschema \
--no-comments \
--no-tablespaces \
--file=somedir \
--format=d \
--jobs=5
which really is pretty grotty. Being able to have a config file that
has proper comments would be much better and we could start to extend to
things like "please export schema A to directory A, schema B to
directory B" and other ways of selecting source and destination, and
imagine if we could validate it too, eg:
pg_dump --config=whatever --dry-run
or --check-config maybe.
This isn't a new concept either- export and import tools for other
databases have similar support, eg: Oracle's imp/exp tool, mysqldump
(see: https://dev.mysql.com/doc/refman/8.0/en/option-files.html which
has a TOML-looking format too), pgloader of course has a config file,
etc. We certainly aren't in novel territory here
Still, I am not a fan of this. pg_dump is a simple tool for simple purposes. It is not a pgloader or any ETL tool. It can be changed in future, maybe, but still, why? And any time, there will be a question if pg_dump is a good foundation for massive enhancement in ETL direction. The development in C is expensive and pg_dump is too Postgres specific, so I cannot imagine so pg_dump will be used for some complex tasks directly, and there will be requirements for special configuration. When we have a pgloader, then we don't need to move pg_dump in the pgloader direction.
Anyway - new patch allows to store any options (one per line) with possible comments (everywhere in line) and argument's can be across more lines. It hasn't any more requirements on memory or CPU.
Regards
Pavel
Thanks,
Stephen
On Sat, Nov 28, 2020 at 09:14:35PM +0100, Pavel Stehule wrote: > Any short or long option can be read from this file in simple format - one > option per line. Arguments inside double quotes can be multi lined. Row > comments started by # and can be used everywhere. Does this support even funkier table names ? This tests a large number and fraction of characters in dbname/username, so all of pg_dump has to continue supporting that: ./src/bin/pg_dump/t/010_dump_connstr.pl I tested and it seems to work with -t "fooå" But it didn't work with -t "foo\nbar" (literal newline). Fix attached. If you send another patch, please consider including a test case for quoted names in long and short options. > +static char *optsfilename = NULL; > + * It assign the values of options to related DumpOption fields or to > + * some global values. It is called from twice. First, for processing > + * the command line argumens. Second, for processing an options from > + * options file. This didn't support multiple config files, nor config files which include config files, as Dean and I mentioned. I think the argument parsers should themselves call the config file parser, as need be, so the last option specification should override previous ones. For example pg_dump --config-file=./pg_dump.conf --blobs should have blobs even if the config file says --no-blobs. (Command-line arguments normally take precedence over config files, certainly if the argument is specified "later"). I think it'd be ok if it's recursive. I made a quick hack to do that. I doubt this will satisfy Stephen. Personally, I would use this if it were a plain and simple text config file (which for our purposes I would pass on stdin), and I would almost certainly not use it if it were json. But it'd be swell if there were a standard config file format, that handled postgresql.conf and maybe pg_hba.conf. -- Justin
Attachment
Hi
ne 29. 11. 2020 v 0:49 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Sat, Nov 28, 2020 at 09:14:35PM +0100, Pavel Stehule wrote:
> Any short or long option can be read from this file in simple format - one
> option per line. Arguments inside double quotes can be multi lined. Row
> comments started by # and can be used everywhere.
here is updated patch
Does this support even funkier table names ?
This tests a large number and fraction of characters in dbname/username, so all
of pg_dump has to continue supporting that:
./src/bin/pg_dump/t/010_dump_connstr.pl
I tested and it seems to work with -t "fooå"
But it didn't work with -t "foo\nbar" (literal newline). Fix attached.
If you send another patch, please consider including a test case for quoted
names in long and short options.
I implemented some basic backslash escaping. I will write more tests, when there will be good agreement on the main concept.
> +static char *optsfilename = NULL;
> + * It assign the values of options to related DumpOption fields or to
> + * some global values. It is called from twice. First, for processing
> + * the command line argumens. Second, for processing an options from
> + * options file.
This didn't support multiple config files, nor config files which include
config files, as Dean and I mentioned. I think the argument parsers should
themselves call the config file parser, as need be, so the last option
specification should override previous ones.
For example pg_dump --config-file=./pg_dump.conf --blobs should have blobs even
if the config file says --no-blobs. (Command-line arguments normally take
precedence over config files, certainly if the argument is specified "later").
I think it'd be ok if it's recursive. I made a quick hack to do that.
I did it. I used a different design than you. Making "dopt" be a global variable looks too invasive. Almost all functions there expect "dopt" as an argument. But I think it is not necessary.
I implemented two iterations of argument's processing. 1. for options file (more options-file options are allowed, and nesting is allowed too), 2. all other arguments from the command line. Any options file is processed only once - second processing is ignored. So there is no problem with cycles.
The name of the new option - "config-file" or "options-file" ? I prefer "options-file". "config-file" is valid too, but "options-file" is more specific, more descriptive (it is self descriptive).
I merged your patch with a fix of typos.
Regards
Pavel
I doubt this will satisfy Stephen. Personally, I would use this if it were a
plain and simple text config file (which for our purposes I would pass on
stdin), and I would almost certainly not use it if it were json. But it'd be
swell if there were a standard config file format, that handled postgresql.conf
and maybe pg_hba.conf.
--
Justin
Attachment
Hi
rebase
Regards
Pavel
Attachment
Hi
út 16. 2. 2021 v 20:32 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hirebase
fresh rebase
Regards
Pavel
RegardsPavel
Attachment
Hi
ne 11. 4. 2021 v 9:48 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
rebase
fresh rebaseRegardsPavelRegardsPavel
Attachment
Hi, I started looking at the patch allowing to export just functions [1], and I got pointed to this patch as an alternative approach (to adding a separate filtering option for every possible object type). I'm familiar with the customer that inspired Pavel to start working on this, so I understand the use case he's trying to address - a flexible way to filter (include/exclude) large number of objects. IMHO it's a mistake to try to broaden the scope of the patch and require implementing some universal pg_dump config file, particularly if it requires "complex" structure or formats like JSON, TOML or whatever. Maybe that's worth doing, but in my mind it's orthogonal to what this patch aims (or aimed) to do - filtering objects using rules in a file, not on the command line. I believe it's much closer to .gitignore or rsync --filter than to a full config file. Even if we end up implementing the pg_dump config file, it'd be nice to keep the filter rules in a separate file and just reference that file from the config file. That also means I find it pointless to use an "advanced" format like JSON or TOML - I think the format should be as simple as possible. Yes, it has to support all valid identifiers, comments and so on. But I don't quite see a point in using JSON or similar "full" format. If a simple format is good enough for rsync or gitignore, why should we insist on using something more complex? OTOH I don't quite like the current approach of simply reading options from a file, because that requires adding new command-line options for each type of object we want to support. Which seems to contradict the idea of "general filter" method as mentioned in [1]. So if it was up to me, I'd go back to the original format or something close it. So something like this: [+-] OBJECT_TYPE_PATTERN OBJECT_NAME_PATTERN regards [1] https://commitfest.postgresql.org/33/3051/ -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On 10 Jul 2021, at 17:47, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > So if it was up to me, I'd go back to the original format or something close it. So something like this: > > [+-] OBJECT_TYPE_PATTERN OBJECT_NAME_PATTERN That still leaves the parsing with quoting and escaping that needs to be done less trivial and more bespoke than what meets the eye, no? As mentioned upthread, I'm still hesitant to add a file format which deosn't have any version information of sorts for distinguishing it from when the inevitable "now wouldn't it be nice if we could do this too" patch which we all know will come. The amount of selectivity switches we have for pg_dump is an indication about just how much control users like, this will no doubt be subject to the same. -- Daniel Gustafsson https://vmware.com/
On 7/13/21 12:08 AM, Daniel Gustafsson wrote: >> On 10 Jul 2021, at 17:47, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > >> So if it was up to me, I'd go back to the original format or something close it. So something like this: >> >> [+-] OBJECT_TYPE_PATTERN OBJECT_NAME_PATTERN > > That still leaves the parsing with quoting and escaping that needs to be done > less trivial and more bespoke than what meets the eye, no? > Yes, it'd require proper escaping/quoting of the fields/identifiers etc. > As mentioned upthread, I'm still hesitant to add a file format which deosn't > have any version information of sorts for distinguishing it from when the > inevitable "now wouldn't it be nice if we could do this too" patch which we all > know will come. The amount of selectivity switches we have for pg_dump is an > indication about just how much control users like, this will no doubt be > subject to the same. > I'm not going to fight against some sort of versioning, but I think keeping the scope as narrow as possible would make it unnecessary. That is, let's stick to the original goal to allow passing filtering rules that would not fit on the command-line, and maybe let's make it a bit more flexible to support other object types etc. IMHO the filtering rules are simple enough to not really need elaborate versioning, and if a more advanced rule is proposed in the future it can be supported in the existing format (extra field, ...). Of course, maybe my imagination is not wild enough. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2021-Jul-13, Tomas Vondra wrote: > I'm not going to fight against some sort of versioning, but I think keeping > the scope as narrow as possible would make it unnecessary. That is, let's > stick to the original goal to allow passing filtering rules that would not > fit on the command-line, and maybe let's make it a bit more flexible to > support other object types etc. > > IMHO the filtering rules are simple enough to not really need elaborate > versioning, and if a more advanced rule is proposed in the future it can be > supported in the existing format (extra field, ...). I don't understand why is versioning needed for this file. Surely we can just define some line-based grammar that's accepted by the current pg_dump[1] and that would satisfy the current need as well as allowing for extending the grammar in the future; even JSON or Windows-INI format (ugh?) if that's necessary to tailor the output file in some other way not covered by that. [1] your proposal of "[+-] OBJTYPE OBJIDENT" plus empty lines allowed plus lines starting with # are comments, seems plenty. Any line not following that format would cause an error to be thrown. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > [1] your proposal of "[+-] OBJTYPE OBJIDENT" plus empty lines allowed > plus lines starting with # are comments, seems plenty. Any line not > following that format would cause an error to be thrown. I'd like to see some kind of keyword on each line, so that we could extend the command set by adding new keywords. As this stands, I fear we'd end up using random punctuation characters in place of [+-], which seems pretty horrid from a readability standpoint. I think that this file format should be designed with an eye to allowing every, or at least most, pg_dump options to be written in the file rather than on the command line. I don't say we have to *implement* that right now; but if the format spec is incapable of being extended to meet requests like that one, I think we'll regret it. This line of thought suggests that the initial commands ought to match the existing include/exclude switches, at least approximately. Hence I suggest include table PATTERN exclude table PATTERN which ends up being the above but with words not [+-]. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > [1] your proposal of "[+-] OBJTYPE OBJIDENT" plus empty lines allowed > > plus lines starting with # are comments, seems plenty. Any line not > > following that format would cause an error to be thrown. > > I'd like to see some kind of keyword on each line, so that we could extend > the command set by adding new keywords. As this stands, I fear we'd end > up using random punctuation characters in place of [+-], which seems > pretty horrid from a readability standpoint. I agree that it'd end up being bad with single characters. > I think that this file format should be designed with an eye to allowing > every, or at least most, pg_dump options to be written in the file rather > than on the command line. I don't say we have to *implement* that right > now; but if the format spec is incapable of being extended to meet > requests like that one, I think we'll regret it. This line of thought > suggests that the initial commands ought to match the existing > include/exclude switches, at least approximately. I agree that we want to have an actual config file that allows just about every pg_dump option. I'm also fine with saying that we don't have to implement that initially but the format should be one which can be extended to allow that. > Hence I suggest > > include table PATTERN > exclude table PATTERN > > which ends up being the above but with words not [+-]. Which ends up inventing yet-another-file-format which people will end up writing generators and parsers for. Which is exactly what I was arguing we really should be trying to avoid doing. I definitely feel that we should have a way to allow anything that can be created as an object in the database to be explicitly included in the file and that means whatever we do need to be able to handle objects that have names that span multiple lines, etc. It's not clear how the above would. As I recall, the proposed patch didn't have anything for handling that, which was one of the issues I had with it and is why I bring it up again. Thanks, Stephen
Attachment
On 7/13/21 3:40 PM, Stephen Frost wrote: > Greetings, > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>> [1] your proposal of "[+-] OBJTYPE OBJIDENT" plus empty lines allowed >>> plus lines starting with # are comments, seems plenty. Any line not >>> following that format would cause an error to be thrown. >> >> I'd like to see some kind of keyword on each line, so that we could extend >> the command set by adding new keywords. As this stands, I fear we'd end >> up using random punctuation characters in place of [+-], which seems >> pretty horrid from a readability standpoint. > > I agree that it'd end up being bad with single characters. > The [+-] format is based on what rsync does, so there's at least some precedent for that, and IMHO it's fairly readable. I agree the rest of the rule (object type, ...) may be a bit more verbose. >> I think that this file format should be designed with an eye to allowing >> every, or at least most, pg_dump options to be written in the file rather >> than on the command line. I don't say we have to *implement* that right >> now; but if the format spec is incapable of being extended to meet >> requests like that one, I think we'll regret it. This line of thought >> suggests that the initial commands ought to match the existing >> include/exclude switches, at least approximately. > > I agree that we want to have an actual config file that allows just > about every pg_dump option. I'm also fine with saying that we don't > have to implement that initially but the format should be one which can > be extended to allow that. > I understand the desire to have a config file that may contain all pg_dump options, but I really don't see why we'd want to mix that with the file containing filter rules. I think those should be separate, one of the reasons being that I find it desirable to be able to "include" the filter rules into different pg_dump configs. That also means the format for the filter rules can be much simpler. It's also not clear to me whether the single-file approach would allow filtering not supported by actual pg_dump option, for example. >> Hence I suggest >> >> include table PATTERN >> exclude table PATTERN >> >> which ends up being the above but with words not [+-]. > Work for me. > Which ends up inventing yet-another-file-format which people will end up > writing generators and parsers for. Which is exactly what I was arguing > we really should be trying to avoid doing. > People will have to write generators *in any case* because how else would you use this? Unless we also provide tools to manipulate that file (which seems rather futile), they'll have to do that. Even if we used JSON/YAML/TOML/... they'd still need to deal with the semantics of the file format. FWIW I don't understand why would they need to write parsers. That's something we'd need to do to process the file. I think the case when the filter file needs to be modified is rather rare - it certainly is not what the original use case Pavel tried to address needs. (I know that customer and the filter would be generated and used for a single dump.) My opinion is that the best solution (to make both generators and parsers simple) is to keep the format itself as simple as possible. Which is exactly why I'm arguing for only addressing the filtering, not trying to invent a "universal" pg_dump config file format. > I definitely feel that we should have a way to allow anything that can > be created as an object in the database to be explicitly included in the > file and that means whatever we do need to be able to handle objects > that have names that span multiple lines, etc. It's not clear how the > above would. As I recall, the proposed patch didn't have anything for > handling that, which was one of the issues I had with it and is why I > bring it up again. > I really don't understand why you think the current format can't do escaping/quoting or handle names spanning multiple lines. The fact that the original patch did not handle that correctly is a bug, but it does not mean the format can't handle that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On 13 Jul 2021, at 18:14, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > FWIW I don't understand why would they need to write parsers. It's quite common to write unit tests for VM recipes/playbooks wheen using tools like Chef etc, parsing and checking the installed/generated files is part of that. This would be one very real use case for writing a parser. > I think the case when the filter file needs to be modified is rather rare - it certainly is not what the original use casePavel tried to address needs. (I know that customer and the filter would be generated and used for a single dump.) I'm not convinced that basing design decisions on a single customer reference who only want to use the code once is helpful. I hear what you're saying, but I think this will see more diverse use cases than what we can foresee here. -- Daniel Gustafsson https://vmware.com/
> On 13 Jul 2021, at 00:59, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2021-Jul-13, Tomas Vondra wrote: > >> I'm not going to fight against some sort of versioning, but I think keeping >> the scope as narrow as possible would make it unnecessary. That is, let's >> stick to the original goal to allow passing filtering rules that would not >> fit on the command-line, and maybe let's make it a bit more flexible to >> support other object types etc. >> >> IMHO the filtering rules are simple enough to not really need elaborate >> versioning, and if a more advanced rule is proposed in the future it can be >> supported in the existing format (extra field, ...). > > I don't understand why is versioning needed for this file. Surely we > can just define some line-based grammar that's accepted by the current > pg_dump[1] and that would satisfy the current need as well as allowing > for extending the grammar in the future; even JSON or Windows-INI format > (ugh?) if that's necessary to tailor the output file in some other way > not covered by that. I wasn't expressing myself very well; by "versioning" I mean a way to be able to add to/change/fix the format and still be able to deterministically parse it without having to resort to ugly heuristics and hacks. If that's achieved by an explicit version number or if it's an inherit characteristic of the format doesn't really matter (to me). My worry is that the very simple proposed format might not fit that bill, but since I don't know what the future of the feature might bring it's (mostly) a gut feeling. -- Daniel Gustafsson https://vmware.com/
Greetings,
On Tue, Jul 13, 2021 at 16:44 Daniel Gustafsson <daniel@yesql.se> wrote:
> On 13 Jul 2021, at 18:14, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> FWIW I don't understand why would they need to write parsers.
It's quite common to write unit tests for VM recipes/playbooks wheen using
tools like Chef etc, parsing and checking the installed/generated files is part
of that. This would be one very real use case for writing a parser.
Consider pgAdmin and the many other tools which essentially embed pg_dump and pg_restore. There’s no shortage of use cases for a variety of tools to be able to understand, read, parse, generate, rewrite, and probably do more, with such a pg_dump/restore config file.
> I think the case when the filter file needs to be modified is rather rare - it certainly is not what the original use case Pavel tried to address needs. (I know that customer and the filter would be generated and used for a single dump.)
I'm not convinced that basing design decisions on a single customer reference
who only want to use the code once is helpful.
Agreed.
Thanks,
Stephen
On 7/13/21 10:55 PM, Stephen Frost wrote: > Greetings, > > On Tue, Jul 13, 2021 at 16:44 Daniel Gustafsson <daniel@yesql.se > <mailto:daniel@yesql.se>> wrote: > > > On 13 Jul 2021, at 18:14, Tomas Vondra > <tomas.vondra@enterprisedb.com > <mailto:tomas.vondra@enterprisedb.com>> wrote: > > > FWIW I don't understand why would they need to write parsers. > > It's quite common to write unit tests for VM recipes/playbooks wheen > using > tools like Chef etc, parsing and checking the installed/generated > files is part > of that. This would be one very real use case for writing a parser. > > > Consider pgAdmin and the many other tools which essentially embed > pg_dump and pg_restore. There’s no shortage of use cases for a variety > of tools to be able to understand, read, parse, generate, rewrite, and > probably do more, with such a pg_dump/restore config file. > Sure. Which is why I'm advocating for the simplest possible format (and not expanding the scope of this patch beyond filtering), because that makes this kind of processing simpler. > > I think the case when the filter file needs to be modified is > rather rare - it certainly is not what the original use case Pavel > tried to address needs. (I know that customer and the filter would > be generated and used for a single dump.) > > I'm not convinced that basing design decisions on a single customer > reference > who only want to use the code once is helpful. > > > Agreed. > I wasn't really basing this on a single customer - that was merely an example, of course. FWIW Justin Pryzby already stated having to use some more complex format would likely mean they would not use the feature, so that's another data point to consider. FWIW I believe it's clear what my opinions on this topic are. Repeating that seems a bit pointless, so I'll step aside and let this thread move forward in whatever direction. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Greetings, * Tomas Vondra (tomas.vondra@enterprisedb.com) wrote: > On 7/13/21 10:55 PM, Stephen Frost wrote: > >On Tue, Jul 13, 2021 at 16:44 Daniel Gustafsson <daniel@yesql.se > ><mailto:daniel@yesql.se>> wrote: > > > > > On 13 Jul 2021, at 18:14, Tomas Vondra > > <tomas.vondra@enterprisedb.com > > <mailto:tomas.vondra@enterprisedb.com>> wrote: > > > > > FWIW I don't understand why would they need to write parsers. > > > > It's quite common to write unit tests for VM recipes/playbooks wheen > > using > > tools like Chef etc, parsing and checking the installed/generated > > files is part > > of that. This would be one very real use case for writing a parser. > > >Consider pgAdmin and the many other tools which essentially embed pg_dump > >and pg_restore. There’s no shortage of use cases for a variety of tools > >to be able to understand, read, parse, generate, rewrite, and probably do > >more, with such a pg_dump/restore config file. > > Sure. Which is why I'm advocating for the simplest possible format (and not > expanding the scope of this patch beyond filtering), because that makes this > kind of processing simpler. The simplest possible format isn't going to work with all the different pg_dump options and it still isn't going to be 'simple' since it needs to work with the flexibility that we have in what we support for object names, and is still going to require people write a new parser and generator for it instead of using something existing. I don't know that the options that I suggested previously would definitely work or not but they at least would allow other projects like pgAdmin to leverage existing code for parsing and generating these config files. I'm not completely against inventing something new, but I'd really prefer that we at least try to make something existing work first before inventing something new that everyone is going to have to deal with. If we do invent a new thing for $reasons, then we should really look at what exists today and try to design it properly instead of just throwing something together and formally document it because it's absolutely going to become a standard of sorts that people are going to almost immediately write their own parsers/generators in various languages for. Thanks, Stephen
Attachment
On 2021-Jul-13, Stephen Frost wrote: > The simplest possible format isn't going to work with all the different > pg_dump options and it still isn't going to be 'simple' since it needs > to work with the flexibility that we have in what we support for object > names, That's fine. If people want a mechanism that allows changing the other pg_dump options that are not related to object filtering, they can implement a configuration file for that. > and is still going to require people write a new parser and > generator for it instead of using something existing. Sure. That's not part of this patch. > I don't know that the options that I suggested previously would > definitely work or not but they at least would allow other projects like > pgAdmin to leverage existing code for parsing and generating these > config files. Keep in mind that this patch is not intended to help pgAdmin specifically. It would be great if pgAdmin uses the functionality implemented here, but if they decide not to, that's not terrible. They have survived decades without a pg_dump configuration file; they still can. There are several votes in this thread for pg_dump to gain functionality to filter objects based on a simple specification -- particularly one that can be written using shell pipelines. This patch gives it. > I'm not completely against inventing something new, but I'd really > prefer that we at least try to make something existing work first > before inventing something new that everyone is going to have to deal > with. That was discussed upthread and led nowhere. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)
Greetings, * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > On 2021-Jul-13, Stephen Frost wrote: > > The simplest possible format isn't going to work with all the different > > pg_dump options and it still isn't going to be 'simple' since it needs > > to work with the flexibility that we have in what we support for object > > names, > > That's fine. If people want a mechanism that allows changing the other > pg_dump options that are not related to object filtering, they can > implement a configuration file for that. It's been said multiple times that people *do* want that and that they want it to all be part of this one file, and specifically that they don't want to end up with a file structure that actively works against allowing other options to be added to it. > > I don't know that the options that I suggested previously would > > definitely work or not but they at least would allow other projects like > > pgAdmin to leverage existing code for parsing and generating these > > config files. > > Keep in mind that this patch is not intended to help pgAdmin > specifically. It would be great if pgAdmin uses the functionality > implemented here, but if they decide not to, that's not terrible. They > have survived decades without a pg_dump configuration file; they still > can. The adding of a config file for pg_dump should specifically be looking at pgAdmin as the exact use-case for having such a capability. > There are several votes in this thread for pg_dump to gain functionality > to filter objects based on a simple specification -- particularly one > that can be written using shell pipelines. This patch gives it. And several votes for having a config file that supports, or at least can support in the future, the various options which pg_dump supports- and active voices against having a new file format that doesn't allow for that. > > I'm not completely against inventing something new, but I'd really > > prefer that we at least try to make something existing work first > > before inventing something new that everyone is going to have to deal > > with. > > That was discussed upthread and led nowhere. You're right- no one followed up on that. Instead, one group continues to push for 'simple' and to just accept what's been proposed, while another group counters that we should be looking at the broader design question and work towards a solution which will work for us down the road, and not just right now. One thing remains clear- there's no consensus here. Thanks, Stephen
Attachment
You're right- no one followed up on that. Instead, one group continues
to push for 'simple' and to just accept what's been proposed, while
another group counters that we should be looking at the broader design
question and work towards a solution which will work for us down the
road, and not just right now.
One thing remains clear- there's no consensus here.
I think there should be some misunderstanding about the target of this patch, and I am afraid so there cannot be consensus, because the people are speaking about two very different features. And it is not possible to push it to one thing. It cannot work I am afraid.
1. The main target of this patch is to solve the problem with the too large command line of pg_dump when there are a lot of dumped objects. You need to call pg_dump only once to ensure dump in one transaction. And sometimes it is not possible to use wild characters effectively, because the state of objects is in different databases. Enhancing the length of the command line is not secure, and there are other production issues. In this case you need a very simple format - just because you want to use pg_dump in pipe. This format should be line oriented - and usually it will contain just "dump this table, dump second table". Nothing else. Nobody will read this format, nobody will edit this format. Because the main platform for this format is probably the UNIX shell, the format should be simple. I really don't see any joy in generating JSON and parsing JSON later. These data will be processed locally. This is one purpose designed format, and it is not designed for holding configuration. For this purpose the complex format has not any advantage. There is not a problem with parsing JSON or other formats on the pg_dump side, but it is pretty hard to generate valid JSON from bash script. For a unix shell we need the most possible simple format. Theoretically this format (this file) can hold any pg_dump's option, but for usual streaming processing the only filter's options will be there. Originally this feature had the name "filter file". There are a lot of examples of successful filter's file formats in the UNIX world, and I think so nobody doubts about sense and usability. Probably there is a consensus so filter's files are not config files.
The format of the filter file can look like "+d tablename" or "include data tablename". If we find a consensus so the filter file is a good thing, then the format design and implementation is easy work. Isn't problem to invent comment lines.
2. Is true, so there is only a small step from filter's file to option's file. I rewrote this patch in this direction. The advantage is universality - it can support any options without necessity to modify related code. Still this format is not difficult for producers, and it is simple for parsing. Now, the format should be defined by command line format: "-t tablename" or "--table tablename" or "table tablename". There can be issues related to different parsers in shell and in implemented code, but it can be solved. Isn't problem to introduce comment lines. The big advantage is simplicity of usage, simplicity of implementation - more the implementation is generic.
3. But the option's file is just a small step to config file. I can imagine somebody wanting to store typical configuration (and usual options) for psql, pg_dump, pg_restore, pgAdmin, ... somewhere. The config files are very different creatures than filter's files. Although they can be generated, usually are edited and can be very complex. There can be shared parts for all applications, and specific sections for psql, and specific sections for every database. The config files can be brutally complex. The simple text format is not good for this purpose. And some people prefer YAML, some people hate this format. Other people prefer XML or JSON or anything else. Sometimes the complexity of config files is too big, and people prefer startup scripting.
Although there is an intersection between filter's files and config files, I see very big differences in usage. Filter's files are usually temporal and generated and non shared. Config file's are persistent, usually manually modified and can be shared. The requests are different, and should be different too. I don't propose any configuration's file related features, and my proposal doesn't block the introduction of configuration's file in any format in future. I think these features are very different, and should be implemented differently. The filter's file or option's file will be a pretty ugly config file, and config's file will be a pretty impractical filter's file.
So can we talk about implementation of filter's file or option's file? And can we talk about implementation config's files in separate topics? Without it, I am afraid so there is no possibility of finding an agreement and moving forward.
Regards
Pavel
Thanks,
Stephen
On 7/14/21 2:18 AM, Stephen Frost wrote: > Greetings, > > * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: >> On 2021-Jul-13, Stephen Frost wrote: >>> The simplest possible format isn't going to work with all the different >>> pg_dump options and it still isn't going to be 'simple' since it needs >>> to work with the flexibility that we have in what we support for object >>> names, >> >> That's fine. If people want a mechanism that allows changing the other >> pg_dump options that are not related to object filtering, they can >> implement a configuration file for that. > > It's been said multiple times that people *do* want that and that they > want it to all be part of this one file, and specifically that they > don't want to end up with a file structure that actively works against > allowing other options to be added to it. > I have no problem believing some people want to be able to specify pg_dump parameters in a file, similarly to IMPDP/EXPDP parameter files etc. That seems useful, but I doubt they considered the case with many filter rules ... which is what "my people" want. Not sure how keeping the filter rules in a separate file (which I assume is what you mean by "file structure"), with a format tailored for filter rules, works *actively* against adding options to the "main" config. I'm not buying the argument that keeping some of the stuff in a separate file is an issue - plenty of established tools do that, the concept of "including" a config is not a radical new thing, and I don't expect we'd have many options supported by a file. In any case, I think user input is important, but ultimately it's up to us to reconcile the conflicting requirements coming from various users and come up with a reasonable compromise design. >>> I don't know that the options that I suggested previously would >>> definitely work or not but they at least would allow other projects like >>> pgAdmin to leverage existing code for parsing and generating these >>> config files. >> >> Keep in mind that this patch is not intended to help pgAdmin >> specifically. It would be great if pgAdmin uses the functionality >> implemented here, but if they decide not to, that's not terrible. They >> have survived decades without a pg_dump configuration file; they still >> can. > > The adding of a config file for pg_dump should specifically be looking > at pgAdmin as the exact use-case for having such a capability. > >> There are several votes in this thread for pg_dump to gain functionality >> to filter objects based on a simple specification -- particularly one >> that can be written using shell pipelines. This patch gives it. > > And several votes for having a config file that supports, or at least > can support in the future, the various options which pg_dump supports- > and active voices against having a new file format that doesn't allow > for that. > IMHO the whole "problem" here stems from the question whether there should be a single universal pg_dump config file, containing everything including the filter rules. I'm of the opinion it's better to keep the filter rules separate, mainly because: 1) simplicity - Options (key/value) and filter rules (with more internal structure) seem quite different, and mixing them in the same file will just make the format more complex. 2) flexibility - Keeping the filter rules in a separate file makes it easier to reuse the same set of rules with different pg_dump configs, specified in (much smaller) config files. So in principle, the "main" config could use e.g. TOML or whatever we find most suitable for this type of key/value config file (or we could just use the same format as for postgresql.conf et al). And the filter rules could use something as simple as CSV (yes, I know it's not great, but there's plenty of parsers, it handles multi-line strings etc.). >>> I'm not completely against inventing something new, but I'd really >>> prefer that we at least try to make something existing work first >>> before inventing something new that everyone is going to have to deal >>> with. >> >> That was discussed upthread and led nowhere. > > You're right- no one followed up on that. Instead, one group continues > to push for 'simple' and to just accept what's been proposed, while > another group counters that we should be looking at the broader design > question and work towards a solution which will work for us down the > road, and not just right now. > I have quite thick skin, but I have to admit I rather dislike how this paints the people arguing for simplicity. IMO simplicity is a perfectly legitimate (and desirable) design feature, and simpler solutions often fare better in the long run. Yes, we need to look at the broader design, no doubt about that. > One thing remains clear- there's no consensus here. > True. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
st 12. 5. 2021 v 8:22 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi
fresh rebase
RegardsPavelRegardsPavel
Attachment
Hi
út 13. 7. 2021 v 1:16 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> [1] your proposal of "[+-] OBJTYPE OBJIDENT" plus empty lines allowed
> plus lines starting with # are comments, seems plenty. Any line not
> following that format would cause an error to be thrown.
I'd like to see some kind of keyword on each line, so that we could extend
the command set by adding new keywords. As this stands, I fear we'd end
up using random punctuation characters in place of [+-], which seems
pretty horrid from a readability standpoint.
I think that this file format should be designed with an eye to allowing
every, or at least most, pg_dump options to be written in the file rather
than on the command line. I don't say we have to *implement* that right
now; but if the format spec is incapable of being extended to meet
requests like that one, I think we'll regret it. This line of thought
suggests that the initial commands ought to match the existing
include/exclude switches, at least approximately.
Hence I suggest
include table PATTERN
exclude table PATTERN
which ends up being the above but with words not [+-].
Here is an updated implementation of filter's file, that implements syntax proposed by you.
Regards
Pavel
regards, tom lane
Attachment
> On 28 Jul 2021, at 09:28, Pavel Stehule <pavel.stehule@gmail.com> wrote: > út 13. 7. 2021 v 1:16 odesílatel Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> napsal: > Hence I suggest > > include table PATTERN > exclude table PATTERN > > which ends up being the above but with words not [+-]. One issue with this syntax is that the include keyword can be quite misleading as it's semantic interpretion of "include table t" can be different from "--table=t". The former is less clear about the fact that it means "exclude all other tables than " then the latter. It can be solved with documentation, but I think that needs be to be made clearer. > Here is an updated implementation of filter's file, that implements syntax proposed by you. While it's not the format I would prefer, it does allow for most (all?) use cases expressed in this thread with ample armtwisting applied so let's go ahead from this point and see if we can agree on it (or a version of it). A few notes on the patch after a first pass over it: +(include|exclude)[table|schema|foreign_data|data] <replaceable class="parameter">objectname</replaceable> Lacks whitespace between keyword and object type. Also, since these are mandatory parameters, shouldn't they be within '{..}' ? + /* skip initial white spaces */ + while (isblank(*ptr)) + ptr += 1; We don't trust isblank() as of 3fd5faed5 due to portability concerns, this should probably use a version of the pg_isblank() we already have (and possibly move that to src/common/string.c as there now are more consumers). +static bool +isblank_line(const char *line) This could be replaced with a single call to strspn() as we already do for parsing the TOC file. + /* when first char is hash, ignore whole line */ + if (*str == '#') + continue; I think we should strip leading whitespace before this to allow commentlines to start with whitespace, it's easy enough and will make life easier for users. + pg_log_error("invalid format of filter file \"%s\": %s", + filename, + message); + + fprintf(stderr, "%d: %s\n", lineno, line); Can't we just include the lineno in the error logging and skip dumping the offending line? Fast-forwarding the pointer to print the offending part is less useful than a context marker, and in some cases suboptimal. With this coding, if a pattern is omitted for example the below error message is given: pg_dump: error: invalid format of filter file "filter.txt": missing object name 1: The errormessage and the linenumber in the file should be enough for the user to figure out what to fix. + if (keyword && is_keyword(keyword, size, "table")) + objecttype = 't'; Should this use an enum, or at least a struct translation the literal keyword to the internal representation? Magic constants without explicit connection to their token counterparts can easily be cause of bugs. If I create a table called "a\nb" and try to dump it I get an error in parsing the file. Isn't this supposed to work? $ cat filter.txt include table "a b" $ ./bin/pg_dump --filter=filter.txt pg_dump: error: invalid format of filter file "filter.txt": unexpected chars after object name 2: Did you consider implementing this in Bison to abstract some of the messier parsing logic? -- Daniel Gustafsson https://vmware.com/
On Wed, Jul 28, 2021 at 09:28:17AM +0200, Pavel Stehule wrote: > Here is an updated implementation of filter's file, that implements syntax > proposed by you. Thanks. If there's any traction for this approach. I have some comments for the next revision, > +++ b/doc/src/sgml/ref/pg_dump.sgml > @@ -789,6 +789,56 @@ PostgreSQL documentation > </listitem> > </varlistentry> > > + <varlistentry> > + <term><option>--filter=<replaceable class="parameter">filename</replaceable></option></term> > + <listitem> > + <para> > + Read objects filters from the specified file. > + If you use "-" as a filename, the filters are read from stdin. Say 'Specify "-" to read from stdin' > + The lines starting with symbol <literal>#</literal> are ignored. Remove "The" and "symbol" > + Previous white chars (spaces, tabs) are not allowed. These Preceding whitespace characters... But actually, they are allowed? But if it needs to be explained, maybe they shouldn't be - I don't see the utility of it. > +static bool > +isblank_line(const char *line) > +{ > + while (*line) > + { > + if (!isblank(*line++)) > + return false; > + } > + > + return true; > +} I don't think this requires nor justifies having a separate function. Either don't support blank lines, or use get_keyword() with size==0 for that ? > + /* Now we expect sequence of two keywords */ > + if (keyword && is_keyword(keyword, size, "include")) > + is_include = true; > + else if (keyword && is_keyword(keyword, size, "exclude")) > + is_include = false; > + else I think this should first check "if keyword == NULL". That could give a more specific error message like "no keyword found", > + exit_invalid_filter_format(fp, > + filename, > + "expected keyword \"include\" or \"exclude\"", > + line.data, > + lineno); ..and then this one can say "invalid keyword". -- Justin
Hi
po 13. 9. 2021 v 15:01 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 28 Jul 2021, at 09:28, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> út 13. 7. 2021 v 1:16 odesílatel Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> napsal:
> Hence I suggest
>
> include table PATTERN
> exclude table PATTERN
>
> which ends up being the above but with words not [+-].
One issue with this syntax is that the include keyword can be quite misleading
as it's semantic interpretion of "include table t" can be different from
"--table=t". The former is less clear about the fact that it means "exclude
all other tables than " then the latter. It can be solved with documentation,
but I think that needs be to be made clearer.
I invite any documentation enhancing and fixing
> Here is an updated implementation of filter's file, that implements syntax proposed by you.
While it's not the format I would prefer, it does allow for most (all?) use
cases expressed in this thread with ample armtwisting applied so let's go ahead
from this point and see if we can agree on it (or a version of it).
A few notes on the patch after a first pass over it:
+(include|exclude)[table|schema|foreign_data|data] <replaceable class="parameter">objectname</replaceable>
Lacks whitespace between keyword and object type. Also, since these are
mandatory parameters, shouldn't they be within '{..}' ?
yes, fixed
+ /* skip initial white spaces */
+ while (isblank(*ptr))
+ ptr += 1;
We don't trust isblank() as of 3fd5faed5 due to portability concerns, this
should probably use a version of the pg_isblank() we already have (and possibly
move that to src/common/string.c as there now are more consumers).
I rewrote this part, and I don't use function isblank ever
+static bool
+isblank_line(const char *line)
This could be replaced with a single call to strspn() as we already do for
parsing the TOC file.
+ /* when first char is hash, ignore whole line */
+ if (*str == '#')
+ continue;
I think we should strip leading whitespace before this to allow commentlines to
start with whitespace, it's easy enough and will make life easier for users.
now, the comments can be used as first non blank char or after filter
+ pg_log_error("invalid format of filter file \"%s\": %s",
+ filename,
+ message);
+
+ fprintf(stderr, "%d: %s\n", lineno, line);
Can't we just include the lineno in the error logging and skip dumping the
offending line? Fast-forwarding the pointer to print the offending part is
less useful than a context marker, and in some cases suboptimal. With this
coding, if a pattern is omitted for example the below error message is given:
pg_dump: error: invalid format of filter file "filter.txt": missing object name
1:
The errormessage and the linenumber in the file should be enough for the user
to figure out what to fix.
I did it like you proposed, but still, I think the content can be useful. More times you read dynamically generated files, or you read data from stdin, and in complex environments it can be hard regenerate new content for debugging.
+ if (keyword && is_keyword(keyword, size, "table"))
+ objecttype = 't';
Should this use an enum, or at least a struct translation the literal keyword
to the internal representation? Magic constants without explicit connection to
their token counterparts can easily be cause of bugs.
fixed
If I create a table called "a\nb" and try to dump it I get an error in parsing
the file. Isn't this supposed to work?
$ cat filter.txt
include table "a
b"
$ ./bin/pg_dump --filter=filter.txt
pg_dump: error: invalid format of filter file "filter.txt": unexpected chars after object name
2:
probably there was some issue, because it should work. I tested a new version and this is tested in new regress tests. Please, check
Did you consider implementing this in Bison to abstract some of the messier
parsing logic?
Initially not, but now, when I am thinking about it, I don't think so Bison helps. The syntax of the filter file is nicely linear. Now, the code of the parser is a little bit larger than minimalistic, but it is due to nicer error's messages. The raw implementation in Bison raised just "syntax error" and positions. I did code refactoring, and now the scanning, parsing and processing are divided into separated routines. Parsing related code has 90 lines. In this case, I don't think using a parser grammar file can carry any benefit. grammar is more readable, sure, but we need to include bison, we need to handle errors, and if we want to raise more helpful errors than just "syntax error", then the code will be longer.
please, check attached patch
Regards
Pavel
--
Daniel Gustafsson https://vmware.com/
Attachment
po 13. 9. 2021 v 15:11 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Wed, Jul 28, 2021 at 09:28:17AM +0200, Pavel Stehule wrote:
> Here is an updated implementation of filter's file, that implements syntax
> proposed by you.
Thanks.
If there's any traction for this approach. I have some comments for the next
revision,
> +++ b/doc/src/sgml/ref/pg_dump.sgml
> @@ -789,6 +789,56 @@ PostgreSQL documentation
> </listitem>
> </varlistentry>
>
> + <varlistentry>
> + <term><option>--filter=<replaceable class="parameter">filename</replaceable></option></term>
> + <listitem>
> + <para>
> + Read objects filters from the specified file.
> + If you use "-" as a filename, the filters are read from stdin.
Say 'Specify "-" to read from stdin'
> + The lines starting with symbol <literal>#</literal> are ignored.
Remove "The" and "symbol"
> + Previous white chars (spaces, tabs) are not allowed. These
Preceding whitespace characters...
But actually, they are allowed? But if it needs to be explained, maybe they
shouldn't be - I don't see the utility of it.
> +static bool
> +isblank_line(const char *line)
> +{
> + while (*line)
> + {
> + if (!isblank(*line++))
> + return false;
> + }
> +
> + return true;
> +}
I don't think this requires nor justifies having a separate function.
Either don't support blank lines, or use get_keyword() with size==0 for that ?
> + /* Now we expect sequence of two keywords */
> + if (keyword && is_keyword(keyword, size, "include"))
> + is_include = true;
> + else if (keyword && is_keyword(keyword, size, "exclude"))
> + is_include = false;
> + else
I think this should first check "if keyword == NULL".
That could give a more specific error message like "no keyword found",
> + exit_invalid_filter_format(fp,
> + filename,
> + "expected keyword \"include\" or \"exclude\"",
> + line.data,
> + lineno);
..and then this one can say "invalid keyword".
I fixed (I hope) mentioned issues. Please check last patch
Regards
Pavel
--
Justin
Hi
In yesterday's patch I used strndup, which is not available on win. I am sending update when I used pnstrdup instead.
Regards
Pavel
Attachment
As there have been a lot of differing opinions raised in this thread, I re-read it and tried to summarize the discussion so far to try and figure out where we agree and on what (and disagree) before we get deep into the technicalities wrt the current patch. If anyone feel I've misrepresented them below then I sincerely do apologize. If I missed a relevant viewpoint I also apologize, I've tried to objectively represent the thread. I proposed JSON in [0] which is where the format discussion to some extent started, Justin and Pavel had up until that point discussed the format by refining the original proposal. In [1] Surafel Temesgen brought up --exclude-database from pg_dumpall and --no-comments, and argued for them being handled by this patch. This was objected against on the grounds that pg_dumpall is out of scope, and all-or-nothing switches not being applicable in a filter option. Stephen objected to both the proposed, and the suggestion of JSON, in [2] and argued for a more holistic configuration file approach. TOML was suggested. Dean then +1'd the config file approach in [3]. In [4] Tom supported the idea of a more generic config file, and remarked that the proposed filter for table names only makes sense when the number of exclude patterns are large enough that we might hit other problems in pg_dump. Further, in [5] Tom commented that a format with established quoting conventions would buy us not having to invent our own to cope with complicated relation names. The fact that JSON doesn't support comments is brought up in a few emails and is a very valid point, as the need for comments regardless of format is brought up as well. Tomas Vondra in [6] wanted the object filter be a separate file from a config file, and argued for a simpler format for these lists (while still supporting multiple object types). Alvaro agreed with Tomas on [+-] OBJTYPE OBJIDENT in [7] and Tom extended the proposal to use [include/exclude] keywords in [8] in order to support more than just excluding and including. Regardless of stance on format, the use of keywords instead of [+-] is a rare point of consensus in this thread. Stephen and myself have also expressed concern in various parts of the thread that inventing our own format rather than using something with existing broad library support will end up with third-parties (like pgAdmin et.al) having to all write their own generators and parsers. A main concern among most (all?) participants of the thread, regardless of format supported, is that quoting is hard and must be done right for all object names postgres support (including any not currently in scope by this patch). Below is an attempt at summarizing and grouping the proposals so far into the set of ideas presented: A) A keyword+object based format to invoke with a switch to essentially allow for more filters than the commandline can handle and nothing more. After a set of revisions, the current proposal is: [include|exclude] [<objtype>] [<objident>] B) A format similar to (A) which can also be used for pg_dump configuration C) The format in (A), or a close variant thereof, with the intention of it being included in/referred to from a future configuration file of currently unknown format. One reference being a .gitignore type file. D) An existing format (JSON and TOML have been suggested, with JSON being dismissed due to lack of comment support) which has quoting conventions that supports postgres' object names and which can be used to define a full pg_dump configuration file syntax. For B), C) and D) there is implicit consensus in the thread that we don't need to implement the full configuration file as of this patch, merely that it *must* be possible to do so without having to paint ourselves out of a corner. At this point it seems to me that B) and C) has the broadest support. Can the C) option may represent the compromise between "simple" format for object filtering and a more structured format for configuration? Are there other options? Thoughts? -- Daniel Gustafsson https://vmware.com/ [0] https://postgr.es/m/F6674FF0-5800-4AED-9DC7-13C475707241@yesql.se [1] https://postgr.es/m/CALAY4q9u30L7oGhbsfY3dPECQ8SrYa8YO=H-xOn5xWUeiEneeg@mail.gmail.com [2] https://postgr.es/m/20201110200904.GU16415@tamriel.snowman.net [3] https://postgr.es/m/CAEZATCVKMG7+b+_5tNwrNZ-aNDBy3=FMRNea2bO9O4qGcEvSTg@mail.gmail.com [4] https://postgr.es/m/502641.1606334432@sss.pgh.pa.us [5] https://postgr.es/m/619671.1606406538@sss.pgh.pa.us [6] https://postgr.es/m/cb545d78-2dae-8d27-f062-822a07ca56cf@enterprisedb.com [7] https://postgr.es/m/202107122259.n6o5uwb5erza@alvherre.pgsql [8] https://postgr.es/m/3183720.1626131795@sss.pgh.pa.us
> On 15 Sep 2021, at 19:31, Pavel Stehule <pavel.stehule@gmail.com> wrote: > po 13. 9. 2021 v 15:01 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> napsal: > One issue with this syntax is that the include keyword can be quite misleading > as it's semantic interpretion of "include table t" can be different from > "--table=t". The former is less clear about the fact that it means "exclude > all other tables than " then the latter. It can be solved with documentation, > but I think that needs be to be made clearer. > > I invite any documentation enhancing and fixing Sure, that can be collabored on. This gist is though that IMO the keywords in the filter file aren't as clear on the sideeffects as the command line params, even though they are equal in functionality. > + pg_log_error("invalid format of filter file \"%s\": %s", > + filename, > + message); > + > + fprintf(stderr, "%d: %s\n", lineno, line); > Can't we just include the lineno in the error logging and skip dumping the > offending line? Fast-forwarding the pointer to print the offending part is > less useful than a context marker, and in some cases suboptimal. With this > coding, if a pattern is omitted for example the below error message is given: > > pg_dump: error: invalid format of filter file "filter.txt": missing object name > 1: > > The errormessage and the linenumber in the file should be enough for the user > to figure out what to fix. > > I did it like you proposed, but still, I think the content can be useful. Not when there is no content in the error message, printing an empty string for a line number which isn't a blank line doesn't seem terribly helpful. If we know the error context is empty, printing a tailored error hint seems more useful for the user. > More times you read dynamically generated files, or you read data from stdin, and in complex environments it can be hardregenerate new content for debugging. That seems odd given that the arguments for this format has been that it's likely to be handwritten. > If I create a table called "a\nb" and try to dump it I get an error in parsing > the file. Isn't this supposed to work? > $ cat filter.txt > include table "a > b" > $ ./bin/pg_dump --filter=filter.txt > pg_dump: error: invalid format of filter file "filter.txt": unexpected chars after object name > 2: > > probably there was some issue, because it should work. I tested a new version and this is tested in new regress tests.Please, check That seems to work, but I am unable to write a filter statement which can handle this relname: CREATE TABLE "a"" ""b" (a integer); Are you able to craft one for that? > Did you consider implementing this in Bison to abstract some of the messier > parsing logic? > > Initially not, but now, when I am thinking about it, I don't think so Bison helps. The syntax of the filter file is nicelylinear. Now, the code of the parser is a little bit larger than minimalistic, but it is due to nicer error's messages.The raw implementation in Bison raised just "syntax error" and positions. I did code refactoring, and now the scanning,parsing and processing are divided into separated routines. Parsing related code has 90 lines. In this case, I don'tthink using a parser grammar file can carry any benefit. grammar is more readable, sure, but we need to include bison,we need to handle errors, and if we want to raise more helpful errors than just "syntax error", then the code willbe longer. I'm not so concerned by code size, but rather parsing of quotations etc and being able to reason about it's correctness. IMHO that's easier done by reading a defined grammar than parsing a handwritten parser. Will do a closer review on the patch shortly. -- Daniel Gustafsson https://vmware.com/
Hi
A main concern among most (all?) participants of the thread, regardless of
format supported, is that quoting is hard and must be done right for all object
names postgres support (including any not currently in scope by this patch).
Just a small note - when quoting is calculated to design, then the implementation is easy. I am sure, so my last code covers all possibilities, and it is about 100 lines of code.
Below is an attempt at summarizing and grouping the proposals so far into the
set of ideas presented:
A) A keyword+object based format to invoke with a switch to essentially
allow for more filters than the commandline can handle and nothing more.
After a set of revisions, the current proposal is:
[include|exclude] [<objtype>] [<objident>]
B) A format similar to (A) which can also be used for pg_dump configuration
C) The format in (A), or a close variant thereof, with the intention of it
being included in/referred to from a future configuration file of currently
unknown format. One reference being a .gitignore type file.
D) An existing format (JSON and TOML have been suggested, with JSON
being dismissed due to lack of comment support) which has quoting
conventions that supports postgres' object names and which can be used to
define a full pg_dump configuration file syntax.
For B), C) and D) there is implicit consensus in the thread that we don't need
to implement the full configuration file as of this patch, merely that it
*must* be possible to do so without having to paint ourselves out of a corner.
At this point it seems to me that B) and C) has the broadest support. Can the
C) option may represent the compromise between "simple" format for object
filtering and a more structured format for configuration? Are there other
options?
What should be a benefit of this variant?
Regards
Pavel
Thoughts?
--
Daniel Gustafsson https://vmware.com/
[0] https://postgr.es/m/F6674FF0-5800-4AED-9DC7-13C475707241@yesql.se
[1] https://postgr.es/m/CALAY4q9u30L7oGhbsfY3dPECQ8SrYa8YO=H-xOn5xWUeiEneeg@mail.gmail.com
[2] https://postgr.es/m/20201110200904.GU16415@tamriel.snowman.net
[3] https://postgr.es/m/CAEZATCVKMG7+b+_5tNwrNZ-aNDBy3=FMRNea2bO9O4qGcEvSTg@mail.gmail.com
[4] https://postgr.es/m/502641.1606334432@sss.pgh.pa.us
[5] https://postgr.es/m/619671.1606406538@sss.pgh.pa.us
[6] https://postgr.es/m/cb545d78-2dae-8d27-f062-822a07ca56cf@enterprisedb.com
[7] https://postgr.es/m/202107122259.n6o5uwb5erza@alvherre.pgsql
[8] https://postgr.es/m/3183720.1626131795@sss.pgh.pa.us
pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 15 Sep 2021, at 19:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> po 13. 9. 2021 v 15:01 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> napsal:
> One issue with this syntax is that the include keyword can be quite misleading
> as it's semantic interpretion of "include table t" can be different from
> "--table=t". The former is less clear about the fact that it means "exclude
> all other tables than " then the latter. It can be solved with documentation,
> but I think that needs be to be made clearer.
>
> I invite any documentation enhancing and fixing
Sure, that can be collabored on. This gist is though that IMO the keywords in
the filter file aren't as clear on the sideeffects as the command line params,
even though they are equal in functionality.
> + pg_log_error("invalid format of filter file \"%s\": %s",
> + filename,
> + message);
> +
> + fprintf(stderr, "%d: %s\n", lineno, line);
> Can't we just include the lineno in the error logging and skip dumping the
> offending line? Fast-forwarding the pointer to print the offending part is
> less useful than a context marker, and in some cases suboptimal. With this
> coding, if a pattern is omitted for example the below error message is given:
>
> pg_dump: error: invalid format of filter file "filter.txt": missing object name
> 1:
>
> The errormessage and the linenumber in the file should be enough for the user
> to figure out what to fix.
>
> I did it like you proposed, but still, I think the content can be useful.
Not when there is no content in the error message, printing an empty string for
a line number which isn't a blank line doesn't seem terribly helpful. If we
know the error context is empty, printing a tailored error hint seems more
useful for the user.
> More times you read dynamically generated files, or you read data from stdin, and in complex environments it can be hard regenerate new content for debugging.
That seems odd given that the arguments for this format has been that it's
likely to be handwritten.
> If I create a table called "a\nb" and try to dump it I get an error in parsing
> the file. Isn't this supposed to work?
> $ cat filter.txt
> include table "a
> b"
> $ ./bin/pg_dump --filter=filter.txt
> pg_dump: error: invalid format of filter file "filter.txt": unexpected chars after object name
> 2:
>
> probably there was some issue, because it should work. I tested a new version and this is tested in new regress tests. Please, check
That seems to work, but I am unable to write a filter statement which can
handle this relname:
CREATE TABLE "a""
""b" (a integer);
Are you able to craft one for that?
I am not able to dump this directly in pg_dump. Is it possible?
> Did you consider implementing this in Bison to abstract some of the messier
> parsing logic?
>
> Initially not, but now, when I am thinking about it, I don't think so Bison helps. The syntax of the filter file is nicely linear. Now, the code of the parser is a little bit larger than minimalistic, but it is due to nicer error's messages. The raw implementation in Bison raised just "syntax error" and positions. I did code refactoring, and now the scanning, parsing and processing are divided into separated routines. Parsing related code has 90 lines. In this case, I don't think using a parser grammar file can carry any benefit. grammar is more readable, sure, but we need to include bison, we need to handle errors, and if we want to raise more helpful errors than just "syntax error", then the code will be longer.
I'm not so concerned by code size, but rather parsing of quotations etc and
being able to reason about it's correctness. IMHO that's easier done by
reading a defined grammar than parsing a handwritten parser.
Will do a closer review on the patch shortly.
--
Daniel Gustafsson https://vmware.com/
> On 17 Sep 2021, at 13:51, Pavel Stehule <pavel.stehule@gmail.com> wrote: > pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> napsal: > I am unable to write a filter statement which can > handle this relname: > > CREATE TABLE "a"" > ""b" (a integer); > > Are you able to craft one for that? > > I am not able to dump this directly in pg_dump. Is it possible? Sure, see below: $ ./bin/psql filter psql (15devel) Type "help" for help. filter=# create table "a"" filter"# ""b" (a integer); CREATE TABLE filter=# select relname from pg_class order by oid desc limit 1; relname --------- a" + "b (1 row) filter=# ^D\q $ ./bin/pg_dump -s filter -- -- PostgreSQL database dump -- -- Dumped from database version 15devel -- Dumped by pg_dump version 15devel SET statement_timeout = 0; SET lock_timeout = 0; SET idle_in_transaction_session_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = on; SELECT pg_catalog.set_config('search_path', '', false); SET check_function_bodies = false; SET xmloption = content; SET client_min_messages = warning; SET row_security = off; SET default_tablespace = ''; SET default_table_access_method = heap; -- -- Name: a" "b; Type: TABLE; Schema: public; Owner: danielg -- CREATE TABLE public."a"" ""b" ( a integer ); ALTER TABLE public."a"" ""b" OWNER TO danielg; -- -- PostgreSQL database dump complete -- -- Daniel Gustafsson https://vmware.com/
pá 17. 9. 2021 v 13:56 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 17 Sep 2021, at 13:51, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> napsal:
> I am unable to write a filter statement which can
> handle this relname:
>
> CREATE TABLE "a""
> ""b" (a integer);
>
> Are you able to craft one for that?
>
> I am not able to dump this directly in pg_dump. Is it possible?
Sure, see below:
$ ./bin/psql filter
psql (15devel)
Type "help" for help.
I didn't ask on this
I asked if you can use -t and some for filtering this name
?
filter=# create table "a""
filter"# ""b" (a integer);
CREATE TABLE
filter=# select relname from pg_class order by oid desc limit 1;
relname
---------
a" +
"b
(1 row)
filter=# ^D\q
$ ./bin/pg_dump -s filter
--
-- PostgreSQL database dump
--
-- Dumped from database version 15devel
-- Dumped by pg_dump version 15devel
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;
SET default_tablespace = '';
SET default_table_access_method = heap;
--
-- Name: a" "b; Type: TABLE; Schema: public; Owner: danielg
--
CREATE TABLE public."a""
""b" (
a integer
);
ALTER TABLE public."a""
""b" OWNER TO danielg;
--
-- PostgreSQL database dump complete
--
--
Daniel Gustafsson https://vmware.com/
Greetings,
On Fri, Sep 17, 2021 at 13:59 Pavel Stehule <pavel.stehule@gmail.com> wrote:
pá 17. 9. 2021 v 13:56 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:> On 17 Sep 2021, at 13:51, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> napsal:
> I am unable to write a filter statement which can
> handle this relname:
>
> CREATE TABLE "a""
> ""b" (a integer);
>
> Are you able to craft one for that?
>
> I am not able to dump this directly in pg_dump. Is it possible?
Sure, see below:
$ ./bin/psql filter
psql (15devel)
Type "help" for help.I didn't ask on thisI asked if you can use -t and some for filtering this name?
For my part, at least, I don’t see that this particularly matters.. for a new feature that’s being developed to allow users to export specific tables, I would think we’d want to support any table names which can exist.
Thanks,
Stephen
> On 17 Sep 2021, at 13:59, Pavel Stehule <pavel.stehule@gmail.com> wrote: > pá 17. 9. 2021 v 13:56 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> napsal: > > On 17 Sep 2021, at 13:51, Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>> wrote: > > pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se> <mailto:daniel@yesql.se<mailto:daniel@yesql.se>>> napsal: > > I am unable to write a filter statement which can > > handle this relname: > > > > CREATE TABLE "a"" > > ""b" (a integer); > > > > Are you able to craft one for that? > > > > I am not able to dump this directly in pg_dump. Is it possible? > > Sure, see below: > > $ ./bin/psql filter > psql (15devel) > Type "help" for help. > > I didn't ask on this > > I asked if you can use -t and some for filtering this name? I didn't try as I don't see how that's relevant? Surely we're not limiting the capabilities of a filtering file format based on the quoting semantics of a shell? -- Daniel Gustafsson https://vmware.com/
Greetings,
On Fri, Sep 17, 2021 at 14:07 Daniel Gustafsson <daniel@yesql.se> wrote:
> On 17 Sep 2021, at 13:59, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> pá 17. 9. 2021 v 13:56 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> napsal:
> > On 17 Sep 2021, at 13:51, Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>> wrote:
> > pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se> <mailto:daniel@yesql.se <mailto:daniel@yesql.se>>> napsal:
> > I am unable to write a filter statement which can
> > handle this relname:
> >
> > CREATE TABLE "a""
> > ""b" (a integer);
> >
> > Are you able to craft one for that?
> >
> > I am not able to dump this directly in pg_dump. Is it possible?
>
> Sure, see below:
>
> $ ./bin/psql filter
> psql (15devel)
> Type "help" for help.
>
> I didn't ask on this
>
> I asked if you can use -t and some for filtering this name?
I didn't try as I don't see how that's relevant? Surely we're not limiting the
capabilities of a filtering file format based on the quoting semantics of a
shell?
Yeah, agreed. I would think that a DBA might specifically want to be able to use a config file to get away from having to deal with shell quoting, in fact…
Thanks,
Stephen
pá 17. 9. 2021 v 14:07 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 17 Sep 2021, at 13:59, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> pá 17. 9. 2021 v 13:56 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> napsal:
> > On 17 Sep 2021, at 13:51, Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>> wrote:
> > pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se> <mailto:daniel@yesql.se <mailto:daniel@yesql.se>>> napsal:
> > I am unable to write a filter statement which can
> > handle this relname:
> >
> > CREATE TABLE "a""
> > ""b" (a integer);
> >
> > Are you able to craft one for that?
> >
> > I am not able to dump this directly in pg_dump. Is it possible?
>
> Sure, see below:
>
> $ ./bin/psql filter
> psql (15devel)
> Type "help" for help.
>
> I didn't ask on this
>
> I asked if you can use -t and some for filtering this name?
I didn't try as I don't see how that's relevant? Surely we're not limiting the
capabilities of a filtering file format based on the quoting semantics of a
shell?
this patch just use existing functionality, that can be buggy too.
but I had a bug in this part - if I detect double double quotes on input I have to send double quotes to output too.
It should be fixed in attached patch
[pavel@localhost pg_dump]$ echo 'include table "a""\n""b"' | ./pg_dump --filter=-
--
-- PostgreSQL database dump
--
-- Dumped from database version 15devel
-- Dumped by pg_dump version 15devel
--
-- PostgreSQL database dump
--
-- Dumped from database version 15devel
-- Dumped by pg_dump version 15devel
--
Daniel Gustafsson https://vmware.com/
Attachment
>
> Initially not, but now, when I am thinking about it, I don't think so Bison helps. The syntax of the filter file is nicely linear. Now, the code of the parser is a little bit larger than minimalistic, but it is due to nicer error's messages. The raw implementation in Bison raised just "syntax error" and positions. I did code refactoring, and now the scanning, parsing and processing are divided into separated routines. Parsing related code has 90 lines. In this case, I don't think using a parser grammar file can carry any benefit. grammar is more readable, sure, but we need to include bison, we need to handle errors, and if we want to raise more helpful errors than just "syntax error", then the code will be longer.
I'm not so concerned by code size, but rather parsing of quotations etc and
being able to reason about it's correctness. IMHO that's easier done by
reading a defined grammar than parsing a handwritten parser.
In this case the complex part is not a parser, but the scanner is complex and writing this in flex is not too easy. I wrote so the grammar file can be more readable, but the usual error from Bison is "syntax error" and position, so it does not win from the user perspective. When a parser is not linear, then a generated parser can help a lot, but using it at this moment is premature.
Will do a closer review on the patch shortly.
--
Daniel Gustafsson https://vmware.com/
> Will do a closer review on the patch shortly. Had a read through, and tested, the latest posted version today: + Read objects filters from the specified file. Specify "-" to read from + stdin. Lines of this file must have the following format: I think this should be <filename>-</filename> and <literal>STDIN</literal> to match the rest of the docs. + <para> + With the following filter file, the dump would include table + <literal>mytable1</literal> and data from foreign tables of + <literal>some_foreign_server</literal> foreign server, but exclude data + from table <literal>mytable2</literal>. +<programlisting> +include table mytable1 +include foreign_data some_foreign_server +exclude table mytable2 +</programlisting> + </para> This example is highlighting the issue I've previously raised with the UX/doc of this feature. The "exclude table mytable2" is totally pointless in the above since the exact match of "mytable1" will remove all other objects. What we should be doing instead is use the pattern matching aspect along the lines of the below: include table mytable* exclude table mytable2 + The <option>--filter</option> option works just like the other + options to include or exclude tables, schemas, table data, or foreign This should refer to the actual options by name to make it clear which we are talking about. + printf(_(" --filter=FILENAME dump objects and data based on the filter expressions\n" + " from the filter file\n")); Before we settle on --filter I think we need to conclude whether this file is intended to be included from a config file, or used on it's own. If we gow tih the former then we might not want a separate option for just --filter. + if (filter_is_keyword(keyword, size, "include")) I would prefer if this function call was replaced by just the pg_strcasecmp() call in filter_is_keyword() and the strlen optimization there removed. The is not a hot-path, we can afford the string comparison in case of errors. Having the string comparison done inline here will improve readability saving the reading from jumping to another function to see what it does. + initStringInfo(&line); Why is this using a StringInfo rather than a PQExpBuffer as the rest of pg_dump does? +typedef struct I think these should be at the top of the file with the other typedefs. When testing strange object names, I was unable to express this name in the filter file: $ ./bin/psql psql (15devel) Type "help" for help. danielg=# create table " danielg"# t danielg"# t danielg"# " (a integer); CREATE TABLE danielg=# select relname from pg_class order by oid desc limit 1; relname --------- + t + t + (1 row) -- Daniel Gustafsson https://vmware.com/
po 20. 9. 2021 v 14:10 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> Will do a closer review on the patch shortly.
Had a read through, and tested, the latest posted version today:
+ Read objects filters from the specified file. Specify "-" to read from
+ stdin. Lines of this file must have the following format:
I think this should be <filename>-</filename> and <literal>STDIN</literal> to
match the rest of the docs.
+ <para>
+ With the following filter file, the dump would include table
+ <literal>mytable1</literal> and data from foreign tables of
+ <literal>some_foreign_server</literal> foreign server, but exclude data
+ from table <literal>mytable2</literal>.
+<programlisting>
+include table mytable1
+include foreign_data some_foreign_server
+exclude table mytable2
+</programlisting>
+ </para>
This example is highlighting the issue I've previously raised with the UX/doc
of this feature. The "exclude table mytable2" is totally pointless in the
above since the exact match of "mytable1" will remove all other objects. What
we should be doing instead is use the pattern matching aspect along the lines
of the below:
include table mytable*
exclude table mytable2
+ The <option>--filter</option> option works just like the other
+ options to include or exclude tables, schemas, table data, or foreign
This should refer to the actual options by name to make it clear which we are
talking about.
fixed
+ printf(_(" --filter=FILENAME dump objects and data based on the filter expressions\n"
+ " from the filter file\n"));
Before we settle on --filter I think we need to conclude whether this file is
intended to be included from a config file, or used on it's own. If we gow tih
the former then we might not want a separate option for just --filter.
I prefer to separate two files. Although there is some intersection, I think it is good to have two simple separate files for two really different tasks.
It does filtering, and it should be controlled by option "--filter". When the implementation will be changed, then this option can be changed too.
Filtering is just a pg_dump related feature. Revision of client application configuration is a much more generic task, and if we mix it to one, we can be
in a trap. It can be hard to find one good format for large script generated content, and possibly hand written structured content. For practical
reasons it can be good to have two files too. Filters and configurations can have different life cycles.
+ if (filter_is_keyword(keyword, size, "include"))
I would prefer if this function call was replaced by just the pg_strcasecmp()
call in filter_is_keyword() and the strlen optimization there removed. The is
not a hot-path, we can afford the string comparison in case of errors. Having
the string comparison done inline here will improve readability saving the
reading from jumping to another function to see what it does.
I agree that this is not a hot-path, just I don't feel well if I need to make a zero end string just for comparison pg_strcasecmp. Current design reduces malloc/free cycles. It is used in more places, when Postgres parses strings - SQL parser, plpgsql parser. I am not sure about the benefits and costs - pg_strcasecmp can be more readable, but for any keyword I have to call pstrdup and pfree. Is it necessary? My opinion in this part is not too strong - it is a minor issue, maybe I have a little bit different feelings about benefits and costs in this specific case, and if you really think the benefits of rewriting are higher, I'll do it.
+ initStringInfo(&line);
Why is this using a StringInfo rather than a PQExpBuffer as the rest of pg_dump
does?
The StringInfo is used because I use the pg_get_line_buf function, and this function uses this API.
+typedef struct
I think these should be at the top of the file with the other typedefs.
done
When testing strange object names, I was unable to express this name in the filter file:
$ ./bin/psql
psql (15devel)
Type "help" for help.
danielg=# create table "
danielg"# t
danielg"# t
danielg"# " (a integer);
CREATE TABLE
danielg=# select relname from pg_class order by oid desc limit 1;
relname
---------
+
t +
t +
(1 row)
Good catch - I had badly placed pg_strip_crlf function, fixed and regress tests enhanced
Please check assigned patch
Regards
Pavel
Attachment
> On 21 Sep 2021, at 08:50, Pavel Stehule <pavel.stehule@gmail.com> wrote: > po 20. 9. 2021 v 14:10 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> napsal: > + printf(_(" --filter=FILENAME dump objects and data based on the filter expressions\n" > + " from the filter file\n")); > Before we settle on --filter I think we need to conclude whether this file is > intended to be included from a config file, or used on it's own. If we gow tih > the former then we might not want a separate option for just --filter. > > I prefer to separate two files. Although there is some intersection, I think it is good to have two simple separate filesfor two really different tasks. > It does filtering, and it should be controlled by option "--filter". When the implementation will be changed, then thisoption can be changed too. > Filtering is just a pg_dump related feature. Revision of client application configuration is a much more generic task,and if we mix it to one, we can be > in a trap. It can be hard to find one good format for large script generated content, and possibly hand written structuredcontent. For practical > reasons it can be good to have two files too. Filters and configurations can have different life cycles. I'm not convinced that we can/should change or remove a commandline parameter in a coming version when there might be scripts expecting it to work in a specific way. Having a --filter as well as a --config where the configfile can refer to the filterfile also passed via --filter sounds like problem waiting to happen, so I think we need to settle how we want to interact with this file before anything goes in. Any thoughts from those in the thread who have had strong opinions on config files etc? > + if (filter_is_keyword(keyword, size, "include")) > I would prefer if this function call was replaced by just the pg_strcasecmp() > call in filter_is_keyword() and the strlen optimization there removed. The is > not a hot-path, we can afford the string comparison in case of errors. Having > the string comparison done inline here will improve readability saving the > reading from jumping to another function to see what it does. > > I agree that this is not a hot-path, just I don't feel well if I need to make a zero end string just for comparison pg_strcasecmp.Current design reduces malloc/free cycles. It is used in more places, when Postgres parses strings - SQL parser,plpgsql parser. I am not sure about the benefits and costs - pg_strcasecmp can be more readable, but for any keywordI have to call pstrdup and pfree. Is it necessary? My opinion in this part is not too strong - it is a minor issue,maybe I have a little bit different feelings about benefits and costs in this specific case, and if you really thinkthe benefits of rewriting are higher, I'll do it Sorry, I typoed my response. What I meant was to move the pg_strncasecmp call inline and not do the strlen check, to save readers from jumping around. So basically end up with the below in read_filter_item(): + /* Now we expect sequence of two keywords */ + if (pg_strncasecmp(keyword, "include", size) == 0) + *is_include = true; > + initStringInfo(&line); > Why is this using a StringInfo rather than a PQExpBuffer as the rest of pg_dump > does? > > The StringInfo is used because I use the pg_get_line_buf function, and this function uses this API. Ah, of course. A few other comments from another pass over this: + exit_nicely(-1); Why -1? pg_dump (and all other binaries) exits with 1 on IMO even more serious errors so I think this should use 1 as well. + if (!pg_get_line_buf(fstate->fp, line)) + { + if (ferror(fstate->fp)) + fatal("could not read from file \"%s\": %m", fstate->filename); + + exit_invalid_filter_format(fstate,"unexpected end of file"); + } In the ferror() case this codepath isn't running fclose() on the file pointer (unless stdin) which we do elsewhere, so this should use pg_log_error and exit_nicely instead. + pg_log_fatal("could not read from file \"%s\": %m", fstate->filename); Based on how other errors are treated in pg_dump I think this should be downgraded to a pg_log_error. The above comments are fixed in the attached, as well as a pass over the docs and extended tests to actually test matching a foreign server. What do think about this version? I'm still not convinced that there aren't more quoting bugs in the parser, but I've left that intact for now. -- Daniel Gustafsson https://vmware.com/
Attachment
út 21. 9. 2021 v 14:37 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 21 Sep 2021, at 08:50, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> po 20. 9. 2021 v 14:10 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> napsal:
> + printf(_(" --filter=FILENAME dump objects and data based on the filter expressions\n"
> + " from the filter file\n"));
> Before we settle on --filter I think we need to conclude whether this file is
> intended to be included from a config file, or used on it's own. If we gow tih
> the former then we might not want a separate option for just --filter.
>
> I prefer to separate two files. Although there is some intersection, I think it is good to have two simple separate files for two really different tasks.
> It does filtering, and it should be controlled by option "--filter". When the implementation will be changed, then this option can be changed too.
> Filtering is just a pg_dump related feature. Revision of client application configuration is a much more generic task, and if we mix it to one, we can be
> in a trap. It can be hard to find one good format for large script generated content, and possibly hand written structured content. For practical
> reasons it can be good to have two files too. Filters and configurations can have different life cycles.
I'm not convinced that we can/should change or remove a commandline parameter
in a coming version when there might be scripts expecting it to work in a
specific way. Having a --filter as well as a --config where the configfile can
refer to the filterfile also passed via --filter sounds like problem waiting to
happen, so I think we need to settle how we want to interact with this file
before anything goes in.
Any thoughts from those in the thread who have had strong opinions on config
files etc?
> + if (filter_is_keyword(keyword, size, "include"))
> I would prefer if this function call was replaced by just the pg_strcasecmp()
> call in filter_is_keyword() and the strlen optimization there removed. The is
> not a hot-path, we can afford the string comparison in case of errors. Having
> the string comparison done inline here will improve readability saving the
> reading from jumping to another function to see what it does.
>
> I agree that this is not a hot-path, just I don't feel well if I need to make a zero end string just for comparison pg_strcasecmp. Current design reduces malloc/free cycles. It is used in more places, when Postgres parses strings - SQL parser, plpgsql parser. I am not sure about the benefits and costs - pg_strcasecmp can be more readable, but for any keyword I have to call pstrdup and pfree. Is it necessary? My opinion in this part is not too strong - it is a minor issue, maybe I have a little bit different feelings about benefits and costs in this specific case, and if you really think the benefits of rewriting are higher, I'll do it
Sorry, I typoed my response. What I meant was to move the pg_strncasecmp call
inline and not do the strlen check, to save readers from jumping around. So
basically end up with the below in read_filter_item():
+ /* Now we expect sequence of two keywords */
+ if (pg_strncasecmp(keyword, "include", size) == 0)
+ *is_include = true;
I don't think so it is safe (strict). Only pg_strncasecmp(..) is true for keywords "includex", "includedsss", ... You should to compare the size
Regards
Pavel
I definitely agree that we should have two files, one for config and another one for filter, since their purposes are orthogonal and their formats are likely different; trying to cram the filter specification in the config file seems unfriendly because it'd force users to write the filter in whatever alien grammar used for the config file. Also, this would make it easier to use a single config file with a bunch of different filter files. On 2021-Sep-21, Daniel Gustafsson wrote: > I'm not convinced that we can/should change or remove a commandline parameter > in a coming version when there might be scripts expecting it to work in a > specific way. Having a --filter as well as a --config where the configfile can > refer to the filterfile also passed via --filter sounds like problem waiting to > happen, so I think we need to settle how we want to interact with this file > before anything goes in. I think both the filter and the hypothetical config file are going to interact (be redundant) with almost all already existing switches, and there's no need to talk about removing anything (e.g., nobody would argue for the removal of "-t" even though that's redundant with the filter file). I see no problem with the config file specifying a filter file. AFAICS if the config file specifies a filter and the user also specifies a filter in the command line, we have two easy options: raise an error about the redundant option, or have the command line option supersede the one in the config file. The latter strikes me as the more useful behavior, and it's in line with what other tools do in similar cases, so that's what I propose doing. (There might be less easy options too, such as somehow combining the two filters, but offhand I don't see any reason why this is real-world useful, so I don't propose doing that.) -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "How amazing is that? I call it a night and come back to find that a bug has been identified and patched while I sleep." (Robert Davidson) http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php
On 9/21/21 3:28 PM, Alvaro Herrera wrote: > I definitely agree that we should have two files, one for config and > another one for filter, since their purposes are orthogonal and their > formats are likely different; trying to cram the filter specification in > the config file seems unfriendly because it'd force users to write the > filter in whatever alien grammar used for the config file. Also, this > would make it easier to use a single config file with a bunch of > different filter files. > +1, that is pretty much excatly what I argued for not too long ago. > On 2021-Sep-21, Daniel Gustafsson wrote: > >> I'm not convinced that we can/should change or remove a commandline parameter >> in a coming version when there might be scripts expecting it to work in a >> specific way. Having a --filter as well as a --config where the configfile can >> refer to the filterfile also passed via --filter sounds like problem waiting to >> happen, so I think we need to settle how we want to interact with this file >> before anything goes in. > > I think both the filter and the hypothetical config file are going to > interact (be redundant) with almost all already existing switches, and > there's no need to talk about removing anything (e.g., nobody would > argue for the removal of "-t" even though that's redundant with the > filter file). > > I see no problem with the config file specifying a filter file. > > AFAICS if the config file specifies a filter and the user also specifies > a filter in the command line, we have two easy options: raise an error > about the redundant option, or have the command line option supersede > the one in the config file. The latter strikes me as the more useful > behavior, and it's in line with what other tools do in similar cases, so > that's what I propose doing. > > (There might be less easy options too, such as somehow combining the two > filters, but offhand I don't see any reason why this is real-world > useful, so I don't propose doing that.) > Well, I think we already have to do decisions like that, because you can do e.g. this: pg_dump -T t -t t So we already do combine the switches, and we do this: When both -t and -T are given, the behavior is to dump just the tables that match at least one -t switch but no -T switches. If -T appears without -t, then tables matching -T are excluded from what is otherwise a normal dump. That seems fairly reasonable, and I don't see why not to use the same logic for combining patterns no matter where we got them (filter file, command-line option, etc.). Just combine everything, and then check if there's any "exclude" rule. If yes, we're done - exclude. If not, check if there's "include" rule. If not, still exclude. Otherwise include. Seems reasonable and consistent to me, and I don't see why not to allow multiple --filter parameters. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi
The above comments are fixed in the attached, as well as a pass over the docs
and extended tests to actually test matching a foreign server. What do think
about this version? I'm still not convinced that there aren't more quoting
bugs in the parser, but I've left that intact for now.
The problematic points are double quotes and new line char. Any other is just in sequence of bytes.
I have just one note to your patch. When you use pg_strncasecmp, then you have to check the size too
char *xxx = "incl";
int xxx_size = 4;
elog(NOTICE, ">>>>%d<<<<",
pg_strncasecmp(xxx, "include", xxx_size) == 0);
int xxx_size = 4;
elog(NOTICE, ">>>>%d<<<<",
pg_strncasecmp(xxx, "include", xxx_size) == 0);
result is NOTICE: >>>>1<<<<
"incl" is not keyword "include"
Regards
Pavel
--
Daniel Gustafsson https://vmware.com/
Hi
The above comments are fixed in the attached, as well as a pass over the docs
and extended tests to actually test matching a foreign server. What do think
about this version? I'm still not convinced that there aren't more quoting
bugs in the parser, but I've left that intact for now.
This patch is based on the version that you sent 21.9. Just I modified string comparison in keyword detection. If we don't allow support abbreviations of keywords (and I dislike it), then the check of size is necessary. Any other is without change.
Regards
Pavel
--
Daniel Gustafsson https://vmware.com/
Attachment
I took another pass over this today and touched up the documentation (docs and code) as well as tweaked the code a bit here and there to both make it fit the pg_dump style better and to clean up a few small things. I've also added a set of additional tests to cover more of the functionality. I'm still not happy with the docs, I need to take another look there and see if I make them more readable but otherwise I don't think there are any open issues with this. As has been discussed upthread, this format strikes a compromise wrt simplicity and doesn't preclude adding a more structured config file in the future should we want that. I think this takes care of most comments and opinions made in this thread. -- Daniel Gustafsson https://vmware.com/
Attachment
pá 1. 10. 2021 v 15:19 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
I took another pass over this today and touched up the documentation (docs and
code) as well as tweaked the code a bit here and there to both make it fit the
pg_dump style better and to clean up a few small things. I've also added a set
of additional tests to cover more of the functionality.
I'm still not happy with the docs, I need to take another look there and see if
I make them more readable but otherwise I don't think there are any open issues
with this.
As has been discussed upthread, this format strikes a compromise wrt simplicity
and doesn't preclude adding a more structured config file in the future should
we want that. I think this takes care of most comments and opinions made in
this thread.
It looks well.
Thank you
Pavel
--
Daniel Gustafsson https://vmware.com/
On 10/1/21 3:19 PM, Daniel Gustafsson wrote: > > As has been discussed upthread, this format strikes a compromise wrt simplicity > and doesn't preclude adding a more structured config file in the future should > we want that. I think this takes care of most comments and opinions made in > this thread. > > -- > Daniel Gustafsson https://vmware.com/ > Hi, If you try to dump/restore a foreign file from a file_fdw server, the restore step will complain and thus leave the returnvalue nonzero. The foreign table will be there, with complete 'data'. A complete runnable exampe is a lot of work; I hope the below bits of input and output makes the problem clear. Main thing: the pg_restore contains 2 ERROR lines like: pg_restore: error: COPY failed for table "ireise1": ERROR: cannot insert into foreign table "ireise1" ---------------------- From the test bash: echo " include table table0 # ok public include table test.table1 # include foreign_data goethe # foreign server 'goethe' (file_fdw) include table gutenberg.ireise1 # foreign table include table gutenberg.ireise2 # foreign table " > inputfile1.txt pg_dump --create -Fc -c -p $port -d $db1 -f dump1 --filter=inputfile1.txt echo # prepare for restore server_name=goethe echo "create schema if not exists test;" | psql -qaXd $db2 echo "create schema if not exists gutenberg;" | psql -qaXd $db2 echo "create server if not exists $server_name foreign data wrapper file_fdw " \ | psql -qaXd $db2 echo "-- pg_restore --if-exists -cvd $db2 dump1 " pg_restore --if-exists -cvd $db2 dump1 rc=$? echo "-- rc [$rc]" - echo ---------------------- from the output: -- pg_dump --create -Fc -c -p 6969 -d testdb1 -f dump1 --filter=inputfile1.txt -- pg_restore --if-exists -cvd testdb2 dump1 pg_restore: connecting to database for restore pg_restore: dropping TABLE table1 pg_restore: dropping TABLE table0 pg_restore: dropping FOREIGN TABLE ireise2 pg_restore: dropping FOREIGN TABLE ireise1 pg_restore: creating FOREIGN TABLE "gutenberg.ireise1" pg_restore: creating COMMENT "gutenberg.FOREIGN TABLE ireise1" pg_restore: creating FOREIGN TABLE "gutenberg.ireise2" pg_restore: creating COMMENT "gutenberg.FOREIGN TABLE ireise2" pg_restore: creating TABLE "public.table0" pg_restore: creating TABLE "test.table1" pg_restore: processing data for table "gutenberg.ireise1" pg_restore: while PROCESSING TOC: pg_restore: from TOC entry 5570; 0 23625 TABLE DATA ireise1 aardvark pg_restore: error: COPY failed for table "ireise1": ERROR: cannot insert into foreign table "ireise1" pg_restore: processing data for table "gutenberg.ireise2" pg_restore: from TOC entry 5571; 0 23628 TABLE DATA ireise2 aardvark pg_restore: error: COPY failed for table "ireise2": ERROR: cannot insert into foreign table "ireise2" pg_restore: processing data for table "public.table0" pg_restore: processing data for table "test.table1" pg_restore: warning: errors ignored on restore: 2 -- rc [1] --------- A second, separate practical hickup is that schema's are not restored from the dumped $schema.$table includes -- but this can be worked around; for my inputfile1.txt I had to run separately (as seen above, before running the pg_restore): create schema if not exists test; create schema if not exists gutenberg; create server if not exists goethe foreign data wrapper file_fdw; A bit annoying but still maybe all right. Thanks, Erik Rijkers
On 10/1/21 6:19 PM, Erik Rijkers wrote: > On 10/1/21 3:19 PM, Daniel Gustafsson wrote: >> >> As has been discussed upthread, this format strikes a compromise wrt >> simplicity >> and doesn't preclude adding a more structured config file in the >> future should > > If you try to dump/restore a foreign file from a file_fdw server, the > restore step will complain and thus leave the returnvalue nonzero. The > foreign table will be there, with complete 'data'. > > A complete runnable exampe is a lot of work; I hope the below bits of > input and output makes the problem clear. Main thing: the pg_restore > contains 2 ERROR lines like: > > pg_restore: error: COPY failed for table "ireise1": ERROR: cannot > insert into foreign table "ireise1" Further testing makes clear that the file_fdw-addressing line include foreign_data goethe was the culprit: it causes a COPY which of course fails in a readonly wrapper like file_fdw. Without that line it works (because I run the restore on the same machine so the underlying file_fdw .txt files are there for testdb2 too) So the issue is not as serious as it seemed. The complaint remaining is only that this could somehow be documented better. I attach a running example (careful, it deletes stuff) of the original ERROR-producing bash (remove the 'include foreign_data' line from the input file to run it without error). thanks, Erik Rijkers
Attachment
> On 2 Oct 2021, at 08:18, Erik Rijkers <er@xs4all.nl> wrote: > So the issue is not as serious as it seemed. This is also not related to this patch in any way, or am I missing a point here? This can just as well be achieved without this patch. > The complaint remaining is only that this could somehow be documented better. The pg_dump documentation today have a large highlighted note about this: "When --include-foreign-data is specified, pg_dump does not check that the foreign table is writable. Therefore, there is no guarantee that the results of a foreign table dump can be successfully restored." This was extensively discussed [0] when this went in, is there additional documentation you'd like to see for this? -- Daniel Gustafsson https://vmware.com/ [0] https://postgr.es/m/LEJPR01MB0185483C0079D2F651B16231E7FC0@LEJPR01MB0185.DEUPRD01.PROD.OUTLOOK.DE
> On 1 Oct 2021, at 15:19, Daniel Gustafsson <daniel@yesql.se> wrote: > I'm still not happy with the docs, I need to take another look there and see if > I make them more readable but otherwise I don't think there are any open issues > with this. Attached is a rebased version which has rewritten docs which I think are more in line with the pg_dump documentation. I've also added tests for --strict-names operation, as well subjected it to pgindent and pgperltidy. Unless there are objections, I think this is pretty much ready to go in. -- Daniel Gustafsson https://vmware.com/
Attachment
út 5. 10. 2021 v 14:30 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 1 Oct 2021, at 15:19, Daniel Gustafsson <daniel@yesql.se> wrote:
> I'm still not happy with the docs, I need to take another look there and see if
> I make them more readable but otherwise I don't think there are any open issues
> with this.
Attached is a rebased version which has rewritten docs which I think are more
in line with the pg_dump documentation. I've also added tests for
--strict-names operation, as well subjected it to pgindent and pgperltidy.
Unless there are objections, I think this is pretty much ready to go in.
great, thank you
Pavel
--
Daniel Gustafsson https://vmware.com/
Op 05-10-2021 om 14:30 schreef Daniel Gustafsson: > > Unless there are objections, I think this is pretty much ready to go in. Agreed. One typo: 'This keyword can only be with the exclude keyword.' should be 'This keyword can only be used with the exclude keyword.' thanks, Erik Rijkers > > -- > Daniel Gustafsson https://vmware.com/ >
Hi
út 5. 10. 2021 v 14:30 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 1 Oct 2021, at 15:19, Daniel Gustafsson <daniel@yesql.se> wrote:
> I'm still not happy with the docs, I need to take another look there and see if
> I make them more readable but otherwise I don't think there are any open issues
> with this.
Attached is a rebased version which has rewritten docs which I think are more
in line with the pg_dump documentation. I've also added tests for
--strict-names operation, as well subjected it to pgindent and pgperltidy.
Unless there are objections, I think this is pretty much ready to go in.
I am sending a rebased version of patch pg_dump-filteropt-20211005.patch with fixed regress tests and fixed documentation (reported by Erik).
I found another issue - the stringinfo line used in filter_get_pattern was released too early - the line (memory) was used later in check of unexpected
chars after pattern string. I fixed it by moving this stringinfo buffer to fstate structure. It can be shared by all routines, and it can be safely released at
an end of filter processing, where we are sure, so these data can be free.
Regards
Pavel
--
Daniel Gustafsson https://vmware.com/
Attachment
Hi
fresh rebase
Regards
Pavel
Attachment
On 2022-04-25 Mo 13:39, Pavel Stehule wrote: > Hi > > fresh rebase > > If we're going to do this for pg_dump's include/exclude options, shouldn't we also provide an equivalent facility for pg_dumpall's --exclude-database option? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
st 13. 7. 2022 v 22:49 odesílatel Andrew Dunstan <andrew@dunslane.net> napsal:
On 2022-04-25 Mo 13:39, Pavel Stehule wrote:
> Hi
>
> fresh rebase
>
>
If we're going to do this for pg_dump's include/exclude options,
shouldn't we also provide an equivalent facility for pg_dumpall's
--exclude-database option?
It has sense
Regards
Pavel
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Hi
čt 14. 7. 2022 v 6:54 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
st 13. 7. 2022 v 22:49 odesílatel Andrew Dunstan <andrew@dunslane.net> napsal:
On 2022-04-25 Mo 13:39, Pavel Stehule wrote:
> Hi
>
> fresh rebase
>
>
If we're going to do this for pg_dump's include/exclude options,
shouldn't we also provide an equivalent facility for pg_dumpall's
--exclude-database option?It has sense
The attached patch implements the --filter option for pg_dumpall and for pg_restore too.
Regards
Pavel
Attachment
Thanks for updating the patch. This failed to build on windows. http://cfbot.cputube.org/pavel-stehule.html Some more comments inline. On Sun, Jul 17, 2022 at 08:20:47AM +0200, Pavel Stehule wrote: > The attached patch implements the --filter option for pg_dumpall and for > pg_restore too. > diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml > index 5efb442b44..ba2920dbee 100644 > --- a/doc/src/sgml/ref/pg_dump.sgml > +++ b/doc/src/sgml/ref/pg_dump.sgml > @@ -779,6 +779,80 @@ PostgreSQL documentation > </listitem> > </varlistentry> > > + <varlistentry> > + <term><option>--filter=<replaceable class="parameter">filename</replaceable></option></term> > + <listitem> > + <para> > + Specify a filename from which to read patterns for objects to include > + or exclude from the dump. The patterns are interpreted according to the > + same rules as the corresponding options: > + <option>-t</option>/<option>--table</option> for tables, > + <option>-n</option>/<option>--schema</option> for schemas, > + <option>--include-foreign-data</option> for data on foreign servers and > + <option>--exclude-table-data</option> for table data. > + To read from <literal>STDIN</literal> use <filename>-</filename> as the STDIN comma > + <para> > + Lines starting with <literal>#</literal> are considered comments and > + are ignored. Comments can be placed after filter as well. Blank lines change "are ignored" to "ignored", I think. > diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml > index 8a081f0080..137491340c 100644 > --- a/doc/src/sgml/ref/pg_dumpall.sgml > +++ b/doc/src/sgml/ref/pg_dumpall.sgml > @@ -122,6 +122,29 @@ PostgreSQL documentation > </listitem> > </varlistentry> > > + <varlistentry> > + <term><option>--filter=<replaceable class="parameter">filename</replaceable></option></term> > + <listitem> > + <para> > + Specify a filename from which to read patterns for databases excluded > + from dump. The patterns are interpretted according to the same rules > + like <option>--exclude-database</option>. same rules *as* > + To read from <literal>STDIN</literal> use <filename>-</filename> as the comma > + filename. The <option>--filter</option> option can be specified in > + conjunction with the above listed options for including or excluding For dumpall, remove "for including or" change "above listed options" to "exclude-database" ? > diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml > index 526986eadb..5f16c4a333 100644 > --- a/doc/src/sgml/ref/pg_restore.sgml > +++ b/doc/src/sgml/ref/pg_restore.sgml > @@ -188,6 +188,31 @@ PostgreSQL documentation > </listitem> > </varlistentry> > > + <varlistentry> > + <term><option>--filter=<replaceable class="parameter">filename</replaceable></option></term> > + <listitem> > + <para> > + Specify a filename from which to read patterns for objects excluded > + or included from restore. The patterns are interpretted according to the > + same rules like <option>--schema</option>, <option>--exclude-schema</option>, s/like/as/ > + <option>--function</option>, <option>--index</option>, <option>--table</option> > + or <option>--trigger</option>. > + To read from <literal>STDIN</literal> use <filename>-</filename> as the STDIN comma > +/* > + * filter_get_keyword - read the next filter keyword from buffer > + * > + * Search for keywords (limited to containing ascii alphabetic characters) in remove "containing" > + /* > + * If the object name pattern has been quoted we must take care parse out > + * the entire quoted pattern, which may contain whitespace and can span > + * over many lines. quoted comma *to parse remove "over" > + * The pattern is either simple without any whitespace, or properly quoted double space > + * in case there is whitespace in the object name. The pattern handling follows s/is/may be/ > + if (size == 7 && pg_strncasecmp(keyword, "include", 7) == 0) > + *is_include = true; > + else if (size == 7 && pg_strncasecmp(keyword, "exclude", 7) == 0) > + *is_include = false; Can't you write strncasecmp(keyword, "include", size) to avoid hardcoding "7" ? > + > + if (size == 4 && pg_strncasecmp(keyword, "data", 4) == 0) > + *objtype = FILTER_OBJECT_TYPE_DATA; > + else if (size == 8 && pg_strncasecmp(keyword, "database", 8) == 0) > + *objtype = FILTER_OBJECT_TYPE_DATABASE; > + else if (size == 12 && pg_strncasecmp(keyword, "foreign_data", 12) == 0) > + *objtype = FILTER_OBJECT_TYPE_FOREIGN_DATA; > + else if (size == 8 && pg_strncasecmp(keyword, "function", 8) == 0) > + *objtype = FILTER_OBJECT_TYPE_FUNCTION; > + else if (size == 5 && pg_strncasecmp(keyword, "index", 5) == 0) > + *objtype = FILTER_OBJECT_TYPE_INDEX; > + else if (size == 6 && pg_strncasecmp(keyword, "schema", 6) == 0) > + *objtype = FILTER_OBJECT_TYPE_SCHEMA; > + else if (size == 5 && pg_strncasecmp(keyword, "table", 5) == 0) > + *objtype = FILTER_OBJECT_TYPE_TABLE; > + else if (size == 7 && pg_strncasecmp(keyword, "trigger", 7) == 0) > + *objtype = FILTER_OBJECT_TYPE_TRIGGER; Avoid hardcoding these constants. > diff --git a/src/bin/pg_dump/filter.h b/src/bin/pg_dump/filter.h > new file mode 100644 > index 0000000000..e4a1a74b10 > --- /dev/null > +++ b/src/bin/pg_dump/filter.h ... > \ No newline at end of file :(
ne 17. 7. 2022 v 16:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
Thanks for updating the patch.
This failed to build on windows.
http://cfbot.cputube.org/pavel-stehule.html
Yes, there was a significant problem with the function exit_nicely, that is differently implemented in pg_dump and pg_dumpall.
Some more comments inline.
On Sun, Jul 17, 2022 at 08:20:47AM +0200, Pavel Stehule wrote:
> The attached patch implements the --filter option for pg_dumpall and for
> pg_restore too.
> diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
> index 5efb442b44..ba2920dbee 100644
> --- a/doc/src/sgml/ref/pg_dump.sgml
> +++ b/doc/src/sgml/ref/pg_dump.sgml
> @@ -779,6 +779,80 @@ PostgreSQL documentation
> </listitem>
> </varlistentry>
>
> + <varlistentry>
> + <term><option>--filter=<replaceable class="parameter">filename</replaceable></option></term>
> + <listitem>
> + <para>
> + Specify a filename from which to read patterns for objects to include
> + or exclude from the dump. The patterns are interpreted according to the
> + same rules as the corresponding options:
> + <option>-t</option>/<option>--table</option> for tables,
> + <option>-n</option>/<option>--schema</option> for schemas,
> + <option>--include-foreign-data</option> for data on foreign servers and
> + <option>--exclude-table-data</option> for table data.
> + To read from <literal>STDIN</literal> use <filename>-</filename> as the
STDIN comma
fixed
> + <para>
> + Lines starting with <literal>#</literal> are considered comments and
> + are ignored. Comments can be placed after filter as well. Blank lines
change "are ignored" to "ignored", I think.
changed
> diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
> index 8a081f0080..137491340c 100644
> --- a/doc/src/sgml/ref/pg_dumpall.sgml
> +++ b/doc/src/sgml/ref/pg_dumpall.sgml
> @@ -122,6 +122,29 @@ PostgreSQL documentation
> </listitem>
> </varlistentry>
>
> + <varlistentry>
> + <term><option>--filter=<replaceable class="parameter">filename</replaceable></option></term>
> + <listitem>
> + <para>
> + Specify a filename from which to read patterns for databases excluded
> + from dump. The patterns are interpretted according to the same rules
> + like <option>--exclude-database</option>.
same rules *as*
fixed
> + To read from <literal>STDIN</literal> use <filename>-</filename> as the
comma
fixed
> + filename. The <option>--filter</option> option can be specified in
> + conjunction with the above listed options for including or excluding
For dumpall, remove "for including or"
change "above listed options" to "exclude-database" ?
fixed
> diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
> index 526986eadb..5f16c4a333 100644
> --- a/doc/src/sgml/ref/pg_restore.sgml
> +++ b/doc/src/sgml/ref/pg_restore.sgml
> @@ -188,6 +188,31 @@ PostgreSQL documentation
> </listitem>
> </varlistentry>
>
> + <varlistentry>
> + <term><option>--filter=<replaceable class="parameter">filename</replaceable></option></term>
> + <listitem>
> + <para>
> + Specify a filename from which to read patterns for objects excluded
> + or included from restore. The patterns are interpretted according to the
> + same rules like <option>--schema</option>, <option>--exclude-schema</option>,
s/like/as/
changed
> + <option>--function</option>, <option>--index</option>, <option>--table</option>
> + or <option>--trigger</option>.
> + To read from <literal>STDIN</literal> use <filename>-</filename> as the
STDIN comma
fixed
> +/*
> + * filter_get_keyword - read the next filter keyword from buffer
> + *
> + * Search for keywords (limited to containing ascii alphabetic characters) in
remove "containing"
fixed
> + /*
> + * If the object name pattern has been quoted we must take care parse out
> + * the entire quoted pattern, which may contain whitespace and can span
> + * over many lines.
quoted comma
*to parse
remove "over"
fixed
> + * The pattern is either simple without any whitespace, or properly quoted
double space
fixed
> + * in case there is whitespace in the object name. The pattern handling follows
s/is/may be/
fixed
> + if (size == 7 && pg_strncasecmp(keyword, "include", 7) == 0)
> + *is_include = true;
> + else if (size == 7 && pg_strncasecmp(keyword, "exclude", 7) == 0)
> + *is_include = false;
Can't you write strncasecmp(keyword, "include", size) to avoid hardcoding "7" ?
I need to compare the size of the keyword with expected size, but I can use strlen(conststr). I wrote new macro is_keyword_str to fix this issue
fixed
> +
> + if (size == 4 && pg_strncasecmp(keyword, "data", 4) == 0)
> + *objtype = FILTER_OBJECT_TYPE_DATA;
> + else if (size == 8 && pg_strncasecmp(keyword, "database", 8) == 0)
> + *objtype = FILTER_OBJECT_TYPE_DATABASE;
> + else if (size == 12 && pg_strncasecmp(keyword, "foreign_data", 12) == 0)
> + *objtype = FILTER_OBJECT_TYPE_FOREIGN_DATA;
> + else if (size == 8 && pg_strncasecmp(keyword, "function", 8) == 0)
> + *objtype = FILTER_OBJECT_TYPE_FUNCTION;
> + else if (size == 5 && pg_strncasecmp(keyword, "index", 5) == 0)
> + *objtype = FILTER_OBJECT_TYPE_INDEX;
> + else if (size == 6 && pg_strncasecmp(keyword, "schema", 6) == 0)
> + *objtype = FILTER_OBJECT_TYPE_SCHEMA;
> + else if (size == 5 && pg_strncasecmp(keyword, "table", 5) == 0)
> + *objtype = FILTER_OBJECT_TYPE_TABLE;
> + else if (size == 7 && pg_strncasecmp(keyword, "trigger", 7) == 0)
> + *objtype = FILTER_OBJECT_TYPE_TRIGGER;
Avoid hardcoding these constants.
fixed
> diff --git a/src/bin/pg_dump/filter.h b/src/bin/pg_dump/filter.h
> new file mode 100644
> index 0000000000..e4a1a74b10
> --- /dev/null
> +++ b/src/bin/pg_dump/filter.h
...
> \ No newline at end of file
fixed
updated patch attached
Regards
Pavel
:(
Attachment
Hi
I am sending fresh rebase + enhancing tests for pg_dumpall and pg_restore
Regards
Pavel
Attachment
As noted upthread at some point, I'm not overly excited about the parser in filter.c, for maintainability and readability reasons. So, I've reimplemented the parser in Flex/Bison in the attached patch, which IMHO provides a clear(er) picture of the grammar and is more per project standards. This version of the patch is your latest version with just the parser replaced (at a reduction in size as a side benefit). All features supported in your latest patch version are present, and it passes all the tests added by this patch. It's been an undisclosed amount of years since I wrote a Bison parser (well, yacc really) from scratch so I don't rule out having made silly mistakes. I would very much appreciate review from those more well versed in this area. One thing this patchversion currently lacks is refined error messaging, but if we feel that this approach is a viable path then that can be tweaked. The function which starts the parser can also be refactored to be shared across pg_dump, pg_dumpall and pg_restore but I've kept it simple for now. Thoughts? It would be nice to get this patch across the finishline during this commitfest. -- Daniel Gustafsson https://vmware.com/
Attachment
Op 07-09-2022 om 21:45 schreef Daniel Gustafsson: > > One thing this patchversion currently lacks is refined error messaging, but if > we feel that this approach is a viable path then that can be tweaked. The > function which starts the parser can also be refactored to be shared across > pg_dump, pg_dumpall and pg_restore but I've kept it simple for now. > > Thoughts? It would be nice to get this patch across the finishline during this > commitfest. > [0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch] This seems to dump & restore well (as Pavels patch does). I did notice one peculiarity (in your patch) where for each table a few spaces are omitted by pg_dump. ------------- #! /bin/bash psql -qXc "drop database if exists testdb2" psql -qXc "create database testdb2" echo " create schema if not exists test; create table table0 (id integer); create table table1 (id integer); insert into table0 select n from generate_series(1,2) as f(n); insert into table1 select n from generate_series(1,2) as f(n); " | psql -qXad testdb2 echo "include table table0" > inputfile1.txt echo "include table table0 include table table1" > inputfile2.txt # 1 table, emits 2 spaces echo -ne ">" pg_dump -F p -f plainfile1 --filter=inputfile1.txt -d testdb2 echo "<" # 2 tables, emits 4 space echo -ne ">" pg_dump -F p -f plainfile2 --filter=inputfile2.txt -d testdb2 echo "<" # dump without filter emits no spaces echo -ne ">" pg_dump -F c -f plainfile3 -t table0 -table1 -d testdb2 echo "<" ------------- It's probably a small thing -- but I didn't find it. thanks, Erik Rijkers > > -- > Daniel Gustafsson https://vmware.com/ >
> On 8 Sep 2022, at 12:00, Erik Rijkers <er@xs4all.nl> wrote: > > Op 07-09-2022 om 21:45 schreef Daniel Gustafsson: >> One thing this patchversion currently lacks is refined error messaging, but if >> we feel that this approach is a viable path then that can be tweaked. The >> function which starts the parser can also be refactored to be shared across >> pg_dump, pg_dumpall and pg_restore but I've kept it simple for now. >> Thoughts? It would be nice to get this patch across the finishline during this >> commitfest. > > > [0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch] > > This seems to dump & restore well (as Pavels patch does). Thanks for looking! > I did notice one peculiarity (in your patch) where for each table a few spaces are omitted by pg_dump. Right, I had that on my TODO to fix before submitting but clearly forgot. It boils down to consuming the space between commands and object types and object patterns. The attached v2 fixes that. -- Daniel Gustafsson https://vmware.com/
Attachment
Hi, On Thu, Sep 08, 2022 at 01:38:42PM +0200, Daniel Gustafsson wrote: > > On 8 Sep 2022, at 12:00, Erik Rijkers <er@xs4all.nl> wrote: > > > > Op 07-09-2022 om 21:45 schreef Daniel Gustafsson: > >> One thing this patchversion currently lacks is refined error messaging, but if > >> we feel that this approach is a viable path then that can be tweaked. The > >> function which starts the parser can also be refactored to be shared across > >> pg_dump, pg_dumpall and pg_restore but I've kept it simple for now. > >> Thoughts? It would be nice to get this patch across the finishline during this > >> commitfest. > > > > > [0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch] > > > > This seems to dump & restore well (as Pavels patch does). > > Thanks for looking! > > > I did notice one peculiarity (in your patch) where for each table a few spaces are omitted by pg_dump. > > Right, I had that on my TODO to fix before submitting but clearly forgot. It > boils down to consuming the space between commands and object types and object > patterns. The attached v2 fixes that. I only had a quick look at the parser, and one thing that strikes me is: +Patterns: + /* EMPTY */ + | Patterns Pattern + | Pattern + ; + +Pattern: + C_INCLUDE include_object pattern { include_item(priv, $2, $3); } It seems confusing to mix Pattern(s) (the rules) and pattern (the token). Maybe instead using Include(s) or Item(s) on the bison side, and/or name_pattern on the lexer side?
> On 8 Sep 2022, at 13:44, Julien Rouhaud <rjuju123@gmail.com> wrote: > On Thu, Sep 08, 2022 at 01:38:42PM +0200, Daniel Gustafsson wrote: >>> On 8 Sep 2022, at 12:00, Erik Rijkers <er@xs4all.nl> wrote: >>> I did notice one peculiarity (in your patch) where for each table a few spaces are omitted by pg_dump. >> >> Right, I had that on my TODO to fix before submitting but clearly forgot. It >> boils down to consuming the space between commands and object types and object >> patterns. The attached v2 fixes that. > > I only had a quick look at the parser, Thanks for looking! > .. and one thing that strikes me is: > > +Patterns: > + /* EMPTY */ > + | Patterns Pattern > + | Pattern > + ; > + > +Pattern: > + C_INCLUDE include_object pattern { include_item(priv, $2, $3); } > > It seems confusing to mix Pattern(s) (the rules) and pattern (the token). > Maybe instead using Include(s) or Item(s) on the bison side, and/or > name_pattern on the lexer side? That makes a lot of sense, I renamed the rules in the parser but kept them in the lexer since that seemed like the clearest scheme. Also in the attached is a small refactoring to share parser init between pg_dump and pg_restore (pg_dumpall shares little with these so not there for now), buffer resize overflow calculation and some error message tweaking. -- Daniel Gustafsson https://vmware.com/
Attachment
Hi
st 7. 9. 2022 v 21:46 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
As noted upthread at some point, I'm not overly excited about the parser in
filter.c, for maintainability and readability reasons. So, I've reimplemented
the parser in Flex/Bison in the attached patch, which IMHO provides a clear(er)
picture of the grammar and is more per project standards. This version of the
patch is your latest version with just the parser replaced (at a reduction in
size as a side benefit).
All features supported in your latest patch version are present, and it passes
all the tests added by this patch. It's been an undisclosed amount of years
since I wrote a Bison parser (well, yacc really) from scratch so I don't rule
out having made silly mistakes. I would very much appreciate review from those
more well versed in this area.
One thing this patchversion currently lacks is refined error messaging, but if
we feel that this approach is a viable path then that can be tweaked. The
function which starts the parser can also be refactored to be shared across
pg_dump, pg_dumpall and pg_restore but I've kept it simple for now.
Thoughts? It would be nice to get this patch across the finishline during this
commitfest.
I have no objections to this, and thank you so you try to move this patch forward.
Regards
Pavel
--
Daniel Gustafsson https://vmware.com/
On Thu, Sep 8, 2022 at 7:32 PM Daniel Gustafsson <daniel@yesql.se> wrote: > [v3] Note that the grammar has shift-reduce conflicts. If you run a fairly recent Bison, you can show them like this: bison -Wno-deprecated -Wcounterexamples -d -o filterparse.c filterparse.y filterparse.y: warning: 2 shift/reduce conflicts [-Wconflicts-sr] filterparse.y: warning: shift/reduce conflict on token C_INCLUDE [-Wcounterexamples] Example: • C_INCLUDE include_object pattern Shift derivation Filters ↳ 3: Filter ↳ 4: • C_INCLUDE include_object pattern Reduce derivation Filters ↳ 2: Filters Filter ↳ 1: ε • ↳ 4: C_INCLUDE include_object pattern filterparse.y: warning: shift/reduce conflict on token C_EXCLUDE [-Wcounterexamples] Example: • C_EXCLUDE exclude_object pattern Shift derivation Filters ↳ 3: Filter ↳ 5: • C_EXCLUDE exclude_object pattern Reduce derivation Filters ↳ 2: Filters Filter ↳ 1: ε • ↳ 5: C_EXCLUDE exclude_object pattern -- John Naylor EDB: http://www.enterprisedb.com
> On Sep 9, 2022, at 5:53 PM, John Naylor <john.naylor@enterprisedb.com> wrote: > > On Thu, Sep 8, 2022 at 7:32 PM Daniel Gustafsson <daniel@yesql.se> wrote: >> [v3] > > Note that the grammar has shift-reduce conflicts. If you run a fairly > recent Bison, you can show them like this: > > bison -Wno-deprecated -Wcounterexamples -d -o filterparse.c filterparse.y > > filterparse.y: warning: 2 shift/reduce conflicts [-Wconflicts-sr] > filterparse.y: warning: shift/reduce conflict on token C_INCLUDE > [-Wcounterexamples] > Example: • C_INCLUDE include_object pattern > Shift derivation > Filters > ↳ 3: Filter > ↳ 4: • C_INCLUDE include_object pattern > Reduce derivation > Filters > ↳ 2: Filters Filter > ↳ 1: ε • ↳ 4: C_INCLUDE include_object pattern > filterparse.y: warning: shift/reduce conflict on token C_EXCLUDE > [-Wcounterexamples] > Example: • C_EXCLUDE exclude_object pattern > Shift derivation > Filters > ↳ 3: Filter > ↳ 5: • C_EXCLUDE exclude_object pattern > Reduce derivation > Filters > ↳ 2: Filters Filter > ↳ 1: ε • ↳ 5: C_EXCLUDE exclude_object pattern > Looks like the last rule for Filters should not be there. I do wonder whether we should be using bison/flex here, seems likeusing a sledgehammer to crack a nut. Cheers Andrew
> On 9 Sep 2022, at 11:00, Andrew Dunstan <andrew@dunslane.net> wrote: > >> On Sep 9, 2022, at 5:53 PM, John Naylor <john.naylor@enterprisedb.com> wrote: >> >> Note that the grammar has shift-reduce conflicts. > Looks like the last rule for Filters should not be there. Correct, fixed in the attached. > I do wonder whether we should be using bison/flex here, seems like using a > sledgehammer to crack a nut. I don't the capabilities of the tool is all that interesting compared to the long term maintainability and readability of the source code. Personally I think a simple Bison/Flex parser is easier to read and reason about than the corresponding written in C. -- Daniel Gustafsson https://vmware.com/
Attachment
po 12. 9. 2022 v 9:59 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 9 Sep 2022, at 11:00, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>> On Sep 9, 2022, at 5:53 PM, John Naylor <john.naylor@enterprisedb.com> wrote:
>>
>> Note that the grammar has shift-reduce conflicts.
> Looks like the last rule for Filters should not be there.
Correct, fixed in the attached.
> I do wonder whether we should be using bison/flex here, seems like using a
> sledgehammer to crack a nut.
I don't the capabilities of the tool is all that interesting compared to the
long term maintainability and readability of the source code. Personally I
think a simple Bison/Flex parser is easier to read and reason about than the
corresponding written in C.
When this work is done, then there is no reason to throw it. The parser in bison/flex does the same work and it is true, so code is more readable. Although for this case, a handy written parser was trivial too.
Regards
Pavel
--
Daniel Gustafsson https://vmware.com/
Op 12-09-2022 om 09:58 schreef Daniel Gustafsson: >> On 9 Sep 2022, at 11:00, Andrew Dunstan <andrew@dunslane.net> wrote: >> >>> On Sep 9, 2022, at 5:53 PM, John Naylor <john.naylor@enterprisedb.com> wrote: >>> [v4-0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch] I noticed that pg_restore --filter cannot, or at last not always, be used with the same filter-file that was used to produce a dump with pg_dump --filter. Is that as designed? It seems a bit counterintuitive. It'd be nice if that could be fixed. Admittedly, the 'same' problem in pg_restore -t, also less than ideal. (A messy bashdemo below) thanks, Erik Rijkers #! /bin/bash db2='testdb2' db3='testdb3' db2='testdb_source' db3='testdb_target' sql_dropdb="drop database if exists $db2; drop database if exists $db3;" sql_createdb="create database $db2; create database $db3;" schema1=s1 table1=table1 t1=$schema1.$table1 schema2=s2 table2=table2 t2=$schema2.$table2 sql_schema_init="create schema if not exists $schema1; create schema if not exists $schema2;" sql_test="select '$t1', n from $t1 order by n; select '$t2', n from $t2 order by n;" function sqltest() { for database_name in $db2 $db3 ;do port_used=$( echo "show port" |psql -qtAX -d $database_name ) echo -n "-- $database_name ($port_used): " echo "$sql_test" | psql -qtAX -a -d $database_name | md5sum done echo } echo "setting up orig db $db2, target db $db3" echo "$sql_dropdb" | psql -qtAX echo "$sql_createdb" | psql -qtAX psql -X -d $db2 << SQL $sql_schema_init create table $t1 as select n from generate_series(1, (10^1)::int) as f(n); create table $t2 as select n from generate_series(2, (10^2)::int) as f(n); SQL echo " include table $t1 include table $t2 # include schema $s1 # include schema $s2 " > inputfile1.txt # in filter; out plain echo "-- pg_dump -F p -f plainfile1 --filter=inputfile1.txt -d $db2" pg_dump -F p -f plainfile1 --filter=inputfile1.txt -d $db2 echo "$sql_schema_init" | psql -qX -d $db3 echo "-- pg_restore -d $db3 dumpfile1" pg_restore -d $db3 dumpfile1 rc=$? echo "-- pg_restore returned [$rc] -- pg_restore without --filter" sqltest # enable this to see it fail if [[ 1 -eq 1 ]] then # clean out echo "drop schema $schema1 cascade; drop schema $schema2 cascade; " | psql -qtAXad $db3 --filter=inputfile1.txt" echo "$sql_schema_init" | psql -qX -d $db3 echo "-- pg_restore -d $db3 --filter=inputfile1.txt dumpfile1" pg_restore -d $db3 --filter=inputfile1.txt dumpfile1 rc=$? echo "-- pg_restore returned [$rc] -- pg_restore without --filter" sqltest fi
Op 12-09-2022 om 16:00 schreef Erik Rijkers: > Op 12-09-2022 om 09:58 schreef Daniel Gustafsson: >>> On 9 Sep 2022, at 11:00, Andrew Dunstan <andrew@dunslane.net> wrote: >>> >>>> On Sep 9, 2022, at 5:53 PM, John Naylor >>>> <john.naylor@enterprisedb.com> wrote: > >>>> [v4-0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch] > > I noticed that pg_restore --filter cannot, or at last not always, be > used with the same filter-file that was used to produce a dump with > pg_dump --filter. > > Is that as designed? It seems a bit counterintuitive. It'd be nice if > that could be fixed. Admittedly, the 'same' problem in pg_restore -t, > also less than ideal. > > (A messy bashdemo below) I hope the issue is still clear, even though in the bash I sent, I messed up the dumpfile name (i.e., in the bash that I sent the pg_dump creates another dump name than what is given to pg_restore. They should use the same dumpname, obviously) > > thanks, > > Erik Rijkers
On Mon, Sep 12, 2022 at 8:10 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: > > po 12. 9. 2022 v 9:59 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal: >> I don't the capabilities of the tool is all that interesting compared to the >> long term maintainability and readability of the source code. With make distprep and maintainer-clean, separate makefile and MSVC build logic a short time before converting to Meson, I'm not sure that even the short term maintainability here is a good trade off for what we're getting. > The parser in bison/flex does the same work and it is true, so code is more readable. Although for this case, a handy writtenparser was trivial too. If the hand-written version is trivial, then we should prefer it. -- John Naylor EDB: http://www.enterprisedb.com
út 13. 9. 2022 v 10:46 odesílatel John Naylor <john.naylor@enterprisedb.com> napsal:
On Mon, Sep 12, 2022 at 8:10 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> po 12. 9. 2022 v 9:59 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
>> I don't the capabilities of the tool is all that interesting compared to the
>> long term maintainability and readability of the source code.
With make distprep and maintainer-clean, separate makefile and MSVC
build logic a short time before converting to Meson, I'm not sure that
even the short term maintainability here is a good trade off for what
we're getting.
> The parser in bison/flex does the same work and it is true, so code is more readable. Although for this case, a handy written parser was trivial too.
If the hand-written version is trivial, then we should prefer it.
Please, can you check and compare both versions? My view is subjective.
Regards
Pavel
--
John Naylor
EDB: http://www.enterprisedb.com
Hi, On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote: > > On 9 Sep 2022, at 11:00, Andrew Dunstan <andrew@dunslane.net> wrote: > > > >> On Sep 9, 2022, at 5:53 PM, John Naylor <john.naylor@enterprisedb.com> wrote: > >> > >> Note that the grammar has shift-reduce conflicts. > > > Looks like the last rule for Filters should not be there. > > Correct, fixed in the attached. Due to the merge of the meson build, this patch now needs to adjust the relevant meson.build. This is the cause of the failures at: https://cirrus-ci.com/build/5788292678418432 See e.g. src/bin/pgbench/meson.build Greetings, Andres Freund
Hi, On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote: > Correct, fixed in the attached. Updated patch adding meson compatibility attached. Greetings, Andres Freund
Attachment
Hi, On 2022-10-01 23:56:59 -0700, Andres Freund wrote: > On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote: > > Correct, fixed in the attached. > > Updated patch adding meson compatibility attached. Err, forgot to amend one hunk :( Greetings, Andres Freund
Attachment
Hi, On 2022-10-02 00:19:59 -0700, Andres Freund wrote: > On 2022-10-01 23:56:59 -0700, Andres Freund wrote: > > On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote: > > > Correct, fixed in the attached. > > > > Updated patch adding meson compatibility attached. > > Err, forgot to amend one hunk :( That fixed it on all platforms but windows, due to copy-pasto. I really should have stopped earlier yesterday... > +/*------------------------------------------------------------------------- > + * > + * filter.h > + * Common header file for the parser of filter file > + * > + * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group > + * Portions Copyright (c) 1994, Regents of the University of California > + * > + * src/bin/pg_dump/filter.h > + * > + *------------------------------------------------------------------------- > + */ > +#ifndef FILTER_H > +#define FILTER_H > +#include "c.h" c.h (and postgres.h, postgres_fe.h) shouldn't be included in headers. This is a common enough mistake that I'm wondering if we could automate warning about it somehow. Greetings, Andres Freund
Attachment
> On 2 Oct 2022, at 18:04, Andres Freund <andres@anarazel.de> wrote: > On 2022-10-02 00:19:59 -0700, Andres Freund wrote: >> On 2022-10-01 23:56:59 -0700, Andres Freund wrote: >>> On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote: >>>> Correct, fixed in the attached. >>> >>> Updated patch adding meson compatibility attached. >> >> Err, forgot to amend one hunk :( > > That fixed it on all platforms but windows, due to copy-pasto. I really should > have stopped earlier yesterday... Thanks for updating the patch! The parser in the original submission was -1'd by me, and the current version proposed as an alternative. This was subsequently -1'd as well but no updated patch with a rewritten parser has been posted. So this is now stalled again. Having been around in 12 commitfests without a committer feeling confident about pushing this I plan to mark it returned with feedback, and if a new parser materializes itc can be readded instead of being dragged along. > c.h (and postgres.h, postgres_fe.h) shouldn't be included in headers. > > This is a common enough mistake that I'm wondering if we could automate > warning about it somehow. Maybe we can add a simple git grep invocation in the CompilerWarnings CI job to catch this in the CFBot? If something like the below sketch matches then we can throw an error. (only for illustration, all three files needs to checked). git grep "\"c\.h" -- *.h :^src/include/postgres*.h; -- Daniel Gustafsson https://vmware.com/
Daniel Gustafsson <daniel@yesql.se> writes: > On 2 Oct 2022, at 18:04, Andres Freund <andres@anarazel.de> wrote: >> c.h (and postgres.h, postgres_fe.h) shouldn't be included in headers. >> This is a common enough mistake that I'm wondering if we could automate >> warning about it somehow. > Maybe we can add a simple git grep invocation in the CompilerWarnings CI job to > catch this in the CFBot? I'd be inclined to teach headerscheck or cpluspluscheck about it. regards, tom lane
On 2022-10-02 22:52:33 +0200, Daniel Gustafsson wrote: > The parser in the original submission was -1'd by me, and the current version > proposed as an alternative. This was subsequently -1'd as well but no updated > patch with a rewritten parser has been posted. So this is now stalled again. > > Having been around in 12 commitfests without a committer feeling confident > about pushing this I plan to mark it returned with feedback, and if a new > parser materializes itc can be readded instead of being dragged along. Makes sense to me.
ne 2. 10. 2022 v 22:52 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 2 Oct 2022, at 18:04, Andres Freund <andres@anarazel.de> wrote:
> On 2022-10-02 00:19:59 -0700, Andres Freund wrote:
>> On 2022-10-01 23:56:59 -0700, Andres Freund wrote:
>>> On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote:
>>>> Correct, fixed in the attached.
>>>
>>> Updated patch adding meson compatibility attached.
>>
>> Err, forgot to amend one hunk :(
>
> That fixed it on all platforms but windows, due to copy-pasto. I really should
> have stopped earlier yesterday...
Thanks for updating the patch!
The parser in the original submission was -1'd by me, and the current version
proposed as an alternative. This was subsequently -1'd as well but no updated
patch with a rewritten parser has been posted. So this is now stalled again.
You started rewriting it, but you didn't finish it.
Unfortunately, there is not a clean opinion on using bison's parser for this purpose. I understand that the complexity of this language is too low, so the benefit of using bison's gramatic is low too. Personally, I have not any problem using bison for this purpose. For this case, I think we compare two similarly long ways, but unfortunately, customers that have a problem with long command lines still have this problem.
Can we go forward? Daniel is strongly against handwritten parser. Is there somebody strongly against bison's based parser? There is not any other way.
I am able to complete Daniel's patch, if there will not be objections.
Regards
Pavel
Having been around in 12 commitfests without a committer feeling confident
about pushing this I plan to mark it returned with feedback, and if a new
parser materializes itc can be readded instead of being dragged along.
> c.h (and postgres.h, postgres_fe.h) shouldn't be included in headers.
>
> This is a common enough mistake that I'm wondering if we could automate
> warning about it somehow.
Maybe we can add a simple git grep invocation in the CompilerWarnings CI job to
catch this in the CFBot? If something like the below sketch matches then we
can throw an error. (only for illustration, all three files needs to checked).
git grep "\"c\.h" -- *.h :^src/include/postgres*.h;
--
Daniel Gustafsson https://vmware.com/
Hi, On Mon, Oct 03, 2022 at 06:00:12AM +0200, Pavel Stehule wrote: > ne 2. 10. 2022 v 22:52 odesílatel Daniel Gustafsson <daniel@yesql.se> > napsal: > > > > On 2 Oct 2022, at 18:04, Andres Freund <andres@anarazel.de> wrote: > > > On 2022-10-02 00:19:59 -0700, Andres Freund wrote: > > >> On 2022-10-01 23:56:59 -0700, Andres Freund wrote: > > >>> On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote: > > >>>> Correct, fixed in the attached. > > >>> > > >>> Updated patch adding meson compatibility attached. > > >> > > >> Err, forgot to amend one hunk :( > > > > > > That fixed it on all platforms but windows, due to copy-pasto. I really > > should > > > have stopped earlier yesterday... > > > > Thanks for updating the patch! > > > > The parser in the original submission was -1'd by me, and the current > > version > > proposed as an alternative. This was subsequently -1'd as well but no > > updated > > patch with a rewritten parser has been posted. So this is now stalled > > again. > > > > You started rewriting it, but you didn't finish it. > > Unfortunately, there is not a clean opinion on using bison's parser for > this purpose. I understand that the complexity of this language is too low, > so the benefit of using bison's gramatic is low too. Personally, I have not > any problem using bison for this purpose. For this case, I think we compare > two similarly long ways, but unfortunately, customers that have a problem > with long command lines still have this problem. > > Can we go forward? Daniel is strongly against handwritten parser. Is there > somebody strongly against bison's based parser? There is not any other way. I don't have a strong opinion either, but it seems that 2 people argued against a bison parser (vs only 1 arguing for) and the fact that the current habit is to rely on hand written parsers for simple cases (e.g. jsonapi.c / pg_parse_json()), it seems that we should go back to Pavel's original parser. I only had a quick look but it indeed seems trivial, it just maybe need a bit of refactoring to avoid some code duplication (getFiltersFromFile is duplicated, and getDatabaseExcludeFiltersFromFile could be removed if getFiltersFromFile knew about the 2 patterns).
Hi
I am sending version with handy written parser and meson support
po 3. 10. 2022 v 6:34 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
Hi,
> You started rewriting it, but you didn't finish it.
>
> Unfortunately, there is not a clean opinion on using bison's parser for
> this purpose. I understand that the complexity of this language is too low,
> so the benefit of using bison's gramatic is low too. Personally, I have not
> any problem using bison for this purpose. For this case, I think we compare
> two similarly long ways, but unfortunately, customers that have a problem
> with long command lines still have this problem.
>
> Can we go forward? Daniel is strongly against handwritten parser. Is there
> somebody strongly against bison's based parser? There is not any other way.
I don't have a strong opinion either, but it seems that 2 people argued against
a bison parser (vs only 1 arguing for) and the fact that the current habit is
to rely on hand written parsers for simple cases (e.g. jsonapi.c /
pg_parse_json()), it seems that we should go back to Pavel's original parser.
I only had a quick look but it indeed seems trivial, it just maybe need a bit
of refactoring to avoid some code duplication (getFiltersFromFile is
duplicated, and getDatabaseExcludeFiltersFromFile could be removed if
getFiltersFromFile knew about the 2 patterns).
I checked this code again, and I don't think some refactoring is easy. getFiltersFromFile is not duplicated. It is just probably badly named.
These routines are used from pg_dump, pg_dumpall and pg_restore. There are significant differences in supported objects and in types used for returned lists (dumpOptions, SimpleStringList, and RestoreOptions). If I have one routine, then I need to implement some mechanism for specification of supported objects, and a special type that can be used as a proxy between caller and parser to hold lists of parsed values. To be names less confusing I renamed them to read_dump_filters, read_dumpall_filters and read_restore_filters
Regards
Pavel
Attachment
On Fri, Oct 07, 2022 at 07:26:08AM +0200, Pavel Stehule wrote: > > I checked this code again, and I don't think some refactoring is easy. > getFiltersFromFile is not duplicated. It is just probably badly named. > > These routines are used from pg_dump, pg_dumpall and pg_restore. There are > significant differences in supported objects and in types used for returned > lists (dumpOptions, SimpleStringList, and RestoreOptions). If I have one > routine, then I need to implement some mechanism for specification of > supported objects, and a special type that can be used as a proxy between > caller and parser to hold lists of parsed values. To be names less > confusing I renamed them to read_dump_filters, read_dumpall_filters and > read_restore_filters Ah right, I missed the different argument types. Now that the functions have improved names it looks way clearer, and it seems just fine!
pá 7. 10. 2022 v 16:03 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
On Fri, Oct 07, 2022 at 07:26:08AM +0200, Pavel Stehule wrote:
>
> I checked this code again, and I don't think some refactoring is easy.
> getFiltersFromFile is not duplicated. It is just probably badly named.
>
> These routines are used from pg_dump, pg_dumpall and pg_restore. There are
> significant differences in supported objects and in types used for returned
> lists (dumpOptions, SimpleStringList, and RestoreOptions). If I have one
> routine, then I need to implement some mechanism for specification of
> supported objects, and a special type that can be used as a proxy between
> caller and parser to hold lists of parsed values. To be names less
> confusing I renamed them to read_dump_filters, read_dumpall_filters and
> read_restore_filters
Ah right, I missed the different argument types. Now that the functions have
improved names it looks way clearer, and it seems just fine!
Thank you for check
Pavel
Hi, On 2022-10-07 07:26:08 +0200, Pavel Stehule wrote: > I am sending version with handy written parser and meson support Given this is a new approach it seems inaccurate to have the CF entry marked ready-for-committer. I've updated it to needs-review. Greetings, Andres Freund
On Thu, Oct 13, 2022 at 11:46:34AM -0700, Andres Freund wrote: > Hi, > > On 2022-10-07 07:26:08 +0200, Pavel Stehule wrote: > > I am sending version with handy written parser and meson support > > Given this is a new approach it seems inaccurate to have the CF entry marked > ready-for-committer. I've updated it to needs-review. I just had a quick look at the rest of the patch. For the parser, it seems that filter_get_pattern is reimplementing an identifier parsing function but isn't entirely correct. It can correctly parse quoted non-qualified identifiers and non-quoted qualified identifiers, but not quoted and qualified ones. For instance: $ echo 'include table nsp.tbl' | pg_dump --filter - >/dev/null $echo $? 0 $ echo 'include table "TBL"' | pg_dump --filter - >/dev/null $echo $? 0 $ echo 'include table "NSP"."TBL"' | pg_dump --filter - >/dev/null pg_dump: error: invalid format of filter on line 1: unexpected extra data after pattern This should also be covered in the regression tests. I'm wondering if psql's parse_identifier() could be exported and reused here rather than creating yet another version. Nitpicking: the comments needs some improvements: + /* + * Simple routines - just don't repeat same code + * + * Returns true, when filter's file is opened + */ + bool + filter_init(FilterStateData *fstate, const char *filename) also, is there any reason why this function doesn't call exit_nicely in case of error rather than letting each caller do it without any other cleanup? + /* + * Release allocated sources for filter + */ + void + filter_free_sources(FilterStateData *fstate) I'm assuming "ressources" not "sources"? + /* + * log_format_error - Emit error message + * + * This is mostly a convenience routine to avoid duplicating file closing code + * in multiple callsites. + */ + void + log_invalid_filter_format(FilterStateData *fstate, char *message) mismatch between comment and function name (same for filter_read_item) + static const char * + filter_object_type_name(FilterObjectType fot) No description. /* * Helper routine to reduce duplicated code */ void log_unsupported_filter_object_type(FilterStateData *fstate, const char *appname, FilterObjectType fot) Need more helpful comment.
Hi
út 18. 10. 2022 v 11:33 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
On Thu, Oct 13, 2022 at 11:46:34AM -0700, Andres Freund wrote:
> Hi,
>
> On 2022-10-07 07:26:08 +0200, Pavel Stehule wrote:
> > I am sending version with handy written parser and meson support
>
> Given this is a new approach it seems inaccurate to have the CF entry marked
> ready-for-committer. I've updated it to needs-review.
I just had a quick look at the rest of the patch.
For the parser, it seems that filter_get_pattern is reimplementing an
identifier parsing function but isn't entirely correct. It can correctly parse
quoted non-qualified identifiers and non-quoted qualified identifiers, but not
quoted and qualified ones. For instance:
$ echo 'include table nsp.tbl' | pg_dump --filter - >/dev/null
$echo $?
0
$ echo 'include table "TBL"' | pg_dump --filter - >/dev/null
$echo $?
0
$ echo 'include table "NSP"."TBL"' | pg_dump --filter - >/dev/null
pg_dump: error: invalid format of filter on line 1: unexpected extra data after pattern
fixed
This should also be covered in the regression tests.
done
I'm wondering if psql's parse_identifier() could be exported and reused here
rather than creating yet another version.
I looked there, and I don't think this parser is usable for this purpose. It is very sensitive on white spaces, and doesn't support multi-lines. It is designed for support readline tab complete, it is designed for simplicity not for correctness.
Nitpicking: the comments needs some improvements:
+ /*
+ * Simple routines - just don't repeat same code
+ *
+ * Returns true, when filter's file is opened
+ */
+ bool
+ filter_init(FilterStateData *fstate, const char *filename)
done
also, is there any reason why this function doesn't call exit_nicely in case of
error rather than letting each caller do it without any other cleanup?
It is commented few lines up
/*
* Following routines are called from pg_dump, pg_dumpall and pg_restore.
* Unfortunatelly, implementation of exit_nicely in pg_dump and pg_restore
* is different from implementation of this rutine in pg_dumpall. So instead
* direct calling exit_nicely we have to return some error flag (in this
* case NULL), and exit_nicelly will be executed from caller's routine.
*/
* Following routines are called from pg_dump, pg_dumpall and pg_restore.
* Unfortunatelly, implementation of exit_nicely in pg_dump and pg_restore
* is different from implementation of this rutine in pg_dumpall. So instead
* direct calling exit_nicely we have to return some error flag (in this
* case NULL), and exit_nicelly will be executed from caller's routine.
*/
+ /*
+ * Release allocated sources for filter
+ */
+ void
+ filter_free_sources(FilterStateData *fstate)
I'm assuming "ressources" not "sources"?
changed
+ /*
+ * log_format_error - Emit error message
+ *
+ * This is mostly a convenience routine to avoid duplicating file closing code
+ * in multiple callsites.
+ */
+ void
+ log_invalid_filter_format(FilterStateData *fstate, char *message)
mismatch between comment and function name (same for filter_read_item)
fixes
+ static const char *
+ filter_object_type_name(FilterObjectType fot)
No description.
fixed
/*
* Helper routine to reduce duplicated code
*/
void
log_unsupported_filter_object_type(FilterStateData *fstate,
const char *appname,
FilterObjectType fot)
Need more helpful comment.
fixed
Thank you for comments
attached updated patch
Regards
Pavel
Attachment
Hi, On Wed, Oct 26, 2022 at 06:26:26AM +0200, Pavel Stehule wrote: > > út 18. 10. 2022 v 11:33 odesílatel Julien Rouhaud <rjuju123@gmail.com> > napsal: > > > > > I'm wondering if psql's parse_identifier() could be exported and reused > > here > > rather than creating yet another version. > > > > I looked there, and I don't think this parser is usable for this purpose. > It is very sensitive on white spaces, and doesn't support multi-lines. It > is designed for support readline tab complete, it is designed for > simplicity not for correctness. Ah, sorry I should have checked more thoroughly. I guess it's ok to have another identifier parser for the include file then, as this new one wouldn't really fit the tab-completion use case. > > also, is there any reason why this function doesn't call exit_nicely in > > case of > > error rather than letting each caller do it without any other cleanup? > > > > It is commented few lines up > > /* > * Following routines are called from pg_dump, pg_dumpall and pg_restore. > * Unfortunatelly, implementation of exit_nicely in pg_dump and pg_restore > * is different from implementation of this rutine in pg_dumpall. So instead > * direct calling exit_nicely we have to return some error flag (in this > * case NULL), and exit_nicelly will be executed from caller's routine. > */ Oh right, I totally missed it sorry about that! About the new version, I didn't find any problem with the feature itself so it's a good thing! I still have a few comments about the patch. First, about the behavior: - is that ok to have just "data" pattern instead of "table_data" or something like that, since it's supposed to match --exclude-table-data option? - the error message are sometimes not super helpful. For instance: $ echo "include data t1" | pg_dump --filter - pg_dump: error: invalid format of filter on line 1: include filter is not allowed for this type of object It would be nice if the error message mentioned "data" rather than a generic "this type of object". Also, maybe we should quote "include" to outline that we found this keyword? About the patch itself: filter.c: +#include "postgres_fe.h" + +#include "filter.h" + +#include "common/logging.h" the filter.h inclusion should be done with the rest of the includes, in alphabetical order. +#define is_keyword_str(cstr, str, bytes) \ + ((strlen(cstr) == bytes) && (pg_strncasecmp(cstr, str, bytes) == 0)) nit: our guidline is to protect macro arguments with parenthesis. Some arguments can be evaluated multiple times but I don't think it's worth adding a comment for that. + * Unfortunatelly, implementation of exit_nicely in pg_dump and pg_restore + * is different from implementation of this rutine in pg_dumpall. So instead + * direct calling exit_nicely we have to return some error flag (in this typos: s/Unfortunatelly/Unfortunately/ and s/rutine/routine/ Also, it would probably be better to say "instead of directly calling..." +static const char * +filter_object_type_name(FilterObjectType fot) +{ + switch (fot) + { + case FILTER_OBJECT_TYPE_NONE: + return "comment or empty line"; +[...] + } + + return "unknown object type"; +} I'm wondering if we should add a pg_unreachable() there, some compilers might complain otherwise. See CreateDestReceiver() for instance for similar pattern. + * Emit error message "invalid format of filter file ..." + * + * This is mostly a convenience routine to avoid duplicating file closing code + * in multiple callsites. + */ +void +log_invalid_filter_format(FilterStateData *fstate, char *message) nit: invalid format *in* filter file...? +void +log_unsupported_filter_object_type(FilterStateData *fstate, + const char *appname, + FilterObjectType fot) +{ + PQExpBuffer str = createPQExpBuffer(); + + printfPQExpBuffer(str, + "The application \"%s\" doesn't support filter for object type \"%s\".", nit: there shouldn't be uppercase in error messages, especially since this will be appended to another message by log_invalid_filter_format. I would just just drop "The application" entirely for brevity. +/* + * Release allocated resources for filter + */ +void +filter_free(FilterStateData *fstate) nit: Release allocated resources for *the given* filter? + * Search for keywords (limited to ascii alphabetic characters) in + * the passed in line buffer. Returns NULL, when the buffer is empty or first + * char is not alpha. The length of the found keyword is returned in the size + * parameter. + */ +static const char * +filter_get_keyword(const char **line, int *size) +{ + [...] + if (isascii(*ptr) && isalpha(*ptr)) + { + result = ptr++; + + while (isascii(*ptr) && (isalpha(*ptr) || *ptr == '_')) + ptr++; Is there any reason to test isascii()? isalpha() should already cover that and should be cheaper to test anyway. Also nit: "Returns NULL when the buffer..." (unnecessary comma), and the '_' char is also allowed. +filter_read_item(FilterStateData *fstate, + bool *is_include, + char **objname, + FilterObjectType *objtype) +{ + Assert(!fstate->is_error); + + if (pg_get_line_buf(fstate->fp, &fstate->linebuff)) + { + char *str = fstate->linebuff.data; + const char *keyword; + int size; + + fstate->lineno++; + + (void) pg_strip_crlf(str); + + /* Skip initial white spaces */ + while (isspace(*str)) + str++; +[...] + keyword = filter_get_keyword((const char **) &str, &size); Is there any interest with the initial pg_strip_crlf? AFAICT all the rest of the code will ignore such caracters using isspace() so it wouldn't change anything. Dropping both pg_strip_crlf() would allow you to declare str as const rather than doing it in function calls. It would require to add const qualifiers in a few other places, but it seems like an improvement, as for instance right now filter_get_pattern is free to rewrite the str (because it's also calling pg_strip_crlf, but there's no guarantee that it doesn't do anything else). +/* + * filter_get_pattern - Read an object identifier pattern from the buffer + * + * Parses an object identifier pattern from the passed in buffer and sets + * objname to a string with object identifier pattern. Returns pointer to the + * first character after the pattern. Returns NULL on error. + */ +static char * +filter_get_pattern(FilterStateData *fstate, nit: suggestion to reword the comment, maybe something like /* * filter_get_pattern - Identify an object identifier pattern * * Try to parse an object identifier pattern from the passed buffer. If one is * found, it sets objname to a string with the object identifier pattern and * returns a pointer to the first byte after the found pattern. Otherwise NULL * is returned. */ +bool +filter_read_item(FilterStateData *fstate, Another suggestion for comment rewrite: /*------------------- * filter_read_item - Read command/type/pattern triplet from a filter file * * This will parse one filter item from the filter file, and while it is a * row based format a pattern may span more than one line due to how object * names can be constructed. The expected format of the filter file is: * * <command> <object_type> <pattern> * * command can be "include" or "exclude" * object_type can one of: "table", "schema", "foreign_data", "data", * "database", "function", "trigger" or "index" * pattern can be any possibly-quoted and possibly-qualified identifier. It * follows the same rules as other object include and exclude functions so it * can also use wildcards. * * Returns true when one filter item was successfully read and parsed. When * object name contains \n chars, then more than one line from input file can * be processed. Returns false when the filter file reaches EOF. In case of * error, the function will emit an appropriate error message before returning * false. */ Note also that your original comment said: + * In case of + * errors, the function wont return but will exit with an appropriate error + * message. But AFAICS that's not the case: it will indeed log an appropriate error message but will return false. I'm assuming that the comment was outdated as the calling code handles it just fine, so I just modified the comment. filter.h: +#ifndef FILTER_H +#define FILTER_H +#include "c.h" It's definitely not ok to include .ch in frontend code. But AFAICS just removing it doesn't cause any problem. Note also that there should be an empty line after the #define FILTER_H per usual coding style.
čt 3. 11. 2022 v 5:09 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
Hi,
On Wed, Oct 26, 2022 at 06:26:26AM +0200, Pavel Stehule wrote:
>
> út 18. 10. 2022 v 11:33 odesílatel Julien Rouhaud <rjuju123@gmail.com>
> napsal:
>
> >
> > I'm wondering if psql's parse_identifier() could be exported and reused
> > here
> > rather than creating yet another version.
> >
>
> I looked there, and I don't think this parser is usable for this purpose.
> It is very sensitive on white spaces, and doesn't support multi-lines. It
> is designed for support readline tab complete, it is designed for
> simplicity not for correctness.
Ah, sorry I should have checked more thoroughly. I guess it's ok to have
another identifier parser for the include file then, as this new one wouldn't
really fit the tab-completion use case.
> > also, is there any reason why this function doesn't call exit_nicely in
> > case of
> > error rather than letting each caller do it without any other cleanup?
> >
>
> It is commented few lines up
>
> /*
> * Following routines are called from pg_dump, pg_dumpall and pg_restore.
> * Unfortunatelly, implementation of exit_nicely in pg_dump and pg_restore
> * is different from implementation of this rutine in pg_dumpall. So instead
> * direct calling exit_nicely we have to return some error flag (in this
> * case NULL), and exit_nicelly will be executed from caller's routine.
> */
Oh right, I totally missed it sorry about that!
About the new version, I didn't find any problem with the feature itself so
it's a good thing!
I still have a few comments about the patch. First, about the behavior:
- is that ok to have just "data" pattern instead of "table_data" or something
like that, since it's supposed to match --exclude-table-data option?
done
- the error message are sometimes not super helpful. For instance:
$ echo "include data t1" | pg_dump --filter -
pg_dump: error: invalid format of filter on line 1: include filter is not allowed for this type of object
It would be nice if the error message mentioned "data" rather than a generic
"this type of object". Also, maybe we should quote "include" to outline that
we found this keyword?
done
About the patch itself:
filter.c:
+#include "postgres_fe.h"
+
+#include "filter.h"
+
+#include "common/logging.h"
the filter.h inclusion should be done with the rest of the includes, in
alphabetical order.
done
+#define is_keyword_str(cstr, str, bytes) \
+ ((strlen(cstr) == bytes) && (pg_strncasecmp(cstr, str, bytes) == 0))
nit: our guidline is to protect macro arguments with parenthesis. Some
arguments can be evaluated multiple times but I don't think it's worth adding a
comment for that.
done
+ * Unfortunatelly, implementation of exit_nicely in pg_dump and pg_restore
+ * is different from implementation of this rutine in pg_dumpall. So instead
+ * direct calling exit_nicely we have to return some error flag (in this
typos: s/Unfortunatelly/Unfortunately/ and s/rutine/routine/
Also, it would probably be better to say "instead of directly calling..."
done
+static const char *
+filter_object_type_name(FilterObjectType fot)
+{
+ switch (fot)
+ {
+ case FILTER_OBJECT_TYPE_NONE:
+ return "comment or empty line";
+[...]
+ }
+
+ return "unknown object type";
+}
I'm wondering if we should add a pg_unreachable() there, some compilers might
complain otherwise. See CreateDestReceiver() for instance for similar pattern.
done
+ * Emit error message "invalid format of filter file ..."
+ *
+ * This is mostly a convenience routine to avoid duplicating file closing code
+ * in multiple callsites.
+ */
+void
+log_invalid_filter_format(FilterStateData *fstate, char *message)
nit: invalid format *in* filter file...?
changed
+void
+log_unsupported_filter_object_type(FilterStateData *fstate,
+ const char *appname,
+ FilterObjectType fot)
+{
+ PQExpBuffer str = createPQExpBuffer();
+
+ printfPQExpBuffer(str,
+ "The application \"%s\" doesn't support filter for object type \"%s\".",
nit: there shouldn't be uppercase in error messages, especially since this will
be appended to another message by log_invalid_filter_format. I would just just
drop "The application" entirely for brevity.
changed
+/*
+ * Release allocated resources for filter
+ */
+void
+filter_free(FilterStateData *fstate)
nit: Release allocated resources for *the given* filter?
changed
+ * Search for keywords (limited to ascii alphabetic characters) in
+ * the passed in line buffer. Returns NULL, when the buffer is empty or first
+ * char is not alpha. The length of the found keyword is returned in the size
+ * parameter.
+ */
+static const char *
+filter_get_keyword(const char **line, int *size)
+{
+ [...]
+ if (isascii(*ptr) && isalpha(*ptr))
+ {
+ result = ptr++;
+
+ while (isascii(*ptr) && (isalpha(*ptr) || *ptr == '_'))
+ ptr++;
Is there any reason to test isascii()? isalpha() should already cover that and
should be cheaper to test anyway.
changed. I wanted to limit keyword's char just for basic ascii alphabets, but the benefit probably is not too strong, and the real effect can be messy, so I removed isascii test
Also nit: "Returns NULL when the buffer..." (unnecessary comma), and the '_'
char is also allowed.
done
+filter_read_item(FilterStateData *fstate,
+ bool *is_include,
+ char **objname,
+ FilterObjectType *objtype)
+{
+ Assert(!fstate->is_error);
+
+ if (pg_get_line_buf(fstate->fp, &fstate->linebuff))
+ {
+ char *str = fstate->linebuff.data;
+ const char *keyword;
+ int size;
+
+ fstate->lineno++;
+
+ (void) pg_strip_crlf(str);
+
+ /* Skip initial white spaces */
+ while (isspace(*str))
+ str++;
+[...]
+ keyword = filter_get_keyword((const char **) &str, &size);
Is there any interest with the initial pg_strip_crlf? AFAICT all the rest of
the code will ignore such caracters using isspace() so it wouldn't change
anything.
I think reading multiline identifiers is a little bit easier, because I don't need to check the ending \n and \r
When I read multiline identifiers, I cannot ignore white spaces.
Dropping both pg_strip_crlf() would allow you to declare str as const rather
than doing it in function calls. It would require to add const qualifiers in a
few other places, but it seems like an improvement, as for instance right now
filter_get_pattern is free to rewrite the str (because it's also calling
pg_strip_crlf, but there's no guarantee that it doesn't do anything else).
+/*
+ * filter_get_pattern - Read an object identifier pattern from the buffer
+ *
+ * Parses an object identifier pattern from the passed in buffer and sets
+ * objname to a string with object identifier pattern. Returns pointer to the
+ * first character after the pattern. Returns NULL on error.
+ */
+static char *
+filter_get_pattern(FilterStateData *fstate,
nit: suggestion to reword the comment, maybe something like
/*
* filter_get_pattern - Identify an object identifier pattern
*
* Try to parse an object identifier pattern from the passed buffer. If one is
* found, it sets objname to a string with the object identifier pattern and
* returns a pointer to the first byte after the found pattern. Otherwise NULL
* is returned.
*/
replaced
+bool
+filter_read_item(FilterStateData *fstate,
Another suggestion for comment rewrite:
/*-------------------
* filter_read_item - Read command/type/pattern triplet from a filter file
*
* This will parse one filter item from the filter file, and while it is a
* row based format a pattern may span more than one line due to how object
* names can be constructed. The expected format of the filter file is:
*
* <command> <object_type> <pattern>
*
* command can be "include" or "exclude"
* object_type can one of: "table", "schema", "foreign_data", "data",
* "database", "function", "trigger" or "index"
* pattern can be any possibly-quoted and possibly-qualified identifier. It
* follows the same rules as other object include and exclude functions so it
* can also use wildcards.
*
* Returns true when one filter item was successfully read and parsed. When
* object name contains \n chars, then more than one line from input file can
* be processed. Returns false when the filter file reaches EOF. In case of
* error, the function will emit an appropriate error message before returning
* false.
*/
replaced, thank you for the text
Note also that your original comment said:
+ * In case of
+ * errors, the function wont return but will exit with an appropriate error
+ * message.
But AFAICS that's not the case: it will indeed log an appropriate error message
but will return false. I'm assuming that the comment was outdated as the
calling code handles it just fine, so I just modified the comment.
yes
filter.h:
+#ifndef FILTER_H
+#define FILTER_H
+#include "c.h"
It's definitely not ok to include .ch in frontend code. But AFAICS just
removing it doesn't cause any problem. Note also that there should be an empty
line after the #define FILTER_H per usual coding style.
fixed - it looks so it was some garbage
updated patch attached
big thanks for these comments and tips
Regards
Pavel
Attachment
On Thu, Nov 03, 2022 at 10:22:15PM +0100, Pavel Stehule wrote: > čt 3. 11. 2022 v 5:09 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal: > > > > > Is there any interest with the initial pg_strip_crlf? AFAICT all the rest > > of > > the code will ignore such caracters using isspace() so it wouldn't change > > anything. > > > > I think reading multiline identifiers is a little bit easier, because I > don't need to check the ending \n and \r > When I read multiline identifiers, I cannot ignore white spaces. Ok. I don't have a strong objection to it. > > updated patch attached > > big thanks for these comments and tips Thanks for the updated patch! As far as I'm concerned the patch is in a good shape, passes the CI and I don't have anything more to say so I'm marking it as Ready for Committer!
On Fri, Nov 04, 2022 at 07:59:01PM +0800, Julien Rouhaud wrote: > On Thu, Nov 03, 2022 at 10:22:15PM +0100, Pavel Stehule wrote: > > čt 3. 11. 2022 v 5:09 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal: > > updated patch attached > > > > big thanks for these comments and tips > > Thanks for the updated patch! As far as I'm concerned the patch is in a good > shape, passes the CI and I don't have anything more to say so I'm marking it as > Ready for Committer! +1 I started looking to see if it's possible to simplify the patch at all, but nothing to show yet. But one thing I noticed is that "optarg" looks wrong here: simple_string_list_append(&opts->triggerNames, optarg); -- Justin
On Fri, Nov 04, 2022 at 07:19:27AM -0500, Justin Pryzby wrote: > On Fri, Nov 04, 2022 at 07:59:01PM +0800, Julien Rouhaud wrote: > > On Thu, Nov 03, 2022 at 10:22:15PM +0100, Pavel Stehule wrote: > > > čt 3. 11. 2022 v 5:09 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal: > > > updated patch attached > > > > > > big thanks for these comments and tips > > > > Thanks for the updated patch! As far as I'm concerned the patch is in a good > > shape, passes the CI and I don't have anything more to say so I'm marking it as > > Ready for Committer! > > +1 > > I started looking to see if it's possible to simplify the patch at all, > but nothing to show yet. > > But one thing I noticed is that "optarg" looks wrong here: > > simple_string_list_append(&opts->triggerNames, optarg); Ah indeed, good catch! Maybe there should be an explicit test for every (include|exclude) / objtype combination? It would be a bit verbose (and possibly hard to maintain).
pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
On Fri, Nov 04, 2022 at 07:19:27AM -0500, Justin Pryzby wrote:
> On Fri, Nov 04, 2022 at 07:59:01PM +0800, Julien Rouhaud wrote:
> > On Thu, Nov 03, 2022 at 10:22:15PM +0100, Pavel Stehule wrote:
> > > čt 3. 11. 2022 v 5:09 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
> > > updated patch attached
> > >
> > > big thanks for these comments and tips
> >
> > Thanks for the updated patch! As far as I'm concerned the patch is in a good
> > shape, passes the CI and I don't have anything more to say so I'm marking it as
> > Ready for Committer!
>
> +1
>
> I started looking to see if it's possible to simplify the patch at all,
> but nothing to show yet.
>
> But one thing I noticed is that "optarg" looks wrong here:
>
> simple_string_list_append(&opts->triggerNames, optarg);
Ah indeed, good catch! Maybe there should be an explicit test for every
(include|exclude) / objtype combination? It would be a bit verbose (and
possibly hard to maintain).
I'll do it
On Fri, Nov 04, 2022 at 02:37:05PM +0100, Pavel Stehule wrote: > pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud <rjuju123@gmail.com> > napsal: > > > > > > > But one thing I noticed is that "optarg" looks wrong here: > > > > > > simple_string_list_append(&opts->triggerNames, optarg); > > > > Ah indeed, good catch! Maybe there should be an explicit test for every > > (include|exclude) / objtype combination? It would be a bit verbose (and > > possibly hard to maintain). > > > > I'll do it Thanks a lot Pavel! I switched the CF entry back to "Waiting on Author" in the meantime.
Hi
pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
On Fri, Nov 04, 2022 at 07:19:27AM -0500, Justin Pryzby wrote:
> On Fri, Nov 04, 2022 at 07:59:01PM +0800, Julien Rouhaud wrote:
> > On Thu, Nov 03, 2022 at 10:22:15PM +0100, Pavel Stehule wrote:
> > > čt 3. 11. 2022 v 5:09 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
> > > updated patch attached
> > >
> > > big thanks for these comments and tips
> >
> > Thanks for the updated patch! As far as I'm concerned the patch is in a good
> > shape, passes the CI and I don't have anything more to say so I'm marking it as
> > Ready for Committer!
>
> +1
>
> I started looking to see if it's possible to simplify the patch at all,
> but nothing to show yet.
>
> But one thing I noticed is that "optarg" looks wrong here:
>
> simple_string_list_append(&opts->triggerNames, optarg);
Ah indeed, good catch! Maybe there should be an explicit test for every
(include|exclude) / objtype combination? It would be a bit verbose (and
possibly hard to maintain).
yes - pg_restore is not well covered by tests, fixed
I found another issue. The pg_restore requires a full signature of the function and it is pretty sensitive on white spaces (pg_restore). I made a mistake when I partially parsed patterns like SQL identifiers. It can work for simple cases, but when I parse the function's signature it stops working. So I rewrote the parsing pattern part. Now, I just read an input string and I try to reduce spaces. Still multiline identifiers are supported. Against the previous method of pattern parsing, I needed to change just one regress test - now I am not able to detect garbage after pattern :-/. It is possible to enter types like "double precision" or "timestamp with time zone", without needing to check it on the server side.
When I wroted regress tests I found some issues of pg_restore filtering options (not related to this patch)
* function's filtering doesn't support schema - when the name of function is specified with schema, then the function is not found
* the function has to be specified with an argument type list - the separator has to be exactly ", " string. Without space or with one space more, the filtering doesn't work (new implementation of pattern parsing reduces white spaces sensitivity). This is not a bug, but it is not well documented.
* the trigger filtering is probably broken (on pg_restore side). The name should be entered in form "tablename triggername"
Regards
Pavel
Attachment
Hi, On Sat, Nov 05, 2022 at 08:54:57PM +0100, Pavel Stehule wrote: > > pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud <rjuju123@gmail.com> > napsal: > > > > But one thing I noticed is that "optarg" looks wrong here: > > > > > > simple_string_list_append(&opts->triggerNames, optarg); > > > > Ah indeed, good catch! Maybe there should be an explicit test for every > > (include|exclude) / objtype combination? It would be a bit verbose (and > > possibly hard to maintain). > > > > yes - pg_restore is not well covered by tests, fixed > > I found another issue. The pg_restore requires a full signature of the > function and it is pretty sensitive on white spaces (pg_restore). Argh, indeed. It's a good thing to have expanded the regression tests :) > I made a > mistake when I partially parsed patterns like SQL identifiers. It can work > for simple cases, but when I parse the function's signature it stops > working. So I rewrote the parsing pattern part. Now, I just read an input > string and I try to reduce spaces. Still multiline identifiers are > supported. Against the previous method of pattern parsing, I needed to > change just one regress test - now I am not able to detect garbage after > pattern :-/. I'm not sure it's really problematic. It looks POLA-violation compatible with regular pg_dump options, for instance: $ echo "include table t1()" | pg_dump --filter - | ag CREATE CREATE TABLE public.t1 ( $ pg_dump -t "t1()" | ag CREATE CREATE TABLE public.t1 ( $ echo "include table t1()blabla" | pg_dump --filter - | ag CREATE pg_dump: error: no matching tables were found $ pg_dump -t "t1()blabla" | ag CREATE pg_dump: error: no matching tables were found I don't think the file parsing code should try to be smart about checking the found patterns. > * function's filtering doesn't support schema - when the name of function > is specified with schema, then the function is not found Ah I didn't know that. Indeed it only expect a non-qualified identifier, and would restore any function that matches the name (and arguments), possibly multiple ones if there are variants in different schema. That's unrelated to this patch though. > * the function has to be specified with an argument type list - the > separator has to be exactly ", " string. Without space or with one space > more, the filtering doesn't work (new implementation of pattern parsing > reduces white spaces sensitivity). This is not a bug, but it is not well > documented. Agreed. > attached updated patch It looks overall good to me! I just have a few minor nitpicking complaints: - you removed the pg_strip_clrf() calls and declared everything as "const char *", so there's no need to explicitly cast the filter_get_keyword() arguments anymore Note also that the code now relies on the fact that there are some non-zero bytes after a pattern to know that no errors happened. It's not a problem as you should find an EOF marker anyway if CLRF were stripped. + * Following routines are called from pg_dump, pg_dumpall and pg_restore. + * Unfortunately, implementation of exit_nicely in pg_dump and pg_restore + * is different from implementation of this routine in pg_dumpall. So instead + * of directly calling exit_nicely we have to return some error flag (in this + * case NULL), and exit_nicelly will be executed from caller's routine. Slight improvement: [...] Unfortunately, the implementation of exit_nicely in pg_dump and pg_restore is different from the one in pg_dumpall, so instead of... + * read_pattern - reads an pattern from input. The pattern can be mix of + * single line or multi line subpatterns. Single line subpattern starts first + * non white space char, and ending last non space char on line or by char + * '#'. The white spaces inside are removed (around char ".()"), or reformated + * around char ',' or reduced (the multiple spaces are replaced by one). + * Multiline subpattern starts by double quote and ending by this char too. + * The escape rules are same like for SQL quoted literal. + * + * Routine signalizes error by returning NULL. Otherwise returns pointer + * to next char after last processed char in input string. typo: reads "a" pattern from input... I don't fully understand the part about subpatterns, but is that necessary to describe it? Simply saying that any valid and possibly-quoted identifier can be parsed should make it clear that identifiers containing \n characters should work too. Maybe also just mention that whitespaces are removed and special care is taken to output routines in exactly the same way calling code will expect it (that is comma-and-single-space type delimiter).
pá 11. 11. 2022 v 9:11 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
Hi,
On Sat, Nov 05, 2022 at 08:54:57PM +0100, Pavel Stehule wrote:
>
> pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud <rjuju123@gmail.com>
> napsal:
>
> > > But one thing I noticed is that "optarg" looks wrong here:
> > >
> > > simple_string_list_append(&opts->triggerNames, optarg);
> >
> > Ah indeed, good catch! Maybe there should be an explicit test for every
> > (include|exclude) / objtype combination? It would be a bit verbose (and
> > possibly hard to maintain).
> >
>
> yes - pg_restore is not well covered by tests, fixed
>
> I found another issue. The pg_restore requires a full signature of the
> function and it is pretty sensitive on white spaces (pg_restore).
Argh, indeed. It's a good thing to have expanded the regression tests :)
> I made a
> mistake when I partially parsed patterns like SQL identifiers. It can work
> for simple cases, but when I parse the function's signature it stops
> working. So I rewrote the parsing pattern part. Now, I just read an input
> string and I try to reduce spaces. Still multiline identifiers are
> supported. Against the previous method of pattern parsing, I needed to
> change just one regress test - now I am not able to detect garbage after
> pattern :-/.
I'm not sure it's really problematic. It looks POLA-violation compatible with
regular pg_dump options, for instance:
$ echo "include table t1()" | pg_dump --filter - | ag CREATE
CREATE TABLE public.t1 (
$ pg_dump -t "t1()" | ag CREATE
CREATE TABLE public.t1 (
$ echo "include table t1()blabla" | pg_dump --filter - | ag CREATE
pg_dump: error: no matching tables were found
$ pg_dump -t "t1()blabla" | ag CREATE
pg_dump: error: no matching tables were found
I don't think the file parsing code should try to be smart about checking the
found patterns.
> * function's filtering doesn't support schema - when the name of function
> is specified with schema, then the function is not found
Ah I didn't know that. Indeed it only expect a non-qualified identifier, and
would restore any function that matches the name (and arguments), possibly
multiple ones if there are variants in different schema. That's unrelated to
this patch though.
> * the function has to be specified with an argument type list - the
> separator has to be exactly ", " string. Without space or with one space
> more, the filtering doesn't work (new implementation of pattern parsing
> reduces white spaces sensitivity). This is not a bug, but it is not well
> documented.
Agreed.
> attached updated patch
It looks overall good to me! I just have a few minor nitpicking complaints:
- you removed the pg_strip_clrf() calls and declared everything as "const char
*", so there's no need to explicitly cast the filter_get_keyword() arguments
anymore
removed
Note also that the code now relies on the fact that there are some non-zero
bytes after a pattern to know that no errors happened. It's not a problem as
you should find an EOF marker anyway if CLRF were stripped.
I am not sure if I understand this note well?
+ * Following routines are called from pg_dump, pg_dumpall and pg_restore.
+ * Unfortunately, implementation of exit_nicely in pg_dump and pg_restore
+ * is different from implementation of this routine in pg_dumpall. So instead
+ * of directly calling exit_nicely we have to return some error flag (in this
+ * case NULL), and exit_nicelly will be executed from caller's routine.
Slight improvement:
[...]
Unfortunately, the implementation of exit_nicely in pg_dump and pg_restore is
different from the one in pg_dumpall, so instead of...
+ * read_pattern - reads an pattern from input. The pattern can be mix of
+ * single line or multi line subpatterns. Single line subpattern starts first
+ * non white space char, and ending last non space char on line or by char
+ * '#'. The white spaces inside are removed (around char ".()"), or reformated
+ * around char ',' or reduced (the multiple spaces are replaced by one).
+ * Multiline subpattern starts by double quote and ending by this char too.
+ * The escape rules are same like for SQL quoted literal.
+ *
+ * Routine signalizes error by returning NULL. Otherwise returns pointer
+ * to next char after last processed char in input string.
typo: reads "a" pattern from input...
fixed
I don't fully understand the part about subpatterns, but is that necessary to
describe it? Simply saying that any valid and possibly-quoted identifier can
be parsed should make it clear that identifiers containing \n characters should
work too. Maybe also just mention that whitespaces are removed and special
care is taken to output routines in exactly the same way calling code will
expect it (that is comma-and-single-space type delimiter).
In this case I hit the limits of my English language skills.
I rewrote this comment, but it needs more care. Please, can you look at it?
Hi
and updated patch
Regards
Pavel
Attachment
On Sat, Nov 12, 2022 at 09:35:59PM +0100, Pavel Stehule wrote: Thanks for the updated patch. Apart from the function comment it looks good to me. Justin, did you have any other comment on the patch? > > I don't fully understand the part about subpatterns, but is that necessary > > to > > describe it? Simply saying that any valid and possibly-quoted identifier > > can > > be parsed should make it clear that identifiers containing \n characters > > should > > work too. Maybe also just mention that whitespaces are removed and special > > care is taken to output routines in exactly the same way calling code will > > expect it (that is comma-and-single-space type delimiter). > > > > In this case I hit the limits of my English language skills. > > I rewrote this comment, but it needs more care. Please, can you look at it? I'm also not a native English speaker so I'm far for writing perfect comments myself :) Maybe something like /* * read_pattern - reads on object pattern from input * * This function will parse any valid identifier (quoted or not, qualified or * not), which can also includes the full signature for routines. * Note that this function takes special care to sanitize the detected * identifier (removing extraneous whitespaces or other unnecessary * characters). This is necessary as most backup/restore filtering functions * only recognize identifiers if they are written exactly way as they are * regenerated. * Returns a pointer to next character after the found identifier, or NULL on * error. */
ne 13. 11. 2022 v 9:58 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
On Sat, Nov 12, 2022 at 09:35:59PM +0100, Pavel Stehule wrote:
Thanks for the updated patch. Apart from the function comment it looks good to
me.
Justin, did you have any other comment on the patch?
> > I don't fully understand the part about subpatterns, but is that necessary
> > to
> > describe it? Simply saying that any valid and possibly-quoted identifier
> > can
> > be parsed should make it clear that identifiers containing \n characters
> > should
> > work too. Maybe also just mention that whitespaces are removed and special
> > care is taken to output routines in exactly the same way calling code will
> > expect it (that is comma-and-single-space type delimiter).
> >
>
> In this case I hit the limits of my English language skills.
>
> I rewrote this comment, but it needs more care. Please, can you look at it?
I'm also not a native English speaker so I'm far for writing perfect comments
myself :)
far better than mine :)
Thank you very much
updated patch attached
Regards
Pavel
Maybe something like
/*
* read_pattern - reads on object pattern from input
*
* This function will parse any valid identifier (quoted or not, qualified or
* not), which can also includes the full signature for routines.
* Note that this function takes special care to sanitize the detected
* identifier (removing extraneous whitespaces or other unnecessary
* characters). This is necessary as most backup/restore filtering functions
* only recognize identifiers if they are written exactly way as they are
* regenerated.
* Returns a pointer to next character after the found identifier, or NULL on
* error.
*/
Attachment
Hi, On Sun, Nov 13, 2022 at 08:32:47PM +0100, Pavel Stehule wrote: > > updated patch attached Thanks! Some enhancement could probably be done by a native english speaker, but apart from that it looks good to me, so hearing no other complaints I'm marking the CF entry as Ready for Committer!
út 22. 11. 2022 v 6:26 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
Hi,
On Sun, Nov 13, 2022 at 08:32:47PM +0100, Pavel Stehule wrote:
>
> updated patch attached
Thanks!
Some enhancement could probably be done by a native english speaker, but apart
from that it looks good to me, so hearing no other complaints I'm marking the
CF entry as Ready for Committer!
Thank you very much for check and help
Regards
Pavel
Hi, On 2022-11-13 20:32:47 +0100, Pavel Stehule wrote: > updated patch attached It fails with address sanitizer that's now part of CI: https://cirrus-ci.com/task/6031397744279552?logs=test_world#L2659 [06:33:11.271] # ==31965==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x619000000480 at pc 0x559f1ac40822 bp0x7ffea83e1ad0 sp 0x7ffea83e1ac8 [06:33:11.271] # READ of size 1 at 0x619000000480 thread T0 [06:33:11.271] # #0 0x559f1ac40821 in read_pattern /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302 [06:33:11.271] # #1 0x559f1ac40e4d in filter_read_item /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:459 [06:33:11.271] # #2 0x559f1abe6fa5 in read_dump_filters /tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:18229 [06:33:11.271] # #3 0x559f1ac2bb1b in main /tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:630 [06:33:11.271] # #4 0x7fd91fabfd09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09) [06:33:11.271] # #5 0x559f1abe5d29 in _start (/tmp/cirrus-ci-build/tmp_install/usr/local/pgsql/bin/pg_dump+0x39d29) [06:33:11.271] # [06:33:11.271] # 0x619000000480 is located 0 bytes to the right of 1024-byte region [0x619000000080,0x619000000480) [06:33:11.271] # allocated by thread T0 here: [06:33:11.271] # #0 0x7fd91fe14e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145 [06:33:11.271] # #1 0x559f1ac69f35 in pg_malloc_internal /tmp/cirrus-ci-build/src/common/fe_memutils.c:30 [06:33:11.271] # #2 0x559f1ac69f35 in palloc /tmp/cirrus-ci-build/src/common/fe_memutils.c:117 [06:33:11.271] # [06:33:11.271] # SUMMARY: AddressSanitizer: heap-buffer-overflow /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302 in read_pattern Greetings, Andres Freund
út 22. 11. 2022 v 8:39 odesílatel Andres Freund <andres@anarazel.de> napsal:
Hi,
On 2022-11-13 20:32:47 +0100, Pavel Stehule wrote:
> updated patch attached
It fails with address sanitizer that's now part of CI:
https://cirrus-ci.com/task/6031397744279552?logs=test_world#L2659
[06:33:11.271] # ==31965==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x619000000480 at pc 0x559f1ac40822 bp 0x7ffea83e1ad0 sp 0x7ffea83e1ac8
[06:33:11.271] # READ of size 1 at 0x619000000480 thread T0
[06:33:11.271] # #0 0x559f1ac40821 in read_pattern /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302
[06:33:11.271] # #1 0x559f1ac40e4d in filter_read_item /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:459
[06:33:11.271] # #2 0x559f1abe6fa5 in read_dump_filters /tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:18229
[06:33:11.271] # #3 0x559f1ac2bb1b in main /tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:630
[06:33:11.271] # #4 0x7fd91fabfd09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
[06:33:11.271] # #5 0x559f1abe5d29 in _start (/tmp/cirrus-ci-build/tmp_install/usr/local/pgsql/bin/pg_dump+0x39d29)
[06:33:11.271] #
[06:33:11.271] # 0x619000000480 is located 0 bytes to the right of 1024-byte region [0x619000000080,0x619000000480)
[06:33:11.271] # allocated by thread T0 here:
[06:33:11.271] # #0 0x7fd91fe14e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
[06:33:11.271] # #1 0x559f1ac69f35 in pg_malloc_internal /tmp/cirrus-ci-build/src/common/fe_memutils.c:30
[06:33:11.271] # #2 0x559f1ac69f35 in palloc /tmp/cirrus-ci-build/src/common/fe_memutils.c:117
[06:33:11.271] #
[06:33:11.271] # SUMMARY: AddressSanitizer: heap-buffer-overflow /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302 in read_pattern
I'll check it
Greetings,
Andres Freund
Hi
út 22. 11. 2022 v 8:39 odesílatel Andres Freund <andres@anarazel.de> napsal:
Hi,
On 2022-11-13 20:32:47 +0100, Pavel Stehule wrote:
> updated patch attached
It fails with address sanitizer that's now part of CI:
https://cirrus-ci.com/task/6031397744279552?logs=test_world#L2659
[06:33:11.271] # ==31965==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x619000000480 at pc 0x559f1ac40822 bp 0x7ffea83e1ad0 sp 0x7ffea83e1ac8
[06:33:11.271] # READ of size 1 at 0x619000000480 thread T0
[06:33:11.271] # #0 0x559f1ac40821 in read_pattern /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302
[06:33:11.271] # #1 0x559f1ac40e4d in filter_read_item /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:459
[06:33:11.271] # #2 0x559f1abe6fa5 in read_dump_filters /tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:18229
[06:33:11.271] # #3 0x559f1ac2bb1b in main /tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:630
[06:33:11.271] # #4 0x7fd91fabfd09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
[06:33:11.271] # #5 0x559f1abe5d29 in _start (/tmp/cirrus-ci-build/tmp_install/usr/local/pgsql/bin/pg_dump+0x39d29)
[06:33:11.271] #
[06:33:11.271] # 0x619000000480 is located 0 bytes to the right of 1024-byte region [0x619000000080,0x619000000480)
[06:33:11.271] # allocated by thread T0 here:
[06:33:11.271] # #0 0x7fd91fe14e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
[06:33:11.271] # #1 0x559f1ac69f35 in pg_malloc_internal /tmp/cirrus-ci-build/src/common/fe_memutils.c:30
[06:33:11.271] # #2 0x559f1ac69f35 in palloc /tmp/cirrus-ci-build/src/common/fe_memutils.c:117
[06:33:11.271] #
[06:33:11.271] # SUMMARY: AddressSanitizer: heap-buffer-overflow /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302 in read_pattern
should be fixed in attached patch
I found and fix small memleak 24bytes per filter row (PQExpBufferData)
Regards
Pavel
Greetings,
Andres Freund
Attachment
So.... This patch has been through a lot of commitfests. And it really doesn't seem that hard to resolve -- Pavel has seemingly been willing to go along whichever way the wind has been blowing but honestly it kind of seems like he's just gotten drive-by suggestions and he's put a lot of work into trying to satisfy them. He implemented --include-tables-from-file=... etc. Then he implemented a hand-written parser for a DSL to select objects, then he implemented a bison parser, then he went back to the hand-written parser. Can we get some consensus on whether the DSL looks right and whether the hand-written parser is sensible. And if so then can a committer step up to actual review and commit the patch? The last review said it might need a native English speaker to tweak some wording but otherwise looked good. -- Gregory Stark As Commitfest Manager
> On 6 Mar 2023, at 21:45, Gregory Stark (as CFM) <stark.cfm@gmail.com> wrote: > > So.... This patch has been through a lot of commitfests. And it really > doesn't seem that hard to resolve -- Pavel has seemingly been willing > to go along whichever way the wind has been blowing but honestly it > kind of seems like he's just gotten drive-by suggestions and he's put > a lot of work into trying to satisfy them. Agreed. > He implemented --include-tables-from-file=... etc. Then he implemented > a hand-written parser for a DSL to select objects, then he implemented > a bison parser, then he went back to the hand-written parser. Well, kind of. I was trying to take the patch to the finishing line but was uncomfortable with the hand written parser so I implemented a parser in Bison to replace it with. Not that hand-written parsers are bad per se (or that my bison parser was perfect), but reading quoted identifiers across line boundaries tend to require a fair amount of handwritten code. Pavel did not object to this version, but it was objected to by two other committers. At this point [0] I stepped down from trying to finish it as the approach I was comfortable didn't gain traction (which is totally fine). Downthread from this the patch got a lot of reviews from Julien with the old parser back in place. > Can we get some consensus on whether the DSL looks right I would consider this pretty settled. > and whether the hand-written parser is sensible. This is the part where a committer who wants to pursue the hand-written parser need to step up. With the amount of review received it's hopefully pretty close. -- Daniel Gustafsson [0] 098531E1-FBA9-4B7D-884E-0A4363EEE6DF@yesql.se
Hi, On Mon, Mar 06, 2023 at 10:20:32PM +0100, Daniel Gustafsson wrote: > > On 6 Mar 2023, at 21:45, Gregory Stark (as CFM) <stark.cfm@gmail.com> wrote: > > > > So.... This patch has been through a lot of commitfests. And it really > > doesn't seem that hard to resolve -- Pavel has seemingly been willing > > to go along whichever way the wind has been blowing but honestly it > > kind of seems like he's just gotten drive-by suggestions and he's put > > a lot of work into trying to satisfy them. > > Agreed. Indeed, I'm not sure I would have had that much patience. > > He implemented --include-tables-from-file=... etc. Then he implemented > > a hand-written parser for a DSL to select objects, then he implemented > > a bison parser, then he went back to the hand-written parser. > > Well, kind of. I was trying to take the patch to the finishing line but was > uncomfortable with the hand written parser so I implemented a parser in Bison > to replace it with. Not that hand-written parsers are bad per se (or that my > bison parser was perfect), but reading quoted identifiers across line > boundaries tend to require a fair amount of handwritten code. Pavel did not > object to this version, but it was objected to by two other committers. > > At this point [0] I stepped down from trying to finish it as the approach I was > comfortable didn't gain traction (which is totally fine). > > Downthread from this the patch got a lot of reviews from Julien with the old > parser back in place. Yeah, and the current state seems quite good to me. > > Can we get some consensus on whether the DSL looks right > > I would consider this pretty settled. Agreed.
út 7. 3. 2023 v 3:47 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
Hi,
On Mon, Mar 06, 2023 at 10:20:32PM +0100, Daniel Gustafsson wrote:
> > On 6 Mar 2023, at 21:45, Gregory Stark (as CFM) <stark.cfm@gmail.com> wrote:
> >
> > So.... This patch has been through a lot of commitfests. And it really
> > doesn't seem that hard to resolve -- Pavel has seemingly been willing
> > to go along whichever way the wind has been blowing but honestly it
> > kind of seems like he's just gotten drive-by suggestions and he's put
> > a lot of work into trying to satisfy them.
>
> Agreed.
Indeed, I'm not sure I would have had that much patience.
> > He implemented --include-tables-from-file=... etc. Then he implemented
> > a hand-written parser for a DSL to select objects, then he implemented
> > a bison parser, then he went back to the hand-written parser.
>
> Well, kind of. I was trying to take the patch to the finishing line but was
> uncomfortable with the hand written parser so I implemented a parser in Bison
> to replace it with. Not that hand-written parsers are bad per se (or that my
> bison parser was perfect), but reading quoted identifiers across line
> boundaries tend to require a fair amount of handwritten code. Pavel did not
> object to this version, but it was objected to by two other committers.
>
> At this point [0] I stepped down from trying to finish it as the approach I was
> comfortable didn't gain traction (which is totally fine).
>
> Downthread from this the patch got a lot of reviews from Julien with the old
> parser back in place.
Yeah, and the current state seems quite good to me.
> > Can we get some consensus on whether the DSL looks right
>
> I would consider this pretty settled.
Agreed.
rebase + enhancing about related option from a563c24
Regards
Pavel
Attachment
Hi
fresh rebase
regards
Pavel
Attachment
On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote: > rebase + enhancing about related option from a563c24 Thanks. It looks like this doesn't currently handle extensions, which were added at 6568cef26e. > + <literal>table_and_children</literal>: tables, works like > + <option>-t</option>/<option>--table</option>, except that > + it also includes any partitions or inheritance child > + tables of the table(s) matching the > + <replaceable class="parameter">pattern</replaceable>. Why doesn't this just say "works like --table-and-children" ? I think as you wrote log_invalid_filter_format(), the messages wouldn't be translated, because they're printed via %s. One option is to call _() on the message. > +ok($dump !=~ qr/^CREATE TABLE public\.bootab/m, "exclude dumped children table"); !=~ is being interpretted as as numeric "!=" and throwing warnings. It should be a !~ b, right ? It'd be nice if perl warnings during the tests were less easy to miss. > + * char is not alpha. The char '_' is allowed too (exclude first position). Why is it treated specially? Could it be treated the same as alpha? > + log_invalid_filter_format(&fstate, > + "\"include\" table data filter is not allowed"); > + log_invalid_filter_format(&fstate, > + "\"include\" table data and children filter is not allowed"); For these, it might be better to write the literal option: > + "include filter for \"table_data_and_children\" is not allowed"); Because the option is a literal and shouldn't be translated. And it's probably better to write that using %s, like: > + "include filter for \"%s\" is not allowed"); That makes shorter and fewer strings. Find attached a bunch of other corrections as 0002.txt I also dug up what I'd started in november, trying to reduce the code duplication betwen pg_restore/dump/all. This isn't done, but I might never finish it, so I'll at least show what I have in case you think it's a good idea. This passes tests on CI, except for autoconf, due to using exit_nicely() differently. -- Justin
Attachment
ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> rebase + enhancing about related option from a563c24
Thanks.
It looks like this doesn't currently handle extensions, which were added
at 6568cef26e.
> + <literal>table_and_children</literal>: tables, works like
> + <option>-t</option>/<option>--table</option>, except that
> + it also includes any partitions or inheritance child
> + tables of the table(s) matching the
> + <replaceable class="parameter">pattern</replaceable>.
Why doesn't this just say "works like --table-and-children" ?
I think as you wrote log_invalid_filter_format(), the messages wouldn't
be translated, because they're printed via %s. One option is to call
_() on the message.
> +ok($dump !=~ qr/^CREATE TABLE public\.bootab/m, "exclude dumped children table");
!=~ is being interpretted as as numeric "!=" and throwing warnings.
It should be a !~ b, right ?
It'd be nice if perl warnings during the tests were less easy to miss.
> + * char is not alpha. The char '_' is allowed too (exclude first position).
Why is it treated specially? Could it be treated the same as alpha?
> + log_invalid_filter_format(&fstate,
> + "\"include\" table data filter is not allowed");
> + log_invalid_filter_format(&fstate,
> + "\"include\" table data and children filter is not allowed");
For these, it might be better to write the literal option:
> + "include filter for \"table_data_and_children\" is not allowed");
Because the option is a literal and shouldn't be translated.
And it's probably better to write that using %s, like:
> + "include filter for \"%s\" is not allowed");
That makes shorter and fewer strings.
Find attached a bunch of other corrections as 0002.txt
Thank you very much - I'll recheck the mentioned points tomorrow.
I also dug up what I'd started in november, trying to reduce the code
duplication betwen pg_restore/dump/all. This isn't done, but I might
never finish it, so I'll at least show what I have in case you think
it's a good idea. This passes tests on CI, except for autoconf, due to
using exit_nicely() differently.
Your implementation reduced 60 lines, but the interface and code is more complex. I cannot say what is significantly better. Personally, in this case, I prefer my variant, because I think it is a little bit more readable, and possible modification can be more simple. But this is just my opinion, and I have no problem accepting other opinions. I can imagine to define some configuration array like getopt, but it looks like over engineering
Regards
Pavel
--
Justin
ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> rebase + enhancing about related option from a563c24
Thanks.
It looks like this doesn't currently handle extensions, which were added
at 6568cef26e.
> + <literal>table_and_children</literal>: tables, works like
> + <option>-t</option>/<option>--table</option>, except that
> + it also includes any partitions or inheritance child
> + tables of the table(s) matching the
> + <replaceable class="parameter">pattern</replaceable>.
Why doesn't this just say "works like --table-and-children" ?
changed
I think as you wrote log_invalid_filter_format(), the messages wouldn't
be translated, because they're printed via %s. One option is to call
_() on the message.
fixed
> +ok($dump !=~ qr/^CREATE TABLE public\.bootab/m, "exclude dumped children table");
!=~ is being interpretted as as numeric "!=" and throwing warnings.
It should be a !~ b, right ?
It'd be nice if perl warnings during the tests were less easy to miss.
should be fixed by you
> + * char is not alpha. The char '_' is allowed too (exclude first position).
Why is it treated specially? Could it be treated the same as alpha?
It is usual behaviour in Postgres for keywords. Important is the complete sentence "Returns NULL when the buffer is empty or the first char is not alpha."
In this case this implementation has no big impact on behaviour - probably you got a message "unknown keyword" instead of "missing keyword". But I would
implement behaviour consistent with other places. My opinion in this case is not extra strong - we can define the form of keywords like we want, just this is consistent
with other parsers in Postgres.
> + log_invalid_filter_format(&fstate,
> + "\"include\" table data filter is not allowed");
> + log_invalid_filter_format(&fstate,
> + "\"include\" table data and children filter is not allowed");
For these, it might be better to write the literal option:
> + "include filter for \"table_data_and_children\" is not allowed");
Because the option is a literal and shouldn't be translated.
And it's probably better to write that using %s, like:
> + "include filter for \"%s\" is not allowed");
done
That makes shorter and fewer strings.
Find attached a bunch of other corrections as 0002.txt
merged
Regards
Pavel
Attachment
On Mon, Mar 20, 2023 at 08:01:13AM +0100, Pavel Stehule wrote: > ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal: > > > On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote: > > > rebase + enhancing about related option from a563c24 > > > > Thanks. > > > > It looks like this doesn't currently handle extensions, which were added > > at 6568cef26e. What about this part ? Should extension filters be supported ? I think the comment that I'd patched that lists all the filter types should be minimized, rather than duplicating the list of all the possible filters that's already in the usrr-facing documentation. One new typo: childrent
út 21. 3. 2023 v 16:32 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Mon, Mar 20, 2023 at 08:01:13AM +0100, Pavel Stehule wrote:
> ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
>
> > On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> > > rebase + enhancing about related option from a563c24
> >
> > Thanks.
> >
> > It looks like this doesn't currently handle extensions, which were added
> > at 6568cef26e.
What about this part ? Should extension filters be supported ?
I missed this, yes, it should be supported.
I think the comment that I'd patched that lists all the filter types
should be minimized, rather than duplicating the list of all the
possible filters that's already in the usrr-facing documentation.
One new typo: childrent
út 21. 3. 2023 v 16:32 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Mon, Mar 20, 2023 at 08:01:13AM +0100, Pavel Stehule wrote:
> ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
>
> > On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> > > rebase + enhancing about related option from a563c24
> >
> > Thanks.
> >
> > It looks like this doesn't currently handle extensions, which were added
> > at 6568cef26e.
What about this part ? Should extension filters be supported ?
should be fixed
I think the comment that I'd patched that lists all the filter types
should be minimized, rather than duplicating the list of all the
possible filters that's already in the usrr-facing documentation.
I modified this comment. Please, check
One new typo: childrent
fixed
Regards
Pavel
Attachment
Hi
only rebase
Regards
Pavel
Attachment
Hi
po 11. 9. 2023 v 6:34 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hionly rebase
Unfortunately this rebase was not correct. I am sorry.
fixed version
Regards
Pavel
RegardsPavel
Attachment
Hi
po 11. 9. 2023 v 6:57 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hipo 11. 9. 2023 v 6:34 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:Hionly rebaseUnfortunately this rebase was not correct. I am sorry.fixed version
and fixed forgotten "break" in switch
Regards
Pavel
RegardsPavelRegardsPavel
Attachment
I went and had another look at this. The patch has been around for 18 commitfests and is widely considered to add a good feature, so it seems about time to get reach closure. As I've mentioned in the past I'm not a big fan of the parser, but the thread has overruled on that. Another thing I think is a bit overcomplicated is the layered error handling for printing log messages, and bubbling up of errors to get around not being able to call exit_nicely. In the attached version I've boiled down the error logging into a single new function pg_log_filter_error() which takes a variable format string. This removes a fair bit of the extra calls and makes logging easier. I've also added a function pointer to the FilterStateData for passing the exit function via filter_init. This allows the filtering code to exit gracefully regardless of which application is using it. Finally, I've also reimplemented the logic for checking the parsed tokens into switch statements without defaults in order to get the compilerwarning on a missed case. It's easy to miss adding code to handle a state, especially when adding new ones, and this should help highlight that. Overall, this does shave a bit off the patch in size for what IMHO is better readability and maintainability. (I've also made a pgindent pass over it of course). What are your thoughts on this version? It's not in a committable state as it needs a bit more comments here and there and a triplecheck that nothing was missed in changing this, but I prefer to get your thoughts before spending the extra time. -- Daniel Gustafsson
Attachment
Hi
What are your thoughts on this version? It's not in a committable state as it
needs a bit more comments here and there and a triplecheck that nothing was
missed in changing this, but I prefer to get your thoughts before spending the
extra time.
I think using pointer to exit function is an elegant solution. I checked the code and I found only one issue. I fixed warning
[13:57:22.578] time make -s -j${BUILD_JOBS} world-bin
[13:58:20.858] filter.c: In function ‘pg_log_filter_error’:
[13:58:20.858] filter.c:161:2: error: function ‘pg_log_filter_error’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format]
[13:58:20.858] 161 | vsnprintf(buf, sizeof(buf), fmt, argp);
[13:58:20.858] | ^~~~~~~~~
[13:58:20.858] cc1: all warnings being treated as errors
and probably copy/paste bug
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index f647bde28d..ab2abedf5f 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -535,7 +535,7 @@ read_restore_filters(const char *filename, RestoreOptions *opts)
case FILTER_OBJECT_TYPE_EXTENSION:
case FILTER_OBJECT_TYPE_FOREIGN_DATA:
pg_log_filter_error(&fstate, _("%s filter for \"%s\" is not allowed."),
- "exclude",
+ "include",
filter_object_type_name(objtype));
exit_nicely(1);
index f647bde28d..ab2abedf5f 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -535,7 +535,7 @@ read_restore_filters(const char *filename, RestoreOptions *opts)
case FILTER_OBJECT_TYPE_EXTENSION:
case FILTER_OBJECT_TYPE_FOREIGN_DATA:
pg_log_filter_error(&fstate, _("%s filter for \"%s\" is not allowed."),
- "exclude",
+ "include",
filter_object_type_name(objtype));
exit_nicely(1);
Regards
Pavel
--
Daniel Gustafsson
Attachment
Hi
ne 12. 11. 2023 v 14:17 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi
What are your thoughts on this version? It's not in a committable state as it
needs a bit more comments here and there and a triplecheck that nothing was
missed in changing this, but I prefer to get your thoughts before spending the
extra time.I think using pointer to exit function is an elegant solution. I checked the code and I found only one issue. I fixed warning[13:57:22.578] time make -s -j${BUILD_JOBS} world-bin[13:58:20.858] filter.c: In function ‘pg_log_filter_error’:[13:58:20.858] filter.c:161:2: error: function ‘pg_log_filter_error’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format][13:58:20.858] 161 | vsnprintf(buf, sizeof(buf), fmt, argp);[13:58:20.858] | ^~~~~~~~~[13:58:20.858] cc1: all warnings being treated as errorsand probably copy/paste bugdiff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index f647bde28d..ab2abedf5f 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -535,7 +535,7 @@ read_restore_filters(const char *filename, RestoreOptions *opts)
case FILTER_OBJECT_TYPE_EXTENSION:
case FILTER_OBJECT_TYPE_FOREIGN_DATA:
pg_log_filter_error(&fstate, _("%s filter for \"%s\" is not allowed."),
- "exclude",
+ "include",
filter_object_type_name(objtype));
exit_nicely(1);RegardsPavel
next update - fix used, but uninitialized "is_include" variable, when filter is of FILTER_OBJECT_TYPE_NONE
fix crash
# Running: pg_ctl -w -D /tmp/cirrus-ci-build/build-32/testrun/pg_dump/005_pg_dump_filterfile/data/t_005_pg_dump_filterfile_main_data/pgdata -l /tmp/cirrus-ci-build/build-32/testrun/pg_dump/005_pg_dump_filterfile/log/005_pg_dump_filterfile_main.log -o --cluster-name=main start waiting for server to start.... done server started # Postmaster PID for node "main" is 71352 # Running: pg_dump -p 65454 -f /tmp/cirrus-ci-build/build-32/testrun/pg_dump/005_pg_dump_filterfile/data/t_005_pg_dump_filterfile_main_data/backup/plain.sql --filter=/tmp/cirrus-ci-build/build-32/testrun/pg_dump/005_pg_dump_filterfile/data/tmp_test_0mO3/inputfile.txt postgres ../src/bin/pg_dump/pg_dump.c:18800:7: runtime error: load of value 86, which is not a valid value for type '_Bool' ==71579==Using libbacktrace symbolizer. #0 0x566302cd in read_dump_filters ../src/bin/pg_dump/pg_dump.c:18800 #1 0x56663429 in main ../src/bin/pg_dump/pg_dump.c:670 #2 0xf7694e45 in __libc_start_main (/lib/i386-linux-gnu/libc.so.6+0x1ae45) #3 0x56624d50 in _start (/tmp/cirrus-ci-build/build-32/tmp_install/usr/local/pgsql/bin/pg_dump+0x1ad50)Regards
Pavel
--
Daniel Gustafsson
Attachment
> On 13 Nov 2023, at 14:15, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > Hi > > ne 12. 11. 2023 v 14:17 odesílatel Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>> napsal: > Hi > > > What are your thoughts on this version? It's not in a committable state as it > needs a bit more comments here and there and a triplecheck that nothing was > missed in changing this, but I prefer to get your thoughts before spending the > extra time. > > I think using pointer to exit function is an elegant solution. I checked the code and I found only one issue. I fixed warning > > [13:57:22.578] time make -s -j${BUILD_JOBS} world-bin > [13:58:20.858] filter.c: In function ‘pg_log_filter_error’: > [13:58:20.858] filter.c:161:2: error: function ‘pg_log_filter_error’ might be a candidate for ‘gnu_printf’ format attribute[-Werror=suggest-attribute=format] > [13:58:20.858] 161 | vsnprintf(buf, sizeof(buf), fmt, argp); > [13:58:20.858] | ^~~~~~~~~ > [13:58:20.858] cc1: all warnings being treated as errors > > and probably copy/paste bug > > diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c > index f647bde28d..ab2abedf5f 100644 > --- a/src/bin/pg_dump/pg_restore.c > +++ b/src/bin/pg_dump/pg_restore.c > @@ -535,7 +535,7 @@ read_restore_filters(const char *filename, RestoreOptions *opts) > case FILTER_OBJECT_TYPE_EXTENSION: > case FILTER_OBJECT_TYPE_FOREIGN_DATA: > pg_log_filter_error(&fstate, _("%s filter for \"%s\" is not allowed."), > - "exclude", > + "include", > filter_object_type_name(objtype)); > exit_nicely(1); > > Regards > > Pavel > > next update - fix used, but uninitialized "is_include" variable, when filter is of FILTER_OBJECT_TYPE_NONE Thanks, the posted patchset was indeed a bit of a sketch, thanks for fixing up these. I'll go over it again too to clean it up and try to make into something committable. I was pondering replacing the is_include handling with returning an enum for the operation, to keep things more future proof in case we add more operations (and also a bit less magic IMHO). -- Daniel Gustafsson
po 13. 11. 2023 v 14:39 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 13 Nov 2023, at 14:15, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
> ne 12. 11. 2023 v 14:17 odesílatel Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>> napsal:
> Hi
>
>
> What are your thoughts on this version? It's not in a committable state as it
> needs a bit more comments here and there and a triplecheck that nothing was
> missed in changing this, but I prefer to get your thoughts before spending the
> extra time.
>
> I think using pointer to exit function is an elegant solution. I checked the code and I found only one issue. I fixed warning
>
> [13:57:22.578] time make -s -j${BUILD_JOBS} world-bin
> [13:58:20.858] filter.c: In function ‘pg_log_filter_error’:
> [13:58:20.858] filter.c:161:2: error: function ‘pg_log_filter_error’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format]
> [13:58:20.858] 161 | vsnprintf(buf, sizeof(buf), fmt, argp);
> [13:58:20.858] | ^~~~~~~~~
> [13:58:20.858] cc1: all warnings being treated as errors
>
> and probably copy/paste bug
>
> diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
> index f647bde28d..ab2abedf5f 100644
> --- a/src/bin/pg_dump/pg_restore.c
> +++ b/src/bin/pg_dump/pg_restore.c
> @@ -535,7 +535,7 @@ read_restore_filters(const char *filename, RestoreOptions *opts)
> case FILTER_OBJECT_TYPE_EXTENSION:
> case FILTER_OBJECT_TYPE_FOREIGN_DATA:
> pg_log_filter_error(&fstate, _("%s filter for \"%s\" is not allowed."),
> - "exclude",
> + "include",
> filter_object_type_name(objtype));
> exit_nicely(1);
>
> Regards
>
> Pavel
>
> next update - fix used, but uninitialized "is_include" variable, when filter is of FILTER_OBJECT_TYPE_NONE
Thanks, the posted patchset was indeed a bit of a sketch, thanks for fixing up
these. I'll go over it again too to clean it up and try to make into something
committable.
I was pondering replacing the is_include handling with returning an enum for
the operation, to keep things more future proof in case we add more operations
(and also a bit less magic IMHO).
+1
Pavel
--
Daniel Gustafsson
Hi
I was pondering replacing the is_include handling with returning an enum for
the operation, to keep things more future proof in case we add more operations
(and also a bit less magic IMHO).+1
I did it.
Regards
Pavel
Pavel
--
Daniel Gustafsson
Attachment
> On 20 Nov 2023, at 06:20, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I was pondering replacing the is_include handling with returning an enum for > the operation, to keep things more future proof in case we add more operations > (and also a bit less magic IMHO). > > +1 > > I did it. Nice, I think it's an improvement. + <literal>extension</literal>: data on foreign servers, works like + <option>--extension</option>. This keyword can only be + used with the <literal>include</literal> keyword. This seems like a copy-pasteo, fixed in the attached. I've spent some time polishing this version of the patch, among other things trying to make the docs and --help screen consistent across the tools. I've added the diff as a txt file to this email (to keep the CFbot from applying it), it's mainly reformatting a few comments and making things consistent. The attached is pretty close to a committable patch IMO, review is welcome on both the patch and commit message. I tried to identify all reviewers over the past 3+ years but I might have missed someone. -- Daniel Gustafsson
Attachment
Op 11/21/23 om 22:10 schreef Daniel Gustafsson: >> On 20 Nov 2023, at 06:20, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > The attached is pretty close to a committable patch IMO, review is welcome on > both the patch and commit message. I tried to identify all reviewers over the > past 3+ years but I might have missed someone. I've tested this, albeit mostly in the initial iterations (*shrug* but a mention is nice) Erik Rijkers > > -- > Daniel Gustafsson >
> On 22 Nov 2023, at 05:27, Erik Rijkers <er@xs4all.nl> wrote: > > Op 11/21/23 om 22:10 schreef Daniel Gustafsson: >>> On 20 Nov 2023, at 06:20, Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> The attached is pretty close to a committable patch IMO, review is welcome on >> both the patch and commit message. I tried to identify all reviewers over the >> past 3+ years but I might have missed someone. I took another look at this, found some more polish that was needed, added another testcase and ended up pushing it. > I've tested this, albeit mostly in the initial iterations (*shrug* but a mention is nice) As I mentioned above it's easy to miss when reviewing three years worth of emails, no-one was intentionally left out. I went back and looked and added you as a reviewer. Thanks for letting me know. -- Daniel Gustafsson
Hi
st 29. 11. 2023 v 15:44 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 22 Nov 2023, at 05:27, Erik Rijkers <er@xs4all.nl> wrote:
>
> Op 11/21/23 om 22:10 schreef Daniel Gustafsson:
>>> On 20 Nov 2023, at 06:20, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>> The attached is pretty close to a committable patch IMO, review is welcome on
>> both the patch and commit message. I tried to identify all reviewers over the
>> past 3+ years but I might have missed someone.
I took another look at this, found some more polish that was needed, added
another testcase and ended up pushing it.
> I've tested this, albeit mostly in the initial iterations (*shrug* but a mention is nice)
As I mentioned above it's easy to miss when reviewing three years worth of
emails, no-one was intentionally left out. I went back and looked and added
you as a reviewer. Thanks for letting me know.
Thank you very much
Regards
Pavel
--
Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes: > I took another look at this, found some more polish that was needed, added > another testcase and ended up pushing it. mamba is unhappy because this uses <ctype.h> functions without casting their arguments to unsigned char: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2023-11-30%2002%3A53%3A25 (I had not realized that we still had buildfarm animals that would complain about this ... but I'm glad we do, because it's a hazard. POSIX is quite clear that the behavior is undefined for signed chars.) regards, tom lane
Hi
čt 30. 11. 2023 v 4:40 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Daniel Gustafsson <daniel@yesql.se> writes:
> I took another look at this, found some more polish that was needed, added
> another testcase and ended up pushing it.
mamba is unhappy because this uses <ctype.h> functions without
casting their arguments to unsigned char:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2023-11-30%2002%3A53%3A25
(I had not realized that we still had buildfarm animals that would
complain about this ... but I'm glad we do, because it's a hazard.
POSIX is quite clear that the behavior is undefined for signed chars.)
here is a patch
Regards
Pavel
regards, tom lane
Attachment
> On 30 Nov 2023, at 07:13, Pavel Stehule <pavel.stehule@gmail.com> wrote: > čt 30. 11. 2023 v 4:40 odesílatel Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> napsal: > Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> writes: > > I took another look at this, found some more polish that was needed, added > > another testcase and ended up pushing it. > > mamba is unhappy because this uses <ctype.h> functions without > casting their arguments to unsigned char: Thanks for the heads-up. > here is a patch I agree with this fix, and have applied it. -- Daniel Gustafsson
čt 30. 11. 2023 v 14:05 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 30 Nov 2023, at 07:13, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> čt 30. 11. 2023 v 4:40 odesílatel Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> napsal:
> Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> writes:
> > I took another look at this, found some more polish that was needed, added
> > another testcase and ended up pushing it.
>
> mamba is unhappy because this uses <ctype.h> functions without
> casting their arguments to unsigned char:
Thanks for the heads-up.
> here is a patch
I agree with this fix, and have applied it.
Thank you
Pavel
--
Daniel Gustafsson